MATHEMATICAL MODELLING IN THE SCIENCE

0 downloads 0 Views 236KB Size Report
However, because the Laplace transform of the Kohlrausch function is not ... the Kohlrausch function by Pollard [68]. 3. ..... bridge University Press, 2001.
INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, SERIES B

c 2012 Institute for Scientific

Computing and Information

Volume 3, Number 3, Pages 242–258

MATHEMATICAL MODELLING IN THE SCIENCE AND TECHNOLOGY OF PLANT BREEDING R. S. ANDERSSEN AND M. P. EDWARDS

Abstract. The livelihood of humanity depends crucially on the growing and harvesting of crops, the processing of crops to produce the various foods that are eaten and the distribution of the resulting products and produce to the various consumers. The underlying biological foundation on which the success of this complex industrial hierarchy of activity rests is the success of the ongoing process of plant breeding. Not only must plants be bred to ensure that the planned end-products, such as bread, cakes, pasta and noodles, are of acceptable quality, they must, in order to minimize crop failure and thereby ensure food security and supply, also be insect and/or disease resistant. The success of such endeavours rests on the quality of the underlying science, which has become highly sophisticated in recent years. Its utilization, in terms of the modern understanding of the genetics of plant growth and the increasing sophistication of experimentation and instrumentation, has greatly improved the speed and quality of plant breeding. The associated implementation of these new plant breeding protocols is generating a need for improved quantification through the utilization of mathematical modelling. In order to illustrate the diverse range of mathematics required to support such quantification, this paper discusses some illustrative aspects connected with the recent modelling of the flow and deformation of wheat-flour dough, information recovery from spectroscopic data (e.g. such as the determination of the protein content in wheat), antiviral resistance in plants and pattern formation in plants. Various aspects of the mathematics involved are highlighted from a mathematical modelling perspective, with a key secondary goal, using the discussion about these examples, of illustrating how applications impact on mathematics with the resulting mathematical developments in turn contributing to the solution of other applications with the process starting all over again. Key words. plants, plant breeding, near infrared spectroscopy, hexagonal cell modelling, linear viscoelasticity, pattern formation, reaction diffusion modelling, derivative spectroscopy, causal modelling, rheology

1. Introduction In traditional mathematical modelling, such as in the application of the equations of fluid dynamics [23] to the solution of practical problems such as flow of wind through and over plant canopies and the infiltration of water into soils [21], the underlying fundamentals are clearly understood to the point where well-defined and comprehensive mathematical models are available. In such traditional modelling, the first step often reduces to an assessment of the dominant robust behaviour of the phenomenon being modelled which should be utilized to capture the essence of the situation under examination (e.g. geophysical fluid dynamics [65]; industrial mathematical modelling [26]). The available experimental evidence and scientific background play a key role in this process. The second step reduces to solving and interpreting the resulting set of equations (e.g. numerical methods for the solution of differential equations [56, 22]). There are also areas of biology where such modelling plays the key role. They include the modelling of human physiology [64], the design of artificial arteries [42] and reaction-diffusion modelling of pattern formation in plants and animals [74, 58]. Received by the editors March 9 , 2012 and, in revised form, May 6, 2012. 1991 Mathematics Subject Classification. 92C15, 92C80, 74D99, 62P10, 65F30. 242

THE SCIENCE AND TECHNOLOGY OF PLANT BREEDING

243

In other areas of biology, the modelling is still very much a hypothesis-and-test exploratory activity, in that there is insufficient information about the problem context to give a clear guide as to how to “uniquely” identify the likely solution from within the various possibilities [25, 38, 24]. This is the situation not only in biology, but also in sociology, economics and physiology and even in many areas of industrial mathematical modelling [33]. A number of proven strategies act as guides on how to progress the exploratory process. They include the formulation and utilization of allometric scaling laws [36], calibration-and-prediction [13], compartmental modelling and ordinary differential equation modelling [34]. In many situations, especially in areas of molecular biology, because only a partial picture of the assumed mechanism under examination has been tentatively formulated, success depends on coupling the modelling with complementary experimentation. In fact, the limiting factor is very much what can be performed and measured experimentally, not the plethora of possible models of the mechanism that can be formulated and solved computationally. The coupling, if properly managed, is the strength that mathematical modelling brings to the exploratory process. However, as explained in Anderssen and Waterhouse [17] and Groenenboom and Hogeweg [40], it is a chicken-and-egg situation: on the basis of experimental observations, a variety of models are formulated to simulate possible mechanism; as a proof-by-elimination activity, new experiments are performed to reduce uncertainty about the likely mechanism; the process continues inductively. For example, in areas like plant genetics, some of the questions to be resolved are quite speculative and their resolution depends on identifying and performing the more, rather than the less, appropriate experiments. The overall goal of this paper is to illustrate, within the context of modelling biological aspects of plant breeding, the wide variety of mathematical modelling issues that arise when aiming to answer the real-world question that orchestrated and orchestrates the associated modelling activity. In particular, discussion will focus on: (i) The causal nature of mathematical models in biology with a specific emphasis on the modelling of the flow and deformation of wheat flour dough. (ii) The role of the calibration-and-prediction methods as a surrogate for direct laboratory experimentation with a specific emphasis on the identification, from a large pool of pest resistant wheats, of the ones that contain the proteins that make good breads. (iii) How insight, arising from the modelling of biological invasion, yields a possible foundation for modelling the antiviral response of plants. Such information is required for the design of biological protection for the plants selected through a plant breeding program. (iv) Insight about how genetics controls the development of the various geometric features of plants is required to improve on current plant breeding protocols by identifying the presence or absence of key signalling proteins and hormones. Currently, such insight is being obtained by formulating simple rules for the control of the positioning of the hairs on the leaves of plants. The rules must not only reproduce the observed patterns in normal plants but also, for small changes in the parameters in the model, reproduce the known mutants. A central underlying theme guiding the writing of this paper has been to illustrate “the impact of applications on mathematics”. Sometimes, but always involving a mixture of mathematical expertise and lateral thinking, it only involves

244

R. S. ANDERSSEN AND M. P. EDWARDS

seeing how traditional mathematical constructs can be adapted to model the process being investigated so that an answer to the question being under consideration follows as a natural corollary. An appropriate illustration below is the impact that trying to understand how wheat proteins control the observed flow and deformation of wheat-flour dough during mixing has had on the analysis, utilization and reinterpretation of the equations of linear viscoelasticity. At other times, as the challenges of the modelling are managed, insight is conceptualized about how new mathematical models should be formulated to accommodate the demands of the application. Below, an illustration can be found in the discussion of the modelling the positioning of the trichomes on the leaves of plants where the challenge was and is the formulation of models which not only simulate the assumed mechanism but do it in such a way that the generation of associated mutants is an essential component in the models. 2. Modelling Flow and Deformation (of Wheat-Flour Dough) The modelling of the flow and deformation of (biological) materials, such as wheat flour dough, involves three interrelated activities: (i) the formulation and analysis of mathematical models (equations) to model causal input-output processes, (ii) the coupling and solution of such models with the actual measurements that come from mixing experiments, and (iii) the utilization of the information coming from (ii) to improve on current understanding about the nature of the molecular processes involved in the flow and deformation and the key molecules controlling the process. Together, (i) and (ii) illustrate how the interaction between applications (the context in which the measurements are performed) and their mathematical modelling is a two way street with each, through the interaction, benefiting from and influencing the development in the other. The consequently effect of this is the new understanding (iii) that is thereby generated which can then be utilized in various ways in support of the context in which the measurements were performed. 2.1. Causal Modelling in Rheology. Whether as measurements resulting from an experiment performed on some biological entity, such as the rheological response of a wheat-flour dough to deformation, or as the observed phenotype of a plant in response to a specific stimulus, such as the silencing of a gene using modern gene silencing technology, the response to the stimulus is causal. Integral equation models for linear causal processes take the form of first kind convolution Volterra integral equations. For example, the causal stress response of a viscoelastic material to an imposed strain is modelled using a structure similar to that first proposed by Boltzmann Z t dγ (1) σ(t) = G(t − τ )γ(τ ˙ )dτ, γ˙ = , dt 0 where, respectively, σ, G and γ denote the measured stress response, the relaxation modulus and the imposed strain. However, the way that rheological instruments have been designed directly exploits the mathematical properties of equation (1). If, at time t = 0, the applied strain is oscillatory (shear or elongational) with γ(t) = γ0 sin(ωt),

THE SCIENCE AND TECHNOLOGY OF PLANT BREEDING

245

where γ0 and ω denote the amplitude and frequency of the input stimulus, then the output stress, as determined by equation (1), is given by (2)

σ(t) = γ0 (G′ (ω) sin(ωt) + G′′ (ω) cos(ωt))

where the inphase storage modulus and the out-of-phase loss modulus satisfy, respectively, Z ∞ (3) G′ (ω) = ω G(s) sin(ωs)ds, 0

and (4)

′′

G (ω) = ω

Z



G(s) cos(ωs)ds.

0

Commercial instruments directly return, for a chosen sequence of frequencies ω, the values of G′ (ω) and G′′ (ω) and it is these values that are utilized in rheological problem-solving and decision-making because of their direct physical interpretation. It is an excellent example of how the solution of an inverse problem is based on functionals defined on the solution of the mathematical model and not the explicit structure of the solution itself. It represents an important aspect of real-world inverse problem solving referred to as the the direct use of indirect measurements. In addition, it is an excellent example of how the practical needs of rheological decision-making has impact on the mathematics of viscoelasticity to yield a practical solution. Even for the actual determination of the relaxation modulus G(t), it is the relaxation spectrum H(t) ≥ 0, defined by Z ∞ H(τ ) exp(−t/τ ) dτ, (5) G(t) = τ 0 that is utilized as its surrogate. The importance of this choice is that it automatically guarantees that the relaxation modulus has fading memory (in the sense of Boltzmann) and thereby that the causal model formulated in equation (1) is consistent with the conservation of energy. For a given viscoelastic material, the actual determination of its relaxation spectrum is performed using the measured storage modulus G′ (ω) and loss modulus G′′ (ω) data. The substitution of the fading memory representation of equation (5) into the equations (3) and (4) yields the following dual system of equations that are solved for the relaxation spectrum H(τ ) Z ∞ ω 2 τ 2 H(τ ) G′ (ω) = dτ, 1 + ω2τ 2 τ 0 and

Z



ωτ H(τ ) dτ. 2τ 2 1 + ω τ 0 The importance of the relaxation spectrum model for the relaxation modulus of equation (5) is that it not only automatically guarantees that G(t) is a completely monotone function but also yields, via the solution of equations (3) and (4), a computationally stable process for the approximation of G(t) by a completely monotone function. Such considerations have led, from a rheological perspective [12], to the analysis of the properties of completely monotone functions [7], which represent a fundamentally important class of mathematical functions [79]. It is therefore another example of the impact of the needs of a real-world activity, rheological problem-solving, on the development of mathematics. ′′

G (ω) =

246

R. S. ANDERSSEN AND M. P. EDWARDS

2.2. Wheat Flour Dough. In the making of bread, cakes and pasta, the quality of the end product depends not only on the properties of the wheat from which the flour is made but also on the processing performed to make a particular end product. The behaviour of the product through the various processing stages is directly related to the composition of the wheat. For example, historically, it became common knowledge that hard wheats make good breads, soft wheats make good cakes and durum wheats make good pastas. The characterization and the formalization of such rules-of-thumb in terms of the composition of wheat and the resulting quality characteristics of end products has become a major activity of modern cereal science R&D. In that respect, through the use of modern technology, the flow and deformation (rheology) of a wheat flour dough has become, as well as near infrared (NIR) spectroscopy [5, 11] and grain hardness measurements [8], a key link technology that is utilized to relate wheat composition to end product quality. This modus operandi is base on the fact, confirmed by extensive experimentation and modelling, that the composition of a wheat correlates with the rheology of the dough that its flour makes, and that rheology correlates with end product quality. As a result, the rheology of wheat flour dough has become a fundamental aspect in cereal science research, technology and application. From a plant breeding perspective, such R&D has yielded detailed information about the specific components (proteins, starches, etc) in a wheat which correlate most strongly with specific quality aspects of the end products. For example, it is now known that: hard wheats, which make good breads, have a higher proportion of the gultenin proteins than soft wheat and that the doughs that they make have larger bandwidth [37]; when crushed, hard wheat are much more brittle than soft wheats and the size distributions of the resulting particles are quite different [8]. The consequential impact that cereal science has had on the theory and application of rheology has spawned a variety of independent mathematical investigations which include: (a) The modelling of the accumulation of elastic potential energy by the mining of a wheat-flour dough (on a Mixograph) as an open-loop hysteretic elongation-and-rupture process [16, 15, 37]. The accumulation represents an indirect measurement of the molecular dynamics of the development of the protein network within the dough and thereby a basis for comparing the role of different combinations of wheat gluten proteins in terms of the resulting quality of the products that their doughs make [14]. (b) In order to extend the applicability of the Boltzmann causal integral representation (1) to general wheat-flour dough modelling, the concept of a damage function f [72] has been introduced to model the measured stress response σ(t) σ(t) = f (t)

Z

0

t

G(t − τ )γ(τ ˙ )dτ,

γ˙ =

dγ . dt

2.3. Interconversion and New Opportunities. In the measurement of (linear) viscoelasticity, equation (1) models the situation where, in response to an applied strain rate γ, ˙ the stress in the material being tested relaxes. An equally important measurement of (linear) viscoelasticity is the accumulation in the material being tested of the strain (the creep) in response to an applied stress rate. The

THE SCIENCE AND TECHNOLOGY OF PLANT BREEDING

247

corresponding Boltzman causal integral equation model takes the form Z t dσ (6) γ(t) = J(t − τ )σ(τ ˙ )dτ, σ˙ = , dt 0 where J(t) is referred to as the retardation (creep) modulus. In order to guarantee ˙ that the conservation of energy is respected, the time derivative of J(t), J(t), must be a completely monotone function. By applying the Laplace convolution theorem to equations (1) and (6), combining the resulting equations to eliminate either σ or γ and then invoking the fundamental theorem of functional analysis, the following convolution identity, known as the interconversion equation, is obtained Z t Z t (7) G∗J = G(t − τ )J(τ ) = J ∗ G = J(t − τ )G(τ ) = t. 0

0

Mathematically, the importance of the interconversion relationship is that it allows equations of the form (8)

G ∗ u = f,

J ∗v =g

to be solved analytically to yield [2, 3] d2 d2 (J ∗ f ) , v = 2 (G ∗ g) . 2 dt dt The practical importance of this interconversion relationship is that once, as a result of solving the appropriate Boltzmann causal model, an estimate of either G or J has been obtained, equation (7) can be solved to determine the other. This avoids the need to invest in two separate instruments for determining estimates of G and J independently, and it ensures that, given either an estimate of G or J, the resulting J or G satisfies the interconversion relationship, which is an essential constraint on their dual behaviour. Interestingly, from a historical perspective, it was one of the first practical problems solved on one of the early electronic computers [45]. These analytic results have then been used to study the effect on the solutions u and v of perturbations in the kernels G and J, repectively [2, 3, 4]. This rheological result and the associated kernel perturbation analysis for G and J have had a consequential impact on mathematics in that they have been generalized to the analysis of first kind and second convolution Volterra equations [27]. For the first kind equations Z t (10) k∗u = k(t − τ )u(τ )dτ = f, (9)

u=

under appropriate regularity, assuming that there exists an function h such that (11)

k∗h=t

it follows that the solution of the first kind equation (10) is given by d2 (h ∗ f ) . dt2 Such results have been used to examine the effect of kernel perturbations for general first and second kind convolution Volterra equations [27]. However, any new mathematical result for Volterra equations has the potential to be applied to Volterra equations in a completely different context. For example, the defect renewal equation which defines the probability of ruin is a first kind convolution Volterra equation to which the generalization has been applied to examine the effect in kernel perturbations [6].

(12)

u=

248

R. S. ANDERSSEN AND M. P. EDWARDS

Another aspect of mathematics that is a consequence of the impact of rheological modelling on mathematics is the choice of model for the relaxation and creep modulii G and J. As already mentioned, in order to guarantee conservation of energy, G and J˙ = dJ/dt must be completely monotone functions. The popular strategy has been, using measured stress-strain data, to approximate either G or J by a sums of exponentials and solve the interconversion relationship (7). In some situations, the relaxation behaviour of G must be fitted with a large number of exponentials. An alternative strategy is to use the Kohlrausch (Williams-Watts, stretched exponential) function A exp(−αtβ ), 0 < β < 1, because it only involves three parameters which are easily estimated using a double logarithm transformation. However, because the Laplace transform of the Kohlrausch function is not known explicitly, it in turn must be approximated by a sum of functions for which the Laplace transforms are known. The obvious choice is the exponential functions. The approximation of the Kohlrausch function by sums of exponentials has recently been investigated [7] using the 1946 Laplace transform representation of the Kohlrausch function by Pollard [68]. 3. Recovery of Information from Spectroscopic Data In a plant breeding trial, a major activity is maximizing the number of varieties screened. Near infrared (NIR) spectroscopy, as well as other forms of spectroscopy, is now widely utilized because of the speed and minimal-cost of the screening, once a suitable calibration-and-prediction (CAP) for the target property has been performed. In addition, the speed and minimal cost of NIR assessments are being exploited in a large variety of other ways. For example: Australian farmers are paid for their harvested wheat on the basis of its protein and moisture content which is assessed using NIR measurements [62, 78]; Raman spectroscopy is being trialled to measure the blood sugar levels in diabetes sufferers [49, 61]; by exploiting derivative spectroscopy analysis [10], NIR measurements are being used to identify the molecular differences in cereal (barley) mutants [80]; various spectroscopic modalities are being used to check that pharmaceutical tables meet specifications [53]. For the recovery of information from spectroscopic data, the mathematical modelling is performed implicitly by assuming that, as a sparse regularization ansatz, a simple relationship holds between the measured spectra of a sample and the specific property of interest of that sample. In this way, by exploiting the fact that only a subset of the spectrum of the sample relates to the property of interest, a partial regularization of the associated under determined system of equations is achieved. The more traditional and historic approach is CAP, but other versions go under the names of machine learning, compressed sensing and sparse regularization. In one way or another, they can be viewed as methodologies that explore for structure in highly under determined situations which can be used predictively. By utilizing computer controlled technology, modern spectrometers generate highly accurate data on very fine (wavelength) grids. Consequently, visually, such data look as if they are accurate discrete realization of some smooth curve. This has led to a resurgence in the application of the historic methodology of derivative spectroscopy [10, 43] to compare the fine scale structure in spectra [80]. 3.1. Calibration-and-Prediction (CAP). For CAP, the overall goal is the determination of a predictor for some specified property P, such as the protein content of wheat, where the predictor is defined in terms of some subset of the features of the spectrum that encapsulate the essence of P. The calibration step estimates the form of the predictor.

THE SCIENCE AND TECHNOLOGY OF PLANT BREEDING

249

For a set of m representative samples, two independent classes of measurements are performed [6]: (i) For each sample, a large comprehensive set of inexpensive and rapidly recorded indirect measurements are collected. A popular choice is a nearinfrared (NIR) spectrum over a comprehensive range of wavelengths. Depending on the circumstances, other forms, such as Raman spectra [1] and grain hardness measurements [8], are utilized either independently of or jointly with the NIR spectra. The overall rationale motivating the choice of the indirect measurements is that (localized) features in them are strongly linked to the information required about P. (ii) For each sample, a (possibly expensive, time consuming and/or hazardous) laboratory measurement is performed to determined the value of P (protein content). This set of dual measurements defines the “calibration data”. For NIR measurements, where a linear relationship is justified (Beer’s Law) [63], it is assumed that the dual calibration data satisfies the linear relationship (13)

Sβ = p,

where the m × n-matrix S, with n >> m, has the m measured spectral responses for n evently spaced wavelengths stored as its rows, p is the vector of the measured values of the property P, and β is the unknown (regression) coefficient vector to be determined. Since only a subset of the spectral responses are related to the property P, the estimation of β as βˆ will involve a dimension reduction calibration step resulting in the replacement of equation (13) by (14)

Sk βˆk = p,

Sk ∈ Rm×k ,

with rank(Sk ) = k < n. It is the βˆk estimate, generated at the calibrations step, that defines the predictor ∗ p of P for a new sample with NIR spectrum s∗ ; namely, (15) p∗ = (s∗ )T βˆk . The clear advantage of this process is that it removes the need to perform laboratory measurements. The price to be paid is the need to solve the (highly) underdetermined system of equations (13) with n >> m. The fact that k < n can be achieved is reflected in the nature of an NIR spectrum (and spectral measurements in generally) in that they record the proportional presence of all the components in the sample that respond to the specific electromagnetic stimulus applied. A variety of methods, including principal component analysis (PCA) [31], partial least squares (PLS) [59] and independent component analysis (ICA) [47], have been utilized to perform the dimension reduction that is the essence of the calibration. However, the choice of β in the calibration equation (13) to define how the prediction process (of equation (15)) should be performed is not unique. This is directly reflected in the fact that the calibration equation (13) can be rewritten to take the alternative form (16) SGG−1 β = p, S¯β¯ = p, where the matrix G represents the chosen preprocessing to be performed on each of the spectra forming S. In NIR spectroscopy, in order to remove the linear scattering sample effect, second order differentiation preprocessing is often applied before the calibration step is performed [63]. If the one-sided derivatives are included in the

250

R. S. ANDERSSEN AND M. P. EDWARDS

differentiation of S, then G would correspond to the second derivative matrix D2 . An alternative choice for G, motivated by derivative spectroscopy [80, 10], is the fourth differentiation matrix D4 . This possibility has recently been investigated in [6]. Other choices are possible. They depend on how the preprocessing can assist with the subsequent calibration. In this preprocessing situation, the modified calibration equations (16) have the same basic structure as the original calibration equations (13). Consequently, in the calibration step, the estimation of β as βˆ is replaced by the estimation of β¯ as ˆ ˆ In this way, the prediction of the value of P as p∗ using the predictor (15) is β. replaced by the predictor ¯ˆ ¯ ∗ = (s∗ G)T β, (17) p where the preprocessing encapsulated in G must be applied to the measured spectrum s∗ of the new sample before the linear functional that defines the prediction can be evaluated. 3.2. Derivative Spectroscopy. It is common practice to model a given set of J discrete observational data points {dj } as (18)

dj = s(tj ) + ǫj ,

t1 < t2 < · · · < J,

where s denotes the smooth signal that the data {dj } are sampling and the {ǫj } random errors associated with the way in which the measurement of the {dj } was performed. In reality, a more appropriate model is (19)

dj = S(tj ) + s(tj ) + ǫj ,

t1 < t2 < · · · < tJ ,

where S denotes the dominant (linear, quadratic, cubic, etc) feature in the smooth signal and s some subsidiary feature the presence of which is not apparent because of the presence of S. In addition, because of the averaging, performed by the computer controlled instrument, of multiple spectra recorded for the same sample, the variance of the {ǫj } is exceedingly small. Such a situation arises for the NIR spectra of powders, such as wheat flours, due to the scattering by the particles within the powder. For two separate samples of the same powder, the packing of the particles will be different. The corresponding NIR spectra will each contain a linear scattering bias. For NIR spectra, though multiplicative scatter correction is often used to remove this linear bias, the more popular strategy is to differentiate the spectral data twice. Its importance and popularity relate not only to the efficient and automatic removal of the linear trend, but also to the fact that the differentiation performs a resolution enhancement. Higher derivatives, such as the fourth, can be used to perform the resolution enhancement [10, 43, 44, 6]. Because of the high accuracy of the measured spectra on a very fine discretization grid, there is no problems with the numerical differentiation if appropriate stable numerical differentiation procedures are used [9]. 4. Modelling Disease and Insect Resistance in Plants Plants must be bred not only to guarantee a desired end-product quality outcome but also to be disease and insect resistant. Historically and currently, the latter was and is achieved as part of the selection process and appropriate cross-breeding, which are involved and time consuming activities. Molecular biological and genetic techniques are now being used to identify and add to target plants transgenes which confer the required resistance [28, 69, 57].

THE SCIENCE AND TECHNOLOGY OF PLANT BREEDING

251

From a mathematical modelling perspective, there has been considerable research related to biological invasion between plants [48] as well as the spread of infections and insects [71]. Reaction diffusion modelling [71, 18] has played a key role along with systems of ordinary differential equations [48]. However, from a practical perspective, it is important to perform the modelling so that it can be adapted to actual measurements of the spread of an invasion. For this, integral recusrsion [77, 73, 52] and level set methods have been proposed [19]. With the discovery in plants of gene silencing, its application to hairpin RNAmediated gene silencing of target genes [75, 76, 29] and its role in the control of plant viral infections, new opportunities have arisen to study and model the various roles of gene silencing in plant germination, growth, flowering, viral infection response, etc. For the design of hairpins, it is important to ensure that only the candidate gene to be silenced is silenced. A match-searching algorithm has been proposed by Horn and Waterhouse [46]. The mathematical modelling of the dynamics of biological invasion of pests [71] represents a starting point for the formulation of models of molecular biological invasion of the type that occurs when a plant responds to infection by a virus. In the co-evolutionary development of modern plants and animals, whereas mammals developed immune systems using antibodies to detect and destroy viruses, plants developed a system to detect the presence of the double stranded RNA (dsRNA) of a replicating virus and to utilize this information to destroy it. This and related processes in animals and plants are now referred to as “gene silencing”. Understanding how it works biologically has been exploited scientifically by designing dsRNA transgenes which are inserted into plants to identify the effect of switching off target genes. The modelling of gene silencing has been discussed in various articles [40, 41, 17]. The gene silencing dynamics of viral defence in plants is a novel molecular predator-prey-predator activity of considerable complexity. The infection of a plant by a virus (the initial predator) attacks the plant cells (the initial prey). The replication of the virus initiates the plants Dicer/Agro response which produces small interfering RNAs (siRNAs) to become the predator with the virus becoming the prey. The spatial-temporal dynamics of the movement of the virus and the siRNAs in a plant determines the information recoverable from experiments about the predator-prey-predator activity. From a mathematical modelling perspective, the goal is the recovery of solutions, consistent with the available experimental observations, of the various inverse problems each of which highlights a specific feature of the underlying predator-preypredator dynamics. A related situation arises in the modelling of the macroscopic dynamics of biological invasion using reaction-diffusion equations. The speed of propagation of the wavefront of the invasion (animals, insects, viruses) represents a key constraint on the choice of the model. The impact of a viral infection in a plant is manifested in various ways. Depending on the role that the virus has evolved to perform, sometimes the plant dies, other times a local imperfection, such as an erratic target pattern on the leaves and fruits, becomes visible as a result of the spatial-temporal dynamics of the competition between the virus and the siRNAs. Two different scenarios for which mathematical models are being developed are: (i) The movement of the siRNAs as they prey on the virus. Such movement can be induced in plants using specially designed dsRNA hairpins in roots grafted to shoots containing a complementary target transgene.

252

R. S. ANDERSSEN AND M. P. EDWARDS

(ii) The interpretation of the observed spatial-temporal dynamics of the competition between the virus and the siRNAs. Such information is collected using time-lapse photography. How the mathematical modelling should be performed represents a future opportunities for mathematical biology research.

5. Modelling Pattern Formation and Floral Shape in Plants A key activity in genetics is, on the basis of experimental results, the identification of the role being performed by specific genes. Because of the large number of genes active in various cells of an organism, there is a need to not only identify the likely role of a gene but also to put it into an appropriate equivalence class. A representative equivalence class is the “housekeeping genes” for which the equivalence relation is “common to all cells” in an organism. In the study of the various phases in the life of a plant, a key goal in biological research is the discovery of the genes which are only expressed (active) in a given phase. Such a situation arises in the study of pattern formation in plants. The types of questions under investigation include: “What are the specific genes that control the positioning of trichomes (hairs) on the leaves and stems of plants?”; “What are the roles of the genes which are only expressed during floral growth?”. But, to answer such questions, the various stages in the formation of some specific pattern must be identified and characterized. This is straight forward when visual cues can be used, such as the time a trichome first appears, or the time when a flower bud opens. It is more challenging when the genetic control of the shaping of the trichomes or flowers is under consideration. Now, the identification must be characterized quantitatively. Enrico Coen and colleagues have been using principal component analysis (PCA) for the quantitative characterization of shape change [25, 38, 24]. They have adopted and adapted a technology first introduced to track the changes in lip shape as someone speaks so that it can be used to quantify lipreading [54]. Furthermore, insight about how genetics controls the development of various geometric phases in plant growth is required to improve on current understanding about what the specific genes involved are actually controlling and doing. Indirectly, such information is important for plant breeding as it helps clarify the role of known signalling proteins and hormones. Examining how to formulate models that describe some specific phase of the genetic control complements and supplements the independent information coming from experimentation and, thereby, has the potential to assist with the identification of what the next experiments should be. Currently, such insight is being obtained in a number of different ways which include: (a) Reaction-Diffusion Modelling of Pattern Formation [35, 55, 39, 32, 50]. (b) Principal Component Analysis (PCA) of Floral Development [25, 38, 24]. (c) Hexagonal Cell Modelling [67, 30]. In this work, the positioning of hairs on the leaves of plants is modelled using simple rules on hexagonal grids. The rules must not only reproduce the observed patterns in normal plants but also, for small changes in the parameters in the model, reproduce the known mutants. The importance of this constraint on the modelling appears to have been first noted by Young [81] in his modelling of the growth of the pea.

THE SCIENCE AND TECHNOLOGY OF PLANT BREEDING

253

5.1. Hexagonal Modelling of the Genetic Control of Trichome Positioning. For the positioning of the hairs on the leaves of plants, Pereverzyev and Anderssen [67] have examined the utility of modelling the growth of the leaf out from the meristem as a simple signalling process on an array of hexagonal cells. Among other things, this has led to the formulation and investigation of the algebra of hexagonal recursion [60]. The hexagonal recursion model [30] assumes that the fate of the cells on the leaf of a plant is determined as they grow out of the meristem, as illustrated by the red cells in Figure 1(A). Biological constraints taken into account include: (i) the model is cellular and is not a macroscopic model with cellular details smoothed out; (ii) the assumption that the cells are hexagonal (This is consistent with [51], where the average number of sides of leaf cell has been shown to be approximately six.); (iii) the fate of a cell is determined by its neighbours [66]; (iv) no boundary cell (of an Aribidopsis leaf [70]) becomes a trichome. In the model, a recursion rule is used to determine the hexagonal recursion value (HRV) in a cell as it grows out of the meristem, as illustrated in Figure 1(B). Boundary cells are assigned a specified constant value. The HRV of a non-boundary cell is determined as the weighted value of the HRVs of its three neighbouring cells just formed above the meristem. This is illustrated pictorially and algebraically in Figure 1(B). Stipulating that each of the weights in the formula in Figure 1(B) is greater than or equal to 1 ensures that the HRV will be increasing until reset. If a specified threshold is reached in a red cell being formed, its fate is set to be a trichome; i.e., trichome initiation is switched ON. When a second specified threshold is met in at least one of the triple of black hexagonal cells above a red one being formed, trichome production is switched OFF and the value of that cell is reset to some specified constant value. A trivial asymptotic case arises when the first threshold (ON switch) equals the boundary value and the second (OFF) threshold → ∞. In this case, the ON threshold value is met by all cells and the OFF threshold is never met, so that all cells become trichomes, including those along the boundary. A second trivial asymptotic situation arises when the second (OFF) threshold value is less that the first (ON) threshold value, so that no trichomes are initiated. Setting the OFF threshold value equal to the ON threshold value results in trichome initiation being switched OFF immediately after the initiation of a trichome occurs. This means that as soon as a cell reaches the first threshold and produces a trichome, any lower adjacent cell will have its value reset and the possibility of a trichome is “switched off”. This is consistent with wild type patterns, where cells producing trichomes cannot be adjacent [51]. In this case, there is no possibility of generating the clumping of trichomes which is observed in mutants. Setting the OFF threshold greater than the ON threshold allows production of trichomes in adjacent cells. Increasing the difference in the two threshold values leads to the thickening of the groups of adjacent cells producing trichomes. This model confirms that small changes in the parameters easily yield different patterns for the trichomes for the hexagonal cell model, capturing both wild type and mutant behaviour. Such modelling will, in time, be linked to the current understanding about the role of the stem cells in determining how a leaf is formed [20].

254

R. S. ANDERSSEN AND M. P. EDWARDS

meristem

(a)

C2 C1

C3 C



C ∗ = Wl C 1 + Wc C 2 + Wr C 3

(b)

Figure 1. (A) A hexagonal cell approximation of the epidermal cells on the upper side of a leaf. The top hexagonal cell corresponds to the tip of the leaf. The red hexagons represent the cells growing out of the meristem. Further details can be found in [67]. (B) The localized additive relationship that models how the concentration of the signal in the cells above the meristem determine its concentration in the cells forming at the meristem. As illustrated in Figures 2 and 3, small changes in the parameters easily yield different patterns for the hairs on the hexagonal cell model. Acknowledgements Useful and helpful discussions with Sergiy Pereverzyev, Frank de Hoog, Markus Hegland and Peter Waterhouse are gratefully acknowledged. References [1] R. S. Anderssen, E. Carter, B. G. Osborne, and I. J. Wesley. Joint inversion of multi-modal spectroscopic data of wheat flours. Appl. Spectro., 59:920–925, 2005. [2] R. S. Anderssen, A. R. Davies, and F. R. de Hoog. On the sensitivity of interconversion between relaxation and creep. Rheologica Acta, 47:159–167, 2008.

THE SCIENCE AND TECHNOLOGY OF PLANT BREEDING

(a)

(b)

255

(c)

Figure 2. Final distributions generated by the hexagonal recursion with fixed non-symmetric weights and the OFF threshold equal the ON threshold.

(a)

(b)

(c)

Figure 3. Final distributions generated by the hexagonal recursion with fixed non-symmetric weights and the OFF threshold greater than the ON threshold. [3] R. S. Anderssen, A. R. Davies, and F. R. de Hoog. On the Volterra integral equation relating creep and relaxation. Inverse Problems, 24:035009 (p. 13), 2008. [4] R. S. Anderssen, A. R. Davies, and F. R. de Hoog. The effect of kernel perturbations when solving the interconversion convolution equation of linear viscoelasticity. Appl. Math. Letters, 24():71–75, 2011. [5] R. S. Anderssen, F. R. de Hoog, and I. J. Wesley. Information recovery from near infrared data. In W. McLean and A. J. Roberts, editors, Proceedings of the 15th Biennial Computational Techniques and Applications Conference, CTAC-2010, volume 52 of ANZIAM J., pages C333–C348, 2011. [6] R. S. Anderssen, F. R. de Hoog, and M. Westcott. Stability of the defect renewal equation. In MODSIM2011, 19th International Congress of Modelling and Simulation, pages 359–363. Modelling and Simulation Society of Australian and New Zealand, 2011. [7] R. S. Anderssen, M. P. Edwards, S. A. Husain, and R. J. Loy. Sums of exponentials approximations for the Kohlrausch function. In MODSIM2011, 19th International Congress of Modelling and Simulation, pages 263–269. Modelling and Simulation Society of Australian and New Zealand, 2011. [8] R. S. Anderssen and R. Haraszi. Characterizing and exploiting the rheology of wheat hardness. Euro. Food Res. Tech., 229:159–174, 2009. [9] R. S. Anderssen and M. Hegland. For numerical differentiation, dimensionality can be a blessing! Mathematics of Computation, 68(227):1121–1141, 1999.

256

R. S. ANDERSSEN AND M. P. EDWARDS

[10] R. S. Anderssen and M. Hegland. Derivative Spectroscopy – An enhanced role for numerical differentiation. J. Integ. Eqn. Appl., 22(3):355–367, 2010. [11] R. S. Anderssen, M. Hegland, and I. J. Wesley. Resolution enhancement for infrared spectroscopic data. In MODSIM2011, 19th International Congress of Modelling and Simulation, pages 371–377. Modelling and Simulation Society of Australian and New Zealand, 2011. [12] R. S. Anderssen, S. Husain, and R. J. Loy. The Kohlrausch function: properties and applications. ANZIAM J (E), 45:C800–C816, 2004. [13] R. S. Anderssen, B. G. Osborne, and I. J. Wesley. The application of localisation to near infrared calibration and prediction through partial least squares regression. JNIRS, 11(1):39– 48, 2003. [14] RS Anderssen, F Bekes, PW Gras, A Nikolov, and JT Wood. Wheat-flour dough extensibility as a discriminator for wheat varieties. J. Cereal Sci., 39():195–203, 2004. [15] RS Anderssen, IG Gotz, and KH Hoffmann. The global behavior of elastoplastic and viscoelastic materials with hysteresis-type state equations. SIAM J. Appl. Math., 58:703–723, 1998. [16] RS Anderssen, PW Gras, and F MacRitchie. The rate-independence of the mixing of wheat flour dough to peak dough development. J. Cereal Sci., 27():167–177, 1998. [17] RS Anderssen and PW Waterhouse. Modelling antiviral resistance in plants. Antiviral Resistance in Plants, Chapter 10, 2011. [18] DG Aronson and HF Weinberger. Multidimensional non-linear diffusion arising in populationgenetics. Advances in Math., 30:33–76, 1978. [19] Britta Basse and Michael Plank. Modelling biological invasions over homogeneous and inhomogeneous landscapes using level set methods. Bio. Invas., 10:157–167, 2008. [20] Siobhan A. Braybrook and Cris Kuhlemeier. How a Plant Builds Leaves. Plant Cell, 22():1006–1018, 2010. [21] P Broadbridge, JH Knight, and C Rogers. Constant rate rainfall infiltration in a bounded profile - solutions of a nonlinear model. Soil Sci. Soc. Amer. J., 52:1526–1533, 1988. [22] J. C. Butcher. Numerical Methods for Ordinary Differential Equations. Wiley, 2003. [23] A. J. Chorin and Marsden. A Mathematical Introduction to Fluid Mechanics. Springer, 2000. [24] E Coen, AG Rolland-Lagan, M Matthews, JA Bangham, and P Prusinkiewicz. The genetics of geometry. PNAS, 101:4728–4735, 2004. [25] Min-Long Cui, Lucy Copsey, Amelia A. Green, J. Andrew Bangham, and Enrico Coen. Quantitative Control of Organ Shape by Combinatorial Gene Activity. PLOS Biology, 8(11), NOV 2010. [26] F. R. de Hoog. Why are simple models often appropriate in industrial mathematics? 18th World IMACS/MODSIM Congress, Cairns, July 13-17, Proceedings:23–36, 2009. [27] F. R. de Hoog and R. S. Anderssen. Kernel perturbations for Volterra convolution integral equations. J. Integral Eqn. Appl., 22:427–441, 2010. [28] E. S. Dennis, J. Ellis, A. Green, D. Llewellyn, M. Morell, L. Tabe, and W. J. Peacock. Genetic contributions to agricultural sustainability. Phil. Trans. Roy. Soc. B - Bio. Sci., 363:591–609, 2008. [29] A. Eamens, M.-B. Wang, N. A. Smith, and P. M. Waterhouse. RNA silencing in plants: Yesterday, today, and tomorrow. Plant Physiol., 147:456–468, 2008. [30] M. P. Edwards, S. Pereverzyev Jr., and R. S. Anderssen. Modelling pattern formation in plants. In MODSIM2011, 19th International Congress of Modelling and Simulation, pages 371–377. Modelling and Simulation Society of Australian and New Zealand, 2011. [31] L. Elden. Partial least-squares vs. Lanczos bidiagonalization – I: analysis of a projection method for multiple regression. Comp. Stats & Data Anal., 46(1):11–31, 2004. [32] H. Fujita, K. Toyokura, K. Okada, and M. Kawaguchi. Reaction-Diffusion Pattern in Shoot Apical Meristem of Plants. PLOS ONE, 6, 2011. [33] G. Fulford and P. F. Broadbridge. Case Studies in the Diffusion of Heat and Matter. Cambridge University Press, 2001. [34] G. Fulford, P. Forrester, and A. Jones. Modelling with Differential and Difference Equations. Cambridge University Press, Cambridge, U. K., 1997. [35] A Gierer and H Meinhardt. Theory of biological pattern formation. Kybernetik, 12:30–39, 1972. [36] Douglas S. Glazier. A unifying explanation for diverse metabolic scaling in animals and plants. Biol. Reviews, 85(1):111–138, FEB 2010. [37] PW Gras, HC Carpenter, and RS Anderssen. Modelling the developmental rheology of wheatflour dough using extension tests. J. Cereal Sci., 31():1–13, 2000.

THE SCIENCE AND TECHNOLOGY OF PLANT BREEDING

257

[38] Amelia A. Green, Richard Kennaway, Andrew I. Hanna, J. Andrew Bangham, and Enrico Coen. Genetic Control of Organ Shape and Tissue Polarity. PLOS Biology, 8, 2010. [39] P. B. Green. Expression of pattern in plants: Combining molecular and calculus-based biophysical paradigms. Amer. J. Botany, 86:1059–1076, 1999. [40] MAC Groenenboom, AFM Maree, and P Hogeweg. The RNA silencing pathway: The bits and pieces that matter. PLOS Computational Biology, 1(2):155–165, JUL 2005. [41] Marian A. C. Groenenboom and Paulien Hogeweg. RNA silencing can explain chlorotic infection patterns on plant leaves. BMC Systems Biol., 2, NOV 30 2008. [42] V. G. Hart and J. Y. Shi. Governing equations for wave-propagation in prestressed joined dissimilar elastic tubes containing fluid-flow - with an example for a tapered section. International J. Eng Sci, 33(8):1121–1138, JUN 1995. [43] M. Hegland. Error bounds for spectral enhancement which are based on variable Hilbert scale inequalities. J. Integral Eqns. Appl., 22:285–312, 2010. [44] M. Hegland and R. S. Anderssen. Resolution enhancement of spectra using differentiation. Inverse Problems, 21(3):915–934, 2005. [45] I. L. Hopkins and R. W. Hamming. On creep and relaxation. J. Appl. Phys., 28:906–909, 1957. [46] M. E. T. Horn and P. M. Waterhouse. Rapid match-searching for gene silencing assessment. Bioinformatics, 26:1932–1937, 2010. [47] A. Hyvarinen and E. Oja. Independent component analysis: algorithms and applications. Neural Net., 13(4-5):411–430, 2000. [48] M. J. Jeger, R. Jeffries, Y. Elad, and X.-M. Xu. A generic theoretical model for biological control of foliar plant diseases. J. Theor. Biol., 256:201–214, 2009. [49] OS Khalil. Spectroscopic and clinical aspects of noninvasive glucose measurements. Clinical Chem., 45:165–177, 1999. [50] S. Kondo and T. Miura. Reaction-Diffusion Model as a Framework for Understanding Biological Pattern Formation. Science, 329:1616–1620, 2010. [51] JC Larkin, N Young, M Prigge, and MD Marks. The control of trichome spacing and number in Arabidopsis. Development, 122():997–1005, 1996. [52] B. Li, M. A. Lewis, and H. F. Weinberger. Existence of traveling waves for integral recursions with nonmonotone growth functions. J. Math. Biology, 58:323–338, 2009. [53] Marta B. Lopes, Jean-Claude Wolff, Jose M. Bioucas-Dias, and Mario A. T. Figueiredo. Determination of the composition of counterfeit Heptodin (TM) tablets by near infrared chemical imaging and classical least squares estimation. Anal. Chim. Acta, 641:46–51, 2009. [54] I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey. Extraction of visual features for lipreading. IEEE Trans. Pattern Anal. Machine Intel., 24:198–213, 2002. [55] H Meinhardt. Models of pattern formation and their application to plant development. Positional Control in Plant Development, pages 1–32, 1984. [56] K. W. Morton and D. F. Mayers. Numerical Solution of Patial Differential Equations. Cambridge University Press, 2005. [57] F. Murray, D. Llewellyn, H. McFadden, D. Last, E. S. Dennis, and W. J. Peacock. Expression of the Talaromyces flavus glucose oxidase gene in cotton and tobacco reduces fungal infection, but is also phytotoxic. Molec. Breeding, 5:219–232, 1999. [58] J. D. Murray. Mathematical Biology. 1989. [59] T. Naes, T. Isaksson, T. Fearn, and T. Davies. A User-Friendly Guide to Multivariate Calibration and Classification. NIR Publications, Chichester, UK, 2002. [60] CM O’Keefe, S Pereverzyev Jr., and RS Anderssen. The algebra of hexagonal numbers. The Mathematical Scientist, 36:1–9, 2011. [61] N. S. Oliver, C. Toumazou, A. E. G. Cass, and D. G. Johnston. Glucose sensors: a review of current and emerging technology. Diabetic Med., 26:197–210, 2009. [62] B. G. Osborne. Near-infrared spectroscopy in food analysis. In Encyclopedia of Analytic Chemistry, pages 1–14. J. Wiley & Sons, Chichester, UK, 2006. [63] B. G. Osborne, T. Fearn, and P. H. Hindle. Practical NIR Spectroscopy with Applications in Food and Beverage Analysis. Longman Scientific & Technical, Harlow, UK, 1993. McGrawHill Series in Higher Mathematics. [64] J. T. Ottesen, M. S. Olufsen, and J. K. Larsen. Applied Mathematical Models in Human Physiology. SIAM, Philadelphia, 2004. [65] J. Pedlosky. Geophysical Fluid Dynamics. Springer, 1987. [66] RI Pennell, QCB Cronk, S Forsberg, C Stohr, L Snogerup, P Kjellbom, and PF McCrae. Cell-Contex Signalling. Phil. Trans Roy. Soc. London, B-Biological Sci., 350:87–93, 1995.

258

R. S. ANDERSSEN AND M. P. EDWARDS

[67] S Pereverzyev Jr. and RS Anderssen. Recursive algebraic modelling of gene signalling, communication and switching. RICAM Report, 24:(), 2008. [68] H Pollard. The representation of exp(xλ ) as a Laplace integral. Bull. Amer. Math. Soc., 52:908–910, 1946. [69] D. Rungis, D. Llewellyn, E. S. Dennis, and B. R. Lyon. Investigation of the chromosomal location of the bacterial blight resistance gene present in an Australian cotton (Gossypium hirsutum L.) cultivar. Aust. J. Agric. Res., 53:551–560, 2002. [70] S Schellmann, A Schnittger, V Kirik, T Wada, K Okada, A Beermann, J Thumfahrt, G Jurgens, and M Hulskamp. TRIPTYCHON and CAPRICE mediate lateral inhibition during trichome and root hair patterning in Arabidopsis. EMBO J., 21:5036–5046, 2002. [71] N Shigesada and K Kawaasaki. Biological Invasion: Theory and Practice. Oxford University Press, 1997. [72] Roger I. Tanner, Fuzhong Qi, and Kostas D. Dai. Bread dough rheology: an improved damage function model. Rheologica Acta, 50():75–86, 2011. [73] HR Thieme. Density-dependent regulation of spatially distributed populations and their asymptotic speed of spread. J. Math. Biology, 8:173–187, 1979. [74] AM Turing. The chemical basis of morphogenesis. Phil. Trans. R. Soc. London Ser. B-Biol. Sci., 237:37–72, 1952. [75] P. M. Waterhouse, H. W. Graham, and M.-B. Wang. Virus resistance and gene silencing in plants can be induced by simultaneous expression of sense and antisense RNA. Proc. Nat. Acad. Science USA, 95:13959–13964, 1998. [76] P. M. Waterhouse, M.-B. Wang, and T. Lough. Gene silencing as an adaptive defence against viruses. Nature, 411:834–842, 2001. [77] HF Weinberger. Long-time behavior of a class of biological models. SIAM J. Math. Anal., 13:353–396, 1982. [78] I. J. Wesley, O. Larroque, B. G. Osborne, N. Azudin, H. Allen, and J. H. Skerritt. Measurement of gliadin and glutenin content of flour by NIR spectroscopy. J. Cereal Sci., 34:125–133, 2001. [79] D. V. Widder. The Laplace Transform. Princeton University Press, Princeton, 1972. [80] Paul R. Wiley, Greg J. Tanner, Peter M. Chandler, and Robert S. Anderssen. Molecular Classification of Barley (Hordeum vulgare L.) Mutants Using Derivative NIR Spectroscopy. J. Agric. Food Chem., 57:4042–4050, 2009. [81] JPW Young. Pea leaf morphogenesis - a simple-model. Annals Botany, 52:311–316, 1983. CSIRO Mathematics, Informatics and Statistics, GPO Box 664, Canberra, ACT 2601 E-mail : [email protected] URL: http://www.cmis.csiro.au/Bob.Anderssen/ School of Mathematics and Applied Statistics, University of Wollongong, NSW 2522 E-mail : [email protected]