modelling approach - BES journal

3 downloads 0 Views 2MB Size Report
... as it allows to perform more simulations, a faster model emulator can be used ..... Others techniques (e.g. LASSO, Gaussian process, Random forest) were not ...
Methods in Ecology and Evolution 2014, 5, 934–943

doi: 10.1111/2041-210X.12250

Extending the use of ecological models without sacrificing details: a generic and parsimonious meta-modelling approach Guillaume Marie and Guillaume Simioni* 1

INRA, UR 629 Ecologie des Fore^ts Me´diterrane^ennes, URFM, Domaine Saint Paul, site Agroparc, CS 40509 - 89914 Avignon cedex 9, France

Summary 1. Difficulties in accounting for the fine scale nature of ecological processes in large-scale simulations constitute an important issue in ecology. Among existing methods, meta-modelling, that is creating a statistical emulator of a model, has seen very few applications in ecology. Yet, meta-modelling methods are well advanced in the field of engineering. 2. We adapted and applied a meta-modelling approach to a case study typical of the complexity found in ecosystems. It involved a highly detailed, individual-based and spatially explicit biophysical model (noTG). The model was parameterized for a multi-specific, spatially heterogeneous forest. Our goal was to increase its temporal domain of applicability by obtaining a meta-model of its light interception module many times faster. 3. The meta-model was constructed from a series of simulations with noTG, following a latin hypercube design. Several meta-modelling techniques were compared, with neural networks providing the best results. The metamodel accurately reproduced the behaviour of noTG across a range of variables and organization levels. It was also 62 times faster. 4. These result show that meta-modelling can be a practical tool in ecology and represents a highly powerful way to change the scope of a model while still accounting for fine details.

Key-words: modelling, spatial or time-series, statistics, upscaling Introduction Process-based models (PBMs, see Table 1 for a list of abbreviations) are essential tools to assess ecosystem response to climate change, land use changes, extreme weather patterns, or other environmental disturbances. PBMs allow to deal with the high level of interactions and feedbacks which are intrinsic to ecological processes, but their complexity comes at the cost of computation time and memory. Because of that, there is a trade-off between the resolution satisfactory to describe ecological processes (e.g. organism, population, species, vegetation type), and the computing constraints when applying PBMs at a large spatial extent and/or long time periods. In addition, most measurements are made at fine scales, that is high resolution and small extent, rendering the parameterization and evaluation of PBMs at large scales less manageable (Coutts & Yokomizo 2014). Accounting for fine scale processes in large-scale simulations is a major issue in ecology. In that regard, up-scaling, that is translating information of process behaviour and interaction from one scale to another larger scale, is an interesting approach. It consists in deriving an aggregated model from a more complex one. It often relies on differential equations (Auger et al. 2000; Tietjen & Huth 2006) or model simplification (Boulain, Simioni & Gignoux *Correspondence author. E-mail: [email protected]

2007). The up-scaling approach leads to an increase of the extent, but requires a decrease in the resolution and/or in the level of details at which processes are described (Moorcroft, Hurtt & Pacala 2001; Urban 2005; Lischke et al. 2007). Such model simplifications can entail a sacrifice of potentially important fine scale details. However, model simplification is not the only option. Extending the use of a model without sacrificing details and resolution requires to reduce its computational needs (Lischke et al. 2007). One way to get around this computational resources problem is to use more computing power. But despite the continuous development of computers, computing power remains a major limitation for many PBMs. In some cases, it may be possible to optimize the model source code and compiling. Finally, another solution consists in constructing a statistical emulator of a model: a meta-model. The term meta-model has been used with various meanings in the literature. For Urban, Acevedo & Garman (1999) and Moorcroft, Hurtt & Pacala (2001), it refers to any model of a model, while for Kleijnen (1975) and Simpson et al. (2001), it only refers to a faster statistical emulator. In this study, we used the second definition that can be described as: gðxÞ  fðxÞ ¼ y where g(x) is the statistical approximation of the output of the model f(x) = y using the same input x. Such meta-models have

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society

Extending ecological models without sacrifices Table 1. Table of abbreviations APE C Cass DBH DDratio Dec GAP Hr LAI LHS MLR N NNT NPP OL OLratio PARa PBM PE PMA Rg SSage Stress Temp Ts Wevap wind

Absolute percentage error Carbon C assimilation (g) Diameter at breast height (cm) Diffuse to direct ratio ([0,1]) Solar declination (rad) Gap fraction ([0,1]) Relative humidity (%) Leaf area index (m2 m2) Latin hypercube sampling Multi linear regression Nitrogen Neural Network Net primary production Surface types (shaded or sunlit) Shaded to sunlit ratio ([0,1]) Absorbed photosynthetically active radiation (MJ m2 or W m2) Process-based model Percentage error Polynomial regression spline Global radiation (MJ m2 d1) Mean age of each species (year) Water stress ([0,1]) Air temperature (°C) Leaf and soil surface temperature (°C) Water evapotranspiration (l or mm) Wind speed (m s1)

been widely used in the field of engineering, and various metamodelling techniques have been described (Barton 1994; Simpson et al. 2001) and compared (Pineros Garcet et al. 2006). In addition to up-scaling applications, as it allows to perform more simulations, a faster model emulator can be used to better understand the input/output relationships of the model (Friedman 1984; Chen et al. 2011), improve parameter optimization (Barton 2009), or perform sensitivity analyses (Busby 2009). Meta-models have scarcely been used in ecology, although interesting studies include Britz & Leip (2009), Kwon & Hudson (2010), Villa-Vialaneix et al. (2012) and Luo et al. (2013). Strictly speaking, as a statistical approximation, a metamodel does not explicitly include the causal relationships of the original model but only emulates them. Therefore, its accuracy is critically dependent on the data that were used for its construction. The goal of our study was to propose and test a generic and parsimonious method to construct a statistical emulator, in order to increase the temporal and/or the spatial extent of a PBM without changing its resolution. Our five step method (Table 2) was inspired from the method proposed by Kleijnen & Sargent (2000), but differs in some points to better account for the high number of variables, parameters, processes and feedbacks typical of ecological systems. The first step is to define whether the whole model or only part of it will be meta-modelled, according to the structure of the model and computation time profiling. The second step is to optimize the number of inputs and outputs of the metamodel to avoid possible redundancies, and possibly aggregate

935

Table 2. Proposed general method to construct a meta-model. Different options are available for each step based on various criteria. Inputs can be parameters, or state, initial, or forcing variables. I/O relationship type refers to the type of relationships between inputs and outputs (e.g. linear, monotonic, nonlinear, level of interaction) Steps

Options

Criteria

1. Defining a general strategy

Global Fragmented Localized

Model structure Computation time profiling

2. Identifying I/O relationship

Exhaustive Selected Aggregated Complete Factorial Space filling Linear Polynomial Machine learning

Desired range of use Required precision

3. Creating a data set

4. Fitting the meta-model(s)

5. Testing the meta-model(s)

Global Multi-scale

Number of inputs Model computation time Meta-model computation time Required precision I/O relationship type Error propagation Required precision Number of outputs

some inputs depending on the desired level of genericness. The third step is to define a design of simulation and create the data set to construct the meta-model. The fourth step is to construct meta-models according to various statistical tools and to select the best. The fifth step is to test the meta-model including evaluating error propagation. To illustrate the method, we applied it on a particularly time-consuming PBM designed to simulate the carbon (C), water and nitrogen (N) cycles of ecosystems with complex structures from one to a few years. Using this model for areas larger than a forest stand or periods longer than a few years is strongly limited by computing requirements. We also chose an ecosystem presenting a particularly complex structure (i.e. a multi-species, spatially heterogeneous forest) for which simpler models are not suitable. Through meta-modelling, we attempted to extend the use of the model to run simulations for periods up to a century. Based on this example, we discuss the advantages and inconveniences of the meta-modelling approach, especially regarding the range of applicability, error propagation and potential applications.

Materials and methods THE ECOSYSTEM MODEL: noTG

noTG, as for ‘not only TREEGRASS’ (Simioni et al. 2011), is an individual-based and spatially explicit biophysical model designed to simulate the full C, water and N cycles of various terrestrial ecosystems. It inherits all the features of the TREEGRASS model (Simioni et al. 2000; Simioni 2001) that simulates water fluxes and net primary production (NPP) for tropical tree/grass ecosystems. noTG is a more comprehensive PBM that includes processes necessary to simulate the full ecosystem C, water and N budgets for tropical, temperate and

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 934–943

936 G. Marie & G. Simioni Wind

Rg

H2O H2O

Hr

Temp

3D light interception + Energy budget 3D matrix

CO2

PARa Leaf Ts Stom.Cond. Leaf area

Photosynthesis + C, N allocation + Phenology

N CO2 N

Soil cond.

Plant Soil Ts

O H2

r ate W ess str

Soil water budget Soil cell

C, N

N

Nitrogen budget Water limitation

Soil cell N

H 2O

Fig. 1. Summary of the main processes included in the noTG model. Boxes show the main components and the resolution at which they are computed. Plain arrows indicate fluxes while dashed arrows indicate influences. Grey-filled circles indicate forcing variables: Rg, wind, Hr, Temp, CO2 and rainfall (H2O). Abbreviations are defined in Table 1. Other abbreviations are for plant stomatal conductance (stom. cond.) and soil surface conductance (soil. cond.). The red rectangle represents the component that has been meta-modelled. Mediterranean ecosystems. Figure 1 summarizes the main processes incorporated in noTG. Plant foliage is partitioned into a 3D grid of voxels (Fig. 2). The word voxel is a contraction of two word, volume and pixel, and defines a pixel in three dimensions. A voxel can contain foliage (i.e. vegetation voxel), from one or several trees. Radiation absorption, photosynthesis and transpiration are computed for each leaf type in each voxel, before being aggregated at the tree level. Similarly, plant roots are partitioned between soil voxels, and a plant can access water and N only from the soil voxels where its roots are present. In the case when a plant does not have access to sufficient water, the resulting water stress will reduce stomatal conductance, thereby impacting both photosynthesis and transpiration at the voxel level. This representation allows to simulate, at a very small resolution, the competition between plants for light, water and N. Amounts of acquired C and N and their partitioning between different tissues determine the rate of plant growth. Through senescence, C and N are transferred to the soil organic matter. The model tracks the decomposition of soil organic matter, and the corresponding inputs and outputs of C and N, for each soil voxel. Soil water is also modelled at the soil voxel level and depends on

(a)

(b)

Canopy

inputs from rainfall and outputs through plant uptake and soil evaporation. The model runs at a daily time-step, for periods typically ranging from one to a few years. But ultimately, the extent of the simulated period is limited by computing power, as simulating 1 year requires 2–4 h and one to two gigabytes of random access memory. These computing requirements limit its use to small ecosystem plots for short durations. This makes noTG an ideal candidate to explore the possibility of extending its use through meta-modelling techniques. For clarity, it is important to distinguish between state variables, initial values, forcing variables, output variables and parameters. State variables can change over the course of a simulation and the main ones include C and N amounts in the various plant tissues and soil organic matter, plant dimensions, plant leaf area, phenological state, plant water status and water in soil voxels. Initial values are the values given to state variables at the beginning of a simulation, and also include plant spatial coordinates. Forcing variables are input data that can change over the course of a simulation. They include global radiation (Rg), air temperature (Temp), relative humidity (Hr), wind speed (wind), air CO2 concentration and solar declination (Dec). Parameters are input data that do not change over the course of a simulation. They characterize species (photosynthetic potential, allometric constraints, phenology, etc), soil (water holding capacity, decomposition rates, etc) and site (latitude, nitrogen deposition). Parameters are listed in Appendix S1.

STUDY AREA: AN HETEROGENEOUS AND MULTISPECIFIC ECOSYSTEM 0

FORCING:

STATE:

Rg, Temp, Hr, dec, wind

GAP, stress, DDratio, OL, OLratio

Voxel

OUTPUTS: PARa, Ts

rid

Tree position

00

noTG was parameterized for the Font-Blanche forest (5∘40 45 E/ 0 00 43∘14 27 N): a typical Mediterranean forest composed of an upper stratum of a conifer, Pinus halepensis Mill., a lower stratum of an evergreen broadleaf species, Quercus ilex L., plus an understorey. The two dominating species present strong differences (e.g. broadleaf vs. coniferous, growth pattern, response to drought). In addition, each strata presents large gaps, allowing light to penetrate deep into the canopy, unlike more homogeneous forests. Those characteristics influence light interception and partitioning between species (Simioni, Durand-Gillmann & Huc 2013), growth, response to water stress, and potentially C and water fluxes at the ecosystem level. Many PBMs that predict ecosystem C and water budgets were designed for monospecific, homogeneous canopies and cannot handle complex forest structures such as

g 3D

Fig. 2. Inputs and outputs of the meta-models. The role of the meta-models was to predict PARa, and Ts for each voxel of the 3D grid (a). Inputs (b) include climate forcing variables (Rg, Temp, Hr, wind, dec) as well as state variables computed by noTG for each voxel (GAP, OL, OLratio), plants (stress) and plot (DDratio). Abbreviations are defined in Table 1.

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 934–943

Extending ecological models without sacrifices found at Font-Blanche. All climate data and initial variables used in the simulations were measured at Font-Blanche or nearby weather stations. Parameters were measured at Font-Blanche or taken from the literature (see Appendix S1). In our simulations, the understorey was ignored.

META-MODELLING METHOD

The following is a description of the five step method presented in Table 2, using our case study as an example of implementation.

937

noTG to compute photosynthesis, plant transpiration and soil evaporation. We carried out a global sensitivity analysis on forcing climatic and state variable inputs for each output and we selected the most relevant inputs. Then, we aggregated species parameters to one species, which can be either P. halepensis or Q. ilex. Aggregating inputs, however, decreases the meta-model genericness. While it limited the domain of applicability to P. halepensis and Q. ilex, it reduced the number of plant parameters to one. The final selected inputs and outputs are presented in Appendix S2 and Fig. 2.

Step 3: Building a meta-model construction data set

This first stage involves examining the model structure and performing a computation time profiling. According to those, three meta-modelling strategies can be used: Global: one meta-model is created for the whole model. It is desirable when there are few inputs and outputs, or if the model cannot be decomposed into submodels. This strategy can quickly become impractical when there are too many inputs and outputs, as the number of simulations necessary to create the meta-model could be unmanageable. Fragmented: if the model can be easily decomposed into submodels, one meta-model can be created for each submodel. Localized: if one submodel represents most of the computing time, only this submodel is meta-modelled. In our case, 98% of the computation time and most of the memory use were taken up by the light interception and energy budgets (Fig. 1). Thus, the global strategy would have been both very challenging, because of the high level of interactions, and unproductive as it would have required a lot more simulations to complete the metamodel construction. In addition, the more complex a meta-model is the longer the computation time. Therefore, applying the localized strategy was the most relevant choice. A big advantage of the localized and fragmented strategies is to reduce the number of inputs and outputs for the meta-model(s). The method we propose is not adapted to stochastic processes, as it relies on the constancy of the relationship between inputs and outputs. Therefore, applying the method to models including stochastic processes will require to meta-model only parts that do not include stochasticity (i.e. fragmented or localized strategies). The second aspect of this step is to define the targeted scope for the meta-model. In our case, the targeted spatial resolution and extent were the voxel (Fig. 2) and the forest stand, respectively, focusing on heterogeneous forests composed of Q. ilex and/or P. halepensis which are common in the Mediterranean area. In this study, we used 2 m3 voxel (1 m long, 1 m large, 2 m deep). The targeted temporal resolution and extent were the daily time-step and the century, respectively.



• •

The meta-model will be constructed based on a data set obtained from simulations with the original model, following a design of simulation determined by the number of inputs and computing constraints. If the model is fast enough, and with few inputs, a classic complete factorial design can be used. Otherwise, a fractional factorial design or a space filling sampling design may be more appropriate. We chose a latin hypercube sampling (LHS), a space filling sampling technique which allowed to reduce the number of simulations while still covering the desired domain of applicability (Helton & Davis 2003). Model inputs (described in the Appendix S2) are divided into two types: direct inputs that could be directly used in the LHS and indirect inputs, that are state variables at the voxel level (OL, OLratio, GAP) that could not be included directly into the LHS. For the latter, we used a virtual plot generator to create plots with different vegetation structures that would lead to a large range of OL, OLratio and GAP values. The virtual plot generator is an algorithm coded in R that permits to randomly create simulation plots. Only three parameters are required to generate a virtual plot: the mean stand age of each species (SSage), as tree density decreases when mean age increases, and the dimensions of the plot. A virtual 15 9 15 m plot was generated for each simulation. Figure 3 summarizes the whole simulation process. Table 3 presents the nine inputs of the LHS that represent our selected model inputs, and their

Latin hypercube sampling (100 runs) Direct inputs

Indirect inputs SSage

Rg, Temp, Hr, stress, dec, DDratio, wind

noTG One day simulation

GAP OLratio

Virtual plot generator

Step 2: Optimizing the inputs and outputs The lower the number of inputs is the easier the design of simulations will be (step three). The number of inputs and outputs of the metamodel depends on the strategy chosen in step one, but can be further reduced according to the desired domain of applicability and the required meta-model precision. Reducing the number of inputs can be done based on a sensitivity analysis and/or input aggregation. In our case, the light interception and energy budget module have two outputs that we want to predict: (i) the photosynthetically active radiation absorbed by leaves (PARa) and (ii) the surface temperatures of leaves and soil (Ts) for each surface types (sunlit or in the shade, OL) in each vegetation and soil voxels. Those outputs are subsequently used in

For each run

Step 1: Defining the meta-modelling strategy

10 % of voxels are strored into dataset

Fig. 3. Design of simulations to build the data set used to construct meta-models, based on a LHS of 100 simulations for the selected inputs. Direct inputs are forced into noTG while SSage is used in the virtual plot generator to produce a variety of GAP, OL and OLratio voxel values. Abbreviations are defined in Table 1.

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 934–943

938 G. Marie & G. Simioni Table 3. Distributions and bounds of values for the inputs used in the LHS to construct meta-models. Values between brackets are minimums and maximums except for beta and exponential distributions for which they are parameters. Abbreviations are defined in Table 1 Model inputs

LHS inputs

Temp Rg Hr DDratio wind Dec Stress OL,OLratio,GAP OL,OLratio,GAP

Temp Rg Hr DDratio wind Dec Stress Pinus SSage Quercus SSage

respective distributions. Uniform distributions were used for all inputs except for Wind and Stress. Based on a previous 4 year simulation using Font-Blanche climate data, we chose for Wind an exponential distribution to better match its asymmetric distribution shape. We chose for Stress a beta distribution, to better account for its bimodal distribution around zero and one. For convenience, in a given run, Stress was given the same value to all trees of the same species. The LHS comprised 100 one-day simulations. Due to the very large number of data generated for each simulation, 10% of the vegetation and soil voxels were randomly sampled. It represented a total of about 12 000 voxels, a sample large enough to cover the input and output variability while reducing the computation time in step 4.

Step 4: Fitting the meta-model To fit a meta-model, different statistical techniques can be used. Linear and polynomial regressions are useful to understand inputs/outputs relationships but can suffer from lack of accuracy when model complexity increases (i.e. nonlinearity, heterogeneity Simpson et al. 2001). On the other hand, machine learning gives better results when relationships are complex but creates black boxes that are difficult to interpret. Based on Villa-Vialaneix et al. (2012), we compared three statistical techniques according to their accuracies and computation times: Multivariate linear regression (MLR): the simplest and the most frequently used techniques. Three types were used, MLR with no (MLR1), two (MLR2) and three (MLR3) degrees of interactions. Polynomial regression spline (PMA): more complex regression techniques with polynomial and smoothing spline (Friedman 1991). Neural networks (NNT): a machine learning technique that mimics real neural networks to create complex empirical functions. Construction time can be expensive (Villa-Vialaneix et al. 2012). Others techniques (e.g. LASSO, Gaussian process, Random forest) were not tested here because of too long computing times that would be prohibitive in our study. We constructed a meta-model for each type of output and technique, that is five for Ts and five for PARa. To select the best meta-model technique, a second independent LHS data set was used. All statistical analyses were performed using the R software (R Development Core Team 2008). Polynomial regression splines were performed with the ‘polspline’ package. For neural networks, we used the ‘pcaNNet’ function of the ‘Caret’ package. The selected meta-models were integrated as an option into the noTG C++ source code, using the ‘Rinside’ R package for C++. In the following, when noTG is run with the option of using the meta-models, it is referred to as noTGmeta. Metamodel accuracy was assessed using the coefficient of determination (R2),

• • •

Unit ∘

C MJ m2 d1 % 0–1 m s1 rad 0–1 year year

Distribution

Parameters

Uniform Uniform Uniform Uniform Exponential Uniform Beta Uniform Uniform

[8, 40] [0, 36] [10, 100] [0, 1] [0, 12] [042, 042] [06, 15] [5, 40] [5, 40]

percentage error (PE) and absolute percentage error (APE) (see Appendix S4 for details).

Step 5: Testing the meta-model, a multi-scale error analysis When extending the model domain of use, it is important to evaluate error propagation, especially when only part of the model is metamodelled, and errors from the meta-modelled part could propagate to other parts of the model. Ecosystems also display important memory effects, meaning that errors can be amplified over time. It was therefore important to assess whether small errors at the beginning of a simulation would translate into large errors later on or not. In order to understand error patterns across different resolutions and extents, the accuracy of noTGmeta was tested at different temporal scales and levels of spatial aggregation: (i) at the day and the tree levels (Day & Tree); (ii) at the species and the year levels (Year & Species); and (iii) at the forest level for the full period (Period & Plot). At each scale, five integrative output variables were analysed: the absorbed photosynthetically active radiation (PARa), the transpiration (Wevap), the gross photosynthesis (Cass), the leaf area index (LAI) and the tree diameter at breast height (DBH). These variables are influenced by most of the processes included in noTG and as such provide a good basis for checking prediction errors. Multi-scale error analysis was carried out on two types of simulations. The first consisted in simulations of 50 years using nine plots of 15 9 15 m reconstituted from measurement of tree spatial coordinates and dimensions made at Font-Blanche. They represented a total of 118 trees (71 for Q. ilex; 47 for P. halepensis). Simulating long periods with noTG is challenging because of its long computing time. Computing time is critically dependent on the number of voxels containing foliage. We therefore chose to start these simulations with mature trees, and with fixed tree sizes (i.e. tree biomass could increase, but canopy dimensions remained fixed) to prevent an increase in voxel number. We acknowledge that possible effects due to changing tree size would then be ignored. But they would be ignored in both model and metamodel, so the test remains valuable to check for error propagation not linked to changes in tree size. The second type of simulation aimed at assessing meta-model performance when tree size changed. We used 100 years simulations on a virtual 25 9 25 m plot which started with 10-year-old trees, in which tree sizes were not fixed. Starting with small trees allowed to keep the number of voxels occupied by foliage reasonable during the whole simulation. It contained 11 Q. ilex and eight P. halepensis trees. The selected statistical metrics were the same as in the previous step.

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 934–943

Extending ecological models without sacrifices

939

However, in certain cases, inputs that were very close to the bounds the LHS generated larger errors. Upon inspection, these cases corresponded to unrealistic conditions (e.g. extremely low wind speed) not normally found in simulations with noTG.

Results SELECTING THE META-MODELS

All the meta-models obtained with the different techniques had very high R2 except with MLR (lowest R2 was 085). Neural networks and MLR3 produced the most accurate meta-models (Fig. 4). As they had better prediction time compared to MLR3, meta-models based on NNT were selected. Figure 5 illustrates the level of accuracy at voxel level obtained with the neural network-based meta-models. While the degree of accuracy at this very fine scale is extremely high, in very few cases, predictions were inaccurate. The meta-models were tested on a wide range of input values.

TESTING noTGMETA PREDICTION ERROR AND BIAS AT DIFFERENT TIME SCALES AND LEVELS OF ORGANIZATION

noTGmeta was more accurate and less biased for P. halepensis than for Q. ilex (Fig. 6 and Fig. S3.1 in Appendix S3). All unilateral Student’s t-tests concluded that errors were higher for Q. ilex than for P. halepensis.

Fig. 4. Prediction errors and prediction times of five meta-modelling techniques to predict PARa and Ts at voxel level for Quercus ilex, Pinus halepensis and soil. Computation time was made on about 1 million voxels. Prediction errors were performed on about 12 000 voxels of an independent data set. Mean APE was calculated for each species and soil entity. Techniques compared were MLR without or with two and three degrees of interactions (MLR2 and MLR3), PMA and NNT. Abbreviations are defined in Table 1.

40

Ts (°C)

30 20

600

10

400

–10

0

200

Fig. 5. Prediction accuracy of the selected meta-models for PARa and Ts at voxel level. Results for Quercus ilex, Pinus halepensis and soil were combined. Each dot represents one surface type in a voxel from one simulation from the validation LHS. Solid lines represent the 1:1 line. Abbreviations are defined in Table 1.

0

noTG

800

1000

PARa (W.m−2)

0

200

400

600

800

1000

–10

0

10

20

30

40

noTGmeta

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 934–943

940 G. Marie & G. Simioni

Fig. 6. Prediction errors of noTGmeta compared to noTG for nine 50 year and one 100 year simulations for a mixed forest of Quercus ilex and Pinus halepensis. Mean APE was computed at different scales: at each tree and day (Day & Tree), at each species and years (Year & Species), and for the two species combined over the full period (Period & Plot). Abbreviations are defined in Table 1.

noTGmeta was also more accurate and less biased at coarser resolutions, both in space and time (Fig. 6 and Fig. S3.1 in Appendix S3). Unilateral Student’s t-tests were significantly different in 777% of the cases. It means that prediction errors at fine scales had limited impact on the meta-model accuracy at coarser scales. Transpiration, photosynthesis and subsequent growth are driven by light absorption. However, for P. halepensis, errors in the prediction of light absorption were higher than those of other variables that depend on it (i.e. errors on light absorption did not propagate to other processes). On the opposite, for Q. ilex errors in the prediction of light absorption have propagated to photosynthesis and transpiration. The best accuracy was obtained for leaf area index and stem diameter, with mean APE not exceeding 5%. Errors on light absorption occurred mainly on days when light was low (Fig. S3.2 left in Appendix S3) and for smaller trees (Fig. S3.2 middle in Appendix S3). During those days, photosynthesis was also low. Therefore, errors in light absorption had limited consequences on photosynthesis over longer periods and/or at the species or plot levels (Fig. S3.2 right in Appendix S3). The longer the simulation was the higher the bias and the lower the accuracy (Fig. 6). This result was expected since errors in photosynthesis can entail errors in tree growth, resulting in errors in leaf area that will influence light interception and so on. However, even if errors increased over time, differences between noTG and noTGmeta remained reasonable, especially for stem diameter (Fig. 7).

Fig. 7. Predicted stem DBH for each tree from the 100 year simulation with noTG (plain lines) and noTGmeta (dashed lines). The trees with smaller diameters are Quercus ilex, the others are Pinus halepensis.

We found that for both species, stem diameter increment patterns at the tree level were the same for noTGmeta and noTG, with bias at the end of the simulation ranging between 144% and 137% (mean = 1%), and 026% and 226% (mean = 13%) for Q. ilex and P. halepensis, respectively.

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 934–943

Extending ecological models without sacrifices GAIN IN COMPUTATION TIME

Simulation time with noTGmeta was 29 h for the 100 year simulation, compared to 1823 h with noTG. noTGmeta was therefore more than 62 times faster. In addition while simulation time increases exponentially with plot size when using noTG, the increase is more linear when using noTGmeta (data not shown), meaning that the gain in computation time increases with plot size.

Discussion META-MODEL PERFORMANCE

Evaluating the meta-model against the original model is a critical step to ensure that the relevant details have been retained, and that sufficient precision has been achieved. However, defining an acceptable prediction error is problematic as the required precision depends on the targeted model application. For instance, if the goal is to provide quantitative predictions of a specific variable, the meta-model precision must be higher than if one seeks to assess qualitative general trends. In addition, when choosing between different meta-modelling options, there is a trade-off between rapidity and precision (e.g. Fig. 4). In any case, the error analysis, by quantifying prediction differences between the model and the meta-model, provides crucial information on the metamodel reliability. Based on this, each user can make an informed decision about using a meta-model or not. In our case, given the multi-level structure of the model (voxel, tree, species, ecosystem), we performed a multi-level error analysis. We also used different statistical metrics. In the meta-modelling literature, a coefficient of determination R2 = 095 or more is considered an excellent performance (Villa-Vialaneix et al. 2012). In our case, the highest R2 were obtained when using neural networks or multi linear regressions with two or three degrees of interactions. This reflects the fact that inputs/outputs relationships were complex, involving nonlinear interactions. A high R2 value, however, does not mean there is no bias. We used other indices in order to better quantify errors of predictions at different levels of aggregation. Quantifying those errors allowed to characterize the strengths and weaknesses of the meta-model, and knowing when it departed from the original model. This helped to define the range of situations for which the meta-model was an adequate tool, and those for which it was not. We obtained similar percentages of errors as in previous up-scaling studies (Boulain, Simioni & Gignoux 2007; Seidl et al. 2012). Then it is also worthwhile to investigate the sources of errors as it can help to assess their implications. In our case study, we found that the highest errors on light absorption occurred on days when little light is available and therefore had limited consequences on photosynthesis over longer periods. In addition, these errors occurred on smaller trees, which also had limited consequences on predictions at the ecosystem level. These errors likely occurred because dark conditions were under-represented in the data

941

set used to construct the meta-models (Fig. S3.3 in Appendix S3). This points to a possible issue with the construction data set. More generally, a likely source of inadequacy is when a meta-model is used outside of the bounds of the construction data set. It is therefore paramount to define the bounds and the shapes of the distributions of the inputs in the simulation design (step 3) in accordance with the meta-model’s objectives. For instance, if a meta-model is to be used in the context of climate change, the simulation design should encompass the proper ranges of temperature, precipitation and atmospheric carbon dioxide changes. To prevent such errors, it would be tempting to use the largest bounds as possible to construct the most robust meta-model. However, the more generic the meta-model, the more complex and the slower it is likely to be. It will also require a more comprehensive simulation design. There is therefore a likely trade-off between genericness and efficiency. If the domain of applicability for the metamodel is well known, it could be advantageous to reduce as much as possible the bounds of the inputs. Another possibility is to aggregate inputs to reduce their number. For example, the meta-model could be destined to be used always with the same species (’narrow spectrum’), or for many species (’broad spectrum’). In these two cases, the model level of details is the same but the domain of applicability is different. The ‘narrow spectrum’ case is more empirical, and species parameters can be reduced to only one input that implicitly includes a set of parameters for each species. The ‘broad spectrum’ case is more generic, and it would require to explicitly include the relevant species parameters in the meta-model. Explicit inputs increase the genericness and the complexity of the meta-model while aggregated inputs decrease its genericness and complexity. In our study, aggregated inputs were related to species and soil surface characteristics (leaf angle distribution, optical properties, conductance parameters). Therefore, simulating other species or soil types would require to construct different meta-models. An interesting option would be to combine complementary ‘narrow spectrum’ meta-models instead of a single ‘broad spectrum’ one. This would keep each meta-model relatively simple and efficient, while allowing to cover the same wide range of conditions a generic meta-model would. ADVANTAGES OF META-MODELLING IN ECOLOGY

In our study, we showed that meta-modelling is an interesting technique to extend a model domain of use, is relatively simple to implement and is based on common statistical methods in ecology. Some advantages can also bring some constraints. A meta-model retaining the same resolution also requires the same input data as the original model, as opposed to other upscaling techniques. Depending on data availability, this may restrain the use of the meta-model. Also by nature, meta-models suffer from the same limitation as empirical models, that is limited applicability outside of the domain for which they were

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 934–943

942 G. Marie & G. Simioni designed. Therefore, the meta-model may not be as robust or generic as the original model. However, as explained before there are ways to get around that limitation. A major application of meta-modelling is to allow to use a model at larger scales. According to Levin (1992), the problem is not to choose the correct level of aggregation but to understand the change that is taking place on many scales at the same time. By allowing to change the extend while retaining details and resolution, meta-modelling can be a powerful way to explore how processes at different scales interact, either in time or in space. Other up-scaling approaches such as those presented by Urban, Acevedo & Garman (1999) or Boulain, Simioni & Gignoux (2007) allow to do that to some extent. However, they usually only emulate a specific set of outputs of the original model, but do not retain all the details. In our study, simulating at the century scale allows to tackle questions that would be impossible with noTG, such as the effect of competition between trees on the long-term forest productivity and carbon sequestration. A faster meta-model also offers the opportunity to incorporate processes relevant to larger scales, such as those linked to demography or hydrology, thus vastly increasing the scope of the model. Independently of changing scale, by running faster, metamodels provide the ability to perform analyses that would involve a number of simulations too high to be practical with the original model. Sensitivity analyses are essential tools to study the properties of a model, but require many simulations, leading to a prohibitive computing demand in a case of slow models. Sensitivity analyses are examples of tasks that can be performed with a meta-model, but not (or not to the same extent) with other up-scaling approaches (Coutts & Yokomizo 2014). In this study, we demonstrated that it is possible to deploy the meta-modelling method to a case study displaying a level of complexity typically found in ecology. We showed that the meta-model was in reasonably good agreement with the original model. This leads confidence that such a technique would be of great interest to all areas of ecology.

Acknowledgements This work was made possible by a joint PhD grant between INRA and the region Provence-Alpes-C^ ote-d’Azur. Part of the simulations was performed using HPC resources from GENCI-CINES (Grant 2013-c2013016613). We wish to thank Dr H. Davi (INRA-URFM), E. Rigolot (INRA-URFM) and C. Bruchou (INRA-BioSp) for comments on the initial version of the manuscript. We also wish to thank four anonymous reviewers for helpful comments on an earlier version.

Data accessibility A case study on noTG model available in the Dryad Digital Repository (Marie & Simioni 2014).

References Auger, P., Charles, S., Viala, M. & Poggiale, J. (2000) Aggregation and emergence in ecological modelling: integration of ecological levels. Ecological Modelling, 127, 11–20.

Barton, R.R. (1994) Metamodeling: a state of the art review. Proceedings of the 26th conference on Winter simulation, pp. 237–244. Society for Computer Simulation International. Barton, R.R. (2009) Simulation optimization using metamodels. Winter Simulation Conference, pp. 230–238. Boulain, N., Simioni, G. & Gignoux, J. (2007) Changing scale in ecological modelling: a bottom up approach with an individual based vegetation model. Ecological Modelling, 203, 257–269. Britz, W. & Leip, A. (2009) Development of marginal emission factors for n losses from agricultural soils with the DNDC–CAPRI meta-model. Agriculture, Ecosystems & Environment, 133, 267–279. Busby, D. (2009) Hierarchical adaptive experimental design for Gaussian process emulators. Reliability Engineering & System Safety, 94, 1183–1193. Chen, T., Hadinoto, K., Yan, W. & Ma, Y. (2011) Efficient meta-modelling of complex process simulations with time–space-dependent outputs. Computers & Chemical Engineering, 35, 502–509. Coutts, S.R. & Yokomizo, H. (2014) Meta-models as a straightforward approach to the sensitivity analysis of complex models. Population Ecology, 56, 7–19. Friedman, L.W. (1984) Establishing functional relationships in multiple response simulation: the multivariate general linear metamodel. Proceedings of the 1984 Winter Simulation Conference, pp. 285–289. Friedman, J.H. (1991) Multivariate adaptive regression splines. The Annals of Statistics, 1–67. Helton, J. & Davis, F. (2003) Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering & System Safety, 81, 23–69. Kleijnen, J.P. (1975) A comment on Blanning’s “metamodel for sensitivity analysis: the regression metamodel in simulation.” Interfaces, 5, 21–23. Kleijnen, J. & Sargent, R. (2000) A methodology for fitting and validating meta-models in simulation. European Journal of Operational Research, 120, 14–29. Kwon, H.Y. & Hudson, R.J. (2010) Quantifying management-driven changes in organic matter turnover in an agricultural soil: an inverse modeling approach using historical data and a surrogate century-type model. Soil Biology and Biochemistry, 42, 2241–2253. Levin, S.A. (1992) The problem of pattern and scale in ecology: the Robert H. Macarthur award lecture. Ecology, 73, 1943–1967. Lischke, H., L€ offler, T.J., Thornton, P.E. & Zimmermann, N.E. (2007) Model up-scaling in landscape research. A Changing World, 249–272. Springer. Luo, Z., Wang, E., Bryan, B.A., King, D., Zhao, G., Pan, X. & Bende-Michl, U. (2013) Meta-modeling soil organic carbon sequestration potential and its application at regional scale. Ecological Applications, 23, 408–420. Marie, G. & Simioni, G. (2014) Data from: Extending the use of ecological models without sacrificing details: a generic and parsimonious metamodelling approach. Dryad Digital Repository, http://dx.doi.org/10.5061/ dryad.4254b. Moorcroft, P., Hurtt, G. & Pacala, S. (2001) A method for scaling vegetation dynamics: the ecosystem demography model (ED). Ecological Monographs, 71, 557–585. Pineros Garcet, J., Ordonez, A., Roosen, J. & Vanclooster, M. (2006) Metamodelling: theory, concepts and application to nitrate leaching modelling. Ecological Modelling, 193, 629–644. R Development Core Team (2008) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Seidl, R., Rammer, W., Scheller, R.M. & Spies, T.A. (2012) An individual-based process model to simulate landscape-scale forest ecosystem dynamics. Ecological Modelling, 231, 87–100. Simioni, G. (2001) Importance de la structure spatiale de la strate arboree sur les fonctionnements carbone et hydrique des ecosystemes herbes–arbres. Exemple d’une savane d’Afrique de l’Ouest. PhD thesis, Universite Paris XI, France. Simioni, G., Le Roux, X., Gignoux, J. & Sinoquet, H. (2000) Treegrass: a 3D, process-based model for simulating plant interactions in tree-grass ecosystems. Ecological Modelling, 131, 47–63. Simioni, G., Marie, G., Gigioux, J., Le Roux, X. & Huc, R. (2011) noTG: a 3D mechanistic model to study fine scale water and carbon budgets of multi-specific, heterogeneous ecosystems. MEDPINE 4: 4th International Conference on Mediterranean Pines. INRA. Simioni, G., Durand-Gillmann, M. & Huc, R. (2013) Asymmetric competition increases leaf inclination effecton light absorption in mixed canopies. Annals of Forest Science, 70, 123–131. Simpson, T., Peplinski, J., Koch, P. & Allen, J. (2001) Metamodels for computer based engineering design: survey and recommendations. Engineering with Computers, 17, 129–150.

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 934–943

Extending ecological models without sacrifices Tietjen, B. & Huth, A. (2006) Modelling dynamics of managed tropical rainforests–an aggregated approach. Ecological Modelling, 199, 421–432. Urban, D. (2005) Modeling ecological processes across scales. Ecology, 86, 1996. Urban, D.L., Acevedo, M.F. & Garman, S.L. (1999) Scaling fine-scale processes to large-scale patterns using models derived from models: meta-models. Spatial Modeling of Forest Landscape Change: Approaches and Applications, pp. 70–98. Cambridge University Press, Cambridge, UK. Villa-Vialaneix, N., Follador, M., Ratto, M. & Leip, A. (2012) A comparison of eight metamodeling techniques for the simulation of N2O fluxes and n leaching from corn crops. Environmental Modelling & Software, 34, 51–66. Received 8 January 2014; accepted 21 July 2014 Handling Editor: Tamara M€ unkem€ uller

943

Supporting Information Additional Supporting Information may be found in the online version of this article. Appendix S1. Model parameters. Appendix S2. Meta-model inputs. Appendix S3. Futher informations on error analysis. Appendix S4. Statistical metrics.

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 934–943