Toward large-scale crop production forecasts for global food security ...

9 downloads 322475 Views 2MB Size Report
food production, decision makers, scientists, and growers. must collaborate in .... The query is accelerated by the indexing scheme employed. to store data in ...... Her academic background includes two M Sc degrees, one related. to “GIS and ... Research Center He received his B Sc degree in computer science. from the ...
Toward large-scale crop production forecasts for global food security Predicting crop production plays a critical role in food price forecasting and mitigating potential food shortages. Crop models may require parameters from, for example, weather, crop genotype, farm management, and soil. Sources for these data are often found in very different places. Researchers spend a significant amount of time to collect and curate them. In addition, in order to scale yield forecasts from the single-farm level up to the continental scale, crop models have to be coupled with a geospatial big data platform to provide the required data inputs. In a proof-of-concept case study, we investigate the coupling of a scalable geospatial big data platform, Physical Analytics Integrated Repository and Services (PAIRS), to the Decision Support System for Agrotechnology Transfer (DSSAT) crop model. We envision running this system on a global scale. For geospatial analytics, PAIRS provides curation of heterogeneous data sources to simulate crop models using hundreds of terabytes of data.

Introduction The world population may potentially exceed nine billion by 2050 [1, 2], and it is estimated that food production must increase by 70% to meet the nutritional demand [3–5]. Most of the population growth is expected in the developing world where adoption of modern agricultural practices lags. To address this need of ever-increasing food production, decision makers, scientists, and growers must collaborate in developing global solutions that promote yield increase without aggravating soil erosion, water pollution, or further expansion of land use for agricultural purposes [3, 4]. Maximizing crop production on existing agricultural land can help protect remaining intact ecosystems from being converted. Crop models are an essential tool towards achieving this goal. Of particular note is that future crop production will encounter additional challenges due to climate change. Most likely farmers will have to face water scarcity, soil erosion, and desertification in many parts of the world. Traditionally, yield predictions have been based on well-controlled experiments where yield was monitored for many years across a few selected farms, and management practices were carefully recorded. Such studies are not Digital Object Identifier: 10.1147/JRD.2016.2591698

G. Badr L. J. Klein M. Freitag C. M. Albrecht F. J. Marianno S. Lu X. Shao N. Hinds G. Hoogenboom H. F. Hamann

available at regional scales, and this is where crop models can contribute most. For example, crop models have been used to study the impact of climate change on crop production. However, interpretation of the results can be challenging, as many of the input variables have large uncertainties [5–9]. Furthermore, crop growth, water availability, and soil health are interconnected. Therefore, they should be treated as coupled models. Large-scale modeling may require the integration and optimization of heterogeneous data sources. In particular, there is a large interest in estimating the potential yield of strategic crops (corn, wheat, and soybean) [5]. Beyond crop yield forecasting, these efforts may help to develop decision support tools to optimize livestock and crop choice, to improve the farm management, and to better inform policy makers [6]. These developments may ultimately help to address food security by taking into account multiple dependencies and understanding how optimizing one process may affect others. With the emergence of the Internet of Things (IoT), we are facing an increasing amount of data from a variety of sources such as mobile devices, distributed sensor networks, pictures, videos, and social media. Many of these data sources carry information that can be used for crop modeling. However, efficient usage of such structured

ÓCopyright 2016 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. 0018-8646/16 B 2016 IBM

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

G. BADR ET AL.

5:1

Figure 1 Geospatial big data framework delivered as a cloud service (API application program interface )

and unstructured data requires data integration and harmonization techniques. Previous studies of crop estimates [7–9] have demonstrated the importance and the challenges of creating datasets that are spatially aligned and have well-defined data schemas and standards [10]. Therefore, the quick integration of various data sources into physical and/or statistical models [11] requires new tools and approaches that can handle heterogeneous data spanning petabytes in volume. To address data management challenges, we introduced a new geospatial data platform, called Physical Analytics Integrated Repository and Services (PAIRS) [11], which is built on top of the open source technologies Apache Hadoop**/MapReduce [12] and HBase [13]. In PAIRS, all datasets are curated and indexed so that the data aligns spatially and temporally. Furthermore, this platform is sufficiently flexible to provide information at both local and global scales. In this work, we investigate the integration of a state-of-the-art wheat crop model (Decision Support System for Agrotechnology Transfer [DSSAT]) with PAIRS. The scope of this paper is to demonstrate the integration of a crop production model with a big data platform, rather than demonstrating prediction accuracy. For truly global crop production forecasting with acceptable accuracy, much more work will be required. As an initial case study, we present the integration of PAIRS and DSSAT for modeling winter wheat yields in Kansas in multiple locations for 34 years.

PAIRS: A geospatial big data platform The heterogeneity of data formats, including mismatched spatial and temporal resolution, requires extensive data

5:2

G. BADR ET AL.

preparation and processing before analytics can be carried out. Traditionally, data resides in large repositories or data warehouses from where it is downloaded when needed. However, this approach is not scalable when crop models are globally run. Thus, the direct access of large amounts of data is mandatory. One option to minimize repetitive steps in data processing is to spatially align all datasets and store them in a big data platform for immediate access. PAIRS is addressing this need by curating all datasets. As a result, the platform speeds up analytics by a factor of up to ten compared to traditional techniques (see Figure 6 in [11]). As mentioned, PAIRS is built on top of open source big data technologies [12, 13]. All data points in PAIRS are geo-indexed for quick search, identification, and retrieval [14]. Currently geospatial data integration into big data platforms is investigated by many groups [15–20]. The proposed combination of PAIRS and DSSAT is envisioned to speed crop modeling, enabling global and near real-time yield modeling [15–17]. The architecture of PAIRS is shown in Figure 1. Multiple modules have been developed to download, ingest, process, validate, and retrieve data. Unique features of the platform include spatially aligned and joined data layers stored in HBase, a key-value store [13]. The platform employs a set of spatial grids with fixed resolution. The spatial resolution of consecutive data layers is changing by a factor of two. Currently, PAIRS supports 29 global grids of different spatial resolutions, with unit grid resolution spanning from 0.1 m up to hundreds of kilometers [11]. All raw datasets are interpolated and re-gridded before being integrated into PAIRS as part of the data ingestion step.

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

Table 1 Crop model relevant data sets integrated into PAIRS. Two real-time analytics running on PAIRS are weather forecasting based on machine learning and irrigation forecasting for precision agriculture.

Each data point is indexed based on a geospatial location indexðx; yÞ and time of acquisition t. These labels serve to define a key, while the data point itself represents the value within the corresponding HBase table. For spatial indexing, a quad-tree approach is implemented [14], z ¼ f ðx; yÞ, and the PAIRS index reads k ¼ ðz; tÞ. The quad-tree follows a recursive scheme by successively splitting space into four squares of equal size. This ensures alignment of layers with different resolution. Because time series are stored successively in the HBase table, querying time series is very efficient with PAIRS. Downloading the raw data is automated using agents that scout repositories and download new datasets once they become available. These agents query the server for the latest updates and check if that dataset is already integrated into PAIRS; if the data is not found in PAIRS, an automatic command is launched to retrieve it. Automation of data download is critical, as terabytes of data are ingested, and their completeness cannot be easily verified through manual processes. For example, more than 800 Landsat tiles are generated daily [10], and they provide satellite-based information about vegetation growth that is required in near real-time to calibrate and to update crop models. Another feature of PAIRS is its multi-layer query capability, where one or multiple data layers can be selectively filtered based on one or multiple parameters [11]. The query is accelerated by the indexing scheme employed to store data in HBase tables [11, 14]. Such cross-layer queries tailor PAIRS for crop modeling, as weather, soil, or vegetation data queries can be limited to areas where corn, soybean, or wheat is planted. Some of the existing data layers in PAIRS that support crop modeling are shown in Table 1. Each layer’s spatial resolution and data acquisition frequency is detailed as well.

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

In addition to the data layers shown in Table 1, PAIRS can ingest custom data layers such as crop identification, planting pattern, and planting date estimates that are generated by custom analytics models. Once a new layer is added to PAIRS, it automatically gets spatially aligned with all other data layers. Thus, multi-layer queries of new datasets and existing PAIRS layers are readily enabled.

Decision support system for agrotechnology transfer Crop models have been used as a tool to predict the stability of production and to assess the impact of climate change on food security [21–27]. Yield forecasting can be approached using two tactics: 1) statistical forecasting methods, when significant historical yield data is available, or 2) crop models to estimate potential yield. As mentioned, DSSAT is one of the most widely used crop models [28–30]. DSSAT was developed by an international team of scientists to estimate production, resource use, and risks associated with different crop production practices. Several studies have used DSSAT to predict the yield of various crops with respect to climate change scenarios [31–35]. To run crop models on a global scale, the Earth surface is divided in a uniform grid and representative parameters for soil and weather are specified. Then, parameters from each cell are run through DSSAT to generate a yield prediction. To estimate crop production, many of the existing studies rely on data grids that exceed 50 km by 50 km [8, 16]. These cells may predict the average yield on a larger scale, but do not resolve finer spatial details, such as individual farms within the cell [21–27]. Alternatively, one can run the DSSAT model for individual farms and aggregate the values from all farms to predict yield for a region [35]. Farm-level modeling

G. BADR ET AL.

5:3

Figure 2 Schematic plot for the integration of DSSAT with PAIRS The case study of this paper manually implemented the decision tree shown in the lower part We envision automating this process to become part of the PAIRS technology

(with its generally more deterministic data about management practices) may be more accurate than statistical methods that are used for larger-scale modeling [35], but requires detailed input from individual farms about weather, management practices, seed genetics, and soil properties. In this study, the DSSAT model was coupled with PAIRS to simulate winter wheat (Triticum aestivum L.) yield at a farm level for a period of 34 years (Figure 2). The predicted yield was evaluated by comparing the results with wheat yield extracted from the U.S. Department of Agriculture (USDA) reports for Kansas State, where yields were reported at the county level [36]. The data provided by DSSAT is provisioned on-the-fly and the crop model is relying on PAIRS’s curated datasets. We note that there have been previous attempts to run crop models on a regional scale, or to combine crop modeling with statistical modeling [28] and machine learning [17]. Data provisioning to crop models on a global scale is just starting to become a reality [11, 18], and there are

5:4

G. BADR ET AL.

extensive efforts to scale the results from crop models from individual farms to regional levels; two such programs are ISI-MIP (Inter-Sectoral Impact Model Intercomparison Project) [8] and AgMIP (Agricultural Model Intercomparison and Improve Project) GGCMI (Global Gridded Crop Model Intercomparison) [7]. Study area Our study was focused on winter wheat data centered on Kansas, United States. Kansas has an area of 3,884,982 ha of winter wheat reported for 2014 [37]. The state extends from 37° to 40° N, and 94° 35¶ to 102° 3¶ W. The land elevation ranges from 207 m to more than 1,200 m. Kansas is located in the center of the continental United States; this location is associated with complex and varying weather patterns [38]. Two main geophysical features control the weather in Kansas, 1) the Rocky Mountains to the west and 2) the Gulf of Mexico to the south [39]. In addition, the variation in the elevation also has an impact on the weather and climate [39]. The eastern part of the state is extremely humid—and

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

Table 2 Yield estimate for 14 sites under four different scenarios and averaged across 34 years. (Units are in kg/ha.)

gradually becomes sub-humid and turns into dry sub-humid in the west [39]. The southwest region is recognized as semiarid, and the crop growth for that region is mainly drought-resistant sorghums, short grasses, and small grains. Most of the precipitation (70% to 75%) is accumulated from April to September [40]. Quantitatively, precipitation ranges from 105 mm in June to 18 mm in January [41]. Kansas’s average annual temperature varies from 10 °C to 14 °C [40], while statewide average monthly temperatures range from −1.6 °C in January to 26 °C in July [41]. The data used for our proof-of-concept was extracted from published data by Kansas State University [37] and USDA’s annual yield reports aggregated at the county level for Kansas [36]. Model specifications Calibration of DSSAT for winter wheat in Kansas The Crop-Environment-Resource-Synthesis (CERES-WHEAT) module [28–30] within DSSAT v4.6.1 computes vegetative and reproductive growth of the plant as a function of photosynthesis, growth stage, water, and nitrogen stress [28–30] on a daily basis. The module requires a set of minimum dataset (MDS) [28–30] to estimate the growth and development of winter wheat in a region. The required MDS data has been reported in detail in previous studies [28–30, 42, 43]. Performance trial reports available through Kansas State University Agricultural Experiment Station and

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

Cooperative Extension Service [37] were used to obtain input data for model calibration. This data included yield, planting date, harvest date, planting density, and fertilizer application. Based on the performance trial data, nine main test regions are established across Kansas for winter wheat. In order to have at least one site selected in each region, this study selected fourteen test sites distributed across Kansas to be representatives for all meteorological regions. The winter wheat land-cover for Kansas was obtained from PAIRS, and the location of the 14 test sites were identified using land-cover and land use data to confirm that indeed winter wheat was prevalently planted in those locations. The spatial distribution of winter wheat locations were used to identify representative soil types for that particular region (Table 2). This was then employed to set the simulation specifications and run the DSSAT-wheat model. Daily weather data was obtained for five years (2010 to 2015) from the Kansas State Mesonet [40] stations that are located near the trial sites. The weather data includes daily maximum air temperature, minimum air temperature, precipitation, and solar radiation. Using DSSAT v4.6.1, the growth and development of winter wheat for the 2010 to 2015 growing seasons were simulated for Colby, Kansas, which is located in the western region of the state, where wheat is being irrigated. Optimal conditions are assumed, and cultivar coefficients were estimated for this particular location.

G. BADR ET AL.

5:5

Based on the Kansas State University’s Agricultural Experiment Station and Cooperative Extension Service report, the “Everest” is the dominant cultivar in most of the winter wheat growing regions across Kansas. This cultivar was used for 14.3% of the state’s wheat in 2014 [37]. However, cultivar coefficients for “Everest” were not specified; therefore, the Generalized Likelihood Uncertainty Estimation (GLUE) tool was used to estimate the cultivar coefficients. Several specified parameters were extracted from the reports [37] and were used for setting up and running the calibration experiment for the Colby site.

Seasonal analysis of winter wheat in Kansas The winter wheat yield across the 14 locations was simulated for a period of 34 years (1980 to 2014). The simulation results were evaluated with respect to the historical reports for winter wheat yields by the United States Department of Agriculture National Agricultural Statistics Service (USDA NASS) [36]. Crop simulation specifications were developed based on the performance trials [37] (see the previous “Calibration” subsection for more details about the assumptions). Interpolated historical Daymet daily weather was used for DSSAT modeling [44]. Seasonal analysis was conducted across the 14 locations to assess the accuracy of the CERES-WHEAT model for estimating long-term historical yield [45]. This analysis could potentially be used as a tool to evaluate the suitability of this method for the estimation of in-season yields or projection of the yields for the upcoming years in the same locations and quantify the potential yield. The data on detailed management practices as well as irrigation strategy was not available for this study. In addition, the evaluation dataset had a county-level resolution, which makes the comparison more challenging. Therefore, the comparison of simulated yield for one site with the average yield for the county that encompasses the site may not be the best approach, due to the fact that so many environmental factors control the growth and development of winter wheat, and spatial aggregation of the data should be done with caution. The simulation specifications here are based on the “Everest” cultivar as the most popular cultivar, yet there is no guaranty that all of the winter wheat growers within one county are growing only one cultivar. Hence, in order to try to better capture the reality, four scenarios were defined: 1) Optimum yield simulation (assuming no water stress and no nitrogen stress). The phrase “no nitrogen stress” means that nitrogen fertilizer is applied at an optimum level to achieve maximum yield. 2) No irrigation, such that water stress was introduced, and the crop growth and development was based on

5:6

G. BADR ET AL.

Figure 3 Calibration data-set (2010 to 2015 growing seasons) for irrigated winter wheat in Colby, Kansas, USA

available precipitation. No nitrogen stress was introduced). 3) Irrigation was applied based on a schedule, and supplementary water became available to the crops but there was no nitrogen stress. 4) The crop was irrigated assuming the same irrigation regime in Case 3, combined with nitrogen fertilizer that was applied at a rate of 60 kg/ha. Yield data was simulated spanning 34 years under all four scenarios.

Case study of winter wheat yield For the crop simulation, the simulated onset of flowering (anthesis) date was, on average, 10 days earlier than the observed anthesis dates. This difference was carried through the model and may have contributed to the year-to-year variations in the yield. For all locations, the crop phenotype and soil was held the same for year-to-year variations, and anthesis was allowed to vary. Using the same soil parameters, management practices, cultivar coefficient, and allowing only the weather to change from year to year, the prediction sometimes underestimates while at other times overestimates the yield (Figure 3). Variations in the historical yield predictions, for four different sites, under different management practice scenarios are shown in Figure 4. Yield simulations were carried out using the

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

Figure 4 Winter wheat yield observations (Obs) in four locations, compared with crop yield simulations based on different management practices Crop irrigated (IR), Crop not irrigated (NIR), Crop irrigated and nitrogen applied (IRNA), and Yield potential (YP) The yield is averaged across 34 years for each survey methods

“Everest” cultivar across 34 growing seasons, and the various management practices were implemented to obtain the yield estimates across all locations (Table 2). For all sites, the average yield is specified in Table 2. The root mean square error (RMSE) [45] across all sites for different management scenarios is: not irrigated crop (1,298 kg/ha), irrigated crop (1,293 kg/ha), and irrigated and nitrogen applied crop (924 kg/ha). We note that these preliminary studies need to be continued further to lower the RMSE values by either

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

improving the cultivar coefficient or assessing the management practices. The difference and variation between modeling and observed yield can be partially estimated by uncertainties in the data used for the crop yield simulations. Any large-scale simulation needs to deal with partial or incomplete datasets that carry uncertainty related to input parameters like soil and cultivar. For all the farms, the irrigation or fertilizer application practices were not known. Therefore, part of the variation that was observed

G. BADR ET AL.

5:7

may be due to the lack of sufficient management practice data. In addition, it is known that some regions across Kansas conduct dry land cultivation for winter wheat production; dry land farming is highly dependent on the annual precipitation in the region, and thus it has an impact on the yield depending on a wet year versus a dry year. Hence, simulating yield based on the historical weather data that were obtained from gridded weather simulations may also hinder a “perfect” convergence among the simulated and observed yields and lead to a greater difference between the observed and simulated yields. The general trend of the yield variation was also evaluated by comparing the results obtained from the irrigated scenario versus the historical yield reports obtained from USDA. The year-to-year yield variations were reproduced for each region. This result, in turn, might indicate that with proper knowledge of a site, as well as availability of long-term data for that site, it is possible to robustly predict yield once well characterized yield data is collected. The observed data contains detailed information on management practices, fertilizer application (s), and cultivar coefficients. Therefore, this information becomes specific for individual farms; hence, the simulated yields become more reliable, and the model parameters can be up-scaled to predict yields at a regional or continental scale. Of note here is that in-season measurements could easily be ingested into PAIRS to re-calibrate the crop-model and improve yield predictions. However, the potential value of yield estimates decrease the closer one is to harvest time (at harvest time, the value of yield estimates is essentially zero, since harvest measurements can be taken). Thus, timing is important, and PAIRS offers the necessary performance for quick turnaround of crop model re-runs with new data.

Future steps and vision for DSSAT integrated into PAIRS In-season yield prediction requires constant updating of the input parameters of crop models. Since data from individual farms may be limited, remote observations may fill the gap to some degree. The observations may be in form of images of the farm or spectral information that track above-ground biomass and Leaf Area Index for individual crops. Currently, three main satellites are offering images with global coverage: Landsat, Sentinel, and Moderate Resolution Imaging Spectroradiometer (MODIS). These satellites can be used to observe crop growth and assess management practices [46–50]. The dynamics of vegetation indices during a growing season is valuable for detecting the phenological variability [51–58]. MODIS, Landsat, and Sentinel information is being integrated as raster layers into PAIRS.

5:8

G. BADR ET AL.

To compensate for missing satellite observations due to cloud cover, the spatially coarser resolution data from MODIS (which has a fine temporal resolution instead) needs to be blended with Landsat and Sentinel information to extract vegetation data. Algorithms to merge two satellite datasets with different spatial resolution have been developed, but operationalizing this requires further work [46, 59]. Spectral information from satellites can be used for crop recognition and estimation of the total area cultivated. Furthermore, remote observations can provide the dynamics of the Normalized Difference Vegetation Index (NDVI) or the Leaf Area Index (LAI), which in turn might be coupled with the DSSAT model. Additionally, the satellite data can be used to estimate the planting and the harvest day that may vary across geographical regions and be crop specific. Weather is another key component to correctly model yield. In PAIRS, such data is becoming available in a consistent way. Two examples involve the Global Forecast System (GFS) and the European Medium Resolution Mesoscale Models (ECMWF). In addition, PAIRS hosts regional models like the North American Mesoscale Model (NAM) that makes predictions for smaller regions. Since all numerical models have their strengths and weaknesses, it can be beneficial to adapt these models to local conditions by statistical techniques such as situation-dependent model blending [50]. In such a scheme, different weather models are combined based on weighting coefficients that can be obtained from training datasets. PAIRS runs a machine learning algorithm to pick the best combination of individual weather models. Given a certain cultivar, one more aspect that may be considered to answer questions with respect to food security involves globally mapping locations that produce maximum yield. Regions with similar environmental conditions (if managed with the same management practices) have the potential to produce very similar yields for a given cultivar [51]. The cross-layer searching capability of PAIRS could map all regions that share the same weather and soil characteristics.

Conclusion In this proof-of-concept, we have linked a crop model (DSSAT) with a big data geospatial platform (PAIRS) in order to assess the feasibility of scaling up yield predictions to regional and ultimately global level. Here we focused on winter wheat in Kansas, but this approach can be transferred to other crops, scaling the spatial region as well. Although in this case study the prediction accuracy is disappointing, several options were discussed to improve on the approach, which will be the subject of future research. One motivation for this preliminary report was to raise challenges for future developments of integrating DSSAT into PAIRS.

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

Information technology can play a significant role in providing the data required to run crop models under different climate and management scenarios, and this may be helpful when considering certain food security issues. Yield forecasts on a regional scale can help decision makers, growers, and researchers to better manage supply and demand and have access to information that can be used to validate models which, in turn, can predict food challenges, potential areas of insecurity with respect to food, and/or crop prices.

Acknowledgments We thank Robin Lougee, Stuart Siegel, Upendra Chitnis, Satish Gajjela, and Supratik Guha for informative and constructive discussions. * Trademark, service mark, or registered trademark of International Business Machines Corporation in the United States, other countries, or both

** Trademark, service mark, or registered trademark of Apache Software Foundation in the United States, other countries, or both

References 1 H Charles, J Godfray, J R Beddington, I R Crute, L Haddad, D Lawrence, J F Muir, J Pretty, S Robinson, S M Thomas, and C Toulmin, “Food security The challenge of feeding 9 billion people,” Science, vol 327, no 5967, pp 812–818, 2010 2 L See, S Fritz, L You, N Ramankutty, M Herrero, C Justice, I Becker-Reshef, P Thornton, K Erb, P Gong, H Tang, M van der Velde, P Ericksen, I McCallum, F Kraxner, and M Obersteiner, “Improved global cropland data as an essential ingredient for food security,” Global Food Security, vol 4, pp 37–45, 2015 3 D Tilman, C Balzer, J Hill, and B L Befort, “Global food demand and the sustainable intensification of agriculture,” Proc Nat Acad Sci USA, vol 108, no 50, pp 20260–20264, 2011 4 B A Keating and P S Carberry, “Sustainable production, food security and supply chain implications,” Aspects Appl Biol , vol 102, pp 7–19, 2010 5 P S Carberry, W Liang, S Twomlow, D P Holzworth, J P Dimes, T McClelland, N I Huth, F Chen, Z Hochman, and B A Keating, “Scope for improved eco-efficiency varies among diverse cropping systems,” Proc Nat Acad Sci , vol 110, no 21, pp 8381–8386, 2013 6 D P Holzworth, V Snow, S Janssen, I N Athanasiadis, M Donatelli, G Hoogenboom, J W White, and P Thorburn, “Agricultural production systems modelling and software Current status and future prospects,” Environ Model Softw , vol 72, pp 276–286, 2015 7 J Elliott, D Deryng, J Chryssanthacopoulos, K J Boote, I Foster, M Glotter, J Heinke, T Iizumi, R Izaurralde, N Mueller, D Ray, C Rosenzweig, A Ruane, and J Sheffield, “The global gridded crop model intercomparison Data and modeling protocols for Phase 1 (v1 0),” Geosci Model Dev , vol 8, pp 261–277, 2015 8 C Rosenzweig, J Elliott, D Deryng, A C Ruane, A Arneth, K J Boote, C Folberth, M Glotter, N Khabarov, C Müller, K Neumann, F Piontek, T Pugh, E Schmid, E Stehfest, and J W Jones, “Assessing agricultural risks of climate change in the 21st century in a global gridded crop model intercomparison,” Proc Nat Acad Sci , vol 111, no 9, pp 3268–3273, 2014

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

9 S Sonka, “Big data and the Ag sector More than lots of numbers,” Int Food Agribusiness Manage Rev , vol 17, no 1, pp 1–20, 2014 10 D P Roy, M A Wulder, T R Loveland, C E Woodcock, R G Allen, M C Anderson, D Helder, J R Irons, D M Johnson, R Kennedy, T A Scambos, C B Schaaf, J R Schott, Y Sheng, E F Vermote, A S Belward, R Bindschadler, W B Cohen, F Gao, J D Hipple, P Hostert, J Huntington, C O Justice, A Kilic, V Kovalskyy, Z P Lee, L Lymburner, J G Masek, J McCorkel, Y Shuai, R Trezza, J Vogelmann, R H Wynnea, and Z Zhu, “Landsat-8 Science and product vision for terrestrial global change research,” Remote Sens Environ , vol 145, pp 154–172, 2014 11 L J Klein, F J Marianno, C M Albrecht, M Freitag, S Lu, N Hinds, X Shao, S Bermudez Rodriguez, and H F Hamann, “PAIRS A scalable geo-spatial data analytics platform,” in Proc IEEE Int Conf Big Data, 2015, pp 1290–1298 12 J Dean and S Ghemawat, “MapReduce Simplified data processing on large clusters,” in Proc 6th Symp OSDI, San Francisco, CA, USA, Dec 2004, pp 1–13 13 L George, HBase The Definitive Guide, 1st ed Sebastopol, CA, USA O’Reilly Media, 2011 14 G Morton, “A computer-oriented geodetic data base and a new technique for file sequencing,” IBM Canada, Res Rep , Markham, ON, Canada, 1966 [Online] Available http //domino research ibm com/library/cyberdig nsf/0/ 0dabf9473b9c86d48525779800566a39?OpenDocument 15 H Karambelkar, Scaling Big Data with Hadoop and Solr Birmingham, U K Packt, 2013 16 Z Zhang, K Barbary, F Austin Nothaft, E Sparks, O Zahn, M J Franklin, D A Patterson, and S Perlmutter, “Scientific computing meets big data technology An astronomy use case,” in Proc IEEE Int Conf Big Data, Oct 29–Nov 1, 2015, pp 918–927 17 J M Bennett, “Agricultural big data Utilization to discover the unknown and instigate practice change,” Farm Policy J , vol 12, no 1, pp 43–50, 2015 18 A Aji, F Wang, H Vo, R Lee, Q Liu, and X Zhang, “Hadoop GIS A high performance spatial data warehousing system over mapreduce,” Proc VLDB Endowment, vol 6, no 11, pp 1009–1020, 2013 19 A Eldawy and M F Mokbel, “SpatialHadoop A MapReduce framework for spatial data,” in Proc IEEE Int Conf Data Eng , 2015, pp 1352–1363 20 F A Nothaft, M Massie, and T Danford, “Rethinking data-intensive science using scalable analytics systems,” in Proc ACM SIGMOD, Int Conf Manage Data, 2015, pp 631–646 21 F Tao, Z Zhang, J Liu, and M Yokozawa, “Modelling the impacts of weather and climate variability on crop productivity over a large area A new super-ensemble-based probabilistic projection,” Agricultural Forest Meteorol , vol 149, no 8, pp 1266–1278, Aug 2009 22 C Rosenzweig and M L Parry, “Potential impact of climate change on world food supply,” Nature, vol 367, no 6459, pp 133–138, Jan 1994 23 W R Cline, “Global warming and agriculture Impact estimates by country,” Center Global Develop , Peterson Inst Int Econ , Washington, DC, USA, 2007 24 M L Parry, C Rosenzweig, A Iglesias, M Livermore, and G Fischer, “Effects of climate change on global food production under SRES emissions and socio-economic scenarios,” Global Environ Change, vol 14, no 1, pp 53–67, Apr 2004 25 “World bank development report 2010 Development and climate change,” World Bank, Washington, DC, USA, 2010 [Online] Available http //siteresources worldbank org/INTWDR2010/ Resources/5287678-1226014527953/WDR10-Full-Text pdf 26 T Wheeler and J von Braun, “Climate change impacts on global food security,” Science, vol 341, no 6145, pp 508–513, Aug 2013

G. BADR ET AL.

5:9

27 Y Kang, S Khan, and X Ma, “Climate change impacts on crop yield, crop water productivity and food security—A review,” Progr Natural Sci , vol 19, no 12, pp 1665–1674, Dec 2009 28 J W Jones, G Y Tsuji, G Hoogenboom, L A Hunt, P K Thornton, P W Wilkens, D T Imamura, W T Bowen, and U Singh, “Decision support system for agrotechnology transfer DSSAT v3,” in Understanding Options for Agricultural Production, G Y Tsuji, G Hoogenboom, and P K Thornton, Eds Dordrecht, The Netherlands Springer-Verlag, 1998, pp 157–177 29 J W Jones, G Hoogenboom, C H Porter, K J Boote, W D Batchelor, L A Hunt, P W Wilkens, U Singh, A J Gijsman, and J T Ritchie, “The DSSAT cropping system model,” Eur J Agronomy, vol 18, no 3/4, pp 235–265, Jan 2003 30 G Hoogenboom, J W Jones, P W Wilkens, C H Porter, K J Boote, L A Hunt, U Singh, J I Lizaso, J W White, O Uryasev, R Ogoshi, J Koo, V Shelia, and G Y Tsuji, “Decision Support System for Agrotechnology Transfer (DSSAT) Version 4 6 1 (www DSSAT net),” DSSAT Found , Prosser, Washington, DC, USA, 2015 31 C M T Soler, P C Sentelhas, and G Hoogenboom, “Application of the CSM-CERES-Maize model for planting date evaluation and yield forecasting for maize grown off-season in a subtropical environment,” Eur J Agronomy, vol 27, no 2–4, pp 165–177, Oct 2007 32 I H Jang and Y C Choe, “Forecasting rice productivity using a neural network method A global warming scenario,” Adv Sci Technol Lett , vol 49, pp 222–228, 2014 33 Q Luo, M A J Williams, W Bellotti, and B Bryan, “Quantitative and visual assessments of climate change impacts on South Australian wheat production,” Agricultural Syst , vol 77, no 3, pp 173–186, Sep 2003 34 J F Lawless, Statistics in Action A Canadian Outlook Boca Raton, FL, USA CRC Press, 2014 35 T Huffman, B Qian, R De Jong, J Liu, H Wang, B McConkey, and J Yang, “Upscaling modelled crop yields to regional scale A case study using DSSAT for spring wheat on the Canadian prairies,” Can J Soil Sci , vol 95, no 1, pp 49–61, 2015 36 United States Department of Agriculture National Agricultural Statistics Service (USDA NASS) [Online] Available http // quickstats nass usda gov/ 37 “Winter wheat varieties Kansas performance tests,” Kansas Kansas State Univ Agricultural Exp Station Coop Extension Serv , Manhattan, KS, USA, 2014 [Online] Available http //www bookstore ksre ksu edu/pubs/SRP1108 pdf 38 D G Goodin, J E Mitchell, M C Knapp, and R E Bivens, “Climate and Weather Atlas of Kansas; an Introduction,” educational series 12, Kansas Geological Survey, Lawrence, Kansas 1995, reprinted 2004 [Online] Available http //www kgs ku edu/Publications/Bulletins/ED12/KGS_ED12 pdf 39 C W Thornthwaite, “Atlas of climate types of the United States 1900–1939,” U S Dept Agriculture, Washington, DC, USA, Miscellaneous Bull 421, 7p, 1941 40 Kansas State University Mesonet [Online] Available http //mesonet k-state edu/ 41 Kansas Climate [Online] Available http //www k-state edu/ ksclimate/ 42 L A Hunt and K J Boote, “Data for model operation, calibration and evaluation,” in Understanding Options for Agricultural Production, G Tsuji, G Hoogenboom, and P Thornton, Eds Dordrecht, The Netherlands Kluwer/ICASA, 1998, pp 9–40 43 G Hoogenboom, J W Jones, P C S Traore, and K J Boote, “Experiments and data for model evaluation and application,” in Improving Soil Fertility Recommendations in Africa Using the Decision Support System for Agrotechnology Transfer (DSSAT), J Kihara, D Fatondji, J W Jones, G Hoogenboom, R Tabo, and A Bationo, Eds Dordrecht, The Netherlands Springer-Verlag, 2012 44 P E Thornton, M M Thornton, B W Mayer, N Wilhelmi, Y Wei, R Devarakonda, and R B Cook, “Daymet Daily

5 : 10

G. BADR ET AL.

45 46

47

48 49

50

51

52 53

54 55

56

57

58

59

surface weather data on a 1-km grid for North America, Version 2,” ORNL DAAC, Oak Ridge, TN, USA, accessed Jun 15, 2015, Time period 1981-01-01 to 2015-12-31 Spatial range N ¼ 37, S ¼ 40, E ¼ 94 55, W ¼ 102 05, 2014, doi http //dx doi org/10 3334/ORNLDAAC/1219 G James, D Witten, and T Hastie, An Introduction to Statistical Learning With Applications in R Berlin, Germany Springer-Verlag, 2014 G Feng, J Masek, M Schwaller, and F Hall, “On the blending of the Landsat and MODIS surface reflectance Predicting daily Landsat surface reflectance,” IEEE Trans Geosci Remote Sens , vol 44, no 8, pp 2207–2218, Aug 2006 S Grunwald, J A Thompson, and J L Boettinger, “Digital soil mapping and modeling at continental scales Finding solutions for global issues,” Soil Sci Soc Amer J , vol 75, no 4, pp 1201–1213, 2011 P A Sanchez, S Ahamed, and F Carre, “Digital soil map of the world,” Science, vol 325, pp 680–681, 2009 T Hengl, J M de Jesus, R A MacMillan, N H Batjes, G Heuvelink, E Ribeiro, A Samuel-Rosa, B Kempen, J Leenaars, M Walsh, and M Gonzalez, “SoilGrids1km— Global soil information based on automated mapping,” PLoS ONE, vol 9, no 8, 2014, Art no e105992 S Lu, Y Hwang, I Khabibrakhmanov, F Marianno, X Shao, J Zhang, B Hodge, and H Hamann, “Machine learning based multi-physical-model blending for enhancing renewable energy forecast—Improvement via situation dependent error correction,” in Proc ECC, Jul 15–17, 2015, pp 283–290 C Löffler, J Wei, T Fast, J Gogerty, S Langton, M Bergman, B Merrill, and M Cooper, “Classification of maize environments using crop simulation and geographic information systems,” Crop Sci , vol 45, no 5, pp 1708–1716, 2005 P J Gregory, J S I Ingram, and M Brklacich, “Climate change and food security,” Philosoph Trans Roy Soc London B, Biol Sci , vol 360, no 1463, pp 2139–2148, Nov 2005 B C Reed, J F Brown, D VanderZee, T R Loveland, J W Merchant, and D O Ohlen, “Measuring phenological variability from satellite imagery,” J Vegetation Sci , vol 5, no 5, pp 703–714, 1994 N Pettorelli, The Normalized Difference Vegetation Index London, U K Oxford Univ Press, 2013 G Badr, G Hoogenboom, J Davenport, and J Smithyman, “Estimating growing season length using vegetation indices based on remote sensing A case study for vineyards in Washington state,” Trans ASABE, vol 58, no 3, pp 551–564, 2015 P M Atkinson, C Jeganathan, J Dash, and C Atzberger, “Inter-comparison of four models for smoothing satellite sensor time-series data to estimate vegetation phenology,” Remote Sens Environ , vol 123, pp 400–417, Aug 2012 A Fischer, “A model for the seasonal variations of vegetation indices in coarse resolution data and its inversion to extract crop parameters,” Remote Sens Environ , vol 48, no 2, pp 220–230, May 1994 X Zhang, J C F Hodges, C B Schaaf, M A Friedl, A H Strahler, and F Gao, “Global vegetation phenology from AVHRR and MODIS data,” in Proc IEEE IGARSS, 2001, vol 5, pp 2262–2264 P C Doraiswamy, J L Hatfield, T J Jackson, B Akhmedov, J Prueger, and A Stern, “Crop condition and yield simulations using Landsat and MODIS,” Remote Sens Environ , vol 92, no 4, pp 548–559, Sep 2004

Received December 8, 2015; accepted for publication January 16, 2016 Golnaz Badr Washington State University, Prosser, WA 99350 USA (golnaz badr@wsu edu) As part of the AgWeatherNet program, Ms Badr is a Ph D candidate at the Washington State

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

University Department of Biological Systems Engineering Her academic background includes two M Sc degrees, one related to “GIS and Earth Observation for Environmental Modelling and Management” and the other in agricultural engineering Her main research interests include the applications of geospatial technologies for precision agriculture and crop modelling Via an IBM Ph D fellowship, Ms Badr worked at the T J Watson Research Center in the summer of 2015

Levente J. Klein IBM Research, Thomas J Watson Research Center, Yorktown Heights, NY 10598 USA (kleinl@us ibm com) Dr Klein is a Research Staff Member in the Physical Sciences Department at the IBM T J Watson Research Center, Yorktown Heights, NY Since joining IBM Research in 2006 he developed technologies to enable energy-efficient cooling in data centers and to integrate physical models in real-world applications Current research interests focus on geospatial data management, application of remote sensing for precision agriculture, and distributed sensing Dr Klein is a member of the American Physical Society (APS), American Vacuum Society (AVS) and the New York Academy of Sciences Marcus Freitag IBM Research, Thomas J Watson Research Center, Yorktown Heights, NY 10598 USA (mfreitag@us ibm com) Dr Freitag is a Research Staff Member in the Physical Analytics group at the IBM T J Watson Research Center He received his Ph D degree in physics and astronomy in 2002 from the University of Pennsylvania Dr Freitag holds a “Diplom” in Physics from the University of Tübingen, Germany (1998), and an M S degree in physics from the University of Massachusetts (1996) After postdoctoral work for Carbon Nanotechnologies, Inc he joined the Research Division of IBM in 2004 He has authored and co-authored more than 70 scientific papers with more than 6000 citations Dr Freitag has previously worked extensively on graphene and carbon nanotubes He is currently interested in geospatial analytics, in particular for the field of agriculture Conrad M. Albrecht IBM Research, Thomas J Watson Research Center, Yorktown Heights, NY 10598 USA (cmalbrec@us ibm com) Dr Albrecht is a research scientist in the Physical Sciences department at the IBM T J Watson Research Center He received his Ph D degree in physics with an extra certification in computer science from Heidelberg University, Germany in 2014 Dr Albrecht’s research interests focus on interconnecting physical models and numerical analysis, employing big data technologies As a member of the Physical Analytics group at the T J Watson Research Center, he works on big data processing for geospatial data analysis and machine learning driven remote sensing

Fernando J. Marianno IBM Research, Thomas J Watson Research Center, Yorktown Heights, NY 10598 USA (fjmarian@us ibm com) Mr Marianno is a Senior Software Engineer in the Science and Technology Department at the IBM T J Watson Research Center He received his B Sc degree in computer science from the Methodist University of Piracicaba (UNIMEP) in 2001 He received an IBM Outstanding Technical Achievement Award for developing Measurement and Management Technology in 2013 and another Outstanding Technical Award in 2015 for his contributions to the corrosion monitoring on pSeries* and zSeries* platforms He is author or co-author of 9 patents and 10 technical papers Mr Marianno is a member of IEEE (Institute of Electrical and Electronics Engineers) and AGU (American Geophysical Union)

Siyuan Lu IBM Research, Thomas J Watson Research Center, Yorktown Heights, NY 10598 USA (lus@us ibm com) Dr Lu is in the Department of Physical Sciences at IBM Research He received his Ph D degree in physics in 2006 from the University of Southern

IBM J. RES. & DEV.

VOL. 60

NO. 5/6

PAPER 5

SEPTEMBER/NOVEMBER 2016

California Before joining IBM Research, he was an assistant research professor jointly appointed in the Department of Physics and the Department of Ophthalmology at the University of Southern California Dr Lu’s current research interests at IBM include nanostructured sensors, sensor networks, and data-driven modeling of complex systems In one focus area, he is developing a technology for accurate renewable energy forecasting, combining physics modeling with big data processing and deep machine learning capabilities Dr Lu has co-authored more than 20 peer-reviewed articles and has served as journal reviewer and member on governmental committees

Xiaoyan Shao IBM Research, Thomas J Watson Research Center, Yorktown Heights, NY 10598 USA (shaox@us ibm com) Dr Shao is a Research Staff Member in the Physical Sciences Department at the IBM T J Watson Research Center She received a B S degree in mechanical engineering from Tsinghua University in 1995 and M S and Ph D degrees in materials science and engineering from the Johns Hopkins University in 2002 She subsequently joined IBM at the T J Watson Research Center, where she has worked on hard disk drives, copper interconnects for microelectronics, silicon photovoltaic solar cells, magnetoresistive random-access memory (MRAM), and physical analytics She is author or coauthor of more than 30 issued patents Dr Shao is a member of ECS (Electrochemical Society), MRS (Materials Research Society), and INFORMS (Institute for Operations Research and the Management Sciences)

Nigel Hinds IBM Research, Thomas J Watson Research Center, Yorktown Heights, NY 10598 USA (nhinds@us ibm com) Dr Hinds is a Technical Staff Member at the IBM T J Watson Research Center He received his Ph D and M S degrees in Computer Science from the University of Michigan In the past, he has contributed to the National Aeronautics and Space Administration (NASA) Earth Observing System project as well as Linux open source projects His current interests include operating systems, distributed resource management, and large-scale data analytics systems

Gerrit Hoogenboom AgWeatherNet Program, Washington State University, Prosser, WA 99350 USA (gerrit hoogenboom@ wsu edu) Dr Hoogenboom is the Director of the AgWeatherNet Program and Professor of Agrometeorology at Washington State University He has more than 25 years of experience in research, education, and outreach in agricultural and environmental engineering He has specialized in the development and application of crop simulation models and decision support systems He currently coordinates work on DSSAT He has published more than 280 scientific papers in refereed journals as well as numerous book chapters and proceedings He is an Editor for Agricultural Systems, Journal of Agricultural Science (Cambridge), Climate Research, and Scientia Agricola

Hendrik F. Hamann IBM Research, Thomas J Watson Research Center, Yorktown Heights, NY 10598 USA (hendrikh@us ibm com) Dr Hamann is a Research Manager at the IBM T J Watson Research Center He received his Ph D degree from the University of Goettingen in Germany Since 2001, he is leading the Physical Analytics program in IBM Research, as a Research Manager His expertise includes sensor networks, sensor-based physical modeling, oil and gas, renewable energy, precision agriculture, energy management, and system physics He has authored and co-authored more than 70 scientific papers and holds over 85 patents Dr Hamann has served on governmental committees such as the National Academy of Sciences

G. BADR ET AL.

5 : 11