WATER RESOURCES RESEARCH, VOL. 41, W03019, doi:10.1029/2003WR002901, 2005

Framework to evaluate the worth of hydraulic conductivity data for optimal groundwater resources management in ecologically sensitive areas Luc Feyen1 Department of Geological and Environmental Sciences, Stanford University, Stanford, California, USA Hydrogeology and Engineering Geology, Katholieke Universiteit Leuven, Leuven, Belgium

Steven M. Gorelick Department of Geological and Environmental Sciences, Stanford University, Stanford, California, USA Received 24 November 2003; revised 27 December 2004; accepted 13 January 2005; published 18 March 2005.

[1] We propose a framework that combines simulation optimization with Bayesian

decision analysis to evaluate the worth of hydraulic conductivity data for optimal groundwater resources management in ecologically sensitive areas. A stochastic simulation optimization management model is employed to plan regionally distributed groundwater pumping while preserving the hydroecological balance in wetland areas. Because predictions made by an aquifer model are uncertain, groundwater supply systems operate below maximum yield. Collecting data from the groundwater system can potentially reduce predictive uncertainty and increase safe water production. The price paid for improvement in water management is the cost of collecting the additional data. Efficient data collection using Bayesian decision analysis proceeds in three stages: (1) The prior analysis determines the optimal pumping scheme and profit from water sales on the basis of known information. (2) The preposterior analysis estimates the optimal measurement locations and evaluates whether each sequential measurement will be costeffective before it is taken. (3) The posterior analysis then revises the prior optimal pumping scheme and consequent profit, given the new information. Stochastic simulation optimization employing a multiple-realization approach is used to determine the optimal pumping scheme in each of the three stages. The cost of new data must not exceed the expected increase in benefit obtained in optimal groundwater exploitation. An example based on groundwater management practices in Florida aimed at wetland protection showed that the cost of data collection more than paid for itself by enabling a safe and reliable increase in production. Citation: Feyen, L., and S. M. Gorelick (2005), Framework to evaluate the worth of hydraulic conductivity data for optimal groundwater resources management in ecologically sensitive areas, Water Resour. Res., 41, W03019, doi:10.1029/2003WR002901.

1. Introduction [2] We address three interrelated concerns that underlie the development of an optimal groundwater resources plan in an ecologically sensitive area. First is maximizing profits from groundwater production while limiting the decline of water levels in wetland areas. The permitted water table decline can be dictated by governmental regulation, or can be self-imposed. Second is accounting for uncertainty in predicted water levels due to unknown spatial variability in hydraulic conductivity. Third is optimizing the data collection strategy so that new conductivity data both reduce uncertainty and provide the greatest economic benefit to the water manager of reliable regional groundwater production. 1 Now at Land Management Unit, Institute for Environment and Sustainability, European Commission, Ispra, Italy.

Copyright 2005 by the American Geophysical Union. 0043-1397/05/2003WR002901

[3] We adopt the perspective of a water manager whose goal is to improve local production in the most costeffective way, or to layout the optimal measurement design to optimally exploit new sources. Although the perspective of the water manager in this case is that of a profit maximizer, an alternative analogous formulation is that the water manager is a cost minimizer. The latter case might be applicable to a public agency that assumes the responsibility of providing a reliable least cost water supply while meeting environmental and regulatory constraints. The underlying concepts and approach presented here are the same whether the goal of water manager is to maximize profits or minimize costs. [4] The three interrelated concerns above are addressed using an optimal design, data collection, and a Bayesian decision-making framework. Combined groundwater simulation and optimization is the tool used for optimal design (for reviews, see Gorelick [1983, 1990], Yeh [1992], Ahlfeld and Heidari [1994], and Wagner [1995b]). Specifically, the design component employs stochastic simulation optimiza-

W03019

1 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

tion to plan regionally distributed groundwater pumping while preserving the hydroecological balance in wetland areas. A stochastic approach is required because predictions made by an aquifer model are imprecise. Consequently, any groundwater extraction scheme based on predicted water levels is uncertain. To guarantee that the constraints aimed at wetland protection are met when there is significant doubt about the predicted water level, the regional extraction system is forced to operate below its equivalent certainty level. The water manager must intentionally extract less water from the aquifer to ensure that wetland areas do not dry up. The maximum safe and reliable groundwater production rates will generally be lowest when model uncertainty is greatest. Here ‘‘safe’’ means that the damage to wetlands, as measured by violating water level constraints and caused by pumping to obtain a sufficient water supply, is relegated to a low-probability event by carefully designing an optimal pumping plan. [5] Here we consider uncertainty due to spatially variable hydraulic conductivity. To account for the spatial variability in the hydraulic conductivity, the multiple realization management model formulation is used [Gorelick, 1987; Wagner and Gorelick, 1989; Morgan et al., 1993]. This is a simulation optimization technique in which numerous realizations of spatially variable hydraulic conductivity are considered simultaneously in a single optimization formulation. In our case, the simulation model is nonlinear because it is used to represent unconfined aquifer behavior. Consequently, the stochastic optimization problem requires nonlinear programming (NLP). Here a gradient-based iterative method is employed to solve the NLP [Gill et al., 2002]. In some situations, perhaps problem nonlinearities can be ignored thereby enabling the use of a simpler and less computationally burdensome response matrix method that relies on linear programming (LP) to determine an optimal solution [e.g., Gorelick, 1983]. [6] Data collection is the second component of the framework. The water manager seeks a water production system that is most cost effective, but is faced with a system that is consistently forced to under perform to compensate for predictive uncertainty. A reduction in uncertainty and consequent increase in production can be accomplished by collecting data from the aquifer system to build a more reliable predictive model and a better simulation optimization model used for groundwater management. However, the expected increase in production and profit will not necessarily offset the cost of additional data collection. Therefore the primary obstacle to the design of optimal sequential measurement campaigns in conjunction with groundwater management lies in quantifying the economic worth of hydrogeological information. As such, it is important to focus on net economic benefit and not simply on the increase in production or the reduction in model uncertainty. Here net profit refers to the difference between the profit from water sales and the cost of collecting the data. [7] Risk-based decision making is the third component of the framework. We use Bayesian data worth analysis to find the best locations of sequential sets of hydraulic conductivity measurements that maximize the combined benefits of groundwater production and data collection. The two specific tasks involved are to determine where to take the next set of measurements and when to halt the data collection

W03019

campaign. Bayesian decision analysis relies on the concepts of the expected value of measurement information (EVMI), which is a measure of the expected worth of information, and the expected value of perfect information (EVPI), which serves as an upper bound on the worth of exploration. Data worth is determined by comparing the cost of data collection to the expected gain in profit to the water manager. The economic value of additional data is measured by the consequent improvement in optimal groundwater management operations. The process involves sequential data collection and stochastic groundwater management, continuing until net profit is maximized. Because data collection and groundwater management are both economic optimization problems that account for uncertainty, solving the joint data worth and planning problem is computationally intensive. For each possible sampling location in the field, the methodology involves 1) a Monte Carlo analysis to characterize the hydraulic conductivity distribution at the location, and 2) a stochastic simulation optimization for each likely value of the hydraulic conductivity at that location. [8] We apply the optimal design, data collection, and decision-making framework to a hypothetical example based on groundwater management in an area where wetlands must be protected. The example is realistic but of limited complexity, and the following assumptions were adopted. (1) We consider a system represented by a twodimensional, recharge dominated simulation model with steady state conditions. (2) We evaluate the worth of data in zones rather than at individual points, sample sets of hydraulic conductivity measurements, sample those conductivity sets sequentially, do not account for estimation error, and the support scale of individual measurements is local. (3) We account only for the uncertainty in spatially variable hydraulic conductivity. However, the modular nature and Monte Carlo design of the framework allow adaptation to more complex groundwater and transport problems. In principle, the multiple-realization management model and Bayesian decision analysis approach can account for any source of uncertainty. The main drawback is computational effort, which increases with model complexity and the number of uncertain variables. This is especially true when continuous variables like the hydraulic conductivity are involved because the assessment of the value of information at a certain location requires a stochastic optimization for each possible value the variable can take at the particular location.

2. Review of Previous Work [9] To date, several studies have inspected the worth of data in hydrological problems using statistical decision theory. The earliest contributions reported are by Davis and Dvoranchik [1971] and Davis et al. [1972], who evaluated the worth of additional streamflow data for the design of piles for bridge construction. Maddock [1973] combined decision theory, groundwater simulation and mathematical programming to plan and manage an irrigated farm and to evaluate the worth of hydrogeological and other information. Gates and Kisiel [1974] computed the worth of measurements of storativity, transmissivity, discharge and recharge for a numerical groundwater prediction model. The

2 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

first application of decision making to problems involving groundwater contamination was presented by Kaunas and Haimes [1985]. Among other early contributions of statistical decision theory to contamination problems are the works provided by Grosser and Goodman [1985], Ben-Zvi et al. [1988], and Reichard and Evans [1989]. [10] Most of the aforementioned studies predate the wide use of spatial stochastic theory in hydrological sciences and treated the hydrogeological parameters as regionally homogeneous but uncertain. Since the introduction of geostatistical concepts in aquifer simulation by Delhomme [1978], there have been several sampling design studies that identify the ‘‘best’’ sampling locations for additional measurements as the locations that provide the greatest reduction in parameter uncertainty or in model prediction uncertainty [e.g., Carrera et al., 1984; Rouhani, 1985; Loaiciga, 1989; Tucciarelli and Pinder, 1991; Wagner et al., 1992; Wagner, 1995a, 1999; Criminisi et al., 1997]. They quantified the worth of a measurement by the degree by which it increased the accuracy of predictions of heads or concentrations, rather than in monetary terms. We argue that data worth analyses are far more useful if they are based on an objective function that quantifies the economic benefits of reducing model prediction uncertainty. [11] The modern risk-based design approach that included geostatistical techniques to characterize spatial variability was introduced into the literature by Massmann and Freeze [1987a, 1987b]. The concepts were further developed in a series of papers by Freeze et al. [1990], Massmann et al. [1990], and Freeze et al. [1992]. Freeze et al. [1992] showed how to evaluate the worth of spatially correlated measurements of hydraulic conductivity in a heterogeneous aquifer when predicting contaminant travel time. James and Freeze [1993] developed a Bayesian decision-making framework to evaluate the worth of data for remedial containment of contaminants in an aquifer underlain by an aquitard of uncertain continuity. James and Gorelick [1994] evaluated the monetary worth of data needed to delineate a contaminant plume, and estimated the optimal stepwise number of samples in a data collection program. Bayesian decision theory is used in a wide range of disciplines to evaluate the worth of information. For a comprehensive review of value of information analysis applications, with a focus on health risk assessment, we refer the reader to the review paper by Yokota and Thompson [2004]. [12] We build on the works of Freeze et al. [1992] and Wagner and Gorelick [1989] and propose a combination of simulation optimization and decision analysis to evaluate the worth of sets of hydraulic conductivity data to determine the maximum profit of water production in ecologically sensitive areas, where minimum water levels must be maintained. The main contribution is the development of a more comprehensive linkage between data collection, model uncertainty analysis, and optimal water resources planning. To our knowledge this is the first time stochastic simulation optimization has been integrated into formal decision analysis for optimal data collection, thereby reducing both prediction uncertainty and the consequent suboptimally losses in production. By incorporating a multiple realization simulation optimization management model into Bayesian decision analysis, we identify a reliable optimal groundwater management plan that protects wetlands while

W03019

simultaneously determining the economic worth of sequential measurements of hydraulic conductivities in random conditional fields that are spatially correlated.

3. Wetland Problem [13] We present the optimal data collection and planning approach for groundwater production in a region where wetland health depends on the maintenance of the local water table at or near the land surface. This problem is motivated by water management challenges, such as those in Florida, where the supply of groundwater is regulated to maintain the integrity of isolated lakes and palustrine wetlands. Southwest Florida relies on groundwater for approximately 86% of its 4.9 billion liters per day supply [Southwest Florida Waste Management District (SWFWMD), 2003]. The Floridian aquifer, which provides over 80% of the groundwater in the region, has been pumped so heavily that declining water levels and severe water resources problems occurred in the north Tampa Bay area in the mid-1990s. In 1996, the Florida legislature directed the Florida Department of Environmental Quality to establish minimum water levels to ensure that significant harm is not caused to the water resources and the environment. Responding to the statute, the Southwest Florida Water Management District, one of five regional authorities in the state, proposed specific minimum water levels in 1998 amidst significant controversy. After review by an independent panel of experts, these minimum water levels were established to protect ecologically sensitive habitats under threat due to groundwater declines from excessive pumping. The minimum water levels currently constitute the single most significant restriction on groundwater exploitation in the region [SWFWMD, 1999; Florida Statutes, Title XXVIII, Natural resources; conservation reclamation and use, chap. 373, Water Resources, State Water Resources Plan, 373.042 Minimum flows and levels, 2000]. Similar regulations have been established in other states, such as Kansas, where 90% of all irrigation water is derived from groundwater and minimum base flow must be maintained (Kansas Water Appropriations Act, K.S.A.82a – 703b, 1984). [14] Our illustrative example involves pumping to obtain a water supply while maintaining a high water table for wetland protection. We use topography obtained from a Digital Elevation Map of a wetland area near Dade City in southwest Florida, but this is intended to be generic, not adhering to the situation in that particular area. The example system is shown in Figure 1. The unconfined aquifer is approximately 150 m thick and of lateral dimensions 2.5 km by 2.5 km. Heads are simulated using finite differences based on discretization of the area into 50 by 50 m grid cells. [15] To limit the influence of the far-field boundary conditions on the region of concern, it is surrounded by a 5.6 km buffer zone, or termination strip, consisting of 10 additional rows and columns with successively increasing dimensions toward the exterior margins. Groundwater inflow of 2500 m3/d across the entire boundary of the domain occurs along the flux boundary at the top of Figure 1 and exits under conditions of no pumping along the constant head boundary fixed at 21.0 m at the bottom margin of Figure 1. Under natural conditions, the hydraulic

3 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

W03019

domain to reflect site-specific habitat susceptibility, or transient to allow for temporal changes in water needs of wetlands. Prior to pumping the maximum depth to water in the wetlands is 0.3 m. Alternatively, an equivalent formulation could have restricted heads or drawdowns, rather than the readily measured depth to the water table. [17] When the water table falls below the extinction depth of 2 m, the ET condition changes from a linear decline with depth to a constant value of zero. This is a discontinuity and could pose problems when using a gradient-based optimization method. However, the maximum allowable depth to the water table at the constraint locations is 0.5 m, which is far from the extinction depth of 2 m. As a result, the locations where the water table can fall below the extinction depth will be distant from the constraint locations. Hence discontinuities in the response of the depth to the water table to pumping did not occur at the constraint locations and the gradientbased optimization method successfully identified solutions in all cases. Figure 1. Example aquifer system represented as a spatially variable hydraulic conductivity field (surrounding region not to scale). gradient is approximately 104. Along the left and right sides of the domain, zero-gradient boundaries are specified at the outer edges of the buffer zone. Uniform precipitation of 1500 mm/year is applied to the system. Evapotranspiration is calculated with ETmax of 1800 mm/year and an extinction depth of 2 m. This results in a net recharge of approximately 300 mm/year. The hydraulic conductivity of the domain is represented as a realization of a second-order stationary Gaussian random field, where Y = log K is the log hydraulic conductivity. The field is characterized by a mean log value of mY = 1.6 (a geometric mean K of 40 m day1), an isotropic, exponential covariance function with log variance s2Y = 1.0, and a correlation length lY = 200 m. For the surrounding buffer zone the hydraulic conductivity is set to the geometric mean conductivity.

4.2. Optimal Groundwater Management Under Uncertainty [18] The primary cause of prediction uncertainty in the above case is unknown hydraulic conductivity, which is spatially variable. On the basis of both local data at the supply wells and data from similar environments, we assume that the spatial statistics of the hydraulic conductivity field can be estimated. Given these statistics, a suite of equally likely realizations conditioned on the conductivity values at the existing supply wells can be generated and represents the likely range of values. Optimal groundwater planning can account for uncertainty in conductivity using a stochastic simulation optimization method known as the multiple-realization or stacking approach [Gorelick, 1987; Wagner and Gorelick, 1989; Chan, 1993]. In the approach, a large number of equally likely realizations are included in the optimization formulation and a large simulation optimization, involving simultaneous simulations based on each conductivity realization, is solved. The realizations in the stack are equally likely, hence a uniform probability distribution characterizes the conductivity realizations. If 100

4. Groundwater Management Under Uncertainty 4.1. Formulation of the Groundwater Management Problem [16] We consider a water management authority that is responsible for providing a supply from groundwater extracted at four pumping centers that it owns and operates. The objective of the manager is to maximize total production, and consequent profits, by determining a single selection of pumping well rates from its existing well fields. Regulations require that minimum water levels must be maintained in ecologically sensitive areas. Figure 2 shows the location of the 4 pumping centers P1, . . ., P4. At the 422 control locations, shown as circles in Figure 2, minimum level constraints are imposed. The gray scale in Figure 2 indicates the depths to the water table under natural conditions. Water depths under conditions of no pumping are between 0.1 and 7.5 m. At the control locations, water levels, expressed as a maximum depth to water of 0.5 m, must be maintained under pumping conditions. We note that the bound can easily be made variable throughout the

Figure 2. Location of the pumping well centers (P1, P2, P3, and P4) and the control locations (circles) shown on an overlay of the regional depth to the water table.

4 of 13

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

W03019

realizations are considered, then the constraint set will consist of a stack involving 100 different simulation models. Groundwater production, or the consequent profit obtained from water sales, is maximized while constraints are obeyed in each and every simulation model based on the many hydraulic conductivity realizations in the stack. One unique set of pumping rates is determined regardless of the number of realizations in the stack. [19] Because the statistics of the stochastic model for the hydraulic conductivity are assumed known, the equally likely conductivity realizations in the stack are characterized by identical values for the mean, variance, and correlation length. Stochastic optimization employing the multiplerealization management modeling approach could be extended to account for uncertainty in the statistical parameters that govern the hydraulic conductivity distribution. The unknown statistical parameters are then treated as random quantities. Bayesian principles [e.g., Kitanidis, 1986; Feyen et al., 2002] or spatial bootstrapping [e.g., Journel, 1993] then can be applied to infer the probability distributions of the parameters. From the latter, a Monte Carlo method is used to sample combinations of statistical parameter values, which are used to generate equally likely conductivity realizations. Hence the statistics characterizing the realizations in the stack span the likely range of values, as described by the joint probability distribution of the underlying statistical parameters. The result is that the variability among the realizations in the stack not only reflects the uncertainty due to the spatial variability in conductivity, but also the additional uncertainty that stems from imperfect knowledge about the parameters of the stochastic model. Because in this work we focus on the worth of data, rather than on parameter estimation, we have assumed that the statistics are known and fixed when generating hydraulic conductivity realizations employed in the stack. However, we do condition the realizations to measured conductivity values. [20] The mathematical formulation of a multiple-realization management model for maximizing groundwater production is maximize et q

ð1Þ

subject to Realization 1 f ðqÞ ¼ e di ðK1 Þ di*

8i

ð2aÞ

Realization 2 f ðqÞ ¼ e di ðK2 Þ di* .. .

8i

ð2bÞ

Realization j f ðqÞ ¼ e di Kj di* .. .

8i

ð2cÞ

Realization ssz f ðq Þ ¼ e di ðKssz Þ di* q0

at pumping wells

8i

ð2dÞ ð3Þ

W03019

where e is a column vector of ones, t denotes the transpose operator, q is a vector of the pumping rates at the individual wells in m3/d, Kj represents the jth realization of the hydraulic conductivity field, e d i is the spatially variable depth to the water table at control location i, d*i is the maximum allowable depth to the water table at location i, and ssz is the stack size, which is the number of hydraulic conductivity realizations included in the stochastic management model. For any hydraulic conductivity realization, Kj, and a particular set of pumping rates, q, the head distribution is calculated by solving a groundwater flow model, f(q). The multiple realization optimization simulation management model was solved using the SNOPT optimization code [Gill et al., 2002]. MODFLOW-2000 [Harbaugh et al., 2000] was used to solve the groundwater flow model and was coded as an external subroutine that was called for each function evaluation in the optimization. [21] In general, the optimal pumping strategy will not be dictated by one single ‘‘worst case’’ realization. Rather, there can be parts of one realization that dictate pumping in one region and parts of another realization that dictate pumping in another region, such that the optimal solution will meet each constraint in every part of every realization in the stack. The stacking approach addresses uncertainty by determining an optimal pumping scheme that is necessarily overdesigned to guarantee success in meeting the constraints even when the underlying model predictions are in doubt. 4.3. Variability and Reliability of the Stochastic Optimization Simulation Results [22] The ensemble of realizations in a stack of finite size is a sample of the variability in hydraulic conductivity. Different stacks of the same size contain different realizations, thus constitute different samples of the range of variability. Using the multiple realization approach, optimal solutions based on stacks of the same size but consisting of different sets of realizations can be different. To evaluate the variability of the stochastic optimization simulation results, the optimal pumping scheme was determined for 300 different stacks, each with a stack size of 100 realizations, where each stack contained a unique set of realizations. Optimal total groundwater production varied between 8.69 103 m3/d and 10.37 103 m3/d, with an average production of 9.66 103 m3/d and a standard deviation of 0.25 103 m3/d. Thus the effect on the optimal solution of considering different realizations, as expressed by the relative magnitude of the standard deviation, was 2.6%. [23] The water manager must select a reliability (or range of reliabilities to be considered) that the depth to water constraints will be met. The reliability of the simulation optimization solutions can be determined a posteriori, by conducting a set of postoptimality Monte Carlo simulations. A statistical measure of the success or failure of an optimal pumping scheme is determined by evaluating its performance when applied to a series of test realizations. For a given test realization, the optimal solution is successful when none of the constraints for the test realization are violated. The reliability equals the fraction of successes out of the total set of Monte Carlo runs, i.e., the reliability is equal to one minus the probability of failure. Given the variability observed in the simulation optimization results, it is obvious that the reliability of the optimal pumping

5 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

schemes will also vary among different stacks of the same size that are represented in the formulation. [24] Some works have attempted to develop relations for the multiple-realization methods that prespecify reliability [e.g., Chan, 1993; Feyen and Gorelick, 2004], such that once the reliability is determined, the stack size can be readily estimated. Chan [1993] showed that, based on order statistics, the system reliability for a stack of n realizations is approximately n/(n + 1). He also developed an argument based on Bayesian principles in which the reliability is (n + 1)/(n + 2). Feyen and Gorelick [2004] proposed a new measure to predict the expected reliability of meeting water level constraints in wetland areas as a function of the number of conductivity realizations included in the stochastic optimization formulation and the variance of log K. It is important to note that these formulas provide a way to estimate the reliability of the optimal groundwater pumping plan without conducting postoptimization Monte Carlo analysis. In the current work, this is a technical detail that relates to assessing the likely viability of the optimal groundwater management scheme. It is not part of the Bayesian decision analysis aimed at determining the best measurement network. [25] We determined the reliabilities of each of the 300 optimal pumping strategies by counting the number of successes and failures when each of the optimal pumping schemes was applied to sets of 250 test realizations. In each determination of reliability, a different unique set of 250 test realizations was used. This resulted in 300 reliability values, ranging from 0.91 to 1.00, with an average reliability of 0.965 and a standard deviation of 0.018. Had a higher reliability been required, then the stack size necessarily would have been larger. For example, an average reliability of 0.98 would require a stack size of 200 realizations [Feyen and Gorelick, 2004]. [26] The results of the 300 multiple-realization optimizations and the reliability tests reveal that the optimal pumping schemes and the corresponding reliability may show variability that can not be neglected. Larger stacks of realizations sample a wider range of the uncertainty, and it is to be expected that the variability within the optimization results will decrease with increasing stack size. In the limit, when all possible realizations are stacked, the optimal pumping scheme should be unique and the reliability of the solution should equal one, with no variability. However, large stack sizes, e.g., >1000 realizations, may be computationally infeasible. Also, the required reliability may not justify such large stack sizes. Instead, one can perform the optimization analysis for a number of identical formulations, each with a chosen stack size, and accept the average reliability as a good estimate. This is the approach that we adopt in the worth of data analysis described below. A stack size of 100 realizations was used, hence the average reliability of the optimal pumping schemes is taken as 0.965. As shown later, the reliability of 0.965 was maintained in the final solution after data collection. This value was also determined using postoptimality Monte Carlo analysis.

5. Worth of Data [27] A central thread of our approach is the evaluation of the worth of data. The underlying precept is that the water

W03019

manager is a profit maximizer. As such, data collection efforts are only worthwhile if the expected value of each additional measurement at least pays for itself by reliably increasing groundwater production and subsequent profits. In this section, we first formulate the problem of optimal data collection mathematically. Then, we give an overview of the Bayesian decision analysis used to optimize the sampling strategy. The rest of the section details the different steps of the Bayesian decision analysis applied to the wetland problem. 5.1. Mathematical Formulation [28] The mathematical formulation for the present value worth of data problem is ð4aÞ

maximize Pnet

with Pnet ¼ Ps Ctot meas

ð4bÞ

¼ Rs Ccap þ Cop Ctot meas

ð4cÞ

T X

h i 1 unit unit ¼ prw :q:t Ccap þ Cq :q:t ð1 þ d Þt t¼1 unit ð4dÞ Cmeas : mKmeas

subject to Realization 1 f ðqÞ ¼ e di ðK1 Þ di*

8i

ð5aÞ

Realization 2 f ðq Þ ¼ e di ðK2 Þ di* .. .

8i

ð5bÞ

Realization j f ðqÞ ¼ e di Kj di* .. .

8i

ð5cÞ

Realization ssz f ðqÞ ¼ e di ðKssz Þ di*

q0

8i

ð5dÞ

ð6Þ

at pumping wells, where Pnet net profit ($); Ps profit from water sales ($); Rs revenue from water sales ($); total costs of taking hydraulic conductivity meaCtot meas surements ($); unit cost of hydraulic conductivity measurement Cunit meas ($/measurement);

6 of 13

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

W03019

W03019

table is small, single well, short-duration pumping tests provide local values of conductivity representative of a 2500 m2 footprint. In thinner aquifers slug tests might yield appropriate values. Collecting several measurements at once is common practice because it reduces fixed mobilization charges.

Figure 3. Schematic overview of Bayesian decision analysis to evaluate the worth of data. Ccap amortized capital and replacement cost (installation of the production well field) ($); Cop annual operating costs of the well field ($); d discount rate (dimensionless); annual sales price per unit water ($/m3); prunit w annual cost per unit pumping ($/m3); Cunit q K mmeas number of hydraulic conductivity campaigns (dimensionless); t time (years). The production well field is in place and operational prior to collecting the additional data determined in this analysis. The fixed cost of aquifer test well installation is contained in the unit cost of measurements. The amortized capital cost of the production well fields is included in the annual e unit groundwater production cost, or (Ccap + Cunit q .q.t) = Cq . unit e q.t, where Cq is the annual cost per unit pumping ($/m3) that includes the amortized capital costs. Equation (4d) then becomes Pnet ¼

T X t¼1

h i 1 unit e unit :q:t Cunit :mK : pr C w meas meas q t ð1 þ d Þ ð7Þ

We consider the case in which there is a negotiated fixed sales contract period, T, of five years. Therefore total net profit is a function of net annual profit per unit of water ($/m3), and the total cost of making m sets production Punit s of aquifer measurements. As such, equation (7) is unit K Pnet ¼ Punit s :q:T Cmeas : mmeas :

ð8Þ

As long as new hydraulic conductivity measurements reduce model prediction uncertainty, and enhanced model reliability enables a greater Pnet, those measurements are valuable. This is true when the increase in profit from water sales Ps = Punit s .q.T outweighs the increase in costs unit K Ctot meas = Cmeas.mmeas of taking additional measurements. The best measurement to take next in a sequence of measurements is the one that generates the greatest expected net profit. We consider campaigns to collect hydraulic conductivity data that are taken in sets of five local measurements within an area of 250,000 m2 (500 m by 500 m). In our case, in which the depth to the water

5.2. Evaluation of the Value of Information: Overview of Methodology [29] Bayesian decision analysis consists of three components and is schematically outlined in Figure 3. The prior analysis determines the profit from water sales based on prior information. The preposterior analysis estimates the optimal measurement locations and evaluates whether each sequential measurement will be cost-effective before it is taken. The posterior analysis then revises the prior optimal pumping scheme and consequent profit, given the new information. Because the mean, variance and correlation length of the hydraulic conductivity field are fixed, only the ‘‘K map’’ but not the spatial statistics of K are updated when each of the handful of new measurements becomes available. In this case, formal Bayesian updating of the hydraulic conductivity field as by Kitanidis [1986] and Feyen et al. [2002] reduces to conditional simulation to the enlarged data set of measured conductivity values. As such, conditional realizations generated in the prior, preposterior, and posterior analysis are characterized by identical spatial statistics. Only the number of conditioning data that are honored by the stochastic simulations differs. [30] The core component in determining the worth of data is the preposterior analysis. In that part, the likely worth of additional information is estimated and the best measurement points are identified. Even though the values at the proposed measurement locations remain unknown, such measurements will reduce the uncertainty, which allows the expected increase in the objective function that includes the proposed measurements to be estimated. The preposterior analysis is based on estimation of the expected profit increase due to taking additional measurements. [31] The expected value of perfect information (EVPI) assumes a perfect sampling design that reduces uncertainty to zero and hence yields the truly optimal groundwater extraction rates and maximum profit from water sales. The value of perfect information (VPI) or regret is the difference in profit from water sales Ps between the prior analysis and the perfect optimal solution when the hydraulic conductivity field is sampled exhaustively and precisely. The VPI or regret is a measure of the loss in profit from water sales due to the unknown spatial variability in hydraulic conductivity. Because the true hydraulic conductivity field is unknown, the regret cannot be calculated. However, for each equally likely representation of the hydraulic conductivity field, the regret or VPI can be calculated. The average of the calculated regrets over the set of equally like realizations of K is an estimate of the VPI, hence is called the expected VPI, or EVPI. The EVPI is an estimate of the expected loss in profit due to the unknown spatial variability in hydraulic conductivity. It is the maximum improvement that could be expected, and is an upper bound on the worth of exploration or data collection. [32] The expected value of measurement information (EVMI) reflects the expected value of an imperfect sampling round, which does not reduce uncertainty to zero. It is

7 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

an estimate of the monetary worth of the information, and allows one to determine whether sampling is valuable. A sampling program is cost-effective only if it costs less than the EVMI. The value of measurement information (VMI) is the increase in the profit from water sales Ps between the prior analysis and the posterior analysis that includes the sample information. Since the VMI remains unknown until the sample is actually taken, it can only be estimated. The EVMI is then equal to the expected increase in Ps between the prior analysis and the posterior analysis due to taking the proposed additional measurements Y(b). Formally, the EVMI at location b can be written as

EVMIðbÞ ¼ Pprior E fPs jY ðbÞg ; s

ð9Þ

where Pprior is the profit from water sales prior to collecting s conductivity measurements and E{PsjY(b)} is the expected profit from water sales given sample Y(b). On the basis of spatial stochastic theory, equation (9) can be expanded into EVMIðbÞ ¼ jPprior ½Ps ðY ðbÞ ¼ y1 Þ: PrðY ðbÞ ¼ y1 Þ þ . . . s þ Ps ðY ðbÞ ¼ yn Þ: PrðY ðbÞ ¼ yn Þj;

ð10Þ

where Ps(Y(b) = yi) is the profit from water sales when Y(b) = yi, Pr(Y(b) = yi) is the probability of sampling yi at location b, and n is the number of hydraulic conductivity values statistically sampled to characterize the distribution of log K at location b [James and Freeze, 1993]. The EVMI of a location thus equals the average increase in profit from groundwater sales over the possible outcomes of the hydraulic conductivity at that location, weighted by the probability of the respective outcomes. The n possible outcomes yi at location b are samples from N(mb,krig, s2b,krig), where mb,krig and s2b,krig are the kriged estimate and estimation variance, respectively, at location b. Values of hydraulic conductivity are sampled by generating conditional realizations and selecting the simulated values at the locations of interest. Although there may be many locations where a sample is cost-effective, a measurement will be taken only at the location with the highest EVMI. That is the most beneficial data to a water manager interested in reducing uncertainty to maximize production and profits. [33] It is important to note that the actual increase (postmeasurement) in profit from water sales can differ substantially from the EVMI, and the actual profit can increase or decrease when a measurement is taken. This is because the EVMI is an expectation; it is the average increase in profit from water sales, with the average over the possible values of the hydraulic conductivity at that location, whereas the actual increase in profit depends only on the true hydraulic conductivity value at the location. Thus, even when the likely increase in profit from water sales outweighs the cost of sampling, the increase in net profit, which is unknown until the sample is actually taken, can turn out to be negative. 5.3. Evaluation of the Value of Information: Application to the Wetland Problem 5.3.1. How Much Should the Manager Spend on Data Collection? [34] In each of the three stages of the data worth analysis, the optimal pumping rates are determined using the multi-

W03019

ple-realization approach given by equations (1) – (3). In the prior and posterior analyses, 100 stacks (snr = 100) with a stack size of 100 realizations (ssz = 100) each are included in the simulation optimization formulation and the snr = 100 optimization results are averaged. Both the prior and posterior analysis require simulation of 10,000 ‘‘aquifers’’, each with a unique hydraulic conductivity field. As shown later, the preposterior analysis requires simulating the same number of ‘‘aquifers’’ for each potential sampling location. [35] In our case, the hydraulic conductivity values at the pumping locations are known a priori, and therefore included in the prior analysis by generating realizations conditioned on the values at those locations. In the posterior analysis, the hydraulic conductivity fields will be conditioned on both the values at the pumping locations and on the newly sampled data. Determining each simulation optimization solution involves a search for the best pumping rates when considering 100 simulated aquifers in the constraint set (5), and solution of the nonlinear optimization typically requires 50 to 60 iterations. For the given formulation, the average computation time for solving a single simulation optimization management model with a stack size of 100 realizations (ssz = 100) is 45 minutes on a 2.6 GHz machine. [36] The prior analysis begins with the computation of the EVPI or average regret. The value of EVPI, expressed as a pumping rate, is equal to the difference in the average total optimal pumping rates when stacking 1 versus 100 realizations. Note that these two values are each based on taking an average of 100 simulation optimization solutions (snr = 100). For ssz = 1, average optimal production equaled 11.83 103 m3/d. For ssz = 100, average optimal production was 9.66 103 m3/d, which corresponds to the average production of the 300 multiple realization simulation optimization solutions described in Section 4.3. The computation of EVPI is simply (11.83 9.66) 103 m3/d = 2.17 103 m3/d. This is the loss in groundwater production due to unknown spatial variability in hydraulic conductivity. As such, the loss in groundwater production is 18% of the estimated potential groundwater extraction, or the system is operating at 82% of its estimated potential capacity. The profit to the water manager is 0.4 $/m3 based on the 5-year sales contract period with negotiated fixed pricing. In monetary terms, the EVPI then equals (2.17 103 m3/d) (5 365 days) (0.4 $/m3) = $1,584,100. This is the maximum budget that should be spent on data collection. It reflects the greatest potential unrealized profit due to initially unknown spatial variability of hydraulic conductivity when no additional measurements are taken. 5.3.2. Measurement Layout and Cost of Sampling [37] For the data worth analysis, we divide the region into 25 blocks, each occupying a region of 500 m by 500 m. The blocks are numbered from 1 to 25, starting with 1 in the bottom left corner of the domain, cycling by columns first, and ending in the upper right corner with 25. We look at sets of 5 measurements, spaced on a regular ‘‘5-spot’’ pattern within each block. We are interested in efficiently evaluating the worth of information in zones within our model domain, while recognizing fine-scale spatial variability and uncertainty in hydraulic conductivity. That is, each single block considered for economic analysis contains 100 local and spatially variable values of hydraulic conductivity, 5 of

8 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Table 1. Relation Between the EVMI, Expressed in Terms of an Increase in Pumping Rate or Increase in Profit, and the Number of Measurement Campaigns That Determine Five Local Hydraulic Conductivity Values Within a Region Occupying 250,000 m2 EVMI, m3/d

EVMI, $

Number of Measurement Campaigns

68 137 274 411 548 685

50,000 100,000 200,000 300,000 400,000 500,000

0.5 1 2 3 4 5

which are (potentially) measured in a data collection campaign. Taking multiple measurements in each campaign reduces the price per conductivity measurement, and reduces the dimensionality of the problem, making it computationally tractable. The cost of the aquifer tests to determine 5 local hydraulic conductivity measurements within a block is $100,000. From Table 1 it can be seen that this value corresponds to an increase in profit generated by an increase in groundwater production rate of 137 m3/d. Hence sampling is cost effective if the measurements generate an EVMI of more than $100,000, which is equivalent to an increase in optimal pumping rate greater than 137 m3/d. 5.3.3. Evaluation of the Value of Information: EVMI Calculation and Convergence of Results [38] The aim of the preposterior analysis is to calculate the expected value of potential measurements or EVMI. For each 500 m by 500 m block b, the EVMI is obtained by evaluating equation (10). The sets of log hydraulic conductivity values yi(b) = (ybi,1,. . ., ybi,5), with i = 1, . . ., n, are obtained from different conditional realizations. For each set of values yi(b), snr simulation optimization problems each containing 100 conditional realizations (ssz = 100) are solved. Hence such an analysis results in snr estimates of the VMI or regret for each of the n sampled values yi(b). Averaging both over the snr estimates of the VMI per sample and the n samples yi(b) results in the EVMI. [39] Practically, there are computational limits on n, the number of values sampled from N(mb,krig, s2b,krig) to represent the variability in hydraulic conductivity at each location, and on snr, the number of simulation optimization problems solved for each of those statistical samples of yi(b) drawn from N(mb,krig, s2b,krig). In section 4.3 it was shown that the standard deviation of the optimal solution when including different realizations in the stack (stack size = 100) was 2.6%. A more significant source of variability in the optimal solutions has to do with n, which determines how accurately the variability of log K at a potential measurement location is characterized. The simulation optimization solutions for different log K values at a particular block in which an additional measurement might be taken depends not only on the variability of K in that block, but also on both the actual location of the block and the magnitude of the log K values. For n = 100, and considering a single stacking optimization (snr = 1) of 100 realizations (ssz = 100) for

W03019

each of the n samples yi(b), the standard deviation of the simulation optimization solutions for the 25 blocks varied between 2.9% and 4.1%. This means that the consideration of different samples of yi(b) at a given block b better accounts for a greater source of variability in the optimization solutions than consideration of more stacks (snr 1) for each of the n samples yi(b). Therefore we chose to set snr = 1, and increased n until the EVMI at all blocks converged, i.e., for each sampled set of log K values yi(b) at block b only one simulation optimization was evaluated. As discussed next, convergence occurred when n = 100. This implies that the VMI or regret obtained for measurement set yi(b) is a sample of the possible simulation optimization solutions at that block. We do not average the VMI of many alternative simulation optimization solutions for each possible outcome yi(b) and then sample a small number of log K sets from N(mb,krig, s2b,krig). Rather we take only one stack evaluated for each sample yi(b), and the VMI is averaged over a large number of sampled log K sets at that block, b. Moreover, when n increases, the probability of sampling similar yi(b) values increases, hence the effect of averaging over different stacks for one sample yi(b) is accounted for when n becomes large. [40] For the first sampling round, the fluctuation of the EVMI versus the number of samples n drawn from N(mb,krig, s2b,krig) is shown in Figure 4. The number of log hydraulic

Figure 4. Fluctuation of EVMI versus the number of values sampled n from N(mb,krig, s2b,krig ): (a) jEVMIi EVMI100j/EVMI100 for the 25 blocks; (b) average jEVMIi EVMI100j/EVMI100.

9 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Figure 5. Plot of EVMI ($): (a) first sampling round; (b) second sampling round. conductivity sets sampled per block is shown along the x axis, with the absolute deviation relative to the EVMI for n = 100 at each block shown along the y axis. Figure 4a shows jEVMIi EVMI100j/EVMI100 for each block. The average deviation for the 25 blocks is presented in Figure 4b. The values of EVMIi and EVMI100 correspond to the EVMI, or average VMI, after drawing i and 100 samples from N(mb,krig, s2b,krig), respectively. The fluctuation in EVMI for the individual blocks varied widely for small values of n, and converged when n increased. For nearly all blocks, the fluctuation of the EVMI stabilized for n > 75. On average, the absolute deviation declined with increasing n, with the decline initially steepest, and leveling off for higher values of n. The average absolute deviation was 10% for n = 75 and dropped below 5% for n > 90. This analysis shows that for n = 100 the EVMI values were sufficiently stable to allow selecting the blocks with the highest EVMI. Consequently, the preposterior analysis required 100 stochastic optimizations with a stack size of

W03019

100 realizations each, or the equivalent of simulating 10,000 aquifers, for each potential sampling location. 5.3.4. Selection of the Blocks at Which Hydraulic Conductivity Should Be Sampled [41] The EVMI100 values obtained in the preposterior analysis of the first sampling round are displayed in the shaded map in Figure 5a. Comparison of the results with the EVMI threshold of $100,000 shows that several blocks had an EVMI larger than the cost of sampling, hence were worth sampling. The highest EVMI values were observed in the vicinity of, but not necessarily in, the blocks in which the pumping locations are positioned. Thus, even though the hydraulic conductivity values at the pumping locations were known, which implies the kriging variance was smallest nearby, it was still most cost effective to sample the blocks nearby the pumping wells. [42] Block 20, located northeast of P2, had by far the highest EVMI. It was expected that the set of 5 conductivity measurements in this block would result in an optimal pumping rate of 10.02 103 m3/d. This corresponds to an increase in pumping rate of 355 m3/d, or an increase in profit of $259,150. However, the actual increase in pumping rate can differ substantially from the EVMI calculated for the block. For example, the actual optimal pumping rate obtained in the posterior analysis after sampling block 20 was 9.81 103 m3/d, which corresponds to an increase in pumping rate of 150 m3/d, or an increase in profit of $109,500, which is less than half the expected increase estimated using the EVMI. [43] The shaded EVMI map for the preposterior analysis of the second sampling round is given in Figure 5b. Block 20 is marked by a white cross to indicate that it is sampled in the first round and is no longer considered as a potential sampling block in further sampling rounds. Comparing the EVMI maps for both sampling rounds shows that on average the EVMI values have dropped and that the pattern of EVMI has changed by sampling block 20. For the second sampling round, the highest EVMI value was obtained for block 15, which is the block east of the block in which P2 is located. The fact that this block did not have the second highest EVMI in the preposterior analysis of the first sampling round indicates that a sampling campaign based on the first EVMI map only is not optimal. Sequential sampling and updated conditioning is an important aspect of developing the most cost-effective measurement campaign. Procedurally, the EVMI map must be recalculated after each set of measurements has been taken. [44] Optimal sequential measurement campaigns should continue until the highest EVMI among the remaining blocks is lower than the cost of sampling. In this case, a total of five blocks were sampled. These were the 4 blocks surrounding P2 (blocks 14, 15, 19 and 20) and the block in which P3 is located (block 11). In the 6th sampling round, the highest EVMI was observed in block 12, located east of P3. However, the expected payoff, as reflected in the EVMI of $48,910 at this location, did not justify engaging in another round of measurements. 5.3.5. Cost-Benefit Analysis Sequential Measurement Campaigns [45] The costs and benefits of the different sampling rounds are plotted in Figure 6. Figure 6 (top) shows the

10 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Figure 6. (top) Cumulative cost and (bottom) profit of m sequential measurement campaigns involving collecting sets of hydraulic conductivity data while maximizing profits from production under the constraint that water levels cannot be lowered excessively in wetland areas and that the optimal management plan is 0.965 reliable. Note that costs are negative values in the objective function. cumulative sampling costs as a function of the number of measurement rounds, with the costs shown along the y axis, whose range is inverted. Figure 6 (bottom) shows the profit from water sales and net profit to the water manager with each round of measurements. The profit from water sales does not include measurement costs and is a monotonically increasing function of the number of measurements taken. The water manager aims to maximize the net profit, which is obtained by subtracting the sampling costs from the profit from water sales. The results of the second measurement round show that, even though the expected increase in profit from water sales outweighs the cost of sampling, the increase in net profit of a particular campaign may turn out to be negative. This is because the EVMI is merely an expected value of the increase in profit from water sales. The actual increase in profit, which is only known after each sequential set of measurements is taken, can differ substantially from the expectation. For the second sampling round, the actual increase in profit from water sales was considerably lower than the expected increase in profit and did not pay off for the cost of taking the measurements. This resulted in a decline in net profit. However, sampling continued until the likely value of a set of measurements, as measured by the EVMI, was lower than the cost of sampling, even though the latest measurement generated a decline in profit. Thus the ‘‘stopping rule’’ relates to the expected increase in profit (EVMI) and not the actual realized increase in profit, which might fall short of the profit expectation.

W03019

[46] Engaging in the 5 sampling rounds resulted in an increase in the optimal pumping rate from 9660 m3/d to 10,670 m3/d, or an increase of 1,010 m3/d. In section 5.3.1, it was shown that the estimated potential capacity of the system was 11,830 m3/d. Thus it is only worthwhile for the water manager to recover approximately 47% of the expected suboptimality losses due to unknown spatial variability in the hydraulic conductivity by engaging in the five sampling campaigns. Optimal data collection costs totaled $500,000. This resulted in an increase in profit from water sales of $744,600 and a net profit increase of $244,600. The water manager’s return on investment is tangibly 49%. The external benefit is a better understanding and more complete representation of the aquifer system and its behavior, and maintenance of wetland viability. An additional campaign would raise total costs to $600,000 and reduce net profits to $192,050. It is to be expected that when the spatial statistics are imperfectly known, the worth of hydraulic conductivity data, and consequently the return on investment, will (at least initially) increase, because the conductivity data will have a combined effect of reducing the kriging variance as well as allowing for a better estimation of the spatial statistics. 5.3.6. Reliability of the Posterior Optimization Simulation Solutions [47] The results given by the stochastic groundwater management model after the 5 data campaigns were completed, was evaluated to determine the reliability of the optimal pumping plan. A Monte Carlo analysis was conducted in which 300 conditional test realizations were generated and aquifer simulations were run. On the basis of the number of realizations in which constraint violations occurred, the average reliability of the solution was estimated to be 0.967, with a standard deviation of 0.019. Therefore the goal was achieved of maintaining a reliability of at least 0.965 after data collection. Thus conditioning to hydraulic conductivity measurements allows more water to be extracted while maintaining the desired reliability of 0.965.

6. Summary and Conclusions [48] We demonstrated a framework that enables a water manager to determine optimal water production plans and data collection that maximize net profit. The optimal data collection program is aimed at reducing the predictive uncertainty of the simulation model in such a manner that only the most valuable data are collected. Value is not measured in terms of reduced uncertainty, but in terms of the consequent expected monetary profit to the water manager. Driving the optimal sampling program and groundwater resources management plan is the fact that spatial variability of hydraulic conductivity exists and is unknown. However, zones targeted for sampling did not coincide with regions with the highest estimation variance given by kriging. [49] The approach unites a multiple realization stochastic groundwater management model and Bayesian decision analysis to evaluate the worth of data aimed at maximizing the profit of aquifer production in ecologically sensitive areas. The method was demonstrated with a hypothetical unconfined aquifer system similar to systems in Florida.

11 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Through successive sampling campaigns, net profit was increased, while maintaining the reliability of the optimal simulation optimization solution at 0.965. The net profit was maximized by engaging in 5 sequential measurement campaigns. Zones with the largest expected increase in profit from water production were targeted until measurements were no longer cost effective. Additional measurements would cost more than the expected profit it was expected to produce. Additional data collection beyond five rounds was shown not to be cost effective. Zones in the neighborhood of the pumping wells consistently showed the highest expected sampling benefit aimed at the safe increase in pumping. The water manager achieved nearly a 50% return on investment by engaging in the optimal sequence of data collection. Moreover, the additional data resulted in an improved representation of aquifer properties and a better model of system behavior, and in the maintenance of wetland viability. [50] Acknowledgments. We are grateful to Michael Saunders at Stanford Department of Management Science and Engineering for his advice on the optimization part of the work. The authors wish to thank Keith Beven and Jim Freer for the access to the parallel system at the Institute of Environmental Sciences at Lancaster University, UK. The first author wishes to acknowledge the Fund for Scientific Research - Flanders for providing a Postdoctoral Fellowship and a mobility grant. We gratefully acknowledge National Science Foundation grant EAR-0337393. We also thank three anonymous reviewers and the associate editor for their review comments.

References Ahlfeld, D. P., and M. Heidari (1994), Applications of optimal hydraulic control to ground-water systems, J. Water Resour. Plann. Manage., 120(3), 350 – 365. Ben-Zvi, M., B. Berkowitz, and S. Kessler (1988), Preposterior analysis as a tool for data evaluation: Application to aquifer contamination, Water Resour. Manage., 2, 11 – 20. Carrera, J., E. Usunoff, and F. Szidarovsky (1984), A method for optimal observation network design for groundwater management, J. Hydrol., 73, 147 – 163. Chan, N. (1993), Robustness of the multiple realization method for stochastic hydraulic aquifer management, Water Resour. Res., 29(9), 3159 – 3167. Criminisi, A., T. Tucciarelli, and G. P. Karatzas (1997), A methodology to determine optimal transmissivity measurement locations in groundwater quality management models with scarce field information, Water Resour. Res., 33(6), 1265 – 1274. Davis, D. R., and W. M. Dvoranchik (1971), Evaluation of the worth of additional data, Water Resour. Bull., 7(4), 700 – 707. Davis, D. R., C. C. Kisiel, and L. Duckstein (1972), Bayesian decision theory applied to design in hydrology, Water Resour. Res., 8(1), 33 – 41. Delhomme, J. P. (1978), Kriging in the hydrosciences, Adv. Water Res., 1(5), 251 – 266. Feyen, L., and S. M. Gorelick (2004), Reliable groundwater management in hydroecologically sensitive areas, Water Resour. Res., 40, W07408, doi:10.1029/2003WR003003. Feyen, L., P. J. Ribeiro Jr., F. De Smedt, and P. J. Diggle (2002), Bayesian methodology to stochastic capture zone determination: Conditioning on transmissivity measurements, Water Resour. Res., 38(9), 1164, doi:10.1029/2001WR000950. Freeze, R. A., J. Massmann, L. Smith, T. Sperling, and B. James (1990), Hydrogeolocical decision analysis, 1. A framework, Ground Water, 28(5), 738 – 766. Freeze, R. A., B. James, J. Massmann, T. Sperling, and L. Smith (1992), Hydrogeological decision analysis, 4, The concept of data worth and its use in the development of site investigation strategies, Ground Water, 30(4), 574 – 588. Gates, J. S., and C. C. Kisiel (1974), Worth of additional data to a digital computer model of a groundwater basin, Water Resour. Res., 10(5), 1031 – 1038.

W03019

Gill, P. E., W. Murray, and M. A. Saunders (2002), SNOPT: An SQP algorithm for large-scale constrained optimization, SIAM J. Optim., 12(4), 979 – 1006. Gorelick, S. M. (1983), A review of distributed parameter groundwater management modeling methods, Water Resour. Res., 19(2), 305 – 319. Gorelick, S. M. (1987), Sensitivity analysis of optimal groundwater contaminant capture curves: Spatial variability and robust solutions, in Solving Ground Water Problems With Models: An Intensive ThreeDay Conference and Exposition Devoted Exclusively to Ground Water Modeling, pp. 133 – 146, Natl. Water Well Assoc., Dublin, Ohio. Gorelick, S. M. (1990), Large-scale nonlinear deterministic and stochastic optimization: Formulations involving simulation of subsurface contamination, Math. Programm., 48, 19 – 39. Grosser, P. W., and A. S. Goodman (1985), Determination of groundwater sampling frequencies through Bayesian decision theory, Civ. Eng. Syst., 2, 186 – 194. Harbaugh, A. W., E. R. Banta, M. C. Hill, and M. G. McDonald (2000), MODFLOW-2000, U.S. Geological Survey modular ground-water model—User guide to modularization concepts and the ground-water flow process, U.S. Geol. Surv. Open File Rep., 00-92, 121 pp. James, B. R., and R. A. Freeze (1993), The worth of data in predicting aquitard continuity in hydrogeological design, Water Resour. Res., 29(7), 2049 – 2065. James, B. R., and S. M. Gorelick (1994), When enough is enough: The worth of monitoring data in aquifer remediation design, Water Resour. Res., 30(12), 3499 – 3513. Journel, A. G. (1993), Resampling from stochastic simulations, Environ. Ecol. Stat., 1, 63 – 83. Kaunas, J. R., and Y. Y. Haimes (1985), Risk management of groundwater contamination in a multiobjective framework, Water Resour. Res., 21(11), 1721 – 1730. Kitanidis, P. K. (1986), Parameter uncertainty in estimation of spatial functions: Bayesian analysis, Water Resour. Res., 22(4), 499 – 507. Loaiciga, H. A. (1989), An optimization approach for groundwater quality monitoring network design, Water Resour. Res., 25(8), 1771 – 1782. Maddock, T., III (1973), Management model as a tool for studying the worth of data, Water Resour. Res., 9(2), 270 – 280. Massmann, J., and R. A. Freeze (1987a), Groundwater contamination from waste management sites: The interaction between risk-based engineering design and regulatory policy: 1. Methodology, Water Resour. Res., 23(2), 351 – 367. Massmann, J., and R. A. Freeze (1987b), Groundwater contamination from waste management sites: The interaction between risk-based engineering design and regulatory policy: 2. Results, Water Resour. Res., 23(2), 368 – 380. Massmann, J., R. A. Freeze, L. Smith, T. Sperling, and B. James (1990), Hydrogeolocical decision analysis, 2. Applications to ground water contamination, Ground Water, 29(4), 536 – 548. Morgan, D. R., J. W. Eheart, and A. J. Valocchi (1993), Aquifer remediation design under uncertainty using a new chance constrained programming technique, Water Resour. Res., 29(3), 551 – 561. Reichard, E. G., and J. S. Evans (1989), Assessing the value of hydrogeological information for risk-based remedial action decisions, Water Resour. Res., 25(7), 1451 – 1460. Rouhani, S. (1985), Variance reduction analysis, Water Resour. Res., 21(6), 837 – 846. Southwest Florida Water Management District (1999), Establishment of minimum levels in palustrine cypress wetlands, in northern Tampa Bay minimum flows and levels for isolated wetlands, category 1 and 2 lakes, seawater intrusion, environmental aquifer levels, and Tampa Bypass Canal, report, Brooksville, Fla. Southwest Florida Water Management District (2003), 2001 estimated water use in the Southwest Florida Water Management District, report, Brooksville, Fla. (Available at http://www.swfwmd.state. fl.us/documents/reports/files/2001%20Estimated%20Water%20 Use%20.pdf) Tucciarelli, T., and G. Pinder (1991), Optimal data acquisition strategy for the development of a transport model for groundwater remediation, Water Resour. Res., 27(4), 577 – 588. Wagner, B. J. (1995a), Sampling design methods for groundwater modeling under uncertainty, Water Resour. Res., 31(10), 2581 – 2591. Wagner, B. J. (1995b), Recent advances in simulation-optimization groundwater management modeling, Rev. Geophys., 33, 1021 – 1028.

12 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Wagner, B. J. (1999), Evaluating data worth for ground-water management under uncertainty, J. Water Resour. Plann. Manage., 125(5), 281 – 288. Wagner, B. J., and S. M. Gorelick (1989), Reliable aquifer remediation in the presence of spatially variable hydraulic conductivity: From data to design, Water Resour. Res., 25(10), 2211 – 2225. Wagner, J. M., U. Shamir, and H. R. Nemati (1992), Groundwater quality management under uncertainty: Stochastic programming approaches and the value of information, Water Resour. Res., 28(5), 1233 – 1246. Yeh, W. W.-G. (1992), Systems analysis in ground-water planning and management, J. Water Resour. Plann. Manage., 118(3), 224 – 237.

W03019

Yokota, F., and K. Thompson (2004), Value of information literature analysis: A review of applications in health risk assessment, Med. Decis. Mak., 24, 287 – 298.

L. Feyen, Land Management Unit, Institute for Environment and Sustainability, European Commission, DG Joint Research Centre, TP261, I-21020 Ispra (Va), Italy. ([email protected]) S. M. Gorelick, Department of Geological and Environmental Sciences, Stanford University, Stanford, CA 94305-2115, USA. ([email protected] stanford.edu)

13 of 13

Framework to evaluate the worth of hydraulic conductivity data for optimal groundwater resources management in ecologically sensitive areas Luc Feyen1 Department of Geological and Environmental Sciences, Stanford University, Stanford, California, USA Hydrogeology and Engineering Geology, Katholieke Universiteit Leuven, Leuven, Belgium

Steven M. Gorelick Department of Geological and Environmental Sciences, Stanford University, Stanford, California, USA Received 24 November 2003; revised 27 December 2004; accepted 13 January 2005; published 18 March 2005.

[1] We propose a framework that combines simulation optimization with Bayesian

decision analysis to evaluate the worth of hydraulic conductivity data for optimal groundwater resources management in ecologically sensitive areas. A stochastic simulation optimization management model is employed to plan regionally distributed groundwater pumping while preserving the hydroecological balance in wetland areas. Because predictions made by an aquifer model are uncertain, groundwater supply systems operate below maximum yield. Collecting data from the groundwater system can potentially reduce predictive uncertainty and increase safe water production. The price paid for improvement in water management is the cost of collecting the additional data. Efficient data collection using Bayesian decision analysis proceeds in three stages: (1) The prior analysis determines the optimal pumping scheme and profit from water sales on the basis of known information. (2) The preposterior analysis estimates the optimal measurement locations and evaluates whether each sequential measurement will be costeffective before it is taken. (3) The posterior analysis then revises the prior optimal pumping scheme and consequent profit, given the new information. Stochastic simulation optimization employing a multiple-realization approach is used to determine the optimal pumping scheme in each of the three stages. The cost of new data must not exceed the expected increase in benefit obtained in optimal groundwater exploitation. An example based on groundwater management practices in Florida aimed at wetland protection showed that the cost of data collection more than paid for itself by enabling a safe and reliable increase in production. Citation: Feyen, L., and S. M. Gorelick (2005), Framework to evaluate the worth of hydraulic conductivity data for optimal groundwater resources management in ecologically sensitive areas, Water Resour. Res., 41, W03019, doi:10.1029/2003WR002901.

1. Introduction [2] We address three interrelated concerns that underlie the development of an optimal groundwater resources plan in an ecologically sensitive area. First is maximizing profits from groundwater production while limiting the decline of water levels in wetland areas. The permitted water table decline can be dictated by governmental regulation, or can be self-imposed. Second is accounting for uncertainty in predicted water levels due to unknown spatial variability in hydraulic conductivity. Third is optimizing the data collection strategy so that new conductivity data both reduce uncertainty and provide the greatest economic benefit to the water manager of reliable regional groundwater production. 1 Now at Land Management Unit, Institute for Environment and Sustainability, European Commission, Ispra, Italy.

Copyright 2005 by the American Geophysical Union. 0043-1397/05/2003WR002901

[3] We adopt the perspective of a water manager whose goal is to improve local production in the most costeffective way, or to layout the optimal measurement design to optimally exploit new sources. Although the perspective of the water manager in this case is that of a profit maximizer, an alternative analogous formulation is that the water manager is a cost minimizer. The latter case might be applicable to a public agency that assumes the responsibility of providing a reliable least cost water supply while meeting environmental and regulatory constraints. The underlying concepts and approach presented here are the same whether the goal of water manager is to maximize profits or minimize costs. [4] The three interrelated concerns above are addressed using an optimal design, data collection, and a Bayesian decision-making framework. Combined groundwater simulation and optimization is the tool used for optimal design (for reviews, see Gorelick [1983, 1990], Yeh [1992], Ahlfeld and Heidari [1994], and Wagner [1995b]). Specifically, the design component employs stochastic simulation optimiza-

W03019

1 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

tion to plan regionally distributed groundwater pumping while preserving the hydroecological balance in wetland areas. A stochastic approach is required because predictions made by an aquifer model are imprecise. Consequently, any groundwater extraction scheme based on predicted water levels is uncertain. To guarantee that the constraints aimed at wetland protection are met when there is significant doubt about the predicted water level, the regional extraction system is forced to operate below its equivalent certainty level. The water manager must intentionally extract less water from the aquifer to ensure that wetland areas do not dry up. The maximum safe and reliable groundwater production rates will generally be lowest when model uncertainty is greatest. Here ‘‘safe’’ means that the damage to wetlands, as measured by violating water level constraints and caused by pumping to obtain a sufficient water supply, is relegated to a low-probability event by carefully designing an optimal pumping plan. [5] Here we consider uncertainty due to spatially variable hydraulic conductivity. To account for the spatial variability in the hydraulic conductivity, the multiple realization management model formulation is used [Gorelick, 1987; Wagner and Gorelick, 1989; Morgan et al., 1993]. This is a simulation optimization technique in which numerous realizations of spatially variable hydraulic conductivity are considered simultaneously in a single optimization formulation. In our case, the simulation model is nonlinear because it is used to represent unconfined aquifer behavior. Consequently, the stochastic optimization problem requires nonlinear programming (NLP). Here a gradient-based iterative method is employed to solve the NLP [Gill et al., 2002]. In some situations, perhaps problem nonlinearities can be ignored thereby enabling the use of a simpler and less computationally burdensome response matrix method that relies on linear programming (LP) to determine an optimal solution [e.g., Gorelick, 1983]. [6] Data collection is the second component of the framework. The water manager seeks a water production system that is most cost effective, but is faced with a system that is consistently forced to under perform to compensate for predictive uncertainty. A reduction in uncertainty and consequent increase in production can be accomplished by collecting data from the aquifer system to build a more reliable predictive model and a better simulation optimization model used for groundwater management. However, the expected increase in production and profit will not necessarily offset the cost of additional data collection. Therefore the primary obstacle to the design of optimal sequential measurement campaigns in conjunction with groundwater management lies in quantifying the economic worth of hydrogeological information. As such, it is important to focus on net economic benefit and not simply on the increase in production or the reduction in model uncertainty. Here net profit refers to the difference between the profit from water sales and the cost of collecting the data. [7] Risk-based decision making is the third component of the framework. We use Bayesian data worth analysis to find the best locations of sequential sets of hydraulic conductivity measurements that maximize the combined benefits of groundwater production and data collection. The two specific tasks involved are to determine where to take the next set of measurements and when to halt the data collection

W03019

campaign. Bayesian decision analysis relies on the concepts of the expected value of measurement information (EVMI), which is a measure of the expected worth of information, and the expected value of perfect information (EVPI), which serves as an upper bound on the worth of exploration. Data worth is determined by comparing the cost of data collection to the expected gain in profit to the water manager. The economic value of additional data is measured by the consequent improvement in optimal groundwater management operations. The process involves sequential data collection and stochastic groundwater management, continuing until net profit is maximized. Because data collection and groundwater management are both economic optimization problems that account for uncertainty, solving the joint data worth and planning problem is computationally intensive. For each possible sampling location in the field, the methodology involves 1) a Monte Carlo analysis to characterize the hydraulic conductivity distribution at the location, and 2) a stochastic simulation optimization for each likely value of the hydraulic conductivity at that location. [8] We apply the optimal design, data collection, and decision-making framework to a hypothetical example based on groundwater management in an area where wetlands must be protected. The example is realistic but of limited complexity, and the following assumptions were adopted. (1) We consider a system represented by a twodimensional, recharge dominated simulation model with steady state conditions. (2) We evaluate the worth of data in zones rather than at individual points, sample sets of hydraulic conductivity measurements, sample those conductivity sets sequentially, do not account for estimation error, and the support scale of individual measurements is local. (3) We account only for the uncertainty in spatially variable hydraulic conductivity. However, the modular nature and Monte Carlo design of the framework allow adaptation to more complex groundwater and transport problems. In principle, the multiple-realization management model and Bayesian decision analysis approach can account for any source of uncertainty. The main drawback is computational effort, which increases with model complexity and the number of uncertain variables. This is especially true when continuous variables like the hydraulic conductivity are involved because the assessment of the value of information at a certain location requires a stochastic optimization for each possible value the variable can take at the particular location.

2. Review of Previous Work [9] To date, several studies have inspected the worth of data in hydrological problems using statistical decision theory. The earliest contributions reported are by Davis and Dvoranchik [1971] and Davis et al. [1972], who evaluated the worth of additional streamflow data for the design of piles for bridge construction. Maddock [1973] combined decision theory, groundwater simulation and mathematical programming to plan and manage an irrigated farm and to evaluate the worth of hydrogeological and other information. Gates and Kisiel [1974] computed the worth of measurements of storativity, transmissivity, discharge and recharge for a numerical groundwater prediction model. The

2 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

first application of decision making to problems involving groundwater contamination was presented by Kaunas and Haimes [1985]. Among other early contributions of statistical decision theory to contamination problems are the works provided by Grosser and Goodman [1985], Ben-Zvi et al. [1988], and Reichard and Evans [1989]. [10] Most of the aforementioned studies predate the wide use of spatial stochastic theory in hydrological sciences and treated the hydrogeological parameters as regionally homogeneous but uncertain. Since the introduction of geostatistical concepts in aquifer simulation by Delhomme [1978], there have been several sampling design studies that identify the ‘‘best’’ sampling locations for additional measurements as the locations that provide the greatest reduction in parameter uncertainty or in model prediction uncertainty [e.g., Carrera et al., 1984; Rouhani, 1985; Loaiciga, 1989; Tucciarelli and Pinder, 1991; Wagner et al., 1992; Wagner, 1995a, 1999; Criminisi et al., 1997]. They quantified the worth of a measurement by the degree by which it increased the accuracy of predictions of heads or concentrations, rather than in monetary terms. We argue that data worth analyses are far more useful if they are based on an objective function that quantifies the economic benefits of reducing model prediction uncertainty. [11] The modern risk-based design approach that included geostatistical techniques to characterize spatial variability was introduced into the literature by Massmann and Freeze [1987a, 1987b]. The concepts were further developed in a series of papers by Freeze et al. [1990], Massmann et al. [1990], and Freeze et al. [1992]. Freeze et al. [1992] showed how to evaluate the worth of spatially correlated measurements of hydraulic conductivity in a heterogeneous aquifer when predicting contaminant travel time. James and Freeze [1993] developed a Bayesian decision-making framework to evaluate the worth of data for remedial containment of contaminants in an aquifer underlain by an aquitard of uncertain continuity. James and Gorelick [1994] evaluated the monetary worth of data needed to delineate a contaminant plume, and estimated the optimal stepwise number of samples in a data collection program. Bayesian decision theory is used in a wide range of disciplines to evaluate the worth of information. For a comprehensive review of value of information analysis applications, with a focus on health risk assessment, we refer the reader to the review paper by Yokota and Thompson [2004]. [12] We build on the works of Freeze et al. [1992] and Wagner and Gorelick [1989] and propose a combination of simulation optimization and decision analysis to evaluate the worth of sets of hydraulic conductivity data to determine the maximum profit of water production in ecologically sensitive areas, where minimum water levels must be maintained. The main contribution is the development of a more comprehensive linkage between data collection, model uncertainty analysis, and optimal water resources planning. To our knowledge this is the first time stochastic simulation optimization has been integrated into formal decision analysis for optimal data collection, thereby reducing both prediction uncertainty and the consequent suboptimally losses in production. By incorporating a multiple realization simulation optimization management model into Bayesian decision analysis, we identify a reliable optimal groundwater management plan that protects wetlands while

W03019

simultaneously determining the economic worth of sequential measurements of hydraulic conductivities in random conditional fields that are spatially correlated.

3. Wetland Problem [13] We present the optimal data collection and planning approach for groundwater production in a region where wetland health depends on the maintenance of the local water table at or near the land surface. This problem is motivated by water management challenges, such as those in Florida, where the supply of groundwater is regulated to maintain the integrity of isolated lakes and palustrine wetlands. Southwest Florida relies on groundwater for approximately 86% of its 4.9 billion liters per day supply [Southwest Florida Waste Management District (SWFWMD), 2003]. The Floridian aquifer, which provides over 80% of the groundwater in the region, has been pumped so heavily that declining water levels and severe water resources problems occurred in the north Tampa Bay area in the mid-1990s. In 1996, the Florida legislature directed the Florida Department of Environmental Quality to establish minimum water levels to ensure that significant harm is not caused to the water resources and the environment. Responding to the statute, the Southwest Florida Water Management District, one of five regional authorities in the state, proposed specific minimum water levels in 1998 amidst significant controversy. After review by an independent panel of experts, these minimum water levels were established to protect ecologically sensitive habitats under threat due to groundwater declines from excessive pumping. The minimum water levels currently constitute the single most significant restriction on groundwater exploitation in the region [SWFWMD, 1999; Florida Statutes, Title XXVIII, Natural resources; conservation reclamation and use, chap. 373, Water Resources, State Water Resources Plan, 373.042 Minimum flows and levels, 2000]. Similar regulations have been established in other states, such as Kansas, where 90% of all irrigation water is derived from groundwater and minimum base flow must be maintained (Kansas Water Appropriations Act, K.S.A.82a – 703b, 1984). [14] Our illustrative example involves pumping to obtain a water supply while maintaining a high water table for wetland protection. We use topography obtained from a Digital Elevation Map of a wetland area near Dade City in southwest Florida, but this is intended to be generic, not adhering to the situation in that particular area. The example system is shown in Figure 1. The unconfined aquifer is approximately 150 m thick and of lateral dimensions 2.5 km by 2.5 km. Heads are simulated using finite differences based on discretization of the area into 50 by 50 m grid cells. [15] To limit the influence of the far-field boundary conditions on the region of concern, it is surrounded by a 5.6 km buffer zone, or termination strip, consisting of 10 additional rows and columns with successively increasing dimensions toward the exterior margins. Groundwater inflow of 2500 m3/d across the entire boundary of the domain occurs along the flux boundary at the top of Figure 1 and exits under conditions of no pumping along the constant head boundary fixed at 21.0 m at the bottom margin of Figure 1. Under natural conditions, the hydraulic

3 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

W03019

domain to reflect site-specific habitat susceptibility, or transient to allow for temporal changes in water needs of wetlands. Prior to pumping the maximum depth to water in the wetlands is 0.3 m. Alternatively, an equivalent formulation could have restricted heads or drawdowns, rather than the readily measured depth to the water table. [17] When the water table falls below the extinction depth of 2 m, the ET condition changes from a linear decline with depth to a constant value of zero. This is a discontinuity and could pose problems when using a gradient-based optimization method. However, the maximum allowable depth to the water table at the constraint locations is 0.5 m, which is far from the extinction depth of 2 m. As a result, the locations where the water table can fall below the extinction depth will be distant from the constraint locations. Hence discontinuities in the response of the depth to the water table to pumping did not occur at the constraint locations and the gradientbased optimization method successfully identified solutions in all cases. Figure 1. Example aquifer system represented as a spatially variable hydraulic conductivity field (surrounding region not to scale). gradient is approximately 104. Along the left and right sides of the domain, zero-gradient boundaries are specified at the outer edges of the buffer zone. Uniform precipitation of 1500 mm/year is applied to the system. Evapotranspiration is calculated with ETmax of 1800 mm/year and an extinction depth of 2 m. This results in a net recharge of approximately 300 mm/year. The hydraulic conductivity of the domain is represented as a realization of a second-order stationary Gaussian random field, where Y = log K is the log hydraulic conductivity. The field is characterized by a mean log value of mY = 1.6 (a geometric mean K of 40 m day1), an isotropic, exponential covariance function with log variance s2Y = 1.0, and a correlation length lY = 200 m. For the surrounding buffer zone the hydraulic conductivity is set to the geometric mean conductivity.

4.2. Optimal Groundwater Management Under Uncertainty [18] The primary cause of prediction uncertainty in the above case is unknown hydraulic conductivity, which is spatially variable. On the basis of both local data at the supply wells and data from similar environments, we assume that the spatial statistics of the hydraulic conductivity field can be estimated. Given these statistics, a suite of equally likely realizations conditioned on the conductivity values at the existing supply wells can be generated and represents the likely range of values. Optimal groundwater planning can account for uncertainty in conductivity using a stochastic simulation optimization method known as the multiple-realization or stacking approach [Gorelick, 1987; Wagner and Gorelick, 1989; Chan, 1993]. In the approach, a large number of equally likely realizations are included in the optimization formulation and a large simulation optimization, involving simultaneous simulations based on each conductivity realization, is solved. The realizations in the stack are equally likely, hence a uniform probability distribution characterizes the conductivity realizations. If 100

4. Groundwater Management Under Uncertainty 4.1. Formulation of the Groundwater Management Problem [16] We consider a water management authority that is responsible for providing a supply from groundwater extracted at four pumping centers that it owns and operates. The objective of the manager is to maximize total production, and consequent profits, by determining a single selection of pumping well rates from its existing well fields. Regulations require that minimum water levels must be maintained in ecologically sensitive areas. Figure 2 shows the location of the 4 pumping centers P1, . . ., P4. At the 422 control locations, shown as circles in Figure 2, minimum level constraints are imposed. The gray scale in Figure 2 indicates the depths to the water table under natural conditions. Water depths under conditions of no pumping are between 0.1 and 7.5 m. At the control locations, water levels, expressed as a maximum depth to water of 0.5 m, must be maintained under pumping conditions. We note that the bound can easily be made variable throughout the

Figure 2. Location of the pumping well centers (P1, P2, P3, and P4) and the control locations (circles) shown on an overlay of the regional depth to the water table.

4 of 13

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

W03019

realizations are considered, then the constraint set will consist of a stack involving 100 different simulation models. Groundwater production, or the consequent profit obtained from water sales, is maximized while constraints are obeyed in each and every simulation model based on the many hydraulic conductivity realizations in the stack. One unique set of pumping rates is determined regardless of the number of realizations in the stack. [19] Because the statistics of the stochastic model for the hydraulic conductivity are assumed known, the equally likely conductivity realizations in the stack are characterized by identical values for the mean, variance, and correlation length. Stochastic optimization employing the multiplerealization management modeling approach could be extended to account for uncertainty in the statistical parameters that govern the hydraulic conductivity distribution. The unknown statistical parameters are then treated as random quantities. Bayesian principles [e.g., Kitanidis, 1986; Feyen et al., 2002] or spatial bootstrapping [e.g., Journel, 1993] then can be applied to infer the probability distributions of the parameters. From the latter, a Monte Carlo method is used to sample combinations of statistical parameter values, which are used to generate equally likely conductivity realizations. Hence the statistics characterizing the realizations in the stack span the likely range of values, as described by the joint probability distribution of the underlying statistical parameters. The result is that the variability among the realizations in the stack not only reflects the uncertainty due to the spatial variability in conductivity, but also the additional uncertainty that stems from imperfect knowledge about the parameters of the stochastic model. Because in this work we focus on the worth of data, rather than on parameter estimation, we have assumed that the statistics are known and fixed when generating hydraulic conductivity realizations employed in the stack. However, we do condition the realizations to measured conductivity values. [20] The mathematical formulation of a multiple-realization management model for maximizing groundwater production is maximize et q

ð1Þ

subject to Realization 1 f ðqÞ ¼ e di ðK1 Þ di*

8i

ð2aÞ

Realization 2 f ðqÞ ¼ e di ðK2 Þ di* .. .

8i

ð2bÞ

Realization j f ðqÞ ¼ e di Kj di* .. .

8i

ð2cÞ

Realization ssz f ðq Þ ¼ e di ðKssz Þ di* q0

at pumping wells

8i

ð2dÞ ð3Þ

W03019

where e is a column vector of ones, t denotes the transpose operator, q is a vector of the pumping rates at the individual wells in m3/d, Kj represents the jth realization of the hydraulic conductivity field, e d i is the spatially variable depth to the water table at control location i, d*i is the maximum allowable depth to the water table at location i, and ssz is the stack size, which is the number of hydraulic conductivity realizations included in the stochastic management model. For any hydraulic conductivity realization, Kj, and a particular set of pumping rates, q, the head distribution is calculated by solving a groundwater flow model, f(q). The multiple realization optimization simulation management model was solved using the SNOPT optimization code [Gill et al., 2002]. MODFLOW-2000 [Harbaugh et al., 2000] was used to solve the groundwater flow model and was coded as an external subroutine that was called for each function evaluation in the optimization. [21] In general, the optimal pumping strategy will not be dictated by one single ‘‘worst case’’ realization. Rather, there can be parts of one realization that dictate pumping in one region and parts of another realization that dictate pumping in another region, such that the optimal solution will meet each constraint in every part of every realization in the stack. The stacking approach addresses uncertainty by determining an optimal pumping scheme that is necessarily overdesigned to guarantee success in meeting the constraints even when the underlying model predictions are in doubt. 4.3. Variability and Reliability of the Stochastic Optimization Simulation Results [22] The ensemble of realizations in a stack of finite size is a sample of the variability in hydraulic conductivity. Different stacks of the same size contain different realizations, thus constitute different samples of the range of variability. Using the multiple realization approach, optimal solutions based on stacks of the same size but consisting of different sets of realizations can be different. To evaluate the variability of the stochastic optimization simulation results, the optimal pumping scheme was determined for 300 different stacks, each with a stack size of 100 realizations, where each stack contained a unique set of realizations. Optimal total groundwater production varied between 8.69 103 m3/d and 10.37 103 m3/d, with an average production of 9.66 103 m3/d and a standard deviation of 0.25 103 m3/d. Thus the effect on the optimal solution of considering different realizations, as expressed by the relative magnitude of the standard deviation, was 2.6%. [23] The water manager must select a reliability (or range of reliabilities to be considered) that the depth to water constraints will be met. The reliability of the simulation optimization solutions can be determined a posteriori, by conducting a set of postoptimality Monte Carlo simulations. A statistical measure of the success or failure of an optimal pumping scheme is determined by evaluating its performance when applied to a series of test realizations. For a given test realization, the optimal solution is successful when none of the constraints for the test realization are violated. The reliability equals the fraction of successes out of the total set of Monte Carlo runs, i.e., the reliability is equal to one minus the probability of failure. Given the variability observed in the simulation optimization results, it is obvious that the reliability of the optimal pumping

5 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

schemes will also vary among different stacks of the same size that are represented in the formulation. [24] Some works have attempted to develop relations for the multiple-realization methods that prespecify reliability [e.g., Chan, 1993; Feyen and Gorelick, 2004], such that once the reliability is determined, the stack size can be readily estimated. Chan [1993] showed that, based on order statistics, the system reliability for a stack of n realizations is approximately n/(n + 1). He also developed an argument based on Bayesian principles in which the reliability is (n + 1)/(n + 2). Feyen and Gorelick [2004] proposed a new measure to predict the expected reliability of meeting water level constraints in wetland areas as a function of the number of conductivity realizations included in the stochastic optimization formulation and the variance of log K. It is important to note that these formulas provide a way to estimate the reliability of the optimal groundwater pumping plan without conducting postoptimization Monte Carlo analysis. In the current work, this is a technical detail that relates to assessing the likely viability of the optimal groundwater management scheme. It is not part of the Bayesian decision analysis aimed at determining the best measurement network. [25] We determined the reliabilities of each of the 300 optimal pumping strategies by counting the number of successes and failures when each of the optimal pumping schemes was applied to sets of 250 test realizations. In each determination of reliability, a different unique set of 250 test realizations was used. This resulted in 300 reliability values, ranging from 0.91 to 1.00, with an average reliability of 0.965 and a standard deviation of 0.018. Had a higher reliability been required, then the stack size necessarily would have been larger. For example, an average reliability of 0.98 would require a stack size of 200 realizations [Feyen and Gorelick, 2004]. [26] The results of the 300 multiple-realization optimizations and the reliability tests reveal that the optimal pumping schemes and the corresponding reliability may show variability that can not be neglected. Larger stacks of realizations sample a wider range of the uncertainty, and it is to be expected that the variability within the optimization results will decrease with increasing stack size. In the limit, when all possible realizations are stacked, the optimal pumping scheme should be unique and the reliability of the solution should equal one, with no variability. However, large stack sizes, e.g., >1000 realizations, may be computationally infeasible. Also, the required reliability may not justify such large stack sizes. Instead, one can perform the optimization analysis for a number of identical formulations, each with a chosen stack size, and accept the average reliability as a good estimate. This is the approach that we adopt in the worth of data analysis described below. A stack size of 100 realizations was used, hence the average reliability of the optimal pumping schemes is taken as 0.965. As shown later, the reliability of 0.965 was maintained in the final solution after data collection. This value was also determined using postoptimality Monte Carlo analysis.

5. Worth of Data [27] A central thread of our approach is the evaluation of the worth of data. The underlying precept is that the water

W03019

manager is a profit maximizer. As such, data collection efforts are only worthwhile if the expected value of each additional measurement at least pays for itself by reliably increasing groundwater production and subsequent profits. In this section, we first formulate the problem of optimal data collection mathematically. Then, we give an overview of the Bayesian decision analysis used to optimize the sampling strategy. The rest of the section details the different steps of the Bayesian decision analysis applied to the wetland problem. 5.1. Mathematical Formulation [28] The mathematical formulation for the present value worth of data problem is ð4aÞ

maximize Pnet

with Pnet ¼ Ps Ctot meas

ð4bÞ

¼ Rs Ccap þ Cop Ctot meas

ð4cÞ

T X

h i 1 unit unit ¼ prw :q:t Ccap þ Cq :q:t ð1 þ d Þt t¼1 unit ð4dÞ Cmeas : mKmeas

subject to Realization 1 f ðqÞ ¼ e di ðK1 Þ di*

8i

ð5aÞ

Realization 2 f ðq Þ ¼ e di ðK2 Þ di* .. .

8i

ð5bÞ

Realization j f ðqÞ ¼ e di Kj di* .. .

8i

ð5cÞ

Realization ssz f ðqÞ ¼ e di ðKssz Þ di*

q0

8i

ð5dÞ

ð6Þ

at pumping wells, where Pnet net profit ($); Ps profit from water sales ($); Rs revenue from water sales ($); total costs of taking hydraulic conductivity meaCtot meas surements ($); unit cost of hydraulic conductivity measurement Cunit meas ($/measurement);

6 of 13

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

W03019

W03019

table is small, single well, short-duration pumping tests provide local values of conductivity representative of a 2500 m2 footprint. In thinner aquifers slug tests might yield appropriate values. Collecting several measurements at once is common practice because it reduces fixed mobilization charges.

Figure 3. Schematic overview of Bayesian decision analysis to evaluate the worth of data. Ccap amortized capital and replacement cost (installation of the production well field) ($); Cop annual operating costs of the well field ($); d discount rate (dimensionless); annual sales price per unit water ($/m3); prunit w annual cost per unit pumping ($/m3); Cunit q K mmeas number of hydraulic conductivity campaigns (dimensionless); t time (years). The production well field is in place and operational prior to collecting the additional data determined in this analysis. The fixed cost of aquifer test well installation is contained in the unit cost of measurements. The amortized capital cost of the production well fields is included in the annual e unit groundwater production cost, or (Ccap + Cunit q .q.t) = Cq . unit e q.t, where Cq is the annual cost per unit pumping ($/m3) that includes the amortized capital costs. Equation (4d) then becomes Pnet ¼

T X t¼1

h i 1 unit e unit :q:t Cunit :mK : pr C w meas meas q t ð1 þ d Þ ð7Þ

We consider the case in which there is a negotiated fixed sales contract period, T, of five years. Therefore total net profit is a function of net annual profit per unit of water ($/m3), and the total cost of making m sets production Punit s of aquifer measurements. As such, equation (7) is unit K Pnet ¼ Punit s :q:T Cmeas : mmeas :

ð8Þ

As long as new hydraulic conductivity measurements reduce model prediction uncertainty, and enhanced model reliability enables a greater Pnet, those measurements are valuable. This is true when the increase in profit from water sales Ps = Punit s .q.T outweighs the increase in costs unit K Ctot meas = Cmeas.mmeas of taking additional measurements. The best measurement to take next in a sequence of measurements is the one that generates the greatest expected net profit. We consider campaigns to collect hydraulic conductivity data that are taken in sets of five local measurements within an area of 250,000 m2 (500 m by 500 m). In our case, in which the depth to the water

5.2. Evaluation of the Value of Information: Overview of Methodology [29] Bayesian decision analysis consists of three components and is schematically outlined in Figure 3. The prior analysis determines the profit from water sales based on prior information. The preposterior analysis estimates the optimal measurement locations and evaluates whether each sequential measurement will be cost-effective before it is taken. The posterior analysis then revises the prior optimal pumping scheme and consequent profit, given the new information. Because the mean, variance and correlation length of the hydraulic conductivity field are fixed, only the ‘‘K map’’ but not the spatial statistics of K are updated when each of the handful of new measurements becomes available. In this case, formal Bayesian updating of the hydraulic conductivity field as by Kitanidis [1986] and Feyen et al. [2002] reduces to conditional simulation to the enlarged data set of measured conductivity values. As such, conditional realizations generated in the prior, preposterior, and posterior analysis are characterized by identical spatial statistics. Only the number of conditioning data that are honored by the stochastic simulations differs. [30] The core component in determining the worth of data is the preposterior analysis. In that part, the likely worth of additional information is estimated and the best measurement points are identified. Even though the values at the proposed measurement locations remain unknown, such measurements will reduce the uncertainty, which allows the expected increase in the objective function that includes the proposed measurements to be estimated. The preposterior analysis is based on estimation of the expected profit increase due to taking additional measurements. [31] The expected value of perfect information (EVPI) assumes a perfect sampling design that reduces uncertainty to zero and hence yields the truly optimal groundwater extraction rates and maximum profit from water sales. The value of perfect information (VPI) or regret is the difference in profit from water sales Ps between the prior analysis and the perfect optimal solution when the hydraulic conductivity field is sampled exhaustively and precisely. The VPI or regret is a measure of the loss in profit from water sales due to the unknown spatial variability in hydraulic conductivity. Because the true hydraulic conductivity field is unknown, the regret cannot be calculated. However, for each equally likely representation of the hydraulic conductivity field, the regret or VPI can be calculated. The average of the calculated regrets over the set of equally like realizations of K is an estimate of the VPI, hence is called the expected VPI, or EVPI. The EVPI is an estimate of the expected loss in profit due to the unknown spatial variability in hydraulic conductivity. It is the maximum improvement that could be expected, and is an upper bound on the worth of exploration or data collection. [32] The expected value of measurement information (EVMI) reflects the expected value of an imperfect sampling round, which does not reduce uncertainty to zero. It is

7 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

an estimate of the monetary worth of the information, and allows one to determine whether sampling is valuable. A sampling program is cost-effective only if it costs less than the EVMI. The value of measurement information (VMI) is the increase in the profit from water sales Ps between the prior analysis and the posterior analysis that includes the sample information. Since the VMI remains unknown until the sample is actually taken, it can only be estimated. The EVMI is then equal to the expected increase in Ps between the prior analysis and the posterior analysis due to taking the proposed additional measurements Y(b). Formally, the EVMI at location b can be written as

EVMIðbÞ ¼ Pprior E fPs jY ðbÞg ; s

ð9Þ

where Pprior is the profit from water sales prior to collecting s conductivity measurements and E{PsjY(b)} is the expected profit from water sales given sample Y(b). On the basis of spatial stochastic theory, equation (9) can be expanded into EVMIðbÞ ¼ jPprior ½Ps ðY ðbÞ ¼ y1 Þ: PrðY ðbÞ ¼ y1 Þ þ . . . s þ Ps ðY ðbÞ ¼ yn Þ: PrðY ðbÞ ¼ yn Þj;

ð10Þ

where Ps(Y(b) = yi) is the profit from water sales when Y(b) = yi, Pr(Y(b) = yi) is the probability of sampling yi at location b, and n is the number of hydraulic conductivity values statistically sampled to characterize the distribution of log K at location b [James and Freeze, 1993]. The EVMI of a location thus equals the average increase in profit from groundwater sales over the possible outcomes of the hydraulic conductivity at that location, weighted by the probability of the respective outcomes. The n possible outcomes yi at location b are samples from N(mb,krig, s2b,krig), where mb,krig and s2b,krig are the kriged estimate and estimation variance, respectively, at location b. Values of hydraulic conductivity are sampled by generating conditional realizations and selecting the simulated values at the locations of interest. Although there may be many locations where a sample is cost-effective, a measurement will be taken only at the location with the highest EVMI. That is the most beneficial data to a water manager interested in reducing uncertainty to maximize production and profits. [33] It is important to note that the actual increase (postmeasurement) in profit from water sales can differ substantially from the EVMI, and the actual profit can increase or decrease when a measurement is taken. This is because the EVMI is an expectation; it is the average increase in profit from water sales, with the average over the possible values of the hydraulic conductivity at that location, whereas the actual increase in profit depends only on the true hydraulic conductivity value at the location. Thus, even when the likely increase in profit from water sales outweighs the cost of sampling, the increase in net profit, which is unknown until the sample is actually taken, can turn out to be negative. 5.3. Evaluation of the Value of Information: Application to the Wetland Problem 5.3.1. How Much Should the Manager Spend on Data Collection? [34] In each of the three stages of the data worth analysis, the optimal pumping rates are determined using the multi-

W03019

ple-realization approach given by equations (1) – (3). In the prior and posterior analyses, 100 stacks (snr = 100) with a stack size of 100 realizations (ssz = 100) each are included in the simulation optimization formulation and the snr = 100 optimization results are averaged. Both the prior and posterior analysis require simulation of 10,000 ‘‘aquifers’’, each with a unique hydraulic conductivity field. As shown later, the preposterior analysis requires simulating the same number of ‘‘aquifers’’ for each potential sampling location. [35] In our case, the hydraulic conductivity values at the pumping locations are known a priori, and therefore included in the prior analysis by generating realizations conditioned on the values at those locations. In the posterior analysis, the hydraulic conductivity fields will be conditioned on both the values at the pumping locations and on the newly sampled data. Determining each simulation optimization solution involves a search for the best pumping rates when considering 100 simulated aquifers in the constraint set (5), and solution of the nonlinear optimization typically requires 50 to 60 iterations. For the given formulation, the average computation time for solving a single simulation optimization management model with a stack size of 100 realizations (ssz = 100) is 45 minutes on a 2.6 GHz machine. [36] The prior analysis begins with the computation of the EVPI or average regret. The value of EVPI, expressed as a pumping rate, is equal to the difference in the average total optimal pumping rates when stacking 1 versus 100 realizations. Note that these two values are each based on taking an average of 100 simulation optimization solutions (snr = 100). For ssz = 1, average optimal production equaled 11.83 103 m3/d. For ssz = 100, average optimal production was 9.66 103 m3/d, which corresponds to the average production of the 300 multiple realization simulation optimization solutions described in Section 4.3. The computation of EVPI is simply (11.83 9.66) 103 m3/d = 2.17 103 m3/d. This is the loss in groundwater production due to unknown spatial variability in hydraulic conductivity. As such, the loss in groundwater production is 18% of the estimated potential groundwater extraction, or the system is operating at 82% of its estimated potential capacity. The profit to the water manager is 0.4 $/m3 based on the 5-year sales contract period with negotiated fixed pricing. In monetary terms, the EVPI then equals (2.17 103 m3/d) (5 365 days) (0.4 $/m3) = $1,584,100. This is the maximum budget that should be spent on data collection. It reflects the greatest potential unrealized profit due to initially unknown spatial variability of hydraulic conductivity when no additional measurements are taken. 5.3.2. Measurement Layout and Cost of Sampling [37] For the data worth analysis, we divide the region into 25 blocks, each occupying a region of 500 m by 500 m. The blocks are numbered from 1 to 25, starting with 1 in the bottom left corner of the domain, cycling by columns first, and ending in the upper right corner with 25. We look at sets of 5 measurements, spaced on a regular ‘‘5-spot’’ pattern within each block. We are interested in efficiently evaluating the worth of information in zones within our model domain, while recognizing fine-scale spatial variability and uncertainty in hydraulic conductivity. That is, each single block considered for economic analysis contains 100 local and spatially variable values of hydraulic conductivity, 5 of

8 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Table 1. Relation Between the EVMI, Expressed in Terms of an Increase in Pumping Rate or Increase in Profit, and the Number of Measurement Campaigns That Determine Five Local Hydraulic Conductivity Values Within a Region Occupying 250,000 m2 EVMI, m3/d

EVMI, $

Number of Measurement Campaigns

68 137 274 411 548 685

50,000 100,000 200,000 300,000 400,000 500,000

0.5 1 2 3 4 5

which are (potentially) measured in a data collection campaign. Taking multiple measurements in each campaign reduces the price per conductivity measurement, and reduces the dimensionality of the problem, making it computationally tractable. The cost of the aquifer tests to determine 5 local hydraulic conductivity measurements within a block is $100,000. From Table 1 it can be seen that this value corresponds to an increase in profit generated by an increase in groundwater production rate of 137 m3/d. Hence sampling is cost effective if the measurements generate an EVMI of more than $100,000, which is equivalent to an increase in optimal pumping rate greater than 137 m3/d. 5.3.3. Evaluation of the Value of Information: EVMI Calculation and Convergence of Results [38] The aim of the preposterior analysis is to calculate the expected value of potential measurements or EVMI. For each 500 m by 500 m block b, the EVMI is obtained by evaluating equation (10). The sets of log hydraulic conductivity values yi(b) = (ybi,1,. . ., ybi,5), with i = 1, . . ., n, are obtained from different conditional realizations. For each set of values yi(b), snr simulation optimization problems each containing 100 conditional realizations (ssz = 100) are solved. Hence such an analysis results in snr estimates of the VMI or regret for each of the n sampled values yi(b). Averaging both over the snr estimates of the VMI per sample and the n samples yi(b) results in the EVMI. [39] Practically, there are computational limits on n, the number of values sampled from N(mb,krig, s2b,krig) to represent the variability in hydraulic conductivity at each location, and on snr, the number of simulation optimization problems solved for each of those statistical samples of yi(b) drawn from N(mb,krig, s2b,krig). In section 4.3 it was shown that the standard deviation of the optimal solution when including different realizations in the stack (stack size = 100) was 2.6%. A more significant source of variability in the optimal solutions has to do with n, which determines how accurately the variability of log K at a potential measurement location is characterized. The simulation optimization solutions for different log K values at a particular block in which an additional measurement might be taken depends not only on the variability of K in that block, but also on both the actual location of the block and the magnitude of the log K values. For n = 100, and considering a single stacking optimization (snr = 1) of 100 realizations (ssz = 100) for

W03019

each of the n samples yi(b), the standard deviation of the simulation optimization solutions for the 25 blocks varied between 2.9% and 4.1%. This means that the consideration of different samples of yi(b) at a given block b better accounts for a greater source of variability in the optimization solutions than consideration of more stacks (snr 1) for each of the n samples yi(b). Therefore we chose to set snr = 1, and increased n until the EVMI at all blocks converged, i.e., for each sampled set of log K values yi(b) at block b only one simulation optimization was evaluated. As discussed next, convergence occurred when n = 100. This implies that the VMI or regret obtained for measurement set yi(b) is a sample of the possible simulation optimization solutions at that block. We do not average the VMI of many alternative simulation optimization solutions for each possible outcome yi(b) and then sample a small number of log K sets from N(mb,krig, s2b,krig). Rather we take only one stack evaluated for each sample yi(b), and the VMI is averaged over a large number of sampled log K sets at that block, b. Moreover, when n increases, the probability of sampling similar yi(b) values increases, hence the effect of averaging over different stacks for one sample yi(b) is accounted for when n becomes large. [40] For the first sampling round, the fluctuation of the EVMI versus the number of samples n drawn from N(mb,krig, s2b,krig) is shown in Figure 4. The number of log hydraulic

Figure 4. Fluctuation of EVMI versus the number of values sampled n from N(mb,krig, s2b,krig ): (a) jEVMIi EVMI100j/EVMI100 for the 25 blocks; (b) average jEVMIi EVMI100j/EVMI100.

9 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Figure 5. Plot of EVMI ($): (a) first sampling round; (b) second sampling round. conductivity sets sampled per block is shown along the x axis, with the absolute deviation relative to the EVMI for n = 100 at each block shown along the y axis. Figure 4a shows jEVMIi EVMI100j/EVMI100 for each block. The average deviation for the 25 blocks is presented in Figure 4b. The values of EVMIi and EVMI100 correspond to the EVMI, or average VMI, after drawing i and 100 samples from N(mb,krig, s2b,krig), respectively. The fluctuation in EVMI for the individual blocks varied widely for small values of n, and converged when n increased. For nearly all blocks, the fluctuation of the EVMI stabilized for n > 75. On average, the absolute deviation declined with increasing n, with the decline initially steepest, and leveling off for higher values of n. The average absolute deviation was 10% for n = 75 and dropped below 5% for n > 90. This analysis shows that for n = 100 the EVMI values were sufficiently stable to allow selecting the blocks with the highest EVMI. Consequently, the preposterior analysis required 100 stochastic optimizations with a stack size of

W03019

100 realizations each, or the equivalent of simulating 10,000 aquifers, for each potential sampling location. 5.3.4. Selection of the Blocks at Which Hydraulic Conductivity Should Be Sampled [41] The EVMI100 values obtained in the preposterior analysis of the first sampling round are displayed in the shaded map in Figure 5a. Comparison of the results with the EVMI threshold of $100,000 shows that several blocks had an EVMI larger than the cost of sampling, hence were worth sampling. The highest EVMI values were observed in the vicinity of, but not necessarily in, the blocks in which the pumping locations are positioned. Thus, even though the hydraulic conductivity values at the pumping locations were known, which implies the kriging variance was smallest nearby, it was still most cost effective to sample the blocks nearby the pumping wells. [42] Block 20, located northeast of P2, had by far the highest EVMI. It was expected that the set of 5 conductivity measurements in this block would result in an optimal pumping rate of 10.02 103 m3/d. This corresponds to an increase in pumping rate of 355 m3/d, or an increase in profit of $259,150. However, the actual increase in pumping rate can differ substantially from the EVMI calculated for the block. For example, the actual optimal pumping rate obtained in the posterior analysis after sampling block 20 was 9.81 103 m3/d, which corresponds to an increase in pumping rate of 150 m3/d, or an increase in profit of $109,500, which is less than half the expected increase estimated using the EVMI. [43] The shaded EVMI map for the preposterior analysis of the second sampling round is given in Figure 5b. Block 20 is marked by a white cross to indicate that it is sampled in the first round and is no longer considered as a potential sampling block in further sampling rounds. Comparing the EVMI maps for both sampling rounds shows that on average the EVMI values have dropped and that the pattern of EVMI has changed by sampling block 20. For the second sampling round, the highest EVMI value was obtained for block 15, which is the block east of the block in which P2 is located. The fact that this block did not have the second highest EVMI in the preposterior analysis of the first sampling round indicates that a sampling campaign based on the first EVMI map only is not optimal. Sequential sampling and updated conditioning is an important aspect of developing the most cost-effective measurement campaign. Procedurally, the EVMI map must be recalculated after each set of measurements has been taken. [44] Optimal sequential measurement campaigns should continue until the highest EVMI among the remaining blocks is lower than the cost of sampling. In this case, a total of five blocks were sampled. These were the 4 blocks surrounding P2 (blocks 14, 15, 19 and 20) and the block in which P3 is located (block 11). In the 6th sampling round, the highest EVMI was observed in block 12, located east of P3. However, the expected payoff, as reflected in the EVMI of $48,910 at this location, did not justify engaging in another round of measurements. 5.3.5. Cost-Benefit Analysis Sequential Measurement Campaigns [45] The costs and benefits of the different sampling rounds are plotted in Figure 6. Figure 6 (top) shows the

10 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Figure 6. (top) Cumulative cost and (bottom) profit of m sequential measurement campaigns involving collecting sets of hydraulic conductivity data while maximizing profits from production under the constraint that water levels cannot be lowered excessively in wetland areas and that the optimal management plan is 0.965 reliable. Note that costs are negative values in the objective function. cumulative sampling costs as a function of the number of measurement rounds, with the costs shown along the y axis, whose range is inverted. Figure 6 (bottom) shows the profit from water sales and net profit to the water manager with each round of measurements. The profit from water sales does not include measurement costs and is a monotonically increasing function of the number of measurements taken. The water manager aims to maximize the net profit, which is obtained by subtracting the sampling costs from the profit from water sales. The results of the second measurement round show that, even though the expected increase in profit from water sales outweighs the cost of sampling, the increase in net profit of a particular campaign may turn out to be negative. This is because the EVMI is merely an expected value of the increase in profit from water sales. The actual increase in profit, which is only known after each sequential set of measurements is taken, can differ substantially from the expectation. For the second sampling round, the actual increase in profit from water sales was considerably lower than the expected increase in profit and did not pay off for the cost of taking the measurements. This resulted in a decline in net profit. However, sampling continued until the likely value of a set of measurements, as measured by the EVMI, was lower than the cost of sampling, even though the latest measurement generated a decline in profit. Thus the ‘‘stopping rule’’ relates to the expected increase in profit (EVMI) and not the actual realized increase in profit, which might fall short of the profit expectation.

W03019

[46] Engaging in the 5 sampling rounds resulted in an increase in the optimal pumping rate from 9660 m3/d to 10,670 m3/d, or an increase of 1,010 m3/d. In section 5.3.1, it was shown that the estimated potential capacity of the system was 11,830 m3/d. Thus it is only worthwhile for the water manager to recover approximately 47% of the expected suboptimality losses due to unknown spatial variability in the hydraulic conductivity by engaging in the five sampling campaigns. Optimal data collection costs totaled $500,000. This resulted in an increase in profit from water sales of $744,600 and a net profit increase of $244,600. The water manager’s return on investment is tangibly 49%. The external benefit is a better understanding and more complete representation of the aquifer system and its behavior, and maintenance of wetland viability. An additional campaign would raise total costs to $600,000 and reduce net profits to $192,050. It is to be expected that when the spatial statistics are imperfectly known, the worth of hydraulic conductivity data, and consequently the return on investment, will (at least initially) increase, because the conductivity data will have a combined effect of reducing the kriging variance as well as allowing for a better estimation of the spatial statistics. 5.3.6. Reliability of the Posterior Optimization Simulation Solutions [47] The results given by the stochastic groundwater management model after the 5 data campaigns were completed, was evaluated to determine the reliability of the optimal pumping plan. A Monte Carlo analysis was conducted in which 300 conditional test realizations were generated and aquifer simulations were run. On the basis of the number of realizations in which constraint violations occurred, the average reliability of the solution was estimated to be 0.967, with a standard deviation of 0.019. Therefore the goal was achieved of maintaining a reliability of at least 0.965 after data collection. Thus conditioning to hydraulic conductivity measurements allows more water to be extracted while maintaining the desired reliability of 0.965.

6. Summary and Conclusions [48] We demonstrated a framework that enables a water manager to determine optimal water production plans and data collection that maximize net profit. The optimal data collection program is aimed at reducing the predictive uncertainty of the simulation model in such a manner that only the most valuable data are collected. Value is not measured in terms of reduced uncertainty, but in terms of the consequent expected monetary profit to the water manager. Driving the optimal sampling program and groundwater resources management plan is the fact that spatial variability of hydraulic conductivity exists and is unknown. However, zones targeted for sampling did not coincide with regions with the highest estimation variance given by kriging. [49] The approach unites a multiple realization stochastic groundwater management model and Bayesian decision analysis to evaluate the worth of data aimed at maximizing the profit of aquifer production in ecologically sensitive areas. The method was demonstrated with a hypothetical unconfined aquifer system similar to systems in Florida.

11 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Through successive sampling campaigns, net profit was increased, while maintaining the reliability of the optimal simulation optimization solution at 0.965. The net profit was maximized by engaging in 5 sequential measurement campaigns. Zones with the largest expected increase in profit from water production were targeted until measurements were no longer cost effective. Additional measurements would cost more than the expected profit it was expected to produce. Additional data collection beyond five rounds was shown not to be cost effective. Zones in the neighborhood of the pumping wells consistently showed the highest expected sampling benefit aimed at the safe increase in pumping. The water manager achieved nearly a 50% return on investment by engaging in the optimal sequence of data collection. Moreover, the additional data resulted in an improved representation of aquifer properties and a better model of system behavior, and in the maintenance of wetland viability. [50] Acknowledgments. We are grateful to Michael Saunders at Stanford Department of Management Science and Engineering for his advice on the optimization part of the work. The authors wish to thank Keith Beven and Jim Freer for the access to the parallel system at the Institute of Environmental Sciences at Lancaster University, UK. The first author wishes to acknowledge the Fund for Scientific Research - Flanders for providing a Postdoctoral Fellowship and a mobility grant. We gratefully acknowledge National Science Foundation grant EAR-0337393. We also thank three anonymous reviewers and the associate editor for their review comments.

References Ahlfeld, D. P., and M. Heidari (1994), Applications of optimal hydraulic control to ground-water systems, J. Water Resour. Plann. Manage., 120(3), 350 – 365. Ben-Zvi, M., B. Berkowitz, and S. Kessler (1988), Preposterior analysis as a tool for data evaluation: Application to aquifer contamination, Water Resour. Manage., 2, 11 – 20. Carrera, J., E. Usunoff, and F. Szidarovsky (1984), A method for optimal observation network design for groundwater management, J. Hydrol., 73, 147 – 163. Chan, N. (1993), Robustness of the multiple realization method for stochastic hydraulic aquifer management, Water Resour. Res., 29(9), 3159 – 3167. Criminisi, A., T. Tucciarelli, and G. P. Karatzas (1997), A methodology to determine optimal transmissivity measurement locations in groundwater quality management models with scarce field information, Water Resour. Res., 33(6), 1265 – 1274. Davis, D. R., and W. M. Dvoranchik (1971), Evaluation of the worth of additional data, Water Resour. Bull., 7(4), 700 – 707. Davis, D. R., C. C. Kisiel, and L. Duckstein (1972), Bayesian decision theory applied to design in hydrology, Water Resour. Res., 8(1), 33 – 41. Delhomme, J. P. (1978), Kriging in the hydrosciences, Adv. Water Res., 1(5), 251 – 266. Feyen, L., and S. M. Gorelick (2004), Reliable groundwater management in hydroecologically sensitive areas, Water Resour. Res., 40, W07408, doi:10.1029/2003WR003003. Feyen, L., P. J. Ribeiro Jr., F. De Smedt, and P. J. Diggle (2002), Bayesian methodology to stochastic capture zone determination: Conditioning on transmissivity measurements, Water Resour. Res., 38(9), 1164, doi:10.1029/2001WR000950. Freeze, R. A., J. Massmann, L. Smith, T. Sperling, and B. James (1990), Hydrogeolocical decision analysis, 1. A framework, Ground Water, 28(5), 738 – 766. Freeze, R. A., B. James, J. Massmann, T. Sperling, and L. Smith (1992), Hydrogeological decision analysis, 4, The concept of data worth and its use in the development of site investigation strategies, Ground Water, 30(4), 574 – 588. Gates, J. S., and C. C. Kisiel (1974), Worth of additional data to a digital computer model of a groundwater basin, Water Resour. Res., 10(5), 1031 – 1038.

W03019

Gill, P. E., W. Murray, and M. A. Saunders (2002), SNOPT: An SQP algorithm for large-scale constrained optimization, SIAM J. Optim., 12(4), 979 – 1006. Gorelick, S. M. (1983), A review of distributed parameter groundwater management modeling methods, Water Resour. Res., 19(2), 305 – 319. Gorelick, S. M. (1987), Sensitivity analysis of optimal groundwater contaminant capture curves: Spatial variability and robust solutions, in Solving Ground Water Problems With Models: An Intensive ThreeDay Conference and Exposition Devoted Exclusively to Ground Water Modeling, pp. 133 – 146, Natl. Water Well Assoc., Dublin, Ohio. Gorelick, S. M. (1990), Large-scale nonlinear deterministic and stochastic optimization: Formulations involving simulation of subsurface contamination, Math. Programm., 48, 19 – 39. Grosser, P. W., and A. S. Goodman (1985), Determination of groundwater sampling frequencies through Bayesian decision theory, Civ. Eng. Syst., 2, 186 – 194. Harbaugh, A. W., E. R. Banta, M. C. Hill, and M. G. McDonald (2000), MODFLOW-2000, U.S. Geological Survey modular ground-water model—User guide to modularization concepts and the ground-water flow process, U.S. Geol. Surv. Open File Rep., 00-92, 121 pp. James, B. R., and R. A. Freeze (1993), The worth of data in predicting aquitard continuity in hydrogeological design, Water Resour. Res., 29(7), 2049 – 2065. James, B. R., and S. M. Gorelick (1994), When enough is enough: The worth of monitoring data in aquifer remediation design, Water Resour. Res., 30(12), 3499 – 3513. Journel, A. G. (1993), Resampling from stochastic simulations, Environ. Ecol. Stat., 1, 63 – 83. Kaunas, J. R., and Y. Y. Haimes (1985), Risk management of groundwater contamination in a multiobjective framework, Water Resour. Res., 21(11), 1721 – 1730. Kitanidis, P. K. (1986), Parameter uncertainty in estimation of spatial functions: Bayesian analysis, Water Resour. Res., 22(4), 499 – 507. Loaiciga, H. A. (1989), An optimization approach for groundwater quality monitoring network design, Water Resour. Res., 25(8), 1771 – 1782. Maddock, T., III (1973), Management model as a tool for studying the worth of data, Water Resour. Res., 9(2), 270 – 280. Massmann, J., and R. A. Freeze (1987a), Groundwater contamination from waste management sites: The interaction between risk-based engineering design and regulatory policy: 1. Methodology, Water Resour. Res., 23(2), 351 – 367. Massmann, J., and R. A. Freeze (1987b), Groundwater contamination from waste management sites: The interaction between risk-based engineering design and regulatory policy: 2. Results, Water Resour. Res., 23(2), 368 – 380. Massmann, J., R. A. Freeze, L. Smith, T. Sperling, and B. James (1990), Hydrogeolocical decision analysis, 2. Applications to ground water contamination, Ground Water, 29(4), 536 – 548. Morgan, D. R., J. W. Eheart, and A. J. Valocchi (1993), Aquifer remediation design under uncertainty using a new chance constrained programming technique, Water Resour. Res., 29(3), 551 – 561. Reichard, E. G., and J. S. Evans (1989), Assessing the value of hydrogeological information for risk-based remedial action decisions, Water Resour. Res., 25(7), 1451 – 1460. Rouhani, S. (1985), Variance reduction analysis, Water Resour. Res., 21(6), 837 – 846. Southwest Florida Water Management District (1999), Establishment of minimum levels in palustrine cypress wetlands, in northern Tampa Bay minimum flows and levels for isolated wetlands, category 1 and 2 lakes, seawater intrusion, environmental aquifer levels, and Tampa Bypass Canal, report, Brooksville, Fla. Southwest Florida Water Management District (2003), 2001 estimated water use in the Southwest Florida Water Management District, report, Brooksville, Fla. (Available at http://www.swfwmd.state. fl.us/documents/reports/files/2001%20Estimated%20Water%20 Use%20.pdf) Tucciarelli, T., and G. Pinder (1991), Optimal data acquisition strategy for the development of a transport model for groundwater remediation, Water Resour. Res., 27(4), 577 – 588. Wagner, B. J. (1995a), Sampling design methods for groundwater modeling under uncertainty, Water Resour. Res., 31(10), 2581 – 2591. Wagner, B. J. (1995b), Recent advances in simulation-optimization groundwater management modeling, Rev. Geophys., 33, 1021 – 1028.

12 of 13

W03019

FEYEN AND GORELICK: WORTH OF DATA FOR GROUNDWATER MANAGEMENT

Wagner, B. J. (1999), Evaluating data worth for ground-water management under uncertainty, J. Water Resour. Plann. Manage., 125(5), 281 – 288. Wagner, B. J., and S. M. Gorelick (1989), Reliable aquifer remediation in the presence of spatially variable hydraulic conductivity: From data to design, Water Resour. Res., 25(10), 2211 – 2225. Wagner, J. M., U. Shamir, and H. R. Nemati (1992), Groundwater quality management under uncertainty: Stochastic programming approaches and the value of information, Water Resour. Res., 28(5), 1233 – 1246. Yeh, W. W.-G. (1992), Systems analysis in ground-water planning and management, J. Water Resour. Plann. Manage., 118(3), 224 – 237.

W03019

Yokota, F., and K. Thompson (2004), Value of information literature analysis: A review of applications in health risk assessment, Med. Decis. Mak., 24, 287 – 298.

L. Feyen, Land Management Unit, Institute for Environment and Sustainability, European Commission, DG Joint Research Centre, TP261, I-21020 Ispra (Va), Italy. ([email protected]) S. M. Gorelick, Department of Geological and Environmental Sciences, Stanford University, Stanford, CA 94305-2115, USA. ([email protected] stanford.edu)

13 of 13