Spatial sampling design for prediction taking ... - Spatial Accuracy

1 downloads 0 Views 277KB Size Report
design, - we call it here experimental design, spatial sampling design becomes so ... 2 The Experimental Design Problem for Bayesian Linear Regression.
7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

Spatial sampling design for prediction taking account of uncertain covariance structure Jürgen Pilz1 and Gunter Spöck2 1 Universität Klagenfurt, Institut für Mathematik Universitätsstraße 65-67, 9020 Klagenfurt, Austria Tel.: + 0043 463 2700-3113; Fax: + 0043 463 2700-3199 [email protected] 2 Universität Klagenfurt, Institut für Mathematik Universitätsstraße 65-67, 9020 Klagenfurt, Austria Tel.: + 0043 463 2700-3125; Fax: + 0043 463 2700-3199 [email protected]

Abstract This paper presents a model-based approach to the problem of the optimal choice of a spatial design in the presence of uncertainty about the distribution of the observations, a topic which has received only little attention in the geostatistics literature so far. In spatial sampling one usually starts with an initial design to estimate the covariance function, such a design is nonmodel-based and chosen e.g. according to principles of deterministic sampling, cluster sampling, simple random and stratified sampling etc. The basic difficulty for rigorous modelbased approaches to spatial sampling is the fact that the spatial correlatedness of the observations leads to analytically intractable design criteria, whereas for linear regression models with uncorrelated observations one obtains well-tractable objective functions which can be optimized using a rich and fully developed methodology from convex analysis. After briefly reviewing the “classical” experimental design theory approach to regression models with uncorrelated observations, we will show how this approach can be extended to spatial sampling with correlated errors, using an approximation of the random field by linear regression models with random coefficients. In this context, we also present a useful algorithm for the iterative generation of an optimal design for spatial prediction. The results are illustrated by means of a real data set of Cs137 measurements.

Keywords: spatial model-based design, experimental design theory, uncertain covariance function, Bayes kriging

1 Introduction A theme, which in our opinion receives too little attention in the spatial statistics literature, is spatial sampling design, - the problem of optimal allocation of sampling points to spatial coordinates in order to improve spatial estimation and prediction. The reason for this neglection is surely the complicatedness of the mathematical problems involved in spatial sampling design and the planning of monitoring networks. In contrast to non-spatial sampling design, - we call it here experimental design, spatial sampling design becomes so complicated, because spatial observations are correlated. In contrast to the spatial sampling design problem the experimental design problem for linear regression models with uncorrelated errors may be considered as solved, since the pioneering works of Kiefer (1959) and Fedorov (1972) and its advances. In this paper we will show that every isotropic random field can be approximated arbitrarily close by a Bayesian linear regression problem. The well-developed experimental design theory

109

7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

for Bayesian linear regression can thus be used to calculate spatial sampling designs for random fields. On the basis of this result we develop an algorithm for the iterative generation of an optimal design for spatial prediction. We illustrate our findings and the procedure for generating optimal spatial designs by means of a data set of Cs137 measurements taken from a monitoring network in the region of Gomel (Belarus).

2 The Experimental Design Problem for Bayesian Linear Regression In this section, we will give a survey of the theory of optimal design for regression models with uncorrelated errors, since we will later show, that this theory can easily be used to solve spatial sampling design problems with correlated errors. As is shown in the next section, random fields can easily be approximated by linear regression models with stochastic coefficients with fixed mean and covariance structure. To be specific, we consider here the following Bayesian linear regression model

Y ( x ) = f ( x ) T β + ε ( x) , where we assume the error process ε ( x) to have expectation zero and be uncorrelated with fixed variance

var(Y ( x)) = σ 2 . There upon we assume to have prior knowledge about the trend parameter vector β in the form of a known prior mean E (β) = µ and a known a priori covariance matrix

cov(β) = σ 2Φ . Then the best linear affine predictor of the response surface at an unmeasured location x0 is given by

Yˆ ( x 0 ) = f ( x 0 ) T βˆ BK where βˆ BK = (FT F + Φ −1 ) −1 (FT Ydat + Φ −1µ )

is the a posteriori mean of the trend parameter vector β under the Gaussian assumption for the prior distribution and for the errors. The matrix F, known as design matrix, is made up of the components of f at n given design points x1 ,..., xn . Ydat denotes the vector of observations at these points. The total mean square error reads

E ((Y ( x0 ) − Yˆ ( x0 )) 2 ) = σ 2 [1 + f ( x0 )T (Φ −1 + FT F) −1 f ( x0 )]

110

7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

We write vn = ( x1 , x2 ,..., xn ) for the exact design and F (vn ) = F to stress, that the above expression is dependent on the selected design points xi . Bayesian experimental design then tries to select the points in

vn

in such a way, that the expression

T T −1 −1 2 2 ∫ E ((Y ( x0 ) − Yˆ ( x0 )) ) P(dx0 ) = ∫ σ [1 + f ( x0 ) (Φ + F F) f ( x0 )]P(dx0 )

X

X

is minimised, where X is the design region and

P(dx0 )

is a fixed prespecified probability

measure on X , weighting the importance of the prediction points. Obviously this is a discrete optimization problem and is therefore very complicated to solve numerically. The matrix

M b (v n ) =

1 T (F (v n )F (v n ) + Φ −1 ) n

is called Bayesian information matrix. Its inverse is proportional to the expectation of the a posteriori covariance matrix of β . Assuming that k different design points x1 , x2 ,..., xk with k

multiplicities n1 , n2 ,..., nk , ∑ ni = n are contained in vn we may write i =1

k n 1 T F (vn )F(vn ) = ∑ i f ( xi )f ( xi )T n i =1 n

The proportions

ni n

may be interpreted as probabilities or intensities with which the different

design points xi are selected. This interpretation is the key idea to define continuous design measures ξ ( dx) on

X . A continuous design measure is just a probability measure on X ,

and is a generalization of exact design measures

ξ ( x1 ) =

n n1 n , ξ ( x 2 ) = 2 ,..., ξ ( x k ) = k n n n

k

∑ ni = n, x1 ≠ x2 ≠ ... ≠ xk , ni ∈ N . i =1

Generalizing further, we can also define a continuous Bayesian information matrix

1 M B (ξ ) = ∫ f ( x) T f ( x)ξ (dx) + Φ −1 n X The corresponding continuous design problem reads T −1 . ∫ f ( x) M B (ξ ) f ( x) P(dx) → min ξ ∈Ξ

X

Here Ξ is the set of all probability measures on the set X supposed to be compact. Defining

U = ∫ f ( x)f ( x)T P (dx) X

111

7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

the above minimization problem may be written

Ψ (ξ ) = tr (UM B (ξ ) −1 ) → min ξ ∈Ξ

where tr (.) is the trace operator. It may be shown that the set of all information matrices on

Ξ is convex and compact. Furthermore, it is possible to show that all functionals of the above form are convex and continuous in M B (ξ ) and

ξ . The above design functional Ψ (ξ )

thus attains its minimum at a design ξ ∈ Ξ . *

3 Iteration Procedures for Determining Exact Designs We are now going to formulate iteration procedures for the construction of approximately optimal exact designs. Contrary to the construction of optimal discrete designs, which, among other design algorithms, are discussed in Pilz (1991), in this case we cannot prove convergence of the exact designs to the function value of an optimal exact design; we can only guarantee stepwise improvement of a given exact starting design, i.e. the sequence of function values decreases monotonically. We will give here two iterative algorithms, where the first serves for the construction of an initial design. 3.1 Generation of an initial design

Step 1. Choose x1 ∈ X such that

x1 = arg inf ψ (M B ( x)) x∈X

and set v1 = x . Step 2. Beginning with i = 1, find xi +1 such that

xi +1 = arg inf Ψ (M B (vi + ( x))) x∈ X

and form vi +1 = vi + ( xi +1 ) Continue with i replaced by i + 1 until i + 1 = n . Step 3. If i + 1 = n then stop and take vn = ( x1 ,..., xn ) as an initial design. For our design functional the different steps above read: Step 1. x1 = arg sup x∈ X

Step 2.

112

f ( x ) T Φ UΦ f ( x ) 1+ f ( x ) T Φ f ( x )

7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

xi +1 = arg

f ( x)T M B (vi ) −1 UM B (vi ) −1 f ( x) , i = 1,2,..., n − 1 i + f ( x )T M B (vi ) −1 f ( x) 1≤ j ≤ n +1

sup

3.2 Exchange Type Algorithm The second algorithm is an exchange type algorithm improving n-point designs and starting from an initial design.

Step 1. Use some initial design vn ,1 = ( x1,1 ,..., xn ,1 ) ∈ X n of size n . Step 2. Beginning with s = 1 , form the design vn+1,s = vn ,s + ( xn+1,s ) by adding the point

xn+1,s = arg inf Ψ (M B (vn,s + ( x ))) x∈X

to vn , s . *

Then form vnj,s = vn+1,s − ( x j* ,s ) by deleting that point x j ,s from vn+1,s for which *

Ψ (M B (vnj,s )) =

j min Ψ (M B (vn,s ))

j∈{1,..., n +1}

Step 3. Repeat Step 2 until the point to be deleted is equivalent to the point to be added. For our design functional step 2 is determined as follows: x n + 1, s = arg sup x∈ X

j * = arg

min 1≤ j ≤ n +1

f ( x ) T M B ( v n , s ) − 1 UM B ( v n , s ) − 1 f ( x ) n + f ( x ) T M B ( v n , s ) −1 f ( x )

f ( x j , s )T M B (vn +1, s ) −1 UM B (vn +1, s ) −1 f ( x j , s ) n + 1 − f ( x j , s )T M B (vn +1, s ) −1 f ( x j , s )

4 Approximating Spatial Sampling Designs by Experimental Designs 4.1 The Spatial Linear Model In this section we will show that every isotropic random field can be approximated arbitrarily close by a Bayesian linear regression problem. The experimental design theory for Bayesian linear regression can thus be used to calculate spatial sampling designs for random fields. We will make use of the so-called polar spectral representation of isotropic random fields (Yaglom, 1987).

Here, we consider the random field {Y ( x ), x ∈ S ⊂ R d } to be decomposed as

Y ( x ) = f ( x )T β + ε ( x ) where

113

7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

E (Y ( x) | β) = f ( x)T β Moreover we assume to have prior knowledge about the trend parameter vector in the form

E (β) = µ

cov(β) = Φ > 0 For the error process we assume

E (ε ( x) | β) = 0 and that its covariance function

K ( x1 , x2 ) = cov(ε ( x1 ), ε ( x2 )), x1 , x2 ∈ S is known. The set S ⊂ R d , d ≥ 2 , over which the random field is defined, is assumed to be compact. We furthermore assume the error process ε ( x) to be mean square continuous, which is equivalent to the continuity of the covariance function on S × S (no nugget effect). Mean square continuity and the compactness of S are the main assumptions for the decomposability of the error process by means of the Karhunen-Loeve expansion and the polar spectral representation. 4.2 Polar Spectral Representation of Isotropic Random Fields In practice the approach to the approximation of the random field by means of the KarhunenLoeve expansion will be very complicated, since generalized eigenproblems have to be solved. The eigenfunctions will very strongly oscillate, since they contain all the variability of the error process. Thus, in general we can not expect to find numerically reliable algorithms for the solution of these eigenvalue problems. This makes the Karhunen-Loeve approach useless for the spatial sampling design problem in practical applications. Another approach to the sampling design problem, our approach, is based on the so-called polar spectral representation of isotropic random fields.

For simplicity, here we restrict ourselves to isotropic random fields {ε ( x), x ∈ S ⊂ R 2 } , but all main ideas of this subsection may be generalized to isotropic random fields over

R d , d ≥ 2 . According to Yaglom (1987), every isotropic, mean square continuous random field over S ⊂ R 2 has the following spectral representation for the covariance function: ∞

K ( x1 , x2 ) = ∫ J 0 ( x1 − x2 ω )dG (ω ), x1 , x2 ∈ S 0

where J 0 (.) is the Bessel function of the first kind and order

0 , x1 − x2

is the Euclidean

distance between x1 and x2 , and G (ω ) is the so-called polar spectral distribution function. Like any distribution function, G (ω ) is positive, monotonically increasing and bounded from

114

7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

above. On the other side, knowing only the isotropic covariance function C (h) of the random field in R 2 , its 2-dimensional spectral distribution function may be written

G (ω +) − G (ω −) ∞ C ( h) = ∫ J 1 (hω )hω dh h 2 0 Approximating the polar spectral distribution function G (ω ) by means of a step function with positive jumps δ i = G (ωi +1 ) − G (ωi ) at the points ωi +1 , i = 1,2,..., n , the polar spectral representation theorem of isotropic processes tells us, that the error process may be written in polar coordinates (radius , angle) = (t , ϕ ) approximately as M

n

M

n

m =0

i =1

m=1

i =1

ε (t , ϕ ) ≈ ∑ {cos(mϕ )∑ J m (ωi t )U m(1,)i } + ∑ {sin( mϕ )∑ J m (ωi t )U m( 2,i) } , where all the random variables U m( j,i) are uncorrelated, and their variances are given by

var(U m( j,i) ) = d mδ i , with d m = 1 if m = 0 and d m = 2 if m > 0 . Dependent on how well the step function approximates the polar spectral distribution function, the stochastic process may be approximated arbitrarily close by means of the above expression. Thus, by truncating the above series at a sufficiently large m = M , we get an approximation of the isotropic error process by means of a Bayesian linear regression model. It may be shown that the trend prediction in the approximating regression model approximates the Bayesian kriging predictor in the original model. The same is true for the total mean square error. Thus, we have the important result, that the trend estimation in the approximating Bayesian linear model approximates Bayesian kriging in the original model. 4.3 The Approximation of the Spatial Sampling Design Problem by means of a Bayesian Linear Design Problem We now come to discuss spatial sampling design. Here, most often one wants to minimize the average of the Bayesian kriging total mean square error over the area of interest. Using, instead of the integrated total mean square error, its approximation from the approximating regression model, we get

⎛ f ( x0 ) ⎞ ∫S ⎜⎜⎝ g ( x0 ) ⎟⎟⎠

T

⎛ ⎜ (F | G )T (F | G ) + σ 2 ⎛⎜ Φ 0⎜ ⎜ ⎝0 ⎝

0⎞ ⎟ Λ ⎟⎠

−1

⎞ ⎛ f ( x0 ) ⎞ ⎟⎜ ⎟ dP ( x0 ) → min ⎟⎜⎝ g ( x0 ) ⎟⎠ vn ⎠

This resembles a standard Bayesian experimental design problem, and we have the important result, that we can use standard Bayesian experimental design theory as described in section 2 to calculate spatial sampling designs for the Bayesian kriging predictor. Here the design matrix G and the vector of functions g ( x0 ) contain all the regression functions from the approximating regression model, Λ gives the variances of the random amplitudes and σ 02 is the variance of the uncorrelated errors of the approximating regression model.

115

7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

5 Example Now we are going to discuss an example for spatial sampling design by means of approximating the isotropic random function on the basis of the polar spectral representation. We use the so-called Gomel data set, a set of Cs137 measurements taken from a monitoring network in the region of Gomel (Belarus), an area in the neighbourhood of Chernobyl. From the 591 sample locations we randomly selected 80 data points, logtransformed the corresponding Caesium137 concentrations and calculated a weighted least squares fit to the empirical variogram of the logtransformed concentrations. Figure 1 shows this variogram and a worst case approximation to this variogram based on the representation of the approximating covariance function. Here we have selected the largest angular frequency M = 35 and the radial frequencies ωi = 0.0027 * 1.1i −1 − 0.0027, i = 1,2,...,68 . The 2-dimensional spectral distribution function was approximated at these frequencies ωi by a step function and the stochastic process was approximated as well by Cosine-Sine-Bessel surface harmonics with random amplitudes. The random field considered was part of a 300 × 300 square area with origin (0,0) . The worst case approximation to the variogram was calculated along the line of points ( x,150),−150 ≤ x ≤ 150 . From the above frequency parameters M and ωi resulted a Bayesian linear regression model with 4828 regression functions, which were used to approximate the error process. As variance for the error of the approximating Bayesian regression model we used σ 02 = 0.3 . For modeling the trend of the random function only a constant trend was used. Concerning the design problem, we were interested in minimizing the average total mean square error of prediction over a regular grid of 30 × 30 points. This approximation was used, since the integrals for calculating the matrix U in the design problem T

⎛ 1 ⎞⎛ 1 ⎞ ⎟⎟ dP ( x0 ) ⎟⎟⎜⎜ tr (UM B (ξ ) −1 ) → min , U = ∫ ⎜⎜ ξ ∈Ξ S ⎝ g ( x0 ) ⎠⎝ g ( x0 ) ⎠ here simplify to simple sums. We were interested in calculating 6-, 12-, 18-, 24-and 30-point designs, that should be added to the already available 80 sample locations. For the calculation of the desired exact designs we used a combination of the algorithms in section 3. For that purpose, in every step of the algorithm two design points were added to the already available 80 sample locations by means of the 'initial design algorithm'. Thereupon, by means of the exchange type algorithm suboptimal 80+2-, 80+4-,...80+30-point designs were generated step by step, such that in every step we got nearly optimal designs. The figures show a sample of suboptimal designs. Obviously the algorithm selects design points in such areas at first, where the sampling density is low, a feature that we expect from good design algorithms. Moreover, every design point has been selected only once, a feature that seems to be a property of the polar spectral approximation, but is not a property of the used design algorithm.

116

7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

6 150

100

50

0

−50

−100

−150 −150

Figure 1 Covariance function and its worst case approximation.

−100

−50

20

100

100

50

50

0

0

−50

−50

−100

−100

−50

0

100

150

20 150

−100

50

Figure 2 Suboptimal 80+6 point design, Bayes Kriging.

150

−150 −150

0

50

100

−150 −150

150

Figure 3 Suboptimal 80+20 point design, Bayes Kriging.

−100

−50

0

50

100

150

Figure 4 Suboptimal 1+20 point design, Bayes Kriging.

average kriging mean square error 3.4 3.2 3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4

0

5

10

15

20

25

30

35

40

Figure 5 Average Bayes Kriging Variance.

117

7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences. Edited by M. Caetano and M. Painho.

References Bandemer, H. et al. (1977), ``Theorie und Anwendung der optimalen Versuchsplanung I'', AkademieVerlag, Berlin Fedorov, V.V. (1972), ``Theory of optimal experiments'', transl. and ed. by W.J. Studden and E.M. Klimko, Academic Press, New York, (Russian Original: Nauka, Moscow 1971) Fedorov, V.V. (1996), ``Design of spatial experiments: model fitting and prediction'', in Rao, C.R. and Gosh, J., editors, Handbook of Statistics, Volume 13, North-Holland Fedorov, V.V. and Flanagan, D. (1997), ''Optimal monitoring network design based on Mercer's expansion of the covariance kernel'', Journal of Combinatorics, Information and System Sciences, 23 Kiefer, J. (1959), ``Optimum experimental design'', Journal of the Royal Statistical Society, Ser. B 21, 272-319 Omre, H. (1987), ``Bayesian Kriging-Merging Observations and Qualified Guess in Kriging'', Mathematical Geology, V.19, 25-39 Pilz, J. (1991), “Bayesian Estimation and Experimental Design in Linear Regression Models'', Wiley, New York Yaglom, A.M. (1987), ``Correlation Theory of Stationary and Related Random Functions'', Springer, New York

118