An Experimental Comparison of Ordinary and Universal Kriging and ...

9 downloads 320 Views 127KB Size Report
Dale Zimmerman; Claire Pavlik; Amy Ruggles; Marc P. Armstrong. Article. DOI : 10.1023/A:1007586507433. Cite this article as: Zimmerman, D., Pavlik, C., ...
9812

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

(PR1)

Mathematical Geology, Vol. 31, No. 4, 1999

An Experimental Comparison of Ordinary and Universal Kriging and Inverse Distance Weighting1 Dale Zimmerman,2 Claire Pavlik,3 Amy Ruggles,4 and Marc P. Armstrong5 A factorial, computational experiment was conducted to compare the spatial interpolation accuracy of ordinary and universal kriging and two types of inverse squared-distance weighting. The experiment considered, in addition to these four interpolation methods, the effects of four data and sampling characteristics: surface type, sampling pattern, noise level, and strength of small-scale spatial correlation. Interpolation accuracy was measured by the natural logarithm of the mean squared interpolation error. Main effects of all five factors, all two-factor interactions, and several three-factor interactions were highly statistically significant. Among numerous findings, the most striking was that the two kriging methods were substantially superior to the inverse distance weighting methods over all levels of surface type, sampling pattern, noise, and correlation. KEY WORDS: geostatistics, spatial interpolation, spatial pattern, surface-fitting algorithms.

INTRODUCTION An important statistical problem in the geosciences is the estimation of a twoor three-dimensional manifold across a study region from observations taken at known locations in the region. This problem, known as spatial interpolation, has been addressed using methods that include ordinary and universal kriging, inverse distance weighting, interpolating polynomials, splines, and power and Fourier series fitting (for reviews of these and other methods, see Franke, 1Received

August 1995; accepted 21 July 1998. of Statistics and Actuarial Science, The University of Iowa, Iowa City, Iowa 52242. e-mail: [email protected] 3Department of Geography, The University of Iowa, Iowa City, Iowa 52242. e-mail: [email protected] 4Department of Geography, The University of Iowa, Iowa City, Iowa 52242. e-mail: [email protected] 5Department of Geography, and Program in Applied Mathematical and Computational Sciences, The University of Iowa, Iowa City, Iowa 52242. e-mail: [email protected] 2Department

375 0882-8121/ 99/ 0500-0375$16.00/ 0  1999 International Association for Mathematical Geology

9812

376

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

(PR1)

Zimmerman, Pavlik, Ruggles, and Armstrong

1982; Lam, 1983; Cressie, 1993). The variety of available interpolation methods has led to questions about which is most appropriate in different contexts and has stimulated several comparative studies of relative accuracy. Unfortunately, published studies are not unequivocal as to which interpolation method is most accurate. In some studies (Creutin and Obled, 1982; Tabios and Salas, 1985; Rouhani, 1986; Grimm and Lynch, 1991; Laslett and McBratney, 1990; Weber and Englund, 1994; Laslett, 1994; Phillips and others, 1997), a kriging procedure performed best, while in others (Laslett and others, 1987; Weber and Englund, 1992; Gallichand and Marcotte, 1993; Brus and others, 1996; Declercq, 1996) splines or inverse distance weighting were as good or better. Most of these comparisons were based on analyses of actual geostatistical data, such as groundwater levels (Rouhani, 1986), acid rain deposition (Grimm and Lynch, 1991), soil pH (Laslett and others, 1987; Laslett and McBratney, 1990), and land elevation and elevation variances (Weber and Englund, 1992, 1994; Wood and Fisher, 1993). Using real data rather than synthetic data has several advantages; for example, it precludes one method from having an unfair advantage merely because the data used for the comparison are generated under the same model on which the method is based. On the other hand, only with synthetic data can the effect of certain data characteristics (such as the level of noise or the strength of spatial correlation) on interpolation accuracy be systematically evaluated. The purpose of this paper is to evaluate and compare the accuracy of several interpolation methods based on an analysis of synthetic data from a computational experiment. The experiment was designed so that the effects of interpolation method and certain data features and sampling procedures, both individually and in interaction with one another, could be tested for statistical significance. The interpolation methods we studied, ordinary kriging, universal kriging, and two types of inverse squared-distance weighting, are widely used methods that were given prominence in the relatively recent comparative studies of Weber and Englund (1992, 1994) and Englund, Weber, and Leviant (1992). The present study can, in fact, be considered a synthetic-data complement and supplement to these two previous studies. Our computational experiment focused on four basic factors in addition to interpolation method: surface type, sampling pattern, noise (magnitude of error variance), and strength of spatial correlation among error terms. The three surfaces we used, a simple tilted plane, a strong peak based on a sine function, and an undulating, increasing surface which is the sum of sine and cosine functions, were selected to encompass a simplified subset of the range of surfaces encountered in environmental analyses. The surfaces share, at an abstract level, many characteristics with real environmental surfaces. As Weber and Englund (1994, p. 590) argued, physical phenomena such as hydraulic head, surface water temperature, and geologic structural surfaces often vary smoothly and continuously.

9812

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

A Comparison of Kriging and Inverse Distance Weighting

[632]

(PR1)

377

The three surfaces were sampled with four commonly used patterns so that the interaction of surface type with sampling pattern could be evaluated. In general, the surfaces from which environmental samples are taken are expected to exhibit local variation (noise); in addition, physical processes operate which often lead to small-scale spatial correlation. To simulate this in our data, we systematically controlled the level of noise in, and the strength of spatial correlation among, the error terms, thus permitting an examination of the effects of noise and spatial correlation on interpolation accuracy. The systematic design of our experiment, with sampled surfaces interpolated by ordinary and universal kriging and by two types of inverse squared-distance weighting, permits comparative evaluation of the four methods. By controlling values of the parameters discussed above, we are able to reach conclusions about the conditions under which kriging (either ordinary or universal) or inverse squared-distance weighting performs better, as well as the conditions under which particular sampling patterns may be preferred. EXPERIMENTAL PROCEDURE Many characteristics of sampling and data could conceivably affect the performance of interpolation methods. As noted in the previous section, we considered four basic factors in this investigation in addition to interpolation method: surface type, sampling pattern, magnitude of error variance, and strength of small-scale spatial correlation. In this section, we describe how the levels of these factors were selected and how our computational experiments were conducted. The experimental region for our investigation was taken to be the unit square {(x, y): 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. Three mathematical surfaces, henceforth called the plane, the sombrero, and Morrison’s surface, were defined on this region (Fig. 1). Each of these surfaces represents a particular kind of drift, or large-scale spatial variation. The drift functions that define the plane and the sombrero are: Plane:g(x, y) c 1.05 − x Sombrero:g(x, y) c

sin[{16(x − 0.5)}2 + {16( y − 0.5)}2 ] {16(x − 0.5)}2 + {16( y − 0.5)}2 if x ? 0.5 or y ? 0.5

c 1,

if x c y c 0.5

The drift function that defines Morrison’s surface, given in Morrison (1971) but not duplicated here, is rather complicated; it is a linear combination of 48 sine

9812

378

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

(PR1)

Zimmerman, Pavlik, Ruggles, and Armstrong

Figure 1. Mathematical surfaces used for the experiment: A, plane; B, sombrero; C, Morrison’s surface.

and cosine functions of different phase, period, and amplitude. So as to eliminate possible differences in the accuracy of interpolation methods for these surfaces caused by differences in total surface variation (maximum minus minimum), all three surfaces were scaled so that the maximum was approximately equal to 2.0 and the minimum was approximately equal to 0.0. Each surface (actually a perturbed version of each surface, as will be described subsequently) was sampled at locations whose spatial distribution was determined according to a particular spatial point process. Four types of sampling patterns were considered: a regular hexagonal grid, an inhibited pattern, a simple random pattern, and a clustered pattern. In the order just listed, these types lie on

9812

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

A Comparison of Kriging and Inverse Distance Weighting

981203ap

6/29/1999

[632]

(PR1)

379

a continuum of decreasing regularity (or increasing aggregation). The hexagonal grid comprised 99 sampling points with a grid spacing of approximately 0.104 units. The other three types of patterns all consisted of 100 sampling points. For each type, realizations were obtained using methods described by Diggle (1983), which we briefly summarize. A realization of the random pattern was obtained by generating 200 independent uniform (0, 1) random variables and pairing them as they were generated to yield 100 (x, y)-coordinates. A realization of the clustered pattern was obtained by simulating a Poisson cluster process with 20 clusters. For each cluster, the number of sampling points was an independent Poisson variate with expectation 5.0; realizations were conditioned to yield a total of 100 points. Locations of points in a cluster were independently distributed according to a uniform distribution on a circle of radius 0.10 around the cluster center, with toroidal edge correction when necessary. A realization of the inhibited sampling pattern was obtained by simulating a sequential, hard-core inhibition process: the first sampling point was uniformly generated in the unit square as for a simple random pattern, but each subsequent point was chosen according to a uniform distribution on that portion of the unit square that lay no closer than 0.04 to any previously obtained point. A realization of the hexagonal grid was obtained by locating the lower leftmost grid point at random within the smallest subregion for which the resulting grid would lie completely within the unit square. A realization of each type of sampling pattern is depicted in Figure 2. Values of the surfaces at sampling points were perturbed by adding autocorrelated errors in order to introduce some local variability. The resulting surfaces more realistically represent environmental surfaces than the smooth mathematical surfaces defined above. For the purpose of evaluating the performance of interpolation methods, errors were also generated at 400 “evaluation points” arranged in a 20 × 20 square grid with lower leftmost point at (0.025, 0.025) and grid spacing 0.05 units. Autocorrelated, normally distributed errors with mean 0 and variance j 2 were generated at the 100 (or 99, for the hexagonal patterns) sampling points and 400 evaluation points using the Cholesky decomposition method (Cressie, 1993, p. 201–203). The autocorrelated errors were generated according to the isotropic exponential covariance model—a commonly used, valid (nonnegative definite), and monotone decreasing covariance function. This model is given by C(r; j 2 , a) c j 2 e − ar where r is the (Euclidean) distance between points, j 2 > 0 is the error variance, and a is the correlation strength parameter. Two values of j 2 , 0.0036 and 0.0576, and two values of a, 4 and 10, were considered. These were chosen so as to be reasonable in relation to the total variation of each surface and to the area of

9812

380

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

(PR1)

Zimmerman, Pavlik, Ruggles, and Armstrong

Figure 2. Examples of the four types of sampling patterns used for the experiment: A, hexagonal grid; B, inhibited; C, random; D, clustered.

the experimental region. When j 2 c 0.0036, roughly 95% of the errors have magnitudes less than 2 . (0.0036)1/ 2 c 0.12 (6% of the range of the data) and fewer than 0.1% of the errors exceed 10% of the range in magnitude; thus, this choice of j 2 introduces a relatively small amount of noise into the observations. When j 2 c 0.0576, however, much more noise is introduced; only 38% of the errors have magnitudes less than 6% of the range, roughly 40% of the errors exceed 10% of the range, and about 4% of the errors exceed 50% of the range. When a c 4, errors at points separated by a distance of 0.25 units are correlated by the amount 0.37; that is, moderate correlation exists at a distance of one-

9812

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

A Comparison of Kriging and Inverse Distance Weighting

981203ap

6/29/1999

[632]

(PR1)

381

fourth the length of the experimental region’s side. When a c 10, however, the correlation is much weaker; the correlation between points 0.25 units apart is only 0.08. The experiment was carried out as follows. For each combination of surface type, sampling pattern, j 2 , and a, three realizations of the sampling pattern were obtained and then, conditional on those data locations, errors with the autocovariance structure specified by j 2 and a were generated at the 100 (or 99) data locations and 400 evaluation gridpoints simultaneously. In this way, a total of 144 (3 surface types × 4 pattern types × 3 realizations of each pattern type × 2 values of j 2 × 2 values of a) sets of 500 (or 499) errors referenced by spatial location were generated. Next, each set of errors was added to the corresponding mathematical surface at the 500 (or 499) locations, yielding a synthetic data set of 100 (or 99) observations and a “true surface” of 400 values at evaluation points. Note that we define the true surface as the mathematical surface plus autocorrelated error; since our synthetic data arose from a perturbed surface, it was most appropriate for the surface to which interpolated values were compared, to have the same degree of perturbation. Each set of 100 (or 99) observations was processed by one of four interpolation algorithms: ordinary kriging (OK), universal kriging (UK) using (as is common practice) a full second-order polynomial drift function, and inverse squared-distance weighting using the nearest six observations (ISDW-6) or the nearest 12 observations (ISDW-12). The results from these interpolations then were used to estimate the value of the true surface at each of the 400 evaluation points. An estimation error at each evaluation point and for each interpolation method was obtained by subtracting the value of the true surface at the point from the value estimated by the interpolation method. The four interpolation methods were carried out using suitable functions in S+SpatialStats: krige and predict.krige for OK and UK, and find.neighbor and functions we wrote ourselves for ISDW-6 and ISDW-12 (for details on these and related functions, see Kaluzny and others, 1997). Implementation of the two kriging procedures was preceded by the selection of a suitable semivariogram model, and the estimation of that model, specific to each dataset. This was accomplished, as is common practice, by fitting each of several semivariogram models (linear, Gaussian, spherical, and exponential) to the empirical semivariogram by weighted least squares (Cressie, 1993, sect. 2.6.2) and selecting the model with the smallest weighted residual sum of squares. The S+SpatialStatsfunctions variogram and vario.fit were useful in this regard. To measure interpolation accuracy, the natural logarithm of the mean squared error [log(MSE)], the root-mean-squared error, and the mean absolute error were computed for each dataset. In addition, the correlation between the estimated values and true values at evaluation points was computed for each dataset. In what follows, we present statistical results for log (MSE) only, as this was the only measure which residual analysis showed to be nearly normally

9812

382

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

(PR1)

Zimmerman, Pavlik, Ruggles, and Armstrong

distributed. Although the other performance measures had distributions that were not as well approximated by normality, the results for these measures were qualitatively similar to results for log (MSE). The complete experiment can be regarded as a randomized factorial experiment with five factors: interpolation method, surface type, sampling pattern, noise level (j 2 ), and correlation strength (a). Each cell of the five-way layout contained three replicates (calculated from the synthetic datasets corresponding to the three realizations of each sampling pattern). An experimental measurement was obtained by: (1) comparing the interpolated surface to the true surface for a dataset at the 400 evaluation points, in the manner described previously, and (2) reducing the 400 interpolation errors to a single numerical performance measure [log(MSE)]. The design yielded 576 such measurements. Accordingly, our analysis of the experiment is based on the model log(MSE )ijklmn c f (m i , t j , pk , n l , am ) + e ijklmn (i c 1, 2, 3, 4; j c 1, 2, 3; k c 1, 2, 3, 4; l c 1, 2;

m c 1, 2;

n c 1, 2, 3)

where log(MSE)ijklmn is log (MSE) for the ith interpolation method and for the nth dataset at levels j, k, l, m of the other four factors; f is a specified function (discussed further below); m i is the effect of interpolation method i (i c 1, 2, 3, 4 for OK, UK, ISDW-6, or ISDW-12); t j is the effect of surface type j ( j c 1, 2, 3 for plane, sombrero, or Morrison’s surface); pk is the effect of sampling pattern k (k c 1, 2, 3, 4 for hexagonal, inhibited, random, or clustered); n l is the effect of the lth level of noise (l c 1 or 2 for j 2 c 0.0036 or 0.0576); am is the effect of the mth level of correlation (m c 1 or 2 for a c 4 or 10); e ijklmn is the residual for the nth data set at levels i, j, k, l, m of the five factors; these residuals are assumed to be independently and identically distributed normal random variables with mean zero and positive variance. The main-effects version of this model takes f to be a simple additive function of the five factor effects; that is, f (m i , t j , pk , n l , am ) c m i + t j + pk + n l + am . We found the most useful version, however, to be the model with main effects and two- and three-factor interactions. RESULTS Table 1 gives the analysis-of-variance table that corresponds to the prescribed model for log (MSE) with main effects and two- and three-factor interaction effects. The model described a large portion (96.7%) of the total variation, and a standard Shapiro-Wilks test could not reject the null hypothesis of normally-distributed residuals (P c 0.18). All main effects and two-factor interactions were highly significant (P < 0.0001). Also highly significant were three of the ten three-factor interactions. Further analysis, based on a model which

9812

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

A Comparison of Kriging and Inverse Distance Weighting

(PR1)

383

Table 1. ANOVA for the Experiment, Based on a Model for Log (MSE) with Main Effects and Two- and Three-Factor Interactionsa Source Method (M) Surface (S) Pattern (P) Noise (N) Correlation (C) M× S M× P M× N M× C S× P S× N S× C P× N P× C N× C M× S× P M× S× N M× S× C M× P× N M× P× C M× N× C S× P× N S× P× C S× N× C P× N× C Residual Total aHighly

df

MS

F

Pr > F

3 2 3 1 1 6 9 3 3 6 2 2 3 3 1 18 6 6 9 9 3 6 6 2 3 459 575

27.37 117.04 46.69 213.70 18.49 0.46 0.23 7.25 0.62 1.79 29.49 1.58 3.00 0.59 3.17 0.48 0.62 0.05 0.05 0.02 0.00 0.31 0.02 0.07 0.17 0.06

449.67 1923.17 767.11 3511.25 303.88 7.49 3.78 119.11 10.22 29.42 484.49 25.92 49.35 9.75 52.14 7.84 10.24 0.78 0.81 0.35 0.02 5.11 0.36 1.23 2.73

0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.0001* 0.59 0.61 0.96 0.99 0.0001* 0.90 0.29 0.04

significant terms are indicated with an asterisk. R2 c 0.97.

included four-factor interaction effects, indicated that all interactions of order four were not significant. Although the presence of three-factor interactions can, in general, obfuscate the interpretation of two-factor interactions and main effects, an examination of the three-way means corresponding to the significant three-factor interactions (not shown) revealed that in this case they do not. That is, the lack of additivity indicated by the significant three-factor interactions was due to differences in the relative magnitudes, but not to differences in the relative rankings, of log (MSE) over the three-factor combinations. Moreover, the three-factor interactions were much smaller in magnitude than the lower-order effects. Consequently, the significance of several three-factor interactions here does not compromise conclusions drawn from examining main effects and two-factor interactions only.

9812

384

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

(PR1)

Zimmerman, Pavlik, Ruggles, and Armstrong

Figure 3. Interaction plots for the significant two-factor interactions: A, Method × Surface; B, Method × Pattern; C, Method × Noise; D, Method × Correlation; E, Surface × Pattern; F, Sur-

9812

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

A Comparison of Kriging and Inverse Distance Weighting

981203ap

6/29/1999

[632]

(PR1)

385

Figure 3. (Continued) face × Noise; G, Surface × Correlation; H, Pattern × Noise; I, Pattern × Correlation; J, Noise × Correlation. Plots show the sample means of log (MSE) for one factor at each level of the other factor. Means that correspond to the same level of the first factor are connected by line segments. Horizontal bars represent values obtained by adding and subtracting twice the standard error from the mean.

Figure 3 displays interaction plots corresponding to all two-factor interactions. The plots indicate that, in most cases, the significance of a two-factor interaction is caused by a lack of additivity (parallelism) between the kriging profiles and the ISDW profiles. There are only two instances in which profiles cross: (1) in the Method × Surface plot, where OK was superior to UK for the sombrero and Morrison’s surface, but vice versa for the plane; (2) in the Method × Pattern plot, where OK was superior to UK for the clustered, inhibited, and hexagonal patterns but vice versa for the random pattern. Apart from these exceptions, the significance of the two-factor interactions, like that of the three-factor interactions, appears to be due to differences in relative magnitude rather than in relative ranking of log (MSE) over the two-factor combinations; similarly, the two-factor

9812

386

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

(PR1)

Zimmerman, Pavlik, Ruggles, and Armstrong

Figure 4. Sample one-factor means of log (MSE). Horizontal bars represent means and values obtained by adding and subtracting twice the standard error from the mean.

interactions are much smaller than the main effects. Consequently, apart from the two exceptions noted, the significance of the two-factor interactions does not materially affect conclusions drawn from examining main effects only. Figure 4 displays the one-factor means of log (MSE), averaged over all levels of the other factors, and their standard errors. The information in this figure can be summarized as follows: 1. The kriging methods performed much (20–30%) better than the two inverse squared distance weighting procedures. Furthermore, OK outperformed UK slightly and ISDW-6 outperformed ISDW-12 slightly. 2. The performance of each interpolation method was significantly better for the plane than for the sombrero, which in turn yielded better performance than Morrison’s surface. 3. The performance of each interpolation method deteriorated significantly as the sampling pattern changed from hexagonal, to inhibited, to random, and finally to clustered. 4. The performance of each interpolation method was significantly better when j 2 c 0.0036 than when j 2 c 0.0576. That is, interpolation accuracy improved when the data were not as noisy. 5. The performance of each interpolation method was significantly better when a c 4 than when a c 10. That is, interpolation accuracy improved when the data were more highly spatially correlated.

9812

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

A Comparison of Kriging and Inverse Distance Weighting

981203ap

6/29/1999

[632]

(PR1)

387

DISCUSSION Several important points emerge from the results of the previous section. Because the primary objective of this investigation was to compare interpolation methods, we first consider the effect of interpolation method on accuracy [as measured by log (MSE)]. The most striking result along these lines was that the two kriging methods consistently and substantially outperformed the two inverse distance weighting methods over all levels of the other factors. This agrees with the findings of several authors who have made comparisons among interpolation methods, but contradicts the findings of several other such authors (refer to the Introduction or to Cressie, 1993, section 5.9.3, for a list of relevant literature). We make no claim that our study completely lays this issue to rest; however, we do find the consistent superior performance of kriging over various types of surfaces and sampling patterns and various levels of noise and spatial correlation in our study rather more compelling than findings from less comprehensive studies. Of course, kriging has several other advantages to inverse distance weighting that are not performance-related. These include the capability of adapting easily to the estimation of spatial averages (block kriging) and the provision of a measure of variability of interpolated values (the kriging variance). A second point pertaining to the comparison of interpolation methods is that although the kriging methods were consistently superior to inverse squareddistance weighting over all levels of the other factors, the extent of superiority was affected by the other factors. More specifically, the disparity in performance between the two types of methods tended to be greatest at those levels of the other factors at which all four methods performed best. For example, in comparing interpolation methods over the three surfaces, the relative superiority of kriging was greatest for the plane and smallest for Morrison’s surface: these are the surfaces for which all four methods do best and worst, respectively. Similarly, the relative superiority of kriging was greatest when the sampling pattern was most regular, noise was low, and spatial correlation was high. Thirdly, consider comparisons of the two kriging methods with each other and the two inverse distance weighting methods with each other. It was expected that UK would outperform OK for the plane because the noiseless version of this surface can be fit exactly by the second-order polynomial function employed by UK (in fact a first-order polynomial would suffice). And indeed this was so, although the difference in performance was not statistically significant. Nor was the difference significant for Morrison’s surface. For the sombrero, however, OK performed significantly better than UK. This last result is interesting, for it indicates that for some types of surfaces it is better to completely ignore the modeling of trend (as OK does) than to model trend inappropriately (as, apparently, UK does for the sombrero). This issue requires further investigation; see also the relevant work of Journel and Rossi (1989).

9812

388

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

(PR1)

Zimmerman, Pavlik, Ruggles, and Armstrong

As regards the significant Method × Pattern interaction, we found that OK performed better than UK for the clustered pattern, whereas OK and UK were not significantly different for the other three patterns. A plausible explanation for the superiority of OK for clustered patterns is elusive; however, it can be argued that the issue is of little practical importance because overall performance was so much worse for clustered patterns that they should be avoided if at all possible. ISDW-6 consistently performed better than ISDW-12 across all levels of the other factors, and in most cases the difference, though only on the order of 5–10% of the larger log (MSE), was statistically significant. Thus, if an investigator insists on interpolating a surface using inverse squared-distance weighting, we recommend that the number of neighbors, k, be six rather than 12. This recommendation is consistent with those of Morrison (1974), MacDougall (1976), Peucker (1980), and Hodgson (1992) who recommended, respectively, 3 ≤ k ≤ 7, 6 ≤ k ≤ 9, k ≤ 6, and 4 ≤ k ≤ 7. It differs somewhat, however, from that of Declercq (1996), who recommended 4 ≤ k ≤ 8 for “smooth” surfaces and 16 ≤ k ≤ 24 for abruptly changing surfaces. Note that the larger the value of k, the smoother the interpolated surface. Although the primary objective of this investigation was to compare interpolation methods, the effects of the other factors on overall interpolation performance were also of some interest. The significance of the main effects and the relatively small magnitude of the interactions allow us to make several points and to note some aspects that require further investigation. First, our results indicate that overall interpolation performance deteriorated consistently and significantly as the sampling pattern varied from the most regular pattern, hexagonal, to the next most regular, inhibited, to a random pattern, and finally to the most irregular pattern, clustered. A similar phenomenon has been documented previously (Morrison, 1971; MacEachern and Davidson, 1987; Englund, Weber, and Leviant, 1992). These results also comport with spatial design theory, in which systematic sampling is known to be generally superior (see, for example, Haining, 1990, Chapter 5). Second, it is generally expected that interpolation methods will perform better, in absolute terms, at lower noise levels and at higher levels of spatial correlation. While our results were consistent with these expectations, they are somewhat at odds with the results of Englund, Weber, and Leviant (1992), who found, to their surprise, that the accuracy of OK was not significantly affected by noise level. Two possible explanations for this discrepancy are: (1) we used about three times as many datasets (144 vs. 54) and thus our analysis was perhaps more sensitive than theirs at detecting a noise effect; (2) the amounts of noise, as measured by the ratios of error variance to total variance, were considerably more disparate in our experiment (roughly 2% and 30% in our experiment vs. 0% and 10% in the study of Englund, Weber, and Leviant). Finally, consider the effect of surface type. We found that interpolation per-

9812

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

A Comparison of Kriging and Inverse Distance Weighting

6/29/1999

[632]

(PR1)

389

formance was best for the plane, next best for the sombrero, and worst for Morrison’s surface. That the plane yielded the best performance is not surprising in light of this surface’s extreme smoothness, but it is difficult to characterize the sombrero or Morrison’s surface as smoother than the other and hence difficult to explain why performance was better for the former than for the latter. Additional investigation is required to understand how surface type affects interpolation performance at both the overall level [as measured, for example, by log (MSE)] and the micro level. In particular, more work is needed to explicitly identify and parameterize those surface characteristics (e.g., peaks, troughs, or portions of the surface with large gradients) that are responsible for the greatest discrepancies in performance. At finer scales, the grid evaluation approach used in this paper could serve as the basis for visualizing and evaluating local performance characteristics. ACKNOWLEDGMENTS This research was partially supported by a grant from the Center for Global and Regional Environmental Research at The University of Iowa. Michael Hodgson kindly provided assistance in locating the equations used to generate Morrison’s surface. The computing assistance of Yanming Jiang is gratefully acknowledged. We thank two anonymous referees for comments which substantially improved the paper. REFERENCES Brus, D. J., de Gruijter, J. J., Marsman, B. A., Visschers, R., Bregt, A. K., and Breeuwsma, A., 1996, The performance of spatial interpolation methods and choropleth maps to estimate properties at points: A soil survey case study: Environmetrics, v. 7, no. 1, p. 1–16. Cressie, N., 1993, Statistics for spatial data: John Wiley & Sons, New York, 900 p. Creutin, J. D., and Obled, C., 1982, Objective analyses and mapping techniques for rainfall fields: An objective comparison: Water Resources Res., v. 18, no. 2, p. 413–431. Declercq, F. A. N., 1996, Interpolation methods for scattered sample data: Accuracy, spatial patterns, processing time: Cartography and Geographic Information Systems, v. 23, no. 3, p. 128–144. Diggle, P. J., 1983, Statistical analysis of spatial point patterns: Academic Press, London, 148 p. Englund, E. J., Weber, D. D., and Leviant, N., 1992, The effects of sampling design parameters on block selection: Math. Geology, v. 24, no. 3, p. 329–343. Franke, R., 1982, Scattered data interpolation: tests of some methods: Mathematics of Computation, v. 38, no. 157, p. 181–200. Gallichand, J., and Marcotte, D., 1993, Mapping clay content for subsurface drainage in the Nile delta: Geoderma, v. 58, nos. 3/ 4, p. 165–179. Grimm, J. W., and Lynch, J. A., 1991, Statistical analysis of errors in estimating wet deposition using five surface estimation algorithms: Atmospheric Environment, v. 25a, no. 2, p. 317–327. Haining, R., 1990, Spatial data analysis in the social and environmental sciences: Cambridge Univ. Press, Cambridge, 409 p. Hodgson, M. E., 1992, Sensitivity of spatial interpolation models to parameter variation: ACSM

9812

390

Mathematical Geology

PLENUM

Dec′ 98 − Jan′ 99

981203ap

6/29/1999

[632]

(PR1)

Zimmerman, Pavlik, Ruggles, and Armstrong

Technical Papers—Albuquerque, American Congress on Surveying and and Mapping, Bethesda, MD, v. 2, p. 113–122. Journel, A. G., and Rossi, M. E., 1989, When do we need a trend model in kriging?: Math. Geology, v. 21, no. 7, p. 715–739. Kaluzny, S. P., Vega, S. C., Cardoso, T. P., and Shelley, A. A., 1997, S+SPATIALSTATS user’s manual for windows and UNIX: Springer, New York, 384 p. Lam, N. S., 1983, Spatial interpolation methods: A review: American Cartographer, v. 10, no. 2, p. 129–149. Laslett, G. M., 1994, Kriging and splines: An empirical comparison of their predictive performance in some applications: Jour. Am. Stat. Assoc., v. 89, no. 426, p. 391–409. Laslett, G. M., and McBratney, A. B., 1990, Further comparison of spatial methods for predicting soil pH: Soil Science Society of America Journal, v. 54, no. 6, p. 1553–1558. Laslett, G. M., McBratney, A. B., Pahl, P. J., and Hutchinson, M. F., 1987, Comparison of several spatial prediction methods for soil pH: Jour. of Soil Science, v. 38, no. 2, p. 325–341. MacDougall, E. B., 1976, Computer programming for spatial problems: John Wiley & Sons, New York, 158 p. MacEachern, A. M., and Davidson, J. V., 1987, Sampling and isometric mapping of continuous geographic surfaces: The American Cartographer, v. 14, no. 4, p. 299–320. Morrison, J. L., 1971, Method-produced error in isarithmic mapping: American Congress on Surveying and Mapping, Washington, D.C., 76 p. Morrison, J. L., 1974, Observed statistical trends in various interpolation algorithms useful for first stage interpolation: The Canadian Cartographer, v. 11, no. 2, p. 142–159. Peucker, T. K., 1980, The impact of different mathematical approaches to contouring: Cartographica, v. 17, no. 2, p. 73–95. Phillips, D. L., Lee, E. H., Herstrom, A. A., Hogsett, W. E., and Tingey, D. T., 1997, Use of auxiliary data for spatial interpolation of ozone exposure in southeastern forests: Environmetrics, v. 8, no. 1, p. 43–61. Rouhani, S., 1986, Comparative study of ground-water mapping techniques: Ground Water, v. 24, no. 2, p. 207–216. Tabios, G. Q., and Salas, J. D., 1985, A comparative analysis of techniques for spatial interpolation of precipitation: Water Resources Bull., v. 21, no. 3, p. 365–380. Weber, D. D., and Englund, E. J., 1992, Evaluation and comparison of spatial interpolators: Math. Geology, v. 24, no. 4, p. 381–391. Weber, D. D., and Englund, E. J., 1994, Evaluation and comparison of spatial interpolators, II: Math. Geology, v. 26, no. 5, p. 589–603. Wood, J. D., and Fisher, P. F., 1993, Assessing interpolation accuracy in elevation models: IEEE Computer Graphics and Applications, v. 13, no. 2, p. 48–56.