comparison of approaches for estimating time ... - Semantic Scholar

1 downloads 0 Views 1MB Size Report
'A leisurely look at the bootstrap, the jackknife, and cross-validation', Am. Statistician, 37, 3648. Ellsaesser, H. W., MacCracken, M. C., Walton, J. J. and Grotch, ...
INTERNATIONAL JOURNAL OF CLIMATOLOGY, VOL. 16, 1 103-1 115 (1 996)

COMPARISON OF APPROACHES FOR ESTIMATING TIME-AVERAGED PRECIPITATION USING DATA FROM THE USA CORT J. WILLMOTT

Center@ Climatic Research, Department of Geography, Universily of Delaware, Newark, Delaware 19716, USA email: [email protected] SCOTT M. ROBESON AND MICHAEL J. JANIS

Department of Geogmphy, Indiana University, Bloomington, Indiana 47405, USA Received 24 April 1995 Accepted 4 January 1996

ABSTRACT Spatial and temporal sampling errors inherent in large-scale, weather-station (raingauge) climatologies of precipitation are evaluated. A primary goal is to assess whether more representative large-scale precipitation climatologies emerge when (i) more station means are included, even when they are based on unequal periods of record, or (ii) fewer station means are included but all are derived from the same period of record. Observations drawn from the Historical Climatology Network (HCN) are used to estimate temporally averaged precipitation over lo-, 20-, and 30-year intervals at 457 stations within the USA. Two strategies for estimating these ‘observed’ means are examined, one based on temporal ‘substitution’ within each station record, and the other based on spatial interpolation from surrounding stations. Temporally estimated m-year means were obtained by substituting other m-year means, from within the same station record, for each ‘observed’ m-year mean, where m is the length of the averaging period of interest. Spatially interpolated m-year means were estimated from m-year means associated with nearby stations. Climatologies containing a greater number of station averages, even if they are computed over unequal averaging periods, appear to better represent the space-time variability in mean precipitation than climatologies containing fewer, but tem orall commensurate, station means. Our results (for precipitation-station networks on the order of 50-60 stations per 10? km Y) indicate that the within-station-record substitution of means is about 1.3 to 2.5 times more accurate than is interpolation from surrounding station means. Within-station substitution errors-associated with estimating any 10-year mean precipitation from any other 10-year mean-for example, were about 8 per cent of the long-term spatial precipitation mean for the USA, or 67.6 mm. Spatially interpolated 10-year means, from nearby stations, were in error by more than 10 per cent, or 88.8 mm on average. Much of the space-time variability in mean precipitation was not resolved adequately by the 457 HCN stations, especially high-frequency spatial variability caused by orographic and convective mechanisms. For many regions of the world, temporally homogeneous precipitation station networks are considerably more sparse than in the USA, further degrading the reliability of interpolated and spatially integrated mean precipitation fields derived solely from those networks. KEY WORDS: precipitation variability; raingauge

networks; spatial interpolation; time averages; climatic change; USA.

1. INTRODUCTION

Growing scientific and public interest in climatic patterns, trends, and anomalies has prompted climatologists to reexamine historical weather-station archives. Our purpose has been to develop measurement-based climatologies that better document characteristic levels of the more important climatic variables (e.g. Karl et al., 1990; Legates and Willmott, 1990) as well as the extent to which climate varies in both time and space (e.g. Ellsaesser et al., 1986; Bradley et al., 1987; Diaz et al., 1989; Willmott and Legates, 199 1). Requests by the climate-modelling and remote-sensing communities for more reliable empirical standards have provided further impetus for examination of the instrumental record (cf. Gates, 1992; Hulme, 1992; Legates and Willmott, 1992). Although climatologists CCC 0899-8418/96/101103-13 0 1996 by the Royal Meteorological Society

1104

C. J. WILLMOTT, S. M. ROBESON AND M. J. JANIS

continue to produce a variety of improved climatologies (e.g. Leemans and Cramer, 1991; Hulme, 1992; Vose et al., 1992), the spatial coverage of historical station networks as well as their temporal extent remain less than optimal. Climatologists generally try to compile spatially high-resolution station networks and, at the same time, include station records that span climatically significant (or at least consistent) periods of time. Imperfect data archives, however, force the selection of less-than-optimal sets of station records. Although sparse and uneven station distributions can seriously bias analyses and interpretations (Willmott and Legates, 1991; Willmott et al., 1991, 1994; Robeson, 1994, 1995), criteria and assumptions for selecting stations are infrequently articulated and tested, and biases are incompletely evaluated and reported. At one extreme, one may elect to maximize the number and spatial resolution of stations within a network-at the expense of reducing the extent to which the station records are lengthy and temporally commensurate. At the other extreme, one may select only those station records that are sufficiently long and temporally commensurate-at the expense of reducing the number of stations and their ability to spatially resolve the climatic field of interest. Most modem climatologies fall somewhere between these two extremes, although many tend to emphasize either the temporal or the spatial dimension. Hulme (199 1, 1992), for example, preserved temporal fidelity whereas Legates and Willmott (1990) favoured a high spatial resolution. Our purpose within this paper is to examine the spatial and temporal sampling errors encountered in compiling large-scale climatologies, especially climatologies based on time-averages of station precipitation. Station archives containing observational time series at stations or grid-points are not our focus, although several of our findings have implications for such data sets. More specifically, we examine the potential effects of limiting the number of stations represented in time-averaged climatologies by requiring that all station averages be obtained from temporally commensurate records of sufficient duration. Our analyses focus on precipitation because of its importance to biological systems and its key role in the hydrological cycle. It also is one of the most highly variable (in both space and time) climatic elements and, therefore, difficult to resolve when station networks are sparse and uneven. Four-hundred and fifty-sevenprecipitation records for the USA, drawn from the Historical Climatology Network (HCN) (Karl et al., 1990), are examined because (i) their spatial and temporal resolutions are sufficiently high that the sampling biases can be tested adequately, and (ii) a variety of precipitation regimes are found in the USA. Findings, therefore, may be cautiously extrapolated to other parts of the world.

2. TESTABLE PROPOSITIONS A computer-intensivesampling strategy (explained below) is used to test the following propositions. Is it generally more accurate to estimate a climatological mean value of precipitation at a location by (i) spatially interpolating it from commensurate averages (representing the same time period) evaluated at the surrounding (usually the nearest) stations, or (ii) using a mean from a different time period (of the same length) but from that same station record? If, for instance, we wished to estimate mean annual precipitation for the period 1971-1980 at Newark, Delaware, would our estimate likely to be more accurate if (i) we interpolated it from 1971 to 1980 means associated with nearby stations or (ii) estimated it by simply substituting a 10-year mean from the Newark, Delaware record but from another time period? At least two important aspects of climatological understanding depend upon the selection of the better approach. In the first and relatively straightforward instance, it is usehl to know which estimation method will likely yield the most accurate estimate of a point precipitation average. If spatial interpolation will likely produce a more accurate estimate, then-when compiling climatologies to evaluate large-scale climate and climatic variability-it is probably better to use fewer station records but records that are high quality, long-term and temporally commensurate. Missing values on the map (either at stations with incomplete or unreliable records, or at unsampled locations) then can be spatially interpolated with relative confidence. If the substitution of a samestation mean generally produces a more accurate estimate, however, it probably is better to increase the spatial resolution of the station network-at the expense of temporal fidelity-by including more stations, even though their records may not exactly represent the same time period.

ESTIMATION OF TIME-AVERAGED PRECIPITATION

1105

If the same-station-mean substitution approach is more accurate, a disturbing implication is that there may be more temporal uncertainty at unsampled locations (between the stations selected to comprise a network) than can be explained by temporal variability within a network’s station records. Spatial integrations of station data to obtain large-scale precipitation averages (e.g. continental, terrestrial, or global), in turn, may be aliased by sparse networks, and time series constructed from such large-scale means may not reliably depict average climatic change. Climatologies compiled from historical station archives alone, in other words, simply may not be able to adequately resolve large-scale average precipitation patterns, variability, or change in general. 3. A RAINGAUGE NETWORK FOR THE USA Rainguage (station) networks are spatial samples of precipitation. It follows that the spatial variability of precipitation can be characterized adequately only when a station network represents (without bias) most of the non-trivial spatial variability. Network adequacy, in other words, depends upon the extent to which the station locations are spatially coincident with important precipitation regimes and gradients. Although the station locations within a network may be fixed, it is important to mention that its adequacy may vary (i) as the precipitation field varies with time and (ii) with the averaging period. Adequacy also may vary if the spatial configuration of a network changes through time (Willmott et al., 1991, 1994). Because precipitation is highly variable in space and time, station networks generally must be dense to sufficiently resolve it. Spatial variability of a precipitation field and network adequacy then are closely linked and usually covary. Karl et al. (1990) provide an unusually dense network of long-term raingauge records for the USA, and we use two subsets of their data here. Their archive (the HCN) was developed for climatic change studies and, therefore, long-term continuous records were preferred. The HCN contains 1219 (mostly cooperative) weather stations that were active in 1987, and had at least 80 years of record. Stations included within our fist subset of the HCN (457stations) each spanned the period 1901-1985, and had few missing observations (Figure l(a)). Within any m-year subinterval of the station record, no station was retained if it had fewer than 90 per cent of the monthly precipitation totals for the same month (e.g. January). Among the retained stations, systematic intra-annual sampling biases, and their effects on the longer-term means, are reduced substantially. Our 457 station subnetwork then provides commensurate long-term monthly records, from which station means can be computed over a meaningfd variety of lo-, 1 5 , 20-, 2 5 , or 30-year periods at each station within the subnetwork. Compared with most other large-scale monthly precipitation (time series) station networks (Willmott et al., 1994), our 457 station subnetwork is quite dense. It contains approximately 57 stations per lo6 km2, and the mean distance to nearest station is about 68 km. Somewhat higher station densities are found in the eastern USA (Figure 1(a)). Lower station densities appear within the drier reaches of the central and western parts of the country. Some patchiness also is apparent in the west. Although ‘corrected’ precipitation data-adjusted for station moves, instrument changes, and missing values (Karl and Williams, 1987)-are contained in the HCN, we use the ‘uncorrected’ averages. Uncorrected averages are more appropriate here because the corrections sometimes involve interstation correlations, an indirect spatial interpolation method. Our spatial interpolations, in other words, cannot be tested fairly on station data that have already been interpolated (directly or indirectly) in space. It is also true that most precipitation climatologies in use today are ‘uncorrected’ for many non-trivial station-record inhomogeneities (Karl and Williams, 1987). Our analyses of the uncorrected records, therefore, should produce error patterns that might arise from the use of many existing climatologies. 4. SPATIAL VARIABILITY OF MEAN PRECIPITATION In view of the synergy between the spatial variability of mean precipitation and network adequacy, we briefly preface our analyses with a description of average precipitation variability across the USA. A second HCN subnetwork is used for this purpose. Thirty-year station averages for the period 1951-1980 were spatially interpolated (Willmott et al., 1985a) from 1203 of the HCN stations (Figure I@)) to the nodes of a 0.25” latitude by 0.25” longitude lattice. When mapped (Figure l(c)), mean annual precipitation across the conterminous USA displays considerable spatial variability. A nearly complete (1203-station) subset of the HCN is used to illustrate

1106

C.

J. WLLMOTT, S . M. ROBESON AND M. J. JANIS

Figure 1. Spatial distribution of (a) 457 stations drawn from the United States Historical Climate Network (HCN) with relatively complete (having fewer than 10 per cent of the observations missing) long-term (1901-1985) monthly precipitation totals, (b) 1203 HCN stations each spanning the period 1951-1980, and (c) spatial distribution of mean annual precipitation (mm) interpolated from the 1203 station averages for the period 1951-1980

ESTIMATION OF TIME-AVERAGED PRECIPITATION

1107

how higher-frequency spatial variability might not be resolved, even by a network as dense as our 457-station subnetwork. Some of the higher-frequency spatial variability in the Pacific Northwest (contained within the 1203station network), for example, clearly cannot be resolved by the 457-station network. Although a significant portion of the precipitation that falls on the USA probably evapotranspired from the USA (cf. Joussaume et al., 1986), external moisture sources also are quite important. Primary external moisture sources include the Gulf of Mexico and the Pacific Ocean, and their influences are particularly evident within the southeast and northwest, where precipitation maxima occur (Figure l(c)). With distance from the Gulf of Mexico, southeastern precipitation gradually declines, whereas, in the northwest, sharp spatial precipitation gradients appear due to the orographichain shadow influences of the Cascade and Sierra Nevada Mountains. Average precipitation is more spatially variable in the western USA than in the eastern states, primarily because of the considerable topographic variability in the west. Summer convection also produces high spatial variability in the eastern and southern parts of the country, but primarily on shorter time-scales. For the entire conterminous USA, our estimate of the spatially averaged, 30-year mean precipitation is 838 mm per year. 5. METHODS FOR ESTIMATING TIME-AVERAGED PRECIPITATION 5.1. Substitution of means fim other periods

At each station ( i ) within the 457-station network, ‘observed’ mean-annual precipitation (pi,,)was computed for lo-, 20-, and 30-year periods between 1901 and 1985. The averaging period is identified by t. Fifteen and 25-year means were also calculated, but not analysed, because they added little additional information. Each lo-, 20-, or 30year averaging period was offset from the beginning of the previous averaging interval by 5 years, producing m minus 5 years of overlap. Ten-year averaging periods, for example, were 1901-1 9 10, 1906-1 9 15, 1911-1 920, and so forth. Observed time-averages (pi,,)were computed at each station, in other words, for 16 overlapping 10-year periods, 14 overlapping 20-year periods, and 12 overlapping 30-year periods. This yielded a 16 x 457 array of observed 10-year means, a 14 x 457 array of observed 20-year means, and a 12 x 457 array of observed 30-year means to which estimates could be compared. Our estimation by substitution is illustrated using means derived from 10-year averaging periods. An observed 10-year mean annual precipitation (pi,t)was ‘estimated’ by substituting another ‘observed’ mean ( p , ~from ) the same station record, but from a different (0) 10-year averaging period. The observed 1921-1930 mean, for example, was estimated with each of the other 15 ten-year means computed from the full 1901-1985 station record. This substitution process then was repeated for all 10-year averaging periods, and all 457 station records. Within each station record, 15 ‘estimated’ 10-year means were substituted for each ‘observed’ 10-year mean in this fashion. Because all permutations of 10-year means were used to estimate each other, temporal autocorrelation biases were minimized. Commensurate calculations were repeated for the 20- and 30-year means. 5.2. Spatial interpolation@om nearby stations

Probably the most common method for estimating mean precipitation at an unsampled location of interest is by spatial interpolation. Traditional spatial interpolation typically involves using a set of sampled values of mean precipitation, at a set of neighbouring locations, to produce an estimate of mean precipitation at a desired location. Only the neighbouring station observations of mean precipitation, and their longitude and latitude coordinates, influence the estimate. Expressed generically, traditional interpolation of mean precipitation is 4,t

=I ( C

AdJ)

(1)

where (kiJ is the interpolated mean preci itation value at location i, Pt is a vector that contains ni nearby (to i ) station means (each of which influences L and dJ are the corresponding +element longitude and latitude vectors, and I( ) is the spatial interpolation function. Spatial interpolation, with a traditional method (Willmott et a[., 1985a), is used here to estimate mean precipitation at unsampled locations. Traditional interpolation algorithms include applications of optimal interpolation (Thiebaux and Pedder, 1978), thin-plate splines (e.g. Hutchinson, 1991; Hutchinson and Gessler, 1994), kriging (e.g. Ishida and Kawashima,

ji,*),

1108

C. J. WILLMOTT. S. M. ROBESON AND M. J. JAMS

1993), inverse-distance weighting (e.g. Willmott et al., 1985a), and spatial regression (Bennett et al., 1984). Although each interpolator performs somewhat differently, the better ones generally estimate precipitation with accuracies on the same order. Bussibres and Hogg (1989), for example, compared four interpolators with respect to their abilities to interpolate daily precipitation over a 7" latitude by 13" longitude region of North America centred on Lake Ontario. Root-mean-square errors (RMSEs) ranged from 3.0 to 3.7 mm day-', with a Gandin-based optimal (statistical) algorithm performing the best. A simplified version of Shepard's (1968) distance-weighting procedure performed nearly as well (RMSE = 3-2 mm day-'). A more complete version of Shepard's (1968) distance-weighting procedure, which was adapted to interpolate on spherical surfaces by Willmott et al. (1985a), is used here. Not only is it relatively accurate, but it is computationally more efficient than either functional minimization (e.g. splines) or spatial covariance-based methods (e.g. kriging).

5.3. Performance evaluations Cross validation (Efron and Gong, 1983) is a useful way to examine the ability of our temporal and spatial estimation procedures to resolve the space-time variability in climatic variables (Robeson, 1994), and we use it here. Our application of cross validation involves removing a station precipitation average (pi,,)and then alternatively estimating it (i) by the mean substitution approach outlined above, and (ii) by interpolating to the removed station location from time-averages for the same period associated with the ni nearby stations. Each of these estimation sequences are repeated for every station and averaging period of interest in the network. With regard to substitution, observed m-year average precipitation at each station (for each m-year average in the station record) then can be subtracted from all other m-year means within that station record. Mean-absolute temporal substitution errors (IPi,8 - pi,,l)then are evaluated for all observations and summarized statistically,With the constraint that 0 # t, the corresponding mean-absolute ( M E , ) and relative (di)performance measures for each station are

and

where n, is the number of m-year means within a station record, and (Pi)is the average m-year mean for station i (Willmott et al., 1985b). Ensemble average errors ( M E and d) are obtained by averaging over all stations. Observed m-year average precipitation values at each station also are subtracted from corres onding (from the same averaging period) interpolated estimates to obtain an estimated spatial interpolation error - Pi,,).Spatial interpolation error statistics, analogous to the temporal substitution error statistics, then are calculated. The number of interpolation differences per station is n,, rather than (ni - nm), and the summations in equations (2) and (3) must be adjusted accordingly. If the interpolation method is reliable, how well precipitation is estimated at the stations individually and collectively reflects the adequacy of the network. Although this is an intuitively appealing way to analyse errors, it should be mentioned that an ill-conditioned station network not only may bias an interpolated precipitation field, but it also may bias the cross-validation error field. The effects of station-location bias on spatially averaged M E values, however, may be reduced by interpolating the station cross-validation errors to a regular grid prior to averaging (Willmott and Matsuura, 1995). Spatially averaged MAE values obtained in this way were commensurate with the station-basedMAE values reported here (Table l), which suggests that our average performance statistics are representative. Nevertheless, interpretations of spatial cross-validation errors should be made cautiously.

(ji,,

1109

ESTIMATION OF TIME-AVERAGED PRECIPITATION

Table I. Ensemble mean-absolute-estimation-error(MAE) and index-of-agreement (6)performance statistics (Willmott et al., 1985b) computed separately for the temporal-mean-substitution and spatial-interpolation analyses. Ninety-five per cent lower (0 and upper (u) confidence intervals are given as well Averaging period

Temporal mean substitution

I

Spatial interpolation

U

I

U

66.1 0.89

67.6 0.89

69.0 0.90

85.7 0.85

88.8 0.86

92.6 0.86

MAE

20 years

44.3 0.92

45.5 0.93

46.7 0.93

80.4 0.86

84.2 0.86

87.7 0.87

MAE d

30 years

32.1

33.1

34.2

77.9 0.86

81.9 0.87

85.6 0.87

MAE

10 years

1 .oo

1.oo

1 .oo

d

d

5.4 Comparison of substitution and spatial interpolation

Substituted-minus-observed mean-precipitation differences (pi,@ - pi,f)at each station, for each averaging (Figure 2(a-c)). Analogous error graphs period, were plotted against long-term average station precipitation (pi) (but of Fi,* - pi,f)also were plotted for the spatial interpolation errors (Figure 3(a-c)). Several expected patterns are apparent in both the substitution- and interpolation-error plots. Stations with higher precipitation averages, for instance, often have higher variabilities. As the averaging period lengthens, the error variance declines, especially the temporal error variance. Although this may seem to be a partial function of temporal autocorrelation that arises from our m minus 5-year overlaps of the averaging periods, by and large, it is not’. It is an intrinsic characteristic of mean precipitation variability within the USA (Figure 4). It is clear from the performance statistics (Table I) that the within-station-record substitution of means is about 1.3 to 2.5 times more accurate than is interpolation from surrounding station means, when the nearby means represent the same averaging period. Average error associated with estimating any 10-year mean precipitation from any other 10-year mean, from within the same station record, is about 8 per cent of the 1951-1980 (838 mm) spatial mean, or 67.6 mm (Table I). For 20-year means, errors are of the order of 5 per cent or 45.5 mm, whereas 30-year-mean estimation errors are approximately 4 per cent or 33.1 mm. When 10-year means are spatially interpolated from commensurate 10-year means evaluated at nearby stations, average estimation error is over 10 per cent or 88-8 mm. Interpolation errors associated with estimating 20- and 30-year means are somewhat lower, but also of the order of 10 per cent (Table I). Relative error measures (6)further indicate that substitution (temporal estimation) errors decline as the averaging period increases, whereas the averaging period seems to have little influence on relative interpolation accuracy. Interpolation errors are larger than substitution errors because the network cannot adequately resolve the spatial variability arising primarily from orographic influences and patchy convective precipitation. As the patterns of between-station variability also vary with time, even more sophisticated interpolative models, using additional independent variables such as topography (e.g. Daly et al., 1994), have a difficult time resolving between-station space-time variability. Ten-year-mean substitution (Figure 5(a)) and interpolation (Figure 5(b)) MAE values associated with the stations, when mapped, reveal quite dissimilar error fields. Within-station-record substitution produces a more accurate and uniform error field, although it covaries with mean precipitation. Wetter regions generally exhibit



An experiment was conducted to determine the extent to which the rn minus 5-year overlaps of adjacent averaging periods may have biased our substitution analyses. Within a station record, two randomly identified 10-year averages were selected and pair-wise differenced. An absolute value of the pair-wise difference was then obtained. This was repeated 500 times for each station, from which a ‘randomized’station MAE was calculated. All (457) station MAE values were then plotted (Figure 4). Unlike our rn minus 5-year overlapping means, time contiguity and overlap were not constrained, randomizing any temporal overlap and minimizing any temporal autocomelation bias. Results of our original calculations (with rn minus 5-year overlapping of adjacent means) fall near the middle of this range of differences. This sbongly suggests that the m minus 5-year overlaps inject little bias into our results.

1110

C. J. WILLMOTT, S. M. ROBESON AND M. J. JANIS ,

1600 n

1200 W

44

0

.c .u 4

8

-000

t)

2-1200

Q

- 16

O 0

O 1200

600

. 1800

1901 -85

Mean Precipitation

a 24 30

(mm)

1600

(b)

n

1200; v

o,

000-

8

u C

2

400-

j-----

0) ) .

2

0

-

.-

O:.-

0 0

-400-

0

0

.-V

8

.& -000.L

1

e-1200-

n

. .

-1600

. . . , . . .

.

.,

,

. , . . . .

. . .

.

8

0

Q

0

.< a

.-u

-800

2-1200

a

- 1600 1 . 0

. .

.

. , . . 600

.

.

.,... 1200

Mean Precipitation

.

.,..... 1800

1901 -85

2400

(mm)

Figure 2. Substituted-minus-observedm-year-mean differences ( p , , ~- 4,) at each of the 457 stations plotted against long-term (1901-1985) mean precipitation ((Pi)) at each station (11:(a) 10-year-mean differences; @) 20-year-mean differences; (c) 30-year-mean differences. A subset of size nm of the (ni - n,) differences available for each station was randomly selected and plotted, without replacement

1111

ESTIMATION OF TIME-AVERAGED PRECIPITATION

.

-1600

.

. . .

0

.

..

600

. .

.

0

.

. . .

.

I

.

..

. . ,

E E

U

P)

V C

IP)

.a .-‘0 0 ..-

1200

- B

..

. .

1800

.

. .

.

1200

606

.

I

...I

2400

(mm)

..

1800

1901 -85

Mean Precipitation

n

.

1901 -85

Mean Precipitation

-1600

.

1200

.

..

I

2400

(rnm)

%

400 800

c-

I

Y

p.

0-..

-800 -400

u

I -1200n -16001.. 0

. .

.,... 600

. . ,

... .

1200

Mean Precipitation

. ,

.. ...

1800

1901 -85

2‘ 30

(mm)

Figure 3. Interpolated-minus-observed m-year-mean differences (ji,,- &) at each of the 457 stations plotted against long-term (1901-1985) mean precipitation ((4))at each station (13 : (a) 10-year-mean differences; @) 20-year-mean differences; (c) 30-year-mean differences. All n, differences associated with each station are plotted

1112

C.

J. WILLMOTT, S. M. ROBESON AND M. J. JAMS

Figure 4. Station mean-absolute differences (MAEs) computed from 500 randomly selected pairs of 10-year averages (drawn from the same station record) for all 457 stations. Ninety-five per cent bootstrap confidence intervals are shaded. The original error estimate (MAE) is also plotted as a vertical dashed line.

higher substitution errors than drier areas. As mentioned above, much of the interpolation error arises because topographic exposure to precipitation-bearing weather systems is insufficiently resolved by the raingauge network. Precipitation at some relatively wet and isolated stations (e.g. at Prescott, Arizona), as a consequence, is grossly underinterpolated from nearby drier stations, during cross-validation. These relatively wet stations also contribute to overestimation at the drier stations. Such problems contribute to the large interpolation errors that are common in the topographically rugged west and southwest. An inability to fully resolve (by interpolation) somewhat patchy convective regimes, largely within the south and east, is apparent as well. It should be noted that mean substitution errors decrease more rapidly with increases in the averaging period than do interpolation errors (Table I); therefore, differences between the substitution and interpolation error fields become more pronounced as the averaging period increases. Although our analyses are based primarily on two test networks (both drawn from the HCN), we know that interpolation errors strongly vary with network configurations (Robeson, 1994; Willmott et al., 1994). The network influences on interpolation errors inherent in the HCN subsequently were investigated using cross validation, over a range of network densities. Subnetworks (of size n,) were created from the 1203-station HCN network by randomly sampling 30-year (1951-1980) mean precipitation values at the stations. Mean precipitation values at the stations of each n,-station subnetwork then were interpolated back to the stations, with the mean observed at station i removed from the interpolation to station i. The observed mean at each station then was subtracted from the interpolated station mean to obtain estimates of the interpolation errors at the stations. One-hundred (ne) randomly selected subnetworks were evaluated for each of 21 network sizes (Figure 6). Typical MAE values ranged from nearly 100 mm, for subnetworks containing 200 stations, to just over 70 mm for all 1203 stations. These limits correspond to station densities ranging from 25 to over 150 per lo6 kin2. Because a 70-mm interpolation error is considerably higher than the 33.1 mm 30-year-mean substitution M E , it appears that even networks as dense as the full HCN cannot adequately represent the spatial variability in average precipitation. Additional station records, spanning unequal time periods of necessity, would likely improve the climatological value of even the HCN.

6. SUMMARY AND CONCLUSIONS Two types of sampling iases inherent in station archives of time-averagec precipitation (climatologies) were examined. Our impetus was to ascertain whether it is better, in general, to assure that all station averages included within a climatology were evaluated from temporally commensurate records, or whether it is better to include means obtained from unequal averaging periods, in order to increase the spatial resolution of the station network and associated mean precipitation field. Precipitation was of interest because it is one of the most highly variable

ESTIMATION OF TIME-AVERAGED PRECIPITATION

1113

Figure 5 . Ten-year mean-absolute differences (MAEs),at the stations, mapped for (a) within-station-record substitution and (b) spatial interpolation

(in both space and time) climatic elements and, as a consequence, relatively difficult to resolve by many existing station networks. Using the Historical Climatology Network (HCN) (Karl et al., 1990), 457 monthly precipitation records for the USA were selected and temporally averaged over lo-, 20-, and 30-year intervals. These ‘observed’ m-year means served as the ‘truth’ with which we compared other ‘estimated’ m-year means. Two separate sets of estimated means were obtained by (i) substituting other m-year means, from within the same station record, for each

1114

C. J. WILLMOTT, S. M. ROBESON AND M. J. JANIS 120,

. . . , . ......................

601 20

40 60 80 100 120 140 1 i0 Somple Size (Stations per 10’ km’)

Figure 6. Mean-absolute interpolation errors (MAEs) associated with 21 sets (each of a different station density) of 100 randomly selected subnetworks drawn from the higher-resolution (1203-station) HCN. Box plots representing each set of 100 M E s depict central tendency, the upper and lower quartiles, and the range

‘observed’m-year mean of interest, and (ii) spatially interpolating the m-year means from m-year means associated with nearby stations. Evaluations of each set of lo-, 20-, and 30-year means were mutually exclusive. For precipitation-station networks of the order of 50-60 stations per lo6 km2,performance statistics suggested that the within-station-record substitution of means is about 1.3 to 2.5 times more accurate than is interpolation from surrounding station means. The error incurred by estimating any 10-year mean precipitation from any other 10-year mean (from within the same station record), for example, was about 8 per cent of the 1951-1980 spatial mean, or 67.6 mm, whereas interpolating 10-year means from nearby stations produced errors of the order of 10 per cent, or 88.8 mm. It appears that mid-latitude raingauge networks, with station densities of the order of 50-60 stations per 1O6 km2or less, cannot adequately resolve the spatial variability in mean precipitation. This may even be true of somewhat more dense networks. Orographically induced variability and patchy precipitation patterns caused by frequent convective precipitation produced most of the unresolved spatial variability. Our results indicate that most of the space-time variability in long-term averages of precipitation is spatial, and that it is insufficiently resolved by most existing large-scale station climatologies. It also seems that climatologies containing a greater number of station averages, even if they are computed over unequal averaging periods, better represent the space-time variability in mean precipitation than climatologies containing fewer, but temporally commensurate, station means (i.e. most archives of climatic ‘normals’). Although our findings have wider implications, one should be cautious in extending them to other regions, times, networks, or climatic variables. They are based on mid-latitude North American precipitation data only, a limited set of network configurations, and a single spatial interpolation algorithm. It is encouraging that more sophisticated interpolation models, which make use of additional independent variables such as topography, are being developed (e.g. Daly et al., 1994). Multivariate interpolation should reduce interpolation errors and, in turn,allow us to produce higher resolution and more reliable precipitation climatologies. ACKNOWLEDGEMENTS

Portions of this research were f h d e d by NASA Grant NAGW-1884 and US EPA Cooperative Agreement No. CR 816 278. Several helpful discussions with Kenji Matsuura and Scott Webber at the University of Delaware also are greatly appreciated. REFERENCES Bennett, R. J., Haining, R. F! and Griffith, D. A. 1984. ‘The problem of missing data on spatial surfaces’, Ann. Assoc. Am. Geogr., 74 138-156. Bradley, R. S., Diaz, H. F., Eischeid, J. K., Jones, P. D., Kelly, P. M. and Goodess, C. M. 1987. ‘Precipitation fluctuations over Northern Hemisphere. land areas since the mid-19th century’, Science, 237, 171-175.

ESTIMATION OF TIME-AVERAGED PRECIPITATION

1115

Bussikres, N. and Hogg, W. 1989. ‘The objective analysis of daily rainfall by distance weighting schemes on a mesoscale grid’, Amos. Ocean, 27, 521-541. Daly, C., Neilson, R. F! and Phillips, D. L. 1994. ‘A statistical-topographic model for mapping climatological precipitation over mountainous terrain’, 1 Appl. Meteoml., 33, 140-158. Diaz, H., Bradley, R. S. and Eischeid, J. K. 1989. ‘Precipitation fluctuations over global land areas since the late 1800’s’, 1 Geophys. Res., 94(DI), 1195-1210. Efron, B. and Gong, G. 1983. ‘A leisurely look at the bootstrap, the jackknife, and cross-validation’, Am. Statistician, 37, 3 6 4 8 . Ellsaesser, H. W., MacCracken, M. C., Walton, J. J. and Grotch, S. L. 1986. ‘Global climatic trends revealed by the recorded data’, Rev. Geophys., 24, 745-792. Gates, W. L. 1992. ‘AMIP: the atmospheric model intercomparison project’, Bull. Am. Meteorol. SOC.,73, 1962-1970. Hulme, M. 1991. ‘An intercomparison of model and observed global precipitation climatologies’, Geophys. Res. Lett., 18, 1715-1718. Hulme, M. 1992. ‘A 1951-80 global land precipitation climatology for the evaluation of general circulation models’, Climate Q n . , 7, 57-72. Hutchinson, M. F. 1991. ‘The application of thin plate smoothing splines to continent-wide data assimilation’, in Jasper, J. D. (ed.), Data Assimilation Systems, BMRC Research Report No. 27, Bureau of Meteorology, Melbourne, Australia, pp. 104-1 13. Hutchinson, M. F. and Gessler, F! E. 1994. ‘Splines--more than just a smooth interpolator’, Geoderma, 62, 45-67. Ishida, T. and Kawashima, S. 1993. ‘Use of colcriging to estimate surface air temperature from elevation’, Theor: Appl. Climatol.,47, 147-157. Joussaume, S., Sadourny, R. and Vignal, C. 1986. ‘Origin of precipitating water in a numerical simulation of the July climate’, Ocean-Air Intemctions, 1, 43-56. Karl, T. R. and Williams, C. N. Jr. 1987. ‘An approach to adjusting climatological time series for discontinuous inhomogeneities’, 1 Clim. Appl. Meteoml., 26, 17441763. Karl, T. R. Williams, C. N. Jr. and Quinlan, F. T. 1990. United States Historical Climatology Network Serial Tempemlure and Precipitation Data, NDP-O19/RI, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, TN. Leemans, R. and Cramer, W. P. 1991. The IIASA Data Basefor Mean Monthly blues of Tempemlure, Precipitation and Cloudiness on a Global Ternsfrial Grid, International Institute of Applied Systems Analysis, RR-91-18, Laxenburg, Austria. 61 pp. Legates, D. R. and Willmott, C. J. 1990. ‘Meanseasonal and spatial variability in gauge-corrected, global precipitation’, Int. 1 Climatol., 10, 1 1 1-127. Legates, D. R. and Willmott, C. J. 1992. ‘A comparison of GCM-simulated and observed mean January and July precipitatoion’, Palaeogeogr Palaeoclimatol. Palaeoecol. (Global Planet. Change Sect.), 97, 345-363. Robeson, S.M. 1994. ‘Influence of spatial sampling and interpolation on estimates of air temperature change’, Clim. Res., 4, 119-126. Robeson, S.M, 1995. ‘Resampling of network-induced variability in estimates of terrestrial air temperature change’, Climatic Change, 29,213229. Shepard, D. 1968. ‘A two-dimensional interpolation function for irregularly spaced data’, Proceedings of the 23rd National Conference, ACM, pp. 517-523. Thikbaux, H. J. and Pedder, M. A. 1987. Spatial Objective Analysis, Academic Press, New York. Vose, R. S., Schmoyer, R. L., Steurer, P. M., Peterson, T. C., Heim, R., Karl, T. R. and Eischeid, J. K. 1992. The Global Historical Climatology Network: Long-term Monthly Temperature, Precipitation, Sea Level Pressure, and Station Pressure Data. ORNL/CDIAC-53, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, TN. Willmott, C. J., and Legates, D. R. 1991. ‘Rising estimates of terrestrial and global precipitation’, Climate Res., 1, 179-186. Willmott, C. J. and Matsuura, K. 1995. ‘Smart interpolation of annually averaged air temperature in the United States’, 1 Appl. MeteomL, 34, 2577-2586. Willmott, C. J., Rowe, C. M. and Philpot, W. D. 1985a. ‘Small-scale climate maps: A sensitivity analysis of some common assumptions associated with grid-point interpolation and contouring’, Am. Cartogr, 12, 5-16. Willmott, C. J., Ackleson, S. G., Davis, R. E., Feddema, J. J., Klink, K. M., Legates, D. R., O’Donnell, J. and Rowe, C. M. 1985b. ‘Statistics for h e evaluation and comparison of models’, 1 Geophys. Res., 90 (C5), 8995-9005. Willmott, C. J., Robeson, S. M. and Feddema, J. J. 1991. ‘Influence of spatially variable instrument networks on climatic averages’, Geophys. Res. Lett., 18, 2249-225 1. Willmott, C. J., Robeson, S. M. and Feddema, J. J. 1994. ‘Estimating continental and terrestrial precipitation averages from rain-gauge networks’, Int. 1 Climatol., 14, 403414.