comparison of performance of classic and data mining methods in ...

12 downloads 14746 Views 673KB Size Report
evaluating classical and data mining methods performance in filling gaps of ... classical and data mining sets of methods applied for estimating missed data.
International Water Conference 2016 Water Resources in Arid Areas

COMPARISON OF PERFORMANCE OF CLASSIC AND DATA MINING METHODS IN MISSING PRECIPITATION ESTIMATION IN ARID AREAS Mohammad Taghi Sattari1* and Ali RezazadehJoudi2 1- Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran * [email protected] 2-Young Researchers and Elite Club, Maragheh Branch, Islamic Azad University, Maragheh, Iran. [email protected] Precipitation is one of the most important factors in hydrological cycle. Monthly precipitation data are one of the basic elements in hydrological and meteorological studies, especially in arid regions. Without complete and reliable data sets interpretation of hydrological models results is impossible. In this paper, for evaluating classical and data mining methods performance in filling gaps of monthly rainfall data, six weather stations located in arid regions at Southern of Iran were selected. In the first stage by using Normal standard homogeneity test and MannKendall trend test the homogeneity and trend of data in six stations assessed. In second stage based on the correlation between stations, the Khash station selected as a target station for filling gap data. In the third stage a portion of Khash station monthly rainfall data considered randomly as a missed data. Finally by using classical and data mining sets of methods applied for estimating missed data. In this study five classic statistical methods including Normal Ratio, Inverse Distance, UK traditional method, multiple linear regression, Multiple Imputations, and three modern data mining methods including Multilayer perceptron, Support vector regression and K-Nearest neighbors, used for estimating missing precipitations data. The comparison of results indicate that the higher performance of Normal ratio method among the classic statistical methods and higher performance of K-NN method among the modern data mining methods. According to simplicity of Normal ratio method, application of this method in estimation of missing values of precipitation in similar climates is suggested. Key words: Missing precipitation data, Normal Ratio, Inverse Distance, K-NN, Support vector regression.

1

International Water Conference 2016 Water Resources in Arid Areas

1. Introduction Understanding and quantifying the spatial and temporal variability of precipitation in a watershed are crucial tasks for hydrologic modeling, analysis and design of water resources systems. Availability of continuous precipitation data at different spatial and temporal scales is essential for hydrologic simulation models that use precipitation as an input for accurate prediction of watershed response to different precipitation event (teegavarapu et al. 2009). Existence of gaps in the records of data acquisition systems are often attributed to various reasons such as absence of the observer, instrumental failure and communication line breakdown especially in developing countries (Hasanpour Kashani and Dinpashoh 2012). The accuracy of reconstruction of missing hydrological data differs among variables and has been the lowest for precipitation. This is due to highly stochastic nature of precipitation being relatively compared to other climatic variables such as temperature, which show lower degree of spatial and temporal variability (Kim and Pachepsky 2010). According to importunity of having access to reliable and complete precipitation data sets lots of efforts have been done in this field. For example, Eisched et al. (1995) examined the performance of six methods including empirical and statistical methods for monthly mean temperature and monthly precipitation. They concluded that the multiple linear regression method is the best among others. Xia et al. (1999) estimated the missing hydrological data including precipitation using six methods at Bavaria, Germany. They found that the multiple linear regression method had high ability in estimation of missing data of study area. Teegavaapu and Chandramouli (2005) developed artificial neural network, kriging and inverse distance weighting method (IDWM) for estimation of missing precipitation data in the state of Kentucky, USA. Results suggested that the conceptual revisions can improve estimation of missing precipitation records by defining better weighting parameters and surrogate measure for distances in (IDWM). De Silva et al (2007) use Aerial precipitation ratio method, Arithmetic mean method, Normal ratio method and Inverse distance method in estimation missing rainfall data. Results show normal ratio method is the most suitable method. Further, Arithmetic mean method is more appropriate for upcountry Wet zone while Aerial precipitation ratio method is more suitable for mid-country Wet zone. Teegavarapu (2009) used association rules within weighting methods to improve estimation of precipitation data. Dastorani et al (2009) tried to predict the missing data using normal ratio method, the correlation method, a relevant architecture of ANN as well as ANFIS. According to the results, the ANFIS technique presented a superior ability to predict missing flow data. ANN was also found as an efficient method to predict the missing data in comparison to the traditional approaches. Hasanpour Kashani and Dinpashoh (2012) evaluate the efficiency of different estimation methods for missing climatological data. The results show that although the artificial intelligence

2

International Water Conference 2016 Water Resources in Arid Areas techniques are more complex and time consuming models in identifying their best structure for optimum estimation, but they outperform the classical methods. Also results indicate that multiple regression analysis method is the suitable method among the classical methods. Gughe and Regulwar (2013) use artificial neural network method for estimation of missing precipitation data. Results suggest that ANN model can be work for estimation of missing data. Che Ghani et al (2014) estimated the missing rainfall data in Raja River, using gene expression programing method. The study illustrates the applications of GEP to determine the most suitable rainfall station to replace the principal rainfall station. it seems that there are no significant studies on evaluation of various methods for estimation of missing precipitation data in arid parts of Iran. On the other hand, most of the done researches are about application of ANN and GEP methods in comparison of classic methods but there is not any significant research about evaluation of efficiency of K-nearest neighbors (K-NN), Support vector machine (SVM) and multilayer perceptron (MLP) artificial neural networks in the field of estimation of missing data precipitation especially in arid areas. The aim of this study is to investigate the ability of 8 different classic traditional and data-driven methods to estimate missing precipitation data in arid parts at southern of Iran and identify the most appropriate method. The eight examined methods include inverse distance interpolation (ID), multiple imputations (MI), multiple regression analysis (MLR), normal ratio method (NR), UK traditional method (UK) and three data-mining methods including KNN, SVM and MLP.

2. Materials and methods 2.1. Study area The studied region located at southern and eastsouth of Iran and includes an area approximately more than 250 thousand square kilometers. The studied region includes hot and dry parts of Iran. The Climate at this area influenced with arid and semi-arid climate. The amount of monthly precipitation data of the seven rain gauge stations located in southern Iran such as Bandar Abbas, Zahedan, Iranshahr, Saravan, Minab, Khash and Abomoosa Island between 1986 and 2014 are taken to use in this investigation. The climate of each station was determined using the De Martonne (1923) aridity index as equation (1): 𝑃 (1) 𝑇 + 10 Where P and T are the average annual precipitation (mm) and temperature (⁰C), respectively. Figure 1 shows the geographical area of the studied region and Table 1 displays geographic coordinates, the heights and aridity index of examined weather stations. Table 2 represent the statistics of precipitation data of studied stations. 𝐼=

3

International Water Conference 2016 Water Resources in Arid Areas

Fig1. geographical area of study region Table 1-Characteristic of selected weather stations of Iran Station

Latitude Longitude Elevation Index of (N) (E) (m) aridity Abomoosa Island 25⁰ 50´ 6.6 0.280 54⁰ 50´ Bandar Abbas 9.8 0.383 27⁰ 13´ 56⁰ 22´ Zahedan 1370 0.212 29⁰ 28´ 60⁰ 53´ Iranshahr 591.1 0.239 27⁰ 12´ 60⁰ 42´ Saravan 1195 0.282 27⁰ 20´ 62⁰ 20´ Khash 1394 0.399 28⁰ 13´ 61⁰ 12´ Minab 29.6 0.436 27⁰ 6´ 57⁰ 5´ Table 2-Range and Statistics of investigated stations

Climate type Dry Dry Dry Dry Dry Dry Dry

Min (mm) Max (mm) Mean (mm) Standard deviation Abomoosa Island 0 205 10.653 28.037 Bandar Abbas 0 194.7 14.128 33.402 Zahedan 0 192.3 7.438 19.314 Iranshahr 0 163.5 8.684 18.176 Saravan 0 195.3 10.448 22.594 Khash 0 183.6 12.494 26.944 Minab 0 195.3 16.804 35.979 In this study we hypothetically and randomly assumed about 10 percent of existing monthly precipitation data obtained from the Khash station are unavailable. After applying statistical analysis and quality controlling of available data’s we have tried to evaluate the efficiency of different classic traditional and modern data-minig methods in estimation of those hypothetically lost data’s.

2.2. Inverse distance interpolation (ID) This method is used to estimate missing data because of its simplicity (Hubbard, 1994) (Hasanpour Kashani and Dinpashoh, 2012)

4

International Water Conference 2016 Water Resources in Arid Areas 𝑉 (2) ∑𝑛𝑖=1 ( 𝑖 ) 𝑑𝑖 𝑉0 = 1 ∑𝑛𝑖=1 ( ) 𝑑𝑖 Where di is the distance between the station having the missing data and the ith nearest weather station. The other parameters introduced before. (Hasanpour Kashani and Dinpashoh, 2012)

2.3. Normal ratio method (NR) The normal ratio method which first proposed by Paulhus and Kohler (1952), and later modified by Young (1992) is a common method for estimation of rainfall missing data. The estimated data are considered as a combination of variables with different weights i.e. 𝑉0 =

∑𝑛𝑖=1 𝑊𝑖𝑉𝑖 (3) 𝑛 ∑𝑖=1 𝑊𝑖 Where wi is the weight of ith nearest weather station and can be estimated as:

(4) 𝑛𝑖 − 2 2 )] 1 − 𝑟𝑖 Where ri is the correlation coefficient between the target station and the ith surrounding station, ni is the number of points used to drive correlation coefficient. (Hasanpour Kashani and Dinpashoh, 2012) 𝑊𝑖 = [𝑟𝑖2 (

2.4. Multiple linear regression (MLR) MLR is a statistical method for estimating the relationship between a dependent variable and two or more independent (or predictor) variables. MLR produces a model that identifies the best weighted combination of independent variables to predict the dependent (or criterion) variable. Eischeid et al, (1995) highlighted many advantages of this method in data interpolation and estimation of missing data. Missing data (V0) were estimated as: 𝑛

(5)

𝑉𝑜 = 𝑎0 + ∑(𝑎𝑖 𝑉𝑖 ) 𝑖=1

Where ai, a1,…,an are regression coefficients.

2.5. Multiple imputations (MI) A once-common method of imputations is single imputation method which allows parameter estimation. However, the single imputation ignored the estimation of variability which leads to underestimation of standard errors and confidence intervals. To overcome underestimation problem, multiple imputations method is used, where

5

International Water Conference 2016 Water Resources in Arid Areas each missing value is estimated with a distribution of imputations that reflect the uncertainty about the missing data. Multiple imputations provide one of the best methods in dealing with missing values. Since the rainfall data is heavily skewed to the right, the data need to be transformed by taking the natural logarithm of observed data before the method is applied. Then, the average of imputed data set is calculated and used to fill in the missing data in target station (Radi et al, 2014). In many studies, five imputed data sets are considered enough. For example, Schafer and Olsen (1998) suggest that in many applications, three to five imputations are sufficient to obtain excellent results. In this study the statistical XLSTAT software used to generate multiple imputations.

2.6. UK traditional method (UK) This method traditionally used by UK Meteorological Office to estimate missing temperature and sunshine data was based on comparisons with a single neighboring station (Hasanpour Kashani and Dinpashoh, 2012). In this study for rainfall data’s estimation, the ratio of the average of rainfall amounts in target station and the average of rainfall amounts at the station with highest correlation with target station were calculated. Then the obtained ratio was multiplied to the available data at the station with highest correlation with target station and the corresponding missing rainfall data obtained.

2.7. K-Nearest Neighbors (K-NN) The k-nearest-neighbor method was first described in the early 1950s. The method is labor intensive when given large training sets, and did not gain popularity until the 1960s when increased computing power became available. It has since been widely used in the area of pattern recognition. Nearest-neighbor classifiers are based on learning by analogy, that is, by comparing a given test tuple with training tuples that are similar to it. The training tuples are described by n attributes. Each tuple represents a point in an n-dimensional space. In this way, all of the training tuples are stored in an ndimensional pattern space. When given an unknown tuple, a k-nearest-neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple. These k training tuples are the k “nearest neighbors” of the unknown tuple. “Closeness” is defined in terms of a distance metric, such as Euclidean distance (equation 6) (Han and Kamber 2006). (6) k

 (x i 1

i

 yi ) 2

Where x and y are two variables.

2.8. Support Vector Neighbors (SVM) The first paper on support vector machines was presented in 1992 by Vladimir Vapnik and colleagues Bernhard Boser and Isabelle Guyon, although the groundwork

6

International Water Conference 2016 Water Resources in Arid Areas for SVMs has been around since the 1960s. Although the training time of even the fastest SVMs can be extremely slow, they are highly accurate, owing to their ability to model complex nonlinear decision boundaries. They are much less prone to overfitting than other methods. The support vectors found also provide a compact description of the learned model. SVMs can be used for prediction as well as classification. They have been applied to a number of areas, including handwritten digit recognition, object recognition, and speaker identification, as well as benchmark time-series prediction tests. It uses a nonlinear mapping to transform the original training data into a higher dimension. Within this new dimension, it searches for the linear optimal separating hyperplane (that is, a “decision boundary” separating the tuples of one class from another). With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane (Han and Kamber 2006). 2.9. Multilayer perceptron ANNs (MLP) A multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training the network (Rosenblatt 1962; Rumelhart et al. 1986). MLP is a modification of the standard linearperceptron and can distinguish data that are not linearly separable (Cybenko 1989). The multilayer perceptron consists of three or more layers (an input and an output layer with one or more hidden layers) of nonlinearly-activating nodes and is thus considered a deep neural network. 2.10 . Performance criteria In order to compare the accuracy of the discussed methods for reconstructing missing monthly rainfall data and selecting the most appropriate one, four statistical measures were used as follows:

rpearson =

MAE 

∑ni=1(xi − x̅)(yi − y̅)

(7)

√∑ni=1(xi − x̅)2 ∑ni=1(yi − y̅)2

1 n  X i  Yi n i 0

(8) ∑n (yi − xi )2 RMSE = √ i=0 N

(9)

Where, x is the observed value and y denotes the computed value.

3. Results According to notability of of using accurate and reliable data sets in climate and hydrologic studies, at this study in the first step different Homogeneity and 7

International Water Conference 2016 Water Resources in Arid Areas normality Tests and Mann-Kendall Trend Test were applied to data sets using XLSTAT software. Results indicate the relibility of data sets used in this study. According the role of monthly precipitation correlation between different stations in assesment of best input variables, these amounts are shown in table 3. Table 3-correlation matrix of investigated stations Zahedan

Iranshahr

Saravan

Bandar Abbas

Minab

Abomoosa

Khash

Zahedan

1

0.567

0.600

0.463

0.546

0.453

0.731

Iranshahr

0.567

1

0.796

0.544

0.595

0.580

0.825

Saravan

0.600

0.796

1

0.431

0.474

0.476

0.703

Bandar Abbas

0.463

0.544

0.431

1

0.837

0.708

0.631

Minab

0.546

0.595

0.474

0.837

1

0.697

0.714

Abomoosa

0.453

0.580

0.476

0.708

0.697

1

0.627

Khash

0.731

0.825

0.703

0.631

0.714

0.627

1

As can be seen in table 3, the amount of precipitation in synoptic station of Khash has the most correlation with stations in Iranshahr, Zahedan and Minab, respectively. In modeling with classic statistical methods, the data related to stations with the highest correlation with the data related to the target station were used in accordance with the method. In multiple imputation method, it was found out that the best results are obtained when 3 imputed stations are used this is in accord with Schafer and Olsen (1998). Using modern data mining methods, at first various combinations of input parameters including precipitation in various cities neighboring Khash were tested. The findings of the modeling showed that the best results are obtained when the data related to monthly precipitation in stations of Iranshahr, Zahedan, Minab and Saravan are used. Using K-NN method it is found out best result obtained using 3 nearest neighbors. Using SVR method testing different kernel functions, the pearson universal kernel with amounts of (C=0.4, σ and ω = 3). Using MLP method the best structure was found out as (3-4-1) using sigmoid function as transfer function and amount of 0.4 for learning rate. Results obtained from various classic statistical methods and modern data mining methods are presented in table 4. Table 4-Performance criteria values for different methods of estimating missing monthly rainfall data Method

Classic statistical methods

Modern data mining methods

NR ID UK MI MLR K-NN SVR MLP

8

R 0.928 0.923 0.847 0.856 0.824 0.918 0.894 0.843

RMSE (mm) 18.77 22.76 19.38 20.29 28.20 19.01 16.48 20.21

MAE (mm) 9.15 10.61 11.28 15.27 15.42 9.48 9.76 13.46

International Water Conference 2016 Water Resources in Arid Areas

140

140

120

120

Estimated with ID (mm)

Estimated with NR (mm)

As can be seen in results obtained from table 4, among of classical statistical methods, Normal ratio method and inverse distance method and among of modern data mining methods K-NN method are highly accurate, respectively. The high accuracy of classic statistical methods seems to be due to the fact that the stations under study followed a rather similar precipitation pattern and have simmilar climate type. Results of comparing various methods are presented in scatter plots and time-series charts in figures (2) and (3).

100

80

60

40

20

100

80

60

40

20

0

0

20

40

60

80

100

120

0

140

0

20

40

Actual (mm)

a

80

100

120

140

100

120

140

100

120

b

140

140

120

120

Estimated with MI (mm)

Estimated with UK (mm)

60

Actual (mm)

100

80

60

40

20

100

80

60

40

20

0

0 0

20

40

60

80

100

120

140

0

20

40

Actual (mm)

60

80

Actual (mm)

c

d 140

200 180

120

Estimated with K-NN (mm)

Estimated with MLR (mm)

160

140 120 100 80 60

100

80

60

40

40 20

20 0

0

0

50

100

150

0

200

20

40

60

80

Actual (mm)

Actual (mm)

f

e

9

140

International Water Conference 2016 Water Resources in Arid Areas 140

180 160

Estimated with MLP (mm)

Estimated with S VR (mm)

120

100

80

60

40

140

120 100 80 60

40

20

20 0

0

20

40

60

80

100

120

0

140

0

20

Actual (mm)

40

60

80

100

120

140

160

180

Actual (mm)

g

h

Fig2 .Scatter diagram of predicted and observed precipitation values, a)NR b)ID c)UK d)MI e)MLR f)KNN g)SVR h)MLP

Values of Precipitation (mm)

160

Actual

NR

ID

K-NN

SVR

140 120

100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Month having missed data

Fig3 .Time series of values estimated with selected methods and observed missing precipitation

As it can be found out from a more accurate investigation of figures (2) and (3), the values obtained from normal ratio, inverse distance and K-NN methods related to Khash station had higher fitness of distribution and consistency compared to its observational data and in comparison with other studied methods in this research which indicates their high capabilities in estimating the missing values of monthly precipitation. In figure (3) the time-series of monthly precipitation obtained through normal ratio and inverse distance method were presented as the best classical statistical methods and K-NN and SVR methods as best the modern data mining methods compared to observational values of monthly precipitation in Khash station which were considered as missing values. This indicates the accuracy and accordance of the obtained results from mentioned methods, compared to other methods.

4. Discussion Existence of gaps in the records of data acquisition systems are often, but in order to reliable and confident hydrological studies a number of different methods are suggested to handle this problem. In this study after being confident about homogeneity, normality and lack of trend in data sets, different classic statistical methods and modern 10

International Water Conference 2016 Water Resources in Arid Areas data mining methods were applied to filling gaps in monthly precipitation data in Khash station. Results indicate that among the investigated methods normal ratio method has the best performance in estimating missing values of monthly precipitation at studied area with criteria amounts of (CC=0.928, RMSE=18.77 (mm) and MAE=9.15 (mm). This is due to the similar climate type and precipitation pattern in investigated stations. Among the modern data mining methods, K-NN method has higher accuracy rather than SVR and MLP methods with criteria amounts of (CC=0.918, RMSE=19.01 (mm) and MAE=9.48 (mm). In a similar study De silva et al (2007) suggested normal ratio and inverse distance method for estimating messing precipitation values in dry climates. In another research Hasanpour Kashani and Dinpashoh (2012) stated high accuracy of normal ratio and inverse distance methods after simple arithmetic averaging method among the classic statistical methods for estimating missing precipitation data in dry and semi-dry areas. Unfortunately, they did not use recently developed methods such as K-NN , SVR and MLP. According to simplicity of Normal ratio method, application of this method in estimation of missing values of precipitation in similar climates is suggested. In this research the appropriate methods for estimating missing precipitation values were selected for Iran, in our opinion the results would be applicable for other arid and semiarid countries. This is due to the fact that all arid and semi-arid regions have the same and similar climate conditions. 5.

Refrences

1. Boser B, Guyon L, and. Vapnik V.N (1992) A training algorithm for optimal margin classifiers. In Proc. Fifth Annual Workshop on Computational Learning Theory, pages 144–152, ACM Press: San Mateo, CA. 2. Che Ghani, Nor Zaimah, Abuhasan, Zorkefle, and Tze Liang, Lau (2014) Estimation of Missing Rainfall Data Using GEP: Case Study of Raja River, Alor Setar, Kedah, Advances in Artificial Intelligence, http://dx.doi.org/10.1155/2014/716398 , pp. 1-5. 3. Cybenko G (1989) Approximation by super positions of a sigmoidal function Mathematics of Control, Signals, and Systems, 2(4), 303–314. 4. Dastorani MT, Moghadamnia A, Piri J, Rico-Ramirez M (2009) Application of ANN and ANFIS models for reconstructing missing flow data. Environ Monit Assess. doi:10.1007/s10661-009-1012-8. 5. De Martonne E (1923) Aridité et indices d´aridité. Académie des Sciences. Comptes Rendus, 182 (23) : 1935-1938. 6. De silva, R.P, Dayawansa, N.D.K, Ratnasiri, M.D (2007) A Comparison of methods used in Estimating missing rainfall data, the journal of agricultural sciences, 3(2): pp. 101-108. 7. Eischeid JK, Baker CB, Karl TR, Diaz HF (1995) The quality control of long-term climatological data using objective data analysis. Journal of applied meteorology and climatology 34: 2787–2795.

11

International Water Conference 2016 Water Resources in Arid Areas 8. Han J, Kamber M (2006) Data Mining: Concepts and Techniques, 2 nd ed. Morgan Kaufman publishers. ISBN 1-55860-901-6.

9. Hasanpur Kashani, Mahsa, and Dinpashoh, Yagub (2012) Evaluation of efficiency of different estimation methods for missing climatological data, journal of stochastic environment research risk assessment, 26: pp. 59-71. 10. Hubbard KG (1994) Spatial variability of daily weather variables in the high plains of the USA. Agric For Meteorol 68:29–41. 11. K. Choge, Harshand, and Regulwar, D.G (2013) Artificial Neural Network Method for Estimation of Missing Data, International Journal of Advanced Technology in Civil Engineering, 2(1): pp. 1-4. 12. Kim, Jung-Woo, and A. Pachepsky, Yakov (2010) Reconstructing missing daily precipitation data using regression trees and artificial neural networks for SWAT streamflow simulation, Journal of Hydrology 394: pp. 305–314. 13. Radi N, Zakaria R, Azman M (2015) Estimation of missing rainfall data using spatial interpolation and imputation methods. AIP conference proceedings, volume 1643, Issue 1,pp: 42-48. DOI: 10.1063/1.4907423. 14. Rosenblatt F. (1962) Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC. 15. Rumelhart D.E, McClelland J.L and the PDP research group (editors) (1986) Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundations. MIT Press. Cambridge, MA, USA. ISBN:0262-68053-X. 16. Schafer, J. L., and Olsen, M. K., (1998), Multivariate behavioral research, 33(4), pp. 545-571. 17. Teegaravapu, R.S.V (2009) Estimation of missing precipitation records integrating surface interpolation techniques and spatio-temporal association rules, Journal of Hydroinformatics, 11(2): pp. 133-146. 18. Teegavarapu R S.V, Tufail M, Ormsbee L (2009) Optimal functional forms for estimation of missing precipitation data. Journal of hydrology 374: 106-115. 19. Teegavarapu RSV, Chandramouli V (2005) Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. Journal of Hydrology 312: 191–206. 20. Xia Y, Fabian P, Stohl A, Winterhalter M (1999) Forest climatology: estimation of missing values for Bavaria, Germany. Agricultural and Forest Meteorology 96: 131-144. 21. Young KC (1992) A three-way model for interpolating monthly precipitation values. Mon Weather Rev, 120: pp. 2561–2569.

12