Performance assessment of different data mining methods in statistical ...

6 downloads 14208 Views 1MB Size Report
Apr 18, 2013 - Performance assessment of different data mining methods in statistical downscaling of daily precipitation. M. Nasseri a,*. , H. Tavakol-Davani a, ...
Journal of Hydrology 492 (2013) 1–14

Contents lists available at SciVerse ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Performance assessment of different data mining methods in statistical downscaling of daily precipitation M. Nasseri a,⇑, H. Tavakol-Davani a, B. Zahraie b a b

School of Civil Engineering, University of Tehran, Tehran, Iran Center of Excellence for Engineering and Management of Civil Infrastructures, School of Civil Engineering, University of Tehran, P.O. Box 11155-4563, Tehran, Iran

a r t i c l e

i n f o

Article history: Received 16 April 2012 Received in revised form 7 April 2013 Accepted 9 April 2013 Available online 18 April 2013 This manuscript was handled by Andras Bardossy, Editor-in-Chief, with the assistance of Sheng Yue, Associate Editor Keywords: Statistical downscaling Nonlinear data-mining method Climate change

s u m m a r y In this paper, nonlinear Data-Mining (DM) methods have been used to extend the most cited statistical downscaling model, SDSM, for downscaling of daily precipitation. The proposed model is Nonlinear DataMining Downscaling Model (NDMDM). The four nonlinear and semi-nonlinear DM methods which are included in NDMDM model are cubic-order Multivariate Adaptive Regression Splines (MARS), Model Tree (MT), k-Nearest Neighbor (kNN) and Genetic Algorithm-optimized Support Vector Machine (GA-SVM). The daily records of 12 rain gauge stations scattered in basins with various climates in Iran are used to compare the performance of NDMDM model with statistical downscaling method. Comparison between statistical downscaling and NDMDM results in the selected stations indicates that combination of MT and MARS methods can provide daily rain estimations with less mean absolute error and closer monthly standard deviation and skewness values to the historical records for both calibration and validation periods. The results of the future projections of precipitation in the selected rain gauge stations using A2 and B2 SRES scenarios show significant uncertainty of the NDMDM and statistical downscaling models. Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction Outputs of Global Circulation Models (GCMs) are the base of climate change studies. Spatial resolution of these data is not enough to determine local climate change effects and they must be recalculated to a suitable resolution to be valid for local meteorological analysis. The methods of extracting regional scale meteorological variables from GCM outputs have been known as downscaling approaches. Four general categories of downscaling approaches include regression (empirical) methods (Enke and Spekat, 1997; Faucher et al., 1999; Li and Sailor, 2000; Wilby et al., 2002; Hessami et al., 2008; Raje and Mujumdar, 2011), weather pattern approaches (Bárdossy and Plate, 1992; Yarnal et al., 2001; Bárdossy et al., 2002; Wetterhall et al., 2009; Anandhi et al., 2011), stochastic weather generators (Semenov and Barrow, 1997; Bates et al., 1998) and regional climate models (Mearns et al., 1995). Regression or empirical methods are the most cited approaches in downscaling simulation. Simplicity in use, relatively lower costs of pre-processing and straightforwardness of computational procedure are the main reasons of the popularity of these downscaling techniques. Finding the empirical relationships between global and local scales of climate circulation is the basic statement of any statistical downscaling method. According to this assumption, correlation of ⇑ Corresponding author. Tel.: +98 912 209 4881. E-mail addresses: [email protected] (M. Nasseri), [email protected] (H. TavakolDavani), [email protected] (B. Zahraie). 0022-1694/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jhydrol.2013.04.017

global GCM meteorological variables (predictors) and local meteorological variables such as observed precipitation and temperature (predictands) is the key point of this type of downscaling procedure. The most well-known regression based downscaling methods are structured for separate estimation of occurrence and amount of meteorological variables. Advantages and disadvantages of statistical regression based downscaling methods have been comprehensively discussed by Hessami et al. (2008). Different nonlinear Data Mining (DM) methods such as Artificial Neural Networks (ANNs) (Tomassetti et al., 2009; Pasini, 2009; Mendes and Marengo, 2010; Fistikoglu and Okkan, 2011), k-Nearest Neighbor (kNN) (Yates et al., 2003; Gangopadhyay et al., 2005; Raje and Mujumdar, 2011), Support Vector Machine (SVM) (Tripathi et al., 2006; Chen et al., 2010), Model Tree (MT) (Li and Sailor, 2000), Multivariate Adaptive Regression Splines (MARS) (CorteReal et al., 1995) beside linear regression methods (Wilby et al., 2002; Hessami et al., 2008) have been used in the previous studies for climatological research. Statistical downscaling model (SDSM) is the most cited concepts and packages among regression based statistical downscaling methods. This computer package benefits from Multiple Linear Regression (MLR) method to estimate the amount and/or the occurrence of local meteorological predictands. In this paper, efficiency of four nonlinear and semi-nonlinear DM methods and their previous applications in climatological research, namely MARS, MT, kNN and Genetic Algorithm-optimized SVM (GA-SVM) have been evaluated versus application of the standard MLR in estimating both occurrence and amount of precipitation.

2

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

In this article, the structure of SDSM has been used as the main platform to develop Nonlinear Data-Mining Downscaling Model (NDMDM) model by replacing their MLR kernels with the selected DM methods. In the next sections, local scale (areas of interest and predictands) and large scale datasets (predictors) which are used in this study are described. Then, SDSM and the utilized DM methods are briefly described. The next sections of the paper present the results of the case study and concluding remarks and recommendations for further studies. 2. Datasets

Twenty-six different atmospheric variables are available for each grid box in this database. For each rain gauge station, nine boxes covering and around the study areas have been selected. Fig. 1 depicts center of each meteorological grid box and location of the selected rain gauge stations. As it is illustrated in this figure, the grid boxes cover a large area over the selected basins and around them. In addition, one to three-day lags of predictors have been considered as candidate model inputs to incorporate cross correlation and auto-correlation in the modeling process. For each station, 936 (9 (grid boxes)  26 (meteorological predictors)  4 (0 to 3-day time lag)) predictors have been analyzed.

2.1. Local dataset

3. Methodology

To assess the efficiency of the proposed downscaling method, twelve rain gauge stations scattered in five different climatological basins in Iran, namely Hamoon-Jazmoorian, Sefidrood, MordabAnzali, Shapoor-dalky and Mond are used. These basins are located in an arid region in southeast of Iran near Iran-Pakistan border, a wet region in north of Iran near Caspian Sea and a semi-arid region in southwest of Iran in Persian Gulf. Some statistical characteristics such as average, maximum and standard deviation of observed daily precipitation of the selected stations have been presented in Table 1. The locations of these rain gauge stations are also shown in Fig. 1. As presented in Table 1, 26–35 years of daily precipitation records up to the year 2000 (the start year of simulations of the climate change scenarios) are available for the selected rain gauge stations. For each station, the first 75% of the available record has been used for calibration of the downscaling model and the rest of the recorded data has been used for validation of the model. The daily precipitation records have been gathered from the Iran Water Resources Management Company.

In the current section at the first, platform of SDSM has been described. Then different data-mining methods which are used in NDMDM are described and at the end, structure of NDMDM is explained.

2.2. Large scale datasets The data bank of Hadley Center GCM, namely HadCM3, for A2 and B2 SRES (Special Report on Emission Scenarios) scenarios has been used in this study to project the future climate behavior. The coarse resolution (2.5°  2.5°) reanalysis of atmospheric data from the U.S. National Center for Environmental Prediction (NCEP) (Table 2) have been used as the downscaling model predictors. Because of inconsistency of spatial resolution of HadCM3 outputs (3.75° (long.)  2.5° (lat.)) and NCEP dataset, projection of large-scale predictors of NCEP on HadCM3 computational grid box has been used in this study. The daily projected data and HadCM3 outputs are available from the Canadian Climate Impacts Scenarios (CCIS) website (www.cics.uvic.ca/scenarios/sdsm/select.cgi).

3.1. Statistical downscaling model (SDSM) SDSM software is developed based on Multiple Linear Regression Downscaling Model (MLRDM) (Wilby et al., 2002). SDSM outputs are the average of several weather ensembles which are the results of using linear regression models with stochastic terms of bias correction. Because of the linear structure of SDSM, selection of predictors is based on the correlation and partial correlation analysis between the predictand and predictors and weights of the predictors which are estimated via simple least square method. Dual simplex method has been also provided in SDSM because of instability of regression coefficients for non-orthogonal predictor vectors. Hessami et al. (2008) added a new option of using ridge regression (Hoerl and Kennard, 1970) in their downscaling model, namely ASD as a remedy of the non-orthogonality impact of the predictor vectors as well (Hessami et al., 2008). SDSM model contains of two separate sub-models to determine occurrence and amount of conditional meteorological variables (or discrete variables) such as precipitation and amount model for unconditional variables (or continues variables) such as temperature or evaporation. Statistical downscaling using SDSM consists of the following steps: 1. In first step, suitable predictors should be selected. SDSM provides the ability of some statistical analysis for users to select the best predictors. In SDSM, predictors should have acceptable unconditional and conditional correlations with the predictand. Also, partial correlation, P-value and explained variance of the

Table 1 Basic information about 12 rain gauge stations (Max.=Maximum and Std.=Standard deviation). No.

Station code

Station name

Abbr.

Basin

Length of dataset (year)

Longitude (°E)

Latitude (°N)

Statistical characteristics of observed daily rainfall (mm) Mean

Max.

Std.

1 2 3 4

44-014 44-009 44-016 44-024

Delfard Dehrood Khoramshahi Kharposht

Del. Deh. Kho. Khar.

Jazmoorian Jazmoorian Jazmoorian Jazmoorian

1975–2000

57.60 57.73 57.75 57.83

29.00 28.87 29.00 28.48

1.20 0.76 1.32 0.46

150 132 194 80

6.31 4.71 6.95 3.20

5 6 7 8

17-082 17-075 18-007 18-017

Rasht Farshekan Kasma Shanderman

Ras. Far. Kas. Shan.

Sefidrood Sefidrood Mordab-anzali Mordab-anzali

1966–2000

49.60 49.58 49.30 49.11

37.25 37.40 37.31 37.41

3.58 3.30 3.01 2.67

188 168 317 177

10.34 9.93 9.59 8.23

9 10 11 12

24-033 23-011 23-019 43-034

Khanzanian Shapoor Shoorjareh Arsanjan

Khan. Shap. Shoo. Ars.

Mond Shapoor-dalky Shapoor-dalky Shapoor-dalky

1972–2000

52.15 51.11 51.98 51.30

29.67 29.58 29.25 29.92

1.27 0.85 0.99 0.87

92 75 120 111

5.26 4.45 4.92 4.56

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

3

Fig. 1. Location map of rain gauge stations in Jazmoorian, Sefidrood, Mordab-anzali, Shapoor-dalky and Mond basins.

predictors can be checked while using SDSM. The scatter plot is another tool provided in SDSM in order to select the appropriate predictors. Acceptable ranges for the above mentioned terms are proposed by Wilby et al. (2004). 2. A multiple linear regression model is calibrated to simulate the precipitation occurrence which is called unconditional model. This model can be calibrated by two different methods namely ordinary least square and dual simplex methods. An autoregressive term can be added to this model. For each month, one MLR model must be calibrated for occurrence estimation. The days with and without events (precipitation) are represented with 1 and 0, respectively. For each day and ensemble, a uniformly distributed random number between 0 and 1 is generated. If the random number is less than the output of the occurrence model in that day, precipitation occurs. Otherwise, precipitation does not occur.

3. Another multiple linear regression model, namely conditional model, is calibrated to simulate the precipitation amount. This model is calibrated using the rainy days data. Like the unconditional model, SDSM calibrate different conditional models for 12 months of year. For a day which is identified as a rainy day in the previous step, output of the amount model is calculated. Then, a normally distributed number is added to the output to consider the modeling error. This random number is generated using a normal distribution function with zero mean and standard deviation equal to standard error. 4. The result of the previous step is compared with a predefined threshold. If the result is less than the threshold, the precipitation won’t occur. Otherwise, the result is considered as the rainfall amount in that day and in that ensemble.

4

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

Table 2 Large-scale predictors from NCEP database. No.

Predictor

Abbreviation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Mean sea level pressure Surface airflow strength Surface zonal velocity Surface meridional velocity Surface vorticity Surface wind direction Surface divergence 500 hpa Airflow strength 500 hpa Zonal velocity 500 hpa Meridional velocity 500 hpa Vorticity 500 hpa Wind direction 500 hpa Divergence 850 hpa Airflow strength 850 hpa Zonal velocity 850 hpa Meridional velocity 850 hpa Vorticity 850 hpa Wind direction 850 hpa Divergence 500 hpa Geopotential height 850 hpa Geopotential height Relative humidity at 500 hpa Relative humidity at 850 hpa Near surface relative humidity Near surface specific humidity Mean temperature at 2 m

mslp p__f p__u p__v p__z p_th p_zh p5_f p5_u p5_v p5_z p5th p5zh p8_f p8_u p8_v p8_z p8th p8zh p500 p850 r500 r850 Rhum Shum Temp

matical kernels enhanced by continuous data partitioning. Various applications of MARS can be found in hydrological studies (Coulibaly and Baldwine, 2005; Buccola and Wood, 2010; Herrera et al., 2010). MARS detects inherent data nonlinearity and appropriate partitions of data structure for model parameters using weighted summation of some conditional linear or nonlinear polynomial basis functions. MARS uses the following structure for conditional regression:

f ðxÞ ¼

k X C i  Bi ðxÞ

ð3Þ

i¼1

where Bi(x) and Ci are the basis function and its constant coefficient, respectively and also k is the number of total basis functions used in the final model. Basis functions are known as hinge function. The general form of hinge functions is as follows:

Bi ðxÞ ¼ maxð0; xi  consti Þ or maxð0; const i  xi Þ

5. Furthermore, in SDSM, bias correction (b) (Eq. (1)) and variance inflation (VIF) (Eq. (2)) actions can be applied on the results of each monthly model to achieve acceptable ensemble results both in the calibration and validation periods (Hessami et al., 2008):

ð4Þ

In Eq. (4), hinge function is linear while it is also possible to be presented in the form of multiple nonlinear functions with orders higher than one. MARS has two important backward and forward routines for identifying the best structure and especially the model parameters. These coupled routines allow optimization of the MARS structure avoiding over parameterization and over-fitting effects. For more information, the readers are referred to Friedman and Stuetzle (1981). FAST MARS (Friedman, 1993), ARESLab (Jekabsons, 2010a) are the recent softwares developed for MARS. The ARESLab package provided by Jekabsons (2010a) has been utilized in this study, and maximum order of polynomial function in ARESLab has been set up to 3. 3.3. Model tree (MT)

b ¼ Meanobs  Meanmod VIF ¼

12ðVar obs  Varmod Þ Ste2

ð1Þ ð2Þ

where Meanobs and Meanmod are the mean values of the observed and modeled precipitation, respectively. Varobs and Varmod are the variances of observed and modeled precipitation for the calibration period and Ste is the standard error in the same period. b  1 Is added to the amount of precipitation in each day and qffiffiffiffiffi VIF is multiplied to the standard deviation of modeling error. 12 While the downscaling model is calibrated using NCEP dataset, in estimating VIF and bias correction, variables with the subscript Mod are estimated using downscaling model outputs based on GCM simulations. This approach allows the modeler to take into account the bias of GCM in the downscaling process. 6. Finally, in order to achieve a single downscaled time series from all projected ensembles, their arithmetic mean are calculated. In this study, SDSM has been rewritten in MATLAB environment. Accuracy and compatibility of the new MATLAB code with SDSM package has been tested using several datasets. Then the SDSM MATLAB code has been extended to make the user capable of choosing different predictors for the amount and occurrence models. Because of this difference between the commercial SDSM package developed by Wilby et al. (2002) and the code developed in this study, we have referred to it as MLRDM in the next sections of the paper.

MT is classified as a data-structured and modular DM method. Similar to MARS, MT is also built on a data partitioning foundation. MT splits rules at the leaves of the conceptual mathematical tree in non-terminal nodes of regression functions. So, the construction of a MT is similar to that of decision tree while it has faster convergence compared with other DM methods. MT can successfully manage problems with high dimensional spaces up to hundreds of variables, and also combines a conventional model tree with the possibility of generating linear regression functions at its leaves through increasing simulation performance. MT operates very similar to piecewise or conditional mathematical functions. One of the first applications of conditional regression method to describe behavior of hydrological rainfall-runoff system has been presented in the 1970s by Becker (1976) and also with Becker and Kundzewicz (1987). MT has found many applications in climatological and hydrological sciences (Faucher et al., 1999; Li and Sailor, 2000; Xiong et al., 2001; Solomatine and Xue, 2004). M5 is a popular MT method. M5 learning paradigm was developed by Quinlan (1986, 1993). The first version of M5 consisted of piecewise or conditional linear models, which made it an intermediate model between the linear models and truly nonlinear models such as ANNs. The details of algorithm of the first version of M5 can be found in Quinlan (1993) and Solomatine and Dulal (2003). A new version of M5 has been presented by Wang and Witten (1997), namely M50 which is used in the current paper. The main kernel of M5’ developed in MATLAB developed by Jekabsons (2010b) is used in NDMDM model.

3.2. MARS 3.4. k-Nearest Neighbor (kNN) Initial idea of MARS (Friedman and Stuetzle, 1981) has been developed and completed by Friedman (1991). This method is based on multivariate regression with linear or nonlinear mathe-

One of the simplest methods in pattern recognition is k-Nearest Neighbor (kNN). It is an unsupervised machine learning method. It

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

classifies objects based on the nearest observed in the training dataset in the initial feature space and the interested object being assigned to the class of the most similar between its k nearest neighbors (k is a positive integer). k Is the only parameter in this methods which should be calibrated. Different revised forms of kNN have been presented in the literature (Lall and Sharma, 1996; Sharif and Burn, 2006). Useful reports and articles about applications of kNN in the fields of hydro-science are available in the literature (Yates et al., 2003; Gangopadhyay et al., 2005; Sharif and Burn, 2006; Raje and Mujumdar, 2011). In this paper, original type of kNN (with geometric distance value) has been used for statistical downscaling simulation, and the best value of k in the range of 1–20 has been detected via unsupervised learning.

5

Because of nonlinearity of SVM and its parameters, some researchers optimized SVM and the kernel parameters with evolutionary algorithms (Fei and Sun, 2008; Oliveira et al., 2010). In this paper, SVM is used to model daily precipitation amount and occurrence in the proposed downscaling model. In this paper, regression based SVM will be used to downscale daily precipitation both in occurrence and amount modes. To achieve the best performance of SVM, the kernel and SVM parameters (6 parameters) have been optimized using GA. Two kernel functions, namely sigmoid and Radial Basis Function (RBF) have been test and calibrated in this study. For a more detailed description of SVM, readers are referred to Vapnik and Cortes (1995). In the next section, procedure of predictor(s) selection is described in details.

3.6. Selection of the predictors 3.5. Genetic Algorithm-Optimized Support Vector Machine (GA-SVM) SVM is one of the new types of machine learning and data mining methods intended to recognize the data structures for classification or regression. The basic revision of SVM was developed by Vapnik and Cortes (1995). The most important feature of SVM in detecting the data structure is transforming original data from input space to a new target space (feature space) with new mathematical paradigm entitled Kernel function (Boser et al., 1992). For this purpose, a nonlinear transformation function /ð Þ is defined to project the input space into a higher dimension feature space Rnh . According to Cover’s theorem (Cover, 1965) a linear function, f ð Þ, can be formulated in the higher dimensional feature space to represent a non-linear relation between the inputs xi and the outputs yi as follows:

yi ¼ f ðxi Þ ¼ hw; /ðxi Þi þ b

ð5Þ

where w and b are the model parameters. This mathematical approach has been presented previously by Aizerman et al. (1964). Boser et al. (1992) utilized this formulation to develop nonlinear SVM. For more information about SVM in regression and pattern recognition mode, the readers have been addressed to Vapnik (1998).

The feature selection techniques (or selection of predictors, here) can be categorized into three main branches, namely embedded, wrapper and filter based methods (Tan et al., 2006). Most of the well-known general approaches of feature selection can be categorized in the other two broad classes of wrapper and filter methods (Guyon and Elisseeff, 2003). Wrapper methods measure the model performance up to all or most of possible subset of input variables in order to find the appropriate input subsets based on their calibration results (Liue and Yu, 2005). The filter based methods are model-free techniques which utilize statistical criteria to find the existing dependencies between the input candidates and output variable(s) or predictors. These criteria act as statistical benchmarks for reaching the suitable predictor dataset. The linear correlation coefficient is a popular criterion for measuring dependencies between input and output variables. Battiti (1994) showed that efficiency of linear correlation coefficient is related to the effects of noise and data transformation during data preprocessing and feature selection. Despite popularity and simplicity of linear correlation coefficient in exploring the dependency of variables, it is inappropriate for real nonlinear systems (Battiti, 1994). Mutual Information (MI), as another filtering method, describes the reduction amount of uncertainty in estimation of one parameter when another is available (Liu et al., 2009). It

Fig. 2. NDMDM procedure.

6

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

is a robust and nonlinear filter method and recently has been found to be an appropriate statistical criterion in feature or predictor selection problems in hydrology (Bowden et al., 2005a, 2005b; May et al., 2008a, 2008b). Achieving the best subset of input predictors in downscaling problems is complicated and challenging because of large number of meteorological predictors while considering the interactions of model parameters and its structure. Since nonlinear mathematical

kernels have been used in the proposed downscaling model, MI is selected for choosing the best set of the downscaling model predictor(s). 3.7. Statistical downscaling using NDMDM NDMDM is a fully automated MATLAB package developed in this study. In this computational package, five models including

Table 3 Selected predictors for occurrence model calibrated for different stations. Station

Predictor

Lag

Longitude (°)

Latitude (°)

MI

Del.

rhum r850 r850 r850 p500

0 0 0 0 0

56.25 56.25 60.00 56.25 52.50

30.00 30.00 27.50 27.50 30.00

0.0170 0.0170 0.0165 0.0162 0.0162

Deh.

r850 r850 r850 rhum rhum

0 0 0 0 0

56.25 56.25 60.00 56.25 56.25

27.50 30.00 27.50 27.50 30.00

0.0169 0.0161 0.0161 0.0152 0.0152

Kho.

p500 pr850 pr850 pr850 rhum

0 0 0 0 0

52.50 52.50 56.25 60.00 56.25

30.00 32.50 27.50 27.50 27.50

0.0209 0.0207 0.0197 0.0190 0.0181

Khar.

r850 r850 r850 rhum rhum

0 0 0 0 0

56.25 56.25 60.00 56.25 56.25

27.50 30.00 27.50 27.50 30.00

0.0113 0.0113 0.0112 0.0111 0.0110

Ras.

r850 r850 rhum r850 p__u

0 0 0 0 0

48.75 48.75 48.75 52.50 48.75

37.50 40.00 37.50 37.50 40.00

0.0493 0.0429 0.0380 0.0328 0.0299

Far.

r850 r850 r850 rhum r850

0 0 0 0 1

48.75 48.75 52.50 48.75 48.75

37.50 40.00 37.50 37.50 40.00

0.0490 0.0456 0.0379 0.0335 0.0319

Kas.

p__u r850 r850 r850 rhum

0 0 0 0 0

48.75 48.75 48.75 52.50 48.75

40.00 37.50 40.00 37.50 37.50

0.0449 0.0426 0.0352 0.0320 0.0283

Shan.

p__u p__z pr850 pr850 rhum

0 0 0 0 0

48.75 48.75 48.75 48.75 48.75

40.00 40.00 37.50 40.00 37.50

0.0407 0.0394 0.0308 0.0304 0.0258

Khan.

rhum r850 r850 r850 p8zh

0 0 0 0 1

52.50 52.50 48.75 52.50 48.75

30.00 30.00 30.00 27.50 27.50

0.0436 0.0436 0.0366 0.0350 0.0350

Shap.

r850 r850 r850 r850 rhum

0 0 0 0 0

48.75 48.75 52.50 52.50 52.50

30.00 32.50 27.50 30.00 30.00

0.0473 0.0472 0.0394 0.0387 0.0358

Shoo.

r850 r850 r850 rhum rhum

0 0 0 0 0

48.75 52.50 52.50 52.50 56.25

30.00 27.50 30.00 30.00 30.00

0.0485 0.0485 0.0412 0.0384 0.0382

Ars.

r850 r850 r850 rhum rhum

0 0 0 0 0

52.50 52.50 56.25 52.50 56.25

27.50 30.00 30.00 30.00 30.00

0.0459 0.0459 0.0389 0.0389 0.0359

7

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14 Table 4 Selected predictors for the amount model calibrated for different stations. Station

Predictor

Lag

Longitude (°)

Latitude (°)

MI

Del.

p5zh p5zh p5_v p5_v

0 0 0 0

52.50 56.25 56.25 52.50

30.00 27.50 30.00 30.00

0.0515 0.0511 0.0508 0.0489

Deh.

r850 rhum r850 rhum

0 0 1 1

56.25 56.25 56.25 56.25

52.50 27.50 30.00 30.00

0.0538 0.0521 0.0520 0.0492

Kho.

p5_v p5_v p5_v p5zh

0 0 0 0

52.50 52.50 52.50 52.50

27.50 30.00 32.50 27.50

0.0482 0.0475 0.0471 0.0452

Khar.

r850 r850 rhum rhum

0 0 0 0

60.00 60.00 56.25 60.00

27.50 30.00 27.50 30.00

0.0879 0.0849 0.0808 0.0771

Ras.

p8th p__u p8th p__u

0 0 0 0

52.50 48.75 52.50 45.00

40.00 37.50 37.50 40.00

0.0243 0.0225 0.0223 0.0215

Far.

p__u p__u p8th r850

0 0 0 0

45.00 48.75 52.50 48.75

40.00 37.50 40.00 37.50

0.0242 0.0217 0.0217 0.0215

Kas.

p__u p__u p8_u p8th

0 0 0 0

45.00 48.75 45.00 45.00

37.50 37.50 37.50 37.50

0.0140 0.0137 0.0136 0.0135

Shan.

p__u p__z p8_u p8th

0 0 0 0

45.00 48.75 45.00 52.50

40.00 40.00 37.50 40.00

0.0194 0.0175 0.0168 0.0166

Khan.

r850 rhum p8_v p8_v

0 0 1 0

52.50 52.50 52.50 56.25

30.00 30.00 27.50 30.00

0.0522 0.0522 0.0434 0.0431

Shap.

p8_v r850 rhum rhum

0 0 0 0

56.25 52.50 52.50 52.50

32.50 30.00 27.50 30.00

0.0624 0.0618 0.0509 0.0471

Shoo.

r850 r850 rhum rhum

0 0 0 0

52.50 56.25 52.50 56.25

30.00 30.00 30.00 30.00

0.0583 0.0583 0.0446 0.0445

Ars.

p8_v p8_v p8zh p8_v

0 0 0 1

52.50 56.25 52.50 52.50

27.50 30.00 27.50 27.50

0.0563 0.0489 0.0483 0.0440

MLR (which is used in SDSM package), nonlinear MARS, MT, kNN and GA-SVM are available for calibrating both occurrence and amount models. Auto calibration capability is available for all five models as well. NDMDM includes two separate subroutines for precipitation occurrence and amount ensemble simulation. It also includes variance inflation and bias correlation similar to SDSM. The following steps should be taken to use NDMDM model for downscaling precipitation: 1. A uniformly distributed random number in [0, 1] is generated to determine whether precipitation occurs. Similar to SDSM, for each day and in each ensemble, a wet-day occurs when the

random number is less than or equal to the output of the calibrated occurrence model which can be either of the five MLR, nonlinear MARS, MT, kNN and GA-SVM models. 2. Another model (from the set of five available models in NDMDM) is calibrated to simulate the precipitation amount using the rainy days data. Similar to SDSM, NDMDM can calibrate different conditional models for 12 months of the year. For a day which is identified as a rainy day in the previous step, output of the amount model is calculated. Then, similar to SDSM, a normally distributed number is added to the output to consider the modeling error. This random number is generated using a normal distribution function with zero mean and standard deviation equal to standard error.

8

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

3. In the last step, the results from the previous step are compared with a user-defined threshold to avoid generation of irrational results (such as negative values or too small positive values which can interrupt the dry spell analysis). These three steps are also shown in Fig. 2. These steps are similar to SDSM and the only major difference between NDMDM and SDSM is the four nonlinear MARS, MT, kNN and GA-SVM models which are available in NDMDM and also the possibility of considering different sets of predictors for precipitation occurrence and amount modeling in NDMDM. The following five steps have been performed in this study for evaluation of the performance of the models:  Singularity analysis is carried out in NDMDM. In this study, in model calibration phase, NCEP variables are used while in scenario generation phase, GCM variables are exploited. The calibrated models in NDMDM must be checked for possible over fitting, extrapolation and singular response modes. In this paper, computed precipitation values which are greater than hundred times of the maximum observed, are considered as singular results. Consequently, the model combinations which produce such results are rejected. This threshold is selected based on engineering judgment and can be different for other basins.  The absolute relative errors are calculated for mean, standard deviation and skewness in the dry and wet seasons for the model which passes the previous step.  The average, standard deviation, and skewness of errors have been calculated for dry and wet seasons and have been used in evaluation of NDMDM results.  Final error (FER) is calculated using Eq. (7) assuming 3, 2, and 1 as the relative weights of mean error (ErrorMean), standard deviation of errors (Errorstd.) and skewness of errors (Errorskw.), respectively. It must be noted that the error weights in this formula are hypothetical and can be changed based on the modeler’s judgment.

FER ¼

3Error Mean þ 2Error Std: þ Error Skw: 6

ð6Þ

 The weights used in this equation are selected based on expert judgment and may differ for different purposes. For example, in the case of extreme precipitation, weight of skewness and standard deviation may be selected greater than the weight of mean. In the current study general evaluation of precipitation is aimed.  Finally, the model with the least FER value is selected as the best one.  In the next section, modeling results and the advantages and disadvantages of the proposed methodology are described.

4. Results and discussion To implement NDMDM, in the first step, suitable predictors must been extracted from the pool of meteorological predictors. Based on the presented description in the Section 3.6, MI index has been calculated for different combinations of predictors and predictands to select the suitable predictors. The first five predictors with highest MI values for each predictand are selected. The Table 6 Best combination of occurrence and amount models in the twelve rain gauge stations. No.

Station

Occurrence model

Amount model

1 2 3 4 5 6 7 8 9 10 11 12

Del. Deh. Kho. Khar. Ras. Far. Kas. Shan. Khan. Shap. Shoo. Ars.

MT MT MT MARS MARS MT MARS MT MT MT MT MT

MLR MARS MT MT MT MT MT MT MT kNN MLR MT

Table 5 NDMDM results for Del. Station (based on the validation period). Model combination

a

Dry season

Wet season

FER

No.

Occurrence model

Amount model

Mean

Standard deviation

Skewness

Mean

Standard deviation

Skewness

1a 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Observation

MLR MLR MLR MLR MLR kNN kNN kNN kNN kNN MARS MARS MARS MARS MARS MT MT MT MT MT GA-SVM GA-SVM GA-SVM GA-SVM GA-SVM

MLR kNN MARS MT GA-SVM MLR kNN MARS MT GA-SVM MLR kNN MARS MT GA-SVM MLR kNN MARS MT GA-SVM MLR kNN MARS MT GA-SVM

0.27 0.25 0.25 0.27 0.28 0.16 0.14 0.15 0.16 0.17 0.75 0.67 0.77 0.8 0.82 0.17 0.14 0.15 0.17 0.22 0.29 0.24 0.25 0.27 0.29 0.2

0.51 0.44 0.41 0.44 0.53 0.54 0.47 0.58 0.49 0.64 1.81 1.62 1.87 1.94 1.97 1.16 0.99 1.11 1.03 2.15 0.7 0.54 0.53 0.51 0.64 1.48

3.92 4.23 3.73 3.06 8.22 4.33 5.1 11.64 4.22 10.34 2.96 3.15 2.86 3.06 2.96 12.27 13.89 17.73 8.97 25.04 9.56 8.41 10.29 5.73 12.28 9.93

1.98 1.93 1.97 1.94 2.16 1.84 1.78 1.87 1.78 1.93 1.93 1.88 1.97 1.89 2.08 2.07 2.05 2.11 2.07 2.22 1.75 1.82 1.85 1.81 2.13 2.33

2.96 2.83 2.83 2.88 2.76 3.56 3.23 3.68 3.51 3.23 3.56 3.34 3.92 3.49 3.21 5.02 4.9 4.98 4.93 4.87 1.46 1.47 1.38 1.6 1.58 8.72

2.59 3.23 3.46 3.15 2.1 3 2.94 5.51 4.82 2.17 3.41 3.79 9.19 4.74 2.39 2.82 2.95 3.59 2.99 2.25 1.07 1.34 1.34 3.79 1.78 6.23

Selected combination of the amount and occurrence models are marked in gray color.

0.4 0.4 0.41 0.43 0.37 0.35 0.4 0.27 0.35 0.29 0.6 0.48 0.61 0.66 0.71 0.07 0.15 0.13 0.09 0.26 0.42 0.43 0.43 0.45 0.42 –

9

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

MI range of the selected predictors is between 0.011 and 0.09. The selected predictors are mostly relative humidity and zonal velocity in different geopotential heights without any time lag. The selected predictors are scattered in all of the nine neighboring boxes shown in Fig. 1. Tables 3 and 4 show the selected five and four predictors set for occurrence and amount models for all of the stations. In these tables, MI values between the predictands and selected predictors have been presented as well. Based on these tables, for occurrence and amount simulation five and four predictors with one lags have been selected (Far. and Khan. stations for occurrence and Ars., Khan. and Deh. Stations for amount) respectively. Relative humidity has been detected as the most selected predictor for both occurrence and amount models all of the stations. Using NDMDM, a set of occurrence and amount models have been calibrated automatically for each month and then stochastic weather generation has been performed for all 25 possible combinations of five models for occurrence and five models for amount estimation. Number of generated ensembles in each downscaling simulation is set to 100 and also all of the selected models have passed the singularity test as explained in the previous section. In the case of GA-SVM, the population size and cross-over and mutation rates of GA have been set to 50%, 10% and 80%, respectively; the selected SVM type is epsilon-SVM for both precipitation amount and occurrence models. Sample results obtained from NDMDM for various combinations of the amount and occurrence models for Del. Station are presented in Table 5. To achieve a good perspective of seasonal performance of NDMDM, these results have been categorized to wet (November to April) and dry (May to October) seasonal. In

Del. Station, MT-MLR has been found as the best combination of occurrence-amount models. In Table 6, the best combinations of NDMDM models (both occurrence and amount models) for the twelve stations are presented as well. According to the Table 6, MT, MARS, MLR and kNN have been selected 17, 4, 2 and 1 times, respectively. Selected DM methods in occurrence mode are only MT and MARS. The most important similarity of these two models is data-partitioning in developing its regression based model structures and this might be the most important reason of their better performance in precipitation occurrence modeling versus the other nonlinear methods used in this study. Table 7 depicts the comparison between statistical characteristics of MLRDM (downscaling using MLR method for both occurrence and amount modeling which is similar to SDSM), selected NDMDM models (based on the proposed FER criteria) and observations in the rain gauge stations in dry and wet seasons and also for the calibration and validation periods. Based on the illustrated results in Table 7, the mean values of the down scaled precipitation series estimated by NDMDM in 58% and 42% of the selected rain gauge stations in the calibration and validation periods, respectively have been closer to the historical mean values than the MLRDM model results. In other words, overall NDMDM and MLRDM performances in regenerating mean values are competitive however MLRDM performance is slightly better than NDMDM. Table 7 also shows that the standard deviation and skewness values of the down scaled precipitation series estimated by NDMDM in 89 and 92 percent of the selected rain gauge stations in the calibration and validation periods, respectively have been closer to the his-

Table 7 Comparison of statics of daily precipitation values downscaled by MLRDM and NDMDM. Station

Model

Calibration

Validation

Dry season

Del.

Deh.

Kho.

Khar.

Ras.

Far.

Kas.

Shan.

Khan.

Shap.

Shoo.

Ars.

MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs. MLRDM NDMDM Obs.

Wet season

Dry season

Wet season

Mean

Std.

Skw.

Mean

Std.

Skw.

Mean

Std.

Skw.

Mean

Std.

Skw.

0.35 0.2 0.28 0.15 0.12 0.12 0.26 0.22 0.23 0.11 0.09 0.07 2.89 3.01 3.00 2.66 2.63 2.69 3.04 3.14 3.02 2.81 2.8 2.81 0.20 0.14 0.15 0.07 0.06 0.07 0.07 0.07 0.07 0.24 0.08 0.09

0.89 0.99 2.58 0.54 0.73 1.59 0.77 1.17 1.99 0.37 0.41 1.02 5.51 6.96 10.28 5.36 7.41 9.52 5.02 7.06 11.47 4.13 6.82 9.72 0.72 1.08 1.61 0.53 0.7 1.20 0.30 0.64 0.91 0.90 0.55 1.19

5.35 6.58 22.40 9.03 9.54 19.96 10.91 7.37 14.27 8.17 18.76 24.68 4.08 5.21 6.62 4.01 4.68 6.29 3.31 5.72 9.49 2.86 4.72 7.00 7.13 12.85 16.93 22.94 20.78 24.90 10.73 15.6 19.45 7.25 11.91 23.65

2.18 2.04 2.12 1.49 1.48 1.47 2.61 2.45 2.55 0.90 0.89 0.87 4.05 4.01 4.16 3.70 3.67 3.79 2.93 3.03 3.06 2.48 2.55 2.54 2.33 2.3 2.39 1.51 1.5 1.57 1.80 1.86 1.83 1.59 1.56 1.62

4.22 5.36 8.47 2.99 4.58 6.78 4.89 7.21 9.63 1.85 2.57 4.55 5.95 6.97 10.33 5.36 7.34 9.64 4.17 5.44 7.84 2.90 5.2 6.59 4.12 5.21 7.08 3.40 4.01 5.87 3.68 4.43 6.53 3.85 4.7 6.10

4.68 3.61 8.20 4.16 4.95 7.76 3.38 5.9 6.77 3.66 6.12 7.95 2.35 4.04 4.71 2.29 3.32 4.92 2.48 4.2 4.57 1.83 5.76 6.97 2.82 3.39 4.58 3.58 3.69 5.70 3.21 2.92 6.59 4.85 6.55 6.99

0.35 0.17 0.20 0.18 0.11 0.08 0.21 0.16 0.20 0.13 0.12 0.12 3.01 3.04 3.44 2.64 1.68 3.16 2.79 2.88 3.07 2.63 2.12 2.87 0.21 0.08 0.12 0.06 0.03 0.03 0.07 0.06 0.06 0.37 0.07 0.05

0.84 1.16 1.48 0.54 0.64 0.95 0.63 0.97 1.52 0.39 0.63 1.53 6.36 7.87 11.31 5.93 6.17 11.47 5.55 7.72 10.24 4.63 6.52 9.45 0.70 0.57 1.20 0.37 0.5 0.49 0.24 0.63 0.74 1.38 0.49 0.71

3.82 12.27 9.93 5.74 7.77 20.12 6.55 7.65 10.84 4.42 14.62 22.28 3.91 5.04 6.06 3.72 6.11 6.77 3.68 5.81 5.84 3.10 5.93 6.35 6.11 11.86 17.40 12.27 23.65 18.24 5.41 18.41 19.86 6.77 11.03 23.11

2.26 2.07 2.33 1.50 1.59 1.29 2.72 2.29 2.08 0.97 0.9 0.76 4.30 4.16 3.77 3.91 3.74 3.78 2.74 2.89 2.76 2.37 2.47 2.40 2.36 2.35 2.48 1.58 1.33 1.90 1.75 1.82 2.24 1.82 1.69 1.79

4.17 5.02 8.72 2.86 4.63 5.32 4.84 6.12 9.28 1.90 2.61 3.63 6.60 7.76 9.38 5.93 7.85 10.20 4.32 5.59 7.15 3.18 5.6 6.08 4.47 5.64 7.29 3.52 3.81 6.77 3.60 4.46 7.53 4.15 5.11 6.78

3.28 2.82 6.23 2.94 4.3 5.94 2.81 4.05 8.45 3.18 6.95 6.39 2.57 3.89 4.12 2.51 3.53 4.54 2.88 4.12 4.46 2.86 6.78 4.50 3.16 3.69 3.81 3.55 3.95 5.31 3.29 2.95 5.01 3.83 5.85 5.89

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

Monthly Mean (mm)

torical values than the MLRDM model results. In other words, NDMDM shows significant superiority over MLRDM in preserving historical standard deviation and skewness values of the precipitation series. Comparison between the performances of NDMDM in dry and wet seasons also shows that NDMDM performs better in preserving historical mean precipitation of the dry season. The results of the three selected stations (Del., Kas. and Khan.) including monthly mean, monthly standard deviation, monthly skewness and q–q plot of observed versus computed values are shown in Figs. 3–5 for the calibration and validation periods. Both MRLDM (MLR-MLR combination which is similar to MLRDM) and NDMDM performances have been acceptable in simulating the monthly mean precipitation values for the calibration and validation periods. For nearly all of the presented q-q plots, a threshold can be found; simulated daily precipitation values larger than this threshold are less than observations. For Del. Station this threshold is about 4.5 mm. For the Ras. Station, this threshold is 8 mm with the exception of NDMDM in the validation period. It is 4 mm for Khan. Station with the exception of the validation period. In Table 8, number of available samples used for occurrence and amount modeling in the calibration period for Del. Station is

(a-2)

5

Observed MLRDM NDMDM

4 3 2 1 0 1

2

3

4

5

6

7

8

presented. Number of the parameters for different NDMDM models is also shown in this Table. Number of parameters in MARS and MT are automatically set according to the available samples avoiding over-fitting. In occurrence simulation of the calibration period, many samples are available and models with higher numbers of parameters than MLR are also acceptable. But in simulation of amount in the dry season which has fewer samples, MARS and MT automatically determine suitable number of parameters as shown in the Table. For example, in June and September which are the months with fewest samples for amount modeling, both calibrated MT and MARS model have only one parameter while MLR has 5 parameters. GA-SVM also has 6 parameters which can cause over fitting in the dry season for Stations with few precipitation observations. To evaluate the climate change impacts on the studied Stations, SRES A2 and B2 scenarios have been considered for the years 2000–2050. The results of downscaling using MLRDM and NDMDM are generally different because of the different values of bias correction and variance inflation factors calculated in the calibration period by these models. In Fig. 6 and 5-year moving average of precipitation is presented for A2 and B2 scenarios in

Monthly Mean (mm)

10

5 4 3 2 1 0 1

9 10 11 12

2

3

4

5

2

3

4

5

Observed MLRDM NDMDM

6

7

8

Monthly Std. (mm)

Monthly Std. (mm)

(b-2)

1

14 12 10 8 6 4 2 0 1

9 10 11 12

MLRDM

3

4

5

6

7

8

9 10 11 12

Monthly Skewness (mm)

Monthly Skewness (mm)

Observed NDMDM

2

2

3

4

5

4 MLRDM NDMDM Bisector

1 0 0

1

2

3

4

5

6

Observed Monthly Precip. (mm)

Modeled Monthly Precip. (mm)

Modeled Monthly Precip. (mm)

(d-2)

2

6

7

8

9 10 11 12

1

2

3

4

5

Observed MLRDM NDMDM

6

7

8

9 10 11 12

Month

5 3

9 10 11 12

Observed MLRDM NDMDM

(c-1)

16 14 12 10 8 6 4 2 0

Month 6

8

Month

(c-2)

1

7

(b-1)

Month 18 16 14 12 10 8 6 4 2 0

6

Month

Month 14 12 10 8 6 4 2 0

Observed MLRDM NDMDM

(a-1)

10

(d-1)

8 6

MLRDM

4

NDMDM Bisector

2 0 0

2

4

6

8

10

Observed Monthly Precip. (mm)

Fig. 3. Downscaling results for Del. rain gauge. (a) Monthly mean, (b) monthly standard deviation, (c) monthly skewness and (d) q–q plot of Monthly precipitation (left column: calibration period, right column: validation period).

11

8 7 6 5 4 3 2 1 0

Observed MLRDM NDMDM

(a-1)

1

2

3

4

5

6

7

8

Monthly Mean (mm)

Monthly Mean (mm)

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

(a-2)

10

Observed MLRDM NDMDM

8 6 4 2 0

9 10 11 12

1

2

3

4

5

15

Monthly Std. (mm)

Monthly Std. (mm)

Observed MLRDM NDMDM

(b-1)

20

10 5 0 1

2

3

4

5

6

7

8

(b-2)

20 15

5 0 1

4 2 0 4

5

6

7

8

9 10 11 12

Monthly Skewness (mm)

Monthly Skewness (mm)

6

3

2

3

4

5

MLRDM NDMDM Bisector

0 5

10

15

Observed Monthly Precip. (mm)

Modeled Monthly Precip. (mm)

Modeled Monthly Precip. (mm)

10

0

7

8

9 10 11 12

Observed MLRDM NDMDM

1

2

3

4

5

6

7

8

9 10 11 12

Month

(d-1)

5

6

(c-2)

9 8 7 6 5 4 3 2 1 0

Month 15

9 10 11 12

10

9 10 11 12

Observed MLRDM NDMDM

8

2

8

Month

(c-1)

1

7

Observed MLRDM NDMDM

Month 10

6

Month

Month

14 12 10 8 6 4 2 0

(d-2) MLRDM NDMDM Bisector

0 2 4 6 8 10 12 14

Observed Monthly Precip. (mm)

Fig. 4. Ras. rain gauge. (a) Monthly mean, (b) monthly standard deviation, (c) monthly skewness and (d) q–q plot of monthly precipitation (Left column: calibration period, right column: validation period).

Del., Ras. and Khan. Stations. As it can be seen in Fig. 6, except for Khan. Station, the NDMDM has produced significantly different results compared with MLRDM model. For example, the results of downscaling simulation for Del. Station with MLRDM are significantly higher than the result of NDMDM and for Ras. Station, it is vise-versa. Also in Table 9, some statistical properties of the estimated annual precipitation by MLRDM and NDMDM for the all stations are reported. Based on the table, except for Kas, and Far. Stations, NDMDM and MLRDM results have been relatively similar for each of the scenarios. In other words, if long-term maximum values in one scenario by one model have been increased or decreases, same type of variation have been also predicted by the other model. This similarity of behavior have been also observed for minimum values in all stations except for Shan., Deh., and Shoo. Stations. It also worth mentioning that in both scenarios in almost 75% of the stations, lower minimum values have been estimated by NDMDM compared with MLRDM. Higher variances have also been estimated by NDMDM for all of the stations except for Khan., Shap., and Del. Stations.

These results demonstrate high uncertainty associated with downscaling model structures in the climate change modeling for different climate regions and future scenarios.

5. Conclusions The results of this study have shown that different combinations of DM methods can provide good alternative approaches in empirical or regression based downscaling simulations. NDMDM is also proved to be useful software for statistical downscaling. The proposed approach is applicable for downscaling of all meteorological variables and there is no restriction for using NDMDM for other variables however some tuning might be necessary for example for singularity test threshold. Based on the illustrated results of MARS and MT models, it can be concluded that data partitioning plays an important role in similarity of the statistical downscaling results. So, detection of similarity is highly recommended as one of the most important pre-processing steps in statistical downscaling.

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

4 3.5 3 2.5 2 1.5 1 0.5 0

Observed MLRDM NDMDM

(a-1)

1

2

3

4

5

6

7

8

Monthly Mean (mm)

Monthly Mean (mm)

12

Observed MLRDM NDMDM

(a-2)

5 4 3 2 1 0

9 10 11 12

1

2

3

4

5

6

Observed MLRDM NDMDM

(b-1)

12 10 8 6 4 2 0 1

2

3

4

5

6

7

8

6 4 2 0 1

2

3

4

5

20 15 10 5 0 1

2

3

4

5

6

7

6

8

9 10 11 12

Modeled Monthly Precip. (mm)

(mm)

Modeled Monthly Precip.

8 6

MLRDM NDMDM

4

Bisector

2 0 2

4

6

8

9 10 11 12

Observed MLRDM NDMDM

1

2

3

4

5

6

7

8

9 10 11 12

Month

(d-1)

0

8

(c-2)

16 14 12 10 8 6 4 2 0

Month 10

7

Month

Observed MLRDM NDMDM

25

9 10 11 12

Observed MLRDM NDMDM

8

9 10 11 12

Monthly Skewness (mm)

Monthly Skewness (mm)

(c-1)

8

(b-2)

10

Month 30

7

Month

Monthly Std. (mm)

Monthly Std. (mm)

Month

10

Observed Monthly Precip. (mm)

(d-2)

9 6

MLRDM NDMDM

3

Bisector

0 0

3

6

9

Observed Monthly Precip. (mm)

Fig. 5. Khan. rain gauge. (a) Monthly mean, (b) monthly standard deviation, (c) monthly skewness and (d) q–q plot of monthly precipitation (Left column: calibration period, right column: validation period). Table 8 Number of Parameters for amount and occurrence models and number of the available samples in Del. Station in the calibration period. Model

DM Method

January

February

March

April

May

June

July

August

September

October

November

December

Occurrence simulation

MLR kNN MARS MT GA-SVM No. of Samples

6 1 29 48 6 620

6 1 16 50 6 565

6 1 31 54 6 612

6 1 29 29 6 570

6 1 31 17 6 589

6 1 34 9 6 570

6 1 29 16 6 589

6 1 21 22 6 589

6 1 31 7 6 578

6 1 31 17 6 620

6 1 34 11 6 600

6 1 29 40 6 620

Amount simulation

MLR kNN MARS MT GA-SVM No. of Samples

5 1 6 9 6 87

5 1 6 8 6 110

5 1 1 5 6 121

5 1 1 2 6 47

5 1 6 3 6 25

5 1 1 1 6 7

5 1 1 3 6 13

5 1 14 1 6 26

5 1 1 1 6 7

5 1 1 1 6 18

5 1 1 1 6 16

5 1 31 7 6 69

This study also shows that appropriate performance of NDMDM results is not only related to the complexity of the selected DM methods and their higher numbers of parameters. For example GA-SVM provides much better results in amount simulation compared with MLR while they have almost same number of parameters. As another evidence, kNN (with only

one parameter, k) is the selected model for downscaling of Sha. Station while MLR (with 5 parameters) is selected only twice. The results of this study show that the statistical downscaling model performance is highly related to the modeling concept and the overall performance of the occurrence-amount simulation.

13

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

Annual Precip (mm)

Table 9 Maximum, Minimum and Variance of annual precipitation of all stations for different future scenarios.

(a)

600 500

Station

400

MLRDM-A2 MLRDM-B2 NDMDM-A2 NDMDM-B2

300 200 100

Del.

Deh.

2050

2045

2040

2035

2030

2025

2020

2015

Kho.

(b)

2100

Annual Precip (mm)

2010

2005

0

Khar.

1900 1700

Ras. MLRDM-A2 MLRDM-B2 NDMDM-A2 NDMDM-B2

1500 1300

Far.

1100 900

Kas.

2050

2045

2040

2035

2030

2025

2020

2015

Shan.

(c)

600

Khan.

500 400

MLRDM-A2 MLRDM-B2 NDMDM-A2 NDMDM-B2

300 200 100

Shap.

Shoo.

2050

2045

2040

2030

2035

2025

2020

2015

Ars.

2010

0

2005

Annual Precip (mm)

2010

2005

700

Fig. 6. Five-year moving average for A2 and B2 Scenarios in (a) Del., (b) Ras. and (c) Khan. stations.

Occurrence simulation is very similar to pattern recognition and regression based SVM does not provide good results in pattern detection. It is expected that SVM for classification provides better results in occurrence modeling but because of using SDSM platform, it has not been a choice in this study. Presented results in this paper depict better performance of NDMDM in preserving historical monthly mean of precipitation in dry seasons compared with wet seasons and closer estimation of historical monthly standard deviation and skewness values compared with MLRDM. Overall the results of this study have shown that NDMDM can be a useful tool for statistical downscaling of precipitation in semi-arid regions with high seasonal variability of precipitation and long dry seasons. Significant uncertainties in projection of climate change effects on precipitation in this study shows that future works can be focused on uncertainty assessment of MLRDM and NDMDM models. These simulations also help in evaluating the mathematical stability of regression models and their parameters. Comparison of the presented approaches in downscaling of other climatic parameters such as temperature, evaporation and number of days with event in Stations is also recommended.

Acknowledgements This work was supported by a grant from Iranian National Support Foundation (INSF) with Ref. NO. 90001592. The authors would

Statistics

Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance Maximum Minimum Variance

A2

B2

MLRDM

NDMDM

MLRDM

NDMDM

610 137 10692 610 137 10692 585 159 7988 226 45 1730 1242 689 19241 1293 693 23853 1323 792 18707 1311 707 13901 576 129 12005 462 96 5480 93 35 237 540 83 10241

497 90 10554 805 156 20140 535 89 10074 311 22 3380 2264 1099 63680 1225 546 30720 1470 592 31380 1632 782 46507 533 93 8839 262 45 3035 188 28 1220 581 47 12360

520 188 6090 520 188 6090 582 223 4616 209 66 1168 1628 609 36767 1673 533 41407 1486 690 25873 1332 733 17757 501 132 8659 382 73 4168 90 32 197 416 118 6562

452 100 8259 797 142 16655 442 111 4982 229 22 2226 2786 1003 89604 2106 464 91291 1325 492 33943 2201 600 84526 490 107 7614 261 47 2552 193 51 994 453 70 7824

like to thank the anonymous reviewers for their valuable comments to improve the quality of the paper.

References Aizerman, M., Braverman, E., Rozonoer, L., 1964. Theoretical foundations of the potential function method in pattern recognition learning. Automat. Remote Control 25, 821–837. Anandhi, A., Frei, A., Pierson, D.C., Schneiderman, E.M., Zion, M.S., Lounsbury, D., Matonse, A.H., 2011. Examination of change factor methodologies for climate change impact assessment. Water Resour. Res. 47, W03501. http://dx.doi.org/ 10.1029/2010WR009104. Bárdossy, A., Plate, E.J., 1992. Space-time model for daily rainfall using atmospheric circulation patterns. Water Resour. Res. 28 (5), 1247–1259. Bárdossy, A., Stehlík, J., Caspary, H.-J., 2002. Automated objective classification of daily circulation patterns for precipitation and temperature downscaling based on optimized fuzzy rules. Clim. Res. 23, 11–22. Bates, B.C., Charles, S.P., Hughes, J.P., 1998. Stochastic downscaling of numerical climate model simulations. Environ. Modell. Softw. 13, 325–331. Battiti, R., 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Networks 5 (4), 537–550. Becker, A., 1976. Simulations of nonlinear flow systems by combining linear models. IAHS 116, 135–142. Becker, A., Kundzewicz, Z.W., 1987. Nonlinear flood routing with multi linear models. Water Resour. Res. 23 (6), 1043–1048. Boser, B.E., Guyon, I.M., Vapnik, V.N., 1992. A training algorithm for optimal margin classifiers. In: Haussler, D. (Ed.), 5th Annual ACM Workshop on COLT. ACM Press, Pittsburgh, PA, pp. 144–152. Bowden, G.J., Dandy, G.C., Maier, H.R., 2005a. Input determination for neural network models in water resources applications. Part 1—Background and methodology. J. Hydrol. 301 (1–4), 75–92. Bowden, G.J., Maier, H.R., Dandy, G.C., 2005b. Input determination for neural network models in water resources applications. Part 2: Case study: forecasting salinity in a river. J. Hydrol. 301 (1–4), 93–107.

14

M. Nasseri et al. / Journal of Hydrology 492 (2013) 1–14

Buccola, N.L., Wood, T.M., 2010. Empirical models of wind conditions on Upper Klamath Lake. Oregon. U.S. Geological Survey Scientific-Investigations, Report 2010-5201. Chen, S.T., Yu, P.S., Tang, Y.H., 2010. Statistical downscaling of daily precipitation using support vector machines and multivariate analysis. J. Hydrol. 385 (1–4), 13–22. Corte-Real, J., Zhang, X., Wang, X., 1995. Downscaling GCM information to regional scales: a non-parametric multivariate regression approach. Clim. Dyn. 11, 413– 424. Coulibaly, P., Baldwine, C.K., 2005. Non stationary hydrological time series forecasting using nonlinear dynamic methods. J. Hydrol. 307, 164–174. Cover, T.M., 1965. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Elec. Comp. EC-14, 326–334. Enke, W., Spekat, A., 1997. Downscaling climate model outputs into local and regional weather elements by classification and regression. Clim. Res. 8, 195– 207. Faucher, M., Burrows, W.R., Pandolfo, L., 1999. Empirical-statistical reconstruction of surface marine winds along the western coast of Canada. Clim. Res. 11, 173– 190. Fei, Sh.W., Sun, Y., 2008. Forecasting dissolved gases content in power transformer oil based on support vector machine with genetic algorithm. Electric. Power Syst. Res. 78 (3), 507–514. Fistikoglu, O., Okkan, U., 2011. Statistical downscaling of monthly precipitation using NCEP/NCAR reanalysis data for tahtali river basin in Turkey. J. Hydrol. Eng. 16 (2). http://dx.doi.org/10.1061/(ASCE)HE.1943-5584.0000300. Friedman, J.H., 1991. Multivariate adaptive regression splines. Ann. Stat. 19 (1), 1– 67. Friedman, J.H., 1993. Fast MARS. Dept. of Statistics, Stanford University, Technical Report: 110. Friedman, J.H., Stuetzle, W., 1981. Projection pursuit regression. JASA 76, 817–823. Gangopadhyay, S., Clark, M., Rajagopalan, B., 2005. Statistical downscaling using Knearest neighbors. Water Resour. Res. 41, W02024. http://dx.doi.org/10.1029/ 2004WR003444. Guyon, I., Elisseeff, A., 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182. Herrera, M., Torgo, L., Izquierdo, J., Perez-Garcia, R., 2010. Predictive models for forecasting hourly urban water demand. J. Hydrol. 384, 141–150. Hessami, M., Gachon, Ph., Ouarda, T.B.M.J., St-Hilaire, A., 2008. Automated regression-based statistical downscaling tool. Environ. Modell. Softw. 23 (6), 813–834. Hoerl, A.E., Kennard, R.W., 1970. Ridge regression: application to nonorthogonal problems. Technometrics 12 (1), 69–82. Jekabsons, G., 2010, ARESLab: Adaptive Regression Splines toolbox for Matlab/ Octave. . Jekabsons, G., 2010. M5PrimeLab: M5’ Regression Tree and Model Tree Toolbox for Matlab. . Lall, U., Sharma, A., 1996. A nearest neighbor bootstrap for resampling hydrologic time series. Water Resour. Res. 32 (3), 679–693. Li, X., Sailor, D., 2000. Application of tree-structured regression for regional precipitation prediction using general circulation model output. Clim. Res. 16, 17–30. Liu, H., Sun, J., Liu, L., Zhang, H., 2009. Feature selection with dynamic mutual information. Pattern Recognition 42 (7), 1330–1339. Liue, H., Yu, L., 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17 (4), 491–502. May, R.J., Dandy, G.C., Maier, H.R., Nixon, J.B., 2008a. Application of partial mutual information variable selection to ANN forecasting of water quality in water distribution systems. Environ. Model. Softw. 23, 1289–1299. May, R.J., Maier, H.R., Dandy, G.C., Fernando, T.G., 2008b. Non-linear variable selection for artificial neural networks using partial mutual information. Environ. Model. Softw. 23, 1312–1326.

Mearns, L.O., Giorgi, F., Shields, C., McDaniel, L., 2003. Climate scenarios for the southeastern US based on GCM and regional modeling simulations. Climatic Change 60, 7–36. Mendes, D., Marengo, J.A., 2010. Temporal downscaling: a comparison between artificial neural network and autocorrelation techniques over the Amazon Basin in present and future climate change scenarios. Theor. Appl. Climatol. 100 (3–4), 413–421. Oliveira, A.L.I., Braga, P.L., Lima, R.M.F., Cornélio, M.L., 2010. GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf. Softw. Technol. 52 (11), 1155–1166. Pasini, A., 2009. Neural network modelling in climate change studies. Artif. Intell. Method Environ. Sci. II, 413–421. Quinlan, J.R., 1986. Induction on decision trees. Mach. Learn. 1, 81–106. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA. Raje, D., Mujumdar, P.P., 2011. A comparison of three methods for downscaling daily precipitation in the Punjab region. Hydrol. Process. http://dx.doi.org/ 10.1002/hyp.8083. Semenov, M.A., Barrow, E., 1997. Use of stochastic weather generator in the development of climate change scenarios. Clim. Change 35, 397–414. Sharif, M., Burn, D.H., 2006. Simulating climate change scenarios using an improved K-nearest neighbor model. J. Hydrol. 325, 179–196. Solomatine, D., Dulal, K.h., 2003. Model trees as an alternative to neural networks in rainfall-runoff modeling. Hydrol. Sci. J. 48, 399–411. Solomatine, D., Xue, Y., 2004. M5 model trees and neural networks: application to flood forecasting in the upper reach of the Huai river in China. J. Hydrol. Eng. 9 (6), 275–287. Tan, P.N., Steinbach, M., Kumar, V., 2006. Introduction to Data Mining. Addison Wesley. Tomassetti, B., Verdecchia, M., Giorgi, F., 2009. NN5: A neural network based approach for the downscaling of precipitation fields–model description and preliminary results. J. Hydrol. 367 (1–2), 14–26. Tripathi, Sh., Srinivas, V.V., Nanjundiah, R.S., 2006. Downscaling of precipitation for climate change scenarios: a support vector machine approach. J. Hydrol. 330 (3–4), 621–640. Vapnik, V.N., 1998. Statistical Learning Theory. Wiley, New York. Vapnik, V.N., Cortes, C., 1995. Support vector networks. Mach. Learn. 20, 273– 297. Wang, Y., Witten, I.H., 1997. Induction of model trees for predicting continuous classes. In: Proceedings of European Conference on Machine Learning, Prague, pp. 128–137. Wetterhall, F., Bárdossy, A., Chen, D., Halldin, S., Xu, Ch., 2009. Statistical downscaling of daily precipitation over Sweden using GCM output. Theor. Appl. Climatol. 96, 95–103. Wilby, R.L., Dawson, C.W., Barrow, E.M., 2002. SDSM–a decision support tool for the assessment of regional climate change impacts. Environ. Modell. Softw. 17, 147–159. Wilby, R.L., Charles, S.P., Zorita, E., Timbal, B., Whetton, P., Mearns, L.O., 2004. Guidelines for use of Climate Scenarios Developed from Statistical Downscaling Methods. Supporting material of the Intergovernmental Panel on Climate Change, The DDC of IPCC TGCIA, p. 27. Xiong, L.H., Shamseldin, A.Y., O’Connor, K.M., 2001. A non-linear combination of the forecasts of rainfall–runoff models by the first-order Takagi-Sugeno. Fuzzy system. J. Hydrol. 245 (1–4), 196–217. Yarnal, B., Comrie, A.C., Frakes, B., Brown, D.P., 2001. Developments and prospects in synoptic climatology. Int. J. Climatol. 21, 1923–1950. Yates, D., Gangopadhyay, S., Rajagopalan, B., Strzepek, K., 2003. A technique for generation regional climate scenarios using a nearest-neighbor algorithm. Water Resour. Res. 39 (7), 1199. http://dx.doi.org/10.1029/2002WR001769.