A comparison of linear and nonlinear regression modelling for ...

6 downloads 0 Views 1MB Size Report
The study compares a multiple linear and nonlinear regression model to forecast monthly water demand in the Blue Mountains. Water Supply System, Australia.
Excerpt of the full paper "ICWRER 2013 | Proceedings" Download: www.water-environment.org/proceedings.html

DOI: 10.5675/ICWRER_2013

A comparison of linear and nonlinear regression modelling for forecasting long term urban water demand: A Case Study for Blue Mountains Water Supply System in Australia Md Mahmudul Haque1 · Ataur Rahman1 · Dharma Hagare1 · Golam Kibria2 1

University of Western Sydney · 2 Sydney Catchment Authority

Abstract Prediction of long term water demand is necessary to assess the future adequacy of water resources, to attain an efficient allocation of water supplies among competing water users and to ensure long-term water sustainability. In order to predict future water demand and assess the effects of future climate and other factors on water demand, suitable mathematical models are needed. The study compares a multiple linear and nonlinear regression model to forecast monthly water demand in the Blue Mountains Water Supply System, Australia. The performance of the developed models are assessed through the relative error (RE), the coefficient of determination (R²), the percent bias (PBIAS) and the accuracy factor (Af ), computed from the observed and model predicted water demand values. The RE, R², PBIAS, Af , values are found to be 0.46%, 0.88, 2.07% and 1.04, respectively for multiple linear regression model and 2.49%, 0.30, -20.79% and 1.21, respectively for multiple nonlinear regression model. The results of the study show that the developed multiple linear regression model is capable of predicting water demand more accurately than multiple nonlinear regression model.

1. Introduction Urban water demand modelling plays an important role in efficient planning, design and development of water supply systems. In order to ensure reliable water supply to the residents of a city, an accurate estimate of future water demand is necessary. This estimate can help in planning a cost effective and reliable infrastructure expansion, developing alternative water supply sources and incorporating water demand management programs [House-Peters & Chang, 2011]. Mathematical models can be developed to estimate future water demand under changing climate, population growth and conservation measures. Therefore, development of a suitable water demand forecasts model is essential to have the prediction of future water demand. Water demand forecasting can be classified into two types, i.e. short term and long term forecasting [House-Peters & Chang, 2011; Nasseri et al., 2011]. Short term forecasting is required for operation of reservoirs and pumping stations, and maintenance of a water ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

363

DOI: 10.5675/ICWRER_2013

supply system [Jain et al., 2001]. On the other hand, long term forecasting is essential for planning and design of system expansion and future resilience analysis [Bougadis et el., 2005]. In literature, many approaches have been proposed to forecast short and long term urban water demands. Froukh [2001] mentioned that time extrapolation, disaggregate end-uses, single coefficient method and multiple coefficient method are suitable for long term forecasting. In contrast, time series models (e.g. Box Jenkins and ARIMA), memory-based learning technique, probabilistic method and artificial neural networks models are suitable for short term forecasting. Babel et al. [2007] also mentioned that domestic water demand can be forecasted by time extrapolation methods, single coefficient requirement methods, multiple coefficient requirement models, multiple coefficient demand models and disaggregated water use forecast models. Qi & Chang [2011] have grouped the existing forecasting approaches into five categories, which are the regression analysis, the time series analysis, the artificial intelligence approach (e.g. Artificial Neural Networks, fuzzy logic and agent based models), the hybrid and the Monte Carlo simulation approaches. Among all of the methods, regression analysis techniques dominate the water demand literature which has widely been used for both short and long term water demand forecasting. For example, Dandy et al. [1997] developed a long term water demand forecasting model for residential sector in Adelaide, Australia by regression analysis method, Billings & Agthe [1998] compared the regression and time series methods for short term urban water demand forecasting in Tuscon, Arizona. Lahlou & Colyer [2000] forecasted water demand for residential, commercial and industrial sector in Casablanca, Morocco by developing water demand forecast models using multiple regression techniques. Babel et al. [2007] developed a multivariate regression model to forecast long term domestic water demand in Kathmandu, Nepal. Balling Jr & Gober [2007] investigated the influence of climatic variables on annual water use in the city of Phoenix, Arizona by regression analysis. Polebitski & Palmer [2009] developed regression models to forecast single family residential water demand at a bi-monthly time step in Seattle, Washington. Shandas & Hossein Parandvash [2010] employed multiple regression analysis to measure the effects of land use on regional water demand. Most of these regression models are based on multiple linear regression analysis. In respect to multiple linear regression analysis of urban water demand, there has been rather limited research on multiple nonlinear regression modelling. Examples include Adamowski et al. [2012] and Yasar et al. [2012]. Adamowski et al. [2012] used ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

364

DOI: 10.5675/ICWRER_2013

polynomial functions to develop multiple nonlinear regression equations using observed data for forecasting domestic water demand in Montreal, Canada. Yasar et al. [2012] assumed a general form of nonlinear equation for forecasting water demand, which was the multiplication of all the independent variables with a power relationship for the dependent variable. In this study, a linear and a nonlinear multiple regression models were developed for forecasting long term residential water demand using the demographic, socio-economic and climatic variables as predictor variables. Firstly, a simple regression analysis was carried out for each of the predictor variables (with the dependent variable) to find out a suitable relationship (i.e. water demand). Thereafter, the nonlinear multiple regression functions were defined using the identified dependent-independent relationships. The developed multiple linear and nonlinear regression models were applied to the water supply systems for modelling of the single dwelling residential water demand in the Blue Mountains regions, Australia. The obtained results were compared for both the multiple linear and nonlinear regression models. Finally, the performances of the developed linear and nonlinear multiple regression models were evaluated using a number of statistical performance indices such as relative error, the coefficient of determination, the percent bias and the accuracy factor.

2. Study area The Blue Mountains region (Figure 1) of New South Wales, Australia is selected as the study area. The Blue Mountains Water Supply System provides water to around 48,000 population from Faulconbridge to Mount Victoria, which are considered as Upper and Middle Blue Mountains area [Sydney Catchment Authority, 2009]. Cascades and Greaves Creek delivery systems together make up the Blue Mountains Water Supply system which provides water to the twelve reservoir zones, namely, Mount Victoria, Blackheath, Catalina, Katoomba, Yosemite, Wentworth Falls, Bodington, Bullaburra, Lawson, Woodford, Linden, and Faulconbridge.

ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

365

DOI: 10.5675/ICWRER_2013

Figure 1 Blue Mountains region in Australia and Cascade and Greaves creeks water supply area [Bluemountainsaustralia, 2013]

The climate of the Blue Mountains is normally moderate than the lower Sydney region. As Mount Victoria is over 1000 meters above Sea Level, the temperature is normally 7°C lower than the coastal Sydney. The average temperature in the Upper Blue Mountains is around 5°C and 18°C in winter (June to August) and summer months (December to February), respectively. The Blue Mountains experience similar rainfall to that of Sydney. The average rainfall in the Upper Blue Mountains is around 1050 mm per year [Bluemountainsaustralia, 2013].

3. Materials and Methods 3.1. Data context Data on per dwelling monthly metered water consumption which has been considered as dependent variable in the linear and nonlinear regression models were obtained from ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

366

DOI: 10.5675/ICWRER_2013

Sydney Water for the period of January 1997 to September 2011 for the study area. Data on water usage price and water conservation savings (WCS) were also obtained from Sydney Water for the same period. In this study, WCS refers to the water savings from the implemented water demand management programs in Blue Mountains, such as rain water tank, WaterFix (installation of new showerheads, flow restrictions and minor leak repairs undertaken by a licensed plumber, DIY (Do-It-Yourself) kits (self installed flow restrictors), water efficient washing machines and toilets [Sydney Water, 2010]. Data on water restriction savings (WRS) due to imposed water restriction in the Blue Mountains Water Supply System during the drought period (2003-2009) were estimated from the water consumption data. Temperature and rainfall data were obtained from Sydney Catchment Authority for the period of January 1997 to September 2011 for the study area. The available data (1997-2011) was divided into two data sets, model development set (January 97 to June 09) and validation set (July 09 to September 11).

3.2. Multiple linear regression analysis Multiple linear regression (MLR) analysis examines the relationship between several independent variables and a dependent variable. In MLR, the relationship between the dependent variable and the independent variables are assumed to be linear. The following represents a multiple linear regression equation [Montgomery et al., 2001]:

Y = α + β1 X1 + β2 X2 + … + βk Xk

(1)

where α is the model intercept are the regression coefficients, and βs is the number of independent variables. In this study, three forms of MLR were adopted to model the single dwelling residential water demand which were Raw-Data (i.e. no transformation of the variables), Semi-Log (i.e. transformation of the dependent variable in the logarithmic form), and Log-Log (i.e. transformation of the dependent and independent variables in the logarithmic form). Based on the models performance, the semi-log model was finally selected to undertake the MLR analysis. The finally adopted MLR equation has the following form:

ln(Y) = α + β1 X1 + β2 X2 + β3 X3 + β4 X4 + β5 X5

ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

(2)

367

DOI: 10.5675/ICWRER_2013

Where Y = per dwelling water consumption in a month (kL/dwelling/month), X1 = total monthly rainfall in mm, X2 = monthly mean maximum temperature (in ºC), X3 = water usage price in AUD/kL, X4 = water conservation savings in kL/dwelling/month, and X5 = water restriction savings in kL/dwelling/month. The regression coefficients were estimated using Minitab software [Minitab, 2010].

3.3. Multiple nonlinear regression analysis In the multiple nonlinear regression (MNLR) analysis, the relationship between the dependent and the independent variables are assumed to be nonlinear. Non linear regression can estimate a model using random relationship between dependent and independent variables. In this study, a series of simple regression analysis between the dependent and the independent variables were conducted first. Thereafter, the multiple nonlinear regression equation was developed to model the single dwelling residential water demand using the identified relationship during simple regression analysis. Linear, power, quadratic, cubic, logarithmic and exponential functions were used to identify the best relationship. Results of the simple regression analysis between the dependent variable and the independent variables are presented in Table 1. Table 1: Correlation coefficients of the simple regression analysis between the dependent and the independent variables

Independent Variable

Linear

Logarithmic

Quadratic

Cubic

Power

Exponential

X1

0.024

0.029

0.028

0.028

0.027

0.024

X2

0.086

0.083

0.086

0.086

0.08

0.082

X3

0.38

0.39

0.41

0.42

0.46

0.44

X4

0.46

*

0.46

0.53

*

0.52

X5

0.61

*

0.61

0.61

*

0.68

*Logarithmic and power models cannot be calculated due to existence of zero values.

Based on the results presented in Table 1, logarithmic, linear, power, cubic and exponential models are more suitable for X1, X2, X3, X4 and X5 variables, respectively (as marked in bold in Table 1). Finally, the following multiple nonlinear equation form was adopted for the prediction of water demand.

Y = α + β1 ln(X1) + β2 X2 + β3 X3β + β5 X4 + β6 X42 + β7 X43 + β8 exp β ×X 4

9

5

(3)

The regression coefficients for nonlinear regression analysis were obtained using Minitab software [Minitab, 2010].

ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

368

DOI: 10.5675/ICWRER_2013

4. Model evaluation criteria The performances of the developed MLR and MNLR models were compared using four statistical performance indices, namely the relative error (RE), the coefficient of determination (R²), the percent bias (PBIAS) and the accuracy factor (Af ). The ideal values of these performance indices are 0 for RE, 1 for R², 0 for PBIAS and 1 for Af. The values of these performance indices were computed from the observed and model predicted values of the dependent variable. They were calculated for both the development and validation data sets. The values were calculated using the equations given in Table 2. Table 2: Numerical indices used to evaluate model performance

Performance Indices RE

Equation

∑ ( O − P ) * 100 n

i =1

N



PBIAS

   



n i =1

∑ (O n

i =1

i

(O

i

  n 2  ∑ i =1 ( Pi − P ) 

− O )( Pi − P )

−O)

2

2

∑ (O − P ) *100 ∑ O n

i =1

n

i =1

Af

10

     

P 

∑i =1 log O   n

N

  

*O: Observed water demand, P: Model estimated water demand, N: Number of observations.

5. Result and Discussion The performance indices of the developed multiple linear regression (MLR) and multiple nonlinear regression (MNLR) models for estimating per dwelling monthly water demand for both the model development and validation period are presented in Table 3. As can be seen in Table 3, RE value was found to be lower in MLR model than in MNLR model (i.e. 1.03% and 1.55%) for the model data set. The RE values were 0.46% and 2.49% for the validation data set for the MLR and MNLR model, respectively. The MLR model had an R² value of 0.70 and 0.88 for the model and the validation data sets, respectively, which were found to be much higher than the R² values of 0.46 and 0.30 of the MNLR model. In respect to percent bias, the MLR model also outperformed the MNLR model. The PBIAS values for MLR model were found to be 0.65% and 2.07% for the model and ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

369

DOI: 10.5675/ICWRER_2013

the validation data set, respectively. Similarly, for MNLR model they were 0.44% and -20.79%, respectively. The Af was found to be 1.07 and 1.04 for MLR model and 1.11 and 1.21 for MNLR model for the model and validation data set, respectively. Based on these

Af values, it has been found that the MLR model performed better than the MNLR model. Table 3: Results of the performance indices for both the MLR and the MNLR model

Performance Indices

MLR

 

MNLR Model data set

RE

1.03

1.55

r

0.70

0.46

PBIAS

0.65

0.44

Af

1.07

1.11

RE

0.46

2.49

r

0.88

0.30

PBIAS

2.07

-20.79

Af

1.04

1.21

2

 

2

Validation data set

Figures 2 and 3 compare the observed monthly water demand for the single dwelling residential sector with the forecasted monthly water demand for the period of July 2009 to September 2011 by MLR and MNLR model, respectively. As can be seen in Figure 2, the MLR model provided the closest estimates to the corresponding observed monthly water demand for the forecast period. On the other hand, MNLR model over estimated the monthly water demand (Figure 3) in comparison to observed water demand during the forecast period. By comparing Figures 2 and 3, it can be said that the MLR models yield significantly better demand predictions.

ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

370

Monthly Water Demand in kL

DOI: 10.5675/ICWRER_2013

250000 200000 150000 100000

Observed Demand

Forecasted Demand

50000 0

7 8 9 101112 1 2 3 4 5 6 7 8 9 101112 1 2 3 4 5 6 7 8 9 2009

2010 Month, Year

2011

Monthly Water Demand in kL

Figure 2 Comparison of forecasted versus observed monthly water demand using MLR model during the forecast period (July, 09-Sept,11)

300000 250000 200000

150000 Observed Demand

100000

Forecasted Demand

50000 0

7 8 9 101112 1 2 3 4 5 6 7 8 9 101112 1 2 3 4 5 6 7 8 9 2009

2010 Month, Year

2011

Figure 3 Comparison of forecasted versus observed monthly water demand using MNLR model during the forecast period (July, 09-Sept, 11)

6. Conclusion In this study, a multiple linear regression (MLR) and nonlinear regression models (MNLR) were developed for the prediction of monthly water demand for single residential sector in the Blue Mountains Water Supply System, Australia. In the Blue Mountains system, single dwelling residential sector is the highest water consumption sector. Around 75% of total supplied water is used by this sector and the rest is shared between multiple dwelling and commercial sectors. The developed models were validated and tested on monthly data of per dwelling single residential water demand over a period of 12 years. Both the MLR and MNLR were constructed using five independent variables namely, total monthly rainfall, monthly mean maximum temperature, water usage price, water ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

371

DOI: 10.5675/ICWRER_2013

conservation savings and water restriction savings. The study shows that the developed MLR model was capable of forecasting monthly water demand with a higher degree of accuracy. On the other hand, performance of the developed MNLR model was relatively poor as compared to MLR model. Moreover, the developed MNLR model was found to have overestimated the monthly water demand by about 20%. However, the suitability of multiple nonlinear regression models need to be investigated further using different combination of random relationships between the dependent and the independent variables in the models to improve its accuracy.

Acknowledgements: Water consumption data were collected from Sydney Water in 4 May 2012. The best available data at the time of study has been used, which may be updated in near future. Daily rainfall and temperature data were collected from Sydney Catchment Authority (SCA). The authors express their sincere thanks to Pei Tillman and Frank Spaninks of SW for their assistance in collating and providing the data. Further, the authors are very grateful to Lucinda Maunsell and Peter Cox of Sydney Water and Mahes Maheswaran of Sydney Catchment Authority for their cooperation and assistance during data collection and analysis.

References Adamowski, J., Chan, H. F., Prasher, S. O., Ozga-Zielinski, B., & Sliusarieva, A. (2012). Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada. Water Resources Research, 48(1), W01528. Babel, M., Gupta, A. D., & Pradhan, P. (2007). A multivariate econometric approach for domestic water demand modeling: An application to Kathmandu, Nepal. Water Resources Management, 21(3), 573-589. Balling Jr, R. C., & Gober, P. (2007). Climate variability and residential water use in the city of Phoenix, Arizona. Journal of Applied Meteorology and Climatology, 46(7), 1130-1137. Billings, R. B., & Agthe, D. E. (1998). State-space versus multiple regression for forecasting urban water demand. Journal of Water Resources Planning and Management, 124, 113. Bluemountainsaustralia.com (n.d.), Location and maps, viwed 10 February 2013, http://www.bluemts.com.au/info/about/maps/ Bluemountainsaustralia.com (n.d.), Climate, viwed 10 February 2013, http://www.bluemts.com.au/info/about/climate/

ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

372

DOI: 10.5675/ICWRER_2013

Bougadis, J., Adamowski, K., & Diduch, R. (2005). Short-term municipal water demand forecasting. Hydrological Processes, 19(1), 137-148. Dandy, G., Nguyen, T., & Davies, C. (1997). Estimating Residential Water Demand in the Presence of Free Allowances. Land Economics, 73(1), 125-139. Froukh, M. L. (2001). Decision-support system for domestic water demand forecasting and management. Water Resources Management, 15(6), 363-382. House-Peters, L. A., & Chang, H. (2011). Urban water demand modeling: Review of concepts, methods, and organizing principles. Water Resources Research, 47(5), W05401. Jain, A., Kumar Varshney, A., & Chandra Joshi, U. (2001). Short-term water demand forecast modelling at IIT Kanpur using artificial neural networks. Water Resources Management, 15(5), 299-321. Lahlou, M., & Colyer, D. (2000). Water Conservation in Casablanca, Morocco. JAWRA Journal of the American Water Resources Association, 36(5), 1003-1012. Montogomery, D. C., Peck, E.A., & Vining, G.G. (2001). Introduction to linear regression analysis. Third edition, John Wiley & Sons, New York, USA. Minitab. (2010). Minitab 16 statistical software. Minitab, Inc., State College, PA. Nasseri, M., Moeini, A., & Tabesh, M. (2011). Forecasting monthly urban water demand using Extended Kalman Filter and Genetic Programming. Expert Systems with Applications, 38(6), 7387-7395. Polebitski, A. S., & Palmer, R. N. (2009). Seasonal residential water demand forecasting for census tracts. Journal of Water Resources Planning and Management, 136(1), 27-36. Qi, C., & Chang, N. B. (2011). System dynamics modeling for municipal water demand estimation in an urban region under uncertain economic impacts. Journal of Environmental Management, 92(6), 1628-1641. Shandas, V., & Hossein Parandvash, G. (2010). Integrating urban form and demographics in water-demand management: an empirical case study of Portland, Oregon. Environment and planning B: Planning & design, 37(1), 112. Sydney Catchment Authority (2009). Blue Mountains water supply system: Strategic review. Sydney Catchment Authority, Penrith. Sydney Water (2010). Water conservation and recycling implementation report, 2009-10, Sydney. Yasar, A., Bilgili, M., & Simsek, E. (2012). Water Demand Forecasting Based on Stepwise Multiple Nonlinear Regression Analysis. Arabian Journal for Science and Engineering, 37(8), 2333-2341.

ICWRER 2013 | A comparison of linear and nonlinear regression modelling …

373

DOI: 10.5675/ICWRER_2013

German IHP/HWRP National Committee

BfG – Federal Institute of Hydrology, Koblenz

HWRP – Hydrology and Water Resources Programme of WMO

United Nations Educational, Scientific and Cultural Organization

International Hydrological Programme

German National Committee for the International Hydrological Programme (IHP) of UNESCO and the Hydrology and Water Resources Programme (HWRP) of WMO Koblenz 2013

©IHP/HWRP Secretariat

Disclaimer

Federal Institute of Hydrology

Any papers included in these proceedings

Am Mainzer Tor 1

reflect the personal opinion of the authors.

50068 Koblenz · Germany

The publisher does not accept any liability for the correctness, accuracy or complete-

Telefon: +49 (0)261 / 1306 - 54 35

ness of the information or for the obser-

Telefax: +49 (0)261 / 1306 - 54 22

vance of the private rights of any third parties. Any papers submitted by the authors do

www.ihp-germany.de

not necessarily reflect the editors’ opinion; their publication does not constitute any

DOI: 10.5675/ICWRER_2013 ICWRER 2013 | Proceedings | Disclaimer

evaluation by the editors.