An artificial neural network model for flood simulation ...

59 downloads 0 Views 2MB Size Report
Dec 31, 2011 - Mahmud, Wan Nor Azmin Sulaiman & ... Biswajeet Pradhan • Ahmad Rodzi Mahmud • ... levels and then the flood map is constructed in GIS. To.
An artificial neural network model for flood simulation using GIS: Johor River Basin, Malaysia Masoud Bakhtyari Kia, Saied Pirasteh, Biswajeet Pradhan, Ahmad Rodzi Mahmud, Wan Nor Azmin Sulaiman & Abbas Moradi Environmental Earth Sciences ISSN 1866-6280 Volume 67 Number 1 Environ Earth Sci (2012) 67:251-264 DOI 10.1007/s12665-011-1504-z

1 23

Your article is protected by copyright and all rights are held exclusively by SpringerVerlag. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to self-archive your work, please use the accepted author’s version for posting to your own website or your institution’s repository. You may further deposit the accepted author’s version on a funder’s repository at a funder’s request, provided it is not made publicly available until 12 months after publication.

1 23

Author's personal copy Environ Earth Sci (2012) 67:251–264 DOI 10.1007/s12665-011-1504-z

ORIGINAL ARTICLE

An artificial neural network model for flood simulation using GIS: Johor River Basin, Malaysia Masoud Bakhtyari Kia • Saied Pirasteh • Biswajeet Pradhan • Ahmad Rodzi Mahmud Wan Nor Azmin Sulaiman • Abbas Moradi



Received: 8 November 2010 / Accepted: 13 December 2011 / Published online: 31 December 2011  Springer-Verlag 2011

Abstract Flooding is one of the most destructive natural hazards that cause damage to both life and property every year, and therefore the development of flood model to determine inundation area in watersheds is important for decision makers. In recent years, data mining approaches such as artificial neural network (ANN) techniques are being increasingly used for flood modeling. Previously, this ANN method was frequently used for hydrological and flood modeling by taking rainfall as input and runoff data as output, usually without taking into consideration of other flood causative factors. The specific objective of this study is to develop a flood model using various flood causative factors using ANN techniques and geographic information system (GIS) to modeling and simulate flood-prone areas in the southern part of Peninsular Malaysia. The ANN model for this study was developed in MATLAB using seven flood causative factors. Relevant thematic layers (including rainfall, slope, elevation, flow accumulation, soil, land use, and geology) are generated using GIS, remote sensing data, and field surveys. In the context of objective weight

Electronic supplementary material The online version of this article (doi:10.1007/s12665-011-1504-z) contains supplementary material, which is available to authorized users. M. B. Kia  B. Pradhan (&)  A. R. Mahmud Institute of Advanced Technology (ITMA), University Putra Malaysia (UPM), 43400 Serdang, Selangor Darul Ehsan, Malaysia e-mail: [email protected]; [email protected] S. Pirasteh Dezful Branch, Islamic Azad University, Dezful, Iran W. N. A. Sulaiman  A. Moradi Faculty of Environmental Studies, University Putra Malaysia, Serdang, Malaysia

assignments, the ANN is used to directly produce water levels and then the flood map is constructed in GIS. To measure the performance of the model, four criteria performances, including a coefficient of determination (R2), the sum squared error, the mean square error, and the root mean square error are used. The verification results showed satisfactory agreement between the predicted and the real hydrological records. The results of this study could be used to help local and national government plan for the future and develop appropriate (to the local environmental conditions) new infrastructure to protect the lives and property of the people of Johor. Keywords Flood modeling  Neural network  GIS  Remote sensing  Spatial modeling  Johor  Malaysia

Introduction Globally, floods are frequent natural disasters that cause severe damage to both lives and property. It is estimated that of the total economic loss caused by all kinds of disasters, 40% are due to flooding (Feng and Lu 2010). Recurring floods is a severe problem in Malaysia causing many deaths. In Malaysia, floods are the most important natural hazard in terms of population affected, frequency, areal extent, and socio-economic damage (Pradhan and Youssef 2011). According to the Department of Irrigation and Drainage (DID), 9% of land area (29,800 km2) in the country is prone to flood, 22% of the population (4.82 million) is affected by floods (Fig. 1), and the average annual flood damage is about USD 0.3 billion (Pradhan 2010). In Malaysia, there are many reports available on flooding since the 1920s. According to DID, Malaysia has

123

Author's personal copy 252

Environ Earth Sci (2012) 67:251–264

Fig. 1 a Flood-prone area in west of Malaysia (Hassan and Ghani 2006) and b location of the study area

experienced major floods in the past decades and most recently in December 2006 and January 2007 which severely affected Johor state. In fact, during the recent 2006–2007 floods in Johor a couple of abnormally heavy rainfall events caused massive flooding. The estimated total cost in terms of loss of property of these flood disasters amounted to USD 0.5 billion, considered as one of the most

123

costly flood events in Malaysian history. At the peak of the recent Johor flood, around 110,000 people were evacuated and moved to relief centers and the death toll was 18 lives. Despite large costs to manage and control of these floods in Malaysia, flooding occurs repeatedly every year in various parts of the country, destroying property and killing people each year (Pradhan 2010a). Even though flood

Author's personal copy Environ Earth Sci (2012) 67:251–264

events are unavoidable (Maidment 2002; Youssef et al. 2011), to reduce the losses, authorities and the general public should know when and where the next flood is going to happen and what areas are going to be inundated due to such events. Therefore, there is a need to develop an accurate technique for flood forecasting in order to prevent future disasters. Hydrologists have a long history of research into constructing models of hydrological processes. It is the goal of flood modeling to provide timely and accurate estimates of future discharge conditions at specific watershed locations. The wide variety of available forecasting techniques used by the hydrologists today, include physically based rainfall-runoff modeling techniques, data-driven techniques, and varying degrees of combination of the both, with forecasts ranging in scale from short-term involving a number of hours, through to long-term involving a number of months or years (Smith and Ward 1998). Although hydrologists have used many models to predict flooding, the problem remains. Some of the models cannot cope with dynamic changes inside the watersheds. Some models are too difficult to calibrate and need to have robust optimization tools and some models require an understanding of the physical processes inside the basin. These problems have lead to exploration of a more datadriven approach (Varoonchotikul 2003). In recent years, there have been many studies on flood susceptibility and hazard mapping using remote sensing data and GIS tools. Radar remote sensing data have been extensively used for flood monitoring across the globe (Hess et al. 1995, 1990; Pradhan and Shafie 2009; Pirasteh et al. 2010) and many of these studies have applied probabilistic methods (Farajzadeh 2001, 2002; Horritt and Bates 2002; Pradhan and Shafie 2009). Hydrological and stochastic rainfall method for flood susceptibility mapping has been employed in other areas (Blazkova and Beven 1997; Cunderlik and Burn 2002). Flood susceptibility mapping using GIS and neural network methods have also been applied in various case studies (Islam and Sado 2001, 2002; Dixon 2005). Therefore, to increase the precision of flood models and to cope with some of the above limitations, in recent years, several hydrological studies have used new techniques such as ANN, fuzzy logic and neuro-fuzzy to make flood predictions (Dixon 2005). These techniques are capable of dealing with uncertainties in the inputs and can extract information from incomplete or contradictory datasets (Rashid et al. 1992; Pradhan 2010a, b, c, 2011a, b; Pradhan et al. 2006, 2010a, b, c, d; Rogers et al. 1995; Pradhan and Youssef 2010; Oh and Pradhan 2011; Sezer et al. 2011; Lorrai and Sechi 1995; Tamari et al. 1996; Woldt et al. 1996; Holger and Dandy 1996; Zhu et al. 1997; Schaap et al. 1998; Lin et al. 1999; Ray and Klindworth 2000; See

253

and Openshaw 2000; Liu and Chandrashekar 2000; Dixon 2005). These new methods are frequently developed for hydrological and flood modeling only with rainfall and runoff as input and output, usually without taking into consideration of other flood causative factors. The specific objective of this study is to develop a flood model using various flood causative factors using ANN technique and GIS to modeling and simulation of flood-prone areas at Johor River Basin, Malaysia. This paper is presented in seven major sections: firstly, ‘‘Introduction’’ to the paper, followed by ‘‘Study area’’ which presents the data and methodology used. Thirdly, ‘‘Data set’’, which presents the background of artificial neural network model, then ‘‘Flood simulation’’, which presents the application of neural network in the Johor River Basin, followed by ‘‘Flood map generated by ANN’’, which consists of the main results and discussion, followed by ‘‘Model performance assessments’’ and finally, the paper summarizes the conclusion in ‘‘Discussion and conclusions’’.

Study area The Johor River Basin is located at the South-east of Peninsular Malaysia (Fig. 1). The river originates from Gunung Belumut (at an elevation of 1,010 m) and Bukit Gemuruh (at an elevation of 109 m) in the north of basin. It flows south-east to discharge into the Johor Straits. Natural forest and low land swamps are the dominant land cover in the northern and central part, and oil palm and rubber plantations with swamps occupied the southern part of the basin. The most important population center is Kota Tinggi, the administrative center of the Kota Tinggi District. Average annual rainfall is 2,500 mm and the mean annual discharge at Rantau Panjang station (1,130 km2) is 37.7 m3/s. The catchment area of Johor River at Kota Tinggi is about 1,620 km2. The main tributaries of this river are Sayong, Linggiu, Semanggar, Tiram and Lebam.

The data set Like most modeling methods, the techniques used in this research are based on the well-known principle of ‘‘past and today are keys to the future’’. In order to develop the flood model, understanding and determination of flood causative factors are crucial for this study area. These factors are selected based on the knowledge acquired from a literature review and from previous research such as the United Nations Environment Program (UNEP 2002), Kingma (2002), Smith and Ward (1998), World Meteorological Organization (WMO 2008), and field studies. Although many factors may be important with respect to the

123

Author's personal copy 254

flood occurrence for a particular region, the same factors may not be important for other regions (Pradhan 2010a). Hence, different thematic data layers corresponding to causative and intensifying flood factors, namely, topography, topographic slope, soil, land cover/land use, lithology, and drainage were prepared as input for this study. Besides the intense rainfall, these factors are classified as direct and indirect causative factors in floods. They are considered as being responsible for flood occurrence in the study area. DEM and its derivatives Topography, as an intensifying factor, plays an important role in flood severity and for the determination of a floodprone area. On one hand, topographic factors have a direct effect on flow size and runoff velocity. On the other hand, river flood-prone areas mostly have low elevation and also slight topographic slope. Digital elevation models (DEM) are an excellent source to derive topographic factors responsible for flood activity in a region. Because the results of the flooding model have to show on the DEM to define flood-prone areas, the DEM must have appropriate accuracy (Pradhan 2009). Therefore, a DEM has been generated from contours on 1:25,000 topographic maps (Fig. 2a). Topographic slope is defined as the angle between the surface and a horizontal datum. It means that gravity has an effect in inducing runoff and its velocity. Therefore, this factor is very important in hydrology (Gomez and Kavzoglu 2005). Although the steeper slopes produce more rapid flows, floods tend to occur on gentle slopes. The estimation of the slope angle for the Johor basin was derived from the DEM and divided into four classes (Fig. 2b). Nearly 64% of the Johor basin has slope angles ranging from 0 to 5. The mean slope angle is 6 while the maximum slope angle is 49. Many of the floods occur in the high density drainage due to accumulation of a large quantity of water. To build this layer, a drainage data layer has been prepared from the DEM with the flow direction for each filled DEM cell one of the keys to deriving the hydrologic characteristics of a surface. This function directs the flow out of each cell (Fig. 2c). The values in the resultant pixels of the flow direction grid indicate the direction of the steepest descent from that pixel. Later, the flow accumulation was computed using the accumulated number of pixels in upstream. The results of flow accumulation can be used to create a stream network by classification of pixel values (Fig. 2d). Geology The watershed has different geological formations. The major rock units, which occupy about 67% of the study

123

Environ Earth Sci (2012) 67:251–264

area are granite, adamellite, and minor granodiorite (source: Depart of Mineral & Geosciences, Malaysia). The Riverine and swamp alluvium, colluviums, sand, silt and clay with some gravel occupy about 10% of the study area along the river valleys. The center of watershed and around the Linggui Dam lake are covered by massive cross-bedded sandstone with intercalations of maroon and greenish gray mudstone, and grit (about 7%). The sandstone, siltstone, conglomerate, shale, tuff and lava are found in the south-east part of the lake and cover about 4.2%. Acid to intermediate pyroclastics, lava, shale and microadamellite are other geological formations in this area. The geological map is shown in the Fig. 2e. Soil types The study area is characterized by five different types of soil series. The most dominant soil series of the study area is Ultisols. This soil occupies about 73% of the study area. Steep lands comprise about 16.5% of study area and are found in the north and north east part of the watershed. Entisols covers about 8.3% and are found mainly along the stream valleys. Main lands are found in the eastern part of the watershed center, whereas small patches of Oxisols are found in the south-east part of the study area. Spatial distribution of major soil series in the study area is shown in Fig. 2f. Land use Land use and type of land cover are also key factors responsible for flood incidence. The occurrence of flooding is inversely related to the vegetation density. Rain falls on the barren slopes run over the surface rapidly as compared to the forest area. Consequently, some land use areas (for instance, high percentage of cropland or urban land use) yield more storm runoff in comparison with similar areas which are covered by grassland or forest. To define this factor, the layer was prepared and seven dominant land use land cover classes namely forest, agriculture, mine area, built up, water bodies, barren area and swamp have been considered (Fig. 2g). Rainfall and runoff data To develop a flood model, rainfall and simultaneous runoff data were obtained from seven rain gauges and one water level station within the Johor River Basin area; the summary of the rain gauge data is shown in Table 1. In total, 267 samples of highest pick discharges (flood events) and their rainfalls, as published by the DID, were selected.

Author's personal copy Environ Earth Sci (2012) 67:251–264

255

Fig. 2 Input thematic layers: a DEM, b slope angle, c flow direction, d flow accumulation, e geology, f soil types, and g land use

123

Author's personal copy 256

Fig. 2 continued

123

Environ Earth Sci (2012) 67:251–264

Author's personal copy Environ Earth Sci (2012) 67:251–264

257

Table 1 Summary of the rain gauges of Johor River Basin Stations no.

1,737,001

1,834,001

1,833,123

1,834,122

1,835,001

1,836,001

1,739,002

Mean annual rainfall (mm)

1,946.8

1,844.6

2,111.1

2,058.9

2,601.2

2,285

2,345.3

Minimum annual rainfall (mm)

1,251

865.2

1,225

1,355.5

1,847.5

1,291

1,292

Maximum annual rainfall (mm)

2,613.9

2,929

3,014

2,575.7

5,152.5

3,274

3,178

Period of record

1987–05

1989–05

1987–06

1987–06

1987–06

1987–06

1987–97

Flood simulation Artificial neural networks ANNs are mathematical models of human perception that can be trained for performing a particular task based on available empirical data. When the relationships between data are unknown, they can make a powerful tool for modeling (Lek et al. 1996; Lek and Gue´gan 1999; Mas 2004; Pradhan and Buchroithner 2010; Pradhan and Lee 2009, 2010a, b, c; Pradhan and Pirasteh 2010). The theory and mathematical basis of ANNs are explained in detail by many researchers (Bishop 1995; Haykin 1999). Therefore, a brief description is presented here. As shown in Fig. 3, an ANN includes a number of neurons or nodes that work in parallel to transform the input data into output categories. Typically, an ANN consists of three layers namely input, hidden layers and output. Each layer, depending on the specific application in a network, has some neurons. Each neuron is connected to other neurons in the next consecutive layer by direct links. These links have a weight that represents the strength of outgoing signal (Atkinson and Tatnall 1997; Varoonchotikul 2003). The input layer receives the data from different sources (e.g., thematic layers). Hence, the number of neurons in the input layer depends on the number of input data sources. The data are processed in hidden and output layers actively. The number of hidden layers and their neurons are often defined by trial and error (Atkinson and Tatnall

1997). The number of neurons in output layers is fixed by the application and is represented by the class being processed. Each hidden neuron responds to the weighted inputs it receives from the connected neurons from the preceding input layer. Once the combined effect on each hidden neuron is determined, the activation at this neuron is determined via a transfer function. Many differentiable nonlinear functions are available as a transfer function. Since the sigmoid function enables a network to map any nonlinear process, most networks of practical interest make use of it (Bishop 1994; ASCE Task Committee 2000). A typical artificial neuron and the modeling of a multilayered neural network are illustrated in Fig. 3. Referring to this Fig. 3, the signal flow from inputs x1, …, xn is considered to be unidirectional, which are indicated by arrows, as is a neuron’s output signal flow (O). The neuron output signal O is given by the following relationship: ! n X O ¼ f ðnetÞ ¼ f wj xi ; ð1Þ j¼1

where wj is the weight vector, and the function f(net) is referred to as an activation (transfer) function. The variable net is defined as a scalar product of the weight and input vectors, net ¼ wT x ¼ ðw1 x1 Þ þ ðw2 x2 Þ þ    þ ðwn xn Þ;

ð2Þ

where T is the transpose of a matrix, and, in the simplest case, the output value O is computed as  1 if wT x  h O ¼ f ðnetÞ ¼ ; ð3Þ 0 if otherwise where h is called the threshold level, and this type of node is called a linear threshold unit (Abraham 2005).

Input 1

Application of ANN to flood simulation

Input 2 Output Input 3

The ANN flood model development consisted of three steps: the ANN architecture, training, and testing (Principe et al. 1999).

Input 4

ANN architectures Fig. 3 Architecture of a multilayer neural network and an artificial neuron

The ANN architecture refers to the number of layers and connection weights. It also defines the flow of information

123

Author's personal copy 258

in the ANN network. Design of a suitable structure is the most important and also the most difficult part in the ANN modeling process (Maier and Dandy 1996). There are no strict rules to define the number of hidden layers and neurons in the literature. Most researchers have been using the trial and error method to determine them. Although some studies use a single hidden layer in ANN architecture, Sarle (1994) showed that a higher flexibility can be taken using more than one hidden layer and subsequently they used two hidden layers as starting point (Flood and Kartam 1994; Tamura and Tateishi 1997). However, the optimal design of ANN architectures depends on the type of problem under investigation. In this research, a three-interconnection ANN architecture comprises an input layer, two hidden layers, and an output layer was used. The input layer contains seven neurons (one each for elevation, topographic slope, flow accumulation, geology, land use, soil, and rainfall data) each representing a causative factor that contributes to the occurrence of the flood in the catchment. The output layer contains a single neuron representing river flow. The hidden layers and their number of neurons are used to define the complex relationship between the input and output variables.

Fig. 4 A schematic architecture of ANN for flood modeling

123

Environ Earth Sci (2012) 67:251–264

Once the input and output variables were defined, to identify the hidden layers neurons, the ANN architecture 7-N–N-1 (which N represent the number of neuron in hidden layers) was examined (Fig. 4). Training of the network The aim of training process is to decrease the error between the ANN output and the real data by changing the weight values based on a given algorithm (Pradhan and Lee 2010a). Typically, back-propagation algorithms are used by the ANN at this stage. A successful ANN model can predict target data from a given set of input data. Once the minimal error is achieved and training is completed, the feed-forward structure is applied by ANN to generate a classification of the whole data set (Paola and Schowengerdt 1995). The flowchart for the determination of weights in ANN is shown in Fig. 5. The weights between different layers were calculated by training the ANN through a reverse calculation process in which the contribution or importance of each factor was computed. Then, the contribution or importance of each factor, i.e., the weight, was determined.

Author's personal copy Environ Earth Sci (2012) 67:251–264

Fig. 5 Flow chart showing weight determination of flood factors using ANN model (modified after Pradhan and Lee 2010a)

To train the ANN, a 7-N–N-1 format was used in this study, where N represents the hidden layer nodes. By varying the number of neurons in both hidden layers, the neural networks were run several times to identify the most appropriate neural network architecture based on training and testing accuracies. Therefore, the neurons in the hidden layers were changed by repeating the training process 20 times and then taking the minimum mean square error. The values of neurons in the first and second layers were checked from 7 to 25 and 3 to 14, respectively. For each ANN configuration the training procedure was repeated starting from independent initial conditions and ultimately ensuring selection of the best performing network. The decreasing trend in the minimum mean square error in the training and validation sets was used to decide the optimal learning. The training was stopped when the minimum mean square error was achieved. This was done by adding an early stopping technique in the MATLAB software. This is an indication of the network getting over-trained; as such an ANN would perform very well in the training stage but would fail to maintain that level of performance when applied to a different dataset. The over-fitting error can occur during the training stage. In over-fitting error, however, the training set error is very small, but the error is large when new data are used and the new solution cannot be generalized to a new situation by ANN. The early stopping method is used to prevent this error during training. In this method, to develop

259

the algorithms, the training of ANN, and to prevent any over-fitting error, the data are divided into three independent parts: training, validation, and testing. The training part is used for training and updating the parameters of the ANN. The error in validation part is checked during the training stage. Normally, the error of validation is reduced during the initial stage of training. When over-fitting is started, the validation error is increased. When the validation error increases for a specified number of iterations, the training is stopped, and the weights that produced the minimum error on the validation set are retrieved. In this study, the data were randomly selected, i.e. 60% used for training, 20% used for validation purpose in order to stop training before over-fitting, and the remaining 20% were used as a completely independent test of the network generalization. The aim of this part is to confirm the ANN accuracy by application of untrained data in the model. The multilayer perceptron (MLP) program has been used in MATLAB software. The input portions of the program were modified for the easy computation and handling of GIS data. The Levenberg–Marquardt algorithm has been used to construct the input files, scaling (normalized) the data, train the network and doing needed postprocessing to obtain model output (Pradhan and Lee 2010a). The number of epochs was set to 2,000, and the sum square error (SSE) value used for the stopping criterion was set to 0.001. The experimental data sets met the 0.001 SSE goals in the case of 0 slopes (ESM Fig. 1). However, if the SSE value was not achieved then the maximum number of iterations was terminated at 2,000 epochs. The rate of change of SSE is displayed on the training set (blue line), the validation set (green line) and test (red line) (ESM Fig. 2). The validation and test lines in ESM Fig. 2 are very similar. As can be observed in ESM Fig. 2, the rate of change was very similar and the error is scaled down with increasing epoch, and it reached to 5.6498e-05 at epoch 1,811. This indicates a good fit meaning that the ANN is well trained and it can be used to predict and modeling. Seven input nodes each representing flood causative parameters including rainfall, slope, elevation, soil, geology, flow accumulation, and land use were used during the ANN modeling. These factors are denoted as I1, I2, I3, I4, I5, I6 and I7, respectively (Table 2). The 20 hidden nodes are denoted as HA1 to HA20 (Table 2, column 1). Table 2 shows the weight connections changes between the input and first hidden layers. From Table 2, it can be observed that there is little variation in maximum and minimum connection weights between the input and hidden layers nodes except the rainfall parameter (I1). For the rainfall parameter (I1), the variation between maximum and minimum connection weights are larger than other factors (e.g.

123

Author's personal copy 260

Environ Earth Sci (2012) 67:251–264

I2, I3, …, I7) in the corresponding input layers. This indicates that rainfall factor is the main factor in training of the neural network compared to the other inputs. For the ANN modeling, we used a back-propagated algorithm to adjust and tune the connection weights between the inputs, first hidden, second hidden and output layers (Pradhan et al. 2010a). In the algorithms used here, the error and variation between the ANN outputs and measured data (target) are back propagated through the ANN and are minimized by updating interconnection weights between the layers (Arora et al. 2004; Lee et al. 2004; Pradhan et al. 2010a, b; Pradhan and Lee 2010b), the best weight adjustment may happen between first hidden and second hidden layers. The ANN training results are shown in ESM Fig. 2. In this figure the observed river flow and ANN training results are compared. High and low river flow values are predicted very well, and there is high convergence and good agreement between the predicted flows and observed data. Generally, the relationship between the observed and predicted data is shown by regression plot. Since the ANN is perfectly trained, the ANN output equals the predicted data (ESM Fig. 3). The next step was to investigate this data point in order to determine if it represents extrapolation (i.e., is it outside of the training data set). If so, then it should be included in the training set, and additional data should be collected to be used as the test set.

Testing the network After ANN training process is completed, different datasets (testing data) were used to extend, and to determine the model accuracy. Using new data, the network performance was evaluated. These data had the same properties as the training data but they had not been used during the training of the model. An important result in testing these data was that the ANN was able to identify all values same as training stage. This result yields a R2 value of 1 which is acceptable result and it shows a high level of prediction. The simulated and ANN predicted river flow, and the regression plot are shown in ESM Figs. 4 and 5, respectively.

Flood map generated by ANN Flood maps will be helpful for disaster planning in addition to an actual emergency response to the floods. Estimation of the flood inundation area is the most important duty and highest priority for decision makers and most relevant for national and local governments. The outputs of the ANN model can be used in GIS for visualization of the flood extent and the flood inundation areas. This map is the best tool to quickly use as a potential impact assessment and for

Table 2 Input (I)—hidden layer 1 (HA) connection weights Node

I1 (rainfall)

I2 (slope)

I3 (elevation)

I4 (soil)

I5 (geology)

I6 (flow accumulation)

I7 (land use)

-9.9E-09

1.6E-09

1.6E-09

-3.2E-09

-1.3E-09

-4.1E-09

1.4E-05

-2.6E-06

-2.3E-06

5.1E-06

1.8E-06

6.2E-06

-5.2E-06

1.0E-06

8.4E-07

-1.9E-06

-6.2E-07

-2.4E-06

HA1

23.6799

HA2

0.2785

HA3

-1.4384

HA4

-10.7464

1.7E-06

-2.9E-07

-2.6E-07

3.7E-07

2.1E-07

6.9E-07

HA5

2.0906

-2.1E-07

3.8E-08

3.2E-08

-6.8E-08

-2.3E-08

-9.1E-08

HA6

3.6231

1.0E-05

-1.8E-06

-1.6E-06

3.3E-06

1.3E-06

4.2E-06

HA7

5.3119

4.4E-07

-8.1E-08

-6.9E-08

1.5E-07

5.5E-08

1.9E-07

HA8

0.1299

3.6E-06

-6.3E-07

-5.9E-07

1.2E-06

4.9E-07

1.5E-06

HA9

-2.9412

1.8E-06

-3.1E-07

-2.8E-07

5.6E-07

2.3E-07

7.4E-07

HA10

-26.4989

2.5E-08

-4.6E-09

-3.8E-09

8.2E-09

2.7E-09

1.1E-08

HA11

-0.6151

2.1E-07

-3.2E-08

-3.2E-08

5.5E-08

2.8E-08

7.8E-08

HA12

4.4757

-1.2E-06

2.2E-07

1.9E-07

-4.2E-07

-1.4E-07

-5.2E-07

HA13

4.2550

1.8E-06

-3.2E-07

-2.9E-07

5.9E-07

2.4E-07

7.7E-07

HA14

-0.5180

-6.8E-06

1.2E-06

1.1E-06

-2.2E-06

-8.2E-07

-2.8E-06

HA15 HA16

1.0745 9.6419

5.4E-05 -2.6E-07

-9.4E-06 4.7E-08

-8.6E-06 4.2E-08

1.7E-05 -9.1E-08

6.9E-06 -3.0E-08

2.2E-05 -1.1E-07

HA17

4.3609

-3.1086

0.673

0.5908

-1.213

-0.421

-1.634

HA18

-1.1144

6.0E-05

-1.1E-05

-8.9E-06

1.9E-05

6.0E-06

2.5E-05

-3.7E-05

6.2E-06

5.2E-06

-9.7E-06

-3.3E-06

-1.5E-05

3.3E-07

-5.4E-08

-5.3E-08

9.8E-08

4.6E-08

1.3E-07

HA19 HA20

123

1.05169 -3.6817

Author's personal copy Environ Earth Sci (2012) 67:251–264

261

Comparison between the forecasted and observed river flow in Fig. 6 indicates that the accuracy of model is quite good, especially in high river flows. Since the topography is the main factor to specify the flood inundation extent, the DEM map was used to determine the flood-prone area. The flood inundation area is derived from the DEM based on water levels in the river cross-section (Fig. 7).

Model performance assessments

Fig. 6 The comparison of simulated flood hydrographs with observed hydrographs at the Kota Tinggi gauging stations

rescue operations for any flood as well as to compute the type and number of buildings affected by the flood. To further extend the model performance, the ANN model was used to simulate recent floods in January 2007 that occurred in the Johor state. In fact, during the 2006–2007 Johor flood (due to a couple of abnormally heavy rainfall events which caused massive floods) the estimated total cost of these flood disasters valued at UDS 0.5 billion considered the most costly flood event in Malaysian history. At the peak of that Johor flood, around 110,000 people were evacuated and sheltered in relief centers and the death toll was 18 persons. The simulated hydrograph is compared to observed river flow for this event at Kota Tinggi station (Fig. 6). At this station, three critical flood levels are designated by DID, namely Alert (1.7 m), Warning (2.2 m) and Danger (2.7).

The model accuracy assessment is described in terms of the error of forecasting or the variation between the observed and predicted values. In the literature, there are many performance assessment methods for measuring the accuracy and each one has advantages and limitations. In this study, the most widely used methods namely coefficient of determination (R2), sum squared error (SSE), mean squared error (MSE), and root mean squared error (RMSE) were used to check the performance of the ANN. Each method is estimated from the ANN predicted values and the measured discharges (targets). To check the model performance, the multilayered perceptron (MLP) was used for forecasting flood events and was calculated based on both training and testing data (Table 3). The MLP model forecasting results produced excellent agreement with the real data at determination coefficient (R2). These values are 1 for MLP training and testing data. The results showed that the model has less SSE, MSE, and RMSE. Overall, the errors were negligible.

Fig. 7 Flood inundation area in January 2007 at Johor River Basin

123

Author's personal copy 262

Environ Earth Sci (2012) 67:251–264

Table 3 Comparison of model performance for MLP during training and testing

R2

Train

Test

1

1

SSE

6.5E-08

6.4E-08

MSE

4.9E-12

4.9E-12

RMSE

2.5E-23

2.4E-23

Table 4 Sensitivity analysis results for the input factors Factors

SSE

MSE

RMSE

R2

Slope

3.70

0.00028

0.01687

0.963

Elevation

4.13

0.00032

0.01782

0.931

Soil

2.37

0.00018

0.01350

0.988

Geology

5.28

0.00041

0.02015

0.990

Flow accumulation

6.38

0.00049

0.02215

0.988

Land use

4.19

0.00032

0.01795

0.986

Some researchers used the sensitivity analysis to improve hydrological model (Liu et al. 2003; Bahremand and De Smedt 2008; Pappenberger et al. 2008; Fernandez and Lutz 2010). Sensitivity analysis is a common tool for finding important model factors, testing the model conceptualization, and developing the model structure (Sieber and Uhlenbrook 2005). In this study, a sensitivity analysis was applied to define the relative importance of each of the input data with the exception of the rainfall factor as the main reason of floods, on the river flow in the output. Input factors including slope, elevation, soil, geology, flow accumulation, and land use were considered, in turn, in the sensitivity analysis. Table 4 shows the ANN models, with one of the input factors eliminated in each case. As can be seen in the table, the R2 of the factors varies within the range 0.931–0.990. This range indicates that all input causative factors influence the river flow. Table 4 indicated that among the different input factors, elevation has the most significant (R2 = 0.931) and geology (R2 = 0.990) has the least influence on river flow and flood.

Discussion and conclusions Over the last decade, ANN has been used in many geohazard applications. Integration of GIS and neural network techniques in the field of water resource has opened various new approaches in hydrological modeling, improved our ability to create more accurate flood models, and helped to present the results in a spatial environment. Floods are affected by several factors such as rainfall, initial soil moisture, geology, land use, evaporation,

123

watershed infiltration, geomorphology, etc. There exists a very complicated relationship between these factors, and they have significant influence on each other and on the runoff. Understanding these factors and the interaction between them are necessary for hydrological modeling. This study gives a detailed selection of the most important flood causative factors and an understanding of the interactions between them. The sensitivity analysis performed here shows that the elevation is the most important factor for flood susceptibility mapping. The average normalized value shows that elevation has the highest weight values (R2 = 0.931) followed by slope (R2 = 0.963) and then landuse (R2 = 0.986). The scientific weights and ratings are essential to flood susceptibility mapping. The back-propagation training algorithm presents difficulties when trying to follow the internal processes of the procedure. The method also involves a long execution time with a heavy computing load. Therefore, the thematic data layers were converted into ASCII format to speed up the computing process. Computation of the weight of the factors and artificial neural network modeling was performed in MATLAB; the outputs were exported to GIS for map production and visual interpretation. Flood susceptibility map was analyzed qualitatively using equal area classification schemes. A prototype of MLP was developed and integrated with GIS. These new methods apply different causative flood factors to model the floods and present the results in a spatial form. The results showed the models could simulate peak river flows as well as base flows. These results can be used as basic data to assist slope management and landuse planning. The methods used in the study can also be used for generalized planning and assessment purposes, although they may be less useful on a site-specific scale, where local geomorphologic and geographic heterogeneities may prevail. The study suggests the model can be used to predict floods in the study area with acceptable accuracy. It seems integration of this model with a real-time warning system will provide a great advantage and flood damages can be reduced significantly. Acknowledgments This article is greatly benefited from very helpful reviews by two anonymous reviewers and editorial comments by James W. LaMoreaux.

References Abraham A (2005) Artificial Neural Networks. In: Peter H. Sydenham, Richard Thorn (ed) Handbook of measuring system design. John Wiley and Sons, London, pp 901–908 Arora MK, Das Gupta AS, Gupta RP (2004) An artificial neural network approach for landslide hazard zonation in the Bhagirathi (Ganga) Valley, Himalayas. Int J Remote Sens 25(3):559–572

Author's personal copy Environ Earth Sci (2012) 67:251–264 ASCE Task Committee (2000) Artificial neural networks in hydrology I: preliminary concepts. J Hydrol Eng 5(2):115–123 Atkinson PM, Tatnall ARL (1997) Neural networks in remote sensing. Int J Remote Sens 18:699–709 Bahremand A, De Smedt F (2008) Distributed hydrological modeling and sensitivity analysis in Torysa Watershed, Slovakia. Water Resour Manag 22:393–408 Bishop CM (1994) Neural networks and their application. Rev Sci Instrum 65(6):1803–1830 Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford, UK Blazkova S, Beven K (1997) Flood frequency prediction for data limited catchments in the Czech Republic using a stochastic rainfall model and TOPMODEL. J Hydrol 195(1–4):256–278 Cunderlik JM, Burn DH (2002) Analysis of the linkage between rain and flood regime and its application to regional flood frequency estimation. J Hydrol 261(1–4):115–131 Dixon B (2005) Applicability of neuro-fuzzy techniques in predicting ground-water vulnerability: a GIS-based sensitivity analysis. J Hydrol 309:17–38 Farajzadeh M (2001) The flood modeling using multiple regression analysis in Zohre & Khyrabad Basins. In: 5th International Conference of Geomorphology, August, Tokyo, Japan Farajzadeh M (2002) Flood susceptibility zonation of drainage basins using remote sensing and GIS, case study area: Gaveh rod Iran. In: Proceeding of international symposium on geographic information systems, Istanbul, Turkey, 23–26 Sept 2002 Feng LH, Lu J (2010) The practical research on flood forecasting based on artificial neural networks. Expert Syst Appl 37:2974–2977 Fernandez DS, Lutz MA (2010) Urban flood hazard zoning in Tucuman Province, Argentina, using GIS and multicriteria decision analysis. Eng Geol 111:90–98 Flood I, Kartam N (1994) Neural networks in civil engineering. I: principles and understanding. J Comput Civil Eng 8(2):131–148 Gomez H, Kavzoglu T (2005) Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin. Venezuela. Eng Geol 78(1–2):11–27 Hassan AJ, Ghani AA (2006) Development of flood risk map using gis for sg. Selangor Basin. http://redac.eng.usm.my/html/ publish/2006_11.pdf. Accessed 19 April 2008 Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, New Jersey Hess LL, Melack JM, Simonett DS (1990) Radar detection of flooding beneath the forest canopy: a review. Int J Remote Sens 11:1313–1325 Hess LL, Melack J, Filoso S, Wang Y (1995) Delineation of inundated area and vegetation along the Amazon floodplain with the SIR-C Synthetic Aperture Radar. IEEE T Geosci Remote 33:896–903 Holger RM, Dandy GC (1996) The use of artificial neural networks for the prediction of water quality parameters. Water Resour Res 32:1013–1022 Horritt MS, Bates PD (2002) Evaluation of 1D and 2D numerical models for predicting river flood inundation. J Hydrol 268:87–99 Islam MM, Sado K (2001) Flood damage and modeling using satellite remote sensing data with GIS: case study of Bangladesh. In: Jerry Ritchie et al (eds) Remote sensing and hydrology 2000. IAHS Publication, Oxford, pp 455–458 Islam MM, Sado K (2002) Development priority map for flood countermeasures by remote sensing data with geographic information system. J Hydrol Eng 9:346–355 Kingma NC (2002) Flood hazard assessment and zonation, Lecture Note. ITC, Enschede Lee S, Ryu J, Won J, Park H (2004) Determination and application of the weights for landslide susceptibility mapping using an artificial neural network. Eng Geol 71:289–302

263 Lek S, Gue´gan JF (1999) Artificial neural networks as a tool in ecological modelling, an introduction. Ecol Model 120:65–73 Lek S, Delacoste M, Baran P, Dimopoulos I, Lauga J, Aulanier S (1996) Application of neural networks to modelling non-linear relationships in ecology. Ecol Model 90:39–52 Lin HS, McInnes KJ, Wilding LP, Hallmark CT (1999) Effects of soil morphology on hydraulic properties: I. Quantification of soil morphology. Soil Sci Soc Am J 63:948–953 Liu H, Chandrashekar V (2000) Classification of hydrometers based on polarimetric radar measurements: development of fuzzy logic and neuro-fuzzy systems and in situ verifications. J Atmos Ocean Tech 17:140–164 Liu YB, Gebremeskel S, De Smedt F, Hoffmann L, Pfister L (2003) A diffusive transport approach for flow routing in GIS-based flood modelling. J Hydrol 283:91–106 Lorrai M, Sechi GM (1995) Neural nets for modeling rainfall-runoff transformations. Int Ser Prog Water Res 9:299–313 Maidment DR (2002) Arc Hydro: GIS for water resources. ESRI Press, Redlands Maier HR, Dandy GC (1996) The use of artificial neural networks for the prediction of water quality parameters. Water Resour Res 32(4):1013–1022 Mas JF (2004) Mapping land use/cover in a tropical coastal area using satellite sensor data, GIS and artificial neural networks. Estuar Coast Shelf S 59:219–230 Oh JJ, Pradhan B (2011) Application of a neuro-fuzzy model to landslide susceptibility mapping in a tropical hilly area. Comput Geosci 37(9):1264–1276. doi:10.1016/j.cageo.2010.10.012 Paola JD, Schowengerdt RA (1995) A review and analysis of backpropagation neural networks for classification of remotely sensed multi-spectral imagery. Int J Remote Sens 16:3033–3058 Pappenberger F, Beven KJ, Ratto M, Matgen P (2008) Multi-method global sensitivity analysis of flood inundation models. Adv Water Resour 31:1–14 Pirasteh S, Rizvi SMA, Ayazi MH, Mahmoodzadeh A (2010) Using microwave remote sensing for flood study in Bhuj Taluk, Kuchch District Gujarat, India. Int Geoinform Res Dev J 1(1):13–24 Pradhan B (2009) Groundwater potential zonation for basaltic watersheds using satellite remote sensing data and GIS techniques. Central Eur J Geosci 1(1):120–129. doi:10.2478/v10085009-0008-5 Pradhan B (2010a) Flood susceptible mapping and risk area delineation using logistic regression, GIS and remote sensing. J Spatial Hydrol 9(2):1–18 Pradhan B (2010b) Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. J Indian Soc Remote Sens 38(2):301–320. doi:10.1007/s12524-010-0020-z Pradhan B (2010c) Application of an advanced fuzzy logic model for landslide susceptibility analysis. Int J Comput Int Sys 3(3):370–381 Pradhan B (2011a) Manifestation of an advanced fuzzy logic model coupled with geoinformation techniques for landslide susceptibility analysis. Environ Ecol Stat 18(3):471–493. doi: 10.1007/s10651-010-0147-7 Pradhan B (2011b) Use of GIS based fuzzy relations and its cross application to produce landslide susceptibility maps in three test areas in Malaysia. Environ Earth Sci 63(2):329–349. doi: 10.1007/s12665-010-0705-1 Pradhan B, Buchroithner MF (2010) Comparison and validation of landslide susceptibility maps using an artificial neural network model for three test areas in Malaysia. Environ Eng Geosci 16(2):107–126. doi:10.2113/gseegeosci.16.2.107 Pradhan B, Lee S (2009) Landslide risk analysis using artificial neural network model focusing on different training sites. Int J Phys Sci 3(11):1–15

123

Author's personal copy 264 Pradhan B, Lee S (2010a) Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modeling. Environ Modell Softw 25:747–759. doi: 10.1016/j.envsoft.2009.10.016 Pradhan B, Lee S (2010b) Delineation of landslide hazard areas using frequency ratio, logistic regression and artificial neural network model at Penang Island, Malaysia. Environ Earth Sci 60:1037– 1054. doi:10.1007/s12665-009-0245-8 Pradhan B, Lee S (2010c) Regional landslide susceptibility analysis using backpropagation neural network model at Cameron Highland, Malaysia. Landslides 7(1):13–30. doi:10.1007/s10346009-0183-2 Pradhan B, Pirasteh S (2010) Comparison between prediction capabilities of neural network and fuzzy logic techniques for landslide susceptibility mapping. Disaster Adv 3(2):26–34 Pradhan B, Shafie M (2009) Flood hazard assessment for cloud prone rainy areas in a typical tropical environment. Disaster Adv 2(2):7–15 Pradhan B, Youssef AM (2010) Manifestation of remote sensing data and GIS for landslide hazard analysis using spatial-based statistical models. Arab J Geosci 3(3):319–326. doi:10.1007/ s12517-009-0089-2 Pradhan B, Youssef AM (2011) A 100-year maximum flood susceptibility mapping using integrated hydrological and hydrodynamic models: Kelantan River Corridor, Malaysia. J Flood Risk Manag 4:189–202. doi:10.1111/j.1753-318X.2011.01103.x Pradhan B, Singh RP, Buchroithner MF (2006) Estimation of stress and its use in evaluation of landslide prone regions using remote sensing data. Adv Space Res 37:698–709. doi:10.1016/j.asr. 2005.03.137 Pradhan B, Lee S, Buchroithner MF (2010a) A GIS-based backpropagation neural network model and its cross application and validation for landslide susceptibility analyses. Comput Environ Urban Sys 34:216–235. doi:10.1016/j.compenvurbsys. 2009.12.004 Pradhan B, Lee S, Buchroithner M (2010b) Remote sensing and GISbased landslide susceptibility analysis and its cross-validation in three test areas using a frequency ratio model. Photogramm Fernerkun 1:17–32. doi:10.1127/1432-8364/2010/0037 Pradhan B, Youssef AM, Varathrajoo R (2010c) Approaches for delineating landslide hazard areas using different training sites in an advanced artificial neural network model. Geospatial Inf Sci 13(2):93–102. doi:10.1007/s11806-010-0236-7 Pradhan B, Sezer E, Gokceoglu C, Buchroithner MF (2010d) Landslide susceptibility mapping by neuro-fuzzy approach in a landslide prone area (Cameron Highland, Malaysia). IEEE T Geosci Remote 48(12):4164–4177. doi:10.1109/TGRS. 2010.2050328 Principe JC, Euliano NR, Lefebvre WC (1999) Neural and adaptive systems: fundamentals through simulations. John Wiley and Sons, New York

123

Environ Earth Sci (2012) 67:251–264 Rashid A, Aziz A, Wong KFV (1992) A neural network approach to the determination of aquifer parameters. Ground Water 30:164–166 Ray C, Klindworth KK (2000) Neural networks for agrichemical vulnerability assessment of rural private wells. J Hydrol Eng 4:162–171 Rogers SJ, Chen HC, Kopaska-Merkel DC, Fang JH (1995) Predicting permeability from porosity using artificial neural networks. AAPG Bull 79:1786–1797 Sarle WS (1994) Neural networks and statistical models. In: Proceedings of the nineteenth annual SAS users group international conference, SAS Institute, pp 1538–1550 Schaap MG, Leij FJ, VanGenuchten MT (1998) Neural network analysis for hierarchical prediction of soil hydraulic properties. Soil Sci Soc Am J 62:847–855 See L, Openshaw S (2000) A hybrid multi-model approach to river level forecasting. Hydrol Sci J 45:523–536 Sezer E, Pradhan B, Gokceoglu C (2011) Manifestation of an adaptive neuro-fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. Expert Syst Appl 38(7):8208–8219. doi: 10.1016/j.eswa.2010.12.167 Sieber A, Uhlenbrook S (2005) Sensitivity analyses of a distributed catchment model to verify the model structure. J Hydrol 310:216–235 Smith K, Ward R (1998) Floods: physical processes and human impacts. John Wiley and Sons Ltd, West Sussex, pp 3–33 Tamari S, Wosten JHM, Ruiz-Suarez JC (1996) Testing an artificial neural network for predicting soil hydraulic conductivity. Soil Sci Soc Am J 57:1088–1095 Tamura SI, Tateishi M (1997) Capabilities of a four-layered feedforward neural network: Four layers versus three. IEEE T Neural Netw 8(2):251–255 United Nations Environment Program (2002) Early warning, forecasting and operational flood risk monitoring in Asia (Bangladesh, China and India). http://www.unep.org/geo/geo3.asp. Accessed 21 Aug 2010 Varoonchotikul P (2003) Flood forecasting using artificial neural networks. Taylor & Francis, The Netherlands, p 102 World Meteorological Organisation (2008) Urban flood management: a tool for integrated flood management. http://www.wmo.int/pages/ mediacentre/press_releases/pr_835_en.html. Accessed 15 July 2010 Woldt W, Dahab I, Bogardi C, Dou C (1996) Management of diffuse pollution in groundwater under imprecise conditions using fuzzy models. Water Sci Technol 33:249–257 Youssef AM, Pradhan B, Hassan AM (2011) Flash flood risk estimation along the St. Katherine road, southern Sinai, Egypt using GIS based morphometry and satellite imagery. Environ Earth Sci 62(3):611–623. doi:10.1007/s12665-010-0551-1 Zhu XY, SHi Xu, Zhu J-J, Zhou N-Q, Wu C-Y (1997) Study on the contamination of fracture karst water in Boshan District, China. Ground Water 35:538–545