Modeling destructive earthquake casualties based on ... - Springer Link

2 downloads 0 Views 961KB Size Report
Jan 24, 2014 - extracting information for prevention and risk reduction casualties after ... between casualty rate and related covariates based on earthquake ...
Nat Hazards (2014) 72:1093–1110 DOI 10.1007/s11069-014-1059-x ORIGINAL PAPER

Modeling destructive earthquake casualties based on a comparative study for Turkey ¨ zel S. Turkan • G. O

Received: 12 June 2013 / Accepted: 15 January 2014 / Published online: 24 January 2014  Springer Science+Business Media Dordrecht 2014

Abstract The statistical modeling of destructive earthquakes is an indispensable tool for extracting information for prevention and risk reduction casualties after destructive earthquakes in a seismic region. The linear regression (LR) model can reveal the relation between casualty rate and related covariates based on earthquake catalog. However, if some covariates affect the casualty rate parametrically and some of them nonparametrically, the LR model may entail serious bias and loss of power when estimating or making inference about the effect of parameters. We suggest that semi-parametric beta regression (SBR), semi-parametric additive regression (SAR), and beta regression (BR) models could provide a more suitable description than the LR model to analyze the observed casualties after destructive earthquakes. We support this argument using destructive earthquakes occurred in Turkey between 1900 and 2012 having surface wave magnitudes five or more. The LR, SAR, BR, and SBR models are compared within the context of this data. The data strongly support that the SBR and SAR models can lead to more precise results than the BR and LR models. Furthermore, the SBR is the best model for the earthquake data since the beta distribution provides a flexible model that can be used to analyze the data involving proportions or rates. The results from this model suggest that the casualty rate depends on energy, damaged buildings, and the number of aftershocks of a destructive earthquake. Keywords Casualty rate  Magnitude  Energy release  Beta regression  Semiparametric regression  Semi-parametric beta regression

¨ zel S. Turkan (&)  G. O Department of Statistics, Faculty of Science, Hacettepe University, 06800 Ankara, Turkey e-mail: [email protected] ¨ zel G. O e-mail: [email protected]

123

1094

Nat Hazards (2014) 72:1093–1110

1 Introduction Earthquakes become major societal risks when they impinge on vulnerable populations. According to the available worldwide data during the twentieth century, almost half a thousand of earthquakes demanded more than 1,615,000 human victims. In this sense, various studies have been conducted to investigate the regularities of human casualties during destructive earthquakes (Ohta et al. 1983; Samardjieva and Oike 1992; Tsai et al. 2001; Samardjieva and Badal 2002; Ashkenazi et al. 2005; Badal et al. 2005; Gao and Jia 2005; Gutie´rrez et al. 2005; Liu and Wu 2005; Liu et al. 2005; Wyss 2005; Koshimura et al. 2006; Lamontagne 2008; Zhao et al. 2008; Wu et al. 2009). Existing approaches are very efficient to estimate the death toll shortly after an earthquake, and the linear regression (LR) model is commonly used in the analysis of preventing human losses. The measures of the earthquake danger and the expectancy of maximum intensity from the data of historical earthquakes that occurred in Japan were proposed by Kawasumi (1951) using the LR model. Ohta et al. (1983) established the LR model of the number of victims caused by an earthquake as a function of the number of completely destroyed houses. The number of casualties in destructive earthquakes between years 1950 and 1980 for the world was investigated with the LR model by Christoskov and Samardjieva (1984), and this study completed by Samardjieva and Oike (1992) with worldwide and Japanese disastrous earthquakes that occurred during the period 1980–1990. Shebalin (1985) proposed a classification of earthquake danger by assuming that the expected number of deaths increases with the growth of the population in different nations in the world. More recently, the relation between the number of killed individuals and the earthquake magnitude has been studied using the LR model in Samardjieva and Badal (2002), Gutie´rrez et al. (2005), Wyss (2005), Koshimura et al. (2006), Zhao et al. (2008), and Wu et al. (2009). The LR model is not, however, appropriate, the casualty rate is restricted to the interval [0, 1], and some covariates can affect the casualty rate nonparametrically. The beta regression (BR) model has been shown to perform better than the LR model on a transformed variable for percentage data by Paolino (2001). So, it is more appropriate to use the BR model that assumes that the response variable, casualty rate, follows a continuous distribution with support in [0, 1] its lower and upper bounds. Furthermore, the semi-parametric additive regression (SAR) model can be more suitable than the BR model if the relation between casualty rate and some covariates is nonlinear. Besides, the semi-parametric beta regression (SBR) model proposed by Branscum et al. (2007) with applications on household expenditure data and genetic distance between foot-and-mouth disease viruses could be considered to analyze casualty rate. Since the relation between casualty rate and some covariates is nonlinear, the response variable is assumed to follow a beta distribution. As far as we know, these models have never been used for the analysis of the human losses. Besides, damaged buildings, depth, energy release, and aftershocks related to an earthquake may be important to analyze the casualty rate. Accordingly, a new model with these covariates for the ongoing revision of the literature is needed. Turkey ranks high among countries that have suffered centuries of loss of life and property due to earthquakes. In the twentieth century, earthquakes in Turkey caused over 110,000 deaths, 250,000 hospitalized injuries, and the destruction of 600,000 housing units. In recent decades, earthquake disaster risk for Turkey’s urban centers has increased, mainly due to high rates of urbanization, faulty land-use planning and construction, inadequate infrastructure and services, and environmental degradation. Erdik and Aydinoglu (2002) showed that the vulnerability of Turkish building stock was higher than that in California, which shared a comparable level of earthquake hazard. Therefore, our

123

Nat Hazards (2014) 72:1093–1110

1095

purpose is to put forward prognostic estimations of the size of the human losses in Turkey in the case of strong earthquakes. The paper is organized as follows. In Sect. 2, the LR, BR, SAR, and SBR models are introduced to derive new expressions from disastrous earthquakes in Turkey. Section 3 describes study area and includes the determination of the covariates related to the casualty rate. In Sect. 4, the results are obtained applying the technique to the destructive earthquakes occurred in Turkey. Some future research perspectives are discussed in Sect. 5.

2 Methods In this section, we briefly summarize some basic concepts related to the LR, BR, SAR, and SBR models. 2.1 Linear regression model The LR analysis is the most widely used statistical technique in researches and provides the basis for many other statistical methods. It is used for modeling the relationship between covariates and a response variable. The important aim of the LR analysis is to obtain predictions of the response variable. The LR model can be simply expressed as below yi ¼ b0 þ b1 xi1 þ b2 xi2 þ    þ bk xik þ ei where yi is response variable, xik’s are covariates, bk’s are unknown coefficients, and ei is error term. This model can be expressed as a matrix–vector form, y ¼ Xb þ e

ð1Þ

where X is an (nxp) full rank matrix of known constants, y is an (nx1) vector consisting of n observations, b is a (px1) vector of unknown parameters, and e is an (nx1) vector of random error with E(e) = 0 and V(e) = r2In. The ordinary least square (OLS) estimate of b is b^OLS ¼ ðXT XÞ1 XT y: The results of the LR analysis are not reliable if assumptions are not satisfied. In this situation, nonparametric or semi-parametric regression models could be used (Fox 1991). 2.2 Beta regression model The LR analysis is widely used in applications to analyze data that is considered to be related to other variables. It should, however, not be used, in models, where the response variable is restricted to the interval [0, 1]. A regression model is proposed by Ferrari and Cribari-Neto (2004) for the continuous variables that assume values in the standard unit interval, e.g., rates, proportions, or income concentration indices. Since the model is based on the assumption that the response is beta distributed, it is called as the BR model. Note that the chief motivation for the BR model lies in the flexibility delivered by the assumed beta law. The beta density can assume a number of different shapes depending on the combination of parameter values, including left- and right-skewed or the shape of the uniform density. As a generalized linear model (GLM), an underlying assumption of the BR is that a linear relationship exists between response variable and covariates through the

123

1096

Nat Hazards (2014) 72:1093–1110

link function (Christopher et al. 2011). The interval [0, 1] restriction on the response variable makes the BR analysis naturally adept for modeling percentages. So, the BR has been shown to perform better than the LR on a transformed variable for percentage data (Paolino 2001). We say that a random variable y follows a beta distribution with parameters p, q [ 0, denoted by B(p, q), if the distribution y admits the following density f ðy; p; qÞ ¼

Cðp þ qÞ p1 y ð1  yÞq1 ; CðpÞCðqÞ

y 2 ð0; 1Þ;

ð2Þ

where Cð:Þ is the gamma function. The beta distribution can be parameterized by its mean and variance similar to a normal distribution. However, unlike the normal distribution, the variance of a beta distribution is a function of its mean and a ‘precision’ parameter, a scaling measurement describing how tightly clustered the observed data are. This particular aspect of the beta distribution is used to model the association of covariates with changes in mean and variance simultaneously. The mean and variance of y are, respectively, p pq : and VðyÞ ¼ EðyÞ ¼ pþq ðp þ qÞ2 ðp þ q þ 1Þ Ferrari and Cribari-Neto (2004) defined a regression structure for the beta-distributed responses that differs from (1). Let l = p/(p ? q) and / = p ? q, i.e., p = l/ and q = (1 - l)/. Under this new parameterization, if y * b(p, q), then E(y) = l and VarðyÞ ¼ lð1  lÞ=ð1 þ uÞ. Under this parameterization, we will use the notation y * B(l, /) and the beta density in (2) is given by f ðy; l; /Þ ¼

Cð/Þ yl/1 ð1  yÞð1lÞ/1 ; y 2 ð0; 1Þ: Cðl/ÞCðð1  lÞ/Þ

ð3Þ

Then, the log-density is log f ðy; l; /Þ ¼ log Cð/Þ  log Cðl/Þ  log Cðð1  lÞ/Þ þ ðl/Þ log y þ fð1  lÞ/  1g logð1  yÞ; with 0 \ l \ 1 and / [ 0, since p, q [ 0. Let y1, …, yn be a random sample such that yi * B(li, /), i = 1, …, n. Here, the precision parameter, /, is constant for all observations. Hence, the BR model is defined as gðli Þ ¼ gi ¼ xTi bi ;

ð4Þ

where b ¼ ðb1 ; . . .; bk ÞT is a kx1 vector of unknown regression parameters (k \ n), gi is a linear predictor, and xi ¼ ðxi1 ; . . .; xik ÞT is the vector of k covariates. Here, gð:Þ : ð0; 1Þ ! R is a link function, which isPstrictly increasing and twice differentiable. The log-likelihood function of yi is ‘ðb; /Þ ¼ ni¼1 ‘i ðli ; /Þ where ‘i ðli ; /Þ ¼ log Cð/Þ  log Cðli /Þ  log Cðð1  li Þ/Þ þ ðli /Þ log yi þ fð1  li Þ/  1g logð1  yi Þ:

ð5Þ

Notice that li = g-1(xTi b) is a function of b, the vector of regression parameters. Parameter estimation is performed by the maximum likelihood (ML). In this model, the precision parameter / is constant for all observations. This implies a fixed dispersion BR model. The extension of the BR model introduced by Simas et al. (2010) is a variable

123

Nat Hazards (2014) 72:1093–1110

1097

dispersion BR model. The precision parameter is not constant in this model for all observations but modeled in a similar way as the mean parameter such as g1(li) = g1i = xTi bi and g2(/i) = g2i = zTi h. In this study, the fixed dispersion BR model is used (Cribari-Neto and Zeileis 2010). 2.3 Semi-parametric additive regression model Generalized additive models (GAMs) are nonparametric modifications of the GLMs where each predictor is included in the model as a nonparametric smoothing function (Hastie and Tibshirani 1990). A GAM can be written as y ¼ b0 þ f1 ðx1 Þ þ f2 ðx2 Þ þ    þ fp ðxp Þ þ e

ð6Þ

where fi, i = 1, …, p, are nonparametric functions estimated using smoothing techniques. Adding a parametric component to the additive model creates a SAR model, which provides the ability to estimate a wide variety of functional form. Assume that the effects of the covariates are additive, a SAR model where some of the covariates enter the model in a parametric fashion, while other variables enter as nonparametric terms. The SAR model allows the analyst to estimate standard parametric models with nonparametric estimates for the continuous predictor variables and for easy diagnosis of nonlinearity within the context of the familiar multiple regression model (Hastie and Tibshirani 1990). The SAR model that mixes parametric and nonparametric terms is expressed as yi ¼ b0 þ b1 z1i þ    þ bk zki þ f1 ðx1i Þ þ f2 ðx2i Þ þ    þ fp ðxpi Þ þ e:

ð7Þ

In the above model, the first k covariates are assumed to have linear effect on yi and the rest of the covariates enter into the model are assumed nonlinear effect on yi fitted with nonparametric smoothers. The model can be expressed as matrix–vector notation as y ¼ Zb þ

p X

fj þ e:

ð8Þ

j¼1

The model in (8) can also be sensed as additive regression model, because parametric part of (8) is summation of linear functions. If (8) has only one f component, the model is converted to the SAR model. Iterative algorithms are required for the estimation of both additive and semi-parametric regression models, and a variety of iterative algorithms have been implemented into various software programs. One method is to use the equivalence between smoothing splines and mixed models for a restricted maximum likelihood estimator that relies on a Newton–Raphson algorithm. The most well-known method of estimation for these models, however, is a backfitting algorithm proposed and implemented by Hastie and Tibshirani (1990). Using this backfitting algorithm, the estimation of f in (8) can be obtained as ! p k1 X X ðmþ1Þ ðmþ1Þ ðmÞ ; k ¼ 0; 1; 2; . . .; p; m ¼ 0; 1; 2; . . . ð9Þ ¼ Sk y  fj  fj fk j¼1

j¼kþ1

1

where S0 ¼ ZðZT ZÞ ZT is smoother matrix for parametric part. Also, f0 = Zb is predictor of parametric term (Omay and Aydın 2007). The SAR model also has an obvious advantage over a fully parametric model. In this semi-parametric framework, the inferential machinery of nonparametric regression is

123

1098

Nat Hazards (2014) 72:1093–1110

retained in the SAR model, which allows the analyst to test whether any nonparametric terms are needed in the model specification (Keel 2004). 2.4 Semi-parametric beta regression model The generalized additive model for location, scale, and shape (GAMLSS) is introduced by Rigby and Stasinopoulos (2005) as a way of overcoming some of the limitations associated with GLM and GAMs. GAMLSS is semi-parametric regression-type models, and one of the important sub-models of GAMLSS is the SBR model, which is introduced by Branscum et al. (2007). It is parametric in the sense that the response variable is assumed to follow a beta distribution, and at the same time, it is semi-parametric because one can model some conditioning effects through nonparametric functions. A GAMLSS model assumes independent observations yi for i = 1, 2, …, n with probability function f ðyi jhi Þ conditional on a vector of four distribution parameters, hi ¼ ðh1i; h2i ; h3i ; h4i Þ ¼ ðli ; /i ; ti ; si Þ, each of which can be a function to the covariates. The first two population distribution parameters li and /i are usually characterized as location and scale parameters, while the remaining parameters are characterized as shape parameters, e.g., skewness and kurtosis parameters (Florencio et al. 2012). Let yT ¼ ðy1 ; y2 ; . . .; yn Þ be the n length vector of the response variable and let gk(.), k = 1, 2, 3, 4, be known monotonic link functions relating the distribution parameters to covariates by gk ðhk Þ ¼ gk ¼ Xk bk þ

Jk X

Zjk cjk

ð10Þ

j¼1

where hk, k = 1, 2, 3, 4, represents l, u, m, and s distribution parameter vectors, respectively. Here, gk are also vectors of length n, bTk ¼ ðb1k ; . . .; bJk0 k Þ is a parameter vector of 0 length Jk0 , Xk is a fixed known design matrix (nxJk), Zjk is a fixed known (nxqjk) design matrix, and cjk is a qjk-dimensional random variable, which is assumed to be distributed as -1 cjk  Nqjk ð0; G1 is the inverse of a (qjkxqjk) symmetric matrix jk Þ, where Gjk Gjk ¼ Gjk ðkjk Þ, which may depend on a vector of hyperparameters kjk (Stasinopoulos and Rigby 2007). The model in (10) allows modeling each distribution parameter as a linear function of covariates. Let Zjk = In, where In is an nxn identity matrix, and cjk ¼ f jk ¼ fjk ðxjk Þ for all combinations j and k in (10), then semi-parametric additive formulation of GAMLSS is given by gk ðhk Þ ¼ gk ¼ Xk bk þ

Jk X

f jk ðxjk Þ

ð11Þ

j¼1

where the function f jk ¼ fjk ðxjk Þ is an unknown function of covariate xjk. The parameter vectors bk and the random effects parameters cjk are estimated for fixed values of the smoothing hyperparameters kjk’s by maximizing a penalized likelihood function ‘p given by 1 ‘p ¼ ‘  k 2

Z1

½f 00 ðtÞ 2 dt

1

where ‘ ¼

123

Pn

i¼1

i

log f ðyi jh Þ is the log-likelihood function.

ð12Þ

Nat Hazards (2014) 72:1093–1110

1099

In this study, RS algorithm which is used by Rigby and Stasinopoulos (1996a, b) for maximizing the penalized likelihood that is given in (12). From (5) and (12), the penalized likelihood function ‘sbeta of the SBR model is given by ‘sbeta ¼ log Cð/i Þ  log Cð/i li Þ  log C½/i ð1  li Þ þ ð/i li  1Þ log yi Z1 1 þ ½/i ð1  li Þ  1 logð1  yi Þ  k ½f 00 ðtÞ 2 dt: 2

ð13Þ

1

For the SBR model, the analysis is conducted using ‘GAMLSS’ package in R. An additive term function is to be fitted for fitting nonlinear or nonparametric (smooth) functions or random effects terms (Rigby and Stasinopoulos 2005). Cubic splines (cs) have been studied extensively in the literature, and the cubic spline function is based on the ‘smooth.spline()’ function of R. In this study, cs are used to implement the SBR model. The functions f(t) in the model (11) are arbitrary twice continuously differentiable functions and a penalized log-likelihood function is maximized given by ‘sbeta in (13) subject to R1 penalty terms of the form k 1 ½f 00 ðtÞ2 dt for the SBR model. The solution for the maximizing functions f(t) are all natural cs and is expressed as linear combinations of their natural cubic spline basis functions. (Weihua et al. 2012).

3 Application 3.1 Study area Turkey is a seismically active area within the complex zone of collision between the Eurasian Plate and both the African and Arabian Plates. Much of the country lies on the Anatolian Plate, a small plate bounded by two major strike-slip fault zones, the North Anatolian Fault and East Anatolian Fault. The western part of the country is also affected by the zone of extensional tectonics in the Aegean Sea caused by the southward migration of the Hellenic arc. The easternmost part of Turkey lies on the western end of the Zagros folds and thrust belt, which is dominated by thrust tectonics. Most earthquake-related deaths are caused by the collapse of structures, and the construction practices play a tremendous role in the death toll of an earthquake. North-western Turkey, the country’s most densely populated region and industrial heartland, has been struck by two massive earthquakes. On the basis of the current official earthquake hazard zoning map of turkey, 92 % of the total surface area and 95 % of the total population are situated in zones of varying degrees of seismic risk; 75 % of the industrial centers are located in these earthquake prone areas. Moreover, 53 % of the land, 50 % of the population, and 15 % of industry are situated in areas of first- and second-degree risks, liable to a violent earthquake any time. The first measured 7.4 on the Richter scale, on August 17, 1999. Izmit, an industrial city of one million in western Turkey, was nearest the epicenter. The official death toll stands at over 18,000, with some 44,000 people injured, nearly 300,000 homes either damaged or collapsed and more than 40,000 business premises similarly affected. The disaster was followed by more than 1,300 aftershocks, culminating in the second quake some 100 km to the east of Izmit on November 12, 1999, and rated 7.2 on the Richter scale. Another big earthquake hit the city of Bingol, Eastern part of Turkey, on May 1, 2003, killing 176 people with a Richter scale of 6.4. Recently on October 23, 2011, a 7.2-magnitude earthquake hit the province of Van in Eastern Anatolia, killing 644 people.

123

1100

Nat Hazards (2014) 72:1093–1110

In this study, destructive earthquakes with surface magnitude M C 5.0 in Turkey, between the north (39–42N) and the east (26–45E) coordinates, from 1900 to 2012 are investigated. To detect the significant seismic quiescence, it is necessary to decluster the earthquake catalog. Several declustering algorithms are proposed to identify and remove dependent events from a seismic data, among which the most standard are of Knopoff (1964), Gardner and Knopoff (1974), and Reasenberg (1985). In this study, the declustering algorithm proposed by Reasenberg (1985) is applied to the Turkish earthquake catalog. This algorithm removes all the dependent events from each cluster and substitutes them with a unique event, equivalent in energy to that of a whole series. It contains certain input parameters that allow the user to remove foreshocks and aftershocks in a smaller or larger time or space interval with respect to the main shock location. In this study, the following input parameters are used: • rfact = 8 (number of crack radii surrounding each earthquake to consider linking a new event to cluster), • xmeff = 2.00 (effective lower magnitude cutoff for catalog), • xk = 0.50 (increase in the lower magnitude cutoff during clusters xmeff = xmeff ? xkm, where m is the magnitude of the largest event of the cluster), • p1 = 0.95 (probability of detecting the next clustered event used to compute the look ahead of time s), • smin = 1.0 day (minimum value of the look-ahead time for building clusters, when the first event is not clustered), and • r = 0.011x100.4m km (interaction distance of dependent events). The resulting declustered catalog has 118 main shocks (5 %), 831 foreshocks (38 %), and 1,226 aftershocks (57 %). The epicentral distribution of the main shocks, the source of which is Kandilli observatory, is shown in Fig. 1.

Fig. 1 Study area and epicentral distribution of earthquakes with Ms C 5.0 (source Kandilli observatory 2013)

123

Nat Hazards (2014) 72:1093–1110

1101

3.2 Covariates related to the earthquake casualty rate After exploring the data, regression models are applied to the data to predict the casualty rate. Note that the casualty rate is defined as the number of killed people divided by the number of inhabitants of the affected region. These models deal with several kinds of covariates, namely depth of main shock, number of aftershocks, magnitude, and transformed energy of earthquake. It is now worthwhile to overview these covariates that are supposed to be statistically significant on the casualty rate. • Depth Earthquakes occurring at a depth of \70 km are classified as ‘shallow-focus’ earthquakes, while those with a focal depth between 70 and 300 km are commonly termed ‘mid-focus’ or ‘intermediate-depth’ earthquakes. In fact, the great majority of earthquake in Turkey is shallow. Shallow earthquakes often tend to cause serious damage. That is, the deeper earthquake is the less effect in the surface of the Earth. • Aftershock To detect the significant seismic quiescence, it is necessary to decluster the earthquake catalog. As mentioned above, Reasenberg’s declustering algorithm removes all the dependent events from each cluster and substitutes them with a unique event. The aftershocks of each earthquake with magnitude Ms C 4.0 within 1 month and shallower than 40 km are used. A strong aftershock can cause additional casualties from indirect effects of the earthquake. Aftershocks are smaller earthquakes that occur after a main earthquake, and in most cases, there are many of these (1,260 were measured after the 1964 Alaskan earthquake). Aftershocks occur because the main earthquake changes the stress pattern in areas around the epicenter, and the crust must adjust to these changes. Aftershocks are very dangerous because they cause further collapse of structures damaged by the main shock. Aftershocks are a secondary effect of earthquakes. • Surface magnitude (M) The most frequent way to represent the strength of an earthquake is by its magnitude. Surface wave magnitude is generally used for shallow earthquakes. It is estimated using the formula MS = log (A/t) ? 1.66 log D ? 3.3, where A is the maximum amplitude (in micrometers) of the Rayleigh waves, T is the period (usually about 20 s), and D is a distance (in degrees). In general, the larger the earthquake, the more intense is the shaking and the duration of the shaking. • Transformed energy The energy of an earthquake represents as measure of its strength. To calculate the energy given the magnitude, we have used the formula log ES = 11.8 ? 1.5Ms (Gutenberg 1956; Jarkov 1983). To reduce, however, the very broad variability range of the energy released in an earthquake, we have transformed this quantity through the formula Eitr ¼

ðEi  Emin Þ ; Etr 2 ½0; 1 ðEmax  Emin Þ i

where Emax and Emin are, respectively, the upper and lower bound of the observed values, and Ei is the energy of the ith event. The energy release of an earthquake closely correlates to its destructive power. The larger the magnitude, the greater the energy and hence amplitude of seismic waves and the more damage they may potentially cause. • Damaged buildings Casualties in earthquakes have been principally due to damaged buildings, and these buildings can be potentially an informative measure for the casualty rate.

123

1102

Nat Hazards (2014) 72:1093–1110

4 Results We begin with this section by presenting summary of covariates. Table 1 shows the summary of the covariates that are thought to affect the occurrence of an earthquake. In this study, the LR model, BR model with different link functions, and SAR and SBR models are used to investigate the relationship between the casualty rate and covariates. Before applying these models, the distribution of the casualty rate is also investigated by means of quantile–quantile (Q–Q) plots. A Q–Q plot consists of plots of the observed quantile, Q(j), j = 1, …, n, against the quantile predicted by the fitted model. Figure 2 presents Q–Q plots of the casualty rate and logarithmic transformation of the casualty rate. As shown in Fig. 2a, the distribution of the casualty rate is not normal, and hence, the logarithmic transformation could be applied. Figure 2b shows that the distribution of the casualty rate after logarithmic transformation is normal. Then, the scatter plots are presented in Fig. 3 to see relationships between the casualty rate and covariates. As shown from Fig. 3, transformed energy, damaged buildings, depth, and aftershock covariates are assumed to have linear effect on yi and magnitude covariate is assumed to have nonlinear effect on yi. We first analyze the relationship between casualty rate and covariates using the LR analysis. The distribution of response variable must be normal to use the LR model. In this study, the logarithmic transformation is used since the distribution of casualty rate is not normal. As shown from Fig. 2b, the distribution of casualty rate is normal after logarithmic transformation. To fit the LR model, the logarithm of casualty rate is considered as the response variable. The result of the LR analysis is reported in Table 2. The results in Table 2 suggest that the surface magnitude and damaged buildings affect the casualty rate since these covariates are statistically significant at the nominal levels. An R2 value of 0.529 means that 52.9 % of the variability in the casualty rate is predictable

Table 1 Summary of covariates related to the casualty rate in Turkey

Mean ± SE

Surface magnitude

Transformed energy

Damaged buildings

Depth

Number of aftershocks

6.36 ± 0.63

0.05 ± 0.15

8,091.4 ± 18,085.8

26.33 ± 20.43

7.63 ± 12.09

a

Fig. 2 a Q–Q plot of the casualty rate and b Q–Q plot of log casualty rate

123

b

Nat Hazards (2014) 72:1093–1110

1103

Fig. 3 a Scatter plot of magnitude, b scatter plot of transformed energy, c scatter plot of damaged buildings, d scatter plot of depth, and e scatter plot of aftershock

using the LR model. Other covariates are not statistically significant on the human losses after destructive earthquakes in Turkey. We now turn our attention to performing the BR model since the casualty rate is continuous and restricted to the interval [0, 1]. Before modeling the earthquake data with the BR model, the beta distribution assumption for the response variable, casualty rate, is assessed. Whether the distribution of the casualty rate fit or not to the beta distribution is investigated by Q–Q plot and the Q–Q plot of the casualty rate for Turkey is shown in Fig. 4. Figure 4 suggests that the distribution of the casualty rate fits the beta distribution. Then, the BR model is performed using different link functions since the selection of an appropriate link function can greatly improve the model fit. The best link function is

123

1104

Nat Hazards (2014) 72:1093–1110

Table 2 Result of the LR analysis Variable

Parameter estimate

SE

t value

Pr [ |t|

Intercept

-22.270

3.087

-7.212

0.000*

2.304

0.491

4.696

Transformed energy

-6.774

3.913

-1.731

Damaged buildings

-0.0000911

0.000033

2.781

0.007**

0.008473

0.011790

0.719

0.476

0.002770

0.022370

0.124

0.902

Magnitude

Depth Aftershock

0.000* 0.088

2

Multiple R 0.529, F-statistic 13.7 on 5 and 61 df, p value 0.000 ** 0.01, * 0.05

Fig. 4 Q–Q beta plot of the casualty rate for Turkey

determined that clearly improves pseudo R-squared of model. The pseudo R-squared values of the BR model with different link functions are given in Table 3. The results in Table 3 indicate that the log–log is the best link function compared to other link functions. Then, the BR analysis with log–log link function is performed, and the result of this model is given in Table 4. Table 4 shows that the precision parameter u is significant and the model is valid to infer results. The results in Table 4 indicate that magnitude, damaged buildings, and aftershock are statistically significant at the usual nominal levels. This means that magnitude and damaged buildings have significantly positive effect, i.e., as the magnitude, damaged buildings increases, the casualty rate increases. Large aftershocks may pose a substantial hazard to populated areas. According to the model, the number of aftershock has negative effect on the casualty rate. Since the people become more carefully after an aftershock, it is expected to sign of the aftershock is negative in the model.

123

Nat Hazards (2014) 72:1093–1110 Table 3 Comparison criterion for the BR analysis with several link functions

1105

Link function

Pseudo R2

Logit

0.509

Probit

0.575

Cloglog

0.506

Cauchit

0.029

Log

0.504

Loglog

0.657

Table 4 Results of the BR analysis with log–log link function Variable

Parameter estimate

Intercept

SE

z value

Pr [ |z| 0.000*

-2.3680

0.309

-7.666

Magnitude

0.1234

0.049

2.535

0.011*

Transformed energy

0.5263

0.412

1.276

0.202

Damaged buildings

0.00001

0.000003

3.747

0.000*

Depth

-0.00037

0.001

-0.311

0.756

Aftershock

-0.00657

0.002

-3.073

0.002*

4.143

0.000*

/ coefficients (precision model with identity link) /

42.91

10.36

2

Log-likelihood 312.1 on 7 df, pseudo R 0.657 ** 0.01, * 0.05

We consider the SAR model that combines both parametric and nonparametric components when it is believed that the response variable depends on some covariates in a linear way but is nonlinearly related to the remaining covariates. In this study, there is a nonlinear relation between the casualty rate and magnitude; however, there is a linear relation between casualty rate and the rest of covariates. It implies that the SAR model more appropriate than the other models. The result of the SAR model is presented in Table 5. Table 5 indicates that the transformed energy, damaged buildings, and the number of the aftershocks are statistically significant at the usual nominal levels on the casualty rate. According to the model, the transformed energy, damaged buildings increase the casualty rate after a destructive earthquake in Turkey. However, the number of aftershock decreases the casualty rate. The nonparametric variable in nonparametric part of the suitable SAR model can be only displayed graphically, since it cannot be expressed parametrically. The effect of this nonparametric variable is shown Fig. 5. Table 5 indicates that the transformed energy, damaged buildings, and the number of the aftershocks are statistically significant at the usual nominal levels on the casualty rate. According to the model, the transformed energy, damaged buildings increase the casualty rate after a destructive earthquake in Turkey. However, the number of aftershock decreases the casualty rate. Finally, the SBR model is considered which combines both parametric and nonparametric components since the response variable, casualty rate, has a beta distribution. As shown from Fig. 3a, there is a nonlinear relation between the casualty rate and magnitude;

123

1106

Nat Hazards (2014) 72:1093–1110

Table 5 Results of the SAR analysis Variable

Parameter estimate

SE

t value

Pr [ |t|

Intercept

0.000*

-0.086

0.018

-4.864

Transformed energy

2.134

0.348

6.129

0.000*

Damaged buildings

0.000002

0.0000004

4.561

0.000*

Depth

-0.00007

0.00015

-0.447

0.656

Aftershock

-0.00094

0.00034

-2.740

0.008*

Edf

Ref. df

F

p value

8.994

4.023

0.0004*

Approximate significance of smooth terms f (Magnitude)

8.872

2

Adjusted R 0.91, deviance explained 93 % GCV score 0.0007, Scale est. = 0.00056, n = 67 ** 0.01, * 0.05

Fig. 5 Changing of casualty with respect to magnitude and 95 % CI

however, there is a linear relation between casualty rate and the rest of covariates. Hence, it implies that the SBR model can be more appropriate than the other models. Then, the result of the SBR model is presented in Table 6. Table 6 indicates that the transformed energy, damaged buildings, and the number of the aftershocks are statistically significant at the usual nominal levels on the casualty rate. According to the SBR model, damaged buildings increase the casualty rate after a destructive earthquake in Turkey. However, the number of aftershock and the transformed energy decrease the casualty rate.

123

Nat Hazards (2014) 72:1093–1110

1107

In order to compare models used to determine covariates, which affect the casualty rate, we can also consider the R-square criteria, the mean-squared prediction error (MSPE), and absolute bias (AB). The MSPE and AB are defined as below: MSPE ¼

n X ðpredicted  observedÞ2

n

i¼1

; AB ¼

n X ðpredicted  observedÞ : n i¼1

When the models are compared, the largest value of R-square and the smallest value of MSPE show the best fitting. The R-square, MSPE, and AB values of LR, BR, SAR, and SBR models are presented in Table 7. The results given in Table 7 indicate that the SAR and SBR models are strong competitors to the LR and BR models used for fitting earthquake data for the casualty rates considering R-square criterion, MSPE, and AB. The MSPE is the smallest for the SBR model; hence, it means that the best model is the SBR. Furthermore, the SBR model provides unbiased predictions. Note that in the SAR model it is not possible to control predictions values outside the range (0, 1). However, the negative values of prediction are so close to zero, and these values could be considered as zero. The improvement of the model fits can also be brought out graphically in a display of predicted versus observed values. The graphs of predicted versus observed values are presented in Fig. 6. Figure 6 shows that the best prediction is obtained via the SBR model especially for the extreme observations.

Table 6 Results of the SBR analysis with logit link function Variable

Parameter estimate

SE

t value

Pr [ |t|

Intercept

-11.82

0.000*

cs (Magnitude) Transformed energy Damaged buildings

1.436

-8.233

1.108

0.225

4.919

-2.807

1.243

-2.257

0.00005

0.000012

4.505

0.000* 0.0277* 0.000*

Depth

-0.00419

0.0058

-0.722

0.473

Aftershock

-0.0159

0.007

-2.271

0.0268*

Estimate

SE

t value

Pr [ |t|

-1.795

0.0796

-22.55

0.000*

Sigma coefficients (Intercept) * 0.05

Table 7 Comparison criterion of the regression models

Model

R2 (%)

MSPE

AB

LR

53

3.3077

0.00000

BR with loglog link

65

0.00052

-0.000079

SAR

91

0.00044

0.00000

SBR

61

0.00041

-0.00014

123

1108

Nat Hazards (2014) 72:1093–1110

Fig. 6 a Predicted versus observed values for the LR model, b predicted versus observed values for the BR model diagnostic plots for the models, c predicted versus observed values for the SAR model, and d predicted versus observed values for the SBR model

5 Conclusion Turkey and its surrounding area are known as an excellent natural laboratory to study tectonic escape-related deformation, and the consequent structures that include fold and thrust belts, suture zones, active strike-slip faulting, and active normal faulting and the associated basin formation. Five destructive earthquakes have hit the populated areas in Turkey since 1998: 1998 Adana; 1999 Golcuk; 1999 Duzce; 2003 Bingol; and 2011 Van. These earthquakes killed 19,166 people and destroyed 148,586 buildings. For this reason, a comprehensive analysis of destructive earthquakes and casualty rate gains importance for the seismic risk analysis of Turkey. Since the pioneering work of (Ohta et al. 1983), there has been an increasing interest in understanding the dynamics of an earthquake using several statistical methods to determine the human losses after a strong earthquake. In this study, we show that the regression models can be an efficient tool for investigating the casualty rate of the earthquake activity. For this aim, we have analyzed Turkish earthquake catalog using the regression models for the period of 1900–2012. Due to its flexibility in terms of the variety of density shapes that

123

Nat Hazards (2014) 72:1093–1110

1109

can be accommodated, the beta distribution is a natural choice for modeling casualty rate that are restricted to the interval (0, 1). Since the some covariates have nonlinear relation to the casualty rate, the SBR model as an alternative to the LR, BR, and SAR models is considered in seismology for the first time, and these models seem to lead to results that are more precise. Diagnostic plots also show that the suggested SBR model is the most efficient model and gives us a useful tool for determining influence parameters of destructive earthquakes on the rate of killed people after strong earthquakes. To obtain a reliable evaluation, the best model combines the following parameters: transformed energy, damaged buildings, and the number of aftershocks. The results can put for the information about the number of victims caused by an earthquake as a function of these covariates. Acknowledgments The authors are very much thankful to the editor and the unknown referees for valuable suggestions and comments on earlier draft of the paper.

References Ashkenazi I, Isakovich B, Kluger Y, Alfici R, Kessel B, Better OS (2005) Prehospital management of earthquake casualties buried under rubble. Prehosp Disaster Med 20:122–133 ´ (2005) Preliminary quantitative assessment of earthquake casualties Badal J, Va´zquez-Prada M, Gonza´lez A and damages. Nat Hazards 34:353–374 Branscum AJ, Johnson WO, Thurmond MC (2007) Bayesian beta regression: applications to household expenditure data and genetic distance between foot- and mouth-disease viruses. Aust N Z J Stat 49(3):287–301 Christopher JS, Barbara CT, Robert JA, Zoran R, Joyce SN, Dipankar B, Robert FW (2011) Application of beta regression to analyze ıschemic stroke. Clin Trials Methods Neuroepidemiology 37:73–82 Christoskov L, Samardjieva E (1984) An approach for estimation of the possible number of casualties during strong earthquakes. Bulg Geophys J X(4):94–106 Cribari-Neto F, Zeileis A (2010) Beta regression in R. J Stat Softw 34(2):1–24. http://www.jstatsoft.org/v34/ i02/. Accessed 26 Dec 2013 Erdik M, Aydinoglu N (2002) Earthquake performance and vulnerability of buildings in Turkey. Report Prepared for World Bank Disaster Management Facility, Washington, DC Ferrari SLP, Cribari-Neto F (2004) Beta regression for modeling rates and proportions. J Appl Stat 31:799–815 Florencio L, Cribari-Neto F, Ospina R (2012) Real estate appraisal of land lots using GAMLSS models. Chil J Stat 3:75–91 Fox J (1991) Regression diagnostics. Quantitative applications in the social sciences series. Sage Publications, London Gao JG, Jia Y (2005) A study on the time of promulgating earthquake disaster—an index of earthquake rescue ability. J Catastrophol 20:31–35 Gardner JK, Knopoff L (1974) Is the sequence of earthquakes in southern California with aftershocks removed, Poissonian? Bull Seismol Soc Am 64:1363–1367 Gutenberg B (1956) The energy of earthquakes. Q J Geol Soc 112:1–14 Gutie´rrez E, Taucer F, Groeve TD, Al-Khudhairy DHA, Zaldivar JM (2005) Analysis of worldwide earthquake mortality using multivariate demographic and seismic data. Am J Epidemiol 161:1151–1158 Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Chapman and Hall, New York Jarkov V (1983) Inner structure of the earth and the planet. Nauka, Moscow Kandilli Observatory and Earthquake Research Institute of University of Bogazici, Turkey, http://www. koeri.boun.edu.tr. Accessed 26 Dec 2013 Kawasumi H (1951) Measures of earthquake danger and expectancy of maximum intensity throughout Japan as inferred from seismic activity in historical times. Bull Earthq Res Inst 29:469–482 Keel L (2004) Semiparametric regression for the social sciences. Wiley, UK Knopoff L (1964) Rev Geophys 2:625–660. doi:10.1029/RG002i004p00625

123

1110

Nat Hazards (2014) 72:1093–1110

Koshimura S, Katada T, Mofjeld HO, Kawata Y (2006) A method for estimating casualties due to the tsunami inundation flow. Nat Hazards 39:265–274 Lamontagne M (2008) Casualties directly caused by an earthquake in Canada: first contemporaneous written accounts from the M6.5 Charlevoix, Quebec, earthquake of 20 October 1870. Bull Seismol Soc Am 98:1602–1606 Liu Z, Wu ZL (2005) A simple model of reported casualties during earthquakes and earthquake-generated tsunamis. Earthq Res China 21:526–529 Liu Z, Wu ZL, Cai MJ, Jiang CS (2005) Discussion on casualties during 2004 earthquake-generated tsunami in Indian Ocean. In: Li SC, Wang YJ, Huang P (eds) Progress in safety science and technology. Science Press, Beijing Ohta Y, Goto N, Ohashi H (1983) An empirical construction of equations for estimating number of victims by earthquake. Zisin II 36:463–466 Omay E, Aydın D (2007) A semiparametric additive model: investigate of house price in Eskisehir (Turkey). WSEAS Trans Math 3:494–499 Paolino P (2001) Maximum likelihood estimation of models with beta-distributed dependent variables. Polit Anal 9:325–346 Reasenberg P (1985) Second-order moment of central California seismicity. 1969–1982. J Geophys Res 90:5479–5495 Rigby RA, Stasinopoulos DM (1996a) A semi-parametric additive model for variance heterogeneity. Stat Comput 6:57–65 Rigby RA, Stasinopoulos DM (1996b) Mean and dispersion additive models. In: Hardle W, Schimek MG (eds) Statistical theory and computational aspects of smoothing 215–230. Physica, Heidelberg Rigby RA, Stasinopoulos DM (2005) Generalized additive models for location, scale and shape. J R Stat Soc Ser C Appl Stat 54:507–554 Samardjieva E, Badal J (2002) Estimation of the expected number of casualties caused by strong earthquakes. Bull Seismol Soc Am 92:2310–2322 Samardjieva E, Oike K (1992) Modelling the number of casualties from earthquakes. J Nat Disaster Sci 14:17–28 Shebalin NV (1985) Regularities of the natural disasters. Nauki O Zemle, Znanie Simas AB, Barreto-Souza W, Rocha AV (2010) Improved estimators for a general class of beta regression models. Comput Stat Data Anal 54:348–366 Stasinopoulos DM, Rigby RA (2007) Generalized additive models for location, scale and shape (GAMLSS) in R. J Stat Softw 23:1–46 Tsai YB, Yu TM, Chao HL, Lee CP (2001) Spatial distribution and age dependence of human-fatality rates from the Chi–Chi, Taiwan, earthquake of 21 September 1999. Bull Seismol Soc Am 91:1298–1309 Weihua Z, Zean L, Xiangjian X (2012) Influence analysis for semiparametric beta regression model with longitudinal data. Acta Math Sin 4:677–692 Wu XY, Gu JH, Wu HY (2009) A modified exponential model for reported casualties during earthquakes. Acta Seismol Sin 31:457–463 Wyss M (2005) Human losses expected in Himalayan earthquakes. Nat Hazards 34:305–314 Zhao YZ, Wu ZL, Li YT (2008) Casualty in earthquake and tsunami disasters: internet-based monitoring and early estimation of the final death toll. In: Proceedings of the 14th world conference on earthquake engineering, Beijing, October 12–17, 2008. Paper ID 09-01-0044

123