probabilistic seismic lifeline risk assessment using efficient sampling ...

Department of Civil and Environmental Engineering Stanford University

PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT USING EFFICIENT SAMPLING AND DATA REDUCTION TECHNIQUES By Nirmal Jayaram and Jack Baker

Report No. 175 May 2010

The John A. Blume Earthquake Engineering Center was established to promote research and education in earthquake engineering. Through its activities our understanding of earthquakes and their effects on mankind’s facilities and structures is improving. The Center conducts research, provides instruction, publishes reports and articles, conducts seminar and conferences, and provides financial support for students. The Center is named for Dr. John A. Blume, a well-known consulting engineer and Stanford alumnus. Address: The John A. Blume Earthquake Engineering Center Department of Civil and Environmental Engineering Stanford University Stanford CA 94305-4020 (650) 723-4150 (650) 725-9755 (fax) [email protected] http://blume.stanford.edu

©20 The John A. Blume Earthquake Engineering Center

c Copyright by Nirmal Jayaram 2010

All Rights Reserved

ii

Abstract Lifelines are large, geographically-distributed systems that are essential support systems for any society. Probabilistic seismic risk assessment for lifelines is less straightforward than for individual structures, due to challenges in quantifying the ground-motion hazard over a region rather than at just a single site and in developing a risk assessment framework that deals with the heavy computational burden associated with lifeline performance evaluations. Quantification of the regional ground-motion hazard requires information on the joint distribution of ground-motion intensities at multiple sites. Statistical tests are used here to examine the commonly-used assumptions of univariate normality of logarithmic intensities and multivariate normality of spatially-distributed logarithmic intensities. Further, observed and simulated ground-motion time histories are used to estimate the spatial correlation between intra-event residuals, which can be used to parameterize the joint distribution of the ground-motion intensities. Factors that affect the decay of the correlation with increasing separation distance are identified. The study then develops a computationally-efficient lifeline risk assessment framework based on efficient sampling and data reduction techniques. The framework can be used for developing a small, but stochastically representative, catalog of spatially-correlated ground-motion intensity maps that can be used for performing lifeline risk assessments. The catalog is used to evaluate the exceedance rates of various travel-time delays on an aggregated (higher-scale) model of the San Francisco Bay Area transportation network. The risk estimates obtained are consistent with those obtained using conventional Monte Carlo simulation (MCS) that requires three orders of magnitudes more ground-motion intensity maps. Therefore, the proposed technique can be used to drastically reduce the

iv

computational expense of a MCS-based risk assessment, without compromising the accuracy of the risk estimates. Further, the catalog of ground-motion intensity maps is used in conjunction with a statistical learning technique termed Multivariate Adaptive Regression Trees (MART) in order to obtain an approximate relationship between the ground-motion intensities at lifeline component locations and the lifeline performance. The lifeline performance predicted by this relationship can be used in place of the actual lifeline performance with advantage in problems whose computational demand stems from the need for repeated lifeline performance evaluations. Even though the above-mentioned risk assessment framework facilitates the consideration of spatial correlation between ground-motion intensities, current ground-motion models (e.g., NGA ground-motion models) that are used to predict the distribution of groundmotion intensities at individual sites are fitted assuming independence between the intraevent residuals. This study proposes a method to consider the spatial correlation in the mixed-effects regression procedure used for fitting ground-motion models, and empirically shows that the risk estimates of spatially-distributed systems can be inaccurate while using ground-motion models fitted without the consideration of spatial correlation. Finally, the study also investigates the extension of the seismic hazard and risk assessment concepts discussed earlier to hurricane hazard and risk modeling. The focus is on quantifying the uncertainties and the spatial correlation in hurricane wind fields (using the same techniques that are used to quantify these parameters in earthquake groundmotion fields), and evaluating their impact on the hurricane risk of spatially-distributed systems. The results show that the uncertainties and the spatial correlation in the wind fields must be modeled in order to avoid introducing errors into the risk calculations of spatially-distributed systems. The results also show that the tools developed in this thesis for seismic risk assessment can also be applicable to risk assessments that consider other hazards.

v

Acknowledgments This work was supported by the Stanford Graduate Fellowship and the U.S. Geological Survey (USGS) via External Research Program awards 07HQGR0031 and 07HQGR0032. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the USGS. The report was originally published as the Ph.D. dissertation of the first author. The authors would like to thank Professors Anne Kiremidjian, Sarah Billington, Kincho Law, Eric Dunham, Jerome Friedman and Dr. Paolo Bazzurro for providing constructive feedback on this work.

vi

Contents Abstract

iv

Acknowledgments

vi

1

Introduction

1

1.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Areas of contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2.1

Multi-site hazard modeling . . . . . . . . . . . . . . . . . . . . . .

5

1.2.2

Lifeline risk assessment . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 2

Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Statistical Tests of the Joint Distribution of Spectral Acceleration Values

20

2.1

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3

Testing the univariate normality of residuals . . . . . . . . . . . . . . . . . 23 2.3.1

2.4

Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 24

Testing the assumption of multivariate normality for random vectors using independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5

2.4.1

Henze-Zirkler test . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.2

Mardia’s measures of kurtosis and skewness . . . . . . . . . . . . . 31

2.4.3


Testing the assumption of multivariate normality for spatially distributed data 39 2.5.1

Check for bivariate normality . . . . . . . . . . . . . . . . . . . . 40

2.5.2

Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 41 vii

3

2.6

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.7

Data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.8

Appendix: Normal score transform . . . . . . . . . . . . . . . . . . . . . . 46

Correlation model for spatially distributed ground-motion intensities

48

3.1

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3

Modeling correlations using semivariograms . . . . . . . . . . . . . . . . . 51

3.4

Computation of semivariogram ranges for intra-event residuals using empirical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.5

4

3.4.1

Construction of experimental semivariograms using empirical data . 57

3.4.2

1994 Northridge earthquake recordings . . . . . . . . . . . . . . . 59

3.4.3

1999 Chi-Chi earthquake . . . . . . . . . . . . . . . . . . . . . . . 61

3.4.4

Other earthquakes . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.4.5

A predictive model for spatial correlations . . . . . . . . . . . . . . 67

Isotropy of semivariograms . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5.1

Isotropy of intra-event residuals . . . . . . . . . . . . . . . . . . . 69

3.5.2

Construction of a directional semivariogram . . . . . . . . . . . . . 70

3.5.3

Test for anisotropy using Northridge ground motion data . . . . . . 71

3.6

Comparison with previous research . . . . . . . . . . . . . . . . . . . . . . 71

3.7

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Spatial correlation between spectral accelerations using simulated groundmotion time histories

79

4.1

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3

Statistical estimation of spatial correlation . . . . . . . . . . . . . . . . . . 83

4.4

Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4.1

Effect of ground-motion component orientation on the semivariogram range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.4.2

Testing the assumption of isotropy using directional semivariograms 87

4.4.3

Testing the assumption of second-order stationarity . . . . . . . . . 88 viii

4.4.4 4.5 5

Effect of directivity on spatial correlation . . . . . . . . . . . . . . 90

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Simulation of spatially-correlated ground-motion intensities with and without consideration of recorded intensity values

93

5.1

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.3

Simulation of correlated residuals without consideration of recorded ground motion intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3.1

Single-step simulation technique . . . . . . . . . . . . . . . . . . . 97

5.3.2

Sequential simulation technique . . . . . . . . . . . . . . . . . . . 100

5.4

Importance sampling of normalized intra-event residuals . . . . . . . . . . 103

5.5

Sequential simulation of correlated residuals with consideration of recorded ground motion intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.6

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.7

Appendix: The conditional sequential simulation of heteroscedastic normalized residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6

Efficient sampling and data reduction techniques for probabilistic seismic lifeline risk assessment

111

6.1

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.3

Simulation of ground-motion intensity maps using importance sampling . . 116

6.4

6.5

6.3.1

Importance sampling procedure . . . . . . . . . . . . . . . . . . . 117

6.3.2

Simulation of earthquake catalogs . . . . . . . . . . . . . . . . . . 118

6.3.3

Simulation of normalized intra-event residuals . . . . . . . . . . . 121

6.3.4

Simulation of normalized inter-event residuals . . . . . . . . . . . 123

Lifeline risk assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.4.1

Risk assessment based on realizations from Monte Carlo simulation 124

6.4.2

Risk assessment based on realizations from importance sampling . 125

Data reduction using K-means clustering . . . . . . . . . . . . . . . . . . . 126

ix

6.6

Application: Seismic risk assessment of the San Francisco Bay Area transportation network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.6.1

Network data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.6.2

Transportation network loss measure . . . . . . . . . . . . . . . . 130

6.6.3

Ground-motion hazard . . . . . . . . . . . . . . . . . . . . . . . . 132

6.6.4


6.6.5

Importance of modeling ground-motion uncertainties and spatial correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.7

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.8

Appendix: Proof that the exceedance rates obtained using IS and K-means clustering are unbiased . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.9

Appendix: Improving the computational efficiency of the K-means clustering method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7

Lifeline performance assessment using statistical learning techniques

144

7.1

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.3

Brief introduction to ground-motion map sampling . . . . . . . . . . . . . 147

7.4

7.3.1

Conventional MCS of ground-motion maps . . . . . . . . . . . . . 147

7.3.2

Importance sampling of ground-motion maps . . . . . . . . . . . . 148

7.3.3

K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Confidence intervals for lifeline risk estimates . . . . . . . . . . . . . . . . 150 7.4.1

Network data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.4.2

Ground-motion hazard data . . . . . . . . . . . . . . . . . . . . . 153

7.4.3

Statistical description of the problem . . . . . . . . . . . . . . . . 153

7.4.4

Confidence intervals using bootstrap . . . . . . . . . . . . . . . . . 154

7.4.5

Approximate loss estimation using non-parametric regression . . . 156

7.4.6

Bootstrap confidence intervals estimated using the exact and the approximate loss functions . . . . . . . . . . . . . . . . . . . . . . 162

7.5

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

x

8

Seismic risk assessment of spatially distributed systems using ground motion models fitted considering spatial correlation 8.1

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

8.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

8.3

8.4

8.5 9

167

8.2.1

Current regression algorithm . . . . . . . . . . . . . . . . . . . . . 169

8.2.2

Should spatial correlation be considered in the regression algorithm? 172

Regression algorithm for mixed-effects models considering spatial correlation173 8.3.1

Covariance matrix for the total residuals . . . . . . . . . . . . . . . 174

8.3.2

Obtaining inter-event residuals from total residuals . . . . . . . . . 174

8.3.3

Algorithm summary . . . . . . . . . . . . . . . . . . . . . . . . . 176

8.3.4

Large sample standard errors of σ and τ . . . . . . . . . . . . . . . 176

8.3.5

Mixed-effects regression procedure in R . . . . . . . . . . . . . . . 177

Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.4.1

Standard deviation of residuals as a function of period . . . . . . . 180

8.4.2

Estimates of spatial correlation . . . . . . . . . . . . . . . . . . . . 182

8.4.3

Risk assessment for a hypothetical portfolio of buildings . . . . . . 183

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Hurricane risk assessment of spatially-distributed systems with consideration of wind-field uncertainties and spatial correlation

187

9.1

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

9.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

9.3

Spatial correlation estimation methodology . . . . . . . . . . . . . . . . . 191

9.4

Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 9.4.1

Data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

9.4.2

Hurricane Jeanne (2004) . . . . . . . . . . . . . . . . . . . . . . . 194

9.4.3

Hurricane Frances (2004) . . . . . . . . . . . . . . . . . . . . . . 198

9.4.4

Hurricane risk assessment of a hypothetical portfolio of buildings . 200

9.5

Limitations and research needs . . . . . . . . . . . . . . . . . . . . . . . . 203

9.6

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

xi

10 Conclusions

206

10.1 Contributions and practical implications . . . . . . . . . . . . . . . . . . . 206 10.1.1 Joint distribution of spectral acceleration values at different sites and/ or different periods . . . . . . . . . . . . . . . . . . . . . . . 206 10.1.2 Spatial correlation model for spectral accelerations . . . . . . . . . 208 10.1.3 Lifeline seismic risk assessment using efficient sampling and data reduction techniques . . . . . . . . . . . . . . . . . . . . . . . . . 209 10.1.4 Lifeline performance assessment using statistical learning techniques211 10.1.5 Seismic risk assessment of spatially-distributed systems using groundmotion models fitted considering spatial correlation . . . . . . . . . 211 10.1.6 Extension of proposed ground-motion modeling approaches to hurricane risk assessment . . . . . . . . . . . . . . . . . . . . . . . . 212 10.2 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . . . 213 10.2.1 Spatial correlation model for spectral accelerations . . . . . . . . . 213 10.2.2 Lifeline risk assessment . . . . . . . . . . . . . . . . . . . . . . . 215 10.2.3 Risk management . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 10.2.4 Multi-hazard risk assessment . . . . . . . . . . . . . . . . . . . . . 219 10.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 A Characterizing spatial cross-correlation between ground-motion spectral accelerations at multiple periods

221

A.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 A.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 A.3 Statistical Estimation of Spatial Cross-Correlation . . . . . . . . . . . . . . 224 A.4 Sample Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 227 A.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 B Supporting details for the spatial correlation model developed in Chapter 3

230

B.1 Semivariograms of residuals estimated using the Northridge earthquake ground motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 B.2 Semivariograms of residuals estimated using Chi-Chi earthquake ground motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 xii

B.2.1

Exact versus approximate semivariogram fit . . . . . . . . . . . . . 235

B.2.2

Semivariograms of the residuals at seven periods ranging between 0 and 10s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

B.3 Semivariograms of residuals estimated using broadband simulations for scenario earthquakes on the Puente Hills thrust fault system . . . . . . . . . 240 B.4 Clustering of Vs 30’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 B.5 Correlation between near-fault ground-motion intensities . . . . . . . . . . 244 B.6 Directional semivariograms estimated using the Northridge and the ChiChi earthquake records at various periods . . . . . . . . . . . . . . . . . . 250 C Deaggregation of lifeline risk: Insights for choosing deterministic scenario earthquakes

257

C.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 C.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 C.3 Deaggregation of seismic loss . . . . . . . . . . . . . . . . . . . . . . . . 260 C.4 Loss assessment for the San Francisco Bay Area transportation network . . 261 C.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 C.5.1

Contribution of magnitudes and faults to the lifeline losses . . . . . 263

C.5.2

Contribution of inter- and intra-event residuals to the lifeline loss . 267

C.6 Transportation network performance under sample scenario ground-motion maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 C.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Bibliography

273

xiii

List of Tables 2.1

Tests on normalized intra-event residuals computed at different periods . . . 35

2.2

Tests on inter-event residuals computed at different periods . . . . . . . . . 47

2.3

Tests on residuals corresponding to two orthogonal directions (fault-normal and fault-parallel directions) . . . . . . . . . . . . . . . . . . . . . . . . . 47

8.1

Regression coefficients for estimating median Sa (1s) . . . . . . . . . . . . 179

8.2

Standard deviations of residuals corresponding to Sa (1s) . . . . . . . . . . 179

xiv

List of Figures 1.1

Comparison of the risk assessment frameworks for (a) single structures and (b) lifelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

4

1999 Chi-Chi earthquake: (a) recorded PGAs (b) median PGAs predicted by the Boore and Atkinson [2008] ground-motion model (c) normalized total residuals computed using Equation 1.2. . . . . . . . . . . . . . . . . .

1.3

6

2004 hurricane Jeanne (The line indicates the hurricane track.): (a) recorded wind speeds (b) wind speeds predicted by Batts et al. [1980] wind-speed model (c) wind-speed residuals. . . . . . . . . . . . . . . . . . . . . . . . 11

1.4

Ground-motion intensity simulation for a magnitude 8 earthquake on the San Andreas fault: (a) median intensities obtained using the Boore and Atkinson [2008] ground-motion model (b) simulated values of the normalized total residuals (c) total intensities. . . . . . . . . . . . . . . . . . . . . 14

2.1

The normal Q-Q plots of the normalized intra-event residuals at four different periods. (a) T = 0.5 seconds (1560 samples) (b) T = 1.0 seconds (1548 samples) (c) T = 2.0 seconds (1498 samples) (d) T = 10.0 seconds (507 samples). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2

The histogram of the 12,194 pooled normalized intra-event residuals computed at 10 periods, with the theoretical standard normal distribution superimposed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3

The normal Q-Q plot of the pooled set of normalized intra-event residuals. . 35

2.4

The normal Q-Q plots of inter-event residuals at four different periods. (a) T = 0.5 seconds (64 samples) (b) T = 1.0 seconds (64 samples) (c) T = 2.0 seconds (62 samples) (d) T = 10.0 seconds (21 samples). . . . . . . . . . . 36 xv

2.5

Theoretical and empirical semivariograms for residuals computed at 2 seconds: (a) results for the 0.1 quantile of the residuals from the Chi-Chi data (b) results for the 0.25 quantile of the residuals from the Chi-Chi data (c) results for the 0.5 quantile of the residuals based from the Chi-Chi data (d) results for the 0.25 quantile of the residuals from the Northridge data. . . . 43

2.6

Theoretical and empirical semivariograms for the 0.25 quantile of the residuals: (a) results for the residuals computed at 0.5s from the Northridge data (b) results for the residuals computed at 0.5s from the Chi-Chi data (c) results for the residuals computed at 1s from the Chi-Chi earthquake data (d) results for the residuals computed at 5s from the Chi-Chi data. . . . . . . . 45

3.1

(a) Parameters of a semivariogram (b) Semivariograms fitted to the same data set using the manual approach and the method of least squares. . . . . 53

3.2

Range of semivariograms of ε˜ , as a function of the period at which ε˜ values are computed: (a) the residuals are obtained using the Northridge earthquake data (b) the residuals are obtained using the Chi-Chi earthquake data.

3.3

59

(a) Experimental semivariogram obtained using normalized Vs 30’s at the recording stations of the Northridge earthquake. No semivariogram is fitted on account of the extreme scatter (b) Experimental semivariogram obtained using normalized Vs 30’s at the recording stations of the Chi-Chi earthquake. The range of the fitted exponential semivariogram equals 25 km. 63

3.4

Range of semivariograms of ε˜ , as a function of the period at which ε˜ values are computed. The residuals are obtained using the: (a) Big Bear City earthquake data (b) Parkfield earthquake data; (c) Alum Rock earthquake data; (d) Anza earthquake data; (e) Chino Hills earthquake data. . . . . . . 65

3.5

Ranges of residuals computed using PGAs versus ranges of normalized Vs 30 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.6

(a) Range of semivariograms of ε˜ , as a function of the period at which ε˜ values are computed. The residuals are obtained from six different sets of time histories as shown in the figure; (b) Range of semivariograms of ε˜ predicted by the proposed model as a function of the period. . . . . . . . . 67

xvi

3.7

(a) Parameters of a directional semivariogram. Subfigures (b), (c) and (d) show experimental directional semivariograms at discrete separations obtained using the Northridge earthquake ε˜ values computed at 2 seconds. Also shown in the figures is the best fit to the omni-directional semivariogram: (b) azimuth = 0◦ (c) azimuth = 45◦ (d) azimuth = 90◦ . . . . . . . . . 72

3.8

Semivariogram obtained using residuals computed based on Chi-Chi earthquake peak ground velocities: (a) residuals from Annaka et al. [1997] and semivariogram model from Wang and Takada [2005] (b) residuals from Annaka et al. [1997] and semivariogram fitted to model the discrete values well at short separation distances (c) residuals from Annaka et al. [1997], considering random amplification factors. . . . . . . . . . . . . . . . . . . 74

4.1

Semivariogram computed using the Sa(T=2s) residuals. . . . . . . . . . . . 86

4.2

Ranges of semivariograms obtained using residuals computed from the (a) 1989 Loma Prieta simulations (b) recorded ground motions [Jayaram and Baker, 2009a]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.3

(a) Ranges are computed using residuals at different orientations (b) Omnidirectional (i.e., obtained using all pairs of points, irrespective of the azimuth) and directional semivariograms computed using residuals for Sa (2s). 89

4.4

(a) Ranges are computed using residuals from different spatial domains (b) Ranges are computed using pulse-like and non-pulse-like near fault ground motions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.1

Ground-motion intensities map simulation: (a) median intensities (b) spatially correlated normalized total residuals and (c) total intensities. . . . . . 96

5.2

Illustration of the sequential step procedure. . . . . . . . . . . . . . . . . . 102

5.3

The alternate sampling distribution (marginal distribution) used for the importance sampling of residuals [Jayaram and Baker, 2010]. . . . . . . . . . 104

xvii

6.1

Importance sampling density functions for: (a) magnitude and (b) normalized intra-event residual; (c) recommended mean-shift as a function of the average number of sites and the average site-to-site distance normalized by the range of the spatial correlation model. . . . . . . . . . . . . . . . . . . 120

6.2

(a) San Francisco Bay Area transportation network (b) Aggregated network. 134

6.3

(a) Travel-time delay exceedance curves (b) Coefficient of variation of the annual exceedance rate (c) Comparison of the efficiency of MCS, IS and the combination of K-means and IS (d) Travel-time delay exceedance curve obtained using the K-means method. . . . . . . . . . . . . . . . . . . . . . 134

6.4

(a) Mean of travel-time delays within a cluster (b) Standard deviation of travel-time delays within a cluster. With both clustering methods, cluster numbers are assigned in order of increasing mean travel-time delay within the cluster for plotting purposes. . . . . . . . . . . . . . . . . . . . . . . . 138

6.5

Comparison of site hazard curves obtained at two sample sites using the sampling framework with that obtained using numerical integration. (a) Sample site 1 and (b) Sample site 2. . . . . . . . . . . . . . . . . . . . . . 138

6.6

Exceedance curves obtained using simplifying assumptions. . . . . . . . . 143

6.7

Travel-time delay exceedance curve obtained using the two-step clustering technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.1

Sample ground-motion map corresponding to an earthquake on the San Andreas fault. A map is a collection of ground movement levels (groundmotion intensities) at all the sites of interest. The sites of interest, in this case, are located in the San Francisco Bay Area. . . . . . . . . . . . . . . . 145

7.2

(a) Stratified sampling of earthquake magnitudes (b) Importance sampling of residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.3

Four simulated ground-motion maps, two of which are reasonably similar and grouped together into one cluster. . . . . . . . . . . . . . . . . . . . . 151

7.4

(a) The San Francisco Bay Area transportation network (b) Aggregated model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.5

Exceedance rates of travel-time delays. . . . . . . . . . . . . . . . . . . . . 154

xviii

7.6

(a) Predicted vs. exact delay values (b) Prediction residuals. . . . . . . . . 157

7.7

(a) A LOESS fit to the prediction residuals (b) Predicted and exact delay values after bias correction. . . . . . . . . . . . . . . . . . . . . . . . . . . 158

7.8

Two sample exceedance curves obtained using the exact and the approximate loss functions (after bias correction). . . . . . . . . . . . . . . . . . . 159

7.9

(a) Residuals from the prediction model (b) Residuals normalized (divided) by the predicted delays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7.10 Normal Q-Q plot of the residuals. . . . . . . . . . . . . . . . . . . . . . . 160 7.11 MART model fitted using 150 MCS maps. . . . . . . . . . . . . . . . . . . 162 7.12 Methodology for estimating bootstrap confidence intervals for the loss curves.163 7.13 1000 bootstrapped exceedance curves obtained using the (a) exact loss function (b) approximate loss function. . . . . . . . . . . . . . . . . . . . . 163 7.14 Bootstrap confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . 164 7.15 Bootstrap confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . 165 7.16 Balanced bootstrap confidence intervals. . . . . . . . . . . . . . . . . . . . 166 8.1

Comparison of predicted median Sa (1s) values obtained using the CB08 model fitted with and without the consideration of spatial correlation: (a) linear scale (b) log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

8.2

Effect of spatial correlation on: (a) estimated intra-event residual standard deviation (σ ), (b) estimated inter-event residual standard deviation (τ), (c) estimated total residual standard deviation. (d) Ratio of inter-event residual standard deviation to total residual standard deviation. . . . . . . . . . . . . 181

8.3

Risk assessment results for a hypothetical portfolio of buildings performed using ground-motion models developed with and without the proposed refinement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

9.1

Hurricane Jeanne: (a) Observed wind speeds (b) Predicted wind speeds (c) Residuals (d) Bias-corrected residuals. . . . . . . . . . . . . . . . . . . . . 195

9.2

Residuals and bias-corrected residuals versus closest distances from the hurricane track. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

xix

9.3

(a) Histogram of bias-corrected residuals estimated using the Hurricane Jeanne data (b) Normal QQ plot of normalized bias-corrected residuals from Hurricane Jeanne. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

9.4

Semivariogram of bias-corrected residuals estimated using the Hurricane Jeanne data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

9.5

Bias-corrected residuals estimated using the Hurricane Frances data. . . . . 199

9.6

(a) Histogram of bias-corrected residuals estimated using the Hurricane Frances data (b) Normal QQ plot of normalized bias-corrected residuals from Hurricane Frances. . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

9.7

Semivariogram of bias-corrected residuals estimated using the Hurricane Frances data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9.8

Portfolio of five residential buildings considered in the risk assessment. . . 201

9.9

Portfolio loss exceedance probabilities. . . . . . . . . . . . . . . . . . . . 203

10.1 Comparison of the risk assessment frameworks for (a) single structures and (b) lifelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 A.1 (a) The San Francisco Bay Area transportation network and (b) Annual exceedance rates of various travel time delays on that network (results from Jayaram and Baker [2010]). . . . . . . . . . . . . . . . . . . . . . . . . . . 225 A.2 (a) Chi-Chi earthquake normalized residuals computed using spectral accelerations at 1 second (b) Chi-Chi earthquake normalized residuals computed using spectral accelerations at 2 seconds (c) Cross-semivariogram estimated using the 1s and 2s Chi-Chi earthquake residuals. . . . . . . . . . 229 B.1 Semivariogram of ε˜ based on the peak ground accelerations observed during the Northridge earthquake data . . . . . . . . . . . . . . . . . . . . . . 231 B.2 Semivariogram of ε˜ computed at 0.5 seconds based on the Northridge earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 B.3 Semivariogram of ε˜ computed at 1 second based on the Northridge earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

xx

B.4 Semivariogram of ε˜ computed at 2 seconds based on the Northridge earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 B.5 Semivariogram of ε˜ computed at 5 seconds based on the Northridge earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 B.6 Semivariogram of ε˜ computed at 7.5 seconds based on the Northridge earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 B.7 Semivariogram of ε˜ computed at 10 seconds based on the Northridge earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 B.8 Experimental semivariogram of ε˜ computed at 2 seconds based on the ChiChi earthquake data. Also shown in the figure are two fitted semivariogram models: (i) An accurate exponential + nugget model and (ii) An approximate exponential model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 B.9 Semivariogram of ε˜ based on the peak ground accelerations observed during the Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . 236 B.10 Semivariogram of ε˜ computed at 0.5 seconds based on the Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 B.11 Semivariogram of ε˜ computed at 1 second based on the Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 B.12 (Approximate) Semivariogram of ε˜ computed at 2 seconds based on the Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 B.13 Semivariogram of ε˜ computed at 5 seconds based on the Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 B.14 Semivariogram of ε˜ computed at 7.5 seconds based on the Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 B.15 Semivariogram of ε˜ computed at 10 seconds based on the Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 B.16 Experimental Semivariogram of ε˜ computed at 5 seconds based on the simulated ground-motion data. Also shown in the figure are two fitted semivariogram models: (i) An accurate spherical model and (ii) An approximate exponential model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

xxi

B.17 Range of semivariograms of ε˜ , as a function of the period at which ε˜ values are computed. The residuals are obtained using the simulated groundmotion data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 B.18 Simulated multivariate normal random fields. The correlation structure is defined using an exponential semivariogram with range equaling (a) 0km (b) 20km and (c) 40km. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 B.19 Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are computed from peak ground accelerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 B.20 Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 0.5 seconds . . . . . . . . . . . . . . . . . . 246 B.21 Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 1 second . . . . . . . . . . . . . . . . . . . 247 B.22 Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 2 seconds . . . . . . . . . . . . . . . . . . . 247 B.23 Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 5 seconds . . . . . . . . . . . . . . . . . . . 248 B.24 Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 7.5 seconds . . . . . . . . . . . . . . . . . . 248

xxii

B.25 Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 10 seconds . . . . . . . . . . . . . . . . . . 249 B.26 Experimental directional semivariograms at discrete separations obtained using the Northridge earthquake ε˜ values computed at 2 seconds. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 B.27 Experimental directional semivariograms at discrete separations obtained using the Chi-Chi earthquake ε˜ values computed at 1 second. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 B.28 Experimental directional semivariograms at discrete separations obtained using the Chi-Chi earthquake ε˜ values computed at 7.5 seconds. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 B.29 Experimental directional semivariograms at discrete separations obtained using the simulated time histories. The ε˜ values are computed at 2 seconds. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 B.30 Experimental directional semivariograms at discrete separations obtained using the simulated time histories. The ε˜ values are computed at 7.5 seconds. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

xxiii

B.31 Experimental directional semivariograms at discrete separations obtained using the simulated time histories. The ε˜ values are computed at 7.5 seconds. Also shown in the figures is an anisotropic model that fits the four experimental semivariograms well (It is to be noted that an anisotropic semivariogram has different shapes in different directions.): (a) Omnidirectional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90 . . . . 256 C.1 The aggregated San Francisco bay area transportation network. . . . . . . . 262 C.2 Recurrence curve for the travel time delay obtained using the simulationbased framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 C.3 Joint likelihoods of magnitudes and faults given that travel time delay exceeds (a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. . 264 C.4 Level of congestion in the network as indicated by the volume/ capacity ratio.265 C.5 Joint likelihoods of inter-event residual given that travel time delay exceeds (a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. . . . . . 266 C.6 Joint likelihoods of inter-event residual given that travel time delay exceeds (a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. . . . . . 267 C.7 Mean magnitude of earthquakes producing a travel time delay exceeding a specified threshold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 C.8 (a) Average of mean intra-event residual of earthquakes producing a travel time delay exceeding a specified threshold (b) Average of inter-event residual of earthquakes producing a travel time exceeding a specified threshold. . 268 C.9 Recurrence curves obtained without completely accounting for inter-event and intra-event residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 C.10 Performance of the network under three difference ground-motion scenarios corresponding to three different inter-event residuals. (a) η = 3.79, (b) η = -1.64 and (c) η= 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

xxiv

Chapter 1 Introduction 1.1

Motivation

Lifelines are large, geographically-distributed systems that are essential support systems for any society. Due to their known vulnerabilities, it is important to proactively assess and mitigate the seismic risk of lifelines. For instance, the Northridge earthquake caused over $1.5 billion in business interruption losses ascribed to transportation network damage [Chang, 2003]. The city of Los Angeles suffered a power blackout and $75 million of power-outage related losses as a result of the earthquake [e.g., Tanaka et al., 1997]. Lifeline seismic risk assessment is a systematic approach for quantifying the likelihood of observing such losses during future earthquakes (pre-event risk assessment) or in the immediate aftermath of an earthquake (post-event risk assessment). It is often the first step in the process of management of the lifeline seismic risk, and is useful for several applications including the prediction (or estimation after an earthquake) of quantities such as the monetary losses associated with structures and infrastructure owned by a corporation or insured by an insurance company, the number of injuries and casualties in a certain area and the probability that lifeline networks for power, water and transportation may be interrupted. This knowledge is useful for decision makers interested in seismic risk mitigation (e.g., lifeline retrofit), post-disaster management planning, post-earthquake decision making (e.g., opening and closing of facilities such as gas pipelines) and insurance modeling.

1

CHAPTER 1. INTRODUCTION

2

Lifeline seismic risk assessment is a multi-disciplinary problem that involves seismology to quantify the earthquake hazard, structural engineering to quantify the damage to infrastructure components, statistics to handle the numerous uncertainties that are present in the seismic environment and in the infrastructure performance, as well as tools and techniques from fields such as optimization, network flow modeling and economics. The analytical Pacific Earthquake Engineering Research Center (PEER) loss analysis framework has been used to perform the risk assessment for a single structure at a given site, by estimating the site ground-motion hazard and assessing probable losses using the hazard information [Cornell and Krawinkler, 2000, Deierlein, 2004]. The risk is measured as the exceedance rates of various loss levels, and is obtained as follows: Z Z Z

λ (DV ) =

G(DV |DM)dG(DM|EDP)dG(EDP|IM) dλ (IM)

(1.1)

where λ (DV ) is the exceedance rate of the decision variable (loss measure) denoted DV , dλ (IM) is the derivative of the exceedance rate of a ground-motion intensity measure denoted IM (e.g., spectral acceleration, peak ground acceleration), dG(EDP|IM) is the derivative of the probability of exceedance of an engineering demand parameter (EDP) (e.g., inter-story drift ratio) given an IM, dG(DM|EDP) is the derivative of the probability of exceedance of a damage measure (DM) (e.g., minor damage, severe damage) given an EDP and G(DV |DM) is the probability of exceedance of a decision variable (DV ) (e.g., monetary loss) given a DM. It is to be noted that the parameters IM, EDP and DM can also be vectors. Often, numerical integration is sufficient to estimate λ (DV ) for a single structure. Lifeline risk assessment, however, is based on a large vector of ground-motion intensities (e.g., intensities at all bridge locations in a transportation network). In other words, the scalar IM in Equation 1.1 is now replaced by a large vector of IMs which adds considerable complexity to the integral. The intensities also show significant spatial correlation (i.e., dependence between the intensities at different sites), which needs to be carefully modeled in order to accurately assess the seismic risk [e.g., Park et al., 2007, Bazzurro and Luco, 2004]. Further, the link between the lifeline component damage measures and the performance of the lifeline (i.e., G(DV |DM)) is usually not available in closed form. For instance, the travel


3

time of vehicles in a transportation network, a commonly-used performance measure, is only obtained using an optimization procedure rather than being a closed-form function of the ground-motion intensities and the bridge damage states. These additional complexities make it difficult to use the PEER framework for lifeline risk assessment. There are some analytical approaches that are sometimes used for lifeline risk assessment [e.g., Kang et al., 2008], but those are generally applicable to only specific classes of lifeline reliability problems. Hence, many past research works use Monte Carlo simulation (MCS)-based approaches instead of analytical approaches for lifeline risk assessment [e.g., Chang et al., 2000, Campbell and Seligson, 2003, Werner et al., 2004, Crowley and Bommer, 2006, Kiremidjian et al., 2007, Shiraki et al., 2007]. Figure 1.1 illustrates the above-mentioned similarities and dissimilarities between the risk assessment frameworks for single structures and lifelines. (The bold font in the figure denotes a vector. It is also to be noted that the value of G(DM|IM) for a lifeline component can be computed using G(DM|EDP) and G(EDP|IM) for the component, if desired.) In a MCS-based approach, several possible future earthquakes are simulated and the losses sustained by the lifeline due to the ground-motion intensities during these earthquakes are evaluated. These losses are then probabilistically combined in order to obtain the exceedance rates of various loss levels. Basic MCS-based approaches necessitate performance evaluations of the lifeline under a large number of possible future earthquake scenarios and are therefore highly computationally demanding. The current study addresses these challenges and proposes a computationally-efficient MCS-based framework for assessing the seismic risk of lifelines, with full consideration of the uncertainties and correlations present in spatial ground-motion fields.

1.2

Areas of contribution

This thesis aims to address the challenges mentioned above. The major contributions of this work are summarized below.


4

Figure 1.1: Comparison of the risk assessment frameworks for (a) single structures and (b) lifelines.


1.2.1

5

Multi-site hazard modeling

Challenges Lifeline risk assessment requires knowledge about the joint distribution of a vector of spatially-distributed ground-motion intensities during probable future earthquakes. The distribution of the ground-motion intensity at a single site is typically predicted using a ground-motion model, which takes the following form [e.g., Boore and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]: ln(Yi j ) = ln Y¯i j + σi j εi j + τi j ηi j

(1.2)

where Yi j denotes the ground-motion intensity parameter of interest (e.g., Sa (T ), the spectral acceleration at period T ) at site i during earthquake j; Y¯i j denotes the predicted (by the ground-motion model) median ground-motion intensity which depends on parameters such as magnitude, distance, period and local-site conditions; εi j denotes the normalized intra-event residual and ηi j denotes the normalized inter-event residual. Both εi j and ηi j are univariate normal random variables with zero mean and unit standard deviation. σi j and τi j are standard deviation terms that are estimated as part of the ground-motion model and are functions of the spectral period of interest, and in some models also functions of the earthquake magnitude and the distance of the site from the rupture. The term σi j εi j is called the intra-event residual and the term τi j ηi j is called the inter-event residual. The inter-event residual is a constant across all the sites for a given earthquake. (It is to be noted that some chapters of this thesis describe the ground-motion model directly in terms of the inter-event and the intra-event residuals, rather than using the normalized forms.) The sum of the inter-event residual and the intra-event residual is called the total residual. Figures 1.2a-b show, for example, the observed (i.e., Yi j ) and predicted (Y¯i j ) peak ground accelerations (PGA) during the 1999 Chi-Chi earthquake. Figure 1.2c shows the normalized total residuals (i.e., total residuals normalized by their standard deviation) computed using the Boore and Atkinson [2008] ground-motion model. While quantifying the hazard over two or more sites, the ground-motion model is used to predict the ground-motion intensity at each site of interest. For instance, the following


6

Figure 1.2: 1999 Chi-Chi earthquake: (a) recorded PGAs (b) median PGAs predicted by the Boore and Atkinson [2008] ground-motion model (c) normalized total residuals computed using Equation 1.2.


7

equations are used to predict the distribution of ground-motion intensity at sites i and i0 . ln(Yi j ) = ln Y¯i j + σi j εi j + τi j ηi j

(1.3)

ln(Yi0 j ) = ln Y¯i0 j + σi0 j εi0 j + τi0 j ηi0 j

(1.4)

It is to be noted, however, that the above equations only provide information about the marginal distribution of the ground-motion intensity at sites i and i0 . Regional risk assessments require knowledge about the joint distribution of the ground-motion intensity at sites i and i0 in order to capture possible dependencies between the ground-motion intensities at the two sites. Since the median predictions at sites i and i0 are deterministic and the values of the inter-event residual at sites i and i0 are equal, the only additional information required to quantify the joint distribution of intensities at sites i and i0 is the joint distribution of the εi j and εi0 j . While, the inter-event residual and each of the intra-event residuals during an earthquake have been statistically seen to follow the univariate normal distribution marginally [Abrahamson, 1988], not much is known about the joint distribution of multiple spatially-distributed intra-event residuals. In the past, some research works assume that the intra-event residuals follow a multivariate normal distribution [e.g., Bazzurro and Cornell, 2002, Baker and Cornell, 2006, Kiremidjian et al., 2007], though this assumption has not been verified using recorded time history data. Once the nature of the distribution of the residuals (and equivalently, that of the intensities) is determined, the distribution needs to be parameterized so that it can be used for forward-predicting residuals and ultimately, ground-motion intensities from future earthquakes. One of the challenges in parameterizing the joint distribution of the intra-event residuals is that the intra-event residuals exhibit ‘spatial correlation’ [e.g., Boore et al., 2003]. The spatial correlation is a term that denotes the interdependency between the intraevent residuals located over a region during an earthquake. It arises due to several reasons including common-source effects (a part of this effect is captured by the inter-event residual) and similarity in local-site effects and propagation-path effects. The correlation is known to be large when the sites are close to one another, and decays with increase in separation between the sites [Boore et al., 2003]. Evidence of spatial correlation can be seen in Figure 1.2c, which shows clusters of large- and small-valued residuals (which indicates


8

dependence between closely-spaced residuals). The impact of this correlation on lifeline risk has only been recently studied, and has been seen to be significant [e.g., Park et al., 2007, Lee and Kiremidjian, 2007, Straub and Der Kiureghian, 2008, Rix et al., 2009]. The Sacramento delta levee system risk assessment project is a practical example where the spatial correlation was considered in the risk assessment process [Hanson et al., 2008, Bazzurro, 2010]. Straub and Der Kiureghian [2008] note that the presence of spatial correlation tends to increase the reliability of series systems and decrease the reliability of parallel systems. Irrespective of the nature of the lifeline (which is neither a series nor a parallel system), it is important to consider the spatial correlation in the risk assessment in order to obtain unbiased estimates of the probability of sustaining large losses (and small frequent losses). The ground-motion models (Equation 1.2) that quantify the distribution of intensities at a single site do not provide information about the spatial correlation between the intensities. Researchers, in the past, have computed these correlations using ground-motion time histories recorded during earthquakes [Goda and Hong, 2008, Wang and Takada, 2005, Boore et al., 2003]. Boore et al. [2003] used observations of PGA from the 1994 Northridge earthquake to compute the spatial correlations. Wang and Takada [2005] computed the correlations using observations of peak ground velocities (PGV) from several earthquakes in Japan and the 1999 Chi-Chi earthquake. Goda and Hong [2008] used the Northridge and Chi-Chi earthquake PGAs and spectral accelerations at three periods ranging between 0.3s and 3s. The results reported by these research works, however, differ in terms of the rate of decay of correlation with separation distance. For instance, while Boore et al. [2003] report that the correlation drops to essentially zero at a site separation distance of approximately 10 km, the non-zero correlations observed by Wang and Takada [2005] extend past 100 km. Further, Goda and Hong [2008] observe differences between the correlation decay rate estimated using the Northridge earthquake records and the correlation decay rate based on the Chi-Chi earthquake records. To date, no explanation for these differences has been identified. Additionally, the ground-motion models used in the development of the correlation models and for performing risk assessments are currently calibrated using regression analysis that assumes independence between the intra-event residuals [Abrahamson and Youngs,


9

1992]. Few works have verified the impact of considering the spatial correlation in the development of ground-motion models. One recent work is that of Hong et al. [2009], who investigated the influence of including spatial correlation in the regression analysis on the ground-motion models fitted using a two-stage regression algorithm and a one-stage algorithm of Joyner and Boore [1993]. They observed that the differences in the estimated ground-motion model coefficients (used for predicting the median intensity) obtained with and without the incorporation of spatial correlation were insignificant. They did not, however, investigate the impact on the variances predicted by the ground-motion models in detail. Contributions In the current study, statistical tests are used to verify the commonly-used assumptions of univariate normality of logarithmic intensities and multivariate normality of spatiallydistributed logarithmic intensities. Further, observed and simulated ground-motion time histories are used to estimate the spatial correlation between intra-event residuals, which can be used to parameterize the joint distribution of the ground-motion intensities. Factors that affect the rate of decay of the correlation with separation distance are studied. Probable explanations for the differing correlation estimates reported in the literature are provided. Finally, the importance of considering spatial correlation in lifeline risk assessments is illustrated. The study also investigates the impact of incorporating spatial correlation on the ground motion model coefficients and on the variance of the predicted intensities. The commonlyused mixed-effects regression algorithm of Abrahamson and Youngs [1992] is modified to account for the spatial correlation. This modified algorithm is then used to refit a sample ground-motion model (the Campbell and Bozorgnia [2008] model) in order to study the impact of incorporating spatial correlation on ground-motion models and subsequently, on the lifelines risk estimates. Additionally, the techniques described above for quantifying the seismic hazard over a region can be extended to other types of hazard and multi-hazard scenarios. This study investigates extension of the regional seismic risk framework to regional hurricane hazard modeling. Multi-site hurricane wind hazard assessment involves the simulation of possible


10

hurricane tracks (i.e., the point of origin, path and other properties such as the central pressure and velocity) and the prediction of the wind fields (peak wind speeds at all the sites of interest) associated with each track [Lee and Rosowsky, 2007, Vickery et al., 2009b, Legg et al., 2010]. This is analogous to the simulation of earthquake events and the prediction of associated ground-motion fields in the seismic hazard assessment framework. Most prediction models developed in the past for predicting hurricane wind fields are deterministic, however, and the uncertainties in wind fields have been rarely analyzed. To the author’s knowledge, the spatial correlation in hurricane wind fields has not been studied in the literature. The current study focuses on quantifying the uncertainties and the spatial correlation in hurricane wind fields (using the same techniques that were used to quantify these parameters in earthquake ground-motion fields), and evaluating their impact on the hurricane risk of spatially-distributed systems. Hurricane wind-speed predictions are obtained for two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the Batts et al. [1980] wind-speed model, and the uncertainties in these predictions are evaluated using actual wind-speed recordings. For instance, Figure 1.3a shows the track of the 2004 Hurricane Jeanne [Landsea et al., 2004] and the observed maximum wind speeds (maximum sustained one minute wind speed at 10 meter height) during the hurricane [Powell et al., 1998]. Figure 1.3b shows the corresponding wind speeds predicted by Batts et al. [1980] wind-speed model. Figure 1.3c shows the total residuals computed from the observed and the predicted wind speeds. The smoothness of the residuals in Figure 1.3c indicates the presence of spatial correlation between the residuals. This spatial correlation structure is estimated and modeled using geostatistical tools.

1.2.2

Lifeline risk assessment

Challenges Lifelines are complex infrastructure systems with a large number of components. (For instance, there are over 1,000 bridges in the San Francisco Bay Area transportation network model used later in this thesis). Estimating the performance of a lifeline during an earthquake scenario is often extremely computationally intensive. One important challenge in lifeline risk assessment is to devise methods to handle this large computational demand


11

Figure 1.3: 2004 hurricane Jeanne (The line indicates the hurricane track.): (a) recorded wind speeds (b) wind speeds predicted by Batts et al. [1980] wind-speed model (c) windspeed residuals.


12

[Der Kiureghian, 2009]. In the past, researchers have developed and used several techniques for lifeline risk assessment. Numerical integration-based techniques are used in some special cases where the components of the lifeline do not interact with each other. This is done, for instance, while evaluating the exceedance rates of monetary losses associated with structural damage to bridges in a transportation network [Stergiou and Kiremidjian, 2006]. This also arises in other situations involving spatially-distributed systems such as the evaluation of the exceedance rates of monetary losses to a portfolio of buildings [Wesson et al., 2009]. These works, however, ignore the spatial correlation between the ground-motion intensities in order to facilitate the use of numerical integration. Some research works use simplified lifeline performance measures in order to reduce the computational demand. Basoz and Kiremidjian [1996], Dueñas-Osorio et al. [2005], Kang et al. [2008] and Bensi et al. [2009b] use connectivity between the nodes of a lifeline (e.g., transportation network connectivity between a city and a hospital) as a measure of network performance. Not only does the use of a simplified connectivity-based measure (instead of a flow-based measure such as the travel-time delay in a transportation network) reduce the time required to evaluate the network performance under various earthquake scenarios, it enables the use of computationally-efficient analytical techniques such as the matrix-based system reliability (MSR) method of Kang et al. [2008] to evaluate the lifeline risk. It is to be noted that the MSR method can be extended to problems using flow-based performance measures, but is computationally expensive in such cases [Kang et al., 2008]. On account of the above-mentioned complications involved in modeling the hazard and the lifeline performance (particularly while using flow-based measures), many past research works use MCS-based approaches instead of analytical approaches for lifeline risk assessment [e.g., Werner et al., 2000, Chang et al., 2000, Campbell and Seligson, 2003, Crowley and Bommer, 2006, Kiremidjian et al., 2007, Shiraki et al., 2007]. One simple MCS-based approach used in the past involves studying the performance of lifelines under those earthquake scenarios that dominate the hazard in the region of interest [e.g., Adachi and Ellingwood, 2008, Kiremidjian et al., 2007, Dueñas-Osorio et al., 2005]. While this approach is more tractable, it does not adequately capture all the seismic hazard uncertainties.


13

A more comprehensive approach uses MCS to probabilistically generate ground-motion intensity maps, considering all possible earthquake scenarios that could occur in the region, and then use these for the risk assessment [Crowley and Bommer, 2006]. Sample scenarios are probabilistically generated by first estimating the median intensities due to a particular earthquake using a ground-motion model, and by subsequently combining the median intensities with simulated values of residuals. Figure 1.4, for instance, illustrates the MCS of ground-motion intensities for a magnitude 8 earthquake on the San Andreas fault. The most basic form of MCS, the conventional MCS, is computationally inefficient because large magnitude earthquakes and above-average ground-motion intensities are considerably more important than small magnitude earthquakes and small ground-motion intensities to lifeline risks, but these are infrequently sampled in conventional MCS. Kiremidjian et al. [2007] improved the MCS process by preferentially simulating large magnitude events using importance sampling (IS). Werner et al. [2004] also implemented variance-reduction techniques in the software package REDARS (Risks from Earthquake Damage to Roadway Systems) in order to simulate fewer earthquakes. Chang et al. [2000] used a MCS-based approach to estimate earthquake-induced delays in a transportation network. They generated a catalog of 47 earthquakes and corresponding intensity maps for the Los Angeles area and assigned probabilities to these earthquakes such that the site hazard curves obtained using this catalog match with the known local site hazard curves obtained from PSHA. In other words, the probabilities of the scenario earthquakes were chosen to make the catalog hazard consistent. Only median PGAs were used to produce the ground-motion intensity maps corresponding to the scenario earthquakes, however, and variability about these medians was ignored, which can bias the resulting risk estimates [e.g., Grossi and Kunreuther, 2005]. While this approach is highly computationally efficient on account of the use of a small catalog of earthquakes, the selection of earthquakes is a somewhat subjective process, and the assignment of probabilities is based on hazard consistency rather than on actual event likelihoods. Campbell and Seligson [2003] proposed a more quantitative procedure to develop the hazard consistent scenarios, but the rest of the drawbacks were not resolved. Recently, Guikema [2009] proposed that the lifeline performance evaluations can be expedited by using an approximate regression relationship between the lifeline performance


14

Figure 1.4: Ground-motion intensity simulation for a magnitude 8 earthquake on the San Andreas fault: (a) median intensities obtained using the Boore and Atkinson [2008] groundmotion model (b) simulated values of the normalized total residuals (c) total intensities.


15

and the predictive hazard variables (e.g., ground-motion intensities at component locations) obtained using a statistical learning technique. He, however, did not provide any risk assessment examples. Another recent work is that of Bensi et al. [2009a], who explored the use of Bayesian network models, particularly for post-earthquake lifeline risk assessment. The computational feasibility of this approach particularly while estimating the risk of large lifelines needs further investigation. Contributions The current study develops a computationally-efficient lifeline risk assessment framework based on efficient sampling and data reduction techniques. The framework can be used for developing a small, but stochastically representative catalog of spatially-correlated groundmotion intensity maps that can be used for performing lifeline risk assessments. This technique is seen to reduce the computational demand of complex risk assessments by more than three orders of magnitude, without compromising the accuracy of the risk estimates. The proposed framework is used to evaluate the exceedance rates of various travel-time delays on an aggregated (higher-scale) model of the San Francisco Bay Area transportation network. Lifeline risk deaggregation calculations are used to illustrate the need to consider uncertainties in the lifeline risk assessment process. Finally, the study also explores the use of a statistical learning technique called multivariate adaptive regression trees in order to expedite lifeline performance evaluation.

1.3

Organization

This thesis addresses several important issues related to the risk assessment of lifelines. Chapters 2, 3 and 4 deal with the joint distribution of spatially-distributed ground-motion intensities. Chapter 5 discusses systematic approaches to probabilistically sampling ground motion intensity fields. Chapters 6 and 7 present new computationally-efficient lifeline risk assessment techniques. Chapter 8 discusses the impact of considering spatial correlation on ground-motion models and subsequently, on lifeline seismic risk. Chapter 9 explores extending the probabilistic framework used for the seismic risk assessment of lifelines to


16

hurricane lifeline risk assessment. Chapter 2 deals with the important issue of quantifying the joint distribution of spectral accelerations, which is required for the risk assessment of lifelines. The chapter discusses statistical tests that are used to examine the commonly-used assumptions of univariate normality of logarithmic spectral acceleration values and multivariate normality of vectors of logarithmic spectral acceleration values computed at different sites and/or different periods. The statistical hypothesis tests carried out in this work indicate that these assumptions are reasonable. Chapter 3 presents a new spatial correlation model for spectral accelerations at a single period (and the related Appendix A describes the estimation of cross-correlations between spectral accelerations at two different periods), developed using recorded earthquake time histories. The correlation is expressed as a function of the site separation distance, the spectral acceleration period and the local soil conditions. The correlations predicted by the model, along with the means and the variances provided by the ground-motion models, can be used to completely parameterize the joint distribution of spatial spectral acceleration fields, which is necessary for lifeline risk calculations. Chapter 4 investigates the validity of commonly-used assumptions in spatial correlation models such as stationarity (invariance of correlation with spatial location) and isotropy (directional independence). Testing these assumptions, however, requires a large number of ground-motion time histories. Since real data are sparse, this chapter uses simulated ground-motion time histories instead. The chapter also takes advantage of the large simulated ground-motion database to carry out tests to identify whether the correlations between pulse-like ground motions that arise due to directivity effects are different from the correlations between non-pulse-like ground motions. Overall, this chapter tests and provides a basis for some of the subtle assumptions commonly used in spatial correlation models. Chapter 5 discusses techniques for simulating ground-motion intensity maps with and without the consideration of recorded ground-motion intensities. A ground-motion intensity map is generated by combining median intensity predictions from ground-motion models with realizations of inter-event and intra-event residuals that account for the uncertainty in the intensities. Intra-event residuals can be simulated as a correlated vector (using the correlation model presented in Chapter 3) of multivariate normal random variables, and the


17

inter-event residual can be simulated as a univariate Gaussian random variable (based on the discussion in Chapter 2). The chapter discusses two MCS techniques, termed, single-step simulation and sequential simulation, for generating residuals in the absence of recorded ground-motion intensities. While both procedures are theoretically equivalent, it is possible to reduce computational expense by using the sequential simulation technique. The chapter also describes a sequential simulation technique for simulating residuals incorporating knowledge about recorded ground-motion intensities. This is useful for post-earthquake damage assessment and for determining optimal emergency response strategies. Chapter 6 presents a novel computationally-efficient MCS procedure based on importance sampling and K-means clustering, that can be used for the seismic risk assessment of lifelines. The framework can be used for developing a small, but stochasticallyrepresentative catalog of ground-motion intensity maps that can be used for performing lifeline risk assessments. The importance sampling technique is used to preferentially sample important ground-motion intensity maps (using the MCS techniques discusses in Chapter 5), and the K-means clustering technique is used to identify and combine redundant maps. It is shown theoretically and empirically that the risk estimates obtained using these techniques are unbiased. The proposed framework is used to compute the exceedance rates of travel-time delays (the chosen performance measure) on an aggregated form (coarse-scale model) of the San Francisco Bay Area transportation network. The exceedance rates of travel-time delays are obtained using a catalog of only 150 maps, and are shown to be in good agreement with those obtained using the conventional MCS method. The proposed method is three orders of magnitude faster (computationally) than the conventional MCS, and therefore will potentially facilitate computationally intensive risk analysis of lifelines, with full consideration of the uncertainties and the spatial correlation in ground-motion intensity fields. The related Appendix C uses lifeline risk deaggregation calculations to illustrate the need to consider these uncertainties in the lifeline risk assessment process. Chapter 7 explores the use of statistical learning techniques to reduce the computational expense of the lifeline risk assessment problem. MCS and its variants are generally well suited for characterizing ground motions and computing resulting losses to lifelines. MCS-based methods are, however, highly computationally intensive, primarily because


18

they involve repeated evaluations of lifeline performance under a large number of simulated ground-motion intensity maps. In this study, a non-parametric statistical learning technique termed Multivariate Adaptive Regression Trees (MART) is used to obtain an approximate relationship between the ground-motion intensities at lifeline component locations and the lifeline performance. Non-parametric regression is used in place of classical regression since the number of predictor variables (ground-motion intensities at the component locations) far exceeds the number of available training data points. The lifeline performance predicted by this relationship can potentially be used in place of the actual lifeline performance (the evaluation of which is intensive) to expedite the computation of several lifeline risk-related parameters. The study illustrates this by developing a MART-based relationship between the ground-motion intensities at bridge locations and the network travel times in the San Francisco Bay Area transportation network, and using it for estimating confidence intervals for the risk estimates presented in Chapter 6. More generally, these approximate performance relationships can be used in several problems such as prioritizing lifeline retrofits, whose computational demand stems from the need for repeated performance evaluations. Even though the risk assessment framework described in Chapter 6 facilitates the consideration of spatial correlation between ground-motion intensities, current ground-motion models (e.g., NGA ground-motion models) that are used to predict the distribution of ground-motion intensities at individual sites are fitted assuming independence between the intra-event residuals. Chapter 8 proposes a method to consider the spatial correlation (discussed in Chapter 3) in the mixed-effects regression procedure used for fitting groundmotion models, and illustrates the impact of considering spatial correlation on the means and the variances predicted by the ground-motion models. It is shown using an illustrative example that the risk estimates of spatially-distributed systems can be inaccurate while using ground-motion models fitted without the consideration of spatial correlation. Frameworks for the risk assessment of structures and infrastructure systems under a variety of natural and man-made hazards share many similarities. It is conceivable therefore, that the techniques developed for the risk assessment under one type of natural or man-made hazard will be applicable for the risk assessment under another hazard or multihazard scenario. Chapter 9 describes an exploratory study carried out to investigate the


19

extension of the seismic hazard and risk assessment concepts and techniques discussed in the earlier chapters to hurricane hazard and risk modeling. The study focuses on quantifying the uncertainties and the spatial correlation in hurricane wind fields (using techniques that are used to quantify these parameters in earthquake ground-motion fields), and evaluating their impact on the hurricane risk of spatially-distributed systems. Finally, Chapter 10 summarizes the important contributions and findings of this thesis, and discusses future extensions of this research. The chapters of this thesis are designed to be largely self-contained because they have been or will be published as individual journal articles. Because of this, there is some repetition of background material. In addition, notational conventions were chosen to be simple and clear for the topic of each chapter rather than for the thesis as a whole; because of this, the notational conventions may not be identical for each chapter. Apologies are made for any distraction this causes when reading the thesis as a continuous document.

Chapter 2 Statistical Tests of the Joint Distribution of Spectral Acceleration Values N. Jayaram and J.W. Baker (2008). Statistical tests of the joint distribution of spectral acceleration values, Bulletin of the Seismological Society of America, 98(5), 2231-2243.

2.1

Abstract

Assessment of seismic hazard using conventional probabilistic seismic hazard analysis (PSHA) typically involves the assumption that the logarithmic spectral acceleration values follow a normal distribution marginally. There are, however, a variety of cases in which a vector of ground-motion intensity measures is considered for seismic hazard analysis. In such cases, assumptions regarding the joint distribution of the ground-motion intensity measures are required for analysis. In this article, statistical tests are used to examine the assumption of univariate normality of logarithmic spectral acceleration values and to verify that vectors of logarithmic spectral acceleration values computed at different sites and/or different periods follow a multivariate normal distribution. Multivariate normality of logarithmic spectral accelerations are verified by testing the multivariate normality of inter-event and intra-event residuals obtained from ground-motion models. The univariate normality tests indicate that both inter-event and intra-event residuals

20

CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS

21

can be well represented by normal distributions marginally. No evidence is found to support truncation of the normal distribution, as is sometimes done in PSHA. The tests for multivariate normality show that inter-event and intra-event residuals at a site, computed at different periods, follow multivariate normal distributions. It is also seen that spatiallydistributed intra-event residuals can be well represented by the multivariate normal distribution. This study provides a sound statistical basis for assumptions regarding the marginal and the joint distribution of ground-motion parameters that must be made for a variety of seismic hazard calculations.

2.2

Introduction

Spectral acceleration values of earthquake ground motions are widely used in seismic hazard analysis. Conventional probabilistic seismic hazard analysis (PSHA) [e.g., Kramer, 1996] provides a framework for the probabilistic assessment of a single ground-motion parameter (such as the spectral acceleration computed at a single period). When implementing PSHA, it is typically assumed that the spectral acceleration follows a lognormal distribution marginally. There are, however, cases in which knowledge about the joint occurrence of several spectral acceleration values, corresponding to different periods, is required for hazard assessment [Bazzurro and Cornell, 2002]. Additionally, a single earthquake can cause severe damage over a large area. Hence, when assessing the impact of earthquakes on a portfolio of structures or a spatially-distributed infrastructure system, it is necessary to study the joint occurrence of spectral acceleration values at various sites in the region [Crowley and Bommer, 2006]. Moreover, the knowledge of a vector of groundmotion intensity measures is useful in other practical applications that involve computation of the seismic response of a structure dominated by more than one mode [Shome and Cornell, 1999, Vamvatsikos and Cornell, 2005], or that involve joint prediction of structural and non-structural seismic responses for loss estimation purposes, and prediction of multiple demand parameters such as displacement and hysteric energy. In such cases, a vector of intensity measures needs to be considered and hence, it is necessary to study the joint distribution of these intensity measures in observed ground motions.


22

Various empirical ground-motion models have been developed for estimating the response spectrum of a given ground motion [e.g., Campbell and Bozorgnia, 2008, Boore and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008]. A typical ground-motion model has the form: ln(Y ) = ln (Y¯ ) + ε + η

(2.1)

where Y denotes the ground-motion parameter of interest (e.g., Sa (T1 ), the spectral acceleration at period T1 ); Y¯ denotes the predicted (by the ground-motion model) median value of the ground-motion parameter (which depends on parameters such as magnitude, distance, period and local soil conditions); ε denotes the intra-event residual, which is a random variable with zero mean and a standard deviation of σ ; and η denotes the inter-event residual, which is a random variable with zero mean and a standard deviation of τ. The standard deviations, σ and τ, are estimated during the derivation of the ground-motion model and are a function of the response period, and in some models a function of earthquake magnitude and distance from the rupture. Normalized intra-event residuals (ε˜ ) are obtained by ˜ dividing ε by σ . Similarly, η can be normalized using τ to obtain η. The logarithmic spectral acceleration at a site due to an earthquake is usually assumed to be well represented by the normal distribution marginally [e.g., Kramer, 1996]. Abrahamson [1988] performed rigorous statistical studies to verify the assumption that logarithmic peak ground acceleration (PGA) values follow the normal distribution marginally. Such rigorous studies have, however, not been performed on spectral accelerations. Moreover, the assumption of normality must be extended to the joint distribution of the logarithmic spectral accelerations, when performing vector-valued seismic hazard analysis [Bazzurro and Cornell, 2002, Baker and Cornell, 2006]. When multiple ground-motion parameters are considered (for instance, Y1 and Y2 ), the ground-motion model equations take the following form: ln(Y1 ) = ln (Y¯1 ) + ε1 + η1 ln(Y2 ) = ln (Y¯2 ) + ε2 + η2

(2.2)


23

where Y¯1 and Y¯2 denote the predicted median values of the ground-motion parameters; ε1 and ε2 denote the intra-event residuals corresponding to the two parameters; η1 and η2 denote the inter-event residuals (η1 equals η2 if Y1 and Y2 denote Sa (T ) at two sites during the same earthquake). If Y1 and Y2 are spectral accelerations at two-closely spaced sites or spectral accelerations at two different periods at the same site, the residuals will not be independent [Baker and Jayaram, 2008, Baker and Cornell, 2006]. Thus, an assumption of univariate normality does not necessarily imply joint normality between the residuals. There is a paucity of research work that examines the validity of assuming multivariate normality. This chapter explores the validity of these assumptions using statistical tests for univariate and multivariate normality, and a large library of spectral acceleration values from recorded ground motions. The ground-motion model of Campbell and Bozorgnia [2008] is used in this study to compute the parameters shown in equations 2.1 and 2.2. The conclusions drawn from the work, however, did not change when the Boore and Atkinson [2008] ground-motion model was used as well. The spectral acceleration definition typically used in the NGA groundmotion models is ‘GMRotI50’ (also known as ‘GMRotI’). This is the 50th percentile of the set of geometric means of spectral accelerations at a given period, obtained by rotating the as-recorded orthogonal horizontal motions through all possible non-redundant rotation angles [Boore et al., 2006]. The residuals used in this work are obtained based on this definition of the spectral acceleration. The data for the analysis is obtained from the PEER NGA Database [2005]. In order to exclude records whose characteristics differ from those used by the ground-motion modelers for data analysis, only records used by the ground-motion model authors are considered in the tests for normality.

2.3

Testing the univariate normality of residuals

This section discusses tests performed on the assumption that logarithmic spectral accelerations at a site due to a given earthquake are well represented by the normal distribution, marginally. A practical way to test the univariate normality of a data set is to inspect the normal Q-Q plot obtained from the data set by plotting the quantiles of the data sample


24

against the corresponding quantiles of the theoretical normal distribution [e.g., Johnson and Wichern, 2007]. The following steps are involved in the construction of a normal Q-Q plot. Let x be a collection of n data values that need to be tested for normality. The data set is ordered (sorted in ascending order) to obtain x(1) , x(2) , ..., x(n) (such that x(1) ≤ x(2) ≤ · · · ≤ x(n) ). When these sample quantiles x(k) are distinct (which is a reasonable assumption for continuously varying data), exactly k observations are less than or equal to x(k) . The cumulative probabilities p(k) of each x(k) can be computed as nk . It has been shown, however, that a continuity correction gives an improved p(k) estimate of

(k−3/8) (n+1/4)

[Johnson and Wichern, 2007]

and hence, this definition of p(k) is used in this work. The normal Q-Q plot is obtained by plotting the ordered data samples against the theoretical normal quantiles corresponding to each of the probabilities p(k) . The theoretical normal quantile corresponding to probability p(k) is obtained as Φ−1 (p(k) ), where Φ−1 denotes the inverse of the cumulative normal distribution with the mean and the variance equaling the sample mean and the sample variance respectively. If the data sample follows a normal distribution, the normal Q-Q plot will form a straight line with a slope of 45 ◦ , passing through the origin.

2.3.1

Results and discussion

Normality tests are performed on intra-event and inter-event residuals in order to verify the univariate normality of logarithmic spectral accelerations at a site due to an earthquake. The intra-event and the inter-event residuals provided to the authors by the ground-motion model authors are used in the normality tests. Intra-event residuals This section discusses results of the univariate standard normality tests performed on the normalized intra-event residuals (ε˜ ). As mentioned previously, ε˜ values are obtained by dividing the intra-event residuals (ε’s) by the standard deviations (σ ’s) provided by the Campbell and Bozorgnia [2008] model. Figure 2.1 shows the normal Q-Q plots of ε˜ computed at four different periods ranging between 0.5 seconds and 10 seconds, with the theoretical quantiles derived from the


25

standard normal distribution (normal distribution with zero mean and unit variance). Long periods such as 10 seconds may not be used in practice as often as short periods. These long periods are considered in this work, however, in order to cover the entire range of periods in which the ground-motion model used is applicable. Also shown in the figures are 45 ◦ lines passing though the origin. Deviation of the normal Q-Q plot from the 45 ◦ line indicates deviation from standard normality. It can be seen from Figure 2.1 that the normal Q-Q plots match reasonably well with the 45

◦

lines in all the four cases. This indicates

that ε˜ can be considered to be univariate standard normal based on this data set. Note that while normality of ε˜ is assumed in PSHA, it is often assumed that the distribution is truncated. A typical decision would be to truncate the distribution at a ε˜ = 2 or 3, and not allow any larger ε˜ values [Bommer and Abrahamson, 2006]. The tail of the marginal distribution needs to be studied in order to determine if this truncation of the normal distribution is reasonable. Figure 2.1 shows that ε˜ values larger than 2 are observed as often as would be expected from a non-truncated distribution. With the small data sets used, however, it is not possible to study the tail distribution beyond ε˜ = 3. A technique to obtain a larger number of samples at the tail of the distribution would be to pool the ε˜ values computed at different periods. The normalized residuals computed at various periods are shown to follow a standard normal distribution using the normal QQ plots in Figure 2.1. Hence, it can be inferred that quantiles of the pooled data set will match with the corresponding quantiles of a theoretical standard normal distribution. The pooled set has a larger number of data points in the tail and hence, it is preferable to study the tail properties using the pooled data set rather than the individual data sets. Hence, 12,194 ε˜ values computed at 10 periods ranging from 0.5-10 seconds are pooled together. The histogram of the pooled data set is shown in Figure 2.2 along with a scaled plot of the theoretical standard normal distribution. The figure shows that the data are in excellent agreement with the standard normal distribution, as expected based on the normal Q-Q plots shown in Figure 2.1. The normal Q-Q plot for the pooled data set is shown in Figure 2.3. It can be seen that the quantiles from the observed data match reasonably well with the theoretical quantiles up to ε˜ values of 3.5 or 4. Beyond ε˜ = ±4, there is no longer enough data to study possible truncation. This large data set thus contradicts claims that an ε˜ truncation at less than 4 is reasonable, and provides no evidence to support truncation


26

Figure 2.1: The normal Q-Q plots of the normalized intra-event residuals at four different periods. (a) T = 0.5 seconds (1560 samples) (b) T = 1.0 seconds (1548 samples) (c) T = 2.0 seconds (1498 samples) (d) T = 10.0 seconds (507 samples).


27

at a larger value. This is consistent with the findings of other researchers examining large data sets [Strasser et al., 2008, Abrahamson, 2006, Bommer et al., 2004]. Inter-event residuals According to the ground-motion model of Campbell and Bozorgnia [2008], the standard deviation of the inter-event residuals (η) depends on the rock PGA at the sites. As a result, while the η values computed at any particular period are identical across all the sites during ˜ vary across sites even during a given earthquake, the normalized inter-event residuals (η) a single earthquake (because the standard deviation, τ, with which they are normalized varies from site to site). This makes it impossible to use η˜ for the normality study. It is seen, however, using the records in the PEER NGA Database [2005] that over 90% of the standard deviations of η’s (obtained using the ground-motion model of Campbell and Bozorgnia [2008]) lie within a reasonably narrow interval (with an approximate range of 0.04). Hence, homoscedasticity (i.e., constant variance) of η is considered to be reasonable and so the η values are used as such, without normalization. Figure 2.4 shows the normal Q-Q plot obtained using the η values corresponding to four different periods. The theoretical quantiles are obtained using a normal distribution with zero mean and a standard deviation that equals the sample standard deviation (which does not equal one since the η values are not normalized). It is seen from Figures 2.4ad that the normal Q-Q plots match reasonably well with the 45

◦

straight lines, thereby

indicating the univariate normality of inter-event residuals.

2.4

Testing the assumption of multivariate normality for random vectors using independent samples

In this section, several statistical tests are presented that can be used with observed groundmotion data to test the validity of the assumed multivariate normal distribution for logarithmic spectral accelerations. A given ground motion will have spectral acceleration values that vary stochastically as a function of period. Hence, for any d periods, T = [T1 , T2 , ..., Td ], let the corresponding


28

Figure 2.2: The histogram of the 12,194 pooled normalized intra-event residuals computed at 10 periods, with the theoretical standard normal distribution (scaled) superimposed.


29

j

values of spectral acceleration at the sites be denoted by Sa (Ti ), where j is an index that denotes a given recording, while Ti indicates a particular period. The mathematical procedures explained in this section can be used to test whether the random vectors of logarithmic spectral accelerations, [ln (Sa (T1 )) , ln (Sa (T2 )) , · · · , ln (Sa (Td ))], are jointly normal. Testing for multivariate normality is much more complex than testing for univariate normality since there are many more properties in a multivariate distribution to be considered during the test. Among the many possible tests for multivariate normality of a given data set, eight are reviewed in detail by Mecklin and Mundfrom [2003]. They examined the power of the eight tests using a Monte Carlo study for several data sets that had pre-determined multivariate distributions. They recommend the use of the Henze-Zirkler test [Henze and Zirkler, 1990] as a formal test of multivariate normality, complemented by other test procedures such as the Mardia’s skewness and kurtosis tests [Mardia, 1970]. Multivariate normality can also be tested using the Chi-square plot (also known as the gamma plot) [Johnson and Wichern, 2007], which is a multivariate equivalent of the normal Q-Q plot. The procedure to obtain the Chi-square plot is similar to that used for a normal Q-Q plot except that squared Mahalanobis distances [Mardia et al., 1979] of data samples are used in place of the data quantiles and a theoretical Chi-square distribution is used in place of the theoretical normal distribution. A departure from linearity indicates departure from multivariate normality. In this work, however, only the three more quantitative tests, namely, the Henze-Zirkler test and Mardia’s test of skewness and of kurtosis are used. These three tests are described in the following paragraphs.

2.4.1

Henze-Zirkler test

Henze and Zirkler [1990] proposed a class of invariant consistent tests for testing multivariate normality. The test procedure is based on the computation of a defined test statistic which is a function of the given data and whose asymptotic distribution is known if the data follows a multivariate normal distribution. The statistic can be compared to the asymptotic distribution to test whether the data set can be reasonably assumed to be normal. The Henze-Zirkler test statistic is defined as follows: Let X1 , X2 , ..., Xn be a set of n independent data samples (i.e., the X1 , X2 , ..., Xn are obtained from n independent records) each of


30

dimension d (i.e., Xi = {Xi1 , Xi2 , ..., Xid }). It is to be noted that the variables Xi( j1 ) and Xi( j2 ) could be correlated.

Tn,β

2 β2 1 n n = ∑ ∑ exp − 2 Y j −Yk n k=1 j=1 n

2 d β2 2 − d2 2 −2

− 2(1 + β ) ∑ exp − Y + n 1 + 2β j 2 (1 + β 2 ) j=1

(2.3)

where 1 d+4 1 β = √12 2d+1 n d+4 4

0

Y j −Yk 2 = X j − Xk S−1 X j − Xk

2 0

Y j = X j − X ¯ n S−1 X j − X ¯n

¯ n is the sample mean vector of the n realizations X1 , ..., Xn where Tn,β is the test statistic; X 0 ¯ n X j − X¯n and S is the sample covariance matrix defined as S = 1 ∑n Xj −X n

j=1

Henze and Zirkler [1990] also approximated the limiting distribution of Tn,β (given the multivariate normality of X) with a lognormal distribution with the mean and the variance defined as follows: " # d dβ 2 d(d + 2)β 4 2 −2 E Tβ = 1 − 1 + 2β 1+ + 1 + 2β 2 2 (1 + 2β 2 )2

" d 2 −2 2 −d Var Tβ = 2 1 + 4β + 2 1 + 2β 1+ − d2

"

− 4w (β ) where w (β ) = 1 + β 2

1 + 3β 2

1+

2dβ 4 2

(1 + 2β 2 ) # d(d + 2)β 8

3dβ 4 + 2w (β ) 2w (β )2

+

(2.4)

3d(d + 2)β 8

#

4

4 (1 + 2β 2 )

(2.5)

Based on the value of the statistic computed using the data and the asymptotic distribution of Tn,β , the p-value of the test of multivariate normality can be calculated. The p-value is the probability of obtaining a statistic value that is at least as extreme as the statistic computed from the data, if the null hypothesis of multivariate normality were true. The


31

smaller the p-value, the stronger the evidence against the null hypothesis. It is suggested that this test be used if the sample size n is at least 20 [Henze and Zirkler, 1990].

2.4.2

Mardia’s measures of kurtosis and skewness

Mardia [1970] extended the concepts of kurtosis and skewness from the univariate case to the multivariate case. Mardia [1970] also obtained the asymptotic distribution of the multivariate kurtosis and skewness parameters (which is needed to test the null hypothesis of multivariate normality). Multivariate kurtosis Mardia [1970] defined the multivariate kurtosis coefficient as follows: h i2 0 −1 µ Σ µ K = E (X − ) (X − )

(2.6)

where X = [X1 , X2 , ..., Xn ] is the random vector whose distribution is tested; µ is the mean 0

vector of X; (X − µ ) refers to the transpose of (X − µ ) and Σ is the covariance matrix of X. In practice, the value of multivariate kurtosis can be computed from the sample data as follows: 0 −1 i2 1 n h ¯ ¯ Xi − Xn k = ∑ Xi − Xn S n i=1

(2.7)

Mardia [1970] also showed that the asymptotic distribution of the above-defined multivariate kurtosis parameter (k) can be obtained from the following equation, if X follows the multivariate normal distribution: k − (d(d + 2)(n − 1)/(n + 1)) (8d(d + 2)/n)0.5

⇒ N(0, 1)

(2.8)

where N(0, 1) denotes the univariate standard normal distribution. The asymptotic distribution can be used to test if the sample data are from a multivariate normally distributed population, by allowing a p-value to be computed.


32

Multivariate skewness Mardia [1970] and Mardia et al. [1979] defined the measure of multivariate skewness to be as follows:

h i3 0 −1 µ Σ µ S = E (X1 − ) (X2 − )

(2.9)

where X = [X1 , X2 , ..., Xn ] is the random vector whose distribution is tested. This parameter can be computed from the sample data as follows: s=

0 −1 i3 1 n n h ¯ ¯ X − X S X − X j n ∑∑ i n n2 i=1 j=1

(2.10)

The asymptotic distribution of the multivariate skewness parameter (s) can be obtained from the following equation: ns 2 ⇒ χd(d+1)(d+2)/6 6

(2.11)

2 where χd(d+1)(d+2)/6 is the Chi-square distribution with d(d + 1)(d + 2)/6 degrees of free-

dom. This asymptotic distribution can be used to test the null hypothesis of multivariate normality. The above procedures can be used to test the multivariate normality of any random vector using a set of independent data samples. For instance, these tests can be used to verify the multivariate normality of intra-event residuals computed at multiple periods. In this case, in order to obtain a set of independent data samples, each random vector (comprising of intra-event residuals computed at multiple periods) must be obtained from records that are independent of one another. A technique to obtain independent data samples is discussed in a subsequent section.

2.4.3


As mentioned earlier, multivariate normality tests need to be performed on intra-event and inter-event residuals in order to verify multivariate normality of the logarithmic spectral accelerations. The intra-event residuals are normalized by the appropriate standard deviations before use, while the inter-event residuals are used without normalization, for reasons


33

mentioned previously. Normalized intra-event residuals at different periods Let ε˜ (T) = [ε˜ (T1 ), ε˜ (T2 ), · · · , ε˜ (Td )] denote the random vector of normalized intra-event residuals computed at d different periods. During an earthquake, different sites experience different levels of ground motion based on their distance from the earthquake source, the local soil conditions and other factors. These ground motions can be used to compute samples (e˜ j (T)) of the random vector ε˜ (T) at site j. This section uses the samples e˜ j (T) obtained at various sites to test whether ε˜ (T) follows a multivariate normal distribution. The results presented in this work are based on data from the 1994 Northridge earthquake and the 1999 Chi-Chi earthquake. The PEER NGA Database [2005] is used to obtain the data and contains 160 records from the Northridge earthquake and 421 records from the Chi-Chi earthquake (the aftershock data are not used). From these records, only those used by the authors of the Campbell and Bozorgnia [2008] ground-motion model are included in the analysis. Even this reduced data set can not be used as such because the samples will not be independent of one another on account of the spatial correlation of the ground motion during a given earthquake. It is known, however, that the correlation between e˜i (Tp ) and e˜j (Tp ) decreases with increasing separation distance between the sites i and j, where Tp denotes any particular period. It is seen from the literature that the correlation coefficient drops close to zero (i.e., the ε˜ (Tp )’s are approximately uncorrelated) when the separation distance exceeds 10km [Boore et al., 2003]. Moreover, it is shown subsequently in this chapter that the ε˜ (Tp )’s obtained at different sites from a single earthquake follow a multivariate normal distribution. Hence, approximately uncorrelated ε˜ (Tp ) values are also approximately independent, and, therefore, samples of random vectors obtained from recordings at mutually well-separated sites would be approximately independent and can be used in the tests described in the previous section. Therefore, in the current work, well-separated locations (with separation distances exceeding 20km) are identified for the Northridge earthquake and the Chi-Chi earthquake and the tests of normality are performed on the data set obtained by combining the Chi-Chi and the Northridge earthquake data. There are several possible combinations of recordings that would satisfy the constraints on the minimum separation distance and the minimum sample size (as defined in Section 2.4)


34

and hence the tests are carried out on the various allowable configurations. Though the test results vary slightly based on the configuration used, p-values from only a single data set are reported in this chapter. The combined data set has around 35 records at periods less than or equal to 2 seconds and close to 30 records at periods below 7.5 seconds, which are reasonable sample sizes for testing the hypothesis. At 10 seconds, however, the number of independent samples available is 22, which barely exceeds the threshold of 20, mentioned in Section 2.4. Hence, ε˜ values computed at 10 seconds are not used often in the tests. In order to strictly prove multivariate normality of ε˜ , one must evaluate multivariate normality of normalized residuals having all possible period combinations (i.e., all pairs, triplets etc.). For all practical purposes, however, it is sufficient to consider the joint distribution of ε˜ ’s computed at five periods. Incidentally, if multivariate normality can be established for such a case, it can be inferred that the lower-order combinations (i.e., subsets of the five periods that are used) also follow a multivariate normal distribution and do not have to be tested explicitly. This is because all subsets of a random vector X are multivariate normal if X is multivariate normal [Johnson and Wichern, 2007]. Results from a set of hypothesis test results are shown in Table 2.1 and explained in the following paragraphs. The table shows the set of periods at which the ε˜ values are computed and the p-values obtained based on the Henze-Zirkler test, the Mardia’s test of skewness and the Mardia’s test of kurtosis. Case 1 shown in the table corresponds to the bivariate normality tests on the ε˜ ’s obtained at 1 second and 2 seconds. The p-values reported by all three tests are statistically insignificant at the 5% significance level typically used for testing. In Case 2, five different periods ranging between 0.5 seconds and 2 seconds are chosen. The Henze-Zirkler test and the test of skewness report highly insignificant p-values, and the test of kurtosis reports a p-value of 0.05, which is insignificant as well. The normality tests are also performed considering long periods. In Case 4, the periods are chosen over the 0.5-7.5 seconds range, as shown in Table 2.1. The p-values reported by all three tests are highly statistically insignificant. Finally, a test is carried out considering long periods exclusively (Case 5); the p-values obtained from all the tests are also statistically insignificant. Overall, there seems to be not much evidence to reject the null hypothesis that ε˜ computed at different periods follows a multivariate normal distribution.


35

Figure 2.3: The normal Q-Q plot of the pooled set of normalized intra-event residuals.

Table 2.1: Tests on normalized intra-event residuals computed at different periods Case Periods (secs) PHZ PSK PKT 1 T={1.0,2.0} 0.10 0.23 0.93 2 T={0.5,0.75,1.0,1.5,2.0} 0.49 0.92 0.05 3 T={0.5,1.0,2.0,5.0,7.5} 0.69 0.90 0.42 4 T={5.0,7.5,10.0} 0.19 0.14 0.62 Explanation of Abbreviations used in the table a P : p-value obtained from Henze-Zirkler test HZ b P : p-value obtained from Mardia’s test of skewness SK c P : p-value obtained from Mardia’s test of kurtosis KT


36

Figure 2.4: The normal Q-Q plots of inter-event residuals at four different periods. (a) T = 0.5 seconds (64 samples) (b) T = 1.0 seconds (64 samples) (c) T = 2.0 seconds (62 samples) (d) T = 10.0 seconds (21 samples).


37

Inter-event residuals at different periods This section discusses tests carried out on inter-event residuals (η) at multiple periods. The number of inter-event residuals available for the tests ranges from 64 at 0.5 seconds to 40 at 7.5 seconds. Only 21 records are available, however, at 10 seconds. Table 2.2 shows the hypothesis test results based on η values. In Case 1, η values at two periods, 1 second and 2 seconds, are tested for bivariate normality. It can be seen that the p-values reported by all three tests are highly insignificant. In Case 2, five different periods are chosen ranging between 0.5 and 2 seconds. The table shows that the p-values reported by all three tests are statistically significant. The authors believe, however, that this is a result of the deviations from marginal normality due to the small sample size being carried over to the higher-order distributions (i.e., even if the true marginal distribution is normal, a sample from the distribution will not be exactly normal). In order to verify this, the η values are again computed at the same set of periods as in Case 2 and are transformed so that their marginal distributions are normal (in order to remove the deviations in the sample’s univariate distribution from the normal distribution), using the normal score transform procedure described by Deutsch and Journel [1998]. It is to be noted that the normal score transform (or any other monotonic transform) of the univariate distribution can not change the basic nature of the bivariate and the other multivariate distributions. Further, the marginal distribution of η has been shown to be normal in section 2.3 and hence, the transformation of the marginal distribution of the sampled data does not interfere with the tests for multivariate normality. This transformation procedure is described in Appendix 2.8. The tests are performed on the transformed data (Case 3) and the p-values corresponding to all three tests are seen to increase significantly, indicating that the statistically significant p-values in Case 2 is probably a result of the deviation of the sample’s marginal distribution from a normal distribution rather than an indicator of non-normality in the joint distribution. Case 4 involves testing η values at five periods ranging from 0.5-7.5 seconds. The reported p-values are, again, found to be insignificant. In Case 5, η’s at three long periods are tested for multivariate normality. The p-values reported by the three tests are highly statistically insignificant. It can, hence, be concluded from the results that it is reasonable to


38

assume that the η’s computed at different periods follow a multivariate normal distribution.

Since both the inter-event and intra-event residuals computed at multiple periods follow multivariate normal distributions, it is concluded that the logarithmic spectral accelerations computed at different periods, at a given site during a given earthquake, follow a multivariate normal distribution. Spectral acceleration values at different orientations This section describes tests carried out to verify whether spectral acceleration values corresponding to two different orientations at a site follow a bivariate normal distribution. The test procedures are identical to h those describediin section 2.4, except that the random vec-

tor would now be written as SaH1 (T1 ), SaH2 (T2 ) , where H1 and H2 refer to two orthogonal horizontal orientations (e.g., the fault-normal and the fault-parallel directions) and T1 and T2 denote the periods in consideration in the two orthogonal directions. In order to verify bivariate normality of the spectral accelerations corresponding to two different orientations, normality tests should be carried out on the inter-event and the intra-event residuals separately. The inter-event residuals in the fault-normal and the faultparallel directions, however, are not known. As a result, an approximate test for bivariate normality of spectral accelerations in different orientations is carried out by performing tests on normalized total residuals. Total residuals are computed based on the following alternate formulation of the ground-motion equations: ln(Y ) = ln (Y¯ ) + δ

(2.12)

where Y denotes the ground-motion parameter of interest; Y¯ denotes the predicted median value of the ground-motion parameter; δ refers to the total residual, which is a random variable that represents both the inter-event and the intra-event residuals. From equations √ 2.1 and 2.12, it can be inferred that δ has zero mean and standard deviation σ 2 + τ 2 . Hence, normalized total residuals (δ˜ ) can be obtained as √ δ . σ 2 +τ 2

In this work, δ˜ values are computed using the fault-normal and the fault-parallel time histories observed during the Chi-Chi and the Northridge earthquakes [Chiou et al., 2008].


39

As mentioned earlier, the tests described in section 2.4 require independent data samples and hence, pairs of fault-normal and fault-parallel residuals are computed at well-separated sites (separation distances exceeding 20km). Table 2.3 shows a sample of the multivariate normality test results obtained when δ˜ values are computed at different orientations (fault-normal and fault-parallel) and/ or different periods. In Case 1, the δ˜ values corresponding to the fault-normal direction and the fault-parallel direction are computed at the same period (2 seconds). The three tests of multivariate normality report insignificant p-values in this case. In Case 2, the δ˜ ’s corresponding to the fault-normal and the fault-parallel directions are computed at 2 different periods. All three tests report insignificant p-values in Case 2 as well. Finally, it is intended to check if a larger separation in the periods affects the bivariate distributional properties. Hence, in Case 3, the fault normal δ˜ values are computed at 0.5 seconds, while the faultparallel δ˜ values are computed at 10 seconds. It can be seen from the table that the p-values are highly insignificant in this case as well.

2.5

Testing the assumption of multivariate normality for spatially distributed data

The tests that have been described so far are only valid for testing random vectors using independent samples. While testing spatially-distributed data from a given earthquake, ground-motion recordings at closely-separated sites should also be considered and hence, it is not possible to obtain independent samples using the techniques described in section 2.4. Hence, certain other tests are needed for testing the multivariate normality assumption for ground-motion intensities distributed over space. Multivariate normality can be ascertained by verifying univariate normality, bivariate normality, trivariate normality etc. Goovaerts [1997] and Deutsch and Journel [1998] described a procedure to test the assumption of bivariate normality of spatially-distributed data whose marginal distribution is standard normal. This test procedure can be used to verify whether pairs of residuals computed at two different sites during a single earthquake follow a bivariate normal distribution. The test is described in the following subsection, followed by test results from recorded ground


40

motions.

2.5.1

Check for bivariate normality

Let X(u) denote the random variable (for example, the residuals) in consideration at location u and let X(u + h) denote the random variable in consideration at location u + h (h denotes the spatial separation between the 2 locations). The procedure to test bivariate normality [Goovaerts, 1997, Deutsch and Journel, 1998] involves the comparison of the indicator semivariogram of the data (the experimental indicator semivariogram) to the theoretical indicator semivariogram obtained by assuming that (X(u), X(u + h)) follows a bivariate normal distribution. An indicator semivariogram is a measure of spatial variability and is defined as follows: 1 γI (h; x p ) = E [I (X(u + h); x p ) − I (X(u); x p )]2 2

(2.13)

where x p denotes the p-quantile of X, and I (X(u); x p ) = 1 if X(u) ≤ x p ; = 0 otherwise. The experimental indicator semivariogram is a regression-based relationship between γI (h; x p ) and h. In this study, an exponential model is assumed as the form of the regression. Based on an exponential model, the experimental indicator semivariogram can be defined as follows: γI (h; x p ) = ax p 1 − exp −3h/bx p

(2.14)

where ax p and bx p are the sill and the range of the experimental indicator semivariogram respectively. The sill of a semivariogram equals the variance of X, while the range of a semivariogram is defined as the separation distance h at which γI (h; x p ) equals 0.95 times the sill (for the exponential model). The range and the sill can be computed using nonlinear least squares regression based on observed values of γI (h; x p ) and h. The values (observed) of γI (h; x p ) for a given data set can be obtained as follows (based on Equation 2.13):

γI (h; x p ) =

1 N(h) ∑ [I (X(uα + h); x p) − I (X(uα ); x p)]2 2N(h) α=1

(2.15)


41

where N(h) is the number of pairs of data points separated by h (within some tolerance); and (X(uα + h), X(uα )) denotes the α th such pair. Theoretically, if X(u) and X(u + h) follow a bivariate normal distribution, the indicator semivariogram is [Goovaerts, 1997]: "

1 γI (h; x p ) = p − p2 + 2π

Z sin−1Cx (h)

exp 0

−x2p 1 + sin(θ )

!

# dθ

(2.16)

where Cx (h) denotes the covariance model of X, given as follows: CX (h) = Covariance(X(u), X(u + h))

(2.17)

The null hypothesis that X(u) and X(u + h) follow a bivariate normal distribution is not rejected if the experimental indicator semivariogram compares well to the theoretical indicator semivariogram. As mentioned earlier, univariate and bivariate normality are not sufficient conditions for multivariate normality. For realistic data sets, however, the tests for trivariate normality and normality at other higher dimensions are impractical. This is because, for example, the trivariate normality test requires many triplets of data points that have the same geometric configuration (in terms of the spatial orientation of the three points), which are usually not available. Hence, in practice, if the sample statistics do not show a violation of the univariate and bivariate normalities, a multivariate normal model can be assumed for X [Goovaerts, 1997].

2.5.2


If the spatially-distributed normalized intra-event residuals (ε˜ ) follow a multivariate normal distribution, it can be seen from equation 2.2 that the logarithmic spectral accelerations conditioned on the predicted median spectral accelerations will be multivariate normal as well. This is because the inter-event residuals at any particular period are constant across all sites, during any single earthquake. Hence, in this section, normality tests are carried out on the normalized intra-event residuals (ε˜ ) only.


42

It has been shown previously that the ε˜ values can be represented by a normal distribution marginally, and hence, only the bivariate normality test results are considered in this section. To prevent the deviations in the sample’s univariate distribution from the normal distribution (which can arise even if the population actually follows a univariate normal distribution) from affecting the results of the bivariate normality test, the univariate distributions of ε˜ are transformed to the standard normal space using the normal score transform procedure. As mentioned earlier, the normal score transform of the univariate distribution does not change the basic nature of the bivariate distributions and hence, does not interfere with the test of bivariate normality. The procedure to test the bivariate normality of spatially-distributed data described by Goovaerts [1997] involves comparing the theoretical and the experimental indicator semivariograms obtained based on the ε˜ values computed at various periods and for all quantiles x p (Equation 2.14 and 2.16). However such an exhaustive test is practically impossible and so a few sample periods and quantiles are tested here. Based on the symmetry of the bivariate normal distribution, only values of p in the interval [0, 0.5] are needed. The authors present results corresponding to p = 0.1, 0.25 and 0.5, so as to cover the entire range. The periods chosen for the illustrations vary over the range of periods for which the groundmotion models are usually valid. Figures 2.5a-c show comparisons of the theoretical and the experimental indicator semivariograms obtained using the Chi-Chi data set, with the ε˜ values computed at a period of 2 seconds. It is to be noted that all records (that are usuable at the chosen period) can be part of the sample data used for obtaining the experimental indicator semivariograms (unlike in section 2.4 where the sample data had to be independent of each other). The theoretical and the experimental indicator semivariograms match reasonably well in all cases. Figure 2.5d shows the comparison of the theoretical and the experimental indicator semivariograms (p = 0.25) for the ε˜ values computed at T = 2 seconds based on the Northridge earthquake data set, and a reasonable match can be seen there as well. Similar plots are obtained using the Northridge and the Chi-Chi earthquake data sets and are shown in Figure 2.6. In obtaining this figure, the value of p is kept constant at 0.25, while the value of T is varied from as low as 0.5 seconds to as high as 5 seconds. A reasonably good match between the theoretical and the experimental semivariograms can be seen in these


(a)

0.2

γI(h;x p)

γI(h;x p)

0.12 0.08 0.04 0

(b)

0.2

Experimental indicator semivariogram Theoretical indicator semivariogram

0.16

0.16 0.12 0.08 0.04

0

50

100

150

200

0

0

50

Distance (km)

0.2

0.2

γI(h;x p)

0.25

γI(h;x p)

0.25

0.1

0.05

0.05 50

100

150

Distance (km)

200

0.15

0.1

0

150

(d)

0.3

0.15

100

Distance (km)

(c)

0.3

0

43

200

0

0

50

100

150

200

Distance (km)

Figure 2.5: Theoretical and empirical semivariograms for residuals computed at 2 seconds: (a) results for the 0.1 quantile of the residuals from the Chi-Chi data (b) results for the 0.25 quantile of the residuals from the Chi-Chi data (c) results for the 0.5 quantile of the residuals based from the Chi-Chi data (d) results for the 0.25 quantile of the residuals from the Northridge data.


44

figures as well. All these results suggest that bivariate normality can be safely assumed for spatially-distributed ε˜ ’s. Incidentally, it can be seen from Figures 2.5 and 2.6 that the sill of the indicator semivariograms equals p(1 − p), which is a consequence of the independence between well-separated intra-event residuals [Goovaerts, 1997].

2.6

Conclusions

Statistical tests have been used to test the assumption of joint normality of logarithmic spectral accelerations. Joint normality of logarithmic spectral accelerations was verified by testing the multivariate normality of inter-event and intra-event residuals. Univariate normality of inter-event and intra-event residuals was studied using normal Q-Q plots. The normal Q-Q plots showed strong linearity, indicating that the residuals are well represented by a normal distribution marginally. No evidence was found to support truncation of the marginal distribution of intra-event residuals as is sometimes done in PSHA. Using the Henze-Zirkler test, the Mardia’s test of skewness and the Mardia’s test of kurtosis, it was shown that inter-event and the intra-event residuals at a site, computed at different periods, follow multivariate normal distributions. The normality test of Goovaerts was used to illustrate that pairs of spatially-distributed intra-event residuals can be represented by the bivariate normal distribution. For a set of correlated spatially-distributed data, it is practically impossible to ascertain the trivariate normality and the normality at higher dimensions and hence, the presence of univariate and bivariate normalities is considered to indicate multivariate normality of the spatially-distributed intra-event residuals [Goovaerts, 1997]. The results reported in this study are based on the residuals computed using the ground-motion model of Campbell and Bozorgnia [2008], but similar results were obtained when using the Boore and Atkinson [2008] ground-motion model. This study provides a sound statistical basis for assumptions regarding the marginal and joint distribution of ground-motion parameters that must be made for a variety of seismic hazard calculations.


(a)

0.25

0.2

γI(h;x p)

γI(h;x p)

(b)

0.25

0.2 0.15 0.1

Experimental indicator semivariogram Theoretical indicator semivariogram

0.05 0

0

50

100

150

200

0.15 0.1 0.05 0

0

50

Distance (km)

150

200

(d)

0.2

0.16

0.16

0.12

γI(h;x p)

γI(h;x p)

100

Distance (km)

(c)

0.2

0.08 0.04 0

45

0.12 0.08 0.04

0

50

100

150

Distance (km)

200

0

0

50

100

150

200

Distance (km)

Figure 2.6: Theoretical and empirical semivariograms for the 0.25 quantile of the residuals: (a) results for the residuals computed at 0.5 seconds from the Northridge data (b) results for the residuals computed at 0.5 seconds from the Chi-Chi data (c) results for the residuals computed at 1 second from the Chi-Chi earthquake data (d) results for the residuals computed at 5 seconds from the Chi-Chi data.


2.7

46

Data source

The data for all the ground motions studied here came from the PEER NGA Database [2005]. http://peer.berkeley.edu/nga (last accessed 18 May 2007).

2.8

Appendix: Normal score transform

The data sample can be transformed to have a standard normal distribution by a normal score transform. The transformation involves equating the various quantiles of the data to the corresponding quantiles of a standard normal distribution. Let z represent the given data set and let the empirical cumulative distribution function ˆ ˆ of the data be denoted by F(z). The F(z)-quantile of the standard normal distribution ˆ is given by Φ−1 F(z) , where Φ represents the standard normal cumulative distribution function. Hence, for a given zk , the corresponding normal score value (yk ) is computed as follows: ˆ k) yk = Φ−1 F(z

(2.18)


Case 1 2 3 4 5

47

Table 2.2: Tests on inter-event residuals computed at different periods Periods (secs) PHZ PSK PKT T={1.0,2.0} 0.85 0.20 0.35 T={0.5,0.75,1.0,1.5,2.0} 0.00 0.01 0.01 T={0.5,0.75,1.0,1.5,2.0; Norm.} 0.24 0.11 0.11 T={0.5,1.0,2.0,5.0,7.5} 0.79 0.28 0.41 T={5.0,7.5,10.0} 0.68 0.18 0.31

Explanation of Abbreviations used in the table Data transformed to the standard normal space

a Norm.:

Table 2.3: Tests on residuals corresponding to two orthogonal directions (fault-normal and fault-parallel directions) Case 1 2 3

Periods (secs) T1 =2;T2 =2 T1 =1;T2 =2 T1 =0.5;T2 =10

PHZ 0.14 0.17 0.94

PSK 0.13 0.34 0.80

PKT 0.41 0.96 0.22

Chapter 3 Correlation model for spatially distributed ground-motion intensities N. Jayaram and J.W. Baker (2009). Correlation model for spatially-distributed groundmotion intensities, Earthquake Engineering and Structural Dynamics, 38(15), 1687-1708.

3.1

Abstract

Risk assessment of spatially-distributed building portfolios or infrastructure systems requires quantification of the joint occurrence of ground-motion intensities at several sites, during the same earthquake. The ground-motion models that are used for site-specific hazard analysis do not provide information on the spatial correlation between ground-motion intensities, which is required for the joint prediction of intensities at multiple sites. Moreover, researchers who have previously computed these correlations using observed groundmotion recordings differ in their estimates of spatial correlation. In this chapter, ground motions observed during seven past earthquakes are used to estimate correlations between spatially-distributed spectral accelerations at various spectral periods. Geostatistical tools are used to quantify and express the observed correlations in a standard format. The estimated correlation model is also compared to previously published results, and apparent discrepancies among the previous results are explained.

48

CHAPTER 3. SPATIAL CORRELATION MODEL

49

The analysis shows that the spatial correlation reduces with increasing separation between the sites of interest. The rate of decay of correlation typically decreases with increasing spectral acceleration period. At periods longer than 2 seconds, the correlations were similar for all the earthquake ground motions considered. At shorter periods, however, the correlations were found to be related to the local-site conditions (as indicated by site Vs 30 values) at the ground-motion recording stations. The research work also investigates the assumption of isotropy used in developing the spatial correlation models. It is seen using Northridge and Chi-Chi earthquake time histories that the isotropy assumption is reasonable at both long and short periods. Based on the factors identified as influencing the spatial correlation, a model is developed that can be used to select appropriate correlation estimates for use in practical risk assessment problems.

3.2

Introduction

The probabilistic assessment of ground-motion intensity measures (such as spectral acceleration) at an individual site is a well researched topic. Several ground-motion models have been developed to predict median ground-motion intensities as well as dispersion about the median values [e.g., Boore and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]. Site-specific hazard analysis does not suffice, however, in many applications that require knowledge about the joint occurrence of ground-motion intensities at several sites, during the same earthquake. For instance, the risk assessment of portfolios of buildings or spatially-distributed infrastructure systems (such as transportation networks, oil and water pipeline networks and power systems) requires prediction of ground-motion intensities at multiple sites. Such joint predictions are possible, however, only if the correlation between ground-motion intensities at different sites are known [e.g., Lee and Kiremidjian, 2007, Bazzurro and Luco, 2004]. The correlation is known to be large when the sites are close to one another, and decays with increase in separation between the sites. Park et al. [2007] report that ignoring or underestimating these correlations overestimates frequent losses and underestimates rare ones, and hence, it is important that accurate ground-motion correlation models be developed for loss assessment purposes. The current work analyzes correlations between the ground-motion


50

intensities observed in recorded ground motions, in order to identify factors that affect these correlations, and to select a correlation model that can be used for the joint prediction of spatially-distributed ground-motion intensities in future earthquakes. Ground-motion models that predict intensities at an individual site i due to an earthquake j take the following form: ln(Yi j ) = ln Y¯i j + εi j + η j

(3.1)

where Yi j denotes the ground-motion parameter of interest (e.g., Sa (T ), the spectral acceleration at period T ); Y¯i j denotes the predicted (by the ground-motion model) median groundmotion intensity (which depends on parameters such as magnitude, distance, period and local-site conditions); εi j denotes the intra-event residual, which is a random variable with zero mean and standard deviation σi j ; and η j denotes the inter-event residual, which is a random variable with zero mean and standard deviation τ j . The standard deviations, σi j and τ j , are estimated as part of the ground-motion model and are a function of the spectral period of interest, and in some models also a function of the earthquake magnitude and the distance of the site from the rupture. During an earthquake, the inter-event residual η j computed at any particular period is a constant across all the sites. Chapter 2 [Jayaram and Baker, 2008] showed that a vector of spatially-distributed intra event residuals ε j = ε1 j , ε2 j , · · · , εd j follows a multivariate normal distribution. Hence, the distribution of ε j can be completely defined using the first two moments of the distribution, namely, the mean and variance of ε j , and the correlation between all εi1 j and εi2 j pairs (Alternately, the distribution can be defined using the mean and the covariance of ε j , since the covariance completely specifies the variance and correlations.) Since the intra-event residuals are zero-mean random variables, the mean of ε j is the zero vector of dimension d. The covariance, however, is not entirely known from the ground-motion models since the models only provide the variances of the residuals, and not the correlation between residuals at two different sites. Researchers, in the past, have computed these correlations using ground-motion time histories recorded during earthquakes [Goda and Hong, 2008, Wang and Takada, 2005, Boore et al., 2003]. Boore et al. [2003] used observations of peak ground acceleration


51

(PGA, which equals Sa (0)) from the 1994 Northridge earthquake to compute the spatial correlations. Wang and Takada [2005] computed the correlations using observations of peak ground velocities (PGV) from several earthquakes in Japan and the 1999 Chi-Chi earthquake. Goda and Hong [2008] used the Northridge and Chi-Chi earthquake groundmotion records to compute the correlation between PGA residuals, as well as the correlation between residuals computed from spectral accelerations at three periods between 0.3 seconds and 3 seconds. The results reported by these research works, however, differ in terms of the rate of decay of correlation with separation distance. For instance, while Boore et al. [2003] report that the correlation drops to zero at a site separation distance of approximately 10 km, the non-zero correlations observed by Wang and Takada [2005] extend past 100 km. Further, Goda and Hong [2008] observe differences between the correlation decay rate estimated using the Northridge earthquake records and the correlation decay rate based on the Chi-Chi earthquake records. To date, no explanation for these differences has been identified. The current work uses observed ground motions to estimate correlations between spectral accelerations at the same period. (Appendix A describes the estimation of crosscorrelations between spectral accelerations at two different periods.) Factors that affect the rate of decay in the correlation with separation distance are identified. The work also provides probable explanations for the differing results reported in the literature. In this study, an emphasis is placed on developing a standard correlation model that can be used for predicting spatially-distributed ground-motion intensities for risk assessment purposes.

3.3

Modeling correlations using semivariograms

Geostatistical tools are widely used in several fields for modeling spatially-distributed random vectors (also called random functions) [Deutsch and Journel, 1998, Goovaerts, 1997]. The current research work takes advantage of this well-developed approach to model the correlation between spatially-distributed ground-motion intensities. The needed tools are briefly described in this section. Let Z = (Zu1 , Zu2 , · · · , Zud ) denote a spatially-distributed random function, where ui denotes the location of site i; Zui is the random variable of interest (in this case, εui j from


52

equation 3.1) at site location ui and d denotes the total number of sites. The correlation structure of the random function Z can be represented by a semivariogram, which is a measure of the average dissimilarity between the data [Goovaerts, 1997]. Let u and u0 denote two sites separated by h. The semivariogram (γ(u, u0 )) is computed as half the expected squared difference between Zu and Zu0 . 1 γ(u, u0 ) = E {Zu − Zu0 }2 2

(3.2)

The semivariogram defined in equation 3.2 is location-dependent and its inference requires repetitive realizations of Z at locations u and u0 . Such repetitive measurements of {Zu , Zu0 } are, however, never available in practice (e.g., in the current application, one would need repeated observations of ground motions at every pair of sites of interest). Hence, it is typically assumed that the semivariogram does not depend on site locations u and u0 , but only on their separation h. The stationary semivariogram (γ(h)) can then be obtained as follows:

1 γ(h) = E {Zu − Zu+h }2 2

(3.3)

Equation 3.2 can be replaced with equation 3.3 if the random function (Z) is secondorder stationary. Second-order stationarity implies that (i) the expected value of the random variable Zu is a constant across space and (ii) the two-point statistics (measures that depend on Zu and Zu0 ) depend only on the separation between u and u0 , and not on the actual locations (i.e., the statistics depend on the separation vector h between u and u0 and not on u and u0 as such). A stationary semivariogram can be estimated from a data set as follows: 1 N(h) ˆ γ(h) = ∑ {zuα − zuα +h}2 2N(h) α=1

(3.4)

ˆ where γ(h) is the experimental stationary semivariogram (estimated from a data set); zu denotes the data value at location u; N(h) denotes the number of pairs of sites separated by h; and {zuα , zuα +h } denotes the α’th such pair. A stationary semivariogram is said to be isotropic if it is a function of the separation distance (h = khk) rather than the separation vector h. ˆ The function γ(h) provides a set of experimental values for a finite number of separation


53

Figure 3.1: (a) Parameters of a semivariogram (b) Semivariograms fitted to the same data set using the manual approach and the method of least squares. vectors h. A continuous function must be fitted based on these experimental values in order to deduce semivariogram values for any possible separation h. A valid (permissible) semivariogram function needs to be negative definite so that the variances and conditional variances corresponding to this semivariogram are non-negative. In order to satisfy this condition, the semivariogram functions are usually chosen to be linear combinations of basic models that are known to be permissible. These include the exponential model, the Gaussian model, the spherical model and the nugget effect model. The exponential model, in an isotropic case (i.e., the vector distance h is replaced by a scalar separation length khk, also denoted as h), is expressed as follows: γ(h) = a [1 − exp (−3h/b)]

(3.5)

where a and b are the sill and the range of the semivariogram function respectively (Figure 3.1a). The sill of a semivariogram equals the variance of Zu , while the range is defined as the separation distance h at which γ(h) equals 0.95 times the sill of the exponential semivariogram.


54

The Gaussian model is as follows: γ(h) = a 1 − exp −3h2 /b2

(3.6)

The sill and the range of a Gaussian semivariogram are as defined for an exponential semivariogram. The Spherical model is as follows: " # 1 h 3 3 h if h ≤ b − γ(h) = a 2 b 2 b

(3.7)

= a otherwise where a and b are again the sill and range of the semivariogram, respectively. The range of a spherical semivariogram is the separation distance at which γ(h) equals a. The nugget effect model can be described as: γ(h) = a [I (h > 0)]

(3.8)

where I (h > 0) is an indicator variable that equals 1 when h > 0 and equals 0 otherwise. The covariance structure of Z is completely specified by the semivariogram function and the sill and the range of the semivariogram. It can be theoretically shown that the following relationship holds [Goovaerts, 1997]: γ(h) = a (1 − ρ (h))

(3.9)

where ρ (h) denotes the correlation coefficient between Zu and Zu+h . It can also be shown that the sill of the semivariogram equals the variance of Zu . Therefore, it would suffice to estimate the semivariogram of a random function in order to determine its covariance structure. Moreover, based on equations 3.5 (for instance) and 3.9, it can be seen that a large range implies a small rate of increase in γ(h) and therefore, large correlations between Zu and Zu+h . Further, it can be seen from equation 3.8 that the nugget effect model specifies zero correlation for all non-zero separation distances. In the current work, correlations between ground-motion intensities at different sites


55

are represented using semivariograms. Ground-motion recordings from past earthquakes are used to estimate ranges of semivariograms and to identify the factors that could affect the estimates. Throughout this work, the semivariograms are assumed to be second-order stationary. Second-order stationarity is assumed so that the data available over the entire region of interest can be pooled and used for estimating semivariogram sills and ranges. In the current work, like many other works involving spatial-correlation estimation, the semivariograms are also assumed to be isotropic. The assumptions of stationarity and isotropy are investigated in more detail subsequently in this chapter.

3.4

Computation of semivariogram ranges for intra-event residuals using empirical data

As mentioned earlier, the covariance of intra-event residuals can be represented using a semivariogram, whose functional form (e.g., exponential model), sill and range need to be determined. This section discusses the semivariograms estimated based on observed ground-motion time histories. For a given earthquake, it can be seen from equation 3.1 that, εi + η = ln(Yi ) − ln (Y¯i )

(3.10)

Let εì denote the normalized intra-event residual at site i (The subscript j in equation 3.1 is no longer used since the residuals used in these calculations are observed during a single earthquake.) εì is computed as follows: εì =

εi σi

(3.11)

where σi denotes the standard deviation of the intra-event residuals at site i. Further, let ε˜i denote the sum of the intra-event residual (εi ) and inter-event residual (η) normalized by the standard deviation of the intra-event residual (σi ). ε˜i can be computed as follows: ε˜i =

εi + η ln(Yi ) − ln(Y¯i ) = σi σi

(3.12)


56

While assessing covariances, it is convenient to work with ε` ’s rather than ε’s, since ε` ’s are homoscedastic (i.e., constant variance) with unit variance unlike the ε’s. Since the inter-event residual (η), computed at any particular period, is a constant across all the sites during a given earthquake, the experimental semivariogram function of ε` can be obtained as follows (based on equation 3.4): ˆ γ(h) =

1 N(h) [εùα − εùα +h ]2 ∑ 2N(h) α=1 1 N(h) ln (Yuα ) − ln (Y¯uα ) − η

(3.13)

ln (Yuα +h ) − ln (Y¯uα +h ) − η = − ∑ 2N(h) α=1 σuα σuα +h 2 1 N(h) ln (Yuα ) − ln (Y¯uα ) ln (Yuα +h ) − ln (Y¯uα +h ) ≈ − ∑ 2N(h) α=1 σuα σuα +h =

2

1 N(h) ∑ [ε˜uα − ε˜uα +h]2 2N(h) α=1

where ε˜ is defined by equation 3.12; (uα , uα + h) denotes the location of a pair of sites separated by h; N(h) denotes the number of such pairs; Yuα denotes the ground-motion intensity at location uα ; and σuα is the standard deviation of the intra-event residual at ˆ location uα . The sill of the semivariogram of ε` (i.e., the sill of γ(h)) should equal 1 since the ε` ’s have a unit variance. Hence, based on equation 3.9, it can be concluded that: ˆ γ(h) = 1 − ρˆ (h)

(3.14)

where ρˆ (h) is the estimate of ρ (h). Incidentally, equation 3.13 shows that the covariances of intra-event residuals can be estimated without having to account for the inter-event residual η. As indicated, equation 3.13 involves an approximation due to the mild assumption that

η σuα

=

η σuα +h .

The Boore

and Atkinson [2008] model, which is used in the current work, suggests that the standard deviation of the intra-event residuals depends only on the period at which the residuals are computed, and hence, it can be inferred that this approximation is reasonable. Incidentally, though the current work only uses the Boore and Atkinson [2008] ground-motion model, the results obtained were found to be similar when an alternate model, namely, the Chiou


57

and Youngs [2008] model, was used. The ground motion databases typically report recordings in two orthogonal horizontal directions. For instance, the PEER NGA database [Chiou et al., 2008] provides the faultnormal and the fault-parallel components of the ground motions for each earthquake. In the current work, it was found that the correlations computed using both the fault-normal and the fault-parallel time-histories were similar. Hence, only results corresponding to the fault-normal orientation are reported here. In fact, Baker and Jayaram [2009] and Bazzurro et al. [2008] used several sets of recorded and simulated ground motions to show that the estimated correlations are independent of the ground-motion component used.

3.4.1

Construction of experimental semivariograms using empirical data

Figure 3.1a shows a sample semivariogram constructed from empirical data. The first step in obtaining such a semivariogram is to compute site-to-site distances for all pairs of sites and place them in different bins based on the separation distances. For example, the bins could be centered at multiples of h km with bin widths of δ h km (δ h ≤ h). All pairs ofsites that fall in the bin centered at h km (i.e., the sites that are separated by a distance

ˆ (based on equation 3.4)). If δ h is chosen to be ∈ h − δ2h , h + δ2h are used to compute γ(h) very small, it can result in few pairs of sites in the bins, which will affect the robustness of the results obtained. On the other hand, a large value of δ h will mix site pairs with differing distances reducing the resolution of the experimental semivariograms. In the current work, experimental semivariograms are obtained using δ h = 2 km (unless stated otherwise), since this was seen to be the smallest value that results in a reasonable number of site pairs in the bins. The semivariogram shown in Figure 3.1a has an exponential form with a sill of 1 and a range of 40 km. This model can be expressed as follows (based on equation 3.5): γ(h) = 1 − exp (−3h/40) The correlation function corresponding to this model equals ρ(h) = 1 − γ(h) =

(3.15)


58

exp (−3h/40) (based on equation 3.14). An easy and transparent method to determine the model and the model parameters is to fit the experimental semivariogram values obtained at discrete separation distances manually. Suppose that γ(h) can be expressed as follows: N

γ(h) = c0 γ0 (h) + ∑ cn γn (h)

(3.16)

n=1

where γ0 (h) is a pure nugget effect and γn (h) is a spherical, exponential or Gaussian model (as defined in equations 3.5-3.8); cn is the contribution of the model n to the semivariogram; and N is the total number of models used (excluding the nugget effect). The ranges and the contributions of the models can be systematically varied to obtain the best fit to the experimental semivariogram values. In the following sections, priority is placed on building models that fit the empirical data well at short distances, even if this requires some misfit with empirical data at large separation distances, because it is more important to model the semivariogram structure well at short separation distances. This is because the large separation distances are associated with low correlations, which thus have relatively little effect on joint distributions of ground motion intensities. In addition to having low correlation, widely separated sites also have little impact on each other due to an effective ’screening’ of their influence by more closely-located sites (Goovaerts, 1997). (It is to be noted that in cases where there are fewer than 10 closely spaced points, the influence of farther away points will not be completely screened, according to Goovaerts [1997]. In such cases, the correlation model developed in this study might provide slightly inaccurate correlation estimates. This might, however, be mitigated by the fact that the large separation distances are associated with low correlations, which thus have relatively little effect on joint distributions of ground motion intensities.) Figure 3.1b shows sample semivariograms fitted to a data set using the the manual approach and the method of least squares. It can be seen that, at small separations, the manually-fitted semivariogram is a better model than the one fitted using the method of least squares. More detailed discussion on the advantages of using manual-fitting rather than least-squares fitting follows in section 3.6, where the proposed approach is also compared to approaches used in previous research on this topic.


59

Figure 3.2: Range of semivariograms of ε˜ , as a function of the period at which ε˜ values are computed: (a) the residuals are obtained using the Northridge earthquake data (b) the residuals are obtained using the Chi-Chi earthquake data.

3.4.2

1994 Northridge earthquake recordings

This section discusses the ranges of semivariograms estimated using observed Northridge earthquake ground motions. The manual fitting approach described previously is used to compute ranges of the semivariograms of ε˜ ’s (obtained based on the Northridge earthquake time histories) computed at seven periods ranging between 0 seconds and 10 seconds. Of the three functional forms considered (equations 3.5-3.7), the exponential model is found to provide the ‘best fit’ (particularly at small separations) for experimental semivariograms obtained using ε˜ ’s computed at several different periods, based on recordings from different earthquakes. The constancy of the semivariogram function across periods makes it simpler to specify a standard correlation model for the ε˜ ’s. Moreover, the use of a single model enables a direct comparison of the correlations between residuals computed at different periods, using only the ranges of the semivariograms. The ranges of these estimated semivariograms are plotted against period in Figure 3.2a. The semivariogram fits corresponding to all the periods considered can be found in Appendix B. It can be observed from Figure 3.2a that the estimated range of the semivariogram tends to increase with period. As described earlier, it can be inferred that the ε˜ values at long periods show larger correlations than those at short periods. This is consistent with


60

comparable past studies of ground motion coherency, which has been widely researched in the past. Coherency can be thought of as a measure of similarity in two spatially separated ground motion time histories. Der Kiureghian [1996] reports that coherency is reduced by the scattering of waves during propagation, and that this reduction is greater for high frequency waves. High-frequency waves, which have short wavelengths, tend to be more affected by small scale heterogeneities in the propagation path, and as a result tend to be less coherent than long period ground waves [Zerva and Zervas, 2002]. It is reasonable to expect highly coherent ground motions to exhibit correlated peak amplitudes (i.e., spectral accelerations) as well. Since the ε˜ ’s studied here, which quantify these peak amplitudes, tend to show the same correlation trend with period as previous coherency studies, it may be that a similar wave-scattering mechanism is partially responsible for the correlation trends observed here. The Northridge earthquake data used for the above analysis are obtained from the NGA database. In order to exclude records whose characteristics differ from those used by the ground-motion modelers for data analysis, in most cases, only records used by the authors of the Boore and Atkinson [2008] ground-motion model are considered. For the purposes of this chapter, these records are denoted ‘usable records’. The semivariograms of residuals computed at periods of 5, 7.5 and 10 seconds, however, are obtained using all available Northridge records in the NGA database. This is on account of the limited number of Northridge earthquake recordings at extremely long periods. At 5 seconds, the residuals can be computed using 158 total available records, while 66 of these are used by the groundmotion model authors. Since there is a reasonable number of records available in both cases, a semivariogram constructed using all 158 records (denoted SV1 ) can be compared to that estimated from the usable 66 records (in this case, the bin size was increased to 4 km to compensate for the lack of available records) (denoted SV2 ). The ranges of the two semivariograms, SV1 and SV2 , are 40 km and 30 km respectively. This shows that there is a slight difference in the estimated ranges, which could be due to the additional correlated systematic errors introduced by the extra records. As mentioned in section 3.2, correlation between intensities estimated using the faultnormal components are discussed in this chapter. This is because the correlations obtained using the fault-normal and the fault-parallel ground motions were found to be similar. For


61

example, the semivariogram of ε˜ ’s computed at 2 seconds, based on the fault-parallel ground motions recorded during the Northridge earthquake was found to be reasonably modeled using an exponential function with a unit sill and a range of 36 km. The corresponding range for the semivariogram based on the fault-normal ground motions equals 42 km. Similar results were observed when the residuals were computed at other periods, and using other earthquake recordings.

3.4.3

1999 Chi-Chi earthquake

In this section, the semivariogram ranges of ε˜ ’s from the Chi-Chi earthquake recordings are presented. The Chi-Chi earthquake ground motions came from the NGA database. Only records used by the authors of the Boore and Atkinson [2008] ground-motion model are considered. The summary plot of the estimated ranges is shown in Figure 3.2b. (The semivariograms are shown in Appendix B.) The following can be observed from the figures: (a) As seen with the Northridge earthquake data, the range of the semivariogram typically increases with period (An exception is observed when the peak ground accelerations (PGA) are considered, and this is explored further subsequently in this chapter.) (b) The ranges are higher, in general, than those observed based on the Northridge earthquake data (Figure 3.2a). This is consistent with observations made by other researchers considering Northridge and Chi-Chi earthquake data [e.g., Goda and Hong, 2008]. The large ranges obtained here, relative to the comparable results from Northridge, can be explained using the Vs 30 values (average shear-wave velocities in the top 30 m of the soil) at the recording stations (The author found an empirical link between the range and Vs 30, but not between range and other earthquake- and site-related parameters such as magnitude, distance. Further research using bigger datasets is necessary to quantify such links.) The Vs 30 values are commonly used in ground-motion models as indicators of the effects of local-site conditions on the ground motion. ε˜ 0 s are affected if the predicted ground-motion intensities are affected by inaccurate Vs 30 values, or if the Vs 30’s are inadequate to capture the local-site effects entirely (i.e., the ground-motion models do not entirely capture the local-site effects using Vs 30 values). Close to 70% of the Taiwan site Vs 30 values are inferred from Geomatrix site classes,


62

while the rest of the Vs 30 values are measured (NGA database). Since closely-spaced sites are likely to belong to the same site class and posess similar (and unknown) Vs 30 values, errors in the inferred Vs 30 values are likely to be correlated among sites that are close to each other. Such correlated Vs 30 measurement errors will result in correlated prediction errors at all these closely-spaced sites, which will increase the range of the semivariograms. The larger ranges of semivariograms estimated using the Chi-Chi earthquake ground motions may also be due to possible correlation between the true Vs 30 values (and not just the correlation between the Vs 30 errors). Larger correlation between the Vs 30’s indicate a more homogeneous soil (homogeneous in terms of properties that affect site effects but not accounted by the ground-motion models). In such cases, if a ground-motion model does not accurately capture the local-site effect at one site, it is likely to produce similar prediction errors in a cluster of closely-spaced sites (on account of the homogeneity). Castellaro et al. [2008] compared the site-dependent seismic amplification factors (Fa , the site amplification factor is defined as the amplification of the ground-motion spectral level at a site with respect to that at a reference ground condition [Borcherdt, 1994]) observed during the 1989 Loma Prieta earthquake to the corresponding site Vs 30 values. They found substantial scatter in the plot of Fa versus Vs 30, and also found that this scatter was more pronounced at short periods (below 0.5 seconds) than at longer periods. This suggests that groundmotion intensity predictions based on Vs 30 will have errors, particularly at periods below 0.5 seconds. Figures 3.3a and 3.3b show semivariograms of the normalized Vs 30 values (the Vs 30 semivariogram is not to be confused with the ε˜ semivariogram) at the Northridge earthquake recording stations and the Chi-Chi earthquake recording stations respectively (Normalization involves scaling the Vs 30 values so that the normalized Vs 30 values have a unit variance to enable a direct comparison of the semivariograms.) Figure 3.3a shows significant scatter at all separation distances indicating zero correlation at all separations. In contrast, Figure 3.3b indicates that the Taiwan Vs 30 values have significant spatial correlation. This suggests that ε˜ ’s may have additional spatial correlation in Taiwan, due to homogeneous site effects that cause correlated prediction errors. As mentioned previously, one notable aberration in the plot of range versus period (Figure 3.2b) is the large range observed when the residuals are computed at 0 seconds


63

Figure 3.3: (a) Experimental semivariogram obtained using normalized Vs 30’s at the recording stations of the Northridge earthquake. No semivariogram is fitted on account of the extreme scatter (b) Experimental semivariogram obtained using normalized Vs 30’s at the recording stations of the Chi-Chi earthquake. The range of the fitted exponential semivariogram equals 25 km. as compared to some of the longer periods. This is not consistent with the coherency argument of the previous section. It can, however, be explained using the relationship between the range and the Vs 30’s described in the above paragraphs. The inaccuracies in ground-motion prediction based on Vs 30’s will reflect in increased correlation between the residuals computed at nearby sites. These inaccuracies are larger at short periods (below 0.5 seconds) [Castellaro et al., 2008], which explains the larger correlation between the residuals (which ultimately results in the larger range observed) computed using PGAs. One final test that was considered here was whether spatial correlations differed for near-fault ground motions experiencing directivity. Baker [2007b] identified pulse-like ground motions from the NGA database based on wavelet analysis. Thirty such pulses were identified in the fault-normal components of the Chi-Chi earthquake recordings. Experimental semivariograms of residuals were computed using these pulse-like ground motions, and their ranges were estimated. It was seen that the ranges were reasonably similar to those obtained using all usable ground motions (i.e., pulse-like and non-pulse-like). Since


64

the available pulse-like ground-motion data set is very small, however, the results obtained were not considered to be sufficiently reliable, and hence not considered further in this chapter. A more detailed analysis can be found in Appendix B and Bazzurro et al. [2008]. Based on the discussion in this section, it can be seen that the correlated Vs 30 values and the correlated Vs 30 measurement errors are possible reasons for the larger ranges estimated in section 3.4.3 than in section 3.4.2. Other factors, such as the size of the rupture areas, may also affect the correlations. These factors could not, however, be investigated with the limited data set available.

3.4.4

Other earthquakes

The correlations computed using data from the 2003 M5.4 Big Bear City earthquake, the 2004 M6.0 Parkfield earthquake, the 2005 M5.1 Anza earthquake, the 2007 M5.6 Alum Rock earthquake and the 2008 M5.4 Chino Hills earthquake are presented in this section. The time histories for these earthquakes were obtained from the CESMD database [2008]. The Vs 30 data used for these computations came from the CESMD database [2008] (for the Parkfield earthquake) and the U.S. Geological Survey Vs 30 maps (for the other earthquakes) [Global Vs30 map server, 2008]. Exponential models are fitted to experimental semivariograms of ε˜ ’s computed using the time histories from the above-mentioned earthquakes, at periods ranging from 0 - 10 seconds. Figure 3.4 shows plots of range versus period for the Big Bear City, Parkfield, Alum Rock, Anza and Chino Hills earthquake residuals respectively. The ranges of the semivariograms are generally seen to increase with period, which is consistent with findings from the Chi-Chi and the Northridge earthquake data. It can also be seen from the figure that, at short periods, the ranges obtained from the Anza earthquake data are larger than those from the other earthquakes considered. On the other hand, the ranges computed using the Parkfield earthquake data are fairly small at short periods. Semivariograms of the Vs 30’s at the recording stations for all five earthquakes of interest were computed. The semivariogram range computed using the Anza earthquake Vs 30’s was found to be the largest at 40 km, while the ranges computed from the Chino Hills, Big Bear City, Alum Rock and Parkfield earthquake data were smaller at 35, 30, 18 and approximately 0 km


65

Figure 3.4: Range of semivariograms of ε˜ , as a function of the period at which ε˜ values are computed. The residuals are obtained using the: (a) Big Bear City earthquake data (b) Parkfield earthquake data; (c) Alum Rock earthquake data; (d) Anza earthquake data; (e) Chino Hills earthquake data.


66

Figure 3.5: Ranges of residuals computed using PGAs versus ranges of normalized Vs 30 values. respectively. The estimated ranges of the semivariograms of the residuals and of the Vs 30’s reinforce the argument made previously that clustering in the Vs 30 values (as indicated by a large range of the Vs 30 semivariogram) results in increased correlation among the residuals (the low PGA-based range estimated using the Chino hills earthquake data seems to be an exception, however). This trend is seen in Figure 3.5, which shows the range of PGA-based residuals plotted against the range of the Vs 30’s, for the earthquakes considered in this work. This dependence on the Vs 30 range seems to be lesser at longer periods, which is in line with the observations of Castellaro et al. [2008] that the scatter in the plot of Fa versus Vs 30 is greater at short periods than at long periods. The authors hypothesize that the reduced dependence of range on Vs 30’s at long periods could also be because the long-period ranges are considerably influenced by factors other than Vs 30 values, such as coherency as explained in section 3.4.2 and prediction errors unrelated to Vs 30’s (which are likely since the ground-motion models are fitted using much fewer data points at long periods). Finally, an additional advantage of considering these five additional events is that earthquakes covering a range of magnitudes have been studied. No trends of range with magnitude were detected. A few research works studying spatial correlations use ground-motion recordings from


67

Figure 3.6: (a) Range of semivariograms of ε˜ , as a function of the period at which ε˜ values are computed. The residuals are obtained from six different sets of time histories as shown in the figure; (b) Range of semivariograms of ε˜ predicted by the proposed model as a function of the period. earthquakes in Japan, based on the data provided in the KiK Net [2007]. In this work, data from the 2004 Mid Niigata Prefecture earthquake and the 2005 Miyagi-Oki earthquake were explored. Though the number of sites at which the ground-motion recordings are available is fairly large, most recording stations are far away from each another. The KiK Net [2007] consists of 681 recording stations, of which only 19 pairs of stations are within 10 km of one another. As explained in section 3.4.2, it is important to accurately model the semivariogram at short separation distances, particularly at separation distances below 10 km. Hence, the recordings from the KiK Net [2007] were not considered further for studying the ranges of semivariograms.

3.4.5

A predictive model for spatial correlations

The above sections presented spatial correlations computed using recorded ground motions from several past earthquakes. In this section, these correlation estimates are used to develop a model that can be used to select appropriate correlation estimates for risk assessment purposes. Figure 3.6a shows the ranges computed using various earthquake data as a function of


68

period. From a practical perspective, despite the wide differences in the characteristics of the earthquakes considered, the ranges computed are quite similar, particularly at periods longer than 2 seconds. At short periods (below 2 seconds), however, there are considerable differences in the estimated ranges depending on the ground-motion time histories used. The previous sections suggested empirically that differences in correlation of ε˜ ’s is in large part explained by the Vs 30 values at the recording stations for these earthquakes. Hence, the following cases can be considered for decision making: Case 1: If the Vs 30 values do not show or are not expected to show clustering (i.e., the geologic condition of the soil varies widely over the region (this can be quantified by constructing the semivariogram of the Vs 30’s as explained previously or by using a simplified visual approach described in Section B.4)), the smaller ranges reported in Figure 3.6a will be appropriate. Case 2: If the Vs 30 values show or are expected to show clustering (i.e., there are clusters of sites in which the geologic conditions of the soil are similar), the larger ranges reported in Figure 3.6a should be chosen. Based on these conclusions, the following model was developed to predict a suitable range based on the period of interest: At short periods (T < 1 second), for case 1: b = 8.5 + 17.2T

(3.17)

At short periods (T < 1 second), for case 2: b = 40.7 − 15.0T

(3.18)

At long periods (T ≥ 1 second), for both cases 1 and 2: b = 22.0 + 3.7T

(3.19)

where b denotes the range of the exponential semivariogram (equation 3.5), and T denotes the period. Based on this model, the correlation between normalized intra-event residuals


69

separated by h km is obtained as follows (follows from equations 3.5 and 3.14): ρ(h) = exp (−3h/b)

(3.20)

It is to be noted that the correlations between intra-event residuals will exactly equal the correlations between normalized intra-event residuals defined above. The plot of the predicted range versus period is shown in Figure 3.6b. The model has been developed based on only seven earthquakes, but since the trends exhibited were found to be similar for these seven, it can be expected that the model will predict reasonable ranges for future earthquakes. The predictive model can be used for simulating correlated ground-motion fields for a particular earthquake as follows: Step 1 : Obtain median ground motion values (denoted Y¯i j in equation 3.1) at the sites of interest using a ground-motion model. Step 2 : Probabilistically generate (simulate) the inter-event residual term (η j in equation 3.1), which follows a univariate normal distribution. The mean of the inter-event residual is zero, and its standard deviation can be obtained using ground-motion models. Step 3: Simulate the intra-event residuals (εi j in equation 3.1) using the standard deviations from the ground-motion models and the correlations from equations 3.17 - 3.20. Step 4: Combine the three terms generated in Steps 1 - 3 using equation 3.1 to obtain simulated ground-motions at the sites of interest.

3.5

Isotropy of semivariograms

This section examines the assumption of isotropy of semivariograms using the ground motions discussed previously.

3.5.1

Isotropy of intra-event residuals

A stationary semivariogram (γ(h)) is said to be isotropic if it depends only on the separation distance h = khk, rather than the separation vector h. Anisotropy is said to be present when the semivariogram is also influenced by the orientation of the data locations. The


70

presence of anisotropy can be studied using directional semivariograms [Goovaerts, 1997]. Directional semivariograms are obtained as shown in equation 3.4 except that the estimate is obtained using only pairs of (zuα , zuα +h ) such that the azimuth of the vector h are identical and as specified for all the pairs. Since an isotropic semivariogram is independent of data orientation, the directional semivariograms obtained considering any specific azimuth will be identical to the isotropic semivariogram if the data is in fact isotropic. Differences between the directional semivariograms indicate one of two different forms of anisotropy, namely, geometric anisotropy and zonal anisotropy. Geometric anisotropy is said to be present if directional semivariograms with differing azimuths have differing ranges. Zonal anisotropy is indicated by a variation in the sill with azimuth.

3.5.2

Construction of a directional semivariogram

A directional semivariogram is specified by several parameters, as illustrated in Figure 3.7a. The parameters include the azimuth of the direction vector (the azimuth angle (φ ) is measured from the North), the azimuth tolerance (δ φ ), the bin separation (h) and the bin width (δ h). A semivariogram obtained using all pairs of points irrespective of the azimuth is known as the omni-directional semivariogram, and is an accurate measure of spatial correlation in the presence of isotropy (The semivariograms that have been described in the previous sections are omni-directional semivariograms.) In determining the experimental h semivariogram i in any bin, only pairs of sites separated by distance ranging between δh δh h − 2 , h + 2 , and with azimuths ranging between [φ − δ φ , φ + δ φ ] are considered. For example, let α be a site located in a 2 dimensional region, as shown in Figure 3.7a. It is intended to construct a directional semivariogram with an azimuth of φ (as marked in the ˆ figure). The computation of the experimental semivariogram value (γ(h)) involves pairing up the data values at all sites falling within the hatched region (the region that satisfies the conditions on the separation distance and the azimuth, as mentioned above) with the data value at site α (i.e., uα ). The area of the hatched region is defined by the azimuth tolerance used and can be seen to increase with increase in separation distance (h) (Figure 3.7a). For large values of h, the area of the hatched region will be undesirably large and hence, in addition to placing constraints on the azimuth tolerance, a constraint is explicitly specified


71

on the bandwidth of the region of interest, as marked in the figure. It is usually difficult to compute experimental directional semivariograms on account of the need to obtain pairs of sites oriented along pre-specified directions. Hence, it is required that the bin width, the azimuth tolerance and the bandwidth be specified liberally while constructing directional semivariograms. The results reported in this chapter are obtained by considering a bin separation of 4 km, a bin width of 4 km, an azimuth tolerance of 10 ◦ and a bandwidth of 10 km. Directional semivariograms are plotted for azimuths of 0 ◦ , 45 ◦

and 90 ◦ in order to capture the effects of anisotropy, if any.

3.5.3

Test for anisotropy using Northridge ground motion data

Figure 3.7b-d shows the omni-directional and the three directional experimental semivariograms of the 2 second ε˜ ’s from the Northridge earthquake data. The semivariogram function shown in the figures is the exponential model with a unit sill and a range of 42 km. This exponential model (obtained assuming isotropy in section 3.4.2) fits all the experimental directional semivariograms reasonably well (at short separations, which are of interest). This is a good indication that the semivariogram is isotropic. Similar results were obtained at other periods and for other earthquakes (Appendix B and Bazzurro et al. [2008]).

3.6

Comparison with previous research

Researchers have previously computed the correlation between ground-motion intensities using observed peak ground accelerations, peak ground velocities and spectral accelerations. These works, however, differ widely in the estimated rate of decay of correlation with separation distance. This section compares the results observed in the current work to those in the literature and also discusses possible reasons for the apparent inconsistencies in the previous estimates. Wang and Takada [2005] used the ground-motion relationship of Annaka et al. [1997] to compute the normalized auto-covariance function of residuals computed using the ChiChi earthquake peak ground velocities (PGV). They used an exponential model to fit the discrete experimental covariance values and reported a result which is equivalent to the


72

Figure 3.7: (a) Parameters of a directional semivariogram. Subfigures (b), (c) and show experimental directional semivariograms at discrete separations obtained using Northridge earthquake ε˜ values computed at 2 seconds. Also shown in the figures is best fit to the omni-directional semivariogram: (b) azimuth = 0◦ (c) azimuth = 45◦ azimuth = 90◦ .

(d) the the (d)


73

following semivariogram: γ(h) = 1 − exp(−3h/83.4).

(3.21)

This semivariogram has a unit sill and a range of 83.4 km (from equation 3.5). The current work does not consider the spatial correlation between PGV-based residuals. The PGVs, however, are comparable to spectral accelerations computed at moderate periods (0.5 to 1 s), and hence, the semivariogram ranges of residuals computed from PGVs can be qualitatively compared to the corresponding ranges estimated in this work (Figure 3.6a). It can be seen that the range reported by Wang and Takada [2005] is substantially higher than the ranges observed in the current work. In order to explain this inconsistency, the correlations computed by Wang and Takada [2005] are recomputed in the current work using the Chi-Chi earthquake time histories available in the NGA database and the ground-motion model of Annaka et al. [1997]. The Annaka et al. [1997] ground-motion model does not explicitly capture the effect of localsite conditions. To account for the local-site effects, Wang and Takada [2005] amplified the predicted PGV at all sites by a factor of 2.0 and the same amplification is carried out here for consistency. The observed and the predicted PGVs are used to compute residuals, and the experimental semivariograms (at discrete separations) of these residuals are estimated (considering a bin size of 4 km) using the procedures discussed previously in this chapter. Figure 3.8a shows the experimental semivariogram obtained, along with an exponential semivariogram function having a unit sill and a range of 83.4 km (there are slight differences between this experimental semivariogram and the one shown in Wang and Takada [2005] possibly due to the differences in processing carried out on the raw data or the specific recordings used). It is clear from Figure 3.8a, as well as the results presented in Wang and Takada [2005], that the exponential model with a range of 83.4 km does not provide an accurate fit to the experimental semivariogram values at small separation distances. This is because Wang and Takada [2005] minimized the fitting error over all distances to obtain their model. In the literature, several research works use the method of least squares (or visual methods that attempt to minimize the fitting error over all distances, which in effect, produces fits similar to the least-squares fit), to fit a model to an experimental semivariogram [Goda


74

Figure 3.8: Semivariogram obtained using residuals computed based on Chi-Chi earthquake peak ground velocities: (a) residuals from Annaka et al. [1997] and semivariogram model from Wang and Takada [2005] (b) residuals from Annaka et al. [1997] and semivariogram fitted to model the discrete values well at short separation distances (c) residuals from Annaka et al. [1997], considering random amplification factors.


75

and Hong, 2008, Hayashi et al., 2006, Wang and Takada, 2005]. There are three major drawbacks in using the method of least squares to fit an experimental semivariogram: (a) As explained in section 3.4.2, it is more important to model the semivariogram structure well at short separation distances than at long separation distances. This is because of the low correlation between intensities at well-separated sites and the screening of a faraway site by more closely-located sites [Goovaerts, 1997]. It is, therefore, inefficient if a fit is obtained by assigning equal weights to the data points at all separation distances, as done in the method of least squares. (b) The results provided by the method of least squares are highly sensitive to the presence of outliers (because differences between the observed and predicted γ(h)’s are squared, any observed γ(h) lying away from the general trend will have a disproportionate influence on the fit). (c) The least-squares fit results can be sensitive to the maximum separation distance considered. This is of particular significance if the method of least squares is used to determine the sill of the semivariogram in addition to its range. Some of the these drawbacks can be corrected within the framework of the least-squares method. Drawback (a) can be partly overcome by assigning large weights to the data points at short separation distances. The presence of outliers can be checked rigorously using standard statistical techniques [Kutner et al., 2005] and the least-squares fit can be obtained after eliminating the outliers in order to overcome the second drawback mentioned above. These procedures, however, add to the complexity of the approach. For this reason, experimental semivariograms are fitted manually rather than using the method of least squares in the current work [as recommended by Deutsch and Journel, 1998]. This approach allows one to overlook outliers and also to focus on the semivariogram model at distances that are of practical interest. Though this method is more subjective than the method of least squares, experience shows that the results obtained are reasonably robust. Figure 3.8b shows the experimental semivariogram (identical to the one shown in Figure 3.8a) along with an exponential function, which is manually fitted to model the experimental semivariogram values well at short separation distances. The range of this exponential model equals 55 km, which is much less than the range of 83.4 km mentioned earlier, and is closer to the results reported earlier for the Chi-Chi spectral accelerations.


76

The large range reported in Wang and Takada [2005] may also be due to inaccuracies in modeling the local-site effects. As explained in section 3.4.2, errors in capturing the localsite effects will cause systematic errors in the predicted ground motions that will result in an increase in the range of the semivariogram. Using a constant amplification factor of 2.0 (without considering the actual local-site effects) will produce even larger systematic errors in the predicted ground motions than considered previously. Consider a complementary hypothetical example in which the ground-motion amplification factor for each site is considered to be an independent random variable, uniformly distributed between 1.0 and 2.0. Randomizing the ground-motion amplification will break up the correlation between the prediction errors in a cluster of closely-spaced sites. The semivariogram of residuals obtained considering such random amplification factors is shown in Figure 3.8c. The range of this semivariogram equals 43 km, which is less than the 55 km from Figure 3.8b. The true amplifications are neither constant at 2.0, nor are totally random between 1.0 and 2.0. Hence, the range of the semivariogram is expected to lie within 43 km and 55 km, which is close to the range observed using short period spectral-accelerations in the current work. Boore et al. [2003] estimated correlations between the PGA residuals computed from the Northridge earthquake. They observed that the correlations dropped to zero when the inter-site separation distance was approximately 10 km. This matches with the range of 10 km estimated in the current work using the Northridge earthquake PGAs (Figure 3.2a). Those results appear to be consistent with the results shown here (and it is interesting to note that the two efforts used different estimation procedures and data sets). The observations in the current work are also consistent with those reported in Goda and Hong [2008] who reported a more rapid decrease in correlations with distance for the Northridge earthquake ground motions than for the Chi-Chi earthquake ground motions. They also reported that the decay of spatial correlation of the residuals computed from spectral accelerations is more gradual at longer periods, a feature observed and analyzed in the current research work. The current work adds plausible physical explanations for these empirically-observed trends.


3.7

77

Conclusions

Geostatistical tools have been used to quantify the correlation between spatially-distributed ground-motion intensities. The correlation is known to decrease with increase in the separation between the sites, and this correlation structure can be modeled using semivariograms. A semivariogram is a measure of the average dissimilarity between the data, whose functional form, sill and range uniquely identify the ground-motion correlation as a function of separation distance. Ground motions observed during the Northridge, Chi-Chi, Big Bear City, Parkfield, Alum Rock, Anza and Chino Hills earthquakes were used to compute the correlations between spatially-distributed spectral accelerations, at various spectral periods. The correlations were computed for normalized intra-event residuals, since the normalized intra-event residuals will be homoscedastic. The ground-motion model of Boore and Atkinson [2008] was used for the computations, but the results did not change when the Chiou and Youngs [2008] model was used instead. It was seen that the rate of decay of the correlation with separation typically decreases with increasing spectral period. It was reasoned that this could be because long period ground motions at two different sites tend to be more coherent than short period ground motions, on account of lesser wave scattering during propagation. It was also observed that, at periods longer than 2 seconds, the estimated correlations were similar for all the earthquake ground motions considered. At shorter periods, however, the correlations were found to be related to the site Vs 30 values. It was shown that the clustering of site Vs 30’s is likely to result in larger correlations between residuals. Based on these findings, a predictive model was developed that can be used to select appropriate correlation estimates for use in risk assessment of spatially-distributed building portfolios or infrastructure systems. The research work also investigates the effect of directivity on the correlations using pulse-like ground motions. The correlations obtained were similar to those estimated using all ground motions. The results, however, are not discussed in detail due to concerns about the reliability of the results on account of the small data set of pulse-like ground motions. The work also investigated the commonly-used assumption of isotropy in the correlation between residuals using directional semivariograms. If directional semivariograms


78

computed based on different azimuths are identical to the omni-directional semivariogram (which is obtained assuming isotropy), it can be concluded that the semivariograms (and therefore, the correlations) are isotropic. It was seen using empirical data that the correlation between Chi-Chi and Northridge earthquake intensities show isotropy at both short and long periods. The results obtained were also compared to those reported in the literature [Goda and Hong, 2008, Wang and Takada, 2005, Boore et al., 2003]. Wang and Takada [2005] report larger correlations using the PGVs computed using the Chi-Chi earthquake recordings than those reported in this work for spectral accelerations. It was shown that these larger correlations are a result of attempting to fit the experimental semivariogram reasonably well over the entire range of separation distances of interest (which is a typical result of using least-squares fits and eye-ball fits that produce results similar to least-squares fits), and of using a ground-motion model that does not account for the effect of local-site conditions. Typically, a semivariogram model should represent correlations accurately at small separations since ground motions at a site are more influenced by ground motions at nearby sites. The method of least squares assigns equal importance to all separation distances and is therefore, inefficient. In the current research work, semivariogram models are fitted manually with emphasis on accurately modeling correlations at small separations. This study illustrates various factors that affect the spatial correlation between groundmotion intensities, and provides a basis to choose an appropriate model using empirical data. The proposed predictive model can be used for obtaining the joint distribution of spatially-distributed ground-motion intensities, which is necessary for a variety of seismic hazard calculations.

Chapter 4 Spatial correlation between spectral accelerations using simulated ground-motion time histories N. Jayaram, Park, J., Bazzurro, P. and Tothong, P. (2010). Estimation of spatial correlation between spectral accelerations using simulated ground-motion time histories, 9th U.S. National and 10th Canadian Conference on Earthquake Engineering, Toronto, Canada.

4.1

Abstract

The impact of earthquakes on a region rather than on just a single property at a specific site is of interest to several public and private stakeholders, including government and relief organizations that are in charge of disaster mitigation and post-disaster response planning and management, and private organizations that insure and manage spatially-distributed assets. Regional earthquake impact assessment requires knowledge about the distribution of ground-motion intensities over the entire region. Ground-motion models that are used for quantifying the hazard at a single site do not provide information on the spatial correlation between ground-motion intensities, which is required for the joint prediction of intensities at multiple sites. Statistical models that describe the spatial correlation between intensity measures are available in the literature, and the mathematics behind models that estimate 79

CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS

80

the spatial correlation as a function of site separation distance has already been developed. This study investigates whether a more sophisticated model of spatial correlation that incorporates features such as non-stationarity (variation of correlation with spatial location), anisotropy (directional dependence) and directivity effects (different correlation models for pulse-like and non-pulse-like ground motions) is warranted. Testing the need for these additional features, however, requires a large number of ground-motion time histories. Since real data are sparse, the current study uses simulated ground-motion time histories instead. Overall, this study tests and provides a basis for some of the subtle assumptions commonly used in spatial correlation models.

4.2

Introduction

The impact of earthquakes on a region rather than on just a single property at a specific site is of interest to several public and private stakeholders. In the aftermath of a large event, public entities such as government agencies and relief organizations, and private entities such as corporations and utilities need to assess the potential damage on a regional scale in order to plan their emergency response in a timely manner. These organizations also need to assess regional risks from future earthquakes in order to determine risk mitigation strategies such as retrofitting and acquiring insurance coverage. Regional earthquake impact assessment requires knowledge about the joint groundmotion hazard at multiple sites of interest spread over the entire region. Predictive equations have been developed for estimating the distribution of the ground-motion intensity that an earthquake can cause at a single site [e.g., Boore and Atkinson, 2008]. Much less attention has been devoted, however, to estimating the statistical dependence (spatial correlation) between ground-motion intensities generated by an earthquake at multiple sites. The spatial correlation between ground-motion intensity measures arises to many factors including common source effects (e.g., a high stress-drop earthquake may generate groundmotion intensities that are, on average, higher than the median values from events of the same magnitude), common path effects (the seismic waves travel over a similar path from the source to two nearby sites) and common site effects (similar non-linear amplification at two nearby sites due to proximity). Modern ground-motion models implicitly account for


81

a part of the dependence via a specific inter-event error term, ηi , as follows [e.g., Boore and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]: ln Yi j = lnY¯i j + σ εi j + τηi

(4.1)

where Yi j denotes the ground-motion intensity parameter of interest (e.g., Sa (T ), the spectral acceleration at period T ) at site j during earthquake i; Y¯i j is the median value of Yi j predicted by the ground-motion model at site j for earthquake i (which depends on parameters such as magnitude, distance of source from site, local site conditions); ηi is the normalized inter-event standard normal residual, εi j is the site-to-site normalized intra-event standard normal residual, τ and σ are the corresponding standard deviations of the two residuals. While the ground-motion model in Equation 4.1 partially accounts for the correlation of Yi j at different sites via the common ηi , there is a significant amount of unaccounted correlation in the εi j ’s, which is not quantified by the ground-motion models. It is of interest in this study to further explore the properties of this correlation. An alternative formulation for Equation 4.1, which was common in older prediction equations, is given by ln Yi j = lnY¯i j + σ˜ ε˜i j

(4.2)

where ε˜i j is a random variable called the normalized total residual, which represents both the inter-event and the intra-event variability at site j from earthquake i. Comparing Equations 4.1 and 4.2, it is seen that σ˜ =

p

σ 2 + τ2

(4.3)

ε˜i j =

τηi + σ εi j σ˜

(4.4)

This study intends to empirically estimate the correlation between the intra-event residuals (εi j ) using ground-motion time histories. Since the inter-event residual is a constant


82

across all sites during a given earthquake, the correlation between εi j ’s equals the correlation between ε˜i j ’s [Jayaram and Baker, 2009a] (Chapter 3 of this thesis). While estimating spatial correlations, it is convenient to directly work with total residuals (Equation 4.2) since the values of ε˜i j can be directly computed from the ground-motion observations without the knowledge about ηi . In the past, researchers have estimated the spatial correlations between the total residuals using recorded ground-motion data [e.g., Wang and Takada, 2005, Jayaram and Baker, 2009a]. Using geostatistical tools, Jayaram and Baker [2009a] identified various factors influencing the extent of the spatial correlation, and developed a predictive model that can be used to select appropriate correlation estimates. While recorded ground motions represent the natural source for estimating the extent of correlation between ground-motion intensities at two sites, they do not suffice for investigating the validity of assumptions such as second-order stationarity (i.e., dependence of correlation on just the separation between sites, and not on the actual location of the sites) and isotropy (i.e., invariance of correlation with the orientation of the sites) that are commonly used in the spatial correlation models developed so far. This is on account of the scarcity of ground-motion recordings for any particular earthquake. This limitation can be partially overcome by using simulated ground motions. Although the simulations may not be complete substitutes for recorded data, they are still extremely useful for testing and refining existing correlation models (which requires large amounts of data). This chapter describes the tests carried out to verify the commonly-used assumptions of stationarity and isotropy using ground motions simulated by Dr. Brad Aagaard of the United States Geological Survey based on the 1989 Loma Prieta earthquake source model [Aagaard et al., 2008]. Further, tests carried out to verify whether pulse-like ground motions that arise due to directivity effects and non-pulse-like ground motions have similar correlation structures are also described. Information about tests carried out using other sets of simulated ground motions can be found in [Bazzurro et al., 2008].


4.3

83

Statistical estimation of spatial correlation

The current work uses geostatistical tools previously used by Jayaram and Baker [2009a] to empirically estimate the spatial correlations of residuals from simulated ground-motion time histories. These tools are described briefly in this section; a detailed discussion can be found in, for example, Deutsch and Journel [1998] and Jayaram and Baker [2009a] (Chapter 3 of this thesis). Let ε˜ denote the normalized total residuals distributed over space. The correlation structure of ε˜ (equivalently, that of ε ) can be represented using a semivariogram, which is a measure of the dissimilarity between the residuals. Let u and u0 denote two sites separated by distance vector h . The semivariogram (γ(u, u0 )) is defined as follows: 1 γ(u, u0 ) = E {ε˜u − ε˜u0 }2 2

(4.5)

The semivariogram defined in Equation 4.5 is location-dependent and its inference requires repetitive realizations of ε˜ at locations u and u0 . Such repetitive measurements are, however, never available in practice. Hence, it is typically assumed that the semivariogram does not depend on site locations u and u0 , but only on their separation h to obtain a stationary semivariogram. The stationary semivariogram (γ(hh)) can then be estimated as follows: 1 γ(hh) = E {ε˜u − ε˜u+h }2 2

(4.6)

A stationary semivariogram is said to be isotropic if it is a function of the separation distance (h = khhk) rather than the separation vector h . An isotropic, stationary semivariogram can be empirically estimated from a data set as follows: 1 N(h) ˆ γ(h) = ∑ {ε˜uα − ε˜uα +h}2 2N(h) α=1

(4.7)

ˆ where γ(h) is the experimental stationary isotropic semivariogram (estimated from a data set); N(h) denotes the number of pairs of sites separated by h; and {ε˜uα , ε˜uα +h } denotes the α’th such pair.


84

ˆ When empirically estimated, γ(h) only provides semivariogram values at discrete values of h, and hence, a continuous is usually fitted to the discrete values to obtain the semivariogram for continuous values of h. The exponential function shown below is commonly used for this purpose. ˆ γ(h) = a [1 − exp (−3h/b)]

(4.8)

where a denotes the ‘sill’ of the semivariogram (which equals the variance of the data) and b denotes the ‘range’ of the semivariogram (which equals the separation distance h at ˆ which γ(h) equals 0.95a). ˆ It can be theoretically shown that the spatial correlation function (ρ(h)) for normalized total residuals (and therefore, for normalized intra-event residuals) can be computed from the semivariogram function as follows: ˆ γ(h) = a (1 − ρ (h))

(4.9)

Therefore, it can be seen that the correlations are completely defined by the semivariogram, which in turn, is a function only of the range. (The sill is known to equal 1, which is the variance of the normalized residuals for which the semivariogram is constructed.) Moreover, note from equations 4.7 and 4.9 that a larger range implies a smaller rate of ˆ increase in γ(h) with h, and subsequently, a smaller rate of decay of correlation with separation distance.

4.4


This section describes the tests carried out to verify the commonly-used assumptions of stationarity and isotropy using ground motions simulated by Dr. Brad Aagaard of the United States Geological Survey for the 1989 Loma Prieta earthquake source model [Aagaard et al., 2008]. Further, tests carried out to verify whether pulse-like ground motions that arise due to directivity effects and non-pulse-like ground motions have similar correlation structures are also described. The simulated 1989 Loma Prieta data set contains groundmotion time histories at 35,547 sites. Soft soil sites with Vs30 ≤ 500m/s are excluded


85

from the tests, due to concerns about the ability of the simulation methodology to capture nonlinear soil behavior. Also, the current limitations in the simulation procedure allow us to investigate the spatial correlation of spectral accelerations only at periods longer than 2s. The total residuals, ε˜ ’s, are computed from the fault normal Sa (T ) values with T =2s, 5s, 7.5s, and 10s using the Boore and Atkinson [2008] ground-motion model. Using the geostatistical procedure described in the previous section, discrete semivariogram values are estimated for these residuals, and an exponential function (Equation 4.8) is subsequently fitted to the discrete values. Figure 4.1 shows a sample semivariogram obtained using the residuals corresponding to Sa (T = 2s). This semivariogram has a sill of 1 and a range of 30km. The ranges of the semivariograms obtained using the fault normal residuals at the four different periods are plotted in Figure 4.2a. As mentioned earlier, the range is an indicator of the extent of spatial correlation, and a larger range implies a larger amount of spatial correlation. Figure 4.2a shows that the range and therefore, the amount of spatial correlation increases with oscillator period. This trend is on expected lines because the coherency between the period components of the ground motion increases with period [Der Kiureghian, 1996]. Note that the ranges obtained from this simulated 1989 Loma Prieta data set are slightly larger than those from recorded ground motions computed by Jayaram and Baker [2009a] shown in Figure 4.2b. This means that this simulated ground motion data set is more spatially correlated than real, recorded data sets analyzed so far. While uncovering the reasons of this apparent discrepancy is beyond the scope of this study, this finding can perhaps be used to enhance the simulation technique. Despite this limitation, it is assumed that the large number of simulated ground-motions contains useful information for studying the isotropy and the second-order stationarity assumptions of spatial-correlation models. These tests can be performed irrespective of the actual extent of correlations measured.

4.4.1

Effect of ground-motion component orientation on the semivariogram range

In order to test whether the extent of spatial correlation is a function of the orientation of the ground-motion component, semivariograms of residuals are estimated using the fault


86

Figure 4.1: Semivariogram computed using the Sa(T=2s) residuals.

Figure 4.2: Ranges of semivariograms obtained using residuals computed from the (a) 1989 Loma Prieta simulations (b) recorded ground motions [Jayaram and Baker, 2009a].


87

normal, fault parallel, north-south and east-west components of the simulated ground motions. The ranges of these semivariograms are shown in Figure 4.3a. The range estimates are essentially identical for Sa at T =2s, and do not show a significant variation with the orientation at longer periods. Hence, most of the following analyses in this chapter are based on the fault normal components of the simulated ground motions.

4.4.2

Testing the assumption of isotropy using directional semivariograms

Directional semivariograms of residuals [Deutsch and Journel, 1998, Jayaram and Baker, 2009a] (illustrated in Chapter 3, Appendix B) are obtained as shown in Equation 4.6 except that the estimates are obtained using only pairs of {ε˜uα , ε˜uα +h } such that the azimuth of the vector h is identical (or, strictly speaking, within a narrow band of azimuths) for all the pairs utilized. This study considers azimuth angles of 0◦ , 45◦ and 90◦ . If anisotropy is present in the data, the semivariograms along the pre-specified azimuths will differ from each other and from the omni-directional semivariogram (i.e., the semivariogram obtained using all pairs of points irrespective of the azimuth). Figure 4.3b compares the omni-directional semivariogram with the semivariograms obtained by considering azimuths of 0◦ , 45◦ and 90◦ for residuals for Sa (T = 2s). All the semivariograms are almost identical for separation distances below 10km and are reasonably close for separation distances between 10km and 20km. Recall that during the characterization of the distribution of ground-motion intensities over a region, it is more important to capture the effects of the spatial correlation at short separation distances since the extent of spatial correlation decreases rapidly with separation distance. Also, in addition to having low correlation, widely separated sites also have little impact on each other due to an effective ’screening’ of their influence by more closely-located sites Deutsch and Journel [1998]. As a result, since the semivariograms in Figure 4.3b are nearly identical at short separation distances, it can be reasonably concluded that, at least for this data set, the spatial correlations can be adequately represented using an isotropic model. Tests carried out using this Loma Prieta simulated data set for residuals computed for Sa at longer periods showed similar results as well [Bazzurro et al., 2008].


4.4.3

88

Testing the assumption of second-order stationarity

A spatial random function Z is said to be second-order stationary if the random variable Zu and Zv (i.e., the random variables that represent the values of Z at locations u and v, respectively) have constant means and second-order statistics (i.e., the covariances) that depend only on the distance vector between u and v and not on the actual locations. In other words, the covariance is the same between any two sites that are separated by the same distance and direction (direction is not a concern for isotropic semivariograms), no matter where the sites are located with respect to the causative fault. The assumption of second-order stationarity is convenient while developing correlation models since it allows the data available over the entire region of interest to be pooled together and because it considerably simplifies the application of the spatial correlation models. We know that the means of the residuals equal zero irrespective of the location of the residuals. Therefore, second-order stationarity can be tested by comparing the spatial correlation estimates obtained using residuals located in different spatial domains (i.e., using data from two groups of sites, one close to the fault and one far from it). Similar semivariograms imply that the actual spatial location of the sites where the ground-motion intensities are measured does not matter. In the current work, seven spatial domains are defined based on the distance of the sites from the rupture: Domain 1 includes sites between 0-20km while Domains 2-7 consist of sites between 20-40km, 40-60km, 60-80km, 80-120km, 120-160km and 160-200km of the rupture, respectively. Note that, as with histograms, the selection of the distance bins is somewhat arbitrary. Very narrow bins may provide results that are both unstable because of scarcity of data and potentially influenced by local effects (e.g., a cluster of sites with large residuals). Conversely, very broad bins may not detect any trend in the data, even if there is one. Here, the width of the domains is selected judiciously to avoid both the above pitfalls. The 1989 Loma Prieta fault normal ground motions are used to compute ε˜ values at four different periods, namely, 2s, 5s, 7.5s and 10s. Semivariograms are constructed for each spatial domain using only the residuals at sites that belong to that domain, and the estimated ranges are reported in Figure 4.4a. It can be seen that the ranges estimated using residuals at sites within 20-160km of the rupture are reasonably close to the range estimated


89

Figure 4.3: (a) Ranges are computed using residuals at different orientations (b) Omnidirectional (i.e., obtained using all pairs of points, irrespective of the azimuth) and directional semivariograms computed using residuals for Sa (T = 2s).

Figure 4.4: (a) Ranges are computed using residuals from different spatial domains (b) Ranges are computed using pulse-like and non-pulse-like near fault ground motions.


90

using all fault normal residuals (’all-site ranges’). There are more significant differences, however, between the ranges computed using residuals at sites that are very close to or very far away from the rupture from the all-site ranges. Semivariograms computed using sites that are farther than 160 km from the rupture show significantly smaller ranges, as do the semivariograms computed using sites that are within 20 km of the rupture. The groundmotion intensities at sites farther than 160 km from the rupture are generally very small and, therefore, accounting for the reduced correlations at these extremely far-off sites is certainly not critical. It is, however, important to further analyze the smaller correlations observed at near-fault locations. Intuitively, it is reasonable to expect path effects and smallscale variations to reduce spatial correlation between ground motions at near-fault sites. At sites farther than 20km, the path effects and small-scale variations have less differential influence, thereby resulting in larger ranges and, therefore, larger correlations.

4.4.4

Effect of directivity on spatial correlation

Ground motions at near-fault sites are often influenced by directivity effects, resulting in large amplitude pulse-like ground motions in the forward-directivity region [Somerville et al., 1997]. Most ground-motion models, however, do not explicitly capture this effect. Therefore, the residuals in such cases may be more correlated because of the additional prediction errors at sites influenced by directivity that are not captured by the groundmotion model. This study intends to verify whether the spatial correlation between pulselike ground motions is different from that between non-pulse-like ground motions. Baker [2007a] developed a technique that uses wavelet analysis to identify ground motions with pulses. Although not all the pulses identified by this technique are due to directivity effects, this approach provides a reasonable data set for studying the potential impact of directivity. The wavelet analysis procedure of Baker [2007a] is used to identify 434 pulses in the fault normal components of 1989 Loma Prieta simulations (incidentally, the wavelet analysis procedure also identified 121 pulses in the fault parallel direction, which are not utilized here). Residuals at four different periods are computed based on these ground motions and semivariograms of the residuals are developed. The estimated ranges (shown in Figure 4.4b) of these semivariograms are smaller than those estimated based


91

on all the fault normal residuals, but similar to those estimated based on ground motions at all the sites that are within 20 km from the rupture (Figure 4.4a). For a comparison, Fig. 4b also shows the ranges obtained using ground motions at all the sites that do not have pulse-like ground motions, but are within 20 km from the rupture (called near-fault non-pulse records in the legend). It is seen that the ranges obtained in this case are similar to the ranges obtained using pulses. This indicates that the effect of directivity does not substantially alter the ranges of the semivariograms. It needs to be verified whether similar observations can be made using recorded ground motions as well.

4.5

Conclusions

This study investigates the validity of commonly-used assumptions in spatial correlation models such as non-stationarity (variation of correlation with location) and anisotropy (directional dependence). Testing the need for these additional features, however, requires a large number of ground-motion time histories. Since real data are sparse, the tests can be performed using simulated ground motions. This chapter describes the tests performed using ground-motion time histories simulated by Dr. Brad Aagaard for the 1989 Loma Prieta earthquake source model instead. Other data sets were considered in Bazzurro et al. [2008]. Geostatistical tools were used to measure the extent of spatial correlation between spectral accelerations using the simulated ground-motion data set. The correlations were estimated using different orientations of the time histories, namely, fault normal, fault parallel, north-south and east-west, and were found to be similar in all four cases. The assumption of isotropy of spatial correlations was studied using directional semivariograms, and was found to be reasonable. The correlations were seen to be smaller than average between sites located extremely close to the fault rupture. Intuitively, it is reasonable to expect path effects and small-scale variations to reduce spatial correlation between ground motions at near-fault sites. Incidentally, the ground-motion intensities at sites very far away from the rupture were also found to be less spatially correlated than average, but this finding is of not much practical importance. It is important, however, to further investigate the smaller correlations seen at near-fault sites. The pulse-identification algorithm of Baker [2007a]


92

was used for identifying pulse-like ground motions, and the correlations between pulselike and non-pulse-like ground motions were compared. The study, however, did not find significant differences between the correlations in these two cases. Although some additional investigation using recorded time histories is needed, this study tests and provides a basis for some of the subtle assumptions commonly used in spatial correlation models.

Chapter 5 Simulation of spatially-correlated ground-motion intensities with and without consideration of recorded intensity values 5.1

Abstract

Quantifying the distribution of ground-motion intensities that might exist over a spatially distributed region during a future earthquake is important for several practical applications such as risk assessment and risk mitigation of spatially-distributed systems. Analytically, this is more complicated than a comparable quantification for only a single site, due to the interdependence between the intensities at multiple sites. As a result, simulation-based techniques are often used to quantify this distribution using probabilistically generated representative ground-motion intensity maps for the region. This chapter discusses two techniques, namely, single-step simulation and sequential simulation, for generating such intensity maps. It may also be of interest to estimate likely ground-motion intensities over a region in the wake of an earthquake, when ground-motion intensities have been recorded at one or

93

CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS

94

more locations in the region. These intensity estimates are useful, for instance, in determining optimal post-earthquake response strategies. In such cases, it is possible to use ground-motion intensities recorded during the earthquake to improve the ground-motion intensity prediction at sites where recordings are not available. This chapter discusses a sequential simulation technique for generating ground-motion intensity maps incorporating the information about the recorded intensities.

5.2

Introduction

Quantifying the distribution of ground-motion intensities that might exist over a spatiallydistributed region during a future earthquake is of great interest for several practical applications. This is important, for instance, to predict (or estimate after an earthquake) the damage to portfolios of buildings and lifelines and the number of injuries and casualties in a certain region. This is, however, more complicated than a comparable quantification at a single site on account of the spatial correlation between the ground-motion intensities at two different sites. Hence, the distribution of spatial ground-motion intensities is often quantified using simulation-based approaches that involve probabilistically generating representative ground-motion intensity maps (a collection of intensities at all the sites of interest) for future earthquakes. For a given earthquake, the intensities are predicted using ground-motion models which take the following form [e.g., Boore and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]: ln(Sai ) = ln S¯ai + σi εi + τi ηi

(5.1)

where Sai denotes the spectral acceleration (at the period of interest) at site i; S¯ai denotes the predicted (by the ground-motion model) median spectral acceleration which depends on parameters such as magnitude, distance, period and local-site conditions; εi denotes the normalized intra-event residual and ηi denotes the normalized inter-event residual. Both εi and ηi are univariate normal random variables with zero mean and unit standard deviation. σi and τi are standard deviation terms that are estimated as part of the ground-motion model and are functions of the spectral period of interest, and in some models also functions of the


95

earthquake magnitude and the distance of the site from the rupture. The term σi εi is called the intra-event residual and the term τi ηi is called the inter-event residual. The inter-event residual is a constant across all the sites for any particular earthquake event. The sum of the inter-event residual and the intra-event residual is called the total residual. For a given earthquake, ground-motion intensities can be predicted by combining the median intensity estimate with simulated values (realizations) of the normalized inter-event and intra-event residuals, in accordance with Equation 5.1. Past research has indicated that the normalized intra-event residuals at two different sites are correlated, and the extent of this correlation depends on the separation distance between the sites [e.g., Jayaram and Baker, 2009a, Wang and Takada, 2005, Boore et al., 2003] (Chapter 3 of this thesis). Any simulation of the normalized intra-event residuals must account for this spatial correlation in order to accurately quantify the regional ground-motion hazard [e.g., Jayaram and Baker, 2010, Park et al., 2007] (Chapter 6 of this thesis). For illustration, Figure 5.1 shows a sample simulated ground-motion intensity map (the intensity measure used here is Sa (1s), the spectral acceleration at a period of 1 second) for a magnitude 8 earthquake on the San Andreas fault. Figure 5.1a shows the median Sa (1s) values estimated using the Boore and Atkinson [2008] ground-motion model. Figure 5.1b shows a sample realization of the sum of the inter-event and the intra-event residuals, obtained considering spatial correlation. Figure 5.1c shows the ground-motion intensities over the region obtained by combining the median intensities and the simulated residuals. While the above simulation technique is used for generating ground-motion maps in the absence of any recorded intensities (say, for a future earthquake), it is often of interest to quantify the ground-motion intensities over a region following an earthquake (e.g., for determining the optimal post-earthquake emergency response strategy). Ground-motion intensity predictions in such cases can be significantly improved (in other words, the uncertainty in the predictions can be reduced) by utilizing the knowledge about the recorded intensities. This chapter primarily focuses on the simulation of correlated residuals with and without consideration of recorded ground-motion intensities. A single-step simulation technique and a sequential simulation technique are described for simulating residuals in the


96

Figure 5.1: Ground-motion intensities map simulation: (a) median intensities (b) spatially correlated normalized total residuals and (c) total intensities.


97

absence of recorded intensities. The sequential simulation technique is subsequently extended to incorporate information from recorded intensities. This chapter is organized as follows. Sections 5.3.1 and 5.3.2 describe procedures for simulating a vector of spatially-correlated intra-event residuals and the inter-event residual for future earthquakes. Section 5.4 describes an importance sampling procedure for spatially-correlated residuals (used by Jayaram and Baker [2010] for improving the computational efficiency of the lifeline risk assessment process). Section 5.5 describes a simulation procedure that uses information about recorded ground-motion intensities for simulating post-earthquake residuals.

5.3

Simulation of correlated normalized residuals without consideration of recorded ground-motion intensities

This section describes a single-step and a sequential simulation technique for simulating correlated normalized residuals.

5.3.1

Single-step simulation technique

Simulation of normalized intra-event residuals Chapter 2 [Jayaram and Baker, 2008] showed that a vector of spatially-distributed normalized intra-event residuals ε = (ε1 , ε2 , · · · , ε p ) (where p denotes the total number of sites of interest) follows a multivariate normal distribution. This distribution is solely defined by the mean and the variance of the marginal distributions (i.e., the mean and the variance of εi , which are zero and one respectively), and the correlation between all εi and ε j pairs. The correlation between the residuals is typically a function of the separation distance between the residuals, and can be obtained from empirical spatial correlation models [e.g., Jayaram and Baker, 2009a] (Chapter 3). The single-step simulation technique makes use of this fact to simulate normalized residuals as a vector of correlated standard normal random variables. In practice, this is done using a computer function if available. For instance, the command ‘mvnrnd’ in


98

MATLAB accepts input mean and covariance matrices and outputs a vector of correlated normally-distributed random variables. The mean matrix in this case is a vector of p zeros, expressed as follows:   0   0     µ = .   .   0

(5.2)

The covariance matrix of ε , denoted Σ , can be expressed as follows: 

1

ρ12 · · · ρ1p

 ρ21 1   Σ= . .   . .  ρ p1 ρ p2



 · · · ρ2p    ··· .   ··· .   ··· 1

(5.3)

where ρi j is the correlation between εi and ε j . Chapter 3 [Jayaram and Baker, 2009a] expressed ρi j as exp( −3h b ), where h is the separation distance between sites i and j, and b is called the range parameter, which controls the rate of decay of correlation with distance. The random variables can also be simulated in principle, by first simulating independent standard normal random variables (for instance, using Box-Muller transform as described in Law and Kelton [2007]) denoted n = [n1 , n2 , · · · , n p ], and by subsequently inducing the desired correlation between the independent variables using the Choleskey triangle. The procedure used to induce this correlation is described below. Σ can be decomposed using the Choleskey decomposition [Law and Kelton, 2007] as follows: Σ = L Lt

(5.4)

where L is a lower triangular matrix of size p by p and (.)t denotes the transpose operation. The vector of independent standard normal variable realizations (nn) can be converted to a


99

vector of correlated standard normal variables (ee = [e1 , e2 , · · · , e p ]) as follows: et = L nt

(5.5)

This vector e serves as a realization of ε . Simulation of the normalized inter-event residuals Following standard conventions, since the inter-event residual is a constant across all the sites during a single earthquake [e.g., Abrahamson and Youngs, 1992], the simulated normalized inter-event residuals should satisfy the following relation (which does not assume that the τi ’s are equal in order to be compatible with ground-motion models such as that of Abrahamson and Silva [2008]): ηi =

τ1 η1 τi

(5.6)

Thus the normalized inter-event residuals can be simulated by first simulating η1 from a univariate normal distribution with zero mean and unit standard deviation (using randn or mvnrnd in MATLAB for instance), and by subsequently evaluating other normalized interevent residuals using Equation 5.6. Incidentally, if all the τi ’s are equal, the ηi ’s will be equal as well (=η). In this case, the value of η can be simulated as a univariate normal variable with zero mean and unit standard deviation. Summary of the steps involved In summary, the steps involved in the single-step simulation procedure are as follows: • Step 1: Estimate the mean (Equation 5.2) and the covariance (Equation 5.3) matrices of the residuals. The covariances can be computed from a spatial-correlation model such as that of Jayaram and Baker [2009a]. • Step 2: Use a computer function such as mvnrnd to generate p jointly normallydistributed random variables using the mean and the covariance matrices. If a computer function is not available, the variables can be simulated by first simulating


100

p independent variables, and by subsequently inducing the correlation using the Choleskey triangle, as described earlier in the section. • Step 3: Simulate a normalized inter-event residual η1 from a univariate normal distribution with zero mean and unit standard deviation (using the same approach used in Step 2). Estimate the other ηi ’s using Equation 5.6. If all the τi ’s are equal, all the ηi ’s will equal η1 . • Step 4: Obtain the spectral acceleration at all the sites by combining the medians and the normalized inter- and intra-event residuals according to Equation 5.1.

5.3.2

Sequential simulation technique

Sequential simulation of intra-event residuals The single-step simulation technique described previously is computationally inefficient because the Choleskey decomposition (Equation 5.4) is an O(p3 ) operation (which is a problem when p is large). One alternative to the single-step simulation technique is the sequential simulation technique [Goovaerts, 1997, Deutsch and Journel, 1998] that lends itself to performing computationally efficient simulations. In this technique, the residuals are simulated one at a time, conditioned on the residuals previously simulated. This conditioning ensures that correlation between the residuals is appropriately accounted for. The residuals can be simulated in any order as long as each residual is conditioned on every other previously simulated residual. The following paragraphs describe the sequential simulation technique for obtaining p intra-event residuals. First, obtain e1 (a realization of ε1 ) by sampling from a univariate normal distribution with zero mean and unit standard deviation. The other ei ’s can be obtained using the procedure described below for simulating εi assuming that ε1 , ε2 , · · · , ε(i−1) have been previously simulated. Let e1 , e2 , · · · , e(i−1) denote the simulated values of the normalized intra-event residuals ε1 , ε2 , · · · , ε(i−1) . Since the ε’s follow a multivariate normal distribution, εi conditioned on ε1 , ε2 , · · · , ε(i−1) follows a univariate normal distribution with the following conditional


101

mean [Johnson and Wichern, 2007]: E εi ε1 , ε2 , · · · , ε(i−1) = ΣiO Σ−1 OO e O

(5.7)

t where e O = e1 , e2 , · · · , e(i−1) , ΣOO is the covariance matrix of ε1 , ε2 , · · · , ε(i−1) , and ΣiO is a row vector of covariances between εi and ε1 , ε2 , · · · , ε(i−1) . The symbol O denotes the set of sites at which the residuals have been previously simulated. ΣiO is thus defined as follows: h i Σ iO = ρi1 ρi2 · · · ρi(i−1)

(5.8)

and ΣOO is defined as follows: 

1

ρ12

· · · ρ1(i−1)



    ρ21 1 · · · ρ 2(i−1)     Σ OO =  . . ··· .      . . · · · .   ρ(i−1)1 ρ(i−1)2 · · · 1 The variance of εi conditioned on ε1 , ε2 , · · · , ε(i−1) is expressed as follows: var εi ε1 , ε2 , · · · , ε(i−1) = 1 − ΣiO Σ−1 OO ΣOi

(5.9)

(5.10)

where ΣOi is the transpose of ΣiO . ei is now obtained as a realization from a univariate normal distribution with the mean in Equation 5.7 and the variance in Equation 5.10. This simulation can be performed using the Box-Muller method [Law and Kelton, 2007] or using a computer function such as ‘randn’ in MATLAB. As mentioned earlier, the primary reason for using the sequential simulation technique is to achieve higher computational efficiency. The basic sequential simulation technique as described above, however, does not provide much benefit since it requires the computation of the inverses of several ΣOO matrices, some of which will be large if the number of conditioning sites is large. Hence, in practice, the number of conditioning sites is always


102

Figure 5.2: Illustration of the sequential step procedure. kept small even if a large number of residuals have previously been simulated. This is typically done by conditioning εi on the q closest ε’s (closest in terms of the Euclidean distance of the associated sites). This is reasonable because it has been observed in practice that εi is screened by nearby ε’s from the effect of far away ε’s [Goovaerts, 1997]. Due to this screening effect, the far-away residuals can be ignored without significantly affecting the statistical properties of the simulated residuals. The value of q is typically chosen to be between 10 and 30 to ensure accuracy and computational efficiency. Alternately, we can also condition εi on only the residuals at sites that are within a distance r from site i, as illustrated in Figure 5.2. A typical value for r is 30km. When the residuals are not conditioned on all other previously simulated residuals, Goovaerts [1997] reports that the order in which the residuals are simulated should be randomized during the simulation of each ground-motion intensity map to avoid any bias. This is, however, computationally inefficient since this necessitates the computation and inversion of many more ΣiO and ΣOO matrices (since the matrices now vary from simulated map to simulated map). For all practical purposes, the authors’ experience shows that the use of a single fixed order causes negligible bias in the results, and hence can be used in order to save significant computational effort (noting that ΣiO and Σ−1 OO are identical across


103

simulations if a fixed order is assumed). The inter-event residual simulation is identical to that described in Section 5.3.1. Summary of the steps involved In summary, the steps involved in the sequential simulation technique are as follows: • Step 1: Simulate e1 (a realization of ε1 from a univariate normal distribution with zero mean and unit standard deviation). Set variable i = 2. • Step 2: Simulate ei conditioned on the previously simulated residuals ε1 , ε2 , .., ε(i−1) (or just the closest q residuals or the residuals that are within a distance r from site i) from a univariate normal distribution with the mean in Equation 5.7 and the variance in Equation 5.10. • Step 3: Increment i by 1. If i is less than p, go to Step 2, else go to Step 4. • Step 4: Simulate the normalized inter-event residuals as described in Section 5.3.1. • Step 5: Obtain the spectral acceleration at all the sites by combining the medians and normalized inter- and intra-event residuals according to Equation 5.1.

5.4

Importance sampling of normalized intra-event residuals

Sometimes, it is of interest to preferentially sample ground-motion intensity maps with positive residuals in order to evaluate the performance of structures and lifelines under extreme events [e.g., Jayaram and Baker, 2010, 2009b]. In such cases, the normalized residuals can be sampled from an alternate distribution that produces a larger number of positive residuals. This procedure of using an alternate distribution for preferential sampling is known as importance sampling [Law and Kelton, 2007].


104

Figure 5.3: The alternate sampling distribution (marginal distribution) used for the importance sampling of residuals [Jayaram and Baker, 2010]. Jayaram and Baker [2010] sampled from a multivariate normal distribution with a positive mean for the marginal distributions of the normalized intra-event residuals as the alternate sampling distribution (Figure 5.3), in order to preferentially generate positive residuals. This choice was based on the simplicity of the corresponding importance sampling weights, a parameter (discussed subsequently) that needs to be computed as part of the importance sampling procedure. There are minor differences between the sampling procedures using the original (zero mean distribution) and the alternate (positive mean distribution) sampling distributions, and these are listed below. In the single-step simulation technique, Equation 5.5 is replaced by the following equation: e = m11 p + L nt

(5.11)

where m is the mean of the alternate sampling distribution, and 1 p denotes a column vector of ones of size p. Since the vector n is sampled from a zero mean distribution, it can be noted that the mean of the sampled residuals e equals the mean of the alternate sampling distribution.


105

Equation 5.2 is replaced by   m   m     µ =.   .   m

(5.12)

If the sequential simulation technique is used, Equation 5.7 is replaced by the following equation: eO − m11 p ) E εi ε1 , ε2 , · · · , ε(i−1) = m + ΣiO Σ−1 OO (e

(5.13)

It is to be noted that Equation 5.10 remains unaltered. The rest of this section discusses the computation of the importance sampling weight for this choice of the alternate sampling distribution. The importance sampling weight can be viewed as a correction factor that accounts for the differences between the sampling distribution and the true distribution. Suppose that we are interested in using a simulationbased approach to compute the expected value of an arbitrary function of ε denoted q(εε ). Let f (εε ) denote the probability density function (PDF) of the normalized intra-event residuals, and g(εε ) denote the alternate PDF. The expected value of q(εε ) (denoted H) can be evaluated as follows: Z

q(ee) f (ee)dee

H=

(5.14)

D

where D is the set of all values taken by e . The integral can be rewritten as follows: Z

q(ee)

H= D

f (ee) g(ee)dee g(ee)

(5.15)

Equation 5.15 shows that H can be computed using samples from the alternate PDF in place of samples from the true PDF if the function q(ee) is multiplied by the correction factor

f (ee) g(ee) .

This correction factor is called the importance sampling weight.

In the specific application discussed in this chapter, the distributions f (ee) and g(ee) are known to be multivariate normal, and are expressed as follows:


1 t −1 e f (e) = 1 exp − e Σ p 2 (2π) 2 |Σ| 2 1

106

(5.16)

where Σ denotes the covariance matrix of ε (Equation 5.3). 1 t −1 e 1 (ee − m11 p ) g(e) = 1 exp − (e − m1 p ) Σ p 2 (2π) 2 |Σ| 2 1

(5.17)

When the single-step simulation technique is used, the importance sampling weight is estimated as follows: f (e) 1 t −1 1 t −1 = exp (ee − m11 p ) Σ (ee − m11 p ) − e Σ e g(e) 2 2

(5.18)

When the sequential simulation technique is used, the importance sampling weight is computed using the following relationship: f (e p |e1 , e2 , · · · , e p−1 ) f (e) f (e1 ) f (e2 |e1 ) = ··· g(e) g(e1 ) g(e2 |e1 ) g(e p |e1 , e2 , · · · , e p−1 )

(5.19)

From Equations 5.7, 5.10 and 5.13, −1 f (ei |e1 , e2 , · · · , ei−1 ) ∼ N(ΣiO Σ−1 OO e O , Σii − ΣiO ΣOO ΣOi )

eO − m11 p ), Σii − ΣiO Σ−1 g(ei |e1 , e2 , · · · , ei−1 ) ∼ N(m + ΣiO Σ−1 OO (e OO ΣOi )

(5.20)

(5.21)

While the above discussion focuses on intra-event residuals, the same importance sampling technique can also be used for preferentially sampling positive inter-event residuals.


5.5

107

Sequential simulation of correlated normalized residuals with consideration of recorded ground-motion intensities

In this section, a procedure is described for simulating normalized residuals conditioned on recorded ground-motion intensities. Here, it is assumed for simplicity that the standard deviations of the inter-event residual and that of the intra-event residuals are constants (i.e., σi = σ and τi = τ). Appendix 5.7 discusses the more general case that arises when this assumption is not true. In the simulation techniques described in the previous section, the inter-event residual and the intra-event residuals are simulated separately. This is because the screening effect is more effective when the intra-event residuals are simulated separately as discussed in more detail subsequently. When we wish to utilize the recorded intensity information, however, it is preferable to simulate total residuals (sum of inter-event and intra-event residuals) directly conditioned on total residuals computed from the recordings. This is because the ground-motion intensity recordings only provide us with information about the total residuals (computed as the difference between the observed logarithmic intensity and the predicted logarithmic intensity). If the residual terms are to be simulated separately, the recorded total residuals will first have to be split into the corresponding inter-event and intra-event terms, which leads to statistical errors. The recorded normalized total residual ε (t) can be computed from the recorded groundmotion intensities as follows (using Equation 5.1): (t)

εi

σ εi + τη = √ σ 2 + τ2 ln(Sai ) − ln S¯ai √ = σ 2 + τ2

(5.22)

where Sai , the observed spectral acceleration at site i, is the intensity measure considered, and S¯ai , σ and τ are parameters computed from the ground-motion model as described √ earlier. The normalizing factor in the above equation is σ 2 + τ 2 since the variance of the total residual equals the sum of the variances of the inter-event and the intra-event residual.


108

The sequential simulation technique for the normalized intra-event residuals described earlier in Section 5.3.2 can be used to simulate normalized total residuals as well. The following changes are necessary, however, since total residuals are simulated directly and since the recorded total residuals are now considered during the simulation procedure. (t)

(a) Each εi is conditioned on the ε (t) ’s previously simulated as well as the ε (t) ’s at the recording stations. In other words, from a simulation perspective, the recorded ε (t) ’s are treated as additional previously simulated ε (t) ’s. (t)

(b) As mentioned earlier, it is reasonable to condition εi on only the q closest normalized total residuals (including recorded and previously simulated total residuals). It is to be noted, however, that the screening effect used as the basis for this simplification is slightly less effective when total residuals are directly simulated as compared to when intra-event residuals are simulated separately. This is because even though the spatial correlation re(t)

duces with distance, the minimum value of spatial correlation between εi τ2 σ 2 +τ 2

(t)

and ε j equals

and not zero (as before). Therefore, ignoring far away residuals can cause slightly

more bias in this case than when only intra-event residuals are simulated. (t)

(c) The conditional mean and the conditional variance of εi (analogous to the quantities in Equations 5.7 and 5.10) are now obtained as follows: h i (t) (t) (t) (t) (t) (t)−1 (t) E εi ε1 , ε2 , · · · , εq = ΣiO ΣOO e O

(5.23)

h i (t) (t) (t) (t) (t) (t)−1 (t) Var εi ε1 , ε2 , · · · , εq = 1 − ΣiO ΣOO ΣOi (5.24) h i (t) (t) (t) where the set ε1 , ε2 , · · · , εq comprises of the q closest recorded and previously sim(t) ulated normalized h total residuals,i and e O denotes the realization (or recorded value as (t) (t) (t) appropriate) of ε1 , ε2 , · · · , εq .

The covariance matrices ΣiO , ΣOO and Σii in Equations 5.8 and 5.9 were defined for intra-event residuals only. The corresponding covariance matrices for the normalized total residuals are obtained as follows: (t) Σ iO =

h

σ 2 ρi1 +τ 2 σ 2 +τ 2

σ 2 ρi2 +τ 2 σ 2 +τ 2

···

σ 2 ρiq +τ 2 σ 2 +τ 2

i

(5.25)




1  2  σ ρ21 +τ 2  σ 2 +τ 2  (t) Σ OO =  .     2 . 2 σ ρq1 +τ σ 2 +τ 2

109



σ 2 ρ1q +τ 2 σ 2 +τ 2  σ 2 ρ2q +τ 2   σ 2 +τ 2 

σ 2 ρ12 +τ 2 σ 2 +τ 2

···

1

···

.

···

.

···

.

σ 2 ρq2 +τ 2 σ 2 +τ 2

···

1

.

(5.26)

    

The rest of the sequential simulation technique is identical to the one described earlier. (t)

In particular, residual ε1 is first simulated as a univariate normally-distributed random (t)

variable with zero mean and unit standard deviation. The residual εi

is then simulated

conditioned on the previously simulated residuals from a normal distribution with the mean in Equation 5.23 and the variance in Equation 5.24.

5.6

Conclusions

Quantifying the distribution of ground-motion intensities over a spatially-distributed region is an important task for several practical applications such as the risk assessment and post-earthquake damage assessment of spatially-distributed systems. Often, this is done using a simulation-based framework that involves generating probabilistic samples of representative ground-motion intensity maps. This chapter discussed techniques for simulating ground-motion intensity maps with and without the consideration of recorded ground-motion intensities. A ground-motion intensity map is generated by combining median intensity predictions from ground-motion models with realizations of inter-event and intra-event residuals that account for the uncertainty in the intensities. Intra-event residuals can be simulated as a correlated vector of normal random variables, and the inter-event residual can be simulated as a univariate normal random variable. The chapter discussed two simulation techniques, namely, single-step simulation and sequential simulation for generating residuals in the absence of recorded ground-motion intensities. While both procedures are theoretically equivalent, it is possible to achieve higher computational efficiency using the sequential simulation technique. The chapter also


110

described a sequential simulation technique for simulating residuals incorporating knowledge about recorded ground-motion intensities. This is useful for post-earthquake damage assessment and for determining optimal emergency response strategies.

5.7

Appendix: The conditional sequential simulation of normalized heteroscedastic residuals

This section generalizes the results shown in section 5.5 for the case where the residuals are heteroscedastic (i.e., σi and τi are not constant across all sites). The normalized total residual is now defined as follows: σi εi + τi ηi (t) εi = q σi2 + τi2

(5.27)

The simulation procedure is similar to that described in section 5.5 with changes to the covariance matrices shown in Equations 5.25 and 5.26. The new matrices can be estimated as follows: (t) Σ iO

= 

+τi τ1 √ σ2i σ1 ρ2i1√ σi +τi σ12 +τ12

√σ12σ2 ρ212√+τ12τ2

1

  σ2 σ1 ρ21 +τ2 τ1  √ 2 2√ 2 2  σ2 +τ2 σ1 +τ1  (t) Σ OO =  .   .   σq σ1 ρq1 +τq τ1 √ 2 2√ 2 2 σq +τq

+τi τ2 √ σ2i σ2 ρ2i2√ σi +τi σ22 +τ22

σ1 +τ1

σ1 +τ1

σ2 +τ22

σi2 +τi2

√

.

···

.

. √ 2

···

.

···

1

σ22 +τ22

σ22 +τ22

σq2 +τq2

(5.28)



σ1 +τ1 σq +τq2  σ2 σq ρ2q +τ2 τq 

··· √

σq2 +τq

√

σ σ ρ +τ τ · · · √ 12 q 21q√ 12 q

1

σq σ2 ρq2 +τq τ2

√

··· √

σi σq ρiq +τi τq

      

σq2 +τq2 

(5.29)

Chapter 6 Efficient sampling and data reduction techniques for probabilistic seismic lifeline risk assessment N. Jayaram and J.W. Baker (2010). Efficient sampling and data reduction techniques for probabilistic seismic lifeline risk assessment, Earthquake Engineering and Structural Dynamics (published online).

6.1

Abstract

Probabilistic seismic risk assessment for spatially-distributed lifelines is less straightforward than for individual structures. While procedures such as the ‘PEER framework’ have been developed for risk assessment of individual structures, these are not easily applicable to distributed lifeline systems, due to difficulties in describing ground-motion intensity (e.g., spectral acceleration) over a region (in contrast to ground-motion intensity at a single site, which is easily quantified using Probabilistic Seismic Hazard Analysis), and since the link between the ground-motion intensities and lifeline performance is usually not available in closed form. As a result, Monte Carlo simulation and its variants are well suited for characterizing ground motions and computing resulting losses to lifelines. This

111

CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT

112

chapter proposes a simulation-based framework for developing a small but stochasticallyrepresentative catalog of earthquake ground-motion intensity maps that can be used for lifeline risk assessment. In this framework, Importance Sampling is used to preferentially sample ‘important’ ground-motion intensity maps, and K-Means Clustering is used to identify and combine redundant maps in order to obtain a small catalog. The effects of sampling and clustering are accounted for through a weighting on each remaining map, so the resulting catalog is still a probabilistically correct representation. The feasibility of the proposed simulation framework is illustrated by using it to assess the seismic risk of a simplified model of the San Francisco Bay Area transportation network. A catalog of just 150 intensity maps is generated to represent hazard at 1,038 sites from ten regional fault segments causing earthquakes with magnitudes between five and eight. The risk estimates obtained using these maps are consistent with those obtained using conventional Monte Carlo simulation utilizing many orders of magnitudes more ground-motion intensity maps. Therefore, the proposed technique can be used to drastically reduce the computational expense of a simulation-based risk assessment, without compromising the accuracy of the risk estimates. This will facilitate computationally intensive risk analysis of systems such as transportation networks. Finally, the study shows that the uncertainties in the groundmotion intensities and the spatial correlations between ground-motion intensities at various sites must be modeled in order to obtain unbiased estimates of lifeline risk.

6.2

Introduction

Lifelines are large, geographically-distributed systems that are essential support systems for any society. Due to their known vulnerabilities, it is important to proactively assess and mitigate the seismic risk of lifelines. For instance, the Northridge earthquake caused over $1.5 billion in business interruption losses ascribed to transportation network damage [Chang, 2003]. The city of Los Angeles suffered a power blackout and $75 million of power-outage related losses as a result of the earthquake [e.g., Tanaka et al., 1997]. Recently, the analytical Pacific Earthquake Engineering Research Center (PEER) loss analysis framework has been used to perform risk assessment for a single structure at a given site,


113

by estimating the site ground-motion hazard and assessing probable losses using the hazard information [e.g., McGuire, 2007]. Lifeline risk assessment, however, is based on a large vector of ground-motion intensities (e.g., spectral accelerations at all lifeline component locations). The intensities also show significant spatial correlation, which needs to be carefully modeled in order to accurately assess the seismic risk. Further, the link between the ground-motion intensities at the sites and the performance of the lifeline is usually not available in closed form. For instance, the travel time of vehicles in a transportation network, a commonly-used performance measure, is only obtained using an optimization procedure rather than being a closed-form function of the ground-motion intensities. These additional complexities make it difficult to use the PEER framework for lifeline risk assessment. There are some analytical approaches that are sometimes used for lifeline risk assessment [e.g., Kang et al., 2008, Dueñas-Osorio et al., 2005], but those are generally applicable to only specific classes of lifeline reliability problems. Hence, many past research works use simulation-based approaches instead of analytical approaches for lifeline risk assessment [e.g., Campbell and Seligson, 2003, Bazzurro and Luco, 2004, Crowley and Bommer, 2006, Kiremidjian et al., 2007, Shiraki et al., 2007]. One simple simulation-based approach involves studying the performance of lifelines under those earthquake scenarios that may dominate the hazard in the region of interest [e.g., Adachi and Ellingwood, 2008]. While this approach is more tractable, it does not capture seismic hazard uncertainties in the way a Probabilistic Seismic Hazard Analysis (PSHA)-based framework would. Further, it is not easy to identify the earthquake scenario that dominates the hazard at the loss levels of interest [Jayaram and Baker, 2009b]. (Appendix C uses lifeline loss deaggregation calculations to illustrate the difficulties involved in selecting a dominating earthquake scenario.) A more comprehensive approach uses Monte Carlo simulation (MCS) to probabilistically generate ground-motion intensity maps (also referred to as intensity maps in this chapter), considering all possible earthquake scenarios that could occur in the region, and then use these for the risk assessment. Ground-motion intensities are generated using an existing ground-motion model, which is described below. The ground-motion intensity at a site is modeled as


ln(Sai j ) = ln S¯ai j + σi j εi j + τi j ηi j

114

(6.1)

where Sai j denotes the spectral acceleration (at the period of interest) at site i during earthquake j; S¯ai j denotes the predicted (by the ground-motion model) median spectral acceleration which depends on parameters such as magnitude, distance, period and local-site conditions; εi j denotes the normalized intra-event residual and ηi j denotes the normalized inter-event residual. Both εi j and ηi j are univariate normal random variables with zero mean and unit standard deviation. σi j and τi j are standard deviation terms that are estimated as part of the ground-motion model and are functions of the spectral period of interest, and in some models also functions of the earthquake magnitude and the distance of the site from the rupture. The term σi j εi j is called the intra-event residual and the term τi j ηi j is called the inter-event residual. The inter-event residual is a constant across all the sites for a given earthquake. Crowley and Bommer [2006] describe the following MCS approach to simulate intensity maps using Equation 6.1: Step 1: Use Monte Carlo simulation to generate earthquakes of varying magnitudes on the active faults in the region, considering appropriate magnitude-recurrence relationships (e.g., the Gutenberg-Richter relationship). Step 2: Using a ground-motion model (Equation 6.1), obtain the median ground-motion intensities (S¯ai j ) and the standard deviations of the inter-event and the intra-event residuals (σi j and τi j ) at all the sites. Step 3: Generate the normalized inter-event residual term (ηi j ) by sampling from the univariate normal distribution. Step 4: Simulate the normalized intra-event residuals (εi j ’s) using the parameters predicted by the ground-motion model. Chapter 2 [Jayaram and Baker, 2008] showed that a vector of spatially-distributed normalized intra-event residuals ε j = ε1 j , ε2 j , · · · , ε p j follows a multivariate normal distribution. Hence, the distribution of ε j can be completely defined using the mean (zero) and standard deviation (one) of εi j , and the correlation between all εi1 j and εi2 j pairs. The correlations between the residuals can be obtained from a predictive model calibrated using past ground-motion intensity observations [Jayaram and


115

Baker, 2009a, Wang and Takada, 2005]. Step 5: Combine the median intensities, the normalized intra-event residuals and the normalized inter-event residual for each earthquake in accordance with Equation 6.1 to obtain ground-motion intensity maps (i.e., obtain Sa j = Sa1 j , Sa2 j , · · · , Sa p j ). Crowley and Bommer [2006] used the above-mentioned approach to generate multiple earthquake scenarios that were then used for the loss assessment of a portfolio of buildings. They found that the results differed significantly from those obtained using other approximate approaches (e.g., using PSHA to obtain individual site hazard and loss exceedance curves, which are then heuristically combined to obtain the overall portfolio loss exceedance curve). Crowley and Bommer [2006], however, ignored the spatial correlations of εi j ’s when simulating intensity maps. Further, they used conventional MCS (i.e., bruteforce MCS or random MCS), which is computationally inefficient because large magnitude events and above-average ground-motion intensities are considerably more important than small magnitude events and small ground-motion intensities while modeling lifeline risks, but these are infrequently sampled in conventional MCS. Kiremidjian et al. [2007] improved the simulation process by preferentially simulating large magnitudes using importance sampling (IS). The normalized residuals (εi j and ηi j ), however, were simulated using conventional MCS. Shiraki et al. [2007] also used a MCS-based approach to estimate earthquake-induced delays in a transportation network. They generated a catalog of 47 earthquakes and corresponding intensity maps for the Los Angeles area and assigned probabilities to these earthquakes such that the site hazard curves obtained using this catalog match with the known local site hazard curves obtained from PSHA. In other words, the probabilities of the scenario earthquakes were made to be hazard consistent. Only median peak ground accelerations were used to produce the ground-motion intensity maps corresponding to the scenario earthquakes, however, and the known variability about these medians was ignored. While this approach is highly computationally efficient on account of the use of a small catalog of earthquakes, the selection of earthquakes is a somewhat subjective process, and the assignment of probabilities is based on hazard consistency rather than on actual event likelihoods. Moreover, the procedure does not capture the effect of the uncertainties in ground-motion intensities.


116

The current research work develops an importance sampling-based framework to efficiently sample important magnitudes and ground-motion residuals. It is seen that the number of IS simulations is about two orders of magnitude smaller than the number of Monte Carlo simulations required to obtain equally accurate lifeline loss estimates. Despite this improvement with respect to the performance of the conventional MCS approach, the number of IS intensity maps required for risk assessment is still likely to be an inconveniently large number. As a result, the K-means clustering technique is used to further reduce the number of intensity maps required for risk assessment by over an order of magnitude. The feasibility of the proposed framework is illustrated by assessing the seismic risk of an aggregated form of the San Francisco Bay Area transportation network using a sampled catalog of 150 intensity maps. The resulting risk estimates are shown to be in good agreement with those obtained using the conventional MCS approach (the benchmark method).

6.3

Simulation of ground-motion intensity maps using importance sampling

This section provides a description of the importance sampling technique used in the current work to efficiently simulate ground-motion intensity maps. Importance sampling (IS) is a technique used to evaluate functions of random variables with a certain probability density function (PDF) using samples from an alternate density function [Fishman, 2006]. This technique is explained in more detail in section 6.3.1. Sections 6.3.2, 6.3.3 and 6.3.4 describe the application of IS to the simulation of ground-motion intensity maps, which involves probabilistically sampling a catalog of earthquake magnitudes and rupture locations (which are required for computing the median ground-motion intensities), the normalized inter-event residuals and the normalized intra-event residuals (Equation 6.1).


6.3.1

117

Importance sampling procedure

Let f (x) be a PDF defined over domain D for random variable X. Define an integral H as follows: Z

H=

q(x) f (x)dx

(6.2)

D

where q(x) is an arbitrary function of x. The integral can be rewritten as follows: Z

H=

q(x) D

f (x) g(x)dx g(x)

(6.3)

where g(x) is any probability density assuming non-zero values over the same domain D. The term

f (x) g(x)

is called the importance sampling weight.

Based on Equation 6.2, the integral H can be estimated using conventional MCS as follows:

1 n Hˆ = ∑ q(xi ) n i=1

(6.4)

where Hˆ is an estimate of H and x1 , ..., xn are n realizations of the random variable X obtained using f (x). The IS procedure involves estimating the integral H using the alternate density g(x) as follows (based on Equation 6.3): 1 r f (yi ) ˆ H = ∑ q(yi ) r i=1 g(yi ) where y1 , ..., yr are r realizations from g(y), and

f (yi ) g(yi )

(6.5)

is a weighting function (the impor-

tance sampling weight) that accounts for the fact that the realizations are based on the alternate density g(y) rather than the original density f (y). While Equations 6.4 and 6.5 provide two methods of estimating the same integral H, it can be shown that the variance of the estimate Hˆ obtained using Equation 6.5 can be made very small if an appropriate alternate density function g(x) is chosen [Fishman, 2006]. As a result of this variance reduction, the required number of IS realizations (r) is much smaller than the required number of conventional MCS realizations (n) for an equally reliable (i.e., ˆ same variance) estimate H. Intuitively, the density g(x) should be such that the samples from g(x) are concentrated


118

in regions where the function q(x) is ‘rough’. This will ensure fine sampling in regions that ultimately determine the accuracy of the estimate and coarse sampling elsewhere. The challenge in implementing IS lies in choosing this alternate density g(x). Useful alternate densities for this application are provided in the following subsections.

6.3.2

Simulation of earthquake catalogs

Let n f denote the number of active faults in the region of interest and ν j denote the annual recurrence rate of earthquakes on fault j with magnitudes exceeding a minimum magnitude mmin . Let f j (m) denote the density function for magnitudes of earthquakes on fault j. Let f (m) denote the density function for the magnitude of an earthquake on any of the n f faults (i.e., this density function models the distribution of earthquakes resulting from all the faults). Using the theorem of total probability, f (m) can be computed as follows: nf

f (m) =

∑ j=1 ν j f j (m) nf

(6.6)

∑ j=1 ν j In the event of an earthquake of magnitude m on a random fault, let Pj (m) denote the probability that the earthquake rupture lies on fault j. The Pj (m)’s can be calculated using the Bayes’ theorem as follows: Pj (m) =

ν j f j (m) nf ∑ j=1 ν j f j (m)

(6.7)

A conventional MCS approach would use the density function f (m) to simulate earthquake magnitudes, although this approach will result in a large number of small magnitude events since such events are considerably more probable than large magnitude events. This is not efficient since lifeline losses due to frequent small events are less important than those due to rare large events (although not negligible, so they can not be ignored). It is desirable to improve the computational efficiency of the risk assessment process without compromising the accuracy of the estimates by using the importance sampling technique described in section 6.3.1 to preferentially sample large events while still ensuring that the simulated events are ‘stochastically representative’. In other words, the magnitudes are


119

simulated from a sampling distribution g(m) (rather than f (m)), which is chosen to have a high probability of producing large magnitude events. Let mmin and mmax denote the range of magnitudes of interest. This range [mmin , mmax ] can be stratified into nm partitions as follows: [mmin , mmax ] = [mmin , m2 ) ∪ [m2 , m3 ) ∪ · · · ∪ [mnm , mmax ]

(6.8)

In the current work, the partitions are chosen such that the width of the interval (i.e., mk+1 − mk ) is large at small magnitudes and small at large magnitudes (Figure 6.1a). A single magnitude is randomly sampled from each partition using the magnitude density function f (m), thereby obtaining nm realizations of the magnitudes. Since, the partitions are chosen to have small widths at large magnitudes, there are naturally a larger number of realizations of large magnitude events. In this case, the sampling distribution g(m) is not explicit, but rather is implicitly defined by the magnitude selection partitioning. This procedure, sometimes called stratified sampling, has the advantage of forcing the inclusion of specified subsets of the random variable while maintaining the probabilistic character of random sampling [Fishman, 2006]. The importance sampling weight

f (m) g(m)

can be obtained by noting that the sampling

distribution assigns equal weight to all the chosen partitions (1/nm ), while the actual probability of a magnitude lying in a partition (mk , mk+1 ) is obtained by integrating the density function f (m). Hence, the importance sampling weight for a magnitude m chosen from the kth partition is computed as follows: f (m) = g(m)

R mk+1 mk

f (m)dm

1/nm

(6.9)

Once the magnitudes are sampled using IS, the rupture locations can be obtained by sampling faults using fault probabilities Pj (m) (Equation 6.7). It is to be noted that Pj (m) will be non-zero only if the maximum allowable magnitude on fault j exceeds m. Let n f (m) denote all such faults with non-zero values of Pj (m). If n f (m) is small (around 10), a more efficient sampling approach will be to consider each of those n f (m) faults to be the source of the earthquake and consider n f (m) different earthquakes of the same


120

Figure 6.1: Importance sampling density functions for: (a) magnitude and (b) normalized intra-event residual; (c) recommended mean-shift as a function of the average number of sites and the average site-to-site distance normalized by the range of the spatial correlation model.


121

simulated magnitude. It is to be noted that this fault sampling procedure is similar to the importance sampling of magnitudes. The importance sampling weight for fault j chosen by this procedure is computed as follows: Pj (m) f ( j|m) = g( j|m) 1/n f (m)

(6.10)

where f ( j|m) and g( j|m) denote the original and the alternate (implicit) probability mass functions for fault j given an earthquake of magnitude m. Once a fault is sampled, the rupture is located randomly on the fault.

6.3.3

Simulation of normalized intra-event residuals

The set of normalized intra-event residuals at p sites of interest, ε j = ε1 j , ε2 j , · · · , ε p j , follows a multivariate normal distribution f (εε j ) [Jayaram and Baker, 2008]. The mean of ε j is the zero vector of size p, while the variance of each εi j equals one. The correlation between the residuals at two sites is a function of the separation between the sites, and can be obtained from a spatial correlation model. In this work, the correlation coefficient between the residuals at two sites i1 and i2 separated by h km is computed using the following equation, which was calibrated using empirical observations (Chapter 3) [Jayaram and Baker, 2009a]: ρεi1 j ,εi2 j (h) = exp(−3h/R)

(6.11)

where R controls the rate of decay of spatial correlation and is called the ‘range’ of the correlation model. The range depends on the intensity measure being used. In this work, the intensity measure of interest is the spectral acceleration corresponding to a period of 1 second, and the corresponding value of R equals 26 km. While a conventional MCS approach can be used to obtain realizations of ε j using f (e) [Fishman, 2006], this will result in a large number of near-zero (i.e., near-mean) residuals and few realizations from the upper and the lower tails. This is inefficient since for the purposes of lifeline risk assessment it is often of interest to study the upper tail (i.e., the ε j values that produce large intensities), which is not sampled adequately in the conventional MCS approach. An efficient alternate sampling density g(e) is a multivariate


122

normal density with the same variance and correlation structure as f (e), but with positive means for all εi0 j s (i.e., a positive mean for the marginal distribution of each intraevent residual). In other words, the mean vector of g(e) is the p-dimensional vector m sintra = (msintra , msintra , · · · , msintra ). Sampling normalized intra-event residuals from this distribution g(e), which has a positive mean, will produce more realizations of large normalized intra-event residuals. Figure 6.1b shows the original and sampling marginal distributions for one particular εi j . It is to be noted that this particular choice of the sampling distribution results in importance sampling weights that are simple to estimate. The importance sampling weights can be estimated as follows: f (e) 1 1 0 −1 0 −1 = exp ( e − m s intra ) Σ ( e − m s intra ) − e Σ e g(e) 2 2

(6.12)

where Σ denotes the covariance matrix of ε j . The positive mean of g(e) will ensure that the realizations from g(e) will tend to be larger than the realizations from f (e). It is, however, important to choose a reasonable value of the mean-shift msintra to ensure adequate preferential sampling of large ε j ’s, while avoiding sets of extremely large normalized intra-event residuals that will make the simulated intensity map so improbable as to be irrelevant. The process of selecting a reasonable value of msintra is described below. The first step in fixing the value of msintra is to note that the preferred value depends predominantly on three factors, namely, the extent of spatial correlations (measured by the range parameter R in Equation 6.11), the average site-to-site separation distance in the lifeline network being studied and the number of sites in the network. If sites are close to one another and if the spatial correlations are significant, the correlations between the residuals permit a larger mean-shift as it is reasonably likely to observe simultaneously large values of positively-correlated random variables. Similarly, the presence of fewer sites permits larger mean-shifts since it is more likely to observe jointly large values of residuals over a few sites than over a large number of sites. Hence, it is intended to determine the preferred mean-shifts as a function of the number of sites and the average site-to-site separation distances normalized by the range parameter. This is done by simulating the normalized


123

intra-event residuals in hypothetical analysis cases with varying numbers of sites and varying average site separation distances, considering several feasible mean-shifts in each case. The feasibility of the resulting residuals (i.e., whether the simulated set of residuals is reasonably probable) is then studied using the resulting importance sampling weights. Based on extensive sensitivity analysis, the authors found that the best results are obtained when 30% of the importance sampling weights fall below 0.1, if exceedance rates larger than 10−6 are of interest. The preferred mean-shifts are determined for each case based on this criterion, and are plotted in Figure 6.1c. This figure will enable users to avoid an extremely computationally expensive search for an appropriate sampling distribution in a given analysis case. Incidentally the figure shows that the mean-shift increases with average site separation distance and decreases with the number of sites. This validates the above-mentioned statement that larger site separation distances and fewer sites permit larger mean-shifts.

6.3.4

Simulation of normalized inter-event residuals

Following standard conventions, since the inter-event residual is a constant across all the sites during a single earthquake [e.g., Abrahamson and Youngs, 1992], the simulated normalized inter-event residuals should satisfy the following relation (which does not assume that the τi j ’s are equal in order to be compatible with ground-motion models such as that of Abrahamson and Silva [2008]): ηi j =

τ1 j η1 j τi j

∀j

(6.13)

Thus the normalized inter-event residuals can be simulated by first simulating η1 j from a univariate normal distribution with zero mean and unit standard deviation, and by subsequently evaluating other normalized inter-event residuals using Equation 6.13. The IS procedure for η1 j is similar to that for ε j , except that the alternate sampling distribution is univariate normal rather than multivariate normal, and has unit standard deviation and a positive mean msinter . The likelihood ratio in this case is f (t) 1 1 2 2 = exp (t − msinter ) − t g(t) 2 2

(6.14)


124

where t denotes a realization of the normalized inter-event residual. The authors have found that values of msinter between 0.5 and 1.0 produce an appropriate number of normalized inter-event residuals from the tail of the distribution.

6.4


In this chapter, it is intended to obtain the exceedance curve for a lifeline loss measure denoted L (e.g., travel-time delay in a transportation network) considering seismic hazard. The exceedance curve, which provides the annual exceedance rates of various values of L, is the product of the exceedance probability curve and the total recurrence rate of earthquakes exceeding the minimum considered magnitude on all faults. nf

νL≥u =

∑ νj

! P(L ≥ u)

(6.15)

j=1

A simple way to compute the annual exceedance rates, while treating each fault separately, n

f would be to compute ∑ j=1 ν j P(L j ≥ u), where P(L j ≥ u) denotes the exceedance probability for fault j, and the ν j values account for unequal recurrence rates across faults. That

approach is not possible here because the importance sampling of Equation 6.9 makes separation by faults difficult. In Equation 6.15, P(L ≥ u) is the probability that the loss due to any earthquake event of interest (irrespective of the fault of occurrence) exceeds u. It can be computed using the simulated maps, and in that form already accounts for the individual P(L j ≥ u) values and the ν j values.

6.4.1

Risk assessment based on realizations from Monte Carlo simulation

If a catalog of n intensity maps obtained using the conventional MCS approach is used for ˆ ≥ u)) can the risk assessment, the empirical estimate of the exceedance probabilities (P(L


125

be obtained as follows (from Equation 6.4): n ˆ ≥ u) = 1 ∑ I(li ≥ u) P(L n i=1

(6.16)

where li is the loss level corresponding to intensity map i, and I(li ≥ u) is an indicator function which equals 1 if li ≥ u and 0 otherwise.

6.4.2

Risk assessment based on realizations from importance sampling

The summand in Equation 6.16 can be evaluated using the approach described in section 6.3. Assuming that a catalog of r importance sampling-based intensity maps are used for evaluating the risk, the estimate of the exceedance probability curve can be obtained as follows (from Equation 6.5): r ˆ ≥ u) = 1 ∑ I(li ≥ u) fS (i) P(L r i=1 gS (i)

where

fS (i) gS (i)

(6.17)

is the importance sampling weight corresponding to scenario intensity map i,

which can be evaluated as follows: f (m) f ( j|m) f (e) f (t) fS (i) = = Λi gS (i) g(m) g( j|m) g(e) g(t)

(6.18)

where m, j, e, t denote the magnitude, fault, normalized intra-event residuals and normalized inter-event residual corresponding to map i respectively. The terms in Equation 6.18 can be obtained from Equations 6.9, 6.10, 6.12 and 6.14. Equation 6.17 shows that the exceedance probability curve is obtained by weighting the indicator functions by the importance sampling weights for the maps. In the rest of the chapter, this weight is denoted Λi as shown in Equation 6.18. Using this notation for weight, Equation 6.17 can be rewritten as follows: r r i ≥ u)Λi ˆ ≥ u) = 1 ∑ I(li ≥ u)Λi = ∑i=1 I(l P(L r r i=1 ∑i=1 Λi

(6.19)


126

The second equality in the above equation comes from the fact that ∑ri=1 Λi = r, as seen by ˆ ≥ 0) = 1. substituting u = 0 in the equation and noting that P(L The variance (var) of this estimate can be shown to be ˆ ≥ u) 2 ∑ri=1 I(li ≥ u)Λi − P(L ˆ ≥ u) = var P(L (∑ri=1 Λi ) (∑ri=1 Λi − 1)

6.5

(6.20)

Data reduction using K-means clustering

The use of importance sampling causes a significant improvement in the computational efficiency of the simulation procedure, but the number of required IS intensity maps is still large and may pose a heavy computational burden. K-means clustering [McQueen, 1967] is thus used as a data reduction technique in order to develop a smaller catalog of maps by ‘clustering’ simulated ground-motion intensity maps with similar properties (i.e., similar spectral acceleration values at the sites of interest). This data reduction procedure is also used in machine learning and signal processing, where it is called vector quantization [Gersho and Gray, 1991]. K-means clustering groups a set of observations into K clusters such that the dissimilarity between the observations (typically measured by the Euclidean distance) within a cluster is minimized [McQueen, 1967]. Let Sa 1 , Sa 2 , · · · , Sa r denote r maps generated using importance sampling to be clustered, where each map Sa j is a p-dimensional vector defined by Sa j = Sa1 j , Sa2 j , · · · , Sa p j . The K-means method groups these maps into clusters by minimizing V , which is defined as follows: K

V=∑

∑

kSa j − Ci k2

(6.21)

i=1 Sa j ∈Si

where K denotes the number of clusters, Si denotes the set of maps in cluster i, Ci = [C1i ,C2i , · · · ,C pi ] is the cluster centroid obtained as the mean of all the maps in cluster i, and kSa j − Ci k2 denotes the distance between the map Sa j and the cluster centroid Ci . If the Euclidean distance is adopted to measure dissimilarity, then the distance between Sa j


127

and Ci is computed as follows: p

kSa j − Ci k2 =

∑

Saq j −Cqi

2

(6.22)

q=1

In its simplest version, the K-means algorithm is composed of the following four steps: Step 1: Pick K maps to denote the initial cluster centroids. This selection can be done randomly. Step 2: Assign each map to the cluster with the closest centroid. Step 3: Recalculate the centroid of each cluster after the assignments. Step 4: Repeat steps 2 and 3 until no more reassignments take place. Once all the maps are clustered, the final catalog can be developed by selecting a single map from each cluster, which is used to represent all maps in that cluster on account of the similarity of the maps within a cluster. In other words, if the map selected from a cluster produces loss l, it is assumed that all other maps in the cluster produce the same loss l by virtue of similarity. The maps in this smaller catalog can be used in place of the maps generated using importance sampling for the risk assessment (i.e., for evaluating ˆ ≥ u)), which results in a dramatic improvement in the computational efficiency. This P(L is particularly useful in applications where it is practically impossible to compute the loss measure L using more than K maps (where K equals a few hundreds). In such cases, the maps obtained using IS can be grouped using the K-means method into K clusters, and one map can be randomly selected from each cluster in order to obtain the catalog of intensity maps to be used for the risk assessment. This procedure allows us to select K strongly dissimilar intensity maps as part of the catalog (since the maps eliminated are similar to one of these K maps in the catalog), but will ensure that the catalog is ‘stochastically representative’. Because only one map from each cluster is now used, the total weight associated with the map should be equal to the sum of the weights of all the maps in that cluster (∑ri=1 Λi ). It is to be noted that even though the maps within a cluster are expected to be similar, for probabilistic consistency, a map must be chosen from a cluster with a probability proportional to its weight. Equation 6.19 can then be used with these sampled maps and the total weights to compute an exceedance probability curve using the catalog


as follows:

I l(c) ≥ u (∑i∈c Λi ) ∑K c=1 ˆ ≥ u) = P(L ∑K c=1 (∑i∈c Λi )

128

(6.23)

where l(c) denotes the loss measure associated with the map selected from cluster c Appendix 6.8 shows that the exceedance probabilities obtained using Equation 6.23 will be unbiased. This and the fact that all the random variables are accounted for appropriately is the reason why the catalog selected is claimed to be stochastically representative. Incidentally, the computational efficiency of this procedure can be improved with minor modifications to the clustering approach, as described in Appendix 6.9.

6.6

Application: Seismic risk assessment of the San Francisco Bay Area transportation network

In this section, the San Francisco Bay Area transportation network is used to illustrate the feasibility of the proposed risk assessment framework. It is intended to show that the seismic risk estimated using the catalog of 150 intensity maps matches well with the seismic risk estimated using the conventional MCS framework and a much greater number of maps (which is the benchmark approach). The catalog size of 150 is chosen since it may be tractable to a real-life lifeline risk assessment problem. If reduced accuracy and reduced emphasis on very large losses is acceptable, the number of maps could be reduced even further. Alternately, a larger number of maps can be chosen if the computational demand remains tractable.

6.6.1

Network data

The San Francisco Bay Area transportation network data are obtained from Stergiou and Kiremidjian [2006]. Figure 6.2a shows the Metropolitan Transportation Commission (MTC) San Francisco Bay Area highway network, which includes 29,804 links (roads) and 10,647 nodes. The network also consists of 1,125 bridges from the five counties of the Bay Area. Stergiou and Kiremidjian [2006] classified these bridges based on their structural properties in accordance with the HAZUS [1999] manual. (The HAZUS [1999] fragility


129

functions are used here only for illustrative purposes, and more realistic fragility functions can be used if applicable.) This classification is useful for estimating the structural damage to bridges due to various simulated intensity maps. The Bay Area network consists of a total of 1,120 transportation analysis zones (TAZ), which are used to predict the trip demand in specific geographic areas. The origin-destination (OD) data provided by Stergiou and Kiremidjian [2006] were obtained from the 1990 MTC household survey [Purvis, 1999]. Analyzing the performance of a network as large and complex as the San Francisco Bay Area transportation network under a large number of scenarios is extremely computationally intensive. Therefore, an aggregated representation of the Bay Area network is used for this example application. The aggregated network consists predominantly of freeways and expressways, along with the ramps linking the freeways and expressways. The nodes are placed at locations where links intersect or change in characteristics (e.g., change in the number of lanes). The aggregated network comprises of 586 links and 310 nodes, and is shown in Figure 6.2b. Of the 310 nodes, 46 are denoted centroidal nodes that act as origins and destinations for the traffic. These centroidal nodes are chosen from the centroidal nodes of the original network in such a way that they are spread out over the entire transportation network. The data from the 1990 MTC household survey are aggregated to obtain the traffic demands at each centroidal node. The aggregation involves assigning the traffic originating or culminating in any TAZ to its nearest centroidal node. Of the 1,125 bridges in the original network, 1,038 bridges lie on the links of the aggregated network and are considered in the risk assessment procedure. While the performance of the aggregated network may or may not be similar to that of the full network, the aggregated network serves as a reasonably realistic and complex test case for the proposed framework, to demonstrate its feasibility. The goal is to demonstrate that the data reduction techniques proposed here produce the same exceedance curve as the more exhaustive MCS. The simplified network is simple enough that MCS is feasible, but still retains the spatial distribution and network effects that are characteristic of more complex models. If the proposed techniques can be shown to be effective for this simplified model, then they can be used with more complex models where validation using MCS is not feasible.


6.6.2

130

Transportation network loss measure

A popular measure of network performance is the travel-time delay experienced by passengers in a network after an earthquake [Stergiou and Kiremidjian, 2006, Shiraki et al., 2007]. The delay is computed as the difference between the total travel time in the network before and after an earthquake. Estimating travel time in the network The total travel time (T ) in a network is estimated as follows: T=

∑

xiti (xi )

(6.24)

i∈links

where xi denotes the traffic flow on link i and ti (xi ) denotes the travel time of an individual passenger on link i. The travel time on link i is obtained as follows [Bureau of Public Roads, 1964]: β # xi 1+α ci

" f ti (xi ) = ti

(6.25)

f

where ti denotes the free-flow link travel time (i.e., the travel time of a passenger if link i were to be empty), ci is the capacity of link i, α and β are calibration parameters, taken as 0.15 and 4 respectively [Shiraki et al., 2007]. Travel times on transportation networks are usually computed using the user equilibrium principle [Beckman et al., 1956], which states that each individual user would follow the route that will minimize his or her travel time. Based on the user-equilibrium principle, the link flows in the network are obtained by solving the following optimization problem: Z xi

min

∑

i∈{links} 0

ti (u)du

(6.26)

subject to the following constraints:

∑ j∈paths

f jod = Qod

∀o ∈ {org}, d ∈ {dest}

(6.27)


xi =

∑ ∑

∑

f jod δ jiod

∀i ∈ {links}

131

(6.28)

o∈org d∈dest j∈paths

f jod ≥ 0

∀o ∈ {org}, d ∈ {dest}, j ∈ {paths}

(6.29)

where f jod denotes the flow between origin o and destination d that passes through path j (here, a path denotes a set of links through which the flow between a specified origin and a specified destination occurs), Qod denotes the desired flow between o and d, δ jiod is an indicator variable that equals 1 if the link i lies on path j and 0 otherwise, org denotes the set of all origins and dest denotes the set of all destinations. The current research work uses a popular solution technique for this optimization problem provided by Frank and Wolfe [1956]. It is to be noted that there are also other travel time and traffic flow estimation techniques such as the dynamic user equilibrium formulation [e.g., Friesz et al., 1993] that could incorporate the non-equilibrium conditions which might exist after an earthquake. Post-earthquake network performance The current work assumes for simplicity that the post-earthquake demands equal the preearthquake demands even though this is known not to be true [Kiremidjian et al., 2003]. The changes in network performance after an earthquake are assumed to be due only to the delay and rerouting of traffic caused by structural damage to bridges. The damage states of the bridges are computed considering only the ground shaking, and other possible damage mechanisms such as liquefaction are not considered. The bridge fragility curves provided by HAZUS [1999] are used to estimate the probability of a bridge being in or exceeding a particular damage state (no damage, minor damage etc.) based on the simulated groundmotion intensity (spectral acceleration at 1 second) at the bridge site. These damage state probabilities are then used to simulate the damage state of the bridge following the earthquake. Damaged bridges cause reduced capacity in the link containing the bridge. The reduced capacities corresponding to the five different HAZUS damage states are 100% (no damage), 75% (slight damage/ moderate damage) and 50% (extensive damage/ collapse). The non-zero capacity corresponding to the bridge collapse damage state may seem surprising at first glance. This is based on the argument that there are alternate routes (apart from the freeways and highways considered in the model) that provide reduced access to


132

transportation services in the event of a freeway or a highway closure [Shiraki et al., 2007]. Such redundancies are prevalent in most transportation networks. A network can have several bridges in a single link, and in such cases, the link capacity is a function of the damage to all the bridges in the link. The current work assumes that the link capacity reduction equals the average of the capacity reductions attributable to each bridge in the link. This is a simplification, and further research is needed to handle the presence of multiple bridges in a link. The post-earthquake network performance is then computed by solving the user-equilibrium problem using the new set of link capacities, and a new estimate of the total travel time in the network is obtained. It is to be noted that the current work estimates the performance of the network only immediately after an earthquake. The changes in the performance with network component restorations are not considered here for simplicity.

6.6.3

Ground-motion hazard

The San Francisco Bay Area seismicity information is obtained from USGS [2003]. Ten active faults and fault segments are considered. The characteristic magnitude-recurrence relationship of Youngs and Coppersmith [1985] is used to model f (m) with the distribution parameters specified by the USGS, and 5.0 considered to be the lower bound magnitude of interest. The flattening of this magnitude distribution towards the maximum magnitude value (Figure 6.1) is to account for the higher probability of occurrence of the characteristic earthquake on the fault [Youngs and Coppersmith, 1985]. The ground-motion model of Boore and Atkinson [2008] is used to obtain the median ground-motion intensities and the standard deviations of the residuals needed in Equation 6.1.

6.6.4


Risk assessment using importance sampling The IS framework requires that the parameters of the sampling distribution for the magnitude and the residuals be chosen reasonably in order to obtain reliable results efficiently. The set of parameters includes the appropriate stratification for magnitudes, the mean-shift


133

for normalized inter-event residuals (msinter ) and the mean-shift for normalized intra-event residuals (msintra ). The stratification of the range of magnitudes is carried out so as to obtain a desired histogram of magnitudes. The partition width is chosen to be 0.3 between 5.0 and 6.5, 0.15 between 6.5 and 7.3 and 0.05 beyond 7.3. The results obtained using the simulations are not significantly affected by moderate variations in the partitions, suggesting that the stratification will be effective as long as it is chosen to preferentially sample large magnitudes. Normalized inter-event residuals are sampled using an msinter of 1.0. Using the procedure described earlier, the value of msintra is fixed at 0.3. The loss measure of interest here is the travel-time delay (i.e., the variable L denoting loss measure in the previous section is the travel-time delay). Figure 6.3a shows the exceedance curve for travel-time delays obtained using the IS framework. This exceedance curve is obtained by sampling 25 magnitudes, each of which is then positioned on the active faults as described in Section 6.3.2, and 50 sets of inter and intra-event residuals for each magnitude-location pair (resulting in a total of 12,500 maps). To validate the IS, an exceedance curve is also estimated using the benchmark method (MCS). Strictly, the benchmark approach should use MCS to sample the magnitudes and the ground-motion residuals. This is computationally prohibitive, however, even for the aggregated network and hence the benchmark approach used in the current study uses IS for generating the magnitudes but MCS for the residuals. IS of a single random variable has been shown to be effective in a wide variety of applications including lifeline risk assessment [Kiremidjian et al., 2003], and so further validation is not needed. On the other hand, the simulation procedure for intra-event residuals involves the novel application of IS of a correlated vector of random variables, and hence, is the focus of the validation study described in this section. Figure 6.3a shows the exceedance curve obtained using IS for generating 25 magnitudes and MCS for generating 500 sets of inter and intra-event residuals per magnitudelocation pair, resulting in a total of 125,000 maps. As seen from the figure, the exceedance curve obtained using the IS framework closely matches that obtained using the benchmark method, indicating the accuracy of the results obtained using IS. This is further substantiated by Figure 6.3b, which plots the estimated coefficient of variation (CoV) (computed using Equations 6.19 and 6.20) of the exceedance rates obtained using the IS approach and


134

Figure 6.2: (a) San Francisco Bay Area transportation network (b) Aggregated network.

Figure 6.3: (a) Travel-time delay exceedance curves (b) Coefficient of variation of the annual exceedance rate (c) Comparison of the efficiency of MCS, IS and the combination of K-means and IS (d) Travel-time delay exceedance curve obtained using the K-means method.


135

the benchmark approach. It can be seen from the figure that the CoV values corresponding to travel-time delays obtained using IS are comparable to those obtained using MCS even though the IS uses one-tenth the number of simulations required by the MCS. Further, it is also seen that using IS in place of MCS for simulating magnitudes typically reduces the computational expense of the risk assessment by a factor of 10, and hence, the overall IS framework reduces the number of computations required for the risk assessment by a factor of nearly 100. It is to be noted that IS produces unbiased risk estimates, and any minor deviation between the IS and the MCS curves in Figure 6.3a is due to the small variances in the risk estimates. Risk assessment using IS and K-means clustering The 12,500 maps obtained using IS are next grouped into 150 clusters using the K-means method. A catalog is then developed by randomly sampling one map from each cluster in accordance with the map weights as described in section 6.5. This catalog is used to estimate the travel-time delay exceedance curve based on Equation 6.23, and the curve is seen to match reasonably well with the exceedance curve obtained using the IS technique (Figure 6.3a). Based on the authors’ experience, the deviation of this curve from the IS curve at the large delay levels is a result of the variance of the exceedance rates rather than any systematic deviation. The variance in the exceedance curves is a consequence of the fact that the map sampled from each cluster is not identical to the other maps in the cluster (although they are similar). To ascertain the variance of the exceedance rates, the clustering and the map selection processes are repeated several times in order to obtain multiple catalogs of 150 representative ground-motion intensities, which are then used for obtaining multiple exceedance curves. The coefficient of variation of the exceedance rates are then computed from these multiple exceedance curves and are plotted in Figure 6.3b. It can be seen that the CoV values obtained using the 150 maps generated by the IS and K-means combination are about three times larger than those obtained using the 12,500 IS maps and the 125,000 MCS maps. This is to be expected, though, on account of the large reduction in the number of maps. The factor of three increase in the CoVs, however, is significantly smaller than what can be expected if IS and MCS are used to obtain the 150 maps directly. This can be


136

seen from Figure 6.3b, which shows the large CoV values of the exceedance rates obtained using 150 ground-motion maps selected directly using the IS and the MCS procedures. Alternately, the relative performances of the IS and K-means combination, the IS method and the MCS method can also be assessed by comparing the number of maps to be simulated using these methods in order to achieve the same CoVs. It is seen that 3,500 IS maps and 11,750 MCS maps are necessary to produce similar CoVs (Figure 6.3c) achieved using the 150 IS and K-means combination maps. Finally, Figure 6.3d shows the mean exceedance rates, along with the empirical 95 percentile (point-wise) confidence interval obtained using the K-means method. Also shown in this figure is the exceedance curve obtained using the IS technique. The mean K-means curve and the IS curve match very closely, indicating that the sampling and data reduction procedure suggested in this work results in unbiased exceedance rates (This is also theoretically established in Appendix 6.8). These width of the confidence interval turns out to be reasonably small, especially considering that the exceedance rates have been obtained using only 150 intensity maps. If the K-means clustering procedure is effective, intensity maps in a cluster will be similar to each other. Therefore, the travel-time delays associated with all the maps in a cluster should be similar to one another, and different from the travel-time delays associated with the maps in other clusters. In other words, the mean travel-time delays computed using all the maps in one cluster should be different from the mean from other clusters, while the standard deviation of the travel-time delays in a cluster should be small as a result of the similarity within a cluster. Conversely, ‘random clustering’ in which the maps obtained from the IS are randomly placed in clusters irrespective of their properties would be very inefficient. Figure 6.4 compares the mean and the standard deviation of cluster travel-time delays, obtained using K-means clustering and random clustering. The smoothly varying cluster means obtained using K-means as compared to the nearly uniform means obtained using random clustering shows that the K-means has been successful in separating dissimilar intensity maps. Similarly, the cluster standard deviations obtained using K-means are considerably smaller than the standard deviations obtained using random clustering for the most part (and are large for larger cluster numbers because all delays in these clusters are large). The occasional spikes in the standard deviations are a result of small sample sizes


137

in some clusters. In summary, the exceedance curves obtained and the results from the tests for the efficiency of K-means clustering indicate that the clustering method has been successful in identifying and grouping similar maps together. As a consequence, substantial computational savings can be achieved by eliminating redundant (similar) maps, without considerably affecting the accuracy of the exceedance rates. It is to be noted that this approach is primarily meant for modeling the upper tail of the risk curve accurately. A conventional Monte Carlo approach might be more appropriate when more frequently exceeded losses such as median loss is of interest. Hazard consistency The proposed framework not only produces reasonably accurate loss estimates, but also intensity maps that are hazard consistent. In other words, the site hazard curves obtained based on the final catalog of intensity maps match the site ground-motion hazard curves obtained from the fault and the ground-motion model using numerical integration (i.e., traditional PSHA). Figures 6.5a and b show the site hazard curves at two different sites obtained using numerical integration, importance sampling (for magnitudes and residuals) and the combination of importance sampling and K-means clustering. It can be seen that the sampling and clustering framework reasonably reproduces the site ground-motion hazard obtained through numerical integration.

6.6.5

Importance of modeling ground-motion uncertainties and spatial correlations

The transportation network risk assessment is repeated assuming uncorrelated intra-event residuals, and a new exceedance curve is obtained, and plotted in Figure 6.6. It can be seen that the risk is considerably underestimated when the spatial correlations are ignored. Further, some past risk assessments have completely ignored the uncertainty in the groundmotion intensities (i.e., median intensity maps are used, and inter- and intra-event residuals are ignored). A risk assessment carried out this way, and plotted in Figure 6.6 shows that the risk is even more substantially underestimated in this case. This happens because the


138

Figure 6.4: (a) Mean of travel-time delays within a cluster (b) Standard deviation of traveltime delays within a cluster. With both clustering methods, cluster numbers are assigned in order of increasing mean travel-time delay within the cluster for plotting purposes.

Figure 6.5: Comparison of site hazard curves obtained at two sample sites using the sampling framework with that obtained using numerical integration. (a) Sample site 1 and (b) Sample site 2.


139

possibility of observing above-median ground-motion intensities during a given earthquake is not considered. Such simplifications clearly introduce significant errors into the risk calculations, and should thus be avoided.

6.7

Conclusions

An efficient simulation-based framework based on importance sampling and K-means clustering has been proposed, that can be used for the seismic risk assessment of lifelines. The framework can be used for developing a small, but stochastically-representative catalog of ground-motion intensity maps that can be used for performing lifeline risk assessments. The importance sampling technique is used to preferentially sample important ground-motion intensity maps, and the K-means clustering technique is used to identify and combine redundant maps. It is shown theoretically and empirically that the risk estimates obtained using these techniques are unbiased. The study proposes importance sampling schemes that can be used for sampling earthquake magnitudes, rupture locations, inter-event residuals and spatially correlated maps of intra-event residuals. Magnitudes are sampled by first stratifying the magnitude range of interest into smaller partitions and by selecting one magnitude from each partition. The partitions are made narrower at larger magnitudes to ensure that larger magnitudes are preferentially sampled. The normalized residuals are sampled from a normal distribution with a positive mean, rather than a zero mean, to sample more large positive residuals. Techniques are also suggested to estimate the optimal parameters of these alternate sampling density functions. The proposed framework was used to evaluate the exceedance rates of various travel-time delays on an aggregated form of the San Francisco Bay Area transportation network. Simplified transportation network analysis models were used to illustrate the feasibility of the proposed framework. The exceedance rates were obtained using a catalog of 150 maps generated using the combination of importance sampling and K-means clustering, and were shown to be in good agreement with those obtained using the conventional Monte Carlo simulation method. Therefore, the proposed techniques can reduce the computational expense of a simulation-based risk assessment by several orders of magnitude, making it practically feasible. The efficiency of the proposed technique was compared to that of conventional


140

techniques using the coefficient of variation (CoV) of the exceedance rates. It was shown that the CoVs achieved using the 150 maps obtained from the combination of importancesampling and K-means clustering can only be reproduced by 3,500 importance-sampling maps and 11,750 MCS maps (conventional MCS for residuals and importance sampling for magnitudes), thereby indicating the efficiency of the proposed technique. The study also showed that the proposed framework automatically produces intensity maps that are hazard consistent. Finally, the study showed that the uncertainties in ground-motion intensities and the spatial correlations between ground-motion intensities at multiple sites must be modeled in order to avoid introducing significant errors into the lifeline risk calculations. For the network considered in this work, ignoring spatial correlations results in about a 30 % reduction in the estimated travel-time delays at small annual exceedance rates (10−6 /year), while ignoring uncertainties results in about a 70 % reduction in the estimated travel-travel time delays at small exceedance rates.

6.8

Appendix: Proof that the exceedance rates obtained using IS and K-means clustering are unbiased

This section illustrates that the loss (e.g., travel-time delay) exceedance rates obtained using a catalog of ground-motion intensities generated by the IS and K-means framework are unbiased. Since the importance sampling procedure produces unbiased estimates [Fishman, 2006], it will suffice to establish that the exceedance rates obtained using the K-means clustered catalog of maps are unbiased estimators of the exceedance rates obtained using the IS maps. This proof will further support the empirical observation that the example exceedance rates from the different procedures are equivalent. Let l1 , l2 , · · · , lr denote the loss measures (e.g., travel-time delay in a transportation network) corresponding to the r intensity maps obtained using importance sampling. Let Λ1 , Λ2 , · · · , Λr denote the weights corresponding to the maps as defined in Equation 6.18. Let PIS denote the exceedance probability curve obtained using the IS maps (Equation 6.19). Assume that the r maps are grouped into K clusters. (This proof does not require


141

knowledge about the clustering technique used.) Let l(c) be the travel-time delay in the network corresponding to the map selected from cluster c. The exceedance probability curve (PˆKM (L ≥ u)) can be obtained from the catalog of l(1) , l(2) , · · · , l(K) based on Equation 6.23. Unbiasedness can be established by showing that the expected value of PˆKM (L ≥ u) equals PÎS (L ≥ u). The expected value of PˆKM (L ≥ u) is computed using the law of iterated expectations, by first conditioning it on a possible grouping G (i.e., a possible grouping of maps into clusters obtained using the clustering method), and then by computing the expectation over all possible groupings. The following equations describe this procedure:

" # I l ≥ u Λ ∑K ∑ i i∈c (c) c=1 E PˆKM (L ≥ u) = E ∑K c=1 ∑i∈c Λi " # ∑K c=1 I l(c) ≥ u ∑i∈c Λi = E ∑ri=1 Λi #) ( " ∑K c=1 I l(c) ≥ u ∑i∈c Λi = EG E G ∑ri=1 Λi " # K 1 = EG ∑ P l(c) ≥ u G ∑ Λi ∑ri=1 Λi c=1 i∈c # " K ∑ j∈c I l j ≥ u Λ j 1 = EG ∑ ∑ Λi ∑ri=1 Λi c=1 ∑ j∈c Λ j i∈c " # K 1 = EG ∑ ∑ I lj ≥ u Λj ∑ri=1 Λi c=1 j∈c = =

1 ∑ri=1

(6.30)

K

I Λi ∑ ∑

lj ≥ u Λj

c=1 j∈c r ∑i=1 I(li ≥ u)Λi ∑ri=1 Λi

= PÎS (L ≥ u) This shows that the exceedance rates obtained using the small catalog of ground-motion intensities are unbiased.


6.9

142

Appendix: Improving the computational efficiency of the K-means clustering method

Clustering a large number of intensity maps (e.g., 12,500) in a single step may be computationally prohibitive on computers with limited memory and processing ability, because clustering involves repetitive computations of the distance between each map and the cluster centroids. In such cases, the authors propose the following two-step clustering technique in which the maps are preliminarily grouped into clusters using a simplified distance measure, followed by a rigorous final clustering step using the distance measure defined in Equation 6.22. This two-step process is described below. In the preliminary clustering step, the intensity maps are grouped into a small number of preliminary clusters with the distance between map Sa j and centroid Ci computed as 2 p p ∑q=1 Saq j − ∑q=1 Cqi . In other words, the distance measure is based on the sum of the intensities corresponding to the intensity map. The sum of the intensities is chosen as the basis for clustering since it has been seen in past research [Campbell and Seligson, 2003] and in the current research work to be a reasonable indicator of the risk associated with an intensity map. Further, the K-means method is extremely fast when the distance is based on a single parameter. The final clustering step is used to refine the preliminary clusters, and involves further clustering within each preliminary cluster using the distance measure defined in Equation 6.22. If 50 preliminary clusters are used, each of these could be subdivided into 3 clusters using the K-means method. Even though the more rigorous distance measure is used in this step, it is much faster because the final clustering is based on a far fewer number of maps stored within each preliminary cluster. Further, the memory demand in this case is much smaller than when clustering is carried out in a single step. Figure 6.7 shows the (point-wise) confidence intervals of the travel-time delay exceedance curves obtained using the two-step clustering procedure, where 50 preliminary clusters are each subdivided in to three final clusters. It can be seen from Figures 6.3d and 6.7 that the results obtained using both the single-step and the two-step clustering approaches are essentially identical. For this application, the two-step clustering procedure is five times faster than the single-step clustering procedure.


143

Figure 6.6: Exceedance curves obtained using simplifying assumptions.

Figure 6.7: Travel-time delay exceedance curve obtained using the two-step clustering technique.

Chapter 7 Lifeline performance assessment using statistical learning techniques 7.1

Abstract

Chapter 6 proposed a simulation-based method involving importance sampling and Kmeans clustering to efficiently generate a small catalog of stochastically-representative ground-motion maps that can be used for lifeline risk assessment. The current study focuses on the highly computationally demanding task of estimating the confidence interval for the risk estimates obtained using this simulation-based method. Estimating the confidence intervals is computationally intensive because it requires repetitive risk calculations (in order to estimate a variance for the risk estimates) that in turn involve numerous lifeline performance evaluations. In order to reduce the computational demand, the catalog of ground-motion maps generated in Chapter 6 is used in conjunction with a statistical learning technique called Multivariate Adaptive Regression Trees (MART) to develop an approximate relationship between the lifeline performance and the ground-motion intensities during an earthquake. The lifeline performance predicted by this relationship can be used in place of the exact lifeline performance (the evaluation of which is intensive) to expedite the computation of several lifeline risk-related parameters, including confidence intervals.

144

CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 145

Figure 7.1: Sample ground-motion map corresponding to an earthquake on the San Andreas fault. A map is a collection of ground movement levels (ground-motion intensities) at all the sites of interest. The sites of interest, in this case, are located in the San Francisco Bay Area.

7.2

Introduction

Probabilistic seismic risk assessment for lifelines is less straightforward than for individual structures. Lifelines, by virtue of their large size and geographic spread are affected by earthquakes that originate on several faults, which necessitates the consideration of numerous probable future earthquake scenarios. Further, lifeline risk assessment is based on a large vector of spatially-correlated ground-motion intensities. The link between the ground-motion intensities at the sites and the performance of the lifeline is usually not available in closed form. These complexities make it difficult to use analytical frameworks for lifeline risk assessment. As a result, Monte Carlo simulation (MCS)-based methods are commonly used for characterizing spatial ground motions and for estimating lifeline risk [e.g., Campbell and Seligson, 2003, Crowley and Bommer, 2006, Kiremidjian et al., 2007, Shiraki et al., 2007].


In the MCS approach, several possible future ground-motion maps (which are collections of ground-motion intensities at all the sites of interest) are probabilistically generated, and the performance of the lifeline is evaluated under each intensity map. (A sample ground-motion map due to a magnitude 8 earthquake on the San Andreas fault is shown in Figure 7.1. This particular map has been simulated without consideration of local-site effects purely for illustration purposes, but the studies carried out in this thesis include local-site effects.) This approach is, however, highly computationally intensive, primarily because it involves repeated evaluations of lifeline performance under a large number of simulated ground-motion intensity maps. In the past, researchers have used several simplifying assumptions (e.g., a single dominating scenario earthquake, deterministic groundmotion intensities, absence of spatial correlation) in order to reduce the required number of simulations. These simplifications can, however, lead to inaccuracies in the risk assessment results, as discussed elsewhere in the thesis. Chapter 6 [Jayaram and Baker, 2010] proposed a simulation-based method involving importance sampling and K-means clustering to efficiently generate a small catalog of stochastically-representative ground-motion maps that can be used for lifeline risk assessment. Importance sampling is used to preferentially sample events with extreme groundmotion intensities that contribute to the lifeline risk. K-means clustering is used to eliminate redundant intensity maps (i.e., maps that are similar to other maps). They showed that the risk estimates obtained using this small catalog are in good agreement with those obtained using the conventional MCS that uses a much larger number of simulations. The current study focuses on the highly-computationally demanding task of estimating the confidence intervals for the risk estimates obtained using the above described simulation-based method. Estimating the confidence intervals is computationally intensive because it requires repetitive risk calculations (in order to estimate a variance for the risk estimates) that involves numerous lifeline performance evaluations. In order to reduce the computational demand, the catalog of ground-motion maps generated in Chapter 6 is used in conjunction with a statistical learning technique called Multivariate Adaptive Regression Trees (MART) [Friedman, 1999] to develop an approximate relationship between the lifeline performance and the ground-motion intensities during an earthquake. The lifeline performance predicted by this relationship can be used in place of the exact lifeline


performance (the evaluation of which is intensive) to expedite the computation of several lifeline risk-related parameters. One notable work in this regard is that of Guikema [2009] who proposed to use approximate regression relationships for evaluating the lifeline performance. That work, however, is purely conceptual and does not give concrete examples. Chapter 6 estimated the travel-time delay exceedance curves for the San Francisco Bay Area transportation network. In this study, the performance relationship developed using MART is used for estimating confidence intervals for these curves. It is seen that the confidence intervals obtained using MART match well with those obtained using the exact loss function.

7.3

Brief introduction to ground-motion map sampling

This section describes the conventional Monte Carlo ground-motion sampling procedure as well as the importance sampling and K-means clustering procedures used in Chapter 6.

7.3.1

Conventional MCS of ground-motion maps

The distribution of the ground-motion intensity at any particular site is predicted using a ground-motion model, which takes the following form [e.g., Boore and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]: ln(Sai j ) = ln S¯ai j + σi j εi j + τi j ηi j

(7.1)

where Sai j denotes the spectral acceleration (at the period of interest) at site i during earthquake j; S¯ai j denotes the predicted (by the ground-motion model) median spectral acceleration, which depends on parameters such as magnitude, distance, period and local-site conditions; εi j denotes the normalized intra-event residual and ηi j denotes the normalized inter-event residual. Both εi j and ηi j are univariate normal random variables with zero mean and unit standard deviation. σi j and τi j are standard deviation terms that are estimated as part of the ground-motion model and are functions of the spectral period of interest, and in some models also functions of the earthquake magnitude and the distance of the site from the rupture.


Probabilistic sampling of ground-motion intensity at multiple sites involves the following steps [Crowley and Bommer, 2006, Jayaram and Baker, 2010]: Step 1: Use MCS to generate earthquakes of varying magnitudes on the active faults in the region, considering appropriate magnitude-recurrence relationships (e.g., the Gutenberg Richter relationship). Step 2: Using a ground-motion model (Equation 7.1), obtain the median ground-motion intensities (S¯ai j ) and the standard deviations of the inter-event and the intra-event residuals (σi j and τi j ) at all the sites. Step 3: Generate the normalized inter-event residual term (ηi j ) by sampling from the univariate normal distribution. Step 4: Simulate the normalized intra-event residuals (εi j ’s) using the parameters predicted by the ground-motion model. Chapter 2 [Jayaram and Baker, 2008] showed that a vector of spatially-distributed normalized intra-event residuals ε j = ε1 j , ε2 j , · · · , ε p j follows a multivariate normal distribution. Hence, the distribution of ε j can be completely defined using the mean (zero) and standard deviation (one) of εi j , and the correlation between all εi1 j and εi2 j pairs. The correlations between the residuals can be obtained from a predictive model calibrated using past ground-motion intensity observations [Jayaram and Baker, 2009a, Wang and Takada, 2005]. Step 5: Combine the median intensities, the normalized intra-event residuals and the normalized inter-event residual for each earthquake in accordance with Equation 7.1 to obtain ground-motion intensity maps (i.e., obtain Sa j = Sa1 j , Sa2 j , · · · , Sa p j ).

7.3.2

Importance sampling of ground-motion maps

Most of the past research works use random MCS (based on the original distributions of magnitudes and residuals) for simulating ground-motion maps (with the notable exception of Kiremidjian et al. [2007] who used importance sampling for magnitudes). While small magnitude earthquakes and average values of residuals are highly probable, they are less interesting for risk assessment purposes, where we are interested in large values of these random variables. Hence, Chapter 6 proposed to sample these random variables preferentially from the tails of their distributions by sampling from alternate distributions. The


Figure 7.2: (a) Stratified sampling of earthquake magnitudes (b) Importance sampling of residuals. magnitudes were simulated using stratified sampling, where the entire range of magnitudes was stratified into bins, with the bin width being large at small magnitudes and small at large magnitudes (Figure 7.2a), and one magnitude was selected from each bin. This ensures an adequate sampling of large magnitude events. The residuals were sampled from a multivariate normal distribution with positive means for the residuals rather than zero means (Figure 7.2b) (in order to sample large values of residuals). Overall, the large magnitude events combined with large positive residuals lead to large values of ground-motion intensities in the sampled maps. It was seen that the importance sampling procedure results in two orders of magnitude reduction in the number of samples needed for the risk assessment.

7.3.3

K-means clustering

The use of importance sampling causes significant improvement in the computational efficiency of the simulation procedure, but the number of required IS intensity maps is still large and may pose a heavy computational burden. The K-means clustering [McQueen, 1967] was used in Chapter 6 as a data reduction technique in order to develop a smaller catalog of maps by ‘clustering’ simulated ground-motion intensity maps with similar properties (i.e., similar spectral acceleration values at the sites of interest), and subsequently using only one map from each cluster. The clustering was performed using the K-means


algorithm, which groups a set of observations into K clusters such that the dissimilarity between the observations (typically measured by the Euclidean distance) within a cluster is minimized [McQueen, 1967]. In its simplest version, the K-means algorithm comprises of the following four steps: Step 1: Pick K maps to denote the initial cluster centroids (This selection can be done randomly.) Step 2: Assign each map to the cluster with the closest centroid. Step 3: Recalculate the centroid of each cluster after the assignments. Step 4: Repeat steps 2 and 3 until no more reassignments take place. For instance, Figure 7.3 shows four simulated ground-motion maps, two of which can be grouped together due to their similarity. Once all the maps are clustered, the final catalog is developed by selecting one map from each cluster, which is used to represent all maps in that cluster on account of the similarity of the maps within a cluster. In other words, if the map selected from a cluster produces loss l, it is assumed that all other maps in the cluster produce the same loss l (by virtue of similarity). The maps in this smaller catalog can be used in place of the maps generated using importance sampling for the loss assessment, which results in a dramatic improvement in the computational efficiency. Both the importance sampling and the K-means clustering methods make the final set of maps unequiprobable (i.e., each map is not equally likely). Hence, suitable weights (e.g., importance sampling weights) are attributed to these maps so that risk estimates obtained using these maps are unbiased. The details of these weight calculations and a proof of unbiasedness can be found in Chapter 6.

7.4

Confidence intervals for lifeline risk estimates

Chapter 6 used the catalog of maps generated using IS and K-means (described above) to obtain the travel-time delay exceedance curve (i.e., rates of exceedance of various traveltime delays) for the San Francisco Bay Area transportation network. In this work, it is of interest to obtain the confidence intervals for the exceedance rates in a computationallyefficient manner.


Figure 7.3: Four simulated ground-motion maps, two of which are reasonably similar and grouped together into one cluster.


Figure 7.4: (a) The San Francisco Bay Area transportation network (b) Aggregated model.

7.4.1

Network data

This section describes the properties of the San Francisco Bay Area transportation network used as the sample lifeline in this work. The relevant network data were obtained from Stergiou and Kiremidjian [2006]. Figure 7.4a shows the Metropolitan Transportation Commission (MTC) San Francisco Bay Area highway network, which consists of 29,804 links (roads) and 10,647 nodes. The network also consists of 1,125 bridges from the five counties of the Bay Area. The traffic demand-supply data were obtained from the 1990 MTC household survey [Purvis, 1999]. Analyzing the performance of a network as large and complete as the San Francisco Bay Area transportation network under maps generated, in particular, by conventional MCS is extremely computationally intensive. Therefore, an aggregated representation of the Bay Area network is used for this example application. The aggregated network consists predominantly of freeways and expressways, along with the ramps linking the freeways and expressways. The nodes are placed at locations where links intersect or change in characteristics (e.g., change in the number of lanes). The aggregated network comprises of 586 links and 310 nodes (Figure 6.2b). While the performance of the aggregated network


may or may not be similar to that of the full network, the aggregated network should serve as a reasonably realistic and complex test case for the proposed framework. If desired, the methods developed here can be applied to the complete network as well.

7.4.2

Ground-motion hazard data

The San Francisco Bay Area seismicity information is obtained from USGS [2003]. Ten active faults and fault segments are considered in the current work. The characteristic magnitude-recurrence relationship of Youngs and Coppersmith [1985] is used to model the density function for magnitudes with the distribution parameters specified by the USGS. The ground-motion model of Boore and Atkinson [2008] is used to obtain the median ground-motion intensities and the standard deviations of the residuals needed in Equation 7.1.

7.4.3

Statistical description of the problem

Let X denote the ground-motion intensities at all the sites of interest in one ground-motion map. The number of sites equals 1,125 (the number of bridges in the network) and hence, X is a reasonably large-dimensional vector. Let x 1 , x 2 , · · · , x m denote various importance sampling realizations of X . Assume that these realizations are segmented using K-means clustering into K clusters. (The clustering attempts to minimize the sum of the Euclidean distances between the vectors in the clusters from the cluster medians as described in Section 6.5 of this thesis.) The lifeline losses are then computed using just one map sampled from each cluster x (1) , x (2) , · · · , x (K) in place of the m original samples. The loss estimates are appropriately weighted (weights are denoted by w(i) ) in order to ensure statistical consistency. The complete details about the weights can be found in Chapter 6. It is of interest to empirically estimate the exceedance curve of a loss function L (e.g., travel-time delay), and the corresponding confidence interval (CI). The rate of exceedance of a loss value equals the rate of occurrence of earthquakes multiplied by the probability of exceedance of the loss value (P(L > l)). The probability of exceedance can be estimated


Figure 7.5: Exceedance rates of travel-time delays. as follows:

K ˆ > l) = ∑ I L(xx(i) ) > l w(i) P(L

(7.2)

i=1

where I[.] is an indicator variable and the w’s are the weights referred to earlier. A sample exceedance curve (which provides the rate of observing various levels of travel-time delays on the aggregated transportation network) is shown in Figure 7.5. The loss function L(xx) used in this case is the travel-time delay induced by ground-motion map x . (The structural damage to the bridges increases the free-flow travel times in the roads, and increases the overall travel time in the network.) The network delays are computed using the static user-equilibrium framework [Frank and Wolfe, 1956]. This study intends to obtain a pointwise CI for the exceedance rates of losses.

7.4.4

Confidence intervals using bootstrap

The confidence intervals (CI) for the risk estimates can be obtained by repeating the entire risk assessment process several times in order to obtain multiple exceedance curves, and by estimating the CIs as the quantiles of these exceedance curves. In other words, this procedure involves repeating the IS and the K-means clustering procedures multiple times to obtain multiple catalogs of 150 ground-motion maps each. Each catalog is used to obtain one exceedance curve, and the CIs are estimated as the quantiles of this set of exceedance


curves. Applying IS multiple times can be computationally-inefficient, therefore the current work uses bootstrap resampling to simplify this procedure. For simplicity, denote the collection of the original set of importance sampled maps, (xx1 , x2 , · · · , xm ), as x˜ . The first step involved in the procedure is to obtain B bootstrap realizations of x˜ (denoted x˜ ∗b for b ∈ [1, B]). A bootstrap realization of x˜ is a set of maps sampled with replacement from (xx1 , x 2 , · · · , x m ) [Efron and Tibshirani, 1997] (In other words, the sets x˜ ∗b ’s are obtained by bootstrapping the original set, rather than by resampling using IS.) The second step ish to cluster x˜i∗b into 150 clusters, pick one map from each cluster, and compute θˆ ∗ (l) = Pˆ L(˜x ∗ ) > l (where x˜ ∗ denotes the 150 maps obtained after clustering x˜ ∗ and b

(b)

b

(b)

selecting one map from each cluster), for all b and all l values of interest. The collection ∗ (l)’s at all values of l denotes the probability of exceedance curve obtained using of θˆ(b) the bootstrapped and clustered set of ground-motion maps x˜ ∗(b) . The point-wise bootstrap confidence interval is then estimated as the quantiles of the replicates (i.e., θˆ ∗ (l)’s) for (b)

each value of l [Davison and Hinkley, 1997]. In essence, this procedure involves repeating (using bootstrap) the simulation procedure several times, and obtaining the confidence intervals using quantiles of the collection of exceedance curves obtained. h i The biggest hurdle in the above procedure is the computation of Pˆ L(˜x ∗(b) ) > l B times, given that it is computationally intensive to estimate this even once (which is the reason why the importance sampling and the clustering are used in the first place). This is not the case for the aggregated network used in this study, but is certainly true for real-life networks. Hence, it is intended to use an approximate loss estimation calculation obtained using a non-parametric regression between the lifeline loss (L) and the ground-motion intensities (˜x ∗b ). This approximate is used in place of the exact loss function for evaluh loss function i ating B values of Pˆ L(˜x ∗ ) > l . The procedure used for obtaining the approximate loss (b)

function is described in the next section.


7.4.5

Approximate loss estimation using non-parametric regression

Application of MART to loss estimation Multiple additive regression trees (MART) is a methodology for predictive data mining (regression and classification). For a set of input ground-motion maps x’s ∈ x˜ and corresponding loss values L’s, the goal is to find a function F(xx) that maps x to L, such that over the joint distribution of all input-loss pairs, the expected value of the squared prediction error is minimized. MART is a gradient boosting algorithm [Friedman, 1999] that expresses this function F as an additive expansion of the form P

Lˆ = F(xx) =

∑ β ph (xx; a p)

(7.3)

p=0

where Lˆ denotes the predicted loss value, the functions h (xx; a p ) are called ‘base learners’ which are functions of x with parameters a p . In the case of MART, the base learners are regression trees [Brieman et al., 1983]. It is advantageous to use MART over other regression techniques for approximating the loss function for the following reasons: (a) there are considerably more input variables (1,125) than data points (150) and hence, it is infeasible to use classical regression for this purpose (requires regularized regression), (b) using a non-parametric model allows for quicker model fitting, (c) MART is capable of modeling highly nonlinear behavior, and (d) MART is resistant against moderate to heavy contamination by bad measurements (outliers) of the predictors and/or the responses, missing values, and to the inclusion of potentially large numbers of irrelevant predictor variables that have little or no effect on the response [Friedman, 2002]. The MART prediction model is developed based on the 150 intensity maps obtained using the IS and the K-means approaches. Figure 7.6a shows the comparison of the predicted losses and the exact losses for a cross-validation set of maps. It is to be noted that the cross-validation set used to develop the model is chosen to be different from the training set in order to obtain an unbiased estimate of the accuracy of the model. The overall prediction accuracy is quite reasonable, but the predictions show small biases. The plot of ˆ - the exact delay (L)) versus predicted loss residuals (computed as the predicted delay (L) values (Figure 7.6b) shows that small losses are slightly over-predicted, while large losses


Figure 7.6: (a) Predicted vs. exact delay values (b) Prediction residuals. are substantially under-predicted. In order to not adversely affect the prediction accuracy, a bias correction needs to be applied to the predictions from the MART model. The next subsection describes the bias correction procedure used in the current work. Bias correction using LOESS The bias correction procedure involves estimating the residual (bias) as a function of the predicted delay, and subtracting it from the predicted value. The residual is fit as a function of the predicted delay using locally weighted scatterplot smoothing (LOESS) [Efron and Tibshirani, 1997], as shown in Figure 7.7a. As expected from previous comparisons of exact and predicted losses, the residual is positive for small loss values and negative for large loss values, and the LOESS fit captures this effect well. The corrected loss predictions are obtained by subtracting out the residual (provided by LOESS) from the MART loss prediction. A comparison of these corrected predictions against the exact values is shown in Figure 7.7b. The figure shows a significantly better match between the exact and the predicted values. For further validation, the loss exceedance curves are estimated using the exact and the approximate loss functions (after bias correction), and are shown in Figure 7.8. The figure shows a very good match between the two curves illustrating the accuracy of the loss prediction model developed.


Figure 7.7: (a) A LOESS fit to the prediction residuals (b) Predicted and exact delay values after bias correction.


Figure 7.8: Two sample exceedance curves obtained using the exact and the approximate loss functions (after bias correction). Study of residuals The predictions from MART and LOESS are not exact, as evidenced by the scatter around the predictions. Figure 7.9a shows the plot of residuals (i.e., observed value - predicted value) versus the predicted loss values. While using the predictive model, it is important to appropriately account for this variability, particularly while estimating confidence intervals, since the smoothed predictions (obtained when the residuals are ignored) will result in an underestimation of the variance and the width of the CI of the risk estimates. Figure 7.9a shows that the residuals are heteroscedastic (i.e., the standard deviation of the residuals varies with the predicted value), and that the standard deviation of the residuals increases linearly with the predicted loss. In order to model these residuals, they are first normalized by the predicted losses (i.e., the residuals are divided by the predicted losses) and these normalized residuals shown in Figure 7.9b are seen to be homoscedastic. A normal Q-Q plot of these normalized residuals, shown in Figure 7.10, indicates that the residuals can be reasonably assumed to follow a normal distribution (since the deviation from the 45◦ straight line is negligible). Further, the standard deviation of the normalized residuals is estimated to be 0.27 and hence, the residuals are modeled as follows: ε ∼ N(0, 0.27F(xx))

(7.4)


Figure 7.9: (a) Residuals from the prediction model (b) Residuals normalized (divided) by the predicted delays.

Figure 7.10: Normal Q-Q plot of the residuals.


where ε denotes the residual, N(0, 0.27F(xx)) denotes the normal distribution with mean 0 and standard deviation 0.27F(xx) and F(xx) denotes the predicted loss for ground-motion map x . Summary of the loss prediction procedure For a given ground-motion map x , the approximate loss is evaluated as follows: (a) Use MART to obtain Fˆ1 (xx), which is a biased estimate of the loss. ˆ Fˆ1 (xx)) (b) Estimate the bias using the LOESS fit: B( ˆ Fˆ1 (xx)) (c) Obtain the bias-corrected prediction: Fˆ2 (xx) = Fˆ1 (xx) − B( (d) Simulate a residual (e) from the univariate normal distribution N(0, 0.27Fˆ2 (xx)) ˆ x) = Fˆ2 (xx) + e (e) Obtain the final estimate of the loss: F(x Discussion: Importance of data selection for training the MART model This section illustrates the reason behind obtaining a reasonably good MART fit despite using only 150 training samples. The good fit is primarily because the importance sampling and the K-means clustering procedures that are used for selecting the training catalog of 150 maps select highly dissimilar maps that cover almost all the intensity values (even rare intensities) of interest to the decision maker. In other words, the 150 maps are fairly representative of the ground-motion hazard in the region. This does not happen, for instance, when the maps are selected using random MCS. In order to illustrate this, 150 maps were sampled using random MCS and are used to fit a MART model. The comparison between the exact and the predicted losses from this new MART model for the cross-validation set used earlier in Section 7.4.5 is shown in Figure 7.11. The random MCS method samples a lot of ground-motion maps with small but frequently-observed ground-motion intensities that correspond to very small travel-time delays. As a result, the model performs very poorly while predicting the losses due to the large intensity maps present in the crossvalidation set (unlike in the training set).


Figure 7.11: MART model fitted using 150 MCS maps.

7.4.6

Bootstrap confidence intervals estimated using the exact and the approximate loss functions

Results and discussion Bootstrap confidence intervals are estimated for the travel-time delay exceedance curve using the procedure described in Section 7.4.4. In summary, the maps obtained using importance sampling (12,500 importance sampled maps were used in Chapter 6) are first bootstrapped (sampled with replacement) to obtain 1000 sets of 12,500 maps each (denoted as x˜ ∗b in Section 7.4.4). Each of these 1000 sets are then segmented into 150 clusters each using the K-means clustering procedure. 150 maps are drawn (one from each cluster) from each set (denoted as x˜ ∗(b) in Section 7.4.4), and are used for obtaining the exceedance curve for that set (denoted as L(˜x ∗(b) ) in Section 7.4.4). Figure 7.13a and b show the 1000 exceedance curves obtained using the exact and approximate (MART+LOESS) loss functions respectively. The point-wise CIs for these curves are estimated as the quantiles of the 1000 loss curves at each loss level. This procedure is summarized in Figure 7.12. Figure 7.14a shows the CIs obtained using the exact and the approximate loss functions.


Figure 7.12: Methodology for estimating bootstrap confidence intervals for the loss curves.

Figure 7.13: 1000 bootstrapped exceedance curves obtained using the (a) exact loss function (b) approximate loss function.


Figure 7.14: Bootstrap confidence intervals. The two curves do not match perfectly, but are reasonably close to one another. Given that it will be computationally almost impossible to estimate the CI using the exact loss function in practice, the CI obtained using MART+LOESS is a reasonable substitute. It was mentioned earlier that it is important to model the residuals in the MART + LOESS predictions in order to obtain accurate CIs. This is illustrated by Figure 7.14b, which shows the CIs obtained using the exact and the approximate loss functions, but without accounting for the residuals. The CI estimated using the predicted losses is considerably narrower than the CI estimated using the exact losses, indicating that the ‘smoothed’ prediction results in an underestimation of the variance of the risk estimates. (The additional jaggedness seen is due to the use of only 200 bootstrap samples while estimating the approximate CI.) Sensitivity to the number of bootstrap samples Let B denote the number of bootstrap samples used for estimating the CI. Efron and Tibshirani [1997] recommend a B value of 1000 for obtaining a robust CI. Figure 7.15 shows


Figure 7.15: Bootstrap confidence intervals. the CIs obtained using 20, 200 and 1000 bootstrap samples. It is seen that the CIs obtained using 20 and 200 samples are highly jagged. It is also seen that (not illustrated by Figure 7.15) the CI obtained using only 20 or 200 samples vary from one computation to another. Overall, there is sufficient evidence to conclude that a B value of 1000 is optimal for computing the CI. Balanced bootstrap confidence interval One of the techniques that can be adopted to reduce the number of bootstrap samples is the balanced bootstrap method [Davison et al., 1986]. This method involves simulating bootstrap samples such that each sample observation is used equally often. It has been seen in past works that balanced bootstrap improves on ordinary uniform resampling when employed to estimate distribution functions or quantiles [Hall, 2005], and hence is relevant while estimating CIs. In this study, 200 balanced bootstrap samples were generated (each with 12,500 maps), and are used for estimating 200 exceedance curves. The point-wise confidence interval obtained from these curves is shown in Figure 7.16. It can be seen that this CI is less jagged than that obtained using 200 uniform bootstrap samples (though still not as good as the CI obtained using 1000 uniform samples).


Figure 7.16: Balanced bootstrap confidence intervals.

7.5

Conclusions

The current study focused on the highly-computationally demanding task of estimating the confidence interval for lifeline risk estimates. Estimating the confidence intervals is computationally intensive because it requires repetitive risk calculations (in order to estimate a variance for the risk estimates) that involves numerous lifeline performance evaluations. In order to reduce the computational demand, the stochastically-representative catalog of ground-motion maps generated in Chapter 6 using importance sampling and K-means clustering was used in conjunction with a statistical learning technique called Multivariate Adaptive Regression Trees (MART) to develop an approximate relationship between the lifeline performance and the ground-motion intensities during an earthquake. Prediction biases from the model were modeled using Locally Weighted Scatterplot Smoothing (LOESS), and were subtracted out from the predictions to obtain unbiased performance estimates. The lifeline performances predicted by the combination of MART and LOESS were used in place of the exact lifeline performances (the evaluation of which is intensive) to expedite the computation of the confidence intervals. It was seen that the exceedance curves and their confidence intervals obtained using the exact and the approximate performance measures match well.

Chapter 8 Seismic risk assessment of spatially-distributed systems using ground-motion models fitted considering spatial correlation N. Jayaram and J.W. Baker (2010). Considering spatial correlation in mixed-effects regression, and impact on ground-motion models, Bulletin of the Seismological Society of America (in review).

8.1

Abstract

Ground-motion models are commonly used in earthquake engineering to predict the probability distribution of the ground-motion intensity at a given site due to a particular earthquake event. These models are often built using regression on observed ground-motion intensities, and are fitted using either the one-stage mixed-effects regression algorithm proposed by Abrahamson and Youngs [1992] or the two-stage algorithm of Joyner and Boore [1993]. In their current forms, these algorithms ignore the spatial correlation between intraevent residuals. This chapter theoretically motivates the importance of considering spatial

167

CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS

168

correlation while fitting ground-motion models and proposes an extension to the Abrahamson and Youngs [1992] algorithm that allows the consideration of spatial correlation. By refitting the Campbell and Bozorgnia [2008] ground-motion model using the mixedeffects regression algorithm considering spatial correlation, it is seen that the variance of the total residuals and the ground-motion model coefficients used for predicting the median ground-motion intensity are not significantly different from the published values even after the incorporation of spatial correlation. It is, however, seen that that there is an increase in the variance of the intra-event residual and a significant decrease in the variance of the inter-event residual. These changes have implications for risk assessments of spatiallydistributed systems, because a smaller inter-event residual variance implies lesser likelihood of observing large ground-motion intensities at all sites in a region. An example risk assessment is performed on a hypothetical portfolio of buildings to demonstrate that neglecting the proposed refinement causes an overestimation of the recurrence rates of large losses.

8.2

Introduction

Ground-motion models are commonly used in earthquake engineering to predict the probability distribution of the ground-motion intensity at a given site due to a particular earthquake event. Typically, a ground-motion model takes the following form: Pi j , θ ) + εi j + ηi ln Yi j = f (P

(8.1)

where Yi j denotes the ground-motion intensity parameter of interest (e.g., Sa (T ), the specPi j , θ ) denotes the groundtral acceleration at period T ) at site j during earthquake i; f (P motion prediction function with predictive parameters P i j (e.g., magnitude, distance of source from site, site condition) and coefficient set θ ; εi j denotes the intra-event residual, which is a zero mean random variable with standard deviation σi j ; ηi denotes the interevent residual, which is a random variable with zero mean and standard deviation τi j . The rest of this chapter assumes for simplicity that the residuals have a constant σ (i.e., σi j = σ ) and τ (i.e., τi j = τ) for any given ground-motion intensity parameter (i.e., the residuals are


169

homoscedastic). This assumption is not true in some modern models [e.g., Abrahamson and Silva, 2008], in which case, the concepts remain the same, but some of the equations are no longer directly applicable. Ground-motion models are primarily fitted using two approaches: the two-stage regression algorithm of Joyner and Boore [1993] [e.g., Boore and Atkinson, 2008] and the onestage mixed-effects model regression algorithm of Abrahamson and Youngs [1992] [e.g., Abrahamson and Silva, 2008, Campbell and Bozorgnia, 2008, Chiou and Youngs, 2008]. Joyner and Boore [1993] provide a detailed comparison of these two algorithms. Both these algorithms, in their current forms, assume that the intra-event residuals are independent of each other. The intra-event residuals, however, are known to be spatially correlated [Boore et al., 2003, Wang and Takada, 2005, Goda and Hong, 2008, Jayaram and Baker, 2009a]. Recently, Hong et al. [2009] investigated the influence of including spatial correlation in the regression analysis on the ground-motion models fitted using the two-stage regression algorithm and a one-stage algorithm of Joyner and Boore [1993]. They concluded that the influence of considering spatial correlation on the estimated ground-motion models is negligible based on insignificant changes to the coefficient set θ . Fitting ground-motion models considering correlation does, however, change the variances of the inter-event and the intra-event residuals (observed by Hong et al. [2009] themselves). This chapter provides a theoretical basis for such changes to the variance terms, and also discusses the impact of these changes on the estimated seismic risk of spatially-distributed systems. Further, a modified algorithm based on that of Abrahamson and Youngs [1992] is developed that accounts for the spatial correlation in the mixed-effects regression. This modified algorithm is used to refit the Campbell and Bozorgnia [2008] ground-motion model in order to illustrate the impact of incorporating spatial correlation.

8.2.1

Current regression algorithm

Brillinger and Preisler [1984a,b] first proposed regressing a ground-motion model as a fixed-effects model. In this approach, the ground-motion model takes the following form: Pi j , θ ) + εi(t) ln Yi j = f (P j

(8.2)


170

(t)

where εi j denotes the total residual term at site j during earthquake i. Abrahamson and Youngs [1992] (henceforth referred to as AY92) subsequently developed a more stable algorithm for the regression by treating the ground-motion model as a mixed-effects model. The mixed-effects model differs from the fixed-effects model in its consideration of the error term as being the sum of an intra-event error term and an inter-event error term (Equation 8.1). The inter-event term helps partially account for the correlation between the ground-motion intensities recorded during any particular earthquake. The AY92 algorithm uses a combination of a fixed-effects regression algorithm and a likelihood maximization approach, and is described below in more detail. In the first step of the algorithm, it is assumed that the random-effects terms η1 , η2 , · · · Pi j , θ ) + εi j . The co, ηM equal zero, in which case Equation 8.1 simplifies to ln Yi j = f (P efficient set θ is then estimated based on the observed Yi j ’s using a fixed-effects regression algorithm. In the next step, the standard deviations σ (for the intra-event residuals) and τ (for the inter-event residuals) are computed using the likelihood maximization approach described below. The total residuals (i.e., the sum of the inter-event and the intra-event residuals), denoted (t) εi j ,

can be computed using the θ estimated in the previous step as follows: (t)

Pi j , θ ) εi j = εi j + ηi = ln(Yi j ) − f (P

(8.3)

It is known that the total residuals follow a multivariate normal distribution [Jayaram and Baker,2008], and hence, the likelihood (L1 ) of having observed the set of total residuals (t) ε (t) = εi j can be estimated as follows:

1 1 (t) 0 −1 (t) N C| − ln(L1 ) = − ln(2π) − ln|C ε C ε 2 2 2

(8.4)

where N is the total number of data points, C is the covariance matrix of the total residuals 0 and ε (t) denotes the transpose of ε (t) . While estimating the model coefficients, AY92 assume that the intra-event residuals are independent of each other and of the inter-event


171

residuals. Hence, the covariance matrix C can be written as follows: M

C = σ 2 I N + τ 2 ∑ + 1 ni ,ni

(8.5)

i=1

where I N is the identity matrix of size N by N, 1 ni ,ni is a matrix of ones of size ni by ni , Σ+ indicates a direct sum operation (using the notation of AY92), M is the number of earthquake events and ni is the number of recordings for the ith event. The matrix C can be expanded as follows:  σ 2 I n1 + τ 2 1 n1 ,n1 0   0 σ 2 I n2 + τ 2 1 n2 ,n2   C = . .   . .  0 0

···

0



···

0

···

0

···

0

       

· · · σ 2 I nM + τ 2 1 nM ,nM

(8.6)

The maximum likelihood estimates of σ and τ are those that maximize the likelihood function L1 , and are obtained using numerical optimization. Now, for given θ and the maximum likelihood estimates of σ and τ, the random-effects term ηi is estimated using the maximum likelihood approach as well. The maximum likelihood estimate of ηi is obtained as follows [Abrahamson and Youngs, 1992]:

ηi =

i τ 2 ∑nj=1 ε (t)

ni τ 2 + σ 2

(8.7)

Finally, using the estimated value of ηi , a new set of coefficients θ is obtained using a Pi j , θ )+εi j ). The fixed-effects algorithm for ln(Yi j )−ηi (i.e., considering ln Yi j −ηi = f (P new set θ is then used to reestimate σ , τ and η , and this iterative algorithm is continued until the coefficient estimates converge. In summary, the steps of the mixed-effects algorithm used by AY92 are as follows: 1. Estimate the model coefficients θ using a fixed effects regression algorithm assuming η equals 0. 2. Using θ , solve for the variances of the residuals, σ 2 and τ 2 , by maximizing the likelihood function described in Equation 8.4.


172

3. Given θ , σ 2 and τ 2 , estimate ηi using Equation 8.7. 4. Given ηi , estimate new coefficients (θθ ) using a fixed effects regression algorithm for ln(Yi j ) − ηi . 5. Repeat steps 2, 3 and 4 until the likelihood in step 2 is maximized and the estimates for the coefficient set converge. One drawback of this algorithm is the assumption in Equation 8.5 that the intra-event residuals are independent of each other. It is known that the intra-event residuals are spatially correlated, with the correlation decreasing with increasing separation distance between the residuals [e.g., Jayaram and Baker, 2009a]. Before addressing that issue, the need to account for the spatial correlation in the regression algorithm is illustrated in the next section.

8.2.2

Should spatial correlation be considered in the regression algorithm?

Consider the hypothetical case where the correlation between the intra-event residuals at C ) for any two different sites is a constant equal to ρ. In this case, the covariance matrix (C (t)

the total residuals (εi j ) is defined by the following equations:

(t) (t) C εi j , εi j0 = ρσ 2 + τ 2 ∀ i, j 6= j0 (t) (t) C εi j , εi j = σ 2 + τ 2 ∀ i, j (t) (t) C εi j , εi0 j0 = 0 ∀ j, j0 , i 6= i0

(8.8a) (8.8b) (8.8c)

In summary, the covariance matrix for the total residuals can be expressed as follows: M

C = (1 − ρ)σ 2 I N + (τ 2 + ρσ 2 ) ∑ + 1 ni ,ni i=1

(8.9)


Denoting

173

p √ 1 − ρσ by σ 0 and τ 2 + ρσ 2 by τ 0 , Equation 8.9 can be rewritten as M

C = σ 02 I N + τ 02 ∑ + 1 ni ,ni

(8.10)

i=1

Comparing the forms of Equations 8.5 and 8.10, it can be seen that the algorithm of AY92 actually provides the estimates of σ 0 and τ 0 rather than σ and τ (If spatial correlations are absent, this is correct since σ 0 = σ and τ 0 = τ.) Assume for simplicity that the set of coefficients θ is not affected by the spatial correlation (this assumption is relaxed subsequently). Hence, the ‘correct’ estimates of σ and τ can be estimated from the σ 0 and τ 0 provided by AY92 as follows: σ0 √ σ = 1−ρ q τ = τ 02 − ρσ 2

(8.11a) (8.11b)

It is to be noted from the above discussion and Equation 8.11 that assuming independent intra-event residuals will underestimate σ and overestimate τ. This has implications for lifeline risk assessments since a larger τ implies a higher likelihood of observing large ground-motion intensities throughout the region of interest. Thus, it is important to determine whether fitting the ground-motion equations while considering correlated intra-event residuals changes the estimates of σ and τ significantly.

8.3

Regression algorithm for mixed-effects models considering spatial correlation

This section describes an algorithm for fitting the mixed-effects model while accounting for spatial correlation between intra-event residuals. The algorithm described here differs from that of AY92 in the estimation of the likelihood function L1 (used in step 2) and in the computation of the inter-event residual ηi (step 4). Both these changes are necessary to account for the spatial correlation between intra-event residuals in the regression algorithm.


8.3.1

174

Covariance matrix for the total residuals

The covariance matrix for the total residuals shown in Equation 8.5 is based on the assumption of independence between spatially-distributed intra-event residuals. The covariance matrix in the presence of spatial correlation is described below. Let ρ(d j j0 ) denote the spatial correlation between intra-event residuals at two sites j and j0 as a function of d j j0 , the separation distance between j and j0 . Then,

(t)

(t)

C(εi j , εi j0 ) = C(εi j + ηi , εi j0 + ηi ) = ρ(d j j0 )σ 2 + τ 2 (t) (t) C εi j , εi0 j0 = 0

8.3.2

∀

i, j, j0

(8.12a)

j, j0 , i 6= i0

∀

(8.12b)

Obtaining inter-event residuals from total residuals

The maximum likelihood approach is typically used to estimate a constant but unknown parameter from observed data. The parameter ηi that is of interest here, however, is a random variable in itself, and hence the authors use a Bayesian framework rather than the method of maximum likelihood to estimate ηi . (t)

The prior distribution of ηi is N(0, τ 2 ). Conditional on the knowledge of ηi , the εi j ’s (t)

marginally follow a normal distribution with mean ηi and variance σ 2 (since εi j = εi j + (t)

(t)

ηi ). Also, the correlation coefficient between εi j and εi j0 conditional on ηi is given by ρ(d j j0 ). In other words, the conditional covariance matrix (Cc ) for the total residuals can be expressed as follows:

(t)

(t)

Cc (εi j , εi j0 ) = ρ(d j j0 )σ 2 (t) (t) Cc εi j , εi0 j0 = 0 ∀

i, j, j0

(8.13a)

j, j0 , i 6= i0

(8.13b)

∀

h i (t) (t) (t) (t) Hence the joint density of ε i = εi1 , εi2 , · · · , εini and ηi is expressed as follows:


175

(t)

(t)

(8.14) f (εε i , ηi ) = f (εε i |ηi ) f (ηi ) 0 1 (t) 1 (t) ∝ exp − ε i − ηi 1ni ,1 Cc−1 ε i − ηi 1ni ,1 exp − 2 ηi2 2 2τ h i (t) (t) (t) = εi1 , εi2 , · · · , εini is the collection of total residuals at all the sites during where 0 (t) earthquake i, f (.) denotes the probability density function, ε i − ηi 1ni ,1 denotes the (t) transpose of ε i − ηi 1ni ,1 , and 1ni ,1 denotes a column matrix of ones of length ni . (t) εi

(t)

(t)

(t)

Noting that f (εε i , ηi ) = f (εε i ) f (ηi |εε i ), one possible approach to identify the poste(t)

(t)

rior distribution of ηi given ε i is to divide the joint density into a function of just ε i and (t) (t) a function that also contains ηi . Let Q(εε i ) denote any generic function of only ε i not

containing ηi . Hence,

(t) f (εε i , ηi )

0 1 (t) 1 2 (t) −1 ∝ exp − ε i − ηi 1ni ,1 Cc ε i − ηi 1ni ,1 exp − 2 ηi (8.15) 2 2τ 0 1 2 0 1 (t) 0 −1 (t) −1 −1 (t) = Q(εε i )exp ε i Cc ηi 1ni ,1 + ηi 1ni ,1Cc ε i − ηi 1ni ,1Cc 1ni ,1 2 2 1 exp − 2 ηi2 2τ   2  0 (t) −1 1ni ,1Cc ε i 0  1 1 (t) −1    = Q(εε i )exp − + 1 C 1 η −  i n ,1 0 ni c i −1 2 1 2 τ + 1 C 1 c n ,1 2 i n ,1 τ i (t)

From the above equation, it can be seen that f (ηi |εε i ) has a normal distribution with mean 0

(t)

1n ,1Cc−1 ε i i

0 1 +1n ,1Cc−1 1ni ,1 τ2 i

and variance

1

0 1 +1n ,1Cc−1 1ni ,1 τ2 i

. If the best estimator for ηi is to be obtained

under the squared-error loss criterion, then the Bayesian estimator of ηi equals the posterior mean [Lehmann and Casella, 2003] 0

ηî =

(t)

1ni ,1Cc−1 ε i 1 τ2

0

+ 1ni ,1Cc−1 1ni ,1

(8.16)


176

If the spatial correlation is absent, Cc is simply σ 2 times an identity matrix of size ni 0

0

(t)

(t)

i εi j /σ 2 , and by ni , in which case, 1ni ,1Cc−1 1ni ,1 equals ni /σ 2 and 1ni ,1Cc−1 ε i equals ∑nj=1

Equation 8.16 becomes identical to Equation 8.7.

8.3.3

Algorithm summary

In summary, the steps of the modified mixed-effects algorithm are as follows: 1. Estimate the model coefficients θ using a fixed effects regression algorithm assuming η equals 0. 2. Using θ , solve for the variances of the residuals, σ 2 and τ 2 , by maximizing the likelihood function described in Equation 8.4. The covariance C in Equation 8.4 is estimated using Equation 8.12. 3. Given θ , σ 2 and τ 2 , estimate ηi using Equation 8.16. 4. Given ηi , estimate new coefficients (θθ ) using a fixed effects regression algorithm for ln(Yi j ) − ηi . 5. Repeat steps 2, 3 and 4 until the likelihood in step 2 is maximized and the estimates for the coefficient set converge.

8.3.4

Large sample standard errors of σ and τ

If desired, the standard errors of the inter- and intra-event residual variances can be calculated based on the following results from Searle [1977]:

" 2 #−1 ∂C var(σ 2 ) = 2 tr C−1 ∂ (σ 2 ) " 2 #−1 2 −1 ∂C var(τ ) = 2 tr C ∂ (τ 2 ) where C is the covariance matrix defined in Equation 8.12, tive of C with respect to σ 2 ,

∂C ∂ (τ 2 )

∂C ∂ (σ 2 )

(8.17a)

(8.17b)

denotes the partial deriva-

denotes the partial derivative of C with respect to τ 2 , tr


denotes the trace of a matrix and var denotes variance. The partial derivatives, ∂C , ∂ (τ 2 )

177

∂C ∂ (σ 2 )

and

can be evaluated using numerical differentiation.

Alternately, the standard errors can also be evaluated using statistical techniques such as bootstrap [Efron, 1998].

8.3.5

Mixed-effects regression procedure in R

While mixed-effects regression procedures that consider spatial correlation (referred to as ‘within-group correlation’ in statistical literature) are available in statistical programming languages such as R (e.g., the nlme package of Pinheiro and Bates [2000]), it is potentially more convenient for current users of the Abrahamson and Youngs [1992] algorithm to switch to the modified algorithm described in this chapter. Further, based on the authors’ experience, the nlme implementation in R suffers from numerical instabilities while fitting the over-parameterized ground-motion models, while the implementation of the proposed algorithm in MATLAB recovers from similar numerical instabilities potentially due to a more robust fixed-effects regression implementation in MATLAB.

8.4


In the current study, the algorithm described in the previous section is used to refit the Campbell and Bozorgnia [2008] ground-motion prediction model (henceforth referred to as the CB08 model) for illustration. First, in order to provide a baseline model for comparison, the coefficients of the CB08 model are reestimated while ignoring spatial correlation. For consistency, only records used by CB08 are used for estimating the coefficients. Table 8.1 shows the coefficients estimated in this study for predicting spectral accelerations at 1 second (denoted Sa (1s)) in the uncorrelated case. Also shown in the table for comparison are the corresponding published CB08 model coefficients. Documentation of how these coefficients are used to make predictions is provided by CB08. The estimates of the standard deviations of the intra-event residual and the inter-event residual (i.e., σ and τ respectively) are shown in Table 8.2. The value of the published intra-event residual standard deviation reported here corresponds to that at large Vs 30’s (the Vs 30 is set above a threshold


178

value beyond which the ground-motion model no longer consider soil non-linearity effects, wherein the intra-event residuals have a constant variance at any given period). The refitted coefficients and variance estimates obtained in this work are similar, but not identical, to those reported by CB08. These small discrepancies are likely due to the manual coefficient smoothing carried out by the authors of the CB08 model [Campbell, 2009]. For consistency, the refitted model coefficients are treated as the benchmark values, for comparison to model coefficients obtained considering spatial correlation. It is to be noted that the functional form of the CB08 model required knowledge about the A1100 value (median estimate of PGA on a reference rock outcrop with Vs 30 = 1100m/s) for the median prediction. This is obtained directly using the coefficients of the CB08 model corresponding to PGA (as against fitting a separate model for the PGA’s) for simplicity. This is reasonable because the model coefficients used for predicting median values do not change significantly after incorporating spatial correlation as shown subsequently in this chapter.

The model coefficients are then reestimated considering spatial correlation. The spatial correlation model is obtained from Jayaram and Baker [2009a], and is shown below. ρ(h) = e−3h/b

(8.18)

where h (km) denotes the separation distance between the sites of interest, and b denotes the ‘range’ parameter which determines the rate of decay of correlation. This range is a function of the spectral period, and equals 26km when Sa (1s) is considered. The coefficient estimates (i.e., θ ) obtained in this case are shown in Table 8.1. It can be seen from the table that the coefficients obtained by considering spatial correlation are similar to those obtained by ignoring spatial correlation. This is reinforced by a plot of the predicted medians at all the data sites using these two approaches (Figure 8.1). This matches with the observation of Hong et al. [2009] that the ground-motion model coefficients do not change significantly when considering spatial correlation. While the coefficients for the median predictions are found to be relatively insensitive to the incorporation of spatial correlation, significant changes are seen in the estimates of the variance of the residuals (Table 8.2). In particular, the value of σ increases from 0.578


Case 1 2 3 Case 1 2 3

Table 8.1: Regression coefficients for estimating median Sa (1s) c0 c1 c2 c3 c4 c5 c6 c7 -6.406 1.196 -0.772 -0.314 -2.000 0.170 4.00 0.255 -6.487 1.181 -0.878 -0.379 -2.064 0.195 3.884 0.264 -6.942 1.297 -1.073 -0.182 -2.112 0.198 4.440 0.324 c8 c9 c10 c11 c12 k1 k2 k3 0.000 0.490 1.571 0.150 1.000 400.0 -1.955 1.929 -0.110 0.897 1.577 0.122 0.871 400.0 -1.955 1.929 -0.093 0.796 1.565 0.093 0.865 400.0 -1.955 1.929

Case 1: Published CB08 results [Campbell and Bozorgnia, 2008] Case 2: Estimated in this study without considering spatial correlation Case 3: Estimated in this study considering spatial correlation

Case 1 2 3

Table 8.2: Standard deviations of residuals corresponding to Sa (1s) √ 2 σ τ σ + τ2 0.568 0.255 0.623 0.578 0.223 0.620 0.654 0.157 0.673

Case 1: Published CB08 results [Campbell and Bozorgnia, 2008] Case 2: Estimated in this study without considering spatial correlation Case 3: Estimated in this study considering spatial correlation σ denotes the standard deviation of the intra-event residual τ denotes the standard deviation of the inter-event residual √ σ 2 + τ 2 denotes the standard deviation of the total residual

179


180

Figure 8.1: Comparison of predicted median Sa (1s) values obtained using the CB08 model fitted with and without the consideration of spatial correlation: (a) linear scale (b) log scale. to 0.654 and the value of τ decreases from 0.223 to 0.157 after incorporating the spatial correlation. This trend is to be expected based on the illustrative example shown in Section 8.2.

8.4.1

Standard deviation of residuals as a function of period

The results presented in the previous section support the use of the published coefficients (i.e., θ ) for predicting the median intensities. The values of σ and τ, however, must be obtained considering spatial correlation. This implies that the iterative mixed-effects algorithm described earlier in the chapter can be simplified to a computation of only the residual variances σ 2 and τ 2 (Step 3) using the published values of θ (i.e., the mixed-effects regression is now simply a random-effects regression procedure). Hence, in this work, the CB08 model coefficients are assumed to be the fixed-effects model coefficients, and the total residuals are computed using the records in the PEER NGA database (only those records used by the authors of the CB08 model are considered for compatibility) [Chiou et al., 2008]. The maximum likelihood estimates of σ and τ are then obtained at different spectral acceleration periods from the total residuals using the procedures described earlier. Figure 8.2a compares the estimates of σ obtained in this study to those reported by CB08. It can be seen that the values of σ obtained considering


181

Figure 8.2: Effect of spatial correlation on: (a) estimated intra-event residual standard deviation (σ ), (b) estimated inter-event residual standard deviation (τ), (c) estimated total residual standard deviation. (d) Ratio of inter-event residual standard deviation to total residual standard deviation.


182

spatial correlation are mostly larger than the published σ ’s (which have been estimated ignoring spatial correlations). Figure 8.2b shows that the values of τ, on the other hand, are considerably smaller when spatial correlations are considered. The values of σ and τ are √ then used to compute the standard deviations of the total residuals (computed as σ 2 + τ 2 ), and plotted in Figure 8.2c. It can be seen from this figure that considering spatial correlation does not significantly alter the total residual standard deviation. (Hong et al., 2010 noticed a small reduction in the total residual standard deviation when the spatial correlation was considered. The alteration in the total residual standard deviation could depend on the data set and the spatial correlation model used.) Though the current work only refits the CB08 model, the trends in the values of σ and τ are the same for the other recent NGA ground-motion models [e.g., Boore and Atkinson, 2008, Chiou and Youngs, 2008]. This can be seen from Figure 8.2d, which shows typical ratios of the inter-event residual standard deviation to the total residual standard deviation reported by these ground-motion models. It is seen that the ratios reported by the groundmotion modelers are generally much larger than those estimated in this work considering spatial correlation.

8.4.2

Estimates of spatial correlation

The spatial correlation estimates (Equation 8.18) provided by Jayaram and Baker [2009a] are based on residuals computed using the published ground-motion models that assume independence between intra-event residuals. As discussed earlier, the consideration of spatial correlation while fitting the models does not change the median predictions, and therefore, the total residuals (Equation 8.1). Jayaram and Baker [2009a] also showed that the spatial correlation between intra-event residuals can be estimated directly from total residuals (exactly when the intra-event residuals are homoscedastic and approximately otherwise). Therefore, it can be inferred that the estimates of spatial correlation will be very similar when estimated using ground-motion models fitted with/ without consideration of spatial correlation. In other words, it is still appropriate to use the correlation models previously developed using the published ground-motion models.


8.4.3

183

Risk assessment for a hypothetical portfolio of buildings

Since ignoring spatial correlation while fitting the ground-motion model does not significantly affect the estimates of the ground-motion medians ( f (θθ )) or the standard deviation of the total residuals (Figure 8.2c), hazard and loss analyses for single structures will produce accurate results if the existing ground-motion models are used. Risk assessments for spatially-distributed systems, however, are influenced by the standard deviation of the interevent and the intra-event residuals and not just by the medians and the standard deviation of the total residuals (this is discussed in more detail in the following section). Therefore, risk assessments of such systems carried out using ground-motion models fitted with and without consideration of spatial correlation could result in different loss estimates. In the next section, this is illustrated using a risk assessment carried out on a hypothetical portfolio of buildings located in the San Francisco Bay Area. Consider a hypothetical portfolio of 100 buildings in the San Francisco Bay Area located on a 10 by 10 grid with a grid spacing of 20km. Each building in the portfolio is assumed to have a replacement value of $1,000,000. The seismic risk of this portfolio is estimated by modeling the seismic hazard due to 10 different faults and fault segments. (The source model is obtained from USGS [2003]). The risk assessment is carried out using a simulation-based procedure described in Crowley and Bommer [2006] and Jayaram and Baker [2010]. The steps involved in this procedure are summarized below. Step 1: Simulate earthquakes of different magnitudes on the active faults in the region, using appropriate magnitude-recurrence relationships. Step 2: Using the ground-motion model, compute the median ground-motion intensities ( f (θθ )) and the standard deviations of the inter-event and the intra-event residuals (σ and τ respectively) at the sites of interest. Step 3: Simulate the inter-event residual (i.e., η j ) by sampling from the univariate normal distribution with mean zero and standard deviation τ. Step 4: Simulate the intra-event residuals (i.e., εi j ’s) by sampling from a multivariate normal distribution with mean 0 p,1 (zero vector of size p) and covariance matrix given by Equation 8.12. Here, the spatial correlation (ρ j j0 ) is defined by the exponential model in Equation 8.18 with a range of 26 km.


184

Step 5: Combine the medians, inter-event residuals and intra-event residuals using Equation 8.1 to obtain realizations of the ground-motion intensity at all sites of interest. In the rest of the chapter, each set of ground-motion intensities is referred to as a groundmotion intensity map. The collection of all simulated ground-motion intensity maps quantifies the total ground-motion hazard in the region. Step 6: Simulate the damage to the buildings due to each ground-motion intensity map. Here, this is done using fragility functions which provide the probability of the building damage being in or exceeding various damage states (no damage, minor damage, moderate damage, extensive damage and collapse) as a function of the spectral acceleration at 1 second at the building location. The damage functions were assumed to be cumulative lognormal distribution functions with median values 0.4, 0.5, 0.7 and 0.9 for the minor, moderate, extensive and collapse damage states respectively. The lognormal standard deviation was assumed to be 0.6 in all these cases. Step 7: Compute the total monetary loss associated with the damage to the portfolio due to each ground-motion intensity map. This is computed by assuming the damage ratio (ratio of repair cost to replacement cost) to be 0.03, 0.08, 0.25 and 1.00 for the minor, moderate, severe and collapse damage states respectively. Step 8: Obtain the loss exceedance curve which provides the annual rate of exceedance of various monetary loss values. The loss exceedance curve is obtained as the product of the recurrence rates of all earthquakes in the region and the probability of exceedance of various monetary loss values. The exceedance probabilities are calculated as follows: P(L ≥ l) =

1 n ∑ I(Li ≥ l) n i=1

(8.19)

where P(L ≥ l) is the probability that the loss exceeds l, n denotes the number of simulated ground-motion intensity maps, Li is the monetary loss associated with ground-motion intensity map i, and I(Li ≥ l) is an indicator variable that equals one if Li exceeds l and zero otherwise. The above-mentioned risk assessment process is carried out using the values of σ and τ provided by CB08 as well as with the σ and the τ estimated in this work by considering spatial correlations in the regression formulation (Figures 8.2a and 8.2b). In both cases, the


185

CB08 median model coefficients are used for estimating median intensities. The resulting loss exceedance curves are shown in Figure 8.3. It can be seen in Figure 8.3 that the recurrence rates of extreme losses are overestimated when the CB08 estimates are used. This is a result of the fact that the CB08 model overestimates τ and underestimates σ by ignoring spatial correlation. A large value of τ increases the likelihood of observing large positive inter-event residuals, which will simultaneously increase the ground-motion intensity at all the sites in the region. If spatial correlations are large, a large value of σ will have a similar effect and can result in large ground-motion intensities at multiple sites. In such a case, the effect of underestimating σ is compensated by the effect of overestimating τ. If the spatial correlations are small, however, underestimating σ and overestimating τ will have the net effect of jointly producing more extreme ground-motion intensities at multiple sites than is probable in reality. It can be inferred from Equation 8.18 that the spatial correlation will be small if h is large or if b is small. Therefore, when the components of a spatially-distributed system are well separated (large h) or if the correlation range is small, the ground-motion models fitted without considering spatial correlation will overestimate the likelihood of jointly observing extreme ground-motion intensities at multiple sites. It is to be noted that the separation between the buildings in the hypothetical portfolio considered in this work is substantial, which leads to significant differences between the loss curves obtained with and without consideration of spatial correlation. It is difficult to make general conclusions about the size of this effect, but it is clear that seismic risk analysis calculations using existing ground-motion model estimates of σ and τ will overestimate the chance of observing large losses.

8.5

Conclusions

This work illustrated the impact of considering spatial correlation between intra-event residuals while developing ground-motion models. The mixed-effects algorithm of Abrahamson and Youngs [1992], which assumes independence between intra-event residuals, was modified to account for the spatial correlation between the intra-event residuals. This was done by changing the likelihood function used for estimating the inter-event and the intra-event residual variances given other model coefficients and changing the estimate of


186

Figure 8.3: Risk assessment results for a hypothetical portfolio of buildings performed using ground-motion models developed with and without the proposed refinement. the inter-event residual given the total residuals at multiple sites. The modified algorithm was used to refit the Campbell and Bozorgnia [2008] ground-motion model, to illustrate the effect of this refinement. The variance of the total residuals and the model coefficients used for predicting the median ground-motion intensity were not significantly affected by the proposed refinement. Significant changes, however, were seen in the variance of the intraevent and the inter-event residuals. Incorporating spatial correlation was seen to increase the intra-event residual variance and to decrease the inter-event residual variance. These changes have implications for risk assessments of spatially-distributed systems because a smaller inter-event residual variance implies a lesser likelihood of simultaneously observing larger-than-median ground-motion intensities at all sites in a region. To demonstrate this effect, a risk assessment was performed for a hypothetical portfolio of buildings using the ground-motion models obtained with and without accounting for spatial correlation. The results showed that using the published variance estimates causes an overestimation of the exceedance rates of large losses.

Chapter 9 Hurricane risk assessment of spatially-distributed systems with consideration of wind-field uncertainties and spatial correlation 9.1

Abstract

With a view toward extending the seismic risk assessment techniques developed in this work for risk assessment under other types of hazards, this exploratory study focuses on quantifying the uncertainties and the spatial correlation in hurricane wind fields (using techniques that were used for earthquake ground motion fields), and evaluating their impact on the hurricane risk of spatially-distributed systems. Hurricane wind-speed predictions are obtained for two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the Batts et al. [1980] wind-speed model, and the uncertainties in these predictions are evaluated using ‘actual’ wind-speed recordings. The spatial correlation of wind speeds is estimated and modeled using geostatistical tools. Finally, the impact of the wind-speed uncertainties and the spatial correlation on the hurricane risk of a spatially-distributed system is illustrated by a sample risk assessment of a hypothetical portfolio of buildings. The results of the risk assessment show that the uncertainties and the spatial correlations in the wind 187

CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT

188

fields needs to be modeled in order to avoid introducing errors into the risk calculations of spatially-distributed systems.

9.2

Introduction

Frameworks for the risk assessment of structures and infrastructure systems under natural and man-made hazards share many similarities. Broadly, they involve the quantification of a hazard intensity measure (e.g., ground-motion intensities during earthquakes, wind speeds during hurricanes) and the associated probable losses. The techniques developed in this thesis for seismic risk assessment can thus be applicable for the risk assessment under other types of hazards. This exploratory study extends the seismic hazard and risk assessment concepts and techniques discussed in the earlier chapters of this thesis to hurricane (chosen as a sample alternate hazard) hazard and risk modeling. Vickery et al. [2000b] developed a hurricane wind-hazard model that forms the basis for the wind-speed contours in ASCE Standard 7-02 [2003] in the Southeast U.S. The following steps are involved in this hazard quantification approach: • Step 1: Use historical hurricane records to develop probability density functions (PDF) for key hurricane parameters such as the location of origin, translation direction, translation speed, central pressure and radius of maximum wind. • Step 2: Use the PDFs developed in Step 1 to Monte Carlo simulate probable future hurricanes. • Step 3: Predict the peak wind speeds due to each simulated hurricane at the sites of interest using empirical or physics-based wind-speed models. [e.g., Batts et al., 1980, Vickery et al., 2000a, 2008]. • Step 4: Develop a PDF for the peak wind speeds experienced at any particular site using the wind-speed information from Step 3. In general, it can be seen that this process is similar to probabilistic seismic hazard analysis (PSHA), which is used for quantifying seismic ground-motion hazard at a given site [Cornell, 1968, Kramer, 1996].


189

The wind-hazard model described above can be used in combination with structural fragility curves to obtain the exceedance rates of different levels of structural losses using numerical integration. Alternately, the hazard information can be used in a structural reliability framework to estimate the failure probability of a structure under hurricane loading. For instance, Li and Ellingwood [2009] modeled the site wind speeds as a Weibull random variable (whose distribution is parameterized using the wind hazard information obtained from Vickery et al. [2000b]), and estimated the reliability of low-rise light-frame wood residential construction in the U.S. subjected to hurricane loading. It is, however, difficult to use the two analytical risk assessment approaches described above for assessing the risk of spatially-distributed systems such as portfolios of buildings and lifelines. This is because the risk assessment of spatially-distributed systems is based on a large vector of correlated wind speeds (wind speeds at all component locations), which makes it difficult to use numerical integration and other analytical techniques. Hence, many past research works use Monte Carlo simulation (MCS) instead of analytical approaches for the risk assessment of spatially-distributed systems [e.g., Legg et al., 2010]. The basic MCS approach for the risk assessment involves the following steps: • Step 1: Simulate probable future hurricanes using the PDFs of hurricane-related parameters developed in past research works such as that of Vickery et al. [2000b]. • Step 2: Predict the peak wind speeds due to each simulated hurricane using empirical or physics-based wind-speed models. [e.g., Batts et al., 1980, Vickery et al., 2000a, 2008]. • Step 3: Monte Carlo simulate the total loss due to the wind speeds. • Step 4: Estimate the probability of exceeding various loss levels using the loss estimates from Step 3. Most hurricane wind-speed prediction models developed in the past are deterministic, and the uncertainties in wind fields have been analyzed in few research works. (In this chapter, the wind field denotes the collection of peak wind speeds (over the duration of the hurricane) at all the sites of interest. The peak wind speed at a site (similar to peak


190

ground acceleration for earthquakes) is often the hurricane intensity measure used to estimate probable losses [e.g., Jarvinen et al., 1984, Li and Ellingwood, 2009].) One notable exception is the work of Vickery et al. [2009b], who computed the uncertainties in the wind fields (maximum peak gust wind speed) using observed and predicted (by the Vickery et al. [2008] wind-field model) wind speeds during 24 different hurricanes. They found that the ratio of the observed wind speeds to the predicted wind speeds has a mean of one and and a coefficient of variation of 0.1. Chapter 5 of this thesis illustrated that ignoring the uncertainties in the earthquake ground motions can lead to inaccurate lifeline risk estimates. This study demonstrates the potential for comparable inaccuracies caused by ignoring the uncertainties in wind fields during hurricane risk assessments. The seismic risk assessments described in Chapter 5 of this thesis also illustrated the importance of considering spatial correlation in ground motion fields for obtaining accurate risk estimates. To the author’s knowledge, the spatial correlation in hurricane wind fields has not been studied in the literature. Assume that a probabilistic hurricane wind-speed model takes the following form: ln (Vi ) = ln (V¯i ) + εi

(9.1)

where Vi denotes the observed peak wind speed at site i, V¯i denotes the predicted (by the wind-field model) median peak wind speed at site i, and εi denotes the residual (error term). For clarity, the spatial correlation mentioned in this chapter refers to the correlation between the residuals (ε’s) at two different sites. There is a significant amount of correlation between the wind speeds at two closely-spaced sites during a hurricane (which was considered by Legg et al. [2010]), but a large portion of this correlation is accounted by the wind-speed model, which predicts similar wind speeds at sites close to one another. The residuals (εi ’s) are correlated as well, and this correlation is of interest in this study. (Chapter 3 of this thesis discusses the comparable concept of spatial correlation for earthquake ground-motion fields in detail.) Causes of this correlation include common source effects and similarity in topography- and land friction-related effects. It is of interest to quantify the uncertainties and the spatial correlation in wind fields. In this exploratory study, two sample hurricanes are used, with the primary goals of obtaining


191

approximate estimates for these parameters and illustrating the tools and methods that can be used for the estimation. Further, a sample hurricane risk assessment is carried out for a hypothetical portfolio of buildings in order to illustrate the importance of considering the uncertainties and the spatial correlation in the risk assessment process.

9.3

Spatial correlation estimation methodology

In this chapter, the uncertainties and the spatial correlation in hurricane wind fields are empirically estimated using recorded hurricane wind speeds. The wind-field uncertainties are quantified using the mean and the variance of the residuals (ε’s). The residuals are computed from recorded hurricane wind speeds using Equation 9.1, where the wind-speed predictions are obtained from the Batts et al. [1980] model. This model is chosen in this work for its simplicity, and the analyses performed using this simple model can be repeated with a more rigorous model [e.g., Vickery et al., 2008] if desired. The spatial correlations between the residuals are estimated using well-established geostatistical tools [Deutsch and Journel, 1998, Goovaerts, 1997] that were previously used in Chapter 3 for quantifying the spatial correlation in ground motion fields. These tools are described briefly in this section supplement the more detailed discussion in Chapter 3. Let ε˜ denote the normalized residual, estimated as follows: ε˜i =

εi σ

(9.2)

where σ denotes the standard deviation of the residual. The correlation structure of ε˜i (equivalently, that of εi ) can be represented using a semivariogram, which represents the dissimilarity between the ε˜i ’s. Let u and u0 denote two sites separated by distance vector h , and εu denote the residual at site u. The semivariogram (γ(u, u0 )) is defined as follows: 1 γ(u, u0 ) = E {ε˜u − ε˜u0 }2 2

(9.3)

where E(.) denotes the expectation operator. The semivariogram defined in Equation 9.3 is location-dependent, and its inference


192

requires repetitive realizations of ε˜i at locations u and u0 . Such repetitive measurements are, however, never available in practice. Hence, it is typically assumed that the semivariogram does not depend on site locations u and u0 , but only on their separation h to obtain a stationary semivariogram. The stationary semivariogram (γ(hh)) can then be estimated as follows: γ(hh) =

1 E{ε˜u − ε˜u+h }2 2

(9.4)

A stationary semivariogram is said to be isotropic if it is a function of the separation distance (h = khhk) rather than the separation vector h . An isotropic, stationary semivariogram can be empirically estimated from a data set as follows:

ˆ γ(h) =

1 N(h) ∑ {ε˜uα − ε˜uα +h}2 2N(h) α=1

(9.5)

ˆ where γ(h) is the experimental stationary isotropic semivariogram (estimated from a data set); N(h) denotes the number of pairs of sites separated by h; and {ε˜uα , ε˜uα +h } denotes the α’th such pair. ˆ When empirically estimated, γ(h) only provides semivariogram values at discrete values of h, and hence, a continuous function is usually fitted to the discrete values to obtain the semivariogram for continuous values of h. There are only a few permissible continuous functions that ensure that the covariance matrices estimated using these semivariograms are positive definite [Goovaerts, 1997]. The current study uses the permissible Gaussian semivariogram (shown below), which is seen to provide the best fit to the empirical semivariogram values. ˆ γ(h) = a 1 − exp −3h2 /b2

(9.6)

where a denotes the ‘sill’ of the semivariogram (which in this case equals one, the variance of the normalized residuals) and b denotes the ‘range’ of the semivariogram (which equals ˆ the separation distance h at which γ(h) equals 0.95a).


193

ˆ It can be theoretically shown that the spatial correlation function (ρ(h)) for the normalized residuals can be computed from the semivariogram function as follows: ˆ γ(h) = a (1 − ρ (h))

(9.7)

Hence, it can be seen that the correlations are completely defined by the semivariogram, which in turn, is a function only of the range. (The sill is known to equal 1, the variance of the normalized residuals for which the semivariogram is constructed.) Moreover, note ˆ with from equations 9.5 and 9.7 that a larger range implies a smaller rate of increase in γ(h) h, and subsequently, a smaller rate of decay of correlation with separation distance.

9.4 9.4.1

Results and Discussion Data source

The analyses performed in this study use ‘recorded’ wind-speed information from two hurricanes, namely, Hurricane Jeanne (2004) and Hurricane Frances (2004). In both cases, information about hurricane-related parameters such as central pressure, storm position, direction and translation speed are obtained from the six hour position data provided by the HURDAT database [Jarvinen et al., 1984]. The ‘recorded’ wind-speed data are obtained from the Hurricane Research Division (HRD) H*Wind program [Powell et al., 1996]. The primary data comes from the Air Force Reserves (AFRES) reconnaissance flight-level observations reduced from near 3 km to the surface with a boundary layer model Powell [1980]. Other data sources include ships, buoys, Coastal-Marine Automated Network (CMAN) observations, airport observations including Automated Surface Observing Stations (ASOS), and supplemental data collected after landfall from public and private sources [Powell and Houston, 1997]. Additional data over sea is collected by deploying ‘dropwindsondes’ from aircrafts that drift down on a parachute measuring vertical profiles of pressure, temperature, humidity and wind as they fall [Aberson and Franklin, 1999]. The wind-speed data are quality controlled and processed to conform to a common framework for height of recording (10m), exposure (open terrain) and averaging period (maximum


194

sustained 1 minute wind speed) [Powell et al., 1996, Powell and Houston, 1996]. These data were then objectively analyzed with a technique based upon the spectral application of finite element representation (SAFER) method [Ooyama, 1987, Franklin et al., 1993] in order to obtain an interpolated grid of peak wind speeds. In past research, tests on the SAFER methodology have indicated that the technique correctly reproduced known surface wind fields based on the available wind observations [Houston et al., 1999].

9.4.2

Hurricane Jeanne (2004)

This section describes the uncertainties and the spatial correlation estimated using the recorded wind field from the 2004 Hurricane Jeanne. Hurricane Jeanne formed on September 13, 2004 and made its landfall and stayed over Florida on September 26. In this study, recorded hurricane and wind-field data collected between September 24-26 are used for the analysis. Figure 9.1a shows the observed maximum (over the duration of the hurricane) wind speeds during Hurricane Jeanne, and Figure 9.1b shows the maximum wind speeds predicted by the Batts et al. [1980] model. The HURDAT database only provides six hour hurricane-related data. In order to obtain a finer resolution of wind speeds over time, the six hour hurricane data are interpolated linearly to obtain 30 minute data, which are then used to predict (using the Batts et al. [1980] model) the wind speeds at every 30 minute interval and subsequently the peak wind speeds over the duration of the hurricane (Figure 9.1b). Figure 9.1c shows the residuals computed using Equation 9.1. As mentioned earlier, the Batts et al. [1980] model is chosen in this study primarily for its simplicity. It is, however, seen to predict wind speeds that are biased as a function of the closest distance of the site from the hurricane track (denoted di for site i). This is illustrated by Figure 9.2a, which shows the residuals as a function of the d’s. This plot indicates that, in general, the Batts et al. [1980] model under-predicts wind speeds at sites far away from the hurricane track, and over-predicts wind speeds at sites close to the hurricane track. The newer windspeed models have smaller biases on account of the availability of larger data sets and better model development techniques. Therefore, in order to prevent the Batts et al. [1980] model bias from affecting the uncertainty and the correlation estimates, a simple bias correction


195

Figure 9.1: Hurricane Jeanne: (a) Observed wind speeds (b) Predicted wind speeds (c) Residuals (d) Bias-corrected residuals.


196

Figure 9.2: Residuals and bias-corrected residuals versus closest distances from the hurricane track. is performed before analyzing the residuals. Since the plot between the ε’s and the d’s (Figure 9.2a) shows a linear trend, a bias correction factor is obtained using a linear regression between ε and d. The bias correction is then added to the predicted wind speeds in order to eliminate the bias. Figure 9.2b shows the residuals obtained after the bias correction (denoted ε` in the rest of the chapter). Some minor local trends can still be seen between the residuals and the closest distances, but these are reasonably insignificant compared to the overall trend seen in Figure 9.2a. (It might be possible to employ other bias correction techniques to completely eliminate the trends, but this is not done in this exploratory study. Further this may not be necessary while using the more recent wind-speed models.) The scatter in Figure 9.2b is the bias-corrected ‘aleatory’ uncertainty in the wind-speed predictions. The histogram and the normal quantile-quantile (QQ) plot of the ε` (the normal QQ plot is estimated after dividing the ε` ’s by their standard deviation) are shown in Figure 9.3. The figure shows that the residuals have a heavier upper tail than the normal distribution. But normality holds reasonably well until a normalized ε` value of 2. In the rest of the chapter, the residuals are assumed to follow a normal distribution for simplicity during simulation, though this assumption should be verified using data from other recorded hurricanes and


197

Figure 9.3: (a) Histogram of bias-corrected residuals estimated using the Hurricane Jeanne data (b) Normal QQ plot of normalized bias-corrected residuals from Hurricane Jeanne. particularly, using newer wind-speed models. The ε` ’s have mean zero (on account of the bias correction) and standard deviation 0.15, which roughly agrees with the coefficient of variation of 0.1 reported by Vickery et al. [2008] (noting that the coefficient of variation of the multiplicative error term defined by Vickery et al. [2008] is comparable to the standard deviation of ε, if the ε’s reasonably follow a normal distribution). This standard deviation is much smaller than that of the total residuals computed from ground-motion fields (∼ 0.6), but is not negligible. Figure 9.4 shows the semivariogram computed using the ε` ’s. The experimental semivariogram values are fitted using the Gaussian model shown in Equation 9.6, with a range of 170km. The Gaussian fit is more appropriate here (as compared to the exponential fit used for earthquake ground-motion intensity semivariograms) on account of the smoothlyvarying wind-speed residual field. The range of 170km is chosen so that the fit is better at short distances (≤ 20km), even if this requires some misfit with empirical data at large separation distances. As described in Chapter 3, this is because it is more important to model the semivariogram structure well at short separation distances. Large separation distances are associated with low correlations, which thus have relatively little effect on joint distributions of ground motion intensities. In addition to having low correlation, widely separated sites also have little impact on each other due to an effective ’screening’ of their


198

Figure 9.4: Semivariogram of bias-corrected residuals estimated using the Hurricane Jeanne data. influence by more closely-located sites (Goovaerts, 1997). It can be seen from Figure 9.4 that the extent of spatial correlation is much larger than what was seen from the earthquake data [Jayaram and Baker, 2009a]. This is not surprising since hurricane wind speeds are less influenced by factors such as local heterogeneities that reduce the spatial correlation in ground-motion fields.

9.4.3

Hurricane Frances (2004)

Hurricane Frances formed on August 24, 2004 and made its landfall in Florida on September 4. In this study, recorded hurricane and wind-field data collected between September 4-6 are used for the analysis. The hurricane track data and the recorded wind speeds are obtained from the HURDAT database and the HRD respectively. The peak wind speed predictions are obtained using the Batts et al. [1980] model. Figure 9.5 shows the bias corrected residuals obtained from the Hurricane Frances recordings.


199

Figure 9.5: Bias-corrected residuals estimated using the Hurricane Frances data.

Figure 9.6: (a) Histogram of bias-corrected residuals estimated using the Hurricane Frances data (b) Normal QQ plot of normalized bias-corrected residuals from Hurricane Frances.


200

Figure 9.7: Semivariogram of bias-corrected residuals estimated using the Hurricane Frances data. Figure 9.6 shows the histogram and the normal QQ plot of the ε` ’s. The QQ plot indicates that the residuals are more heavy tailed than the normal distribution beyond plusminus two standard deviations. The ε` ’s have mean zero (on account of the bias correction) and standard deviation 0.13. Figure 9.7 shows the semivariogram computed using the ε` ’s. The experimental semivariogram values are fitted using the Gaussian model shown in Equation 9.6, with a range of 130km.

9.4.4

Hurricane risk assessment of a hypothetical portfolio of buildings

This section describes the simulation-based hurricane risk assessment of a hypothetical portfolio of buildings, and illustrates the importance of modeling the uncertainties and the spatial correlation in the wind fields for obtaining accurate risk estimates. The portfolio considered here consists of five two-story residential buildings (gable roof, 6d roof sheathing nails, shingle roof cover, wood frames, two-nailed roof/wall connections, no garage)


201

Figure 9.8: Portfolio of five residential buildings considered in the risk assessment. located in Palm Bay, Florida at coordinates [-80.6,28], [-80.7,28], [-80.5,27.9], [-80.6,27.9] and [-80.5,27.8] (Figure 9.8). The replacement value of each building is assumed to be $1,000,000. It is of interest in this study to evaluate the exceedance rates of post-Hurricane Jeanne losses to this portfolio. The steps involved in the risk assessment procedure are described below. • Step 1: Using the wind-speed model of Batts et al. [1980], the parameters of Hurricane Jeanne obtained from HURDAT, and the bias correction of Section 9.4.2, the wind speeds are predicted at all the sites of interest. • Step 2: The residuals (ε in Equation 9.1) at the sites of interest are assumed to follow a multivariate normal distribution with mean zero and standard deviation 0.15, based on the findings in this work. The spatial correlation is defined by the Gaussian model in Equation 9.6 with a range of 150km. The residuals are simulated at the sites of interest using this distribution. The simulation approach is described in detail in Chapter 4. • Step 4: The predicted wind speeds and the simulated residuals are combined using Equation 9.1 to obtain realizations of the wind speeds at all the sites of interest (i.e., a simulated wind field). • Step 5: The building losses due to each simulated wind field are evaluated using damage functions provided by HAZUS [2006]. These damage functions provide an estimate of the mean damage ratio (ratio of loss sustained to the replacement cost) as a function of the peak gust wind speed experienced during the hurricane (in this case,


202

the simulated peak wind speed at the building site). It is to be noted that the damage functions provided by HAZUS [2006] are deterministic [Vickery et al., 2009a]. For the purposes of this exploratory study these deterministic damage functions are used, but more realistic damage functions can be used with this framework if desired. • Step 6: Obtain the probability of exceedance of various monetary loss values. The exceedance probabilities are calculated as follows:

P(L ≥ l) =

1 n ∑ I(Li ≥ l) n i=1

(9.8)

where P(L ≥ l) is the probability that the loss exceeds l, n denotes the number of simulated wind fields, Li is the monetary loss associated with wind field i, and I(Li ≥ l) is an indicator variable that equals one if Li exceeds l and zero otherwise. It is to be noted that the steps described above do not include steps 1 and 2 listed in Section 3.2 since the risk assessment carried out post-Hurricane Jeanne does not require the simulation of hurricane paths and other hurricane-related parameters (The recorded Hurricane Jeanne parameters are directly used.) The exceedance probabilities obtained for the portfolio are shown in Figure 9.9. Also shown in the figure are the exceedance probabilities obtained by ignoring the spatial correlation between the residuals when performing the simulation in Step 2. It can be seen that ignoring the spatial correlation results in an overestimation of the probability of exceeding small losses and an underestimation of the probability of exceeding large losses. The extent of the overestimation and the underestimation will be smaller if the uncertainties in the damage function are considered, but the risk estimates will nevertheless be inaccurate. Several past risk assessments have completely ignored the uncertainty in the wind fields (i.e., predicted wind speeds are used, and the residuals are ignored). The loss estimate obtained in this deterministic case is shown by the vertical line in Figure 9.9. It is seen that the loss estimate at moderately large probabilities of exceedances can be significantly smaller than some of the probable loss estimates obtained when the residuals and the correlations are considered.


203

Figure 9.9: Portfolio loss exceedance probabilities.

9.5

Limitations and research needs

This section lists some of the challenges and needs in hurricane risk assessment research, and discusses the limitations of the approach proposed in this chapter. One of the primary concerns in developing empirical models related to hurricane wind speeds is the availability of reliable wind speed data from past hurricanes for use in model development. The HRD H*Wind program partly alleviates the concerns by processing data from a multitude of data sources including from low flying aircraft, ships, buoys, airport observations and other public and private data sources. In addition, the use of dropwindsondes improves the overall data quality over sea. Nevertheless, boundary layer models are used to convert the collected data to a common framework for height of recording (10m), exposure (open terrain) and averaging period (maximum sustained 1 minute wind speed), and interpolation algorithms (SAFER) are used to estimate wind speeds over a grid of points. Therefore, the wind fields developed by HRD are not entirely empirical, but rather involve the use of additional algorithms which can have an impact on the data quality. The current study uses hurricane recordings from only two hurricanes for quantifying the hurricane wind-speed uncertainties and spatial correlations. The Batts et al. [1980]


204

wind-speed model was chosen primarily for its simplicity, but it has known limitations. For instance, the model does not consider the reduction in wind speeds attributable to land friction, but rather assumes a constant 15% reduction in the wind speed when the hurricane enters the land from the sea. This is likely to result in correlated prediction errors at neighboring sites with similar levels of land friction, which will increase the estimated value of spatial correlation. In the future, the uncertainties and the spatial correlations need to be estimated using data from additional hurricanes, using a newer and a more rigorous wind-speed model such as that of Vickery et al. [2008]. In this study, a deterministic damage function obtained from HAZUS [2006] was used for the illustrative hurricane risk assessment. A probabilistic damage function that captures the uncertainties in the losses during a hurricane should be used in future works. This will give a better estimate of the importance of considering wind-field uncertainties and spatial correlation in hurricane risk assessments. The illustrative risk assessment carried out in this work estimated the risk of a portfolio of buildings. Further research is required to estimate the hurricane-based risk of lifelines such as transportation networks. The risk assessment did not involve simulation of hurricane tracks, rather only estimates the risk given that hurricane Jeanne had occurred. The current research only considered the wind hazard during the hurricanes, and did not consider the flood and storm surge hazards.

9.6

Conclusions

An exploratory study was carried out to investigate the extension of the seismic hazard and risk assessment concepts and techniques discussed in the earlier chapters to hurricane hazard and risk modeling. The study focused on quantifying the uncertainties and the spatial correlation in hurricane wind fields (using techniques that were used to quantify these parameters in earthquake ground motion fields), and evaluating their impact on the hurricane risk of spatially-distributed systems. Hurricane wind-speed predictions were obtained for two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the Batts et al. [1980] wind-speed model, and the uncertainties in these predictions were evaluated using actual wind-speed recordings. The wind-speed residuals had a standard deviation of approximately 0.15, indicating that the uncertainties are not negligible. The wind-speed


205

uncertainties at two sites were seen to be correlated, with the correlation decaying as a Gaussian function of the separation between the sites. Finally, the impact of the wind-speed uncertainties and the spatial correlation on the hurricane risk of a spatially-distributed system was illustrated by a sample risk assessment of a hypothetical portfolio of buildings. It was seen that ignoring the uncertainties or the correlations results in an overestimation of the probability of exceedance of small losses and an underestimation of the probability of exceedance of large losses.

Chapter 10 Conclusions This study focused on developing a computationally-efficient framework for the seismic risk assessment of lifelines (infrastructure systems). Two important challenges in the seismic risk assessment of a lifeline as compared to that of a single structure are the quantification of the ground-motion hazard over a region rather than at just a single site and the minimization of the computational burden associated with lifeline performance evaluations (Figure 10.1). Contributions have been made in both of these areas. The following subsections briefly summarize the important findings of this work, the limitations of this work and suggested future work related to this thesis.

10.1

Contributions and practical implications

10.1.1

Joint distribution of spectral acceleration values at different sites and/ or different periods

Risk assessment of spatially-distributed building portfolios or infrastructure systems requires assumptions regarding the joint distribution of the ground-motion intensity measures at multiple sites during the same earthquake. Chapter 2 of this thesis discussed statistical tests that were used to examine the commonly-used assumptions of univariate normality of logarithmic spectral acceleration values and multivariate normality of vectors of logarithmic spectral acceleration values computed at different sites and/or different periods. Joint 206

CHAPTER 10. CONCLUSIONS

207

Figure 10.1: Comparison of the risk assessment frameworks for (a) single structures and (b) lifelines.


208

normality of logarithmic spectral accelerations was verified by testing the multivariate normality of inter-event and intra-event residuals. Univariate normality of inter-event and intra-event residuals was studied using normal Q-Q plots. The normal Q-Q plots showed strong linearity, indicating that the residuals are well represented by a normal distribution marginally. No evidence was found to support truncation of the marginal distribution of intra-event residuals as is sometimes done in PSHA. Using the Henze-Zirkler test, the Mardia’s test of skewness and the Mardia’s test of kurtosis, it was shown that inter-event and the intra-event residuals at a site, computed at different periods, follow multivariate normal distributions. The normality test of Goovaerts was used to illustrate that pairs of spatially-distributed intra-event residuals can be represented by the bivariate normal distribution. For a set of observed spatially-distributed data, it is practically impossible to ascertain the trivariate normality and the normality at higher dimensions and hence, the presence of univariate and bivariate normalities was considered to indicate multivariate normality of the spatially-distributed intra-event residuals [Goovaerts, 1997].

10.1.2

Spatial correlation model for spectral accelerations

The ground-motion models that are used for site-specific hazard analysis do not provide information on the spatial correlation between ground-motion intensities, which is required for the joint prediction of intensities at multiple sites. Chapter 3 described a spatial correlation model that has been developed from recorded ground-motion time histories using geostatistical tools. The correlation decreases with increasing separation between the sites, and this correlation structure can be modeled using semivariograms. A semivariogram is a measure of the average dissimilarity between the data, whose functional form, sill and range uniquely identify the ground-motion correlation as a function of separation distance. Ground motions observed during the Northridge, Chi-Chi, Big Bear City, Parkfield, Alum Rock, Anza and Chino Hills earthquakes were used to compute the correlations between spatially-distributed spectral accelerations, at various spectral periods. It was seen that the rate of decay of the correlation with separation typically decreases with increasing spectral period. It was reasoned that this could be because long period ground motions at two


209

different sites tend to be more coherent than short period ground motions, on account of lesser wave scattering during propagation. It was also observed that, at periods longer than 2 seconds, the estimated correlations were similar for all the earthquake ground motions considered. At shorter periods, however, the correlations were found to be related to the site Vs 30 values. It was shown that the clustering of site Vs 30’s suggests larger correlations between residuals. The work also investigated the commonly-used assumption of isotropy, and it was seen using the empirical data that the correlation between Chi-Chi and Northridge earthquake intensities show isotropy. Based on these findings, a predictive model was developed that can be used to select appropriate correlation estimates for use in risk assessment of spatially-distributed building portfolios or infrastructure systems. Chapter 4 described additional tests that were carried out using simulated groundmotion time histories [Aagaard et al., 2008] to verify the validity of commonly-used assumptions in spatial correlation models such as stationarity (invariance of correlation with spatial location) and isotropy (directional independence).The correlations were estimated using different orientations of the time histories, namely, fault normal, fault parallel, northsouth and east-west, and were found to be similar in all four cases. The assumption of isotropy of spatial correlations was studied using directional semivariograms, and was found to be reasonable. The correlations were seen to be smaller than average between sites located extremely close to the fault rupture. Intuitively, it is reasonable to expect path effects and small-scale variations to reduce spatial correlation between ground motions at near-fault sites. The pulse-identification algorithm of Baker [2007a] was used for identifying pulse-like ground motions, and the correlations between pulse-like and non-pulse-like ground motions were compared. For the data set used, no significant differences were found between the correlations in these two cases.

10.1.3

Lifeline seismic risk assessment using efficient sampling and data reduction techniques

Chapter 6 discussed an efficient Monte Carlo simulation (MCS)-based framework based on importance sampling and K-means clustering that has been proposed for the seismic risk assessment of lifelines. The framework can be used for developing a small, but


210

stochastically-representative, catalog of ground-motion intensity maps that can be used for performing lifeline risk assessments. The importance sampling technique was used to preferentially sample important ground-motion intensity maps, and the K-means clustering technique was used to identify and combine redundant maps. It was shown theoretically and empirically that the risk estimates obtained using these techniques are unbiased. The proposed framework was used to evaluate the exceedance rates of various traveltime delays on an aggregated form of the San Francisco Bay Area transportation network. Simplified transportation network analysis models were used to illustrate the feasibility of the proposed framework. The exceedance rates were obtained using a catalog of 150 maps generated using the combination of importance sampling and K-means clustering, and were shown to be in good agreement with those obtained using the conventional MCS method. Therefore, the proposed techniques can potentially reduce the computational expense of a MCS-based risk assessment by several orders of magnitude, making it practically feasible. The study also showed that the proposed framework automatically produces intensity maps that are hazard consistent. Finally, the study showed that the uncertainties in ground-motion intensities and the spatial correlations between ground-motion intensities at multiple sites needs to be be modeled in order to avoid introducing errors into the lifeline risk calculations. Appendix C described lifeline loss deaggregation calculations that were used to identify the ground-motion scenarios most likely to produce exceedance of a given loss threshold for a spatially-distributed lifeline system. Deaggregation calculations were performed to identify the likelihoods of earthquake events that cause various levels of travel time delays (the lifeline loss measure) in an aggregated form of the San Francisco bay area transportation network. The deaggregation calculations indicated that the ‘most-likely’ scenario depends on the loss level of interest, and is influenced by factors such as the seismicity of the region, the location of the lifeline with respect to the faults and the performance state of the various components of the lifeline under normal operating conditions. The calculations also showed that large losses are typically caused by moderately large magnitude events with large values of inter-event and intra-event residuals, indicating the importance of accounting for the residuals in the loss assessment framework. Loss assessments carried out without accounting for either the inter-event or the intra-event residuals produce biased


211

loss estimates.

10.1.4

Lifeline performance assessment using statistical learning techniques

MCS and its variants are well suited for characterizing ground motions and computing resulting losses to lifelines, but are highly computationally intensive because they involve repeated evaluations of lifeline performance under a large number of simulated groundmotion intensity maps. Chapter 7 explored the use of a statistical learning technique termed Multivariate Adaptive Regression Trees (MART) to obtain an approximate relationship between the ground-motion intensities at lifeline component locations and the lifeline performance. The lifeline performance predicted by this relationship can be used in place of the actual lifeline performance (the evaluation of which is intensive) to expedite the computation of several lifeline risk-related parameters. The study illustrated this approach by developing a MART-based relationship between the ground-motion intensities at bridge locations and the network travel times in the San Francisco Bay Area transportation network, and using it for estimating confidence intervals for the risk estimates presented in Chapter 6. It was seen that the confidence intervals obtained using the actual and the approximate performance measures match well. More generally, these approximate performance relationships can be used in other problems such as prioritizing lifeline retrofits, whose computational demand stems from the need for repeated performance evaluations.

10.1.5

Seismic risk assessment of spatially-distributed systems using ground-motion models fitted considering spatial correlation

Even though the risk estimates were obtained in Chapter 6 considering spatial correlation, the ground-motion models that were used to predict the distribution of ground-motion intensities are fitted assuming independence between the intra-event residuals. Chapter 8 illustrated the impact of considering spatial correlation between intra-event residuals while developing ground-motion models. The mixed-effects algorithm of Abrahamson


212

and Youngs [1992], which assumes independence between intra-event residuals, was modified to account for the spatial correlation between the intra-event residuals. This was done by changing the likelihood function used for estimating the inter-event and the intra-event residual variances given other model coefficients and changing the estimate of the interevent residual given the total residuals at multiple sites. The modified algorithm was used to refit the Campbell and Bozorgnia [2008] ground-motion model, to illustrate the effect of this refinement. The variance of the total residuals and the model coefficients used for predicting the median ground-motion intensity were not significantly affected by the proposed refinement. Significant changes, however, were seen in the variance of the intra-event and the inter-event residuals. Incorporating spatial correlation was seen to increase the intraevent residual variance and to decrease the inter-event residual variance. These changes have implications for risk assessments of spatially-distributed systems because a smaller inter-event residual variance implies a lesser likelihood of simultaneously observing largerthan-median ground-motion intensities at all sites in a region. To demonstrate this effect, a risk assessment was performed for a hypothetical portfolio of buildings using the groundmotion models obtained with and without accounting for spatial correlation. The results showed that using the published inter- and intra-event variance estimates causes an overestimation of the exceedance rates of large losses. Ground-motion hazard and seismic risk calculations at individual locations are unaffected by this issue.

10.1.6

Extension of proposed ground-motion modeling approaches to hurricane risk assessment

Frameworks for the risk assessment of structures and infrastructure systems under a variety of natural and man-made hazards share many similarities. It is conceivable, therefore, that the techniques developed for the risk assessment under one type of natural or man-made hazard will be applicable for the risk assessment under another hazard or multi-hazard scenario. Chapter 9 described an exploratory study carried out to investigate the extension of the seismic hazard and risk assessment concepts and techniques discussed in the earlier chapters to hurricane hazard and risk modeling. The study focused on quantifying the uncertainties and the spatial correlation in hurricane wind fields (using techniques that were


213

used to quantify these parameters in earthquake ground motion fields), and evaluating their impact on the hurricane risk of spatially-distributed systems. Hurricane wind-speed predictions were obtained for two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the Batts et al. [1980] wind-speed model, and the uncertainties in these predictions were evaluated using actual wind-speed recordings. The wind-speed residuals had a standard deviation of approximately 0.15, indicating that the uncertainties are not negligible. The wind-speed uncertainties at two sites were seen to be correlated, with the correlation decaying as a Gaussian function of the separation between the sites. Finally, the impact of the wind-speed uncertainties and the spatial correlation on the hurricane risk of a spatially-distributed system was illustrated by a sample risk assessment of a hypothetical portfolio of buildings. It was seen that ignoring the uncertainties or the correlations results in an overestimation of the probability of exceedance of small losses and an underestimation of the probability of exceedance of large losses.

10.2

Limitations and future work

10.2.1

Spatial correlation model for spectral accelerations

Chapter 3 presented a spatial correlation model for spectral accelerations developed using recorded ground motions. The model was developed assuming stationarity (location independence) of correlations. Tests for stationarity carried out in Chapter 4 using simulated time histories showed that the correlation between spectral accelerations at two sites that are close to the rupture (within 10km) is smaller than the correlation between spectral accelerations at sites that are farther away from the rupture. This is probably because path effects and small-scale variations near the rupture reduce the spatial correlation between ground-motion intensities at near-fault sites. This implies that the assumption of stationarity does not completely hold. In the future, it is important to verify this observation using recorded ground-motion time histories, and develop a correlation model for near-fault sites, if required. The tests for isotropy indicated that the assumption of isotropy is reasonable on average. It might be possible that the correlations are stronger along certain directions in certain locations even if not on average. For instance, it might be reasonable to expect


214

strong correlation between residuals in the direction of propagation of waves, particularly at near-fault sites [e.g., Walling, 2009]. This needs to be investigated in more detail in the future. While developing the correlation models, priority is placed on building models that fit the empirical data well at short distances, even if this requires some misfit with empirical data at large separation distances, because it is more important to model the semivariogram structure well at short separation distances. This is because widely separated sites also have little impact on each other due to an effective ’screening’ of their influence by more closelylocated sites (Goovaerts, 1997). In special cases where there are very few closely spaced points (less than 10, according to Goovaerts, 1997), the influence of farther away points will not be completely screened. In such cases, the correlation model developed in this study might provide slightly inaccurate correlation estimates. It is, however, to be noted that this is mitigated by the fact that the large separation distances are associated with low correlations, which thus have relatively little effect on joint distributions of ground motion intensities. The correlation studies carried out in Chapters 3 and 4 treated the residuals as random variables. In reality, though, the residuals are related to other unmeasured and unaccounted (completely or partially) for physical effects such as directivity, basin effects and local site effects. If these physical effects are directly modeled in the risk assessments (as part of the mean ground-motion intensity prediction), the models for spatial variability and spatial correlation should be modified accordingly. The study in Chapter 3 identified an empirical link between the extent of spatial correlation and the local-site conditions. The link between spatial correlation and other siteand earthquake-related parameters such as magnitude and faulting mechanism were not investigated on account of the limited availability of well-processed recorded ground-motion data sets. In the future, these links can possibly be investigated using simulated ground motions. Chapters 3 and 4 focused on simultaneously estimating the spatial correlations between spectral accelerations at the same period. There are scenarios where spectral accelerations at multiple periods (or in general, multiple intensity measures) need to be used for assessing the lifeline risk, in which case the consideration of spatial cross-correlations (correlations


215

between two different intensity measures) becomes important. This scenario arises, for instance, when the risk assessment is carried out for a portfolio comprising of structures with different fundamental periods in which case the risk assessment is based on spectral accelerations at multiple periods. For instance, damage to tall buildings is better predicted using spectral accelerations at a long period (say, T1 ) as compared to short buildings whose damage due to ground shaking is more correlated with spectral accelerations at a short period (say, T2 ). In such cases, the spatial cross-correlation between εi (T1 ) and ε j (T2 ) should be considered in order to account for the likelihood of observing jointly large spectral accelerations at sites i and j (the same way spatial correlations should be considered when the spectral accelerations at a single period are used at both sites). Appendix A provided a brief summary of the technical framework behind the estimation of spatial cross-correlation. In the future, it is important to develop a cross-correlation model between spectral accelerations at different periods using recorded and simulated time histories. Chapter 8 described a ground-motion model fitting algorithm that can be used for developing ground-motion models considering the spatial correlation. This algorithm estimates the inter-event and the intra-event residual variances using a maximum likelihood framework. Future work can focus on estimating spatial correlation and cross-correlation in addition to these variances while fitting ground-motion models, by extending the current maximum likelihood framework to consider the correlation as an unknown parameter as well.

10.2.2


The MCS-based lifeline seismic risk assessment framework proposed in this study was illustrated by assessing the seismic risk of an aggregated model of the San Francisco Bay Area transportation network. An aggregated network was used because analyzing the performance of a network as large and complex as the San Francisco Bay Area transportation network under a large number of scenarios (which was be required while implementing the


216

benchmark conventional MCS framework) is extremely computationally intensive. Network aggregation has been used in several fields of research while assessing the performance of large complex networks such as social networks, internet and transportation networks. The performance of an aggregated higher-scale network is then used for decision making on the actual lower-scale network. Future work by the author will further explore the opportunity to develop new methods to systematically aggregate networks, particularly for risk assessment purposes, for obtaining significant computational savings. The importance sampling (IS) technique proposed in Chapter 6 involves sampling large values of inter-event and intra-event residuals in order to capture the upper tail of the lifeline loss curve accurately. The technique requires determination of the means (referred to as mean-shifts in Section 6.3.3) of the inter- and intra-event residual sampling distributions. Large values of the means will result in realizations of large values of inter- and intra-event residuals (i.e., realizations from the upper tail). It is, however, important to choose a reasonable value for the means to ensure adequate preferential sampling of large residuals, while avoiding sets of extremely large residuals that will make the simulated residuals so improbable as to be irrelevant. Chapter 6 also provided guidance of choosing the sampling means based on the network in consideration. Using larger or smaller than optimal means could increase the variance of the risk estimates, and additional research is required to investigate this effect in detail. It is to be noted that the K-means data reduction algorithm (applied after the importance sampling step as described in Section 6.5) eliminates redundant importance sampling maps and therefore, has the potential to retain a good proportion of maps that are relevant to the risk assessment even if IS samples inefficiently. This, however, needs to be verified in detail through additional research. The current study did not consider the dependence between component performances, which may arise between two components constructed by the same contractor (similar workmanship and material quality) and/ or subjected to the similar material degradations due to natural environmental fluctuations over time [Lee and Kiremidjian, 2007]. The current study also did not consider the deterioration of the structural performance of lifeline components with time. The risk assessment framework is reasonably general, and can potentially be used to estimate the risk of a variety of lifelines. The current work, however, only considered a


217

single isolated infrastructure system (a transportation network). There has been significant research interest of late in the risk assessment of multiple infrastructure systems (such as power distribution networks and water distribution networks), with consideration of the interdependencies between the different systems. MCS-based risk assessment frameworks are conducive to modeling interdependencies, but additional research is required to investigate this further. This study used a rate independent earthquake hazard model provided by USGS [2003]. Background seismicity was not considered. Additional research is required to incorporate rate dependent models and background seismicity in the ground-motion sampling procedure described in Chapter 6. Further, future studies by the author will use seismicity models provided in more recent USGS reports. The primary objective of the transportation network risk assessment described in Chapter 6 is to illustrate the effectiveness of the proposed efficient risk assessment framework. A simplified transportation network model was used for evaluating pre-event and post-event network performances. These simplifications are identified below. The current work assumed for simplicity that the post-earthquake demands equal the pre-earthquake demands, even though this is known not to be true [Kiremidjian et al., 2003]. The changes in network performance after an earthquake were assumed to be due only to the delay and the rerouting of traffic caused by structural damage to bridges. The damage states of the bridges were computed considering only the ground shaking, and other possible damage mechanisms such as liquefaction and landslides were not considered. The development of the cross-correlation model discussed earlier will allow the consideration of multiple types of intensity measures that are required for estimating the damage considering secondary hazards such as liquefaction and landslides. The bridge fragility curves provided by HAZUS [1999] were used to estimate the probability of a bridge being in a particular damage state (no damage, slight damage, moderate damage, extensive damage and collapse) based on the simulated ground-motion intensity (spectral acceleration at 1 second) at the bridge site. It was assumed that the bridge fragility functions can be used to analyze even long span bridges such as a the Golden Gate bridge, which may not provide realistic results. Ongoing work by the author focuses on developing a procedure to incorporate the use of ground-motion time histories (instead of only


218

ground-motion intensities) in the risk assessment framework. Damaged bridges cause reduced capacity in the link containing the bridge. The study assumed that the reduced capacities corresponding to the five different HAZUS damage states as 100% (no damage), 75% (slight damage/ moderate damage) and 50% (extensive damage/ collapse). (It is to be noted that the current study did not model the possible increase to the free-flow travel times on damaged links.) The non-zero capacity corresponding to the bridge collapse damage state may seem surprising at first glance. This was based on the argument that there are alternate routes (apart from the freeways and highways considered in the aggregated model used in this study) that provide reduced access to transportation services in the event of a freeway or a highway closure [Shiraki et al., 2007]. Such redundancies are prevalent in most transportation networks, but the precise impact of the redundancy on the capacity of the links in the aggregated model should be studied in more detail in the future. A network can have several bridges in a single link, and in such cases, the link capacity is a function of the damage to all the bridges in the link. The current work assumed that the link capacity reduction equals the average of the capacity reductions attributable to each bridge in the link. This is a simplification, and further research is needed to handle the presence of multiple bridges in a link. The post-earthquake network performance was then computed by solving the user-equilibrium problem using the new set of link capacities, and a new estimate of the total travel time in the network was obtained. It is to be noted that the current work estimated the performance of the network only immediately after an earthquake. The changes in the performance with network component restorations were not considered here for simplicity. It is to be noted that the framework can be used with more accurate and rigorous transportation network models, if desired, but more work is needed to study and overcome challenges that may arise.

10.2.3

Risk management

One of the important goals of lifeline risk assessment is risk management, by, for example, retrofitting lifeline components in order to reduce the adverse impact of the earthquakes on


219

the lifeline. Prioritizing lifeline retrofit is extremely computationally intensive due to the numerous components present in a lifeline, and on account of the need to evaluate the performance of each possible retrofit scheme under several possible future earthquake scenarios. The computational demand can be reduced using the efficient MCS-based framework proposed in this work (Chapter 6), in combination with the use of statistical learning techniques to efficiently model network performance (Chapter 7). This needs to be explored in the future.

10.2.4

Multi-hazard risk assessment

In order to illustrate the application of the proposed ground-motion modeling techniques to modeling other hazards, the seismic hazard and risk assessment concepts and techniques discussed in this thesis were applied to hurricane hazard and risk modeling in Chapter 9. (It is to be noted that the hurricane was the only alternate hazard considered in this work. It is possible that the challenges that arise in extending the seismic hazard concepts to modeling other hazards may vary from one hazard to another, and further research is needed to investigate this.) Many simplifying modeling assumptions were made in this exploratory study on hurricane risk assessment. The primary simplifying assumptions include the use of data from only two hurricanes, the use of the simplified Batts et al. [1980] model (which does not consider the reduction in wind speeds attributable to land friction) and the use of the deterministic HAZUS [2006] fragility function. Some concerns are also present on the quality of the wind speed recordings provided by the NOAA Hurricane Research Division H*Wind program. A more detailed discussion of the limitations and potential future works connected to the current study can be found in Chapter 9.

10.3

Concluding remarks

The study quantified the distribution of earthquake ground-motion intensities over a region, which is required for the risk assessment of lifelines. A computationally-efficient Monte Carlo sampling technique was proposed to evaluate the lifeline seismic risk with full consideration of the uncertainties and the spatial correlation in ground-motion fields.


220

Given the effectiveness of the framework when applied to the simplified lifeline model used here, future research appears warranted to study its use with more realistic lifeline models, and extend it to quantify the risk of multiple interdependent infrastructure systems under other hazard and multi-hazard scenarios. Further research is also necessary to utilize the framework for prioritizing risk-mitigation solutions.

Appendix A Characterizing spatial cross-correlation between ground-motion spectral accelerations at multiple periods N. Jayaram and J.W. Baker (2010). Spatial cross-correlation between ground-motion intensities, 9th U.S. National and 10th Canadian Conference on Earthquake Engineering, Toronto, Canada.

A.1

Abstract

Quantifying ground-motion shaking over a spatially-distributed region rather than at just a single site is of interest for a variety of applications relating to risk of infrastructure or portfolios of properties. The risk assessment for a single structure can be easily performed using the available ground-motion models that predict the distribution of the ground-motion intensity at a single site due to a given earthquake. These models, however, do not provide information about the joint distribution of ground-motion intensities over a region, which is required to quantify the seismic hazard at multiple sites. In particular, the ground-motion models do not provide information on the correlation between the ground-motion intensities at different sites during a single event. Researchers have previously estimated the correlations between residuals of spectral 221

APPENDIX A. SPATIAL CROSS-CORRELATIONS

222

accelerations at the same spectral period at two different sites. But there is still not much knowledge about cross-correlations between residuals of spectral accelerations at different periods (or more generally between residuals of two different intensity measures) at two different sites, which becomes important, for instance, when assessing the risk of a portfolio of buildings with different fundamental periods. Spatial cross-correlations are also important when assessing the risk due to multiple ground-motion effects such as ground shaking and liquefaction, because this involves the use of multiple types of intensity measures. This manuscript summarizes recent research in ground-motion spatial cross-correlation estimation using geostatistical tools. Recorded ground-motion intensities are used to compute residuals at multiple periods, which are then used to estimate the spatial cross-correlation. These cross-correlation estimates can then be used in risk assessments of portfolios of structures with different fundamental periods, and in assessing the seismic risk under multiple ground-motion effects.

A.2

Introduction

Quantifying ground-motion shaking over a spatially-distributed region rather than at just a single site is of interest for a variety of applications relating to risk of infrastructure or portfolios of properties. For instance, the knowledge about ground-motion shaking over a region is important to predict (or estimate after an earthquake) the monetary losses associated with structures insured by an insurance company, the number of casualties in a certain area and the probability that lifeline networks for power, water, and transportation may be interrupted. The risk assessment for a single structure requires only the quantification of seismic hazard at a single site, which can be easily done using probabilistic seismic hazard analysis (PSHA). The hazard is typically measured in terms of an intensity measure such as the spectral acceleration corresponding to the building’s fundamental period (peak response of simple single-degree-of-freedom (SDOF) oscillators with the same fundamental period of the real structure) when the damage to a building is to be estimated. Other ground-motion parameters such as the peak ground acceleration (PGA) or peak ground velocity (PGV) are used for other applications such as the prediction of liquefaction of saturated sandy soil or the response of buried pipelines. The hazard assessment procedure uses ground-motion


223

models that have been developed to predict the distribution of the ground-motion intensity at a single site after a given earthquake. These models, however, do not provide information on the joint distribution of ground-motion intensities over a region, which is required to quantify the seismic hazard at multiple sites such as for lifeline risk assessment. In particular, the ground-motion models do not provide information on the correlation between the ground-motion intensities at different sites during a single event. In general, the ground-motion intensities at two sites are expected to be correlated for a variety of reasons, such as a common source earthquake (whose unique properties may cause correlations in ground motions at many sites), similar locations to fault asperities, similar wave propagation paths, and similar local-site conditions. Modern ground-motion models partially account for the correlation via a specific inter-event term as follows: ln(Sai (T )) = ln S¯ai (T ) + σi (T )εi (T ) + τi (T )ηi (T )

(A.1)

where Sai (T ) denotes the spectral acceleration at period T at site i; S¯ai (T ) denotes the predicted (by the ground-motion model) median spectral acceleration (which depends on parameters such as magnitude, distance, period and local-site conditions); εi (T ) denotes the normalized intra-event residual at site i associated with Sai (T ), η(T ) denotes the normalized inter-event residual associated with Sai (T ). Both εi (T ) and ηi (T ) are random variables with zero mean and unit standard deviation. The standard deviations, σi (T ) and τi (T ), are estimated as part of the ground-motion model and are functions of the spectral period (T ) of interest, and in some models also functions of the earthquake magnitude and the distance of the site from the rupture. The term σi (T )εi (T ) is called the intra-event residual, and the term τi (T )ηi (T ) is called the inter-event residual. Though the ground-motion models partly account for the correlation via ηi , the εi ’s still show a significant amount of residual correlation. Researchers have previously estimated the correlations between residuals of spectral accelerations at the same spectral period (e.g., between εi (T ) and ε j (T )) using recorded ground motions [e.g., Boore et al., 2003, Wang and Takada, 2005, Goda and Hong, 2008, Jayaram and Baker, 2009a]. These models have shown that the spatial correlation decays with site separation distance between sites i and j, and that the rate of decay is a function of the spectral period. These works, however, do


224

not investigate the nature of the spatial cross-correlation between residuals of two different intensity measures at two different sites (e.g., between ε j (T1 ) and ε j (T2 )). Considering spatial correlation in risk analysis is important because correlation between residuals can lead to large ground-motion intensities over a spatially-extended area. Recent research has shown that ignoring spatial correlations can significantly underestimate the seismic risk of portfolios of buildings and of other lifelines such as transportation networks [e.g., Park et al., 2007, Jayaram and Baker, 2010]. For instance, Figure A.1 shows the exceedance rates of earthquake-induced travel-time delays in the San Francisco Bay Area transportation network estimated by [Jayaram and Baker, 2010] while considering/ignoring spatial correlation. This figure shows that the likelihood of observing large delays gets significantly underestimated when spatial correlations are ignored. Spatial cross-correlations are equally important when multiple intensity measures are used for assessing the system risk. This arises, for instance, when predicting damage to a portfolio of structures whose individual damage states are predicted using spectral accelerations at multiple periods. Spatial cross-correlations are also important when secondary effects such as landslides and liquefaction are considered apart from ground shaking. For instance, according to [HAZUS, 1997], the susceptibility of soil to liquefy is a function of the peak ground acceleration (i.e., Sa (0)) at the site, which might be different from the primary intensity measure (Sa (T )) of interest. This manuscript summarizes recent research in ground-motion spatial cross-correlation estimation using geostatistical tools. Recorded ground-motion intensities are used to compute residuals at multiple periods, which are then used to estimate the spatial cross correlation. These cross-correlation estimates can then be used in risk assessments of portfolios of structures with different fundamental periods, and in assessing the seismic risk under multiple earthquake effects.

A.3

Statistical Estimation of Spatial Cross-Correlation

In this study, geostatistical tools are used to estimate the spatial cross-correlations using recorded ground-motion data from the Pacific Earthquake Engineering Research (PEER) Center’s Next Generation Attenuation (NGA) ground-motion library.


225

Figure A.1: (a) The San Francisco Bay Area transportation network and (b) Annual exceedance rates of various travel time delays on that network (results from Jayaram and Baker [2010]). The first step involved in developing an empirical cross-correlation model using recorded ground-motion time histories is to use the time histories to compute the corresponding ground-motion intensities ({SS a (T1 ), S a (T2 ), · · · , S a (Tm )}) and the associated normalized residuals ({εε (T1 ), ε (T2 ), · · · , ε (Tm )}) using a ground-motion model. The crosscorrelation structure of the residuals can then be represented by a ‘cross-semivariogram’, which is a measure of the average dissimilarity between the data [Goovaerts, 1997]. Let u and u0 denote two sites separated by h . The cross-semivariogram (γ(u, u0 )) is defined as follows: γ(u, u0 ) =

1 [E{εu (T1 ) − εu0 (T1 )}{εu (T2 ) − εu0 (T2 )}] 2

(A.2)

The cross-semivariogram defined in equation A.2 is location-dependent and its inference requires repetitive realizations of ε (T1 ) and ε (T2 ) at locations u and u0 . Such repetitive measurements are, however, never available in practice (e.g., in the current application, one would need repeated observations of ground-motion intensities at every pair of sites of interest). Hence, it is typically assumed that the cross-semivariogram does not depend on site locations u and u0 , but only on their separation h to obtain a stationary cross-semivariogram.


226

The stationary cross-semivariogram (γ(hh)) can then be estimated as follows: γ(hh) =

1 [E{εu (T1 ) − εu+hh (T1 )}{εu (T2 ) − εu+hh (T2 )}] 2

(A.3)

A stationary cross-semivariogram is said to be isotropic if it is a function of the separation distance (h = khhk) rather than the separation vector h . An isotropic, stationary semivariogram can be empirically estimated from a data set as follows: ˆ γ(h) =

1 N(h) ∑ {εuα (T1) − εuα +h(T1)}{εuα (T2) − εuα +h(T2)} 2N(h) α=1

(A.4)

ˆ where γ(h) is the experimental stationary semivariogram (estimated from a data set); N(h) denotes the number of pairs of sites separated by h; and {εuα (T ), εuα +h (T )} denotes the α’th such pair. T ) is completely specified by the semivariogram function The covariance structure of ε (T and the sill and the range of the cross-semivariogram. It can be theoretically shown that the following relationship can be used to estimate the cross-correlations from the crosssemivariograms: γ(h) = ρ12 (0) − ρ12 (h)

(A.5)

where ρ12 (0) denotes the cross-correlation between εu (T1 ) and εu (T2 ) at the same site u and ρ12 (h) denotes the cross-correlation between εu (T1 ) and εu+h (T2 ). Therefore, it would suffice to estimate the cross-semivariogram of the residuals in order to determine their cross-correlations. The correlation term ρ12 (0) has been estimated in the past [e.g., Baker and Jayaram, 2008, Baker and Cornell, 2006], and this work extends these results to include the effects of differing locations as well. Once the cross-semivariogram values are obtained at discrete values of h, they are then fitted using a continuous function of h for prediction purposes. In this work, the discrete cross-semivariogram values are fitted with an exponential semivariogram which has the following form: 3h γ(h) = S 1 − e− R

(A.6)

where S and R denote the sill and the range of the cross-semivariogram respectively. The


227

value of the sill equals ρ12 (0) (from Equations A.5 and A.6), and the range denotes the separation distance at which the cross-correlation decays to less than 5% of the sill. Since the values of ρ12 (0) have been previously computed by Baker and Jayaram [2008], it will suffice to estimate the range R to quantify the extent of spatial cross-correlation.

A.4

Sample Results and Discussion

This section discusses some sample cross-correlation estimates obtained using recorded time histories from the 1999 Chi-Chi earthquake. In particular, spatial cross-correlation estimates are computed for the 1 second and the 2 second spectral acceleration residuals from the Chi-Chi earthquake ground motions using the geostatistical procedure described in the earlier section. These residuals are first computed from the recorded ground motions using the Boore and Atkinson [2008] ground-motion model, and are shown in Figure A.2ab. Visually, the presence of spatial cross-correlation is indicated by the similarity between the nearby residuals across A.2a-b. Figure A.2c shows the cross-semivariogram estimated using the above mentioned residuals. An exponential function is then fitted to the discrete cross-semivariogram values, the sill of which equals 0.7490 (which is the ρ12 (0) value obtained from Baker and Jayaram [2008]). The range of the cross-semivariogram equals 47km, and has been chosen to provide a good fit at short separation distances, although compromising on the quality of the fit at larger separation distances. This is because it is more important to model the crosssemivariogram structure well at short separation distances since large separation distances are associated with low correlations, which thus have relatively little effect on joint distributions of ground motion intensities. In addition to having low correlation, widely separated sites also have little impact on each other due to an effective ’screening’ of their influence by more closely-located sites [Goovaerts, 1997]. A more detailed discussion on the importance of fitting well at short separation distances can be found in Jayaram and Baker [2009a]. The sample cross-semivariogram in Figure A.2 shows that the extent of spatial crosscorrelation is reasonably significant. For instance, the value of the cross-correlation equals 0.4 for sites separated by 10km and increases up to 0.75 for sites that are very close to each


228

another. As a result, it will likely be important to consider spatial cross-correlations while studying multiple types of intensity measures distributed over a region. Currently, the author is in the process of developing a spatial cross-correlation model considering the residuals from multiple intensity measures using recordings from multiple earthquakes.

A.5

Conclusions

This manuscript summarized recent research in ground-motion spatial cross-correlation estimation using geostatistical tools. Spatial cross-correlations become important while quantifying the distribution of different types of ground-motion intensity measures over a region. This work used cross-semivariograms to model the cross-correlation structure. A cross-semivariogram is a measure of dissimilarity between the data, whose functional form (e.g., exponential function), sill and range uniquely identify the ground-motion crosscorrelation as a function of separation distance. In this work, recorded ground-motion spectral accelerations were used to compute residuals at multiple periods, which are then used to estimate the spatial cross-correlation. The manuscript showed sample cross-correlation estimates obtained using the 1s and 2s Chi-Chi earthquake residuals. The extent of the cross-correlation was found to be fairly significant, and hence, it will likely be important to consider spatial cross-correlations while studying the distribution of multiple types of intensity measures over a region. Currently, the authors are in the process of developing a spatial cross-correlation model considering the residuals from multiple intensity measures using recordings from multiple earthquakes. Once developed, these cross-correlation estimates can be used in risk assessments of portfolios of structures with different fundamental periods, and in assessing the seismic risk under multiple ground-motion effects.


229

Figure A.2: (a) Chi-Chi earthquake normalized residuals computed using spectral accelerations at 1 second (b) Chi-Chi earthquake normalized residuals computed using spectral accelerations at 2 seconds (c) Cross-semivariogram estimated using the 1s and 2s Chi-Chi earthquake residuals.

Appendix B Supporting details for the spatial correlation model developed in Chapter 3 Excerpted from:

J.W. Baker and N. Jayaram (2009). Effects of spatial correlation of ground-motion parameters for multi-site risk assessment: Collaborative research with Stanford University and AIR. Technical report, Report for U.S. Geological Survey National Earthquake Hazards Reduction Program (NEHRP) External Research Program Award 07HQGR0031.

(Professor Baker was the first author of the above report as the Principal Investigator of this project, but all the results and the writing in this appendix were produced by the author of this thesis)

In Chapter 3, several statements were made about properties of spectral acceleration spatial correlations that were not explained in detail in the text. In this Appendix, details to support 230

APPENDIX B. SPATIAL CORRELATION MODEL

231

those statements are presented for interested readers.

B.1

Semivariograms of residuals estimated using the Northridge earthquake ground motions

Chapter 3 discussed the semivariogram ranges at seven periods ranging from 0 to 10s estimated using the Northridge earthquake recordings. Figures B.1-B.7 show the semivariograms and the exponential fits obtained in these cases.

Figure B.1: Semivariogram of ε˜ based on the peak ground accelerations observed during the Northridge earthquake data


232

Figure B.2: Semivariogram of ε˜ computed at 0.5 seconds based on the Northridge earthquake data

Figure B.3: Semivariogram of ε˜ computed at 1 second based on the Northridge earthquake data


233

Figure B.4: Semivariogram of ε˜ computed at 2 seconds based on the Northridge earthquake data



234

Figure B.6: Semivariogram of ε˜ computed at 7.5 seconds based on the Northridge earthquake data



B.2

235

Semivariograms of residuals estimated using Chi-Chi earthquake ground motions

Chapter 3 discussed the semivariogram ranges at seven periods ranging from 0 to 10s estimated using the Chi-Chi earthquake recordings. This section shows the semivariograms and the exponential fits obtained in these cases.

B.2.1

Exact versus approximate semivariogram fit

Figure B.8 shows the experimental semivariogram values at discrete separation distances, obtained using the ε˜ values computed at 2 seconds. The most accurate model for the semivariogram function is a combination of a nugget effect with a contribution of 0.3 and an exponential semivariogram with a contribution of 0.7 and a range of 85 km, which is also shown in Figure B.8. This model can be expressed as follows: γ(h) = 0.3I(h > 0) + 0.7 (1 − exp (−3h/85))

(B.1)

where I(h > 0) is an indicator variable that equals 1 when h > 0 and equals 0 otherwise. The use of a single model for all semivariograms is highly desirable in order to facilitate development of a standard correlation model for use in future predictions. The exponential model is seen to be accurate in most cases and hence, an approximate exponential model is fitted even in cases where alternate accurate models are available. Hence, the semivariogram function for the ε˜ values computed at 2 seconds is approximated by an exponential model with a range of 36 km and a sill of 1, as shown in Figure B.8. This semivariogram function fits the data reasonably well at small separations.

B.2.2

Semivariograms of the residuals at seven periods ranging between 0 and 10s

Figures B.1-B.7 show the semivariograms and the exponential fits obtained using the ChiChi earthquake records.


236

Figure B.8: Experimental semivariogram of ε˜ computed at 2 seconds based on the Chi-Chi earthquake data. Also shown in the figure are two fitted semivariogram models: (i) An accurate exponential + nugget model and (ii) An approximate exponential model

Figure B.9: Semivariogram of ε˜ based on the peak ground accelerations observed during the Chi-Chi earthquake data


237

Figure B.10: Semivariogram of ε˜ computed at 0.5 seconds based on the Chi-Chi earthquake data

Figure B.11: Semivariogram of ε˜ computed at 1 second based on the Chi-Chi earthquake data


238

Figure B.12: (Approximate) Semivariogram of ε˜ computed at 2 seconds based on the ChiChi earthquake data

Figure B.13: Semivariogram of ε˜ computed at 5 seconds based on the Chi-Chi earthquake data


239

Figure B.14: Semivariogram of ε˜ computed at 7.5 seconds based on the Chi-Chi earthquake data

Figure B.15: Semivariogram of ε˜ computed at 10 seconds based on the Chi-Chi earthquake data


B.3

240

Semivariograms of residuals estimated using broadband simulations for scenario earthquakes on the Puente Hills thrust fault system

Chapter 3 only discussed the correlations estimated using recorded ground motions. This section describes the correlations between residuals computed based on broadband groundmotion simulations for scenario earthquakes on the Puente Hills thrust fault system [Graves, 2006], which are not discussed in any of the earlier chapters. The simulated time histories are available for five different rupture scenarios that differ in the rupture velocity and the rise time. In this work, ground motions due to the rupture scenario defined by a rupture velocity equaling 80% of the shear wave velocity and a rise time of 1.4 seconds are used for the analysis. The ground-motion time histories have been simulated at 648 sites covering the Los Angeles, San Fernando and San Gabriel basin regions. The time histories at locations with very low Vs 30 values, however, were reported to be possibly inaccurate because the simulation algorithm does not yet fully account for non-linear site effects [Graves, 2007]. Hence, in the current work, only the time histories at sites with Vs 30 values exceeding 300m/s are considered for analysis. Experimental semivariograms are obtained for ε˜ ’s computed at several different periods ranging from 0 - 10 seconds. The exponential model is found to provide a good fit at periods below 2 seconds. At longer periods, however, a spherical model provides a better fit than an exponential model. For example, Figure B.16 shows the experimental semivariogram and a fitted spherical model (unit sill and range equaling 32 km) based on residuals computed at 5 seconds. 3 r 1 r 3 γ(h) = − if h ≤ 32 2 32 2 32 = 1 otherwise

(B.2)

As explained earlier, for consistency with other results, exponential models that provide a reasonably good approximation at short separation distances (that are useful in practice) are used to model the semivariograms. For example, the experimental semivariogram can


241

Figure B.16: Experimental Semivariogram of ε˜ computed at 5 seconds based on the simulated ground-motion data. Also shown in the figure are two fitted semivariogram models: (i) An accurate spherical model and (ii) An approximate exponential model also be fitted with an exponential model which has a unit sill and a range of 60 km as shown in Figure B.16. It can be seen from the figure that this exponential function models the correlations at small separations reasonably accurately. A plot of the range of semivariograms as a function of period is shown in Figure B.17. The trend of increasing range with period is seen in this figure as well. The computed ranges are reasonably similar to those seen from the Northridge earthquake data. It is to be noted, however, that the ground-motion simulations at short periods (periods ≤ 2 seconds) may not be entirely accurate, and hence, the ranges obtained using the Northridge and Chi-Chi earthquake data are more reliable estimates.


242

Figure B.17: Range of semivariograms of ε˜ , as a function of the period at which ε˜ values are computed. The residuals are obtained using the simulated ground-motion data

B.4

Clustering of Vs30’s

The semivariogram model described in Chapter 3 involves determining the presence of Vs 30 clustering (Section 3.4.5). This is best done by computing the range of the Vs 30 semivariogram and comparing it to the ranges shown in Figure 3.5. In order to provide additional guidance to users of the correlation model, Figure B.18 shows simulated multivariate normal random fields with three different levels of correlation, with mean and variance equaling those of the Vs 30’s in the San Francisco bay area region. The correlation structure in Figure B.18a is defined by an exponential semivariogram with a range of 0km, and is an example of heterogeneous Vs 30 conditions (Case 1 in Section 3.4.5). This is indicated by the lack of clustering of the Vs 30’s in the figure. The correlation structure in Figure B.18b is defined by an exponential semivariogram with a range of 20km, and is an example of low-moderately heterogeneous conditions (Case 1 in Section 3.4.5). Figure B.18c is an example of homogeneous Vs 30 conditions, and has a correlation structure defined by an exponential semivariogram with a range of 40km (Case 2 in Section 3.4.5). The clustering of the Vs 30 field in the region of interest (where the region is defined as the collection of sites of interest) can be compared to that of the three maps to approximately determine the


243

appropriate case to use.

Figure B.18: Simulated multivariate normal random fields. The correlation structure is defined using an exponential semivariogram with range equaling (a) 0km (b) 20km and (c) 40km.


B.5

244

Correlation between near-fault ground-motion intensities

Most currently available ground-motion models do not directly predict ground motions containing strong velocity pulses, such as those caused by near-fault directivity. As a result, the ground-motion intensities predicted by the models at sites that experience pulse-like ground motions will be different from the observed values. Such systematic prediction errors can increase the apparent correlation between the residuals computed at these sites. Hence, in this section, empirical data are used to verify whether the correlation between residuals at sites experiencing pulse-like ground motion is significantly different from the correlation between residuals at other sites. Baker [2007a] used wavelet analysis to extract velocity pulses from ground motions and developed a quantitative criterion for classifying a ground motion as pulse-like. Ninety one large-velocity pulses were found in the fault-normal components of the approximately 3500 strong ground-motion recordings in the PEER NGA Database [2005]. It should be noted that not all of these pulses may be due to directivity effects, but this provides a reasonable data set for studying the potential impact of directivity. Of these, 30 pulses were found in the fault-normal components of the Chi-Chi earthquake recordings, while the rest of the earthquakes have far fewer recordings with pulses. In the current work, the pulse-like ground motions from the Chi-Chi earthquake are used to compute ε˜ values at different periods. The semivariograms of the residuals are obtained and compared to those estimated using all usable records (section 3.4.3). Figures B.19-B.25 compare experimental semivariograms of residuals (at seven different periods) computed using pulse-like ground motions to experimental semivariograms of residuals computed using all usable ground motions. The figures show the experimental semivariogram values at short separation distances, which are of interest in practice. On account of the fewer available records, it is to be noted that the experimental semivariograms obtained using pulse-like ground motions are less clearly defined than those obtained using all usable ground motions. Hence, it is difficult to fit robust models for the experimental semivariograms obtained using the pulse-like ground motions. As a result, the experimental semivariograms are compared as such, rather than by their models and ranges.


245

It can be seen from Figures B.19-B.25 that the experimental semivariogram values obtained using the pulse-like ground motions are slightly less than those obtained using all usable ground motions, particularly at separation distances below 10 km and at long periods (7.5 and 10 seconds). This is consistent with expectations as the pulses from this earthquake typically have periods of approximately 7 seconds and so, it is expected that this is the period range that would be most strongly influenced by directivity. In other words, the ε˜ ’s obtained using pulse-like ground motions show slightly larger correlations than those obtained using all usable ground motions. The difference in the correlations is typically around 0.1, with a maximum value of approximately 0.2. While the increased correlations between the residuals at sites experiencing pulselike ground motions is expected, the difference in the correlation seems reasonably small. Moreover, it is to be noted that the source of this additional correlation is the systematic prediction errors caused by the ground-motion models at sites experiencing pulse-like ground motions. Hence, if ground-motion models that account for directivity effects accurately are developed, the correlations between near-fault ground-motion intensities can be expected to the similar to the correlation between ground-motion intensities at other sites. That is, the directivity effects are best addressed through refinements to ground-motion models, rather than refinements to correlation models.


246

Figure B.19: Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are computed from peak ground accelerations

Figure B.20: Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 0.5 seconds


247

Figure B.21: Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 1 second

Figure B.22: Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 2 seconds


248


Figure B.24: Comparison between the experimental semivariogram of ε˜ ’s computed using pulse-like ground motions and the experimental semivariogram of ε˜ ’s computed using all usable ground motions. The ε˜ ’s are obtained from spectral accelerations computed at 7.5 seconds


249



B.6

250

Directional semivariograms estimated using the Northridge and the Chi-Chi earthquake records at various periods

Chapter 3 showed that the directional semivariograms obtained using the Northridge earthquake 2s residuals match reasonably well, thereby indicating that the use of an isotropic correlation model is reasonable. This section provides more empirical evidence (directional semivariograms obtained using Chi-Chi and Northridge earthquake recordings, considering residuals at three different periods) to support the assumption of isotropy.


251

(a)

(b)

(c)

(d)

Figure B.26: Experimental directional semivariograms at discrete separations obtained using the Northridge earthquake ε˜ values computed at 2 seconds. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90


252

(a)

(b)

(c)

(d)

Figure B.27: Experimental directional semivariograms at discrete separations obtained using the Chi-Chi earthquake ε˜ values computed at 1 second. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90


253

(a)

(b)

(c)

(d)

Figure B.28: Experimental directional semivariograms at discrete separations obtained using the Chi-Chi earthquake ε˜ values computed at 7.5 seconds. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90


254

(a)

(b)

(c)

(d)

Figure B.29: Experimental directional semivariograms at discrete separations obtained using the simulated time histories. The ε˜ values are computed at 2 seconds. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90


255

(a)

(b)

(c)

(d)

Figure B.30: Experimental directional semivariograms at discrete separations obtained using the simulated time histories. The ε˜ values are computed at 7.5 seconds. Also shown in the figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90


256

(a)

(b)

(c)

(d)

Figure B.31: Experimental directional semivariograms at discrete separations obtained using the simulated time histories. The ε˜ values are computed at 7.5 seconds. Also shown in the figures is an anisotropic model that fits the four experimental semivariograms well (It is to be noted that an anisotropic semivariogram has different shapes in different directions.): (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90

Appendix C Deaggregation of lifeline risk: Insights for choosing deterministic scenario earthquakes N. Jayaram and Baker, J.W. (2009). Deaggregation of lifeline risk: Insights for choosing deterministic scenario earthquakes, Lifeline Earthquake Engineering in a Multihazard Environment TCLEE, Oakland, California.

C.1

Abstract

Probabilistic seismic risk assessment for lifelines is less straightforward than for individual structures. Analytical risk assessment techniques such as the ‘PEER framework’ are insufficient for a probabilistic study of lifeline performance, due in large part to difficulties in describing ground-motion hazard over a region. As a result, Monte Carlo simulation and its variants appear to be the best approach for characterizing ground motions for lifelines. A challenge with Monte Carlo simulation is its large computational expense, and in situations where computing lifeline losses is extremely computationally demanding, assessments may consider only a single ‘interesting’ ground-motion scenario and a single associated map of resulting ground motion intensities. In this paper, a probabilistic simulation-based risk assessment procedure is coupled 257

APPENDIX C. LIFELINE RISK DEAGGREGATION

258

with a deaggregation calculation to identify the ground-motion scenarios most likely to produce exceedance of a given loss threshold. The deaggregation calculations show that this ‘most-likely scenario’ depends on the loss level of interest, and is influenced by factors such as the seismicity of the region, the location of the lifeline with respect to the faults and the current performance state of the various components of the lifeline. It is seen that large losses are typically caused by moderately large magnitude events with large average values of inter-event and intra-event residuals, implying that the scenario ground motions should be obtained in a manner that accounts for ground-motion uncertainties. Explicit loss analysis calculations that exclude residuals will demonstrate that the resulting loss estimates are highly biased.

C.2

Introduction

Probabilistic seismic risk assessment for lifelines is less straightforward than for individual structures. While procedures such as the ‘PEER framework’ have been developed for risk assessment of individual structures, these are not easily applicable to distributed lifeline systems, due in large part to difficulties in describing ground-motion hazard over a region (in contrast to ground-motion hazard at a single site, which is easily quantified using Probabilistic Seismic Hazard Analysis). In the past, researchers have used simplified approaches to tackle the problem of specifying ground motions over a region. In the simplest case, the uncertainties in the ground-motion intensities are ignored, and lifeline risks are studied using the median ground motions predicted by ground-motion models [e.g., Shiraki et al., 2007, Campbell and Seligson, 2003]. While this approach reduces the computational burden significantly, ignoring the uncertainties in the ground-motion intensities will result in highly biased risk estimates as shown in this paper subsequently. Sometimes, as a simplification, lifeline risks are assessed using only those earthquake scenarios that may dominate the ground-motion hazard in the region of interest [e.g., Adachi and Ellingwood, 2008]. This approach is helpful practically in reducing computational expense, but suffers from several problems. First, it is difficult to identify the probability of actually incurring the computed losses resulting from a single ground-motion scenario. Second, the scenario earthquake is generally chosen in a somewhat ad hoc manner, and so there is no guarantee


259

that the chosen scenario is the one that is most ‘interesting’ in terms of risk to the lifeline system. Crowley and Bommer [2006] and more recently, Jayaram and Baker [2010] proposed Monte Carlo simulation (MCS)-based frameworks to forward simulate ground-motion intensities in future earthquakes, which can then be used for the risk assessment of lifelines. The sampling frameworks are based on the form of existing ground-motion models, which is described below. The ground motion at a site is modeled as [e.g., Boore and Atkinson, 2008] ln(Yi ) = ln (Y¯i ) + εi + η

(C.1)

where Yi denotes the ground-motion parameter of interest (e.g., Sa (T ), the spectral acceleration at period T ); Y¯i denotes the predicted (by the ground-motion model) median groundmotion intensity (which depends on parameters such as magnitude, distance, period and local-site conditions); εi denotes the intra-event residual, which is a random variable with zero mean and standard deviation σi ; and η denotes the inter-event residual, which is a random variable with zero mean and standard deviation τ. The standard deviations, σi and τ, are estimated as part of the ground-motion model and are a function of the spectral period of interest, and in some models also a function of the earthquake magnitude and the distance of the site from the rupture. The intra-event residual at two sites i and j are correlated, and the correlation is a function of the separation distance between the sites. The extent of the correlation can be obtained from spatial correlation models such as that of Jayaram and Baker [2009a] and Wang and Takada [2005]. Crowley and Bommer [2006] describe the MCS approach used to probabilistically sample ground-motion maps. This approach involves simulating earthquakes of different magnitudes on various active faults in the region, followed by simulating the inter-event and the intra-event residuals at the sites of interest for each earthquake. The residuals are then combined with the median ground motions in accordance with Equation 1 in order to obtain the ground motions at all the sites. In the current work, the simulation approach described above is coupled with a deaggregation calculation that can identify the ground-motion scenario most likely to produce exceedance of a given loss threshold. The results show that


260

the most-likely scenario depends on the loss level of interest, and is influenced by factors such as the seismicity of the region, the location of the lifeline with respect to the faults and the current performance state of the various components of the lifeline. It is also seen that large losses are most likely to be caused by moderately large magnitude earthquakes combined with large positive inter-event and intra-event residuals. The findings illustrate the importance of accounting for ground-motion uncertainty, as well as provide a basis for a decision maker to choose interesting scenario ground motions for lifeline risk assessment.

C.3

Deaggregation of seismic loss

This section describes the fundamentals of the seismic loss deaggregation procedure which is used in the current study. Deaggregation is the process used to quantify the likelihood that various events could have produced the exceedance of a given loss threshold. For instance, if it is known that the seismic loss exceeds x units, the likelihood that an event of magnitude m could have caused the exceedance is given as follows:

P(Loss > x, Magnitude = m) P(Loss > x) λ (Loss > x, Magnitude = m) = λ (Loss > x)

P(Magnitude = m|Loss > x) =

(C.2)

where λ (Loss > x, Magnitude = m) denotes the recurrence rate of events of magnitude m causing more than loss x and λ (Loss > x) is the recurrence rates of events causing a loss exceedance of x. These parameters can be estimated using the simulation-based framework described in Section C.2. The likelihoods can also be computed considering multiple parameters such as magnitudes and faults as follows: P(Magnitude = m, f ault = f |Loss > x) =

λ (Loss > x, Magnitude = m, f ault = g) (C.3) λ (Loss > x)

Such calculations are common practice when loss assessments are carried out for a


261

single structure (though most deaggregation calculations estimate the contribution (likelihood) of various earthquake scenarios to ground-motion intensity exceedance rather than loss exceedance). Typical results from the single-site deaggregation computations include the joint likelihoods of magnitudes, rupture distances (distance of the structure from the rupture) and residuals (Equation C.1). In the current work, it is of interest to identify the contributions of magnitudes, rupture locations and residuals (inter-event and intra-event) to lifeline losses. Deaggregation calculations for lifeline losses need to account for the fact that ground motions at multiple sites are of interest. This would mean that a specific distance to the rupture cannot be obtained as is commonly done when a single structure is involved. In the current work, this problem is overcome by specifying the fault on which the rupture lies rather than the distance to any particular site. Further, since each site of interest is associated with a different intra-event residual, deaggregation is used to compute the contribution of the mean intra-event residual (i.e., the average of the intra-event residuals at all sites) rather than the contribution of the intra-event residual at any particular site.

C.4

Loss assessment for the San Francisco Bay Area transportation network

The deaggregation computations in the current work are based on the loss estimates for an aggregated form of the San Francisco bay area transportation network provided by Jayaram and Baker [2009a]. This section describes the details of the aggregated network as well as describes the performance measures considered in the loss assessment process. Figure C.1 shows the deaggregated network along with the various important faults in the San Francisco bay area. The network consists predominantly of freeways and expressways, and has a total of 586 links, 310 nodes and 1,125 bridges. In this network, the traffic originates and culminates in 46 nodes denoted centroidal nodes. Transportation network performance is usually measured in terms of the total travel time of the network [Shiraki et al., 2007, Stergiou and Kiremidjian, 2006]. The total travel time is obtained using the user-equilibrium principle which states that, under equilibrium, each user would choose the


262

Figure C.1: The aggregated San Francisco bay area transportation network. path that would minimize his/ her travel time [Beckman et al., 1956]. The user-equilibrium formulation is solved by the commonly-used solution technique provided by [Frank and Wolfe, 1956]. The changes in the network travel time after an earthquake are due to structural damage to bridges which will result in link closures and reduction in the link capacities. (The current work considers only the change in the total network travel time, and omits monetary costs due to structural damage.) Thus, the loss assessment is carried out by accounting for the structural damage to bridges caused by each simulated ground-motion map (obtained using the simulation-based procedure described in Section C.2) and computing the network travel time in the damaged state (In the current work, only peak-hour demands and travel times are considered.) Figure C.2 shows the loss estimates in the form of a recurrence curve, which shows the rate of exceeding various travel times delays. The current work uses these loss estimates (i.e., travel time delays) in the deaggregation calculations.


263

Figure C.2: Recurrence curve for the travel time delay obtained using the simulation-based framework.

C.5

Results and Discussion

This section presents the results from the deaggregation calculations, which include the contribution of magnitudes, faults, inter-event residuals and mean intra-event residuals to lifeline losses. The estimates are obtained using equations similar to C.2 and C.3, where the required recurrence rates are obtained using the simulation-based loss assessment framework described in the previous sections. For instance, if 100 out of 15,000 simulated events involve an earthquake of magnitude 7 and a loss (i.e., travel time delay) exceeding 10,000 hours, P(Loss > 10, 000, Magnitude = 7) = 100/15,000.

C.5.1

Contribution of magnitudes and faults to the lifeline losses

Figure C.3 shows the contribution (i.e., the likelihood term obtained from Equation 3) of various magnitudes and faults to the probability of exceeding four different travel delay thresholds, namely, 0 hours, 5,000 hours, 10,000 hours and 20,000 hours. (The total travel time in the network during normal operating conditions equals 73,000 hours.) In order to obtain the contributions of discrete magnitudes to the loss exceedance, earthquakes of


264

Figure C.3: Joint likelihoods of magnitudes and faults given that travel time delay exceeds (a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. different magnitudes need to be pooled in to bins of select discrete magnitudes. In the current work, the bin size is chosen to be 0.5. For instance, all magnitudes between 7.75 and 8.25 will be classified as magnitude 8. Figure C.3 shows that, at small loss thresholds, small magnitude events contribute significantly to the loss, which is understandable since small magnitude events are significantly more probable than large magnitude events. Also, as seen from the Figure C.3, the loss is typically dominated by events on the northern segment of the San Andreas Fault. This is because the rate of earthquake occurrence on the San Andreas Fault is much larger than that on other faults. At moderate loss levels (5,000-10,000 hours), a significant portion of the contribution is shared by earthquake events on the Hayward and the San Andreas Faults. Events of magnitude close to 7 on the Hayward Fault and of magnitude around 8 on the San Andreas Fault are ’characteristic events’ on the respective faults [USGS, 2003]. In other words, these earthquakes are known to occur on a fairly regular basis and hence, are more likely


265

Figure C.4: Level of congestion in the network as indicated by the volume/ capacity ratio. than even some of the smaller magnitude events on these faults. It can be seen from Figure C.3 that the characteristic events contribute most to the moderate losses by virtue of the higher likelihoods of occurrence. Further, it is interesting to note that an event of magnitude 7 on the Hayward has a slightly larger contribution than a much larger event (magnitude 8) on the San Andreas fault. This is due to the fact that the Hayward fault is right down the middle of the network while the San Andreas is on the western end. As a result, an event on the Hayward fault causes moderate damage to all the links in the network, while the San Andreas event causes extensive damage to the west end of the network and very less damage to the east end. The overall effect is a nearly equal contribution to the losses by both the above-mentioned events. At large loss levels (20,000 hours), however, events on the San Andreas Fault again dominate the hazard. Of all the links present in the transportation network, the most congested ones under normal operating conditions are in the western portion of the network. This can be seen from Figure C.4 which shows the ratio of the volume of traffic in each link normalized by the link capacity. Large travel time delays are incurred if links that are congested (volume/capacity greater than 0.75) under normal conditions suffer damage increasing the congestion even further. This happens when a moderate to large event occurs


266

Figure C.5: Joint likelihoods of inter-event residual given that travel time delay exceeds (a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. on the San Andreas Fault (which is adjacent to several congested links) and has large residuals, and hence such a scenario is the primary cause for large delays in the network. It can be seen from the above discussion that the most-likely scenario depends on the loss level of interest, and is influenced by factors such as the seismicity of the region, the location of the lifeline with respect to the faults and the performance state of the various components of the lifeline under normal operating conditions. In fact, for certain loss levels, it may not even be possible to choose a single dominating event as shown in Figures C.3b and c, which show nearly equal contributions by events on the Hayward and the San Andreas Faults.


267

Figure C.6: Joint likelihoods of inter-event residual given that travel time delay exceeds (a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours.

C.5.2

Contribution of inter- and intra-event residuals to the lifeline loss

Figures C.5 and C.6 show the contribution of mean intra-event and inter-event residuals to the probability of exceeding four different travel time delay thresholds. As expected, events with residuals close to zero (the mean value) dominate small seismic losses. As the loss level increases, the contribution of large inter-event and large mean intra-event residuals increases rapidly. It can be seen from Figures C.5d and C.6d that, at a loss threshold of 20,000 hours, significant contributions are obtained from mean intra-event residuals between 0.3 and 0.5 and inter-event residuals between 1.5 and 3. These results are perhaps not surprising given the large effect that inter-event and intra-event residuals have on the resulting ground motions. Since the inter-event residual is constant across the entire region, a large positive value will increase the ground-motion intensity at every site in the region. As a consequence, appropriate consideration of the inter-event residual is extremely important while assessing lifeline losses than while assessing the losses for a single structure.


268

Figure C.7: Mean magnitude of earthquakes producing a travel time delay exceeding a specified threshold.

Figure C.8: (a) Average of mean intra-event residual of earthquakes producing a travel time delay exceeding a specified threshold (b) Average of inter-event residual of earthquakes producing a travel time exceeding a specified threshold.


269

Figure C.9: Recurrence curves obtained without completely accounting for inter-event and intra-event residuals. Finally, Figures C.7 and C.8 summarize the findings from the deaggregation calculations, and illustrate the variation in the mean magnitude and the mean residuals of the ground-motion scenarios that contribute to the probability of exceeding various lifeline loss thresholds. For instance, the mean magnitude causing a travel time delay exceeding x hours is obtained by averaging the magnitudes of all earthquakes that produce a travel time delay greater than x hours. The figures show that the magnitude, inter-event residual and mean intra-event residual increase rapidly as the travel time delay threshold increases (Some of the wiggles seen at large thresholds are due to small sample sizes at these thresholds.) It is interesting to note that most of the extremely large losses occur at magnitudes well below the maximum (the maximum is 8.05 in this source model), which indicates that large losses are typically caused by moderately large events combined with large values of residuals (Figure 8) as explained previously. This result can be understood intuitively as follows: while ‘maximum magnitude’ events certainly cause large losses, they occur so infrequently that in many cases, more common moderate magnitude events may be more important.


270

In order to further emphasize the importance of residuals, in the current work, the loss assessment for the aggregated network was repeated without considering one or both types of residuals (i.e., the inter-event and the intra-event residuals). The recurrence curves obtained are shown in Figure C.9. The figure shows that the loss is significantly underestimated if even one of the two types of residuals is not considered. This is to be expected based on the previous observation that the contribution to large loss levels typically comes from events of moderately large magnitude and large positive residuals rather than events of extremely large magnitudes and zero residuals.

C.6

Transportation network performance under sample scenario ground-motion maps

This section provides a graphical illustration of why residuals play an important part in determining the losses to the transportation network. The performance of the network is analyzed under three different ground-motion scenarios, namely, A, B and C. All three scenarios result from an earthquake of magnitude 8.1 on the northern segment of the San Andreas Fault, and have a mean intra-event residual of approximately zero. The value of the inter-event residual equals 3.79 in scenario A, -1.64 in scenario B and 0 in scenario C. Figure C.10 graphically shows the performance of the transportation network under the three ground-motion scenarios. Thicker lines indicate links experiencing larger increases in the travel times. It can be seen that the delays are much greater under scenario A than under scenarios B and C. In fact, the travel time delay in the network equals 32,600 hours under scenario A, 1,550 hours under scenario B and 4,580 hours under scenario C. The significant differences are a result of the differences in the inter-event residual, since the predicted median ground-motion intensities in all these three cases are identical.

C.7

Conclusions

In this paper, a probabilistic simulation-based loss assessment procedure is coupled with a deaggregation calculation that can identify the ground-motion scenarios most likely to


271

Figure C.10: Performance of the network under three difference ground-motion scenarios corresponding to three different inter-event residuals. (a) η = 3.79, (b) η = -1.64 and (c) η= 0.


272

produce exceedance of a given loss threshold for a spatially-distributed lifeline system. The deaggregation calculation quantifies the likelihood that various events (magnitudes, faults, inter-event and intra-event residuals) could have produced the exceedance of a given loss threshold. In the current work, deaggregation calculations are performed to identify the likelihoods of earthquake events that cause various levels of travel time delays (the lifeline loss measure) in an aggregated form of the San Francisco bay area transportation network. The deaggregation calculations indicate that the ‘most-likely’ scenario depends on the loss level of interest, and is influenced by factors such as the seismicity of the region, the location of the lifeline with respect to the faults and the performance state of the various components of the lifeline under normal operating conditions. In fact, for certain loss levels, it is seen that two different events (different magnitudes and faults) could have similar contributions to the loss exceedance making it impossible to identify a single most-likely scenario earthquake. The deaggregation calculations also show that large losses are typically caused by moderately large magnitude events with large values of inter-event and intra-event residuals, indicating that it is very important to appropriately account for the residuals in the loss assessment framework. Loss assessments carried out without accounting for either the inter-event or the intra-event residuals produce highly biased and incorrect loss estimates.

Bibliography B.T. Aagaard, T.M. Brocher, D. Dolenc, D. Dreger, W. Graves, S. Harmsen, S. Hartzell, S. Larsen, and M.L. Zoback. Ground-motion modeling of the 1906 San Francisco earthquake, part I: Validation using the 1989 Loma Prieta earthquake. Bulletin of the Seismological Society of America, 98(2):989–1011, 2008. S.D. Aberson and J.L. Franklin. Impact on hurricane track and intensity forecasts of GPS dropwindsonde observations from the first-season flights of the NOAA Gulfstream-IV jet aircraft. Bulletin of the American Meteorological Society, 80(3):421–428, 1999. N.A. Abrahamson. Statistical properties of peak ground accelerations recorded by the SMART 1 array. Bulletin of the Seismological Society of America, 78(1):26–41, 1988. N.A. Abrahamson. Seismic hazard assessment: Problems with current practice and future developments. Keynote address in the First European Conference on Earthquake Engineering and Seismology, Geneva, Switzerland, 2006. N.A. Abrahamson and W.J. Silva. Summary of the Abrahamson & Silva NGA groundmotion relations. Earthquake Spectra, 24(1):99–138, 2008. N.A. Abrahamson and R.R. Youngs. A stable algorithm for regression analyses using the random effects model. Bulletin of the Seismological Society of America, 82(1):505–510, 1992. T. Adachi and B.R. Ellingwood. Serviceability of earthquake-damaged water systems: Effects of electrical power availability and power backup systems on system vulnerability. Reliability Engineering and System Safety, 93:78–88, 2008. 273

BIBLIOGRAPHY

274

T. Annaka, F. Yamazaki, and F. Katahira. Proposal of peak ground velocity and response spectra based on JMA 87 type accelerometer records. Proceedings, 27th JSCE Earthquake Engineering Symposium, 1:161–164, 1997. ASCE Standard 7-02. Minimum design loads for buildings and other structures. Technical report, Reston (VA): American Society of Civil Engineering, 2003. J.W. Baker. Quantitative classification of near–fault ground motions using wavelet analysis. Bulletin of the Seismological Society of America, 97(5):1486–1501, 2007a. J.W. Baker. Quantitative classification of near–fault ground motions using wavelet analysis. Bulletin of the Seismological Society of America, 97(5):1486–1501, 2007b. J.W. Baker and C.A. Cornell. Correlation of response spectral values for multicomponent ground motions. Bulletin of the Seismological Society of America, 96(1):215–227, 2006. J.W. Baker and N. Jayaram. Correlation of spectral acceleration values from NGA ground motion models. Earthquake Spectra, 24(1):299–317, 2008. J.W. Baker and N. Jayaram. Effects of spatial correlation of ground-motion parameters for multi-site risk assessment: Collaborative research with stanford university and air. Technical report, Technical report, Report for U.S. Geological Survey National Earthquake Hazards Reduction Program (NEHRP) External Research Program Awards 07HQGR0031, 2009. N. Basoz and A.S. Kiremidjian. Risk assessment for highway transportation systems. Technical report, Report No. 118, Blume Earthquake Engineering Center, Stanford University, 1996. M.E. Batts, M.R. Cordes, L.R. Russel, J.R. Shaver, and E. Simiu. Hurricane wind speeds in the United States. Technical report, Report No. BSS-124, National Bureau of Standards, U.S. Department of Commerce, Washington, D.C., 1980. P. Bazzurro. Personal communication, 2010.

BIBLIOGRAPHY

275

P. Bazzurro and C.A. Cornell. Vector-valued probabilistic seismic hazard analysis (VPSHA). In Proceedings of the 7th U.S. National Conference on Earthquake Engineering, Boston, MA, 2002. P. Bazzurro and N. Luco.

Effects of different sources of uncertainty and

correlation on earthquake-generated losses.

Technical report, Presented at

IFED: International Forum on Engineering Decision Making, Stoos, Switzerland. http://www.ifed.ethz.ch/events/Forum04/Bazzurro paper.pdf, 2004. P. Bazzurro, J. Park, P. Tothong, and N. Jayaram. Effects of spatial correlation of groundmotion parameters for multi-site risk assessment: Collaborative research with Stanford University and AIR. Technical report, Report for U.S. Geological Survey National Earthquake Hazards Reduction Program (NEHRP) External Research Program Awards 07HQGR0032, 2008. M.J. Beckman, C.B. McGuire, and C.B. Winsten. Studies in the economics of transportation. Technical report, Cowles Comission Monograph, New Haven, Conn.: Yale University Press, 1956. M. Bensi, A. Der Kiureghian, and D. Straub. A Bayesian network framework for postearthquake infrastructure performance assessment. In Proceedings, TCLEE2009 Conference: Lifeline Earthquake Engineering in a Multihazard Environment, Oakland, California, 2009a. M. Bensi, D. Straub, P. Friis-Hansen, and A. Der Kiureghian. Modeling infrastructure system performance using BN. In 10th International Conference on Structural Safety and Reliability (ICOSSAR09), Osaka, Japan, 2009b. J.J. Bommer and N.A. Abrahamson. Why do modern probabilistic seismic-hazard analyses often lead to increased hazard estimates? Bulletin of the Seismological Society of America, 96(6):1967–1977, 2006. J.J. Bommer, N.A. Abrahamson, F.O. Strasser, A. Pecker, P.Y. Bard, H. Bungum, F. Cotton, D. Fah, F. Sabetta, F. Scherbaum, and J. Studer. The challenge of defining upper bounds on earthquake ground motions. Seismological Research Letters, 75(1):82–95, 2004.

BIBLIOGRAPHY

276

D.M. Boore and G.M. Atkinson. Ground-motion prediction equations for the average horizontal component of PGA, PGV and 5% damped SA at spectral periods between 0.01s and 10.0s. Earthquake Spectra, 24(1):99–138, 2008. D.M. Boore, J.F. Gibbs, W.B. Joyner, J.C. Tinsley, and D.J. Ponti. Estimated ground motion from the 1994 Northridge, California, earthquake at the site of the Interstate 10 and La Cienega Boulevard bridge collapse, West Los Angeles, California. Bulletin of the Seismological Society of America, 93(6), 2003. D.M. Boore, J. Watson-Lamprey, and N.A. Abrahamson. Orientation-independent measures of ground motion. Bulletin of the Seismological Society of America, 96(4):1502– 1511, 2006. R.D. Borcherdt. Estimates of site-dependent response spectra for design (methodology and justification). Earthquake Spectra, 10:617–653, 1994. L. Brieman, J.H. Friedman, J.H. Olshen, and C.J. Stone. CART: Classification and Regression Trees. Belmont, CA: Wadsworth, 1983. D.R. Brillinger and H.K. Preisler. An exploratory analysis of the Joyner-Boore attenuation data. Bulletin of the Seismological Society of America, 74:1441–1450, 1984a. D.R. Brillinger and H.K. Preisler. Further analysis of the Joyner-Boore attenuation data. Bulletin of the Seismological Society of America, 75:611–614, 1984b. Bureau of Public Roads. Traffic assignment manual. U.S. Dept. of Commerce, Urban Planning Division, Washington D.C., 1964. K. Campbell. Personal Communication, 2009. K.W. Campbell and Y. Bozorgnia. NGA ground motion model for the geometric mean horizontal component of PGA, PGV, PGD and 5% damped linear elastic response spectra for periods ranging from 0.01 to 10s. Earthquake Spectra, 24(1):139–171, 2008. K.W. Campbell and H.A. Seligson. Quantitative method for developing hazard-consistent earthquake scenarios. In proceedings of the 6th U.S. Conference and Workshop on Lifeline Earthquake Engineering, Long Beach, CA, 2003.

BIBLIOGRAPHY

277

S. Castellaro, F. Mulargia, and P. L. Rossi. Vs30: Proxy for seismic amplification. Seismological Research Letters, 79(4):540–543, 2008. CESMD

database.

Center

for

Engineering

Strong

Motion

Data,

http://www.strongmotioncenter.org (last accessed 16 March 2010), 2008. S. Chang. Evaluating disaster mitigations: Methodology for urban infrastructure systems. Natural Hazards, 4(4):186–196, 2003. S. Chang, M. Shinozuka, and K.E. Moore II. Probabilistic earthquake scenarios: extending risk analysis methodologies to spatially distributed systems. Earthquake Spectra, 16: 557–572, 2000. B. Chiou, R. Darragh, N. Gregor, and W. Silva. NGA project strong-motion database. Earthquake Spectra, 24(1):23–44, 2008. B.S-J. Chiou and R.R. Youngs. An NGA model for the average horizontal component of peak ground motion and response spectra. Earthquake Spectra, 24(1):173–215, 2008. C.A. Cornell. Engineering seismic risk analysis. Bulletin of the Seismological Society of America, 58(5):1583–1606, 1968. C.A. Cornell and H. Krawinkler. Progress and challenges in seismic performance assessment. PEER Center News 2000; 3(2), 2000. H. Crowley and J.J. Bommer. Modelling seismic hazard in earthquake loss models with spatially distributed exposure. Bulletin of Earthquake Engineering, 4(3):249–273, 2006. A.C. Davison and D.V. Hinkley. Bootstrap Methods and Their Application. Cambridge University Press, 1997. A.C. Davison, D.V. Hinkley, and E. Schechtman. Efficient bootstrap simulation. Biometrica, 73(3):555–566, 1986. G.G. Deierlein. Overview of a comprehensive framework for earthquake performance assessment. Technical report, International Workshop on Performance-Based Seismic Design Concepts and Implementation, Bled, Slovenia, 2004.

BIBLIOGRAPHY

278

A. Der Kiureghian. A coherency model for spatially varying ground motions. Earthquake Engineering and Structural Dynamics, 25:99–111, 1996. A. Der Kiureghian. Seismic risk assessment and management of infrastructure systems: Review and new perspectives. In 10th International Conference on Structural Safety and Reliability (ICOSSAR09), Osaka, Japan, 2009. C.V. Deutsch and A.G. Journel. Geostatistical Software Library and User’s Guide. Oxford University Press, Oxford, New York, 1998. L. Dueñas-Osorio, J.I. Craig, B.J. Goodno, and A. Bostrom. Interdependent response of networked systems. Journal of Infrastructure Systems, 13(3):185–194, 2005. B. Efron. An introduction to the bootstrap. CRC Press LLC, 1998. B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall/ CRC, 1997. G.S. Fishman. A First Course in Monte Carlo. Duxbury, Belmont, CA, 2006. M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3:95–110, 1956. J.L. Franklin, S.J. Lord, S.E. Feuer, and F.D. Marks. The kinematic structure of Hurricane Gloria (1985) determined from nested analyses of dropwindsonde and Doppler data. Monthly Weather Review, 121:2433–2451, 1993. J.H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine. Technical report, Stanford University, 1999. J.H. Friedman. Tutorial: Getting Started with MART in R. http://www-stat.stanford.edu/∼ jhf/r-mart/tutorial/tutorial.pdf, 2002. T.L. Friesz, D. Bernstein, T.E. Smith, R.L. Tobin, and B.W. Wie. A variational inequality formulation of the dynamic network user equilibrium problem. Operations Research, 41:179–191, 1993.

BIBLIOGRAPHY

279

A. Gersho and R.M. Gray. Vector Quantization and Signal Compression. Springer, 1991. Global Vs30 map server. http://earthquake.usgs.gov/hazards/apps/vs30/ (last accessed 16 March 2010), 2008. K. Goda and H.P. Hong. Spatial correlation of peak ground motions and response spectra. Bulletin of the Seismological Society of America, 98(1):354–365, 2008. P. Goovaerts. Geostatistics for Natural Resources Evaluation. Oxford University Press, Oxford, New York, 1997. R. Graves. Broadband ground motion simulations for the Puente hills thrust system. Report for U.S. Geological Survey National Earthquake Hazards Reduction Program (NEHRP) External Research Program Awards 05HQGR0076, 2006. R. Graves. Broadband ground motion simulations for the Puente hills thrust system. Personal communication, 2007. P. Grossi and H. Kunreuther. Catastrophic modeling: A new approach of managing risk. New York: Springer, 2005. S.D. Guikema. Natural disaster risk analysis for critical infrastructure systems: An approach based on statistical learning theory. Reliability Engineering and Systems Safety, 94:855–860, 2009. P. Hall. Performance of balanced bootstrap resampling in distribution function and quantile problems. Probability Theory and Related Fields, 85(2):239–260, 2005. C. Hanson, M. McCann, C. Stevens, J. Rosenfield, P. Rawlings, T. Cooke, A. Fraser, and K. Karkanen. Delta risk management strategy. Technical report, Department of Water Resources, 2008. T. Hayashi, S. Fukushima, and H. Yashiro. Effects of the spatial correlation in ground motion on the seismic risk of portfolio of buildings. First European conference on Earthquake engineering and Seismology, Geneva, Switzerland, 2006.

BIBLIOGRAPHY

280

HAZUS. Earthquake loss estimation methodology. Technical manual. Prepared by the National Institute of Building Sciences for Federal Emergency Management Agency, 1997. HAZUS. Earthquake loss estimation technical manual. Technical report, National Institute of Building Sciences, Washington D.C., 1999. HAZUS. Multihazard loss estimation methodology: Hurricane model. Technical manual. Prepared by the National Institute of Building Sciences for Federal Emergency Management Agency, 2006. N. Henze and B. Zirkler. A class of invariant consistent tests for multivariate normality. Communications in Statistics-Theory and Methods, 19:3595–3618, 1990. H.P. Hong, Y. Zhang, and K. Goda. Effect of spatial correlation on estimated groundmotion prediction equations. Bulletin of the Seismological Society of America, 99(2A): 928–934, 2009. S.H. Houston, W.A. Shaffer, M.D. Powell, and J. Chen. Comparisons of HRD and SLOSH Surface Wind Fields in Hurricanes: Implications for Storm Surge Modeling. Weather and Forecasting, 14:671–686, 1999. B.R. Jarvinen, C.J. Neumann, and M.A.S. Davis. A tropical cyclone data tape for the North Atlantic Basin 1886-1983: Contents, limitations and use. Technical report, NOAA Technical Memorandum No. NWS-NHC-22, U.S. Department of Commerce, Washington, D.C., 1984. N. Jayaram and J.W. Baker. Statistical tests of the joint distribution of spectral acceleration values. Bulletin of the Seismological Society of America, 98(5):2231–2243, 2008. N. Jayaram and J.W. Baker. Correlation model for spatially-distributed ground-motion intensities. Earthquake Engineering and Structural Dynamics, 38(15):1687–1708, 2009a. N. Jayaram and J.W. Baker. Deaggregation of lifeline risk: Insights for choosing deterministic scenario earthquakes. In Proceedings, TCLEE2009 Conference: Lifeline Earthquake Engineering in a Multihazard Environment, Oakland, California, 2009b.

BIBLIOGRAPHY

281

N. Jayaram and J.W. Baker. Efficient sampling and data reduction techniques for probabilistic seismic lifeline risk assessment. Earthquake Engineering and Structural Dynamics (published online), 2010. R.A. Johnson and D.W. Wichern. Applied Multivariate Statistical Analysis. Prentice Hall, Upper Saddle River, NJ, 2007. W.B. Joyner and D.M. Boore. Methods for regression analysis of strong-motion data. Bulletin of the Seismological Society of America, 83(2):469–487, 1993. W.H. Kang, J. Song, and P. Gardoni. Matrix-based system reliability method and applications to bridge networks. Reliability Engineering & System Safety, 93(11):1584 – 1593, 2008. KiK Net. http://www.kik.bosai.go.jp/ (last accessed 16 March 2010), 2007. A.S. Kiremidjian, J. Moore, Y.Y. Fan, N. Basiz, O. Yazali, and M. Williams. PEER highway demonstration project. In 6th US Conference and Workshop on Lifeline Earthquake Engineering, TCLEE/ASCE, Monograph No.25, Long Beach, CA, 2003. A.S. Kiremidjian, E. Stergiou, and R. Lee. Issues in seismic risk assessment of transportation networks. Chapter 19, Earthquake Geotechnical Engineering, pages 939–964. Springer, 2007. S.L. Kramer. Geotechnical Earthquake Engineering. Prentice Hall, Upper Saddle River, New Jersey, 1996. M.H. Kutner, C.J. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. The McGraw-Hill Companies Inc., New York, 2005. C.W. Landsea, C. Anderson, N. Charles, G. Clark, J. Dunion, J. Fernandez-Partagas, P. Hunderford, C. Neumann, and M. Zimmer. The Atlantic hurricane database reanalysis project: Documentation for 1851-1910 alterations and additions to the HURDAT database. In Hurricanes and Typhoons: Past, Present and Future, edited by R.J. Murname and K.B. Liu, Columbia Univ. Press, NY, pages 177–221, 2004.

BIBLIOGRAPHY

282

A.M. Law and W.D. Kelton. Simulation Modeling and Analysis. McGraw-Hill, 2007. K.H. Lee and D.V. Rosowsky. Synthetic hurricane wind speed records: Development of a database for hazard analyses and risk studies. Natural Hazards Review, 8(2):23–34, 2007. R. Lee and A.S. Kiremidjian. Uncertainty and correlation for loss assessment of spatially distributed systems. Earthquake Spectra, 23(4):743–770, 2007. M.R. Legg, L.K. Nozick, and R.A. Davidson. Optimizing the selection of hazard-consistent probabilistic scenarios for long-term regional hurricane loss estimation.

Structural

Safety, 32:90–100, 2010. E.L. Lehmann and G. Casella. Theory of Point Estimation. Springer, 2nd edition, 2003. Y. Li and B.R. Ellingwood. Hurricane damage to residential construction in the US: Importance of uncertainty modeling in risk assessment. Engineering Structures, 28:1009– 1018, 2009. K. Mardia. Measures of multivariate skewness and kurtosis with applications. Biometrika, 57:519–530, 1970. K.V. Mardia, J.T. Kent, and J.M. Bibby. Multivariate Analysis. Academic Press, 1979. R.K. McGuire. Seismic Hazard and Risk Analysis. Earthquake Engineering Research Institute, 2007. J. B. McQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, 1967. C.J. ical

Mecklin values

and in

D.J. testing

Mundfrom. for

multivariate

On

using normality.

http://interstat.statjournals.net/YEAR/2003/abstracts/0301001.php 16 March 2010), 2003.

asymptotic

crit-

InterStat, (last

accessed

BIBLIOGRAPHY

283

K.V. Ooyama. Scale controlled objective analysis. Monthly Weather Review, 115:2479– 2506, 1987. J. Park, P. Bazzurro, and J.W. Baker. Modeling spatial correlation of ground motion intensity measures for regional seismic hazard and portfolio loss estimation. 10th International Conference on Application of Statistics and Probability in Civil Engineering (ICASP10), Tokyo, Japan, 2007. PEER NGA Database. http://peer.berkeley.edu/nga (last accessed 16 March 2010), 2005. J.C. Pinheiro and D.M. Bates. Mixed-effects models in S and S-PLUS. Springer, 2000. M.D. Powell. Evaluations of diagnostic marine boundary layer models applied to hurricanes. Monthly Weather Review, 108:757–766, 1980. M.D. Powell and S.H. Houston. Hurricane Andrew’s landfall in South Florida Part II : Surface wind fields and potential real-time applications. Weather Forecast, 11:329–349, 1996. M.D. Powell and S.H. Houston. Surface Wind Fields of 1995 Hurricanes Erin, Opal, Luis, Marilyn, and Roxanne at Landfall. Monthly Weather Review, 126(5):1259–1273, 1997. M.D. Powell, S.H. Houston, and T.A. Reinhold. Hurricane Andrew’s landfall in South Florida Part I : Standardizing measurements for documentation of surface wind fields. Weather Forecast, 11:304–328, 1996. M.D. Powell, S.H. Houston, L.R. Amat, and N. Morisseau-Leroy. The HRD real-time hurricane wind analysis system. Journal of Wind Engineering & Industrial Aerodynamics, 77 & 78:53–64, 1998. C. Purvis. Peak spreading models: promises and limitations. In 7th TRB Conference on the Application of Transportation Planning Models, Boston, Massachusetts, 1999. G.J. Rix, D. Werner, and L.M. Ivey. Seismic risk analyses for container ports. In Proceedings, TCLEE2009 Conference: Lifeline Earthquake Engineering in a Multihazard Environment, Oakland, California, 2009.

BIBLIOGRAPHY

284

S.R. Searle. Linear Models. John Wiley and Sons, Inc., 1977. N. Shiraki, M. Shinozuka, J.E. Moore II, S.E. Chang, H. Kameda, and S. Tanaka. System risk curves: Probabilistic performance scenarios for highway networks subject to earthquake damage. Journal of Infrastructure Systems, 213(1):43–54, 2007. N. Shome and C.A. Cornell. Probabilistic seismic demand analysis of nonlinear structures. Report No. 35, RMS Program, Stanford, CA. www.stanford.edu/group/rms (last accessed 16 March 2010), 1999. P.G. Somerville, N.F. Smith, R.W. Graves, and N.A. Abrahamson. Modification of empirical strong ground motion attenuation relations to include the amplitude and duration effects of rupture directivity. Seismological Research Letters, 68(1):199–222, 1997. E. Stergiou and A.S. Kiremidjian. Treatment of uncertainties in seismic risk analysis of transportation systems. Technical report, No. 154, Blume Earthquake Engineering Center, Stanford University, 2006. F.O. Strasser, J.J. Bommer, and N.A. Abrahamson. Truncation of the distribution of ground-motion residuals. Journal of Seismology, 12(1):79–105, 2008. D. Straub and A. Der Kiureghian. Improved seismic fragility modeling from empirical data. Structural Safety, 30:320–336, 2008. S. Tanaka, M. Shinozuka, A. Schiff, and Y. Kawata. Lifeline seismic performance of electric power systems during the Northridge earthquake. In Proceedings of the Northridge Earthquake Research Conference, Los Angeles, California, 1997. USGS. Earthquake probabilities in the San Francisco bay region: 2002-2031. Technical report, Open File Report 03-214, USGS, 2003. D. Vamvatsikos and C.A. Cornell. Developing efficient scalar and vector intensity measures for IDA capacity estimation by incorporating elastic spectral shape information. Earthquake Engineering and Structural Dynamics, 34(13):1573–1600, 2005.

BIBLIOGRAPHY

285

P.J. Vickery, P.F. Skerlj, A.C. Steckley, and L.A. Twisdale. Hurricane wind field model for use in hurricane simulation. Journal of Structural Engineering, 126(10):1203–1221, 2000a. P.J. Vickery, P.F. Skerlj, and L.A. Twisdale. Simulation of hurricane risk in the U.S. using empirical track model. Journal of Structural Engineering, 126(10):1222–1237, 2000b. P.J. Vickery, D. Wadhera, M.D. Powell, and Y. Chen. A hurricane boundary layer and wind field model for use in engineering applications. Journal of Applied Meteorology, 48: 381–405, 2008. P.J. Vickery, P.F. Skerlj, J. Lin, L.A. Twisdale Jr., M.A. Young, and F.M. Lavelle. HASUSMH hurricane model methodology. II: Damage and loss estimation. Natural Hazards Review, 7(2):94–103, 2009a. P.J. Vickery, D. Wadhera, L.A. Twisdale Jr., and F.M. Lavelle. U.S. hurricane wind speed risk and uncertainty. Journal of Structural Engineering, 135(3):301–320, 2009b. M.A. Walling. Non-Ergodic Probabilistic Seismic Hazard Analysis and Spatial Simulation of Variation in Ground Motion. PhD thesis, University of California at Berkeley, 2009. M. Wang and T. Takada. Macrospatial correlation model of seismic ground motions. Earthquake Spectra, 21(4):1137–1156, 2005. S.D. Werner, C.E.Taylor, J.E. Moore, J.S. Walton, and S.Cho. A Risk-Based methodology for assessing the seismic performance of highway systems. Technical report, Multidisciplinary Center for Earthquake Engineering Research, University at Buffalo, Buffalo, 2000. S.D. Werner, J.P. Lavoie, C. Eitzel, S.Cho, C.K. Huyck, S. Ghosh, R.T. Eguchi, C.E. Taylor, and J.E. Moore. REDARS 1: Demonstration software for seismic risk analysis of highway systems. Technical report, Multidisciplinary Centre for Earthquake Engineering MCEER, University at Buffalo, Buffalo, 2004.

BIBLIOGRAPHY

286

R.L. Wesson, D.M. Perkins, N. Luco, and E. Karaca. Direct calculation for the probability distribution for earthquake losses to a portfolio. Earthquake Spectra, 25(3):687–706, 2009. R.R. Youngs and K.J. Coppersmith. Implications of fault slip rates and earthquake recurrence models to probabilistic seismic hazard estimates. Bulletin of the Seismological Society of America, 75(4):939–964, 1985. A. Zerva and V. Zervas. Spatial variation of seismic ground motions. Applied Mechanics Reviews, 55(3):271–297, 2002.