Probabilities for large events in driven threshold ... - APS link manager

1 downloads 0 Views 2MB Size Report
Aug 7, 2012 - William R. Graves,4,‡. Donald L. Turcotte,2,§. Kristy F. Tiampo,5, and. William Klein6,¶. 1Department of Physics, University of California, Davis, ...
PHYSICAL REVIEW E 86, 021106 (2012)

Probabilities for large events in driven threshold systems John B. Rundle,1,2,3,* James R. Holliday,1,† William R. Graves,4,‡ Donald L. Turcotte,2,§ Kristy F. Tiampo,5, and William Klein6,¶ 1

Department of Physics, University of California, Davis, California 95616 Department of Geology, University of California, Davis, California 95616 3 The Santa Fe Institute, Santa Fe, New Mexico 87501 4 Open Hazards Group, Davis, California 95616 5 Department of Earth Sciences, University of Western Ontario, London, ON, Canada N6A 3K7 6 Department of Physics, Boston University, Boston, Massachusetts 02215 (Received 22 August 2011; published 7 August 2012) 2

Many driven threshold systems display a spectrum of avalanche event sizes, often characterized by power-law scaling. An important problem is to compute probabilities of the largest events (“Black Swans”). We develop a data-driven approach to the problem by transforming to the event index frame, and relating this to Shannon information. For earthquakes, we find the 12-month probability for magnitude m > 6 earthquakes in California increases from about 30% after the last event, to 40%–50% prior to the next one. DOI: 10.1103/PhysRevE.86.021106

PACS number(s): 02.50.Tt, 91.30.Px, 05.45.−a, 91.30.pd

I. INTRODUCTION

Driven threshold systems are among the most important nonequilibrium systems in nature. Examples from various fields in physics include neural networks [1] and earthquakes [2,3], among others. Examples from economics and finance include stop-loss orders in stock trading and credit defaults in the bond markets [4]. In climate studies, one important threshold includes the point at which reversal of the global ocean circulation system (“conveyor”) might occur [5]. In ecology, regime shifts and population collapse (extinctions) are often modeled as threshold-dependent phenomena [6]. A critical problem with such systems is the reliable computation of forecast probability for the largest events, such as major stock market crashes, or major earthquakes, many of which cluster in time [3]. These events are often called “Black Swan” events [7]. A driven threshold system [8] involves a persistent external or exogenous forcing, an order parameter field defining the state of the system over a series of spatial cells, an interaction, and a discontinuous value (threshold) in the order parameter that leads to a sudden transition in state. A well-used example is an earthquake fault system [2,3]. Here the plate tectonic dynamics apply forces (stress) to a weak fault zone. One of the notable features of these driven threshold systems is the appearance of power-law scaling [8]. More specifically, power laws are observed that describe frequency of avalanche events as a function of volume of the event, or as a function of time of occurrence of small events following a large event. For earthquakes, the corresponding power laws are the Gutenberg-Richter frequency-moment relation (or

*

[email protected] [email protected][email protected] § [email protected]  [email protected][email protected]

equivalently the frequency-magnitude relation) and the Omori aftershock relation [9]. Other examples of the same type of dynamics have been observed in the financial markets, where it is known that large crashes can be described as the largest outliers on the statistical distribution of returns and drawdowns [10–14]. For example, Ref. [11] shows that power-law relaxations characterize events following financial crashes. Reference [12] shows that the probability density functions characterizing avalanches in selforganized criticality models of earthquakes have q-Gaussian (e.g., essentially leptokurtotic) shape. Reference [13] shows that it is possible, under some circumstances, for these finite self-organized criticality (SOC) earthquake models to demonstrate predictability of an extreme event in the next time step. Reference [14] shows that magnitudes cluster in space and time, a result known elsewhere as Bath’s law, and deduces dynamical scaling relations for these events. Summary of our results. We have developed a method for computing the statistical probabilities of the largest avalanche events in driven threshold systems. By counting the number of small events since the last large event, we are able to compute the conditional probability for the next large event. Using reliability and receiver operating characteristic tests, we find optimal values for the model parameters. We apply these methods to the computation of large earthquake probabilities in California and Nevada using online seismic data from the ANSS catalog [11]. We find that the 12-month probability (in percent) of large earthquakes having magnitudes > 6 typically varies from 30% just after the last such event, to as much as 40%–50% just prior to the next such event. The forecast probability method we describe here is the natural time Weibull method (NTW). The frequency statistics of large avalanches in many driven threshold systems are often of the form



1539-3755/2012/86(2)/021106(6)

fv = A(V /V0 )−γ ,

(1)

where V is the volume of the event, γ is an exponent, and A and V0 are constants. Equation (1) is often considered to be the tail 021106-1

©2012 American Physical Society

JOHN B. RUNDLE et al.

PHYSICAL REVIEW E 86, 021106 (2012)

of the distribution, so that fv is the frequency of events having volume V larger than v, i.e., V > v. For earthquakes, V is usually identified with the seismic moment W [9], and Eq. (1) is often stated as the Gutenberg-Richter frequency-magnitude relation: fμ = 10a 10−bm ,

(2)

where fμ is the frequency (e.g., number per year) of events having magnitudes larger than m, and a and b are constants. Here, m is related to moment W by the relation m = (2/3) log10 W − 6.0 [9]. The point of departure for the present work is the observation that Eqs. (1) and (2) imply that on average there are many small events having volume VS (or magnitude mS ) that correspond to one much larger event having volume VL (or magnitude mL ), VS  VL (or mS  mL ). Therefore the small events serve as a kind of “clock” that marks the “time” between the large events. This is a variation of the natural time hypothesis, and has been discussed previously in connection with earthquakes [15,16]. For example, it is easy to see that on average there are N  = (VL /VS )γ events corresponding to Eq. (1), or N  = 10b(mL −mS ) small earthquakes having magnitude mS for every large earthquake having magnitude mL , as from Eq. (2). Alternatively, we can say that a system that is characterized by a statistically stable scaling distribution may temporarily develop a deficiency of large events, since the small events are far more frequent. Eventually, the distribution must be “filled in” by the occurrence of a large event, thereby rectifying the statistics. To use this idea in constructing a conditional probability, we consider a Weibull probability law [17], which is evidently the most widely used probability law for reliability analysis, and for time to failure in fracture models and experiments. An alternative distribution to be considered might be the Gumbel distribution [18,19], which has been used to describe the distribution of extreme magnitudes in fracture and earthquakes. The Gumbel distribution is also sometimes called the ‘log-Weibull” distribution. Here our goal is to compute the probability of a large event occurring within a time interval from the present. For that reason, the Weibull law is apparently the distribution of choice. On the other hand, if we wished to estimate the maximum time interval, or the maximum magnitude event that might occur, the Gumbel distribution might be more appropriate. In the Weibull law [20], the temporal scale constant is N = N  / [(β + 1)/β], β is the exponent, and [·] is the  function. Weibull probabilities are often used in failure and reliability analyses [20]. Consider the earthquake example. In a sample of n earthquakes selected at random, the cumulative (Weibull) probability of finding a large earthquake in the sample is taken to be     αn β P (n) = 1 − exp − . (3) N Here we have inserted a parameter α that will later be used in optimizing the probability, expecting that α will be of order 1. Note that β = 1 corresponds to a Poisson statistic. We now apply Eq. (3) in a conditional sense. Given that there have been n(t) small earthquakes since the last large

earthquake that occurred a time interval t ago, we ask: What is the probability that another large earthquake will occur after an additional number n of small earthquakes have occurred? Thus we compute the conditional probability [14]:    α (n + n) β  αn β + . (4) P (n|n) = 1 − exp − N N Equation (4) does not depend explicitly on time, or on future (calendar) time t. To map this probability back to the dual (calendar) time domain, we make the reasonable assumption that for a small time interval t:, n ≈ fn (mS )t,

(5)

where fn (mS ) is the frequency (Poisson rate) of small events computed from Eq. (2) having magnitudes larger than ms . Using Eq. (5) in Eq. (4), we can estimate the conditional probability in the dual, time domain: ⎧   β   ⎫ ⎨ α n + fμ t αn β ⎬ P (t|t) = 1 − exp − + . ⎩ N N ⎭ (6) We now consider the optimal choice of parameters (α,β) via backtesting. Considerable literature has accumulated in the weather and financial communities relating forecast validation [21]. Of particular interest are reliability/attributes (R/A) tests [22,23], and receiver operating characteristic (ROC) tests [23–27]. The R/A test is essentially a scatter plot, in which the frequency of observed large events is plotted against the computed large event probabilities. The ROC test plots successful forecasts (H = “hit rate”) against false alarms (F = “false alarm rate”). In the R/A test, reliability is computed by a weighted measure known as the Brier skill score (BSS), and is displayed as a scatter plot of observed frequency versus forecast probability [28]. In the R/A test, the BSS can be decomposed into a set of terms that represent, respectively, the reliability, sample skill and the resolution [28], which are all measures of forecast quality. II. DATA ASSIMILATION

There are two parts to the data assimilation process. These parts are, respectively, optimization of the regional time dependence, and the second is localization and optimization in the spatial domain. The first is relatively straightforward using the forecast quality tests, but the second is computationally intensive and presents problems of a more detailed nature. In this paper we focus on the first of these problems, reserving consideration of the second to a future paper [29]. The method using data assimilation follows from the Brier score [28]: Q = 1/K

K  (rk − dk )2 .

(7)

k=1

Here K represents the number of forecasts made, k = 1, . . . ,K, rk is the probability computed on the kth occasion, dk = 1 if an event occurs within the forecast window, and dk = 0 otherwise. For our purposes, a forecast is made

021106-2

PROBABILITIES FOR LARGE EVENTS IN DRIVEN . . .

PHYSICAL REVIEW E 86, 021106 (2012)

on every time step from t0 to t, in time steps of about 0 δt = 3.65 days, so K = t−t . The reliability test is conditional: δt Given the forecast, what was the outcome? The Brier score can be decomposed into a series of terms that lead to measures of forecast reliability, sample skill, and resolution [22,28]: ¯ − d) ¯ + (1/K) Q = d(1

T 

Kt (rt − d¯t )2

t=1

− (1/K)

T 

Kt (d¯t − d¯t )2 .

(8)

t=1

In Eq. (8), the probability density has been divided into T subsamples, where the subsample t consists of the Kt forecasts  for which rk = rt and K = t Kt d¯t . Note that d¯t is the relative frequency of occurrence of the event when R − k = rt and  d¯ = (1/K) Kt d¯t . (9) t

In Eq. (8) above, the first term is interpreted as the quadratic score for a static forecast having sample frequency d¯ over the K ¯ − d). ¯ The instances [28] and is generally denoted as Q∗ = d(1 second term is the average squared difference between forecast probability Rk and relative frequency, thus it is a measure of the reliability of the forecast. If rt = dt then the forecast is perfectly reliable, and smaller values of this term represent better reliability. The third term is regarded as a measure of attribute resolution [27]. It measures the difference between ¯ thus the importance of the subsample d¯t and the overall d, subsample t. No resolution is indicated when d¯ = d¯t , and larger values of the third term indicate better resolution. Resolution indicates a forecaster’s ability to divide the sample into subsamples for which the forecast probabilities differ from the static probability. Following Ref. [28], we define the following indicators of forecast quality: Reliability :

(1/K)

T 

Kt (rt − d¯t )2 ,

(10)

t=1

(Q ∗ −Q) , Q∗ T  ¯ 2. (1/K) Kt (d¯t − d)

Sample skill : Resolution :

t=1

(11) (12)

 As an example, the reliability error is  = (1/K) Ss=1 (rs − ds )2 , where we have divided the probability density into S subsamples, with the subsample  s consisting of the Ks forecasts for which rk = rs and K = s Ks. A separate verification method is the ROC and the associated area skill score [23–27]. In this method, a threshold D is applied to a probabilistic forecast, transforming the forecast into a binary, or “yes/no” forecast. The forecasts are rank ordered from highest probability value to lowest probability value, and the threshold D decreases downward progressively from highest to lowest value. For values D  rk , the event is considered to be predicted “P = yes,’ whereas for D < rk , the event is considered to be “P = no.’ During the forecast period, it is determined whether the event is observed, “O = yes,” or not, “O = no.’

A 2 × 2 contingency table is constructed for each value of the threshold D based on whether a forecast within a spacetime increment is successful or not. The entries in the table are numbers a,b,c,d that represent the number of space-time forecasts having (a) P = yes, O = yes; (b) P = yes, O = no; (c) P = no, O = yes; and (d) P = no, O = no. The success or “hit” rate is defined to be H = a/(a + c). The false alarm rate is defined to be F = b/(b + d). The total number of forecasts is n = a + b + c + d. The ROC diagram is then constructed by plotting H against F as the threshold value R decreases. Both H and F vary between the limits [0,1]. A “good” forecast has H (F ) > F . A random forecast is characterized by the condition H = F which is represented on the ROC diagram by a diagonal line connecting the point (0,0) at the lower left corner to the point (1,1) at the upper right corner. The area skill score [23] is the area under the curve H (F ), thus it takes on values from 0 to 1. Here we modify this definition slightly to be the area under the curve H (F ) − F , thus it takes on values from −0.5 to 0.5. Values 0 indicate a better forecast of event occurrence. III. ESTIMATION OF UNCERTAINTY

There are several ways to estimate uncertainty in the computation of reliability, skill, and resolution. One of these is via Monte Carlo methods [30], in which random models are generated and evaluated, with standard deviations being computed from deviations of model parameters from the optimal value. By random models we mean models in which, for example, the parameter f in Eq. (6) is assigned a value based on a random number generator, and the performance of the suites of models is used to define the envelope of uncertainty. A second method of estimating uncertainty is with bootstrap methods, in which large earthquake times are sampled with replacement [31]. Estimates of reliability or other parameters are computed for suites of models having the optimal parameters, and the envelope of uncertainty is computed. Here we implement this method by resampling the ML  6 occurrence times in 500 distinct trials, and recomputing the verification measures as described above. The 500-member ensemble of values of these measures then defines the statistical sample that allows us to estimate sample uncertainties. Using the bootstrap method, we compute both the mean and standard deviation of the statistical measures of reliability, sample skill, etc., for the 500 samples. For a measure of uncertainty, we choose the larger of (1) the computed standard deviation; or (2) the difference between the computed mean from the 500 samples, and the computed value for the actual data. IV. RESULTS

Applying these ideas to earthquakes in California, we use data from the area between latitudes 29.0◦ and 42◦ north, and between longitudes −127◦ and −113◦ degrees east. We consider the time interval from January 1, 1985 until April 26, 2011, using data from the ANSS catalog with a catalog

021106-3

PHYSICAL REVIEW E 86, 021106 (2012)

Probability of an Earthquake (%)

JOHN B. RUNDLE et al.

FIG. 1. (Color online) Optimized time series for earthquake probabilities in the California-Nevada region during 1997 to the present, where date quality is best. Top: Chance of an earthquake having magnitude mL > 6 during the next 12 months as a function of time. Bottom: Magnitudes of all events in the region as a function of time. Blue dots and vertical dashed lines are earthquakes in the region having mL > 6. ˜ × represents current 12-month probability of 51%.

completeness level of magnitude mC  3.5. Parameters α and β are estimated using data from 1985 until April, 4 2010. The period from April 5, 2010 to April 26, 2011 represents a forecast. For data analysis and plotting purposes, we use a time step equal to 0.01 year, about 3.65 days. Larger earthquakes closer together in time than this value will not appear independently in the time series plots, although they are treated as separate events in the verification analyses. Examples include the June 28, 1992 Landers-Big Bear events (M7.3 and M6.5, 3 h apart), and the November 23–24, 1987 Elmore Ranch-Superstition Hills events (M6.2 and M6.6, 12.3 h apart). To make practical calculations, we first compute the Poisson rate of small earthquakes fm (mS ) in the defined region. Using Eq. (6), we then compute probabilities P (t|t), for a range of values of (α,β) during the period 1985–present. For each specific set of parameters, we compute a reliability diagram and associated reliability error . The parameters leading to the minimum value of  = min , (αmin ,βmin ), are then used in Eq. (6) to compute the forecast (probabilities). We also compute contingency tables and an ROC diagram to verify that the forecast has skill. The results are shown in Figs. 1–3, which are plots, respectively, of probability as a function of time, reliability, and ROC. For the region under consideration, we find that αmin = 1.25 and βmin = 1.40. Figure 1 shows the optimal forecast as a function of time in years over the period February 22, 1995–February 22, 2012. Figure 2 is a plot of the reliability diagram, and Fig. 3 is a plot of the receiver operating characteristic. Both the reliability diagram and the receiver operating characteristic also show

results for 500 random catalogs as discussed previously, to indicate the general level of significance of the forecast. Note that for the backtesting process, we use data only up until 15:40 PDT on April 4, 2010, the time and date of the m = 7.2 El Major-Cucupah (Baja) earthquake. The reason is that any forecast arising from any of the data from April 4 until the present would automatically be counted as a false alarm, since by definition, no earthquake has occurred since April 4, 2010. Therefore, any forecast after April 4, 2010, for example in Fig. 1, can be considered as an actual future forecast for mL  6 earthquakes within the California-Nevada study area. For example, to construct the reliability diagram, we use Eqs. (10) and (13). Consider Fig. 1. From February 22, 1995 to April 4, 2010 we have about 1500 samples at 0.01 year (0.01 year = 3.65 days) increments. Each of these samples represents a forecast at the time indicated, because there is a computed probability that at least one mL  6 earthquake will occur within 1 yr of that particular time. This computed probability from Fig. 1 therefore defines rt in Eq. (10). Thus there are K = 1500 distinct forecasts rk . Now if an earthquake having mL  6 occurs within 1 yr of the time corresponding to rk , we set dk = 1. We then divide the K forecasts into T subsamples and proceed to compute the reliability for each subsample. In the reliability diagram of Fig. 2, we use T = 20 subsamples, of which seven have computed forecast probabilities, leading to seven points on the diagram. To estimate the random error, we constructed 500 catalogs using the bootstrap method described above, and applied the same procedure. These are plotted as the myriad of small crosses aligned in vertical groups.

021106-4

PROBABILITIES FOR LARGE EVENTS IN DRIVEN . . .

PHYSICAL REVIEW E 86, 021106 (2012)

the 500 ROC curves generated from the 500 random forecasts cluster around this no-skill line. The spread of the 500 ROC curves provides an indication of the magnitude of the random error, as well as the likelihood that a random forecast might show skill just by chance. V. SPATIAL LOCALIZATION

FIG. 2. (Color online) Reliability (scatter plot) diagram for earthquake probabilities corresponding to the time series shown in Fig. 1, and to the data in Table I. In the figure, there are seven bins having forecast probability, and these are plotted against observed frequencies. A perfectly reliable forecast would have zero reliability error, and the blue dots would lie along the black dashed line having slope = 1. The red dashed line is the no-skill line, and the large red dot is the “climatology point,’ or average forecast. The light turquoise crosses are the reliability errors constructed from superposing the reliability diagrams for the 500 random forecasts associated with the bootstrap error analysis. The inset shows the histogram of forecast probability bins.

The receiver operating characteristic diagram was computed in an analogous manner. Starting at a threshold value of 100% in Fig. 1, the threshold was then lowered toward 0%. At each threshold value, a contingency table is constructed. Of the S samples of length 0.01 years above the threshold, the number of samples that include a large earthquake, as well as those that do not are tabulated and the a,b,c,d values in the table are computed. From these values the ROC curve is computed. In this diagram, the diagonal line represents the no-skill (random guessing) ROC curve. It can be seen that

FIG. 3. (Color online) Temporal receiver operating characteristic diagram for the forecast of Fig. 1. The solid blue line is the ROC diagram for 1997 to present, and the light turquoise crosses represent a superposition of the ROC diagrams for the 500 random forecasts.

An important extension of the calculation is the localization of probability in space. Earthquakes outside the region of interest contribute to probabilities inside the region of interest, with a diminishing influence as their distance increases. The characteristic correlation length ξ controls the average distance over which the influence is exerted. The average magnitude of the influence depends on the correlation function, which for a mean field system is C(r) = e−r/ξ /r d−2+η [32]. Typically η = 0 for mean field systems such as elastic systems [32]. In addition, horizontal distances of interest are usually much larger than vertical distances (depths), so we take C(r) ≈ e−r/ξ . This implies that small earthquakes outside the region of interest contribute C(r) to the increase in probability, while large earthquakes contribute a factor C(r) to the decrease in probability inside the region. Calculations illustrating these effects will be presented in a separate publication [29]. VI. SUMMARY

Black Swans are unexpected large events that are often associated with systems having power-law or fat-tailed frequencysize statistics. For such statistics, the variance is often much larger than the Gaussian variance. For fat-tailed statistics whose exponents are within certain ranges, the variance can in fact be infinitely large. We have presented a method for computing conditional probabilities of occurrence in these systems based on failure and reliability analysis. We have applied these methods to earthquakes in the California-Nevada region, and we show optimal forecasts as well as the back tests that underpin them. The model for earthquake probabilities described here has implications for the long-standing debate over whether earthquakes tend to cluster in time, or anticluster [33]. Are large earthquakes more probable during times of increased seismic activity (activation) or during times of decreased seismic activity (quiescence)? We have considered this question at length [34] and find no compelling evidence for either assumption. We concluded that earthquake probabilities computed from rates of activity are not statistically significant as compared to Poisson probabilities. These results would seem to indicate that the clustering model (activation or “attraction”) for earthquake interoccurrence times does not seem to be preferred over the anticlustering model (quiescence or “repulsion”), and vice versa. Following the results in Ref. [34], we concluded that a new model for earthquake probabilities was needed, leading to the model described in this paper. Financial markets and neural networks are known to exhibit phenomena similar to earthquakes. For example, the economic collapse and market crash of 2008–2009 has been compared to a major earthquake, including “foreshocks” and “aftershocks”

021106-5

JOHN B. RUNDLE et al.

PHYSICAL REVIEW E 86, 021106 (2012)

TABLE I. Summary of backtest statistics for NTW model and 500 random catalogs as computed by a statistical bootstrap method. Times for mL  6 earthquakes are randomly sampled with replacement to produce estimates of reliability and other measures. Here we implement this method by resampling the large earthquake times in 500 distinct trials and recomputing the verification measures as described above. The 500-member ensemble of values of these measures then defines the statistical sample that allows us to estimate sample uncertainties. Using the bootstrap method, we compute both the mean and standard deviation of the statistical measures of reliability, sample skill, etc., for the 500 samples. For a measure of uncertainty, we choose the larger of (1) the computed standard deviation; or (2) the difference between the computed mean from the 500 samples, and the computed value for the actual data. Ranges for the various test measures are as follows (listing good values to bad values). Reliability (0 to ∞), resolution (1 to 0), sample skill (1 to −∞), area skill ( + 0.5 to −0.5). Errors are ±1σ .

NTW model Random models

Reliability

Resolution

Sample skill

Area skill

0.027 0.070 ± 0.031

0.038 0.066 ± 0.022

0.045 −0.019 ± 0.072

0.138 0.045 ± 0.122

[11–14,33,35]. We predict that the approach developed here will have significant applications to the markets. A market move in a stock or index can be associated with a “magnitude,’ the frequency distribution of which is found to have a power-law tail, similar to the Gutenberg-Richter relation in earthquakes [8–10]. Strong temporal clustering of large events is also observed, although with different characteristics. An example is the lack of a relatively regular renewal cycle of the type that is seen in Fig. 1. Consequently we predict that the

exponent β will be characteristic of a stretched exponential distribution (β < 1) rather than the compressed (Weibull) distribution (β > 1) used above.

[1] J. J. Hopfield, Proc. Natl. Acad. Sci. USA 79, 2554 (1982). [2] J. B. Rundle and D. D. Jackson, Bull. Seismol. Soc. Am. 67, 1363 (1977). [3] J. B. Rundle and W. Klein, J. Stat. Phys. 72, 405 (1993). [4] S. G. Hanson, M. H. Pesaran, and T. Schuermann, J. Empirical Finance 15, 583 (2008). [5] S. Rahmstorf, Nature (London) 464, 681 (2010). [6] J. Rockstrom, W. Steffen, K. Noone, A. Persson, F. S. Chapin, E. Lambin, T. M. Lenton, M. Scheffer, C. Folke, H. Schellnhuber et al., Ecology Society 14, 32 (2009). [7] N. N. Taleb, The Black Swan (Random House, New York, 2007). [8] J. B. Rundle, K. F. Tiampo, W. Klein, and J. S. S. Martins, Proc. Natl. Acad. Sci. USA 99, 2514 (2002). [9] C. H. Scholz, The Mechanics of Earthquakes and Faulting (Cambridge University Press, Cambridge, England, 2002). [10] X. Gabaix, P. Gopikrishnan, V. Plerou, and H. Stanley, Nature (London) 423, 267 (2003). [11] F. Lillo and R. N. Mantegna, Phys. Rev. E 68, 016119 (2003). [12] F. Caruso, A. Pluchino, V. Latora, S. Vinciguerra, and A. Rapisarda, Phys. Rev. E 75, 055101 (2007). [13] F. Caruso and H. Kantz, Eur. Phys. J. B 79, 7 (2011). [14] E. Lippiello, L. de Arcangelis, and C. Godano, Phys. Rev. Lett. 100, 038501 (2008). [15] P. A. Varotsos, N. V. Sarlis, H. K. Tanaka, and E. S. Skordas, Phys. Rev. E 71, 032102 (2005). [16] J. R. Holliday, J. B. Rundle, D. L. Turcotte, W. Klein, K. F. Tiampo, and A. Donnellan, Phys. Rev. Lett. 97, 238501 (2006). [17] M. Evans, N. Hastings, and B. Peacock, Statistical Distributions (Wiley, New York, 1993). [18] E. J. Gumbel, Statistics of Extremes (Columbia University Press, New York, 1958).

[19] E. Castillo, Extreme Value Theory in Engineering (Academic, New York, 1988). [20] M. Xie and C. D. Lai, Reliab. Eng. Syst. Saf. 52, 87 (1996). [21] WWRP/WGNE Joint Working Group on Verification, Forecast Verification: Issues, Methods and FAQ, http://www.cawcr.gov.au/projects/verification/ (2010). [22] A. H. Murphy and H. Daan, Probability, Statistics, and Decision Making in the Atmospheric Sciences (Westview Press, Boulder, 1985). [23] I. T. Jolliffe and D. B. Stephenson, Forecast Verification (Wiley, Chichester, 2003). [24] D. M. Green and J. A. Swets, Signal Detection Theory and Psychophysics (Wiley, New York, 1966). [25] I. B. Mason, Aust. Met. Mag. 30, 291 (1982). [26] A. H. Murphy and R. L. Winkler, Mon. Weather Rev. 115, 1330 (1987). [27] V. V. Kharin and F. W. Zwiers, J. Clim. 16, 4145 (2003). [28] W. Hsu and A. H. Murphy, Int. J. Forecasting 2, 285 (1986). [29] P. R. Bevington and D. K. Robinson, Data Reduction and Error Analysis for the Physical Sciences (McGraw-Hill, New York, 2006). [30] B. Efron and R. Tibshirani, An Introduction to the Bootstrap (Chapman and Hall, London/CRC, Boca Raton, 1993). [31] J. B. Rundle (unpublished). [32] N. Goldenfeld, Lectures on Phase Transitions and the Renormalization Group (Addison Wesley, Reading, MA, 1992). [33] A. Corral, Phys. Rev. E 71, 017101 (2005). [34] J. B. Rundle, J. R. Holliday, M. Yoder, M. K. Sachs, D. L. Turcotte, K. F. Tiampo, W. Klein, and L. H. Kellgg, Geophys. J. Int. 187, 225 (2011). [35] R. B. Reich, Aftershock: The Next Economy and America’s Future (Alfred A. Knopf-Random House, New York, 2010).

ACKNOWLEDGMENT

Research by J.B.R. and J.R.H. was performed with funding from NASA NNX08AF69G to the University of California, Davis.

021106-6