Non-parametric estimation of historical volatility

1 downloads 0 Views 285KB Size Report
Statistics Research Associates Ltd, Wellington, New Zealand. E-mail: ... A new non-parametric procedure for estimating historical volatility is proposed based on ...
Non-parametric estimation of historical volatility J A Randal†, P J Thomson‡ and M T Lally† † Victoria University of Wellington, New Zealand ‡ Statistics Research Associates Ltd, Wellington, New Zealand E-mail: [email protected]

Non-parametric estimation of historical volatility

2

Non-parametric estimation of historical volatility

Abstract. Evolving volatility is a dominant feature observed in most financial time series and a key parameter used in option pricing and many other financial risk analyses. A number of methods for non-parametric scale estimation are reviewed and assessed with regard to the stylised features of financial time series. A new non-parametric procedure for estimating historical volatility is proposed based on local maximum likelihood estimation for the t–distribution. The performance of this procedure is assessed using simulated and real price data and is found to be the best among estimators we consider. We propose that it replaces the moving variance historical volatility estimator.

JEL classification: C1. Keywords: EM algorithm; financial return data; historical volatility; robust estimation; Student’s t–distribution. 1. Introduction Evolving volatility is a dominant feature observed in most financial time series and a key parameter used in option pricing and many other financial risk analyses. Although there is now an extensive literature on the estimation of parametric volatility models (see Engle (1982), Taylor (1986), Bollerslev, Chou and Kroner (1992), Harvey, Ruiz and Shephard (1994), Shephard (1996), Barndorff–Nielsen and Shephard (2001), and Andreou and Ghysels (2002) for example) less attention has been paid to simpler non–parametric alternatives. Exceptions include A¨ıt–Sahalia (1996), Anderson and Grier (1992), Andersen, Bollerslev, Diebold and Labys (2001), and Turner and Weigel (1992) for example. In this paper we present preliminary findings on the construction and properties of non–parametric estimators of time–varying volatility where volatility is assumed to be a measure of variance. Our focus is on financial data and, in particular, the daily returns of market prices such as equities, market indices, exchange rates etc., where daily returns are the first differences of the logarithms of the prices. Thus each daily return measures the continuously compounded rate of return on the asset, index etc. over the day concerned. Our objective is to construct non–parametric historical volatility estimators that have simple structure, are cheap to compute, and are tailored to the typically heavy– tailed distributions met in practice. In particular we seek procedures that are robust to distributional assumptions, resistant to outliers, and have a sound statistical basis with reasonable precision properties. The estimation procedures that we consider construct local robust scale estimates (not necessarily estimates of standard deviation) based on finite moving–averages of the squared deviations of the time series from its local level. The moving–average weights are selected with reference to a target family of heavy–tailed distributions exemplified by the t–distribution, and the span of the moving–average is chosen so that the volatility is approximately constant within the local time window concerned. Finally, a global correction factor is applied to the local scale estimates to provide estimates of time–varying volatility.

Non-parametric estimation of historical volatility

3

Smoothness assumption This assumes that volatility generally changes smoothly over time and, in particular, is locally constant over the local time windows within which estimation takes place. This seems a reasonable assumption for the most part; without it reliable volatility estimation would be difficult to achieve. It also accords with the long–memory dependence of squared daily returns and absolute returns observed in practice (see Ding, Granger and Engle (1993), Granger and Ding (1995), Ryd´en, Ter¨ asvirta and ˚ Asbrink (1998) for example). However this framework does not properly account for discontinuous breaks in volatility which may occur in practice (see Lamoureux and Lastrapes (1990), Hamilton and Susmel (1994), McConnell and Perez–Quiros (2000) for example) although our methodology could no doubt be adapted to better identify such changes. Heavy-tail assumption This assumes that the daily returns of financial time series have heavy–tailed distributions that are better approximated by a t–distribution with low degrees of freedom than a Gaussian distribution. There are many studies, Fama (1965) being the first, that support the general heavy–tailed hypothesis which would appear to be a ubiquitous feature of financial data. A number of candidate distributions have been proposed of which the t–distribution is a common choice. See, for example, Blattberg and Gonedes (1974), Harvey, Ruiz and Shephard (1994), Hurst and Platen (1997), Kon (1984), Liesenfeld and Jung (2000) and Barndorff–Nielsen and Shephard (2001) among many others. Typically the degrees of freedom ν of the t–distribution found in such studies range between 3 and 9. Note, however, that the tν distribution has infinite moments of order k when k ≥ ν. Thus ν ≥ 3 ensures finite variance and ν ≥ 5 ensures finite kurtosis. Global correction factor Although the heavy tailed distributions of the returns make the moving sample variance an unreliable local volatility estimate, particularly if the moving time window is small, we shall assume that the sample variance is a reliable estimator of the variance of the underlying heavy–tailed distribution in large samples of the order of the length of the data. In essence we assume that the extreme values of the heavy–tailed distribution will appear representatively in large samples where they will not distort the sample variance. However, in small samples of the order of the moving time window, such extreme observations will be over–represented when they occur and can severely distort the sample variance. With these points in mind we now choose to model a time series of (daily) returns Rt as Rt = σt t

(1)

where Rt = ln Pt −ln Pt−1 and Pt is the underlying time series of prices concerned. The t are assumed to be independently and identically distributed with mean zero and unit variance. Although the assumption of finite variance excludes potentially interesting specifications for t , the condition E(2t ) = 1 serves to identify the volatility σt2 which is assumed to be a strictly positive, smoothly–varying function of time so that Rt has mean zero and variance σt2 . In fact, Rt will almost always have a non-zero mean level that is typically very small in relation to σt t since it represents the mean return over just one day. However we shall assume that Rt has been corrected for such an evolving mean level if appropriate. As mentioned before, the Rt will typically be heavy–tailed so that the common distribution of the t will be heavy–tailed also.

Non-parametric estimation of historical volatility

4

As a general comment, we note that although non–parametric volatility estimators cannot be used to forecast volatility, they are particularly appropriate for extracting and understanding historical volatility prior to fitting any more sophisticated parametric model. They also provide robust benchmarks for testing the forecasting and in–sample performance of competing parametric procedures. In Section 2 we review methods for robust estimation of scale in the case where σt2 is a constant and in Section 3 we use this information to construct robust and resistant estimates of evolving volatility. Section 4 presents a preliminary assessment of these estimators using simulated data and also applies the estimators to stock price data. Section 6 concludes. 2. Robust scale estimation An excellent review of robust and resistant scale estimation methods is given in Iglewicz (1983) who builds on the Monte Carlo studies of Gross (1976) and Lax (1975). Modern alternatives to the estimators discussed by Iglewicz can be found in Rousseeuw and Croux (1993), Croux (1994) and Randal (2002). In a subsequent paper, Lax (1985) reported on a more extensive Monte Carlo study on the finite–sample performance of 17 robust scale estimators for heavy–tailed symmetric distributions. The 17 scale estimators considered had been identified as promising or commonly used from earlier studies. The general conclusion of the Lax (1985) study was that appropriately tuned A–estimators of scale were more robust for heavy–tailed symmetric distributions than the other estimators considered. In Sections 2.1 to 2.3 we review existing scale estimators, and in Section 2.4 we introduce a new non–parametric estimator of scale which will form the basis of a robust volatility estimator for financial time series. 2.1. The sample variance The sample standard variance is the optimal estimator of variability for normal samples, however it is not an efficient estimator for heavy–tailed data, and as such it is not a robust estimator. The moving sample variance has been prominent in empirical studies of stock returns at least since Officer (1973), to the extent where it is commonly called historical volatility for contemporaneously available data, and realized volatility for future data. It plays an important part of Figlewski’s (1997) analysis, where it is used to estimate both “historical” and “realized” volatility as a basis for evaluation of volatility forecasting techniques. This estimator implicitly makes the assumption that volatility is constant over the estimation window, as we do. However, it fails to take into account the heavy-tailed distribution of stock returns. 2.2. The median absolute deviation The median absolute deviation from the median (MAD) is a robust estimator in the sense that it is relatively efficient for heavy–tailed data. Its general formula is MAD = median{|Xt − medianXt |}.

(2)

For normally distributed data, the expected value of the MAD is the upper quartile of the distribution, or 0.6475σ. To measure volatility, we will consider squaring the MAD.

Non-parametric estimation of historical volatility

5

2.3. The A–estimator of scale Given n independent and identically distributed observations X1 , . . . , Xn with zero mean, the A–estimator of scale with biweight weight function is given by n k2 X It (1 − u2t )4 Xt2 (3) s2A = A n − 1 t=1 where (

n

−1 kA

1X = (1−u2t )(1−5u2t ), n t=1

ut = It Xt /(cs0 ),

and

It =

1 0

(|Xt | ≤ cs0 ) (|Xt | > cs0 ).

Here c is a positive tuning constant that depends on the choice of auxiliary scale estimate s0 > 0, and s0 is generally taken to be the MAD of the Xt . An A–estimator of scale is based on the sample analogue of the asymptotic variance of an M –estimator of location (see Iglewicz (1983) or Lax (1985) for further details). Like the sample variance, the A–estimator can be viewed as a weighted sum of squared observations, however it features robustness weights that adapt to the size of Xt by downweighting extreme observations. The extent of the downweighting is controlled through the auxiliary scale estimator, and the scaling constant c, and also by the shape of the biweight function. The estimator (3) with s0 = MAD and c = 9 performed best in the Lax (1985) Monte Carlo study. Note that for Gaussian data MAD/0.6745 is an unbiased estimator of the standard deviation so that, in this case, (3) with s0 = MAD and c = 9 will ignore observations that lie more than 6 standard deviations from zero. To better identify appropriate values for c in (3) we re–worked the Lax (1985) study for a selection of scale estimators of interest. Given the significant improvements in speed and accuracy of modern computing that have taken place since the mid 1980s, a larger scale, more accurate study was able to be undertaken (Randal (2003)). With the larger-scale simulation, we were able to confirm the choice of the A–estimator with c = 9 as a relatively efficient estimator of scale, although unlike Lax (1985), it was not the most efficient estimator, and it had a somewhat lower efficiency than that reported by Lax. Optimal estimators were taken as those whose logarithms had the largest triefficiency (see Iglewicz (1983)). As well as being a symmetrising transformation, the use of logarithms allows direct comparison between the reliability of scale estimators whose expected values differ. Here triefficiency is measured as the minimum of the efficiencies obtained in the three cases where the data follows the Gaussian, one–wild and slash distributions. These three “corner” distributions have varying degrees of heavy tail behaviour and are meant to encompass most situations met in practice. In particular, the slash distribution has infinite variance and heavier tails than the target family of tν distributions (ν ≥ 3) that we have in mind here. 2.4. The iterated t–estimator of scale We now develop an alternative estimator to (3) which is more closely tailored to the t–distribution. Let X1 , . . . , Xn denote the n independent and identically distributed random variables given by σZt Xt = √ (t = 1, . . . , n) St

Non-parametric estimation of historical volatility

6

where the Zt are independent N (0, 1) random variables that are independent of the St , the St are independent random variables on [0, ∞) each with distribution function FS (s), and E(Xt2 ) = σ 2 . We have in mind the case where St is proportional to a χ2ν random variable so that the Xt are scaled tν random variables with ν ≥ 3 to ensure finite variance. However, for the moment, we pursue the more general case of arbitrary distribution function FS (s) and consider estimating σ 2 using maximum likelihood. Note that, in almost all cases of practical interest, the maximum likelihood estimator of σ 2 will have to be obtained using iterative optimisation procedures and it is the moving–average equivalent of such procedures that we seek. Bearing these points in mind we now use the EM algorithm (Dempster, Laird and Rubin (1977)) to construct an iterative formula for estimating σ 2 . Ignoring additive terms that are not functions of the parameter σ 2 , the log-likelihood of the complete data X = (X1 , . . . , Xn ), S = (S1 , . . . , Sn ) is proportional to n X ˜ = − log σ 2 − 1 1 log L St Xt2 . σ 2 n t=1 ˜ Maximising E0 (log L|X) with respect to σ 2 yields n 1X σ ˆ2 = E0 (St |X)Xt2 n t=1

(4)

where the expectation is with respect to the conditional distribution evaluated at σ02 , a previous estimate of σ 2 . Note that (4) has the same structure as (3) since both involve averaging squared observations weighted using weights that are a function of an auxiliary or preliminary estimate of σ 2 . In this sense both estimators can be regarded as M –estimators of scale (see Huber (1981) and Croux (1994)). Now consider the case where the St are χ2ν /(ν − 2) random variables so that the Xt are scaled tν random variables with E(Xt2 ) = σ 2 . Evaluation of E0 (St |X) gives the recursive equation for the maximum likelihood estimator of σ 2 to be −1 n  Xt2 ν+1 1 X 1+ Xt2 . (5) σ ˆν2 = ν − 2 n t=1 (ν − 2)σ02 A proof is given in the Appendix. For given ν this estimator will need to be iterated to obtain the maximum likelihood estimator of σ 2 . It is the moving-average equivalent of this estimator that we choose to base our volatility estimation procedure on. 3. Local volatility estimation We now consider the time series of daily returns Rt (t = 1, . . . , T ) defined by (1) with heavy–tailed t and where the Rt have been corrected for evolving level if appropriate. We consider a general non–parametric historical volatility estimator given by the finite moving average of span 2r + 1 r X 2 σ ˆt2 = wj qt+j Rt+j (6) j=−r

Pr where the weights wj satisfy j=−r wj = 1 and control the precision and smoothness of σ ˆt2 as an estimator of the underlying time–varying volatility. The robustness weights qt ≥ 0 are used to downweight extreme observations and, as a consequence, may be a function of the Rt .

Non-parametric estimation of historical volatility

7

Within the estimation window, E(ˆ σt2 ) = σt2 /τ where τ typically depends on the estimator used, and the underlying distribution of the t . A focus of robust statistics, and here, is forming estimators which perform well for any underlying distribution, and so specification of a distribution will typically be avoided, even when the form of the estimator allows theoretical calculation of τ . This problem is evident in Turner and Weigel’s (1992) estimates of the volatility of the S&P 500 and Dow Jones indices. There, volatility based on the (robust) moving interquartile range has a very different scale to that based on a moving sample variance. It follows though, that within the estimation window Rt /ˆ σt ∼ (0, 1/τ ), and due to local estimation we can estimate 1/τ from the entire series using the sample variance. To correct any evolving volatility estimates σ ˆt2 , we multiply through (6) by τˆ where 2 T  1 X Rt . (7) τˆ = T t=1 σ ˆt This estimator is just the sample variance of all the scale adjusted (or standardised) returns Rt /ˆ σt and, although not a robust or reliable estimator in small samples, it can be expected to provide a reliable estimate of the variance for very large samples. As noted in Section 1, the extreme values of the heavy–tailed distribution are assumed to appear representatively in large samples of the order of the series length T , but will be over–represented when they occur in small samples of the order of the span 2r+1 of the moving estimation window. This correction ensures the series of standardised returns have unit variance, with the robust procedure dictating exactly how the volatility estimates react to returns as they pass through the estimation window. 3.1. The moving sample variance The commonly used non–parametric estimator of volatility is the historical volatility r X 1 Vt = R2 (8) 2r + 1 j=−r t+j which is the moving variance of the (zero-mean) returns. Comparing (8) with the general EM result (4) indicates that the moving variance is optimal for normally distributed returns, in which case St ≡ 1. Our objective is to improve this estimator by incorporating robustness weights and thus better reflect the typical properties of the financial data under study. The moving variance is a special case of the general finite moving average of span 2r + 1 in (6). Reconciling (8) with (6), we see the smoothness weights for the moving variance estimator are wj = 1/(2r +1) for all j, and the robustness weights are identically one for all t. Further, comparison with (4) suggests setting wj = 1/(2r + 1) and qt = E0 (St |X) in the general estimator. 3.2. The moving MAD A simple robust estimator of volatility can be based on a moving median absolute deviation (MAD). Like the interquartile range estimator used by Turner and Weigel (1992), the robustness weights given by the moving MAD depend on every return in the window, and as such, the moving MAD does not fit into the general framework (6). For each observation in the body of the series, i.e. Rt for t = r + 1, . . . , T − r, we calculate

Non-parametric estimation of historical volatility

8

the MAD (2) and square the resulting value to obtain an estimate proportional to volatility. 3.3. The moving A–estimator The moving MAD can be used as an input to a volatility estimator motivated by the A–estimator (3). The A–estimator of volatility has wj = 1/(2r + 1) for all j, and  4 2   1 − Rt |Rt | ≤ cst qt = (cst )2   0 |Rt | > cst where s2t is the uncorrected MAD volatility estimator, and c = 9 should be used for efficient estimation. Correction factors featuring in (3) but which are not present in wj or qt are incorporated into calculation of τˆ and can be ignored at the filtering stage. Research into robust scale estimation suggests that the A–estimator of volatility should be highly efficient for a wide range of underlying models for t . 3.4. The iterated t–estimator We finally present the iterated t–estimator, whose underpinning assumptions we feel most closely reflect the empirical properties of daily stock returns. As before, the smoothness weights are wj = 1/(2r + 1) for all j and in this case, the robustness weights are given by !−1 Rt2 ν+1 1+ qt = 2 ν−2 (ν − 2)ˆ σ0,t 2 where σ ˆt,0 is a previous estimate of volatility. The updated volatility estimate is !−1 r 2 X ν + 1 R 1 t 2 1+ Rt2 (9) σ ˆ1,t = 2 2r + 1 j=−r ν − 2 (ν − 2)ˆ σ0,t

and the theory of the EM algorithm states that the likelihood function evaluated using σ ˆ12 can not be less than its value at σ ˆ02 , i.e. we can’t get worse with an 2 additional iteration. Consequently, use of the A–estimate of volatility for σ ˆ0,t and a single additional filtration of the returns using (9) should yield an improved volatility estimate if specification of the t–distribution is appropriate. The fully iterated testimator based on repeated application of (9) will yield the true maximum likelihood estimator for t–distributed t , and the final volatility estimate will be based on the converged estimates with the correction (7). In the simulation and data analysis section that follows, a further alternative is considered which does not fit into the framework (6). This is the GARCH(1,1) estimator. Although it is a parametric estimator of (conditional) volatility, it is applied so generally and often to series which are not GARCH(1,1), that it is often viewed as a non-parametric estimator of historical volatility. Unlike (6), this estimator is based on the past returns alone, rather than on a symmetric smoothing window. However, it provides a useful benchmark to compare with the other contenders for estimation of σt2 .

Non-parametric estimation of historical volatility

9

4. Simulations Using simulated data, we now consider the relative performances of the local volatility estimators based on the standard deviation, MAD, biweight A–estimator, t–estimator, and GARCH(1,1) estimator respectively. We selected ν = 5 for the t–estimator since the t5 distribution has heaviest tails among the t–distributions with finite variance and kurtosis. In all cases the iterated t–estimator was initialised by the local sample variance and iterated a further 3 times to produce a final estimate. The GARCH(1,1) estimate of conditional volatility is obtained using maximum likelihood with conditionally Gaussian innovations. For the simulation study we simulated 270 returns from the scaled tν distribution with unit variance and with ν = 3, 5, 9 to represent varying degrees of heavy tailed behaviour and also from the standard Gaussian distribution (a scaled tν with ν = ∞), and the Laplace (double exponential) distribution with unit variance. The series length was chosen to roughly represent a calendar year of trading days allowing for end effects. Our estimators were based on moving windows of span 2r + 1 = 21 with uniform weights wj = 1/21. The latter were selected since our concern at this stage was with the precision of the estimators rather than their smoothness. The impact of the smoothness weights wj on the properties of the estimators, among other issues, remains to be investigated as part of a more extensive simulation study. We apply the volatility estimators to data generated with both deterministic and stochastic volatility. 4.1. Estimation for deterministic volatility In the deterministic case, the 270 simulated returns were generated by multiplying σt based on the smooth volatility function σt2 = 9esin(πt/125) .

(10)

by independent scaled innovations t from each of the five distributional choices: the standard Gaussian, the scaled t–distributions with 3, 5 and 9 degrees of freedom, and the unit variance Laplace distribution. The five estimates of σt2 were calculated and scaled so that the standardised returns had unit sample variance.P They were assessed by their mean absolute σt2 − σ proportionate error (MAPE = N1 ˆt2 /σt2 ) computed over the N = 250 volatility estimates available. Note that 10 observations of volatility are lost from each end of the series when a symmetric 21 day smoothing window is used. These statistics were then averaged over 5000 independent realisations of the time series, for each of the five distributions and the five estimates, to yield the results in Table 1. The results are self evident. The iterated t–estimator performed best, even in the cases where the underlying distribution was not the t5 distribution. As expected, the volatility estimator based on the moving standard deviation performed reasonably well for ν = 9 and ∞ (the latter case corresponding to the standard Gaussian distribution), but its performance deteriorated as ν decreased and for the Laplace distribution. Surprisingly, the moving sample variance was not best for the t∞ returns. The downweighting of extreme observations in the iterated t–estimator also served to produce a smoother volatility estimate and a lower average MAPE. The biweight A–estimator performed reasonably well in all cases. It might be expected that the A–estimator would have a comparable or better performance than the iterated t– estimator for heavy tail data not well–approximated by a t–distribution. However

Non-parametric estimation of historical volatility

10

this has not yet been verified. The GARCH(1,1) conditional volatility estimator lay between the A–estimator and the iterated t–estimator for each distribution. The MAPEs themselves are shown in Figure 1. We see that not only are the average MAPEs lower for the t–estimator, but also they are generally less variable. Due to the large simulation size (10000 series), all observed differences are highly significant. For both the t9 and Gaussian data, the moving standard deviation, A–estimator, t–estimator and GARCH(1,1) estimator are all of very similar quality. We also consider the performance of the t–estimator through time. Figure 2 shows the proportionate errors (ˆ σt2 −σt2 )/σt2 for t = 10, 30, . . . , 250 for 10000 simulated series. 2 The function σt is also shown for reference, with scale given by the right-hand axis. We see that there is generally a negative bias, and that this bias is largest when σt2 is high. This indicates that the smoothing window of 21 observations is too wide to accommodate the curvature of the volatility function concerned. As the t–estimator smooths the squared returns, the high downwards concavity of the volatility function around t = 50, and its strong non-linearity, induces this negative bias. 4.2. Estimation for stochastic volatility One potential criticism of the deterministic volatility function used in the simulation reported above is that it is too smooth. As a consequence, we also provide results for returns based on a stochastic volatility function. We use a standard discrete-time stochastic volatility specification σt2 = 9eht ht = ρht−1 + σt

(11)

where ρ = 0.97, σ = 0.20 and {t } is a Gaussian white noise process with unit variance. These parameter values have been taken from Liesenfeld and Jung (2000) to reflect typical values for individual stocks. Standardised returns are generated from the Laplace and tν distributions independently of the volatility process, and the actual returns obtained using (1). Once again, the five estimates of σt2 were calculated and scaled so that the standardised returns had unit sample variance. The MAPEs for each series were averaged over 5000 independent realisations of the time series, for each of the five distributions and the five estimates, to yield the results in Table 2. As for the deterministic volatility case, the iterated t–estimator is the best performing estimator for all five distributions. The standard deviation does poorly for the heavy tailed distributions, and the MAD does poorly for the t9 and Gaussian distributions. The A–estimator once again does a good job, and the GARCH(1,1) estimator improves on the A–estimator in all but the Gaussian case. As would be expected, the average MAPEs of the t–estimator are higher in the stochastic volatility case than in the deterministic case. The MAPEs themselves are shown in Figure 3 with similar features to the deterministic case. 4.3. Conclusion On the basis of this limited analysis, it is clear that the iterated t–estimator is the best of the estimators we have considered. Although it is unable to forecast volatility, is can be used for an important part of the forecasting process, in which we gain understanding into the past behaviour of volatility, and the distribution of the t .

Non-parametric estimation of historical volatility

11

5. Data analysis Finally, we turn to examine the performance of these estimators on actual price data: two individual stocks listed on the Australian Stock Exchange (ASX), and the S&P 500 Index. Coca-Cola Amatil Ltd (CCL) and The Broken Hill Proprietary Company Ltd (BHP) are among the largest and most actively traded companies listed on the ASX. Daily closing price data for these stocks, and for the 500 trading days following 18 September 1998, were analysed. A time series plot of the daily returns for each stock features periods of low volatility and periods of high volatility, and a small number of extreme returns. This latter phenomenon is more evident in the CCL returns. We again use a moving window of 2r + 1 = 21 observations. Calculation of evolving volatility for CCL using the moving variance produced volatility estimates which were badly affected not only by the large returns, but also by the many small returns. The resulting fluctuations in the volatility estimates gave a distribution of standardised returns that was not well approximated by the Gaussian distribution since it had a sharp peak and values outside four standard deviations from the mean. Estimating volatility using the A–estimator also resulted in standardised returns that were not well approximated by the normal distribution. However the distribution of these standardised returns had a smoother peak (the generally lower volatility estimates had not brought so many returns close to zero), and a distribution that was reasonably well–approximated by a tν distribution with ν = 5.55. On this basis, parametric volatility estimation based on the t5 distribution should improve the quality of the volatility estimates. A plot against time of the three estimates of evolving volatility based on the moving variance, A–estimator and iterated t–estimator is shown in Figure 4 and shows that the t–estimator is the most stable. It is clear that the t–estimator typically adopts a compromise position between the moving variance and the A–estimator, but closer to the A–estimator. The impact of extreme returns is clearly evident on the moving variance and the A–estimator often appears to discount such returns too heavily. The superior performance of the iterated t–estimator is to be expected given the results of the simulation study reported in Table 1 and the fact that the distribution of standardised returns was well–approximated by a t–distribution. In contrast, the distribution of the standardised residuals for BHP are well approximated by a normal distribution. The tails of this distribution decay quickly and all observations are within 3 standard deviations of the mean. While the use of the moving variance is appropriate for this data, the other two volatility estimators give almost identical volatility estimates as shown in Figure 5. Thus the t–estimator and A–estimator retain high efficiency in this situation also. Indeed, by using the t–estimator generally, it seems that we benefit in the case of long-tailed residuals, and maintain high efficiency with well-behaved data. The S&P 500 Index is a much studied financial time series (see for example Ding and Granger (1996), Ding, Granger and Engle (1993), Figlewski (1997), Taylor (1986) and Turner and Weigel (1992)). Volatility is calculated for this series using daily data for the period 1969–2001 inclusive using the moving variance and the iterated t–estimator. These estimates are often very similar, however the moving variance is grossly affected by the occasional extreme returns in the series. This is particularly evident around the October 1987 crash. Forming the standardised returns for this series using the t–estimator, we obtain a distribution which is well described by a t5 distribution around the mode.

Non-parametric estimation of historical volatility

12

Ding and Granger (1996) demonstrate the volatility persistence of the S&P 500 Index using the estimated autocorrelation function (correlogram) of the absolute returns. The correlogram for the absolute returns is shown in the top plot of Figure 6 and we see the very slow decay of the autocorrelations described in earlier studies. In contrast, the lower plot of Figure 6 has the correlogram for the absolute standardised returns |Rt |/ˆ σt where σ ˆt is provided by the iterated t–estimator. We see that the volatility estimate has accounted for almost all the significant autocorrelation in the absolute returns, and that which remains is limited to lags less than the width of the smoothing window (21 days). We conclude the iterated t–estimator is correctly identifying σt in the decomposition (1). 6. Conclusions This paper considers non–parametric estimation of evolving volatility in the context of heavy–tailed distributions of returns. A new robust time series estimation procedure based on finite moving averages and the t–distribution has also been introduced. Our preliminary findings indicate that local volatility estimation based on the biweight A–estimator with tuning parameter c = 9 is a reliable estimator in all situations. The iterated t–estimator performed best in the many cases where the distribution of returns is well–approximated by a t–distribution with between 3 and 9 degrees of freedom, as well as in the Gaussian and Laplace cases. Both our simulation and and data analyses indicate that the iterated t–estimator is a preferable estimator to the moving variance. In particular, the underlying assumptions of the t–estimator are no less restrictive than those of the moving variance, yet appear much more suitable for the observed return distributions. By improving estimates of “realized” volatility, use of the iterated t–estimator may result in better selection of parametric forecasting models on the basis of out-of-sample forecasting performance. More generally, the techniques we use in Section 2.4 could be used to develop other suitable estimators. Alternatives could be considered for the distribution of St that involve censoring to minimize the impact of outliers, mixed distributions with point mass at zero to account for sticky prices, and mixture distributions among other candidate distributions chosen to exemplify the target family of distributions under study. In some circumstances it may also be possible to infer the underlying distribution of St from procedures where E0 (St |X) has been specified. These remain topics for further research. Based on the robustness and resistance properties of the iterated t–estimator summarised in Section 4, we propose that it replace the moving variance historical volatility estimator in empirical analysis. The iterated t–estimator is not difficult to implement, and it better reflects the underlying properties of typical financial return data. Acknowledgments Special thanks to Clive Granger for comments on this work. Additional comments from an anonymous referee and an Associate Editor allowed us to greatly improve the paper.

Non-parametric estimation of historical volatility

13

References A¨ıt–Sahalia, Y. (1996). Nonparametric pricing of interest rate derivative securities. Econometrica 64, 527–560. Andersen, T. G., Bollerslev, T., Diebold, F.X. and Labys, P. (2001). The distribution of realized exchange rate volatility. Journal of the American Statistical Association 96, 42–55. Anderson, M. and Grier, D.A. (1992). Robust, non-parametric measures of exchange rate variability. Applied Economics 24, 951–958. Andreou, E. and Ghysels, E. (2002). Rolling-sample volatility estimators: some new theoretical, simulation and empirical results. Journal of Business and Economic Statistics 20, 363–376. Barndorff–Nielsen, O.E. and Shephard, N. (2001). Non–Gaussian Ornstein– Uhlenbeck–based models and some of their uses in financial economics. Journal of the Royal Statistical Society, Series B 63, 167–241. Blattberg, R. and Gonedes, N. (1974) A comparison of the stable and Student distributions as statistical models for stock prices. Journal of Business 47, 244– 280. Bollerslev, T., Chou, R. Y. and Kroner, K. F. (1992). ARCH modeling in finance: A review of the theory and empirical evidence. Journal of Econometrics 52, 5–59. Croux C. (1994). Efficient high-breakdown M –estimators of scale. Statistics and Probability Letters 19, 371–379. Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38. Ding, Z. and Granger, C. W. J. (1996). Modeling volatility persistence of speculative returns: A new approach. Journal of Econometrics 73, 185–215. Ding, Z., Granger, C. W. J. and Engle, R. F. (1993). A long memory property of stock market returns and a new model. Journal of Empirical Finance 1, 83–106. Engle, R. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of the United Kingdom inflation. Econometrica 50, 987–1007. Fama, E. (1965). The behaviour of stock market prices. Journal of Business 38, 34–105. Figlewski, S. (1997). Forecasting volatility. Financial Markets, Institutions & Instruments 6, 1–88. Granger, C. W. J. and Ding, Z. (1995). Some properties of absolute returns, an alternative measure of risk. Annals of Economic Statistics 40, 67–91. Gross, A. M. (1976). Confidence interval robustness with long-tailed symmetric distributions. Journal of the American Statistical Association 71, 409–416. Hamilton, J. D. and Susmel, R. (1994). Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics 64, 307–333. Harvey, A., Ruiz, E. and Shephard, N. (1994). Multivariate stochastic variance models. Review of Economic Studies 61, 247–264. Huber, P.J. (1981). Robust Statistics, John Wiley and Sons. Hurst, S. R. and Platen, E. (1997). The marginal distribution of returns and volatility. In L1 –Statistical Procedures and Related Topics (ed. Dodge, Y.), IMS Lecture Notes–Monograph Series 31, California. Iglewicz, B. (1983). Robust scale estimates. In Understanding Robust and Exploratory Data Analysis, John Wiley, New York.

Non-parametric estimation of historical volatility

14

Kon, S. J. (1984). Models of stock returns–A comparison. Journal of Finance 39, 147-165. Lamoureux, C. G. and Lastrapes, W. D. (1990). Persistence in variance, structural change, and the GARCH model. Journal of Business and Economic Statistics 8, 225–234. Lax, D. A. (1975). An interim report of a Monte Carlo study of robust estimators of width. Technical Report 93, Series 2, Department of Statistics, Princeton University. Lax, D. A. (1985). Robust estimators of scale: finite-sample performance in longtailed symmetric distributions. Journal of the American Statistical Association 80, 736–741. Liesenfeld, R. and Jung, R. C. (2000). Stochastic volatility models: conditional normality versus heavy-tailed distributions. Journal of Applied Econometrics 15, 137–160. McConnell, M. M. and Perez–Quiros, G. (2000). Output fluctuations in the United States: What has changed since the early 1980’s? American Economic Review 90, 1464–1476. Officer, R. R. (1973). The variability of the market factor of the New York Stock Exchange. Journal of Business 46, 434–453. Randal, J. A. (2003). A reinvestigation of robust scale estimation in finite samples. Technical Report 03-07, School of Mathematical and Computing Sciences, Victoria University of Wellington, New Zealand. Rousseeuw, P. J. and Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association 88, 1273–1283. Ryd´en, T., Ter¨ asvirta, T. and ˚ Asbrink, S. (1998). Stylized facts of daily return series and the hidden Markov model. Journal of Applied Econometrics 13, 217–244. Shephard, N. (1996). Statistical aspects of ARCH and stochastic volatility. In Time Series Models in Econometrics, Finance and Other Fields (eds Cox, D.R., Hinkley, D.V. and Barndorff–Nielsen, O.E.), Chapman and Hall, London. Taylor, S. (1986). Modelling Financial Time Series. John Wiley, New York. Turner, A. L. and Weigel, E. J. (1992). Daily stock market volatility: 1928-1989. Management Science 38, 1586–1609.

Non-parametric estimation of historical volatility Estimator Biweight A–estimator GARCH(1,1) estimator Historical volatility Iterated t–estimator Median absolute deviation

Laplace 0.474 0.381 0.470 0.373 0.572

t3 0.552 0.543 0.801 0.521 0.625

15 t5 0.399 0.376 0.438 0.340 0.511

t9 0.349 0.326 0.344 0.293 0.482

Gaussian 0.307 0.293 0.282 0.263 0.468

Table 1. The average mean absolute proportionate error (MAPE) of five local volatility estimators estimating the smoothly varying volatility function (10) over 250 observations and moving windows of span 21. GARCH(1,1) is an exception, and this is based on the past observations with an additional 100 observation run-in period. The average is over 5000 simulated series, each with an individual MAPE. The simulated innovations are from the Laplace, scaled tν –distributions with ν = 3, 5, 9 and standard Gaussian distributions. The parameters of the GARCH(1,1) estimator are chosen using maximum likelihood.

Estimator Biweight A–estimator GARCH(1,1) estimator Historical volatility Iterated t–estimator Median absolute deviation

Laplace 0.536 0.476 0.543 0.455 0.619

t3 0.610 0.609 0.853 0.591 0.675

t5 0.467 0.470 0.509 0.423 0.559

t9 0.425 0.434 0.426 0.384 0.535

Gaussian 0.385 0.405 0.369 0.355 0.516

Table 2. The average mean absolute proportionate error (MAPE) of five local volatility estimators estimating the stochastic volatility (11) over 250 observations and moving windows of span 21. GARCH(1,1) is an exception, and this is based on the past observations with an additional 100 observation run-in period. The average is over 5000 simulated series, each with an individual MAPE. The simulated innovations are from the Laplace, scaled tν –distributions with ν = 3, 5, 9 and standard Gaussian distributions. The parameters of the GARCH(1,1) estimator are chosen using maximum likelihood.

Appendix A. Derivation of weights for the iterated t–estimator Implementation of (4) relies on specification of St and evaluation of E0 (St |X). The conditional distribution of St given X is √ 1 xt 2 dF (st |x) ∝ f (xt |st )dFS (st ) ∝ st e− 2 ( σ ) st dFS (st ), and under the assumption that the Xt are independent, we observe that R ∞ − 1 ( x )2 s 3 e 2 σ s 2 dFS (s) E0 (St |X) = E0 (St |Xt = x) = R0∞ − 1 ( x )2 s 1 e 2 σ s 2 dFS (s) 0 where the denominator is the normalising constant of the conditional distribution. In particular  2 ! 1 Xt E0 (St |X) = Q 2 σ0 where Q(t) = −d log M (t)/dt and M (t) is the Laplace transform Z ∞ √ M (t) = e−ts sdFS (s) (t ≥ 0). 0

16

1.0 0.5 0.0

Mean Absolute Proportionate Errors

1.5

Non-parametric estimation of historical volatility

laplace distribution

t(3) distribution

t(5) distribution

t(9) distribution

normal distribution

Source code: /home/john/Research/Npvol−im

ctory: /home/john/Research/Npvol−improved/Store A

g

hv

t

M

A

g

hv

t

M

A

g

hv

t

M

A

g

hv

t

M

A

g

hv

t

Estimator

Figure 1. 5000 realisations of the mean average proportionate errors for each estimator with true volatility given by (10), and A denoting the moving A– estimator, g, the GARCH(1,1) estimator, hv, the moving variance, t, the moving iterated t–estimator, and M, the moving median absolute deviation, for series of 250 returns. The five blocks represent simulated returns from the Laplace, and scaled tν –distributions with ν = 3, 5, 9, ∞ respectively.

Now consider the case where the St are independent χ2ν /(ν − 2) random variables so that the Xt are scaled tν random variables with E(Xt2 ) = σ 2 . St has density ν−2 fS (s) = Γ( ν2 )

  ν2 ν 1 ν 1 s 2 −1 (ν − 2) 2 −1 e− 2 s(ν−2) 2

√ and so, since sfS (s) is proportional to a gamma density with parameters 1 and 2 (ν − 2) respectively, we see that  M (t) =

ν−2 ν − 2 + 2t

 12 (ν+1)

 t > − 12 (ν − 2) .

Hence ν+1 Q(t) = ν−2



2t 1+ ν−2

−1

1 2 (ν

+ 1)

M

17

5

Non-parametric estimation of historical volatility

9 0

0 −1

R directory: /home/john/PHD/NpvolThesis

Source code: /home/john/PHD/NpvolThesis / 10

30

50

70

90

110

130

150

170

190

210

230

Time

Figure 2. Boxplots showing the proportionate bias (ˆ σt2 − σt2 )/σt2 through time, for 5000 simulated series. The simulated returns have a scaled t5 –distribution, and the function σt2 is shown for reference, with magnitude given by the right-hand axis.

and so σ ˆν2 = as required.

18

Volatility

2 1

Proportionate error

3

27

4

21 Days

−1 n  Xt2 ν+1 1 X 1+ Xt2 ν − 2 n t=1 (ν − 2)σ02

250

18

1.5 1.0 0.5 0.0

Mean Absolute Proportionate Errors

2.0

Non-parametric estimation of historical volatility

laplace distribution

t(3) distribution

t(5) distribution

t(9) distribution

normal distribution

Source code: /home/john/Research/Npvol−im

ctory: /home/john/Research/Npvol−improved/Store A

g

hv

t

M

A

g

hv

t

M

A

g

hv

t

M

A

g

hv

t

M

A

g

hv

t

Estimator

Figure 3. 5000 realisations of the mean average proportionate errors for each estimator with true volatility given by (11), and A denoting the moving A– estimator, g, the GARCH(1,1) estimator, hv, the moving variance, t, the moving iterated t–estimator, and M, the moving median absolute deviation, for series of 250 returns. The five blocks represent simulated returns from the Laplace, and scaled tν –distributions with ν = 3, 5, 9, ∞ respectively.

M

19

0.0015 0.0010 0.0005 0.0000

Volatility estimate

0.0020

0.0025

Non-parametric estimation of historical volatility

0

100

200

300

400

Time

Figure 4. Squared returns and volatility estimates for CCL for the 500 trading days preceding 11 September 2000. The volatility estimates are the squares of: moving variance shown by the dotted line, moving A–estimator shown in dark grey, and moving t5 estimator shown by the solid line. The squares of absolute returns greater than 5% are not shown in the plot area.

500

20

0.0008 0.0006 0.0004 0.0002 0.0000

Volatility estimate

0.0010

0.0012

Non-parametric estimation of historical volatility

0

100

200

300

400

Time

Figure 5. Squared returns and volatility estimates for BHP for the 500 trading days preceding 11 September 2000. The volatility estimates are the squares of: moving variance shown by the dotted line, moving A–estimator shown in dark grey, and moving t5 estimator shown by the solid line. The squares of absolute returns greater than 3.5% are not shown in the plot area, and the plot is not in the same scale as that in Figure 4.

500

21

0.08 0.04 0 −0.04 0.02

Absolute returns

−0.02

Sample autocorrelation

0.12

0.16

Non-parametric estimation of historical volatility

Absolute standardised returns 0

200

400

600

800

Lag

Figure 6. the estimated autocorrelation functions for the absolute S&P 500 returns, and for the absolute standardised returns. The top plot is for |Rt |, and the lower is for |Rt |/ˆ σt where σ ˆt is the volatility given by the iterated t-estimator. Approximate 95% confidence intervals for the autocorrelation estimates are shown. The two plots are on the same scale, and only ρ0 = 1 is not shown.

1000