Numerical Techniques for Maximum Likelihood Estimation of ...

20 downloads 0 Views 762KB Size Report
Numerical Techniques for Maximum Likelihood Estimation of Continuous-Time Diffusion. Processes. Author(s): Garland B. Durham and A. Ronald Gallant.
Numerical Techniques for Maximum Likelihood Estimation of Continuous-Time Diffusion Processes Author(s): Garland B. Durham and A. Ronald Gallant Source: Journal of Business & Economic Statistics, Vol. 20, No. 3 (Jul., 2002), pp. 297-316 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/1392112 Accessed: 03/12/2010 11:54 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=astata. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of Business & Economic Statistics.

http://www.jstor.org

Editor's Note: This article is the JBES Invited paper presentedat the 2001 Joint StatisticalMeetings.

Numerical Techniquesfor Estimation

Likelihood Diffusion

Maximum Continuous-Time

of

Processes

GarlandB. DURHAM of Economics,University of Iowa,IowaCity,IA 52242-1000([email protected]) Department A. Ronald GALLANT

of Economics,University of NorthCarolina,ChapelHill,NC 27599-3305 Department ([email protected]) Stochastic differentialequationsoften provide a convenientway to describe the dynamics of economic and financialdata, and a great deal of effort has been expended searchingfor efficient ways to estimate models based on them. Maximum likelihood is typically the estimator of choice; however, since the transitiondensity is generally unknown,one is forced to approximateit. The simulation-basedapproach suggestedby Pedersen(1995) has greattheoreticalappeal,but previouslyavailableimplementationshave been computationallycostly. We examine a variety of numericaltechniques designed to improve the performanceof this approach.Synthetic data generatedby a Cox-Ingersoll-Rossmodel with parameters calibratedto match monthly observationsof the U.S. short-terminterest rate are used as a test case. Since the likelihood function of this process is known, the quality of the approximationscan be easily evaluated. On datasets with 1,000 observations,we are able to approximatethe maximum likelihood estimatorwith negligible errorin well under 1 min. This representssomethingon the orderof a 10,000fold reductionin computationaleffortas comparedto implementationswithoutthese enhancements.With other parametersettings designed to stress the methodology,performanceremains strong. These ideas are easily generalizedto multivariatesettings and (with some additionalwork) to latent variablemodels. To illustrate,we estimate a simple stochastic volatility model of the U.S. short-terminterestrate. KEY WORDS: Continuous-timeestimation;Short-terminterestrate; Simulatedmaximumlikelihood; Stochastic volatility.

Stochastic differential equations (SDE's) often provide a convenient way to model economic and financial data, and their use has become increasingly common in recent years. Although the process specified by a stochastic differential equation is defined in continuous time, the data which are available are typically sampled at discrete time intervals.The resulting estimation problem turns out to be nontrivial, and considerableenergy has been expended in developing computationally (and statistically)efficient estimation schemes. In this article, we focus primarily on scalar, timehomogeneous processes. In particular,we consider the diffusion process generatedby an SDE of the form

(1995b) suggests a simulation-basedapproachwhich involves integratingout unobservedstates of the process at intermediate points between each pair of observations(see also Santa-Clara 1995; Brandt and Santa-Clara2002). While this approach, commonly known as simulated maximum likelihood estimation (SMLE), is able to come arbitrarilyclose to the true transition density,previously availableimplementationshave been computationallyburdensome. Other approacheshave been proposedwhich are much less computationally costly. For example, the process described by (1) has a first-orderapproximationgiven by the discretetime process

dX = A(X; 0) dt + o(X; 0) dW

X(to)=X

=i+ =

(1)

with parameter vector 0. Suppose that the sample {Xi = = X(ti), i 0,... , n} is availablefor analysis. The observations need not be equally spaced. Ideally,one would like to know the transitiondensity,which would allow one to compute the maximumlikelihood estimator with its usual optimalityproperties.Although exact transition densities are known in only a few isolated cases, several approachestoward approximatingthe transitiondensity have been proposed. Lo (1988) suggests numericallysolving the Fokker-Planck partial differential equation for each observation. Pedersen

Xi +(i;

Ai = ti+l-

ti,

+ 8O)Ai (Xi; 0)A1/2E Ei N(O, 1).

(2)

Under mild regularity conditions, the maximum likelihood estimator based on this approximation is known to converge to the true maximum likelihood estimator as the sampling intervalgoes to zero (Florens-Zmirou1989). While this approachis very appealing from a computationalviewpoint,

297

? 2002 American Statistical Association Journal of Business & Economic Statistics July 2002, Vol. 20, No. 3 DOI 10.1198/073500102288618397

298

the approximationmay not be sufficiently accurate for the sampling frequencies at which reliable data are available. There are various ways in which one might improve upon this idea. Elerian (1998) suggests replacing the Gaussian density in (2) by a noncentralchi-squareddensity which is derived from the Milstein scheme, an order 2.0 weak approximation to the true process. Shoji and Ozaki (1998) linearize the SDE, obtaining an approximating Ornstein-Uhlenbeck process (the exact transitiondensity of an Ornstein-Uhlenbeck process is known). Kessler (1997) approximatesthe transition function by a Gaussian density with first and second moments obtained from higher order Ito-Taylor expansions. Alt-Sahalia(2001) approximatesthe transitiondensity using a Hermitefunction with coefficients obtainedusing higher order Ito-Taylor expansions. Except for Aft-Sahalia (2001), these methods still require the sampling interval to go to zero to obtain convergence to the true transitiondensity. While this requirementalso holds for AAt-Sahalia'sapproachwith a Hermite function and Ito-Taylor expansion of fixed order, AftSahalia'sapproximationmay be made arbitrarilyaccuratewith fixed sampling frequency by using a Hermite function and Ito-Taylor expansion of sufficiently high order (given some regularityconditions). Various method-of-moments approaches have also been proposed. Chan, Karolyi, Longstaff, and Sanders (1992) use moments based on Equation (2). Duffie and Singleton (1993), Gallant and Tauchen (1997), Bibby and Sorensen (1995), and Gourieroux,Monfort, and Renault (1993) compute expectations using simulation-based methods. Hansen and Scheinkman (1995) and Duffie and Glynn (1996) use moment conditions obtainedfrom the infinitesimalgenerator. The simulation-based methods can be computationally costly, but have the advantageof being easily adaptedto diffusions with unobserved state variables. Stochastic volatility models and term structuremodels are importantapplications where these techniques have been found useful. The efficient method of moments proposedby Gallantand Tauchen(1996) approachesthe efficiency of maximum likelihood asymptotically, and providesa convenientset of diagnosticmeasuresfor model specification. Markov chain Monte Carlo (MCMC) methods have been proposedby Eraker(2001), Jones (1999a), and Elerian,Chib, and Shephard(2001). There is a close relationshipbetween MCMC methods and SMLE. For example, Elerianet al. point out that their importancesampler can also be used with the simulation-basedapproach of Pedersen (1995b) to substantially reduce the computationaleffort requiredto obtain reasonably accuratelikelihood approximations. In this article, we focus on the SMLE approach.The basic idea is quite simple. Suppose that one wishes to obtain the transitiondensity p(x,, t; xs, s). The first-orderapproximation p(1)(x,, t; xs, s) defined by (2) will be accurateif the interval [s, t] is sufficiently short. Otherwise, one may partition the interval s = 71 < 72 < "" < M = t such that the first-order approximationis sufficientlyaccurateon each subinterval.The random variables X(z1) .... X(7Ml) are, of course, unobserved, and must be integrated out. Because the process is

Journalof Business & EconomicStatistics,July 2002

Markovian,one obtains p(x,, t; xs, s)

p(M)(Xt, t; X,, S)

(3)

M-1

=

H P()(Um+i,

m=0

x dA(u1,...

ur, Tm+,;

UM_)

Tm)

(4)

where A denotes the Lebesgue measure, and we use the convention uo= x, and uM= xt to conserve notation.Monte Carlo integrationis generally the only feasible way to evaluate this integral. The theoretical issues involved with this approach are already reasonablywell understood.Sufficient conditions for the approximationin (3) to converge are known. While it is certainlyof value to extend these conditions, we do not undertake this task here. The theories of Monte Carlo integration and maximum likelihood estimation have also been extensively studied. Nonetheless, although the simulation-based approach is attractivefrom a theoretical point of view, the computationalburden associated with previous implementations has hinderedits widespreaduse. We have found that it can be quite costly to attaineven the degree of accuracyprovided by the simple first-orderapproximation(2). It is this shortcomingwhich we seek to address. We attack the problem of computationalefficiency from two directions.We first seek to improve the approximationin Equation(3). This allows one to attain a given level of accuracy with fewer intermediatepoints. We consider extrapolation techniques and the use of alternativesto the first-order (Euler)approximationof the subtransitiondensities. Secondly, we examine techniques to accelerate the convergence of the Monte Carlointegration.We consider severalimportancesamplers and randomschemes. Finally, we consider transforming the model in such a way as to make the volatility function constant. Workingwith the transformedratherthan the original model turns out to provide a useful improvementin both the accuracy of the approximation(3) as well as the performance of the Monte Carlo integrationused to compute (4). As a test case, we use the square-rootspecificationproposed by Cox, Ingersoll, and Ross (1985) as a model for the shortterm interest rate. Parametersettings are calibratedto match monthly observationsof the U.S. short-terminterestrate. This model has the advantagethat the transitiondensity is available in closed form, which allows us to easily evaluatethe accuracy of our approximations.We also tested our techniques using other parametersettings and models with similar results. On simulated datasets of 1,000 observations, we are able to obtain estimates in well under 1 min (runningFORTRAN code on a 750 MHz PC) which differ negligibly from those obtained by maximizing the exact log-likelihood function. Achieving comparableaccuracywithoutour accelerationtechniques would requiresomething on the orderof a 10,000-fold increase in computationaleffort. Much of the discussion in this article may be readily adaptedto the multivariatesetting. With some additionalwork, the ideas can also be extended to latent variablemodels. We outline an approachto approximatingthe transitiondensity of a continuous-timestochasticvolatility model, and illustrateby

Durhamand Gallant:NumericalTechniquesfor MaximumLikelihoodEstimationof Continuous-Time DiffusionProcesses

estimating a simple model over weekly observations of the U.S. treasury bill rate. We speculate that much carries over to the time-inhomogeneouscase as well; however, we have not examined such extensions carefully.Although it should be possible to apply techniques similar to those considered here to jump diffusions, this is also beyond the scope of this article. A more extensive applicationillustratingthe techniquesdiscussed in this article may be found in Durham(2000). Further explorationof these and relatedtechniquesin multivariateand latent variablesettings is underway. The structureof this article is as follows. Section 1 introduces the notation, and provides some theoretical results, Section 2 describes the benchmarkswhich we will use for evaluation of our techniques, Section 3 examines the performance of the simulation-basedmethod without any of our accelerationtechniques, Section 4 considers the issue of bias reduction, Section 5 considers the issue of variance reduction, Section 6 discusses the results of our numericalexperiments, Section 7 extends these ideas to the stochastic volatility model, Section 8 provides an application, and Section 9 concludes.

1. BACKGROUND To begin, we define some notation,and provide a brief discussion of the theoreticalframework.Let (fl, Y, P) be a probability space, and let W be a Brownian motion defined on it. Let {Ct, t > O} be the filtrationgeneratedby W and augmented by the P-null sets of Y. Let 0 be a compact subset of Rd. We are interestedin the parameterizedfamily of scalar diffusion processes {X(t; 0),0 E } generated by the timehomogeneous SDE dX = ,(X; 0) dt + o(X; 0) dW

X(to; .) = Xo. Assumption1. For each 0 e 0, (5) has a nonexploding, unique weak solution. By nonexploding,we mean thatthereis zero probabilitythat the process diverges to infinity over any fixed time interval. Sufficient conditions ensuring Assumption 1 are well known (e.g., Karatzasand Shreve 1991, sec. 5.5). For example, it suffices that / and o"satisfy global Lipschitz and linear growth conditions. A variety of extensions is also available. Explosiveness would preclude the existence of a transitiondensity, and is thus disallowed. Note that stationarityis not required. For s < t, suppose that X(t; 6)IX(s; 6) has a transitiondensity p(x,, t; xs, s, 6), and let

299

interval [s, t], and let p(M) (Xt, t; xs, s, 0) M

f"F1p(1)(um

iTm;i Um-1TITm-1

0) dA(u1 .....uMU

)

(6)

m=l

where uo = xs, UM= x,, and A denotes the Lebesgue measure. This will serve as our approximatingdensity. For clarity, we will often refer to p(l)(.) as a subtransitiondensity (or occasionally simply subdensity)when used in this context. Suppose that one has a set of observations {Xi = X(ti; 00), i = 0 ..., n} of the process generated by (5) with unknown parameter vector 60, and let Poo,n denote the probability measure induced by {X0 ...X,}. Let ln(6) = E nlogp(Xi,ti; Xi_, ti_1, ) and PIM)(o) = ti; logp(M)(X, Xi_, ti_1,O) denote the log-likelihood i= functions associated with the exact and approximatedensities, respectively. Assumption2. For all s < t, xs in the supportof X(s; 60), ) and 0 e 0. And M > 1, the densities p(.,t;xs,s, exist. p(M)(., t; xS, s, 9) Pedersen (1995a) provides sufficient conditions for Assumption(2) to hold, as well as regularityconditions ensuring that lim p(M)(.,t; Xs,S, 0) = p(-, t; xs, s,0)

M --+

in L'(A).

(7)

We note that Pedersen's results are obtained for multivariate processes. Pedersen's Theorem 2 allows for timeinhomogeneousprocesses. While this theoremrequiresa constant diffusion function, we will see in Section 2 that, for scalar processes at least, this does not impose a materialconstraint.Pedersen's Theorem 3 allows for a variable diffusion function, but imposes other conditions. Although Pedersen's results assume Lipschitz and linear growth conditions on A(.) and ao(.) that are not satisfied for many applicationsof economic interest (including notablythe CIR squareroot process), we speculatethat suitableextensions should be possible using localizationargumentsalong the lines of, for example, Karatzasand Shreve (1991, thm. 5.2.5). Similarly, we will examine subtransitiondensities other than the simple first-orderapproximationshown in (5) (see Section 4) and alternativerandomnumberschemes (see Section 5) which are not covered by Pedersen's results. Again, these extensions seem plausible, but formaljustificationis left for future work. The goal of this article is practical rather than theoretical, and Pedersen's results will serve as a convenient startingpoint. In particular,we assume the following, which Pedersen'sTheorem4 shows to be an immediateconsequence of (7). Assumption3. For each 8 e 0,

p(1)(x,, t; xs, s, 6) = ?(x,; x, +

(x,)(t-

s), a2(xs)(t-

s)),

(5)

lim l•M)(0)=

in(6)

in probabilityunder Poo,n.

The difficulty is how to efficiently evaluate the integral in where k(x;/, a2) is the Gaussian density, be its first-order Equation (6). Monte Carlo integration is generally the only approximation.Let s = 70 < * * < 7' = t be a partitionof the feasible approach.To perform Monte Carlo integration,one

300

Journalof Business & EconomicStatistics,July 2002

requires an importancesampler.Fix s < t, x,, xt, 0, and M, and let q(u ..... uM_-1)denote a probabilitydensity on RM-1. This will be our importance sampler. Some techniques for constructing efficient importance samplers are discussed in Section 5. Let {uk = (Uk,1 ... .k,M-1), k = 1,..., K) be independent draws from q, and let p(M,K)(X,

t; xs, S) S, 1

Hm=lP(1)(Uk,m,

K =

m; Uk,m-1,

m-1,

Our base case uses the parametersettings 60 = (.06, .5, .15) and A = 1/12. These settings are identical to those used in A't-Sahalia (2001) for ease of comparison,and are said to be calibratedto match monthly observationsof the U.S. treasury bill rate. We also test the methods discussed in this article with other models and parametersettings with similarresults. We have found that better results are often obtained if the SDE is firsttransformedto make the diffusiontermof constant magnitude.With the CIR model, for example, setting Y = and applying Ito's lemma gives

8

dY=

q(Uk,1 ...9 Uk,M-1)

2)-

_2Y

8Y-

dt+ dW. 2

where uk,0= x and uk,M = Xt for all k. Then, given AssumpIf P (y,, t; ys, s) denotes the transitiondensity of the transtion 4 below, the strong law of large numbersimplies that formed process, then the density of the original process is = 0 a.s. lim p(MK)(Xt, t; X, 9S)s, (9) obtainedin the usual mannerby p(M)(Xt, t; xs, s, 90) K--

oo

I

A somewhatstrongerconditionprovides I/n convergence[see Geweke (1989)]. Assumption4. Let U0 = xs, UM= x,, 6 e ?, and q be fixed, and let (U1,..., UM_1)be a random vector with density q. Then 0) Tm; Um-, E fmlp(')(Um, Tm_1, q(U19... UMl )

L

I
0, c = 202/[62(1 - e-02)], and Y = 2cX, then YIYsis distributed as noncentralchi-squaredwith 40261/62 degrees of freedom and noncentralityparameterYse-2a or, equivalently, p(x,, t; xs, s) = ce-"-V(v/u)q/2Iq(2

-v)

t;

s)

2,/x

In general, the appropriatetransformationis given by Y = G(X), where G satisfies G'(x)= 1/o-(x). The constant of integrationis irrelevant.Ito's lemma then implies dY = G'(X) dX + - G"(X) 2(X) dt 2

0.

Our goal is to approximatelog I, (0) for a given realization of the process. For this, it will suffice to be able to approximate p(x,, t; xs, s, 9) for arbitrarys < t, xs, x,, and 0. If we can do this with arbitraryprecision, and if the log-likelihood function is continuous and 0 is compact, then we can evaluate the maximumlikelihood estimatorat this realizationwith any desired level of accuracy.We do not treat the estimator obtainedby optimizing the approximatelog-likelihood with a fixed setting of the tuning parametersas an object of independent interest.

dX =

y s) p(x, t; xs, s) = P(Y, t; Ys,t; dx

(11)

where u = CXse-O2, v = cxi, q = 20201/02 - 1, and I,(.) is the modified Bessel function of the first kind of order q. For any experiments where synthetic data from the CIR model are required, we generate them directly using draws from the noncentralchi-squared.

X) --• [(X) =

1,1

dt+dW 2 (X)

St[G-'1(Y)]

2 '[G-(Y)]] dt + dW.

In many cases, G can be obtained analytically;otherwise, it may require a numericalintegration.This does not pose any serious difficulties.If the parametervector enters into o nonlinearly, the transformationwill have to be recomputed for each candidateparameter,which may be inconvenient. This transformationgoes back to at least Doss (1977), and is also used by Shoji and Ozaki (1998) and AAt-Sahalia (2001). (In contrastto those papers,our methodologydoes not require that the model be transformed.)The reason underlying its effectiveness appearsto be that it makes the process closer to Gaussian. This improves the performanceof the approximation p(M),as well as that of the importancesamplers. We comparethe effectiveness of the various approximation techniques using several differentmeasures.First, we look at some density plots. We fix a value for xs, and considera range of values for xt. For each value of x,, we approximatethe density p(x,, t; xs, s) a numberof times using differentseeds for the random numbergenerator.We then compute the difference between the true and approximatelog densities, and plot the median and interquartilerange of the approximation errorsfor each x,. The figures in this article are obtained using the process defined by (10), with A = t- s = 1/12, x, = .10, x, E [.05, .15], and 1,024 repetitions.For reference,the exact transition density with these settings is shown in Figure 1. Since the object of ultimate interest is the log-likelihood, it seems appropriateto examine the error in the log rather

301

Durhamand Gallant:NumericalTechniquesfor MaximumLikelihoodEstimationof Continuous-TimeDiffusionProcesses transitiondensity

30 25

2

20

x

L_. 10

0

0-2 -4

5 0

log transitiondensity

4

-6

0.06 0.08 0.1 0.12 0.14

0.06 0.08 0.1 0.12 0.14

Xt

Xt

Densityfor the CIRModel GivenA = t - s = 1/12, Xs= .10, and 0 = (.5,.06,.15). Densityand Log Transition Figure 1. TrueTransition

than the level of the density. Suppose, for example, that p(x,, t; xs, s) = 10, and we are able to computeit with an error of ?..1. This term would contributean error of +.01 to the log-likelihood. On the other hand, if p(x,, t; x,, s) = .2, the contributionto the log-likelihood of the same approximation errorwould be in the range of [-.7, .4]. If the approximation error is greater than the level of the density, the computed density can be negative. This is clearly catastrophicfor the log-likelihood. These distinctions are obscured if one examines the approximationerrorfor the level of the density. The second measure which we examine is the root mean squared error (RMSE) of the log-density approximation. We approximatethis by generating n = 100,000 simulated observations from the model, and computing a sample analog, that is, RMSE=

(loogP^(ylx)-_logp(ylx))2p(y, x)dydx

(12)

(log P(xi+lxi) - log p(xi+xi))}2

(13)

-i

A reasonablegoal might be to obtain an approximationerror on the orderof 1%of the errorinherentin the MLE itself. We are able to easily obtain this goal for our test case. Virtuallyany method which one might reasonablyconsider should be able to approximatethe log-likelihood functionwith arbitraryprecision given sufficienttime. The key issue is how quickly one is able to obtainsufficientlyaccurateresults.Thus, we also reportcomputationalcosts. As a matterof implementation,variancein the Monte Carlo integral can result in a great deal of jaggedness in the likelihood surface, which will severely degradethe performanceof the optimizer. However, this issue is easily addressed if, for each evaluationof the likelihood function, one uses the same seed for the random numbergeneratorused to draw samples for the Monte Carlo integration.This is especially critical if one is computingnumericalderivatives.At any rate, for many of the methods which we examine, it is relatively straightforward to obtain analyticalderivatives.

3. SIMULATION METHODWITHOUT ACCELERATION TECHNIQUES

In i=1I

where we have denoted the approximatetransitiondensity by 3. It is convenientto assume that the integralin (12) exists. At any rate, the sum in (13) certainlyexists for a fixed realization which is all we really need in orderto compare {X ,...., xn}, across approximationtechniques. Finally, we are interested in the accuracy of the parameter estimates obtained by maximizing the approximaterather than the exact log-likelihood. To measure this, we generate J = 512 data sets of length n = 1,000, and compute 0 for each repetitionusing the exact log-likelihood and the various approximations.We compute the RMSE of the exact maximum likelihood estimates with respect to the parametervector used to actually generatethe data, and the RMSE of the simulated maximum likelihood estimates with respect to the exact maximumlikelihood estimates, that is, J 1/2

= RMSETRUE-MLE

• •1(MLE

00)2/

RMSELSML= =

RMSEMLE-SMLE

_

;=1(

MLE

/2

To establish a baseline, we begin by examining the simulation method as implemented by Pedersen (1995b), that is, without any of our acceleration techniques. The importance sampler used by Pedersen is constructedby simulating paths on each subdivided interval using the Euler scheme. Suppose that s < t, xs = X(s), and x, = X(t) are given. The importance sampler is defined by the mapping T(M): (u, ..... UM_-)given by the recursion (W1..., WM_;0) Um+i = Um+?I(um;

(14)

where uo = xs, = (t - s)/M, and W = (Wi ..., WM_-)is a multivariatestandardnormal. In this case, Equation(8) simplifies considerably.Since the density of the importance sampler q is identical to the first M - 1 factors of the numerator,they cancel, and one is left with

pMK)19)--Z .

)61)/2Wm+1,

m =O-..... M-2

(MK)

-SMLE)2

0)6 +r(um;

t

ix.= K.

s, s, 8, O) = -

UM1

O)=(15)) (xt, t; Uk,Ml ,M

k= l

V)

302

Journalof Business & EconomicStatistics,July 2002 pedersen,M=8,K=256

(a)

.-

-

0 ./

-- -

pedersen,M=8,K=256(zoomed)

(b) 0.15

"--

I

//

2 o.05

-2

0--

-0.1

-4 -5

-0.15

-6

0.06

0.08

0.1

0.12

-0.2

0.14

0.06

0.08

Xt

(d)

0.2

0.15

-

0

2 -0.05 -0.15

-0.2

0.14 I

pedersen,M=32,K=256(transformed model) .,

.

0.15

/

SS0.05 0.

0.12

Xt

pedersen,M=32,K=256

(c)0.2

0.1

/

0.05 I

0

I.I I I

0.06

a. 2 -0.05

I

-0.15

I

/

I

!

0.08

0.1

Xt

0.12

0.14

-0.2

0.06

0.08

0.1

Xt

0.12

0.14

Figure 2. ApproximationError,log3(Xt,t;X3,s) -logp(Xt,t;X,s), Using Pedersen's Method Given A = t -s = 1/12, X = .10, and 0 = model is used in panels (a)-(c), and the (.5, .06, .15). Themedian and interquartile range over 1,024 repetitionsare plotted. Theuntransformed transformedmodel is used in panel (d).

where the {Uk,M1, k = 1,...,K} are drawn from the readily apparent.It would take a great deal of effort even to of An alternate (M 1)st component q. interpretationof match the accuracyof the simple first-orderapproximation,at is to consider the Equation (15) right-handside as the sam- least for our test model. of ple analog E[p(')(xt, t; U_1, 7,_-1, 0)], where the expectation is over u and with respect to the distributioninduced by X(7rM_)IX(s) = x,.

Throughoutthis article, we use the method of antithetic variateswhen drawingrandomnumbers.This is a commonly used variance-reductiontechnique in simulation-basedmethods. To implement antithetic variates, one draws only K/2 samples from the multivariatenormal,and simulatestwo paths from each: T(M)(W)= T(M)(W) is as described above, and T(M)(W)= T(M)(-W) is its "mirrorimage." While we have found antitheticvariatesto provide only marginalbenefit, the cost is also negligible. Figure 2 illustrates the approximationerror which results from computingthe log density using this approach.The settings K = 256 and M = 8 or M = 32 are used (recall that K is the numberof sample paths and M is the numberof subintervals). Panels (a)-(c) use the untransformedmodel. Panel (d) uses the transformedmodel, which appears to provide little benefit in this case. IncreasingM reduces bias, but at the cost of greatervariance.Reducing the varianceis costly since it is

4. BIAS-REDUCTION TECHNIQUES There are two sources of approximationerror which we wish to address:bias due to the first-orderapproximationused in the constructionof p(M), and variance resulting from the Monte Carlo integration. We begin with the bias. While it is possible to drivethe bias to zero by partitioningthe intervalsbetween observationssufficiently finely, this can be computationallycostly. We examine two approachestoward reducing the number of subintervals required to obtain a given level of accuracy.The first is to replace the first-orderapproximationused in Equation(5) by a higher order method. There are several possibilities which one might try. Elerian (1998) suggests using a transitiondensity derived from a scheme due to Milstein (1978). If the volatilityfunction o(.) is constant, the density is identical to that of the firstorder approximation;otherwise, it is given by

CO(K-1/2).

Upon comparisonwith the tables and figures in Section 6, the reason why this approachhas not seen widespreaduse is

= PElerian(Xt,t; Xs, S)

exp AIN-_•x (

2

cosh(/Cz)

Durhamand Gallant:NumericalTechniquesfor MaximumLikelihoodEstimationof Continuous-TimeDiffusionProcesses

303

where

where k lz

2 p A= Xs+ (X) K?+

B

x -

A

A=t-s

A=

B=

,"(X)

[K- '(x,)A]

O2

2 2'' (x,)

2

K = e '(xs)A - 1.

x+ x + pL(xS,)A-

2

Nowman (1997) suggests a similar approach,but simply treating the volatility as if it were constant on each sample C= intervalratherthan transformingthe model so that it actually is constant. While he examines only the special case where the drift function is linear, a plausible extension would be of Note that o'(.) denotes the derivative a. to use a first-orderTaylor expansion for the drift function as transition Kessler (1997) suggests using a Gaussian density, above. The resulting approximationis analogous to described but rather than using the first-orderapproximationsfor the that of and Ozaki, but replacing a by O(xs). Shoji mean and variance,he proposes using higher orderIto-Taylor While Elerian (1998) uses the Milstein density in the conapproximations.We try a second-orderimplementation,that text of a simulation-basedapproach,Kessler (1997), Shoji and is, Ozaki (1998), and Nowman (1997) approximatethe transiPKessler(Xt, t; s, s) = (t 2) tion density between observationsdirectly (i.e., without using , intermediatepoints). where Another approach to obtaining higher order methods is extrapolation.Given, for example, a first-ordermethod, one o (2 2 + + + =x, X may constructa second-ordermethod as follows: ll(xs)A ?[I(x)p/'(xs) 2a'(x,)

1

2= x2+{2L(Xs)Xs + 2((Xs)}a

p(M)

+ ut(xs)+ + {2/t(xs)[1tL'(xs)xs ?(xs)o'(xs)] + U2(Xs)[IL (Xs)X + (Xs)

p(2M) =

+ o-'(xs)o'(xs) + o2 (Xs)[/ (Xs)Xs+ 21p'(xs)

Notice that, for some models and parametersettings, it is possible to obtain &2 < 0. The code should include a check to watch out for this. Shoji and Ozaki (1998) suggest a method which they refer to as local linearization.Their approachrequiresa model with constantvolatility;however,as shown in Section 3, this results in little loss of generality.Given dX = lp(X) dt + a dW (a is constant)and fixed x,, one begins with an applicationof Ito's lemma: 2o1"(X) dt +1'(X)

dX.

=

i(Xs)

+"-C(Xs)(Xt

-

Xs) +-- 2"21Ltt

(Xs) (-

=

2p(2M) _

2P

(M)

(A2)

where K is some unknownconstant. If the approximatelikelihoods are stochastic (i.e., computedby simulation),extrapolation reducesbias, but at the cost of greatervariance.Since it is possible to obtain a negative value for the extrapolateddensity, any implementationof this technique should check for positivity, and fall back to the nonextrapolatedvalue in case of trouble. Extrapolationis a well-known bias-reductiontechniquefor computing expectations of diffusion processes (see Kloeden and Platen 1992, sec. 15.3). That we are able to apply the technique in the present context is because our approachto approximatingthe transitiondensity is essentially an expectation. For Pedersen's method, it is easy to see from Equation (15) that p(M)(x,, t; x,

Using the first-orderTaylor expansion, we define F(x,)

p(2M) PE

(A2)

p + KA/2 +(9(A2)

= p+

+ .(xs)a".(x,)]}- f2

dlu(X) =

= p + KA +

s)=

-

p(')(Xt,

t; u,7M_

0,) dP(M) (U)

(16)

where P(M_1is the measure induced by the Euler scheme at M_-1.In general, one obtains S).

The approximatedensity will be obtained from

p(M) (Xt,

t;

Xs, S)

,,,(U)

(U)

(17)

dX = f(X) dt + o dW, which is an Ornstein-Uhlenbeckprocess. One obtains PSh&Oz(Xt,

t;

Xs,

S) =

4(Xt;

, .2)

where QQ, is the measure induced by the (M - 1)st component of the importance sampler and p() is the Radonderivative of P with to respect Nikodym M_1 Q•-)1. These also be derived from expressions may directly (6).

304

Journalof Business & EconomicStatistics,July 2002

5. VARIANCE-REDUCTION TECHNIQUES

pled using the Euler scheme with xs = .08, A = t - s = 1/12, and the SDE given in Section 2. The terminalpoint of each path representsa draw from P(M). The curve representsthe integrandof the right-handside of (16) as a functionof u with x, = .11. It is clear that most of the samples are drawn from regions where the integrandhas little mass. The importance samplersdiscussed in this section are designed to addressthis shortcoming.Elerianet al. (2001) appearto have been the first to consider the idea of using efficient importancesampling in this context.

We examine two approachesto reducingthe varianceof the Monte Carlo integrationshown in Equation (8): importance sampling and random number schemes. Some of the techniques are illustratedin Figure 3. A basic principle of Monte Carlo integrationis that one should draw points with higher probabilityin regions where the integrand is larger. Figure 4 illustrates why Pedersen's method performs so poorly. The paths in the figure are sam-

Brownianbridge

Pedersen's sampler 0.1

0.1

x

t

0.11

x

0.09 x

0.09

s

0.07

x S

0.05

0.07

t

s

time

Modifiedbridge

Mod.bridgewithnormalizedvariates

0.11

0.11

xt

x

xt

x

x

S

0.07

t

s

time

0.07

t

s

S

t

s

time

time

Modifiedbridgewithbinomialtree

Elerian-Chib-Shephardsampler

0.11

xt

x

0.11 ?-xt

x

0.09

0.09

x

x

S

0.07

s

t

time

S

0.07

s

t

time

Figure3. SimulatedPaths DrawnUsing VariousImportanceSamplersand RandomSchemes.

Durhamand Gallant:NumericalTechniquesfor MaximumLikelihoodEstimationof Continuous-TimeDiffusionProcesses 80 60)

60 ..p..,;

t;U,M-)

40

The integral is computed by generating samples k = 1,... K} from the joint process {(uk M-,9k, M-l), the Euler scheme, and then computing (X )1,pm41) using

=1K

, (M,K) p(MtK)xt, t; Xs, S,

a,.

)= KE

k=l

20.09

t; Uk, M-1

M-1) rk,M-1.

k2 .O d(log p) = - 2 dt + k dW

Xs

0.07

t

X 0.05

S

time

Figure 4. Illustrationof Equation(16). The terminalpoints of the sample paths representdrawsfromp(M); the curverepresentsthe integrand.

with initial condition log p(s) = 0 ratherthan (18). The second importancesampler which we consider draws Um+1from a Gaussiandensity based on the first-orderapproximation conditional on um and x,. That is, treating um and u, = x, as fixed, one draws um+1from the density P(Um+I

IUm, UM) = P(Um+,IIU)P(UMIUm+l)/P(UMIUm) S

The first importance sampler we consider is based on the Brownian bridge. A Brownian bridge is a Brownian motion startedat xs at time s and conditionedto terminateat xt at time t. The sampleris constructedin a mannersimilar to the Euler scheme. In this case, the mapping T(M): (W1. . . WM-I; 0) -+ (uI, ..., uM_) is defined by the recursion Um+1

p)(xt,

It is easy to show by direct calculation of p(M) that this expression is equivalentto (8). We have found it to be more stable to base the Euler scheme for p on

.

0.11.....

305

=

Um

+

or(UM,

+ m)5 3+(Um;

0)81/2 Wm+1

Um +

14(Um+1;

x

= 4(Um+l;

Umn+-Am&m,

"-2

tm)

orm =

, U2....

UM_

-

p(x)

xk(x)

-

=

(M--m--1.2

M-M

)

M-1

=

C log p(')(u,,m+, rm+; um, Tr).

m=O

One samples u = (u1, u2....

t(x)

We can thus obtain the continuous-timeexpression

/*

,MI)pM

(u) dQe_ (u)

where Q,_,- is the probabilitymeasureinduced by X(7rM_).

=

S[ -= 4

p(x,, t; xs, s) = fp(x,, t; u,

(Um),

Xs, Xt)

with initial condition p(s) = 1 and k(x)

28)

Notice that this importance sampler turns out to be identical to the Brownian bridge sampler, except for the factor (M - m - 1)/(M - m) in the variance.While it is not entirely obvious that this should be the case, we will see that this modification results in much better performance.We will refer to this sampleras the modified Brownianbridge. The third importancesampler which we consider was proposed by Elerian et al. (2001). The idea is to approximatethe target density by a multivariatenormal with mean and variance based on a second-orderTaylorexpansion of the log target density about the mode. The log targetdensity is given by log p(M)(u

(18)

28*)

where 8 = (t - s)/M, 8* = t - 7rm+,8+ = t Tm, f = a = ff(Um), and S(UM-UM)

dp = pk(X) dW

Um+1I+-•8*,a

4(UM;

AM

This is a true Brownian bridge if and only if a is constant (which will be the case if we first transformthe model as discussed in Section 2). Although it is certainlypossible to computethe approximate density directly from (8), there is an interestinginterpretation of this sampler based on Girsanov's theorem. Consider the processes dX = p(X) dt + u(X) dW and dX = fA(X)dt + o(X) dW with initial condition X(s) = X(s) = x,. Girsanov's theorem tells us that the Radon-Nikodym derivative of the probabilitymeasure generatedby X with respect to that generated by X is given by

-28)

/ 28+) +/8+, +1(UM;Um

where the drift is given by

(x, 7r)=xt -7

L8,

-

uM1_)

from N(/*, Z*), where

argmaxu log p(ulxs, x,) 2

log p(x* xs, xt)

Ou'Ou In practice, one obtains p* by starting with i = (l ... , uiM-l), where im = x, + m(x,-xs)/M, and taking a

306

single Newton step toward the maximum. The derivativesof log p(ulxs, x,) are straightforward,but tedious to compute. The key feature of this sampler is that it draws paths in one shot rather than recursively. Implementingthis sampler requiressolving a system of M - 1 linear equations,and computing a Cholesky decomposition. To obtain reasonable performance, it is essential that one take advantageof the tridiagonal natureof the relevantmatrices. As always with importance sampling, one should ensure that the tails of the sampling density are not too thin; otherwise, it will not be possible to drive down the varianceof the Monte Carlo integral despite using a large number of sample paths. One way to address this problem is by using Student t ratherthan normalincrementsin the constructionof the sample paths. One might also try the approachsuggested by Geweke (1989). The second category of variance-reduction techniques which we examine is random number schemes. The method of antitheticvariates, as discussed in Section 4, is one such scheme, although our results suggest that it provides only marginalbenefits in this context. Recall from Equations (16) and (17) that the density approximationmay be thought of as an expectation. Kloeden and Platen (1992, sec. 14.1) suggest that, for computing expectations,the Gaussianincrements (W,,..., WM_-)in Equation (14) (and similar expressions for the other importance samplers) may be replaced by other random variables satisfying appropriatemoment conditions. One possibility is the random variable which takes on the values 1 and -1, each with probability1. In additionto reducing variance, this scheme gives a speed increase, since generating normal deviates can be a significant fraction of the computationaleffort. Furthermore,if M is sufficiently small, it is possible to compute the Monte Carlo integral by summing over all possible branchesof the binomial tree of length M - 1. For example, setting M = 8 would require the computationof 27 = 128 sample paths, which is entirely feasible. Using the techniques discussed in Section 4, it is possible to achieve low bias with small M. In particular,since this random number scheme produces a method with no variance, it is ideally suited for use together with extrapolation. While expectations computed using the binomial tree scheme are known to converge under appropriateconditions, the propertiesof this scheme in the context of this article are uncertain.Therefore,we also consider a relatedscheme which provides much of the same benefit by less drasticmeans. The idea is to controlthe "jaggedness"of the samplepathsby forcing each vector of increments(W,,..., WM_-) to have sample varianceone. This may be accomplishedsimply by using the vector

Journalof Business & EconomicStatistics,July 2002

6. NUMERICAL EXPERIMENTS

We first test our various techniques by approximatingthe transitiondensity as describedin Section 2. The settings used for the various approximationswill be identifiedby "samplersubdensity-M-K,"for example, the Brownianbridge sampler used togetherwith the first-order(Euler) subtransitiondensity, M = 8, and K = 256 will be identifiedas "bridge-euler-8-256." The RMSE is also computedfor these approximationschemes (as described in Section 2). The results are summarizedin Table 1. Figure 5 illustrates the performance of the various subdensity methods when used to compute the transitiondensity directly (i.e., M = 1, no simulation). While the error associated with the simple first-orderapproximationis moderately severe, a factor of 10 improvementis obtained if the model is transformedbefore applying the first-orderapproximation. The scheme proposedby Elerian(1998) comes close to obtaining this improvementwithoutneeding the transformationstep. When applied to the transformedmodels, the techniquesproposed by Shoji and Ozaki (1998) and Kessler (1997) provide an additionalorderof magnitudeimprovementover the Euler scheme. Although not shown in the figures, we have found these schemes to be of little benefit when used on the untransformed model. The technique proposed by Nowman (1997) provides nearly no benefit whatsoever. Figure6 illustratesthe Brownianbridgeand modifiedbridge samplers. The first-orderapproximationis used for the subtransitiondensities. The transformationstep is not used. Using the Brownian bridge largely solves the main problem associated with Pedersen's method. The modified bridge provides a furtherdramaticreductionin variance.Notice that panels (c) and (d) of Figure 6 use only K = 8 sample paths, as compared to K = 256 for panels (a) and (b) and Figure 2(a)-(d) (Pedersen's method). We see that increasing the number of subintervalsbrings the expected reductionin bias. Figure 7(a) and (b) illustratesthe use of extrapolationand Elerian'ssubtransitiondensity scheme, respectively,to reduce bias. Panel (c) shows the variance reduction due to normalized variates.Panel (d) demonstratesthat one still obtains the expected reductions in bias and variance from increasing M and K, respectively.All panels in this figure use the untransformed model. Figure 8 illustratesthe Elerian-Chib-Shephard(ECS) sampler. This samplerseems to work well for M relatively small, but the variance goes up dramaticallyas the numberof intermediate points increases. It was not possible to compute the RMSE of the log density approximationwith the ECS sampler and M = 32 because the sampleroften chose points below zero (i.e., outside the range of the model). We follow Elerianet al. (2001) by using the transformationY = log X in this case. While it is possible to obtain reasonably accurate results using the untransformedmodel, Figure 9 shows thatfirsttransforming the model provides significant benefits, especially W} WM (W,, (W,... W_), 1 when used with the subdensity scheme of Shoji and Ozaki (1998) and normalized variates. With these settings, we are and may be thoughtof as a weakening of the two-point idea, able to obtain RMSEI .0006 with only M = 8 and K = 8. which forces the sample varianceof each individualincrement The computationalcost is about 16 s to approximatea likelihood with n = 100,000 observationsusing FORTRANcode to be equal to 1.

Durhamand Gallant:NumericalTechniquesfor MaximumLikelihoodEstimationof Continuous-TimeDiffusionProcesses

307

Table1. RMSEof Log DensityApproximations Sampler

Subdensity

M

K

NVa

b Extrap.

RMSE

Timec

.13678 .03550 .14467 .19353 .19353 .66310 .05355 .05570 .03068 .06103 .02404 .01180 .00713 .03562 .01856 .01206 .00656 .07207 .05164 .46920 .25973

.2 .2 .2 539.5 538.3 2,169.8 550.4 2,211.2 17.5 26.3 19.1 299.3 1,227.1 24.6 25.0 401.2 1,603.8 30.4 30.3 403.6 1,582.2

.03790 .01412 .00864 .16500 .57507 .00812 .00255 .01795 .00070 .00057 .00030 .00072 .00020

.2 .3 .2 338.2 1,254.5 10.8 80.7 16.6 16.0 16.4 248.7 25.3 315.5

Untransformed model None None None Pedersen Pedersen Pedersen Bridge Bridge Mod bridge Mod bridge Mod bridge Mod bridge Mod bridge Mod bridge Mod bridge Mod bridge Mod bridge ECS ECS ECSd ECSd

Euler Elerian Nowman Euler Euler Euler Euler Euler Euler Euler Euler Euler Euler Elerian Elerian Elerian Elerian Elerian Elerian Elerian Elerian

1 1 1 8 8 32 8 32 8 8 8 32 32 8 8 32 32 8 8 32 32

1 1 1 256 256 256 256 256 8 8 8 32 128 8 8 32 128 8 8 32 128

x x x x x x x x x x

Transformed model None None None Pedersen Pedersen Mod bridge Mod bridge Mod bridge Mod bridge Mod bridge Mod bridge ECS ECS

Euler Shoji Kessler Euler Euler Euler Euler Euler Shoji Shoji Shoji Shoji Shoji

1 1 1 8 32 8 64 8 8 8 32 8 32

1 1 1 256 256 8 8 8 8 8 32 8 32

x x x x x

a An "x"in this columnindicatesnormalizedvariateswere used; otherwise,antitheticvariates. b An "x"in this columnindicatesextrapolation was used. c Computingtime (inseconds) requiredto obtainlikelihoodforn = 100,000observationsusing FORTRAN code on a 750 MHzPC. d Withthese settings,the samplerfrequentlychose pointsXr< 0; thus, followingElerianet al., we used the transformation Y= log X.

on a PC runningat 750 MHz. It would be virtuallyimpossible to obtain anywhere near this level of accuracy using Pedersen's method without our accelerationtechniques. Tables 2 and 3 show the errorswhich result from estimating parametersby maximizing the approximateratherthan exact log-likelihood. Results are shown for several differentsettings of the model parameters.The errors are estimated by Monte Carlo simulation with 512 repetitions over synthetic datasets of n = 1,000 observations.The SMLE estimates are obtained using the modified bridge sampler,Shoji and Ozaki's subdensity, M = 16 and K = 16. For comparison, we also compute parameterestimatesusing the first-orderEuler scheme approximation. The transformedmodel is used for all of the experiments shown in these tables. It should be noted that the Euler scheme approximationsthus obtained can be expected to be significantlybetterthan those typically obtainedby practitioners without implementingthe transformation(see Fig. 5). Table 2(a) uses the baseline model settings, 00 = (.06, .5, .15) and A = 1/12. Recall that these are calibratedto monthly observationsof the U.S. short-terminterest rate. For these model settings, we also compute parameterestimates

using Pedersen's method with M = 8 and K = 256. Pedersen's method is unable to match even the Euler scheme. On the other hand, the approximationerrors of the SMLE estimates obtained using our techniques are negligible (compare with the sample distributionof OMLE - 0o). Panel (b) increases the volatility of the model. Note that computing the exact likelihood requires the evaluation of a Bessel function, which in turn requires 202013/0 > 1. Setting 00 = (.06, .5, .22) comes quite close to this boundary. For some samples, the constraintappearsto bind when maximizing the likelihood. These samples are discarded. This model causes our methodology some difficulty, apparentlybecause the data often venture close to the singularity at zero. The estimates are nonetheless quite good. Panel (c) reduces the model's volatility to 03 = .03. Panel (d) sets the mean reversionparameterto 02 = .4. Again, the constraint20201/0 > 1 comes into play. Neither of these tests presents any difficulty to our methodology. Panel (e) increasesthe mean reversionparameterby a factor of 10 to 02 = 5.0. Panel (f) uses the baseline setting for 00, but stretchesthe samplingintervalto two years. Both of these

Journalof Business & EconomicStatistics,July 2002

308 Euler

(a)

model Euler,transformed

(b) 0.25

2

1.5

0.2

o0.5 ) -0.5

0

0

0.15

.0

S0.1

EE

2-0.5

-1 -1.5 -2

.-00 0.06 0.08

0.1

0.12

0.14

1J0.05 0.06

0.08

Elerian

(C) 0.5

0.1

0.12

0.14

Xt

Xt (d) x 1020

0.4

model Shoji&Ozaki,transformed

15 c10

c

0.3

10

E C

E 0.1

0

0

0 -0.1

0.06

0.08

0.1

0.12

0.14

-5

0.06

0.08

Xt (e)

0.1

0.12

0.14

Xt

Nowman

(f)

2

20

x 10-

model Kessler,transformed

1.5 15 c S0.50

c 10

S-0.5

5

-1 -1.5 -2

0.06

0.08

0.1

Xt

0.12

0.14

-5,

0.06

0.08

0.1

0.12

0.14

Xt

Error,log D(Xt,t;X,,s) - log p(Xt,t; X,,s), forVariousSchemes WithoutUsingSimulation(i.e., M = 1, K = 1) GivenA = Figure5. Approximation t-s= 1/12, X, = .10, and 6= (.5,.06,.15).

settings result in large biases for the first-orderapproximation, and many other financialtime series are well known to exhibit but pose little difficulty for the SMLE technique. properties such as fat-tails and persistent volatility patterns which are inconsistent with time-homogeneousscalar models (e.g., Ghysels, Harvey,and Renault 1996). A variety of alter7. STOCHASTIC VOLATILITY native models has been proposed.To illustrateour methodolwe will examine stochastic volatility (SV) models of the ogy, While the previous sections have focused on techniques form designed to efficiently approximatethe likelihood functionfor scalar models, most of these ideas are easily generalized to dX = txx(X) dt + ax (X) exp(H) dW, the multivariatesetting. With some work, they may also be dH = pAH(X,H) dt + oa(X, H) dW2. applied to latent variablemodels. The short-terminterest rate

Durhamand Gallant:NumericalTechniquesfor MaximumLikelihoodEstimationof Continuous-Time DiffusionProcesses (a) 0.2 0.15

bridge-euler-8-256

(b) 0.2

0.1 \

0.05

0.05

0

E

0-

2 -0.05

-005-

-0.1

-0.1

-0.15

-0.15

-0.2

bridge-euler-32-256

0.15

\

0.1

?2 E

0.06

0.08

0.1

0.12

-0.2

0.14

0.06

0.08

0.1

Xt (C) 0.2

?

0.15

\

0.1

\

0.12

0.14

xt

modbridge-euler-8-8 ,

modbridge-euler-32-8

(d) 0.2! 0.15 0.1 \

S0.05

C 0.05

-0.05

-0.05

-0.1

\

\

-0.15 -0.2

309

-0.1

-0.15 0.06

0.08

0.1

0.12

0.14

-0.2

0.06

0.08

0.1

Xt

0.12

0.14

Xt

Error,log ,(Xt,t; X,,s) - log p(Xt,t;X,,s), for VariousSimulatedLikelihoodSchemes GivenA = t - s = 1/12, X =. 10, Figure6. Approximation and 0 = (.5, .06, .15). Themedian and interquartile model is used. range over 128 repetitionsare plotted. The untransformed

Such models have been examined by Gallant and Tauchen (1998), Andersenand Lund (1997), and Eraker(2001) among others. The second component, H, correspondsto an unobserved volatility factor. In order to obtain a likelihood, the unobservedfactor must be integrated out. Several ways of going about this have been proposedfor discrete-timemodels, for example, Danielsson and Richard (1993), Jacquier,Polson, and Rossi (1994), Richard and Zhang (2000), Sandmannand Koopman (1998), Kim, Shephard, and Chib (1998), and Pitt and Shephard (1999). In the continuous-time context, it is less straightforward to integrate out the unobserved factor, and alternative approaches are used. The efficient method of moments approachhas been used by Gallant and Tauchen (1998) and others. Methods based on the empiricalcharacteristicfunction have been proposed by Chacko and Viceira (1998) and Singleton (1997). Markov chain Monte Carlo approacheshave been proposed by Eraker(2001), Jones (1999b), and Elerian (1999). We now outline a technique to approximatethe likelihood of continuoustime SV models. For simplicity,we will assume that W1and W2are independent;this is not an essential part of the methodology. We provide a small Monte Carlo study which demonstratesthat the procedureis effective and reasonably fast. The methodology is used to estimate an SV model of the U.S. short-terminterestrate in Section 8. Furtherrefine-

ments are undoubtedlypossible; a more detailed study is currently underway. The basic idea is relatively straightforward.We are interested in the process (X(t), H(t)), where X is observed at times to, t, ..., t, and H is latent. Let Xi = X(ti), Hi = H(ti),

and Y,= or(Ho, Xo, X, ..., Xi) for i = 1 ..., n. The goal is

to obtainp(Xi+I1$). If we knewthe distribution of HiI., we

could use

p(Xi+11$) = fp(Xi+1 IXi,hi) dPHiF (hi). We will approximate Hi 1. Given the distribution of Hi-1 ij-1, it can be propagatedforwardusing

(Hi-1 i) ) p(Hii -=

-

p(XiI-1, H'i-1)p(Hi-1 '-1)

p(Xip(-1) p(HiXiX,hi-1)dPH,_,i (hi-l).

It remainsonly to find Ho. The approachwe take is to estimate H0 as an unknownparameter,althoughone could equally well integrateit out using an appropriateimportancesampler.This is the basic idea of a particlefilter (see, e.g., Pitt and Shephard 1999 and the referencestherein). To implement this idea, we use the following procedure. Consider the model dZ = t(Z) dt + r(Z) dW

310

Journalof Business & EconomicStatistics,July 2002 (a)

0.1 *'

modbridge-euler-8-8(extrapolated) .

0.15

\

C 0.05

modbridge-elerian-8-8

(b)

? ./

"//

C/

0

0.05-/

-0.05

O

2.2

-0.1

0.06

0.08

0.1

0.12

-0.051

0.14

0.06

0.08

Xt 0.1

0.12

0.14

Xt

modbridge-elerian-8-8(normalized variates)

(c)

0.1

(d)

0.03

0.08

modbridge-elerian-32-32 (normalized variates)

0.02 0.02

S0.06

2 C c 0.01?,

C S0.04

\"

E

0,

/

\

\

S0.02

/

\ O

-0.02

-0.01

-

"0. 0.06

0.08

0.1

0.12

-0.02

0.14

0.06

0.08

Xt

0.1

0.12

0.14

Xt

Error,log p(Xt,t; X,s) - logp(Xt,t; Xs,s), forVariousSimulatedLikelihoodSchemes Given&= t - s = 1/12, X =. 10, Figure7. Approximation and 0 = (.5, .06, .15). Themedian and interquartile model is used. range over 128 repetitionsare plotted. The untransformed

where

X IL ( Z= ( H)

0X(Z) 0

(Z) a

0

HM(Z)

W= W

?H(Z) )

The problemis to determinethe density p(Xi+1I$i, 0), where Fi =- (Hno,Xo, XI1..... Xi). The first-orderapproximationis given by p(1)(Zt,

t;

Zs, S,

8) =

(z,;

-

Z,

+

A(Z,)(t

- s),

(z,s)(t

- S))

where 0 is the multivariateGaussian density and = coT'. Also, let s = r7 < •l < * *