Quasi-Monte Carlo Filtering in Nonlinear Dynamic ... - IEEE Xplore

6 downloads 0 Views 455KB Size Report
Abstract—We develop a new framework for Bayesian filtering in general nonlinear dynamic systems based on the quasi-Monte. Carlo (QMC) numerical ...
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006

2087

Quasi-Monte Carlo Filtering in Nonlinear Dynamic Systems Dong Guo and Xiaodong Wang, Senior Member, IEEE

Abstract—We develop a new framework for Bayesian filtering in general nonlinear dynamic systems based on the quasi-Monte Carlo (QMC) numerical techniques. We first propose a general approach to deterministic filtering called the quasi-Monte Carlo Kalman filter (QMC-KF), which unifies several existing advanced filtering methods in the literature, such as the unscented Kalman filter (UKF) and the quadrature Kalman filter (QKF). The computationally expensive step of calculating the Jacobian matrix involved in the extended Kalman filter (EKF) is avoided in the proposed QMC-KF approach. We also propose sequential quasi-Monte Carlo (SQMC) filtering techniques which is analogous to the sequential Monte Carlo (SMC) or particle filtering methods in the literature. We show in particular how to effectively combine deterministic filtering and adaptive importance sampling schemes, which lead to powerful SQMC filtering strategies. The properties of the proposed SQMC and SQMC/IS methods in terms of almost sure convergence and numerical error propagation behavior are analyzed. Finally, numerical examples are provided to demonstrate the effectiveness of the proposed new QMC-based filtering algorithms. Index Terms—Nonlinear dynamic systems, quasi-Monte Carlo (QMC), Kalman filter (KF), sequential Monte Carlo (SMC).

I. INTRODUCTION

T

HE need for accurate monitoring and analysis of sequential data arises in many scientific, industrial and financial applications. Consider the following general nonlinear dynamic system specified by a state equation and an observation equation (1) (2)

and denote, respectively, the state and observation where and denote, respectively, the state noise and variables; observation noise which follow certain known distributions. The and are assumed to be independent. We sequences denote as the observations up to time . Our aim of filtering such a nonlinear dynamic system is to calculate Manuscript received November 4, 2004; revised August 12, 2005. This work was supported in part by the U.S. National Science Foundation (NSF) by Grant DMS-0244583, and by the U.S. Office of Navel Research (ONR) by Grant N00014-03-1-0039. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Abdelhak M. Zoubir. D. Guo was with the Department of Electrical Engineering, Columbia University, New York, NY 10027 USA. He is now with Rutter Associates, New York, NY USA. X. Wang is with the Department of Electrical Engineering, Columbia University, New York, NY 10027 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TSP.2006.873585

at each time . The the posterior state distribution densities of interest can be updated recursively as follows: (3) (4)

(5) The above recursive equations produce a reduction in computing cost, but the integrals involved are still typically intractable. There are primarily two paradigms for treating the above sequential inference problem. The first is the deterministic filtering approach, in which the posterior distributions are approximated as Gaussian or mixture Gaussian; and various strategies exist in the literature to recursively update the corresponding first- and second-order moments. For example, the extended Kalman filter (EKF) and its variants [6], [14], are based on linearization along the trajectories. The EKF has been successfully applied in numerous applications. However, due to the linear approximation error, the performance of the EKF is often unsatisfactory. In this paper, we propose a new family of deterministic filtering techniques for the general nonlinear dynamic system (1)–(2) that makes use of the quasi-Monte Carlo (QMC) numerical integration technique. One important feature of these algorithms is that they do not involve calculating the Jacobian matrix which is the most computationally intensive component in the EKF. Furthermore, several well-known advanced deterministic filtering methods, such as the unscented Kalman filter (UKF) [18] and the Gaussian quadrature Kalman filter (QKF) [17], are special cases of the proposed new framework. The second paradigm for filtering nonlinear dynamic systems is based on drawing random samples from the underlying posterior distributions, namely, the particle filters or sequential Monte Carlo (SMC) methods [2], [4], [5], [23]. In essence, these methods make use of Monte Carlo (MC) integration to approximate the integrals in (3)–(5). In this paper, by employing the QMC integration techniques, which have been proven asymptotically more efficient than the MC integration [20], [25]–[27] we develop novel sequential quasi-Monte Carlo (SQMC) algorithms for filtering nonlinear dynamic systems. The basic idea is to replace the conventional MC points by the randomized QMC points. In particular, we develop a new filtering scheme that is based on sequential QMC and importance sampling (SQMC/IS). The performance/efficiency of

1053-587X/$20.00 © 2006 IEEE

2088

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006

the SQMC/IS filters depends largely on the proposal distributions. Several proposal distributions are discussed within the SQMC/IS framework. The first one is similar to that used in the particle filter [11] and the lattice particle filter [21], which depends only on the state equation without exploiting the new measurement. The second proposal distribution exploits information in both the state process and the observations and a conventional nonlinear filter is employed to construct the proposal distribution [13]. A split normal sampling method [10] is adopted to determine the QMC points based on the constructed proposal distribution. Finally, we propose to adopt a more sophisticated adaptive importance sampling scheme [31], [32] to successively refine the proposal distribution, and consequently improve the efficiency of the algorithm. The remainder of the paper is organized as follows. In Section II we give a general background on QMC numerical integration techniques. In Section III, we develop the quasi-Monte Carlo Kalman filters (QMC-KF). In Section IV we propose the SQMC algorithms. In Section V, we provide numerical examples to demonstrate the effectiveness of the proposed new filtering algorithms. Finally, Section VI contains the conclusion. II. BACKGROUND ON QMC METHODS In this section, we provide a brief introduction to QMC methods. Suppose we want to evaluate an integral of a real-valued function defined over a -dimensional unit cube

(6) A frequently used numerical approach is to choose a point set and take an average of over (7)

QMC methods are deterministic suggests that we can choose that provide the best-possible spread in the sample the points space, avoiding to the extent possible gaps and clusters that arise from random sampling. These deterministic points are also referred to as “low-discrepancy points.” Theoretical analysis as well as empirical study suggests that can be significantly lower for QMC methods the error than under entirely random sampling. Specifically, the error bounds of the best known low-discrepancy sequences have the , which for fixed dimension is asymptotic order rate associated with a better asymptotic rate than the MC. Hence, we can expect that QMC methods can approximate the integral with a smaller error than MC if the sample size is sufficiently large. In fact, empirical studies indicate that QMC can be far more accurate than MC even with practical values of . A. QMC Points The two main families of low-discrepancy point sets are the digital nets and the lattice points [20]. Examples of digital nets include the Halton sequence, the Sobol sequence, and the Niederreiter sequence. Here we discuss the Halton sequence as an example. Given an integer , the base- Halton sequence is : (a) Write the integer constructed as follows. For in base as (b) The th element of the base- Halton sequence is obtained by reversing the order of these digits and putting a decimal point in front of . them, i.e., A multidimensional Halton sequence can be obtained by generating several one-dimensional (1-D) Halton sequences corresponding to different bases that are prime numbers. For example, we can take the first prime numbers . Then we generate the corresponding for 1-D base- Halton sequence . The -dimensional Halton sequence is then given . by B. Randomization

as an approximation of . When the dimension is small, then the integral is well approximated by (7) using deterministic quadrature methods or sparse grids [9]. Note that for quadrature methods the approximation error are typically difficult to analyze for high-dimensional problems. When the dimension is large, MC integration is typically preferred over quadrature methods. In MC methods, we usuas a set of independent and uniformly disally choose tributed vectors over . Then is an unbiased estimator of whose error can be approximated by invoking the central limit theorem, and whose variance is given by , assuming . Hence, the error associated with the MC method has a probabilistic . A criticism of classical MC is that entirely order random points tend to form gaps and clusters and, therefore, they do not explore the sample space in the most uniform way. QMC methods can be seen as a deterministic counterpart of the MC methods. They are based on the idea of using more to construct the approximation regularly distributed points than the random point set associated with MC. The fact that

One main practical disadvantage of QMC is that analyzing the accuracy of the approximation is typically very difficult. Moreover, since QMC methods are based on deterministic sequences, statistical procedure for error estimation does not apply. For these reasons, randomized QMC methods have been developed with the following two properties [15], [25], [26]: (a) Every element of the sequence has a uniform distribution over the unit cube; and (b) the low-discrepancy property of the sequence is preserved under the randomization. In this paper, we consider two randomized QMC methods, namely the randomized Halton sequence and randomized Latin hypercube sampling (LHS). The randomized Halton sequences are generated with their starting points random. be the starting point and That is, let be the corresponding integer, i.e., . Then the th point is . In other words, the new sequence is the same as a Halton sequence with the first elements removed. A randomized multidimensional Halton sequence , is conhave structed by letting each coordinate

GUO AND WANG: QMC FILTERING

2089

its own random starting point. As shown in [30], if the starting point is a uniform random vector, then each point in the Halton sequence has a uniform distribution. On the other hand, a randomized LHS can be defined as , where is a are random permutation of the sequence ; and all of the and uniform random variables on are independent. For each and every interval , there is still of the form in that interval. With finite number of samples , the one randomized LHS is never worse than the conventional MC [25]. Remark: The Gaussian quadrature points can be seen as a special case of QMC points. That is, the Gaussian quadrature methods are a family of deterministic numerical integration techniques which approximate the integral by a weighted sum of the its function values at certain abscissas. However, the Gaussian quadrature methods suffer from the curse of dimensionality. Although sparse grids based on the Smoyak algorithm [9], [28] in principle allows the generalization of Gaussian quadrature points to the high-dimensional integration, these methods are efficient only for low-dimensional problems. Another interesting group of points is the Julia points [18], which takes only very sparse grid points at the axis and the origin. It is clear that Julia points can be considered as one specific type of QMC points with some weight on each point.

The methods for generating quasi-random numbers for other distributions are discussed in [15]. III. QMC-KF Consider the dynamic system given by (1)–(2). Assume that and are independent and have zero-mean Gaussian and , respectively. distributions with covariance matrices The typical approximation made in deterministic filtering is that and are Gaussian, i.e., (10) where

and

are given, respectively, by

(11)

C. Quasi-Gaussian Random Points In this paper, we frequently encounter the approximation of a high-dimensional integral of the form where is a Gaussian pdf, i.e., . Thus to allow the application of low-discrepancy point sets, we need to find a mapping that projects the integration domain to the unit hypercube. We first treat the 1-D case and then extends it to the high-dimensional case. For a standard Gaussian , the mapping can be defined as distribution . Then the integral can be approximated by using a low-discrepancy point set as

(12) Applying QMC approximation on (11) and (12), we have (13)

(14) (8) where the inverse mapping can be computed by a fast numerical algorithm [24]. is a quasi-random Now suppose that sequence of vectors in the unit hypercube . We can transform these vectors into a quasi-Gaussian sequence with mean and covariance matrix by the following procedure: (a) perform a Cholesky decomposition on , i.e., ; (b) transform to , via ; and (c) set . Then the integral with respect to a multidimensional Gaussian pdf can be approximated by (9)

where are the quasi-Gaussian points with and covariance matrix , generated by mean the method discussed in Section II-C. Next we approximate the conditional distribution of given as Gaussian with a mean and covariance given, respectively, by

(15)

(16)

2090

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006

where are quasi-Gaussian points with mean and . covariance Then the mean and covariance of the Gaussian can be computed by approximation of and

(17) (18)

where the Kalman filter gain and

is defined by (19)

(20)

where are the quasi-Gaussian points used in (15)–(16). Finally, the QMC-KF is summarized as follows. Algorithm 1: QMC-KF • Initialization: Set . The following steps are implemented at the th recur: sion • Prediction: Generate quasi-Gaussian points with mean and covariance . Then compute and by (13) and (14). • Filtering: Generate quasi-Gaussian points with mean and covariance . and by (13) and (14). Then update Remark 1: The QMC strategy aforementioned can also be applied to the Gaussian sum filter [1], in which the conditional distribution is approximated by a mixture of Gaussian distributions, i.e., . Similarly, the conditional and are also approxdistributions imated by mixture Gaussian, each component of which is is updated according updated independently. The weights to [1]

high-dimensional nonlinear problems, whereas the Gaussian quadrature method and sparse grids are efficient for low-dimensional problems. The UKF is useful when the nonlinear functions are smooth, e.g., with second-order derivatives. Remark 3: The Gaussian particle filter [19] is similar to the proposed method in that all of them assume the Gaussian ap. Howproximation for the posterior distribution at time ever, there are several major differences between the QMC-KF and the Gaussian particle filter. First, the QMC-KF replaces the standard MC points with QMC points. Second, given the approximated Gaussian conditional distribution , QMC-KF computes the Gaussian approximathe same way as in the usual Kalman filter; tion of whereas the Gaussian particle filter simply propagates the particles with updated weights. Third, the Gaussian particle filter calculates the weight for each particle with a Gaussian approximation hence making the scheme not a properly weighted scheme; in other words, almost sure convergence is not guaranteed. A. Error Analysis Next we discuss the error bound for the algorithms discussed above. That is, we need to estimate the difference between the true posterior density , which is computed by the recursive Bayesian formulas (3)–(5), and the approximated pos, which is computed by the QMC-KF terior density or the QMC Gaussian sum filter. For simplicity, we denote b a

and

b a

(22)

where a and b denote, respectively, the one-step prediction by (3) and the one-step update by (5) based on the conditional ; whereas a and b are the Gaussian appdf proximations by (13)–(14) to the operator a and by (17)–(18) to the operator b , respectively. We further assume that the operators satisfy the Lipschitz continuous conditions, i.e., for any and probability density function a b

a b

(23)

The norm can be any norm, for example, the total variation norm. Then we have the following result. Proposition 1: Under the Lipschitz continuous conditions (23), the accumulated error at time can be bounded by

(21)

(24)

Remark 2: The QMC-KF discussed in this section can be rewritten in terms of the weighted specified points using the Gaussian quadrature, sparse grids, and Julia points. In other words, these numerical integration points can be considered as special QMC points, which are selected to minimize the numerical integration errors for different function. Hence, the QMC-KF can be viewed as a framework of these methods. On the other hand, the QMC-KF can be used to solve very

where and are the initial error, the error caused by the approximation operator a and the error brought by b , respectively, i.e.,

a b

a b

(25) (26) (27)

GUO AND WANG: QMC FILTERING

2091

Proof: We have b a

b a

b a b a

b a b a

b a b a

b a b a a

a

Fig. 1. Illustration of the sequential QMC algorithm.

Hence, the error propagation is closely related to the transfer function and the initialization process. Furthermore, the error propagation will not blow up abruptly. IV. SQMC ALGORITHMS A. The SQMC Algorithm In this section, we apply the QMC integration directly to the recursive formulas (3)–(5) without resorting to the Gaussian assumption as in the QMC-KF discussed in the previous section. is negligible outside Suppose that the probability mass . Then we have an interval

(28) (29) are the QMC points over ; where and denotes the point-wise vector product. At each step of Bayesian filtering, if we can predecide the ranges of the integrations, then we can approximate (3)–(5) by QMC as follows:

(30)

(31)

(32) The SQMC algorithm is summarized and illustrated in Fig. 1. Algorithm 2: [SQMC algorithm] • Initialization: Generate QMC points in and transform them to by (29). Then compute the prior density . The following steps are implemented at the th recur: sion

• Prediction: where has support, which — Select the interval is discussed later. in — Generate QMC points and transform them to by (29). — Compute the predictive density values by (30). • Filtering: Compute the posterior density values by (31). Remark 1: Obviously the main difficulty associated with the . above SQMC algorithm is the choice of the interval The simplest way is to keep the interval invariant, i.e., and . This is possible only when the state variables are stationary. In general the choice of the interval has to be made at each time step in order for the filter to keep track of the varying state vector. For example, from the predictive density , we can roughly estimate the interval where has the most mass. Another possible approach is to employ an and auxiliary EKF to get an estimates of based on which we can obtain an estimate of the interval. in the Remark 2: We can reduce the complexity of evaluation of (31) by taking some simple schemes. For example, , the computation in (30), which employs the samples of closest can be approximated by the computation only with samples to . In other words, we choose only those samples of prethat significantly contribute to the value , this scheme is found very dictive distribution. Because effective in reducing complexity while its performance does not deteriorate much. B. The SQMC/IS Algorithm The importance sampling (IS) technique is another effective method to improve the performance of MC integration and is the cornerstone of the sequential Monte Carlo (SMC) methodologies for filtering nonlinear dynamic systems [22]. We next propose to combine both SQMC and IS techniques to further improve the performance of the Bayesian nonlinear filters. More , which can be easily samprecisely, a proposal density contains pled from, is chosen, such that the support of that of . Then QMC points with distribution are obtained. Each point is then given a weight A discrete distribution represented by the points is obtained by placing a probability

2092

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006

mass at each point . This discrete density will converge for large . in distribution to We have the following general SQMC/IS recurrence

(33)

(34)

(35) The SQMC/IS algorithm is summarized as follows. Algorithm 3: [SQMC/IS algorithm] • Initialization: Generate QMC points in and transform them to by (29). Then compute the prior density . The following steps are implemented at the th recur: sion • Prediction: , which will be dis— Compute the sampling density cussed in Sections IV-B-1–3 and — Generate QMC points transform them to by . — Compute the predictive density values by (33). • Filtering: Compute the posterior density values by (34). The performance of importance sampling depends on the is to the sample size and how close the trial distribution . If they are very different, then target distribution will have a large variance, the weights with only a few points having nonnegligible values, which in turn necessitates increasing the sample size . In what follows, . we discuss three forms of the trial distribution 1) Sequential Importance Sampling: A straightforward choice of the trial distribution is based on the state equation (1), i.e.,

where the last equality follows from (33) and (36). To sample from (36), we use the following procedure from the multinomial distribu• Draw . tion : The following steps are implemented for • Find the set and let denote the number of elements in . QMC points in • Generate . • Transform to according to the . mapping Remark: The algorithm described above is similar to the auxiliary particle filter [2] which combines propagation, reweighting, and resampling steps into one step. The lattice particle filter discussed in [21], the quasi random sampling for condensation [29] and the regularized particle filter in [7] are closely related to the proposed method. However, the lattice particle filter and the quasi random sampling for condensation generate the whole set of QMC points and employs one, say , for each stream , to obtain based on the , which may destroy the low-discrepancy property of the QMC points to some degree. The regularized particle filter also took a sampling scheme similar to the auxiliary particle filter. But it still generates the whole set of QMC points as the lattice particle filter, which may also destroy the low-discrepancy property of the QMC points. On the other hand, the proposed algorithm strives to keep the low-discrepancy property of to . In other the QMC points in the mapping from words, the proposed method is at least as good as the lattice randomly from a uniform particle filter. If we sample distribution, the algorithm becomes the sequential importance sampler discussed in [11]. Therefore, both the lattice particle filter and the sequential importance sampler are special cases of the general SQMC/IS framework proposed here. Although the above algorithm is easy to implement, often times such a proposal distribution is not efficient since it does not exploit the information in the observation . Note that the information on comes from two sources: through the the current state through the observation equation (2). state equation (1) and is strong, it is important to exploit When the information in it in generating the importance samples to gain efficiency. 2) Split Normal Importance Sampling: To fully exploit the information in both and , we suggest to use the following trial distribution (38)

(36)

from which a sample computed by

is drawn and its associated weight is

(37)

However, rarely is it possible or practical to obtain samples whose distribution is the same as the true posterior distribution. Here we approximate the posterior distribution using localized linear approximation or the nonlinear Kalman filter discussed in Section III. Since such an importance sampling density may be inaccurate, we make use of the split normal importance sampling strategy [10], in which the importance sampling is taken along each coordinate and adjusted to reduce the mean of the

GUO AND WANG: QMC FILTERING

2093

weights. Let the approximated posterior be a Gaussian distribu. Let be an indicator vector with 1 in the tion th position and 0 elsewhere. Define (39)

(40) In practice, carrying out the evaluations for seems to suffice. With the chosen and , the split normal sampling procedure is readily described as follows: • Generate randomized QMC points with standard Gaussian distribution . with each component • Obtain given by

To get a more accurate approximation to the true posterior distribution, we can employ a mixture Gaussian distribution to and then employ the split normal samapproximate pling on each Gaussian component. 3) Adaptive Importance Sampling: Adaptive importance sampling was proposed in [31] to adaptively refine the posterior distribution using a mixture of Gaussian or -distributions. The basic idea is to start from an initial guess of the posterior, computed by the QMC-KF, and obtain say . Based on these samples, a QMC points weighted kernel estimate is constructed as (43)

where the weight

is computed by (34) with and the covariance matrix is

defined as

(41) is the indicator function of nonnegative numwhere . bers and to • Transform according to (42) Starting with the known prior distribution , we have the initial samples and the Gaussian to the prior distribution. Then approximation the posterior distribution at time can be calculated recursively. The first step is to compute the Gaussian approximation to the posterior density function using the QMC-KF discussed in Section III. The next step is the prediction step by (33) employing the QMC points generate by the split normal sampling scheme. The last step is the correction step by (34) given the new observation. The algorithm is summarized as follows. Algorithm 4: [SQMC/split normal IS algorithm] • Initialization: Compute the Gaussian approximation to the prior distribution . The following steps are implemented at the th recur: sion • Prediction: and by the QMC-KF given — Compute and . based — Generate QMC points and by the split normal sampling on the method. of — Compute the values the predictive density by (33). • Filtering: Compute the weights (i.e., the posterior density) by (34).

with

(44)

and

is a slowly decreasing function of , e.g., , where is the dimension of the state-variable. The weighted kernel estimate (43) is used as a . Repeat this procedure, second guess, namely with a possibly different sample size from , we can revise the mixture importance sampling density to , and further if required. At the final stage, a relatively small number of samples may be sufficient for the desired accuracy. This representation can be used in the next iteration in the mixture QMC-KF. The adaptive importance sampling step at the th recursion is summarized as follows. using the • Compute an initial guess QMC-KF or the split normal sampling method. , do the following: For from • Generate QMC points . by (34). • Calculate the weights by • Construct the proposal distribution (43). The final posterior distribution is then given by (43) with the samples corresponding to the last iteration. C. Analysis 1) Almost Sure Convergence: We next consider the asymptotic properties of the proposed QMC filters in the previous sections. By the theory of low discrepancy sequences, a QMC estimate for numerical integration yields a probabilistic error bound in terms of , the number of QMC points of and this holds under a very weak smooth condition on the func, tion . In particularly, the smoothness is measured using a total variation in the sense of Hardy and Krause, intuitively the

2094

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006

Proof: As b and a are continuous, we have

length of the monotone segments of . For a 1–D function with a continuous first derivative it is simply

b b

(45)

a b

a b In high dimensions, the Hardy-Krause variation may be defined in terms of the integral of partial derivatives [20], [25]. In this section, first we will show that as the number of the QMC points grows, the SQMC algorithm converge to the optimal filter almost surely in the sense of Hardy and Krause. Then we will show the same result holds for the SQMC/IS algorithm. Before we analyze the convergence of the proposed algorithm, we give some definitions. Definition 1: Almost sure convergence: If is a seconverges (weakly) to quence of probability measures, then if for any continuous bounded measurable function

Then by assumption we have a b a b in the sense of Hardy-Krause. That is, in the sense of Hardy-Krause, which in turn in the sense of Hardy-Krause. implies For QMC schemes, b is the mapping b (47) defined by (3), i.e., b a is a probability measure defined by (4), i.e.,

; whereas

a

(46) For the QMC schemes, we have to relax the definition of the almost sure convergence because we have to consider the estimation error for the bounded variance function in the Hardy-Krause sense. Definition 2: Almost sure convergence in the Hardy-Krause is a sequence of probability measures, sense: If then converges (weakly) to if for any continuous bounded variance function in the sense of Hardy-Krause, We next show that the QMC schemes have the property of almost sure convergence in the sense of Hardy-Krause. We need the following three lemmas: be two sequences of Lemma 1 [3]: Let continuous function on the probability space. In addition, let and be two other sequences of function defined as and , where the operator “ ” denotes the where is composition of functions, i.e., any probability measure. Furthermore, the function and are approximated by and where maps a measure to another measure of samples of size . Then we have

if satisfies for all probability measures and . To consider the properties of the QMC algorithms, we need to extend the above lemma to almost sure convergence in the sense of Hardy-Krause. We have the following. satisLemma 2: Under the conditions of Lemma 1, if in the sense fies and , then of Hardy-Krause for all probability measures and in the sense of Hardy-Krause.

In other words, a . In the context of filtering, it is natural to assume that a and b are continuous. This means that a slight variation in the input conditional distribution at time will not result in a large variation in the output . That is, for function b , conditional distribution at time the transitional kernel of the signal is Feller, i.e., it has the property that for a continuous bounded function , the transfer function by the kernel is also a continuous bounded function; is a conwhile for a , the conditional probability function tinuous bounded positive function. in the QMC algoOn the other hand, the perturbation rithm is not a fully random measure as in the particle filter scheme, but a predetermined quasi-random one. That is, the can be defined as perturbation on a probability measure where is the indicator function is the QMC point with the weight defined by and with the probability measure as such defined by (31). is defined as above, then it almost surely Lemma 3: If in the satisfies sense of Hardy-Krause. and be such that , then for Proof: let any bounded variation function in the sense of Hardy-Krause sense, we have

GUO AND WANG: QMC FILTERING

2095

where is the variation of in the Hardy-Krause sense, is the star discrepancy of the points . and goes to zero as goes to infinity and Since is bounded, we have the almost sure convergence of in the sense of Hardy-Krause. By Lemmas 2 and 3, we have the following result. Proposition 2: Assuming that the transition kernel is Feller and the likelihood function is a bounded continuous positive almost function, then surely in the sense of Hardy-Krause sense. For the SQMC/IS schemes, we have the following result. Proposition 3: Under the conditions of Proposition 2 satisfies and assume that the trial sampling density almost everywhere, we have almost surely in the Hardy-Krause sense. Proof: We first consider the integral

with and

Note that

Therefore, for any bounded variation function

, ,

Proof: By definition we write

, we have

, we have

Then following the similar arguments as in Proposition 2, we have almost surely in the Hardy-Krause sense. 2) Error Bound: It is instructive to gain some insight into how numerical error propagate with the fixed number of QMC points. Following [4], we define the numerical error presented as in

(48)

Moreover, by (50) we have we have implies that

Therefore , which .

V. NUMERICAL RESULTS In this section, we illustrate the performance of the proposed QMC-based algorithms via two numerical examples. A. An Illustrative Example We first consider the following 1-D nonlinear dynamic model

where is the accumulated error in the log likelihood function, and and are defined in (5) and (32) [or (35)], respectively. is to avoid normalization. The purpose of the factor Suppose now that is the smallest number for which

(51) (52)

(49) i.e., indicates the relative error in the samples are such that

Assume that

(50) Then we have the following result. Proposition4: Under (49) and (50), the relative error . bounded by

is

and are where mutually independent white Gaussian noise. We run each algorithm on the generated data according to the the above model , and calculate the corresponding root for mean-square error (rmse) for the state variable . The RMSE results averaged over 500 independent simulations of various algorithms, as well as their running times per simulation and the number of samples used, are listed in Table I. In Table I, the algorithms considered are divided into four groups. The first group consists of the variants of the Kalman filter, including the EKF, the UKF, the QKF, and the QMC-KF

2096

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006

TABLE I THE rmse PERFORMANCE OF VARIOUS NONLINEAR FILTERING ALGORITHMS FOR THE 1-D DYNAMIC SYSTEM GIVEN THE RUNNING TIME IS MEASURED ON A DELL PC WITH A PENTIUM 4 AT 2.66 GHz

[cf. Section III] with 500 randomized Halton points or 500 Latin hypercube points (LHS). It is seen that in this group, the QMC-KF with randomized Halton points has the best performance whereas the QMC-KF with LHS offers similar performance. Although the performance of UKF and QKF are worse than the performance of QMC-KF, they both outperform the EKF significantly. The second group in Table I consists of the two SQMC algorithms [cf. Section IV-A] in which the range of the state is predetermined as (SQMC-1) or determined by an auxiliary UKF (SQMC-2). It is seen that the latter algorithm (SQMC-2) outperforms the former one (SQMC-1). It is also seen that these two SQMC algorithms offer significant performance improvement over the algorithms in the first group. The third group in Table I consists of the particle filter (PF) and the SQMC/IS algorithm with sequential importance sampling scheme (SQMC/IS/SIS) [cf. Section IV-B-1] because they have nearly the same complexity. It is seen that the performance of the PF can be slightly improved using the QMC points. Note that the performance of the SQMC/IS/SIS algorithm is not affected much by the choice of the type of the QMC points. The last group consists of the two SQMC/IS algorithms employing advanced trial sampling densities (SQMC/IS/split and SQMC/IS/adaptive) [cf. Section IV-B-2 and Section IV-B-3]. It is seen that by employing the advanced importance sampling strategies, the performance can be significantly improved. B. Target Tracking in a Sensor Network Target tracking in sensor networks has received significant recent attention. Many methods have been proposed, among which the SMC approach is among the very promising solutions. In target tracking typically models that are linear in the state dynamics and nonlinear in the measurements are considered [8], [12], [16]

BY

(51)–(52).

where (53) is the state equation, with the state noise being white and stationary; and (54) is the sensing being the nonlinear measurement function, model, with being white and staand the measurement noise tionary, and independent of the state noise . Specifically, we consider the “point-object” on the 2-D plane. In the context of a slowly maneuvering target, we consider a nearly constant-velocity model [8], [16]. That is, the state represents the coordinates and the velocities on the vector 2-D plane: . The discretized state equation is associated with time instant (55) where is the 2 2 identity matrix, and is a zero-mean . Gaussian vector with covariance matrix Moreover, the relative angle model [12], [16] is used as the measurement model. That is, the sensors can measure the relative angle of an object with respect to its own position, given by (56) is the position of the sensor which is driven to where sense. Once a sensor completes sensing, it selects which sensor to sense in the next time instant and then transmits the belief state to that sensor. In the simulations, we use the following simple be the set of all rule to select the next sensor to sense. Let possible sensor nodes to which the current sensor node -leader as all nodes that can communode-will hand off. We take be nicate directly with the leader node . Let the predicted target location at time . Then we take the sensor node nearest to this predicted location as the new leader. That is, the next sensor is given by (57)

(53) (54)

where

is the coordinate of sensor .

GUO AND WANG: QMC FILTERING

2097

TABLE II THE rmse PERFORMANCE AND RUNNING TIMES OF VARIOUS NONLINEAR FILTERING ALGORITHMS FOR THE TARGET TRACKING EXAMPLE

Fig. 2. Actual and the estimated target trajectory using the QMC-KF, SQMC/IS/SIS, and SQMC/IS/adaptive algorithms. The dots represent the randomly deployed sensor nodes.

Our goal is to perform online estimation of the a posteriori at time based on distribution of the target position at a densely deployed sensor field and the measurements test the performance of the QMC-based algorithms proposed in this paper. In the simulations, we set the sampling interval . There are 1000 sensor nodes, represented by the dots in Fig. 2, randomly deployed on a rectangular field. The target trajectory shown in Fig. 2 is generated according to the state , and then fixed for all simulamodel (1) with tions. The measurement Gaussian noise has a standard deviation Rad. The prior of the target is Gaussian with mean and and covariance given, respectively, by . We have implemented both the conventional nonlinear filtering algorithms (including EKF, UKF, QKF, and PF) and the new QMC-based filtering algorithms developed in this paper (i.e., QMC-KF, SQMC, SQMC/IS/SIS, SQMC/IS/split, and SQMC/IS/adaptive). We run each algorithm over a simulated target trajectory of 600 time instants and calculate the corresponding rmse for each component of the state vector . The

results are listed in Table II. The QMC points are simply taken as the LHS points because the performance by different types of QMC points is similar. The relative performance among these algorithms shown in Table II is similar to that shown in Table I. It is seen that the best performance is achieved by the SQMC/IS scheme with advanced importance sampling strategy, although the complexity of such technique is also the highest. Finally, one realization of mobility tracking is shown Fig. 2. The dotted line is the actual trajectory; the solid line, the dashed line and the dashed-dotted line are the on-line estimated target trajectory using QMC-KF, SQMC/IS/SIS, and SQMC/IS/adaptive algorithms, respectively. The trajectory predicted by the SQMC/IS/adaptive algorithm follows closely the true actual trajectory. The dashed-dotted line by the SQMC/IS/SIS is little bit far from the true actual trajectory while the dashed line predicted by QMC-KF cannot closely follow the actual trajectory. Moreover, all of the algorithms have good convergence properties.

VI. CONCLUSION We have developed QMC-based Bayesian filtering techniques for general nonlinear dynamic systems. Specifically, we have proposed two general frameworks employing the QMC points to sequentially estimate the unobserved state variable. The first scheme, the QMC-KF, sequentially approximates the true posterior state with Gaussian distributions or mixtures of Gaussian distributions, based on the QMC integration. The second framework, namely the SQMC filters, can be used in conjunction with the IS strategy, resulting in a family of SQMC/IS filters. In particular, under this general framework, we proposed three proposals for importance sampling, based on sequential importance sampling (SIS), split normal important sampling, and adaptive importance sampling, respectively. Numerical results show that compared with existing filtering algorithms, the proposed QMC-based algorithms offer superior performance. Furthermore, the error propagation behavior and the convergence property of the SQMC and SQMC/IS algorithms have been analyzed.

2098

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006

REFERENCES [1] D. L. Alspace and H. W. Sorenson, “Nonlinear Bayesian estimation using the Gaussian sum approximation,” IEEE Trans. Autom. Control, vol. 17, pp. 439–448, 1972. [2] S. Arulampalam, S. Maskell, N. Gordan, and T. Clapp, “A tutorial on particle filter for on-line nonlinear/non-Gaussian Bayesian tracking,” IEEE Trans. Signal Process., vol. 50, no. 2, pp. 174–188, Feb. 2002. [3] D. Crisan and A. Doucet, “A survey of convergence results on particle filtering for practitioners,” IEEE Trans. Signal Process., vol. 50, no. 3, pp. 736–746, Mar. 2002. [4] A. Doucet, J. F. G. de Freitas, and N. Gordon, Sequential Monte Carlo in Practice. Cambridge, U.K.: Cambridge Univ. Press, 2001. [5] A. Doucet, S. J. Godsill, and C. Andrieu, “On sequential simulationbased methods for Bayesian filtering,” Stat. Comput., vol. 10, no. 3, pp. 197–208, 2000. [6] H. W. Sorenson, Ed., Kalman Filtering: Theory and Application. New York: IEEE, 1985. [7] P. Fearnhead, “Using random quasi-Monte-Carlo within particle filters, with application to financial time series,” J. Computat. Graph. Stat., vol. 14, pp. 751–769, 2005. [8] T. E. Fortman, Y. Bar-Shalom, and M. Scheffe, “Sonar tracking of multiple targets using joint probabilistic data association,” IEEE J. Ocean. Eng., vol. OE-8, no. 7, pp. 173–184, Jul. 1983. [9] T. Gerstner and M. Griebel, “Numerical integration using sparse grids,” Numerical Algorithms, vol. 18, no. 4, pp. 209–232, 1998. [10] J. Geweke, “Bayesian inference in econometric models using Monte Carlo integration,” Econometrica, vol. 57, no. 6, pp. 1317–1339, 1989. [11] N. J. Gordon, D. J. Salmon, and A. F. M. Smith, “A novel approach to nonlinear/non-Gaussian Bayesian state estimation,” Inst. Elect. Eng. Proc. Radar Signal Process., vol. 140, pp. 107–113, 1993. [12] D. Guo, X. Wang, and R. Chen, “New sequential Monte Carlo methods for nonlinear dynamic systems,” Stat. Comput., vol. 15, no. 2, pp. 135–147, 2005. [13] J. E. Handschin, “Monte Carlo techniques for prediction and filtering of nonlinear stochastic processes,” Automatica, vol. 1970, no. 6, pp. 555–563, May 2003. [14] A. Harvey, Forecasting, Structure Time Series Models and the Kalman Filter. Cambridge, U.K.: Cambridge Univ. Press, 1989. [15] S. Henderson, R. Chiera, and R. Cooke, “Generating “dependent” quasirandom numbers,” in Proc. 2000 Windter Simulation Conf., J. Joins, R. Barton, and P. Fishwick, Eds., 2000, pp. 527–536. [16] C. Hue, J. Cadre, and P. Perez, “Sequential Monte Carlo methods for multiple target tracking and data fusion,” IEEE Trans. Signal Process., vol. 50, no. 2, pp. 309–325, Feb. 2002. [17] K. Ito and K. Xiong, “Gaussian filters for nonlinear filtering problems,” IEEE Trans. Autom. Control, vol. 45, no. 5, pp. 910–827, May 2000. [18] J. Julier and H. F. Durront-Whyte, “A new method for the nonlinear transformation of means and covariances in filters and estimators,” IEEE Trans. Autom. Control, vol. 45, no. 3, pp. 477–482, Mar. 2000. [19] J. Kotecha and P. Djuric, “Gaussian sum particle filtering,” IEEE Trans. Signal Process., vol. 51, no. 10, pp. 2602–2612, Oct. 2003. [20] P. L’Ecuyer and C. Lemieux, Recent Advances in Randomized QuasiMonte Carlo Methods. Boston, MA: Kluwer Academic, 2002. Modeling uncertainty: An examination of stochastic theory, methods, and applications. [21] D. Ormoneit, C. Lemieux, and D. Fleet, “Lattice particle filters,” Proc. 17th Annu. Conf. Uncertainty in Artificial Intelligence, pp. 395–402, 2001. [22] J. S. Liu, Monte Carlo Strategies for Scientific Computing. New York: Springer-Verlag, 2001. [23] J. S. Liu and R. Chen, “Sequential Monte Carlo methods for dynamic systems,” J. Amer. Statist. Assoc., vol. 93, pp. 1032–1044, 1998. [24] G. Marsaglia and A. Zaman, “Rapid evaluation of the inverse of the normal distribution function,” Statist. Prob. Lett., vol. 19, pp. 259–266, 1994.

[25] A. Owen, “Monte Carlo variance of scrambled net quadrature,” SIAM J. Numer. Anal., vol. 34, no. 5, pp. 1884–1910, 1997. [26] , “Latin supercube sampling for very high dimensional simulations,” ACM Trans. Model. Comput. Simulation, vol. 8, no. 2, pp. 71–102, 1998. [27] A. Papageorgiou, “Fast convergence of quasi-Monte Carlo for a class of isotropic integrals,” Math. Computat., vol. 70, no. 4, pp. 297–306, 2001. [28] K. Petras, “Fast calculation of coefficients in the Smolyak algorithm,” Numerical Algorithms, vol. 26, no. 2, pp. 93–109, 2001. [29] V. Philomin, R. Duraiswami, and L. S. Davis, “Quasirandom sampling for condensation,” in Proc. Eur. Conf. Computer Vision, 2000, pp. 134–149. [30] X. Wang and F. Hickernell, “Randomized Halton sequences,” Math. Comput. Model., vol. 32, no. 4, pp. 887–899, 2000. [31] M. West, “Mixture models, Monte Carlo, Bayesian updating and dynamic models,” Comput. Sci. Stat., vol. 24, pp. 325–333, 1993. [32] M. West and J. Harrison, Bayesian Forecasting and Dynamic Models. New York: Springer-Verlag, 1989.

Dong Guo received the B.S. degree in geophysics and computer science from the China University of Mining and Technology (CUMT), Xuzhou, in 1993, the M.S. degree in geophysics from the Graduate School of Research Institute of Petroleum Exploration and Development (RIPED), Beijing, China, in 1996, and the Ph.D. degree in applied mathematics from Beijing University, Beijing, in 1999. In 2004, he received a second Ph.D. degree from the Department of Electrical Engineering, Columbia University, New York. From 2004 to present, he has been a Research Associate with Rutter Associates, New York. His current research interests include Monte Carlo-based statistical signal processing, time series analysis, and their applications in finance.

Xiaodong Wang (S’98–M’98 –SM’04) received the B.S. degree from Shanghai Jiao Tong University, Shanghai, China; the M.S. degree from Purdue University, West Lafayette, IN; and the Ph.D. degree from Princeton University, Princeton, NJ, all in electrical engineering. From July 1998 to December 2001, he was with the Department of Electrical Engineering, Texas A&M University. Since January 2002, he has been with the Department of Electrical Engineering, Columbia University, New York. His research interests fall in the general areas of computing, signal processing, and communications. He has published extensively in these areas. Among his publications is a recent book entitled Wireless Communication Systems: Advanced Techniques for Signal Reception (Englewood Cliffs, NJ: Prentice Hall, 2003). His current research interests include wireless communications, statistical signal processing, and genomic signal processing. Dr. Wang received the 1999 NSF CAREER Award, and the 2001 IEEE Communications Society and Information Theory Society Joint Paper Award. He currently serves as an Associate Editor for the IEEE TRANSACTIONS ON COMMUNICATIONS, the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, the IEEE TRANSACTIONS ON SIGNAL PROCESSING, and the IEEE TRANSACTIONS ON INFORMATION THEORY.