Maximum Likelihood Estimation of Transition ... - IEEE Xplore

12 downloads 0 Views 602KB Size Report
Oct 24, 2008 - Umut Orguner, Member, IEEE, and Mübeccel Demirekler, Member, IEEE. Abstract—This paper describes an online maximum likelihood.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 10, OCTOBER 2008

5093

Maximum Likelihood Estimation of Transition Probabilities of Jump Markov Linear Systems Umut Orguner, Member, IEEE, and Mübeccel Demirekler, Member, IEEE

Abstract—This paper describes an online maximum likelihood estimator for the transition probabilities associated with a jump Markov linear system (JMLS). The maximum likelihood estimator is derived using the reference probability method, which exploits an hypothetical probability measure to find recursions for complex expectations. Expectation maximization (EM) procedure is utilized for maximizing the likelihood function. In order to avoid the exponential increase in the number of statistics of the optimal EM algorithm, we make interacting multiple model (IMM)-type approximations. The resulting method needs the mode weights of an IMM is the number of models in filter with 3 components, where the JMLS. The algorithm can also supply base-state estimates and covariances as a by-product. The performance of the estimator is illustrated on two simulated examples and compared to a recently proposed alternative. Index Terms—Interacting multiple model (IMM), jump Markov linear system (JMLS), maximum likelihood (ML), transition probability matrix (TPM).

I. INTRODUCTION UMP Markov linear systems (JMLSs) are used as a natural modeling framework for many problems in control theory, signal processing, and telecommunications (see [1] and the references therein). Specifically, in maneuvering target tracking, they are extensively used for modeling the motion of a target that can jump between a finite number of dynamics [2], [3]. The evolution of the model parameters of the jump Markov linear systems follows a finite-state Markov chain whose transition probabilities (TPs) are usually assumed to be known a priori. The state estimation literature for JMLSs is vast [1], [4]–[6] and is still expanding [7], [8]. The existing algorithms range from the classical generalized pseudo-Bayesian (GPB) [9], [10] and the interacting multiple model (IMM) [3], [11] filters to stochastic sampling-based methods [12]–[14]. With the advent of particle filters, there has been a considerable amount of research in the state estimation of jump Markov (nonlinear) systems as well [15]–[17]. The use of heuristically selected diagonally dominant transition probability matrices (TPMs) is generally be-

J

Manuscript received October 29, 2007; revised October 24, 2008. First published July 25, 2008; current version published September 17, 2008. The associate editor coordinating the review of this paper and approving it for publication was Dr. Thierry Blu. U. Orguner was with the Department of Electrical and Electronics Engineering, Middle East Technical University, 06531 Ankara, Turkey. He is now with Division of Automatic Control, Department of Electrical Engineering, Linköping University, 581 83 Linköping, Sweden (e-mail: [email protected]). M. Demirekler is with the Department of Electrical and Electronics Engineering, Middle East Technical University, 06531 Ankara, Turkey (e-mail: [email protected]). Digital Object Identifier 10.1109/TSP.2008.928936

lieved to yield satisfactory results in most of the research mentioned above. The uncertainty caused by the unknown TPs associated with a JMLS attracted the attention of the researchers as early as the 1970s, when a Bayesian solution that involves a numerical integration to the problem of TP estimation was presented in [18]. In this work, Sawaragi et al. considered also the adaptive estimation of the state using these estimates in a system having interrupted measurements, which actually was a special case of a JMLS. At that time, this kind of system was quite popular, and later, Tugnait and Haddad investigated the asymptotic behavior of this Bayesian solution when the unknown TPM can take values from a finite set in [19]. For the same case (where the unknown TPM can take values from a finite set), after Tugnait and Haddad presented a similar solution for linear systems with Markovian jump noise parameters [20], Tugnait proposed an approximate maximum likelihood (ML) approach for general JMLSs [10]. Even when the TPM belongs to a finite set, it is concluded in [10] that the standard ML estimation is not computationally feasible. Quite later, an (approximate) expectation-maximization procedure that maximizes a lower bound on the log-likelihood is given in [21] for switching state-space models, where the states of uncoupled state-space systems that have constant dynamics are switched in the output. This type of system, as described in [21], is different than JMLSs although some researchers use a similar phrase, “switching systems,” for JMLSs (e.g., [22]). Therefore, until 2004, when Jilkov and Li addressed the problem of online minimum mean-square error (MMSE) estimation of the TPs [23], the literature seemed to lack a complete approach of TP identification for JMLSs where TPs belong to a continuous valued set (rather than being in a finite set). There have been, on the other hand, studies that tried to make the state estimation more robust to the unknown TPs. Namely, Doucet and Ristic in [24] integrate out the unknown TPs from posterior densities using a Bayesian methodology with Dirichlet priors for the TPs. A completely different approach is to design state estimators that are robust to uncertain techniques (see [25] and the references therein). TPs using It has been clearly shown by [23] that it is possible to boost the performance of the multiple model estimation algorithms by estimating the TPs of JMLSs in an online fashion. Therefore, there has been increasing interest in online estimation of TPs associated with JMLSs. In [23], the authors propose suboptimal MMSE estimators for the TPs. The multiple model estimation problem with unknown TPs is attacked in [26] using Bayesian sampling algorithms. Quite recently, a simple online TP estimator, named the recursive Kullback–Leibler (RKL) algorithm, was derived in [27] that approximately minimizes the Kullback–Leibler divergence between the likelihood of mea-

1053-587X/$25.00 © 2008 IEEE

5094

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 10, OCTOBER 2008

surements given the TPs and the true likelihood. A very interesting approach is given in [17] by modeling mode probability distribution as a Markov process with Dirichlet distribution. Although the TPs are not explicitly considered, this approach [17] gives flexibility and adaptiveness fighting the uncertainty caused by the unknown TPs. As mentioned above, the ML estimation algorithm was one of the first approaches applied to the TP estimation of JMLSs. However, the existing ML-based algorithms suffer from the following drawbacks. • They consider only the cases where the unknown TPMs are in a finite set [10]. • They do not apply to JMLSs [21]. In addition to these, the comfortable approximation scheme of the IMM filter that can be used to avoid the infeasibility of the optimal ML approach for the JMLSs was not yet available to Tugnait in [10]. The profitable place that the IMM filter occupies on the performance versus computation curve makes it a useful candidate to use in the approximations of the infeasible ML algorithms. In this paper, using the motivation above, we propose an ML estimation-based solution to the TP estimation problem in JMLSs. Our algorithm is new in that it assumes that the unknown TPM takes values in a continuous valued set and it is derived specifically for the JMLSs. The algorithm can be used either online or offline. In accordance with the difficulty of the problem, the approximate solution turns out to be quite computationally demanding by requiring the mode weights of an component IMM filter, where is the number of models in the JMLS. We use the well-known reference probability method [28], [29], in which a new probability measure is defined and exploited, to derive the TP estimation algorithm, which enables us to identify the approximations required to form a feasible method easily. This paper is organized as follows. In Section II, the problem definition and an introduction to the reference probability method are given. Application of the ML algorithm to JMLSs is presented in Section III, which concludes that the algorithm should calculate the estimates of the number of jumps between the states of the Markov chain recursively. The presentation of the main results is made in Section IV, where the related recursive calculation procedure is derived. The formulas for final TP estimates utilizing the estimates of the number of jumps are given in Section V. Section VI presents a brief summary of the derived algorithm along with some modifications and implementation issues. Moreover, achieving the base state estimation using the derived algorithm without any additional base-state estimators is also outlined in Section VI. The TP estimation performance of the algorithm is illustrated in Section VII using the same examples as those in [27]. Also, a comparison with the RKL method of [27] is made in Section VII. This paper is finalized with conclusions and some related future work in Section VIII.

(2) where • is the continuous-valued base-state sequence , where the nowith initial distribution stands for a Gaussian probability density tation function for dummy variable (or evaluated at ), which has a mean and covariance ,; is the unknown discrete-valued modal-state se• quence; is the noisy observation sequence; • is a white process noise sequence with distri• ; bution is a white measurement noise sequence inde• pendent from the process noise with distribution . is asThe discrete-valued modal-state sumed to be a first-order finite-state homogenous Markov chain . Here, the variable with fixed but unknown TPM denotes the canonical unit vector with unity at the th position and zeros elsewhere. The initial probability distriis given as . This type bution of of state-space selection for the Markov chain , which has been taken from [28], [29], enables us to write a so-called semimartingale representation (3) is a martingale increment, i.e., , where we adopted, for any signal , , the notation defined as . Another merit of the state-space representation is that the can be written in terms of inner indicator functions about as a linear operation. Consider the indicator products of function , where the notation defined as for

, where

(4) denotes the indicator function of the set . The function is equal to unity when and zero otherwise. Such a function is actually a random variable and, with the state-space selection described above, can be expressed as , where the notation denotes a standard inner , i.e., with . Since product in can have values only in the set , the random is unity when and zero otherwise. variable . The following properties easily Hence, follow from the properties of the inner product and are used in the mathematical derivations presented in this paper: (5)

(6)

II. PROBLEM DEFINITION The following jump Markov linear system model defined in a probability space is considered in this paper: (1)

for all . and the modal-state sequence The basic variables , , are assumed to be mutually independent for all . The time-

ORGUNER AND DEMIREKLER: MAXIMUM LIKELIHOOD ESTIMATION OF TRANSITION PROBABILITIES OF JMLSs

varying matrices , , , and are assumed to be known for each value of . The aim of our study reported in this paper is to find a possibly approximate online ML estimator for the unknown fixed TPs of the jump Markov linear system defined above.

5095

(9)

(10)

A. Reference Probability Method The reference probability method [28], [29] is a way of constructing information states [30] for taking complicated expected values by making use of a hypothetical probability measure denoted as . The probability measure is defined , we have the following. such that, under , is a sequence of independently distributed • (i.d.) random variables, which are Gaussian distributed (we assume that with zero mean and covariance ). The sequence in this case is independent and . of both sequences • , is a sequence of i.d. random variables, which are Gaussian distributed with zero mean and covariance . The sequence in this case is independent of both and . sequences • The distribution of , under is the same as it is under . The aim in defining such a probability measure is to express under probability measure in terms of every expectation expectations under probability measure . The probability measure is the nominal measure under which our results are required. However, generally, finding complex expectations and recursions under is very difficult. The expectations under the reference probability measure can be taken much more easily than those under thanks to the independence properties. in terms of , we To define a relationship between consider a function of , and . Assuming that and the joint probability density functions of , and exist under and , respectively, we have

(11)

(12)

(13) Thus, we have the basic identity for the conditional expectations as (14) We now concentrate on the random variable . Since the same under both and , we have gives

given as is distributed , which

(15) We decompose the density function lowing form:

in the fol-

(16)

where the dependence of on , pressed for simplicity. Then, we can write

, and

is sup-

The joint density can also be decomposed in and the same way. Noting that, under , due to the independence properties, this can be written as decomposition of (17)

Therefore, we have

Hence, we can write the random variable form as

in a multiplicative

(7) where we defined the random variable as . The identity (7) is the main relation between the unconditional and . Now, consider a conditional expecexpectations tation (8)

(18) Similar random variables are also used in [8] and [31]. In meais a restriction sure theoretical terms, the random variable of the Radon–Nikodym derivative of the probability measure with respect to the probability measure (shown as ) to . the complete filtration generated by

5096

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 10, OCTOBER 2008

In the following, and especially in the proofs in the Appendixes, for the sake of simplicity we are going to use the following definitions for the conditional probability density functions of (18) in the cases where the arguments of the densities are changed frequently with the dummy integration variables (19) (20) (21) (22) III. MAXIMUM LIKELIHOOD ESTIMATION FOR TRANSITION PROBABILITIES Maximum likelihood estimation is a powerful tool whose properties are well appreciated in the parameter estimation community [32]. The ML estimation procedure outlined in this section is primarily based on those given in [28] and [29] for hidden Markov models (HMMs). Suppose that we have a family , where of parameterized probability measures denotes the parameter vector, on a measurable space all absolutely continuous with respect to a fixed probability measure . The likelihood function for obtaining an estimate of based on the information available in is defined as

independent of , the maximum likelihood estimate given as

is also (30)

which is the classical definition. The measure theoretical definition (24) using the likelihood (23) is, thus, a generalization of (30), and we are going to use it in the following parts. in In many problems, like in ours, the calculation of (23) is impossible or computationally costly. In this case, the well-known expectation maximization (EM) algorithm comes into picture to provide an iterative solution. The EM method, given here as a generalization of the classical EM algorithm in the sense that (24) is a generalization of (30), has the following steps. and choose . Step 1) Set and set . Step 2) Set Step 3) (E-Step) Compute as (31) as

Step 4) (M-Step) Set

(32) Step 5)

(23) denotes the expectation with respect to the probability where and is the Radon–Nikodym derivative of measure with respect to . The maximum the probability measure is then given as likelihood estimate (24) In the previous section, we have seen that, when the joint densities of the processes exist under two probability measures, the restriction of the Radon–Nikodym derivative of one of the measures with respect to the other is given by the ratio of the densities. In terms of the densities and , this means (25) where ated by

— If the stopping criterion is satisfied, set . Set to 1, go to Step 2). — If the stopping criterion is not satisfied, set to 1 and go to Step 3). Since our algorithm is required to be able to work in an online fashion, we are going to use a predetermined finite number of iterations with respect to in our algorithm. Here, we illustrate the case , which gives the worst performance among these algorithms. The steps of the resulting algorithm turn out to be the following. and choose . Step 1) Set Step 2) (E-Step) Compute as (33) Step 3) (M-Step) Set

as

denotes the complete filtration gener. Using the definition (23)

(34)

(26)

Step 4) Set to 1, go to Step 2). In our problem, the unknown parameter vector is composed as follows: of the TPs

(27) (28) (29) where the dependence of and on , is suppressed for the sake of simplicity. Since

, and is

The TPs must also satisfy the constraints (35)

ORGUNER AND DEMIREKLER: MAXIMUM LIKELIHOOD ESTIMATION OF TRANSITION PROBABILITIES OF JMLSs

5097

for , . Thus, the maximization (34) should be done by considering these constraints as well. We, as is done in [29] for HMMs, carry out the maximization as follows:

where the sign means “equal up to an additive constant.” We neglect the additive constants that do not matter for the maximization problem using this sign. Defining the quantities and as

(36)

(43)

where is the set of Lagrange multipliers. In order to calculate the Radon–Nikodym derivative , we again consider the two probability meaand . Under and , the Markov chain sures has different TPMs and , respecand should depend on the tively. The processes Markov chain for both measures in same way as they do in Section II-A. Assuming that the related densities under exist, we need to calculate the ratio of to as the restriction of the to . We can decompose both densities as

(44) we obtain (45) Note that quantity is the number of jumps of the Markov to state until time . We substitute chain from state of (45) into the maximization problem (36) to get

(46)

(37) , , , and Since the terms do not depend on , they are canceled out when we divide the densities and the ratio of the densities can, then, be written as

Taking the derivative of the right-hand side of (46) with respect , equating it to zero, and then solving for , we get to (47)

(38) If we sum both sides for as for the sake of where we abbreviated when and simplicity. We know that for , . A more compact and explicit way of writing the same fact is (39)

, we obtain

as (48)

When we substitute to be

into (47), the TP estimates

are found

(49) Hence, we can equivalently write (38) using (39) as (40) Now, we can set follows.

to write

in (33) as

for , . Since, as we mentioned above, the quantity is the number of transitions of the Markov chain from to state until time , is the expected state number of such jumps, as the definition (44) suggests. The quantity in the denominator of (49) is then the expected number of sampling instants that the Markov chain is at the state i.e.,

(41)

(42)

(50)

5098

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 10, OCTOBER 2008

Thus, the estimate (49) is the ratio of the expected number of times the Markov chain jumped from the th state to th state (up to time ) to the number of times the Markov chain was at the th state (up to time ), which is intuitively quite meaningful. can be observed as An important property satisfied by

is any test function (i.e., We can see that, if a measurable function with compact support), the following equalities are satisfied:

(57) (58) which can be obtained by just multiplying both sides of (54) and and then by integrating. Selecting , , (55) by in (57) and (58) and using the property (5), we can write (51) (59)

which is going to be used in subsequent parts. In Section IV, we are going to show how the estimates of the number of jumps can be calculated recursively.

IV. RECURSIVE ESTIMATION OF NUMBER OF JUMPS In the previous section, it has been seen that the maximum likelihood estimation of the TPs requires the calculation of the defined as estimated number of jumps

(60)

These, considering (53), show that the densities , , , and , form a sufficient statistics for the estimation of number of jumps and hence the TPs. In fact, we , , , can conclude that the set of densities is alone a sufficient statistics by noting that

(52) In this section, we are going to find recursive update formulas for the estimated number of jumps using the reference probability method outlined in Section II-A. Using(14), we see that

(61) which is derived using the property (51) after summing both , . The densities , , sides of (54) for will only be used for notational simplicity.

(53) A. Recursion For the recursive calculation of initions are required:

, the following density def-

Theorem 1: The densities

satisfy the recursion

(54) (55) can be easily calculated (approxNote that the densities imated) by using multiple model estimation algorithms such as IMM or GPB2 filters because the normalized version of is equal to the modal densities calculated in these filters, i.e., (56) Similar facts are derived for the IMM case in [33] and [34, section 3.3].

(62) where -dependence of the TPs is shown as for , . Proof: The proof is given in Appendix I for the sake of clarity. Remark 1: Since the recursion (62) is linear, once the iniare (unnormalized) Gaussian densities tial densities (which is the case in our model), the densities become unnormalized Gaussian mixtures with an ever increasing number of components as increases.

ORGUNER AND DEMIREKLER: MAXIMUM LIKELIHOOD ESTIMATION OF TRANSITION PROBABILITIES OF JMLSs

Corollary 1: The densities

satisfy the recursion

5099

,

the initial unnormalized densities , are all equal to zero, i.e., for

,

(68)

Remark 3: An important point about the recursion (63) is that . the right-hand side has 0/0 indeterminate form when , the recursion (62) is to be used. Therefore, for (63) , which is in terms of only the densities , , . Proof: Proof is given simply by the substitution of the relationship (61) into the recursion (62). The recursion (63) is the main result of this paper, which results in a maximum likelihood estimator for the TPs associated with JMLSs. Remark 2: Let us define an intermediate density for

(64) Then, (63) will read

C. Approximations For the TP estimation at time , i.e., for the calculation of , we need the densities , , , due to the identity (49). Hence, we need to calculate the recursively. The recursion (63), densities in order to calculate these densities, requires the densities , , . However, at time 1, the densities

,

,

, are not available.

Instead, the densities , , , are available since these densities must have been used for computing the TP estimates at time 1. This problem can , which makes the be iterated back until the initial time reprocessing of all measurements necessary. In order to obtain an online recursive estimation mechanism, we therefore make the approximations (69)

(65) Now, consider the general Bayesian density recursion for general Markov systems (66) Note the similarity between (65) and (66) except for constants in the denominators on their right-hand sides. The Bayesian recursion in (66) can be interpreted as follows. • The integral after a multiplication by system dynamics in (66) represents the prediction (time) update. and are Gaussian, When both the corresponding update is a Kalman prediction update. • The multiplication by the conditional measurement likein (66) represents the measurement uplihood date. When the multiplied densities are Gaussian, the corresponding update is a Kalman measurement update. These facts will be used in the following quite often, but they also have the following interpretation in terms of the recursion is, up to a constant multiplier, the (65): The density time and measurement updated version of the density with the model parameters corresponding to . B. Initial Densities Since the initial number of jumps

is given as (67)

for all . This assumption exists in almost all system identification procedures [32] and basically amounts to assuming that the parameter estimates do not change much between consecutive time instants. Using (63), the resulting recursion for the densities becomes

(70) for all . In MMSE multiple model estimation, it is well known that the number of statistics to be kept for calculating the densities grows exponentially [2]. The optimal recursion (62) shows that the situation is worse for the densities due to the fact that the recursions are excited by the densities . Even if the densities are approximated by single Gaussian densities as in the IMM or GPB2 filters, still the number of statistics to be kept for densities grows exponentially. This extreme growing is observed more clearly in the recursions (63) and(70). This situation, therefore, makes additional approximations necessary. Here, we are going to make IMM-type approximations (see the derivation of the IMM filter in [33] or [34, section 3.3] for IMM-type approximations) to keep a single Gaussian density for each in (70). This means that, before the time and measurement updates are done, we approximate the input mixture of (70) by a single Gaussian, which is called as mixing in the IMM filter [2], [33]. In other words, assuming that each previous

5100

density i.e.,

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 10, OCTOBER 2008

is approximated by a single Gaussian,

Noting that the density

is given as (79)

(71) the Gaussian mixture Gaussian, i.e.,

is approximated by a single (72)

where and , we see that the integral in (78) is a Kalman prediction update that uses the model corresponding to . A more descriptive derivation of this result is given in Appendix III-B. Hence

where where the Kalman predicted estimate are given as (73)

(80) and covariance

(81) (82) Considering the fact that (83)

(74)

where and , we see that the multiplication in (80) is a Kalman measurement update up to a multiplication constant which is the predictive measurement likelihood (see Appendix III-A for this result on the mutiplication of Gaussian densities). Hence

(84) (85) where the Kalman measurement updated quantities are given as (86)

(75)

(87)

where the notation (expression) stands for (expression)(ex, , and are pression) . The quantities obtained above by just equating the integral, mean, and covariance of the both sides in (72), respectively. With these approxiare automatically kept in the mations, the densities form

(88) (89) (90)

(76) D. Initialization mean and covariances where the coefficients are the time and measurement updated versions of the corresponding “mixed” quantities in (72) (with the th model). We now derive this more explicitly. When the approximation (72) is substituted into the recursion(70), we obtain

(77)

The update formulas given in the previous section assume that the previous densities , , , at each estimation step are represented by a single unnormalized Gaussian. The initial densities , , , given by (68) can be expressed as unnormalized Gaussians with arbitrary means and covariances and with zero weights. However, this causes a 0/0 indeterminate form on the right-hand . This can be avoided by using side of recursion of (63) for .

the recursion (62), which requires the densities The initial densities (78)

can be calculated as follows: (91)

ORGUNER AND DEMIREKLER: MAXIMUM LIKELIHOOD ESTIMATION OF TRANSITION PROBABILITIES OF JMLSs

where

for

,

5101

. When we combine this with (59), we get

(92) (104) (93) (94) Here, as defined in Section II, . The proof of (91) is given in Appendix II for the sake gives the densities of clarity. The recursion (62) for , , , as

for , . Since the densities are approximated by a single weighted Gaussian density, we have , which gives

for

(105)

(95) where we have (96) otherwise Arbitrary Arbitrary

(97)

otherwise

(98)

otherwise

While obtaining (97) and (98), we do not need to use the approximation (69) because there is only a single nonzero component (which is ) on the right-hand side of (62). Note that the are zero for all , and values due to (67), densities and hence (68). The gains , estimates , covariances , and in (97) and (98) are given as (99)

Remark 4: It is important to note that, for the calculation of the TP estimates, all that matters about the coefficients (or ) is their relative magnitudes (and not their absolute magnitudes). Therefore, at any time-step , one can multiply the coefficients (or ) by a common constant number without affecting the output estimate. In the cases where the coefficients get too small or too big to handle in the computer, one can make some normalization on them accordingly without given affecting the performance. Moreover, the term in the coefficient update equations, which can cause a division by zero in the computer, can be discarded safely since it is a common factor to all of the coefficients. VI. SUMMARY AND MODIFICATIONS In this section, we summarize the algorithm outlined in the previous sections. : • Initialization at , estimates — Set the coefficients (mode weights)

(100)

, and covariances

as

(101) (102) and

otherwise

is the set of initial TP estimates before we get any

measurements. They are generally selected as for , , for all if we do not have any prior inforand mation about them. Note that some of the estimates covariances can be given arbitrary values. Their values do not affect the subsequent iterations of the algorithm since their corresponding mode weights are initialized as zero.

Arbitrary

otherwise

Arbitrary for

,

otherwise , where (106) (107)

V. CALCULATION OF TRANSITION PROBABILITY ESTIMATES

(108)

In this section, we give the formulas for the TP estimate assuming that the densities lated. Substituting (53) into (49), we obtain

(109)

have been calcu, and Note that the quantities , the equations above are given by (103)

required for

(110) (111)

5102

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 10, OCTOBER 2008

• Measurement update

(112) (113)

(120)

— Calculate the TP estimates as

(121) (122) (114) (123)

for , . : • Update for — Mixing: Calculate the mixed coefficients timates ,

, and covariances

— Estimate calculation: The new TP estimates are calculated as , es(124)

for

, as for

,

.

A. Modifications (115)

The algorithm summarized above calculates the number of jumps at each step using the previously estimated TPs and then updates the TP estimates with new

(116)

(117) — Kalman filtering: Using the mixed coefficients , estimates , and covariances for , , as the initial conditions, run Kalman filters to obtain updated mode weights , estimates , and covariances for , . The required Kalman filtering equations are given as: • Prediction update (118) (119)

ones

. The algorithm therefore works as/in a closed

estimation loop. Since there are an insufficient number of jumps observed between the states in the first few sampling instants to estimate the TPs, most probability estimates erratically jump to either zero or unity in the unmodified version of the algorithm. It is therefore desirable not to close the estimation loop for a few samples at the initial part of the estimation. During this period, the algorithm uses the initial TPs and does not update the TP estimates. After a sufficient number of sampling periods, the loop can be safely closed to update the TP estimates. It is observed during the simulations that the ever decaying charin (115)–(117) reduces acteristics of the coefficient the convergence speed significantly. Therefore, it is desirable to limit this decay. For this purpose, the coefficient is replaced by , where is called as the decay limit factor. This modification makes the algorithm converge faster and track the possible transition rate changes at the expense of making it slightly more prone to noise. B. Joint Base-State Estimation The approximate ML algorithm derived above can provide mode-conditioned state estimates and covariances along with overall approximate MMSE estimate and covariance as a by-product. Therefore, if the base-state estimates and covariances of the JMLS are also required, one need not execute an additional state estimator (like IMM of GPB2), which uses the online calculated TPs. The mode-conditioned state estimates , covariances , and mode-probabilities defined as (125)

ORGUNER AND DEMIREKLER: MAXIMUM LIKELIHOOD ESTIMATION OF TRANSITION PROBABILITIES OF JMLSs

5103

(126) (127) , covarican be calculated using the ML estimator states ances , and mode weights . We illustrate the idea, by substituting (61) into (56) to obtain

(128)

(129)

from which the quantities as

,

, and

can be calculated . Note that this system corresponds to a

(130)

(131)

The overall MMSE state estimates defined as

and covariances (132) (133)

can then be calculated using IMM (or GPB2) final output calculation formulas [2]. VII. SIMULATION RESULTS This section illustrates the performance of the maximum likelihood estimator on two simulated scenarios. The first scenario is originally taken from [23], where the following system model is considered: (134) (135) where

Fig. 1. Average TP estimation performance of the modified maximum likelihood estimation algorithm.

, , and with , , and being mutually independent for . The model sequence is a first-order two-state homogeneous Markov process with TPM given as

system with frequent measurement failures with the modal state corresponding to the case of the failure. The modified ML algorithm is run on the simulated measurements of this . A total of 1000 MC system with initial TPs runs are made where, in each of the runs, a different realization of the state and measurement process is used. The lengths of the open-loop estimation period and the decay limit factor are , respectively. The average estimaselected as 20 and tion performance of the modified ML algorithm, i.e., the average (mean) of the estimated probabilities in the MC runs, is given in Fig. 1. Note that during the open-loop period, which corresponds to the first 20 sampling instants, the initial TP estimates are not changed. Once the estimation loop is closed, the transition probability estimates jump to values that are near to their corresponding true values and then continue to converge towards the true values. The second example contains a hypothetical scalar jump Markov linear system that has three models. The parameters of the system are given as: , , ; • • for ; , , ; • for ; • , , • ; and the true TPM of the mode sequence and the initial TPM are selected as estimate

(136) The modified algorithm works with open estimation loop for the first 100 sampling instants, and the decay limit factor is se. The average estimation performance for lected as 1000 Monte Carlo runs are shown in Fig. 2. In this plot, there

5104

Fig. 2. Average TP estimation performance of the modified maximum likelihood estimation algorithm.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 10, OCTOBER 2008

Fig. 3. Average TP estimation performances of the modified ML estimation algorithm and the RKL algorithm. The thicker and thinner lines denote the ML and RKL estimation results, respectively.

exists some bias in the estimated probabilities like the case in [23]. These can be attributed to the approximations involved in the derivation. Especially, the mixing approximation in the IMM filter, which combines many mixture components into a single Gaussian by weighting (averaging) their means and covariances, can result in biases in the estimation by causing the position of the dominant mode in the mixture to shift slightly towards nondominant modes. This property can explain the small , which corresponds biases existing even in the probability to a relatively higher probability event. Selecting the dominant modes instead of averaging (mixing) can be a solution. However this, on the other hand, can result in poor performance with low probability values, which are observed less frequently. A. Comparison With RKL Algorithm In this section, we compare the estimation performance of the ML method to that of the RKL method presented in [27]. The Kalman filters (i.e., RKL method runs approximately time and measurement updates) per each measurement. Thus, our ML algorithm, which runs Kalman filters per each measurement, is computationally more expensive. For the purpose of comparison, we plot the estimation performances of the methods on the same plot for the first example given above. For the RKL method, the constant step-size sequence is selected as like the case in [27]. The average TP estimation and typical single-run estimation performances of the methods are shown on Figs. 3 and 4, respectively. According to the average performance plot, the initial convergence rate of ML algorithm is much faster than that of the RKL algorithm. However, in the long terms, the estimation performances become similar to each other. This situation suggests the usage of ML algorithm at the beginning of the estimation for fast convergence and then switching to the RKL algorithm with low computation for the rest of the estimation. The single-run comparison is also worth mentioning. Fig. 4 shows that the probability estimate trajectories of ML are less noise-prone than that of RKL. The noise susceptibility of the RKL method can be reduced by decreasing the

Fig. 4. Single-run TP estimation performances of the modified ML estimation algorithm and the RKL algorithm. The thicker and thinner lines denote the ML and RKL estimation results, respectively.

step-size but this, in turn, would further reduce the convergence speed of the RKL method significantly. This low variance property is further confirmed by the root mean square (rms) estimation error curves shown in Figs. 5 and 6 corresponding to the first and the second examples, respectively. In many of the subfigures, the ML-based algorithm can reduce the estimation variance up to half of what can be obtained by RKL algorithm. This suggests that the ML algorithm can be used in applications especially requiring high noise-rejection capabilities. A final issue to point out is that both of the estimators cannot reduce the estimation variance to zero due to their constant step sizes utilized for fast convergence rate. If this is a critical issue for the problem at hand, then, the modification of the ML algorithm that limits the decay of the coefficients should be used more carefully. A slightly more complex modification in this case can be to limit the decay temporarily or to reduce the decay rate.

ORGUNER AND DEMIREKLER: MAXIMUM LIKELIHOOD ESTIMATION OF TRANSITION PROBABILITIES OF JMLSs

5105

schemes like adaptive pruning techniques instead of averaging (mixing) can be considered. The simulation studies show that the estimator is faster (at least in the initial phases of the estimation) and less noise-prone than the previously proposed RKL algorithm. In a general parameter-estimation scenario, starting the estimation with a computationally costly ML procedure like the one presented above and then switching to lower cost algorithms like RKL can be a suitable choice for utilizing good sides of different algorithms. APPENDIX I PROOF OF THEOREM 1

Fig. 5. RMS TP estimation errors of the modified ML estimation algorithm and the RKL algorithm. The thicker and thinner lines denote the ML and RKL estimation results, respectively. Notice that the curves corresponding to  are the same as those of  for i ; because     for i ; .

=12

=1 2

^ 0

= ^ 0

The proofs in this and the next appendix are based on the following simple result. Lemma 1: Let be any (Borel) measurable function with compact support (which we call test function from this and be two (Borel) point on). Let measurable and square integrable functions. If (137) , then for all test functions everywhere. Let be any test function; then

almost

(138)

(139)

Fig. 6. RMS TP estimation errors of the modified ML estimation algorithm and the RKL algorithm. The thicker and thinner lines denote the ML and RKL estimation results, respectively.

VIII. CONCLUSION AND FUTURE WORK This paper proposes an approximate ML estimator for the TPs associated with JMLSs. The estimator, which requires component IMM filter to calculate the mode weights of a the TPs, can also supply the base-state estimates and covariances as a by-product. The derived algorithm has two main approximations: 1) EM-type approximation to the original ML problem; 2) IMM approximation to the infeasible optimal EM algorithm. The effects of the EM type approximation can be alleviated by using more EM iterations on a single measurement. The IMM-type approximations, on the other hand, can be crucial in especially higher dimensional systems than those considered in the simulation section of this paper. Alternative approximation

(140) , we where (138) makes use of (57). Using the definition of can write . Substituting this into (140), we get

(141) , we can see that Using the properties of the inner product , which can be used in the second expectation on the right-hand side of (141). Since the

5106

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 10, OCTOBER 2008

quantity is deterministic, it can be taken outside the expectation to yield

Now, we can take the expectation with respect to expectations of (145) because is independent of under . Then, we obtain and

in both , ,

(142) (146) is already given, the quantities in the Now notice that, since except for and expectations are only functions of . Therefore, we can use the identities (57) and (58) to calculate the expectations of (146) as

(147)

(143) where we used the state-space representation (3) of the Markov chain while passing to (143) and have shown the -dependence of the TPM as . The second and the fourth expectations on the right-hand side of (143) are zero due to the following facts. is linear. • The inner product operation is a martingale increment, i.e., . • • Under the probability measure , the process , and , is independent of the processes hence the process and . Now using the identity

(148)

(144) (149)

(143) becomes

Since this equation is satisfied for all test functions, the recursion (62) holds by Lemma 1. APPENDIX II PROOF OF (91) Let

(145)

be any test function; then

(150)

ORGUNER AND DEMIREKLER: MAXIMUM LIKELIHOOD ESTIMATION OF TRANSITION PROBABILITIES OF JMLSs

(151) (152) where (151) uses (58). Since , , and are all mutually independent under , we can separate the expectations in (152) as

5107

when all the related densities are Gaussian. Rearranging (159), we get the relation (160) This gives the following identity about the multiplication of the Gaussian densities when we replace all the densities with their Gaussian version (161) where (162) (163)

Since the equality holds for all test functions density is given as

(153)

B. Integral of Two Gaussian Densities

(154)

Kalman filter prediction update implements the Bayesian density equation (164)

, the initial

(155) by Lemma 1. Since densities as

and

are given where (156) (157)

we have

which represents a Kalman measurement update. Using the result of Appendix III-A, we can obtain

(158) which is the same as (91) where the definitions for and are given from (92)–(94) explicitly.

when all the related densities are Gaussian. Replacing the corresponding densities in (164) with specific Gaussian densities, we get the relation

,

,

,

APPENDIX III RESULTS ON MULTIPLICATION AND INTEGRAL OF GAUSSIAN DENSITIES This appendix gives brief results on multiplication and integral of Gaussian densities, which are used in the derivations of this paper. A. Multiplication of Two Gaussian Densities Kalman filter measurement update implements the Bayesian density equation (159)

and

are positive definite matrices. REFERENCES

[1] A. Logothetis and V. Krishnamurthy, “Expectation maximization algorithms for MAP estimation of jump Markov linear systems,” IEEE Trans. Signal Process., vol. 47, no. 8, pp. 2139–2156, Aug. 1999. [2] Y. Bar-Shalom and X. R. Li, Estimation and Tracking: Principles, Techniques, and Software. Norwell, MA: Artech House, 1993. [3] E. Mazor, A. Averbuch, Y. Bar-Shalom, and J. Dayan, “Interacting multiple model methods in target tracking: A survey,” IEEE Trans. Aerosp. Electron Syst., vol. 34, pp. 103–123, Jan. 1998. [4] O. Costa, “Linear minimum mean square error estimation for discretetime Markovian jump linear systems,” IEEE Trans. Autom. Control, vol. 39, pp. 1685–1689, Aug. 1994. [5] L. Johnston and V. Krishnamurthy, “An improvement to the interacting multiple model (IMM) algorithm,” IEEE Trans. Signal Process., vol. 49, no. 12, pp. 2909–2923, Dec. 2001. [6] Q. Zhang, “Optimal filtering of discrete-time hybrid systems,” J. Optim. Theory Applications, vol. 100, no. 1, pp. 123–144, Jan. 1999. [7] T. Ho and B. Chen, “Novel extended Viterbi-based multiple-model algorithms for state estimation of discrete-time systems with Markov jump parameters,” IEEE Trans. Signal Process., vol. 54, no. 2, pp. 393–404, Feb. 2006. [8] R. J. Elliott, F. Dufour, and W. P. Malcolm, “State and mode estimation for discrete-time jump Markov systems,” SIAM J. Contr. Optim., vol. 44, no. 3, pp. 1081–1104, 2005. [9] G. Ackerson and K. Fu, “On state estimation in switching environments,” IEEE Trans. Autom. Control, vol. AC-15, pp. 10–17, Aug. 1970. [10] J. K. Tugnait, “Adaptive estimation and identification for discrete systems with Markov jump parameters,” IEEE Trans. Autom. Control, vol. AC-27, pp. 1054–1065, Oct. 1982. [11] H. A. P. Blom and Y. Bar-Shalom, “The interacting multiple model algorithm for systems with Markov switching coefficients,” IEEE Trans. Autom. Control, vol. 33, pp. 780–783, Aug. 1988. [12] A. Doucet, N. Gordon, and V. Krishnamurthy, “Particle filters for state estimation of jump Markov linear systems,” IEEE Trans. Signal Process., vol. 49, no. 3, pp. 613–624, Mar. 2001. [13] A. Doucet, A. Logothetis, and V. Krishnamurthy, “Stochastic sampling algorithms for state estimation of jump Markov linear systems,” IEEE Trans. Autom. Control, vol. 45, no. 2, pp. 188–202, Feb. 2000.

5108

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 10, OCTOBER 2008

[14] Y. Boers and J. Driessen, “Interacting multiple model particle filter,” Proc. Inst. Electr. Eng. Radar Sonar Navig., vol. 150, no. 5, pp. 344–349, Oct. 2003. [15] H. Driessen and Y. Boers, “Efficient particle filter for jump Markov nonlinear systems,” Proc. Inst. Elect. Eng. Radar Sonar Navig., vol. 152, no. 5, pp. 323–326, Oct. 2005. [16] C. Andrieu, M. Davy, and A. Doucet, “Efficient particle filtering for jump Markov systems. Application to time-varying autoregressions,” IEEE Trans. Signal Process., vol. 51, no. 7, pp. 1762–1770, Jul. 2003. [17] F. Caron, M. Davy, E. Duflos, and P. Vanheeghe, “Particle filtering for multisensor data fusion with switching observation models: Application to land vehicle positioning,” IEEE Trans. Signal Process., vol. 55, no. 6, pp. 2703–2719, Jun. 2007. [18] Y. Sawaragi, T. Katayama, and S. Fujishige, “Adaptive estimation for a linear system with interrupted observation,” IEEE Trans. Autom. Control, vol. AC-18, pp. 152–154, Apr. 1973. [19] J. K. Tugnait and A. H. Haddad, “State estimation under uncertain observations with unknown statistics,” IEEE Trans. Autom. Control, vol. 24, pp. 201–210, Apr. 1979. [20] J. K. Tugnait and A. H. Haddad, “Adaptive estimation in linear systems with unknown Markovian noise statistics,” IEEE Trans. Inf. Theory, vol. IT-26, pp. 66–78, Jan. 1980. [21] Z. Ghahramani and G. E. Hinton, “Variational learning for switching state-space models,” Neural Comput., vol. 12, no. 4, pp. 831–864, Apr. 2000. [22] E. K. Boukas, Stochastic Switching Systems: Analysis and Design. Boston, MA: Birkhäuser, 2006. [23] V. P. Jilkov and X. R. Li, “Online Bayesian estimation of transition probabilities for Markovian jump systems,” IEEE Trans. Signal Process., vol. 52, no. 6, pp. 1620–1630, Jun. 2004. [24] A. Doucet and B. Ristic, “Recursive state estimation for multiple switching models with unknown transition probabilities,” IEEE Trans. Aerosp. Electron. Syst., vol. 38, pp. 1098–1104, Jul. 2002. filter design for Mar[25] J. Xiong and J. Lam, “Fixed-order robust kovian jump systems with uncertain switching probabilities,” IEEE Trans. Signal Process., vol. 54, no. 4, pp. 1421–1430, Apr. 2006. [26] V. P. Jilkov, X. R. Li, and D. S. Angelova, “Estimation of Markovian jump systems with unknown transition probabilities through Bayesian sampling,” in Proc. Revised Papers 5th Int. Conf. Numer. Methods Applicat. (NMA ’02), London, U.K., 2003, pp. 307–315. [27] U. Orguner and M. Demirekler, “An online sequential algorithm for the estimation of transition probabilities for jump Markov linear systems,” Automatica, vol. 42, no. 10, pp. 1735–1744, Oct. 2006. [28] L. Aggoun and R. J. Elliott, Measure Theory and Filtering: Introduction with Applications. Cambridge, U.K.: Cambridge Univ. Press, 2004.

[29] R. J. Elliott, L. Aggoun, and J. B. Moore, Hidden Markov Models: Estimation and Control. New York: Springer-Verlag, 1994. [30] P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control. Englewood Cliffs, NJ: Prentice-Hall, 1986. [31] R. J. Elliott, F. Dufour, and D. D. Sworder, “Exact hybrid filters in discrete time,” IEEE Trans. Autom. Control, vol. 41, pp. 1807–1810, Dec. 1996. [32] L. Ljung, System Identification: Theory for the User. Upper Saddle River, NJ: Prentice-Hall, 1999. [33] U. Orguner and M. Demirekler, “Risk-sensitive filtering for jump Markov linear systems,” Automatica, vol. 44, no. 1, pp. 109–118, Jan. 2008. [34] U. Orguner, “Improved state estimation for jump Markov linear systems,” Ph.D. dissertation, Middle East Technical Univ., Ankara, Turkey, 2006.

Umut Orguner (M’00) received the B.S., M.S., and Ph.D. degrees in electrical engineering from Middle East Technical University, Ankara, Turkey in 1999, 2002, and 2006, respectively. Between 1999 and 2007, he was with the Department of Electrical and Electronics Engineering, Middle East Technical University, as a Teaching and Research Assistant. Since January 2007, he has been a Postdoctoral Associate in the Division of Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden. His research interests include estimation theory, multiple-model estimation, target tracking, and information fusion.

H

Mübeccel Demirekler (M’78) received the Ph.D. degree in electrical engineering from Middle East Technical University, Ankara, Turkey, in 1979. She is currently a Professor in the Department of Electrical and Electronics Engineering, Middle East Technical University. Her research interests are target tracking systems, on which she conducted several projects, information fusion and speech processing.