Trajectory planning for systems with a multiplicative stochastic

0 downloads 0 Views 178KB Size Report
May 20, 2004 - _x ¼ f ًxق u1g1ًxق ءءء ukgkًxق and y ¼ Cx where x 2 R n and y 2 R p, we look for a system of controls that drive the output through or close to a.
INT. J. CONTROL,

20 MAY 2004, VOL. 77, NO. 8, 713–722

Trajectory planning for systems with a multiplicative stochastic uncertainty ULF T. JO¨NSSONy*, CLYDE MARTINz and YISHAO ZHOU} A trajectory planning problem for linear stochastic differential equations is considered in this paper. The aim is to control the system such that the expected value of the output interpolates given points at given times while the variance and an integral cost of the control effort are minimized. The solution is obtained by using the dynamic programming technique, which allows for several possible generalizations. The results of this paper can be used for control of systems with a multiplicative stochastic disturbance on the state vector and systems with a stochastic growth rate. This is frequently the case in investment problems, biomathematics and control theory.

1. Introduction The theory of control theoretic splines is concerned with controlling the state or some measured quantity of the state to given values at discrete times. That is, given a system of the form x_ ¼ f ðxÞ þ u1 g1 ðxÞ þ    þ uk gk ðxÞ and y ¼ Cx where x 2 Rn and y 2 Rp , we look for a system of controls that drive the output through or close to a prescribed sequence of set points f yðti Þ ¼ i : i ¼ 1, . . . , Ng: We call the generated output function an interpolating spline if it exactly interpolates these points, otherwise it is called a smoothing spline. The term control theoretic spline refers to the way the spline function is constructed, i.e. as the output of a dynamical system driven by a control function. The dynamics can either be a model of a system to be controlled or an auxiliary model that helps to define the form of the constructed spline function. Much of the literature surveyed below considers the case when the dynamics is linear and the spline function is generated as the

Received 1 October 2003. Revised and accepted 23 April 2004. *Author for correspondence. e-mail: [email protected] y Optimization and Systems Theory, Royal Institute of Technology, SE-100 44 Stockholm, Sweden. z Department of Mathematics and Statistics, Texas Tech University, Lubbock, Texas, USA. } Department of Mathematics, Stockholm University, SE-106 91 Stockholm, Sweden.

solution of an optimal control problem. In this paper we consider a version of this problem where there is a multiplicative stochastic uncertainty in the dynamics. It is then no longer possible to obtain exact interpolation and instead we let the expected value interpolate the given point at the same time as the variance and the expected value of an integral quadratic cost along the trajectory are minimized. By minimizing the variance at the interpolation points we minimize the effect of uncertainty due to the diffusion term in the system equation. Interpolating splines have been important since the 1960s as a tool in numerical analysis. A quick review of the literature shows a tremendous jump in the number of papers related to splines around 1970 which corresponds to the availability of computers for numerical calculations. Smoothing splines were first used in the 1960s but it wasn’t until Grace Wahba did a systematic development that they become important in statistics. Wahba’s book (Wahba 1990) is an important reference that is basic to understanding the myriad applications of smoothing splines as a tool in non-parametric statistics. The first appearance of control theoretic interpolating splines was an application to trajectory planning for aircraft in a seminal paper of Crouch and Jackson (1990). Wahba’s development of smoothing splines was very much in the spirit of control theoretic smoothing splines but she was only interested in polynomial splines. To our knowledge the first work on general control theoretic smoothing splines was in Sun et al. (2000). Control theoretic splines have been applied to trajectory planning for aircraft (Crouch and Jackson 1990), to wildlife tracking (Egerstedt and Martin 2003), to trajectory planning for robots (Egerstedt and Martin 2000) and, of course, to many problems in curve estimation and to many different problems in statistics. Interpolating splines are useful when the data is known to be exact and it is required for a curve to exactly pass through a specific set of points. In most applications exactness is not required and it suffices

International Journal of Control ISSN 0020–7179 print/ISSN 1366–5820 online # 2004 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/00207170410001714997

714

U. T. Jo¨nsson et al.

for the data to be close to the constructed curve. Smoothing splines were designed for exactly this purpose. In statistics the general object is to fit a curve to data in such a manner that the error between the data and the constructed curve has nice statistical properties, for example normally distributed. For control theoretic splines the object was and is slightly different. Here we want a trajectory that is close to the given data points but we seek a trade-off between exactness and cost. For example in aircraft applications, exactness often comes at the price of large accelerations and hence increases in fuel usage. In this paper we take a different approach. We place the uncertainty in the dynamics and require exactness in terms of expected value. So both this approach and the approach of smoothing splines have the effect of filtering the noisy data. For many problems the approach of this paper is the most natural. While there is a great deal of literature on the problem of constructing splines using deterministic methods there is little if any literature on the construction of splines using stochastic dynamics. However, there are many problems ranging from biological to economic to trajectory planning where the dynamics can be naturally considered as stochastic. This is the motivation for writing this paper. In earlier work by Zhou and Martin and other collaborators the problem of deterministic splines was attacked by obtaining explicit solutions to the output of the system and then optimizing the cost over the control; see for example the techniques in Martin et al. (2001). In this paper a more powerful technique is used for the optimization– dynamic programming. This is a necessary step to finding the solution and allows one to use standard techniques of solving the problem locally and showing the explicit dependence of the solution on the Riccati equation. It has the additional advantage in that it directly leads to feedback solutions of the corresponding control problem. We also solve the problem using inequality constraints on the output and we present a complete example of using the method for solving the trajectory planning problem for a mobile robot. Here the combined effect of surface irregularities, friction, and other disturbances is modeled as a multiplicative stochastic disturbance. Further examples and several possible generalizations are discussed in the concluding remarks.

2. Stochastic trajectory planning We consider a trajectory planning problem where a stochastic disturbance enters the differential equation.

As our basic problem we consider J0 ðx0 Þ N1 X ¼ min wkþ1 VarfCXðtkþ1 Þg u2Mðt0 , tN Þ

ðk¼0  tN t0 , x 0 2 juj dt þE 08 dX ¼ ðAX þ BuÞ dt > > > > < þ GðXÞ dZ, Xðt Þ ¼ x 0 0 subject to t0 , x 0 > E fCXðtkþ1 Þg ¼ kþ1 , > > > : k ¼ 0, . . . , N  1 ð1Þ where wk  0 and ðA, BÞ is a controllable pair. The last term in the stochastic differential equation corresponds to a multiplicative noise on the state vector defined by an m-dimensional Brownian motion Zt. In other words, Zt is a process with zero mean and independent increments, i.e. EfdZt dZsT g ¼ ðt  sÞIdt, where  denotes the dirac function. We assume that G(x) is a linear function on the form n X GðxÞ ¼ xi Gi i¼1

where Gk 2 Rnm . In (1), E t0 , x0 is the expectation operator given that Xðt0 Þ ¼ x0 and VarfCXðtk Þg ¼ E t0 , x0 fðCXðtk Þ  E t0 , x0 fCXðtk ÞgÞ2 g. We consider optimization over all Markov controls, i.e. feedback controls on the form uðt, !Þ ¼ ðt, Xðt, !ÞÞ. We let Mðt0 , tN Þ denote the set of Markov controls on an interval ðt0 , tN Þ. It can be shown that our optimal solution is also optimal over all F t adapted processes, where F t denotes the  algebra generated by Zs for s  t (see, e.g. Øksendal 1998). It turns out that there will be linear and constant terms in the value function due to the variance term in the objective function. It is therefore no essential addition in complexity to consider a more general path planning problem, where we allow the dynamics to be time-varying and different from stage to stage. We also consider integral costs that penalize the state vector. This gives the following generalization of (1) ( N 1  X t0 , x 0 J0 ðx0 Þ ¼ min E wkþ1 jC1, kþ1 Xðtkþ1 Þ u2Mðt0 , tN Þ k¼0 ) ð tkþ1

kþ1 j2 þ k ðt, X, uÞ dt tk 8 dX ¼ ðAk ðtÞX þ Bk ðtÞuÞ dt > > > > > > þ Gk ðt, XÞ dZ, t 2 ½tk , tkþ1 , > < Xðt0 Þ ¼ x0 subject to > > t0 , x 0 > >E fC2, kþ1 Xðtkþ1 Þg ¼ kþ1 , > > > : k ¼ 0, . . . , N  1

ð2Þ

715

Trajectory planning for systems where k ðt, x, uÞ ¼ xT Qk ðtÞx þ 2xT Sk ðtÞu þ uT Rk ðtÞu þ 2qk ðtÞT x þ 2rk ðtÞT u þ %k ðtÞ and Gk ðt, xÞ ¼

n X

xi Gk, i ðtÞ

Proof: The proof is based on a standard dynamic programming argument and is given in the appendix for completeness. œ The next proposition states the solution to the stochastic optimal control problem in (5) below. It shows that Jk(x) is a quadratic function which is instrumental in solving the dynamic programming iteration. Let

i¼1

and where everything else is defined in a similar way as in (1). Here Ak , Bk , Qk , . . . , k and Gk, i are piecewise continuous functions of time and all pairs ðAk ðtÞ, Bk ðtÞÞ are assumed to be completely controllable and the cost is strictly convex, which is the case if for all t 2 ½tk , tkþ1 , k ¼ 0, . . . , N  1, we have   Qk ðtÞ Sk ðtÞ Rk ðtÞ > 0 and  0: ð3Þ Sk ðtÞT Rk ðtÞ Note that if C1 ¼ C2 ¼ C and k ¼ k then E t0 , x0 fjCXðtk Þ  k j2 g ¼ VarfCXðtk Þg: Let us define the cost-to-go functions ( N 1 X Jk ðxÞ ¼ min E tk , x wiþ1 jC1, iþ1 Xðtiþ1 Þ  iþ1 j2 u2Mðtk , tN Þ

i¼k

ð tiþ1

)

i ðt, X, uÞ dt

þ ti

8 dX ¼ ðAi ðtÞX þ Bi ðtÞuÞ dt þ Gi ðt, XÞ dZ, > > > < t 2 ½t , t , Xðtk Þ ¼ x i iþ1 subject to t0 , x 0 > E fC2, iþ1 Xðtiþ1 Þg ¼ iþ1 , > > : i ¼ k, . . . , N  1 ð4Þ JN ðxÞ ¼ 0:

Vðx0 , , t0 , tf Þ ¼

min

E t0 , x0

ð tf

ðt,XðtÞ, uðtÞÞ dt  T T þ Xðtf Þ Q0 Xðtf Þ þ 2q0 Xðtf Þ þ %0 8 > < dX ¼ ðAðtÞX þ BðtÞuÞdt þ Gðt,XÞdZ, Xðt0 Þ ¼ x0 subject to > : t0 , x0 E fCXðtf Þg ¼  u2Mðt0 , tf Þ

t0

ð5Þ

where ðt, x, uÞ ¼ xT QðtÞx þ 2xT SðtÞu þ uT RðtÞu þ 2qðtÞT x þ 2rðtÞT u þ %ðtÞ and where Q, R and S satisfy the conditions in (3). Proposition 2:

We have

Vðx0 , , t0 , tf Þ ¼ xT0 Pðt0 Þx0 þ 2pðt0 ÞT x0 þ ðt0 Þ þ ðNðt0 ÞT x0 þ mðt0 ÞÞT Wðt0 Þ1 ðNðt0 ÞT x0 þ mðt0 ÞÞ where P_ þ AT P þ PA þ Q þ PðPÞ ¼ ðPB þ SÞR1 ðPB þ SÞT ,

Pðtf Þ ¼ Q0

N_ þ ðA  BR1 ðPB þ SÞT ÞT N ¼ 0,

Nðtf Þ ¼ C T

Due to the Markov property of the stochastic differential equation we can use dynamic programming to solve (1). We next state two propositions and then the main result that solves (1). The first states the dynamic programming recursion.

p_ þ ðA  BR1 ðPB þ SÞT ÞT p þ q

W_ þ N T BR1 BT N ¼ 0,

Wðtf Þ ¼ 0

Proposition 1: The optimal cost satisfies the following dynamic programming equation ð tkþ1 tk , x Jk ðxÞ ¼ min E k ðt, X, uÞ dt u2Mðtk , tkþ1 Þ tk  þ wkþ1 jC1, kþ1 Xðtkþ1 Þ  kþ1 j2 þ Jkþ1 ðXðtkþ1 ÞÞ 8 dX ¼ ðAk ðtÞX þ Bk ðtÞuÞ dt > > < subject to þ Gk ðt, XÞ dZ, Xðtk Þ ¼ x > > : tk , x E fC2, kþ1 Xðtkþ1 Þg ¼ kþ1

m_ ¼ N T BR1 ðBT p þ rÞ,

mðtf Þ ¼ 

JN ðxÞ ¼ 0:

¼ ðPB þ SÞR1 r,

pðtf Þ ¼ q0

_ þ % ¼ ðr þ BT pÞT R1 ðr þ BT pÞ,

ðtf Þ ¼ %0 ð6Þ

and where PðPÞ is a linear matrix function with elements PðPÞk, l ¼ 12 trðGTk PGl Þ. The optimal control is u ¼ R1 ðPB þ SÞT X  R1 BT Nv  R1 ðBT p þ rÞ with v ¼ Wðt0 Þ1 ðNðt0 ÞT x0 þ mðt0 ÞÞ. Proof: The proof is done by Lagrangian relaxation of the linear constraint. See the appendix for the complete details. œ

716

U. T. Jo¨nsson et al.

Remark 1: If Q, R and S satisfy condition (3), then the linearly perturbed Riccati equation in (6) has an absolutely continuous unique positive semidefinite solution (Wonham 1968). All other differential equations in (6) are linear with bounded piecewise continuous coefficients, which ensure existence of unique absolutely continuous solutions. Note also that we have WðtÞ > 0 for t 2 ½t0 , tf Þ by the complete controllability of the pair ðAðtÞ, BðtÞÞ. Remark 2: If the stochastic term in (1) and (2) is removed then we obtain a deterministic trajectory planning problem. The solution to the deterministic problem is obtained by omitting the term PðPÞ in the Riccati equation above. In order to obtain a compact notation we introduce       Q 0 q0 Q q S b0 ¼ b¼ b¼ Q , Q , S T T q % q0 % 0 rT     P p N b ¼ W, R b¼ R b¼ b¼ , W P , N T p  mT     A 0 B b bðÞ ¼ ½C   b A¼ , B¼ , C 0 0 0     Gðt, xÞ PðPÞ 0 ^ ^ b Gðt, xÞ ¼ , PðPÞ ¼ : 0 0 0 ð7Þ b ¼ ½X T 1T and b If we finally let X x0 ¼ ½xT0 1T then the optimization problem (5) can be written Vðx0 , , t0 , tf Þ ¼

min

u2Mðt0 , tf Þ

E

t0 , x 0

ð tf

bX bu þ uT R bu dt bT Q b þ 2X bT S ½X

 b0 X bðtf ÞT Q bðtf Þ þX 8 bðtÞX bðt, XÞ dZ, b ¼ ðA bþ B bðtÞuÞ dt þ G > dX > > < bðt0 Þ ¼ b subject to X x0 > > > : t0 , x 0 b b E fC ðÞX ðtf Þg ¼ 0 ð8Þ and the optimal solution can be written h i bðt0 Þ þ N bðt0 ÞT b bðt0 ÞW b ðt0 Þ1 N Vðx0 , , t0 , tf Þ ¼ b x T0 P x0 where _ bT b b b b ^ b b P þ A P þ P A þ Q þ PðPÞ b0 bðtf Þ ¼ Q P

_ bÞT ÞT N b B b ¼ 0, bB bþ S b bR b1 ðP N þ ðA _ bR b1 B bT N b ¼ 0, b b TB W þN

bðÞT bðtf Þ ¼ C N

b ðtf Þ ¼ 0: W

bÞTb bT N bW b ðt0 Þ1 N bðt0 ÞTb b1 ðP bB bþ S b1 B xR x0 : u ¼  R There is an equivalent feedback form given as u ¼ bÞT X b T ÞB b. b1 ððP bþ N bW b 1 N bþ S R We can now state the solution of the general stochastic trajectory planning problem in (2). Proposition 3: Consider the optimal control problem in (2), where the condition (3) holds. The optimal Markov control in each time interval ½tk , tkþ1  is (all variables are defined analogously with (7)) bk ðtÞÞT X bðtÞ bk ðtÞ1 ðP bk ðtÞB bk ðtÞ þ S u ðtÞ ¼ R bk ðtÞ1 B bk ðtÞT N bk ðtÞW bk ðtk ÞT X bðtk Þ bk ðtk Þ1 N R where for k ¼ N  1, . . . , 0 _ bT b b bb ^ b b P k þ Ak Pk þ Pk Ak þ Qk þ Pk ðPk Þ bk ÞR b T bk þ S b1 bb bk B ¼ ðP k ðPk Bk þ Sk Þ _ b b T Tb b b1 b b b N k þ ðAk  Bk Rk ðPk Bk þ Sk Þ Þ Nk ¼ 0, b2, kþ1 ðkþ1 ÞT bk ðtkþ1 Þ ¼ C N _ b b T b b1 bT b W k þ N k Bk Rk B k Nk ¼ 0,

bk ðtkþ1 Þ ¼ 0 W

b1, N ðN ÞT C b1, N ðN Þ and for bN1 ðtN Þ ¼ wN C and where P k ¼ N  2, N  3, . . . , 0 bk ðtkþ1 Þ ¼ P bkþ1 ðtkþ1 Þ þ N bkþ1 ðtkþ1 ÞW bkþ1 ðtkþ1 Þ1 P

t0

bÞR bÞT , bB bþ S b1 ðP bB bþ S ¼ ðP

The optimal control is

b1,kþ1 ðkþ1 ÞT C b1,kþ1 ðkþ1 Þ: bkþ1 ðtkþ1 ÞT þ wkþ1 C N The cost-to-go is h i bk ðtk Þ þ N bk ðtk ÞT b bk ðtk ÞW bk ðtk Þ1 N Jk ðxÞ ¼ b xT P x: Proof: By dynamic programming. See details in the appendix. œ The formulation of Proposition 3 in the compact notation (7) gives appealing formulas but in a numerical implementation it is more efficient to perform computations using a system on the form (6). For the basic trajectory planning in (1) this reduces to the following result. Corollary 1: Consider the optimal control problem in (1), where the pair ðA, BÞ is controllable. The optimal Markov control in each time interval ½tk , tkþ1  is u ðtÞ ¼ R1 BT Pk ðtÞXðtÞ  R1 BT Nk ðtÞvk  R1 BT pk ðtÞ

717

Trajectory planning for systems with vk ¼ Wk ðtk Þ1 ðNk ðtk ÞT Xðtk Þ þ mk ðtk ÞÞ, where T

where 2

T

P_ k þ A Pk þ Pk A þ PðPk Þ ¼ Pk BR B Pk 1

0 60 6 A¼6 40

N_ k þ ðA  BR1 BT Pk ÞT Nk ¼ 0, p_ k þ ðA  BR1 BT Pk ÞT pk ¼ 0

0

ð9Þ

W_ k þ NkT BR1 BT Nk ¼ 0



m_ k ¼ NkT BR1 BT pk



_ k ¼ pTk BR1 BT pk

1 0

0 0 0 1

0 

0 , 0

0 61 6 B¼6 40

3 0 07 7 7, 05

0 1 2 0 6x 6 2 GðxÞ ¼ 6 40

Pk ðtkþ1 Þ ¼ Pkþ1 ðtkþ1 Þ þ Nkþ1 ðtkþ1 ÞWkþ1 ðtkþ1 Þ  Nkþ1 ðtkþ1 ÞT þ wkþ1 C T C

min E

0, x0



þ

and Nk ðtkþ1 Þ ¼ C T , mk ðtkþ1 Þ ¼ kþ1 and Wk ðtkþ1 Þ ¼ 0. The optimal cost-to-go is Jk ðxÞ ¼ xT Pk ðtk Þx þ 2pk ðtk ÞT x þ k ðtk Þ þ ðNk ðtk ÞT x þ mk ðtk ÞÞT Wk ðtk Þ1 ðNk ðtk ÞT x þ mk ðtk ÞÞ:

3. Path planning for mobile robot We consider the problem of steering a robot from rest at an initial condition ð5, 1Þ to rest at the final position ð1, 5Þ in such a way that it stays inside the corridor in the upper left part of figure 1. The dynamic model of a mobile robot with the centre of the wheel axis as a reference point will be non-linear and nonholonomic. However, by moving the reference point to an off-axis point it is possible to feedback linearize the dynamics (see e.g. Laumond 1998). We use a feedback linearization of a unicycle model derived in Lawton et al. (2000) y€ ¼ u þ e: Here we have added a noise signal e that takes into account friction, irregularities in the floor, and other error sources. If we let the components of the noise be modelled as dEi ¼ y_i dWi , where a W is a twodimensional Brownian motion, then the robot dynamics can be modelled by the stochastic system

0 0

7 7 7: 5

x4

 juj dt 2

0

 mkþ1 ðtkþ1 Þ þ wkþ1 CT kþ1

 mkþ1 ðtkþ1 Þ þ wkþ1 Tkþ1 kþ1

3

w1 jC1 Xð3Þ  1 j2 þ w2 jC3 Xð6Þ  3 j2 ð6

pk ðtkþ1 Þ ¼ pkþ1 ðtkþ1 Þ þ Nkþ1 ðtkþ1 ÞWkþ1 ðtkþ1 Þ1

k ðtkþ1 Þ ¼ kþ1 ðtkþ1 Þ þ mkþ1 ðtkþ1 ÞT Wkþ1 ðtkþ1 Þ1

0

Let us use the design equation

1

Y ¼ CX

0 0 0 0

2

3 0 07 7 7, 15

0

where PðPk Þ is a linear matrix function with elements PðPÞk, l ¼ 12 trðGTk PGl Þ and the boundary conditions are

dX ¼ ðAX þ BuÞ dt þ GðXÞ dW

1 0 0 0

subject to ( dX ¼ ðAX þ BuÞ dt þ GðXÞ dW, Xð0Þ ¼ x0     E 0, x0 C2 Xð3Þ ¼ 2 , E 0, x0 C3 Xð6Þ ¼ 3 ð10Þ where 

   0 0 1 , 1 ¼ 1 0 1

C 2 ¼ 1 0 1 0 , 2 ¼ 0 2 3 2 3 1 0 0 0 1 60 1 0 07 6 0 7 6 6 7 7 C3 ¼ 6 7, 3 ¼ 6 7: 40 0 1 05 4 5 5

C1 ¼

1 0

0 0

0

0

0 1

0

The idea behind the optimization problem is to divide the control problem into two stages. First, we steer the robot to the switching surface C2 x ¼ 2 in such a way that the expected deviation from the point C1 x ¼ 1 is small. Then in the next stage we steer to the final position x ¼ 3 in such a way that the variance of the deviation from this point is small. The integral cost is introduced to keep the expected control energy small. With the weights w1 ¼ 7 and w2 ¼ 1 we get the result in figure 1. We see from the lower plot that the expected path of the robot stays well inside the corridor as desired. It is possible to push the trajectory further toward theÐmiddle of the corridor by adding an integral 6 cost E 0, x0 f 0 qjCx  y0 ðtÞj2 dtg, where q  0 and y0 ðtÞ is some nominal trajectory near the middle of the corridor. The corresponding optimization problem still belongs to the problem class considered in this paper.

718

U. T. Jo¨nsson et al. 5

5 Final position

3

3 y2

4

y2

4

2

2 Initial position

1

1

0

0 −5

−4

−3

−2

−1

−5

0

−4

−3

y1

−2

−1

0

−2

−1

0

y1

3 5 2 4 1 y2

u1,u2

3 0

2

−1

1

−2

0

−3 0

2

4

6

−5

−4

t

−3 y1

Figure 1. The upper left figure shows the initial and final positions for the robot. The upper right figure shows one realization of the optimal path of the robot and the lower right figure shows the corresponding control signals, where u1 corresponds to the solid line and u2 is the dashed line. The weights w1 ¼ 7 and w2 ¼ 1 were used in the optimization problem (10). Finally, the lower right figure shows an estimate of the expected path obtained by averaging over 100 stochastic simulations.

4. Interpolation with inequality constraints In this section we consider the case with interpolation constraints on the form  k  E t0 , x0 fC2, k Xðtk Þg  k : This means that we consider J0 ðx0 Þ ¼

( min

u2Mðt0 , tN Þ

E t0 , x 0

N 1 X



T

wkþ1 jC1, kþ1 Xðtkþ1 Þ  kþ1 j2

k¼0

)

ð tkþ1

We also use the notation k ¼ ½k Tk T for the Lagrange multipliers corresponding to the interpolation inequalities and let N ¼ ; and k ¼ ½TN . . . Tkþ1 T . Proposition 4:

k ðt, X, uÞ dt

þ tk

subject to

The assumptions on the system matrices are the same as in the previous section. In the next result we use the compact notation of the previous section with the exception that   C2, k k b C2, k ðk Þ ¼ : C2, k k

8 dX ¼ ðAk ðtÞX þ Bk ðtÞuÞ dt þ Gk ðt, XÞ dZ, > > > > > < t 2 ½tk , tkþ1 , Xðt0 Þ ¼ x0 > kþ1  E t0 , x0 fC2, kþ1 Xðtkþ1 Þg  kþ1 , > > > > : k ¼ 0, . . . , N  1: ð11Þ

For k ¼ N  1, N  2, . . . , 0 let

_ bT b b bb ^ b b P k þ A k Pk þ Pk Ak þ Qk þ Pk ðPk Þ bk ÞR b T bk þ S b1 bb bk B ¼ ðP k ðPk Bk þ Sk Þ _ b b T Tb b b b1 b b N k þ ðAk  Bk Rk ðPk Bk þ Sk Þ Þ Nk ¼ 0, b2, kþ1 ðkþ1 ÞT bk ðtkþ1 , kþ1 Þ ¼ C N _ b T b b 1 bT b bk ðtkþ1 Þ ¼ 0 b W W k þ N k Bk R k Bk Nk ¼ 0,

719

Trajectory planning for systems b1, N ðN ÞT C b1, N ðN Þ and bN1 ðtN , N Þ ¼ wN C where P

(1978) for techniques and examples). An interesting problem that remains open is to solve the problem when there is a constraint such as

bk ðtkþ1 , kþ1 Þ P b1, kþ1 ðkþ1 ÞT C b1, kþ1 ðkþ1 Þ ¼ P kþ1 ðtkþ1 , kþ1 Þ þ wkþ1 C P k ðtk , k Þ bk ðtk , kþ1 Þ bk ðtk , kþ1 Þ þ enþ1 Tkþ1 N ¼P bk ðtk Þkþ1 enþ1 eTnþ1 bk ðtk , kþ1 ÞT kþ1 eTnþ1  Tkþ1 W þN bk and and where enþ1 ¼ ½0    0 1 2 Rnþ1 . Note that P b Nk depend quadratically, respectively linearly, on kþ1 bk is independent of . The optimal cost function while W is obtained as the solution to the following quadratic optimization problem x0 : max b x T0 P 0 ðt0 , 0 Þb 0 0

ð12Þ

If  ¼ ½ðN ÞT . . . ð1 ÞT T is the maximizing argument in (12) then the optimal control in each time interval ½tk , tkþ1  is given as  b Tb b b 1  u ðtÞ ¼ R k ðtÞðPk ðt,  ÞBk ðtÞ þ Sk ðtÞÞ X ðtÞ

Eð y_ ðtÞÞ  0 for 0  t  T where T is the final time. This particular problem was solved by Zhou et al. (2001) in the deterministic case. Both in the deterministic and the stochastic cases the problem remains open for solving the problem with constraints such as f ðtÞ  Eð yðtÞÞ  gðtÞ for a < t < b. This current paper represents an important first step in extending control theoretic splines to the stochastic setting where these problems exist. Dynamic programming is a powerful tool and fits the needs of stochastic splines very well as is demonstrated in this paper. Acknowledgement The research was supported by grants from the Swedish Research Council, by the EC within the RECSYS project, and the NSF.

where bk ðt,  Þ bk ðt,  Þ þ enþ1 ðkþ1 ÞT N P k ðtÞ ¼ P bk ðt,  ÞT kþ1 eTnþ1  ðkþ1 ÞT W bk ðtÞkþ1 enþ1 eTnþ1 : þN Proof:

See the appendix.

œ

5. Concluding remarks In this paper we considered and solved the trajectory planning problem in (1) and some generalizations of it. Such problems occur in a variety of settings and there are many important application areas in which the techniques of this paper are relevant. We have already mentioned applications in trajectory planning for robots and aircrafts. Other possible applications are in population studies, and investment problems. We considered a simple example of a model for housing investments with variable interest rates in Jo¨nsson et al. (2002). There are many possible generalizations of this work. An important area is to consider problems that are of mixed types. In the simplest such examples we can assume that some data must be interpolated and some must be smoothed. These problems can be attacked using the methods of this paper. The potential applications of the techniques presented in this paper to problems in economics and finance are enormous. In this area it is interesting to consider generalizations to models where the stochastic uncertainty multiplies the control signal (Lim and Zhou 1999). Another extension is to consider the construction of splines when the dynamics are governed by two point boundary value problems (see, e.g. Adams et al. (1984 a, b) and Krener

Appendix Proof of Proposition 1: Let J  ðxÞ denote the optimal cost-to-go function in (4). We will show that it satisfies the stated dynamic programming iteration. Obviously, we have JN ðxÞ ¼ JN ðxÞ ¼ 0. Assume by induction  that Jkþ1 ðxÞ ¼ Jkþ1 ðxÞ. For compactness of notation we introduce new notation for the cost function and constraints. First, let  n ¼ ðn , . . . , N1 , N Þ and n ¼ ðn , . . . , N1 , N Þ. Then we lety Cn ðu, nþ1 Þ ð tnþ1 ¼ n ðt, X, uÞ dt þ wnþ1 jC1, nþ1 Xðtnþ1 Þ  nþ1 j2 tn

Cn ðu, nþ1 Þ ¼

N 1 X k¼n

ð tkþ1 k ðt, X, uÞ dt tk

þ wkþ1 jC1, nþ1 Xðtkþ1 Þ  kþ1 j2 U n ðx,  nþ1 Þ 8 > > > > > > > > < ¼ u 2 Mðtn , tN Þ: > > > > > > > > :



8 dX ¼ ðAk X þ Bk uÞ dt > > > > > > þ Gk ðXÞ dZ, t 2 ½tk , tkþ1 , > > < Xðtn Þ ¼ x > > > tn , x > E fC2, kþ1 Xðtkþ1 Þg ¼ kþ1 , > > > > : k ¼ n, . . . N  1

9 > > > > > > > > = > > > > > > > > ;

yIn the definition of Cn ðu, nþ1 Þ and Cn ðu, nþ1 Þ we implicitly assume that XðtÞ satisfies the stochastic differential equations in the definition of U n ðx, nþ1 Þ and U n ðx,  nþ1 Þ, respectively.

720

U. T. Jo¨nsson et al.

and finally U n ðx, nþ1 Þ is defined similarly but only on the time interval ½tn , tnþ1 . We have ( N 1 ð tkþ1 X  tn , x min E k ðt, X, uÞ dt Jn ðxÞ ¼ u2 U n ðx,  nþ1 Þ

tk

k¼n

þ wkþ1 jC1, kþ1 Xðtkþ1 Þ  kþ1 j2 ¼ ¼

min

u2 U n ðx,  nþ1 Þ

min

u1 2 U n ðx, nþ1 Þ

þ ¼

þ ¼ ¼

min

tnþ1 , Xðtnþ1 Þ



Cnþ1 ðu2 , nþ2 Þ

o

   E tn , x Cn ðu, nþ1 Þ þ Jnþ1 ðXðtnþ1 ÞÞ

min

o n E tn , x Cn ðu, nþ1 Þ þ Jnþ1 ðXðtnþ1 ÞÞ

¼ Jn ðxÞ where the Markov property was used in the third equality and the induction hypothesis in the fourth. œ Proof of Proposition 2: We use the compact notation introduced in (7) and (8) after the proposition. If we apply the Lagrange multiplier rule to (8) then we obtain Vðx0 , , t0 , tf Þ ¼ max 

min

u2Mðt0 , tf Þ

E t0 , x 0



tf

þ u Ru dt þ

n X

!! xl GTl

l¼1

ij

xk xl Gki GTlj

n X n 1X @2 V aij 2 i¼1 j¼1 @xi @xj

¼

n X n 1X ^ ðP Þb x: x x trðGTk PGl Þ ¼ b x TP 2 k¼1 l¼1 k l

If we plug Vðb x, tÞ ¼ b x T P ðtÞb x into the HJBE we get the bÞTb b 1 ðP B bþ S optimal control u ¼ R x, and that the following Riccati equation must hold b T P þ P A bþ Q bþ P bÞR bÞT ^ ðP Þ ¼ ðP B bþ S b1 ðP B bþ S P_ þ A b0 þ enþ1 T C bþ with boundary condition P ðtf Þ ¼ Q T T T b enþ1 , where enþ1 ¼ ½0    0 1 2 Rnþ1 . The optiC mal cost becomes Vðx0 , , t0 , tf Þ ¼ max b x0 : x T0 P ð, t0 Þb 

ð13Þ

To perform the optimization with respect to  we need to obtain an explicit expression of P as a function of . It turns out that we can use bþ N b eTnþ1 þ enþ1 T N b T  T W b enþ1 eTnþ1 P ¼ P b, N b and W b are given in (7). This follows since where P

bX bu b TQ b þ 2X b TS ½X

bðtf ÞT bðtf Þ þ N bðtf ÞeTnþ1 þ enþ1 T N P ðtf Þ ¼ P

0

b ðtf Þenþ1 eTnþ1  T W

) Tb

xk Gk

n X

where Gki is the ith row of Gk. With the value function Vðb x, tÞ ¼ b x T P ðtÞb x we get

min

u2 U n ðx, nþ1 Þ

!

k, l¼1

 o E tn , x Cnþ1 ðu2 , nþ2 Þ j Xðtnþ1 Þ

E

n X k¼1

¼

n E tn , x Cn ðu1 , nþ1 Þ

u2 2 U nþ1 ðXðtnþ1 Þ,  nþ2 Þ

u2 U n ðx, nþ1 Þ

aij ¼ ð Þij ¼

n E tn , x Cn ðu1 , nþ1 Þ

u2 2U nþ1 ðXðtnþ1 Þ,  nþ2 Þ

u1 2 U n ðx, nþ1 Þ

T

)

E tn , x Cn ðu, nþ1 Þ

min

min

where

b0 X bðtf ÞT Q bðtf Þ X

Tb

bðtf Þ þ 2 C ðÞX

bðtÞX bðt, XÞ dZ, b ¼ ðA bþ B bðtÞuÞ dt þ G subject to dX bðt0 Þ ¼ b X x0 : The solution to the inner optimization can be obtained from the Hamilton–Jacobi–Bellman equation (HJBE) ( @V bb bu þ uT R bu ¼ minm b x þ 2b x TS  x TQ @t u2R ) n X n 2 X @V T b 1 @ V buÞ þ þ ðAb xþB a @b x 2 i¼1 j¼1 ij @xi @xj b0b bðÞb Vðx, tf Þ ¼ b x TQ x x þ 2T C

b0 þ enþ1 T C bþ C bT eTnþ1 ¼Q and bT P b Q b P bÞR bÞT b P bA ^ ðP Þ þ ðP bB bþ S b1 ðP bB bþ S P_ ¼ A b T ÞÞT N b B bþ S beTnþ1 bR b1 ðB bT P  ðA b T ÞÞ b B b T ðA bþ S bR b1 ðB bT P  enþ1 T N b TB bR b1 B bT N benþ1 eTnþ1  T N b Q b P b ÞR bT P  P A bÞT ^ ðP bÞ þ ðP B bþ S b1 ðP B bþ S ¼ A which follows since b T ðenþ1 T N b T þ T W b enþ1 eTnþ1 Þ ¼ 0 A b T þ T W b enþ1 eTnþ1 Þ ¼ 0: bT ðenþ1 T N B

721

Trajectory planning for systems The optimization in (13) thus becomes

boundary condition will be b1, N1 ðN1 ÞT C b1, N1 ðN1 Þ bN2 ðtN1 Þ ¼ wN1 C P

bðt0 ÞT bðt0 Þ þ N bðt0 ÞeTnþ1 þ enþ1 T N x T0 ðP max b 

bN1 ðtN1 Þ þ N bN1 ðtN1 Þ þP

b ðt0 Þenþ1 eTnþ1 Þb  T W x0 ¼

bðt0 Þ b x T0 ðP

bN1 ðtN1 ÞT : bN1 ðtN1 Þ1 N W

bðt0 Þ Þb bðt0 ÞW b ðt0 Þ N þN x0 1

T

b ðt0 Þ and the optimal Lagrange multiplier is  ¼ W T b N ðt0 Þ b x0 . Finally, the optimal control becomes

1

bÞb b1 ðB bðt0 ÞTb bT ðP bþ N bW b ðt0 Þ1 N u ¼ R x x0 eTnþ1 Þ þ S bÞTb bT N bW b ðt0 Þ1 N bðt0 ÞTb bB bþ S b1 ðP b1 B ¼ R xR x0 : It is straightforward to show that E t0 , x0 bðÞX bðtf Þg ¼ 0 when using this control, i.e. the confC straint is satisfied. This proves that we have obtained an optimal solution (Øksendal 1998). Considering the optimal control problem from the ‘initial point’ ðt, xðtÞÞ gives the following feedback formulation of the bÞT X bT ÞB b. bþ N bW b 1 N bþ S b1 ððP optimal control u ¼ R We note that under the conditions of the proposition b to the Riccati equations there exist solutions P and P that are involved in the proof. Indeed, the special structure of the system matrices implies that only the upper b satisfy nonlinear equations, which left blocks of P and P are identical to the first equation in (6). The other blocks corresponds to p and  in (6). Hence, the existence of b follows from Remark 1. P and P œ Proof of Proposition 3: recursion is

The dynamic programming

ð tkþ1

bk X bk u þ uT R b þ 2X bTS bk u dt bTQ ½X  b1, kþ1 ðkþ1 ÞX bðtkþ1 Þj2 þ Jkþ1 ðXðtkþ1 ÞÞ þ wkþ1 jC 8 bk ðtÞX b ¼ ðA bþ B bk ðtÞuÞ dt > dX > > < bk ðt, XÞ dZ, X bðtk Þ ¼ b subject to x þG > > > : tk , x b bðtkþ1 Þg ¼ 0 E fC2, kþ1 ðkþ1 ÞX

Jk ðxÞ ¼ min E tk , x u

tk

JN ðb x Þ ¼ 0: It follows from the discussion preceding the proposition that bN1 ðtN1 Þ x T ðP JN1 ðxÞ ¼ b bN1 ðtN1 ÞT Þb bN1 ðtN1 ÞW bN1 ðtN1 Þ1 N þN x where all variables are defined as in the proposition and b1, N ðN ÞT C b1, N ðN Þ. In the next recurbN1 ðtN Þ ¼ wN C P sion we get an analogous result except that now the

It is now obvious that the recursion continues as in the proposition statement. œ Proof of Proposition 4: Lagrange relaxation of the inequality constraints gives in our compact notation ( N 1 X t0 , x 0 J0 ðx0 Þ ¼ max min E wkþ1 jC1, kþ1 Xðtkþ1 Þ 0 0 u2Mðt0 , tN Þ

k¼0

b2, kþ1 ðkþ1 ÞX bðtkþ1 Þ Tkþ1 C

2

 kþ1 j þ ) ð tkþ1 h i  T T T bk X bk u þ u R b Q b þ 2X b S bk u dt X þ tk

subject to

bk ðtÞX b ¼ ðA bþ B bk ðtÞuÞ dt dX

þ Gk ðt, XÞ dZ,

t 2 ½tk , tkþ1 ,

bðt0 Þ ¼ b X x0 :

We use a dynamic programming iteration with JN ðxÞ ¼ 0. The derivation in the proof of Proposition 2 gives at t ¼ tN1 JN1 ðx, N1 Þ bN1 ðtN1 , N Þ þ N bN1 ðtN1 , N ÞN eTnþ1 ¼ max b x T ðP N 0

bN1 ðtN1 , N ÞT þ enþ1 TN N bN1 ðtN1 ÞN enþ1 eTnþ1 Þb  TN W x T  N1 ¼: max b Þb x: x PN1 ðtN1 ,  N1 0

In the next iteration we get JN2 ðx,N2 Þ ¼ max

min

N1 0 u2MðtN2 ,tN1 Þ

n b1,N1 ðN1 ÞX bðtN1 Þj2 E tN2 ,x wN1 jC

b2,N1 ðN1 ÞX bðtN1 Þ þTN1 C ð tN1 h i bk X bk uþuT R bT Q bþ2X bT S bk u dt X þ tN2

o bðtN1 ÞT P N1 ðtN1 , N1 ÞX bðtN1 Þ þ max X N1 0  tN2 ,x b1,N1 ðN1 ÞX bðtN1 Þj2 ¼ max min E wN1 jC N1 0 u2MðtN2 ,tN1 Þ N1 0

bðtN1 ÞT P N1 ðtN1 ,N1 ÞX bðtN1 Þ þX b2,N1 ðN1 ÞX bðtN1 Þ þTN1 C ð tN1 h i  bk X bk uþuT R bT Q bþ2X bT S bk u dt X þ tN2

¼: max b x xT P N2 ðtN2 ,N2 Þb N2 0

722

U. T. Jo¨nsson et al.

where P N2 ðtN2 , N2 Þ bN2 ðtN2 , N1 Þ þ N bN2 ðtN2 , N1 ÞN1 eTnþ1 ¼P bN2 ðtN2 , N1 ÞT þ enþ1 TN1 N bN2 ðtN2 ÞN1 enþ1 eTnþ1 :  TN1 W The second equality, where minu 2MðtN2 , tN1 Þ and maxN1 are permuted follows from the Karush-Kuhn– Tucker theorem (Balakrishnan 1976). Indeed, the optimization problem involves a convex cost and convex constraints so the Lagrange function satisfies a saddlepoint condition, which implies that the min and the max commutes. In the third equality we used the same arguments as in the proof of Proposition 2. If we continue the recursion we obtain (12). References Adams, M. B., Willsky, A. S., and Levy, B. C., 1984 a, Linear estimation of boundary value stochastic processes. I. The role and construction of complementary models. IEEE Transactions on Automatic Control, 29, 803–811. Adams, M. B., Willsky, A. S., and Levy, B. C., 1984 b, Linear estimation of boundary value stochastic processes. II. 1-D smoothing problems. IEEE Transactions on Automatic Control, 29, 811–821. Balakrishnan, A. V., 1976, Applied Functional Analysis (New York: Springer). Crouch, P., and Jackson, J. W., 1990, Dynamic interpolation for linear systems. In: Proceedings of the 29th IEEE Conference on Decision and Control, Hawaii, USA. Egerstedt, M., and Martin, C., 2000, Optimal trajectory planning and smoothing splines. Automatica, 37, 1057–1064. Egerstedt, M., and Martin, C., 2003, Smoothing splines in the sphere with applications to Wildlife telemetry. Submitted to Automatica.

Jo« nsson, U., Martin, C. F., and Zhou, Y., 2002, Trajectory planning under a stochastic uncertainty. In: Fifteenth International Symposium on Mathematical Theory of Networks and Systems. Krener, A. J., 1978, Boundary value linear systems. Systems analysis (Conference, Bordeaux, France, 1978), pp. 149–165, Astrisque, 75–76, Soc. Math. France, Paris, 1980. Laumond, J.-P. (Ed.), 1998, Robot Motion Planning and Control Lectures. Notes in Control and Information Sciences 229, (London: Springer), ISBN 3-540-76219-1. Lawton, J., Beard, R., and Young, B., 2003, A decentralized approach to formation maneuvers. In: Proceedings for IEEE International Conference on Robotics and Automation, December, volume 19(6), 933–941. Lim, A. E. B., and Zhou, X. Y., 1999, Stochastic optimal control with integral quadratic constraints and indefinite control weights. IEEE Transactions on Automatic Control, 44, 1359–1369. Martin, C., Sun, S., and Egerstedt, M., 2001, Optimal control, statistics and path planning. Mathematical and Computer Modelling, 33, 237–253. Martin, R. F., 2002, Consumption, durable goods, and transaction costs. Dissertation, University of Chicago. Øksendal, B., 1998, Stochastic Differential Equations, An Introduction with Applications, 5th Edition (BerlinHeidelberg, Springer). Sun, S., Egerstedt, M. B., and Martin, C. F., 2000, Control theoretic smoothing splines. IEEE Transactions on Automatic Control, 45, 2271–2279. Wahba, G., 1990, Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics, 59 (Philadelphia, PA: SIAM). Wonham, W. M., 1968, On a matrix Riccati equation of stochastic control. SIAM Journal of Control Optimization, 6, 681–697. Zhang, Z., Tomlinson, J., and Martin, C., 1997, Splines and linear control theory. Acta Applicandae Mathematicae, 49, 1–34. Zhou, Y., Egerstedt, M., and Martin, C., 2001, Optimal approximation of functions. Communications in Information and Systems, no. 1, 101–112.