Optimization of Nonlinear Stochastic Uncertain Relaxed ... - IEEE Xplore

6 downloads 0 Views 128KB Size Report
Systems: Entropy Rate Functionals and Robustness. Farzad Rezaei, Charalambos D. Charalambous, and Andreas Kyprianou. Abstract—This paper is ...
WeC08.3

43rd IEEE Conference on Decision and Control December 14-17, 2004 Atlantis, Paradise Island, Bahamas

Optimization of Nonlinear Stochastic Uncertain Relaxed Controlled Systems: Entropy Rate Functionals and Robustness Farzad Rezaei, Charalambos D. Charalambous, and Andreas Kyprianou

Abstract— This paper is concerned with nonlinear stochastic uncertain relaxed controlled difussions, in which the pay-off is described by the relative entropy between the nominal measure and the uncertain measure, when the uncertain measure satisfies certain energy inequality constraints. With respect to this formulation two problems are defined. The first, seeks to minimize the relative entropy over the set of unknown measures which satisfy inequality constraints. The second, seeks to maximize over the set of admissible relaxed control laws, the minimum value of relative entropy induced by the uncertain measures among those which satisfy inequality constraints. The second problem is equivalent to a minimax problem, while the first is an optimization problem with respect to a fix control law. Certain monotonicity properties of the optimal solution are discussed, while relations to the well-known Cramer’s theorem of large deviations are introduced. In addition, the implication of these results to minimax games for fully observable stochastic systems in which the strategies are measures are delineated and relations to risk-sensitive control problems are investigated.

Key Words: Nonlinear Uncertain Stochastic Systems, Relaxed Controls, Large Deviations, Relative Entropy, Minimax Games, Duality Properties. I. INTRODUCTION A. Abstract Formulation The abstract formulation of the problems considered are the following. Let (Σ, d) denote a complete separable metric space, and (Σ, B(Σ)) the corresponding measurable space in which B(Σ) are identified as the Borel sets generated by open sets in Σ. Let M(Σ) denote the set of probability measures on (Σ, B(Σ)), Uad the set of admissible controls, and B(Σ; ) the set of bounded real-valued measurable functions, u : Σ →  for a given u ∈ Uad . Here, M(Σ) denotes the set of all possible measures induced by the stochastic systems, while u ∈ B(Σ; ) denotes the energy function or fidelity criterion associated with a given choice of the control law u ∈ Uad . This work was supported by National and Science and Engineering Research Council of Canada under an operating grant. F. Rezaei is with the School of Information Technology and Engineering, University of Ottawa, 800 King Edward Ave., Ottawa, Ontario K1N 6N5, Canada, [email protected] C. D. Charalambous is with the School of Information Technology and Engineering, University of Ottawa, 161 Louis Pasteur, A519, Ottawa, Ontario K1N 6N5, Canada. Also with Electrical and Computer Engineering Department, University of Cyprus, 75 Kallipoleos Avenue, Nicosia, Cyprus, [email protected] A. Kyprianou is with the Mechanical and Manufacturing Engineering Department,University of Cyprus, 75 Kallipoleos Avenue, Nicosia, Cyprus, [email protected]

0-7803-8682-5/04/$20.00 ©2004 IEEE

Given a nominal measure µu the nominal stochastic system, a control law u∗ ∈ Uad and ν u,∗ ∈ Uad which solve the optimization problem. J(u∗ , ν u



,∗

) = sup

∈ M(Σ) induced by the problem is to find a probability measure following constrained

inf H(ν u |µu )

u u∈Uad ν ∈Mµ

(I.1)

Subject to fidelity  u dν u ≤ γ

(I.2)

Or subject to fidelity  u Eν u ( ) = u dν u ≥ γ

(I.3)

u

E ( ) = νu

Σ

Σ

where Mµ = {ν u ∈ M(Σ); ν u γ will correspond to the optimistic scenario (emphasizing the best cases) in which the strategies are risk-seeking, while the case (I.1), (I.3), with m < γ will correspond to the pessimistic scenario (emphasizing the worst cases) in which the strategies are risk-averse. B. Summary of Main Conclusions Disturbance Attenuation in Robustness. For a   φu : Σ → given u ∈ Uad let L2 (ν u ; H) = H; φu is a measurable  random variable such that  2 u ||φ|| dν < ∞ denote the Hilbert Space of H Σ random variables. Let L2 (ν u ; Z) and L2 (ν u ; D) denote the Hilbert Spaces of tracking signals and disturbance signals, respectively. For a given u ∈ Uad , let T u : D → Z be a bounded linear operator with

2561

induced norm defined by 

J(u) = ||T u || =

sup

||d||L2 (ν u ;D) =0

||z||2L2 (ν u ;Z)

||d||2L2 (ν u ;D)

(I.4)

The sub-optimal disturbance attenuation is to 1 ensure that for all u ∈ Uad that J(u) ≤ 2s , s > 0, which is equivalent to J s (u)

    1 2 u ||d||2D dν u = sup s ||z||Z dν − 2 d∈L2 (ν u ;D)  1   2 u ||d||D dν − s ||z||2Z dν u =− infu d∈L2 (ν ;D) 2 (I.5)

and ensuring that the pay-off is non-positive. When ν u is absolutely continuous with respect to µu , then for stochastic diffusion processes [9], [10] it can be shown that H(ν u |µu ) = 12 ||d||2D dν u . Therefore, the dual functional associated with converting the primal problem (I.1), (I.3) into the equivalent unconstrained optimization J s,γ (u, ν u,∗ ) =

inf u

ν ∈Mµ

   H(ν u |µu ) − s Eν u (u ) − γ

(I.6)

is equivalent to the sub-optimal disturbance attenuation problem (I.5) (let u = ||z||2Z ). Moreover, larger values of s imply higher attenuation and hence higher dissipation. An application of the above results to general nonlinear uncertain relaxed control systems is discussed in subsequent sections. Legendre-Fenchel or Cramer Transform. In the context of large deviations, the dual functionals associated with converting the primal problems (I.1)-(I.3) into equivalent unconstrained optimization problems are equal to the Legendre-Fenchel or Cramer transforms of u defined by   u   Ψ (γ) = sup sγ − log Eµu es s∈    = sup uinf H(ν u |µu ) − s Eν u (u ) − γ

hence it represents an optimistic scenario. For (I.1), (I.3) it is shown in Theorem (3.2) in [12] that the average energy constraint with respect to all uncertain measures ν u γ;   Case 2. m = Eµu (u ) = Σ u dµu < γ; 2) Find ν u,∗ ∈ M(Σ) which solves J(u, ν u,∗ ) =

{ν u ∈Mµ ;

inf Σ

u dν u ≥γ}

H(ν u |µu )

for the following two cases.   Case 1. m = Eµu (u ) = Σ u dµu < γ;   Case 2. m = Eµu (u ) = Σ u dµu > γ;

2562

(II.9)

A. Dual Functional For every s ∈ , define the Lagrangian 

J s,γ (u, ν u ) = H(ν u |µu ) − s(Eν u (u ) − γ)

(II.10)

and its associated dual functional J s,γ (u, ν u,∗ ) =

inf J s,γ (u, ν u )

ν u ∈Mµ

(II.11)

In addition, define the quantity ∗



ϕs (u, γ) = sup J s,γ (u, ν u,∗ )

(II.12)

s∈

which may or may not exist. Many properties of the dual ∗ functional and ϕs (u, γ) are derived in Lemma (3.1) in [12]. Also the equivalence between the unconstrained and constrained problems is established in Theorem (3.2) in [12]. Remark 2.2: Notice that in both problems discussed in Theorem (3.2) in [12], the following are important observations. 1) The minimizing measures ν u,∗ ∈ Mµ occur on the boundary of the constraints. In 1), corresponding to s ≤ 0, the strategy of the minimizing measure is optimistic (risk-seeking), while in 2), corresponding to s ≥ 0 the strategy of the minimizing measure is pessimistic (risk-averse). 2) The pessimistic scenario leads to the following inequality, reminiscent of the 2nd law of thermodynamics, 1 Eν u,∗ (u )|s=s∗ ≤ ∗ H(ν u,∗ |µu )|s=s∗ + γ (II.13) s which for dynamical systems is equivalent to a Dissipation inequality.

It can be shown that if U is compact then M(U × [0, T ]) is tight when endowed with the weak convergence topology, which is metrizable [11]. By a), π admits a derivative, and hence it can be represented by π(du, dt) = πt (du)dt, where πt (B) is an {F0,t }t≥0 -adapted processes for each B ∈ B(U). For a relaxed control π(·) ∈ M(U × [0, T ]), and for any (s, x) ∈ [0, T )×n , the nominal state process, {x(t)}t≥s is given by the following Ito stochastic differential equation (SDE): ⎧  dx(t) = U f (x(t), u)π ⎪ ⎪  t (du)dt ⎨ + U σ(x(t), u)πt (du)dw(t) (III.14) ⎪ ⎪ ⎩ x(0) = x Here x(·) is the solution of (III.14) corresponding to (s, x), π(·). Given the nominal measure P π ∈ M(Ω), find a relaxed control π ∗ ∈ M(U × [0, T ]) and a probability measure ∗ Qπ ,∗ ∈ M(Ω) which solve the following constrained optimization problem. J(π ∗ , Qπ



,∗

)=

sup

inf

π π π ∗ ∈M(U ×[0,T ]) Q ∈M (P )

H(Qπ |P π ) (III.15)

Subject to fidelity  T   EQπ λ(x(t), u)πt (du)dt + κ(x(T )) ≤ γ 0

U

or subject to fidelity  T   EQπ λ(x(t), u)πt (du)dt + κ(x(T )) ≥ γ 0

U

where γ ∈  and M (P π ) = {Qπ ∈ M(Ω); Qπ γ;  Case 2. m =< κ, νTπ >< γ; 2) J(π ∗ , ν π,∗ ) =

n

where

φ ∈ Cb ( )(III.17)

Moreover, µπt (φ)(x) satisfies the following equation.

inf H(ν π |µπ ) (III.20)

sup

π π∈M(U ×[0,T ]) ν ∈C

  C = ν π ∈ M(n );

κ(z)dν π (T, z|x) ≥ γ  , ν π < γ;  Case 2. m =< κ, νTπ >> γ; Similarly to the previous section, the above problems

2564

can be reformulated using the dual functional as follows. For every s ∈  define the Lagrangian    J s,γ (π, ν π ) = H(ν π |µπ ) − s < κ, NTy,u > −γ

the Dirac delta function concentrated at u = α. Using ordinary controls one can derive a Hamilton-JacobiBellman Equation for computing the optimal control strategy.

and its associated dual functional J

s,γ

(π, ν

π,∗

)=

inf

ν π ∈Mµπ

J

s,γ

π

(π, ν )

References (III.21)

where Mµπ = {ν π ∈ M(n ); ν π