Leader-Follower Strategies for Multilevel Systems - CiteSeerX

3 downloads 0 Views 1MB Size Report
tract EX-76-C-01-2088, and in part by the Joint Senices Electronics. Program under .... time, the leader's cost corresponding to a feedback Stac- kelberg strategy ...



Leader-FollowerStrategies MultilevelSystems

2, APRIL 1978



Abstmct-Seqmntial strategies for dynamic systems with multiple deeision makers and d t i p l e performance indices are surveyed and reviewed. These strategies are geueralizatiom of Stackelberg or IeadeAoUower strategies for tm-perwn games The review indudes struchoPs with one cOOrdinatOr and several second-level decision makers, and linear hierarchical structores with only one decision maker at each level. Several information structures are considered


for all Player Player Player

( G 3

T2 (G)) Q J 1 (u1, T2 (.I,)


uI. The strategy u: is the Stackelberg strategy for 1 and u: = T 2 ( u 3 is the Stackelberg strategy for 2 when the leader isPlayer1.Similarly,when 1 is follower and Player 2 is leader,

J, ( T I(u2),u2)Q J, (u1,u2) for each u2 and for all u I ,



T H E purpose of this paper is to survey and interrelate recent results on the utilization of leader-follower or Stackelberg strategy concepts in the control structuring of interconnected systems. These control methodologies are appropriate for classes of system problems wherethere are multiple criteria, multipledecision makers, decentralized information, andnatural hierarchy of decision making levels. The basicleader-follower strategy wasoriginally suggested for static duopoly by von Stackelberg [l]. This concept has beengeneralized to dynamicnonzero-sum two-person games by Chen and Cruz [2] and Simaan and Cruz [3], [4], to two groups of players by Simaan and Cruz [6],and to stochastic games by Castanon and Athans [8] and Castanon [14].

where TI is a mappingfrom u2 to ul, u:* is the leader Stackelberg strategy, and u:*= T,(u:*) is the follower Stackelberg strategy. In comparison, a Nash strategy pair ( U ~ ~ , U whch ~ ~ ) , may not be unique, is defined by

Static Two-Person Games

Clearly, from (6)

The basic idea of a leader-follower strategy for a static two-person game is rather simple. Consider two players. Player 1 chooses control u1E R and Player 2 chooses control u2E R. The scalar cost function associated with Player 1 is J , ( u , , u J and the scalar cost function associated with Player 2 is J2(u1,u2).Designate Player 1 as leader and Player 2 as follower. For each control u1 chosen by Player 1, Player 2 chooses u2= T2(u1)where T2 is a mapping from ul to u2 such that

and 32

(TI (u2**),u;*)


(TI (u2), u2) for all u2

Jl(uIN>u2*d QJ,(u,,u2,)


for all u1




and J2

(uI,v, u 2 N )



J2 ( U I N 9 U 2 N ) = J 2

b l N ,


T2 ( U I N ) )


and from (2) and (7) Jl




T2 (.:>I

f J 1 (UlN9U2N).


Similarly, from (5) J l ( U I N 3 U 2 N ) = J, (Tl ( U 2 N ) , U 2 N )


and from (4) and (9) J 2 b l > T2 ( u J > f J 2



for all u2. For simplicity,weassume that for each ul, T2(u,)yields a unique u2. The leader chooses u: such that Manuscript received August 22, 1977; revised November 8, 1977. This work was supported in part by the National Science Foundation under Grant ENG-74-20091, in part by the U. S. Energy RF,e?rch and Development Administration, ElectricEnergySystems Dmslon under Contract EX-76-C-01-2088, and in part by the Joint Senices Electronics Program under Contract DM-07-72-C-0259. The author is with the Decision and Control Laboratory, of the Coordinated Science Laboratory and the Department o f Electrical Engineering,University of Illinois at Urbana-Champap, Urbana, IL 61801.

J 2 ( T ,( u ; * ) , G * )



Thus, for the leader a Stackelberg strategy is at least as good as any Nash strategy. For the follower, the Stackelberg strategy may or may not be preferable compared to a Nash strategy. It is assumed that the leader knows the cost function mapping of the follower, but the follower might not know the cost function mapping of the leader. However, the follower knows the control strategy of the leader and he takesthis into account in computing his strategy. This

0018-9286/78/0400-0244$~.75 01978 IEEE



and the effects of random initial conditions are averaged 1121. The open-loop strategy for the leader for the entire duration of the game is declared in advance. If the Open-Loop Stackelberg Strategies for Dynamic Games follower minimizes his cost function, he obtains his Consider a dynamic system follower Stackelberg strategy which is the optimal reaction to the declared leader strategy. By declaring his strategy in =f ( x ,u,, 4 1) (1 advance, the leader influences the follower to react in a where x E R is the state, u1E R ml and u2E Rm2 are the mannerwhich, of course, minimizes the follower's cost controls, and f is a piecewise continuous function from function, but more importantly, in a mannerwhich is R" X R m 1 X Rm2 to R". In the dynamic system case, it is favorable to the leader. This is a direct interpretation of necessary to specify what type of information is available the definition of the leader's strategy in (2). Similarly, for to each player. Suppose no state measurements are avail- closed-loop strategies where the state is available for able. In this case we consider open-loop strategies. measurement, the leader has to declare his control law for the entire duration of the game. Associated with each player is a scalar cost function In situations where either player might be a leader, both cases should be examined because both players may insist ~ ~ = K , [ x ( T ) ] +0S ' ~ ~ ( x , u , , u , ) di=1,2. t, (12) on leader strategies in which case there maybedis1 as leader, the necessary conditions for equilibrium, or both mayplayfollower strategies and a WithPlayer [5]. The stability of thesedis(ul,uJ to be an open-loop Stackelberg strategy pair are stalemate mayoccur equilibrium strategies has beenexaminedbyOkuguchi 121, [41u 1) [ 171. One of the disadvantages of using Stackelberg strategies is that for the leader the principle of optimality does not hold, and hence dynamic programmingcannot beapplied. For example, if the open-loop Stackelberg strategies for a discrete-time game in the interval [to,$] are applied in the interval [to,ti] where to < ti< $, and if the open-loop Stackelberg strategies for the same game but for the interval [ti,$1 are computed, the new strategies will generally not coincide with the continuation of the old strategies for where [ti,91. Similarly, the principle of optimality does not generally hold for closed-loop Stackelberg strategies. The leader-follower solution concept assumes a commitment by the leader to implement his announced strategy. This commitment is for a game over the interval [to,$]. If the actual interval were different, the committed strategy generally would not coincide with the Stackelberg strategy for the new interval, but the leader would be obliged to use the nonoptimal strategy.

reaction behavior of the follower is known to the leader who optimizes his choice of control ul.

Feedback Stackelberg Strategy

A modification of the Stackelberg strategy concept which requires that the strategies for the remaining time-to-go after each stage should be optimal in the extended Stackelberg sense to be defined is presented in [4]. Webrieflyreviewthisextended strategy here. We consider a multistage discrete-time game where state measureTo distinguish the ments are available tobothplayers. extended strategy from the closed-loop Stackelberg strategy, it is called feedback Stackelberg strategy. Other information structures maybe considered, and to disExplicit solutions in t e r n of matrix Riccati equations tinguish the extended Stackelberg strategy from the basic are given for the linearquadratic problem in [2], [3]. one, it is called equilibrium Stackelberg strategy [7]. Consider a discrete-time system Necessary conditions for the closed-loop Stackelberg strategy are extremely difficult to characterize [4]. Simplifications are possible when the structure of the control law is constrained, e.g., restricting the control law to be linear



where x(j)ER", u I C j ) E R m l , u2Cj)ERmz for j = O , . . - , N - 1. A cost functional N-1

J ; [ x ( k ) , u , , u , ] = K , [ x ( N ) ] + 2 Li[X(j),dj)J42(j)] j- k

(23) is associated with each player i, for i = 1, 2, where ui= { u,(k),. * ,u,(N - 1)). Suppose that Player 1 is the leader. Denote the feedback Stackelberg strategies, with Player 1 as leader, for a game starting at time k, by u:, and u&. These are sequences of functions of the state at each stage fromtime k to time N - 1. Denote the resulting cost functions usingthesefeedback Stackelberg controls by y[x(k), k]. A key defining property of feedback Stackelberg strategies is that if utl is a feedback Stackelberg strategy for a game starting at time k and ending at time N , then the continuation of starting fromtime k + 1 is a feedback Stackelberg strategy uL,+ for a game starting at time k + 1 and ending at time N . Thus, for a game starting at time k , we consider only those control sequenceswhose continuations are u,",+'. The resulting cost functions are


J j = L i [ x ( k ) , u l ( k ) , u 2 ( k ) ] v+ - [ x ( k + l ) , k + l ] . (24)

leader's cost associated with remaining stages-to-go corresponding to the closed-loop Stackelberg strategy may be higher than that corresponding to the feedbackStackelberg strategy. This isbecause the continuation of the original closed-loop strategy is generally not a closed-loop Stackelberg strategy for the remainingstages-to-go. In contrast, for the last stage, the feedback Stackelberg strategy is also the optimal closed-loop strategy for a one-stage game. The feedback Stackelberg strategy for the next to the last stage is chosen under the constraint that the strategy for the last stage is a feedback Stackelberg strategy. The feedback control law is computed backward in time in this fashion, as indicated in (25), (26), and (27). The application of the Stackelberg concept to the cost function in (24)is not the only way we can define a feedback Stackelberg strategy. Suppose that the number of stages is even. Then we might consider that the continuation of any strategy two stages later has the same optimality property in the sense to be defined by the feedback Stackelberg strategy. That is, if u l l is a feedback Stackelberg strategy, wemight want to constrain the admissible control strategies so that the continuation two stages later is equal to u,",+,, which is the feedback Stackelberg strategy for a game starting at k + 2. The resulting cost function to be optimized is

If there are no constraints on the controls, wehave the following necessary conditions: Ji=Lj[x(k),u,(k),u2(k)]


+L,[x(k+ l),u,(k+ l),u,(k+ l)] y[x(k+2),k+2]. (28)


af' a v; +X'(k)au2(k) ax(k+ 1) for i = 1,2. (26) The boundary conditions for (25) and (26) are

vi[ x(N),N] =K,[x(N)],

i = 1,2.


From the definition, the optimality of the feedback Stackelberg strategy does not depend on the number of stages in the game. Continuations of feedback Stackelberg strategies are optimal in the feedback Stackelberg sense for any number of remaining stages. On the other hand, the Stackelberg strategy, open-loop or closed-loop, is tuned to a specificnumber of stages and to a specific starting time. For such a fixed interval and fixed starting time, the leader's cost corresponding to a feedback Stackelberg strategy may not be as low as that corresponding to the closed-loop Stackelberg strategy. However, the

Thus, J, is to beminimizedwithrespect to u2(k) and u,(k+ l), and J, is to be minimized with respect to u,(k) and u,(k + 1) subject to the constraint that J , is minimized with respect to u2(k) and u,(k+ 1). The resulting control law would be different from the previously defined feedback Stackelberg strategy. To differentiate these different feedback concepts, we call the previous one Type 1 feedback Stackelberg and the secondoneType 2 feedback Stackelberg. Type n feedback Stackelberg strategies may be similarly considered. For a conventional optimal control problem, all these types yield the same control and they are obtainable bydynamicprogramming and the principle of optimality. It does not matter whetherwe minimize a cost function such as in (24) or (28). But in a Stackelberg game situation, each choice yields a different feedback Stackelberg strategy. If n is taken as N - k , the Type n feedback Stackelberg strategy becomes the closed-loop Stackelberg strategy. The leader-follower strategy for two-person games may be extended to multilevel control of large-scalesystems. The basic approach has been outlined in [9]. A class of two-levelsystems has been considered in [13]. In the following sections we will review recent results pertaining of tospecificclasses of linear systems in twotypes hierarchy. One hierarchy consists of two levels of decision makers where the first level is for coordination camed out M by a leader, and the secondlevelisoccupiedby decisionmakersbehaving as followerswhouse a Nash








strategy with respect to each other. Another hierarchy is a linear M-level structure where each decision maker, except the first and last ones,is a leader with respect to succeeding decision makers,but a follower with respect to preceding ones.




Basic Coordination Concept

Let us consider the basic concept of coordination in a static system with two decision makers, each with a scalar performance index, and each controlling a separate variable uI and u2, respectively. Let us suppose that each of the two scalar performance indices is also affected by a third variable uo whichischosen by a third decision maker called the coordinator. Denote these scalar performance indices by Jl(uO,ul,uJand J2(u0,ul,uJ. For each value of u,, the controls uI and u2 are chosen according to a game solution concept appropriate for a particular situation. For example, if uI and u2 are chosen as Nash equilibrium solutions, JI

b o , Tl (uo), T2 (uo)) G J l b o , u1, T2 ( U o N


b o ?Tl (uo), T2 (uo)) Q J 2 b o , TI b o ) , u2)

where uI= T,(uo) and u2= T2(u0)are Nash solutions for the given u,. The coordinator chooses a value for uo such that a scalar performance index Jo(uo,ul,uJ is minimized subject to the condition that u1= Tl(uo)and u2= T2(u0). Thus, the coordinator acts as the leader and the two other decision makers act as followers in the Stackelberg sense. The coordinator chooses u; such that

Jo[ uo",TI (u,s), T2 (uos)J G Jo[ uo, TI (uo), 7-2 ( u o ) ] for all uo in the admissible set. The coordinator performance index could represent a composite function reflecting the welfare of the entire system. For example, the index J , might be a convex linear combination of J , and J2: J o ( u , , u 1 , ~ 2 ) = ~ 1(uo,uI,u2)+"2J2(uo,uI,u2) J, where a,>O, a,>0, and a1+a2=1.

In this case, uI and u2 might be chosen as Nash equilibrium solutions when the two decision makers cannot be guaranteed to cooperate.However, the introduction of a coordinator which chooses a third control variableenforces a restricted Pareto optimality inthe sense that

aIJ1 ( ~ o s , u S , u ~ ) + ~ 2 J 2 ( ~ ~ , u S , u ~ ) GalJI(~o,~I,~2)+~2J2(~0,~1,~2)

for all admissible uo,ul,u2. However,in the case of the Stackelberg coordination of the Nash decision makers, the allowed controls for u1 and u2 are Tl(uo)and T2(u0)for all ug. Without coordination, the variable uo isassumed to take a nominal value Go. With coordination, uo is chosen as u;. Thus, in cases where the controls u1 and u2 have to be chosen without cooperation, a limited type of Pareto optimality can still be achieved byintroducing a coordinator. We consider a linear stochastic discrete-timesystem with one coordinator at the firstlevel and M decision makers at the second level. For simplicity, we take M = 2 . The system is represented by x(k+l)=A(k)x(k)+Bo(k)uo(k) + B 1 ( k ) u ' ( k ) + B 2 ( k ) u 2 ( k ) + ~ ( k(29) ) where x ( k ) E R is the state, uo(k)E R mo is the control of the coordinator, u'(k)E Rml and u2(k)E Rm2 are the controls of the two decision makers at the second level, and u ( k ) is a vector noise disturbance. The quantities x(0) and u ( k ) are Gaussian random vectors with zero mean and covariance P ( 0 ) and A(k), and the measurement of each decision maker is z'(k)=H'(k)x(k)+('(k)



where ('(k) is a Gaussian random vector with zero mean and covariance Z ( k ) . It is assumed that x(O), c(k), and ( ' ( k ) are mutually independent. The cost function for each i is

Ji(ui)=ix'(N)Ki(N)x(N) N-1

+ f 2 [x'(k)Q'(k)x(k)+(u')'(k)R'(k)u'(k)], k=O

i=O, 1,2. (31)

The matrices A(k), B'(k), H'(k), K ' ( N ) , Q'(k), and R ' ( k ) are known to all decision makers. It is assumed that R' and K' are positive definite and Q' is positive semi-definite. The problem of finding the feedback Stackelberg strategy for the two-level hierarchy where Decision Makers 1 and 2 play Nash betweenthemselveswas recently considered for several information structures by Glankwamdee and Cruz [15]. In this section we summarize some of these results.

Perfect Information

Here it is assumed that all decision makers haveperfect knowledge of the state through their measurements z"k)=z'(k)=z2(k)=x(k).


We seek coordinator strategies which are functions of the



state, and follower strategies which are functions of the stateand the coordinator control strategy. Denote the resulting expected cost-to-go at stage k by V'(k)=fx'(k)S'(k)x(k)+;y'(k),

i=O,1,2 (33)

for some deterministic matrix S ' ( k ) and scalar function y ( k ) when feedback Stackelberg strategies are applied. Using the solution concept of Type 1 feedback Stackelberg strategy discussed in the previous section, we have


~ ' ( k ) min = [f~'(k)~'(k)~(k)+~(u')'(k)~'(k)u'(k) u'(k)

+ E { V ( k + I)}],

i= 1,2. (34)

For a given feedback control law for the coordinator, the two minimizations in (34) define the Nash game between 1 and 2. Substituting the expression DecisionMakers from (33) with k replaced by k + 1 into (34) and using the state equation in (29), the minimizations yield expressions for u'(k) and u2(k) in terms of A ( k ) , B'(k), S'(k+I), Q'(k), R'(k), uo(k), and x ( k ) in the form [15]

ui(k)= - A i ( k ) [ A ( k ) x ( k ) + B o ( k ) u o ( k ) ] ,


(35) For the coordinator we have

+ i ( u o ) ' ( k ) R o ( k ) u o ( k ) + E Vo(k+l)}]. { (36) Beforeperforming the minimization in (36), we express V o ( k + 1) in terms of S o ( k + 1) and yo(k + 1) from (33), the state equation of(29), and the follower control laws from (35). The resulting minimization results in a coordinator control law of the form [15]

The matrix gains Lo(k)and k ( k ) are computed from a set of recursive equations backward in time starting with k = N - 1. The coordinator's control law,i.e., Lo(k), is known in advance to all the M second-level decision makers. The recursive equations are L ' ( k ) = [ R ' ( k ) + ( B ' ) ' ( k ) S ' ( k + l)B'(k)]-' - ( B ' ) ' ( k ) S ' ( k +l),

A'(k)= [ I - L ' ( k ) B - "(k)L'(k)B' -(L'(k)-L'(k)B'(k)L'(k)),

i = 1,2 (38)

(IC)]-' i = 1,2,j=1,2, iZj

(39) i(k)=A(k)-B'(k)A'(k)A(k)-B'(k)A'(k)A(k)


B^ ( k ) = B o ( k ) - B 1 ( k ) A ' ( k ) B o ( k ) -B 2 ( k ) A 2 ( k ) B o ( k ) (41)


LO(k)=[RO(k)+Bl'(k)SO(k+l)Bl(k)I-' .B^'(k)So(k+l)Al(k) (42)

S o ( k ) = Q o ( k ) + A l ' ( k ) S o ( k +l ) 2 ( k ) - ( L o ) ' ( k )

. [ RO(k)+ S'(k)SO(k+ 1)S (k)]LO(k) (43) Si(k)=Q'(k)+[A(k)-Bo(k)Lo(k)]'(k)'(k)


+ [ i ( k ) - S (k)LO(k)]'S'(k+ 1) .[2(IC)- B^ ( k ) ~ O ( k ) ] , i = 1,2.


Equations (38H44) are solved in the sequence presented starting with k = N - 1 with boundary conditions




The calculations are repeated for k = N - 2, k = N - 3, and so forth, until the specified initial timeis reached. The y ' ( k ) in the cost function are obtained from the following:




i=o, 1,2.



As in the two-persongamediscussed in the previous section, the feedback Stackelberg strategy for coordination is an equilibrium strategy in the sense that the continuation strate0 after one stage is an optimal feedback Stackelberg strategy for the remainder of the game. Similar solutions are obtainable for another special information structure, namely, when z ' ( k ) = z2(k) and the coordinator's measurement consists of at least z'(k). This nested information structure includes three subcases: 1) when the lower level decision makers have identical noisy measurements and the coordinator has perfect knowledge of the state; 2) when the lower level decision makers have no measurements and the coordinator has some measurement; and 3) whenallmeasurements are identical. For thisnested information structure case, and using the feedback Stackelberg concept, the optimal cost functions for the lowerlevelsubsystemscanbeexpressed as a quadratic form in the conditional expectation of the state, given the measurement of the lower level subsystem, plus y ' ( k ) in the perfect measurement a termanalogousto case. For the coordinator, the optimal cost function is expressible as a quadratic form in the conditional expectation of the state, given the coordmator's measurement, and in the difference between the two conditional expectations, given the two different measurements, plus a termanalogous to yo(k). The lowerlevel controls are linear in the conditional expectation of the state given theirmeasurement, and linear in the given coordinator control. The gain matrices are identical to those for perfect measurement so that a separation principle applies to the lower level decision makers. The coordinator control is linear in the two conditional expectations of the state. These conditional expectations are obtained from Kalman








filters. The procedure is analogous to that presented in [8]. Si(k)=Qi(k)+(H')'(k)(F')'(k)R'(k)F'(k)H'(k) Details of the recursive equations and their derivation are given in [151. B'(k)Fj(k)H'(k) 'S'(k+l) + ( d ( k ) + j=Q


Nonnested Information Structure


x 2

B'(k)Fi(k)Hi(k) , i=1,2 j=O When the leader does not know all the measurements of the second-leveldecisionmakers and/or whenthesecS ' ( N ) = K ' ( N ) , (53) i = 1,2. ond-level decision makers do not have the same measurement, it isextremely difficult to formulate an optimum For the coordinator feedback Stackelberg problem. However, when the structure of the individual control laws is specified, e.g., when E I J o ( k ) ]= $tr [ S O ( k ) P ( k ) ] it is constrained to be linear, necessary conditions can be derived. We consider control laws of the form A(/?)+


where F'(k) is to be determined so that the control law is feedbackStackelberg.Necessary conditions for the determination of F'(k) have been derived in [ 151. Using the control (48) where z'(k) is given in (30), and defining



we obtain where

x B'(k)F'(k)H'(k) 2



S o ( k ) = Qo(k)+(Ho)'(k)(Fo)'(k)Ro(k)Fo(k)Ho(k) +[A(k)+BO(k)FO(k)HO(k)



+ 2 B' (k)F' (k)H' ( k ) i=O

+ x Bi(k)Fi(k)X'(k)(Fi)'(k)(B')'(k)+A(k).(50) 2

+B1(k)F'(k)H1(k)-tB2(k)F2(k)H2(k)]' @ ( k + l)[A(k)+BO(k)P(k)HO(k)



so( N )= KO ( N ) . (56) It is assumed that P(0) is given. Given a set of feedback matrixsequences {F' (j)}, thesecondlevelcost-to-go In accordance with the feedback Stackelberg concept, expressions may be written as we consider strategies whosecontinuations after one stage are feedbackStackelbergstrategies for theremaining N stages. Thus, we write E[J'(k)]=$tr[S'(k)P(k)]++ trS'([)h(Z-l)



E {J' (k)}= E { f [ x'(k)Qi ( k ) x ( k )


tr(F')'(Z-I)(R'(Z-I) Z=k+l

+(Bi)'S'(Z)B'(Z- l ) ) F ' ( l - l)Z(l- 1)


+E{J'(k+l)}, i=0,1,2



where E {J'(k 1)) is obtained from (51) with k replaced by k + l , and { F ' ( j ) } from j = k + l to j = N are the E { J ' ( k ) } is minifeedbackStackelbergmatrices.Then mized with respect to F'(k) and E { J 2 ( k ) }is minimized with respect to F2(k). These minimizations yield expressions for F'(k) and F2(k)in terms of Fo(k) and the other matrices that appear in (51). Thesematrices F'(k) and F2(k) are substituted in (57) for i=O and theresulting expression for E { J o ( k ) } isminimizedwithrespect to Fo(k).This yields an expression for Fo(k) in terms of the matrices appearing in (57) and (51)except F'(k) and



F2(k). Combining these equations, we can obtain coupled difference equations in Si(,%), for i=O, 1,2, andP ( k ) , with boundary conditions P(0) and S ' ( N ) . Thus, the feedback Stackelberg matrices F'(k) are expressed in terms of solutions of a two-pointboundary value problem.We note that even in the case of a standard stochastic optimal control problem where the control law is constrained to bea linear function of the measurement, as in (48), a two-point boundary value problem arises [ 181. The game problem cannot be expected to be simpler. Details of the two-point boundary value problem are in [15]. The resulting feedback Stackelberg matrices {F'(k), k = O , l , . . . , N - l ; i=O,I,2} are functions of P(O)= E { x(O)x'(O)} = m,m~+cov[x(0)] where m, is the mean of x(0) and cov [x(O)] is the covariance of x(0). Thus, these feedback matrix sequences are based on data at the start of the game, k=O. Since measurements are obtained at each sampling instant, updated estimates of P ( k ) might be available. For example,suppose that at time k = r , there is a new estimate of P ( r ) . A new set of feedback Stackelberg matrices { F'(k), k = r, r 1,. . . , N - 1; i = 0,1,2} could be computed. These new sequences are functions of P ( r ) . In principle, an updatedset of feedback Stackelberg sequences for the remaining stages-to-go could be considered at each stage r when a new estimate of P ( r ) is available. The method described abovemaybeextended to dynamic output feedback controllers of specified order. R e p resent the ith subsystem controller by





(58) where w iE RS' is the state vector of the controllers used. Then

in this section. We model the interconnected system by i = f ( x , u i ; i = O , 1;.




where x E R is the state, uj E R nz is the control of the ith decisionmaker, and the cost function for each decision maker is

The index i = O corresponds to the coordmator. The time instants to and tf are fixed. State measurements are made at r discrete instants of time {ti E [to,tf), i = O , l , . ,r - 1}. The controls are allowed to be functions of time f and the latest state measurement. Thus, for all i, uj= ui(r,x(tj)), for $ 9 r < I . Before time to, the coordinator announces h ~ s control law u,(t,r;i.> for f~[tj,tf], forj=O, I;. , r - 1. The second-level declslon makers take this given coordinator strategy into account in computing their individual sampled data strategy based on a Nash solution concept amongthemselves.The leader, talung intoaccount the reaction strategies of all the second-level decision makers, determines a sampled data strategy to minimize his own cost function, subject to the constraint that the remaining strategy starting from the nextsampling instant is also optimal in the feedback Stackelberg sense. This permits us to relate the optimum cost-to-go, in the feedback Stackelbergsense, at any sampling time $ to the optimal costto-go at sampling time Let the sampled data feedback Stackelberg costs to go at time $ bedenotedby K(x(tj), tj), i =0,1,. . ,m. By definition, for the interval [$$+ 1) tj+



vi (X,,$)= minu,

{ V,i(Xj+~,tj+~) + [""L,(x,u,;

ui(k)=Ni(k)wi(k)+~'(k)zi(k), i=O,1,2 (59) where zi(k) is the measurement (30). For a given s i (09s i < n), the matrices D ' ( k ) , M'(k), N i ( k ) , and F i ( k ) are to be found so that the controls are optimal in the feedback Stackelberg sense. For si=O, the problemis identical to the one considered previously. By augmenting the state space and byaugmenting the measurement z i with wi, the problem may be transformed to the same type considered previously. Details are given in [ 151.


k=O, 1;.

. ,m)dr) (62)


and where u,, for k # i are at their optimal values. For k = 1, . ,m, and for each u,, (62) is the usual condition for a Nash equilibrium solution. For i=O, the minimization of (62) is camed out under the constraint that the other controls u,, k f O are chosen to satisfy (62) for all i#O, for each u,. For each ~ . ( X ~ + ~ , $the + J problem , 111.SAMPLED DATACOORDINATEDCONTROL OF posed above is an open-loop Stackelberg problem with INTERCOhmmD Commwous-Tm SYSTEMS one leader and several followers for a game in the interval In this section we consider the two-level control of where the followersplay a Nashgame among interconnected continuous-time systems,where the first- themselves [6]. The problem is much more complex, howlevel decision maker is a coordinator and the second-level ever, because V,(x,+ I , $ + l ) is also to be determined. The decision makers are followers in the Stackelberg sense, sampled-data feedback Stackelberg concept is similar to and where the decisionmakershavesampled data state the feedback Stackelberg concept for discrete systems in as in (62) measurements. This problem has been examined in [I61 the sense that y.(x,,tj)is related to V,(xj+ where necessary conditions have been derived, and where or (28). However, for the sampled data case, an open-loop efficient solution algorithms havebeen derived for the control time function between sampling times is required. linear quadratic case. We briefly review the results in [16] The usual dynamic programming approach is not applica-

25 1


ble to suchproblems, but a variational method can be used. For each second-level decision maker define a Hamiltonian

H;(x,~;,u~;~~,u,)=L~(x,u~; k=O,l,-..,m) +pif(x,uk; k = 0 , 1 , - - - , m ) .


For any given uo, the necessary conditions for optimality for i = l , . - - , m are i=f(x,ui; i=O, 1;

* *


X($)= X,



aH; O= au;

The necessary conditions, for t E[5, $+ 1; ,m are

forj=O;..,r, for t E [ $ , $ + l ] .The gain matrices K.(t) are obtained by integration of a set of linear differential equations over one sampling interval. A matrix inversion of dimension n is needed at each sampling instant. For details see [16]. For each interval, the solution procedure is the same as that for open-loop Stackelberg strategies except that the boundary conditions are in terms of optimalcost-to-go functions which are reminiscent of feedback Stackelberg strategies for discrete-time games. The sampled-data Stackelberg strategyhas features of open-loop Stackelberg strategies for continuous-time games and feedback Stackelberg strategies for discrete-time games.

j = 0,

- ,r, i =

The basic Stackelberg strategy for two-level sequential decisionmaking can be generalized to three or more levels, as outlined in [9]. For simplicity, we consider only onedecisionmaker at eachlevel,thusyielding a linear hierarchy. Three caseshavebeen treated recently: 1) open-loopmultilevelStackelbergstrategies for continuous-timesystems [lo], 2) closed-loopmultilevelStackelbergstrategies for continuous-time systems [12], and 3) feedbackmultilevelStackelbergstrategies for discretetime systems [19]. Theseresults are brieflysummarized here. Open-Loop Multilevel Stackelberg Strategies Consider a three-level Stackelberg problem for a linear system , ~ = A x + B ~ u ~ + B ~ u ~ + B ~ u(74) ~

with associated cost function T

J j ( ~ , , u 2 , u 3 , x 0 ) = ~( xJ’ Q j x + 10



2 ujlRiiuj)dt j= 1

+ ; x ( T)’&x(T )


for eachdecisionmaker Pi. P , isthefollower at the bottom of the linear hierarchy. He knows the controls u2 and uj of the other decision makers. P2 is the middle who that P I reacts according to knows u3, but heknows declared functions u2 and u,. P, is the leader who knows that P2 reacts according to his declared control u3, and who takes into account the reaction of P, to declared controls u2 and 24,. Necessary conditions for this problem are derived in [lo]. For P , the necessary conditions are (74)

for yi defined on the (’ -j1)st where y;(tj-) =lk~~+~-y;(t) interval [ 5- $) and yj( tj”)= yj( tj) defined on thejth interPI=-Q~x-A’PI~ PI(T)=FIX(T) (76) val An efficient procedure for solving this complicated ( r + 1)-point boundary value problem is given in and [16] for the linear-quadratic case. The controls are exO= R,,ul+B~pl. (77) pressible in the form Assuming that R I Iis positive definite, control u1 may be u;(t,x,)=q(t)x($) (73) expressed as



u 1 = - R 1-'B 1 21-

2, APRIL 1978





Notice the that controls u2 and u3 influence the costate vectorpl and, hence, u1 depends on u2 and 24,. Substituting (78) in (76), we have

nj= Pix, w xw =

i = 1,2,3

(93) (94)

f = A x - S , p , + B2u2+ B3u3, S I =B,R,'B;, x(tO)=xw (79)

Substituting (78) in (75) for i=2, we have

one obtains coupled quadratic matrix differential equations in K,, Pi,and W with boundary conditions at r, and T [ 101. The open-loop Stackelberg strategies are i = 1,2,3

uj= - Rjz:'B,'&+(t,t0)x,


where +(t,to)is the fundamental matrix of the system J 2 = $ ~ 0 T ( x ' Q 2 x + p ; S 2 1 pu;R,u2+ ,+ u;R,u,)dt

+ $x( T)'F2x(T )

i = ( A - S l K l - S2K2- S ~ K ~ ) X , x ( t o ) = x @ (96)


In[lo] it isshown thatthe two-pointboundary value problem can beconverted to a higher order matrix Riccati where Srl= B,R,;'R2!RG1B;.The necessary conditions differential equation with a given terminal condition. The that characterize u2 mmmizing (80) under the constraints coefficient matrices of this higher order Riccati equation (79) and (77) are (79), (73, and do not possess the symmetry and positivesemi-definiteness of usual optimal control problems. However, in [IO] it $ ~ = - Q ~ x - A A ,~1.1Q + P ~ ( T ) = F ~ x ( T ) - F I ~ I ( Tis)shown that if the 4n x 4 n solution of the Riccati equa(8 1) tion is partitioned into four 2n X 2n matrices, the block off-diagonal matrices are symmetric and the block diagok l = - S 2 1 p I + S l pn2, (+t A , )n=,O, (82) nal matrices are transposes of each other. Furthermore, if a solution exists, then the block off-diagonal matrices are and assuming that R , is positive definite, positive semidefinite. ~2 = - R; IB&2. (83) Closed-Loop Multilecel Stackelberg Strategies Substituting 2 4 ,from (83) in (79), we have The determination of necessary conditions for optimalx ( ~ , ) = x., ityin the closed-loop Stackelberg senseisverydifficult, ~ = A x - S ~ ~ ~ - S ~ P ~ + S2=B2RS1B;, B,U,, (84) even for linearquadratic problems. In r121, linear closedloop Stackelberg strategies are consideied for linear sysEquation (77) represents the reaction of PI to a given u2 tems with quadratic cost functions, where it is shown that and u3' and (81) and (82) represent the reaction Of p2 to the optimal closed-loop Stackelberg strategies for such problems are nodinear functions of the initial given '3. Equation (84) is the state equation for a given '3 using the reactions of P I and P,. Substituting the controls state and the state. By assuming that the initial state is uI from (78) and u2 from (83) in (75) for i = 3 , we have random and taking the expectation of the original cost function as a new cost function, linear closed-loop strate~3=~'(x'Q3x+p;S31~1fp~S32p2+u~R33u3)dt giesmay be optimal closed-loop Stackelberg strategies lo provided certain matrix differential equations have + $X(T)'F,X(T) (85) bounded solutions. For simplicity, it is assumed thatthe

~ = A x - s , ~ , - S , ~ , - S x~( t~O~) =,x o , P3=

- Q , x - A A , + QI++

(86) (87)



k2= - ~ , ~ p , + s , p , + ~Sn, ~,W+, n2(t0)=0,

k,= - S32p2+ S g 3 +SAI w n,-

w = - Qln3- A'w, and assumingpositive that R,, is

n3(tO)=0, (89)

w ( T )= Fln3(T ) ,



~ 3 = RG1B&3.

By using the relations


ui= - L i ( r ) x ,



where the feedback matrices L j ( t )are bounded. When the linear controls in (97) are substituted in (74) and (79, it is clear that Ji can beexpressed as

J j = f x a , (r,)x,


where M j ( t ) satisfies the Lyapunov equation 3


A ~ ~ + A A ~ M , + M , A ~L;R,L,+Q=o, + M;(T)=C. J =I







x BjLj.



A,= A -



The initial state x, isassumed to be random with zero mean and unity covariance matrix. The new cost function We considerthedecisionmakers in a linear hierarchy is where the top decision maker is PI who chooses u1 and J i = E ( ~ i } = ~ t r ~ i ( t o ) E ( x ~ ~ } = f t r M(101) i ( t o ) the . last decisionmaker PM chooses uM. Weexamine feedback Stackelbergstrategieswherethe controls are The linear closed-loop Stackelberg problem can now be functions of the state, and where we require that continuations of the strategies starting at time j , for j > 0 are also restated as an open-loop Stackelberg problem where-the matrices 4. are the controls, the cost functions are Ji in feedback Stackelberg strategies for games starting at time (101), and the matrix differential equations in (99) are the j 1. The solution for this problem is given in [ 191 where constraints. The lowest level decision makerP, chooses L , Riccati-type equations for the feedback gain matrices are and the highest level decision makerP3 chooses L,. For a derived. The ith control can be expressed as can be minimized with respect to L , given L, and L,, subject to (99) using the matrix minimum principle or a standard variational approach, yielding



k,+AfM, + MIA,+ M,S,,Ml

+ L $ R I 2 4 +L;R13L3+Q , = O MI (TI = Fl

(102) (103)


s,,= B,R;








Ai(k ) = Ri ( k ) + B/ ( k )

-( fi -( fi ( fi


j=i+ 1

Ll = Rfi’BiM,.


With L , chosenas in (105),minimization of y2 with respect to L, with the constraints (102) and (99) for i = 2 yields a set of matrix Riccati-type differential equations and one algebraic matrix equation which is linear in L,. Finally, we minimize y3with respect to L, subject to the constraints (99) for i =3, (102), (103), (105), and the add& tional Riccati-type equations from the minimization of J2 with respect to &. This yields more Riccati-type differential equations and one algebraic matrix equation which is K, linear in &. Thus, a largeset of matrix Riccatitype differential equations must be solved with boundary conditions at t = to and t = T. These necessaryconditions have been derived in [12]. When the matrices in (74) and (75) are time-invariant and when T-xQ, these differential equations are replaced by algebraic equations which are obtained by deleting the time derivative tenns. An algorithm for this problem is suggested in [12].


j=i+ 1


j= i + 1

[I-Bj(k)Aj(k)] ,

i = l , - - . , M - l , (109)

j=i+ I




j= 1

i = l , . . * , M (110)

Feedback MultiIeveI Stackelberg Strategies Consider the linear discrete-time system

x M


Bi(k)ui(k), x(O)=x,.


i= 1

The cost function for each decision maker is

The matrix product notation used above is defined by







. GI-

(1 14)

i= 1

Sufficient conditions for the existence of the inverses and minima of the cost functions are Y N> 0, Q,G)2 0, and R,G>>O, i = l , - - . , M ; j=O,-.., N - 1. After all the feedback controls are substituted in (104), the state equation becomes

[l] H.von Stackelberg, The Theory of the Market Economy. Oxford, England: Oxford Univ. Press, 1952. 121 C. I. Chen and J. B. Cruz. Jr.. “Stackelbere solution for two-wrson games -with biased inforktion patterns,” IEEE Trans. AAomat. Contr. vol. AC-17, pp. 791-798, 1972. M. Simaan and J. B; Cruz, Jr., “On the Stackelberg strategy in nonzerc-sumgames, J. Opt. Theory Appf., vol.11, no. 5, pp. 533-555,1973. -‘‘Additi:nd aspects of theStackelbergstrategy in nonzerosum games, J. Opt. Theory Appl., vol.11, pp.613-626, no. 6, 1977 I, e.,

x ( k + l)=+(k)x(k).

(1 15)

From the form of +(k) in (113), it isseen that each decision maker, starting with the top, shifts the eigenvalues of A ( k ) , and the final shifted A matrix after all decision makers have acted, is as noted in [19].



T. Basar, “On the relativeleadershipproperty of Stackelberg strategies,” J. Opt. 7’heory Appl., vol. 11, pp. 655-661, June 1973. M. Simaan and J. B. Cruz, Jr., “A Stackelberg strategy for games with many players,” IEEE Trans. Automat. Contr. vol. AG18, no. 3, pp. 322-324,1973. J. B. Cruz,Jr.,“Survey of Nash and Stackelbergequilibrium strategies in dynamicgames,” Annals of Economic and Social Measurement, vol. 4, no. 2, pp. 339-344, 1975. D. Castanon and M. Athans, “On stochastic dynamic Stackelberg strategies,” Automatica, vol. 12, pp. 177-183,1976. J. B. Cruz, Jr., “Stackelberg strategies for multilevel systems,” in Directions in Large Scale @stems, Y. C. Ho and S. K. Mtter, Eds. New York: Plenum, 1976, pp. 139-147. J.Medanic and D. Radojevic, “On themultilevelStackelberg strategies in linear quadratic systems,” J. Opt. Theory AppL, vol. 24, 1978. [I]] B. F. Gardner, Jr. and J. B. Cruz, Jr., “Feedback Stackelberg strategy for a two player game,” IEEE Trans. Automat. C o w . vol. AC-22, pp. 270-271, Apr. 1977. [12] J. Medanic, -closed-loop Stackelberg strategies in h e a r - q u h a t i c problems,” inProc. I977 JACC, San Francisco, CA, June 1977, pp. 1324-1329. [13] M. Simaan, “Stackelberg optimizationof two-level systems,” IEEE Tram.Syst., Man, Cybem, vol.SMC-7, pp. 554-557, July 1977. [14]D. Castanon, “Equilibria in stochastic dynamic games of Stackelberg type,” ElectronicSyst.Lab.,M.I.T.,Rep.ESL-R-662,May 1976. [15] S. Glankwamdee and J. B. Cruz,Jr.,“DecentralizedStackelberg strategies for interconnectedstochasticdynamicsystems,” to be presented at the 7th Triennial World Congr. of IFAC, June 1978, Helsinki, Finland, June, 1978; also, UIUC, Decision and Contr. Lab., Rep. DC-1, Mar. 1977. [16] P. M. Walsh and J. B. Cruz,Jr., “A sampled data Stackelberg coordination scheme for themulticontrollerproblem,” in Proc. I977 IEEE Con$ on Decision and Control, pp. 108-114,New Orleans, L A ; also UIUC, Decision and Contr. Lab., Rep. DC-3, Apr. 1977. [I71 K. Okugucl,i, and in o~gopoly in Lecture Notes in Economics and Mathematical Systems, Mathematical Economics, vol.138.New York: Springer-Verlag, 1976. [18]C.M. Enner and V. D. VandeLinde, ‘‘Outxt feedback gains for a linear-discrete stochastic control problem, IEEE Tram. Automat. Contr. vol. AC-18, pp. 154-157, Apr. 1973.“ [19] B. F. Gardner, Jr. and J. B. Cruz, Jr.,FeedbackStackelberg strategy for M-level herarchid games,” IEEETrans.Automaf. Conrr., vol. AC-23, June 1978, to be published.

In large-sale system where there is a Of decisionmakers and whereeach decision maker has a different Performance god, it is natural to consider the control problem as a differential gameproblem. In t h s paper we have reviewed Some Of the recent work On Stackelberg strategies which are relevantwhen sequential decision making is appropriate and desirable. In general, the leader or top decisionmaker has the mostcomplicated optimization problembecausehe has to consider the optimal reactions of alldecisionmakerswho act after him. The decision makerwho acts last in a linear hierarchy has an ordinary optimal control problem. Although the followers optimize their own performance indexgiven the controls of previous decision makers, the leader chooses a control which optimizes his own performanceindex considering that the followerswill react optimally- In a sense, the leader influences the followers to choose controls which are beneficial to the leader. decision structure is One where there are severaldecisionmakerswho act simultaneously at a given level, and there is more than one level where level actions are sequential. In the paper we considered the case when there are only two levels. The first level has only one decision maker, called the coordinator, who acts first, and Jose B. Cmz, Jr. (S’56-M57SM61-F68) rethe second level, as a group, reacts to the action of the ceived the B.S.E.E. degree (summa cum laude) from the University of the Philippines, Diliman, coordinator. The second-level decision makers act simulin 1953, the S.M. degree from the Massachusetts taneously according to some game solution concept, such lnstitute of Technology, Cambridge, in 1956, as the Nash equilibrium treated in the paper. The and the Ph.D. degree fromUniversity the of Illinois, Urbana, in 1959, all in electricalenleader-follower strategy concept treated in the paper 953 to 1954 he taught at the University could provide a basis for the study of coordination in a : of thePhilippines.He was aResearchAssistant large-scale system. Although the decision makers who act in the M.I.T. Research Laboratory of Electronsimultaneously~and in a decentralized manner,adopt ics,Cambridge,from1954 to 1956. Since 1956he has been with the noncooperative strategies, the introduction of a coordina- Department of Electrical Engineering University of Illinois, wherehe tor who chooses additional control variables could alter Was an Instructor until 1959, an AssistantProfessorfrom 1959 to 1961, the framework in which the noncooperahg decision makers act. This influence could be exploited to improve overall system perfomance allowing the other’decisionmakers to continue pursuing their individual original objectives. was he

an AssociateProfessorfrom 1961 to 1965, and Professor since 1965. Also, he currently a R m c h Professor at the Coordinated Science Laboratory, University of Illinois, where he is Ihrector of the Decision and Contrh Laborat&. In 1964 he was a VisitingAssociate Professor at the University of California, Berkeley, and in 1967 he was an Associate of the Center for Advanced university of Illinois. In the Fall of 1973 Professor Visiting a at M.I.T. at and Harvard University.



AC-23,NO. 2,255 APRIL 1978

His areas of research are hierarchical control of multiple goal systems, decentralized control of largescale systems, sensitivityanalysis, and stochastic control of systems with uncertain parameters. He has written more than 90 papers in technical journals, coauthored three textbooks, and has served as Editor for two books. Within the IEEE Control SystemsSociety, Dr. Cruzserved as a member of the AdministrativeCommittee, Chairman of the Linear Systems Committee, Chairman of the Awards Committee, Editor of the ON AUTOMATIC CONTROL, a member of the InforIEEE TRANSACTIONS mation Dissemination Committee, Chairman of the Finance Committee, General Chairman of the 1975 IEEE Conference on Decision and Control, and Vice President for Financial and Administrative Activities. He is President-Elect for 1978 of the IEEE Control System Society. At

the Institute level,heserved as a member of the IEEE Fellow Committee, member of the IEEE Education Activities Board, and Chairman of acommittee for therevision of the IEEE Guidelines for ECPD Accreditation of Electrical Engineering Curricula in the United States. Presentlyhe is a member of the Meetings Committee of the IEEE Technical Activities Boardand a member of the IEEE Education Medal Committee. In 1972 he received the Curtis W. McGraw Research Award of the American Society for Engineering Education. He is a member of Phi Kappa Phi, Sigma Xi, and Eta Kappa Nu. He is listed in American M e n and Women of Science, Who’s W h o in America, and Who’s Who in Engineering. Dr. CNZ is a Registered Professional Engineer in the State of Illinois.

Specific Structures for Large-Scale State Estimation Algorithms Having Information Exchange CHARLES W. SANDERS, MEMBER, WEE, EDGAR C. TACKER, sEh?oR MEMBER, THOMAS D. LINTON, AND ROBERT Y.-S. LING

Absburct--This paper considers the design and evalnation of large-scale state estimation algorithm having specific structnres whichallowthe subsystems toexchange information overnoisy channels. The specific stroctores whicb are presented are f i i motivated by comidering the relative performancebetween tbe surely locally unbiased filter and a global dynamics fiter. The role of the snrely locally unbiased fiter in evaluating the tradeoffs between the cost of information transfer and fiiter performance is examined and a theorem is presented wbicb fonns tbe M i for an algorithm for calcnlating channel noise crossover levels. ’Ihe theoretical results are illostrated via an application to a power system model.

I. INTRODUCTION NCREASINGLY complex processes together with an I increasingly broad spectrum of available hardware have necessitated a reexamination of the tradeoffs between the various information structures on which system monitoring and control is based [1]-[3]. In addition to the fundamental information-theoretic aspects of decentralized structures examined in [4]-[6], the problem of system stabilizability for these structures has been treated in [7]-[9]. State estimation techniques which are compatible Manuscript received March 15, 1977; revised September 7, !977. This work was supported in part by the National Science Foundation under Grant ENG 75-13399. The authors are with the Department of Electrical Engineering and the Systems Engineering Program, University of Houston, Houston, TX 77ow.


with completely decentralized information structures have been considered in [12]-[14] and it has been shown [12], [13] that effective algorithms from this class can be developed. The primary objective of the present paper is to motivate and explore some specific state estimation algorithms for the case in which information exchange is permitted over noisy communication channels. After identifying the class of systems under consideration and the relevant assumptions being employed,two singular information patterns which can be used to bound the performance improvement attainable through information exchange are discussed in Section 11. It is shown that the information exchange residing in the subsystem interaction measurements can be used to obtain effective localfilters. Interaction measurement of noisecrossover levelswhichprovide a measure of effectiveness of the interaction measurements are introduced and a theorem is presentedwhichsuggests an algorithm for computing theselevels. The results of SectionI1 are then used in Section I11 to motivate twospecific structures having infomation exchange and a numerical example illustrating the performance capability of the resulting algorithms is presented. Consider a systemwhich can be modeled as a given collection {Si:i = 1,2,. * ,N 1 of N interconnected dynamical subsystems Si.On the time interval [0, co)each Si



01978 IEEE

Suggest Documents