Multistage Stochastic Linear Programming

0 downloads 0 Views 308KB Size Report
proximate computationally multistage stochastic linear programs with ..... µ)qt(−→xt−1;−→ηt−1,η. µ ..... fairly well with many randomly generated test examples.
Multistage Stochastic Linear Programming: Aggregation, Approximation, and Some Open Problems∗ Peter F´ usek† Peter Kall† J´anos Mayer† Suvrajeet Sen‡ Simon Siegrist†

Abstract The purpose of this paper is to investigate the possiblility to approximate computationally multistage stochastic linear programs with arbitrary underlying probability distributions by those with finite discrete probability distributions—to begin with, just for the special case of only the right-hand-side being random.

1

Introduction to MSLP

R

Given a probability space (Ω, G, P ), random vectors ξt : Ω −→ rt , and the → − − → probability distribution of ξ = (ξ2T , · · · , ξTT )T , with ξ t = (ξ2T , · · · , ξtT )T the multistage stochastic linear program (MSLP) reads as

(1.1)

    min{cT  1 x1        A11 x1

→ −    At1 ( ξt )x1        

+E

T X

→ − → − cT t ( ξt )xt ( ξt )}

t=2

= b1 +

t X

→ − → − Atτ ( ξt )xτ ( ξτ )

→ − = bt ( ξt ),

a.s., t = 2, · · · , T,

τ =2

→ − x1 ≥ 0, xt ( ξt ) ≥ 0



a.s., t = 2, · · · , T,

Presented at the Conference on Stochastic Optimization, Gainesville, FL (Feb. 2000) IOR, University of Zurich ‡ SIE, University of Arizona, Tucson †

1

→ − where with Ft = σ( ξt ), t = 1, · · · , T , the σ-algebras in Ω generated at stage t → − by ξt (where ξ1 is assumed to be single-valued, hence F1 = {Ω, ∅}), we have → − that Ft ⊂ Ft+1 for t = 1, · · · , T −1, and xt ( ξt ) is to be Ft -measurable (i.e. the → − policy {xt ( ξ t ), t = 1, · · · , T } is assumed to be F-adapted or nonanticipative). To have problem (1.1) well defined—and for computational purposes—we make the following Assumptions:

R ) ∀t, ∈ L (Ω, R ) ∀t,

– ξt ∈ L2 (Ω, – xt

2

rt

nt

→ − – Atτ (·), bt (·), ct (·) linear affine in ξ t (i.e. Ft -measurable), where Atτ (·) is a mt × nτ -matrix. Sometimes the following reformulation of (1.1) may be convenient: – Given a probability space (Ω, G, P ); – Ft , t = 1, · · · , T, being σ-algebras such that Ft ⊂ G; Ft ⊂ Ft+1 , ∀t (i.e. {Ft | t = 1, · · · , T } being a filtration); – F := {F1 , · · · , FT }, where possibly FT = G;

R

– Xt a linear subspace of L2 (Ω, the set of Ft -simple functions;

nt

) (with respect to (Ω, G, P )), including

– Mt the set of Ft -measurable functions Ω −→ being a closed linear subspace of Xt ;

R

nt

and hence, Xt ∩ Mt

problem (1.1) may be restated as

(1.2)

 ( T ) X     min E cT  t xt   t=1     t X Atτ xτ = bt a.s., t = 1, · · · , T,     τ =1     xt ≥ 0 a.s., t = 1, · · · , T,   

xt ∈ Xt ∩ Mt ∀t

with Atτ , bt , ct assumed to be Ft -measurable for 1 ≤ τ ≤ t, t = 1, · · · , T , and to have finite second moments (in particular F1 = {Ω, ∅} and A11 , b1 , c1 being constant on Ω). 2

2

Aggregation

Following S.E. Wright [24] various aggregated problems may be derived from (1.2) by using coarser information structures, i.e. subfiltrations Fb = {Fbt }, Fbt ⊂ Fbt+1 , such that Fbt ⊆ Ft , ∀t, instead of the original filtration F = {Ft }, Ft ⊂ Ft+1 , t = 1, · · · , T − 1. Denoting problem (1.2) as P(F, F) we may distinguish among others b – the constraint aggregated problem P(F, F),  ( T ) X    min E ct xt ( for x F-adapted)     t=1  ( )   t n o X b b (2.1) E A x F = E b F a.s., t = 1, · · · , T, tτ τ t t t    τ =1     xt ≥ 0 a.s., t = 1, · · · , T,   

xt ∈ Xt ∩ Mt ∀t,

i.e. x is F-adapted as before, and the constraints are stated in conditional expectation given Fbt ; b F) b defined as: – and the fully aggregated problem P(F,

(2.2)

) ( T  X   b b  E[ct | Ft ]xt ( for x F-adapted) min E      ) ( t t=1   o n X c c F a.s. ∀t F = E b E A x t t t tτ τ    τ =1     xt ≥ 0 a.s., t = 1, · · · , T,    c

xt ∈ Xt ∩ Mt ∀t,

c is the set of F b -measurable functions Ω −→ where M t t T T T b (x1 , · · · , xT ) is F-adapted.

R

nt

, i.e. x =

Remark 2.1 Concerning the fully aggregated problem (2.2) we have the following facts: b F) b with finitely many • Given that F is infinite and Fb is finite, P(F, constraints and variables is clearly simpler to deal with than P(F, F);

3

• for a sequence {Fb ν } of (finite) filtrations with successive refinements, i.e. Fbtν ⊆ Fbtν+1 ∀t, under appropriate assumptions, e.g. for a corresponding sequence of measures Pν on FbTν converging weakly to P (see Billingsley [1]), we may expect convergence of the optimal values of (2.2) to that one of (1.2); • although in general the optimal value of (2.2) cannot be expected to be a lower bound of the optimal value of (1.2), as remarked by Wright [24], this can happen under special assumptions, as described in Corollary 3 and Theorem 9 in [24]; • in particular, for the two-stage case, with only b2 and A21 being random and A22 a complete recourse matrix, for Fb2 generated by finitely many convex polyhedra (e.g. cubes or simplices), due to Jensen’s inequality [7] the optimal value of (2.2) yields a lower bound for (1.2), and an upper bound can be determined by solving LP’s on all the convex polyhedra (see Dupaˇcov´a [3, 4]) or in particular, for the polyhedra being cubes, by applying the Edmundson-Madansky (E-M) inequality [5, 14, 15]. Convergence statements for the two-stage case may be found e.g. in Frauendorfer–Kall [6], Kall [8, 9], and Wets [23]. On the background of the latter results for two-stage problems—for more details we refer e.g. to Birge–Louveaux [2], Kall–Wallace [13], Mayer [16], and Pr´ekopa [21]—as well as motivated by our numerical experience in solving them, a natural goal would be to design a similar approach for solving multistage problems, at least under special assumptions like, among others, only the right-hand sides bt , t ≥ 2, being random with stagewise stochastic independent random vectors ξt , t = 2, · · · , T . To this end it seems to be appropriate to remind to the basic ideas of an approximation scheme for the two-stage case.

4

3

The approach for two-stage problems

Consider the two-stage problem

(3.1)

 T  ϕ := min{cT  1 x1 + E[c2 x2 (ξ2 )]}     A11 x1 = b1        

A21 (ξ2 )x1 +A22 x2 (ξ2 ) = b2 (ξ2 ) a.s. x1 ≥ 0 x2 (ξ2 ) ≥ 0 a.s.

with A22 a complete fixed recourse matrix (i.e. {y | A22 y = z, y ≥ 0} = 6 ∅ ∀z T and, in addition, {u | A22 u ≤ c2 } 6= ∅), A21 (ξ2 ), b2 (ξ2 ) being linear affine in ξ2 , and x2 (ξ2 ) required to be measurable w.r.t. F2 = σ(ξ2 ). Defining the recourse function Q by  T   Q(x1 , ξ2 ) := min c2 x2

(3.2)

 

A22 x2 = b2 (ξ2 ) − A21 (ξ2 )x1 x2 ≥ 0,

problem (3.1) may be reformulated as  T   min c1 x1 + E{Q(x1 , ξ2 )}

(3.3)

 

A11 x1 = b1 x1 ≥ 0.

R

Assume that (Ω, G, P ) = ( r2 , B, P )1 , that ξ2 (·) is the identity, and that Ξ := supp P is an interval (cube). Due to the above assumptions on A22 (complete recourse and dual feasibility) the function Q(x1 , ·) : Ξ −→ is finite and convex ∀x1 . Moreover, since the optimal value function Q(x1 , ·) is for all x1 piecewise linear in ξ2 and supp P is bounded by assumption, there follows the existence of E[ξ2 ] as well as of E[Q(x1 , ξ2 )] for all x1 . Due to Jensen’s inequality we have

R

Q(x1 , E[ξ2 ]) ≤

Z

Q(x1 , ξ2 )P (dξ2 ),

Ξ

and with the appropriate finite discrete distribution Pe on extr Ξ, the vertices of Ξ, satisfying EPe[ξ2 ] = EP [ξ2 ], the Edmundson-Madansky inequality yields Z Ξ 1

Q(x1 , ξ2 )P (dξ2 ) ≤

B being the Borel σ-algebra in

R

r2

Z extr Ξ

.

5

Q(x1 , ξ2 )Pe (dξ2 ).

Hence, ϕ0 := min{cT 1 x1 + Q(x1 , E[ξ2 ]) | A11 x1 = b1 , x1 ≥ 0} and ψ 0 := min{cT 1 x1 + EP e[Q(x1 , ξ2 )] | A11 x1 = b1 , x1 ≥ 0} are lower and upper bounds, respectively, for the optimal value of (3.3). Analogously, for Ξk := {Ξk1 , · · · , Ξkνk } being a partition of Ξ into subintervals, generating the σ-algebra Fb2k , and for the discrete distribution P k , assigning pµ = P (Ξkµ ) to E[ξ2 | Ξkµ ], µ = 1, · · · , νk , we get the Jensen lower bound (3.4)

  ϕk := min

bk {cT 1 x1 + EP k [Q(x1 , E[ξ2 | F2 ])]}



A11 x1 = b1 , x1 ≥ 0.

Finally, with Peµk denoting the Edmundson-Madansky distribution on any particular subinterval Ξkµ with respect to E[ξ2 | Ξkµ ], we get the E-M upper bound (3.5)



   ψ k := min



 

A11 x1 = b1 , x1 ≥ 0.

cT 1 x1 + EP k [EP ek [Q(x1 , ξ2 )]] µ

By refining successively the partitions Ξk in an appropriate way, we know from [8, 6] Proposition 3.1 If for the gridwidth of the successive partitions it holds that grid Ξk −→ 0, then for the monotonically increasing sequence {ϕk } and the monotonically decreasing sequence {ψ k } follows ϕk −→ ϕ ←− ψ k . A procedure DAPPROX based on this statement has been implemented in SLP-IOR (see Kall–Mayer [10, 11]), using some heuristic rules for partitioning the subintervals to improve efficiency.

6

4

MSLP with discrete distributions

→ − Consider the case of ξt , t = 1, · · · , T, and ξ t , t = 1, · · · , T, having finite discrete distributions. Then the filtration F = {F1 , · · · , FT } is finite, and → − → − → − the data {[Atτ ( ξ t ), τ = 1, · · · , t], bt ( ξ t ), ct ( ξ t ); t = 1, · · · , T } have finite discrete distributions as well. → − The finite σ-algebras Ft = σ( ξ t ) are generated by finite partitions of Ω given as (4.1)

kt [

Ω=

Bit with Bjt ∩ B`t = ∅ for j 6= `.

i=1

If we have, analogously, kt+1

(4.2)

Ω=

[

Cit+1 with Cjt+1 ∩ C`t+1 = ∅ for j 6= `,

i=1

then the filtration property Ft ⊆ Ft+1 implies:

(4.3)

  ∀i : 1 ≤ i ≤ kt ∃Jit+1 ⊂ {1, · · · , kt+1 } :   [ t B = Ckt+1 .  i   t+1 k∈Ji

− → Obviously (since Ft = σ( ξ t )) we have − → ξ t (ω) ≡ const for ω ∈ Bit , i = 1, · · · , kt ; t = 1 · · · , T, → − and for a policy x = {xt ( ξ t ), t = 1, · · · , T } = {x1 (ω), · · · , xT (ω)} holds: (4.4)

x is F-adapted ⇐⇒ xt (ω) = const ∀ω ∈ Bit , ∀i, t.

The filtration F may be represented by a scenario tree, i.e. a rooted tree with nodes associated to the components Bit of partitions (4.1), for i = 1, · · · , kt , t = 1, · · · , T . The construction of a scenario tree is indicated in Fig. 1, where e.g. node i in stage t corresponds to Bit in the partition (4.1) and the nodes Jit+1 = {k1, · · · , k4} in stage t + 1 correspond to Ckt+1 , k ∈ Jit+1 , according to (4.3). Hence, the arcs of the tree reflect exactly the successive subpartitions described in (4.3), i.e. node i ∼ Bit is connected to k ∼ Ckt+1 if and only if k ∈ Jit+1 . For the scenario tree we have the following notations: 7

k1 k2

i

k3

t=1

k4

t=2

t t+1

T

Figure 1: Scenario tree • T = (N , A), the scenario tree, is a rooted tree, the nodes n ∈ N in hierarchical levels tn corresponding to stages, where – n = 1 with t1 = 1 is the (only) root, – the nodes {n | tn = T } are the leaves, and – the unique path from the root n = 1 to any leaf n with tn = T is a scenario; • S is the set of scenarios (root-to-leaf paths); • ps is the probability of scenario s ∈ S; • Sn is the set of scenarios passing through node n ∈ N ; • pn =

X

ps is the probability to reach node n at all;

s∈Sn

• Hn ⊆ N are the predecessor nodes of node n (history); 8

• H n = Hn ∪ {n}; • hn is the parent node of node n ∈ N , n ≥ 2 (immediate predecessor); • Fn,s ⊆ N are the successor nodes of node n along scenario s ∈ S (future of n on s); • Fn =

[

Fn,s are all successor nodes of node n (future of n ∈ N );

s∈Sn

• F n = Fn ∪ {n}; • Cn ⊂ N are the immediate successor nodes of node n (children of n or immediate future of n); pm , m ∈ Fn , is the conditional probability to reach node m pn given that node n has been reached (provided that pn > 0); and finally

• qn,m =

• B n ⊆ Ω is the event in Ftn belonging to node n. → − Obviously, for any n ∈ N the random vector ξ tn (ω) has a constant realization ξ n on B n , with probability pn = P (B n ) ∀n ∈ N . Observing that the constraints are required a.s., and pm = 0 for some m ∈ Hn implies pn = 0, it follows that the MSLP (1.2) may be written as

(4.5)

 X  min pn ctn (ξ n )T xn     n∈N : pn >0    X 

Atn tm (ξ n )xm = btn (ξ n ) ∀n : pn > 0

        

m ∈ Hn : pm > 0

xn ≥ 0

∀n : pn > 0

or with an obvious simplification of notation as

(4.6)

 X  pn ctn (n)T xn min     n∈N : pn >0    X 

Atn tm (n)xm = bn ∀n ∈ N : pn > 0

        

m ∈ Hn : pm > 0

xn ≥ 0

9

∀n ∈ N : pn > 0,

the xm being associated to node m, i.e. to the B m , and due to their measurability w.r.t. Ftm being constant on B m . Further, from (4.3) and (4.1) follows immediately pn =

X

pk ∀n : tn < T

k∈Cn

and X

pn = 1 ∀t ∈ {1, · · · , T },

n∈N , tn =t

such that the pn yield in any particular stage a probability distribution on the nodes of this stage. In particular, the pn yield a probability distribution on the set of leaves {n | tn = T }, which, in turn, are in a one-to-one relation with the root-to-leaf paths, i.e. with the scenarios; hence we also have a probability distribution on the set of scenarios, which may be understood as sample paths of a stochastic process on a finite state space in discrete time. Finally, for any n ∈ N , n > 1, the filtration property according to (4.3) yields \ \ B k = B hn and therefore P ( B k ) = P (B hn ) k∈Hn

k∈Hn

implying—for P (B hn ) > 0— P (B n ∩ k∈Hn B k ) P (B n ) P (B n | B k , k ∈ Hn ) = = = qh n n . T P ( k∈Hn B k ) P (B hn ) T

(4.7)

5

Discrete Approximation in MSLP

From now on for the MSLP under consideration we make the following Assumptions: → − – Only the right-hand sides bt , linear affine in ξ t , are random; – the random vectors ξ2 , · · · , ξT are stochastically independent; – there are hypercubes Ξt ⊂

R

rt

such that Ξt ⊇ range ξt ;

– the Att are complete fixed recourse matrices ∀ t, which requires that ∀ t holds: {y | Att y = z, y ≥ 0} = 6 ∅ ∀z. 10

Assume now that any finite subfiltration Fb and the corresponding scenario tree are generated as follows: – To Ξ1 in stage 1 assign node n = 1, C0 := {1} and ξ 1 = ξ1 (= ηˆ1 = const on Ξ1 with p1 = P (Ξ1 ) = 1). – For ν = 1, · · · , T − 1 repeat: For each node n in stage ν (i.e. tn = ν) define a finite set Cn of children of n, such that card Cn ≥ 1 and Cn ∩Ck = ∅, k 6= n, for all Ck introduced so far, and associate with each node set Cn in stage tn + 1 = ν + 1 a partition {Ξtκn +1 | κ ∈ Cn } of Ξtn +1 (such that Ξtκn +1 = Ξtκκ is assigned to κ ∈ Cn ). – To each node n 6= 1, with H n = {`1 = 1, · · · , `tn −1 , `tn = n} and → − ξ k := (ξ1 , · · · , ξk ) assign, observing the stagewise independence,

(5.1)

→ − − → n = E[ ξ tn | ξ tn ∈ \ /tν=1 Ξν`ν ]  E[ξ1 | ξµ ∈ Ξµ`µ , µ = 1, · · · , tn ]  .. =  . 

  ξn               

E[ξtn | ξµ ∈ Ξµ`µ , µ = 1, · · · , tn ]

               

   

E[ξ1 | ξ1 ∈ Ξ1`1 ]   .. , =  .   E[ξtn | ξtn ∈ Ξt`ntn ] 



t

n with pn = P (/ \ν=1 Ξν`ν ) =

tn Y ν=2

P (Ξν`ν ).

Proposition 5.1 Under the above assumptions it follows that for any finite subfiltration Fb constructed as above the optimal value of (4.5) is a lower bound of the optimal value of the original problem (1.2). Proof: As is well known (see Olsen and Rockafellar–Wets [18, 22]) our problem (1.2) may be solved recursively. According to the above construction of the scenario tree it follows, that any node n in some stage t (i.e. tn = t) corresponds to one particular subset Ξtn of a partition {Ξtµ | µ ∈ Chn } of Ξt (hn the immediate predecessor of node n and Chn the children of hn ). 11

→ − → Denoting realizations of ξ t by − η t and in particular for node n in stage t (see the tree construction) E[ξt | ξt ∈ Ξtn ] by η nt , in the above mentioned → → → → recursion with ΦT +1 (− x T;− η T ) ≡ 0 ∀− x T,− η T we determine iteratively for t = T, T − 1, · · · , 2 and all nodes ν with tν = t  → →  rt (− x t−1 ; − η t)                

→ − − → := min {cT t xt + Φt+1 ( x t ; η t )} x t

t−1

X → s.t. Att xt = bt (− η t) − Atτ xτ , xt ≥ 0 τ =1

→ −

(5.2) 

→ −

→ → → → Φt (− x t−1 ; − η t−1 ) := Eξt [rt (− x t−1 ; ξ t ) | ξ t−1 = − η t−1 ]      → →  = Eξt [rt (− x t−1 ; − η t−1 , ξt )]     X   → →   = P (Ξtµ )Eξt [rt (− x t−1 ; − η t−1 , ξt ) | ξt ∈ Ξtµ ],   µ∈Chν

using the assumed stagewise independence, yielding finally ˆ1 )} r1 := min {cT 1 x1 + Φ2 (x1 ; η x 1

s.t. A11 x1 = b1 (ˆ η1 ) ≡ b1 , x1 ≥ 0, the optimal value of (1.2), with ηˆ1 the constant realization of ξ1 . → → → → Here, if Φt+1 (− x t; − η t ) is jointly convex in (− x t; − η t ) (as is trivially true for ΦT +1 ), it follows immediately that → → → − − → {cT rt (− x t−1 ; − η t ) = min t xt + Φt+1 ( x t ; η t )} x t

t−1

X → Atτ xτ , xt ≥ 0 s.t. Att xt = bt (− η t) − τ =1

→ → is jointly convex in (− x t−1 ; − η t ), implying by (5.2) that → → → → Φt (− x t−1 ; − η t−1 ) = Eξt [rt (− x t−1 ; − η t−1 , ξt )] → → is jointly convex in (− x t−1 ; − η t−1 ) as well. Hence, by Jensen’s inequality, we have → → → → → → (5.3) rt (− x t−1 ; − η t−1 , E[ξt ]) ≤ Eξt [rt (− x t−1 ; − η t−1 , ξt )] = Φt (− x t−1 ; − η t−1 ).

12

If, instead of (5.2), with ΨT +1 ≡ 0, we define for t = T, T − 1, · · · , 2 and all nodes ν with tν = t the iteration  → → → →  qt (− x t; − η , η ν )} x t−1 ; − η , η ν ) := min{cT xt + Ψt+1 (− (5.4)

t−1

        

t

t−1

t

xt

t

t−1

X → s.t. Att xt = bt (− η t−1 , η νt ) − Atτ xτ , xt ≥ 0 τ =1

      → →   Ψ (− x t−1 ; − η t−1 )   t

:=

→ → P (Ξtµ )qt (− x t−1 ; − η t−1 , η µt ),

X µ∈Chν

we’ll get with q1 := min{cT ˆ1 )} 1 x1 + Ψ2 (x1 ; η x1

s.t. A11 x1 = b1 (ˆ η1 ) ≡ b 1 , x 1 ≥ 0 the optimal value of (4.5). → → → → Now, if Ψt+1 (− x t; − η t ) ≤ Φt+1 (− x t; − η t ) (as is obviously true for t = T ) it follows immediately from (5.2) and (5.4), and from Jensen’s inequality (for conditional expectations), that → → → → qt (− x t−1 ; − η t−1 , η νt ) ≤ rt (− x t−1 ; − η t−1 , η νt ) → − → ≤ Eξ [rt ( x t−1 ; − η , ξt ) | ξt ∈ Ξt ] t−1

t

and hence → → Ψt (− x t−1 ; − η

t−1 )

=

X µ∈Chν



X

ν

→ → P (Ξtµ )qt (− x t−1 ; − η t−1 , η µt ) → → P (Ξtµ )Eξt [rt (− x t−1 ; − η t−1 , ξt ) | ξt ∈ Ξtµ ]

µ∈Chν

→ → = Φt (− x t−1 ; − η t−1 ) and therefore finally ˆ1 )} ≤ min {cT ˆ1 )} =: r1 q1 := min {cT 1 x1 + Ψ2 (x1 ; η 1 x1 + Φ2 (x1 ; η x1 ∈B

x1 ∈B

with B := {x1 | A11 x1 = b1 (ˆ η1 ) ≡ b1 , x1 ≥ 0}.

2

The independence of the random vectors ξ2 , · · · , ξT is essential for Prop. 5.1 because in general by (5.1) for some stage µ the conditional expectation of → − the corresponding random vector ξµ (as contained in some ξ tn ) may vary if computed e.g. for some successor m of node n due to the change in the conditioning sets, whereas by (5.1) this cannot happen in the case of stagewise independence. This is demonstrated by the following example: 13

Example 5.1 Consider the 3-stage problem V ? = min{2x + E[y1 (ξ2 ) + 2y2 (ξ2 )] + E[z1 (ξ2 , ξ3 ) + z2 (ξ2 , ξ3 )]} s.t. x +y1 (ξ2 ) − y2 (ξ2 ) = ξ2 x +y1 (ξ2 ) − y2 (ξ2 ) +z1 (ξ2 , ξ3 ) − z2 (ξ2 , ξ3 ) = ξ3 x, y1 , y2 , z1 , z2 ≥ 0. For (ξ2 , ξ3 )T assume the density

f (ξ2 , ξ3 ) =

  1+ε       1+ε       

for for 1 − ε for 1 − ε for 0 else.

0 ≤ ξ2 , ξ3 ≤ 0.5 0.5 ≤ ξ2 , ξ3 ≤ 1 0 ≤ ξ2 < 0.5 < ξ3 ≤ 1 0 ≤ ξ3 < 0.5 < ξ2 ≤ 1

with some constant ε ∈ (−1, +1). Obviously for ε 6= 0 we have stochastic dependence for ξ2 , ξ3 , whereas the marginal distributions for ξ2 and ξ3 are U[0, 1] (the uniform distribution) each. By straightforward computations we get 5−ε . V? = 6 Discretizing with 1 ξ 2 = E[ξ2 ] = and 2        1 1 1 ξ 31 = E ξ3 | ξ3 ∈ 0, , ξ2 ∈ [0, 1] = E ξ3 | ξ3 ∈ 0, = , 2 2 4        1 1 3 ξ 32 = E ξ3 | ξ3 ∈ , 1 , ξ2 ∈ [0, 1] = E ξ3 | ξ3 ∈ , 1 = , 2 2 4 we get with p2 = P (ξ2 ∈ [0, 1]) = 1, p31 = P (ξ3 ∈ [0, 12 ]) = p32 = P (ξ3 ∈ [ 12 , 1]) = 12 the discretized problem

1 2

and

1 1 V = min 2x + y1 + 2y2 + (z11 + z21 ) + (z12 + z22 ) 2 2 s.t.

x +y1 − y2

1 2 1 = 4 3 = 4 ≥ 0 =

x +y1 − y2 +z11 − z21 +z12 − z22

x +y1 − y2

x, y1 , y2 , z11 , z21 , z12 , z22 14

3 with the optimal value V = , for which obviously holds 4

V

  ?    ≤V

if

    >V?

if

1 2 1 ε> . 2 ε≤

2

6

Refinement of partitions

Given the coarse subfiltration Fb with Fbt generated by the elementary events {ξτ−1 [Ξτ ], ∅ | τ = 1, · · · , t}, consider the fully aggregated problem:  ( T )  X    min E ct xt     t=1 t h i X c = E[b ] ∀t  A x = E bt F  tτ τ t t    τ =1    x ≥ 0 ∀t.

(6.1)

t

This may trivially be represented in a basic tree (see Fig. 2), the nodes

1

2

t-1

t

t+1

t+2

T

Figure 2: Basic tree corresponding to the stages and carrying as further information (among others) the respective sets Ξt . Partitioning a particular Ξt into two subintervals Ξt1 and Ξt2 then yields the new tree shown in Fig. 3. Here the two nodes in stage t reflect (among others) the subintervals Ξt1 , Ξt2 , respectively, and the corresponding subfiltration is b b b now refined to Fb with Fb r = Fbr ∀r < t, Fb t generated by {ξt−1 [Ξt1 ], ξt−1 [Ξt2 ], ∅}

15

and, for s > t, Fb s generated by {ξt−1 [Ξt1 ], ξt−1 [Ξt2 ], ξs−1 [Ξs ], ∅}. Due to the assumed stagewise independence of the ξτ , it follows that b

 ≡ E[br ] for r < t  b  t b E br F r  = E [bt | ξt ∈ Ξk ] for r = t and k = 1, 2  t 

≡ E[br | ξt ∈ Ξk ] for r > t and k = 1, 2.

A t 1

2

t+1

t+2

T

t-1

B t+1

t

t+2

T

Figure 3: First partition Denoting the two scenarios by sA = {1, · · · , A}, sB = {1, · · · , B}, we have that psA = P ({ξt ∈ Ξt1 }) and psB = P ({ξt ∈ Ξt2 }) and, for the probability to reach any node n at all, pn =

   1  

p sA p sB

for tn < t for tn ≥ t and n ∈ sA for tn ≥ t and n ∈ sB .

b a scenario tree Assume now that we have, for some (finite) subfiltration F, T = (N , A) and denote by

• Ξtnn ⊆ Ξtn the subinterval of Ξtn belonging to node n; µ

• bn = E(btn | / \H n Ξ`µ ) = btn (ξ n ) the conditional expectation of the right-hand side in stage tn associated with node n.

16

To this tree belongs the fully aggregated problem (see (4.6))

(6.2)

 X  pn cT min  tn xn    n∈N : p >0  n   X Atn tm xm =      m∈H n : pm >0   

bn ∀n ∈ N : pn > 0

xn ≥ 0

∀n ∈ N : pn > 0

and—observing that for pn = 0 the n-th constraint is missing in (6.2) and hence un = 0, implying also um = 0 ∀m ∈ Fn since pm ≤ pn ∀m ∈ Fn —its dual

(6.3)

X   max bT  n un    n∈N : pn >0 X   AT  tm tn um  

≤ pn ctn ∀n ∈ N : pn > 0.

m∈ F n : pm >0

Replacing the dual variables un by un = pn πn ∀n and dividing the n-th constraint of (6.3) by pn yields (with qn,n = 1)

(6.4)

X   max p n bT  n πn    n∈N : pn >0 X   qn,m AT  tm tn πm  

≤ ctn ∀n ∈ N : pn > 0.

m∈F n : pm >0

Then, with (xn , π m ) being a primal-dual pair of solutions, the complementarity conditions (6.5)

(ctn −

X

T qn,m AT tm tn π m ) xn = 0 ∀n ∈ N : pn > 0

m ∈ Fn : pm > 0

(with qn,n = 1) have to hold.

6.1

Refinement for optimality

For any fixed node n with pn > 0, and with {xm | m ∈ Hn }, {π m | m ∈ F n } being parts of solutions to (6.2) and (6.4), respectively, consider the linear program

17

(6.6)

X  T  min(c − qn,m AT t  n tm tn π m ) xn    m∈F n : pm >0  X

Atn tn xn      

= bn −

Atn tm xm

m∈Hn : pm >0

xn ≥ 0.

Then, since {xk , k ∈ N : pk > 0} solves (6.2), xn is feasible in (6.6). Furthermore, the {π ` , ` ∈ N : p` > 0} being feasible and optimal in (6.4) and xn ≥ 0 due to (6.6), we have for the objective value of (6.6), observing (6.5), 0 ≤ (ctn −

X

T qn,m AT tm tn π m ) xn = 0.

m ∈ Fn : pm > 0

Hence xn solves the linear program (6.6). Remark 6.1 The complete recourse property of the matrices Att ∀t, and hence their full row rank, implies—provided (6.2) being solvable at all—that we may choose a solution in such a way that xn is for each node n a basic solution with respect to a basis matrix B(n) out of Atn tn . Solving (6.6) (with π m being solutions of (6.4)) then yields the basic solution xn with the vector x˜n (ξ n ) of basic variables  

X



m∈Hn : pm >0

−1 x˜n (ξ n ) = B(n) bn (ξ n ) −

 

Atn tm xm (ξ m ) , 

the xm (ξ m ) being basic solutions with respect to the matrices Atm tm as well. This way we get the solution {xn (ξ n ), n ∈ N } as linear affine transform of ξ n for all n ∈ N . Let us now consider, for any node n with pn > 0, instead of the fixed righthand side bn = btn (ξ n ), the right-hand side ˜bn (ξtn ) := btn (ξ hn , ξtn ) depending on the random ξtn , and the convex function in ξtn

(6.7)

X  T ˜bn (ξtn )) := min(ctn −  k ( qn,m AT n  tm tn π m ) xn    m∈F n : pm >0   X ˜bn (ξtn ) − Atn tm xm A x =  t t n n n    m∈Hn : pm >0   

xn ≥ 0

18

and its conditional expectation In = E[kn (˜bn (ξtn )) | Ξtnn ] = E[kn (btn (ξ hn , ξtn )) | Ξtnn ],

(6.8)

bounded below by Jensen’s bound Ln := kn (bn ) = 0 (the optimal value of (6.6)) and above by Un , the (E-M)-bound with respect to P (· | Ξtnn ). Based on some tolerance ∆ > 0 we may therefore decide to split node i if δi := Ui − Li = Ui ≥ ∆ according to the following procedure (see Fig. 4). Cut and paste S1 Subdivide Ξtii into two disjoint subintervals Ξ1 and Ξ2 ; compute pe1 = P (ξt i ∈ Ξ1 ), pe2 = P (ξt i ∈ Ξ2 ), r1 =

pe1 pe2 , r2 = , with p i = P (Ξtii ), and pi pi

b1 = E[˜bi (ξt i ) | Ξ1 ], b2 = E[˜bi (ξt i ) | Ξ2 ], such that r1 + r2 = 1 and r1 b1 + r2 b2 = bi . S2 Let T1 = (N1 , A1 ) with N1 ⊂ N , A1 ⊂ A be the maximal subtree of T = (N , A) rooted at i ∈ N . Let T2 = (N2 , A2 ) be a copy of T1 , its root denoted as j 6∈ N , and all other node labels modified such that N2 ∩ N = ∅, A2 ∩ A = ∅. Associate with the nodes of T2 the same quantities as corresponding to the respective nodes of T1 . S3 Update the values of these subtrees as follows (with Hi the history of ˜ n the history of n in T1 and T2 , respectively): i ∈ N in T , and H T1 : Set b i = b1 , and for n ∈ Fi

b1n := E[btn | / \Hi Ξµ`µ × Ξ1 × / \H˜ n −{i} Ξµ`µ ];

multiply node probabilities by r1 , T2 : Set b j = b2 , and for n ∈ Fj

b2n := E[btn | / \Hi Ξµ`µ × Ξ2 × / \H˜ n −{j} Ξµ`µ ]; multiply node probabilities by r2 .

S4 Denote the parent node of i by np . Introduce a new edge from np to j and thus paste T2 to T yielding the new tree (see Fig. 4) 19

T + = (N + , A+ ), with N + = N ∪ N2 and A+ = A ∪ A2 ∪ {(np , j)}. Observe that the updates of S1 and S3 can take advantage of the assumed → − stagewise stochastic independence of the ξt and of bt ( ξ t ) being linear affine.

np 1

j

2 i Figure 4: Cut and paste

Following an argument used already in [12] we get Proposition 6.1 With V being the optimal value of the fully aggregated problem (6.2) on T and V + the optimal value for the respective problem on T + , it follows that V + ≥ V. Proof: Let {un , n ∈ N } be a solution to the dual program (6.3) associated with T . To any node n ∈ N2 assign the vector un belonging to the corresponding node n ∈ N1 . Now define for n ∈ N + u˜n =

   r1 un

r2 un   un

if n ∈ N1 if n ∈ N2 else.

We show that {˜ un , n ∈ N + } is a feasible solution to the dual program (6.3) associated with T + : 20

1) n ∈ N1 X

AT ˜n + tn tn u

AT ˜m = r1 (AT tm tn u tn tn un +

m∈ Fn : pm >0

AT tm tn um )

X m∈ Fn : pm >0

≤ r1 pn ctn =

p+ n ctn

with p+ n the updated node probability in N1 . 2) n ∈ N2 Analogously. 3) n ∈ N + \ (N1 ∪ N2 ) =: ∆N AT ˜n + tn tn u

X

AT ˜ m = AT tm tn u tn tn un +

AT tm tn um

X m∈ Fn ∩∆N : pm >0

m∈ Fn : pm >0

(r1 + r2 ) AT tm tn um

X

+

m∈ Fn ∩N1 : pm >0

|

{z

=1

}

≤ pn ctn = p+ n ctn For the dual objective value of this feasible solution we get X

˜bT u˜n = n

n∈N +

T bT n un + (r1 b1 + r2 b2 ) ui +

X n∈∆N

=

X

|

{z

=bi

}

X n∈N1 \{i}

(r1 b1n + r2 b2n )T un |

{z

=bn

}

bT n un

n∈N

such that the objective value of the feasible solution {˜ un , n ∈ N + } for T + + coincides with the optimal value for T , and hence V ≥ V . 2 Summarizing these results, the following steps are suggested: Node splitting I (for optimality) S1 Initialize t For the course subfiltration Fb with Fbt generated by {b−1 t [Ξ ], ∅}, define the basic tree of Fig. 2

T 1 = (N 1 , A1 ) consisting of a single path, S 1 being the single scenario, pS 1 = 1, 21

1 p1n = 1 ∀n ∈ N 1 , qnm = 1 ∀n < T, ∀m ∈ Fn ,

b1n = E[btn ], and k := 1. S2 Solve the aggregated problem Solve the fully aggregated LP (6.2) associated with the tree T k , yielding the primal-dual pair of solutions {xkn , ukm | n, m ∈ N k } and the optimal value V k . S3 Bounds on nodes Compute for each n ∈ N k the (E–M) bound Unk for (6.8), using DAPPROX, and determine δnk := Unk ∀n ∈ N k . S4 Splitting rule I Compute ∆k = pkn0 δnk0 = maxn∈N k pkn δnk . If (for some choice of ε > 0) ∆k > ε, then apply the “Cut and paste” procedure described before at node i = n0 yielding the new scenario tree T k+1 . With k := k + 1 return to step S2; else if ∆k ≤ ε, then go to “Node splitting II” (next subsection). According to Prop. 6.1 {V k } is a monotone increasing sequence which under our assumptions according to Prop. 5.1 is bounded above by the optimal value Vb of (1.2). Hence, suppressing in S4 the last exit to the yet unknown step “Node splitting II”, {V k } would converge to some value V ? ≤ Vb . This reduced method was implemented within our system SLP-IOR and worked fairly well with many randomly generated test examples. However, it may happen indeed that this procedure stops before reaching the optimal value of (1.2), i.e. with V ? < Vb , as e.g. in the following example.

7

An example

Consider the following example:

22

min x1 + x2 + E{y1 + y2 + z1 + z2 } x1 − x2 x1 + 2x2 +3y1 − 3y2 x1 + 3x2 +y1 − y2 +4z1 − 4z2 xi , yi , zi

= = = ≥

0 ξ2 ξ3 0

where ξ2 ∼ U[0, 6] and ξ3 ∼ U[1, 1.5], with U being the uniform distribution. For the fully aggregated problem with E[ξ2 ] = 3 and E[ξ3 ] = 1.25 as RHS we get (x1 , x2 , y 1 , y 2 , z 1 , z 2 ) = (0, 0, 1, 0,

1 , 0) 16

with the optimal value V = and the dual solution

17 16

1 1 1 π = , , . 4 4 4 Considering the node functions kn (ξn ) defined in (6.7) as 

T



X  T  k (ξ ) := min(c − qn,m AT n n t  n tm tn π m ) xn    m∈F n : pm >0   X A x = ξ − Atn tm xm  t t n n n n    m∈Hn : pm >0   

xn ≥ 0,

for n = 2 with ct2 =

1 1

!

, q22 = q23 = 1,

AT t2 t2

=

3 −3

!

,

AT t3 t2

=

1 −1

!

, π2 = π3 =

1 4

we get k2 (ξ2 ) = min {0 · y1 + 2y2 | 3y1 − 3y2 = ξ2 , yi ≥ 0} yielding k2 (ξ2 ) ≡ 0 for ξ2 ∈ [0, 6], i.e. k2 is linear on supp ξ2 and hence U2 − L2 = 0. Analogously, for 23

k3 (ξ3 ) = min{0 · z1 + 2z2 | 4z1 − 4z2 = ξ3 − 1, zi ≥ 0} we get k3 (ξ3 ) ≡ 0 for ξ3 ∈ [1, 1.5], i.e. also k3 is linear on supp ξ3 and hence U3 − L3 = 0, such that, due to the stopping rule, the above procedure would end here. However, subdividing the interval [0, 6] into [0, 3] and [3, 6] yields the problem 1 1 min{x1 + x2 + (y11 + y21 + y12 + y22 ) + (z11 + z21 + z12 + z22 )} 2 2 x1 − x2 = 0 3 x1 + 2x2 +3y11 − 3y21 = 2 9 x1 + 2x2 +3y12 − 3y22 = 2 5 x1 + 3x2 +y11 − y21 +4z11 − 4z21 = 4 5 x1 + 3x2 +y12 − y22 +4z12 − 4z22 = 4 xi , yik , zik ≥ 0 having the solution 1 3 3 1 x1 = x2 = 0, y11 = , y21 = 0, y12 = , y22 = 0, z11 = , z21 = z12 = 0, z22 = 2 2 16 16 and hence the optimal value V+ =

18 > V, 16

such that the stopping rule for this example did not imply (near-) optimality. Similarly, letting supp ξ2 = [0, 6] unchanged and splitting supp ξ3 = [1, 1.5] into the two subintervals [1, 1.25] and [1.25, 1.5] yields again an optimal value of 18 V ++ = >V 16 in contrast to our stopping rule.

24

Appendix A: Computational details for Example 5.1 Given the data of Ex. 5.1, page 14, for any given first stage solution x ≥ 0 the second stage solution yi (ξ2 ) ≥ 0, i = 1, 2, to minimize the second stage objective y1 (ξ2 ) + 2y2 (ξ2 ), is given according to: a) ξ2 < x ⇒ y1 (ξ2 ) = 0, y2 (ξ2 ) = x − ξ2 b) ξ2 ≥ x ⇒ y1 (ξ2 ) = ξ2 − x, y2 (ξ2 ) = 0. Then for the third stage, minimizing the third stage objective yields, for both of the cases a) and b) above: x + y1 (ξ2 ) − y2 (ξ2 ) ≤ ξ3 ⇒ z1 (ξ2 , ξ3 ) = ξ3 − ξ2 , z2 (ξ2 , ξ3 ) = 0 x + y1 (ξ2 ) − y2 (ξ2 ) > ξ3 ⇒ z1 (ξ2 , ξ3 ) = 0, z2 (ξ2 , ξ3 ) = ξ2 − ξ3 .

1 3

X2

1+ε

1−ε

1+ε

3

X1

1+ε

1−ε 1+ε

0

X

2

1

Figure 5: Density f (ξ2 , ξ3 ) on supp (ξ2 , ξ3 ) = Ξ2 × Ξ3 = Ξ2 × (Ξ31 ∪ Ξ32 ). Hence the objective value is, for 0 ≤ x ≤ 1, V (x) = 2x +

Z

x

ξ2 =0

2(x − ξ2 )dξ2 +

Z

1

ξ2 =x

(ξ2 − x)dξ2 +

Z Ξ2 ×Ξ3

|ξ3 − ξ2 |f (ξ2 , ξ3 )dξ2 dξ3

x2 1 Z = 2x + x + −x+ + |ξ3 − ξ2 |f (ξ2 , ξ3 )dξ2 dξ3 . 2 2 Ξ2 ×Ξ3 2

25

For the last integral we get Z Ξ2 ×Ξ3 Z 1

|ξ3 − ξ2 |f (ξ2 , ξ3 )dξ2 dξ3 =

ξ2 =0

Z

1

ξ3 =ξ2

(ξ3 − ξ2 )f (ξ2 , ξ3 )dξ3 dξ2 +

|

{z

}

A

Z

1

Z

ξ3 =0

1

ξ2 =ξ3

(ξ2 − ξ3 )f (ξ2 , ξ3 )dξ2 dξ3 ,

|

{z B

where A = B for symmetry reasons (see Fig. 5). A =

Z

1 2

Z

ξ2 =0

+

Z

1 2

ξ3 =ξ2 Z 1

ξ2 =0 1

+

1 2

Z

ξ2 = 12

(ξ3 − ξ2 )f (ξ2 , ξ3 )dξ3 dξ2

ξ = 12

(ξ3 − ξ2 )f (ξ2 , ξ3 )dξ3 dξ2

Z 31

ξ3 =ξ2

(ξ3 − ξ2 )f (ξ2 , ξ3 )dξ3 dξ2

1 1 1 1 1 1 (1 + ε) + (1 − ε) + (1 + ε) 2 24 2 4 2 24 2−ε = 12 =

such that A + B =

2−ε and 6

3 1 2−ε 3 5−ε V (x) = x2 + x + + = x2 + x + 2 2 6 2 6 and therefore V ? = min V (x) = x≥0

26

5−ε . 6

}

References [1] P. Billingsley. Convergence of Probability Measures. Wiley, New York, 1968. [2] J.R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer Series in Operations Research. Springer-Verlag, Berlin/Heidelberg, 1997. [3] J. Dupaˇcov´a. Minimax stochastic programs with nonconvex nonseparable penalty functions. In A. Pr´ekopa, editor, Progress in Operations Research, pages 303–316. North-Holland Publ. Co., 1976. [4] J. Dupaˇcov´a. Minimax stochastic programs with nonseparable penalties. In K. Iracki, K. Malanowski, and S. Walukiewicz, editors, Optimization Techniques, Part I, volume 22 of Lecture Notes in Contr. Inf. Sci., pages 157–163, Berlin, 1980. Springer-Verlag. [5] H.P. Edmundson. Bounds on the expectation of a convex function of a random variable. Technical Report Paper 982, The RAND Corporation, 1956. [6] K. Frauendorfer and P. Kall. A solution method for SLP recourse problems with arbitrary multivariate distributions – the independent case. Probl. Contr. Inf. Theory, 17:177–205, 1988. [7] J.L. Jensen. Sur les fonctions convexes et les in´egalit´es entre les valeurs moyennes. Acta Math., 30:173–177, 1906. [8] P. Kall. On approximations and stability in stochastic programming. In J. Guddat, H. Th. Jongen, B. Kummer, and F. Noˇziˇcka, editors, Parametric Optimization and Related Topics, pages 387–407. AkademieVerlag, Berlin, 1987. [9] P. Kall. Stochastic programming with recourse: Upper bounds and moment problems—A review. In J. Guddat, B. Bank, H. Hollatz, P. Kall, D. Klatte, B. Kummer, K. Lommatzsch, K. Tammer, M. Vlach, and K. Zimmermann, editors, Advances in Mathematical Optimization (Dedicated to Prof. Dr. Dr. hc. F. Noˇziˇcka), pages 86–103. Akademie-Verlag, Berlin, 1988. 27

[10] P. Kall and J. Mayer. A model management system for stochastic linear programming. In P. Kall, editor, System Modelling and Optimization, pages 580–587. 15th IFIP TC7 Conference, Zurich, September 1991, Springer-Verlag, 1992. LN Contr. Inform. Sci., vol. 180. [11] P. Kall and J. Mayer. SLP-IOR: An interactive model management system for stochastic linear programs. In A. King, editor, Approximation and Computation in Stochastic Programming, volume 75, pages 221–240. Math. Prog. B, 1996. [12] P. Kall and D. Stoyan. Solving stochastic programming problems with recourse including error bounds. Math. Operationsforsch. Statist., Ser. Opt., 13:431–447, 1982. [13] P. Kall and S.W. Wallace. Stochastic Programming. John Wiley & Sons, Chichester, 1994. [14] A. Madansky. Bounds on the expectation of a convex function of a multivariate random variable. Ann. Math. Statist., 30:743–746, 1959. [15] A. Madansky. Inequalities for stochastic linear programming problems. Management Sci., 6:197–204, 1960. [16] J. Mayer. Stochastic Linear Programming Algorithms: A Comparison Based on a Model Management System. Gordon and Breach Science Publishers, 1998. (Habilitationsschrift, Wirtschaftswiss. Fakult¨at, Universit¨at Z¨ urich, 1996). [17] P. Olsen. Multistage stochastic programming with recourse: The equivalent deterministic problem. SIAM J. Contr. Opt., 14:495–517, 1976. [18] P. Olsen. When is a multistage stochastic programming problem well defined? SIAM J. Contr. Opt., 14:518–527, 1976. [19] P. Olsen. Multistage stochastic programming as mathematical programming in a Lp space. SIAM J. Contr. Opt., 14:528–537, 1976. [20] P. Olsen. Discretizations of multistage stochastic programming problems. Math. Prog. Study, 6:111–124, 1976.

28

[21] A. Pr´ekopa. Stochastic Programming. Kluwer Academic Publ., 1995. [22] R.T. Rockafellar and R.J.-B. Wets. Nonanticipativity and L1 martingales in stochastic optimization problems. Math. Prog. Study, 6:170–186, 1976. [23] R. Wets. Stochastic programming: Solution techniques and approximation schemes. In A. Bachem, M. Gr¨otschel, and B. Korte, editors, Mathematical Programming: The State-of-the-Art, Bonn 1982, pages 566–603. Springer-Verlag, Berlin, 1983. [24] S.E. Wright. Primal-dual aggregation and disaggregation for stochastic linear programs. Math. Oper. Res., 19:893–908, 1994.

29