Dynamic programming

1 downloads 0 Views 178KB Size Report
Let f : R n → R be a given function. The function f has a directional derivative f. ′. (x,d) at x in the direc- tion d ∈ ..... these algorithms find an optimal solution. This is a ..... (b) = v(P). Dynamic Programming algorithm. Step 1. Compute wn(y) for every y ∈ Rm . Step 2. Evaluate iteratively for k = n − 1 up to 1 w k. (y) = min x k. {fk.
Course IE 515 Dynamic Programming J.B.G.Frenk, G052, E-mail:[email protected] Literature 1. Chapter 1-3 of the book: Eric.V.Denardo, Dynamic Programming, Dover Publications, 2003 2. Chapter 3 Eugene Lawler, Combinatorial Optimization (Networks and Matroids), Dover Books on Mathematics, 2011 3. J.Bather, Decision theory (An introduction to Dynamic programming and sequential decisions), Wiley, New York

1

Dynamic programming and its use in finite dimensional optimization. In these course notes the set N := {1, 2, 3, ...}, Z+ := {0, 1, 2, ...} and Z := {...., −1, 0, 1, ...} Introduction different optimization problems. To position Dynamic Programming within deterministic finite dimensional optimization we first give a short overview of the different subfields within finite dimensional optimization and the fundamental problems related to those subfields. The next optimization problems occur a lot in practical problems. We start with the most general one. Nonlinear Programming. A nonlinear program is given by υ(P ) = inf{f (x1, ..., xn) : (x1, .., xn) ∈ FP } with FP some subset of Rn. 2

In the above formulation it is also assumed that an optimal solution exists and so we can replace inf by min. In vector notation this reads υ(P ) = min{f (x) : x ∈ FP }

(P )

with x⊤ = (x1, ..., xn). In most cases the so-called feasible set FP is given by FP = {x ∈ Rn : gi(x) ≤ bi, i = 1, ..., m} with gi : Rn → R and bi ∈ R. In general it is not possible to guarantee that a solution given by an algorithm is indeed an optimal solution. Hence depending on the interpretation of the problem heuristics are used. Literature R.Horst, P.M.Pardalos,N.V. Thoai, Introduction to Global Optimization, Kluwer Academic Publishers, 1995 D.P. Bertsekas, Nonlinear Programming, Athena Scientific Belmont 3

J. Nocedal, S.J. Wright, Numerical Optimization, Springer Verlag, 1999 Nonlinear programs are discussed in the course IE 509, Nonlinear Programming. Basic ideas nonlinear programming. Before discussing the basic ideas of nonlinear programming we introduce the next definition. Definition 1. Let f : Rn → R be a given function. The function f has a directional derivative f ′(x, d) at x in the direction d ∈ Rn if the values f (x + td)−f (x) ft(x; d) := ,t > 0 t converge to a limit if t ↓ 0, i.e

(1)

f ′(x, d) := limt↓0 ft(x; d)

(2) 4

An important result is now given by the following. Lemma 1. If the function f : Rn → R is continuously differentiable with gradient ∇f (x) then for any x ∈ Rn and direction d ∈ Rn the directional derivative f ′(x, d) exists and it satisfies f ′(x, d) = ∇f (x)⊤d.

(3)

We consider first the unconstrained optimization problem min{f (x) : x ∈ Rn}

(P )

and assume that the function f is continuously differentiable with gradient ∇f (x).

5

Definition 2. A vector x∗ ∈ Rn is called an optimal solution of optimization problem (P ) if f (x) ≥ f (x∗) for every x ∈ Rn. If we introduce the so-called Euclidean norm of the vector x given by ∥ x ∥:=

(∑ n

2 x j=1 j

)1 2

(4)

,

then the next algebraic characterization of an optimal solution of optimization problem (P )is well known. Lemma 2. (first and second order characterization optimal solution) 1. If x∗ is an optimal solution of optimization problem (P ), then ∇f (x∗) = 0. 2. If the function f : Rn → R is twice continuously differentiable and x∗ is an optimal solution of optimization problem (P ), 6

then its Hessian ∇2f (x∗) := ( ∂x∂f∂x (x∗)) satisfies i

j



d ∇2f (x∗)d ≥ 0

(5)

for every d (The Hessian matrix is positive semidefinite) Proof. It follow for the direction d = −∇f (x∗) and x∗ an optimal solution that by Lemma 1 2

0 ≥ − ∥ ∇f (x∗) ∥

= −∇f (x∗)⊤∇f (x∗) f (x∗−t∇f (x∗))−f (x∗) = limt↓0 t

≥ 0 Hence we have shown that 2

∥ ∇f (x∗) ∥ = 0. This shows the first result.

7

To verify the second result we observe by the Taylor series expansion for a one-dimensional function that f (x∗ + td) 2 ⊤ ⊤ t ∗ ∗ = f (x ) + t∇f (x ) d+ 2 d ∇2f (xk )d + o(t2) 2



= f (x∗) + t2 d ∇2f (xk )d + o(t2) with o : R → R a function satisfying limt→0 o(t)t−1 = 0 This shows for every d and t > 0 that ⊤

d

∇2f (xk )d = ≥

2(f (x∗ +td)−f (x∗ )) o(t2) − t2 t2 o(t2 ) − t2 ⊤

and taking t ↓ 0 we obtain d ∇2f (xk )d ≥ 0. Remark 1. For a detailed overview on derivatives and basic 8

calculus the reader is referred to Chapter 3 of Ortega and Rheinboldt, Iterative solutions of nonlinear equations in several variables, SIAM (Classics in Applied Mathematics), 2000 First basic idea in the construction of an algorithm. The main idea of any unconstrained optimization algorithm is to select iteratively vectors xk ∈ Rn in such a way that the function value decreases at the next iteration point. This means that (6)

xk+1 = xk + tk dk

with the descent direction dk chosen (if possible) in such a way that ∇f (xk )⊤dk < 0

(7)

and tk ≥ 0 satisfying (8)

f (xk + tk dk ) < f (xk )

9

All the algorithms selecting the next iteration point in this way are called gradient type algorithms. A possible way to select tk (but not the only one possible way see for example section 1.3 of D.P.Bertsekas, Constrained optimization and Lagrangean multiplier methods, Academic Press,1982) is f (xk + tk dk ) = mint≥0{f (xk + tdk )} This part of the algorithm is called the selection of the step size in the direction dk . The following pseudo code with the accuracy ϵ > 0 selected by the optimizer holds for any gradient type algorithm. Pseudocode gradient type algorithm. Step 1. Choose some x0 ∈ Rn, k := 0 and go step 2. 10

Step 2. If ∥∇f (xk ) ∥≤ ϵ stop and go to step 3. Otherwise let dk be any direction satisfying ∇f (xk )⊤dk < 0

(9)

and select the step size tk > 0 in such a way that f (xk + tk dk ) < f (xk )

(10)

set

xk+1 = xk + tk dk and k := k + 1 and repeat step 2. Step 3. Output xk and f (xk ) and select xk as an approximation of the optimal solution. A well known choice for the direction dk is given by the steepest descent direction

dk := −∇f (xk ).

(11) 11

Analysis steepest descent algorithm. As a representative of a gradient type algorithm we now discuss the theoretical behavior of the steepest descent descent algorithm. Lemma 3. If the function f is continuously differentiable and the direction

dk = −∇f (xk ) is chosen the steepest descent direction and f (xk + tk dk ) = min{f (xk + tdk )} t≥0

then limt→∞ xk = x∞ implies ∇f (x∞) = 0. Proof. If after k iterations it follows that dk = −∇f (xk ) = 0 then by relation (6) it follows that xn = xk for n ≥ k and we obtain the desired result.

Therefore we only need to analyze the case

dk = −∇f (xk ) ̸= 0 for every k ∈ N. Since ∇f (xk )⊤dk < 0 and so f (xk + tdk ) − f (xk ) < 0 for some small positive t it must follows that the minimum tk of the optimization problem mint≥0 f (xk + tdk ) is attained in the interval (0, ∞). This shows by the first part of lemma 2 that 0 = df dt (xk + tk dk ) = ∇f (xk + tk dk )⊤dk = −∇f (xk+1)⊤∇f (xk ) and we obtain for every k

12

0 ≤ ∥ ∇f (xk ) ∥2 = ∇f (xk )⊤(∇f (xk ) − ∇f (xk+1))

(12)

Since by assumption limt→∞ xk = x∞ and f is continuously differentiable it follows that limk↑∞ ∇f (xk ) = ∇f (x∞) and so limk↑∞ ∇f (xk ) − ∇f (xk+1) = 0. By relation (12) this implies 0 ≤ ∥ ∇f (x∞) ∥2 = limk↑∞ ∥ ∇f (xk ) ∥2 = limk↑∞ ∇f (xk )⊤(∇f (xk ) − ∇f (xk+1)) = 0 and hence we have shown that ∇f (x∞) = 0. 13

Second basic idea of constructing an algorithm. Another idea is to replace the difficult twice continuously differentiable function f in the neighborhood of xk by its easier twice continuously differentiable second order Taylor approximation fk (x) =

  f (xk ) + ∇f (xk )⊤(x − xk ) 

2 +1 2 (x − xk )∇ f (xk )(x − xk )

with ∇2f (xk ) the Hessian and solve instead of the original problem the approximated problem min{fk (x) : x ∈ Rn}

(Pap)

with ∇2f (xk ) the Hessian. In case the matrix ∇2f (xk ) = (

∂f (xk )) ∂xi∂xj

is positive definite (see Lemma 1) and hence invertible it follows by part 1 of Lemma 1 that an optimal solution of optimization problem (Pap) must satisfy ∇f (xk ) − ∇2f (xk )(xk+1 − xk ) = 0 14

and so the ONLY solution of this system is

xk+1 = xk − ∇2f (xk )−1∇f (xk ). This must therefore be the optimal solution of optimization problem (Pap) and we take this optimal solution as the next point xk+1. A generalization of this is given by

xk+1 = x − tk ∇2f (xk )−1∇f (xk )

(13)

with tk > 0 a properly selected stepsize in the Newton direction −∇2f (xk )−1∇f (xk ). All these algorithms are examples of the so-called Newton method. In general Newton type methods converge faster to a stationary point but require much more computational power since it involves the computation of the inverse of the Hessian. Due to this so-called Quasi-Newton methods were developed in the literature in which you try to approximate the inverse of the Hessian by a simple updating formula. A basic reference to Newton type methods in nonlinear optimization is 15

J.M. Ortega, W.C Rheinboldt, Iterative solutions of nonlinear equations in several variables, Academic Press, New York,1970. Conclusion If we apply an algorithm to a unconstrained optimization problem and we know that the sequence xk of generated points converges to a limit x∞ then in most of those algorithms we may conclude under certain regularity conditions that ∇f (x∞) = 0. Hence the sequence of generated points converges to a so-called stationary point and so there is NO GUARANTEE that these algorithms find an optimal solution. This is a fundamental problem in nonlinear optimization. Therefore without any additional structure on the function the field of nonlinear programming constructs algorithms which in general converge to a local minimum.

16

In case we consider the optimization problem υ(P ) = min{f (x) : gi(x) ≤ bi, i = 1, ...m}

(P )

then an algebraic characterization of an optimal solution and hence generalization of Lemma 1 is given by the following result. Before discussing this lemma we need the next definition. Definition 2. 1. A vector x ∈ Rn is called a Fritz-John point (abbreviation FJ point) if there exists some scalars λi ≥ 0, i = 0, ..., m satisfying λ = (λ0, ..., λm) ̸= 0 such that the vector x satisfies the system of equations λ0∇f (x) +

∑m

λ ∇gi(x) = i=1 i

0

(14)

λi(gi(x) − bi) = 0, i = 1, ..., m.

(15)

and

17

2. A vector x ∈ Rn is called a Karush-Kuhn-Tucker point (abbreviation KKT point) if there exists some scalars λi ≥ 0, i = 1, ..., m such that the vector x is satisfies the system of equations ∇f (x) +

∑m

λ ∇gi(x) = i=1 i

0

(16)

λi(gi(x) − bi) = 0, i = 1, ..., m

(17)

and

Remark 2. 1. In the definition of a KKT or FJ point the relations (15) and (17) are called the complementary slackness conditions. 2. Without loss of generality we may assume in the definition of a FJ point that the vector λ belongs to a so-called simplex ∆m given by

18

∑m ∆m+1 := {(λ0, ..., λm) : λi ≥ 0, i=1 λi = 1}

(18) Hence an equivalent definition of a FJ point x is given by the existence of some λ ∈ ∆m+1 such that x satisfies the system of equations λ0∇f (x) +

∑m

λ ∇gi(x) = i=1 i

0

(19)

λi(gi(x) − bi) = 0, i = 1, ..., m.

(20)

and

3. It is obvious that any KKT point is also a FJ point. In general the set of KJ points is much larger then the set of KKT points.

19

To show that any optimal solution of the constraint optimization problem (P ) is a FJ point we first need the following result. Lemma 3. If x∗ is an optimal solution of the constrained optimization problem (P ) and J(x∗) := {i ∈ {1, ..., m} : gi(x∗) = bi} is the set of the so-called active constraints at x∗, then it follows for every d ∈ Rn that max{sf (d), ∇f (x∗)⊤d} ≥ 0 with sf (d) :=

  −∞

(21)

if J(x∗) is empty

 max ⊤ i∈J(x∗ ){∇gi (x∗ ) d} if not

Proof. Prove this by contradiction and assume that for a given d ∈ Rn it holds that max{sf (d), ∇f (x∗)⊤d} < 0 20

Use this to contradict that x∗ is an optimal solution. To give a so called dual interpretation of Lemma 3 we need the next result related to Farkas lemma. Observe the strong duality theorem of linear programming used in the next proof can be shown constructively by means of the simplex method. Lemma 4. Let ci ∈ Rn, i = 1, ..., s be a set of some given vectors. The following conditions are now equivalent: 1. For every d ∈ Rn it holds that max1≤i≤s c⊤ i d ≥ 0. 2. There exists some λ ∈∆s with ∆s defined in relation (18) satisfying ∑s

λc = i=1 i i

0.

Proof. If condition 1 holds then clearly 0 = mind∈Rn max1≤i≤s c⊤ i d n = min{z : c⊤ i d − z ≤ 0, i = 1, ...s, d ∈ R } 21

This problem is a linear programming problem and by the strong duality theorem of linear programming (for an extensive discussion see course notes Tools of Optimization) we can construct its dual problem and this dual problem satisfies 0 = mind∈Rn max1≤i≤s c⊤ i d     

∑s i=1 λi ci = 0 ∑ = max 0⊤λ : si=1 λi = 1     λi ≥ 0 i = 1, ..., s

        

Hence the feasible region of the dual is not empty and we have verified part 2. To show the reverse inclusion ∑ we observe for the scalars λi ≥ 0, si=1 λi = 1 in part 2 that ∑s ⊤ i=1 λici d ∑s = ( i=1 λici)⊤d

max1≤i≤s c⊤ i d ≥

= 0

22

Combining now Lemma 3 and 4 we obtain the result we like to prove. Observe the next lemma is the equivalence for constrained optimization problems of Lemma 2 valid for an unconstrained optimization problem. Lemma 5. If x∗ is an optimal solution of optimization problem (P ), then x∗ is a Fritz-John point. Proof. By Lemma 3 and 4 it follows immediately that an optimal solution x∗ satisfies relation (14) by setting λi = 0 for i ∈ / J(x∗). Since x∗ is also feasible and therefore must satisfy gi(x∗) ≤ bi, i = 1, ..., m and λi = 0 for i ∈ / J(x∗) it also follows that x∗ satisfies relation (20). To show that any optimal solution is a KKT point we need to assume some so-called constraint qualification. A well known constraint qualification is introduced in the next definition. 23

Definition 3. Let x ∈ FP and introduce the set of active constraints at x given by J(x) := {i ∈ {1, ..., m} : gi(x) = bi} If the set J(x) is nonempty, then the MangasarianFromovitz (MF) constraint qualification for problem (P ) is satisfied at x if there exists some d0 ∈ Rn satisfying maxi∈J(x){∇gi(x)⊤d0} < 0. One can now show the following result. Lemma 6. If x∗ is an optimal solution of optimization problem (P ) and at x∗ the Managasarian-Fromovitz constraint qualification is satisfied then x∗ is a Karush-Kuhn-Tucker point.

24

Proof. Since x∗ is an optimal solution we know by Lemma 5 that x∗ is a FJ point. Consider now the following two mutually exclusive cases. If the set J(x∗) is empty, then for every i = 1, ..., m it follows that gi(x∗) < bi. This shows by the complementary slackness conditions in relation (15) that λi = 0 for every i = 1, ..., m and so by relation (14) it must follow that λ0 > 0. Divide now in relation (14) the FJ conditions by λ0 and we obtain the relations (16) for a KKT point. If the set J(x∗) is nonempty and we assume by contradiction that λ0 = 0 in relation (14) then we obtain ∑

λ ∇gi(x∗) = i∈J(x∗ ) i

0.

Without loss of generality (see Remark 2) we may assume that ∑

λ = 1. i∈J(x∗) i

25

This implies for every i ∈ J(x∗) that maxi∈J(x∗){∇gi(x∗)⊤d0} ≥



⊤ i∈J(x∗ ) λi ∇gi (x∗ ) d0

= 0 and this contradicts the Managasarian-Fromovitz constraint qualification. Hence it must follow that λ0 > 0 and dividing relation (14) by λ0 > 0 we obtain relation (16). If a constraint qualification is not satisfied the next counterexample shows that even for a convex optimization problem an optimal solution might not be a KKT point. Example 1. It might happen even for a convex optimization problem to be introduced in the next subsection that an optimal solution is not a KKT point. Observe an optimal solution is always a KJ point but it might not be a 26

KKT point as shown by the following example. min x1 (x1 − 1)2 + (x2 − 1)2 ≤ 1 (x1 − 1) + (x2 + 1)2 ≤ 1 The only feasible solution (1, 0) is off course optimal and one can easily verify that this feasible solution does not satisfy relations (16) and (17). Hence an optimal solution of an optimization problem is not necessarily a KKT point. Since it can be shown for convex programming problems that any feasible KKT point is an optimal solution we observe for the above convex optimization problem that the set of feasible points which are KKT points is empty. This means even for convex programs that the KKT system might not have a feasible solution. Remark 3. For general nonlinear optimization problems it does not hold that any FJ point or a KKT point is an optimal solution. 27

Again for the constrained optimization problems all socalled methods of feasible directions (see Chapter 10 of Bazaraa, Sherali and Shetty) are iterative procedures are given by

xk+1 = T xk with T : Rn → Rn a properly chosen mapping. If the sequence xk converges to a limit point x∞ the best we can show is that x∞ is a FJ point or if we are lucky a KKT point. However, by the previous observation that result does not imply that the limit point is an optimal solution and so we do not have ANY GUARANTEE about the quality of the selected limit point. A very nice book discussing these general ideas is W.I. Zangwill, Nonlinear Programming (A unified approach), Prentice Hall,1969. M.S.Bazaraa, H.D.Sherali, C.M.Shetty, Nonlinear Programming (Theory and Algorithms), Chapter 7 Wiley, New York, 1993. 28

Convex optimization. To introduce convex optimization we first following class of functions. Definition 4. A set D ⊆ Rn is called a convex set if for every x1, x2 ∈ D the convex combination αx1 + (1 − α)x2 ∈ D

(22)

for every 0 < α < 1. A function f : D → R with D a convex set is called convex on D if f (αx1 +(1−α)x2) ≤ αf (x1)+(1−α)f (x2) (23) for any x1, x2 ∈ D and 0 < α < 1. The function f is called concave on D if −f is convex on D. We first list some well-known properties of finite valued convex functions. Lemma 7. If f : Rn → R is a convex function and for every t > 0 29

the function ft : R2n → R is given by (see also relation (1)!) f (x + td)−f (x) ft(x; d) := t then it follows that the function t 7→ ft(x, d) is nondecreasing on (0, ∞) and the function d 7→ft(x, d) is convex. Proof. Clearly for every t1 > t2 > 0 it follows that t t x + t2d = 2 (x + t1d) + (1 − 2 )x t1 t1 This shows by the convexity of f that t t f (x + t2d) ≤ 2 f (x+t1d) + (1 − 2 )f (x) t1 t1 and so we obtain ft2 (x, d) = f (x+t2td)−f (x)

2 t2 t1 (f (x+t1 d)−f (x)) t2

≤ = ft1 (x, d).

30

Also it is easy to verify using the convexity of f that for every 0 < α < 1 ft(x; αd1+(1−α)d2) ≤ αft(x; d1)+(1−α)ft(x; d2) A direct consequence of Lemma 7 is given the following important observation. Lemma 8. The directional derivative f ′(x; d) of a convex function function exists and it satisfies f ′(x; d) ≤ ft(x; d) for every t > 0. In particular, if the convex function f is differentiable then for every every x and d ∇f (x)⊤d ≤ f (x + d)−f (x)

(24)

Proof. Since by Lemma 7 the function t 7→ ft(x, d) is nondecreasing on (0, ∞) the first part of this result follows by relation (2). The second part follows by Lemma 1. 31

The inequality in relation (24) is called the subgradient inequality. It states that for a differentiable convex function its first order Taylor approximation is always below the function. In general it is difficult to check computationally whether a given function is convex. The following closeness properties are well known and often used to show that a given function is convex. 1. If the function f, g : Rn → R are convex then the function αf + βg is convex for every α, β ≥ 0. 2 If the functions fi : Rn → R, i ∈ I are convex then the function fsup : Rn → (−∞, ∞] given by fsup(x) = supi∈I fi(x) is convex. 3, If the functions fn : Rn → R, n ∈ N are convex and f (x) = limn↑∞ fn(x) for every x then the function f is convex. 32

4. If the function f : X × Y → R is convex with X ⊆ Rn convex and Y ⊆ Rm a convex set then the function g : Y → [−∞, ∞) given by g(y) = inf x∈X f (x, y) is convex. A convex programming problem is given by v(P ) = min{f (x) : x ∈ F} with the feasible region F given by F = {x ∈ Rn : gi(x) ≤ bi, i = 1, ..., m}. and the functions f, gi are convex on Rn.

33

Consider now for a continuously differentiable convex function f : Rn 7→ R the unconstrained optimization problem v(P ) = min{f (x) : x ∈ Rn}.

(P )

An extremely useful implication of Lemma 3 is given by the following result. Lemma 9. The vector x∗ is an optimal solution of optimization problem (P ) if and only if ∇f (x∗) = 0. Proof. We already proved the result x∗ is an optimal solution of optimization problem (P ) implies ∇f (x∗) = 0. To show the reverse inclusion apply the subgradient inequality in relation (24) with x replaced by x∗. Another important result showing the usefulness of convex optimization problem is given by the following. 34

Lemma 10. If the functions gi,i = 1, ..., m and f are convex and x∗ belongs to FP and is a KKT point satisfying relations (16) and (17) then x∗ is an optimal solution of the optimization problem (P ). Proof. Since x∗ satisfies the KKT system we obtain ∇f (x∗) = −



λ ∇gi(x i∈J(x∗ ) i

∗).

with J(x∗) = {i ∈ {1, ..., m} : gi(x∗) = bi} the set of active constraints at x∗. Observe by the complementary slackness conditions it follows that λi = 0 if i ∈ / J(x∗). This shows for every x ∈ FP and choosing d := x − x∗ that ∇f (x∗)⊤d = −



⊤ d.

λ ∇gi(x∗) i∈J(x∗ ) i

(25)

35

By the subgradient inequality it follows for i ∈ J(x∗) and using x ∈ FP that ∇gi(x∗)⊤d = ∇gi(x∗)⊤d + gi(x∗) − bi ≤ gi(x) − bi ≤ 0 Applying relation (25) and λi ≥ 0, this shows f (x∗)⊤d ≥0 and by the subgradient inequality we obtain f (x) ≥ f (x∗) + f (x∗)⊤d ≥ f (x∗). This shows x∗ is an optimal solution. By Lemma 10 and 9 we observe that all algorithms for convex programming try to find a KKT point (for constrained optimization) or a stationary point (for unconstrained optimization) and by doing so the selected solution is indeed an optimal solution. 36

Literature S. Boyd, L.Vandenberghe, Convex Optimization, Cambridge University Press, 2004 D.P. Bertsekas, Nonlinear Programming, Athena, Convex programming problems are discussed in more detail the course IE 509, Nonlinear Programming.

37

Separable optimization problems. A so-called separable optimization problem is given by ∑n v(P ) = min{ j=1 fj (xj ) :

x ∈ F}

(26)

with the feasible region S given by F := {x

∈ Rn :

∑n

g (x ) ≤ bi, i = 1, ..., m}. j=1 ij j

The feasible region can also be given by the discrete set F := {x

∈ Zn +:

∑n

g (x ) ≤ bi, i = 1, ..., m}. j=1 j j

In this case the functions fi, gij : R → R are arbitrary functions without any additional structure like convexity. For this particular class of discrete or continuous optimization problems we may use Dynamic Programming to find an optimal solution. However, it might be computationally very expensive on a computer to do this. How do we do this? 38

Analysis. Introduce for every y ∈ Rm and k = 1, ..., n the (possibly empty) feasible region Fk (y) := {x

∈ Rn−k+1 :

∑n

g (x ) ≤ yi, 1 ≤ i ≤ m} j=k ij j and introduce for k = 1, ..., n the functions wk : Rm → [−∞, ∞], given by wk (y) =

 ∑  min{ n j=k fj (xj ) : 

x ∈ Fk (y)} if Fk (y) ̸= ∅



if Fk (y) = ∅ (27) It is assumed that the one-dimensional parametric optimization problems given by wn(y) = min{fn(xn) : gin(xn) ≤ yi, i = 1, ..., m} can be easily solved. It follows by the definition of w1 that w1(b) = υ(P ). 39

The main question is now how to compute the value w1(b) = υ(P ) and obtain an optimal solution. Introducing the vector-valued function gk : R → Rm given by

gk (x) := (g1k (x), ..., gmk (x)) one can show the following recurrence relation between wk and wk+1. Lemma 11. For every k = 1, ..., n and y ∈ Rm it follows that wk (y) = minxk {fk (xk )+wk+1(y −gk (xk ))} (28) Proof. For notational convenience let I := {1, ..., m} and introduce the sets Fk (y,xk ) given by {x

∈ Rn−k :

∑n

g (x ) ≤ yi−gik (xk ), i ∈ I}. j=k+1 ij j

40

By the definition of wk in relation (27) we obtain wk (y)

∑ = minxk ,...,xn { n j=k fj (xj ) : x ∈ Fk (y)}

∑ = min{minxk+1,...,xn { n j=k fj (xk ) : x ∈ Fk (y, xk )}. xk

Also for every xk ∈ R it follows by the separability of the functions that ∑ minxk+1,...,xn { n j=k fj (xk ) : x ∈ Fk (y, xk )}

  fk (xk )+ = ∑n  minx { k+1 ,...,xn j=k+1 fj (xk ) :

x ∈ Fk (y, xk )}

= fk (xk ) + wk+1(y − gk (xk )). Combining both relations yields the recurrent relation wk (y) = min{fk (xk ) + wk (y − gk (xk )} xk

given in relation (28).

41

Using Lemma 6 we could apply in theory the following algorithm to compute w1(b) = v(P ). Dynamic Programming algorithm. Step 1. Compute wn(y) for every y ∈ Rm. Step 2. Evaluate iteratively for k = n − 1 up to 1 wk (y) = min {fk (xk ) + wk+1(y − gk (xk )} xk

Step 3. Output w1(b) Remark 4. The above DP formulation is called the backwards DP formulation. Since w1(b) can be obtained via a one-dimensional minimization problem over the decision variable x1 we should select the optimal x1 achieving this minimum. Now going forwards we know given the selected x1 how much space is allocated to this variable and we should consider w2(b − g1(x1)). 42

Again this function value was obtained as a one dimensional optimization problem over the decision variable x2 and we now should select the optimal x2 achieving this minimum. By continuing we obtain values for all the decision variables and this constructed solution is an optimal solution. The space Rm in the above formulation is called the state space of the DP formulation and the total number of different iteration steps are called the number of different stages of the DP formulation. A different approach is given by the following. Introduce the functions wk : Rm → [−∞, ∞) wk (y) = min{

∑k

f (x ) : j=1 j j

x ∈ Fk (y)}

with Fk (y) := {x

∈ Rk :

∑k

g (x ) ≤ bi, i = 1, ..., m} j=1 ij j

In this case we obtain by a similar proof the recurrent relation wk+1(y) = min {fk+1(xk+1)+wk (y−gk+1(xk+1))} xk+1

This DP approach is called the forward DP formulation. Remark 5. Observe the above recurrent relations hold due to the separable structure of the functions and the property minx,y f (x, y) = minx miny f (x, y) The above recurrent relation is called Bellmans equation and in each iteration we only have to solve a onedimensional optimization problem. However, since the state space is Rm representing the number of restrictions it is impossible on a computer to compute the value of the function wk for every argument. We need to discretize the state space and even then, due to the dimension m of the state space, it is computationally very costly to evaluate the function. 43

Only if the dimension of the state space is very small one could apply the above algorithmic procedure on a computer. This fundamental problem related to the dimensionality of the state space is called the curse of dimensionality. Therefore one only encounters in the literature DP based solution procedures where the number of restrictions is small. A special case of convex programming and also of separable programming is given by

44

Linear programming problems. A linear program problem in standard form is given by max

∑n j=1 cj xj ∑n j=1 aij xj

≤ bi 1 ≤ i ≤ m

xj

≥ 0 1≤j≤n

(LP)

In matrix notation this reads max{c⊤x : Ax ≤ b, x ≥ 0} In general the theory of linear programming problems is well understood (are discussed in IE 501 and IE 301) and we use the simplex method or an interor point method to solve linear programming problems and never use DP. We now introduce integer valued separable optimization problems different from linear programming problems.

45

Binary linear programming problems. If our decision variables are of the yes/no type (for example accept a given investment or do not accept it) introduce the set B = {0, 1} representing these these two options. A binary linear programming problem is then defined by max

∑n j=1 cj xj ∑n j=1 aij xj

≤ bi 1 ≤ i ≤ m

xj ∈ B

(BP)

1≤j≤n

In matrix notation this reads max{c⊤x : Ax ≤ b, x ∈ Bn} with C n := {(x1, ..., xn) : xj ∈ C for every 1 ≤ j ≤ n}. The set C n is called the n-fold Cartesian product of the set C.

46

Integer linear programming problems. If our decision variables are of the discrete type (for example number of produced items) we encounter a so-called integer linear programming problem. This is given by max

∑n j=1 cj xj ∑n j=1 aij xj

xj ∈ N

≤ bi 1 ≤ i ≤ m 1≤j≤n

In matrix notation max{c⊤x : Ax ≤ b, x ∈ Nn}. In this section we list some important particular cases of integer programming problems.

47

Linear assignment problem. Suppose there are n tasks to be executed and there are n persons who can execute each of these tasks. Assume now that each person must execute exactly one task and cij = cost of executing task j by person i.

(29)

The goal now is to minimize the total cost of executing all the tasks. Modelling To introduce the decision variables we first give a graphical representation as listed in the next figure.

48

1



1

2

j

2

j

3

3

6

Tasks

Persons

Fig 1: representation feasible assignment Since the decision variables should describe the existence of an arc connecting a person to a task it is natural to introduce the following decision variables: {

xij =

1 if an arc connects person i to task j 0 otherwise

To describe the feasible region we observe that exactly one arc should leave from each node representing a person and exactly one arc should arrive at each node representing a task. This means

∑n x =1 j=1 ij

for each person i, 1 ≤ i ≤ n and ∑n

x =1 i=1 ij

for each task j, 1 ≤ j ≤ n. The above set of 2n constraints are called the assignment constraints. Moreover for a given feasible assignment x = (xij ) it follows by relation (29) that the cost f (x) of this assignment x is given by ∑n ∑n f (x) = c x i=1 j=1 ij ij

and so the linear assignment problem has the mathematical programming formulation ∑n ∑n min i=1 j=1 cij xij ∑n j=1 xij = 1 ∑n i=1 xij = 1

xij ∈ B

1≤i≤n 1≤j≤n

(LAP)

1 ≤ i, j ≤ n 49

The set of feasible solutions is finite and this set contains n! elements (hint: for person 1 there are n possible tasks available and given an assigned task for person 1 there are (n − 1) possible tasks for person 2.etc). By Stirlings formula we know lim

n!

√ n↑∞ 2 2πnn+ 1 2 exp(−n)

=1

and so for n large this is an astronomical finite number. Since it is well known that the LP relaxation of this problem yields integer solutions one can solve the corresponding LP relaxation. For a more efficient algorithm formulating the problem as a minimum cost flow problem see Lawler Combinatorial optimization page 129.

50

Knapsack problem A hiker faces the following problem. For his travel he can possibly pack n items in his knapsack and each item j, 1 ≤ j ≤ n has volume aj . However his knapsack has a limited volume given by b and it soon turns out that the hiker cannot take all the items with him due to the limited volume capacity of his knapsack. Therefore he assigns to each item j a utility value cj and faces the problem of finding the selection of items yielding the highest utility value. To formulate this problem as a mathematical programming problem we observe that the decision variable of selecting an item is actually a yes\no decision. Therefore it is natural to introduce {

xj =

1 if item j is selected 0 otherwise

51

Now the associated constraint representing the knap∑n sack capacity is given by j=1 aj xj ≤ b and the objective function f for a given selection vector x has the form ∑n f (x) = c x . j=1 j j

Hence the knapsack problem can be given by the following formulation max

∑n j=1 cj xj ∑n j=1 aj xj ≤ b

xj ∈ B

(KP) 1≤j≤n

The feasible set contains a finite number of elements and this number is bounded above by 2n. (for each decision variable we have 2 possibilities!) This problem is in general NP hard and due to one constraint suitable for the DP formulation. For a detailed overview on knapsack related problems S.Martello, P.Toth, Knapsack problems (algorithms and computer implementations), Wiley 1990 52

Set covering problem. A transportation company faces the following problem. Every working day this company needs to service different customers and so depending on todays customers every working day a set of routes has to be determined to serve those customers. The transportation company has a fixed set N = {1, ..., n} of routes and clearly the subset of the selected routes for today should satisfy that each of the m present customers is on at least one of the selected routes. Also each route j, 1 ≤ j ≤ n, has a known cost cj (related to for example the total number of miles of a route and the total duration of the given route) and so additionally the company wants this selection to have minimal cost. Selecting a route j for today is again a yes\no problem and it is therefore natural to introduce for each route j, 1 ≤ j ≤ n of the fixed set N the decision variable {

xj =

1 if route j is selected 0 otherwise 53

To determine todays feasible region we need to determine first for any route j belonging to the set N which of todays customers i, 1 ≤ i ≤ m are on this route. To achieve this construct the m × n matrix A = (aij ), (columns are the n different routes and rows are the m customers) given by {

aij =

1 if customer i is on route j 0 otherwise

Now for a vector x = (x1, ..., xn) and todays customer i, 1 ≤ i ≤ m, the value ∑n a x j=1 ij j

yields the number of times customer i is on the subset of selected routes. Hence the decision vector x should satisfy for each customer i, 1 ≤ i ≤ m the restriction ∑n

a x ≥ 1. j=1 ij j

54

Since the total cost f (x) of a selection x is clearly given by f (x) =

∑n

c x j=1 j j

the mathematical programming formulation of the socalled set covering problem has the form ∑n min j=1 cj xj ∑n j=1 aij xj ≥ 1 1 ≤ i ≤ m

xj ∈ B

(SP)

1≤j≤n

The feasible set contains a finite number of elements and this finite number is bounded above by 2n. Again this problem is NP hard. Due to the huge number of restrictions it is not advisable to use the DP formulation.

55

Dynamic programming. In this section we will discuss dynamic programming by means of examples. (Chapter 1-3 of Denardo) Shortest paths in acyclic graphs. Consider a directed acyclic graph D = (V, A) with V denoting the set of nodes and A the set of directed arcs. Each directed arc a has a nonnegative value ca representing the distance on this arc. Observe acyclic means that there are no cycles within the graph.

1 >6

ca

3 3

s z 1

s q

t

-?

2 4 Fig 11: example of directed graph

56

Question: How to construct paths of minimal length in acyclic graphs? Suppose we have found a shortest path from node s to v and this shortest path goes through node v1. Clearly the path s → v1 on the shortest path from s to v is also the shortest path from s to v1. (see next figure).

s

1 s z

v

v1 Fig 12: optimality of subpaths

If this does not hold, we can find a shorter path from s to v1 and concatenate this with the path from v1 to v on the shortest path form s to v. This new constructed path certainly has a smaller length than the original shortest path and so we obtain a contradiction. 57

Observation. A shortest path from s to v consists of shortest subpaths. Computation of shortest lengths paths. Let L(v),v ∈ V denote the length of the shortest path from s to v. Clearly L(s) = 0 and to determine for v ̸= s the shortest length L(v) introduce the set V −(v) := {w ∈ V : w is immediate predecessor of v} By an immediate predecessor we mean that there is a directed arc awv = (w, v) with starting point w and endpoint v. Now the next recurrent relation for the shortest lengths must hold: Lemma 12. For every v ̸= s it follows that L(v) = min{L(w) + cwv : w ∈ V −(v)}

(30)

58

Proof. Since node w is an immediate predecessor of node v we construct the shortest path from s to w and concatenate this with the directed arc awv having length cwv . Hence we have constructed a path from s to v with length L(w) + cwv and so L(v) ≤ L(w) + cwv for every w ∈ V −(v). This shows that L(v) ≤ min{L(w) + cwv : w ∈ V −(v)}

(31)

Since a shortest path from s to v must go to a node w0 ∈ V −(v) and by our observation the sub-path from s to w0 must be a shortest path from s to w0 having length L(w0) we obtain L(v) = L(w0) + cw0v

(32)

Combining relations (31) and (32) yields relation (30). How to compute the actual shortest path? After computing the shortest length consider the end node t (the sink) and the value L(t). 59

In relation (30) there is an immediate predecessor node v0 achieving the minimum in relation (30). This is the last node on the shortest path form s to t before reaching t. Consider now L(v0) and determine the predecessor v1of v0 achieving the minimum in relation (30). This is the second last node on the shortest path. Continue with this procedure and stop until we first meet s. The shortest path is now given by s → wk → wk−1 → ...w0 → t. This procedure to recover the shortest path is called backtracking. Clearly the above recursive procedure does not hold for cyclic graphs (graph has a cycle) since in that case some node can be both a predecessor and successor of a given node. (make a picture!) We will now discuss a DP formulation for a network with arbitrary weights (not necessarily positive) and nonnegative cycles. 60

Shortest paths in graphs with arbitrary weights and nonnegative cycles. To solve a shortest path problem for (possibly cyclic graphs containing only cycles with positive lengths and having arbitrary (positive and negative) arc lengths we can use the following successive dynamic programming procedure due to Bellman-Ford. To introduce this procedure we assume for notational convenience that cwv = ∞ ⇐⇒ no directed arc awv exists. All the other arcs have finite weights cwv . Consider now the (possibly cyclic) graph D = (V, A) with nonnegative cycles and introduce for every m the value the length of a shortest path from s to v L(m)(v) = among the set of sets of paths s → v containing no more then m directed arcs

61

Since between every two nodes there is now an arc with weight (possibly ∞ corresponding to the existence no arc) it follows by the above definition of L(m)(v) that for every v ∈ V L(1)(v) =

  0

if v = s

 c sv if v ̸= s

Moreover, for arbitrary m one can show the following recursive relation. Lemma 13. It follows for every v ∈ V and m ∈ N L(m+1)(v) = min{L(m)(v), min {L(m)(w)+cwv }} w̸=v∈V

(33)

62

Proof. To show the above recurrent relation we observe that a shortest path from s to v among the class of paths from s to v containing at most m + 1 arcs has either no more than m arcs or it contains exactly m + 1 arcs and has some final arc say aw0v . In the first case its length is given by L(m)(v). In the last case the number of arcs of the sub-path from s to w0 has exactly m arcs (thus bounded by m arcs) and since this must be a minimal length path containing no more than m arcs its length should by L(m)(w0). Hence the total length of this path is given by L(m)(w0) + cw0v = min {L(m)(w) + cwv } w̸=v∈V

Considering these two possibilities we obtain relation (33). By the interpretation of L(m)(v) or using relation (33) we obtain L(1)(v) ≥ L(2)(v) ≥ .... 63

If the graph contains only cycles with a positive length there exists a shortest path with no repeated nodes and hence a shortest path contains at most n nodes if the graph contains n nodes in total. Hence we obtain that L(n)(v) is the length of the shortest path from s to v. The equation in relation (33) must therefore be evaluated at most for m = 1, ..., n and each evaluation takes O(n2) evaluations (additions and minimizations). Hence the overall computational time is given by O(n3). Again by backtracking as for acyclic graphs we can evaluate the shortest path. Remark 6. Observe the computational faster well known algorithm of Dijkstra (complexity O(n2) can only be applied to graphs with nonnegative weights. (see Lawler). For more different shortest path problems and their solution by dynamic programming see E.Lawler,Chapter 3, Combinatorial Optimization Networks and Matroids, Holt, Rinehart and Winston, 1976) 64

Finally we remark that finding shortest path or alternatives is nowadays very important and computing these should almost take no time even in large networks. (Think of GPS systems like road navigation equipment) Allocation problems A merchant has a quantity b ∈ N of items in stock and he wants to distribute part of this or all of this quantity among n customers. The net return for sending an amount a to customer j is given by Rj (a). It is assumed that Rj is a strictly increasing nonnegative function. The merchant now has to decide how many items he will send to each of the customers. Therefore his decision variable is given by xj := number of items allocated to customer j and so the corresponding mathematical programming formulation has the form 65

v(P ) = max

∑n j=1 Rj (xj ) ∑n j=1 xj ≤ b

xj ∈ N

1≤j≤n

To solve the above problem we introduce for every y ∈ {0, 1, 2, ...b} wk (y)



:= max{ kj=1 Rj (xj ) :

∑k j=1 xj ≤ y, xj ∈ N}

Clearly wk (y) denotes the optimal revenue obtained from the first k customers if we assign at most y items to them. By this interpretation it follows that the optimal objective function value is given by wn(b). To derive a recurrent relation as in Lemma 4 we observe by the monotonicity of R1(.) that for every y ∈ {0, 1, 2, ...b} v1(y) = R1(y)

66

and (copy proof of Lemma 4) wk+1(y) =

max

{Rk+1(xk+1)+wk (y−xk+1)}

xk+1∈{0,1,...,y}

(34) To give an alternative proof of relation (34) we assume that the customers can be served in any order. Given some available stock y the merchant argues that he will first decide how much he should allocate to the first k customers (consider the first k customers as one mega customer) and the (k + 1)th customer with k ≤ n. (see Figure 12) optimally allocate y − x to first k 6customers 

-

6

allocate x to (k + 1)th customer Fig 12: construction dynamic recurrence relation

67

Once he has decided to allocate a certain amount y − x to the first k customers he will do this in an optimal way. Clearly the profit gained from these k customers equals wk (y − x) and so the total profit of this particular allocation is given by wk (y − x) + Rk+1(x) Since the above allocation is feasible this implies wk+1(y) ≤

min

{Rk+1(x) + wk (y − x)}

x∈{0,1,...,y}

Since there are only a finite number of different allocations and for the optimal one the merchant will reserve y − x for the first k customers if y products are available we obtain relation (34). If the merchant has only one customer and has y ∈ {1, ...., b} units in stock it follows using R1 is strictly increasing that w1(y) = R1(y).

(35)

68

Hence the dynamic programming algorithm to solve the above allocation problem is given by the following algorithm. Step 1. Evaluate w1(y) = R1(y) for y ∈ {0, 1, ..., b} Step 2. Evaluate for k = 1 up to n − 1 and y ∈ {0, 1, ..., b} the value wk+1(y) =

max

{wk (y−x)+Rk+1(xk+1)}

xk+1 ∈{0,1,...,y}

Step 3. Evaluate wn(b) and use backtracking to obtain the optimal allocation.

69

Application of allocation problems. 1. Allocation of seats among the different fare classes (tickers with different conditions and hence a different price) in a plane (field of revenue management.). Observe in this case we also encounter the uncertainty of the demand for those different fare classes. Think of buying a ticket on internet. 2. Optimization of utility subject to a budget constraint. (field of micro economics.) 3. Allocation of resources over machines. (field of production planning.) A related problem is given by the following.

70

Knapsack problem. If we consider a knapsack problem given by v(P ) = max{

∑n

∑n c x : a x ≤ b, xj ∈ B} j=1 j j j=1 j j

with cj , aj , 1 ≤ j ≤ n, b ∈ N, then by a similar reasoning as for allocation problems we can use a dynamic programming approach. Introduce for every y ∈ {0, 1, ...., b} the optimization problem ∑k ∑k a x ≤ y, xj ∈ B} wk (y) := max{ j=1 cj xj : j=1 j j

Observe the above problem describes the optimal selection of the first k items in a knapsack with capacity y. We now obtain (prove yourself) the following recurrent dynamic programming relations

71

  0

v1(y) =

if y ∈ {0, 1, .., a1 − 1}

 c 1 otherwise

and with wk (y) := ∞ for y negative wk+1(y) = max{wk (y), wk (y − ak+1) + ck+1} for y ∈ {0, ..., b} (The first term represents the case that in an optimal solution the (k + 1)th item is not selected , while the second term represents the case that in an optimal solution this item is selected.) An alternative representation of the above recurrence relation similar to the representation (34) of the dynamic programming recurrence relation for an allocation problem is given by wk+1(y) = max {wk (y−xak+1)+xck+1} (36) x∈{0,1}

72

This means that every time we consider an item we can choose among the action select or not select. Clearly we need to evaluate wn(b) and this can be done by the following algorithm. Computation optimal solution knapsack problem. Step 1. Evaluate w1(y) for y ∈ {0, 1, ..., b} Step 2. For k = 1 up to n − 1 and y ∈ {0, 1, ..., b} compute wk+1(y) = max{wk (y), wk (y − ak+1) + ck+1} Step 3. Evaluate wn(b) and apply backtracking to find the optimal solution. Clearly the complexity of the above algorithm is given by O(nb) and so this complexity depends on the capacity b of the knapsack. 73

Equipment replacement model. In most cases dynamic programming is applied to problems having a time dimension. As an example consider the following equipment replacement model over a time span of N periods or time units at which we have to take decisions at times n, 0 ≤ n ≤ N − 1. At time 0 we start with a machine of age x ∈ Z+. At each time n, 0 ≤ n ≤ N − 1, we need to decide whether to keep the present machine for one additional time unit or replace it by a new one. Replace or not replace ? new machine installed ?

?

1

N Fig 13: N time units

74

To make at time n, 0 ≤ n ≤ N − 1 the decision replacement represented by the action an = 1 or no replacement represented by the action an = 0, we first introduce the following data. Consider the functions fc : Z+ → R, fr : Z+ → R, fs : Z+ → R, with fr (x) := revenue of machine of age x in next period. and fc(x) := cost of machine of age x in next period. and fs(x) := salvage value of machine of age x It seems reasonable to assume that the functions fr , fs are decreasing, while fc is increasing. The cost of buying a new machine equals p. If we decide at time n, 0 ≤ n ≤ N − 1 to replace (as already observed denoted by the action an = 1) a machine of age x, then the direct revenue 75

over the next period (n, n + 1] is given by r(x, 1) := fr (0) + fs(x) − fc(0) − p.

(37)

If at time n, 0 ≤ n ≤ N − 1 a machine of age x is not replaced, the direct revenue over the next period (n, n + 1] is given by r(x, 0) := fr (x) − fc(x).

(38)

Observe the function c is defined on the Cartesian product S × A with S = Z+ the so-called state space of the DP problem and A the so-called action set. In this case the action space is given by A = {0, 1} and the action space is independent of the state we observe. If we decide not to use machines beyond a certain age b and so we always replace a machine of age b we can model this as follows. The action space is now dependent on the state and we introduce the adapted state space S = {0, 1, ..., b} and A(x) =

  {0, 1} if x = 0, 1, ...b − 1  {1}

if x = b. 76

Another way to model this is to keep S = Z+ and A = {0, 1} and introduce the following next period revenues r(x, 1) :=

  fr (0) + fs(x) − fc(0) − p if x ≤ b − 1  −∞

and r(x, 0) :=

if x ≥ b (39)

  fr (x) − fc(x) if x ≤ b − 1  −∞

if x ≥ b

(40)

Hence in this case the next period revenue function is given by some function r : S × A → [−∞, ∞) with S = Z+ and A = {0, 1}. From a theoretical point of view this formulation is easier to handle and so in the general theory you will encounter extended real valued one period revenue functions. The problem is now to come up with a optimal sequence of actions (replace or not replace) at the times n, 0 ≤ n ≤ N − 1 (called a policy) which maximizes the total revenue over the given N periods. 77

To formulate this optimization problem we observe that any policy π can be represented by a N -dimensional vector π = (a0, a1, ..., aN −1) with an =

  1 if at time n the machine is replaced  0 if at time n the machine is not replaced.

Since at time 0 we start with a machine of age x ∈ Z+ we also allow in our model to take a decision a0 at time 0. Moreover, at time N we stop and receive a salvage value. To describe for a given policy π = (a0, ..., aN −1) the age of the machine at time n starting with a machine of age x at time 0 we introduce x0 = x and xn+1 = g(xn, an)

(41)

for every 1 ≤ n ≤ N − 1 with 78

g(xn, an) :=

  xn + 1 if an = 0  1

if an = 1

(42)

N −1 ⊆ S in relation (41) clearly The sequence {xn}n=0 represents the age of the machine at time n if we start at time 0 with a machine of age x and use the policy π = (a0, ..., aN −1). This shows that the revenue of a policy π starting with a machine of age x0 = x can be represented by the function υN,π : S → [−∞, ∞) and this function is given by

υN,π (x) :=

∑N −1 n=0 r(xn , an ) + fs (g(xN −1 , aN −1 ))

(43) To determine the optimal policy we therefore need to solve the optimization problem υN (x) := maxπ∈AN υN,π (x)

(P )

79

and determine the policy π ∗ ∈ AN satisfying for every x∈S υN,π ∗ (x) = υN (x). A mathematical programming formulation of the above optimization problem is given by max

∑N −1 n=0 r(xn, an ) + fs (xN )

x0 = x xn+1 = g(xn, an)

n = 0, ..., N − 1

an ∈ B

n = 0, ..., N − 1 (44)

Solution. To solve the optimization problem (P ) we introduce for k = 0 the functions wk : S → R given by w0(x) := fs(x)

(45)

and for k = 1, ..., N 80

wk (x)

} {∑ N −1 = maxa∈Fk (x) n=N −k r(xk , ak ) + fs(xN )

(46)

with

Fk (x) =

    

xN −k = x

a : xn+1 = g(xn, an) n = N − k, ., N − 1

   

an ∈ B

n = N − k, ., N − 1

By the definition of the function wk , k = 0, ..., N in relation (46) it is clear that   maximum revenue from   

wk (x) =

   

time N − k up to time N if

the machine at time N − k has age x (47)

81

    

   

For the above constructed functions wk , k = 0, .., N , it follows by relation (47) wN (x) = υN (x) for every x ∈ S. Moreover to compute wN (x) one should use the following recurrent relation connecting wk+1 and wk . Lemma 14. For every x ∈ S w0(x) = fs(x)

(48)

and for every k = 1, ...N − 1 and x ∈ S wk+1(x) = maxa∈{0,1}{r(x, a) + wk (g(x, a))} (49) Proof. The first part is clear by the definition of w0. To derive the above recurrent relation we observe the following. Suppose the machine has age x at time N −(k +1) and we need to evaluate at time N −(k + 1) the cost of the decisions replace or not replace. If we decide not to replace at time N − (k + 1), this 82

machine will have age x + 1 at time N − k and so the optimal revenue from time N − (k + 1) up to time N associated with the decision not to replace at time N − (k + 1) equals r(x, 0) + wk (x + 1)

(50)

If we decide to replace at time N − (k + 1) the new machine has age 1 at time N − k and so the optimal revenue from time N − (k + 1) up to time N associated with the decision replace at time N − (k + 1) equals r(x, 1) + wk (1)

(51)

Hence by evaluating these two possible decisions and their revenues given by relations (50) and (51) we obtain by the interpretation of wk+1 wk+1(x) = maxa∈{0,1}{r(x, a) + wk (g(x, a))} (52)

To compute the values vN (x) we need to execute the following dynamic programming algorithm Step 1. Compute w0(x) = fs(x) for x ∈ S. Step 2. For k = 0 up to N and every x ∈ S compute wk+1(x) = maxa∈{0,1}{r(x, a) + wk (g(x, a))} (53) Step 3. Output wN (x) and backtrack the set of optimal actions. To obtain the optimal sequence of decisions consider for every x ∈ S the value w1(x) and determine that action in relation (53) which yields w1(x). This means if we encounter state x at time N − 1 it is optimal to take that action. This determines the value w1(x). 83

Suppose now we encounter state x at time N − 2. In that case we determine that action which yields the maximum in relation (53) for k +1 = 2 and this action is the one we should take at time N − 2 if we face at time N − 2 the state x. We continue along this way until we have evaluated wN (x). In case of a finite set S we can therefore construct a matrix with rows representing the time periods still to go and columns the states we encounter. The corresponding element in the matrix is the action to take. Extension to a stochastic replacement model To introduce a stochastic version of the above replacement problem we first introduce a stochastic process X = {Xn : n = 0, 1, 2, ..., N } with a finite state space S (representing the different states of the machine in use) with

Xn := state of the machine at time n 84

This stochastic process describing the state of the machine in use is a time homogeneous Markov chain with one-step transition probabilities pij , i, j ∈ S and so pij = P(Xn+1 = j | Xn = i) = P(Xn+1 = j | X0 = i0, ...Xn = i) If we replace a machine at time n it is assumed that this replacement does not cost any time and so the new machine starts at time n in state 0 representing the new state. Moreover, at each integer point in time we either replace or do not replace the machine. In case we observe at time n, n = 0, ...N − 1 state s = i of the machine in use and we select action a it is assumed that the direct revenue in period (n, n+1] equals r(i, a). We assume that the same cost structure holds as for the deterministic model. 85

The cost of operating and maintenance depends on the state s = i of the machine and so with 0 denoting the new state r(i, 1) := fr (0) + fs(i) − fc(0) − p.

(54)

and r(i, 0) := fr (i) − fc(i).

(55)

If at time n state s = i of the machine is observed and action a is selected it is assumed that with probability (a) pij the machine will be in state s = j at time n + 1. (the effect of an action and the transition from one state to another state is not deterministic but is given by a transition probability depending on the chosen action a and the state. In this particular case it follows that   p0j (a) pij =  p ij

if a = 1 if a = 0

(56)

As for the deterministic case discussed in the previous example we will first write down the costs of a given policy. The policy π = (A0, ..., AN −1) 86

consists now of a sequence of Bernoulli random variables. Observe for each realization of the stochastic process X describing the state of the machine in use we assume that at each time n,n = 0, ..., N − 1 the class of considered polices depend only on the state Xn of the machine just observed and so for each realization of the stochastic process we might take a different action. An example of a such a policy is given by the rule that we only replace a machine if the state of the machine enters for the first time a given set A ⊆ S. Such a policy can be represented by the following table. Take S = {0, 1, 2, 3, 4} and N = 5 n\s 0 1 2 3 4 1

0 0 0 1 1

2

0 0 0 1 1

3

0 0 0 1 1

4

0 0 0 1 1

87

This means that the sequence of decision might be different at the times n, n = 0, ...N − 1, due to different realizations of the stochastic process X and so the policy π consists of a sequence of random variables. Technically speaking we only consider policies π = (A0, ..., AN −1) for which the stochastic process (Xn, An), n = 0, ...N − 1 is a time homogeneous or inhomogeneous Markov chain. Now for each policy π = (A0, ..., AN −1) belonging to this class M (a so-called Markov policy) the expected costs is given by υN,π (x) = Ex(

∑N −1 n=0

r(Xn, An)+fs(g(XN −1, AN −1))

with Ex(Y) := E(Y | X0 = x)

88

In this particular case the function g : S × A → S is given by   

g(Xn(ω), An(ω)) =

 

X(1) n+1 (ω) if An(ω) = 1 X(1) n+1 (ω) if An(ω) = 0 (1)

(0)

and the random variables Xn+1 and Xn+1 have distribution functions (1)

(1)

(0)

(0)

P(Xn+1 = j | Xn = i, An = 1) = pij , j ∈ S and P(Xn+1 = j | Xn = i, An = 1) = pij , j ∈ S

89

We now need to solve the problem υN (x) = maxπ∈M υN,π (x) for every x ∈ S and determine the optimal Markov policy π ∗ satisfying υN,π ∗ (x) = υN (x) Solution. As for the deterministic case we introduce for k = 0 the functions wk : S → R given by wk (x)

 ∑  E( N −1 n=N −k r(Xn, An)+

= maxπ∈MN −k 

fs(XN ) | XN −k = x)

(57) with MN −k the set of all Markov policies restricted to the times N − k, ..., N − 1. Clearly by the definition of the functions wk : S → R it follows that 90

maximal expected revenue from time N − k wk (s) = up to time N if the machine is in state s at time N − k just before selecting an action (58) By the above interpretation or relation (57) we obtain wN (x) = υN (x) for every x ∈ S. Moreover, to compute wN (x) one should use the following recurrent relation. Lemma 15. It follows for every x ∈ S that w0(x) = fs(x)

(59)

91

and for every k = 1, ..., N − 1 that wk+1(x) = max {r(x, a) +



a∈{0,1}

(a)

p w (y)} y∈S xy k

(60) Proof. The first part is clear by the definition of w0. To derive the above recurrent relation we observe the following. Suppose at time N − (k + 1) we observe state x of the machine and we need to evaluate at time N − (k + 1) the decision replace or not replace. If we do not replace at time N −(k+1) this machine at time (0) N − k will be in state y with probability pxy and so the optimal expected revenue from time N − (k + 1) up to time N using the interpretation of wk will be r(x, 0) +



(0)

p w (y). y∈S xy k

(61)

92

If we replace the machine at time N − (k + 1) the used machine will be at time N − k in state y with (1) probability pxy and using the interpretation of wk the optimal expected revenue from time N − (k + 1) up to time N is given by r(x, 1) +



(1)

(62)

p w (y) y∈S xy k

Hence by evaluating these two possible decisions and their expected revenues we obtain using the interpretation of wk+1 that wk+1(x) = max {r(x, a) +



a∈{0,1}

(a)

p w (y)} y∈S xy k

and this shows the desired result. Another equivalent way to way to represent the above recurrent relation is introducing for every x and k the event Ak,x,a := {XN −k = x, AN −k = a} 93

and writing wk+1(x) = maxa∈{0,1}{r(x, a) + E[wk+1(XN −k ) | Ak+1,x,a]} As done for the deterministic model on page 63 one can construct by backtracking the optimal Markov policy. The above model is a classical example of a socalled finite horizon Markov decision model with a finite action space. In the next section we will discuss another famous problem which can be solved by Dynamic Programming. Observe for a complete overview of so-called lotsizing models and extensions of those models the reader is referred to Yves Pouchet, L.A.Wolsey, Production planning by mixed integer Programming, Springer Verlag, New York, 2006.

94

Uncapacitated lotsizing problem. Consider the following deterministic N -period inventory production model. At time 0 no inventory is available and in each period 1 ≤ n ≤ N demand occurs and the size dn ∈ N of the demand is known in advance. At the beginning of each period 1 ≤ n ≤ N one needs to decide on producing xn ∈ Z+ items to fulfill the demand in period n. The cost of producing xn items is given by an increasing so-called production cost function pn : Z+ → R+ satisfying pn(0) = 0 and there are no capacity restrictions. Also, if at the end of period 1 ≤ n ≤ N the inventory level is given by sn then an inventory cost of hn(sn) needs to be paid with hn : Z+ → R+ an increasing so-called inventory cost function satisfying hn(0) = 0. Finally, if at the beginning of period 1 ≤ n ≤ N a production is started one incurs a production setup cost of fn > 0. The purpose now is to determine a production schedule which satisfies the occurring demand in time (no backordering allowed) and has the minimal production, inventory and setup costs. 95

To model this problem we introduce for n = 1, ..., N the following decision variables. xn =

  number of produced items in period  n to be decided at time n − 1

and yn =

  1 if production starts in period n  0 otherwise

and sn = inventory level at the end of period n at time n Clearly an optimal production schedule should satisfy sN = 0 and in each period one will not produce more ∑N than M = n=1 dn items and leave more than M items in inventory. The feasible region F of the above optimization problem is given by

96

F=

  sn−1 + xn = dn + sn        xn ≤ M yn   

yn ∈ B      xn, sn ∈ Z+       s0 = sN = 0, sn ≤ M

1≤n≤N 1≤n≤M 1≤n≤N 1≤n≤N

and we need to solve the problem (observe there are only a finite number of feasible production schedules!)

{∑ } N min p (x ) + hn(sn) + fnyn : (xn, sn, yn) ∈ F n=1 n n

A special instance of the above problem is given by the classical uncapacitated lot sizing problem in which the production cost and inventory cost functions are linear. In this case pn(xn) = pnxn and hn(sn) = hnsn. Another important instance is given by the socalled concave uncapacitated lot sizing problem in which the production cost and inventory cost functions pn, hn are discrete concave. 97

In this problem we incorporate efficiency of scale effects. We will now give a Dynamic Programming formulation to solve the above problem. In the remainder we only discuss the so-called forward formulation. The backward formulation is given as an exercise. Solution. Introduce the state space S = {0, 1, 2, ..., M } representing the inventory level and consider for any s ∈ S the feasible regions Fk (s), k = 1, ..., N , given by   sn−1 + xn = dn + sn        xn ≤ M dn    Fk (s) := yn ∈ B      xn, sn ∈ N       s0 = 0, sn = s, sn ≤ M

1≤n≤k 1≤n≤k 1≤n≤k 1≤n≤k 1≤n≤k (63)

98

Also introduce the functions wk : S → R+  ∑  k   n=1 pn(xn) + hn(sn) + fn yn : wk (s) = min   (xn, sn, yn)kn=1 ∈ Fk (s)

(64) Since there are only a finite number of elements in each feasible region Fk (s) the optimization problem associated with wk (s) has an optimal solution and so we may use min. In the forward DP formulation the state space S is given by the left-over inventory level at the end of the k-period horizon. By the above definition of the functions wk , k = 1, ..., N it follows   cost of optimal production schedule from   

wk (s) =

   

period 1 to period k among all schedules

satisfying sk = s at the end of period k

99

This implies wN (0) = υ(P ). In the next lemma we derive a recurrent relation relating the functions wk and wk+1. Lemma 16. Introducing for k = 1, ..., N the one period cost functions ck : S × Z+ → R in period k given by   hk (s) if x = 0 ck (s, x) =  p (x) + f + h (s) if x ∈ N k k k

we obtain w1(s) = c1(s, s + d1) and for k = 1, ..., N − 1 wk+1(s) = min{ck+1(s, x) + wk (s + dk+1 − x)} x∈B

with B = {x ∈ Z+ : M ≥ s + dk+1 − x ≥ 0}. 100

Proof. Clearly for k = 1 we obtain ending up with inventory level s1 = s ≥ 0 at the end of period 1 and having zero inventory at the beginning of period 1 that we should have produced s + d1 at the beginning of period 1. This shows w1(s) = p1(s + d1) + f1 + h1(s) = c1(s, s + d1).

(65)

If we are now at the end of period k + 1 facing sk+1 = s and we had to make a decision whether a production took place in period k + 1 we may have taken at the beginning of period k + 1 the following decision: 1. no production or xk+1 = 0. If this decision was taken at the beginning of period k + 1 and at the end of period k + 1 we observe inventory level sk+1 = s, then at the end of period k the inventory level should be sk = s + dk+1. 101

The associated minimal cost of not producing in period k + 1 is therefore given by hk+1(s) + wk (s + dk+1) = ck+1(s, 0) + wk (s + dk+1) 2. production in period k + 1 with production size xk+1 = x ∈ Z+. If this decision was taken at the beginning of period k + 1 and at the end of period k + 1 we observe inventory level sk+1 = s then at the end of period k we must have had inventory level M ≥ sk = s + dk+1 − x ≥ 0. The associated minimal cost of producing x units in period k + 1 and observing at the end of period k + 1 inventory level sk+1 = s is therefore given by pk+1(x) + fk+1 + hk+1(s) + wk (s + dk+1 − x) = ck+1(s, x) + wk (s + dk+1 − x) 102

Hence the minimum cost schedule with objective value wk+1(s) among all schedules which yield at the end of period k + 1 the inventory level sk+1 = s is given by wk+1(s) = min{ck+1(s, x) + wk (s + dk+1 − x)} x∈B

with B = {x ∈ N : M ≥ sk = s + dk+1 − x ≥ 0} This shows the desired recurrent relation. Using Lemma 16 we now need to apply the following algorithm to determine an optimal production schedule.

103

Algorithm solving uncapacitated lot sizing model. Step 1. Compute for every s ∈ {0, ..., M } the function w1(s) = c1(s, s + d1) and go to step 2. Step 2. For k = 1, ..., N − 1 compute iteratively for every s ∈ {0, ..., M } the functions wk given by wk+1(s) = min{ck+1(s, x} + wk (s + dk+1 − x}} x∈B

and go to step 3. Step 3. Output wN (0) and use backtracking to determine the optimal production schedule Exercise 1. Formulate the backward DP formulation of the 104

uncapacitated lotsizing problem and define everything in a proper way. (hint: compare with maintenance model!) We will now introduce a property of optimal production schedules in the uncapacitated concave lotsizing problem which yields a more efficient DP formulation. Definition 5. A feasible production schedule (xn, sn−1)N n=1 of the uncapacitated lotsizing model with xn ∈ Z+ denoting the production in period n and sn ∈ Z+ the inventory level at the end of period n satisfies the zero inventory property if xnsn−1 = 0 for every 1 ≤ n ≤ N. The set of schedules satisfying the zero-inventory property are called the zeroinventory property schedules. 105

Remark. Since we may assume without loss of generality that the demands in each period are positive (otherwise we combine periods!) it follows for a zero-inventory property schedule that production occurs at the begin of period n if and only if the inventory level at the end of period n − 1 equals 0. We now show the following important property useful in finding an optimal solution. Lemma 17. For any k-period uncapacitated concave lotsizing problem there exists an optimal production schedule among the set of zero-inventory property schedules. Proof. Consider an optimal solution (xn, sn−1)kn=1 of the k-period uncapacitated concave lotsizing model satisfying xt0 sn0−1 > 0 for some 1 ≤ n0 ≤ k. 106

Since we start with inventory 0 at the beginning of period 1 and d1 > 0 it must follow that x1 > 0 and so x1s0 = 0. Hence in any optimal schedule we encounter the zero inventory property at time 0 at the beginning of period 1. Consider now the first period that a production starts with positive inventory at the beginning of that period. Without any loss of generality we may assume this happens in period n0. We will now construct a feasible solution of the same problem satisfying the zero-inventory property at n0 without changing the objective function. Since xn0 sn0−1 > 0 we obtain that xn0 > 0 and sn0−1 > 0. This shows that ϵ := min{xn0 , sn0−1} ∈ Z+ > 0. Since we have a positive inventory at the end of period n0 − 1 and we started with inventory 0 there should have been a production before period n0. 107

Look now at the last production before period n0 and assume this is in period n1 < n0. Since by the definition of period n0 the zero inventory property holds in period n1 and we have a positive inventory ≥ ϵ at the end of period n0 − 1 the production in period n1 should have been bigger then ϵ. We now consider the following two possibilities (see next Figure).

6





?

n1

n0 scenario 1

-ϵ 

n1

+ϵ ?

n0 scenario 2

Figure 1.6 Construction of different schedules Increase the production in period n1 with ϵ (scenario 1) or decrease it with ϵ (scenario 2). In case we increase (decrease) the production at n1 with ϵ we decrease (increase) production at n0 with ϵ. 108

In scenario 1 we obtain at the end of period n inventory level sn + ϵ for every n1 ≤ n ≤ n0 − 1 and production xn0 − ϵ in period n0. Also in scenario 2 we obtain inventory level sn −ϵ for every n1 ≤ n ≤ n0 −1 and production xn0 +ϵ in period n0. Since we already observed that the production in period n1 is bigger than or equal to ϵ both scenarios are feasible solutions. This shows by the optimality of the optimal solution (xn, sn, yn)kn=1 that 1 1 c(scenario 1) + c(scenario 2) 2 2 with copt the optimal objective value and c(.) the cost associated with the given scenarios. By the concavity of the production function we obtain at n = n1 that copt ≤

1 1 pn1 (xn) ≥ pn1 (xn1 − ϵ) + pn1 (xn1 + ϵ) 2 2 and for n = n0 pn0 (xn0 ) ≥

1 1 pn0 (xn0 + ϵ) + pn0 (xn0 − ϵ) 2 2 109

Moreover, by the concavity of the holding cost function we obtain 1 1 hn(sn) ≥ hn(sn − ϵ) + hn(sn + ϵ) 2 2 for every n1 ≤ n ≤ n0 − 1. Since by our construction of the two scenarios the costs only changes between period n1 and n0 and since both scenarios are feasible we obtain by the concavity inequalities that 1 1 copt ≥ c(scenario 1) + c(scenario 2) 2 2 Hence it follows that 1 1 copt = c(scenario 1) + c(scenario 2) 2 2 and so both proposed scenarios should be optimal. By construction one of those scenarios has in period n0 (check yourself which scenario depending on whether xn0 = ϵ or sn0−1 = ϵ!!) the zero inventory property. Applying the same procedure to all non-zero inventory properties periods occurring later than in period n0 we finally obtain a zero-inventory property schedule with the same optimal value and we are finished. 110

Remark. By the above result we know that for the concave uncapacitated lot sizing model we may restrict our search for an optimal schedule within the class of zero-inventory property schedules and this simplifies our DP procedure computationally. In case we consider the capacitated lot sizing model the zero inventory property of a optimal schedule might not be true anymore and so we cannot use this restricted search . By Lemma 17 it is sufficient to identify an optimal zero-inventory production schedule within the set of all zero-inventory property production schedules. Also by this result we can derive a fast dynamic programming approach to find such an optimal schedule. Introduce now for every 1 ≤ k ≤ n the functional (observe we start with zero inventory at the beginning of period 1 and at the end of period k the inventory also equals 0)

value of minimal cost schedule from w(k) = period 1 up to k among the set of all zero-inventory schedules up to period k (66) Clearly w(0) := 0 and by Lemma 9 we obtain that v(n) is the optimal objective value of our uncapacitated n-period lotsizing model. One can now derive the following recurrent relation. Derivation recurrent relation for lotsizing model with linear costs. Consider a k-period model ending at time k with sk = 0 and assume that at time m ≤ k − 1 the last production occurs. Since in each k-period model there is a positive total demand we should have had a production order due to s0 = 0 and so there is a last production order before time k. 111

As already observed this happened at time 0 ≤ m ≤ k −1 and its order size by definition is given by xm+1. To determine this order size we obtain by the zero inventory property that the inventory level at the end of period m at time m satisfies sm = 0 Also, since the total demand from time m up to time k equals ∑k

d

n=m n ∑ it must follow that xm+1 = kn=m dn. Hence the total

inventory, production and setup costs associated with this last production order equals c(m, k) := pmdmk + fm +

∑k

h d (67) n=m n n+1k

with ∑k dik = d n=i n

the total demand from time i up to time k. 112

Observe the last term in relation (67) corresponds to the total inventory costs from time m up to time k. This shows that the associated cost of the k-period model having a last production order at time m is given by the optimal cost w(m) over the first m periods ( this is actually a m period model starting with inventory level zero and ending with zero inventory level) plus the cost c(m, k).

1

2

3

4

5

6

7

8

9

6

beginning period 7 last production 

 -

optimal costs w(6)

-

costs c(7, 9)

113

Hence it follows for 0 ≤ m ≤ k − 1 that w(k) ≤ w(m) + c(m, k) Since any minimal cost zero-inventory schedule in a k-period model should have a last production at some time m ≤ k − 1 and at the end of period m such a schedule has zero inventory we obtain w(k) = min0≤m≤k−1{w(m) + c(m, k)} A dynamic programming procedure for the N -period classical uncapacitated lotsizing model is now given by the following algorithm. Algorithm solving classical uncapacitated lot sizing model. Step 1. set w(0) = 0 and go to step 2. 114

Step 2. For k = 1 up to N compute w(k) = min0≤m≤k−1{w(m) + c(m, k)}

(68)

and go to step 3. Step 3. Output w(N ). Again by backtracking one can compute the optimal production schedule by considering w(N ) and finding within relation (68) that value of m ≤ N which achieves the minimum. This is the last period of production in the and consider then v(m − 1) and continue in this way. Unfortunately for the capacitated lotsizing problem (capacities on production in each period) the zero-inventory property does not hold. However, it is still possible to solve this problem by dynamic programming (see exercises!), but the speed of this solution procedure is much slower. 115