A STOCHASTIC ALGORITHM FOR DETERMINISTIC MULTISTAGE OPTIMIZATION PROBLEMS

arXiv:1810.12870v1 [math.OC] 30 Oct 2018

MARIANNE AKIAN∗ , JEAN-PHILIPPE CHANCELIER† , AND BENOˆIT TRAN† Abstract. Several attempt to dampen the curse of dimensionnality problem of the Dynamic Programming approach for solving multistage optimization problems have been investigated. One popular way to address this issue is the Stochastic Dual Dynamic Programming method (SDDP) introduced by Perreira and Pinto in 1991 for Markov Decision Processes. Assuming that the value function is convex (for a minimization problem), one builds a non-decreasing sequence of lower (or outer) convex approximations of the value function. Those convex approximations are constructed as a supremum of affine cuts. On continuous time deterministic optimal control problems, assuming that the value function is semiconvex, Zheng Qu, inspired by the work of McEneaney, introduced in 2013 a stochastic max-plus scheme that builds upper (or inner) non-increasing approximations of the value function. In this note, we build a common framework for both the SDDP and a discrete time version of Zheng Qu’s algorithm to solve deterministic multistage optimization problems. Our algorithm generates monotone approximations of the value functions as a pointwise supremum, or infimum, of basic (affine or quadratic for example) functions which are randomly selected. We give sufficient conditions on the way basic functions are selected in order to ensure almost sure convergence of the approximations to the value function on a set of interest. Key words. Deterministic multistage optimization problem, min-plus algebra, tropical algebra, Stochastic Dual Dynamic Programming, Dynamic Programming.

Introduction. Throughout this paper we aim to solve an optimization problem involving a dynamic system in discrete time. Informally, given a time t and a state xt , one can apply a control ut and the next state is given by the dynamic ft , that is xt+1 = ft (xt , ut ). Then one wants to minimize the sum of costs ct (xt , ut ) induced by our controls starting from a given state x0 and a given time horizon T . Furthermore, one can add some final restrictions on the states at time T which will be modeled by a function ψ only depending on the final state xT . As in [1] we will call such optimization problems, multistage (optimization) problems:

T −1 X

min

x=(x0 ,...,xT ) u=(u0 ,...uT −1 ) t=0

(0.1)

( s.t.

ct (xt , ut ) + ψ(xT )

x0 ∈ X is given, ∀t ∈ [[0, T − 1]], xt+1 = ft (xt , ut ).

One can solve the multistage problem (0.1) by Dynamic Programming as introduced by Richard Bellman around 1950 [1, 5]. This method breaks the multistage problem (0.1) into T sub-problems that one can solve by backward recursion over the time t ∈ [[0, T ]]. More precisely, denoting by X the set of states and given some X X operators Bt : R → R from the set of functionals that may take infinite values to itself, one can show (see for example [3]) that solving problem (0.1) is equivalent to solving the following system of sub-problems: ∗ INRIA

´ Saclay Ile-de-France and CMAP, Ecole Polytechnique, CNRS, France. ´ Ecole des Ponts ParisTech, France. Email : [email protected]

† CERMICS,

1

2

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

( (0.2)

VT = ψ ∀t ∈ [[0, T − 1]], Vt : x ∈ X 7→ Bt (Vt+1 )(x) ∈ R.

We will call each operator Bt the Bellman operator at time t and each equation in (0.2) will be called the Bellman equation at time t. Lastly, the functions Vt defined in (0.2) will be called the (Bellman) value function at time t. Note that by solving the system (0.2) we mean that we want to compute the value function V0 at point x0 , that is V0 (x0 ). We will state several assumptions on these operators in section 1 under which we will devise an algorithm to solve the system of Bellman equations 0.2 (also called the Dynamic Programming formulation of the multistage problem). Let us stress on the fact that although we want to solve the multistage problem 0.1, we will mostly focus on its (equivalent) Dynamic Programming formulation given by (0.2). One issue of the Dynamic Programming approach to solve multistage problems is the so-called curse of dimensionality [2], that is, grid-based methods to compute the value functions have a complexity exponential in the dimension of the state space. One popular algorithm (see [8, 9, 10, 14, 19, 20]) that aims to dampen the curse of dimensionality is the Stochastic Dual Dynamic Programming algorithm (or SDDP for short) introduced by Pereira and Pinto in 1991. Assuming that the cost functions ct are convex and the dynamics ft are linear, the value functions defined in the Dynamic Programming formulation (0.2) are convex [8]. The SDDP algorithm aims to build lower (or outer) approximations of the value functions as a supremum of affine functions and thus, doesn’t rely on a discretisation of the state space in order to compute (approximations of) the value functions. In the aforementioned references, this approach is used to solve stochastic multistage problems, however in this article we will restrict our study to deterministic multistage problems, that is, the above formulation (0.1). Still, the SDDP algorithm can be applied to our framework. One of the main drawback of the SDDP algorithm is the lack of an efficient stopping criterion: it builds lower approximations of the value functions but upper (or inner) approximations are built through a Monte-Carlo scheme that is costly. During her thesis [15], Zheng Qu devised an algorithm [16] which builds upper approximations of the value functions in an infinite horizon and continuous time framework where the set of controls is both discrete and continuous. This work was inspired by the work of McEneaney [12, 13] using techniques coming from tropical algebra or also called min-plus techniques. Assume that for each fixed discrete control the cost functions are convex quadratic and the dynamics are linear. If the set of discrete controls is finite, then exploiting the min-plus linearity of the Bellman operators, one can show that the value functions can be computed as a finite pointwise infimum of convex quadratic functions: Vt = inf φt , φt ∈Ft

where Ft is a finite set of convex quadratic forms. Moreover, in this framework, the elements of Ft can be explicitly computed through the Discrete Algebraic Riccati Equation (DARE, [11]). Thus an approximation scheme that computes non-decreasing subsets Ftk k∈N of Ft yields an algorithm that converges after a finite number of

3 improvements Vtk := inf φt ≈ inf φt = Vt . φt ∈Ftk

φt ∈Ft

However the size of the set of functions Ft that need to be computed is growing exponentially with t. Informally, in order to address this issue, Qu introduced a probabilistic scheme that adds to Ftk the best (given the current approximations) element of Ft at some point drawn on the unit sphere (assuming the space of states to be Euclidean). Our work aims to build a general algorithm that encompasses both a deterministic version of the SDDP algorithm and an adaptation of Qu’s work to a discrete time and finite horizon framework. This paper is divided in 3 sections. In the first section we make several assumptions on the Bellman operators Bt and define an algorithm which builds approximations of the value functions as a pointwise optimum (i.e. either a pointwise infimum or a pointwise supremum) of basic functions in order to solve the associated Dynamic Programming formulation (0.2) of the multistage problem (0.1). At each iteration, the so-called basic function that is added to the current approximation will have to satisfy two key properties at a point randomly drawn, namely, tightness and validity. A key feature of our algorithm is that it can yield either upper or lower approximations, for example: — If the basic functions are affine, then approximating the value functions by a pointwise supremum of affine functions will yield the SDDP algorithm. — If the basic functions are quadratic convex, then approximating the value functions by a pointwise infimum of convex quadratic functions will yield an adaptation of Qu’s algorithm. In the following section we study the convergence of the approximations of the value functions generated by our algorithm at a given time t ∈ [[0, T ]]. Under the previous assumptions our approximating sequence converges almost surely (over the draws) to the value function on a set of interest (that will be specified). Finally on the last section we will specify our algorithm to the two special cases mentioned in the first section. The convergence result of section 2 specified to these two cases will be new for (an adaptation of) Qu’s algorithm and will be the same as in the literature for the SDDP algorithm. It will be a step toward addressing the issue of computing efficient upper approximations for the SDDP algorithm and opens another way to devise algorithms for a broader class of multistage problems. 1. Notations and definitions. Notations 1.1. — Denote by X := Rn the set of states endowed with its euclidean structure and its Borel σ-algebra. — Denote by T a finite integer that we’ll call the horizon. — Denote by opt an operation that is either the pointwise infimum or the pointwise supremum of functions which we will call the pointwise optimum. Note that once a choice of which operation is associated with opt, it remains the same for the remainder of this article. — Denote by R the extended reals endowed with the operations +∞ − ∞ = −∞ + ∞ = +∞. X — For every t ∈ [[0, T ]], fix Ft and Ft two subsets of R the set of functionals on X such that Ft ⊂ Ft .

4

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Fig. 1. The lower approximations Vt k will be built as a supremum of basic functions (here k

affine functions) that will always be below Vt . Likewise, the upper approximations Vt will be built as an infimum of some other basic functions (here quadratic functions) that will always be above Vt .

— We will say that a functional φ is a basic function if it’s an element of Ft for some t ∈ [[0, T ]]. — For every set X ⊂ X, denote by δX the function equal to 0 on X and +∞ elsewhere. — For every t ∈ [[0, T ]] and every set of basic function Ft ⊂ Ft we denote by VFt its pointwise optimum, VFt := optφ∈Ft φ, i.e. (1.1)

VF : X x

−→ 7−→

R opt {φ(x) | φ ∈ Ft } . X

X

— Denote by (Bt )t∈[[0,T −1]] a sequence of T operators from R to R , that we will call the Bellman operators. — Fix a functional ψ : X → R. We define a sequence of functions (Vt )t∈[[0,T ]] , called the value functions, by the system of Bellman equations: ( VT = ψ (1.2) ∀t ∈ [[0, T − 1]], Vt : x ∈ X 7→ Bt (Vt+1 )(x) ∈ R. We first make several assumptions on the structure of problem (1.2). Those assumptions will be satisfied in the examples of section 3. Informally, we want some regularities on the Bellman operators so as to propagate, backward in time, good behaviours of the value function at the final time t = T to the value function at the initial time t = 0. Moreover, at each time t, we ask that the basic functions that build our approximations are such that their pointwise optimum share a common regularity. Assumptions 1.2 (Structural assumptions). — Common regularity: for every t ∈ [[0, T ]], there exist a common (local) modulus of continuity for every φ ∈ Ft , i.e. for every x ∈ X, there exist ωx : R+ ∪{+∞} → R+ ∪{+∞} which is increasing, equal to 0 and continuous at 0 such that for every φ ∈ Ft and for every x0 ∈ X we have that |f (x) − f (x0 )| ≤ ωx (|x − x0 |) .

5 — Final condition: the value function at time T is a pointwise optimum for some given subset FT of FT as in (1.1), that is VT := VFT . — Stability by the Bellman operators: for every t ∈ [[0, T − 1]], if φ ∈ Ft+1 then Bt (φ) belongs to Ft . — Stability by pointwise optimum: for every t ∈ [[0, T ]], if Ft ⊂ Ft then VFT ∈ Ft . — Order preserving operators: the operators Bt , 0 ≤ t ≤ T − 1 are order preserving, i.e. if φ, ϕ ∈ Ft+1 are such that φ ≤ ϕ, then Bt (φ) ≤ Bt (ϕ). — Existence of the value functions: there exists a solution (Vt )t∈[[0,T ]] to the Bellman equations (0.2) that never takes the value −∞ and which is not equal to +∞. — Existence of optimal sets: for every t ∈ [[0, T − 1]] and every compact set Xt ⊂ dom (Vt ), there exist a compact set Xt+1 ⊂ dom (Vt+1 ) such that for every functional φ ∈ Ft+1 we have Bt φ + δXt+1 ≤ Bt (φ) + δXt . — Additively M -subhomogeneous operators: the operators Bt , 0 ≤ t ≤ T − 1, are additively M -subhomogeneous, i.e. there exist a constant M ≥ 0 such that for every positive constant functional λ and every functional φ ∈ Ft+1 we have that Bt (φ + λ) + δdom(Vt ) ≤ Bt (φ) + M λ + δdom(Vt ) . From a set of basic functions Ft ⊂ Ft , one can build its pointwise optimum VFt = optφ∈Ft φ. We build monotone approximations of the value functions as an optimum of basic functions which will be computed through a compatible selection function as defined below. We illustrate this definition in Figure 2. Definition 1.3 (Compatible selection function). Let t ∈ [[0, T − 1]] be fixed. A compatible selection function is a function φ]t from 2Ft+1 × X to Ft satisfying the following properties — Validity: for all set of basic functions Ft+1 ⊂ Ft+1 and every x ∈ X we have φ]t (Ft+1 , x) ≤ Bt VFt+1 (resp. φ]t (Ft+1 , x) ≥ Bt VFt+1 ) when opt = sup (resp. opt = inf). — Tightness: for all set of basic functions Ft+1 ⊂ Ft+1 and every x ∈ X the functions φ]t (Ft+1 , x) and Bt VFt+1 coincide at point x, that is φ]t (Ft+1 , x) (x) = Bt VFt+1 (x). For t = T , we say that φ]T : 2FT × X → FT is a compatible selection function if function φ]T is valid in the sense that for every FT ⊂ FT and x ∈ X, the function φ]T (FT , x) remains above (resp. below) the value function at time T when opt = inf (resp. opt = sup). Moreover, we say that function φ]T is tight if it coincides with the value function at point x, that is for every FT ⊂ FT and x ∈ X we have φ]T (FT , x) (x) = VT (x). Note that the tightness assumption only asks for equality at the point x between the functions φ]t (Ft+1 , x) and Bt VFt+1 and not necessarily everywhere. The only global property between the functions φ]t (Ft+1 , x) and Bt VFt+1 is an inequality given by the validity assumption. In Algorithm 1.1 we will generate for every time t a sequence of random points of crucial importance as they will be the ones where the selection functions will be

6

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Fig. 2. In section 3 we will specify two selection functions, φQu and φSDDP that will respectively t t yield upper and lower approximations of the value functions. In both cases, at the the selection function is equal to the image by the Bellman operator of the current approximation, that is the tightness assumption. Moreover it remains above the image by the Bellman operator of the current approximation, that is the validity assumption.

evaluated, given the set Ftk which characterizes the current approximation. In order to generate those points, we will assume that we have at our disposition an Oracle that given T +1 set of functions (characterizing the current approximations) computes T + 1 compact sets and a probability law of support equal to those compact sets. This Oracle will have to follow the following conditions on its output. Assumptions 1.4 (Oracle assumptions). The Oracle takes as input T + 1 sets of functions included in F1 × . . . × FT . Its output is T +1 compact sets K0 , . . . , KT each included in X and a probability measure µ on XT +1 (where X = Rn is endowed with its borelian σ-algebra) such that: — For every t ∈ [[0, T ]], Kt ⊂ dom (Vt ). — For every t ∈ [[0, T ]], there exist a function gt : R∗+ → (0, 1) such that for every η > 0 and every x ∈ Kt , µ (B (x, η) ∩ Kt ) ≥ gt (η) . An example of such Oracle would be one that outputs T + 1 union of N singletons in dom (Vt ) (for some positive integer N ) and the product measure of µt , 0 ≤ t ≤ T where µt is the uniform measure over the N singletons. Then for every t ∈ [[0, T ]] the constant function gt equal to n1 satisfies Assumptions 1.4. For every time t ∈ [[0, T ]], we construct a sequence of functionals Vtk k∈N of Ft as follows. For every time t ∈ [[0, T ]] and for every k ≥ 0, we build a subset Ftk of the set Ft and define the sequence of functionals by pointwise optimum Vtk := VFtk = optφFtk φ. As described here, the functionals are just byproducts of Algorithm 1.1 which only describes the way the sets Ftk are defined. As Algorithm 1.1 was inspired by Qu’s work which uses tropical algebra techniques, we will call this algorithm Tropical Dynamic Programming. 2. Almost sure convergence on the set of accumulation points.

7 Algorithm 1.1 Tropical Dynamic Programming (TDP) Input: For every t ∈ [[0, T ]], φ]t a compatible selection function, an Oracle satisfying Assumptions 1.4, T + 1 compact sets K00 × . . . × KT0 and a probability measure µ0 on XT +1 of support equal to those T + 1 compact sets. Output: For every t ∈ [[0, T ]], a sequence of sets Ftk k∈N . Define for every t ∈ [[0, T ]], Ft0 := ∅. for k ≥ 1 do Draw some points xk−1 over K0k−1 × K1k−1 × . . . × KTk−1 according to t t∈[[0,T ]] 0 µk−1 , independently from previous draws at iterations k < k . Compute φkT := φ]T FTk−1 , xk−1 . T Define FTk := FTk−1 ∪ φkT . for t from T − 1 to 0 do k Compute φkt := φ]t Ft+1 , xk−1 . t Define Ftk := Ftk−1 ∪ φkt . end for Compute K0k × . . . KTk , µk := Oracle F0k , . . . , FTk . end for

First, we state several simple but crucial properties of the approximation functions Vtk k∈N generated by Algorithm 1.1. They are direct consequences of the facts that the Bellman operators are order preserving and that the basic functions building our approximations are computed through compatible selection functions. Lemma 2.1. 1. Let (F k )k∈N be a non-decreasing (for the inclusion) sequence of set of functionals on X. Then the sequence (VF k )k∈N is monotone. More precisely, when opt = inf then (VF k )k∈N is non-increasing and when opt = sup then (VF k )k∈N is non-decreasing. 2. Monotone approximations: for every indices k < k 0 we have that 0

Vtk ≥ Vtk ≥ Vt

(2.1)

when opt = inf

0

and Vtk ≤ Vtk ≤ Vt when opt = sup. 3. For every k ∈ N and every t ∈ [[0, T − 1]] we have that k (2.2) Bt Vt+1 ≤ Vtk when opt = inf k and Bt Vt+1 ≥ Vtk when opt = sup. 4. For every k ≥ 1, we have k−1 k (2.3) Bt Vt+1 xt = Vtk xk−1 . t Proof. We prove each point succesively when opt = inf, as the case opt = sup is similar. 0 1. Let F k ⊂ F k be two set of functionals. When opt = inf for every x ∈ X we have that VF k0 (x) := inf 0 φ(x) ≤ inf φ(x) =: VF k (x). φ∈F k

φ∈F k

2. By construction of Algorithm 1.1, the sequence of sets Ftk k∈N is non0 decreasing, thus for every indices k < k 0 we have that Vtk ≥ Vtk when 0 opt = inf (and Vtk ≤ Vtk when opt = sup).

8

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Now we show that Vtk ≥ Vt when opt = inf, the case opt = sup is analogous. Fix k ∈ N, we show this by backward recursion on t ∈ [[0, T ]]. For t = T , by validity of the selection functions Definition 1.3, for every φ ∈ FTk we have that φ ≥ VT . Thus VTk = VFTk = inf φ∈FTk φ ≥ VT . Now, suppose that for k some t ∈ [[0, T − 1]] we have Vt+1 ≥ Vt+1 . Applying the Bellman operator, using the definition of the value function (0.2) and as the Bellman operators are order preserving, we get the desired result. 3. We prove the assertion by induction on k ∈ N in the case opt = inf. For k = 0, as Ft0 = ∅ we have Vt0 = +∞. Thus the assertion is true for k = 0. Assume that for some k ∈ N we have k (2.4) Bt Vt+1 ≤ Vtk . k+1 k By (2.1) we have that Vt+1 ≤ Vt+1 . Thus, as the Bellman operators are k+1 k order preserving we have that Bt Vt+1 ≤ Bt Vt+1 . Thus by induction hypothesis (2.4) we get k+1 (2.5) Bt Vt+1 ≤ Vtk .

Moreover as the selection function is valid, we have that : k+1 (2.6) Bt Vt+1 ≤ φk+1 . t , Finally, by construction of Algorithm 1.1 we have that Vtk+1 = inf Vtk , φk+1 t so using (2.5) and (2.6) we deduce the desired result k+1 Bt Vt+1 ≤ Vtk+1 . 4. As the selection function φ]t is tight in the sense of Definition 1.3 we have by construction of Algorithm 1.1 that k−1 k Bt Vt+1 xt = φkt xk−1 . t Combining it with (2.2) (or its variant when opt = sup) and the definition of Vtk , one gets the desired equality. In the followingk two propositions, we state that the sequences of functionals Vtk k∈N and Bt Vt+1 converge uniformly on any compact included in the dok∈N main of Vt . The limit functional of Vtk k∈N , noted Vt∗ , will be our natural candidate to be equal to the value function Vt . Moreover, the convergence will be µ-almost sure where (see [6, pages 257-259]) µ is the unique probability measure over the countable cartesian product XT +1 × . . . × XT +1 × . . . endowed with the product σ-algebra such that for every borelian Xi ⊂ XT +1 , 1 ≤ i ≤ k, Y µ X1 × . . . × Xk × XT +1 = µ1 (X1 ) × . . . × µk (Xk ) , i≥k+1

where µk k∈N is the sequence of probability measures generated by Algorithm 1.1 through the Oracle. Proposition 2.2 (Existenceof an approximating limit). Let t ∈ [[0, T ]] be fixed. The sequence of functionals Vtk k∈N defined as Vtk := VFtk (where the sets Ftk are generated by Algorithm 1.1) µ-a.s. converges uniformly on every compact set included in the domain of Vt to a functional Vt∗ .

9 Proof. Let t ∈ [[0, T ]] be fixed and let Kt be a given compact set included in the domain of Vt . We denote by Vtk k∈N a sequence of approximations generated by Algorithm 1.1. The proof relies on the Arzel`a-Ascoli Theorem [18, Theorem 2.13.30 p.347]. More precisely, First, by Assumptions 1.2 each functional in Ft have a common modulus of continuity. Thus as Ftk ⊂ Ft , the family of functionals Vtk k≥0 is equicontinuous. Now, by Lemma 2.1, the sequence of functionals Vtk k≥0 is monotone. Now for every x ∈ Kt , the set Vtk (x) k≥1 is bounded by Vt (x), which is finite since we assumed Kt ⊂ dom (Vt ), and Vt1 (x). Hence the set Vtk (x) k≥1 is a bounded subset of R and thus relatively compact. By Arzel` a-Ascoli Theorem, the sequence of functions Vtk k≥1 is a relatively compact set of C (Kt , R) for the topology of the uniform convergence, i.e. there exists a subsequence of Vtk k≥1 converging uniformly to a function Vt∗ ∈ C (Kt , R) . Finally, as Vtk k≥0 is a monotone sequence of functions, we conclude that the sequence Vtk k≥0 converges uniformly on the compact Kt to Vt∗ ∈ C (Kt , R). Proposition 2.3. Let t ∈ [[0, T − 1]] be fixed and Vt∗ be the function defined in k Proposition 2.2. The sequence Bt Vt+1 µ-a.s. converges uniformly to the continuous ∗ function Bt Vt+1 on every compact sets included in the domain of Vt . Proof. We will stick to the case when opt = inf and leave the other case to the reader. Let Kt be a given compact set included in dom (Vt ). First, as the sek quence Vt+1 is non-increasing and using the fact that the operator Bt is order k∈N k preserving, the sequence Bt Vt+1 is also non-increasing. By stability of the k∈N k Bellman operator Bt (see Assumptions 1.2),we have that the function Bt Vt+1 is k in Ft for every k ∈ N and thus the family Bt Vt+1 is equicontinuous using k∈N the common regularity assumption in Assumptions 1.2. Moreover, given x ∈ Kt , the k set Bt Vt+1 (x) k≥1 is bounded by Vt1 (x) and Vt (x) which take finite values on dom (Vt ). Thus, using againArzel` a-Ascoli Theorem, there exists a continuous funck tional φ such that Bt Vt+1 converges uniformly to φ on any compact included k∈N in dom (Vt ). ∗ We now show that the functional φ is equal to Bt Vt+1 on the given compact ∗ Kt or equivalently we show that φ + δKt = Bt Vt+1 + δKt . As already shown in k ∗ Proposition 2.2, the sequence Vt+1 is lower bounded by Vt+1 . We thus have that k∈N k ∗ Vt+1 ≥ Vt+1 , which combined with the fact that the operator Bt is order preserving, gives, for every k ≥ 1, that k ∗ Bt Vt+1 ≥ Bt Vt+1 . Adding, on both side of the previous inequality, the mapping δKt and taking the limit as k goes to infinity, we have that ∗ φ + δKt ≥ Bt Vt+1 + δKt . For the converse inequality, by the existence of optimal sets (see Assumptions 1.2), there exists a compact set Kt+1 ⊂ dom (Vt+1 ) such that ∗ ∗ (2.7) Bt Vt+1 + δKt+1 ≤ Bt Vt+1 + δKt . k By Proposition 2.2, the non-increasing sequence Vt+1 converges uniformly to k∈N ∗ Vt+1 on the compact set Kt+1 . Thus, for any fixed > 0, there exists an integer

10

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

k0 ∈ N such that we have k k ∗ Vt+1 ≤ Vt+1 + δKt+1 ≤ Vt+1 + + δKt+1 ,

for all k ≥ k0 . By Assumptions 1.2, the operator Bt is order preserving and additively M-subhomogeneous, thus we get k k ∗ Bt Vt+1 ≤ Bt Vt+1 + δKt+1 ≤ Bt Vt+1 + δKt+1 + M , which, combined with (2.7) gives that k ∗ Bt Vt+1 + δKt ≤ Bt Vt+1 + M + δKt , for every k ≥ k0 . Taking the limit when k goes to infinity we obtain that ∗ φ + δKt ≤ Bt Vt+1 + δKt + M . ∗ The result is proved for all > 0 and we have thus shown that φ = Bt Vt+1 on k the compact set Kt . We conclude that Bt Vt+1 converges uniformly to the k∈N ∗ functional Bt Vt+1 on the compact set Kt . We want to exploit that our approximations of the final cost function are exact in the sense that we have equality between VTk and VT at the points drawn in Algorithm 1.1, that is, the tightness assumption of the selection function is much stronger at time T than for times t < T . Thus we want to propagate the information backward in time: starting from time t = T we want to deduce information on the approximations for times t < T . In order to show that Vt = Vt∗ on some set Xt , a dissymmetry between upper and lower approximations is emphasized. We introduce the notion of optimal sets (Xt )t∈[[0,T ]] with respect to a sequence of functionals (φt )t∈[[0,T ]] as a condition on the sets (Xt )t∈[[0,T ]] such that if one wants to compute the restriction of Bt (φt+1 ) to Xt , ones only need to know φt+1 on the set Xt+1 . The Figure 3 illustrates this notion. Definition 2.4 (Optimal sets). Let (φt )t∈[[0,T ]] be T + 1 functionals on X. A sequence of sets (Xt )t∈[[0,T ]] is said to be (φt )-optimal or optimal with respect to (φt )t∈[[0,T ]] , if for every t ∈ [[0, T − 1]] we have (2.8)

Bt φt+1 + δXt+1 + δXt = Bt (φt+1 ) + δXt .

When approximating from below, the optimality of sets is only needed for the functions (Vt∗ )t∈[[0,T ]] whereas when approximating from above, one needs the optimality of sets with respect to (Vt )t∈[[0,T ]] . It seems easier to ensure the optimality of sets for (Vt∗ )t∈[[0,T ]] than for (Vt )t∈[[0,T ]] as the functional Vt∗ is known through the se quence Vtk k∈N whereas the function Vt is, a priori, unknown. This fact is discussed in section 3. Lemma 2.5 which is — optimal — optimal If the sequence Equations:

(Unicity in Bellman Equation). Let (Xt )t∈[[0,T ]] be a sequence of sets with respect to (Vt )t∈[[0,T ]] when opt = inf, with respect to (Vt∗ )t∈[[0,T ]] when opt = sup. of functionals (Vt∗ )t∈[[0,T ]] satisfies the following modified Bellman

11 j

k

l

m

Fig. 3. The optimality of the sets (Xt )t∈[[0,T ]] means that in order to compute the restriction of Bt (φt+1 ) to Xt , one only needs to know the values of φt+1 on the set Xt+1 .

( (2.9)

VT∗ + δXT = ψ + δXT ∗ ∀t ∈ [[0, T − 1]], Bt Vt+1 + δXt = Vt∗ + δXt .

Then for every t ∈ [[0, T ]] and every x ∈ Xt we have that Vt∗ (x) = Vt (x). Proof. We prove the lemma by backward recursion on the time t ∈ [[0, T ]], first for the case opt = inf. For time t = T , since by definition of VT = ψ (see (0.2)) we have VT∗ + δXT = ψ + δXT = VT + δXT . Now, let time t ∈ [[0, T − 1]] be fixed and ∗ assume that we have Vt+1 (x) = Vt+1 (x) for every x ∈ Xt+1 i.e. (2.10)

∗ Vt+1 + δXt+1 = Vt+1 + δXt+1 .

Using Lemma 2.1, the sequence of functions Vtk is lower bounded by Vt . Taking the limit in k, we obtain that Vt∗ ≥ Vt , we have thus only to prove that Vt∗ ≤ Vt on Xt , that is Vt∗ + δXt ≤ Vt + δXt . We successively have: (by (2.9))

∗ Vt∗ + δXt = Bt Vt+1 + δ Xt

(by induction assumption (2.10))

∗ ≤ Bt Vt+1 + δXt+1 + δXt = Bt Vt+1 + δXt+1 + δXt

((Xt )t∈[[0,T ]] is (Vt )-optimal)

= Bt (Vt+1 ) + δXt

(by (0.2))

= Vt + δXt ,

(Bt is order preserving)

which concludes the proof in the case of opt = inf. Now we briefly prove the case opt = sup by backward recursion on the time t ∈ [[0, T ]]. As for the case opt = inf, at time t = T , one has VT∗ + δXT = VT + δXT . Now assume that for some t ∈ [[0, T − 1]] one has Vt+1 + δXt+1 = Vt+1 + δXt+1 . By Lemma 2.1, the sequence of functions Vtk is upper bounded by Vt . Thus, taking the limit in k we obtain that Vt∗ ≤ Vt and we

12

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

only need to prove that Vt∗ + δXt ≥ Vt + δXt . We successively have: Vt + δXt = Bt (Vt+1 ) + δXt

(by (0.2))

((Xt )t∈[[0,T ]] is (Vt∗ )-optimal)

≤ Bt Vt+1 + δXt+1 + δXt ∗ = Bt Vt+1 + δXt+1 + δXt ∗ = Bt Vt+1 + δ Xt

(by (2.9))

= Vt∗ + δXt ,

(Bt is order preserving) (by induction assumption (2.10))

In the general case, one cannot hope for the limit object Vt∗ to be equal to the value function Vt everywhere. However, one can expect an (almost sure over the draws) equality between the two functions Vt and Vt∗ on all possible cluster points of sequences (yk )k∈N with yk ∈ Ktk for all k ∈ N, that is, on the set lim sup Ktk (see [17, Definition 4.1 p. 109]). Theorem 2.6 (Convergence of Tropical Dynamic Programming). Define Kt∗ := lim supk Ktk , for every time t ∈ [[0, T ]]. Assume that, µ-a.s the sets (Kt∗ )t∈[[0,T ]] are (Vt )-optimal when opt = inf (resp. (Vt∗ )-optimal when opt = sup). Then, µ-a.s. for every t ∈ [[0, T ]] the functional Vt∗ defined in Proposition 2.2 is equal to the value function Vt on Kt∗ . Proof. We will only study the case opt = inf as the case opt = sup is analogous. We will show that (2.9) holds with Xt = Kt∗ . By tightness of the selection function ∗ at time T , we get that VT∗ = VT on K fix t ∈ [[0, T − 1]]. T . Now, ∗ — First, we show that Bt Vt+1 ≤ Vt∗ . By Lemma 2.1, for every k ∈ N we have that k Bt Vt+1 ≤ Vtk . ∗ k Moreover, by Lemma 2.1 we have Vt+1 ≤ Vt+1 thus, as the operator Bt is order preserving we have ∗ k Bt Vt+1 ≤ Bt Vt+1 ≤ Vtk −→ Vt∗ (by Proposition 2.2). k→+∞

∗ — Now, we will show by contradiction that Bt Vt+1 ≥ Vt∗ on Kt∗ . Suppose ∗ that there exist x ∈ Kt such that ∗ (2.11) Bt Vt+1 (x) < Vt∗ (x). ∗ Define h := Vt∗ (x) − Bt Vt+1 (x) > 0. As illustrated in Figure 4, we will ∗ (x) show that there is an index k such that Vtk+1 (x) will be closer to Bt Vt+1 than Vt∗ (x), which will contradict the fact that the sequence Vtk (x) k∈N is non-increasing. To this end, we state several facts : – By equicontinuity of the sequence of fonctionals Vtk k∈N there exist η > 0 such that for every y ∈ B (x, η), for every index k ∈ N we have (2.12)

|Vtk (y) − Vtk (x)| ≤

h . 4

– By Lemma 2.1, for every index k ∈ N the k-th draw at time t, noted xkt is such that k k+1 (2.13) Bt Vt+1 (xt ) = Vtk+1 (xkt ).

13

∗ Fig. 4. If there exist x ∈ Kt∗ such that Bt Vt+1 (x) < Vt∗ (x) then there is an index k such k+1 ∗ that Vt (x) will be closer to Bt Vt+1 (x) than Vt∗ (x) as the selection function is tight. This will contradict the fact that the sequence Vtk (x) k∈N is non-increasing.

k – By Proposition 2.3, the sequence Bt Vt+1 converges uniformly to k∈N ∗ the continuous functional Bt Vt+1 on arbitrary compact sets included in dom (V t ). Hence, it converges pointwise to the continuous function ∗ Bt Vt+1 on dom (Vt ). Thus, we get the following inequality: for any y ∈ dom (Vt ), there exist a rank k0 ∈ N such that if k ≥ k0 we have that h ∗ k |Bt Vt+1 (y) − Bt Vt+1 (y)| ≤ . 4 ∗ 0 – By continuity of Bt Vt+1 at x, there exist η > 0 such that η 0 < η and for every y ∈ B (x, η 0 ) we have (2.14)

(2.15)

h ∗ ∗ |Bt Vt+1 (y) − Bt Vt+1 (x)| ≤ . 4

– As x is in Kt∗ := lim supk Ktk , by definition of the limsup, there exist an σ(k) σ(k) increasing function σ : N → N and a sequence of points yt ∈ Kt σ(k) −→ x. Thus, there exist a rank k1 ≥ k0 such that if such that yt k→+∞ 0 0 σ(k) σ(k) k ≥ k1 then yt ∈ B x, η2 and a fortiori B yt , η2 ⊂ B (x, η 0 ). Let (Xkt )k∈N be the sequence of random variables where for each k ∈ N, Xkt is the t-th marginal of a random variable of probability law µk . We have that for every k ≥ k1 , σ(k) σ(k) σ(k) P Xt ∈ B (x, η 0 ) ≥ P Xt ∈ B yt , η 0 /2 σ(k) σ(k) σ(k) ≥ P Xt ∈ B yt , η 0 /2 ∩ Kt σ(k) σ(k) = µσ(k) B yt , η 0 /2 ∩ Kt 0 η ≥g (by Assumptions 1.4). 2

14

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Hence, the probability that the subsequence never draw a point in the ball B (x, η 0 ) after the rank σ (k1 ) is bounded from above by Y (1 − g (η 0 /2)) = 0. | {z } k≥k1

∈(0,1)

Therefore, µ-almost surely, there exists an index k2 ≥ k1 such that the σ(k2 )-th draw at time t in Algorithm 1.1 satisfies: xkt ∈ B (x, η 0 ) ,

(2.16)

where k is a simplified notation for σ(k2 ), k := σ(k2 ). By (2.16), the state xkt satisfies both (2.12) and (2.15). Thus, we can conclude ∗ that Vtk+1 (x) is closer to Bt Vt+1 (x) than Vt∗ (x) as detailed now: ∗ |Vtk+1 (x) − Bt Vt+1 (x) | ≤ |Vtk+1 (x) − Vtk+1 xkt | {z } | ≤h/4 by (2.12)

k+1 + |Vtk+1 xkt − Bt Vt+1 xkt | | {z } =0 by (2.13)

+ |Bt | +

k+1 Vt+1

∗ |Bt Vt+1

| ≤

k ∗ xkt − Bt Vt+1 xt | {z } ≤h/4 by (2.14)

∗ xkt − Bt Vt+1 (x) | {z }

≤h/4 by (2.15)

3h . 4

Hence we have that ∗ ∗ Vtk+1 (x) = Vtk+1 (x) − Bt Vt+1 (x) + Bt Vt+1 (x) 3h ∗ ≤ Bt Vt+1 (x) + . 4 And finally we get ∗ ∗ (x) − Vt∗ (x) − Bt Vt+1 (x) Vtk+1 (x) − Vt∗ (x) = Vtk+1 (x) − Bt Vt+1 3h −h 4 < 0, ≤

which contradicts the fact that the sequence Vtk k∈N is non-increasing (Lemma 2.1). Hence, there is no x ∈ Kt such that (2.11) holds. We conclude that the sequence (Vt∗ )k∈N satisfies the modified Bellman Equation (2.9) with the sequence (Kt∗ )k∈N . The conclusion follows from the Unicity Lemma Lemma 2.5. 3. The multistage framework and examples of selection functions.

15 3.1. SDDP selection function: lower approximations in the linearconvex framework. We will show that our framework contains a similar framework of (the deterministic version of) the SDDP algorithm as described in [8] and yields the same result of convergence. Let X = Rn be a continuous state space and U = Rm a continuous control space. We want to solve the following problem min

T −1 X

x=(x0 ,...,xT ) u=(u0 ,...uT −1 ) t=0

(3.1)

ct (xt , ut ) + ψ(xT )

x0 ∈ X is given, ∀t ∈ [[0, T ]], x ∈ X ⊂ X, t t s.t. ∀t ∈ [[0, T − 1]], u ∈ Ut ⊂ U, t ∀t ∈ [[0, T − 1]], xt+1 = ft (xt , ut ).

We make similar assumptions as in [8], one can look at this article for more in-depth comments about them. Assumptions 3.1. For all t ∈ [[0, T − 1]] we assume that: — The set Xt ⊂ X and XT ⊂ X are convex and compacts with non-empty relative interior. — The set Ut is non-empty, convex and compact. — The dynamic ft : X × U −→ X is linear: ft (x, u) = At x + Bt u, for some given matrices At and Bt of compatible dimensions. — The cost function ct : X × U −→ R is a proper convex lower semicontinuous (lsc) function and is a Ct -Lipschitz continuous function on Xt × Ut . — The final cost function ψ : X −→ R is a proper convex lsc function and is a CT -Lipschitz continuous function on XT . — Relatively Complete Recourse (RCR). For every x ∈ Xt we have that ft (x, Ut ) ∩ ri (Xt+1 ) 6= ∅. For every time step t ∈ [[0, T − 1]], we define the Bellman operator Bt for every functional φ : X → R by: Bt (φ) := inf ct (·, u) + φ (ft (·, u)) . u∈U

Moreover, for every functional φ : X → R and every (x, u) ∈ X × U we define Btu (φ) (x) := ct (x, u) + φ (ft (·, u)) ∈ R. The Dynamic Programming principle using the Bellman’s operators Bt yields: ( VT = ψ (3.2) ∀t ∈ [[0, T − 1]], Vt : x ∈ X 7→ Bt (Vt+1 )(x) ∈ R. Using a generalization of Hoffman’s Lemma [4, Theorem 9] that bounds from above the distance between the image by a linear mapping of a point and a convex set, we show that the image of a Lipschitz continuous function by a Bellman operator will also be Lipschitz continuous, with an explicit (conservative) constant.

16

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Proposition 3.2. Under Assumptions 3.1, for every t ∈ [[0, T − 1]] there exist a constant Lt > 0 such that if φ : X → R is convex and Lt+1 -Lipschitz continuous on Xt+1 then Bt (φ) is Lt -Lipschitz continuous on Xt . Proof. Fix t ∈ [[0, T − 1]] and let φ : X → R be a convex and Lt+1 -Lipschitz continuous function on Xt+1 . Let xt ∈ Xt be arbitrary. By the RCR Assumption 3.1, there exist ut ∈ Ut such that ft (xt , ut ) ∈ ri (Xt+1 ). Thus, we have min ct (xt , u) + φ (ft (xt , u)) ≤ ct (xt , ut ) + φ (ft (xt , ut )) < +∞.

u∈Ut

Moreover, by compacity of Ut and lower semicontinuity of u 7→ ct (x, u), φ and ft , the above minimum is attained. We have shown that dom (Bt (φ)) includes Xt . Now the function x 7→ minu∈Ut ct (x, u) + φ (ft (x, u)) = Bt (φ) (x) is convex on Xt as a marginal of a jointly convex function. We finally show that the function Bt (φ) is Lipschitz on Xt with a constant Lt > 0 detailled below. Let x be in Xt and note ux an optimal control at x, that is ux satisfies Btux (φ) (x) = Bt (φ) (x). Note that for every x0 ∈ Xt , by the RCR assumption, there exist a control u such that fy (u) := At y + Bt u is an element of ri Xt+1 . Thus, by Hoffman’s Lemma (see [4, Theorem 9]) there exist a constant γ > 0 such that (3.3)

inf u s.t. fx0 (u)∈Xt+1

kux − uk ≤ γ dist (fx0 (ux ) , Xt+1 ) .

We want to bound from above inf u s.t. fx0 (u)∈Xt+1 kux −uk by a constant times kx−x0 k. As (−At x) ∈ Bt ux − Xt+1 , by triangle inequality we have that dist (Bt ux − (−At x0 ) , Xt+1 ) ≤ dist (At x0 + Bt ux , At x + Bt ux ) = dist (At x0 , At x) (3.4)

≤ kAt x0 − At xk 1/2 dist (fx0 (ux ) , Xt+1 ) ≤ λmax ATt At kx − x0 k.

Setting κ1 := γ λmax AT A (3.5)

1/2

, by (3.3) and (3.4) we have shown that

inf u s.t. fx0 (u)∈Xt+1

kux − uk ≤ κ1 kx − x0 k. u

Similarly, denoting ux0 a control such that Bt x0 (φ) (x0 ) = Bt (φ) (x0 ), there exists a constant κ2 > 0 such that we have (3.6)

inf u s.t. fx (u)∈Xt+1

kux0 − uk ≤ κ2 kx − x0 k.

Now for every u such that fx0 (u) ∈ Xt+1 , as ct is Ct -Lipschitz continuous, φ is Lt+1 -Lipschitz continuous and ft is linear, we have that Bt (φ) (x0 ) ≤ Bt (φ) (x) + Btu (φ) (x0 ) − Bt (φ) (x) = Bt (φ) (x) + ct (x0 , u) + φ (ft (x0 , u)) − ct (x, ux ) − φ (ft (x, ux )) ≤ Bt (φ) (x) + Ct (kx − x0 k + kux − uk) + Lt+1 λmax ATt At kx − x0 k + λmax BtT Bt ku − ux k .

17 So, taking the infimum over the u such that fx0 (u) ∈ Xt+1 , setting κ := max (κ1 , κ2 ) and Lt := Ct (1 + κ) + Lt+1 λmax ATt At + λmax BtT Bt κ one has by (3.6) that (3.7)

Bt (φ) (x0 ) − Bt (φ) (x) ≤ Lt kx − x0 k.

Similarly, one can show that (3.8)

Bt (φ) (x) − Bt (φ) (x0 ) ≤ Lt kx − x0 k.

The result follows from both (3.7) and (3.8). As lower semicontinuous proper convex functions can be approximated by a supremum of affine function, for every t ∈ [[0, T ]] we define FSDDP to be the set of affine t functions φ : x ∈ X 7→ ha, xi + b ∈ R, a ∈ X and b ∈ R with kak2 ≤ Lt . Moreover we’ll denote by FSDDP the set of convex functionals φ : X 7→ R which are Lt -Lipschitz t continuous on Xt and proper. Proposition 3.3. Under Assumptions 3.1, the problem (3.1) and the Bellman operators defined in (3.2) satisfy the structural assumptions Assumptions 1.2. Proof. — Common regularity: By construction, for all t ∈ [[0, T ]], every element of is Lt -Lipschitz continuous. FSDDP t — Final condition. As ψ is convex proper and LT -Lipschitz continuous on XT , it is a countable (as Rn is separable) supremum of LT -Lipschitz affine functions. — Stability by the Bellman operators. This has been shown in Proposition 3.2. — Stability by pointwise optimum. Recall that we are here on the case opt = sup. Fix t ∈ [[0, T ]] and let F ⊂ FSDDP be a set of affine Lt -Lipschitz t continuous functionals. For every x, x0 ∈ dom (Vt ), we have that |VF (x) − VF (x0 ) | = | sup φ(x) − sup φ(x0 )| ≤ sup |φ(x) − φ(x0 )| ≤ Lt kx − x0 k. φ∈F

φ∈F

φ∈F

Thus the function VF is Lt -Lipschitz continuous. As a supremum of affine functions is convex, VF is convex and finite valued. We have shown that VF . is an element of FSDDP t — Order preserving operators. Let φ1 and φ2 be two functionals over X such that φ1 ≤ φ2 i.e. for every x ∈ X we have φ1 (x) ≤ φ2 (x). We want to show that Bt (φ1 ) ≤ Bt (φ2 ). Let x ∈ X, we have : Bt (φ1 ) (x) = inf xT Ct x + uT Dt u + φ1 (ft (x, u)) u∈U

≤ inf xT Ct x + uT Dt u + φ2 (ft (x, u)) u∈U

= Bt (φ2 ) (x). — Existence of the value functions. By backward recursion on the time step t ∈ [[0, T ]] and by Proposition 3.2, for every time step t ∈ [[0, T ]] the function Vt defined by the Dynamic Programming equation (3.2) is convex and Lt -Lipschitz continuous on Xt . — Existence of optimal sets. Fix an arbitrary element φ ∈ FSDDP . We will t show that for every compact set Kt ⊂ dom (Vt ), there exist a compact set Kt+1 ⊂ dom (Vt+1 ) such that (3.9) Bt φ + δKt+1 + δKt = Bt (φ) + δKt ,

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

18

which will imply the desired result. Now equation (3.9) is equivalent to the fact that for every state xt ∈ Kt , there exist a control ut ∈ Ut such that ft (xt , ut ) ∈ Kt+1 ut ∈ arg min ct (xt , u) + φ (ft (xt , u)) . {z } u∈Ut | u =Bt (φ)(x)=:B(u)

Now as the function B : u ∈ Ut 7→ B(u) ∈ R is lower semicontinuous proper and convex (as the sum of two convex functions) we have that B attains its minimum on the compact Ut and that its set of minimizers, that we will note Ut∗ , is bounded. As it is also a closed set of Rn , Ut∗ is a compact set of Rn . Thus setting Kt+1 := ft (Xt , Ut∗ ) we have that equation (3.9) is satisfied and that Kt+1 is compact as ft is continuous and by compactness of Xt and Ut∗ . Lastly, by the RCR Assumption, we have that Kt+1 ⊂ dom (Vt+1 ). — Additively M -subhomogeneous operators. We will show that Bt is additively homogeneous, that is M = 1. Let λ : X → R be a positive constant function, we will identify λ with its value. Let φ be a functional over X. For any x ∈ X we have : Bt (λ + φ) (x) = inf ct (x, u) + (λ + φ) (ft (x, u)) u∈U

= inf ct (x, u) + λ + φ (ft (x, u)) u∈U

= λ + inf ct (x, u) + φ (ft (x, u)) u∈U

= λ + Bt (φ) (x). Note that the constraint ft (x0 , u) restricts the current approximation at time t+1 to the set Xt+1 included in the domain of the true value function Vt+1 . Now we define a compatible selection function. Let t ∈ [[0, T − 1]], for any F ⊂ FSDDP and x ∈ X, define the following optimization problem t " (3.10)

# 0

0

min ct (x , u) + sup φ (ft (x , u))

x0 ∈X u∈U

s.t. x0 = x, ft (x0 , u) ∈ Xt+1 .

φ∈F

If we denote by b its optimal value and by a, a Lagrange multiplier of the constraint x0 − x = 0 at the optimum, then we define φSDDP (F, x) := ha, x0 − xi + b. t Finally, at time t = T , for any F ∈ FSDDP and x ∈ X we define φSDDP (F, x) := ha, x0 − xi + VT (x) , T where a ∈ ∂VT (x). Proposition 3.4. For every time t ∈ [[0, T ]], the function φSDDP is a compatible t selection function in the sense of Definition 1.3.

19 Proof. Fix t ∈ [[0, T − 1]], F ⊂ FSDDP and x ∈ X. We have Bt (VF ) (x) = b t by definition of Bt , thus the function φSDDP (F, x) is tight and it is valid as a is a t subgradient of the convex function Bt (VF ) at x. For t = T , the selection function φSDDP is tight and valid by convexity of VT . T If we want to apply the convergence result from Theorem 2.6, as we approximate from below the value functions (opt = sup) then one has to make the draws according to some sets Ktk such that the sets Kt∗ := lim supk∈N Ktk are Vt∗ optimal. As done in the litterature of the Stochastic Dual Dynamic Programming algorithm (see for example [8], [20] or [14]), one can study the case when the draws are made along the optimal trajectories of the current approximations. More precisely, fix k ∈ N we define a sequence (xk0 , xk1 , . . . , xkT ) by ( k x0 := x0 ∀t ∈ [[0, T − 1]], xkt+1 := ft xkt , ukt , where ukt ∈ arg minu Btu Vtk xkt . We say that such a sequence (xk0 , xk1 , . . . , xkT ) is an optimal trajectory for the k-th approximations starting from x0 . Proposition 3.5 (Optimality of trajectories). quence of singletons: ( k K0 := {x0 }

For every k ∈ N define a se-

k ∀t ∈ [[0, T − 1]], Kt+1 := xkt+1 , where (xk0 , xk1 , . . . , xkT ) is an optimal trajectory for the k-th approximations starting from x0 . Lastly, for every t ∈ [[0, T ]] define Kt∗ := lim supk Ktk . Then the sequence of sets (Kt∗ )t∈[[0,T ]] is optimal with respect to (Vt∗ )t∈[[0,T ]] in the sense of Definition 2.4. Proof. — First we show that for every k ∈ N the sets Ktk t∈[[0,T ]] are Vtk -optimal. Let k ∈ N and t ∈ [[0, T − 1]] be fixed. We want to show that k k Bt Vt+1 + δKt+1 + δKtk = Bt Vt+1 + δKtk , k which is equivalent to prove that for every xkt ∈ Ktk we have k k k (3.11) Bt Vt+1 + δKt+1 xkt = Bt Vt+1 (xt ). k Now using the definition of the Bellman Operators (0.2), the equation (3.11) is satisfied if, and only if, there exist a control ukt ∈ U such that (3.12) k k ut ∈ arg min Btk (u) where Btk (u) := ct xkt , u + Vt+1 ft xkt , u u∈Ut

k ft xkt , ukt ∈ Kt+1 .

Which is true by construction of the sets Ktk t∈[[0,T ]] . — Now we can deduce that the sets (Kt∗ )t∈[[0,T ]] are Vt∗ -optimal. First, as for (3.12), the compability relation ∗ ∗ ∗ Bt Vt+1 + δKt+1 + δKt∗ = Bt Vt+1 + δKt∗ ,

20

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

is satisfied if, and only if, for every x∗t ∈ Kt∗ there exist a control u∗t ∈ Ut such that ∗ u∗t ∈ arg min Bt∗ (u) with Bt∗ (u) = ct (x∗t , u) + Vt+1 (ft (x∗t , u)) u (3.13) f (x∗ , u∗ ) ∈ K ∗ . t t t t+1 k For every k ∈ N denote by ukt the control defining the set Kt+1 . As Ut is k compact, one can extract from the sequence of controls ut k∈N a converging subsequence to a control u∗t that we will also note ukt k∈N . Let x∗t be an element of Kt∗ , by definition of Kt∗ := lim supk Ktk , one can find a sequence of points xkt ∈ Ktk such that one of its subsequence converges to x∗t . Thus, extracting subsequences if needed, one can consider that we have simultaneously k ∗ xt −→ xt k→+∞

ukt

−→ u∗t .

k→+∞

Which implies, by continuity of ft , that the sequence ft xkt , ukt converges ∗ to a point in Kt+1 . 0 Let k, k ∈ N be two indices, as for every u ∈ U the function x 7→ ct xkt , u is k Ct -Lipschitz continuous, Vt+1 ∈ Fkt+1 is Lt+1 Lipschitz and for every u ∈ Ut 1/2 the linear function x 7→ ft (x, u) is λmax ATt At -Lipschitz continuous, so for every u ∈ U we have that k 0 0 T sup |Btk (u) − Btk (u)| ≤ Ct + Lt+1 λ1/2 kxt − xkt k −→ 0. max A A u∈Ut

k→+∞

Thus we have shown that the sequence of functions Btk k∈N converges uniformly on the compact Ut . Moreover as it converges pointwise to Bt∗ , the sequence Btk k∈N converges uniformly to Bt∗ on Ut . Finally, as for every k ∈ N, ukt ∈ arg minu∈Ut Btk (u) we have that u∗t ∈ arg minu∈Ut Bt∗ (u). As the structural assumptions Assumptions 1.2 are satisfied, as the functions , 0 ≤ t ≤ T are compatible selections and the sets (Kt∗ )t∈[[0,T ]] are Vt∗ -optimal φSDDP t (case opt = sup) by Theorem 2.6 we have the following convergence result, which is analogous to the ones in the literature. Theorem 3.6 (Lower (outer) approximations of the value functions). Under Assumptions 3.1, for every t ∈ [[0, T ]], denote by Vtk k∈N the sequence of functionals generated by Tropical Dynamic Programming with the selection function φSDDP and t k the draws made uniformly over the sets K defined in Proposition 3.5. t Then the sequence Vtk k∈N is nondecreasing, bounded from above by Vt , and converges uniformly to Vt∗ on every compact set included in dom (Vt ). Moreover, almost surely over the draws, Vt∗ = Vt on lim supk∈N Ktk . 3.2. A min-plus selection function: upper approximations in the linearquadratic framework with both continuous and discrete controls. In this example, the cost functions will be quadratic without mixing terms, in the sense of the definition below. We will explain at the end of the section why we don’t lose generality (if we increase the dimension of the state space by 1) when studying

21 such specific quadratic forms instead of general ones. In particular this allows us to restrict our study the (compact) unit sphere as explained below. We will denote by Sn the set of n × n symmetric real matrices. Definition 3.7 (Pure quadratic form). We say that a functional q : X → R is a pure quadratic form if there exist a symmetric matrix M ∈ Sn such that for every x ∈ X we have q(x) = xT M x. Similarly, a functional q : X × U → R is a pure quadratic form if there exist two symmetric matrices M1 ∈ Sn and M2 ∈ Sm such that for every x ∈ X we have q(x, u) = xT M1 x + uT M2 u. Let X = Rn be a continuous state space (endowed with its euclidean structure), U = Rm a continuous control space and V a finite set of discrete (or switching) controls. We want to solve the following optimization problem T −1 X

min

(3.14)

(xt )t ∈XT t=0 (ut ,vt )t ∈(U×V)T

( s.t.

cvt t (xt , ut ) + ψ(xT )

x0 ∈ X is given, ∀t ∈ [[0, T − 1]], xt+1 = ftvt (xt , ur ) .

Assumptions 3.8. Let t ∈ [[0, T − 1]] and v ∈ V be arbitrary. — The dynamic ftv : X × U −→ X is linear: ftv (x, u) = Avt x + Btv u, for some given matrices Avt and Btv of compatible dimensions. — The cost function cvt : X × U −→ R is a pure convex quadratic form, cvt (x, u) = xT Ctv x + uT Dtv u, where the matrix Ctv is symmetric semidefinite positive and the matrix Dtv is symmetric definite positive. — The final cost function ψ := inf i∈I ψi is a finite infimum of pure convex quadratric form, of matrix Mi ∈ Sn with i ∈ I a finite set, such that there exists a constant αT ≥ 0 satisfying for every i ∈ I 0 Mi αT I. T

— The maximal eigenvalue of the symmetric semidefinite matrix (Avt ) Avt is less than 1: T λmax (Avt ) Avt < 1. One can write the Dynamic Programming principle for (3.14): ( VT = ψ (3.15) ∀t ∈ [[0, T − 1]], Vt : x ∈ X 7→ inf inf cvt (x, u) + φ (ftv (x, u)) . v∈V u∈U

The following result is crucial in order to study this example: the value functions are 2-homogeneous, allowing us to restrict their study to the unit sphere.

22

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Proposition 3.9. For every time step t ∈ [[0, T ]], the value function Vt , solution of (3.15) is 2-homogeneous, i.e. for every x ∈ X and every λ ∈ R we have Vt (λx) = λ2 Vt (x) . Proof. We proceed by backward recursion on the time step t ∈ [[0, T ]]. For t = T it is true by Assumptions 3.8. Assume that it is true for some t ∈ [[1, T ]]. Fix λ ∈ R, then by definition of Vt−1 , for every x ∈ X we have v Vt−1 (λx) = min min cvt−1 (λx, u) + Vt ft−1 (λx, u) , v∈V u∈U

v which yields the result by 2-homogeneity of x 7→ cvt−1 (x, u), linearity of ft−1 and 2-homogenity of Vt .

Thus, in order to compute Vt , one only needs to know its values on the unit x 2 (euclidean) sphere S as for every non-zero x ∈ X, Vt (x) = kxk Vt kxk . We will study a dynamic programming formulation that exploits this fact. For every time t ∈ [[0, T − 1]] and every switching control v ∈ V we define the Bellman operator with fixed switching control Btv for every functional φ : X → R by: Btv (φ) := inf cvt (·, u) + kftv (·, u)k2 φ u∈U

ftv (·, u) kftv (·, u)k

.

For every time t ∈ [[0, T − 1]] we define the Bellman operator Bt for every functional φ : X → R by: Bt (φ) := inf Btv (φ) . v∈V

Then one can rewrite the Dynamic Programming equation (3.15) as ( (3.16)

VT = ψ ∀t ∈ [[0, T − 1]], Vt = Bt (Vt+1 ) .

The point of writing the Dynamic Programming equation this way is to be able to compute the image by the Bellman operator of a functional by only knowing this functional on the unit (euclidean) sphere S. This will ensure that the unit sphere S is Vt -optimal in the sense of Definition 2.4. Now in order to apply the Tropical Dynamic Programming algorithm to (3.16) we need to check Assumptions 1.2. Under Assumptions 3.8, there exist an interval in the cone of symmetric semidefinite matrices which is stable by every Bellman operator Bt in the sense of the proposition below. We will consider the Loewner order on the cone of (real) symmetric semidefinite matrices, i.e. for every couple of matrices of symmetric matrices (M1 , M2 ) we say that M1 M2 if, and only if, M2 − M1 is semidefinite positive. Moreover we’ll identify a pure quadratic form with its symmetric matrix, thus when we write an infimum over symmetric matrices, we mean the pointwise infimum over their associated pure quadratic forms. Proposition 3.10 (Existence of a stable interval). Fix t ∈ [[0, T − 1]]. Under Assumptions 3.8, there exists a constant αt > 0 such that for every β ≥ αt we have: (3.17)

0 M βI ⇒ 0 Bt (M ) βI.

23 Proof. We want to show that if M 0 then Bt (M ) 0. First, as in Proposition 3.3 one can show that the Bellman operator Bt is order preserving. Therefore, if M 0 then Bt (M ) Bt (0). Hence it is enough to show that Bt (0) 0. But by formula (A.2), we have that Bt (0) = minv∈V Ctv 0 (by Assumptions 3.8) hence the result follows. Now, as Bt is order preserving, if M βI then Bt (M ) Bt (βI). Hence it suffices to find β > 0 such that Bt (βI) βI. By formula (A.2) we have:

(3.18)

T

Btv (βI) βI ⇔ β (Avt )

−1 −1 T I + βBtv (Dtv ) (Btv ) Avt + Ctv βI.

Now, using propositions Proposition B.1, Proposition B.2 and using the notations introduced in those propositions, finding β > 0 satisfying equation (3.18) is equivalent to find β > 0 such that T βλmax (Avt ) Avt + λmax (Ctv ) ≤ β. −1 T v v v λmin I + βBt (Dt ) (Bt ) T −1 ≥ 1, it suffices to find β ≥ 0 such that Noting that as λmin I + βBtv (Dt ) (Btv ) T βλmax (Avt ) Avt + λmax (Ctv ) ≤ β, which, under Assumptions 3.8, is equivalent to (3.19)

β≥

λmax (Ctv ) . T 1 − λmax (Avt ) Avt

Finally, by setting αt := max v∈V

λmax (Ctv ) > 0, T 1 − λmax (Avt ) Avt

every β ≥ αt satisfies (3.19) which concludes the proof. For every time step t ∈ [[0, T ]], we define (3.20)

βt := max αs , s∈[[t,T ]]

the basic functions FQu will be pure quadratic convex forms bounded in the Loewner t sense by 0 and βt I, FQu := φ : x ∈ X 7→ xT M x ∈ R | M ∈ Sn , 0 M βt I , t and we define the following class of functions which will be stable by pointwise infimum of elements in FQu t , n o (3.21) FQu := VF | F ⊂ FQu . t t Proposition 3.11. Under Assumptions 3.8, the Problem (3.14) and the Bellman operators defined in (3.15) satisfy the structural assumptions given in Assumptions 1.2.

24

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Proof. — Common regularity. We will show that every element of FQu is 2βt Lipschitz continuous on S. Let F = {φi }i∈N ⊂ FQu with N ⊂ N and φ ∈ FQu i t t with associated symmetric matrix Mi . Fix x, y ∈ S, we have |VF (x) − VF (y)| = | inf xT Mi x − inf y T Mi y| i∈N

i∈N T

T

≤ max |x Mi x − y Mi y| i∈N

≤ max |xT Mi (x − y) + y T Mi (x − y) | i∈N

(M T = M )

≤ max |hx + y, Mi (x − y)i|

(Cauchy-Schwarz)

≤ kx + yk · max kMi (x − y) k

i∈N

i∈N

≤ kx + yk · max kMi k kx − yk i∈N | {z } ≤βt

≤ βt kx + yk ·kx − yk | {z } ≤2

|VF (x) − VF (y)| ≤ 2βt kx − yk2 . — Final condition. By Assumptions 3.8, the final cost function ψ is an element of FQu t . — Stability by the Bellman operators. Let φ be in FQu t+1 . By construction, the symmetric matrix M associated with the pure quadratic form φ is such that 0 M βt+1 I. By (3.20) we have αt ≤ βt+1 ≤ βt , thus 0 M βt I and by Proposition 3.10 we have that 0 Bt (M ) βt I. Qu We have shown that if φ is in FQu t+1 , then Bt (φ) is in Ft . — Stability by pointwise optimum. This is true by construction of FQu t (3.21). — Order preserving operators. Proceed as in Proposition 3.3. — Existence of the value functions. This is a consequence of (A.1) and the fact that for every functionals φ1 , φ2 : X → R we have that

Bt (inf (φ1 , φ2 )) = inf (Bt (φ1 ) , Bt (φ2 )) . — Existence of optimal sets. For every time step t ∈ [[0, T −1]], every compact Xt ⊂ dom(Vt ) (= X) and every functional φ ∈ FQu the unit sphere S satisfies t Bt (φ + δS ) = Bt (φ) , which implies the desired inequality. — Additively M -subhomogeneous operators. Set M ≥ 0 to be M :=

inf

sup kftv (x, u) k2 .

(u,v)∈U×V x∈S

Then M is finite as for every (u, v) ∈ U×V the function ftv (·, u) is continuous over the compact S. Let λ : X → R be a positive constant function, we will

25 Qu identify λ with its value. Let φ be an element of Ft+1 . For any x ∈ S we have : v ft (x, u) Bt (λ + φ) (x) = inf cvt (x, u) + kftv (x, u) k2 (λ + φ) kftv (x, u) k (u,v)∈U×V

≤ λM + Bt (φ) (x). Thus Bt is additively M -subhomogeneous. We now define the selection functions φQu t . At time t ∈ [[0, T − 1]] we define for Qu any F ⊂ Ft and x ∈ X ! φQu t (F, x) ∈ Bt

arg min Bt (φ) (x) . φ∈F

At time t = T we define for any F ⊂ FQu T and x ∈ X we set φQu T (F, x) = arg min ψi (x). ψi

Proposition 3.12. For every time t ∈ [[0, T ]], the function φQu is a compatible t selection function in the sense of Definition 1.3. Proof. Fix t = T . The function φQu is tight and valid as VT = ψ. Now fix T t ∈ [[0, T − 1]]. Let F ⊂ FQu and x ∈ X be arbitrary. We have t Bt (VF ) (x) = Bt inf φ (x) φ∈F v ft (x, u) = inf cvt (x, u) + kftv (x, u) k2 inf φ φ∈F kf v (x, u) k (u,v)∈U×V tv ft (x, u) = inf inf cvt (x, u) + kftv (x, u) k2 φ φ∈F (u,v)∈U×V kftv (x, u) k = inf Bt (φ) (x) φ∈F

= φQu t (F, x) (x). Thus φQu is tight. By similar arguments we have for every x0 ∈ X that t 0 Bt (VF ) (x0 ) = inf Bt (φ) (x0 ) ≥ φQu t (F, x) (x ). φ∈F

As the structural assumptions, Assumptions 1.2 are satisfied, as the functions φQu t , 0 ≤ t ≤ T are compatible selections and the unit sphere S is Vt -optimal (case opt = inf) we have by Theorem 2.6 the following convergence result Theorem 3.13 (Upper (inner) approximations of the value functions). Under Assumptions 3.8, for every t ∈ [[0, T ]], denote by Vtk k∈N the sequence of functionals generated by Tropical Dynamic Programming with the selection function φQu and the t draws made uniformly overthe sphere Kt := S. Then the sequence Vtk k∈N is nonincreasing, bounded from below by Vt and converges uniformly to Vt∗ on S. Moreover, almost surely over the draws, Vt∗ = Vt on S.

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

26

Finally, we briefly explain why, by adding another dimension to the state variable, there’s no loss of generality induced by the choice of studying pure quadratic forms in (3.14) instead of quadratic forms, nor is there one for studying linear dynamics instead of affine dynamics. First, we define the operator θ that maps any functional φ : X → R to a 2-homogeneous functional φθ

θ: R

(3.22)

X

φ

X×R

−→

R

7−→

φθ : (x, y) 7→ y 2 φ

x y

if y 6= 0, 0 otherwise.

When φ is affine, abusively denote by θ again the operator that maps any affine functional φ to the linear functional φθ ,

(3.23)

θ: R

X

φ

X×R

−→

R

7−→

φθ : (x, y) 7→ yφ

x y

if y 6= 0, 0 otherwise.

Now consider (Bt )t∈[[0,T −1]] the Bellman operators associated to an analogous optimization problem as (3.14) but where we allow the costs and final cost functions to be general quadratic forms and the dynamic to be affine. Furthermore, one can turn this non-homogeneous optimization problem into a homogeneous problem by applying θ to the costs functions, final cost function and dynamics of the previous optimization problem. We note by Btθ t∈[[0,T −1]] the Bellman operators of the homogenized optimization problem. Then one can deduce Bt from Btθ θ, meaning that if one can solve the (homogenized) optimization problem (3.14) where the costs are pure quadratics forms and the dynamics linear, then one can solve the (non-homogeneous) problem where the costs are quadratic forms and the dynamics are affine. Proposition 3.14. Define for every t ∈ [[0, T − 1]], two operators Bt :

Btθ

R φ

X

−→ 7−→

R x 7→ minu∈U ct (x, u) + φ (ft (x, u)) ,

X×R

−→ 7−→

R (x, y) 7→ minu∈U cθt (x, y, u) + φ ftθ (x, y, u) .

R

φ

X

X×R

For every functional φ : X → R, for every x ∈ X and y ∈ R∗ we have that Btθ (θφ) (x, y) = θBt (φ) (x, y)

and

Bt (φ) (x) = Btθ (θφ) (x, 1) .

Proof. First, remark that if the first equality holds, then as for every x ∈ X one has θBt (φ) (x, 1) = Bt (φ) (x), setting y = 1 one gets the second equality. Now fix a

27 6.5 6 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 -4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

0.5 -2.8

-2.6

-2.4

-2.2

-2

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

Fig. 5. Illustration of the multistage (two stages here) optimization problem studied in Example 3.15. On the right we have the final cost function (infimum of ψ1 and ψ2 ) and on the left we have the image of ψ1 and ψ2 by the Bellman operator B. The image of ψ2 is strictly greater than the image of ψ1 . At the final step t = 1, the best function at the point −1 is ψ2 . The image by the k-th optimal dynamic of x0 = −2 is x1 = −1.

functional φ : X → R, x ∈ X and y ∈ R∗ , we have that Btθ (θφ) (x, y) = min cθt (x, y, u) + φθ ftθ (x, y, u) u∈U x u x u = min y 2 ct , + φθ yct , u∈U y y y y x u u x = y 2 min ct , , + φ ct u∈U y y y y x 0 x 0 , u + φ c , u = y 2 min c t t u0 ∈U y y x = y 2 Bt (φ) y = θBt (φ) (x). Example 3.15. We show an example where approximating from above fails to converge when the points are drawn accordingly to optimal trajectories for the current approximations (as done in subsection 3.1 where we approximate from below). As shown by Proposition 3.14 there’s no loss of generality in considering the framework of this section but with non-homogeneous functions. We consider a (non-homogeneous) problem with only two time steps, that is T = 1 and t ∈ {0, 1} such that — The state space X and the control space U are equal to R. — The linear dynamic is f (x, u) = x + u. — The quadratic cost is c(x, u) = x2 + u2 . — The final cost function is the infimum between two quadratics, ψ1 (x) = (x + 2)2 + 1 and ψ2 (x) = x2 i.e. ψ = inf (ψ1 , ψ2 ) . Here the Bellman operator B of this multistage optimization problem is defined

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

28

for every φ : X → R and every x ∈ X by B (φ) (x) = min x2 + u2 + φ (x + u) = x2 + min u2 + φ (x + u) . u∈U

u∈U

2

For the case where φa,b (·) = (· + a) + b with a, b ∈ R one has for every x ∈ R B (φa,b ) (x) =

(3.24)

3 2 x + ax + b. 2

Fix x0 = xk0 = −2 for every k ∈ N. As described in Algorithm 1.1, the approximations of the value functions V1 and V0 are initialized to +∞. Thus every control u ∈ U is optimal in the sense that u ∈ arg minu0 ∈U x2 + (u0 )2 + φ (x + u0 ). Hence if we set x01 := −1 = f (x0 , 1) then (x0 , x01 ) is an optimal trajectory as described in Proposition 3.5. We deduce from (3.24) the following facts, illustrated in Figure 5. 1. The image of ψ2 is strictly greater than the image of ψ1 by the Bellman operator B, i.e. B (ψ2 ) (−2) > B (ψ1 ) (−2). 2. The image by the k-th optimal dynamic of −2 is −1, i.e. setting uk0 := arg minu0 ∈U (−2)2 + (u0 )2 + V1k (−2 + u0 ) (the arg min is here a singleton) one has f −2, uk0 = −1. 3. At the final step t = 1, the best function at the point −1 is ψ2 , i.e. ψ(−1) = inf (ψ1 (−1), ψ2 (−2)) = ψ2 (−2). From those three facts, one can deduce starting x0 = −2 and x1 = −1, the optimal trajectory for the current approximations will always be sent to x1 = −1. But, as shown in the proof of Proposition 3.12 one can show that the image by B of an infimum is the infimum of the images by B: V0 (−2) = B (inf (ψ1 , ψ2 )) (−2) = inf (B (ψ1 ) (−2), B (ψ1 ) (−2)) . Thus for every k ∈ N, V0 (−2) = B (ψ1 ) (−2) < B (ψ1 ) (−2) = V0k (−2). Conclusion. In this article we have devised an algorithm, Tropical Dynamic Programming, that encompases both a discrete time version of Qu’s work and the SDDP algorithm in the deterministic case. We have shown in the last section that Tropical Dynamic Programming can be applied to two natural frameworks: one for Qu’s adaptation and one for SDDP. In the case where both framework intersects, one could apply Tropical Dynamic Programming with the selection functions φQu t and get non-increasing upper approximations of the value function. Simultaneously, by applying Tropical Dynamic Programming with the selection function φSDDP , one t would get non-decreasing lower approximations of the value function. Moreover, we have shown that the upper approximations are, almost surely, asymptotically equal to the value function on the whole space of states X and that the lower approximations are, almost surely, asymptotically equal to the value function on a set of interest. Thus, in those particular cases we get converging bounds for V0 (x0 ), which is the value of the multistage optimization problem (0.1), along with asymptotically exact minimizing policies. In those cases, we have shown a possible way to address the issue

29 of computing efficient upper bounds when running the SDDP algorithm by simultaneously running another algorithm (namely TDP with Qu’s selection functions). This claim has yet to be tested numerically: the results presented here act as a safeguard and are not proofs of efficiency. In future works, we would like to extend the current framework to risk-averse multistage stochastic optimization problems, explicitly give a way to generate deterministic converging upper and lower bounds and to provide numerical experiments. REFERENCES [1] R. Bellman, The theory of dynamic programming, Bulletin of the American Mathematical Society, 60 (1954), pp. 503–515. [2] R. Bellman, Dynamic Programming, Princeton Univ. Pr, Princeton, NJ, 1984. OCLC: 830865530. [3] D. P. Bertsekas, Dynamic Programming and Optimal Control. Vol. 1: ..., no. 3 in Athena scientific optimization and computation series, Athena Scientific, Belmont, Mass, fourth ed ed., 2016. [4] J. V. Burke and P. Tseng, A Unified Analysis of Hoffman’s Bound via Fenchel Duality, SIAM Journal on Optimization, 6 (1996), pp. 265–282, https://doi.org/10.1137/0806015. [5] S. Dreyfus, Richard Bellman on the Birth of Dynamic Programming, Operations Research, 50 (2002), pp. 48–51, https://doi.org/10.1287/opre.50.1.48.17791. [6] R. M. Dudley, Real Analysis and Probability, no. 74 in Cambridge studies in advanced mathematics, Cambridge Univ. Press, Cambridge, 2. ed ed., 2003. OCLC: 249706200. [7] W. Fleming, Functions of Several Variables, Undergraduate Texts in Mathematics, Springer New York, New York, NY, 1977, https://doi.org/10.1007/978-1-4684-9461-7. [8] P. Girardeau, V. Leclere, and A. B. Philpott, On the Convergence of Decomposition Methods for Multistage Stochastic Convex Programs, Mathematics of Operations Research, 40 (2015), pp. 130–145, https://doi.org/10.1287/moor.2014.0664. [9] V. Guigues, SDDP for some interstage dependent risk-averse problems and application to hydro-thermal planning, Computational Optimization and Applications, 57 (2014), pp. 167–203, https://doi.org/10.1007/s10589-013-9584-1. ¨ misch, Sampling-Based Decomposition Methods for Multistage Stochas[10] V. Guigues and W. Ro tic Programs Based on Extended Polyhedral Risk Measures, SIAM Journal on Optimization, 22 (2012), pp. 286–312, https://doi.org/10.1137/100811696. [11] P. Lancaster and L. Rodman, Algebraic Riccati Equations, Oxford science publications, Clarendon Press ; Oxford University Press, Oxford : New York, 1995. [12] W. McEneaney, Max-Plus Methods for Nonlinear Control and Estimation, Systems & Control: Foundations & Applications, Birkh¨ auser-Verlag, Boston, 2006, https://doi.org/10.1007/ 0-8176-4453-9. [13] W. M. McEneaney, H. Kaise, and S. H. Han, Idempotent Method for Continuous-Time Stochastic Control and Complexity Attenuation, IFAC Proceedings Volumes, 44 (2011), pp. 3216–3221, https://doi.org/10.3182/20110828-6-IT-1002.03343. [14] M. V. F. Pereira and L. M. V. G. Pinto, Multi-stage stochastic optimization applied to energy planning, Mathematical Programming, 52 (1991), pp. 359–375, https://doi.org/10. 1007/BF01582895. [15] Z. Qu, Nonlinear Perron-Frobenius Theory and Max-plus Numerical Methods for HamiltonJacobi Equations, PhD thesis, Ecole Polytechnique X, Oct. 2013. [16] Z. Qu, A max-plus based randomized algorithm for solving a class of HJB PDEs, in 53rd IEEE Conference on Decision and Control, Dec. 2014, pp. 1575–1580, https://doi.org/10.1109/ CDC.2014.7039624. [17] R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, no. 317 in Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen, Springer, Dordrecht, corr. 3. print ed., 2009. OCLC: 845635447. [18] L. Schwartz, Analyse. 1: Th´ eorie Des Ensembles et Topologie, no. 42 in Collection Enseignement des sciences, Hermann, Paris, nouv. tirage ed., 1995. OCLC: 247315724. [19] A. Shapiro, Analysis of stochastic dual dynamic programming method, European Journal of Operational Research, 209 (2011), pp. 63–72, https://doi.org/10.1016/j.ejor.2010.08.007. [20] J. Zou, S. Ahmed, and X. A. Sun, Stochastic dual dynamic integer programming, Mathematical Programming, (2018), https://doi.org/10.1007/s10107-018-1249-5.

30

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Appendix A. Algebraic Riccati Equation. This section gives complementary results for subsection 3.2. We use the same framework and notations introduced in subsection 3.2. Proposition A.1. Fix a discrete control v ∈ V and a time step t ∈ [[0, T − 1]]. — The operator Btv restricted to the pure quadratic forms (identified with Sn the space of the symmetric semidefinite positive matrices) is given by the discrete time algebraic Riccati equation (A.1) Btv : Sn −→ S+ n T M 7−→ Ctv + (Avt ) M Avt −1 T T T − (Avt ) M Btv Dtv + Btv M (Btv ) (Btv ) M Avt . — We can rewrite the discrete time algebraic Riccati equation Appendix B (A.1). For every M ∈ S+ n we have: −1 T −1 T (A.2) Btv (M ) = (Avt ) M I + Btv (Dtv ) (Btv ) M Avt + Ctv . Proof. — First note that if M is symmetric, then Btv (M ) is also symmetric. Let t ∈ {T − 1, T − 2, . . . , 0} and M ∈ S+ n . Let x ∈ X, we have: T v ftv (x, u) ft (x, u) v v 2 M Bt (M )(x) = inf ct (x, u) + kft (x, u)k2 u∈U kftv (x, u)k2 kftv (x, u)k2 T

= inf xT Ctv x + uT Dtv u + ftv (x, u) M ftv (x, u) u∈U

(A.3) Btv (M )(x) = xT Ctv x + inf uT Dtv u + ftv (x, u)T M ftv (x, u). u∈U

As u 7→ ft (x, u) is linear, Dtv 0 and M 0 we have that g : u ∈ U 7→ uT Dtv u + ftv (x, u)T M ftv (x, u) ∈ R is convex, hence is minimal when ∇g(u) = 0 i.e. for u ∈ U such that : T T (A.4) Dtv + (Btv ) M Btv u + (Btv ) M (Avt ) x = 0. T

Now we will show that Dtv + (Btv ) M Btv is invertible. As M is symmetric semidefinite positive, for every u ∈ U we have : T T uT Dtv + (Btv ) M Btv u = uT Dtv u + (Btv u) M (Btv u) {z } | ≥0

T

Dtv u

≥u | {z } >0

> 0. T

We have shown that the symmetric matrix Dtv + (Btv ) M Btv is definite positive and thus invertible. We conclude that equation (A.4) is equivalent to: −1 T T (A.5) u = − Dtv + (Btv ) M Btv (Btv ) M (Avt ) x.

31 Finally replacing (A.5) in equation (A.3) we get after simplifications that T Btv (M )(x) = xT Ctv + (Avt ) M Avt −1 T T T − (Avt ) M Btv Dtv + (Btv ) M Btv (Btv ) M Avt x, which gives (A.1). — See [11, Proposition 12.1.1 page 271]. Appendix B. Smallest and greatest eigenvalues. Here we recall some formulas on the lowest and greatest eigenvalues of a matrix. Proposition B.1. Let A and B two symmetric real matrices. Denote the smallest eigenvalue of a symmetric real matrix M by λmin (M ) (every eigenvalue of M is real) and by λmax (M ) its greatest eigenvalue. We have the following inequalities : 1. λmin (A + B) ≥ λmin (A) + λmin (B), 2. λmax (A + B) ≤ λmax (A) + λmax (B). Proposition B.2. Let A be a real matrix and B be a symmetric real matrix. Denote by λmin (M ) the smallest eigenvalue of a symmetric real matrix M (every eigenvalue of M is real) and by λmax (M ) its greatest eigenvalue. We have the following inequalities : 1. λmin (AT BA) ≥ λmin (AT A)λmin (B), 2. λmax (AT BA) ≤ λmax (AT A)λmax (B). Moreover if A and B are symmetric definite positive matrices, then we have: 1. λmin (AB) ≥ λmin (A)λmin (B), 2. λmax (AB) ≤ λmax (A)λmax (B). Acknowledgments.

arXiv:1810.12870v1 [math.OC] 30 Oct 2018

MARIANNE AKIAN∗ , JEAN-PHILIPPE CHANCELIER† , AND BENOˆIT TRAN† Abstract. Several attempt to dampen the curse of dimensionnality problem of the Dynamic Programming approach for solving multistage optimization problems have been investigated. One popular way to address this issue is the Stochastic Dual Dynamic Programming method (SDDP) introduced by Perreira and Pinto in 1991 for Markov Decision Processes. Assuming that the value function is convex (for a minimization problem), one builds a non-decreasing sequence of lower (or outer) convex approximations of the value function. Those convex approximations are constructed as a supremum of affine cuts. On continuous time deterministic optimal control problems, assuming that the value function is semiconvex, Zheng Qu, inspired by the work of McEneaney, introduced in 2013 a stochastic max-plus scheme that builds upper (or inner) non-increasing approximations of the value function. In this note, we build a common framework for both the SDDP and a discrete time version of Zheng Qu’s algorithm to solve deterministic multistage optimization problems. Our algorithm generates monotone approximations of the value functions as a pointwise supremum, or infimum, of basic (affine or quadratic for example) functions which are randomly selected. We give sufficient conditions on the way basic functions are selected in order to ensure almost sure convergence of the approximations to the value function on a set of interest. Key words. Deterministic multistage optimization problem, min-plus algebra, tropical algebra, Stochastic Dual Dynamic Programming, Dynamic Programming.

Introduction. Throughout this paper we aim to solve an optimization problem involving a dynamic system in discrete time. Informally, given a time t and a state xt , one can apply a control ut and the next state is given by the dynamic ft , that is xt+1 = ft (xt , ut ). Then one wants to minimize the sum of costs ct (xt , ut ) induced by our controls starting from a given state x0 and a given time horizon T . Furthermore, one can add some final restrictions on the states at time T which will be modeled by a function ψ only depending on the final state xT . As in [1] we will call such optimization problems, multistage (optimization) problems:

T −1 X

min

x=(x0 ,...,xT ) u=(u0 ,...uT −1 ) t=0

(0.1)

( s.t.

ct (xt , ut ) + ψ(xT )

x0 ∈ X is given, ∀t ∈ [[0, T − 1]], xt+1 = ft (xt , ut ).

One can solve the multistage problem (0.1) by Dynamic Programming as introduced by Richard Bellman around 1950 [1, 5]. This method breaks the multistage problem (0.1) into T sub-problems that one can solve by backward recursion over the time t ∈ [[0, T ]]. More precisely, denoting by X the set of states and given some X X operators Bt : R → R from the set of functionals that may take infinite values to itself, one can show (see for example [3]) that solving problem (0.1) is equivalent to solving the following system of sub-problems: ∗ INRIA

´ Saclay Ile-de-France and CMAP, Ecole Polytechnique, CNRS, France. ´ Ecole des Ponts ParisTech, France. Email : [email protected]

† CERMICS,

1

2

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

( (0.2)

VT = ψ ∀t ∈ [[0, T − 1]], Vt : x ∈ X 7→ Bt (Vt+1 )(x) ∈ R.

We will call each operator Bt the Bellman operator at time t and each equation in (0.2) will be called the Bellman equation at time t. Lastly, the functions Vt defined in (0.2) will be called the (Bellman) value function at time t. Note that by solving the system (0.2) we mean that we want to compute the value function V0 at point x0 , that is V0 (x0 ). We will state several assumptions on these operators in section 1 under which we will devise an algorithm to solve the system of Bellman equations 0.2 (also called the Dynamic Programming formulation of the multistage problem). Let us stress on the fact that although we want to solve the multistage problem 0.1, we will mostly focus on its (equivalent) Dynamic Programming formulation given by (0.2). One issue of the Dynamic Programming approach to solve multistage problems is the so-called curse of dimensionality [2], that is, grid-based methods to compute the value functions have a complexity exponential in the dimension of the state space. One popular algorithm (see [8, 9, 10, 14, 19, 20]) that aims to dampen the curse of dimensionality is the Stochastic Dual Dynamic Programming algorithm (or SDDP for short) introduced by Pereira and Pinto in 1991. Assuming that the cost functions ct are convex and the dynamics ft are linear, the value functions defined in the Dynamic Programming formulation (0.2) are convex [8]. The SDDP algorithm aims to build lower (or outer) approximations of the value functions as a supremum of affine functions and thus, doesn’t rely on a discretisation of the state space in order to compute (approximations of) the value functions. In the aforementioned references, this approach is used to solve stochastic multistage problems, however in this article we will restrict our study to deterministic multistage problems, that is, the above formulation (0.1). Still, the SDDP algorithm can be applied to our framework. One of the main drawback of the SDDP algorithm is the lack of an efficient stopping criterion: it builds lower approximations of the value functions but upper (or inner) approximations are built through a Monte-Carlo scheme that is costly. During her thesis [15], Zheng Qu devised an algorithm [16] which builds upper approximations of the value functions in an infinite horizon and continuous time framework where the set of controls is both discrete and continuous. This work was inspired by the work of McEneaney [12, 13] using techniques coming from tropical algebra or also called min-plus techniques. Assume that for each fixed discrete control the cost functions are convex quadratic and the dynamics are linear. If the set of discrete controls is finite, then exploiting the min-plus linearity of the Bellman operators, one can show that the value functions can be computed as a finite pointwise infimum of convex quadratic functions: Vt = inf φt , φt ∈Ft

where Ft is a finite set of convex quadratic forms. Moreover, in this framework, the elements of Ft can be explicitly computed through the Discrete Algebraic Riccati Equation (DARE, [11]). Thus an approximation scheme that computes non-decreasing subsets Ftk k∈N of Ft yields an algorithm that converges after a finite number of

3 improvements Vtk := inf φt ≈ inf φt = Vt . φt ∈Ftk

φt ∈Ft

However the size of the set of functions Ft that need to be computed is growing exponentially with t. Informally, in order to address this issue, Qu introduced a probabilistic scheme that adds to Ftk the best (given the current approximations) element of Ft at some point drawn on the unit sphere (assuming the space of states to be Euclidean). Our work aims to build a general algorithm that encompasses both a deterministic version of the SDDP algorithm and an adaptation of Qu’s work to a discrete time and finite horizon framework. This paper is divided in 3 sections. In the first section we make several assumptions on the Bellman operators Bt and define an algorithm which builds approximations of the value functions as a pointwise optimum (i.e. either a pointwise infimum or a pointwise supremum) of basic functions in order to solve the associated Dynamic Programming formulation (0.2) of the multistage problem (0.1). At each iteration, the so-called basic function that is added to the current approximation will have to satisfy two key properties at a point randomly drawn, namely, tightness and validity. A key feature of our algorithm is that it can yield either upper or lower approximations, for example: — If the basic functions are affine, then approximating the value functions by a pointwise supremum of affine functions will yield the SDDP algorithm. — If the basic functions are quadratic convex, then approximating the value functions by a pointwise infimum of convex quadratic functions will yield an adaptation of Qu’s algorithm. In the following section we study the convergence of the approximations of the value functions generated by our algorithm at a given time t ∈ [[0, T ]]. Under the previous assumptions our approximating sequence converges almost surely (over the draws) to the value function on a set of interest (that will be specified). Finally on the last section we will specify our algorithm to the two special cases mentioned in the first section. The convergence result of section 2 specified to these two cases will be new for (an adaptation of) Qu’s algorithm and will be the same as in the literature for the SDDP algorithm. It will be a step toward addressing the issue of computing efficient upper approximations for the SDDP algorithm and opens another way to devise algorithms for a broader class of multistage problems. 1. Notations and definitions. Notations 1.1. — Denote by X := Rn the set of states endowed with its euclidean structure and its Borel σ-algebra. — Denote by T a finite integer that we’ll call the horizon. — Denote by opt an operation that is either the pointwise infimum or the pointwise supremum of functions which we will call the pointwise optimum. Note that once a choice of which operation is associated with opt, it remains the same for the remainder of this article. — Denote by R the extended reals endowed with the operations +∞ − ∞ = −∞ + ∞ = +∞. X — For every t ∈ [[0, T ]], fix Ft and Ft two subsets of R the set of functionals on X such that Ft ⊂ Ft .

4

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Fig. 1. The lower approximations Vt k will be built as a supremum of basic functions (here k

affine functions) that will always be below Vt . Likewise, the upper approximations Vt will be built as an infimum of some other basic functions (here quadratic functions) that will always be above Vt .

— We will say that a functional φ is a basic function if it’s an element of Ft for some t ∈ [[0, T ]]. — For every set X ⊂ X, denote by δX the function equal to 0 on X and +∞ elsewhere. — For every t ∈ [[0, T ]] and every set of basic function Ft ⊂ Ft we denote by VFt its pointwise optimum, VFt := optφ∈Ft φ, i.e. (1.1)

VF : X x

−→ 7−→

R opt {φ(x) | φ ∈ Ft } . X

X

— Denote by (Bt )t∈[[0,T −1]] a sequence of T operators from R to R , that we will call the Bellman operators. — Fix a functional ψ : X → R. We define a sequence of functions (Vt )t∈[[0,T ]] , called the value functions, by the system of Bellman equations: ( VT = ψ (1.2) ∀t ∈ [[0, T − 1]], Vt : x ∈ X 7→ Bt (Vt+1 )(x) ∈ R. We first make several assumptions on the structure of problem (1.2). Those assumptions will be satisfied in the examples of section 3. Informally, we want some regularities on the Bellman operators so as to propagate, backward in time, good behaviours of the value function at the final time t = T to the value function at the initial time t = 0. Moreover, at each time t, we ask that the basic functions that build our approximations are such that their pointwise optimum share a common regularity. Assumptions 1.2 (Structural assumptions). — Common regularity: for every t ∈ [[0, T ]], there exist a common (local) modulus of continuity for every φ ∈ Ft , i.e. for every x ∈ X, there exist ωx : R+ ∪{+∞} → R+ ∪{+∞} which is increasing, equal to 0 and continuous at 0 such that for every φ ∈ Ft and for every x0 ∈ X we have that |f (x) − f (x0 )| ≤ ωx (|x − x0 |) .

5 — Final condition: the value function at time T is a pointwise optimum for some given subset FT of FT as in (1.1), that is VT := VFT . — Stability by the Bellman operators: for every t ∈ [[0, T − 1]], if φ ∈ Ft+1 then Bt (φ) belongs to Ft . — Stability by pointwise optimum: for every t ∈ [[0, T ]], if Ft ⊂ Ft then VFT ∈ Ft . — Order preserving operators: the operators Bt , 0 ≤ t ≤ T − 1 are order preserving, i.e. if φ, ϕ ∈ Ft+1 are such that φ ≤ ϕ, then Bt (φ) ≤ Bt (ϕ). — Existence of the value functions: there exists a solution (Vt )t∈[[0,T ]] to the Bellman equations (0.2) that never takes the value −∞ and which is not equal to +∞. — Existence of optimal sets: for every t ∈ [[0, T − 1]] and every compact set Xt ⊂ dom (Vt ), there exist a compact set Xt+1 ⊂ dom (Vt+1 ) such that for every functional φ ∈ Ft+1 we have Bt φ + δXt+1 ≤ Bt (φ) + δXt . — Additively M -subhomogeneous operators: the operators Bt , 0 ≤ t ≤ T − 1, are additively M -subhomogeneous, i.e. there exist a constant M ≥ 0 such that for every positive constant functional λ and every functional φ ∈ Ft+1 we have that Bt (φ + λ) + δdom(Vt ) ≤ Bt (φ) + M λ + δdom(Vt ) . From a set of basic functions Ft ⊂ Ft , one can build its pointwise optimum VFt = optφ∈Ft φ. We build monotone approximations of the value functions as an optimum of basic functions which will be computed through a compatible selection function as defined below. We illustrate this definition in Figure 2. Definition 1.3 (Compatible selection function). Let t ∈ [[0, T − 1]] be fixed. A compatible selection function is a function φ]t from 2Ft+1 × X to Ft satisfying the following properties — Validity: for all set of basic functions Ft+1 ⊂ Ft+1 and every x ∈ X we have φ]t (Ft+1 , x) ≤ Bt VFt+1 (resp. φ]t (Ft+1 , x) ≥ Bt VFt+1 ) when opt = sup (resp. opt = inf). — Tightness: for all set of basic functions Ft+1 ⊂ Ft+1 and every x ∈ X the functions φ]t (Ft+1 , x) and Bt VFt+1 coincide at point x, that is φ]t (Ft+1 , x) (x) = Bt VFt+1 (x). For t = T , we say that φ]T : 2FT × X → FT is a compatible selection function if function φ]T is valid in the sense that for every FT ⊂ FT and x ∈ X, the function φ]T (FT , x) remains above (resp. below) the value function at time T when opt = inf (resp. opt = sup). Moreover, we say that function φ]T is tight if it coincides with the value function at point x, that is for every FT ⊂ FT and x ∈ X we have φ]T (FT , x) (x) = VT (x). Note that the tightness assumption only asks for equality at the point x between the functions φ]t (Ft+1 , x) and Bt VFt+1 and not necessarily everywhere. The only global property between the functions φ]t (Ft+1 , x) and Bt VFt+1 is an inequality given by the validity assumption. In Algorithm 1.1 we will generate for every time t a sequence of random points of crucial importance as they will be the ones where the selection functions will be

6

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Fig. 2. In section 3 we will specify two selection functions, φQu and φSDDP that will respectively t t yield upper and lower approximations of the value functions. In both cases, at the the selection function is equal to the image by the Bellman operator of the current approximation, that is the tightness assumption. Moreover it remains above the image by the Bellman operator of the current approximation, that is the validity assumption.

evaluated, given the set Ftk which characterizes the current approximation. In order to generate those points, we will assume that we have at our disposition an Oracle that given T +1 set of functions (characterizing the current approximations) computes T + 1 compact sets and a probability law of support equal to those compact sets. This Oracle will have to follow the following conditions on its output. Assumptions 1.4 (Oracle assumptions). The Oracle takes as input T + 1 sets of functions included in F1 × . . . × FT . Its output is T +1 compact sets K0 , . . . , KT each included in X and a probability measure µ on XT +1 (where X = Rn is endowed with its borelian σ-algebra) such that: — For every t ∈ [[0, T ]], Kt ⊂ dom (Vt ). — For every t ∈ [[0, T ]], there exist a function gt : R∗+ → (0, 1) such that for every η > 0 and every x ∈ Kt , µ (B (x, η) ∩ Kt ) ≥ gt (η) . An example of such Oracle would be one that outputs T + 1 union of N singletons in dom (Vt ) (for some positive integer N ) and the product measure of µt , 0 ≤ t ≤ T where µt is the uniform measure over the N singletons. Then for every t ∈ [[0, T ]] the constant function gt equal to n1 satisfies Assumptions 1.4. For every time t ∈ [[0, T ]], we construct a sequence of functionals Vtk k∈N of Ft as follows. For every time t ∈ [[0, T ]] and for every k ≥ 0, we build a subset Ftk of the set Ft and define the sequence of functionals by pointwise optimum Vtk := VFtk = optφFtk φ. As described here, the functionals are just byproducts of Algorithm 1.1 which only describes the way the sets Ftk are defined. As Algorithm 1.1 was inspired by Qu’s work which uses tropical algebra techniques, we will call this algorithm Tropical Dynamic Programming. 2. Almost sure convergence on the set of accumulation points.

7 Algorithm 1.1 Tropical Dynamic Programming (TDP) Input: For every t ∈ [[0, T ]], φ]t a compatible selection function, an Oracle satisfying Assumptions 1.4, T + 1 compact sets K00 × . . . × KT0 and a probability measure µ0 on XT +1 of support equal to those T + 1 compact sets. Output: For every t ∈ [[0, T ]], a sequence of sets Ftk k∈N . Define for every t ∈ [[0, T ]], Ft0 := ∅. for k ≥ 1 do Draw some points xk−1 over K0k−1 × K1k−1 × . . . × KTk−1 according to t t∈[[0,T ]] 0 µk−1 , independently from previous draws at iterations k < k . Compute φkT := φ]T FTk−1 , xk−1 . T Define FTk := FTk−1 ∪ φkT . for t from T − 1 to 0 do k Compute φkt := φ]t Ft+1 , xk−1 . t Define Ftk := Ftk−1 ∪ φkt . end for Compute K0k × . . . KTk , µk := Oracle F0k , . . . , FTk . end for

First, we state several simple but crucial properties of the approximation functions Vtk k∈N generated by Algorithm 1.1. They are direct consequences of the facts that the Bellman operators are order preserving and that the basic functions building our approximations are computed through compatible selection functions. Lemma 2.1. 1. Let (F k )k∈N be a non-decreasing (for the inclusion) sequence of set of functionals on X. Then the sequence (VF k )k∈N is monotone. More precisely, when opt = inf then (VF k )k∈N is non-increasing and when opt = sup then (VF k )k∈N is non-decreasing. 2. Monotone approximations: for every indices k < k 0 we have that 0

Vtk ≥ Vtk ≥ Vt

(2.1)

when opt = inf

0

and Vtk ≤ Vtk ≤ Vt when opt = sup. 3. For every k ∈ N and every t ∈ [[0, T − 1]] we have that k (2.2) Bt Vt+1 ≤ Vtk when opt = inf k and Bt Vt+1 ≥ Vtk when opt = sup. 4. For every k ≥ 1, we have k−1 k (2.3) Bt Vt+1 xt = Vtk xk−1 . t Proof. We prove each point succesively when opt = inf, as the case opt = sup is similar. 0 1. Let F k ⊂ F k be two set of functionals. When opt = inf for every x ∈ X we have that VF k0 (x) := inf 0 φ(x) ≤ inf φ(x) =: VF k (x). φ∈F k

φ∈F k

2. By construction of Algorithm 1.1, the sequence of sets Ftk k∈N is non0 decreasing, thus for every indices k < k 0 we have that Vtk ≥ Vtk when 0 opt = inf (and Vtk ≤ Vtk when opt = sup).

8

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Now we show that Vtk ≥ Vt when opt = inf, the case opt = sup is analogous. Fix k ∈ N, we show this by backward recursion on t ∈ [[0, T ]]. For t = T , by validity of the selection functions Definition 1.3, for every φ ∈ FTk we have that φ ≥ VT . Thus VTk = VFTk = inf φ∈FTk φ ≥ VT . Now, suppose that for k some t ∈ [[0, T − 1]] we have Vt+1 ≥ Vt+1 . Applying the Bellman operator, using the definition of the value function (0.2) and as the Bellman operators are order preserving, we get the desired result. 3. We prove the assertion by induction on k ∈ N in the case opt = inf. For k = 0, as Ft0 = ∅ we have Vt0 = +∞. Thus the assertion is true for k = 0. Assume that for some k ∈ N we have k (2.4) Bt Vt+1 ≤ Vtk . k+1 k By (2.1) we have that Vt+1 ≤ Vt+1 . Thus, as the Bellman operators are k+1 k order preserving we have that Bt Vt+1 ≤ Bt Vt+1 . Thus by induction hypothesis (2.4) we get k+1 (2.5) Bt Vt+1 ≤ Vtk .

Moreover as the selection function is valid, we have that : k+1 (2.6) Bt Vt+1 ≤ φk+1 . t , Finally, by construction of Algorithm 1.1 we have that Vtk+1 = inf Vtk , φk+1 t so using (2.5) and (2.6) we deduce the desired result k+1 Bt Vt+1 ≤ Vtk+1 . 4. As the selection function φ]t is tight in the sense of Definition 1.3 we have by construction of Algorithm 1.1 that k−1 k Bt Vt+1 xt = φkt xk−1 . t Combining it with (2.2) (or its variant when opt = sup) and the definition of Vtk , one gets the desired equality. In the followingk two propositions, we state that the sequences of functionals Vtk k∈N and Bt Vt+1 converge uniformly on any compact included in the dok∈N main of Vt . The limit functional of Vtk k∈N , noted Vt∗ , will be our natural candidate to be equal to the value function Vt . Moreover, the convergence will be µ-almost sure where (see [6, pages 257-259]) µ is the unique probability measure over the countable cartesian product XT +1 × . . . × XT +1 × . . . endowed with the product σ-algebra such that for every borelian Xi ⊂ XT +1 , 1 ≤ i ≤ k, Y µ X1 × . . . × Xk × XT +1 = µ1 (X1 ) × . . . × µk (Xk ) , i≥k+1

where µk k∈N is the sequence of probability measures generated by Algorithm 1.1 through the Oracle. Proposition 2.2 (Existenceof an approximating limit). Let t ∈ [[0, T ]] be fixed. The sequence of functionals Vtk k∈N defined as Vtk := VFtk (where the sets Ftk are generated by Algorithm 1.1) µ-a.s. converges uniformly on every compact set included in the domain of Vt to a functional Vt∗ .

9 Proof. Let t ∈ [[0, T ]] be fixed and let Kt be a given compact set included in the domain of Vt . We denote by Vtk k∈N a sequence of approximations generated by Algorithm 1.1. The proof relies on the Arzel`a-Ascoli Theorem [18, Theorem 2.13.30 p.347]. More precisely, First, by Assumptions 1.2 each functional in Ft have a common modulus of continuity. Thus as Ftk ⊂ Ft , the family of functionals Vtk k≥0 is equicontinuous. Now, by Lemma 2.1, the sequence of functionals Vtk k≥0 is monotone. Now for every x ∈ Kt , the set Vtk (x) k≥1 is bounded by Vt (x), which is finite since we assumed Kt ⊂ dom (Vt ), and Vt1 (x). Hence the set Vtk (x) k≥1 is a bounded subset of R and thus relatively compact. By Arzel` a-Ascoli Theorem, the sequence of functions Vtk k≥1 is a relatively compact set of C (Kt , R) for the topology of the uniform convergence, i.e. there exists a subsequence of Vtk k≥1 converging uniformly to a function Vt∗ ∈ C (Kt , R) . Finally, as Vtk k≥0 is a monotone sequence of functions, we conclude that the sequence Vtk k≥0 converges uniformly on the compact Kt to Vt∗ ∈ C (Kt , R). Proposition 2.3. Let t ∈ [[0, T − 1]] be fixed and Vt∗ be the function defined in k Proposition 2.2. The sequence Bt Vt+1 µ-a.s. converges uniformly to the continuous ∗ function Bt Vt+1 on every compact sets included in the domain of Vt . Proof. We will stick to the case when opt = inf and leave the other case to the reader. Let Kt be a given compact set included in dom (Vt ). First, as the sek quence Vt+1 is non-increasing and using the fact that the operator Bt is order k∈N k preserving, the sequence Bt Vt+1 is also non-increasing. By stability of the k∈N k Bellman operator Bt (see Assumptions 1.2),we have that the function Bt Vt+1 is k in Ft for every k ∈ N and thus the family Bt Vt+1 is equicontinuous using k∈N the common regularity assumption in Assumptions 1.2. Moreover, given x ∈ Kt , the k set Bt Vt+1 (x) k≥1 is bounded by Vt1 (x) and Vt (x) which take finite values on dom (Vt ). Thus, using againArzel` a-Ascoli Theorem, there exists a continuous funck tional φ such that Bt Vt+1 converges uniformly to φ on any compact included k∈N in dom (Vt ). ∗ We now show that the functional φ is equal to Bt Vt+1 on the given compact ∗ Kt or equivalently we show that φ + δKt = Bt Vt+1 + δKt . As already shown in k ∗ Proposition 2.2, the sequence Vt+1 is lower bounded by Vt+1 . We thus have that k∈N k ∗ Vt+1 ≥ Vt+1 , which combined with the fact that the operator Bt is order preserving, gives, for every k ≥ 1, that k ∗ Bt Vt+1 ≥ Bt Vt+1 . Adding, on both side of the previous inequality, the mapping δKt and taking the limit as k goes to infinity, we have that ∗ φ + δKt ≥ Bt Vt+1 + δKt . For the converse inequality, by the existence of optimal sets (see Assumptions 1.2), there exists a compact set Kt+1 ⊂ dom (Vt+1 ) such that ∗ ∗ (2.7) Bt Vt+1 + δKt+1 ≤ Bt Vt+1 + δKt . k By Proposition 2.2, the non-increasing sequence Vt+1 converges uniformly to k∈N ∗ Vt+1 on the compact set Kt+1 . Thus, for any fixed > 0, there exists an integer

10

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

k0 ∈ N such that we have k k ∗ Vt+1 ≤ Vt+1 + δKt+1 ≤ Vt+1 + + δKt+1 ,

for all k ≥ k0 . By Assumptions 1.2, the operator Bt is order preserving and additively M-subhomogeneous, thus we get k k ∗ Bt Vt+1 ≤ Bt Vt+1 + δKt+1 ≤ Bt Vt+1 + δKt+1 + M , which, combined with (2.7) gives that k ∗ Bt Vt+1 + δKt ≤ Bt Vt+1 + M + δKt , for every k ≥ k0 . Taking the limit when k goes to infinity we obtain that ∗ φ + δKt ≤ Bt Vt+1 + δKt + M . ∗ The result is proved for all > 0 and we have thus shown that φ = Bt Vt+1 on k the compact set Kt . We conclude that Bt Vt+1 converges uniformly to the k∈N ∗ functional Bt Vt+1 on the compact set Kt . We want to exploit that our approximations of the final cost function are exact in the sense that we have equality between VTk and VT at the points drawn in Algorithm 1.1, that is, the tightness assumption of the selection function is much stronger at time T than for times t < T . Thus we want to propagate the information backward in time: starting from time t = T we want to deduce information on the approximations for times t < T . In order to show that Vt = Vt∗ on some set Xt , a dissymmetry between upper and lower approximations is emphasized. We introduce the notion of optimal sets (Xt )t∈[[0,T ]] with respect to a sequence of functionals (φt )t∈[[0,T ]] as a condition on the sets (Xt )t∈[[0,T ]] such that if one wants to compute the restriction of Bt (φt+1 ) to Xt , ones only need to know φt+1 on the set Xt+1 . The Figure 3 illustrates this notion. Definition 2.4 (Optimal sets). Let (φt )t∈[[0,T ]] be T + 1 functionals on X. A sequence of sets (Xt )t∈[[0,T ]] is said to be (φt )-optimal or optimal with respect to (φt )t∈[[0,T ]] , if for every t ∈ [[0, T − 1]] we have (2.8)

Bt φt+1 + δXt+1 + δXt = Bt (φt+1 ) + δXt .

When approximating from below, the optimality of sets is only needed for the functions (Vt∗ )t∈[[0,T ]] whereas when approximating from above, one needs the optimality of sets with respect to (Vt )t∈[[0,T ]] . It seems easier to ensure the optimality of sets for (Vt∗ )t∈[[0,T ]] than for (Vt )t∈[[0,T ]] as the functional Vt∗ is known through the se quence Vtk k∈N whereas the function Vt is, a priori, unknown. This fact is discussed in section 3. Lemma 2.5 which is — optimal — optimal If the sequence Equations:

(Unicity in Bellman Equation). Let (Xt )t∈[[0,T ]] be a sequence of sets with respect to (Vt )t∈[[0,T ]] when opt = inf, with respect to (Vt∗ )t∈[[0,T ]] when opt = sup. of functionals (Vt∗ )t∈[[0,T ]] satisfies the following modified Bellman

11 j

k

l

m

Fig. 3. The optimality of the sets (Xt )t∈[[0,T ]] means that in order to compute the restriction of Bt (φt+1 ) to Xt , one only needs to know the values of φt+1 on the set Xt+1 .

( (2.9)

VT∗ + δXT = ψ + δXT ∗ ∀t ∈ [[0, T − 1]], Bt Vt+1 + δXt = Vt∗ + δXt .

Then for every t ∈ [[0, T ]] and every x ∈ Xt we have that Vt∗ (x) = Vt (x). Proof. We prove the lemma by backward recursion on the time t ∈ [[0, T ]], first for the case opt = inf. For time t = T , since by definition of VT = ψ (see (0.2)) we have VT∗ + δXT = ψ + δXT = VT + δXT . Now, let time t ∈ [[0, T − 1]] be fixed and ∗ assume that we have Vt+1 (x) = Vt+1 (x) for every x ∈ Xt+1 i.e. (2.10)

∗ Vt+1 + δXt+1 = Vt+1 + δXt+1 .

Using Lemma 2.1, the sequence of functions Vtk is lower bounded by Vt . Taking the limit in k, we obtain that Vt∗ ≥ Vt , we have thus only to prove that Vt∗ ≤ Vt on Xt , that is Vt∗ + δXt ≤ Vt + δXt . We successively have: (by (2.9))

∗ Vt∗ + δXt = Bt Vt+1 + δ Xt

(by induction assumption (2.10))

∗ ≤ Bt Vt+1 + δXt+1 + δXt = Bt Vt+1 + δXt+1 + δXt

((Xt )t∈[[0,T ]] is (Vt )-optimal)

= Bt (Vt+1 ) + δXt

(by (0.2))

= Vt + δXt ,

(Bt is order preserving)

which concludes the proof in the case of opt = inf. Now we briefly prove the case opt = sup by backward recursion on the time t ∈ [[0, T ]]. As for the case opt = inf, at time t = T , one has VT∗ + δXT = VT + δXT . Now assume that for some t ∈ [[0, T − 1]] one has Vt+1 + δXt+1 = Vt+1 + δXt+1 . By Lemma 2.1, the sequence of functions Vtk is upper bounded by Vt . Thus, taking the limit in k we obtain that Vt∗ ≤ Vt and we

12

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

only need to prove that Vt∗ + δXt ≥ Vt + δXt . We successively have: Vt + δXt = Bt (Vt+1 ) + δXt

(by (0.2))

((Xt )t∈[[0,T ]] is (Vt∗ )-optimal)

≤ Bt Vt+1 + δXt+1 + δXt ∗ = Bt Vt+1 + δXt+1 + δXt ∗ = Bt Vt+1 + δ Xt

(by (2.9))

= Vt∗ + δXt ,

(Bt is order preserving) (by induction assumption (2.10))

In the general case, one cannot hope for the limit object Vt∗ to be equal to the value function Vt everywhere. However, one can expect an (almost sure over the draws) equality between the two functions Vt and Vt∗ on all possible cluster points of sequences (yk )k∈N with yk ∈ Ktk for all k ∈ N, that is, on the set lim sup Ktk (see [17, Definition 4.1 p. 109]). Theorem 2.6 (Convergence of Tropical Dynamic Programming). Define Kt∗ := lim supk Ktk , for every time t ∈ [[0, T ]]. Assume that, µ-a.s the sets (Kt∗ )t∈[[0,T ]] are (Vt )-optimal when opt = inf (resp. (Vt∗ )-optimal when opt = sup). Then, µ-a.s. for every t ∈ [[0, T ]] the functional Vt∗ defined in Proposition 2.2 is equal to the value function Vt on Kt∗ . Proof. We will only study the case opt = inf as the case opt = sup is analogous. We will show that (2.9) holds with Xt = Kt∗ . By tightness of the selection function ∗ at time T , we get that VT∗ = VT on K fix t ∈ [[0, T − 1]]. T . Now, ∗ — First, we show that Bt Vt+1 ≤ Vt∗ . By Lemma 2.1, for every k ∈ N we have that k Bt Vt+1 ≤ Vtk . ∗ k Moreover, by Lemma 2.1 we have Vt+1 ≤ Vt+1 thus, as the operator Bt is order preserving we have ∗ k Bt Vt+1 ≤ Bt Vt+1 ≤ Vtk −→ Vt∗ (by Proposition 2.2). k→+∞

∗ — Now, we will show by contradiction that Bt Vt+1 ≥ Vt∗ on Kt∗ . Suppose ∗ that there exist x ∈ Kt such that ∗ (2.11) Bt Vt+1 (x) < Vt∗ (x). ∗ Define h := Vt∗ (x) − Bt Vt+1 (x) > 0. As illustrated in Figure 4, we will ∗ (x) show that there is an index k such that Vtk+1 (x) will be closer to Bt Vt+1 than Vt∗ (x), which will contradict the fact that the sequence Vtk (x) k∈N is non-increasing. To this end, we state several facts : – By equicontinuity of the sequence of fonctionals Vtk k∈N there exist η > 0 such that for every y ∈ B (x, η), for every index k ∈ N we have (2.12)

|Vtk (y) − Vtk (x)| ≤

h . 4

– By Lemma 2.1, for every index k ∈ N the k-th draw at time t, noted xkt is such that k k+1 (2.13) Bt Vt+1 (xt ) = Vtk+1 (xkt ).

13

∗ Fig. 4. If there exist x ∈ Kt∗ such that Bt Vt+1 (x) < Vt∗ (x) then there is an index k such k+1 ∗ that Vt (x) will be closer to Bt Vt+1 (x) than Vt∗ (x) as the selection function is tight. This will contradict the fact that the sequence Vtk (x) k∈N is non-increasing.

k – By Proposition 2.3, the sequence Bt Vt+1 converges uniformly to k∈N ∗ the continuous functional Bt Vt+1 on arbitrary compact sets included in dom (V t ). Hence, it converges pointwise to the continuous function ∗ Bt Vt+1 on dom (Vt ). Thus, we get the following inequality: for any y ∈ dom (Vt ), there exist a rank k0 ∈ N such that if k ≥ k0 we have that h ∗ k |Bt Vt+1 (y) − Bt Vt+1 (y)| ≤ . 4 ∗ 0 – By continuity of Bt Vt+1 at x, there exist η > 0 such that η 0 < η and for every y ∈ B (x, η 0 ) we have (2.14)

(2.15)

h ∗ ∗ |Bt Vt+1 (y) − Bt Vt+1 (x)| ≤ . 4

– As x is in Kt∗ := lim supk Ktk , by definition of the limsup, there exist an σ(k) σ(k) increasing function σ : N → N and a sequence of points yt ∈ Kt σ(k) −→ x. Thus, there exist a rank k1 ≥ k0 such that if such that yt k→+∞ 0 0 σ(k) σ(k) k ≥ k1 then yt ∈ B x, η2 and a fortiori B yt , η2 ⊂ B (x, η 0 ). Let (Xkt )k∈N be the sequence of random variables where for each k ∈ N, Xkt is the t-th marginal of a random variable of probability law µk . We have that for every k ≥ k1 , σ(k) σ(k) σ(k) P Xt ∈ B (x, η 0 ) ≥ P Xt ∈ B yt , η 0 /2 σ(k) σ(k) σ(k) ≥ P Xt ∈ B yt , η 0 /2 ∩ Kt σ(k) σ(k) = µσ(k) B yt , η 0 /2 ∩ Kt 0 η ≥g (by Assumptions 1.4). 2

14

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Hence, the probability that the subsequence never draw a point in the ball B (x, η 0 ) after the rank σ (k1 ) is bounded from above by Y (1 − g (η 0 /2)) = 0. | {z } k≥k1

∈(0,1)

Therefore, µ-almost surely, there exists an index k2 ≥ k1 such that the σ(k2 )-th draw at time t in Algorithm 1.1 satisfies: xkt ∈ B (x, η 0 ) ,

(2.16)

where k is a simplified notation for σ(k2 ), k := σ(k2 ). By (2.16), the state xkt satisfies both (2.12) and (2.15). Thus, we can conclude ∗ that Vtk+1 (x) is closer to Bt Vt+1 (x) than Vt∗ (x) as detailed now: ∗ |Vtk+1 (x) − Bt Vt+1 (x) | ≤ |Vtk+1 (x) − Vtk+1 xkt | {z } | ≤h/4 by (2.12)

k+1 + |Vtk+1 xkt − Bt Vt+1 xkt | | {z } =0 by (2.13)

+ |Bt | +

k+1 Vt+1

∗ |Bt Vt+1

| ≤

k ∗ xkt − Bt Vt+1 xt | {z } ≤h/4 by (2.14)

∗ xkt − Bt Vt+1 (x) | {z }

≤h/4 by (2.15)

3h . 4

Hence we have that ∗ ∗ Vtk+1 (x) = Vtk+1 (x) − Bt Vt+1 (x) + Bt Vt+1 (x) 3h ∗ ≤ Bt Vt+1 (x) + . 4 And finally we get ∗ ∗ (x) − Vt∗ (x) − Bt Vt+1 (x) Vtk+1 (x) − Vt∗ (x) = Vtk+1 (x) − Bt Vt+1 3h −h 4 < 0, ≤

which contradicts the fact that the sequence Vtk k∈N is non-increasing (Lemma 2.1). Hence, there is no x ∈ Kt such that (2.11) holds. We conclude that the sequence (Vt∗ )k∈N satisfies the modified Bellman Equation (2.9) with the sequence (Kt∗ )k∈N . The conclusion follows from the Unicity Lemma Lemma 2.5. 3. The multistage framework and examples of selection functions.

15 3.1. SDDP selection function: lower approximations in the linearconvex framework. We will show that our framework contains a similar framework of (the deterministic version of) the SDDP algorithm as described in [8] and yields the same result of convergence. Let X = Rn be a continuous state space and U = Rm a continuous control space. We want to solve the following problem min

T −1 X

x=(x0 ,...,xT ) u=(u0 ,...uT −1 ) t=0

(3.1)

ct (xt , ut ) + ψ(xT )

x0 ∈ X is given, ∀t ∈ [[0, T ]], x ∈ X ⊂ X, t t s.t. ∀t ∈ [[0, T − 1]], u ∈ Ut ⊂ U, t ∀t ∈ [[0, T − 1]], xt+1 = ft (xt , ut ).

We make similar assumptions as in [8], one can look at this article for more in-depth comments about them. Assumptions 3.1. For all t ∈ [[0, T − 1]] we assume that: — The set Xt ⊂ X and XT ⊂ X are convex and compacts with non-empty relative interior. — The set Ut is non-empty, convex and compact. — The dynamic ft : X × U −→ X is linear: ft (x, u) = At x + Bt u, for some given matrices At and Bt of compatible dimensions. — The cost function ct : X × U −→ R is a proper convex lower semicontinuous (lsc) function and is a Ct -Lipschitz continuous function on Xt × Ut . — The final cost function ψ : X −→ R is a proper convex lsc function and is a CT -Lipschitz continuous function on XT . — Relatively Complete Recourse (RCR). For every x ∈ Xt we have that ft (x, Ut ) ∩ ri (Xt+1 ) 6= ∅. For every time step t ∈ [[0, T − 1]], we define the Bellman operator Bt for every functional φ : X → R by: Bt (φ) := inf ct (·, u) + φ (ft (·, u)) . u∈U

Moreover, for every functional φ : X → R and every (x, u) ∈ X × U we define Btu (φ) (x) := ct (x, u) + φ (ft (·, u)) ∈ R. The Dynamic Programming principle using the Bellman’s operators Bt yields: ( VT = ψ (3.2) ∀t ∈ [[0, T − 1]], Vt : x ∈ X 7→ Bt (Vt+1 )(x) ∈ R. Using a generalization of Hoffman’s Lemma [4, Theorem 9] that bounds from above the distance between the image by a linear mapping of a point and a convex set, we show that the image of a Lipschitz continuous function by a Bellman operator will also be Lipschitz continuous, with an explicit (conservative) constant.

16

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Proposition 3.2. Under Assumptions 3.1, for every t ∈ [[0, T − 1]] there exist a constant Lt > 0 such that if φ : X → R is convex and Lt+1 -Lipschitz continuous on Xt+1 then Bt (φ) is Lt -Lipschitz continuous on Xt . Proof. Fix t ∈ [[0, T − 1]] and let φ : X → R be a convex and Lt+1 -Lipschitz continuous function on Xt+1 . Let xt ∈ Xt be arbitrary. By the RCR Assumption 3.1, there exist ut ∈ Ut such that ft (xt , ut ) ∈ ri (Xt+1 ). Thus, we have min ct (xt , u) + φ (ft (xt , u)) ≤ ct (xt , ut ) + φ (ft (xt , ut )) < +∞.

u∈Ut

Moreover, by compacity of Ut and lower semicontinuity of u 7→ ct (x, u), φ and ft , the above minimum is attained. We have shown that dom (Bt (φ)) includes Xt . Now the function x 7→ minu∈Ut ct (x, u) + φ (ft (x, u)) = Bt (φ) (x) is convex on Xt as a marginal of a jointly convex function. We finally show that the function Bt (φ) is Lipschitz on Xt with a constant Lt > 0 detailled below. Let x be in Xt and note ux an optimal control at x, that is ux satisfies Btux (φ) (x) = Bt (φ) (x). Note that for every x0 ∈ Xt , by the RCR assumption, there exist a control u such that fy (u) := At y + Bt u is an element of ri Xt+1 . Thus, by Hoffman’s Lemma (see [4, Theorem 9]) there exist a constant γ > 0 such that (3.3)

inf u s.t. fx0 (u)∈Xt+1

kux − uk ≤ γ dist (fx0 (ux ) , Xt+1 ) .

We want to bound from above inf u s.t. fx0 (u)∈Xt+1 kux −uk by a constant times kx−x0 k. As (−At x) ∈ Bt ux − Xt+1 , by triangle inequality we have that dist (Bt ux − (−At x0 ) , Xt+1 ) ≤ dist (At x0 + Bt ux , At x + Bt ux ) = dist (At x0 , At x) (3.4)

≤ kAt x0 − At xk 1/2 dist (fx0 (ux ) , Xt+1 ) ≤ λmax ATt At kx − x0 k.

Setting κ1 := γ λmax AT A (3.5)

1/2

, by (3.3) and (3.4) we have shown that

inf u s.t. fx0 (u)∈Xt+1

kux − uk ≤ κ1 kx − x0 k. u

Similarly, denoting ux0 a control such that Bt x0 (φ) (x0 ) = Bt (φ) (x0 ), there exists a constant κ2 > 0 such that we have (3.6)

inf u s.t. fx (u)∈Xt+1

kux0 − uk ≤ κ2 kx − x0 k.

Now for every u such that fx0 (u) ∈ Xt+1 , as ct is Ct -Lipschitz continuous, φ is Lt+1 -Lipschitz continuous and ft is linear, we have that Bt (φ) (x0 ) ≤ Bt (φ) (x) + Btu (φ) (x0 ) − Bt (φ) (x) = Bt (φ) (x) + ct (x0 , u) + φ (ft (x0 , u)) − ct (x, ux ) − φ (ft (x, ux )) ≤ Bt (φ) (x) + Ct (kx − x0 k + kux − uk) + Lt+1 λmax ATt At kx − x0 k + λmax BtT Bt ku − ux k .

17 So, taking the infimum over the u such that fx0 (u) ∈ Xt+1 , setting κ := max (κ1 , κ2 ) and Lt := Ct (1 + κ) + Lt+1 λmax ATt At + λmax BtT Bt κ one has by (3.6) that (3.7)

Bt (φ) (x0 ) − Bt (φ) (x) ≤ Lt kx − x0 k.

Similarly, one can show that (3.8)

Bt (φ) (x) − Bt (φ) (x0 ) ≤ Lt kx − x0 k.

The result follows from both (3.7) and (3.8). As lower semicontinuous proper convex functions can be approximated by a supremum of affine function, for every t ∈ [[0, T ]] we define FSDDP to be the set of affine t functions φ : x ∈ X 7→ ha, xi + b ∈ R, a ∈ X and b ∈ R with kak2 ≤ Lt . Moreover we’ll denote by FSDDP the set of convex functionals φ : X 7→ R which are Lt -Lipschitz t continuous on Xt and proper. Proposition 3.3. Under Assumptions 3.1, the problem (3.1) and the Bellman operators defined in (3.2) satisfy the structural assumptions Assumptions 1.2. Proof. — Common regularity: By construction, for all t ∈ [[0, T ]], every element of is Lt -Lipschitz continuous. FSDDP t — Final condition. As ψ is convex proper and LT -Lipschitz continuous on XT , it is a countable (as Rn is separable) supremum of LT -Lipschitz affine functions. — Stability by the Bellman operators. This has been shown in Proposition 3.2. — Stability by pointwise optimum. Recall that we are here on the case opt = sup. Fix t ∈ [[0, T ]] and let F ⊂ FSDDP be a set of affine Lt -Lipschitz t continuous functionals. For every x, x0 ∈ dom (Vt ), we have that |VF (x) − VF (x0 ) | = | sup φ(x) − sup φ(x0 )| ≤ sup |φ(x) − φ(x0 )| ≤ Lt kx − x0 k. φ∈F

φ∈F

φ∈F

Thus the function VF is Lt -Lipschitz continuous. As a supremum of affine functions is convex, VF is convex and finite valued. We have shown that VF . is an element of FSDDP t — Order preserving operators. Let φ1 and φ2 be two functionals over X such that φ1 ≤ φ2 i.e. for every x ∈ X we have φ1 (x) ≤ φ2 (x). We want to show that Bt (φ1 ) ≤ Bt (φ2 ). Let x ∈ X, we have : Bt (φ1 ) (x) = inf xT Ct x + uT Dt u + φ1 (ft (x, u)) u∈U

≤ inf xT Ct x + uT Dt u + φ2 (ft (x, u)) u∈U

= Bt (φ2 ) (x). — Existence of the value functions. By backward recursion on the time step t ∈ [[0, T ]] and by Proposition 3.2, for every time step t ∈ [[0, T ]] the function Vt defined by the Dynamic Programming equation (3.2) is convex and Lt -Lipschitz continuous on Xt . — Existence of optimal sets. Fix an arbitrary element φ ∈ FSDDP . We will t show that for every compact set Kt ⊂ dom (Vt ), there exist a compact set Kt+1 ⊂ dom (Vt+1 ) such that (3.9) Bt φ + δKt+1 + δKt = Bt (φ) + δKt ,

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

18

which will imply the desired result. Now equation (3.9) is equivalent to the fact that for every state xt ∈ Kt , there exist a control ut ∈ Ut such that ft (xt , ut ) ∈ Kt+1 ut ∈ arg min ct (xt , u) + φ (ft (xt , u)) . {z } u∈Ut | u =Bt (φ)(x)=:B(u)

Now as the function B : u ∈ Ut 7→ B(u) ∈ R is lower semicontinuous proper and convex (as the sum of two convex functions) we have that B attains its minimum on the compact Ut and that its set of minimizers, that we will note Ut∗ , is bounded. As it is also a closed set of Rn , Ut∗ is a compact set of Rn . Thus setting Kt+1 := ft (Xt , Ut∗ ) we have that equation (3.9) is satisfied and that Kt+1 is compact as ft is continuous and by compactness of Xt and Ut∗ . Lastly, by the RCR Assumption, we have that Kt+1 ⊂ dom (Vt+1 ). — Additively M -subhomogeneous operators. We will show that Bt is additively homogeneous, that is M = 1. Let λ : X → R be a positive constant function, we will identify λ with its value. Let φ be a functional over X. For any x ∈ X we have : Bt (λ + φ) (x) = inf ct (x, u) + (λ + φ) (ft (x, u)) u∈U

= inf ct (x, u) + λ + φ (ft (x, u)) u∈U

= λ + inf ct (x, u) + φ (ft (x, u)) u∈U

= λ + Bt (φ) (x). Note that the constraint ft (x0 , u) restricts the current approximation at time t+1 to the set Xt+1 included in the domain of the true value function Vt+1 . Now we define a compatible selection function. Let t ∈ [[0, T − 1]], for any F ⊂ FSDDP and x ∈ X, define the following optimization problem t " (3.10)

# 0

0

min ct (x , u) + sup φ (ft (x , u))

x0 ∈X u∈U

s.t. x0 = x, ft (x0 , u) ∈ Xt+1 .

φ∈F

If we denote by b its optimal value and by a, a Lagrange multiplier of the constraint x0 − x = 0 at the optimum, then we define φSDDP (F, x) := ha, x0 − xi + b. t Finally, at time t = T , for any F ∈ FSDDP and x ∈ X we define φSDDP (F, x) := ha, x0 − xi + VT (x) , T where a ∈ ∂VT (x). Proposition 3.4. For every time t ∈ [[0, T ]], the function φSDDP is a compatible t selection function in the sense of Definition 1.3.

19 Proof. Fix t ∈ [[0, T − 1]], F ⊂ FSDDP and x ∈ X. We have Bt (VF ) (x) = b t by definition of Bt , thus the function φSDDP (F, x) is tight and it is valid as a is a t subgradient of the convex function Bt (VF ) at x. For t = T , the selection function φSDDP is tight and valid by convexity of VT . T If we want to apply the convergence result from Theorem 2.6, as we approximate from below the value functions (opt = sup) then one has to make the draws according to some sets Ktk such that the sets Kt∗ := lim supk∈N Ktk are Vt∗ optimal. As done in the litterature of the Stochastic Dual Dynamic Programming algorithm (see for example [8], [20] or [14]), one can study the case when the draws are made along the optimal trajectories of the current approximations. More precisely, fix k ∈ N we define a sequence (xk0 , xk1 , . . . , xkT ) by ( k x0 := x0 ∀t ∈ [[0, T − 1]], xkt+1 := ft xkt , ukt , where ukt ∈ arg minu Btu Vtk xkt . We say that such a sequence (xk0 , xk1 , . . . , xkT ) is an optimal trajectory for the k-th approximations starting from x0 . Proposition 3.5 (Optimality of trajectories). quence of singletons: ( k K0 := {x0 }

For every k ∈ N define a se-

k ∀t ∈ [[0, T − 1]], Kt+1 := xkt+1 , where (xk0 , xk1 , . . . , xkT ) is an optimal trajectory for the k-th approximations starting from x0 . Lastly, for every t ∈ [[0, T ]] define Kt∗ := lim supk Ktk . Then the sequence of sets (Kt∗ )t∈[[0,T ]] is optimal with respect to (Vt∗ )t∈[[0,T ]] in the sense of Definition 2.4. Proof. — First we show that for every k ∈ N the sets Ktk t∈[[0,T ]] are Vtk -optimal. Let k ∈ N and t ∈ [[0, T − 1]] be fixed. We want to show that k k Bt Vt+1 + δKt+1 + δKtk = Bt Vt+1 + δKtk , k which is equivalent to prove that for every xkt ∈ Ktk we have k k k (3.11) Bt Vt+1 + δKt+1 xkt = Bt Vt+1 (xt ). k Now using the definition of the Bellman Operators (0.2), the equation (3.11) is satisfied if, and only if, there exist a control ukt ∈ U such that (3.12) k k ut ∈ arg min Btk (u) where Btk (u) := ct xkt , u + Vt+1 ft xkt , u u∈Ut

k ft xkt , ukt ∈ Kt+1 .

Which is true by construction of the sets Ktk t∈[[0,T ]] . — Now we can deduce that the sets (Kt∗ )t∈[[0,T ]] are Vt∗ -optimal. First, as for (3.12), the compability relation ∗ ∗ ∗ Bt Vt+1 + δKt+1 + δKt∗ = Bt Vt+1 + δKt∗ ,

20

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

is satisfied if, and only if, for every x∗t ∈ Kt∗ there exist a control u∗t ∈ Ut such that ∗ u∗t ∈ arg min Bt∗ (u) with Bt∗ (u) = ct (x∗t , u) + Vt+1 (ft (x∗t , u)) u (3.13) f (x∗ , u∗ ) ∈ K ∗ . t t t t+1 k For every k ∈ N denote by ukt the control defining the set Kt+1 . As Ut is k compact, one can extract from the sequence of controls ut k∈N a converging subsequence to a control u∗t that we will also note ukt k∈N . Let x∗t be an element of Kt∗ , by definition of Kt∗ := lim supk Ktk , one can find a sequence of points xkt ∈ Ktk such that one of its subsequence converges to x∗t . Thus, extracting subsequences if needed, one can consider that we have simultaneously k ∗ xt −→ xt k→+∞

ukt

−→ u∗t .

k→+∞

Which implies, by continuity of ft , that the sequence ft xkt , ukt converges ∗ to a point in Kt+1 . 0 Let k, k ∈ N be two indices, as for every u ∈ U the function x 7→ ct xkt , u is k Ct -Lipschitz continuous, Vt+1 ∈ Fkt+1 is Lt+1 Lipschitz and for every u ∈ Ut 1/2 the linear function x 7→ ft (x, u) is λmax ATt At -Lipschitz continuous, so for every u ∈ U we have that k 0 0 T sup |Btk (u) − Btk (u)| ≤ Ct + Lt+1 λ1/2 kxt − xkt k −→ 0. max A A u∈Ut

k→+∞

Thus we have shown that the sequence of functions Btk k∈N converges uniformly on the compact Ut . Moreover as it converges pointwise to Bt∗ , the sequence Btk k∈N converges uniformly to Bt∗ on Ut . Finally, as for every k ∈ N, ukt ∈ arg minu∈Ut Btk (u) we have that u∗t ∈ arg minu∈Ut Bt∗ (u). As the structural assumptions Assumptions 1.2 are satisfied, as the functions , 0 ≤ t ≤ T are compatible selections and the sets (Kt∗ )t∈[[0,T ]] are Vt∗ -optimal φSDDP t (case opt = sup) by Theorem 2.6 we have the following convergence result, which is analogous to the ones in the literature. Theorem 3.6 (Lower (outer) approximations of the value functions). Under Assumptions 3.1, for every t ∈ [[0, T ]], denote by Vtk k∈N the sequence of functionals generated by Tropical Dynamic Programming with the selection function φSDDP and t k the draws made uniformly over the sets K defined in Proposition 3.5. t Then the sequence Vtk k∈N is nondecreasing, bounded from above by Vt , and converges uniformly to Vt∗ on every compact set included in dom (Vt ). Moreover, almost surely over the draws, Vt∗ = Vt on lim supk∈N Ktk . 3.2. A min-plus selection function: upper approximations in the linearquadratic framework with both continuous and discrete controls. In this example, the cost functions will be quadratic without mixing terms, in the sense of the definition below. We will explain at the end of the section why we don’t lose generality (if we increase the dimension of the state space by 1) when studying

21 such specific quadratic forms instead of general ones. In particular this allows us to restrict our study the (compact) unit sphere as explained below. We will denote by Sn the set of n × n symmetric real matrices. Definition 3.7 (Pure quadratic form). We say that a functional q : X → R is a pure quadratic form if there exist a symmetric matrix M ∈ Sn such that for every x ∈ X we have q(x) = xT M x. Similarly, a functional q : X × U → R is a pure quadratic form if there exist two symmetric matrices M1 ∈ Sn and M2 ∈ Sm such that for every x ∈ X we have q(x, u) = xT M1 x + uT M2 u. Let X = Rn be a continuous state space (endowed with its euclidean structure), U = Rm a continuous control space and V a finite set of discrete (or switching) controls. We want to solve the following optimization problem T −1 X

min

(3.14)

(xt )t ∈XT t=0 (ut ,vt )t ∈(U×V)T

( s.t.

cvt t (xt , ut ) + ψ(xT )

x0 ∈ X is given, ∀t ∈ [[0, T − 1]], xt+1 = ftvt (xt , ur ) .

Assumptions 3.8. Let t ∈ [[0, T − 1]] and v ∈ V be arbitrary. — The dynamic ftv : X × U −→ X is linear: ftv (x, u) = Avt x + Btv u, for some given matrices Avt and Btv of compatible dimensions. — The cost function cvt : X × U −→ R is a pure convex quadratic form, cvt (x, u) = xT Ctv x + uT Dtv u, where the matrix Ctv is symmetric semidefinite positive and the matrix Dtv is symmetric definite positive. — The final cost function ψ := inf i∈I ψi is a finite infimum of pure convex quadratric form, of matrix Mi ∈ Sn with i ∈ I a finite set, such that there exists a constant αT ≥ 0 satisfying for every i ∈ I 0 Mi αT I. T

— The maximal eigenvalue of the symmetric semidefinite matrix (Avt ) Avt is less than 1: T λmax (Avt ) Avt < 1. One can write the Dynamic Programming principle for (3.14): ( VT = ψ (3.15) ∀t ∈ [[0, T − 1]], Vt : x ∈ X 7→ inf inf cvt (x, u) + φ (ftv (x, u)) . v∈V u∈U

The following result is crucial in order to study this example: the value functions are 2-homogeneous, allowing us to restrict their study to the unit sphere.

22

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Proposition 3.9. For every time step t ∈ [[0, T ]], the value function Vt , solution of (3.15) is 2-homogeneous, i.e. for every x ∈ X and every λ ∈ R we have Vt (λx) = λ2 Vt (x) . Proof. We proceed by backward recursion on the time step t ∈ [[0, T ]]. For t = T it is true by Assumptions 3.8. Assume that it is true for some t ∈ [[1, T ]]. Fix λ ∈ R, then by definition of Vt−1 , for every x ∈ X we have v Vt−1 (λx) = min min cvt−1 (λx, u) + Vt ft−1 (λx, u) , v∈V u∈U

v which yields the result by 2-homogeneity of x 7→ cvt−1 (x, u), linearity of ft−1 and 2-homogenity of Vt .

Thus, in order to compute Vt , one only needs to know its values on the unit x 2 (euclidean) sphere S as for every non-zero x ∈ X, Vt (x) = kxk Vt kxk . We will study a dynamic programming formulation that exploits this fact. For every time t ∈ [[0, T − 1]] and every switching control v ∈ V we define the Bellman operator with fixed switching control Btv for every functional φ : X → R by: Btv (φ) := inf cvt (·, u) + kftv (·, u)k2 φ u∈U

ftv (·, u) kftv (·, u)k

.

For every time t ∈ [[0, T − 1]] we define the Bellman operator Bt for every functional φ : X → R by: Bt (φ) := inf Btv (φ) . v∈V

Then one can rewrite the Dynamic Programming equation (3.15) as ( (3.16)

VT = ψ ∀t ∈ [[0, T − 1]], Vt = Bt (Vt+1 ) .

The point of writing the Dynamic Programming equation this way is to be able to compute the image by the Bellman operator of a functional by only knowing this functional on the unit (euclidean) sphere S. This will ensure that the unit sphere S is Vt -optimal in the sense of Definition 2.4. Now in order to apply the Tropical Dynamic Programming algorithm to (3.16) we need to check Assumptions 1.2. Under Assumptions 3.8, there exist an interval in the cone of symmetric semidefinite matrices which is stable by every Bellman operator Bt in the sense of the proposition below. We will consider the Loewner order on the cone of (real) symmetric semidefinite matrices, i.e. for every couple of matrices of symmetric matrices (M1 , M2 ) we say that M1 M2 if, and only if, M2 − M1 is semidefinite positive. Moreover we’ll identify a pure quadratic form with its symmetric matrix, thus when we write an infimum over symmetric matrices, we mean the pointwise infimum over their associated pure quadratic forms. Proposition 3.10 (Existence of a stable interval). Fix t ∈ [[0, T − 1]]. Under Assumptions 3.8, there exists a constant αt > 0 such that for every β ≥ αt we have: (3.17)

0 M βI ⇒ 0 Bt (M ) βI.

23 Proof. We want to show that if M 0 then Bt (M ) 0. First, as in Proposition 3.3 one can show that the Bellman operator Bt is order preserving. Therefore, if M 0 then Bt (M ) Bt (0). Hence it is enough to show that Bt (0) 0. But by formula (A.2), we have that Bt (0) = minv∈V Ctv 0 (by Assumptions 3.8) hence the result follows. Now, as Bt is order preserving, if M βI then Bt (M ) Bt (βI). Hence it suffices to find β > 0 such that Bt (βI) βI. By formula (A.2) we have:

(3.18)

T

Btv (βI) βI ⇔ β (Avt )

−1 −1 T I + βBtv (Dtv ) (Btv ) Avt + Ctv βI.

Now, using propositions Proposition B.1, Proposition B.2 and using the notations introduced in those propositions, finding β > 0 satisfying equation (3.18) is equivalent to find β > 0 such that T βλmax (Avt ) Avt + λmax (Ctv ) ≤ β. −1 T v v v λmin I + βBt (Dt ) (Bt ) T −1 ≥ 1, it suffices to find β ≥ 0 such that Noting that as λmin I + βBtv (Dt ) (Btv ) T βλmax (Avt ) Avt + λmax (Ctv ) ≤ β, which, under Assumptions 3.8, is equivalent to (3.19)

β≥

λmax (Ctv ) . T 1 − λmax (Avt ) Avt

Finally, by setting αt := max v∈V

λmax (Ctv ) > 0, T 1 − λmax (Avt ) Avt

every β ≥ αt satisfies (3.19) which concludes the proof. For every time step t ∈ [[0, T ]], we define (3.20)

βt := max αs , s∈[[t,T ]]

the basic functions FQu will be pure quadratic convex forms bounded in the Loewner t sense by 0 and βt I, FQu := φ : x ∈ X 7→ xT M x ∈ R | M ∈ Sn , 0 M βt I , t and we define the following class of functions which will be stable by pointwise infimum of elements in FQu t , n o (3.21) FQu := VF | F ⊂ FQu . t t Proposition 3.11. Under Assumptions 3.8, the Problem (3.14) and the Bellman operators defined in (3.15) satisfy the structural assumptions given in Assumptions 1.2.

24

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Proof. — Common regularity. We will show that every element of FQu is 2βt Lipschitz continuous on S. Let F = {φi }i∈N ⊂ FQu with N ⊂ N and φ ∈ FQu i t t with associated symmetric matrix Mi . Fix x, y ∈ S, we have |VF (x) − VF (y)| = | inf xT Mi x − inf y T Mi y| i∈N

i∈N T

T

≤ max |x Mi x − y Mi y| i∈N

≤ max |xT Mi (x − y) + y T Mi (x − y) | i∈N

(M T = M )

≤ max |hx + y, Mi (x − y)i|

(Cauchy-Schwarz)

≤ kx + yk · max kMi (x − y) k

i∈N

i∈N

≤ kx + yk · max kMi k kx − yk i∈N | {z } ≤βt

≤ βt kx + yk ·kx − yk | {z } ≤2

|VF (x) − VF (y)| ≤ 2βt kx − yk2 . — Final condition. By Assumptions 3.8, the final cost function ψ is an element of FQu t . — Stability by the Bellman operators. Let φ be in FQu t+1 . By construction, the symmetric matrix M associated with the pure quadratic form φ is such that 0 M βt+1 I. By (3.20) we have αt ≤ βt+1 ≤ βt , thus 0 M βt I and by Proposition 3.10 we have that 0 Bt (M ) βt I. Qu We have shown that if φ is in FQu t+1 , then Bt (φ) is in Ft . — Stability by pointwise optimum. This is true by construction of FQu t (3.21). — Order preserving operators. Proceed as in Proposition 3.3. — Existence of the value functions. This is a consequence of (A.1) and the fact that for every functionals φ1 , φ2 : X → R we have that

Bt (inf (φ1 , φ2 )) = inf (Bt (φ1 ) , Bt (φ2 )) . — Existence of optimal sets. For every time step t ∈ [[0, T −1]], every compact Xt ⊂ dom(Vt ) (= X) and every functional φ ∈ FQu the unit sphere S satisfies t Bt (φ + δS ) = Bt (φ) , which implies the desired inequality. — Additively M -subhomogeneous operators. Set M ≥ 0 to be M :=

inf

sup kftv (x, u) k2 .

(u,v)∈U×V x∈S

Then M is finite as for every (u, v) ∈ U×V the function ftv (·, u) is continuous over the compact S. Let λ : X → R be a positive constant function, we will

25 Qu identify λ with its value. Let φ be an element of Ft+1 . For any x ∈ S we have : v ft (x, u) Bt (λ + φ) (x) = inf cvt (x, u) + kftv (x, u) k2 (λ + φ) kftv (x, u) k (u,v)∈U×V

≤ λM + Bt (φ) (x). Thus Bt is additively M -subhomogeneous. We now define the selection functions φQu t . At time t ∈ [[0, T − 1]] we define for Qu any F ⊂ Ft and x ∈ X ! φQu t (F, x) ∈ Bt

arg min Bt (φ) (x) . φ∈F

At time t = T we define for any F ⊂ FQu T and x ∈ X we set φQu T (F, x) = arg min ψi (x). ψi

Proposition 3.12. For every time t ∈ [[0, T ]], the function φQu is a compatible t selection function in the sense of Definition 1.3. Proof. Fix t = T . The function φQu is tight and valid as VT = ψ. Now fix T t ∈ [[0, T − 1]]. Let F ⊂ FQu and x ∈ X be arbitrary. We have t Bt (VF ) (x) = Bt inf φ (x) φ∈F v ft (x, u) = inf cvt (x, u) + kftv (x, u) k2 inf φ φ∈F kf v (x, u) k (u,v)∈U×V tv ft (x, u) = inf inf cvt (x, u) + kftv (x, u) k2 φ φ∈F (u,v)∈U×V kftv (x, u) k = inf Bt (φ) (x) φ∈F

= φQu t (F, x) (x). Thus φQu is tight. By similar arguments we have for every x0 ∈ X that t 0 Bt (VF ) (x0 ) = inf Bt (φ) (x0 ) ≥ φQu t (F, x) (x ). φ∈F

As the structural assumptions, Assumptions 1.2 are satisfied, as the functions φQu t , 0 ≤ t ≤ T are compatible selections and the unit sphere S is Vt -optimal (case opt = inf) we have by Theorem 2.6 the following convergence result Theorem 3.13 (Upper (inner) approximations of the value functions). Under Assumptions 3.8, for every t ∈ [[0, T ]], denote by Vtk k∈N the sequence of functionals generated by Tropical Dynamic Programming with the selection function φQu and the t draws made uniformly overthe sphere Kt := S. Then the sequence Vtk k∈N is nonincreasing, bounded from below by Vt and converges uniformly to Vt∗ on S. Moreover, almost surely over the draws, Vt∗ = Vt on S.

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

26

Finally, we briefly explain why, by adding another dimension to the state variable, there’s no loss of generality induced by the choice of studying pure quadratic forms in (3.14) instead of quadratic forms, nor is there one for studying linear dynamics instead of affine dynamics. First, we define the operator θ that maps any functional φ : X → R to a 2-homogeneous functional φθ

θ: R

(3.22)

X

φ

X×R

−→

R

7−→

φθ : (x, y) 7→ y 2 φ

x y

if y 6= 0, 0 otherwise.

When φ is affine, abusively denote by θ again the operator that maps any affine functional φ to the linear functional φθ ,

(3.23)

θ: R

X

φ

X×R

−→

R

7−→

φθ : (x, y) 7→ yφ

x y

if y 6= 0, 0 otherwise.

Now consider (Bt )t∈[[0,T −1]] the Bellman operators associated to an analogous optimization problem as (3.14) but where we allow the costs and final cost functions to be general quadratic forms and the dynamic to be affine. Furthermore, one can turn this non-homogeneous optimization problem into a homogeneous problem by applying θ to the costs functions, final cost function and dynamics of the previous optimization problem. We note by Btθ t∈[[0,T −1]] the Bellman operators of the homogenized optimization problem. Then one can deduce Bt from Btθ θ, meaning that if one can solve the (homogenized) optimization problem (3.14) where the costs are pure quadratics forms and the dynamics linear, then one can solve the (non-homogeneous) problem where the costs are quadratic forms and the dynamics are affine. Proposition 3.14. Define for every t ∈ [[0, T − 1]], two operators Bt :

Btθ

R φ

X

−→ 7−→

R x 7→ minu∈U ct (x, u) + φ (ft (x, u)) ,

X×R

−→ 7−→

R (x, y) 7→ minu∈U cθt (x, y, u) + φ ftθ (x, y, u) .

R

φ

X

X×R

For every functional φ : X → R, for every x ∈ X and y ∈ R∗ we have that Btθ (θφ) (x, y) = θBt (φ) (x, y)

and

Bt (φ) (x) = Btθ (θφ) (x, 1) .

Proof. First, remark that if the first equality holds, then as for every x ∈ X one has θBt (φ) (x, 1) = Bt (φ) (x), setting y = 1 one gets the second equality. Now fix a

27 6.5 6 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 -4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

0.5 -2.8

-2.6

-2.4

-2.2

-2

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

Fig. 5. Illustration of the multistage (two stages here) optimization problem studied in Example 3.15. On the right we have the final cost function (infimum of ψ1 and ψ2 ) and on the left we have the image of ψ1 and ψ2 by the Bellman operator B. The image of ψ2 is strictly greater than the image of ψ1 . At the final step t = 1, the best function at the point −1 is ψ2 . The image by the k-th optimal dynamic of x0 = −2 is x1 = −1.

functional φ : X → R, x ∈ X and y ∈ R∗ , we have that Btθ (θφ) (x, y) = min cθt (x, y, u) + φθ ftθ (x, y, u) u∈U x u x u = min y 2 ct , + φθ yct , u∈U y y y y x u u x = y 2 min ct , , + φ ct u∈U y y y y x 0 x 0 , u + φ c , u = y 2 min c t t u0 ∈U y y x = y 2 Bt (φ) y = θBt (φ) (x). Example 3.15. We show an example where approximating from above fails to converge when the points are drawn accordingly to optimal trajectories for the current approximations (as done in subsection 3.1 where we approximate from below). As shown by Proposition 3.14 there’s no loss of generality in considering the framework of this section but with non-homogeneous functions. We consider a (non-homogeneous) problem with only two time steps, that is T = 1 and t ∈ {0, 1} such that — The state space X and the control space U are equal to R. — The linear dynamic is f (x, u) = x + u. — The quadratic cost is c(x, u) = x2 + u2 . — The final cost function is the infimum between two quadratics, ψ1 (x) = (x + 2)2 + 1 and ψ2 (x) = x2 i.e. ψ = inf (ψ1 , ψ2 ) . Here the Bellman operator B of this multistage optimization problem is defined

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

28

for every φ : X → R and every x ∈ X by B (φ) (x) = min x2 + u2 + φ (x + u) = x2 + min u2 + φ (x + u) . u∈U

u∈U

2

For the case where φa,b (·) = (· + a) + b with a, b ∈ R one has for every x ∈ R B (φa,b ) (x) =

(3.24)

3 2 x + ax + b. 2

Fix x0 = xk0 = −2 for every k ∈ N. As described in Algorithm 1.1, the approximations of the value functions V1 and V0 are initialized to +∞. Thus every control u ∈ U is optimal in the sense that u ∈ arg minu0 ∈U x2 + (u0 )2 + φ (x + u0 ). Hence if we set x01 := −1 = f (x0 , 1) then (x0 , x01 ) is an optimal trajectory as described in Proposition 3.5. We deduce from (3.24) the following facts, illustrated in Figure 5. 1. The image of ψ2 is strictly greater than the image of ψ1 by the Bellman operator B, i.e. B (ψ2 ) (−2) > B (ψ1 ) (−2). 2. The image by the k-th optimal dynamic of −2 is −1, i.e. setting uk0 := arg minu0 ∈U (−2)2 + (u0 )2 + V1k (−2 + u0 ) (the arg min is here a singleton) one has f −2, uk0 = −1. 3. At the final step t = 1, the best function at the point −1 is ψ2 , i.e. ψ(−1) = inf (ψ1 (−1), ψ2 (−2)) = ψ2 (−2). From those three facts, one can deduce starting x0 = −2 and x1 = −1, the optimal trajectory for the current approximations will always be sent to x1 = −1. But, as shown in the proof of Proposition 3.12 one can show that the image by B of an infimum is the infimum of the images by B: V0 (−2) = B (inf (ψ1 , ψ2 )) (−2) = inf (B (ψ1 ) (−2), B (ψ1 ) (−2)) . Thus for every k ∈ N, V0 (−2) = B (ψ1 ) (−2) < B (ψ1 ) (−2) = V0k (−2). Conclusion. In this article we have devised an algorithm, Tropical Dynamic Programming, that encompases both a discrete time version of Qu’s work and the SDDP algorithm in the deterministic case. We have shown in the last section that Tropical Dynamic Programming can be applied to two natural frameworks: one for Qu’s adaptation and one for SDDP. In the case where both framework intersects, one could apply Tropical Dynamic Programming with the selection functions φQu t and get non-increasing upper approximations of the value function. Simultaneously, by applying Tropical Dynamic Programming with the selection function φSDDP , one t would get non-decreasing lower approximations of the value function. Moreover, we have shown that the upper approximations are, almost surely, asymptotically equal to the value function on the whole space of states X and that the lower approximations are, almost surely, asymptotically equal to the value function on a set of interest. Thus, in those particular cases we get converging bounds for V0 (x0 ), which is the value of the multistage optimization problem (0.1), along with asymptotically exact minimizing policies. In those cases, we have shown a possible way to address the issue

29 of computing efficient upper bounds when running the SDDP algorithm by simultaneously running another algorithm (namely TDP with Qu’s selection functions). This claim has yet to be tested numerically: the results presented here act as a safeguard and are not proofs of efficiency. In future works, we would like to extend the current framework to risk-averse multistage stochastic optimization problems, explicitly give a way to generate deterministic converging upper and lower bounds and to provide numerical experiments. REFERENCES [1] R. Bellman, The theory of dynamic programming, Bulletin of the American Mathematical Society, 60 (1954), pp. 503–515. [2] R. Bellman, Dynamic Programming, Princeton Univ. Pr, Princeton, NJ, 1984. OCLC: 830865530. [3] D. P. Bertsekas, Dynamic Programming and Optimal Control. Vol. 1: ..., no. 3 in Athena scientific optimization and computation series, Athena Scientific, Belmont, Mass, fourth ed ed., 2016. [4] J. V. Burke and P. Tseng, A Unified Analysis of Hoffman’s Bound via Fenchel Duality, SIAM Journal on Optimization, 6 (1996), pp. 265–282, https://doi.org/10.1137/0806015. [5] S. Dreyfus, Richard Bellman on the Birth of Dynamic Programming, Operations Research, 50 (2002), pp. 48–51, https://doi.org/10.1287/opre.50.1.48.17791. [6] R. M. Dudley, Real Analysis and Probability, no. 74 in Cambridge studies in advanced mathematics, Cambridge Univ. Press, Cambridge, 2. ed ed., 2003. OCLC: 249706200. [7] W. Fleming, Functions of Several Variables, Undergraduate Texts in Mathematics, Springer New York, New York, NY, 1977, https://doi.org/10.1007/978-1-4684-9461-7. [8] P. Girardeau, V. Leclere, and A. B. Philpott, On the Convergence of Decomposition Methods for Multistage Stochastic Convex Programs, Mathematics of Operations Research, 40 (2015), pp. 130–145, https://doi.org/10.1287/moor.2014.0664. [9] V. Guigues, SDDP for some interstage dependent risk-averse problems and application to hydro-thermal planning, Computational Optimization and Applications, 57 (2014), pp. 167–203, https://doi.org/10.1007/s10589-013-9584-1. ¨ misch, Sampling-Based Decomposition Methods for Multistage Stochas[10] V. Guigues and W. Ro tic Programs Based on Extended Polyhedral Risk Measures, SIAM Journal on Optimization, 22 (2012), pp. 286–312, https://doi.org/10.1137/100811696. [11] P. Lancaster and L. Rodman, Algebraic Riccati Equations, Oxford science publications, Clarendon Press ; Oxford University Press, Oxford : New York, 1995. [12] W. McEneaney, Max-Plus Methods for Nonlinear Control and Estimation, Systems & Control: Foundations & Applications, Birkh¨ auser-Verlag, Boston, 2006, https://doi.org/10.1007/ 0-8176-4453-9. [13] W. M. McEneaney, H. Kaise, and S. H. Han, Idempotent Method for Continuous-Time Stochastic Control and Complexity Attenuation, IFAC Proceedings Volumes, 44 (2011), pp. 3216–3221, https://doi.org/10.3182/20110828-6-IT-1002.03343. [14] M. V. F. Pereira and L. M. V. G. Pinto, Multi-stage stochastic optimization applied to energy planning, Mathematical Programming, 52 (1991), pp. 359–375, https://doi.org/10. 1007/BF01582895. [15] Z. Qu, Nonlinear Perron-Frobenius Theory and Max-plus Numerical Methods for HamiltonJacobi Equations, PhD thesis, Ecole Polytechnique X, Oct. 2013. [16] Z. Qu, A max-plus based randomized algorithm for solving a class of HJB PDEs, in 53rd IEEE Conference on Decision and Control, Dec. 2014, pp. 1575–1580, https://doi.org/10.1109/ CDC.2014.7039624. [17] R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, no. 317 in Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen, Springer, Dordrecht, corr. 3. print ed., 2009. OCLC: 845635447. [18] L. Schwartz, Analyse. 1: Th´ eorie Des Ensembles et Topologie, no. 42 in Collection Enseignement des sciences, Hermann, Paris, nouv. tirage ed., 1995. OCLC: 247315724. [19] A. Shapiro, Analysis of stochastic dual dynamic programming method, European Journal of Operational Research, 209 (2011), pp. 63–72, https://doi.org/10.1016/j.ejor.2010.08.007. [20] J. Zou, S. Ahmed, and X. A. Sun, Stochastic dual dynamic integer programming, Mathematical Programming, (2018), https://doi.org/10.1007/s10107-018-1249-5.

30

MARIANNE AKIAN, JEAN-PHILIPPE CHANCELIER AND BENOˆIT TRAN

Appendix A. Algebraic Riccati Equation. This section gives complementary results for subsection 3.2. We use the same framework and notations introduced in subsection 3.2. Proposition A.1. Fix a discrete control v ∈ V and a time step t ∈ [[0, T − 1]]. — The operator Btv restricted to the pure quadratic forms (identified with Sn the space of the symmetric semidefinite positive matrices) is given by the discrete time algebraic Riccati equation (A.1) Btv : Sn −→ S+ n T M 7−→ Ctv + (Avt ) M Avt −1 T T T − (Avt ) M Btv Dtv + Btv M (Btv ) (Btv ) M Avt . — We can rewrite the discrete time algebraic Riccati equation Appendix B (A.1). For every M ∈ S+ n we have: −1 T −1 T (A.2) Btv (M ) = (Avt ) M I + Btv (Dtv ) (Btv ) M Avt + Ctv . Proof. — First note that if M is symmetric, then Btv (M ) is also symmetric. Let t ∈ {T − 1, T − 2, . . . , 0} and M ∈ S+ n . Let x ∈ X, we have: T v ftv (x, u) ft (x, u) v v 2 M Bt (M )(x) = inf ct (x, u) + kft (x, u)k2 u∈U kftv (x, u)k2 kftv (x, u)k2 T

= inf xT Ctv x + uT Dtv u + ftv (x, u) M ftv (x, u) u∈U

(A.3) Btv (M )(x) = xT Ctv x + inf uT Dtv u + ftv (x, u)T M ftv (x, u). u∈U

As u 7→ ft (x, u) is linear, Dtv 0 and M 0 we have that g : u ∈ U 7→ uT Dtv u + ftv (x, u)T M ftv (x, u) ∈ R is convex, hence is minimal when ∇g(u) = 0 i.e. for u ∈ U such that : T T (A.4) Dtv + (Btv ) M Btv u + (Btv ) M (Avt ) x = 0. T

Now we will show that Dtv + (Btv ) M Btv is invertible. As M is symmetric semidefinite positive, for every u ∈ U we have : T T uT Dtv + (Btv ) M Btv u = uT Dtv u + (Btv u) M (Btv u) {z } | ≥0

T

Dtv u

≥u | {z } >0

> 0. T

We have shown that the symmetric matrix Dtv + (Btv ) M Btv is definite positive and thus invertible. We conclude that equation (A.4) is equivalent to: −1 T T (A.5) u = − Dtv + (Btv ) M Btv (Btv ) M (Avt ) x.

31 Finally replacing (A.5) in equation (A.3) we get after simplifications that T Btv (M )(x) = xT Ctv + (Avt ) M Avt −1 T T T − (Avt ) M Btv Dtv + (Btv ) M Btv (Btv ) M Avt x, which gives (A.1). — See [11, Proposition 12.1.1 page 271]. Appendix B. Smallest and greatest eigenvalues. Here we recall some formulas on the lowest and greatest eigenvalues of a matrix. Proposition B.1. Let A and B two symmetric real matrices. Denote the smallest eigenvalue of a symmetric real matrix M by λmin (M ) (every eigenvalue of M is real) and by λmax (M ) its greatest eigenvalue. We have the following inequalities : 1. λmin (A + B) ≥ λmin (A) + λmin (B), 2. λmax (A + B) ≤ λmax (A) + λmax (B). Proposition B.2. Let A be a real matrix and B be a symmetric real matrix. Denote by λmin (M ) the smallest eigenvalue of a symmetric real matrix M (every eigenvalue of M is real) and by λmax (M ) its greatest eigenvalue. We have the following inequalities : 1. λmin (AT BA) ≥ λmin (AT A)λmin (B), 2. λmax (AT BA) ≤ λmax (AT A)λmax (B). Moreover if A and B are symmetric definite positive matrices, then we have: 1. λmin (AB) ≥ λmin (A)λmin (B), 2. λmax (AB) ≤ λmax (A)λmax (B). Acknowledgments.