## Minorant Methods of Stochastic Global Optimization - Springer Link

Generalizations of the branch and bound method and of the Piyavskii method for ... The branch and bound algorithm is one of the basic methods of discrete and ...

Cybernetics and Systems Analysis, Vol. 41, No. 2, 2005

MINORANT METHODS OF STOCHASTIC GLOBAL OPTIMIZATION

UDC 519.853.4

V. I. Norkin and B. O. Onishchenko

Generalizations of the branch and bound method and of the Piyavskii method for solution of stochastic global optimization problems are considered. These methods employ the concept of a tangent minorant of an objective function as a source of global information about the function. Calculus of tangent minorants is developed. Keywords: stochastic global optimization, nonconvex stochastic programming, branch and bound algorithm, Piyavskii method, stochastic boundaries, tangent minorants.

INTRODUCTION In the paper, we consider numerical methods for solving stochastic global optimization problems, which employ, to some extent, the concept of a tangent minorant (a stochastic tangent minorant) of an objective function. In particular, the well-known Piyavskii’s method of determinate global optimization and the branch and bound algorithm are generalized to solve stochastic global optimization problems (expectation and probability functions). The Piyavskii method [1–3] was “rediscovered” repeatedly and is one of the popular methods of determinate global optimization [4]. It has two equivalent forms: for optimization of functions of maximum and of functions that admit so-called tangent minorants [5]. Tangent minorants are the key concept for this method. The main task of the Piyavskii method in a multidimensional case is to solve auxiliary approximating multiextremum problems. The branch and bound algorithm is one of the basic methods of discrete and global determinate optimization [4]. It is characterized by partition of the initial admissible set (for example, into parallelepipeds, simplexes, etc.), the estimates of the optimal value of the objective function on a subset (relaxation of constraints, dual estimates, etc.), and a refinement strategy. The method alternatives differ mainly in deriving lower estimates of the optimal value of the objective function on a fragment of the search area partition. The stochastic optimization problem is to minimize the function of expectation or probability. The difficulty of this problem is that its objective function cannot be calculated precisely, and only statistical estimates of its values and, probably, gradients are known. The problem is to derive local and global minima of the problem using these estimates. The literature on solving convex stochastic optimization problems is quite extensive [6, 7]. Problems and methods of deriving local minima in nonconvex stochastic optimization problems are considered in [8]. Some problems of stochastic global optimization and the stochastic branch and bound algorithm for their solution are studied in [9–11], where branches (subproblems) are estimated using permutable relaxation, i.e., permutation of the operations of minimization and integration (taking expectation or probability). In [12–14], this stochastic branch and bound algorithm is used for global optimization of probabilities with applications to control of environment contamination, and in [15, 16] — to problems of optimal routing and control of projects. As an example, we will solve a nonconvex nonsmooth distribution problem by methods of stochastic global optimization.

V. M. Glushkov Institute of Cybernetics, National Academy of Sciences of Ukraine, Kiev, Ukraine, [email protected]; [email protected] Translated from Kibernetika i Sistemnyi Analiz, No. 2, pp. 56-70, March-April 2005. Original article submitted June 10, 2004. 1060-0396/05/4102-0203

203

The main result of this paper generalizing [17–19] is in extending the determinate Piyavskii method and the classical branch and bound algorithm to stochastic global optimization problems (functions of expectation and probability). A common feature of these methods is the use of tangent minorants of the objective function as a source of the global information on this function. Thus, the ways of constructing tangent minorants are of importance. For example, such minorant tangents may be tangent cones that use values of the function and its Lipschitz constant, or tangent paraboloids that use values of the function, its gradients, and the Lipschitz constant of gradients. When tangent paraboloids rather than cones are applied, the efficiency of the method increases considerably [18]. Calculus of tangent minorants for complex nonconvex objective functions is developed in [5]. In the present paper, we discuss new methods for calculation of tangent minorants, in particular, for functions of minimum and maximum, and stochastic minorants for functions such as expectation and probability. Introduction of stochastic minorants is similar to the generalization of the gradient method for solving determinate problems to the stochastic quasigradient method for solving stochastic programming problems. It may be problematic to derive determinate minorants, as well as determinate gradients of the functions of expectation; however, it is quite possible to calculate and apply stochastic minorants and stochastic quasigradients. Another common feature of the considered methods of stochastic global optimization is the use of a sequence of uniform approximations of the objective function and tangent minorants of these approximations. Thus, we obtain new modifications of the Piyavskii method and of the branch and bound algorithm for the solution of so-called limiting extremum problems, where the objective function is optimized using a sequence of approximating functions. Auxiliary approximating problems in the multidimensional Piyavskii method are solved approximately, by dividing the search area into subsimlexes and searching for a subsimplex with the least lower estimate of approximating function. Then this alternative of the method turns into the branch and bound algorithm with minorant estimates of branches. An important feature of the classical branch and bound algorithm is the possibility of rejecting unpromising branches. However, this cannot be done when stochastic estimates of branches are used in stochastic programming problems since the global extremum might be lost. In one of the modifications of the branch and bound algorithm, we do not reject branches (subsets of the partition) with poor estimates but aggregate them, i.e., come back to a coarser partition of the search area, but do this no more than a finite number of times. 1. STOCHASTIC TANGENT MINORANTS OF FUNCTIONS Let us consider the problem of stochastic global optimization min x Î X [ F ( x ) = Ef ( x, q )]

(1)

min x Î X [ P ( x ) = P{ f ( x, q ) ³ 0}] ,

(2)

or where q is a random parameter; E is the symbol of expectation in q; f ( x, q ) is a function continuous in x and integrable in q; q ÎQ; ( Q, S, P ) is the probabilistic space of the problem; P{ × } is the symbol of probability, and X is a continuous or discrete set. We assume that for each q the functions f ( × , q ) admit minorants j( x, y, q ) tangent at points y, and thus, we implicitly assume [5] that the functions f ( × , q ) are functions of maximum: f ( x, q ) = max y Î X j( x, y, q ). Actually, we consider global optimization problems of the form min x Î X [ F ( x ) = E max y ÎY y ( x, y, q )], ( min x Î X [ F ( x ) = E min

y ÎY

y ( x, y, q )]),

min x Î X [ P ( x ) = P{max y ÎY y ( x, y, q ) ³ 0}], (min x Î X [ P ( x ) = P{min

y ÎY

y ( x, y, q ) ³ 0}]),

(3)

(4)

where Y is a finite or infinite set. These are stochastic minimax (minimine) problems. Their applications and some methods of local optimization are considered in [6, 20].

204

Definition 1 [5]. Let X be a topological space, the functions F ( x ), x Î X , and j( x, y ), x Î X , y Î X , be related by the conditions: (a) F ( x ) ³ j( x, y ) for all x Î X , y Î X ; (b) F ( y ) = j( y, y ) for all y Î X ; (c) the function j( x, y ) is continuous in x equipotentially in y. Then the functions {j( × , y ), y Î X } are called tangent (at the points y) minorants for F ( x ). Definition 2. The function j( x, y ) is called continuous in x Î X equipotentially in y ÎY if for any e > 0 there exists d > 0 (independent from y) such that | j( x1 , y ) - j( x 2 , y )| £ e for any y ÎY and x1 , x 2 Î X , | | x1 - x 2 | | £ d. LEMMA 1. If the function j( x, y ) is continuous in the population of variables ( x, y ), then it is uniformly continuous in ( x, y ) and, therefore, is continuous in x equipotentially in y. Thus, if lim k ® ¥ j( x k , y k ) = j( x, y ) for any sequences x k ® x, y k ® y, then the function j( x, y ) is continuous in x equipotentially in y. Remark 1 Tangent majorants can be determined similarly. S. A. Piyavskii [1] considered minorants continuous in ( x, y ), V. I. Norkin [5] analyzed minorants continuous in x equipotentially in y, and O. Khamisov [21] considered concave minorants, possibly discontinuous in x. Tangent minorants have much to do with functions of maximum. On the one hand, tangent minorants can easily be constructed for functions of maximum, and on the other hand, the functions admitting tangent minorants are functions of maximum. LEMMA 2 [5]. If f ( x ) = max z Î Z y ( x, z ) = y ( x, z ( x )) is a function of maximum, where y( x, y ) is continuous in x equipotentially in z Î Z, then the function j( x, y ) = y ( x, z ( y )) is obviously a minorant, tangent at the point y, for f ( x ). THEOREM 1 [5]. If {j( × , y )} y Î X is a family of tangent minorants of the function f ( × ) in terms of Definition 1, then f ( × ) is continuous and can be presented as a function of maximum: f ( x ) = sup y Î X j( x, y ). To solve the stochastic programming problem, let us introduce the concept of stochastic tangent minorants. Definition 3. The functions {f( × , y, q ), y Î X , q Î Q}, where Q is a carrier of some probabilistic space ( Q, S, P ), are called stochastic tangent minorants for F ( x ) if the functions f( x, y, q ) are measurable in q and expectations j( x, y ) = Ef( x, y, q ) are finite and are the minorants for F ( x ), tangent at the point y, for each y Î X (in terms of Definition 1). The following lemma shows that it is possible to take tangent minorants of the integrand f ( x, q ) as stochastic tangent minorants of the function of expectation F ( x ) = Ef ( x, q ). Here, the situation is similar to the calculation of stochastic gradients of the function of expectation [6]. LEMMA 3 [17]. Assume that the function f ( × , q ) admits tangent minorants f( x, y, q ) at the points y Î X , i.e., the following is fulfilled for almost all q: 1) f ( x, q ) ³ f( x, y, q ) for all x Î X , y Î X ; 2) f ( y, q ) = f( y, y, q ) for all y Î X ; 3) th function f ( x, y, q ) is continuous in ( x, y ) for almost all q; 4) f( x, y, q ) is measurable in q for any x, y Î X ; 5) | f ( x, y, q )| £ M ( q ) for all x, y Î X with an integrable function M ( q ) . Then the functions j( x, y ) = Ef( x, y, q ) are continuous and are tangent minorants for the function of expectation F ( x ) = Ef ( x, q ). Proof Conditions (a) and (b) of Definition 1 follow from conditions 1) and 2). Continuity of j( x, y ) and, thus, condition c) of Definition 1 follow from conditions 3) and 4) by the Lebesgue theorem about dominated convergence and by Lemma 1. Remark 2. Tangent minorants of the probability function P ( x ) = P{ f ( x, q ) ³ 0} can be constructed similarly, namely, one may take, as the minorant of P ( x ) tangent at the point y, the function f( x, y ) = P{f( x, y, q ) ³ 0}, where f ( x, y, q ) is the minorant of the function f ( x, q ), tangent at the point y. Assume now that the integrand in (1) can be represented explicitly as a function of maximum. The following theorem is a stochastic analog of Lemma 2. THEOREM 2. Let F ( x ) = Ef ( x, q ). Assume that the following is true for each q ÎQ: 1) f ( x, q ) = sup y ( x, z , q ) = y ( x, z ( x, q ), q ), x Î X , z ÎZ

205

where Z is a compact set and z ( x, q ) is a single-valued ( x, q )-measurable section of the mapping Z ( x, q ) = arg max z Î Z y ( x, z , q ) ; 2) y ( x, z , q ) is continuous in ( x, z ) and measurable in q; 3) | y ( x, z , q )| £ M ( q ) for any ( x, z ) with the integrable function M ( q ). Then the functions {f( x, y, q ) = y ( x, z ( y, q ), q )} y Î X form a family of minorants for f ( x, q ), tangent at the point y, and, therefore, the functions j( x, y ) = Ef( x, y, q ) form tangent minorants for F ( x ) = Ef ( x, q ). Proof. Continuity of f ( x, q ) for each q follows from condition 2) and Theorem 1. Existence of an ( x, q )-measurable section z ( x, q ) Î Z ( x, q ) follows from measurability of the mapping Z ( x, q ) [22, § 8.1]. Then the function f( x, y, q ) = y ( x, z ( y, q ), q ) is measurable in q. Continuity of F ( x ) follows from the Lebesgue theorem about a passage to the limit under the integration sign. Conditions a) and b) follow from the relations f ( x, q ) = y ( x, z ( x, q ), q ) ³ y ( x, z ( y, q ), q ) = f( x, y, q ) , f ( y, q ) = y ( y, z ( y ), q ) = f( y, y, q ) . Let us check condition (c). Let x k ® x and y k ® y. Since Z ( y, q ) is upper semicontinuous and Z is compact, we may assume that z ( y k , q ) ® z Î Z ( y, q ) ; therefore, f( x k , y k , q ) = f( x k , z ( y k , q ), q ) ® y ( x, z , q ) = y ( x, z ( y, q ), q ) = f( x, y, q ) . Thus, the functions f( x, y, q ) = y ( x, z ( y ), q ) are continuous in ( x, y ) and, therefore, by Lemma 2, are continuous in x equipotentially in y. Theorem 2 specifies how to construct stochastic tangent minorants for the function of maximum, including for the function of discrete maximum. In particular, if f ( x, q ) = | y ( x, q ) | = max{ - y ( x, q ), y ( x, q )} and the functions y ( x, q ) have corresponding tangent minorats and majorants, then the minorants f ( x, q ) can be constructed according to this theorem. Let us consider other ways of constructing stochastic tangent minorants. Tangents Cones. If the functions f ( x, q ) are Lipschitz (H&&older) with the Lipschitz constant L( q ), integrable in q, and index a:

| f ( x1 , q ) - f ( x 2 , q )| £ L( q )| | x1 - x 2 | |a "x1 , x 2 Î X , 0 < a £ 1 ,

then is possible to take the function

f( x, y, q ) = f ( y, q ) - L( q )| | x - y| |a

as a minorant for f ( x, q ), tangent at the point y. 1/ 2

n Note that different norms are possible here, for example, it is possible to use the norm | | z | | = æç å i = 1 z i2 ö÷ è ø || z || = max | z i | for an n-dimensional vector z.

or

1£i £n

Tangent Paraboloids. It is possible to use paraboloids tangent to the curve f ( × , q ) at the points y as stochastic

tangent minorants for the functions f ( x, q ), smooth in x, with a Lipschitz gradient (a constant L 1 ( q )) : 2

L ( q )½½ 1 1 ½½ Ñf ( y, q )½½ . f( x, y, q ) = f ( y, q ) + | |Ñf ( y, q )| |2 - 1 ½½ x - y 2L 1 ( q ) 2 ½½ L 1(q ) ½½ Let us consider some more ways of constructing tangent minorants. Minorants of Composite Functions. Let f ( x ) = f 0 ( f1 ( x ), . . . , f m ( x )), x Î X , where X is a topological space, f 0 ( z ) is a monotonically increasing continuous function on the set Y = { f1 ( x ), . . . , f m ( x ) Î R m | y Î X }. The functions f i ( × ), i = 1, m, have tangent minorants j i ( x, y ), i = 1, m.. Then the functions {j( × , y ) = f 0 ( j1 ( × , y ), . . . , j m ( × , y ))} y Î X are tangent minorants for f ( x ) [5]. Minorants of a Difference of Convex Functions. If the function f ( x ) can be represented as a difference of two functions f 1 ( x ) and f 2 ( x ) convex on the compact set X Î R n , i.e., f ( x ) = f 1 ( x ) - f 2 ( x ) , then the functions

206

{j( x, y, q ) = f 1 ( y ) + g ( y ), x - y - f 2 ( x )} y Î X , where g ( y ) is the generalized gradient of the function f ( × ) at the point y, are concave tangent minorants for f ( x ) on X [5]. Minorants of the Function of Minimum. Let f ( x ) = inf y ( x, z ) and functions y( × , z ) for all z Î Z admit (concave) z ÎZ

minorants f( x, y, z ) , tangent at the points y. Then the function j( x, y ) = inf z Î Z f( x, y, z ) is a (concave) minorant for f ( x ), tangent at y. 2.

SOME MODIFICATIONS OF THE PIYAVSKII METHOD We will restrict ourselves to the problem without general constraints F ( x ) ® min x Î X .

(5)

Global optimization problems with general constraints are considered in [4, 5]. 2.1. The Piyavskii Method for Solving a Limiting Extremum Problem. Assume that the objective function F ( x ) in problem (5) is not known precisely, and there is a sequence {F k ( x )} that converges to F ( x ) uniformly on X , tangent minorants j k ( x, y ) of the functions F k ( x ) being known. For the approximate solution of problem (5), it is possible to select a sufficiently large k 0 and find the global minimum of the function F k0 ( x ). To refine the solution obtained, it is necessary to take a greater k1 > k 0 and solve again the global optimization problem for the function F k1 ( x ) , etc. However, it is desirable to have a procedure that improves the approximate solution using the calculations results obtained at the previous steps. To do this, let us construct an analog of the Piyavskii method for the solution of problem (5) using tangent minorants j k ( x, y ) of the sequences of functions F k ( x ). Hereafter, we will apply this approach to solve the stochastic global optimization problem. Algorithm 1. Points y 0 , . . . , y k0 Î X are arbitrary. Let the points y 0 , . . . , y k Î X have been constructed. Find the point y k + 1 , k ³ k 0 , as the solution of a special extremum problem j k ( x ) = max0 £ i £ k j k ( x, y i ) ® min x Î X .

(6)

Thus, only a sequence of points { y i , i = 0, 1, . . . , k} remains in problem (6) from the previous iterations i = 0, 1, . . . , k, and the new approximation j k ( x ) of the objective function F ( x ) differs significantly from the old approximation j k - 1 ( x ) = max0 £ i £ k - 1 j k - 1 ( x, y i ) . THEOREM 3 [18]. Assume that: (a) the sequence of functions {F k ( x )} uniformly converges to the continuous function F ( x ) ; (b) tangent minorants j( x, y ) of the function F ( x ) are continuous in x equipotentially in y; (c) tangent minorants j k ( x, y ) of the functions F k ( x ) uniformly converge on X ´ X to the tangent minorants j( x, y ) of the initial function F ( x ). Then all the limit points of the sequence { y k } are the points of global minimum of the limiting extremum problem (5). The proof is similar to that of Theorem 3 from [5]. The authors also considered in [18] modifications of the Piyavskii method with nontangential minorants. 2.2. A Stochastic Analog of the Piyavskii Method. Let us approximate problem (1) by an observed mean: 1 k é i êëFk ( x ) = k å i = 1 f ( x, q

ù )ú ® min x Î X , û

(7)

where q i are independent observations of the random parameter q. If the functions Fk ( x ) uniformly converge to F ( x ) = Ef ( x, q ) as k ® ¥, then it is possible to solve the approximate problem (7) instead of the initial problem (1). The following statements justify this approach.

207

THEOREM 4 [23] (on convergence of empirical approximations). Let the function f ( x, q ) be continuous in x for almost all q ÎQ and be measurable in q for all x Î X , X being a compact set in R n . Assume that the family { f ( x, × ), x Î X } is uniformly integrable, i.e., lim c ®

sup x Î X

ò

| f ( x, q ) | P ( dq ) = 0.

{q : f ( x, q ) ³ c}

Then the functions Fk ( x ) converge to F ( x ) = Ef ( x, q ) uniformly on X with the unit probability. Remark 3. If | f ( x, q )| £ M ( q ) for any x Î X with the integrable function M ( q ), then the family of functions { f ( x, × ), x Î X } is uniformly integrable. Let the functions f( x, y, q ) be tangent minorants for random functions f ( x, q ). It is obvious that the functions j k ( x, y ) =

1 k å f( x, y, q i ) k i =1

(8)

are tangent minorants for random functions Fk ( x ). Then we apply the Piyavskii method for the solution of the approximate determinate problem (7). Alternatively, the initial problem (1) can be solved through the sequence of uniform approximations (7) using Algorithm 1. THEOREM 5 (on convergence of the stochastic analog of the Piyavskii method). Let the objective function F ( x ) of problem (1) be approximated by a sequence of empirical functions Fk ( x ) from (7) with tangent minorants (8). Then under conditions of Lemma 3 and Theorem 4, the sequence { y k } generated by Algorithm 1 converges to the set of global minima of F ( x ) on X with unit probability. Proof. It is necessary to verify the convergence conditions from Theorem 3. It is obvious that functions (8) are tangent minorants for Fk ( x ) . Condition (a) of Theorem 3 is fulfilled with unit probability by virtue of Theorem 4. Condition (b) for the functions j( x, y ) = Ef( x, y, q ) follows from Lemma 3. Condition c) for the functions j k ( x, y ) and j( x, y ) = Ef( x, y, q ) is fulfilled with unit probability by virtue of Theorem 4. Thus, the statement of the theorem follows from Theorem 3. 3. MINORANT BOUNDARIES OF OPTIMAL VALUES 3.1. Minorant Boundaries of Optimal Values of the Objective Function in Deterministic Problems. Denote F* = min x Î X F ( x ). Let {j( x, y )} y Î X be a family of minorants for F ( x ) tangent at the points y Î X and { y Î Z Í X } be a finite or infinite set of points from X . Let us consider some estimate of the optimal value of F* , constructed on the basis of the information contained in the family of tangent minorants {j( x, y )} y Î Z . We will use these estimates in Section 4 in the branch and bound algorithm for deriving the global minimum of F ( x ) on X . It is obvious that the function f( x ) = max y Î Z j( x, y ) is the minorant for F ( x ), tangent at all points y Î Z, and the quantity F1 = min x Î X f( x ) is the lower estimate for F* . Let us consider the quantity F2 = maxx Î X min

yÎZ

j( x, y ) .

LEMMA 4 [19]. Assume that the convex hull co Z of a finite set of points Z Ì X coincides with the compact set X , and the minorants have the form j( x, y ) = y ( y, || x - y|| ), y Î Z, and y( y, × ) monotonically decreases and is continuous in the second argument. Then F2 £ F1 £ F* . Under conditions of Lemma 4, the quantity F3 = min y Î Z j( x , y ) £ F2 for any x Î X is the lower estimate for F* . The closer x to the optimal point x * the better the estimate F3 . Clearly, F4 = max y Î Z min x Î X j( x, y ) £ F* . 208

If X is a polyhedron (for example, a parallelepiped or a simplex) and j( x, y ) is concave or quasiconcave in x, then min x Î X j( x, y ) can be achieved at the vertices of the polyhedron X . If the set Z is finite in this case, then it is easy to calculate F* . Various boundaries of F1 , F2 , . . . are compared in [19] for numerical examples of solving the global optimization problem by the branch and bound algorithm. 3.2. Minorant Boundaries of the Optimal Values of the Function of Expectation F ( x ) = Ff ( x, q ) . If the probabilistic measure is discrete, then the operation of taking expectation can be reduced to the summation: F ( x ) = å q Î Q p q f ( x, q ), where p q are the values of the probability of a simple event q. In the case of a general probabilistic measure, we will approximate problem (1) by the observed mean (7). The function j( x, y ) = Ef( x, y, q ) is the minorant of F ( x ) = Ef ( x, q ) tangent at the point y. It may be used in constructing analogs of the boundaries of F1 , . . . , F4 , for example: F1 = min x Î X max y Î X Ef( x, y, q ) , F 2 = maxx Î X min F 3 = min

yÎ X

yÎ X

Ef( x, y, q ) ,

Ef( x , y, q ) £ F 2 , x Î X ,

F 4 = max y Î X min x Î X Ef( x, y, q ). If f( x, y, q ) = y ( y, q , | | x - y| | ) and y is a function monotonically decreasing in the last argument, for example, f( x, y, q ) = f ( y, q ) - L( q )| | x - y| | , then the function j( x, y ) = Ef( x, y, q ) satisfies conditions of Lemma 4 and thus F 2 £ F1 . It is expedient to use the boundaries F1 , . . . , F 4 if the minorants f( x, y, q ) are concave in x; in this case, the expectations Ef( x, y, q ) remain concave in x. Then derivation of the estimate of F 2 can be reduced to maximization of the concave function Ef( x, y, q ) on the set X , and calculation of F 4 — to the enumeration of values of the concave functions Ef( x, y, q ) , y Î Z, at the vertices of the polyhedron X . The following boundaries are specific for the function of expectation: F 5 = min x Î X E max y Î X f( x, y, q ) ³ F1 , F 6 = E min x Î X max y Î X f( x, y, q ) £ F 5 , F 7 = E max y Î X min x Î X f( x, y, q ) . If X is a polyhedron and the minorant f( × , y, q ) is concave or quasiconcave in x for each q, then min x Î X f( x, y, q ) is achieved at the vertices X . Thus, it is expedient to use the estimate F 7 if the minorants f( x, y, q ) are not concave but only quasiconcave in x. For special minorants of the form f( x, y, q ) = y ( y, q , | | x - y| | ) that satisfy conditions of Lemma 4 for each q, the following relations take place: min x Î X max y Î X f( x, y, q ) ³ maxx Î X min

yÎ X

f( x, y, q ),

therefore, the following estimate is true: F 7 = Emaxx Î X min

yÎ X

f( x, y, q ) £ F 6 .

3.3. Boundaries of the Optimal Values of the Probability Function. We will consider the problem of global maximization of P ( x ) = P{ f ( x, q ) £ 0} on the convex set X . If f( x, y, q ) is a minorant of the function f ( x, q ), tangent at the point y, then the function j( x, y ) = P{f( x, y, q ) £ 0} is the majorant for P ( x ), tangent at the point y, i.e., P ( x ) £ j( x, y ) and P ( y ) = j( y, y ) . If the function f( x, y, q ) is continuous in x, y for almost all q and P{f( x, y, q ) = 0} = 0 for any x, y Î X , then the function j( x, y ) = P ( f( x, y, q ) £ 0} is continuous in x and y. Herefrom, by virtue of Theorem 1, it follows that the function P ( x ) is also continuous. As stochastic tangent majorants, we may take the indicator functions ìï 1, f( x, y, q ) £ 0, y ( x, y, q ) = í ïî 0, f( x, y, q ) > 0 . 209

It is obvious that j( x, y ) = Ey ( x, y, q ). If f( x, y, q ) is quasiconcave in the first argument, then y ( x, y, q ) is also quasiconcave in x. Similarly to the expectations, we can construct upper estimates of the optimal values of the probability function P ( x ) on the set X that includes a finite subset of points (of tangency) Z, for example, P1 = E min

yÎZ

maxx Î X y ( x, y, q ).

Deriving maximum of quasiconvex function y ( x, y, q ) on a polyhedron X is reduced to the enumeration of the values of this function at the vertices of the polyhedron. 4. THE BRANCH AND BOUND ALGORITHM WITH MINORANT ESTIMATES Let us consider the problem of global minimization of the continuous function F ( x ) on the set X Ì R n . Let there be a sequence of functions F N ( x ), uniformly converging to F ( x ) as N ® ¥. For example, for the stochastic global optimization 1 N problem F N ( x ) = å k = 1 f ( x, q k ) under the conditions of Theorem 4, this sequence uniformly converges to N F ( x ) = Ef ( x, q ) with unit probability. For global optimization of the function F ( x ) on X in terms of the sequence F N ( x ), we will apply the branch and bound algorithm using the estimates from Section 3 of the optimal values of the functions F N ( x ) on the subsets X ¢ Ì X of special form (simplexes, parallelepipeds, etc.). In essence, the branch and bound algorithm with minorant estimates may be treated as the Piyavskii method, where the auxiliary nonconvex problems (6) are solved approximately as a derivation of a record set Y N . 4.1. Algorithm with Branch Deleting. In the branch and bound algorithm, there is a current partition S N of the initial set X at each iteration N. For each element Y ÎS N , there are current upper U N (Y ) and lower L N (Y ) estimates of the optimal value F N* = min y ÎY F N ( y ) , i.e., L N (Y ) £ F N* £ U N (Y ) . Denote by Y N a record set of partition S N , i.e., L N (Y N ) = min Y Î S N L(Y ). The algorithm of the method consists of the sequence of the following operations. Step 1 (branching). Let the set Y N be selected for branching, i.e., be represented as a union of several subsets Y Nk Ì Y N , k = 1, 2, . . . , such that Y N = È k Y Nk . Thus, the new partition has the form S N + 1 = ( S N \ Y N ) È ( È k Y Nk ) . Step 2 (estimating). For the elements Y Î S N + 1 of the new partition S N + 1 , calculate the estimates L N + 1 (Y ) and UN

+ 1 (Y

) of the optimal value F N*

+1

= min

y ÎY

FN

+ 1 ( y)

of the function F N

Step 3 (deleting). If N + 1 ³ M , then unpromising sets Y ¢ such that L N

+ 1 (Y ¢

+ 1 ( y)

on the element Y .

) > min Y Î S N + 1U N

+ 1 (Y

) + e, etc., are

deleted from the partition S N + 1 . This is the standard branch and bound algorithm except for that estimates of the subsets not based on the initial function F ( x ) but based on the approximate functions F N ( x ) are used at each its iteration. Denote e M = sup N ³ M sup x Î X | F ( x ) - F N ( x )| . It is obvious that by virtue of the uniform convergence of F N ( x ) to F ( x ), lim M ® ¥ e M = 0 on X . THEOREM 6 (on convergence) [19]. Let F ( x ) be a continuous function, the sequence {F N ( x ), N = 1, 2, . . . } uniformly converge to F ( x ) on X , and e M £ e / 2 . Assume that the estimates of the subsets possess the following property: lim N ® ¥ (U N ( Z N ) - L N ( Z N )) = 0 for any sequence of sets {Z N } such that diam ( Z N ) ® 0 as N ® ¥. Assume that the partitions are such that the diameters of the sets obtained tend to zero for unlimited number of divisions. Then the sequence {Y N } is infinite and for any limit point y of the record sets Y N , we have F ( y ) - min x Î X F ( x ) £ 2e M . 4.2. The Algorithm with Union of Branches. In this algorithm, calculations are reduced due to the group estimate of subsets. It differs from the algorithm in Section 4.1 only in Step 3, where unpromising subsets are not deleted but are joined into larger sets. Step 3 (join of subsets). The subsets Y i¢ Î S N + 1 such that L N + 1 (Y ¢ ) > min Y Î S N + 1 U N + 1 (Y ) + e, and È i Y i¢ Î S N are aggregated into one set Y ¢ ¢, i.e., the subsets Y i¢ are deleted from S N + 1 , and the joined set Y ¢ ¢ is added to S N + 1 . By a partition (branching) tree, we will mean a tree-form graph whose nodes correspond to partition subsets and the nodes subordinate to the given one correspond to the partition subsets of the corresponding parent subset.

210

THEOREM 7 (on convergence). Let assumptions of Theorem 6 be carried out. Let also each subset of the partition tree can participate in the procedure of join no more than a fixed number of times, depending only on its position (depth) in the partition tree. Then diam (Y N ) ® 0 and F ( y ) = min x Î X F ( x ) for any limit point y of record sets Y N . Proof. It is obvious that diam (Y N ) ® 0 as N ® ¥. Denote by Y N* a partition subset containing global minimum x * , then L N (Y N* ) £ min x ÎY * F N ( x ) £ min x ÎY * F ( x ) + e N = F ( x * ) + e N . N

N

Let lim k ® ¥ Y N k = y. Let us show that y is the optimal point of the problem. Assume the opposite: let F ( y ) - min x Î X F ( x ) = e > 0. Since L N (Y N ) £ min x ÎYN F N ( x ) £ U N (Y N ) and U N (Y N ) - L(Y N ) ® 0, by virtue of the uniform convergence of F N Þ F , lim k ® ¥ L N k (Y N k ) = F ( y ) and, therefore, L N k (Y N k ) ³ F ( y ) - e / 2 ³ min x Î X F ( x ) + e / 2 for sufficiently large k. Thus, for sufficiently large k, we have L N k (Y N k ) ³ min x Î X F ( x ) + e / 2 ³ L N k (Y N* k ) + e / 2 - e N k that contradicts the definition of Y N k as a record subset. The inconsistency obtained proves the theorem. 5. NUMERICAL EXPERIMENTS Let us use the allocation problem as an example to illustrate the approach proposed to solving stochastic global optimization problems. Example (the problem of allocation of service centers) [10]. Let P ( dw ) be the distribution of customers of some service within the domain W Ì R m . Let the function c( x i , w ) specify cost of servicing a client who lives at the point wÎW, from a service center located at the point x i Î X Ì R m , for example, c( x i , w ) = c(| | x i - w| | ), in particular, c( x i , w ) =

| | x i - w| |a g + | | x i - w| | b

, a ³ b > 0, g > 0.

(9)

If there are n service centers at the points x1 , . . . , x n Î X , and the client selects the center nearest to him, then the cost of his service w is specified by function f ( x, w ) = min 1 £ i £ n c( x i , w ), x = ( x1 , . . . , x n ) .

(10)

The problem is to allocate n service centers so that total expected expenditures of servicing all the customers are minimum: (11) F ( x ) = ò {min 1 £ i £ n c( x i , w )} P ( dw ) ® min x Î D , W

where D is the set of admissible positions of centers. The coordinates of the centers may satisfy some additional constraints, for example, the centers may be ordered with respect to a direction d Î R m , then D = {x = ( x1 , . . . , x n ) | x1 Î X , . . . , x n Î X ; dx i £ dx i + 1 , i = 1, . . . , n - 1}. The local minimum in (11) can be derived by the following iteration algorithm. Denote the initial position of service centers by x 0 = ( x10 , . . . , x n0 ). Let the vector x k = ( x1k , . . . , x nk ) specify the position of service centers at the kth iteration. Let us distribute the customers among service centers ( x1k , . . . , x nk ) so that everyone is serviced by the center nearest to him, i.e., divide W into n non-intersecting subsets W ki , i = 1, . . . , n, such that i = arg min

1£i £n

c( x ik , w ) for all wÎW ki . Find the next

approximation x k + 1 = ( x1k + 1 ,... , x nk + 1 ) as the (approximate) solution of the convex programming problem F k (x ) = åi = 1 n

òW ki

c ( x i , w )P ( dw ) ® min x Î D .

211

TABLE 1. Distribution of Customers Value of w i 0.037991 0.100437 0.193362 0.260176 0.326991 0.442882 0.453538 0.480874 0.518605 0.518865

Value of w i

Probability pw i 2.15658E-02 3.55347E-03 8.26707E-02 9.97986E-02 2.58563E-02 1.20332E-01 1.42203E-01 9.61612E-02 7.31521E-02 2.01054E-02

Probability pw i

0.595109 0.606194 0.646899 0.720796 0.747336 0.835398 0.847773 0.885398 0.933185 0.999739

7.57130E-02 8.86943E-02 4.32473E-02 1.66506E-02 3.97192E-02 2.55506E-02 1.52854E-02 8.41436E-03 1.32620E-03 7.08675E-08

TABLE 2. The Results of the Algorithm Dimension n 1

2

3 4

Type of minorants

Number of iterations

Number of the left simplexes

C P À C P À P À P À

2091 18 19 349550 329 865 9289 13592 588145 668906

1301 5 4 261436 23 111 577 1128 34777 13106

X* (0.485959) (0.486084) (0.486084) (0.227470; (0.227783; (0.228043; (0.227539; (0.227417; (0.227539; (0.227661;

0.528999) 0.529541) 0.529785) 0.469727; 0.469604; 0.469727; 0.469604;

0.657715) 0.657654) 0.610352; 0.790527) 0.610229; 0.790649)

Let the cost functions c( x, w ) of servicing a fixed customer w from the center at the point x admit tangent minorants f( x, y, w ) . Then the functions (12) j( x, y, w ) = min 1 £ i £ n f( x i , y i , w ) are tangent minorants of the function of minimum F( x, y ) = ò j( x, y, w )P ( dw ) are tangent minorants of F ( x ).

f ( x, w ) = min

1£i £n

c( x i , w ),

and

the

functions

W

If c( x, w ) is a function smooth on x, with the Lipschitz gradient, then we may take tangent paraboloids as tangent minorants f( x, y, w ). Note also that function (9) can be represented as the difference of two convex functions, c( x, w ) =

| | x - w| |a + b 1 . || x - w||a g g ( g + | | x - w| | b )

Therefore, it is possible to take the concave function f( x, y, w ) =

| | x - w| |a + b a 1 || y - w||a + | | y - w| |a - 2 á ( y - w ), x - y ñ g g g ( g + | | x - w| | b )

(13)

as its minorant tangent at the point y. For the numerical experiment, let us take in function (9) the parameters a = b = 2 and g = 0. 1 . The arrangement of customers w1 , ... , w m are random variables uniformly distributed on the interval [ 0, 1] with the probabilities p w 1 , ... , p w m , respectively. For m = 20 , the values of w i and p w i are presented in Table 1. The branch and bound algorithm involves the lower estimate of the value of the objective function (11) of the form [19] 1 L(Y ) = ( E max y Î Z j( x , y, w ) + E min y Î Z j( x , y, w )) , 2 212

where Z is the finite set of points (nodes) from Y ; x ÎY , j( x, y, w ) are tangent, at the points y, minorants for the function (10), calculated in the form (12), where the tangent minorants f( x, y, w ) for the service cost function (9) are taken as tangent cones (C), paraboloids (P), and also in the form (13) (alternative to the cone and paraboloid — (À)). The Lipschitz constant for the function c( x, w ) l = 3 / 8 3 / g » 2.053959 and its gradient L = 2 / g = 20. In the algorithm, the initial set [ 0; 1] n is divided into simplexes, which, during branching, were divided into two subsimplexes according to the greatest edge. The calculation accuracy is e = 10 - 6 . The iterative process is completed when the chosen accuracy is achieved or when 2,100,000 iterations are made. The results of the algorithm are presented in Table 2. CONCLUSIONS Here, the Piyavskii method of global optimization and the branch and bound algorithm have been modified to apply to a stochastic global optimization problem with an expectation-type objective function. The concept of (stochastic) tangent minorants of functions of the problem is the basis for the modifications. In particular, a technique of calculating tangent minorants of the function of expectation is presented, namely, that one may take the expectation of tangent minorants of random integrands as minorants. Here, the situation is similar to that arising in calculating gradients of integral functionals. It is rather difficult to calculate a gradient or a minorant of an integral functional, but it is quite easy to calculate a (stochastic) gradient or a (stochastic) minorant of an integrand. It is proposed to approximate the initial objective function by empirical approximations. The knowledge of stochastic tangent minorants enables us to construct easily tangent minorants for the approximations and to obtain the boundaries of optimal values of approximations. This, in turn, allows us to apply the Piyavskii method and the branch and bound algorithm to global minimization of the initial function in terms of a sequence of approximations.

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

S. A. Piyavskii, “An algorithm of searching for an absolute minimum of functions,” in: Theory of Optimal Solutions [in Russian], Part 2, Inst. Kibern. AN USSR, Kiev (1967), pp. 13–24. Yu. M. Danilin and S. A. Piyavskii, “An algorithm of searching for an absolute minimum,” in: Theory of Optimal Solutions [in Russian], Part 2, Inst. Kibern. AN USSR, Kiev (1967), pp. 25–37. S. A. Piyavskii, “An algorithm of searching for an absolute minimum of functions,” Zh. Vych. Mat. Mat. Fiz., 12, No. 4, 888–896 (1972). R. Horst and H. Tuy, Global Optimization (Deterministic Approaches), 3rd, revised and enlarged edition, Springer Verlag, Berlin (1996). V. I. Norkin, “The Piyavskii method for the solution of the general global optimization problem,” Zh. Vych. Mat. Mat. Fiz., 32, No. 7, 992–1007 (1992). Yu. M. Ermol’ev, Problems and Methods of Stochastic Programming [in Russian], Nauka, Moscow (1976). A. Ruszczynski and A. Shapiro (eds.), Handbooks in Operations Research and Management Science, 10: Stochastic Programming, North-Holland, Amsterdam (2003). Yu. M. Ermol’ev and V. I. Norkin, “Solution of nonconvex nonsmooth stochastic optimization problems,” Kibern. Sist. Analiz, 39, No. 5, 701–715 (2003). V. Norkin, Yu. M. Ermoliev, and A. Ruszczynski, “On optimal allocation of indivisibles under uncertainty,” in: Working paper WP-94-021, Int. Inst. for Appl. Syst. Analysis, Laxenburg (Austria) (1994). V. Norkin, G.Ch. Pflug, and A. Ruszczynski, “A branch and bound method for stochastic global optimization,” Math. Progr., 83, 425–450 (1998). V. Norkin, Yu. M. Ermoliev, and A. Ruszczcynski, “On optimal allocation of indivisibles under uncertainty,” Oper. Res., 46, No. 3, 381–395 (1998). V. I. Norkin, “Global optimization of probabilities by the stochastic branch and bound method,” in: Stochastic Optimization: Numerical Methods and Technical Applications, Proc. 3rd GAMM/IFIP Workshop (Neubiberg/Munich, June 17-20, 1996), Lecture Notes in Economics and Mathematical Systems, Springer, Berlin, 458 (1998), pp. 186–201. 213

13.

14. 15. 16. 17.

18. 19.

20. 21. 22. 23.

214

B. J. Lence and A. Ruszczcynski, “Managing water quality under uncertainty: application of a new stochastic branch and bound method,” in: Working Paper WP-96-066, Int. Inst. for Appl. Syst. Analysis, Laxenburg (Austria), June (1996). K. H&&aggl&&of, “The implementation of the stochastic branch and bound method for applications in river basin water quality management,” in: Working Paper WP-96-89, Int. Inst. for Appl. Syst. Analysis, Laxenburg (Austria) (1996). W. J. Gutjahr, A. Hellmayr, and G. C. Pflug, “Optimal stochastic single-machine tardiness scheduling by stochastic branch-and-bound,” Eur. J. Oper. Res., 117, No. 2, 396–413 (1999). W. J. Gutjahr, C. Strauss, and E. Wagner, “A stochastic branch-and-bound approach to activity crashing in project management,” J. Computing, 12, No. 2, 125–135 (2000). V. I. Norkin, “Global stochastic optimization: The method of branches and probabilistic boundaries,” in: Methods of Control and Decision-Making under Risk and Uncertainty Conditions [in Russian], Inst. Kibern. AN Ukr., Kiev (1993), pp. 3–12. V. I. Norkin and B. O. Onishchenko, “A stochastic analog of the Piyavskii method of global optimization,” in: Theory of Optimal Solutions [in Russian], Issue 2, Inst. Kibern. NAN Ukr., Kiev (2003), pp. 61–67. V. I. Norkin and B. O. Onishchenko, “The branch and bound algorithm with minorant estimates for solving the stochastic global optimization problem,” in: Komp. Matem. [in Russian], Issue 1, Inst. Kibern. NAN Ukr., Kiev (2004), pp. 91–101. E. A. Nurminskii, Methods of the Solution of Stochastic Minimax Problems [in Russian], Naukova Dumka, Kiev (1979). O. Khamisov, “On optimization properties of functions, with a concave minorant,” J. Global Optim., 14, No. 1, 79–101 (1999). A. D. Ioffe and V. M. Tikhomirov, Theory of Extremum Problems [in Russian], Nauka, Moscow (1974). L. Le Cam, “On some asymptotic properties of maximum likelihood estimates and related Bayes’ estimates,” Univ. California Publ. Statist., 1, 227–330 (1953).