Quadratic programs with quadratic constraint ... - Semantic Scholar

4 downloads 0 Views 240KB Size Report
5] J. K. Cullum and R. A. Willoughby. Lanczos ... gence for partially known quasi-newton methods. ... Newton's method with a model trust region modi- cation.
Quadratic programs with quadratic constraint: characterization of KKT points and equivalence with an unconstrained problem Stefano Lucidi

Laura Palagi October 1994

Massimo Roma y

This work was partially supported by the Agenzia Spaziale Italiana, Roma, Italy Universita di Roma \La Sapienza" - Dipartimento di Informatica e Sistemistica via Buonarroti, 12 - 00185 Roma, Italy and Gruppo Nazionale per l'Analisi Funzionale e le sue Applicazioni del Consiglio Nazionale delle Ricerche.  y

1

Abstract In this paper we consider the problem of minimizing a quadratic function with a quadratic constraint. We point out some new properties of the problem. In particular, in the rst part of the paper, we show that (i) the number of values of the objective function at KKT points is bounded by 3n + 1 where n is the dimension of the problem; (ii) given a KKT point that is not a global minimizer, it is immediate to nd a \better" feasible point; (iii) strict complementarity holds at the local-nonglobal minimum point. In the second part, we show that the original constrained problem is equivalent to the unconstrained minimization of a piecewise quartic exact merit function. Using the unconstrained formulation we give, in the nonconvex case, a new second order necessary condition for global minimimum points. A possible algorithmic application of the preceding results is brie y outlined.

Key words: quadratic function, quadratic constraint, merit function. AMS subject classi cation: 90C30, 65K05

2

1 Introduction In this paper we study the problem of minimizing a general quadratic function q : IRn ! IR subject to an ellipsoidal constraint, that is minfq (x) : xT Hx  a2 g;

(1)

where H is a symmetric positive de nite n  n matrix and a is a given positive scalar. The interest in this problem started in the context of trust region methods for solving unconstrained optimization problems. In fact, such methods require at each iteration an approximate solution of Problem (1) where q (x) is a local quadratic model of the objective function over a restricted ellipsoidal region centered about the current iterate. Problems with the same structure of (1) also play an important role in the eld of constrained and combinatorial optimization. In fact, the solution of Problem (1) is at the basis of algorithms for solving general constrained nonlinear problems (e.g. [4]), convex quadratic programming (e.g. [41, 21]), nonconvex quadratic programming and integer programming (e.g. [22, 40, 23, 32]). Many papers have been devoted to point out the speci c features of Problem (1). Among the most relevant results there are the necessary and sucient conditions for a point x to be a global minimizer, due to Gay [13] and Sorensen [36], and the characterization of local-nonglobal minimizers due to Martinez [30]. The particular structure of the Problem (1) has led to the development of algorithms for nding its global solution. The rst algorithms proposed in literature were those of Gay and Sorensen ([13, 36]). More and Sorensen ([27]) developed an algorithm that produces an approximate global minimum point in a nite number of steps. More recently, it has been proved that an approximation of the global solution can be computed in polynomial time (see for example [38, 37, 39, 40, 22]). Furthermore, More [26] has considered a more general case by allowing in Problem (1) a general quadratic constraint and has extended the results of ([13, 36, 27]). All the preceding algorithms are designed to nd just a global (possibly approximate) minimum point of Problem (1). However as indicated in [30] (see also [16]), in some methods proposed in the eld of constrained optimization it is necessary to locate also the local minimum point for Problem (1). This situation occurs, for example, in some trust3

region algorithm for nonlinear programming problems or in algorithms based on trust-region strategy for the minimization of a general differentiable function with a ball-constraint [3, 42, 34, 6, 7, 29, 11, 31]. Martinez [30] has proposed an algorithm for the computation of the local-nonglobal minimum point that, unfortunately, requires the knowledge of the rst two eigenvalues of the matrix Q. In conclusion, due to the growing use of problem with the same structure of (1) in tackling dicult and large scale optimization problems, in particular those coming from combinatorial problems, there is still much interest in studying it. In our opinion, this motivates further research on this topics that could be exploited for de ning ecient methods to locate its local and global minimum points. The aim of this paper is to further characterize the features of Problem (1). In particular, our research develops along two lines: the study of some properties of its Karush-Kuhn-Tucker points and its trasformation into an unconstrained minimization problem. In Section 3 we show that (i) the objective function q (x) can assume at KKT points at most 3n +1 di erent values; (ii) given a KKT point x which is not a global minimum, it is possible to nd a new feasible point x^ such that the objective function is strictly decreased, i.e. q (^x) < q (x); (iii) the strict complementarity condition holds at the local minimum point, hence in the nonconvex case, strict complementarity holds at local and global minimum points. In Section 4 we show that there is a one to one correspondence between KKT (global minimum) points of Problem (1) and stationary (global minimum) points of a piecewise quartic merit function P : IR n ! IR. Therefore, Problem (1) is transformed into the following unconstrained problem: min P (x): (2) x2IRn In Section 5 we brie y outline possible applications of the results of Section 3 and Section 4. In particular, we give a new second order necessary condition for global minimum point of Problem (1) and we 4

sketched possible algorithmic applications for de ning algorithms both for local and global optimization. In Section 6 we consider the particular case of the `2 -norm constraint namely the case in which the matrix H is the identity matrix. In the sequel we will use the following notation. Given a vector x 2 IRn , we denote by kxk the `2 norm on IRn . The `2 norm of a n  n matrix Q is de ned by kQk = supfkQxk : kxk = 1g.

2 Preliminaries We consider the following expression for the quadratic objective function q : IRn ! IR q(x) = 12 xT Qx + cT x (3) where Q is a n  n symmetric matrix and c 2 IR n . We denote by F the feasible set of Problem (1) that is: n

o

F = x 2 IRn : xT Hx  a2 : We note that H 21 can be viewed as a nonsingular scaling matrix of the variables. Its use is dictated not only by the growing interest in scaling and preconditioning tecniques in trust region methods [25] but also in the eld of combinatorial optimization to nd best ellipsoidal approximation for the ipercube ([20]). The Lagrangian function associated with Problem (1) is the function L(x; ) = 12 xT Qx + cT x + (xT Hx ? a2) A Karush-Kuhn-Tucker (KKT) point for Problem (1) is a pair (x;  ) 2 IR n  IR such that: (Q + 2 H )x = ?c (xT H x ? a2 ) = 0 (4)   0 xT H x  a2: Furthermore, we say that strict complementarity holds at a KKT pair (x;  ) if  > 0 for xT H x = a2 . 5

It is well known that it is possible to completely characterize the global solution of Problem (1) without requiring any convexity assumption on the objective function. In fact, the following result due to Gay [13] and Sorensen [36] holds (see also Vavasis [37]): Proposition 2.1 A point x such that xT Hx  a2 is an optimal solution of Problem (1), if and only if there exists a unique   0 such that the pair (x ; ) satis es the KKT conditions (Q + 2H )x = ?c (xT Hx ? a2 ) = 0 and the matrix (Q + 2H ) is positive semide nite. If (Q + 2H ) is positive de nite then Problem (1) has a unique global solution.

Martinez [30] gave a characterization of the local-nonglobal minimum points for Problem (1). For sake of simplicity we report the result in the case H = I ; the extension to the ellipsoidal1 case can be simply obtained by considering the transformation y = H 2 x. Proposition 2.2 There exists at most one local-nonglobal minimum point x of Problem (1). Moreover we have kxk = a2 and the KKT necessary conditions (for the case H = I ) holds with  2 (?2; ?1) where 1 < 2 are the rst two eigenvalues of Q.

3 Further features of KKT points In this section we give some properties of the KKT points for Problem (1). Our interest in the characterization of KKT points, is due to the fact that, in general, algorithms for the solution of constrained problem, converge towards KKT points. First, we prove that the number of sets of KKT points with di erent value of the objective function is bounded from above by a linear polynomial in the dimension of the problem. We state a preliminary result. Lemma 3.1 Let (xb; ) and (x; ) be KKT pairs for Problem (1) with the same KKT multiplier. Then q (xb) = q (x). 6

Proof We observe that the function q(x) can be rewritten at every KKT pair (x; ) as follows

q(x) = 21 cT x ? xT Hx:

By using the KKT condition we obtain q(xb) = 21 cT xb ? xbT H xb = ? 21 xT (Q + 2H ) xb ? a2 = 21 cT x ? xT H x = q (x) Now, we can prove the following proposition. Proposition 3.2 The number of distinct values of the objective function q(x) at KKT points is bounded from above by 2n + m + 1 where m is the number of negative distinct eigenvalues of Q. Proof Since H is a de nite positive matrix and Q is symmetric, then there exists a n  n nonsingular matrix V such that (see [19], Corollary 7.6.5) V T HV = I V T QV = diagi=1;:::;n fi g: We can express x as x= V so that we can write the KKT conditions (4) as follows: (Q + 2H ) V = ?c; T < a2 ; and  = 0; (5) T 2 = a ; and   0: First we observe that at every KKT point (x; ) such that T < a2 the value of the objective function q is constant. This easily follows from Lemma 3.1 by observing that all these pairs are characterized by the fact that  = 0. Now, we consider the values of the function q (x) at all the points such that T = a2 : We premultiply the rst equation of (5) by V T and we obtain   diagi=1;:::;n fi g + 2I = ?V T c: Now, let us distinguish the following cases: 7

(a)  = ? 2i for an index i;

(b)  6= ? 2i for all i.

First, let us consider case (a). In this case, as the matrix (diagi fi g + 2I ) has not full rank, conditions (5) do not uniquely characterize the vector and hence may exist di erent vectors which satisfy (5). These vectors have the same KKT multiplier and then, by Lemma 3.1, they are characterized by the same function value. Then, the number of distinct values that the objective function can assume at KKT points that fall under case (a) may be at most the number of distinct negative eigenvalues of V T QV ; since V is nonsingular V T QV has the same number of distinct negative eigenvalues of Q. Now we consider case (b) that is  6= ? 2i for all i; in this case, since the matrix (diagi fi g + 2I ) is non singular, the vector is uniquely characterized by (5). Hence, we can write (i + 2) i = ? i i = 1; : : :; n n X i=1

2i = a2 :

(6) (7)

where i = (V T c)i. From (6) we obtain

i = ?  + i 2 i By replacing this value of i in (7) we have that the multiplier  must satis es n X i2 = a2 ( + 2)2 which can be rewritten

a2

n Y

j =1

i=1

i

(j + 2)2 ?

n X i=1

i2

n Y j=1 j6=i

(j + 2)2 = 0:

(8)

From (8), it follows that the values of  satisfying (5) are, at most, as many as the number of zeroes of the 2n degree polinomyal (8). Therefore, 8

the number of KKT points which fall under case (c) may be at most 2n and this gives us also a bound to the number of the possible distinct values assumed by the objective function at these points. Finally, by summarizing all the possible cases we can conclude that the number of distinct values that the objective function can assume at KKT points is bounded above by 2n + m + 1. Remark Since m  n, we note that the number of distinct values of the objective function is bounded above by 3n + 1. The peculiarity of Problem (1) allows us to show another interesting property of the KKT points of Problem (1). In fact, in the particular case of Problem (1), it is possible to escape from the KKT points that are not global solutions. In the following proposition, we show that, whenever we have a KKT point x, either x is a global minimum point of Problem (1), or it is possible to compute the expression of a feasible point with a strictly lower value of the objective function. In fact, if x is not a global minimum point, Proposition 2.1 ensures that there exists a direction z such that z T (Q + 2H )z < 0 and by using this direction it is immediate to determine a feasible point x^ such that q (^x) < q (x). Proposition 3.3 Let (x; ) be a KKT point for Problem (1), such that x is not a global minimum point and let z 2 IR n be a vector such that zT (Q + 2H )z < 0: Let us de ne the point x^ in the following manner (i) if xT H x < a2 , x^ = x + z with h

i1

?zT H x + (zT H x)2 + (a2 ? xT H x)zT Hz 2 0<  : zT Hz (ii) if xT H x = a2 and xT Hz = 6 0, T Hz x^ = x ? 2 xzT Hz z: (iii) if xT H x = a2 ; xT Hz = 0 and cT x > 0, x^ = ?x: 9

(iv) if xT H x = a2 ; xT Hz = 0 and cT x  0, T x + z ) x^ = x ? 2 (x +x zH)( T H ( x + z) (x + z)

with

h

i1

?cT z + (cT z)2 + jcT xjjzT (Q + 2H )zj 2 : > jzT (Q + 2H )zj Then we have q (^x) < q (x) and x^T H x^  a2 . Proof Since x is not a global minimum, by Proposition 2.1 there exists a vector z 2 IR n such that z T (Q + 2H )z < 0: In case (i) we have by the KKT conditions that  = 0 and hence we have that z is a vector of negative curvature for q (x). Therefore, for every > 0 the point x^ = x + z satis es the inequality q(x + z) = q(x) + 12 2 zT Qz < q(x):

In particular, if we take  ~ with h

i1

?zT H x + (zT H x)2 + jxT H x ? a2jzT Hz 2 ~ = zT Hz we have that x^H x^  a2. Now, let us consider case (ii). Let x^ be the vector de ned as follows T Hz x^ = x ? 2 xzT Hz z

and consider

L(x; ) = 21 xT (Q + 2H )x + cT x: (9) We note that x^T H x^ = a2 and that z is a negative curvature direction for the quadratic function L(x; ). By simple calculation, taking into account that (Q + 2H )x = ?c we get T Hz j2 T  L(^x; ) = L(x; ) + 2 (jxzT Hz )2 z (Q + 2H )z 10

and hence L(^x; ) < L(x;  ). Recalling the expression (9) we can write q(^x) < q(x) + 12 (xT H x ? x^T H x^) = q(x): Hence we get the result for case (ii). Let us consider the case (iii). We can take x^ = ?x. In fact x^ is still feasible and q(^x) = 21 x^T Qx^ + cT x^ = 21 xT Qx ? jcT xj = q(x) ? 2jcT xj < q(x): Now consider case (iv). Let us de ne the vector s = x + z with > 0: We can nd a value for such that s is a negative curvature direction for L(x;  ) and sT H x 6= 0, so that we can proceed as in case (ii). In fact, by simple calculation we have:

sT H x = (x + z)T H x = xT H x = a2 and by using the KKT conditions sT (Q + 2H )s = xT (Q + 2H )x + 2 zT (Q + 2H )z + 2 xT (Q + 2H )z = ? 2 z T (Q + 2H )z ? 2 cT z + jcT xj: By solving the quadratic equation with respect to we get that for all

> , sT (Q + 2H )s < 0 where h

?cT z + (cT z)2 + jcT xjjzT (Q + 2H )zj  = jzT (Q + 2H )zj

i1

2

Hence, by proceeding as in case (ii), we get the result by introducing the point T x + z ) x^ = x ? 2 (x +x zH)( T H ( x + z) (x + z) with >  .

From the numerical point of view, the computation of a negative curvature direction is an easy task. In fact, we can obtain such a direction by using, for example, the Bunch-Parlett decomposition [2, 28], some 11

modi ed Cholesky factorizations [35] or, for large scale problem, some methods based on Lanczos algorithms [5]. Now, as last result of this section, we investigate a regularity property of the local and global minimum points. In particular, we focus our attention on the strict complementarity property, that roughly speaking indicates that these points are "really constrained". This property can be interesting from an algorithmic point of view. We prove that at a KKT point that is the local-nonglobal minimum point, the strict complementarity condition holds. For sake of simplicity we state the result in the case H = I so that we can use directly the result of Proposition 2.2. The extension to the ellipsoidal case can be obtained by observing that by Corollary 7.6.5 of [19], there exists a n  n nonsingular matrix V such that V T HV = I; V T QV = diagi=1;:::;n fi g and the matrix diagi=1;:::;n fi g has the same inertia of Q. Proposition 3.4 At a KKT point that is the local-nonglobal minimum point for Problem (1) the strict complementarity condition holds.

Proof Since x is a local minimum point the KKT conditions (4) hold. Moreover the second order necessary conditions require that zT (Q + 2 I )z  0 for all z : z T x = 0:

(10)

By Proposition 2.2 we have that  2 (?2 ; ?1) and kxk2 = a2 : Obviously if 1; 2  0 there is no local-nonglobal minimum point. Furthermore, if 1 < 0 and 2  0 necessarily  > 0. So we can restrict ourselves to the case 1 < 0; 2 > 0; since in this case 0 2 (?2; ?1). Let us assume by contradiction that  = 0. From the (4) and Proposition 2.2 we have that

Qx = ?c zT Qz  0 for all z : z T x = 0  = 0 kxk2 = a2 Since x is not a global minimum point, by Proposition 2.1 there exists a direction y such that y T Qy < 0 and from the second order necessary conditions y T x 6= 0. We assume, without loss of generality, that y T x < 12

0. Let us consider the point x( ) = x + y with > 0. We prove that for suciently small values of the point x( ) is feasible and produces a smaller value of the objective function, thus contradicting the assumption of local optimality. In fact, we have kx( )k2 = kxk2 + 2 yT x + 2 kyk2 and hence for < 2 jkyykx2j we obtain kx( )k2 < a2 . Moreover, q(x( )) = q(x) + rq(x)T y + 12 2 yT Qy = q(x) + 12 2 y T Qy < q(x): T

By this proposition and by Proposition 2.1 we directly obtain the following result. Proposition 3.5 In the nonconvex case at every local or global minimum point the strict complementarity holds.

4 Unconstrained formulation In this section, we show that Problem (1) is equivalent to an unconstrained minimization problem of a piecewise quartic merit function. We construct this function following the classical exact penalty approach (see for example [10]). The distinguishing features of our approach consist in exploiting the particular structure of Problem (1) in the same spirit of [15] and [24]. This allows us to de ne a continuosly di erentiable penalty function that enjoys the following properties:  it is globally exact (in the sense of [10]) without requiring any shifted barrier term and this signi cantly simpli es its expression;  it is known, a priori, for which values of the penalty parameter the correspondence between constrained and unconstrained problem holds. As rst step we recall the Hestenes-Powell-Rockafellar augmented Lagrangian function ([33, 17, 18]) " # 2  " 2 T 2 2 La(x; ; ") = q(x) + 4 max 0; " (x Hx ? a ) +  ?  13

where  2 IRm and " is a given positive parameter. Now, according to the classical approach, we must replace the multiplier vector  in the function La (x; ; ") with a multiplier function (x) : IRn ! IR, which yields an estimate of the multiplier vector associated to Problem (1) as a function of the variables x. In the literature di erent multiplier functions have been proposed ([12, 14, 9, 10, 24]). Here we de ne a new simpler multiplier function which exploits the particular structure of Problem (1), whose expression is the following   (x) = ? 21a2 xT Qx + cT x : (11) The properties of the multiplier function are summarized in the following proposition.

Proposition 4.1 (i) (x) is continuosly di erentiable with gradient r(x) = ? 21a2 (2Qx + c) (ii) If (x;  ) is a KKT point for Problem (1) then we have

(x) = : (iii) For every x 2 IRn we have

xT rL(x; (x)) = 2(x)(xT Hx ? a2)

(12)

Proof Part (i) easily follows from the de nition of the multiplier func-

tion (11). As regards part (ii), from (4) we have that a pair (x; ) satis es xT Qx + cT x + 2xT H x = 0: (13)

It is easy to see that if xT H x = a2, (13) corresponds exactly to the de nition of the multiplier function (11). Otherwise, if xT H x < a2 , (4) imply that  = 0 and hence we get from the expression (11) that (x) = 0: 14

Now let us consider part (iii). By simple calculations we have T (xT Qx + cT x) xT rL(x; (x)) = xT Qx + cT x ? x aHx 2 = ? 21a2 (xT Qx + cT x)2(xT Hx ? a2 ) = 2(x)(xT Hx ? a2)

On the basis of the previous considerations we can replace the vector  in the function La with the multiplier function (x). Furthermore, as regards the penalty parameter ", we can select a priori an interval of suitable values depending on the problem data Q; H; c; a. Therefore, we are now ready to de ne our merit function P (x) = La (x; (x); "(Q; H; c; a)), that is # " 2  " 2 2 T 2 P (x) = q(x) + 4 max 0; " (x Hx ? a ) + (x) ?  (x) (14) where (x) is the quadratic function given by (11), " is every parameter that satis es the following inequality: 16a41 (15) 0 < " < a2 (8kQk + 3) + kck2 (21 + 3 kH k) ; and 1 is the minimum eigenvalue of the positive de nite matrix H . First, we show some immediate properties of the merit function P .

Proposition 4.2

(i) P (x) is continuosly di erentiable with gradient

rP (x) = Qx + c ? 2" (x)r(x)     (16) 4 2 " T 2 + 2 max 0; " (x Hx ? a ) + (x) " Hx + r(x) ;

(ii) P (x) is twice continuosly di erentiable except at points where 2 (xT Hx ? a2 ) + (x) = 0;

"

15

(iii) for every x such that xT Hx  a2 we have that

P (x)  q(x); (iv) the penalty function P (x) is coercive and hence it admits a global minimum point.

Proof Part (i) and (ii) directly follows from the expression of the

penalty function P . As regards Part (iii) we distinguish the cases (a) 2 T 2 T 2 2 " (x Hx ? a ) + (x)  0 and (b) " (x Hx ? a ) + (x) > 0. In case (a) we have that P (x) = q(x) ? 4" (x)2 and hence we get the result. In case (b), the expression of the penalty function becomes   2  P (x) = q(x) + 1 xT Hx ? a2 + (x) xT Hx ? a2 :

"

Taking into account that xT Hx  a2 we have and hence

T 2 0 < xT Hx ? a2 + 2" (x)  x Hx2 ? a + 2" (x)

(xT Hx ? a2 )2 + "(x)(xT Hx ? a2 )  0: As regards part (iv), we want to show that as kxk ! 1 the function P (x) goes to in nity. First, we observe that 2 (xT Hx ? a2 ) + (x)  ( 2  ? 1 kQk) kxk2 ? 1 kckkxk ? 2a2 ; " " 1 2a2 2 " hence for suciently large values of kxk the leading term is strictly 2 positive as, recalling (15), we have that "  4kaQk1 : Then, for suciently large values of kxk, we can assume that   2 T 2 max 0; (x Hx ? a ) + (x) = 2 (xT Hx ? a2) + (x):

"

"

16

Hence, by simple calculation, the expression of the penalty function becomes    2 P (x) = 12 xT Qx + cT x + 1" xT Hx ? a2 + (x) xT Hx ? a2 : The following inequalities hold:   2 1 1 2 2 T 4 T P (x)  ? 2 kQk kxk ? kckkxk + " x Hx ? 2a x Hx + a +    ? 21a2 kQkkxk2 + kckkxk xT Hx + a2    xT Hx ? 21a2 kQk + 1" 1 kxk2 ? xT Hx 2kcak2 kxk 2 4 ?xT Hx 2"a ? kQkkxk2 ? 32 kckkxk + a" By (15), we have that "  2kaQk1 and hence we get 2

lim P (x) = 1:

kxk!1

The existence of the global minimum point immediately follows from the continuity of P and the compactness of the level sets. Before studying the correspondence between stationary points of the function P and KKT points of Problem (1), we report this well known result whose proof is immediate.

Lemma 4.3

max



xT H x ? a2; ? " (x) 2

if and only if

xT H x  a2;

(x)  0;



=0

(x)(xT H x ? a2) = 0:

Now, we prove the rst result about the exactness properties of the penalty function P . Proposition 4.4 A point x 2 IRn is a stationary point of P (x) if and only if (x; (x)) is a KKT pair for Problem (1). Furthermore, in this point we have P (x) = q (x). 17

Proof First, we note that we can rewrite the gradient as follows   rP (x) = rL (x; (x)) + r(x) max xT Hx ? a2; ? 2" (x)

(17)   4 " + " Hx max xT Hx ? a2 ; ? 2 (x) : Only if part. Assume that (x;  ) is a KKT pair for Problem (1) then by recalling Lemma 4.3, the result easily follows by observing that the gradient rP (x) can be rewritten in the form (17). If part. In order to prove this part, it is enough to show that 



(18) max xT H x ? a2 ; ? " (x) = 0: 2 In fact, from the expression (17) of the gradient of P we have immediately that 0 = rP (x) = rL(x; (x)); which togheter with Lemma 4.3 implies that the point (x; (x)) is a KKT point of Problem (1). We turn to prove (18). We distinguish the cases x = 0 and x 6= 0. The point x = 0 is a stationary point for P if and only if c = 0. On the other hand, the point x = 0 is a KKT point for Problem (1) if and only if c = 0. Now, we assume x 6= 0. By (16) we can write n

o

"xT rP (x) = "xT rL(x; (x)) +n 4xT Hx max xT Hxo? a2; ? 2" (x) +"xT r(x) max xT Hx ? a2 ; ? 2" (x) = 2" (x) xT Hx ? a2  n o + "xT r(x) + 4xT Hx max xT Hx ? a2 ; ? 2" (x)

Hence, taking into account that o   n 2"(x) xT Hx ? a2 = (2"(x) ? 4xT Hx) max xT Hx ? a2 ; ? 2" (x) n o2 +4 max xT Hx ? a2 ; ? 2" (x) ;

it easily follows that

  "xT rP (x) = max xT Hx ? a2; ? 2" (x) M (x; ")

18

(19)

where

M (x; ") = "xT r(x)+2"(x)+4a2+4 max



xT Hx ? a2 ; ? " (x)



(20) 2 Now, our aim is to show that M (x; ") is strictly positive for every x 2 IR n . n o First, we consider the case max xT Hx ? a2 ; ? 2" (x) = xT Hx ? a2 that is   xT Hx ? a2  4"a2 xT Qx + cT x By simple calculation we get the inequality 4 2 (21) kxk2  8a2 kH8ak +?""(2kkckQk + 1) : In this case we have n o M (x; ") = 4xT Hx + "xT r(x) + 2"(x)   = 4xT Hx ? 2"a2 4xT Qx + 3cT x      41 kxk2 ? 2a" 2 4 kQkkxk2 + 32 kxk2 + kck2 h   i = 41a2 kxk2 16a21 ? "(8 kQk + 3) ? 3 kck2 " Recalling that by (15) the term 16a2 1 ? "(8 kQk + 3) is positive, and by using (21), we can write the following inequality: 2 2 k ? 4a2p1" + 64a61 (22) M (x; ")  "2ak2c(8k ak2QkH k + " (2 kQk + 1)) where p1 = a2 (8 kQk + 3) + (21 + 3 kH k) kck2 : (23) The numerator of the right term of (22) is a quadratic function in " which assume positive values in the interval (0; "1) where 



"1 = b p1 ? p ? q with

q = 16a21 kck2 kQk ; 19

2 1

1 

b=

2

2a2 : 2

kck kQk

(24)

Now, we note the following relationships

bq 16a4 1 bq  1  2p = a2 (8kQk + 3) + kck2 (2 + 3 kH k) 1 1 p1 + p21 ? q 2 which by the choice (15) for the parameter ", imply that " 2 (0; "1). o n " T 2 Therefore, for all x such that max x Hx ? a ; ? 2 (x) = xT Hx ? a2 and for all " satisfying (15), we get "1 =

?

M (x; ") > 0: n

o

Now, let us consider the case max xT Hx ? a2 ; ? 2" (x) = ? 2" (x) that is   xT Hx ? a2  4a" 2 xT Qx + cT x : By simple calculation, and by (15), we have that 8a21 ? " (2 kQk + 1) is positive and hence we have the inequality 2 4 kxk2  8a2" ?kck" (2+k8Qak + 1) : 1

(25)

In this case we have that M (x; ") = "xT r(x) + 4a2   = ? 2"a2 2xT Qx + cT x + 4a2 h i (26)  ? 4a" 2 (4 kQk + 1) kxk2 + kck2 + 4a2 2 2 ? k c k a2p2 " + 64a61  2a2 (8kQa2k" +?"4(2 kQk + 1)) 1 where p2 = a2 (8 kQk + 3) + kck2 1: The numerator of the last term of (26) is a quadratic function in " which assume positive values in the interval (0; "2) where 



"2 = b ?p2 + p22 + q 20

1  2

where q; b are already de ned by (24). Let us observe that p1; p2 > 0, p1 = p2 + kck2 (1 + 3 kH k) where p1 is given by (23), and it is easily seen that p22 + q  p21 , hence the following inequality holds

q q  1  2p : 1 p2 + p22 + q 2 Now, the choice (15) for the parameter ", imply thato" 2 (0; "2) and n T hence for every x such that max x Hx ? a2 ; ? 2" (x) = ? 2" (x) we 

1

?p2 + p22 + q 2 =

?

have

M (x; ") > 0: Therefore, if x is a point x)o= 0, recalling (19) we n such that rP ( " T 2 have necessarily that max x H x ? a ; ? 2 (x) = 0; that is equivalent

to the conditions (18). Hence, we have proved the rst part of the proposition. Now, in order to complete the proof we have to show that P (x) = q(x). This easily follows from Lemma 4.3, Proposition 4.1 (ii) and by observing that the penalty function can be rewritten as

P (x) = q(x) + (x) max



xT Hx ? a2; ? " (x)



 22 1 " T 2 + " max x Hx ? a ; ? 2 (x) :

Now we prove the correspondence betweeen global minimum points of Problem (1) and global minimum points of the penalty function P . Proposition 4.5 Every global minimum point of (1) is a global minimum point of P (x) and conversely.

Proof By Proposition 4.4, the penalty function P admits a global

minimum point x^, which is obviously a stationary point of P and hence by the preceding proposition we have that: P (^x) = q(^x): On the other hand, if x is a global minimum point of Problem (1), it is also a KKT point and hence the preceding proposition implies again 21

that P (x ) = q (x). Now, we proceed by contradiction. Assume that a global minimum point x^ of P (x) is not a global minimum point of Problem (1), then there should exists a point x, global minimum of Problem (1), such that

P (^x) = q(^x) > q(x) = P (x ) that contradicts the assumption that x^ is a global minimum point of P . The converse is true by analogous consideration. In order to complete the correspondence between the solution of Problem (1) and the unconstrained minimization of the penalty function P we show that every local unconstrained minimum point corresponds to a local minimum point of Problem (1). Proposition 4.6 If x is a local unconstrained minimum point of P (x), then x is a local minimum point of Problem (1) and (x) is the associated KKT multiplier.

Proof If x is a local unconstrained minimum point of P (x) the pair

(x; (x)) satis es the KKT conditions for Problem (1). Moreover, by Proposition 4.4, we have that P (x) = q (x) and hence, since x is a local minimum point of P , there exists a neighbourhood (x) of x such that

q(x) = P (x)  P (x) for all x 2 (x): Thus, by using (iii) of Proposition 4.2, we obtain

q(x)  P (x)  q(x) for all x 2 (x) \ F

(27)

and hence we get the result.

Corollary 4.7 The function P (x) admits at most a local-nonglobal min-

imum point.

Proof The proof easily follows from Proposition 2.2 and Proposition 4.6.

We report now some additional results based on a second order analysis. 22

Proposition 4.8 Let (x; ) be a KKT point for Problem (1) and assume

that the strict complementarity holds at (x; ). Then

(i) the function P (x) is twice continuosly di erentiable in a neighborhood of x and the hessian matrix evaluated at x is given by: if xT H x < a2 r2P (x) = Q ? 8a" 4 ccT if xT H x = a2   8 1 2 2 T T r P (x) = Q+2(x)H + (cx H +H xc )+ (x) + H xxT H

a2

a2

"

(ii) if r2P (x) is positive semide nite (de nite) then (x; (x)) is a KKT point for Problem (1) that satis es the second order necessary (suf cient) conditions for Problem (1).

Proof The proof of Part (i) can be found in [8]. As regards part (ii), we assume that z T r2P (x)z  0 (z T r2 P (x)z > 0) for every z 2 IRn . If xT H x < a2 by part (i) we have that  2 zT r2P (x)z = z T Qz ? 8a" 4 cT z :

(28)

By the complementarity condition (x) = 0 and hence the second order necessary (sucient) conditions for Problem (1) reduces to require that zT Qz  0 (z T Qz > 0 )for every z 2 IR n. But this obviously follows from (28). If the constraint is binding, i.e. xT H x = a2, we will prove that

zT (Q + 2(x)H ) z  0;

for every z : z T H x = 0:





zT (Q + 2(x)H ) z > 0; for every z : z T H x = 0 Let us consider the espression of r2P (x). For every z such that z T H x = 0 we have

zT r2 P (x)z = z T (Q + 2(x)H ) z

and hence by the assumption made we get the result. 23

5 A hint at the applications of the unconstrained formulation Besides their own theorical interest, the results given in Section 3 and Section 4 can be combined to state new theoretical properties or to de ne new algorithms for determinig global and/or local minimum points of Problem (1). In this section we give a hint of the possible applications of the results of Section 3 and Section 4. As regards possible theoretical applications we propose a new second order necessary optimality condition for Problem (1). Proposition 5.1 Assume that Q is not positive semide nite, if x is a global minimum point of Problem (1) then there exists a unique vector  > 0 such that the KKT conditions (4) hold and   1 8 2 T T   Q + 2H + (cx H + H xc ) +  + H xxT H

a2

a2

"

is positive semide nite for every " satisfying (15).

Proof If x is a global minimum point of Problem (1), by Proposi-

tion 2.1, we have that at every global point  > 0 and hence xT H x = a2 . Then, there exists a neighborhood (x) of x such that 2 (xT H x ? a2 ) + (x) 6= 0:

"

Thus, by (ii) of Proposition 4.2, the function P (x) is twice continuously di erentiable in (x). Then by Proposition 4.5 x is also a global minimum point of P (x). Therefore x satis es the second order necessary conditions to be a global unconstrained minimum point of P . Then the result follows from (i) of Proposition 4.8.   We point out that the matrix 1 (cxT H + H xcT ) + 8  + 2 H xxT H

a2

a2

"

is not necessarily positive semide nite. A similar result was given in [1]. The results of Section 3 and Section 4 can be appealing also from the algorithmic point of view. In fact, in the nonconvex case, by Proposition 3.5 and Proposition 4.8 we have 24

Proposition 5.2 If Q is not positive semide nite, then the function P (x) is twice continuosly di erentiable in a neighborhood of every local

and global minimum point. On the basis of the preceding proposition, it is possible to minimize the function P , by using superlinearly (or quadratically) convergent Newton-type algorithms. A di erent possibility is to use unconstrained algorithms globally converging to points that satisfy the second order necessary conditions for Problem (2). In this case, Proposition 4.4 and part (ii) of Proposition 4.8 ensure that the points obtained by this class of algorithms satisfy also the second order necessary condition for the constrained Problem (1). Although the study of a numerical algorithm for the solution of Problem (1) is out of the aim of this paper, we give here an idea of an algorithm to nd the global minimum of Problem (1). In particular, in Proposition 3.3 we have shown that given a KKT point which is not a global solution for Problem (1), we are able to nd a new feasible point with a lower value of the objective function. In Proposition 3.2 we proved that the number of of KKT points with di erent value of the objective function is bounded from above by 3n + 1. Therefore we can determine the global minimum point of Problem (1) by applying at most 3n + 1 times an algorithm that, given a starting point, locates a KKT point with a lower value of the objective function. A possible way to realize such algorithm is to use an unconstrained algorithm for minimizing function P . In fact, starting from a point x0 , we can obtain a stationary point x for P such that P (x)  P (x0 ): Then Proposition 4.4 ensures that x is a KKT point of Problem (1) and that P (x) = q (x). On the other hand, if x0 is a feasible point, part (iii) of Proposition 4.2 yields that q(x) = P (x)  P (x0 )  q(x0 ): In conclusion by using this unconstrained algorithm, we get a KKT point of Problem (1) with a value of the objective function lower than the value at the starting point. All these considerations can be summarized in a formal way in the following algorithmic scheme.

25

Algorithmic scheme

Step 0. Let x0 be a given (feasible) point. Step 1. Find a stationary point of the penalty function P (x) from the starting point x0 . Step 2. If the solution point x is a global minimum point of Problem (1), then stop; else goto Step 3. Step 3. By using Proposition 3.3 and by part (iii) Proposition 4.2, nd a new feasible point x^ such that

P (^x)  q(^x) < q(x) = P (x): Step 4. Set x0 = x^ and goto Step 1.

The preceding algorithm model may be the basis for the de nition of ecient algorithms that nd an approximate solution of the global minimum point of Problem (1) in a nite number of steps. This will be the topics of future research. rapdis24

6 The `2-norm constraint In this section, we discuss the case in which the matrix H is the identity matrix. In this case the feasible set is changed into:

Ft = fx 2 IRn : kxk2  a2g and the problem under consideration is minfq (x) : kxk2  a2 g

(29)

where q (x) is given by (3). We can restate all the results of Section 3 and we get the following: Proposition 6.1 Let (x; ) be a KKT point for Problem (29), such that x is not a global minimum point and let z 2 IR n be a vector such that zT (Q + 2I )z < 0: Let us de ne the point x^ in the following way 26

(i) if kxk2 < a2 ,

x^ = x + z

with

h

i1

?zT x + (zT x)2 + (a2 ? kxk2) kzk2 2 : 0<  kzk2 (ii) if kxk2 = a2 and xT z = 6 0, T x^ = x ? 2 x z2 z: kzk (iii) if kxk2 = a2 ; xT z = 0 and cT x > 0,

x^ = ?x: (iv) if kxk2 = a2 ; xT z = 0 and cT x  0, T x^ = x ? 2 x (x + z)2 (x + z) k(x + z)k

with

h

i

?cT z + (cT z)2 + jcT xjjzT (Q + 2I )zj : > jzT (Q + 2I )zj Then we have q (^x) < q (x) and kx^k2  a2 . Obviously Proposition 3.2 still holds. The expression of the exact penalty function, in which (x) is given by (11), becomes "

2  Pt(x) = q(x) + 4" max 0; 2" (kxk2 ? a2) + (x) ? (x)2

where " satis es the following inequality:

a4 0 < " < a2 (8kQk16 + 3) + 5kck2 27

#

simply obtained by (15) by observing that, in this case, 1 = 1 and kI k = 1. The penalty function is continuosly di erentiable with gradient:

rPt(x) = Qx + c ? 2" (x)r(x)      + 2" max 0; 2" kxk2 ? a2 + (x) 4" x + r(x)

Obviously, all the results obtained in Section 4 concerning the exactness properties of the penalty function still hold.

Remark We note that the problem of minimizing a quadratic function with a strictly convex quadratic constraint

minfq (x) : xT Hx + bT x  a2 g can be easily reduced to a problem of the form (29). In fact, the quadratic form xT Hx + bT x has the unique global minimizer (the center) at the point x = ?H ?1 b. Now we can consider the change of coordinates that put the origin in the center of the quadratic function, that is x~ = x + H ?1 b. Hence, we get the following minimization problem. min q~(x) = 12 xT Qx + c~T x + d

kxk2  a2

where

c~ = c ? QH ?1b; d = 12 bT H ?2b ? cT H ?1 b:

References [1] A. Bagchi and B. Kalantari. New optimality conditions and algorithms for homogeneous and polynomial optimization over spheres. Technical Report 40-90, RUTCOR, Rutgers University, New Brunswick, NJ 08903, 1990. [2] J.R. Bunch and B.N. Parlett. Direct methods for solving symmetric inde nite systems of linear equations. SIAM Journal on Numerical Analysis, 8:639{655, 1971. 28

[3] M. R. Celis, J. E. Dennis, and R. A. Tapia. A trust region satrtegy for nonlinear equality constrained optimization. In P.T. Boggs, R. Byrd, and R. Schnabel, editors, Numerical Optimization, pages 71{82. Society for Industrial and Applied Mathematics, Philadelphia, 1984. [4] T. F. Coleman and C. Hempel. Computing a trust region step for a penalty function. SIAM J. on Sci. Statist. Comput., 11:180{201, 1990. [5] J. K. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computation. Birkhauser, 1985. [6] J. E. Dennis, D. M. Gay, and R. E. Welsch. An adaptive nonlinear least-squares algorithms. ACM Trans Math. Software, 7:348{368, 1981. [7] J. E. Dennis, H. J. Martnez, and R. A. Tapia. A convergence theory for the structured BFGS secant method with an application to nonlinear least squares. Technical Report 87-15, Mathematical Sciences Dept., Rice University, 1988. [8] G. Di Pillo and L. Grippo. A continuosly di erentiable exact penalty function for nonlinear programming problems with inequality constraints. SIAM Journal on Control and Optmization, 23:72{84, 1985. [9] G. Di Pillo and L. Grippo. An exact penalty method with global convergence properties for nonlinear programming problems. Mathematical Programming, 36:1{18, 1986. [10] G. Di Pillo and L. Grippo. Exact penalty functions in constrained optimization. SIAM Journal on Control and Optimization, 27(6):1333{1360, 1989. [11] J. R. Engels and H. J. Martnez. Local and superlinear convergence for partially known quasi-newton methods. SIAM Journal on Optimization, 1:42{56, 1991. 29

[12] R. E. Fletcher. A class of methods for nonlinear programming with termination and convergence properties. In J. Abadie, editor, Integer and Nonlinear Programming, pages 157{173, Amsterdam, 1979. North-Holland. [13] D. M. Gay. Computing optimal locally constrained steps. SIAM J. on Sci. Statist. Comput., 2(2):186{197, 1981. [14] T. Glad and E. Polak. A multiplier method with automatic limitation of penalty growth. Mathematical Programming, 17:140{155, 1979. [15] L. Grippo and S. Lucidi. A di erentiable exact penalty function for bound constrained quadratic programming problems. Optimization, 22:557{578, 1991. [16] M. Heikenschloss. On the solution of a two ball trust region subproblem. Mathematical Programming, 64:249{276, 1994. [17] M. R. Hestenes. Multiplier and gradient methods. In L.A. Zadeh, W. Neustadt, and A.V. Balakrishnan, editors, Computing methods in optimization problems, pages 143{164. Academic Press, New York, 1969. [18] M.R. Hestenes. Multiplier and gradient methods. Journal Optimization Theory and Application, 4:303{320, 1969. [19] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985. [20] A. Kamath and N. Karmarkar. A continuous approach to compute upper bounds in quadratic maximization problems with integer constraints. In C. A. Floudas and P. M. Pardalos, editors, Recent Advances in Global Optimization, pages 125{140, Princeton University, 1991. Princeton University Press. [21] S. Kapoor and P. Vaidya. Fast algorithms for convex quadratic programming and multicommodity ows. In Proc. 18th Annual ACM Symp. Theory Comput., pages 147{159, 1986. 30

[22] N. Karmarkar. An interior-point approach to NP-complete problems. In Proceedings of the Mathematical programming society Conference on Integer Programming and Combinatorial Optimization, pages 351{366, 1990. [23] N. Karmarkar, M. G. C. Resende, and K.G. Ramakrishnan. An interior point algorithm to solve computationally dicult set covering problems. Mathematical Programming, 52:597{618, 1991. [24] W. Li. A di erentiable piecewise quadratic exact penalty functions for quadratic programs with simple bound constraints. Technical report, Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529, 1994. [25] J. J. More. Recent developments in algorithms and software for trust region methods. In A. Bachem, M. Grotschel, and B. Korte, editors, Mathematical Programming { The state of the art, pages 258{287, Bonn, 1983. [26] J. J. More. Generalization of the trust region problem. Optimization Methods and Software, 2:189{209, 1993. [27] J. J. More and D. C. Sorensen. Computing a trust region step. SIAM J. on Sci. Statist. Comput., 4(3):553{572, 1983. [28] J.J. More and D.C. Sorensen. On the use of directions of negative curvature in a modi ed Newton method. Mathematical Programming, 16:1{20, 1979. [29] J. M. Martnez. Local convergence theory of inexact Newton methods based on structured least change updates. Math. Comput., 55:143{168, 1990. [30] J. M. Martnez. Local minimizers of quadratic functions on Euclidean balls and spheres. SIAM Journal on Optimization, 4(1):159{ 176, 1994. [31] J. M. Martnez and S. A. Santos. Trust region algorithms on arbitrary domains. In TIMS-SOBRAPO, Rio de Janeiro, 1991. 31

[32] P. M. Pardalos, Y. Ye, and C.-G. Han. Algorithms for the solution of quadratic knapsack problems. Linear Algebra Appl., 25:69{91, 1991. [33] M. J. D. Powell. A method for nonlinear constraints in minimization problem. In R. Fletcher, editor, Optimization, pages 283{298. Academic Press, New York, 1969. [34] M. J. D. Powell and Y. Yuan. A trust region algorithm for equality constrained optimization. Mathematical Programming, 49:189{211, 1991. [35] T. Schlick. Modi ed Cholesky factorization for sparse preconditioners. SIAM Journal on Scienti c Computing, 14:424{445, 1993. [36] D. C. Sorensen. Newton's method with a model trust region modi cation. SIAM J. on Sci. Statist. Comput., 19(2):409{427, 1982. [37] S. A. Vavasis. Nonlinear Optimization. Oxford University Press, 1991. [38] S. A. Vavasis and R. Zippel. Proving polynomial-time for sphereconstrained quadratic programming. Technical Report 90-1182, Department of Computer Science, Cornell University, Ithaca, New York, 1990. [39] Y. Ye. A new complexity result on minimization of a quadratic function with a sphere constraint. In C. A. Floudas and P. M. Pardalos, editors, Recent Advances in Global Optimization, pages 19 { 31, Princeton University, 1991. Princeton University Press. [40] Y. Ye. On ane scaling algorithms for nonconvex quadratic programming. Mathematical Programming, 56:285{300, 1992. [41] Y. Ye and E. Tse. An extension of Karmarkar's projective algorithm for convex quadratic programming. Mathematical Programming, 44:157{179, 1989. [42] Y. Yuan. On a subproblem of trust region algorithms for constrained optimization. Technical Report NA10, DAMPT, University of Cambridge, Norfolk, VA 23529, 1988. 32