Universit a degli Studi di Roma \La Sapienza" On

2 downloads 0 Views 259KB Size Report
Dipartimento di Informatica e Sistemistica , Universit a di Roma \La Sapienza" via Buonarroti ... Key words: quadratic function, `2-norm constraint, merit function.
Universita degli Studi di Roma \La Sapienza" Dipartimento di Informatica e Sistemistica

S. Lucidi

L. Palagi

M. Roma

On some properties of quadratic programs with a convex quadratic constraint 1

Tech. Rep. 33-96

Stefano Lucidi, Laura Palagi, Massimo Roma

Dipartimento di Informatica e Sistemistica , Universita di Roma \La Sapienza" via Buonarroti 12 - 00185 Roma, Italy. E-mail: [email protected], [email protected], [email protected] 1

This work was partially supported by Agenzia Spaziale Italiana, Roma, Italy.

On some properties of quadratic programs with a convex quadratic constrainty Stefano Lucidi

Laura Palagi

Massimo Roma z

Abstract

In this paper we consider the problem of minimizing a (possibly nonconvex) quadratic function with a quadratic constraint. We point out some new properties of the problem. In particular, in the rst part of the paper, we show that (i) given a KKT point that is not a global minimizer, it is easy to nd a \better" feasible point; (ii) strict complementarity holds at the local-nonglobal minimizer. In the second part, we show that the original constrained problem is equivalent to the unconstrained minimization of a piecewise quartic merit function. Using the unconstrained formulation we give, in the nonconvex case, a new second order necessary condition for global minimizers. In the third part, algorithmic applications of the preceding results are brie y outlined and some preliminary numerical experiments are reported.

Key words: quadratic function, `2-norm constraint, merit function. AMS subject classi cation: 90C30, 65K05

1 Introduction In this paper we study the problem of minimizing a general quadratic function q : IR n ! IR subject to an ellipsoidal constraint, that is minfq (x) : xT Hx  a2 g;

(1)

where H is a symmetric positive de nite n  n matrix and a is a positive scalar. The interest in this problem initially arose in the context of trust region methods for solving unconstrained optimization problems. In fact, such methods require at each iteration an approximate solution of Problem (1) where q (x) is a local quadratic model of the objective function over a restricted ellipsoidal region centered around the current iterate. However, recently, it has been shown that problems with the same structure of This work was partially supported by Agenzia Spaziale Italiana, Roma, Italy Universita di Roma \La Sapienza" - Dipartimento di Informatica e Sistemistica - via Buonarroti, 12 - 00185 Roma, Italy and Gruppo Nazionale per l'Analisi Funzionale e le sue Applicazioni del Consiglio Nazionale delle Ricerche. y

z

2

(1) play an important role not only in the eld of unconstrained minimization. In fact, the solution of Problem (1) is at the basis of algorithms for solving general constrained nonlinear problems (e.g. [3, 42, 20, 27]), and integer programming problems (e.g. [21, 41, 22, 31, 19]). Many papers have been devoted to point out the speci c features of Problem (1). Among the most important results there are the necessary and sucient conditions for a point x to be a global minimizer, due to Gay [12] and Sorensen [35], and the characterization and uniqueness of the local-nonglobal minimizer due to Martnez [26]. The particular structure of the Problem (1) has led to the development of algorithms for nding a global solution. The rst algorithms proposed in literature were those of Gay and Sorensen [12, 35]. More and Sorensen [29] developed an algorithm that produces an approximate global minimizer in a nite number of steps. More recently, it has been proved that an approximation to the global solution can be computed in polynomial time (see for example [39, 38, 40, 41, 21]). Furthermore, More [28] has considered a more general case by allowing in Problem (1) a general quadratic constraint and has extended the results of [12, 35, 29]. In spite of all these results, there is still interest in studying Problem (1). In fact, as we mentioned before, there is a growing use of Problem (1) as a tool for tackling large nonlinear programming problems and combinatorial optimization problems. This leads to the necessity of solving more and more eciently large scale problems of the type (1) and motivates further research on theoretical properties of Problem (1) and on the de nition of ecient methods for locating its global minimizers. Recently some interesting algorithms for tackling large scale trust region problems have been proposed in [36, 34, 33]. The basic idea behind these algorithms is to recast the trust region problem in term of a parametrized eigenvalue problem and then to adjust the parameter to nd an optimal solution. In this paper we point out further theoretical properties of Problem (1). In particular, our research develops along two lines: the study of some new properties of its Karush-Kuhn-Tucker points and its equivalence to an unconstrained minimization problem. Besides their own theorical interest, these results allows us to de ne new classes of algorithms for solving large scale case trust region problems. These algorithms use only matrix-vector product and do not require the solution of an eigenvalue problem at each iteration (see [24] for details). The paper is organized as follows. In Section 2 we recall some preliminary results. In Section 3 we show that (i) given a KKT point x which is not a global minimizer, it is possible to nd a new feasible point x^ such that the objective function is strictly decreased, i.e. q(^x) < q(x); (ii) the strict complementarity condition holds at the local minimizer, hence in the nonconvex case, strict complementarity holds at local and global minimizers. In Section 4 we show that there is a one to one correspondence between KKT points (global minimizers) of Problem (1) and stationary points (global minimizers) of 3

a piecewise quartic merit function P : IRn ! IR. Therefore, Problem (1) is equivalent to the unconstrained problem of P over IRn . In Section 5, by exploiting some results of the preceding sections, we give a new second order necessary condition for global minimizers of Problem (1). Finally, in Section 6, we sketch some possible applications of the results of Section 3 and Section 4 for de ning new classes of algorithms for solving large scale trust region problems. In the sequel we will use the following notation. Given a vector x 2 IR n , we denote by kxk the `2 -norm on IR n . The `2-norm of a n  n matrix Q is de ned by kQk = supfkQxk : kxk = 1g. Moreover, we denote by 1  2  : : :  n the eigenvalues of Q.

2 Preliminaries Without loss of generality we can assume that the feasible set F is de ned by n

o

F = x 2 IRn : kxk2  a2 ; so that the problem under consideration is minfq (x) : kxk2  a2g (2) where q : IRn ! IR is given by (3) q(x) = 12 xT Qx + cT x and Q is a n  n symmetric matrix and c 2 IR n . In fact, since H is positive de nite,1 we can reduce Problem (1) to the form (2) by employing the transformation y = H 2 x (however we refer to [25] for the direct treatment of Problem (1)). The Lagrangian function associated with Problem (2) is the function L(x; ) = 21 xT Qx + cT x + (kxk2 ? a2 ) A Karush-Kuhn-Tucker point for Problem (2) is a pair (x;  ) 2 IR n  IR such that: (Q + 2I )x = ?c (4) (kxk2 ? a2) = 0   0; kxk2  a2: Furthermore, we say that strict complementarity holds at a KKT pair (x;  ) if  > 0 for kxk2 = a2. It is well known that it is possible to completely characterize the global solutions of Problem (2) without requiring any convexity assumption on the objective function. In fact, the following result due to Gay [12] and Sorensen [35] holds (see also Vavasis [38]): 4

Proposition 2.1 A point x such that kxk2  a2 is a global solution of Problem (2),

if and only if there exists a unique   0 such that the pair (x ; ) satis es the KKT conditions  

(Q + 2 I )x = ?c (kx k2 ? a2) = 0

and the matrix (Q + 2I ) is positive semide nite. If (Q + 2I ) is positive de nite then Problem (2) has a unique global solution.

Moreover, Martnez [26] gave the following characterization of the local-nonglobal minimizer for Problem (2). Proposition 2.2 There exists at most one local-nonglobal minimizer x of Problem (2). Moreover we have kxk = a2 and the KKT necessary conditions holds with  2 (?2 ; ?1 ) where 1 < 2 are the two smallest eigenvalues of Q.

3 Further features of KKT points In this section we give some new properties of the KKT points for Problem (2). Our interest in the characterization of KKT points is due to the fact that, in general, algorithms for the solution of constrained problems, converge towards KKT points. We show that the number of di erent values that the objective function can take at KKT points is bounded from above by the number of negative eigenvalues of the matrix Q. First we state a preliminary result that extends one given in [38]. Lemma 3.1 Let (xb; ) and (x; ) be KKT pairs for Problem (2) with the same KKT multiplier. Then q (xb) = q (x).

Proof We observe that the function q(x) can be rewritten at every KKT pair (x; ) as follows

q(x) = 12 cT x ?  kxk2 : By using the KKT conditions we obtain q(xb) = 12 cT xb ?  kxbk2 = ? 12 xT (Q + 2I ) xb ? a2 = 12 cT x ?  kxk2 = q(x) Now, we can state the following proposition whose proof follows from a result of Forsythe and Golub [11] on the number of stationary values of a second degree polynomial on the unit sphere. For sake of completeness we give a sketch of the proof. Proposition 3.2 There exist at most minf2m + 2; 2n + 1g KKT points with distinct multipliers , where m is the number of negative distinct eigenvalues of Q. 5

Proof First we observe that at every KKT point (x; ) such that kxk2 < a2 the value

of the objective function q is constant. This easily follows from Lemma 3.1 by observing that all these pairs are characterized by the fact that  = 0. Now, we consider the values of the function q (x) at all the points such that kxk2 = a2 : Since Q is symmetric, there exists an orthogonal matrix V such that V T QV = diagi=1;:::;n fi g where 1  2  : : :  n are the eigenvalues of Q. By considering the transformation x = V we can write the rst equation of the KKT condition (4) (premultiplied by V T ) as follows: 



diagi=1;:::;n fi g + 2I = ?

(5)

with = V T c. Hence, recalling that kxk2 = kV k2 = a2 , we have that the KKT multipliers must satisfy the system

g() = a2 0 where

g() =

n X

i2 : 2 i=1 (i + 2)

(6) (7)

  1 1 1 The function g () has poles at ? 2 and it is convex on the subintervals ? 2 ; ? 2 . i i?1 i Thus there exists at most 2 roots of g () = a2 in each subinterval. Moreover, since g() ! 0 as  ! 1, the equation g() = a2 has one root in each exteme subinterval. If all eigenvalues i are positive there exists at most one non negative root; if all the eigenvalues are negative there are at most 2n non negative roots; in the case of m < n negative eigenvalues, there are at most 2m + 1 non negative roots. Hence the number of the solutions of system (6) is at most minf2m + 1; 2ng. Finally, by summarizing the two cases, we can conclude that the number of distinct KKT multipliers is bounded above by minf2m + 1; 2ng + 1.

Recalling Lemma 3.1, we get directly the following corollary.

Corollary 3.3 The number of distinct values of the objective function q(x) at KKT points is bounded from above by minf2m + 2; 2n + 1g.

Now we can state the main result of this section. In particular, we show that the peculiarity of Problem (2) can be exploited to escape from the KKT points that are not global solutions in the sense that, whenever we have a KKT point x, either x is a global minimizer of Problem (2), or it is possible to compute the expression of a feasible point with a strictly lower value of the objective function. This results is very appealing from a computational point of view, as discussed in Section 6. Proposition 3.4 Let (x; ) be a KKT point for Problem (2). Let us de ne the point x^ in the following way 6

(a) If cT x > 0 then

x^ = ?x:

(b) If cT x  0 and a vector z 2 IRn such that z T (Q + 2I )z < 0 exists, then (i) if kxk2 < a2 , x^ = x + z with

h

i1

?zT x + (zT x)2 + (a2 ? kxk2) kzk2 2 : 0<  kzk2 (ii) if kxk2 = a2 , and xT z = 6 0, T x^ = x ? 2 x z2 z: kzk (iii) if kxk2 = a2 and xT z = 0,

x^ = x ? 2 with

a2 (x + z ) a2 + 2 kzk2

h

i1

cT z ? (cT z)2 + (cT x)(z T (Q + 2I )z) 2 > : zT (Q + 2I )z Then we have q (^x) < q (x) and kx^k2  a2 .

Proof In case (a), the point x^ is still feasible and q(^x) = 12 x^T Qx^ + cT x^ = 12 xT Qx ? jcT xj = q(x) ? 2jcT xj < q(x):

Now consider case (b). In case (i) we have by the KKT conditions that  = 0 and hence we have that z is a vector of negative curvature for q (x). Therefore, for every > 0 the point x^ = x + z satis es the inequality q(x + z) = q(x) + 21 2 zT Qz < q(x): In particular, if we take  ~ with h

?zT x + (zT x)2 + j kxk2 ? a2j kzk2 ~ = kzk2 we have that kx^k2  a2 . 7

i1

2

Now, let us consider case (ii). Let x^ be the vector de ned as follows T

x^ = x ? 2 x z2 z kzk and consider the quadratic function L(x; ) = 12 xT (Q + 2I )x + cT x: (8) We note that kx^k2 = a2 and that z is a negative curvature direction for the quadratic function L(x;  ). By simple calculation, taking into account that (Q + 2I )x = ?c we get T 2 L(^x; ) = L(x; ) + 2 jx z2j z T (Q + 2I )z kzk and hence L(^x; ) < L(x;  ). Hence, recalling the expression (8) we can write q(^x) < q(x) + 21 (kxk2 ? kx^k2) = q(x): Hence we get the result for case (ii). Let us consider the case (iii). Let us de ne the vector s = x + z with > 0: We can nd a value for such that s is a negative curvature direction for L(x; ) and sT x 6= 0, so that we can proceed as in case (ii). In fact, by simple calculation we have: sT x = (x + z)T x = xT x = a2 and by using the KKT conditions sT (Q + 2I )s = xT (Q + 2I )x + 2 z T (Q + 2I )z + 2 xT (Q + 2I )z = ? 2 z T (Q + 2 I )z ? 2 cT z + jcT xj: By solving the quadratic equation with respect to we get that sT (Q + 2 I )s < 0 for all >  where h

?cT z + (cT z)2 + jcT xjjzT (Q + 2I )zj  = jzT (Q + 2I )zj

i1 2

Hence, by proceeding as in case (ii), we get the result by introducing the point T x^ = x ? 2 (x +x z(x)T+(x z+) z) (x + z)

with >  .

Remark We note that the local-nonglobal minimizer can corresponds either to the case (a) with kxk2 = a2 or to the case (b)(ii). 8

The preceding proposition shows that, if the KKT point x is not a global minimizer, it is possible to determine a feasible point x^ such that q (^x) < q (x) by computing at most a direction z such that z T (Q + 2I )z < 0. The existence of such a direction is guaranteed by Proposition 2.1 and from the numerical point of view, its computation is not an expensive task. In fact, we can obtain such a direction by using, for example, the Bunch-Parlett decomposition [2, 30], modi ed Cholesky factorizations [10] or, for large scale problem, methods based on Lanczos algorithms [4]. Now, as last result of this section, we investigate a regularity property of the local and global minimizers. In particular, we focus our attention on the strict complementarity property, that, roughly speaking, indicates that these points are \really constrained". Also this property can be interesting from an algorithmic point of view. Proposition 3.5 At the local-nonglobal minimizer for Problem (2) the strict complementarity condition holds.

Proof Since x is a local minimizer the KKT conditions (4) hold. Moreover the second

order necessary conditions require that zT (Q + 2I )z  0 for all z : z T x = 0: (9) By Proposition 2.2 we have that  2 (?2 ; ?1 ) and kxk2 = a2 : Obviously if 1 ; 2  0 there is no local-nonglobal minimizer. Furthermore, if 1 < 0 and 2  0 necessarily  > 0. So we can restrict ourselves to the case 1 < 0; 2 > 0; since in this case 0 2 (?2 ; ?1 ). Let us assume by contradiction that  = 0. From (4) and Proposition 2.2 we have that rq(x) = 0 zT Qz  0 for all z : z T x = 0  = 0; kxk2 = a2 Since x is not a global minimizer, by Proposition 2.1 there exists a direction y such that y T Qy < 0 and from the second order necessary conditions y T x 6= 0. We assume, without loss of generality, that y T x < 0. Let us consider the point x( ) = x + y with > 0. We prove that for suciently small values of the point x( ) is feasible and produces a smaller value of the objective function, thus contradicting the assumption of local optimality. In fact, we have kx( )k2 = kxk2 + 2 yT x + 2 kyk2 T and hence for < 2 jy x2j we obtain kx( )k2 < a2. Moreover, kyk q(x( )) = q(x) + rq(x)T y + 21 2 yT Qy = q(x) + 21 2 yT Qy < q(x): 9

By this proposition and by Proposition 2.1 we directly obtain the following result.

Proposition 3.6 In the nonconvex case at every local or global minimizer the strict complementarity holds.

4 Unconstrained formulation In this section, we show that Problem (2) is equivalent to an unconstrained minimization problem of a piecewise quartic merit function. A general constrained optimization problem can be transformed into an unconstrained problem by de ning a continuously di erentiable exact penalty function by following, for example, the approach proposed in [6, 7]. However, in the special case of minimization of a quadratic function with box constraints, it has been shown in [16] and [23] that it is possible to de ne simpler penalty functions by exploiting the particular structure of the problem. In the same spirit of these papers we show that also for Problem (2) it is possible to construct a particular continuously di erentiable penalty function. This new penalty function takes full advantage of the peculiarities of the trust region problem and enjoys distinguishing features that make its unconstrained minimization signi cantly simpler in comparison with the unconstrained minimization of the penalty functions proposed in [6, 7]. The main properties of the penalty function proposed in this section are:  it is globally exact according to the de nition of [7];  it does not require any shifted barrier term hence it is de ned on the whole space;  it has a very simple expression (it is piecewise quartic);  it is known, a priori, for which values of the penalty parameter the correspondence between the constrained problem and the unconstrained one holds. As a rst step for the de nition of the exact penalty function, we recall the HestenesPowell-Rockafellar augmented Lagrangian function [32, 18] # " 2  " 2 2 2 2 La (x; ; ") = q(x) + 4 max 0; " (kxk ? a ) +  ?  where  2 IRm and " is a given positive parameter. Now, according to the classical approach, we replace the multiplier vector  in the function La (x; ; ") with a multiplier function (x) : IRn ! IR, which yields an estimate of the multiplier vector associated to Problem (2) as a function of the variables x. In the literature di erent multiplier functions have been proposed (see e.g. [9, 13, 6, 7, 23]). However, all the expression of the multiplier functions given in [9, 13, 6, 7] are not de ned in the origin of the space. Here we de ne a new simpler multiplier function that is de ned on the whole space IRn whose expression is the following   (10) (x) = ? 21a2 xT Qx + cT x 10

Its properties are summarized in the following proposition.

Proposition 4.1 (i) (x) is continuosly di erentiable with gradient r(x) = ? 21a2 (2Qx + c) : (ii) If (x;  ) is a KKT point for Problem (2) then we have (x) =  : (iii) For every x 2 IRn we have xT rq (x) + 2(x) kxk2 = 2(x)(kxk2 ? a2 ):

Proof Part (i) easily follows from the de nition of the multiplier function (10). As regards part (ii), from (4) we have that a pair (x;  ) satis es xT Qx + cT x + 2 kxk2 = 0:

(11)

It is easy to see that if kxk2 = a2 , (11) corresponds exactly to the de nition of the multiplier function (10). Otherwise, if kxk2 < a2 , (4) imply that  = 0 and hence by comparing (11) and (10) that (x) = 0: Now let us consider part (iii). By simple calculations we have 2 xT rq(x) + 2(x) kxk2 = xT Qx + cT x ? kxa2k (xT Qx + cT x) = ? 2a12 (xT Qx + cT x)2(kxk2 ? a2 ) = 2(x)(kxk2 ? a2 )

On the basis of the previous considerations we can replace the vector  in the function La with the multiplier function (x). Furthermore, as regards the penalty parameter ", we can select, a priori, an interval of suitable values depending on the problem data Q; c; a. Therefore, we are now ready to de ne our merit function P (x) = La (x; (x); "(Q; c; a)), that is #

"

2  P (x) = q(x) + 4" max 0; 2" (kxk2 ? a2) + (x) ? 2(x)

(12)

where (x) is the quadratic function given by (10) and " is any parameter that satis es the following inequality: a4 (13) 0 < " < a2 (8kQk16 + 3) + 5kck2 :

11

First, we show some immediate properties of the merit function P .

Proposition 4.2

(i) P (x) is continuosly di erentiable with gradient    4 " 2 " 2 2 rP (x) = Qx + c ? 2 (x)r(x)+ 2 max 0; " (kxk ? a ) + (x) " x + r(x) (ii) P (x) is twice continuosly di erentiable except at points where 2 (kxk2 ? a2) + (x) = 0; " (iii) P (x) is twice continuosly di erentiable in a neighborhood of a KKT point x where strict complementarity holds; (iv) for every x such that kxk2  a2 we have that P (x)  q (x); (v) the penalty function P (x) is coercive and hence it admits a global minimizer.

Proof Part (i), (ii) and (iii) directly follows from the expression of the penalty func-

tion P . As regards Part (iv) it follows from a classical results on penalty functions (see Theorem 2 of [7]). As regards part (v), we want to show that as kxk ! 1 the function P (x) goes to in nity. First, we observe that 2 (kxk2 ? a2 ) + (x)   2 ? 1 kQk kxk2 ? 1 kckkxk ? 2a2 ; " " 2a2 2 " hence for suciently large values of kxk the leading term of the preceding inequality is 2 strictly positive since, recalling that " satis es (13), we have that "  k4Qa k : Then, for suciently large values of kxk, we can assume that   max 0; 2" (kxk2 ? a2) + (x) = 2" (kxk2 ? a2 ) + (x): By simple calculation, the expression of the penalty function becomes in this case:   2  P (x) = 21 xT Qx + cT x + 1" kxk2 ? a2 + (x) kxk2 ? a2 ; and the following inequalities hold: !   2 4 1 2 a 1 k c k 3 P (x)  ? 2a2 kQk + " kxk4 ? 2a2 kxk ? " + kQk kxk2 ? 32 kckkxk + a" 2 As " satis es (13), we have that "  k2Qa k and hence we get kxlim P (x) = 1. The k!1 existence of the global minimizer immediately follows from the continuity of P and the compactness of its level sets. 12

Now, we state the rst result about the exactness properties of the penalty function P . Since its proof is technical and lenghty we report it in the Appendix.

Proposition 4.3 A point x 2 IRn is a stationary point of P (x) if and only if (x; (x))

is a KKT pair for Problem (2). Furthermore, at this point we have P (x) = q (x).

Now we prove that there is a one to one correspondence betweeen global minimizers of Problem (2) and global minimizers of the penalty function P . Proposition 4.4 Every global minimizer of Problem (2) is a global minimizer of P (x) and conversely.

Proof By Proposition 4.3, the penalty function P admits a global minimizer x^, which

is obviously a stationary point of P and hence by the preceding proposition we have that: P (^x) = q(^x): On the other hand, if x is a global minimizer of Problem (2), it is also a KKT point and hence the preceding proposition implies again that P (x ) = q (x). Now, we proceed by contradiction. Assume that a global minimizer x^ of P (x) is not a global minimizer of Problem (2), then there should exists a point x , global minimizer of Problem (2), such that P (^x) = q(^x) > q(x) = P (x ) that contradicts the assumption that x^ is a global minimizer of P . The converse is true by analogous considerations.

In order to complete the correspondence between the solution of Problem (2) and the unconstrained minimization of the penalty function P we prove the following result that considers the corrispondence between local minimizers. Proposition 4.5 The function P (x) admits at most a local-nonglobal minimizer x which is a local minimizer of Problem (2) and (x) is the associated KKT multiplier.

Proof We rst prove that if x is a local minimizer of P (x) then the pair (x; (x)) sat-

is es the KKT conditions for Problem (2). Moreover, by Proposition 4.3, we have that P (x) = q(x) and hence, since x is a local minimizer of P , there exists a neighbourhood

(x) of x such that q(x) = P (x)  P (x) for all x 2 (x):

Thus, by using (iv) of Proposition 4.2, we obtain q(x)  P (x)  q(x) for all x 2 (x) \ F (14) and hence x is a local minimizer for Problem (2). The proof can be easily completed by recalling Proposition 2.2. 13

5 A new second order optimality condition The results given in Section 3 and Section 4 can be combined to state new theoretical properties of Problem (2). In this section we introduce a new second order necessary optimality condition for Problem (2) for the nonconvex case that follows from the unconstrained formulation. Proposition 5.1 Assume that Q is not positive semide nite, if x is a global (local) minimizer of Problem (2) then there exists a unique  > 0 such that the KKT conditions (4) hold and   Q + 2I + a12 (cxT + xcT ) + a82  + 2" xxT is positive semide nite for every " satisfying (13).

Proof If x is a global minimizer of Problem (2), by Proposition 3.6, we have that 2

 > 0 and, hence, kxk = a2 . Then, there exists a neighborhood (x) of x such that 2 (kxk2 ? a2) + (x) 6= 0: " Thus, by (ii) of Proposition 4.2, the function P (x) is twice continuously di erentiable in (x). and the Hessian matrix evaluated at x is given by:   r2P (x) = Q + 2(x)I + a12 (cxT + xcT ) + a82 (x) + 2" xxT By Proposition 4.4, x is also a global minimizer of P (x) and therefore x satis es the second order necessary conditions to be a global unconstrained minimizer of P , that is r2P (x) is positive semide nite. Then the result follows.

Recalling point (a) of Proposition 3.4, we have that in a global minimizer x, it results cT x  0. Hence, the matrix 1 (cxT + xcT ) +  8  + 2  xxT a2 a2 " is not necessarily positive semide nite. A similar second order necessary condition was given in [1], where it has been proved, without requiring any assumptions on the matrix Q, that if the global minimum is on the boundary, the matrix r2L(x; ) + a12 (cxT + xcT ) ? a14 (cT x)xxT is positive semide nite where again the matrix a12 (cxT + xcT ) ? a14 (cT x)xxT is not necessarily positive semide nite. 14

6 Algorithmic application Besides their own theorical interest, the results of the preceding sections are appealing also from a computational point of view. Although the study of a numerical algorithm for the solution of Problem (2) is out of the aim of this paper, in this section we give a hint of possible algorithmic applications of the results of Section 3 and Section 4. We recall that Proposition 3.4 ensures that given a KKT point which is not a global solution for Problem (2), it is possible to nd a new feasible point with a lower value of the objective function and that Proposition 3.2 states that the number of KKT points with di erent value of the objective function is nite. These results indicate a new possibility to tackle large scale trust region problems. In fact they show that a global minimum point of Problem (2) could be eciently computed by applying a nite number of times a constrained optimization algorithm that presents the following features: (i) given a feasible starting point, it is able to locate a KKT point with a lower value of the objective function;

(ii) it presents a \good" (at least superlinearly) rate of convergence; (iii) it does not require an heavy computational burden. A possibility to ensure property (i) is to use any feasible method that forces the decrease of the objective function, following, for example, the approach of [37, 17]. Another possibility is to exploit the unconstrained reformulation of Problem (2) described in Section 4 which allows us to use any unconstrained method for the minimization of the penalty function P . In fact, starting from a point x0 , any of this algorithm obtains a stationary point x for P such that

P (x) < P (x0 ): Then, Proposition 4.3 ensures that x is a KKT point of Problem (2) and that P (x) = q(x). On the other hand, if x0 is a feasible point, part (iv) of Proposition 4.2 yields that q(x) = P (x) < P (x0 )  q(x0): In conclusion by using an unconstrained algorithm, we get a KKT point of Problem (2) with a value of the objective function lower than the value at the starting point. Furthermore, the possibility of transforming the trust region problem into an unconstrained one, seems to be quite appealing also as regards properties (ii) and (iii). In fact Proposition 3.6 and (iii) of Proposition 4.2 guarantees that, in the nonconvex case, the penalty function is twice continuosly di erentiable in every local and global minimizer of the problem. Therefore, in this case, any unconstrained Truncated Newton algorithm (see for example [5, 37, 15]) can be easily adapted in order to de ne globally convergent methods which show a superlinear rate of convergence in a neighbourhood of every global or local minimizer. 15

Nevertheless, we can de ne algorithm with superlinear rate of convergence without requiring that the penalty function is twice continuosly di erentiable in the neighbourhood of the points of interest, that is without requiring the strict complementarity in these points. In fact we can drawn our inspiration from the results in [8]. In particular, we can de ne a search direction dk as follows: (i) if kxk k2 ? a2  ? 2" (xk )

Q + 2(xk )I 2xk 2xTk 0 (ii) if kxk k2 ? a2 < ? 2" (xk )

!

dk zk

!

= ? kxQxkk2 +? ca2 k

(Q + 2(xk )I ) dk = ? (Qxk + c) :

!

(15)

(16)

The results of [8] ensure that the algorithm xk+1 = xk + dk is locally superlinearly convergent without requiring the strict complementarity. Following the approach of truncated Newton method (see for example [5, 15]), in [24] it is shown that an approximate solution d~k of (15)(16) is able to preserve the local superlinear rate of convergence of the algorithmic scheme. Furthermore it is also proved that this direction d~k satis es suitable descent conditions with respect to the penalty function P . This strict connection between the direction d~k and the penalty function P (x) allow us to de ne globally and superlinearly convergent algorithms of the type xk+1 = xk + k d~k ; (17) where k can be determined by every stabilization technique and d~k is computed by using a conjugate gradient based iterative method for solving approximately the linear system (15)(16) . The paper [24] is devoted to a complete description of this approach with the analysis of its theoretical properties and to the de nition of an ecient algorithm. Here, in order to have only a preliminary idea of the viability of this unconstrained approach for solving Problem (2), we have performed some numerical experiments with a rough implementation of algorithm (17) where k is determined by the line-search technique of [14] and d~k is computed by a conjugate gradient algorithm similar to the one proposed in [5]. We coded the algorithm in MATLAB and run the experiments on a IBM/RISC 6000. We run two sets of problems randomly generated that we take from the collection of [34]. We solved ten related problems for each of the two classes both with the easy and the hard case. According to [34], the hard case occurs when the vector c is orthogonal to the subspace generated by the smallest eigenvalue of the matrix Q. In Table 1 we report the results in terms of average number of iterations for problems with increasing dimension (n = 100; 256; 324). We run also a set of near hard-case problems (with n = 100; 256; 324), that is with c nearly orthogonal to the subspace of the smallest eigenvalue of Q. The results are 16

FIRST SET SECOND SET n EASY CASE HARD CASE EASY CASE HARD CASE 100 11.3 21.9 10.7 25.6 256 11 23 11.9 27.9 324 12.9 23.6 12.9 29.9 Table 1: Average number of iterations NEAR HARD CASE mult. min 1 5 10 n = 100 8.8 10.7 9.9 n = 256 9.1 6.9 9.1 n = 324 11 9.4 9.2 Table 2: Average number of iterations reported in Table 2. We tested the invariance with respect to the multiplicity of the smallest eigenvalue (mult. of min = 1; 5; 10). The results obtained are encouraging. The number of iteration is almost constant when the dimension increases. This feature is appealing when solving large scale problems taking into account that, at each iteration, the main e ort is due to the approximate solution of a linear system of dimension n or n ? 1 that requires only matrix-vector products. Furthermore the eciency of the algorithm seems not to be seriously a ected by the occurrence of the hard case, while it is completely insensible to the near-hard case. Of course, even if no nal conclusion can be drawn by these limited numerical experiments, the results obtained encourage further research in de ning new algorithms for solving large scale trust region problems which use the results described in this paper. In particular, as we said before, the possibility of de ning ecient algorithms based on the unconstrained reformulation is investigated in [24].

Acknowledgments We wish to thank S. Santos, D. Sorensen, F. Rendl and H. Wolkowiz, for providing us their Matlab codes and test problems. Moreover we thank the anonymous referees for their helpful suggestions which led to improve the paper.

References [1] A. Bagchi and B. Kalantari. New optimality conditions and algorithms for ho17

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

mogeneous and polynomial optimization over spheres. Technical Report 40-90, RUTCOR, Rutgers University, New Brunswick, NJ 08903, 1990. J.R. Bunch and B.N. Parlett. Direct methods for solving symmetric inde nite systems of linear equations. SIAM Journal on Numerical Analysis, 8:639{655, 1971. T. F. Coleman and C. Hempel. Computing a trust region step for a penalty function. SIAM Journal on Scienti c and Statistical Computing, 11:180{201, 1990. J. K. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computation. Birkhauser, 1985. R.S. Dembo and T. Steihaug. Truncated-Newton methods algorithms for largescale unconstrained optimization. Mathematical Programming, 26:190{212, 1983. G. Di Pillo and L. Grippo. An exact penalty method with global convergence properties for nonlinear programming problems. Mathematical Programming, 36:1{ 18, 1986. G. Di Pillo and L. Grippo. Exact penalty functions in constrained optimization. SIAM Journal on Control and Optimization, 27(6):1333{1360, 1989. F. Facchinei and S. Lucidi. Quadratically and superlinear convergent algorithms for the solution of inequality constrained optimization problems. Journal of Optimization Theory and Application, 85(2), 1985. R. E. Fletcher. A class of methods for nonlinear programming with termination and convergence properties. In J. Abadie, editor, Integer and Nonlinear Programming, pages 157{173, Amsterdam, 1979. North-Holland. A. Forsgren, P.E. Gill, and W. Murray. Computing modi ed Newton directions using a partial Cholesky factorization. SIAM Journal of Scienti c Computing, 16:139{150, 1995. G. E. Forsythe and G. H. Golub. On the stationary values of a second-degree polynomial on the unit sphere. J. Soc. Indust. Appl. Math., 13(4):1050{1068, 1965. D. M. Gay. Computing optimal locally constrained steps. SIAM Journal on Scienti c and Statistical Computing, 2(2):186{197, 1981. T. Glad and E. Polak. A multiplier method with automatic limitation of penalty growth. Mathematical Programming, 17:140{155, 1979. L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search technique for Newton's method. SIAM Journal on Numerical Analysis, 23:707{716, 1986. 18

[15] L. Grippo, F. Lampariello, and S. Lucidi. A truncated Newton method with nonmonotone linesearch for unconstrained optimization. Journal of Optimization Theory and Applications, 60:401{419, 1989. [16] L. Grippo and S. Lucidi. A di erentiable exact penalty function for bound constrained quadratic programming problems. Optimization, 22:557{578, 1991. [17] M. Heinkenschloss. On the solution of a two ball trust region subproblem. Mathematical Programming, 64:249{276, 1994. [18] M.R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Application, 4:303{320, 1969. [19] A. Kamath and N. Karmarkar. A continuous approach to compute upper bounds in quadratic maximization problems with integer constraints. In C. A. Floudas and P. M. Pardalos, editors, Recent Advances in Global Optimization, pages 125{140, Princeton University, 1991. Princeton University Press. [20] S. Kapoor and P. Vaidya. Fast algorithms for convex quadratic programming and multicommodity ows. In Proc. 18th Annual ACM Symp. Theory Comput., pages 147{159, 1986. [21] N. Karmarkar. An interior-point approach to NP-complete problems. In Proceedings of the Mathematical programming society Conference on Integer Programming and Combinatorial Optimization, pages 351{366, 1990.

[22] N. Karmarkar, M. G. C. Resende, and K.G. Ramakrishnan. An interior point algorithm to solve computationally dicult set covering problems. Mathematical Programming, 52:597{618, 1991. [23] W. Li. A di erentiable piecewise quadratic exact penalty functions for quadratic programs with simple bound constraints. Technical report, Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529, 1994. [24] S. Lucidi and L. Palagi. A class of algorithm for large scale trust region subproblems. Technical Report Preliminary Draft 1996, Dipartimento di Informatica e Sistemistica, Universita di Roma \La Sapienza", Roma, Italy, 1996. [25] S. Lucidi, L. Palagi, and M. Roma. Quadratic programs with quadratic constraint: characterization of KKT points and equivalence with an unconstrained problem. Technical Report 24.94, Dipartimento di Informatica e Sistemistica, Universita di Roma \La Sapienza", Roma, Italy, 1994. [26] J. M. Martnez. Local minimizers of quadratic functions on Euclidean balls and spheres. SIAM Journal on Optimization, 4(1):159{176, 1994. [27] J. M. Martnez and S. A. Santos. Trust region algorithms on arbitrary domains. Mathematical Programming, 68:267{302, 1995. 19

[28] J. J. More. Generalization of the trust region problem. Optimization Methods and Software, 2:189{209, 1993. [29] J. J. More and D. C. Sorensen. Computing a trust region step. SIAM Journal on Scienti c and Statistical Computing, 4(3):553{572, 1983. [30] J.J. More and D.C. Sorensen. On the use of directions of negative curvature in a modi ed Newton method. Mathematical Programming, 16:1{20, 1979. [31] P. M. Pardalos, Y. Ye, and C.-G. Han. Algorithms for the solution of quadratic knapsack problems. Linear Algebra Appl., 25:69{91, 1991. [32] M. J. D. Powell. A method for nonlinear constraints in minimization problem. In R. Fletcher, editor, Optimization, pages 283{298. Academic Press, New York, 1969. [33] F. Rendl and H. Wolkowicz. A semide nite framework to trust region subproblems with applications to large scale minimization. Technical Report CORR Rep.94-32, University of Waterloo, Dept. of Combinatorics and Optimization, 1994. [34] S. A. Santos and D. C. Sorensen. A new matrix-free for the large scale trust region subproblem. Technical Report TR95-20, Rice University, Houston, TX, 1995. [35] D. C. Sorensen. Newton's method with a model trust region modi cation. SIAM Journal on Scienti c and Statistical Computing, 19(2):409{427, 1982. [36] D. C. Sorensen. Minimization of a large scale quadratic function subject to an ellipsoidal constraint. Technical Report TR94-27, Rice University, Houston, TX, 1994. [37] Ph.L. Toint. Towards an ecient sparsity exploiting Newton method for minimization. In I. S. Du , editor, Sparse Matrices and Their Uses, pages 57{88, London, 1981. Academic Press. [38] S. A. Vavasis. Nonlinear Optimization. Oxford University Press, 1991. [39] S. A. Vavasis and R. Zippel. Proving polynomial-time for sphere-constrained quadratic programming. Technical Report 90-1182, Department of Computer Science, Cornell University, Ithaca, New York, 1990. [40] Y. Ye. A new complexity result on minimization of a quadratic function with a sphere constraint. In C. A. Floudas and P. M. Pardalos, editors, Recent Advances in Global Optimization, pages 19 { 31, Princeton University, 1991. Princeton University Press. [41] Y. Ye. On ane scaling algorithms for nonconvex quadratic programming. Mathematical Programming, 56:285{300, 1992. [42] Y. Ye and E. Tse. An extension of Karmarkar's projective algorithm for convex quadratic programming. Mathematical Programming, 44:157{179, 1989. 20

Appendix Before giving the proof of Proposition 4.3, we report this well known result whose proof is immediate.

Lemma A.1

  max kxk2 ? a2 ; ? 2" (x) = 0

if and only if

kxk2  a2;

(x)  0;

(x)(kxk2 ? a2) = 0:

Proof of Proposition 4.3 First, we note that we can rewrite the gradient of P as follows



kxk2 ? a2; ? " (x)

rP (x) = rq(x) + 2(x)x + r(x) max   4 " 2 2 + " x max kxk ? a ; ? 2 (x) :



2

(18)

If part. Assume that (x;  ) is a KKT pair for Problem (2) then by recalling Lemma A.1,

the result easily follows from the expression (18) of the gradient rP (x).

Only if part. In order to prove this part, it is enough to show that   max kxk2 ? a2 ; ? 2" (x) = 0:

(19)

In fact, from the expression (18) of the gradient of P we have immediately that 0 = rP (x) = rq (x) + 2(x)x; which together with Lemma A.1 implies that the point (x; (x)) is a KKT point of Problem (2). We turn to prove (19). We distinguish the cases x = 0 and x 6= 0. The point x = 0 is a stationary point for P if and only if c = 0. On the other hand, the point x = 0 is a KKT point for Problem (2) if and only if c = 0. Now, we assume x 6= 0. By (18) we can write for every x 6= 0 n

o

"xT rP (x) = "xT rq(x) + 2n(x) kxk2 + 4 kxk2 max kxk2 ? a2; ? 2" (x) o +"xT r(x) max kxk2 ? a2 ; ? 2" (x) o  n = 2"(x) kxk2 ? a2 + "xT r(x) + 4 kxk2 max kxk2 ? a2 ; ? 2" (x) Hence, taking into account that 

2"(x) kxk2 ? a2



n

= (2"(x) ? 4(kxk2 ? a2 )) max kxk2 ? a2 ; ? 2" (x) o2 n +4 max kxk2 ? a2 ; ? 2" (x) ; 21

o

it easily follows that

"xT rP (x) = max



kxk2 ? a2; ? " (x) 2

where

M (x; ") = "xT r(x) + 2"(x) + 4a2+4 max





M (x; ")

kxk2 ? a2; ? " (x)

(20) 

: (21) 2 Now, our aim is to show that M (x;n") is strictly positiveo for every x 2 IRn . First, we consider the case max kxk2 ? a2 ; ? 2" (x) = kxk2 ? a2 that is   kxk2 ? a2  4"a2 xT Qx + cT x : By simple calculation we get the inequality 4 ? "kck2 kxk2  8a28+a "(2 (22) kQk + 1) : In this case we have   M (x; ") = 4 kxk2 + "xT r (x) + 2"(x) = 4 kxk2 ? 2"a2 4xT Qx + 3cT x    4 kxk2 ? 2"a2 4 kQk kxk2 + 23 kxk2 + kck2 h   i = 41a2 kxk2 16a2 ? "(8 kQk + 3) ? 3 kck2 " Since " satis es (13), the term 16a2 ? "(8 kQk + 3) is positive, and by using (22), we can write the following inequality: 2 2 4a2p1 " + 64a6 ; M (x; ")  " 2kac2k(8kaQ2 k+?" (2 (23) kQk + 1)) where p1 = a2 (8 kQk + 3) + 5 kck2 : (24) The numerator of the right term of (23) is a quadratic function in " which assume positive values in the interval (0; "1) where 

"1 = b p1 ? with

q = 16a2 kck2 kQk ;



p21 ? q

1 

b=

2

2a2 : kck2 kQk

Now, we note the following relationships 16a4 bq bq "1 = ? 2  1  2p = a2 (8kQk + 3) + 5kck2 ; 1 p1 + p1 ? q 2 22

(25)

which, by the choice ", imply that " 2 (0; "1). Therefore, for all o n (13) for the parameter 2 " 2 x such that max kxk ? a ; ? 2 (x) = kxk2 ? a2 and for all " satisfying (13), we get

M (x; ") > 0: o

n

Now, let us consider the case max kxk2 ? a2 ; ? 2" (x) = ? 2" (x), that is   kxk2 ? a2  4a" 2 xT Qx + cT x : By simple calculation, and by the fact that " satis es (13), we have that 8a2 ? " (2 kQk + 1) is positive and hence we have the inequality 2 4 kxk2  8a2 "?k"ck(2+kQ8ka + 1) :

(26)

  M (x; ") = "xT r(x) + 4a2 = ? 2a" 2 2xT Qx + cT x + 4a2 h i  ? 4a" 2 (4 kQk + 1) kxk2 + kck2 + 4a2 2 2 2 6  ? k2cak2 (8kQa2k +" "?(24akQpk2"++1))64a

(27)

In this case we have that

where

p2 = a2 (8 kQk + 3) + kck2 : The numerator of the last term of (27) is a quadratic function in " which assume positive values in the interval (0; "2) where 



"2 = b ?p2 + p22 + q

1  2

and q; b are already de ned by (25). Let us observe that p1 ; p2 > 0, p1 = p2 + 4 kck2 where p1 is given by (24), and it is easily seen that p22 + q  p21, hence the following inequality holds  1 q q ?p2 + p22 + q 2 = ? 2  1  2p : 1 p2 + p2 + q 2 Now, the choicen (13) for the parameter ", imply that " 2 (0; "2) and hence for every x o 2 " 2 such that max kxk ? a ; ? 2 (x) = ? 2" (x) we have

M (x; ") > 0: Therefore, if x is a point osuch that rP (x) = 0, recalling (20) we necessarily have n that max kxk2 ? a2 ; ? 2" (x) = 0; that is equivalent to the conditions (19). Hence, we have proved the rst part of the proposition. 23

Now, in order to complete the proof we have to show that P (x) = q (x). This easily follows from Lemma A.1, Proposition 4.1 (ii) and by observing that the penalty function can be rewritten as   2  P (x) = q(x) + (x) max kxk2 ? a2 ; ? 2" (x) + 1" max kxk2 ? a2 ; ? 2" (x) :

24