The Projective Method for Solving Linear Matrix Inequalities 1

0 downloads 0 Views 461KB Size Report
Linear matrix inequality (LMI) techniques are inspiring a growing interest in the .... The next lemma captures an instrumental geometric property of K [2]. ... To find a strictly feasible vector x, the Projective Method relies on the ..... plays no role at this stage and Algorithm 3.3 can be applied until a strictly feasible vector x0 is.
The Projective Method for Solving Linear Matrix Inequalities Arkadi Nemirovski



Pascal Gahinet



In Math. Programming Series B, Special Issue on SDP, 77(1997), pp. 163-190

Abstract Numerous problems in control and systems theory can be formulated in terms of linear matrix inequalities (LMI). Since solving an LMI amounts to a convex optimization problem, such formulations are known to be numerically tractable. However, the interest in LMI-based design techniques has really surged with the introduction of efficient interior-point methods for solving LMIs with a polynomial-time complexity. This paper describes one particular method called the Projective Method. Simple geometrical arguments are used to clarify the strategy and convergence mechanism of the Projective algorithm. A complexity analysis is provided, and applications to two generic LMI problems (feasibility and linear objective minimization) are discussed.

1

Introduction

Linear matrix inequality (LMI) techniques are inspiring a growing interest in the control community and are emerging as powerful numerical tools for the analysis and design of control systems [4]. An LMI is any matrix inequality of the form A(x) > 0 where A(x) is a symmetric matrix that depends affinely on the entries of the vector x ∈ Rn . In other words, A(x) can be written as A(x) = x1 A1 + . . . + xn An + B where • A1 , . . . , An and B are given symmetric matrices, • x = (x1 , . . . , xn ) is a vector of real scalar variables, • > 0 stands for positive definite. † Faculty of Industrial Engineering and Management, Technion, Technion City, Haifa 32000, Israel. E-mail: [email protected] ‡ INRIA Rocquencourt, Domaine de Voluceau, BP 105, 78153 Le Chesnay Cedex, France. E-mail: [email protected]

1

The entries of x are often called the decision variables. Note that the system of LMIs    L1 (x) > 0 .. .   LP (x) > 0 is equivalent to the single LMI L(x) := Diag(L1 (x), . . . , LP (x)) > 0. Hence the discussion below readily extends to systems of LMIs. Two of the three generic LMI problems are discussed in this paper: • the strict feasibility problem: Find x ∈ Rn such that A(x) > 0

(1.1)

In other words, find values of the decision variables x1 , . . . , xn that satisfy the LMI constraint A(x) > 0. • the linear objective minimization under LMI constraints: Minimize cT x subject to A(x) ≥ 0.

(1.2)

This problem only makes sense when the LMI constraint A(x) ≥ 0 is feasible. Both problems are convex due to the affine dependence in x. The third generic problem is the generalized eigenvalue minimization problem discussed in [16, 3]. Optimization under LMI constraints is a very attractive field for the application of modern polynomial-time interior-point methods. These methods are rooted in the seminal paper of Karmarkar [12] where the first method of this type was proposed for linear programming. Various extensions to semi-definite programming and optimization under LMI constraints can be found in [14, 16, 1, 11, 15, 3, 19, 13, 6] and references therein. See also [18, 17] for alternative approaches to this class of problems. This paper focuses on the so-called Projective Method. This interior-point method has several interesting features: • as a polynomial-time algorithm, it is guaranteed to find, for any ε > 0, an ε-solution to the problem within a finite number of steps bounded by O(m) log(C/ε), where m is the total row size of the LMIs and C is some scaling factor (see Theorem 3.1 and 4.1 below for details) • the Projective Method is applicable to a variety of LMI problems including linear programs under LMI constraints and fractional problems of the form min λ subject to A(x) ≤ λ B(x),

B(x) > 0,

C(x) ≥ 0

• no initial feasible solution is required for problem (1.2), and no separate Phase-I algorithm needs to be run to generate such a solution. The Projective Method was first described in [15, 16] in the more abstract context of the theory of self-concordant barriers. The present paper takes a different and more straightforward perspective and provides simple geometrical insight into the algorithm. Besides its tutorial value, the paper also 2

contains details on the numerical implementation and statistics on the computational overhead. Note that a version of the Projective Method optimized for LMI problems with a block-matrix structure (see Section 5) is implemented in the LMI Control Toolbox for use with Matlab [8, 9]. The paper is organized as follows. Section 2 summarizes the notation and reviews a few instrumental concepts. Section 3 describes the Projective Algorithm for the strict feasibility problem. This includes a detailed proof of convergence as well as a complexity analysis. Section 4 shows how this basic algorithm can be adapted to solve the linear objective minimization problem. Finally, Section 5 discusses the implementation aspects with an emphasis on efficiency and numerical stability. The results of numerical tests are also reported.

2

Notation and Preliminaries

The following notation is used in the sequel. • Tr(X) and Det(X) denote the trace and the determinant of the square matrix X, and log(x) stands for the natural logarithm of the positive scalar x. • S is the space of symmetric matrices of size m. This is a subspace of Rm×m of dimension M := dim(S) =

m(m + 1) . 2

(2.1)

¯ denotes the closure of K, • K denotes the open cone of positive definite matrices in S, and K that is, the cone of positive semi-definite matrices of S. • S is endowed with the family of scalar products h., .iP defined for P > 0 by hX, Y iP = Tr(P XP Y ). The corresponding induced norms are 1/2

kXkP = (Tr(P XP X))

.

The case P = I yields the usual Frobenius norm which we denote by h., .iF ro . The origin and interpretation of the metrics h., .iP are clarified in the next subsection.

2.1

Logarithmic barriers

As an interior-point method, the Projective Method generates a sequence of matrices that remain in the open cone K. To preserve positive definiteness, the optimization criterion includes a logarithmic barrier that is defined on K and tends to +∞ when approaching a boundary point of K. This barrier is defined for X > 0 by f (X) = − log Det(X) (2.2) and constitutes a special case of the “self-concordant” barriers discussed in [16]. It is easily verified that f 0 (X) = −X −1 ; [f 00 (X)](Y ) = X −1 Y X −1 (here we identify linear and bilinear forms with their matrix representation). Consequently, the Hessian metric is given by h[f 00 (X)](Y ), ZiF ro = Tr(X −1 Y X −1 Z) = hY, ZiX −1 . This explains the relevance of the inner products h., .iP introduced above. 3

2.2

Dikin ellipsoid

Instrumental to the updating of the current “best” solution is the notion of Dikin ellipsoid. Given X > 0, consider the ellipsoid centered at X:  Ω(X) = Y : kY − Xk2X −1 = Tr{X −1 (Y − X)X −1 (Y − X)} < 1 n o = Y : kX −1/2 Y X −1/2 − IkF ro < 1 . (2.3)

cone K

X

Figure 2.1: Dikin ellipsoid.

Lemma 2.1 The Dikin ellipsoid Ω(X) is contained in K. Proof: Let Z := X −1/2 Y X −1/2 and denote by {λi }1≤i≤m the eigenvalues of Z. It suffices to show that Z is positive definite whenever kZ − IkF ro < 1. Now kZ − Ik2F ro =

m X (λi − 1)2 < 1 1

implies that |λi − 1| < 1 for all i. Consequently, λi (Z) > 0 holds for all i if kZ − IkF ro < 1 and the proof is complete. The importance of the Dikin ellipsoid comes from the fact that it delimits a region around X where we can move without leaving the cone K, i.e., losing positive definiteness. Note that the Dikin ellipsoid is exactly the open unit ball in the metric k.kX −1 .

2.3

Orthogonal projection in the metric h., .iP

An important tool in the Projective Method is the orthogonal projection onto a subspace E with respect to the metric h., .iP . In our case, E will be the range of the linear map A associated with the LMI A(x) > 0 (see Section 3). Given X ∈ S, its projection X + onto E is defined as the unique solution of the least-squares problem min kY − XkP = minn kAx − XkP . (2.4) Y ∈E

x∈R

4

The gradient g(Y ) of the function Y 7→ kY − Xk2P is easily obtained from the first-order variation hg(Y ), δY iF ro = δ(kY − Xk2P ) = 2 Tr(P (Y − X)P δY ) which yields g(Y ) = 2 P (Y − X)P.

(2.5)

The projection X + is characterized by the optimality condition ∀Y ∈ E, hg(X + ), Y iF ro = 2hX + − X, Y iP = 0.

(2.6)

This is nothing else than saying that X + − X should be orthogonal to E with respect to the inner product h., .iP , or equivalently, that P (X + − X)P should be orthogonal to E in the usual Frobenius metric h., .iF ro . In the case E = Range(A), the orthogonality condition (2.6) can be rewritten in terms of x+ such that X + = Ax+ as: ∀x ∈ Rn , hAx+ − X, AxiP = hP (Ax+ − X)P, AxiF ro = 0.

3

(2.7)

The Strict Feasibility Problem

We first consider the strict feasibility problem (1.1). A vector x is called strictly feasible if A(x) > 0. The affine function A(x) can be decomposed as A(x) = Ax + B = x1 A1 + . . . + xn An + B

(3.1)

where the Ai ’s and B are given matrices in S, and Ax is the homogeneous part of A(x). Without loss of generality, we can assume B = 0 for simplicity. To see this, introduce an extra variable τ and define     x Ax + τ B 0 ˜ A := . τ 0 τ   x Clearly, if x solves Ax + B > 0, then x ˜ := solves A˜ x ˜ > 0. Conversely, given a solution 1   x of A˜ x ˜ > 0, τ is positive and x/τ satisfies Ax + B > 0. Hence the two strict feasibility τ

problems are equivalent. ˜x Up to redefining A, x, and n as A, ˜, and n + 1, respectively, we can therefore concentrate on the homogeneous strict feasibility problem: Find x ∈ Rn such that Ax > 0

(3.2)

where A is a linear map from Rn to S. Henceforth, the range of this map is denoted by E := Range(A). The geometrical interpretation of this problem is as follows: Given an open cone K with its vertex at the origin, and a plane E passing through the origin, find a matrix X in the intersection of E and K. Throughout the sequel, we require that Nondegeneracy assumption: Ax is nonzero whenever x is nonzero. 5

(3.3)

In other words, A must have full column rank (dim E = n). The next lemma captures an instrumental geometric property of K [2]. ¯ = {0} Lemma 3.1 The open cone K intersects the subspace E = Range(A) if and only if E ⊥ ∩ K (here E ⊥ denotes the orthogonal complement of E for the Frobenius inner product). Consequently, strict feasibility is characterized by E ∩ K = 6 ∅ while the absence of strict feasibility corresponds to ¯ 6= {0}. E⊥ ∩ K Proof: This lemma follows from a general result on convex sets. Specifically, if E ∩ K = ∅ then K and E can be separated by an hyperplane of equation hS, XiF ro = 0 (with S fixed and nonzero) such that hS, XiF ro > 0

for all

X∈K

(3.4)

hS, XiF ro ≤ 0

for all

X ∈ E.

(3.5)

¯ Now, E being a linear subspace, (3.5) is only possible if in From (3.4) it is immediate that S ∈ K. ¯ fact hS, XiF ro = 0 for all X ∈ E, i.e., if S ∈ E ⊥ . Consequently S ∈ E ⊥ ∩ K. The converse is proved by contradiction. Suppose that there exist X ∈ E ∩ K and a nonzero ¯ Then hX, Y iF ro = 0 because X ∈ E and Y ∈ E ⊥ , while hX, Y iF ro > 0 because Y ∈ E ⊥ ∩ K. ¯ and Y 6= 0. This is clearly a contradiction. X ∈ K, Y ∈ K, Remark 3.2 The matrix S characterizing the separating hyperplane in the proof above is the slack variable for the following dual problem of (3.2): ¯ Find S in the intersection of E ⊥ and K. The primal problem is strictly feasible if and only if S = 0 is the only solution of the dual problem. A two-dimensional interpretation of Lemma 3.1 is that the angle at the vertex of the cone K is ¯ with possible equality if both X, Y are exactly 90o since hX, Y iF ro ≥ 0 for all nonzero X, Y ∈ K on the boundary (see Figure 3.1). For tractability in the interior-point framework, we rule out the limit case where Ax ≥ 0 admits nonzero solutions x, yet is not strictly feasible. This corresponds to the case when both E ∩ K and E ⊥ ∩ K are empty.

3.1

The Projective Algorithm

To find a strictly feasible vector x, the Projective Method relies on the following simple strategy. Given some X in the open cone K, test whether the Dikin ellipsoid centered at X intersects E. If it does, this provides a strictly feasible point since Ω(X) ⊂ K. Otherwise, update X to increase the “chances” that this intersection be nonempty. More precisely, the Projective algorithm proceeds as follows. Algorithm 3.3 (Projective Algorithm for strict feasibility problems) Given an arbitrary initial point X0 ∈ K (e.g., X0 = I), generate a sequence of positive definite matrices Xk as follows. To update Xk into Xk+1 , 1. Compute the orthogonal projection Xk+ of Xk onto E in the metric h., .iX −1 . k

Xk+

2. If > 0, terminate since then obtained by solving

Xk+

∈ K ∩ E. A strictly feasible point x ∈ Rn is then Ax = Xk+ . 6

3. Otherwise, update Xk to Xk+1 via the formula −1 Xk+1 = Xk−1 − γk Xk−1 (Xk+ − Xk )Xk−1

(3.6)

−1 where the step size γk is selected to make Xk+1 positive definite and “larger” than Xk−1 by some fixed factor. Specifically, γk should be chosen such that −1 Det(Xk+1 ) ≥ κ Det(Xk−1 )

(3.7)

where κ > 1 is a fixed number (both iteration and problem independent). The key of the Projective Method is of course the existence of a step size γk that enforces (3.7) whenever Ω(Xk ) ∩ E = ∅. This is established in the next subsection and for now we restrict our attention to the convergence mechanism of Algorithm 3.3. We first give an intuitive geometric explanation of the convergence mechanism. Assume that ¯ = {0} as shown in Lemma 3.1. By construction, the Ax > 0 is strictly feasible, i.e., that E ⊥ ∩ K −1 size of Xk grows exponentially with the number of iterations since, from (3.7), Det(Xk−1 ) ≥ κk Det(X0−1 );

κ > 1.

(3.8)

Meanwhile, the updating direction ζ(Xk ) := Xk−1 (Xk+ − Xk )Xk−1

(3.9)

remains orthogonal to E in virtue of (2.6) and of the discussion in Subsection 2.3. Consequently, the sequence Xk−1 moves toward infinity parallel to E ⊥ as the algorithm progresses. But since E ⊥ ∩K = {0}, no sequence of positive definite matrices can go to infinity along E ⊥ without leaving K. This is due to the geometry of the problem as illustrated by Figure 3.1, and in particular to the geometry of the cone K (see Lemma 3.1). As a result, termination must occur in finitely many iterations, the number of iterations depending only on (1) the growth rate κ, (2) the position of K with respect to E ⊥ , and (3) the initial point X0 . More rigorously, assume strict feasibility, consider a strictly feasible vector xf ∈ Rn , and let Xf := Axf ∈ E ∩ K. From (3.8), we have log Det(Xk−1 ) ≥ k log κ + log Det(X0−1 ) while the arithmetic-geometric mean inequality applied to the eigenvalues of Xk−1 yields log(

log(Det(Xk−1 )) Tr(Xk−1 ) )≥ . m m

Moreover Tr(Xk−1 Xf ) ≥ α Tr(Xk−1 ) for some α > 0 since Xf > 0. The combination of these three inequalities shows the existence of some positive constant τ such that Tr(Xk−1 Xf ) ≥ τ ek log(κ)/m .

(3.10)

Now, since Xk−1 is always updated along directions ζ(Xk ) orthogonal to E for h., .iF ro , we have at all iterations k Tr(Xk−1 Xf ) = Tr(X0−1 Xf ) . (3.11) Hence Tr(X0−1 Xf ) ≥ τ ek log(κ)/m 7

(3.12)

must hold as long as Xk+ fails to be positive definite. Recalling that log κ > 0 since κ > 1, and observing that the left-hand side is independent of k, this shows that Xk+ must become positive definite after a finite number of iterations k. In other words, the algorithm will terminate in finitely many steps, the polynomial-time complexity resulting from (3.12) (see Subsection 3.3 below for details). In case of infeasibility, by contrast, the algorithm will in principle run forever. In practice, we force termination when the size of Xk−1 exceeds some tolerance. Recalling that Xk approaches E, a large norm in Xk−1 indicates that for all feasible vectors x (if any), Ax is nearly singular. In other words, problems where this occurs are only marginally feasible or even unfeasible up to rounding errors. The convergence mechanism is illustrated in Figures 3.1 and 3.2. Note that these two-dimensional pictures are by essence simplistic since nonscalar problems are at least three-dimensional and the geometry of K is fairly complex. Through these figures we only mean to give insight into the workings of the algorithm, not into the problem geometry. Note that the algorithm actually works toward proving infeasibility, and solves the feasibility problem only by failing to do so.

E⊥

X2−1 X1−1 cone K X0−1 X0

X1 X2 E = Range(A)

X2+ = Ax Figure 3.1: Convergence mechanism: feasible problem

8

E⊥ cone K

X2−1 X1−1 X0−1

X0 X2

X1 E = Range(A)

Figure 3.2: Convergence mechanism: infeasible problem Computing the orthogonal projection Xk+ amounts to solving the least-squares problem: min kAx − Xk kX −1 .

x∈Rn

(3.13)

k

Even though this task turns out to be most expensive in terms of arithmetic operations, it poses no theoretical nor practical difficulty (see Subsection 5.2 below for more details). Consequently, the main challenge is to find a scalar γk that enforces the growth rate (3.7). This central issue is now addressed in detail.

3.2

Determination of the step size γk

Now that the convergence mechanism is understood, we are left with proving that Xk can be updated in such a way that Det(Xk−1 ) grows by at least some fixed factor κ > 1. Recall that this updating is performed as long as the orthogonal projection Xk+ of Xk onto E (for h., .iX −1 ) fails k

to be positive definite. Throughout this subsection, we therefore assume that Xk+ is not positive definite. As a by-product, we know that the Dikin ellipsoid Ω(Xk ) does not intersect E. Indeed, if Y ∈ Ω(X) ∩ E, then kY − Xk kX −1 < 1 by definition and kXk+ − Xk kX −1 ≤ kY − Xk kX −1 since Xk+ K

K

K

is the point of E closest to Xk . As a result, Xk+ must be in Ω(Xk ) whence Xk+ > 0 by Lemma 2.1, a contradiction. That Ω(Xk ) ∩ E = ∅ proves critical in the subsequent derivation of an adequate step size γk . The shorthand notation X and X + is used in place of Xk and Xk+ for the rest of the section. Define ζ(X) := X −1 (X + − X)X −1 . (3.14) Clearly ρ := kX + − XkX −1 ≥ 1 9

(3.15)

since X + ∈ / Ω(X). To derive an adequate γk , it is convenient to work with the barrier F (x) = − log Det X −1 rather than with Det X −1 . Our goal is then to find γ ∈ R such that X −1 −γ ζ(X) > 0 and F (X −1 ) − F (X −1 − γ ζ(X)) ≥ θ > 0 (3.16) for some θ independent of X, or equivalently to show that F can be decreased by some fixed amount in the direction −ζ(X). This naturally leads to studying the function π(γ) := F (X −1 ) − F (X −1 − γ ζ(X)) = log Det(I − γψ)

(3.17)

ψ := X 1/2 ζ(X)X 1/2 = X −1/2 (X + − X)X −1/2 .

(3.18)

where Two important properties of the symmetric matrix ψ are stated in the next lemma. Lemma 3.4 Consider X > 0 such that Ω(X) ∩ E = ∅ and let ρ := kX + − XkX −1 . With this notation, the matrix ψ defined by (3.18) satisfies Tr(ψ 2 ) Tr(ψ)

= ρ2

(3.19) 2

= −ρ .

(3.20)

Proof: See Appendix A. Given (3.19)-(3.20), π(γ) can be evaluated as follows. Let λ1 , . . . , λm denote the real eigenvalues of the symmetric matrix ψ, and introduce ρ∞ := max |λi |. 1≤i≤m

(3.21)

Clearly I − γψ is positive definite whenever 0 ≤ γρ∞ < 1.

(3.22)

In this range of values, the next lemma provides a lower bound on π(γ) and establishes the existence of an adequate γ. Lemma 3.5 For all γ satisfying (3.22),  π(γ) ≥ ρ2 γ + ρ−2 ∞ (log(1 − γρ∞ ) + γρ∞ )

(3.23)

and the right-hand side is maximized for γ ∗ = 1/(1 + ρ∞ ) so that π(γ) ≥ π ∗ :=

ρ2 (ρ∞ − log(1 + ρ∞ )) ≥ 1 − log 2. ρ2∞

Proof: See Appendix B. Selecting γ := γ ∗ = 1/(1 + ρ∞ ) > 0 therefore guarantees that −1 log Det(Xk+1 ) − log Det(Xk−1 ) ≥ 1 − log 2

or equivalently that −1 Det(Xk+1 ) ≥ κ Det(Xk−1 )

where κ := exp(1 − log(2)) = e/2 ≈ 1.36 > 1. 10

(3.24)

When implementing this updating, it is more efficient to directly maximize π(γ) via a line search. To evaluate π(γ) and its first and second derivatives at a low cost, ψ is first reduced to tridiagonal form via orthogonal similarity. The cost of each evaluation of π(γ) is then of order O(m). The main advantage of the line search is to generally yield a faster growth rate for Det(Xk−1 ), hence a faster convergence of the algorithm. A two-dimensional illustration of the sequence generated by this updating appears in Figure 3.3.

X0

cross-section of the cone K

X1

X2

X0+

X1+

E = Range(A) X2+ (feasible) Figure 3.3: Updating of Xk .

3.3

Complexity

Summarizing the previous argument, we come to the following complexity estimate. Theorem 3.1 Assume that the strict feasibility problem (3.2) is solvable, and let m denote the total row size of Ax. Then the Projective Method started at X0 > 0 will find a strictly feasible point in a finite number N of iterations bounded by m inf log CondX0 (Xf ) N −1 ≤ Xf ∈ E 1 − log 2 Xf > 0

where the condition number CondX0 (Xf ) of Xf with respect to X0 is defined by: −1/2

CondX0 (Xf ) :=

−1/2

) λmax (X0 Xf X0 min{t : tX0 ≥ Xf } = . −1/2 −1/2 max{t : tX0 ≤ Xf } ) λmin (X0 Xf X0 11

Proof:

Assume that the method has not yet terminated after k steps and let tmin = min{t : tX0 ≥ Xf },

tmax = max{t : tX0 ≤ Xf }.

As seen above in (3.11), −1 Tr(Xk+1 Xf ) = Tr(X0−1 Xf )

and consequently −1 tmax Tr(X0 Xk+1 ) ≤ tmin Tr(I) = m tmin .

(3.25)

Meanwhile, recall from Subsection 3.2 that −1 Det(X0 Xk+1 ) ≥ κk

where log κ = 1 − log 2. This together with (3.25) implies that the positive definite symmetric 1/2 −1 1/2 matrix Y = X0 Xk+1 X0 satisfies the inequalities: CondX0 (Xf ) ≥

Tr(Y ) , m

Det(Y ) ≥ κk .

Since Tr(Y )/m ≥ Det1/m (Y ), we infer that log CondX0 (Xf ) ≥

k k log κ ≥ (1 − log 2) m m

and the desired upper bound on N readily follows.

4

Minimization of a Linear Objective under LMI Constraints

This section shows how the Projective Method described in Section 3 can be adapted to the problem Minimize cT x subject to A(x) = Ax + B ≥ 0

(4.1)

where Ax = x1 A1 + · · · + xn An . To simplify the forthcoming analysis, we assume that Solvability assumption:

Problem (4.1) is strictly feasible and the optimal solution set is nonempty.

As in Section 3, we first turn (4.1) into a homogeneous problem. Consider the auxiliary problem Minimize

cT y over y, τ subject to Ay + τ B ≥ 0, τ > 0. τ

This problem and (4.1) are equivalent in the change of variable x = y/τ and share the same global minimum. Defining         x Ax + τ B 0 c 0 x ˜ := ; A˜ x ˜ := ; c˜ := ; d˜ := , τ 0 τ 0 1 we can therefore replace the original problem by the homogeneous problem: Minimize

c˜T x ˜ subject to A˜ x ˜ ≥ 0, d˜T x ˜ 6= 0. d˜T x ˜

(4.2)

For the sake of clarity, all tildes are dropped in the sequel, bearing in mind that we are dealing with the transformed problem (4.2). As before, E denotes the range of the linear map A. 12

Remark 4.1 Note that in the homogeneous reformulation introduced above, d˜T x ˜ > 0 whenever A˜ x ˜ > 0, that is, whenever x is strictly feasible. The first task consists of finding a feasible vector x. The objective Θ(x) :=

cT x dT x

(4.3)

plays no role at this stage and Algorithm 3.3 can be applied until a strictly feasible vector x0 is found; without loss of generality, we can assume that the τ -component of this vector is 1. Let X0 = Ax0 > 0 denote the corresponding value of the LMI, and let θ0 := cT x0 /dT x0 . The level sets of the function Θ are the hyperplanes {x ∈ Rn : Θ(x) = θ} = {x ∈ Rn : (c − θd)T x = 0} and they are mapped by A to  E(θ) := Ax ∈ E : (c − θd)T x = 0 .

(4.4)

We are now ready to outline the Projective Method for linear objective minimization. Algorithm 4.2 (Projective Algorithm for linear objective minimization) Given an initial strictly feasible X0 = Ax0 ∈ K ∩ E computed by Algorithm 3.3, generate a sequence of matrices Xk > 0 and a sequence θk∗ of objective values at strictly feasible solutions x∗k ∗ , x∗k+1 : as follows. To update Xk , θk∗ , x∗k to Xk+1 , θk+1 1. Compute the orthogonal projection Xk+ = Axk of Xk onto E (for the metric h., .iX −1 ) and k

check its positive definiteness. If Xk+ > 0, call the step productive and go to 2. Otherwise, call the step unproductive, set ∗ θk∗ := θk−1 ,

x∗k := x∗k−1 ,

Yk := Xk+ − Xk ,

and go directly to 3. 2. Decrease θ starting from θk = cT xk /dT xk until the orthogonal projection Xk+ (θ) of Xk onto E(θ) (still in the metric h., .iX −1 ) is about to leave K. More precisely, find θ ≤ θk such that k

kXk − Xk+ (θ)kX −1 ≥ 0.99 and Xk+ (θ) > 0.

(4.5)

k

Let θk∗ be the resulting value, x∗k be the strictly feasible vector such that Xk+ (θk∗ ) = Ax∗k , and set Yk := Xk+ (θk∗ ) − Xk . 3. Update Xk to Xk+1 according to −1 Xk+1 = Xk−1 − γk Xk−1 Yk Xk−1

(4.6)

−1 where the step size γk is selected such that Xk+1 > 0 and −1 Det(Xk+1 ) ≥ κ Det(Xk−1 )

for some fixed κ > 1. Go back to 1. 13

(4.7)

Note that the first iteration is always productive since x0 is strictly feasible and consequently X0 = X0+ ∈ E. Hence x∗k is well-defined for all k ≥ 1. Moreover, x∗k is by construction a strictly feasible solution of Ax ≥ 0 with corresponding objective value Θ(x∗k ) = θk∗ . Remark 4.3 Note that the objective remains unchanged when the iteration is unproductive (see Step 1). Thus the method does not necessarily improve the objective at each iteration, even though it still achieves some progress of its own in unproductive steps. Before discussing this modified algorithm and its convergence properties, we give a graphical illustration of the algorithm progress. The first step is depicted by Figure 4.1.

(1) decrease θ E(θ0∗ ) E(θ0 )

Ω(X0 ) ∩ E

X0+ (θ0∗ )

X0

E

E(θ)

X1

(2) update X0 based on X0+ (θ0∗ )

K

Figure 4.1: Progress of the algorithm: first step. Given the initial feasible solution X0 = Ax0 and the corresponding objective value θ0 , the objective value θ is decreased in a way ensuring (4.5) (see Section 4.2). This yields a current best value θ0∗ of the objective as well as a matrix X0+ (θ0∗ ) in E(θ0∗ ) ∩ Ω(X0 ). The latter is used to update X0 according to Step 3.

14

The following steps proceed in a similar fashion (see Figure 4.2). First the matrix Xk is updated until its projection Xk+ = Axk onto E regains positive definiteness. The value θk = Θ(xk ) is then a feasible value of the objective, and we can decrease θ down from the value θk until (4.5) is satisfied. Denoting by θk∗ the resulting objective value, the projection of Xk+ (θk∗ ) of Xk onto E(θk∗ ) is used to update Xk to Xk+1 .

θk =

(2) decrease θ

cT xk dT xk

E(θk ) E(θ) Ω(Xk+ ) ∩ E E(θk∗ )

Xk+ = Axk Xk+ (θk∗ )

E

Xk Xk+1 (1) project Xk onto E

(3) update Xk based on Xk+ (θk∗ )

K

Figure 4.2: Progress of the algorithm: following steps.

4.1

Step size γk

First consider Step 3. and the determination of γk such that (4.7) holds. In unproductive steps, the situation is completely similar to that in Algorithm 3.3, and from Subsection 3.2 there exists an adequate γk that results in κ = exp(1 − log 2). Next consider the case of a productive step. Here again the set-up parallels that of Algorithm 3.3 with E replaced by E(θk∗ ). By picking θk∗ such that Xk+ (θk∗ ) is outside the ellipsoid n o Ω0.99 (Xk ) = Y : kY − Xk kX −1 < 0.99 , (4.8) k

15

the existence of some adequate γk is established as in Subsection 3.2, except that the resulting κ is now exp(0.99 − log(1.99)) instead of exp(1 − log(2)). This difference is immaterial since exp(0.99 − log(1.99)) > 1 as well. Hence Step 3 causes no additional difficulty.

4.2

Computation of θk∗

Next we turn to the computation of θk∗ in productive steps. Since A is a bijection from Rn to E, there exist two matrices Ck , Dk ∈ E such that for all X = Ax ∈ E, hCk , XiX −1 = cT x ,

hDk , XiX −1 = dT x .

k

(4.9)

k

Consequently, E(θ) defined in (4.4) is an hyperplane of E with normal vector Ck − θDk and equation E(θ) := {X ∈ E : hCk − θDk , XiS = 0} , S := Xk−1 . (4.10) Using Pythagoras theorem in the triangle (Xk , Xk+ , Xk+ (θ)) and the fact that Xk+ (θ) is the orthogonal projection of Xk+ onto the hyperplane E(θ) with normal vector Ck − θDk , the squared distance from Xk to Xk+ (θ) is given by: δ 2 (θ)

:=

kXk − Xk+ (θ)k2S

=

kXk − Xk+ k2S + kXk+ − Xk+ (θ)k2S

=

kXk − Xk+ k2S +

hCk − θDk , Xk+ i2S . kCk − θDk k2S

From the previous subsection, we must find θk∗ such that (i) Xk+ (θk∗ ) is positive definite (ii) Xk+ (θk∗ ) lies outside of the ellipsoid Ω0.99 (Xk ) defined in (4.8). Since the step in question is productive, Xk+ = Xk+ (θk ) is positive definite. If Xk+ already lies outside Ω0.99 (Xk ), simply take θk∗ = θk . Otherwise, • kXk − Xk+ kS < 0.99 implies that δ(θk ) < 0.99 since hCk −

θk Dk , Xk+ iS

(4.11)

T

= (c − θk d) xk = 0 by definition of θk

• xk (θ) must become infeasible as θ approaches −∞. Otherwise, the objective would be unbounded from below since the objective value at xk (θ) is exactly θ, and this would contradict our Solvability Assumption. As a result, Xk+ (θ) = Axk (θ) must lie outside of the Dikin ellipsoid Ω(Xk ) for small enough θ, whence lim δ(θ) ≥ 1.

θ→−∞

(4.12)

By continuity, (4.11)–(4.12) prove the existence of θk∗ ≤ θk such that δ(θk∗ ) = 0.99. Observing that δ 2 (θ) is the ratio of two quadratic expressions, this value θk∗ is given by a simple explicit formula. Note that by construction Xk+ (θk∗ ) belongs to the boundary of the ellipsoid Ω0.99 (Xk ) and is therefore positive definite (see Lemma 2.1). Remark 4.4 In the actual implementation of the Algorithm, the value θk∗ derived above is further decreased by a line search down to the value of θ at which Xk+ (θ) leaves the cone K. 16

4.3

Convergence and complexity

Denote by θopt the optimal value of the objective Θ(x) and by xopt an optimal solution of the problem (that is, such that A(xopt ) ≥ 0 and Θ(xopt ) = θopt ); without loss of generality, we assume that the τ -component of xopt is 1. What is meant by polynomial-time convergence in the context of linear objective minimization? We say that the algorithm converges in polynomial time if for some fixed “scale factor” R > 0 and for any ε ∈ (0, 1), the objective value θopt + Rε is attained within a finite number Nε of iterations bounded from above by Nε ≤ O(π(m, n)) log(C/ε)

(4.13)

where m is the size of A(x), n the number of scalar variables xi , π(., .) is a polynomial, and C is a data-dependent factor. For our purpose, it is convenient to take R := Θ(x0 ) − θopt and C := min{t : tX0 ≥ Xopt } where X0 = Ax0 > 0 and Xopt = Axopt ≥ 0. To prove polynomial-time convergence, fix ε ∈ (0, 1) and introduce xε = (1 − ε)xopt + εx0 ,

Xε = Axε .

(4.14)

Clearly xε is a strictly feasible vector such that Θ(xε ) ≤ θopt + Rε

(4.15)

(recall that the τ -components of both x0 and xopt are equal to 1, so that Θ(x) = cT x for x = x0 , xopt , xε ). Let Nε be the number of iterations performed until θk∗ ≤ Θ(xε ). To bound Nε from above, we use an argument similar to that of Subsection 3.3. In Algorithm 3.3, the convergence proof relied critically on the following fact: given any strictly feasible X = Ax ∈ K, the quantity Tr(Xk−1 X) remained constant throughout the updating of Xk , that is, Tr(X0−1 X) = Tr(Xk−1 X) for all k. Even though this property no longer holds for the −1 modified algorithm, the next lemma shows that for Xε introduced above, we have Tr(Xk+1 Xε ) ≤ −1 −1 ∗ Tr(Xk Xε ) as long as θk > Θ(xε ), that is, for all k ≤ Nε . As a result, Tr(Xε Xk ) is always bounded from above by Tr(Xε X0−1 ), which allows us to adapt the derivation of Subsection 3.3 to derive an upper bound for Nε . Lemma 4.5 Let xε and Xε = Axε be defined by (4.14). Then −1 Tr(Xk+1 Xε ) ≤ Tr(Xk−1 Xε )

(4.16)

holds as long as θk∗ > Θ(xε ). Proof: Consider some iteration k < Nε where θk∗ > Θ(xε ). If this iteration is unproductive, −1 the updating is identical to that of Algorithm 3.3 and consequently Tr(Xk+1 Xε ) = Tr(Xk−1 Xε ). Assume from now on that the iteration is productive. We first show that Xk+ (θk∗ ) − Xk+ = −α(Ck − θk∗ Dk )

with

α≥0

(4.17)

with Ck , Dk defined by (4.9). To this end, consider xk and x∗k such that Xk+ = Axk ,

Xk+ (θk∗ ) = Ax∗k .

Recalling that Xk+ (θk∗ ) is the orthogonal projection of Xk+ onto E(θk∗ ), and that E(θk∗ ) is an hyperplane of E with normal vector Ck − θk∗ Dk , the vector Xk+ (θk∗ ) − Xk+ must be collinear to Ck − θk∗ Dk . 17

In addition, hCk − θk∗ Dk , Xk+ (θk∗ ) − Xk+ iX −1

=

(c − θk∗ d)T x∗k − (c − θk∗ d)T xk

=

(Θ(x∗k ) − θk∗ ) dT x∗k − (Θ(xk ) − θk∗ ) dT xk

k

= −(θk − θk∗ ) dT xk ≤ 0 since Θ(xk ) = θk ≥ θk∗ = Θ(x∗k ) and dT xk > 0 from Axk > 0. This establishes (4.17). Back to the proof of (4.16), we have from (4.6):   −1 γk Tr (Xk−1 − Xk+1 )Xε = Tr Xk−1 (Xk+ (θk∗ ) − Xk )Xk−1 Xε = hXk+ (θk∗ ) − Xk , Xε iX −1 k

= hXk+ (θk∗ ) − Xk+ , Xε iX −1 + hXk+ − Xk , Xε iX −1 k

k

Now, • hXk+ − Xk , Xε iX −1 = 0 since Xk+ − Xk is orthogonal to E while Xε ∈ E k

• from (4.17) it follows that hXk+ (θk∗ ) − Xk , Xε iX −1 k

= −αhCk − θk∗ Dk , Xε iX −1 k

= −α (c − θk∗ d)T xε = −α (Θ(xε ) − θk∗ ) dT xε . Summing up, we have established the identity  −1 γk Tr (Xk−1 − Xk+1 )Xε = −α (Θ(xε ) − θk∗ ) dT xε . This completes the proof of (4.16) upon recalling that (1) Θ(xε ) < θk∗ whenever k < Nε , (2) γk > 0, and (3) α and dT xε are nonnegative scalars. The argument used to prove Theorem 3.1 is readily adapted to derive an upper bound on Nε from (4.16). Here Xε plays the role of Xf and consequently, the resulting upper bound on Nε will be 1 + m log(CondX0 (Xε ))/ log κ with log κ ≥ 0.99 − log 1.99. Since Xε = (1 − ε)X ∗ + εX0 and X0 , Xε are positive definite, we clearly have CondX0 (Xε ) ≤ C/ε. This leads to the following result. Theorem 4.1 Let Problem (4.1) satisfy the Solvability Assumption, and let x0 be a strictly feasible point (Ax0 > 0). For any ε ∈ (0, 1), the number of iterations performed by Algorithm 4.2 before a feasible solution with objective value less than θopt + ε(Θ(x0 ) − θopt ) is found does not exceed the quantity  log Cε , 1+m 0.99 − log 1.99 where C := min{t : t Ax0 ≥ Axopt }, and xopt is the optimal solution of the problem.

18

5

Implementation and Numerical Results

From the previous discussion, solving the least-squares problem min kAx − Xk kX −1

x∈Rn

(5.1)

k

is the main computational effort involved in each iteration of Algorithms 3.3 and 4.2. This section shows that for control-oriented applications, much can be gained by exploiting the specific blockmatrix structure of each problem. Numerical stability issues near the optimum are also addressed. Finally, experimental data on the running time and computational overhead for a typical control application are reported. The details given next pertain to the implementation of the Projective Method available in the release 1.0 of the LMI Control Toolbox [8, 9].

5.1

Structured representation of LMIs

Henceforth, the discussion is implicitly specialized to the case of a single homogenous LMI constraint n X Ax = xi Ai > 0 . (5.2) i=1

All arguments readily extend to non-homogenous LMIs and to the case of multiple LMI constraints. The canonical representation (5.2) is generic and without further information on the LMI structure, the LMI is best described by storing the upper triangles of the symmetric matrices Ai . In most control applications however, LMI constraints are heavily structured which renders the canonical representation quite inefficient. Specifically, most control-like LMIs are of the form Ax =

nt X   Lr Zr RrT + Rr ZrT LTr ,

(5.3)

r=1

where Lr , Rr are given matrices and Zr is a particular instance in a collection Y1 , . . . , YV of structured matrix variables, e.g., symmetric, block-diagonal, Toeplitz, etc. Here the decision variables x1 , . . . , xn are the free entries of Y1 , . . . , YV (taking structure into account). From now on, (5.2) is referred to as the “unstructured” representation while (5.3) is called the “structured” representation. To illustrate the benefits of the structured representation, we use the following simple example: AY E T + EY AT < 0

(5.4)

where A, E ∈ Rp×p are given and Y is a symmetric matrix to be determined. This LMI would be called a generalized Lyapunov inequality. Note that this example is only chosen for its tutorial value, and that in the particular case of Lyapunov inequalities, additional properties could be used to further boost efficiency [19]. However, while the speed-ups obtained in [19] are limited to Lyapunov or Riccati inequalities, the benefits of the structured representation and rank-one linear algebra discussed below pertain to the general class of structured problems (5.3). To clarify the subsequent argument, we summarize the dimensional parameters involved in Problem (5.4): • p denotes the number of states in the problem, that is, the row dimension of A • n = p(p + 1)/2 is the number of decision variables (free entries of Y when accounting for symmetry)

19

• M = p(p + 1)/2 is the dimension of the image space S where AY E T + EY AT takes values • nt denotes the number of terms in LMIs of the form (5.3). It is typically small compared to n, and nt = 1 for the LMI (5.4). In the flop counts and storage estimates given below, we only keep the dominant term so that, e.g., p(p + 1)/2 + 3p is approximated by p2 /2. An immediate benefit of the structured representation is in terms of storage requirement and cost of evaluating Ax. Take the example (5.4). With the unstructured representation (5.2), we need to store all Ai which uses p4 /4 storage since there are n ≈ p2 /2 decision variables. By comparison, in the structured representation it suffices to store the matrices A and E which uses 2p2 storage. Similarly, evaluating Ax in the unstructured case costs p4 /2 flops vs. only 2p3 with the structured representation. Thus, the structured representation is significantly cheaper in terms of storage requirement and evaluation cost. More importantly, the computational burden attached to solving the least-squares problem (5.1) can also be significantly reduced. This claim is justified next.

5.2

Rank-one linear algebra

The main computational effort in the Projective Method, as in any other interior-point method for this type of problem, is the one required to project a given symmetric X onto the range space of A in the metric h·, ·iS where S > 0. That is, to compute the solution x∗ ∈ Rn of the least-squares problem minn kAx − XkS . x∈R

Let S = LLT be a Cholesky factorization of S. There are two main linear algebra techniques to compute x∗ : • Cholesky-based approach: x∗ is the solution of the normal equation Hx∗ = q

(5.5)

where H and q are the matrix and vector with entries Hij = Tr(Ai SAj S),

qi = Tr(Ai LXLT ) .

(5.6)

To solve (5.5), we compute a Cholesky factorization H = RT R of the positive definite matrix H and x∗ is then obtained by solving two triangular linear systems. Note that H must be assembled as a preliminary to the Cholesky factorization. • QR-based approach: introducing the linear mapping L defined on S by L : Z 7→ LT ZL, the least-squares problem is equivalent to min kLAx − LXk kF ro .

x∈Rn

20

To solve this problem with the QR approach, we first compute the M ×n matrix B associated with the mapping LA. Then, by an orthogonal transformation Q, we reduce B to uppertriangular form:     R ζ1 QB = , Q vec(LXk ) = . 0 ζ2 The least-squares solution x∗ is then given by solving Rx∗ = ζ1 .

(5.7)

Note that the n × n matrix R is nothing but the Cholesky factor of H, i.e., H = RT R. In the unstructured case, there is no incentive at all for using the Cholesky approach since (1) the cost of assembling H is comparable to the cost of the QR factorization of LA, and (2) the normal equation (5.5) is more badly conditioned than (5.7) since κ(H) = κ(R)2 [10]. In the case of LMIs with structure (5.3), however, assembling the Hessian becomes cheap which makes the Cholesky approach very appealing as long as the least-squares problem remains relatively well-conditioned. To evaluate the speed-up attached with the Cholesky approach, consider again the simple example (5.4). The cost of the QR approach is easily estimated as follows: p5 /3 flops to form the matrix B (i.e., compute LT Ai L for i = 1, . . . , n) and 2M n2 = p6 /4 to perform the actual QR factorization. In comparison, the Cholesky approach costs n3 /3 = p6 /24 to factor the Hessian matrix H, to which we must add the cost of assembling H. As mentioned earlier, this cost is comparable to that of the QR factorization of B in general since H = B T B. However, it drops by orders of magnitude for LMIs of the form (5.3). To see this, denote by {ei }i=1,...,n the standard orthonormal basis of the design space Rn , and consider the term-oriented representation (5.3). Then Hij can be decomposed as nt X nt X rs Hij Hij = r=1 s=1

where rs Hij

= =

   Tr S Lr Ei RrT + Rr EiT LTr S Ls Ej RsT + Rs EjT LTs   2 Tr Ei (RrT SLs )Ej (RsT SLr ) + 2 Tr EiT (LTr SLs )Ej (RsT SRr )

(5.8)

and Ei , Ej denote the values of Zr and Zs for x = ei and x = ej , respectively. For typical matrix variable structures, Ei and Ej are low-rank matrices, i.e., Ei =

α X

T εhi δhi ,

Ej =

h=1

β X

T εlj δlj

l=1

where α, β are small integers (0, 1, or 2 in most cases) and ε, δ are canonical vectors of the appropriate space. Using this low-rank decomposition of Ei and Ej , we obtain rs Hij =2

β α X X 

T T T [δhi (RrT SLs )εlj ] × [δlj (RsT SLr )εhi ] + [εThi (LTr SLs )εlj ] × [δlj (RsT SRr )δhi ] .

h=1 l=1

(5.9) Now, this expression can be evaluated in O(1) flops since • all matrix products RrT SLs , RsT SLr , . . . can be computed beforehand (once for all Hij ) at a negligible cost 21

T T • the scalar products δhi (RrT SLs )εlj , δlj (RsT SLr )εhi , . . . simply amount to selecting particular entries of these pre-computed matrices.

As a result, each entry Hij of H can be assembled in O(n2t ) flops by exploiting the structure, and H is therefore assembled in O(n2t n2 ) flops. We call this assembling scheme rank-one linear algebra. In example (5.4) where nt = 1 and the rank of Ei (value of the matrix variable Y for x = ei ) is at most two, the precise cost of assembling H is • 3p3 flops to compute the matrices E T SE, AT SA, and E T SA (once and for all) • 8 flops to evaluate each Hij via (5.9), yielding a total of p4 flops since there are p4 /8 entries to be evaluated. The overall cost of the Cholesky approach is therefore p6 /24. The next table summarizes the previous analysis. The flop counts are given for a system of N Lyapunov inequalities of type (5.4). Cholesky approach Assembling Factorization O(N p4 ) p6 /24

QR approach Total N p6 /4

Table 5.1: Relative expense of the Cholesky and QR approaches. These figures are clearly in favor of the Cholesky approach. Note that the cost of QR grows linearly with the number of LMI constraints, while the dominant term in the Cholesky approach is invariant since it only depends on the number of variables in the problem. In the general case of block-matrix LMIs such as  T  A X + XA + C T C XB 0. I S

Here the variables are the two p × p symmetric matrices R and S and the scalar γ. Accordingly, the dimension of the design space is n = p(p + 1) + 1. This linear objective minimization problem was solved for various values of the state-space dimension p and with the additional constraint k x k2 ≤ 107 . Two experiments were conducted: 1. optimization using only rank-one linear algebra 2. optimization with a required relative accuracy of 10−4 on the optimal γ.

1

1 There is an additional built-in test to detect when the required accuracy is achieved which is as follows: at ˜k = a productive iteration k, when an improved value of the objective θk∗ is obtained, we compute the matrix X

23

All problems were randomly generated and can therefore be considered well conditioned. Since such problems can also be solved by direct linear algebra techniques [5], we compared the optimal value obtained by LMI optimization with the Riccati-based optimum to derive the final relative accuracy estimates. The results of these experiments appear in Table 5.2. The CPU times are for a DEC Alpha 200 4/166 workstation. The number of states, the total number of scalar variables, and the dimension of the image space S are denoted by p, n, and M , respectively. p

n

M

8 12 16 20 24 28 32

73 157 273 421 601 813 1057

292 680 1178 1750 2436 3236 4242

First experiment (rank-one only) # iter. relative error CPU time 18 2 × 10−4 2” 18 2 × 10−4 7” 18 2 × 10−2 20” 19 6 × 10−3 1’00” 19 2 × 10−3 2’44” 19 2 × 10−3 7’00” 20 4 × 10−3 15’00”

Second experiment (10−4 ) # iter. relative error CPU time 23 2 × 10−6 4” 24 2 × 10−5 23” 25 3 × 10−5 1’57” 25 1 × 10−6 6’20” 24 5 × 10−5 16’00” 24 1 × 10−5 39’45” 26 3 × 10−5 1h30’00”

Table 5.2: Experimental results. The next table indicates the distribution of the CPU time (in %) between the various linear algebra tasks. These figures are relative to the second experiment and confirm the high cost of the final few QR-based iterations. p 8 12 16 20 24 28 32

Assembling H 20 15 8 7 7 5 4

Cholesky 2 2 3 5 6 8 7

QR fact. 50 69 82 84 83 82 85

Other 28 14 7 4 4 5 4

Table 5.3: Distribution of CPU time.

6

Conclusions

A first-principle and comprehensive description of the Projective Method for solving LMI problems has been given. As a modern interior-point method, this algorithm has polynomial-time complexity and extensive practical experience confirms its high performance on LMI problems. In the context of control-oriented applications, the Projective Method can be implemented in a highly efficient manner so as to minimize the computational overhead per iteration. In addition, experimental results suggest that the number of iterations grows very slowly with the size of the problem. Xk+ (θk∗ − δ), δ being the required absolute accuracy in terms of the objective value, and check whether the matrix ˜ k is positive semidefinite. If it is the case, then the actual optimal value is ≥ θ0 ≡ θ∗ − δ, since the Zk = Xk − X k positive semidefinite matrix Xk−1 Zk Xk−1 is orthogonal (in the Frobenius Euclidean structure) to the plane E(θ0 ) and therefore gives a separator of the cone of positive semidefinite matrices and the plane. Thus, the indicated test provides us with a sufficient condition for detecting that the required accuracy is already achieved.

24

Appendix A Proof of Lemma 3.4: Let S := X −1 . The first identity is immediate from the definition of ψ: Tr{ψ 2 }

=

Tr{S 1/2 (X + − X)S(X + − X)S 1/2 } = Tr{S(X + − X)S(X + − X)}

=

kX + − Xk2S =: ρ2 .

To derive the second identity, observe that kX + − Xk2S

=

Tr{S(X + − X)S(X + − X)} = Tr{S(X + − X)SX + } − Tr{S(X + − X)SX}

=

hX + − X, X + iS − Tr{S(X + − X)}.

Now, hX + − X, X + iS = 0 since X + is the orthogonal projection of X onto E for the inner product h., .iS . Consequently, ρ2 = − Tr{S(X + − X)} = − Tr{S 1/2 (X + − X)S 1/2 } = − Tr(ψ).

Appendix B Proof of Lemma 3.5: Using a series expansion of the log function, we obtain π(γ)

=

log Det(I − γψ) =

m X

log(1 − γλi ) = −

i=1

=



m X

γλi −

i

k=2 i=1

i=1

i

k

k=1 i=1

m ∞ X X γ k λk

k

∞ X m X γ k λk

= γρ2 −

∞ X m X γ k λk i

k=2 i=1

k

.

(B.1)

In the last identity we used that Tr(ψ) = −ρ2 (see Lemma 3.4). Now, for k ≥ 2 we have ! k−2 m m X X k 2 | λi | ≤ λi max |λi | = ρ2 ρk−2 ∞ i=1

i=1

1≤i≤m

using this time Tr(ψ 2 ) = ρ2 . From the expression (B.1) of π(γ), it follows that   ∞ X γ k ρ2 ρk−2 1 ∞ 2 2 π(γ) ≥ γρ − = ρ γ + 2 [log (1 − γρ∞ ) + γρ∞ ] k ρ∞ k=2

which is exactly (3.23). Elementary calculus shows that the right-hand side is maximized for γ ∗ = 1/(1 + ρ∞ ). Finally, to establish π ∗ ≥ 1 − log 2, first observe that the function f (t) ≡ t−2 (t − log(1 + t)) is monotonically decreasing for t > 0 since Z t Z 1 f (t) = t−2 (t − τ )(1 + τ )−2 dτ = (1 − s)(1 + ts)−2 ds. 0

0

Two cases must now be distinguished: • if ρ∞ ≤ 1, then π ∗ ≥ ρ2 f (1) = ρ2 (1 − log 2) ≥ 1 − log 2 (recalling that ρ ≥ 1 from (3.15) since we assumed that Ω1 (X) ∩ E = ∅). 25

• otherwise, we have π ∗ ≥ ρ2 ρ−2 ∞ (1 − log 2) since t 7→ t − log(1 + t) is monotonically increasing. Now, ρ ≥ ρ∞ since from Lemma 3.4: ρ2 = T r(ψ 2 ) =

m X i=1

λ2i ≥ max λ2i = ρ2∞ . 1≥i≥m

Hence π ∗ ≥ 1 − log 2 also holds in this case.

References [1] Alizadeh, F., “Optimization over the Positive-Semidefinite Cone: Interior-Point Methods and Combinatorial Applications,” SIAM J. Opt., 5 1995. [2] Berman, A., Cones, Matrices, and Mathematical Programming, Lecture Notes in Economics and Mathematical Systems, Vol. 79, Springer Verlag, 1973. [3] Boyd, S.P., and L. El Ghaoui, “Method of Centers for Minimizing Generalized Eigenvalues,” Lin. Alg. & Applic., 188 (1993), pp. 63–111. [4] Boyd, S.P., L. El Ghaoui, E. Feron, V. Balakrishnan, Linear Matrix Inequalities in Systems and Control Theory, SIAM Studies in Applied Mathematics, 1994. [5] Doyle, J.C., K. Glover, P. Khargonekar, and B. Francis, “State-Space Solutions to Standard H2 and H∞ Control Problems,” IEEE Trans. Aut. Contr., AC-34 (1989), pp. 831-847. [6] Fan, M.K.H., “A Second-Order Interior Point Method for Solving Linear Matrix Inequality Problems,” submitted to SIAM J. Contr. Opt., 1993. [7] Gahinet, P., and P. Apkarian, “A Linear Matrix Inequality Approach to H∞ Control,” Int. J. Robust and Nonlinear Contr., 4 (1994), pp. 421–448. [8] Gahinet, P., A. Nemirovski, A.J. Laub, and M. Chilali, “The LMI Control Toolbox,” in Proc. Conf. Dec. Contr., 1994, pp. 2038–2041. [9] Gahinet, P., A. Nemirovski, A.J. Laub, and M. Chilali, LMI Control Toolbox, The MathWorks Inc., 1995. [10] Golub, G. H. and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, 1983. [11] Jarre, F., “An Interior-Point Method for Minimizing the Maximum Eigenvalue of a Linear Combination of Matrices”, SIAM J. Opt., 31 (1993). [12] Karmarkar, N. “A New Polynomial-Time Algorithm for Linear Programming,” Combinatorica, 4 (1984), pp. 373–395. [13] Kojima, M., S. Shindoh, and S. Hara, “Interior-Point Methods for the Monotone Linear Complementarity Problem in Symmetric Matrices,” Technical Report B-282, Dept. of Information Sciences, Tokyo Institute of Technologie, 1994. [14] Nesterov, Yu. and Nemirovski, A., “Polynomial-Time Barrier Methods in Convex Programming” (in Russian), Ekonomika i Matem. Metody, 24 (1988) (translated as Matekon)

26

[15] Nesterov, Yu, and A. Nemirovski,“An Interior-point Method for Generalized Linear-Fractional Problems,” to appear in Math. Programming Series B. [16] Nesterov, Yu, and A. Nemirovski, Interior Point Polynomial Methods in Convex Programming: Theory and Applications, SIAM Studies in Applied Mathematics, SIAM, Philadelphia, 1994. [17] Overton, M.L., “Large-Scale Optimization of Eigenvalues,” SIAM J. Opt., 2 (1992), pp. 88– 120. [18] Rendl, F., and H. Wolkowicz, “Applications of Parametric Programming and Eigenvalue Maximization to the Quadratic Assignment Problem,” Math. Program. Series B, 53 (1992), pp. 63–78. [19] Vandenberghe, L. and S. Boyd,“Primal-Dual Potential Reduction Method for Problems Involving Matrix Inequalities,” Math. Program. Series B, 69 (1995), pp. 205–236.

27