On the condition number of linear least squares problems ... - CiteSeerX

On the condition number of linear least squares problems in Frobenius norm Serge Gratton CERFACS Technical Report TR/PA/94/27 - December 1995 - Submitted to BIT

Abstract

Let A be an m n, m n full rank real matrix and b a real vector of size m. We give in this paper an explicit formula for the condition number of the Linear Least Squares Problem (LLSP) de ned by : minn kAx ? bk2 : x2IR

Let and two positive real numbers, we choose the weighted Frobenius norm k[A ; b]kF on the data and the usual Euclidean norm on the solution. A straight-forward generalization of the backward error of [6] for this norm is also provided. This allow us to carry out a rst order estimate of the forward error for the LLSP with this norm. This complete backward error analysis seems new. Finally, some numerical results are presented in the last section on matrices from the collection of [4]. Three algorithms have been tested : the QR factorization, the Normal Equations (NE), the Semi-Normal Equations (SNE). For simplicity of proofs, we only present results on real matrices. The results hold also, without any change, for complex matrices.

1 Introduction

Notations : In this paper, IRmn is the set of m n real matrices. AT is the transpose matrix of A, tr(A) is the trace of matrix A and A is the Moore-Penrose inverse of A. We will use the following property of Frobenius norm : kAkF = tr(AT A). Throughout this paper, A is assumed to be a full rank matrix so that A = (AT A)? AT . The function F : IRmn IRm ?! IRn (1) A; b 7?! F (A; b) = A b = x : +

2

+

1

+

gives the solution x to the LLSP. The vector r = Ax ? b is the residual vector of the LLSP at the solution x. We denote by K (A) = kAk kA k the generalized condition number of the 2

+

2

CERFACS, 42 av. G. Coriolis, 31057 Toulouse cedex, France. Email : [email protected]

1

matrix A. The perturbation theory of the LLSP is very rich. But to our knowledge, there is no explicit formula for the condition number of this problem. We nd in [1] a comprehensive overview of existing results, where the perturbation on matrices is usually measured with the spectral norm, and only upper bounds of the condition number are provided. Moreover, these bounds hold only if K (A) is small enough compared with the size of perturbation.

2 Condition number of the LLSP In this section, we give the de nition of absolute and relative condition number for a continuously dierentiable function. We justify our choice of weighted Frobenius norm, and give nally the formula of the relative condition number which is useful for characterizing ill-conditioned LLSP.

2.1 De nition of the absolute and relative condition number

Let X and Y be two normed subspaces equipped with norms k:kX and k:kY . We denote by k:k the operator norm induced by the choice of k:kX and k:kY . If F is a continuously dierentiable function F : IRmn IRm ?! IRn ; x 7?! F (x)

the absolute condition number of F at x is the scalar kF 0(x)k: The relative condition number of F at x is kF 0(x)kkxkX : kyk Y

More advanced de nitions and properties of the condition number can be found in [2].

2.2 Choice of a norm for the data

The backward error analysis gives a hint on the role played by the norm. Indeed, to carry out a backward error analysis, we rst have to de ne the data to be perturbed and a norm to measure the perturbation. Then the computed solution to the LLSP via an algorithm will be considered as the exact solution of a nearby perturbed problem. The size of the perturbation measured with the chosen norm is the backward error ([2] for instance). Multiplying this backward error by the condition number of the problem gives a rst order estimate of the forward error. The choice of norm is appropriate if the estimate for the forward error is close to the observed one in numerical tests. Our parameterized Frobenius norm : k[A ; b]kF has been chosen for its exibility. For instance, taking large values of allows to perturb b only, as shown in theorem 2.1, and therefore permits to address problems, like the parameter estimation, where the right hand side b arises from physical measurements (see [2]). Similarly, the preliminary numerical result obtained in the last section indicates that the choice = 1 and = 1 is appropriate to estimate the forward error obtained with normal equations. 2

2.3 Condition number

In this section, we derive the formula for the condition number of LLSP using the weighted norm. Theorem 2.1 The absolute condition number of the LLSP with the norm k[A ; b]kF on the data and the norm kxk on the solution is 2

v

u

+ u

A t

C=

2

1 + kxk + kA k krk : + 2 2 2

2 2

2

2 2

(2)

Proof Since A is full rank, F (A; b) = (AT A)? AT b. Using rules of composition of derivatives, we obtain F 0(A; b):(E; f ) = ?A Ex + A A T E T r + A f: (3) Obviously, AA (resp. I ? AA ) is an orthogonal projection onto ImA (resp. (ImA)?). Let E = EA + EA? with EA = AA E and EA? = (I ? AA )E . From kE kF == tr(E T E ), we get, using A = (AT A)? AT that, kE kF = tr(EAT EA ) + tr(EA?T EA?) = kEA kF +

EA?

F . Let k:k the operator norm induced by the choice of k[A ; b]kF on the data and kxk on the solution. 1

+

+

+

+

+

+

+

+

+

2

1

2

2

2

kF 0(A; b):(E; f )k =

A (A H (EA?)T r ? EA x + f )

A

(

A

EA?

F krk + kEAkF kxk + kf k ) 2

+

+

2

+

+

2

2

2

2

2

The norm of the linear map F 0(A; b) is the supremum of of kF 0(A; b):(E; f )k on the unit ball of IRmn IRm, that is, with our choice of norm, the supremum over the set of points (E; f ) such that k[E ; f ]kF = kE k + kf k = 1. One then gets 2

2

kF 0(A; b)k =

sup 2

2 kE kF + 2 kf k22 =1

2 2

2

2 2

2

+ +H ? T

A (A (EA ) r

sup 2

2 kEA k2F +2 kEA?kF + 2 kf k22 =1

+

A

2

? EAx + f )

F

( kA k

EA?

F krk + kEAkF kxk + 1 kf k ): +

2

2

2

2

Let de ne z = [

EA?

F kEAkF kf k ]T . 2

kF 0(A; b)k

A+

2

"

sup kA k krk +

kzk2 =1

2

Therefore,

kF 0(A; b)k

2

kxk

v

u

+ u t

A 2

2

#

1 z =

A

+

"

2

kA k krk +

kxk + kA k krk + 1 : 2 2

+ 2 2 2

2

2 2

2

2

kxk

2

1

#

2

:

(4)

We show now that this upper bound is reachable. We consider for that purpose the singular value decomposition of A : A = U V T : We de ne u et v respectively the left and right +

+

3

singular vectors associated with the largest singular value, kA k , of A , and we set = v u u kxk + kA k krk 1 , E = ? 1 vxT + kA k ruT and f = 1 v. By de nition, t + T T T A v = kA k u and u A = kA k v . Normal equation gives A (Ax ? b) = 0, so since A r = (AT A)? AT (Ax ? b) = 0, so vT r = uT A r = rT v = 0. We prove rst that the perturbation (E; f ) is feasible, that is, that kE kF + kf k = 1. We have

kE kF = tr((?vxT +

A

ruT )T (?vxT +

A

ruT ))

= tr(xxT ) +

A

krk tr(uuT ) ? 2

A

tr(urT vx)

= kxk +

A

krk ; since rT v = 0 and kuk = 1: Further, from kf k = 1 we prove that kE kF + kf k = 1. Then we compute for our perturbation (E; f ) the quantity kF 0(A; b):(E; f )k . Using AT r = 0 and vT r = 0, one gets

F 0(A; b):(E; f ) = kxk = A v +

A

= AA H u + 1= A v: +

2 2

+

+

+ 2 2 2

+

2

2 2

+

2

2

+

+

2

2

2

1

+

2

2

+

2

2

4

2

2

+

+

2 2

+

2

2 2

2

2 2

2 2

2

+

2 2

2

+

2

2 2

2

2

2

2

2 2

2

4 2

2

2 2

2

+

+

2

+

2

+

2

Moreover, A v = kA k u and uT A = kA k vT , thus +

+

+

2

+

2

F 0(A; b):(E; f ) = kA k kxk u + krk kA k u + kA k u

= 1

A

( kxk + krk kA k + 1 )u =

A +

2 2

2 2

2 2

2 2

1 so that

v u u t

2

2

+ 2 2

2

2

2

+

2

+

Consequently, kF 0(A; b):(E; f )k

+ 3 2

2 2

+

2

2

u:

= kA k kxk + kA k krk + 1 ; with kE kF + kf kF = +

kF 0(A; b)k

2

v

u

+ u

A t

and inequality (4) completes the proof.

2

+ 2 2 2

2 2

2 2

2

2

2

2

2

kxk + kA k krk + 1 ; 2 2

+ 2 2 2

2 2

2

2

Corollary 2.2 Taking = 1 and = 1 in the condition number C of theorem 2.1 gives the

case where both A and b are perturbed. Letting tend to in nity (resp. tend to in nity), no perturbation on the matrix A (resp. on the right-hand side) is permitted.

Proof The condition number is the supremum of kF 0(A; b):(E; f )k for all E anf f satifying the constraint kE k + kf k = 1. Therefore, the condition ! 1 implies E ! 0 and similarly that ! 1 implies f ! 0. 2

2 2

2

2

2 2

2 4

2.4 Remarks

In [3], Geurts obtained the same condition number for perturbation of A only. ( = +1) The relative condition number with perturbations on A and b ( = = 1) is q C = kA k kkx[kA ; b]kF kxk + kA k krk + 1: From C , we can derive the following classes of ill-conditioned LLSP : problems where kxk is small or kbk is large, problems where r = Ax ? b is large, problems where K (A) is large, since kAk kAkF k[A ; b]kF : +

2 2

2

2

2

+ 2 2

2 2

2

2

3 Forward error estimate for the LLSP

Let x~ be a purported solution to the LLSP of minimizing kAx ? bk . Let C = f(E; f ); kb + f ? (A + E )~xk = minx kb + f ? (A + E )xk g : In [6], Walden Kalson and Sun give the formula of the LLSP backward error, BEWKS (), using the norm k[A ; b]kF . They actually prove that 2

2

2

(

)

BEWKS () = E;e min2C k[E ; e]kF = min kkxr~~kk ; min([A; R]) ; where r~ = Ax~ ? b and R is de ned by I ? r~r~ R = q kr~k 1 + kx~k We get now a straight forward corollary of this result using the norm k[A ; b]kF . Corollary 3.1 The backward error using weighted Frobenius norm is 2

(

)

2

+

2

2 2

2

min k[E ; e]kF = BEWKS

E;e)2C

(

Proof immediate, using k[E ; e]kF =

(5)

"

E ;

#

e

F

!

:

:

2

The main interest of this new formulation is that the backward error for perturbation on b can be obtained by setting a high value for . In the same way, ! 1 gives the backward error for perturbations on A only, as it has been already pointed out in [6].

5

4 Numerical experiments To validate the choice of norms on the data, we carry out tests on 3 least squares methods implemented in Matlab ([5]) : Householder QR factorization, Normal equations solved with the Cholesky factorization, Semi-normal equation approach. We have used the 3 following matrices in our tests : the Vandermonde matrix, Vand(25; 10), from [4], the matrix Lauchli matrix, Lau(5; 6), from [4] well-known for illustrating the numerical hazard in forming explicitly AT A , the matrix M (25; 10) = (mij ) de ned by (

j > 25 ? i : mij = 01 ifotherwise :

p p

p

The right-hand side b is equal to ( 1; 2 : : : m). The experiments reported in Tables 1, 2, and 3 have been done for = = 1. We have computed the relative condition number C (whose value is written in the captions) and the relative backward error. The exact solution to our problems were computed with the software Mathematica (see [7]). We have used this exact value to compute the forward error for each algorithm. C BE FE 2 10? 8 10? 3 10? 10? 10? 6 10?

Matrix V and(25; 15) BE QR 3 10? NE 3 10? SNE 2 10?

16

10

12 14

Table 1:

7

8

10

(25 15), = 4 105

V and

;

C

C BE FE 10? 10? 11 3 ? 10 3 10?

Matrix Lau(5; 6) BE QR 3 10? NE 6 10? SNE 2 10?

17

7

9

16

Table 2:

13

6

7

(5 6), = 2 109

Lau

;

6

C

15

9

Matrix M (25; 10) BE QR 5 10? NE 2 10? SNE 5 10?

15 15 15

Table 3:

M

C BE FE 4 10? 3 10? 10? 2 10? ? 4 10 10? 13

13

13

15 15

14

(25 10), = 76 ;

C

We see by comparing the C BE to the forward error that we get in every case an upper bound of the forward error. Furthermore our choice of norm give a rather good estimate in the case of the normal equations and semi-normal equations.

5 Conclusion However, to our knowledge, this paper establishes the rst complete model for the forward error analysis of LLSP. Nevertheless, to investigate further properties of the resulting estimation of the forward error, it could be interesting to experiment this estimate with more values of and . By taking m = n, the results described in the paper hold also for linear systems and we get condition numbers and backwards error expressed in the proposed family of weighted norms.

Acknowledgments The author is indebted to F. Chaitin-Chatelin and V. Fraysse for critical comments which improved considerably this paper.

References

[1] A. Bjorck. Least Squares Methods. May 1991. [2] F. Chaitin-Chatelin and V. Fraysse. Lectures on Finite Precision Computations. SIAM, Philadelphia, 1996. In press. [3] A. J. Geurts. A contribution to the theory of condition. Numer. Math., 39:85{96, 1982. [4] N. J. Higham. A collection of test matrices in MATLAB. ACM Trans. Math. Software, 17:289{ 305, September 1991. [5] The MathWorks. MATLAB Reference Guide. The MathWorks Inc., Natick, MA, 1992. [6] B. Walden, R. Karlson, and J. Sun. Optimal backward perturbation bounds for the linear least squares problem. J. Numer. Linear Algebra Appl., 2:271{286, 1995. [7] S. Wolfram. Mathematica, a system for doing mathematics by computer. Addison-Wesley Publishing Company, 1988.

7