Finite-Dimensional Linear Algebra Solutions to selected odd ...

13 downloads 22762 Views 472KB Size Report
Finite-Dimensional Linear Algebra. Solutions to selected odd-numbered exercises. Mark S. Gockenbach. June 19, 2012 ...
Finite-Dimensional Linear Algebra Solutions to selected odd-numbered exercises Mark S. Gockenbach June 19, 2012

Errata for the first printing The following corrections will be made in the second printing of the text, expected in 2011. These solutions are written as if they have already been made. Page 65: Exercise 14: belongs in Section 2.7. Page 65: Exercise 16: should read “(cf. Exercise 2.3.21)”, not “(cf. Exercise 2.2.21)”. Page 71: Exercise 9 (b): Z54 should be Z45 . Page 72: Exercise 11: “over V ” should be “over F ”. Page 72: Exercise 15: “i = 1, 2, . . . , k” should be “j = 1, 2, . . . , k” (twice). Page 79: Exercise 1: “x3 = 2” should be “x3 = 3”. Page 82: Exercise 14(a): “Each Ai and Bi has degree 2n + 1” should read “Ai , Bi ∈ P2n+1 for all i = 0, 1, . . . , n”. Page 100, Exercise 11: “K : C[a, b] → C[a, b]” should be “K : C[c, d] → C[a, b]” Page 114, Line 9: “L : F n → Rm ” should be “L : F n → F m ”. Page 115: Exercise 8: S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} X = {(1, 1, 1), (0, 1, 1), (0, 0, 1)}. should be S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, X = {(1, 1, 1), (0, 1, 1), (0, 0, 1)}. Page Page Page Page

116, 121, 124, 124,

Exercise Exercise Exercise Exercise

17(b): “F mn ” should be “F mn ”. 3: “T : R4 → R3 ” should be “T : R4 → R4 ”. 15: “T : X/ker(L) → R(U )” should be “T : X/ker(L) → R(L)”. 15: T ([x]) = T (x) for all [x] ∈ X/ker(L)

should be T ([x]) = L(x) for all [x] ∈ X/ker(L). Page 129, Exercise 4(b): Period is missing at the end of the sentence. Page 130, Exercise 8: L : Z33 → Z33 should read L : Z35 → Z35 Page 130, Exercise 13(b): “T defines . . . ” should be “S defines . . . ”. Page 131, Exercise 15: “K : C[a, b] × C[c, d] → C[a, b]” should be “K : C[c, d] → C[a, b]”. Page 138, Exercise 7(b): “define” should be “defined”. Page 139, Exercise 12: In the last line, “sp{x1 , x2 , . . . , xn }” should be “sp{x1 , x2 , . . . , xk }”. Page 139, Exercise 12: The proposed plan for the proof is not valid. Instead, the instructions should read: Choose vectors x1 , . . . , xk ∈ X such that {T (x1 ), . . . , T (xk )} is a basis for R(T ), and choose a basis {y1 , . . . , yℓ } for ker(T ). Prove that {x1 , . . . , xk , y1 , . . . , yℓ } is a basis for X. (Hint: First show that ker(T ) ∩ sp{x1 , . . . , xk } is trivial.) Page 140, Exercise 15: In the displayed equation, |Aii should be |Aii |. Page 168: Definition 132 defines the adjacency matrix of a graph, not the incidence matrix (which is something different). The correct term (adjacency matrix) is used throughout the rest of the section. (Change “incidence” to “adjacency” in three places: the title of Section 3.10.1, Page 168 line -2, Page 169 line 1.) Page 199, Equation (3.41d): “x1 , x2 ≤ 0” should be “x1 , x2 ≥ 0”. 1

2

Errata for the first printing

Page 204, Exercise 10: “α1 , . . . , αk ∈ R” should be “α1 , . . . , αk ≥ 0”. Also, C should not be boldface in the displayed formula. Page 221, Exercise 9: “m > n” should be “m < n”. Page 242, Corollary 194: “for each i = 1, 2, . . . , t” should be “for each i = 1, 2, . . . , m”. Page 251, Exercise 18(e):   0 , w= v should be w=



0 v



.

(That is, the comma should be a period.) Page 256, Exercise 13: First line should read “Let X be a finite-dimensional vector space over C with basis. . . ”. References in part (b) to F n×n , F k×k , F k×ℓ , F ℓ×ℓ should be replaced with Cn×n , etc. Also, in part (b), “Prove that [T ]X ” should be replaced with “Prove that [T ]X ,X ”. Page 264, Exercise 3: Add “Assume {p, q} is linearly independent.” Page 271, Exercise 3: “. . . we introduced the incidence matrix . . . ” should be “. . . we introduced the adjacency matrix . . . ”. Page 282, Exercise 6: S = sp{(1, 3, −3, 2), (3, 7, −11, −4)} should be S = sp{(1, 4, −1, 3), (4, 7, −19, 3)}. Page 282, Exercise 7(b): “N (A) ∩ col(A)” should be “N (A) ∩ col(A) = {0}”. Page 283, Exercise 12: “Lemma 5.1.2” should be “Lemma 229”. Page 306, Example 252: “B = {p0 , D(p0 ), D2 (p0 )} = {x2 , 2x, 2}” should be “B = {D2 (p0 ), D(p0 ), p0 } = {2, 2x, x2 }”. Also, “[T ]B,B ” should be “[D]B,B ” (twice). Similarly, A should be defined as {2, −1+2x, 1−x+x2 } and “[T ]A,A ” should be “[D]A,A ”. Page 308, Exercise 3: “Suppose X is a vector space. . . ” should be “Suppose X is a finite-dimensional vector space. . . ”. Page 311, Line 7: “corresponding to λ” should be “corresponding to λi ”. Page 316, Exercise 6(f ): Should end with a “;” instead of a “.”. Page 317, Exercise 15: “ker((T − λI)2 ) = ker(A − λI)” should be “ker((T − λI)2 ) = ker(T − λI)”. Page 322, displayed equation (5.21): The last line should read vr′ = λvr . Page 325, Exercise 9: “If U (t0 ) is singular, say U (t)c = 0 for some c ∈ Cn , c 6= 0” should be “If U (t0 ) is singular, say U (t0 )c = 0 for some c ∈ Cn , c 6= 0”. Page 331, Line 16: “. . . is at least t + 1” should be “. . . is at least s + 1”. Page 356, Exercise 9: “. . . such that {x1 , x2 , x3 , x4 }.” should be “. . . such that {x1 , x2 , x3 , x4 } is an orthogonal basis for R4 .” Page 356, Exercise 13: “. . . be a linearly independent subset of V ” should be “. . . be an orthogonal subset of V ” Page 356, Exercise 14: “. . . be a linearly independent subset of V ” should be “. . . be an orthogonal subset of V ” Pages 365–368: Miscellaneous exercises 1–21 should be numbered 2–22. Page 365, Exercise 6 (should be 7): “. . . under the L2 (0, 1) norm” should be “. . . under the L2 (0, 1) inner product”. Page 383, Line 1: “col(T )” should be “col(A)” and “col(T )⊥ ” should be “col(A)⊥ ”. Page 383, Exercise 3: “. . . a basis for R4 ” should be “. . . a basis for R3 ”. Page 384, Exercise 6: “basis” should be “bases”. Page 385, Exercise 14: “Exercise 6.4.13” should be “Exercise 6.4.1”. “That exercise also” should be “Exercise 6.4.13”. Page 385, Exercise 15: “See Exercise 6.4” should be “See Exercise 6.4.14”. Page 400, Exercise 4:   b−a (t + 1) . f˜(x) = f a + 2

3

Errata for the first printing should be f˜(x) = f

  b−a (x + 1) . a+ 2

Page 410, Exercise 1: The problem should specify ℓ = 1, k(x) = x + 1, f (x) = −4x − 1. Page 411, Exercise 6: “u(ℓ) = 0.” should be “u(ℓ) = 0” (i.e. there should not be a period after 0). Page 424, Exercise 1: “. . . prove (1)” should be “. . . prove (6.50)”. Page 432, Exercise 9: “G−1/2 is the inverse of G−1/2 ” should be “G−1/2 is the inverse of G1/2 ”. Page 433, Exercise 16: “. . . so we will try to estimate the values u(x1 ), u(x2 ), . . . , u(xn )” should be “. . . so we will try to estimate the values u(x1 ), u(x2 ), . . . , u(xn−1 )”. Page 438, Exercise 3: “. . . define T : Rn → F n ” should be “. . . define T : F n → F n ”. Page 448, Exercise 8: In the formula for f , −200x21 x2 should be −200x21 x2 . Also, (−1.2, 1) should be (1, 1). Page 453, Exercise 6: Add: “Assume ∇g(x(0)) has full rank.” Page 475, Exercise 10: “A = GH” should be “A = GQ”. Page 476, Exercise 15(a): v uX n um X |A2ij for all A ∈ Cm×n . kAkF = t i=1 j=1

should be

v uX n um X kAkF = t |Aij |2 for all A ∈ Cm×n . i=1 j=1

Page 476, Exercise 15: No need to define kCkF again. Page 501, last paragraph: The text fails to define k ≡ ℓ (mod p) for general k, ℓ ∈ Z. The following text should be added: “In general, for k, ℓ ∈ Z, we say that k ≡ ℓ (mod p) if p divides k − ℓ, that is, if there exists m ∈ Z with k = ℓ + mp. It is easy to show that, if r is the congruence class of k ∈ Z, then p divides k − r, and hence this is consistent with the earlier definition. Moreover, it is a straightforward exercise to show that k ≡ ℓ (mod p) if and only if k and ℓ have the same congruence class modulo p.” (k) (k) Page 511, Theorem 381: “Aij = Aij ” should be “Mij = Aij ”. Page 516, Exercise 8: “A(1) , A(2) , . . . , A(n−1) ” should be “M (1) , M (2) , . . . , M (n−1) ”. Page 516, Exercise 10: n2 /2 − n/2 should be n2 − n. Page 523, Exercise 6(b): “. . . the columns of AP are. . . ” should be “. . . the columns of AP T are. . . ”. Page 535, Theorem 401: kAk1 should be kAk∞ (twice). Page 536, Exercise 1: “. . . be any matrix norm. . . ” should be “. . . be any induced matrix norm. . . ”. Page 554, Exercise 4: “. . . b is consider the data . . . ” should be “. . . b is considered the data . . . ”. Page 554, Exercise 5: “. . . b is consider the data . . . ” should be “. . . b is considered the data . . . ”. Page 563, Exercise 7: “Let v ∈ Rm be given and define α = ±kxk2 , let {u1 , u2 , . . . , um } be an orthonormal basis for Rm , where u1 = x/kxk2 . . . ” should be “Let v ∈ Rm be given, define α = ±kvk2 , x = αe1 − v, u1 = x/kxk2 , and let {u1 , u2 , . . . , um } be an orthonormal basis for Rm ,. . . ”. Page 571, Exercise 3: “Prove that the angle between Ak v0 and x1 converges to zero as k → ∞” should be “Prove that the angle between Ak v0 and sp{x1 } = EA (λ1 ) converges to zero as k → ∞. Page 575, line 15: 3n2 − n should be 3n2 + 2n − 5. Page 575, line 16: “n square roots” should be “n − 1 square roots”. Page 580, Exercise 3: “. . . requires 3n2 − n arithmetic operations, plus the calculation of n square roots,. . . ” should be “. . . requires 3n2 + 2n − 5 arithmetic operations, plus the calculation of n − 1 square roots,. . . ”. Page 585, line 19: “original subsequence” should be “original sequence”. Page 585, line 20: “original subsequence” should be “original sequence”. Page 604, Exercise 4: “Theorem 4” should be “Theorem 451”. Page 608, line 18: “. . . exists a real number. . . ” should be “. . . exists as a real number. . . ”.

Chapter 2

Fields and vector spaces 2.1

Fields

3. Let F be a field and let α ∈ F be nonzero. We wish to show that the multiplicative inverse of α is unique. Suppose β ∈ F satisfies αβ = 1. Then, multiplying both sides of the equation by α−1 , we obtain α−1 (αβ) = α−1 · 1, or (α−1 α)β = α−1 , or 1 · β = α−1 . It follows that β = α−1 , and thus α has a unique multiplicative inverse. 7. Let F be a field and let α, β be elements of F . We wish to show that the equation α + x = β has a unique solution. The proof has two parts. First, if x satisfies α + x = β, then adding −α to both sides shows that x must equal −α + β = β − α. This shows that the equation has at most one solution. On the other hand, x = −α + β is a solution since α + (−α + β) = (α − α) + β = 0 + β = β. Therefore, α + x = β has a unique solution, namely, x = −α + β. 13. Let F = {(α, β) : α, β ∈ R}, and define addition and multiplication on F by (α, β)+(γ, δ) = (α+γ, β+δ), (α, β) · (γ, δ) = (αγ, βδ). With these definitions, F is not a field because multiplicative inverses do not exists. It is straightforward to verify that (0, 0) is an additive identity and (1, 1) is a multiplicative identity. Then (1, 0) 6= (0, 0), yet (1, 0) · (α, β) = (α, 0) 6= (1, 1) for all (α, β) ∈ F . Since F contains a nonzero element with no multiplicative inverse, F is not a field. 15. Suppose F is a set on which are defined two operations, addition and multiplication, such that all the properties of a field are satisfied except that addition is not assumed to be commutative. We wish to show that, in fact, addition must be commutative, and therefore F must be a field. We first note that it is possible to prove that 0 · γ = 0, −1 · γ = −γ, and−(−γ) = γ for all γ ∈ F without invoking commutativity of addition. Moreover, for all α, β ∈ F , −β + (−α) = −(α + β) since (α + β) + (−β + (−α)) = ((α + β) + (−β)) + (−α) = (α + (β + (−β))) + (−α) = (α + 0) + (−α) = α + (−α) = 0. We therefore conclude that −1 · (α + β) = −β + (−α) for all α, β ∈ F . But, by the distributive property, −1 · (α + β) = −1 · α + (−1) · β = −α + (−β), and therefore −α + (−β) = −β + (−α) for all α, β ∈ F . Applying this property to −α, −β in place of α, β, respectively, yields α + β = β + α for all α, β ∈ F , which is what we wanted to prove. 19. Let F be a finite field. (a) Consider the elements 1, 1 + 1, 1 + 1 + 1, . . . in F . Since F contains only finitely many elements, there must exist two terms in this sequence that are equal, say 1 + 1 + · · · + 1 (ℓ terms) and 1 + 1 + · · · + 1 (k terms), where k > ℓ. We can then add −1 to both sides ℓ times to show that 1 + 1 + · · · + 1 (k − ℓ terms) equals 0 in F . Since at least one of the sequence 1, 1 + 1, 1 + 1 + 1, . . . equals 0, we can define n to be the smallest integer greater than 1 such that 1 + 1 + · · · + 1 = 0 (n terms). We call n the characteristic of the field. (b) Given that the characteristic of F is n, for any α ∈ F , we have α + α + · · · + α = α(1 + 1 + · · · + 1) = α · 0 = 0 if the sum has n terms. 5

6

CHAPTER 2. FIELDS AND VECTOR SPACES (c) We now wish to show that the characteristic n is prime. Suppose, by way of contradiction, that n = kℓ, where 1 < k, ℓ < n. Define α = 1 + 1 + · · · + 1 (k terms) and β = 1 + 1 + · · · + 1 (ℓ terms). Then αβ = 1 + 1 + · · · + 1 (n terms), so that αβ = 0. But this implies that α = 0 or β = 0, which contradicts the definition of the characteristic n. This contradiction shows that n must be prime.

2.2

Vector spaces

7. (a) The elements of P1 (Z2 ) are the polynomials 0, 1, x, 1 + x, which define distinct functions on Z2 . We have 0 + 0 = 0, 0 + 1 = 1, 0 + x = x, 0 + (1 + x) = 1 + x, 1 + 1 = 0, 1 + x = 1 + x, 1 + (1 + x) = x, x+x = (1+1)x = 0x = 0, x+(1+x) = 1+(x+x) = 1, (1+x)+(1+x) = (1+1)+(x+x) = 0+0 = 0. (b) Nominally, the elements of P2 (Z2 ) are 0, 1, x, 1 + x, x2 , 1 + x2 , x + x2 , 1 + x + x2 . However, since these elements are interpreted as functions mapping Z2 into Z2 , it turns out that the last four functions equal the first four. In particular, x2 = x (as functions), since 02 = 0 and 12 = 1. Then 1 + x2 = 1 + x, x + x2 = x + x = 0, and 1 + x + x2 = 1 + 0 = 1. Thus we see that the function spaces P2 (Z2 ) and P1 (Z2 ) are the same.

(c) Let V be the vector space consisting of all functions from Z2 into Z2 . To specify f ∈ V means to specify the two values f (0) and f (1). There are exactly four ways to do this: f (0) = 0, f (1) = 0 (so f (x) = 0); f (0) = 1, f (1) = 1 (so f (x) = 1); f (0) = 0, f (1) = 1 (so f (x) = x); and f (0) = 1, f (1) = 0 (so f (x) = 1 + x). Thus we see that V = P1 (Z2 ).

9. Let V = R2 with the usual scalar multiplication and the following nonstandard vector addition: u ⊕ v = (u1 + v1 , u2 + v2 + 1) for all u, v ∈ R2 . It is easy to check that commutativity and associativity of ⊕ hold, that (0, −1) is an additive identity, and that each u = (u1 , u2 ) has an additive inverse, namely, (−u1 , −u2 − 2). Also, α(βu) = (αβ)u for all u ∈ V , α, β ∈ R (since scalar multiplication is defined in the standard way). However, if α ∈ R, then α(u + v) = α(u1 + v1 , u2 + v2 + 1) = (αu1 + αv1 , αu2 + αv2 + α), while αu + αv = (αu1 , αu2 ) + (αv1 , αv2 ) = (αu1 + αv1 , αu2 + αv2 + 1), and these are unequal if α 6= 1. Thus the first distributive property fails to hold, and V is not a vector space over R. (In fact, the second distributive property also fails.) 15. Suppose U and V are vector spaces over a field F , and define addition and scalar multiplication on U × V by (u, v) + (w, z) = (u + w, v + z), α(u, v) = (αu, αv). We wish to prove that U × V is a vector space over F . In fact, the verifications of all the defining properties of a vector space are straightforward. For instance, (u, v) + (w, z) = (u + w, v + z) = (w + u, z + v) = (w, z) + (u, v) (using the commutativity of addition in U and V ), and therefore addition in U × V is commutative. Note that the additive identity in U × V is (0, 0), where the first 0 is the zero vector in U and the second is the zero vector in V . We will not verify the remaining properties here.

2.3

Subspaces

3. Let V be a vector space over R, and let v ∈ V be nonzero. We wish to prove that S = {0, v} is not a subspace of V . If S were a subspace, then 2v would lie in S. But 2v 6= 0 by Theorem 5, and 2v 6= v (since otherwise adding −v to both sides would imply that v = 0). Hence 2v 6∈ S, and therefore S is not a subspace of V . 7. Define S = {x ∈ R2 : ax1 + bx2 = 0}, where a, b ∈ R are constants. We will show that S is subspace of R2 . First, (0, 0) ∈ S, since a · 0 + b · 0 = 0. Next, suppose x ∈ S and α ∈ R. Then ax1 + bx2 = 0, and therefore a(αx1 ) + b(αx2 ) = α(ax1 + bx2 ) = α · 0 = 0. This shows that αx ∈ S, and therefore S is closed under scalar multiplication. Finally, suppose x, y ∈ S, so that ax1 + bx2 = 0 and ay1 + by2 = 0. Then a(x1 + y1 ) + b(x2 + y2 ) = (ax1 + bx2 ) + (ay1 + by2 ) = 0 + 0 = 0, which shows that x + y ∈ S, and therefore that S is closed under addition. This completes the proof. 11. Let R be regarded as a vector space over R. We wish to prove that R has no proper subspaces. It suffices to prove that if S is a nontrivial subspace of R, then S = R. So suppose S is a nontrivial subspace,

2.4. LINEAR COMBINATIONS AND SPANNING SETS

7

which means that there exists x 6= 0 belonging to S. But then, given any y ∈ R, y = (yx−1 )x belongs to S because S is closed under scalar multiplication. Thus R ⊂ S, and hence S = R. n o Rb 17. Let S = u ∈ C[a, b] : a u(x) dx = 0 . We will show that S is a subspace of C[a, b]. First, since the integral of the zero function is zero, we see that the zero function belongs to S. Next, suppose u ∈ S Rb Rb Rb and α ∈ R. Then a (αu)(x) dx = a αu(x) dx = α a u(x) dx = α · 0 = 0, and therefore αu ∈ S. Finally, Rb Rb Rb Rb suppose u, v ∈ S. Then a (u + v)(x) dx = a (u(x) + v(x)) dx = a u(x) dx + a v(x) dx = 0 + 0 = 0. This shows that u + v ∈ S, and we have proved that S is a subspace of C[a, b]. 19. Let V be a vector space over a field F , and let X and Y be subspaces of V . (a) We will show that X ∩ Y is also a subspace of V . First of all, since 0 ∈ X and 0 ∈ Y , it follows that 0 ∈ X ∩ Y . Next, suppose x ∈ X ∩ Y and α ∈ F . Then, by definition of intersection, x ∈ X and x ∈ Y . Since X and Y are subspaces, both are closed under scalar multiplication and therefore αx ∈ X and αx ∈ Y , from which it follows that α ∈ X ∩ Y . Thus X ∩ Y is closed under scalar multiplication. Finally, suppose x, y ∈ X ∩ Y . Then x, y ∈ X and x, y ∈ Y . Since X and Y are closed under addition, we have x + y ∈ X and x + y ∈ Y , from which we see that x + y ∈ X ∩ Y . Therefore, X ∩ Y is closed under addition, and we have proved that X ∩ Y is a subspace of V .

(b) It is not necessarily that case that X ∪ Y is a subspace of V . For instance, let V = R2 , and define X = {x ∈ R2 : x2 = 0}, Y = {x ∈ R2 : x1 = 0}. Thus X ∪ Y is not closed under addition, and hence is not a subspace of R2 . For instance, (1, 0) ∈ X ⊂ X ∪ Y and (0, 1) ∈ Y ⊂ X ∪ Y ; however, (1, 0) + (0, 1) = (1, 1) 6∈ X ∪ Y .

2.4

Linear combinations and spanning sets

3. Let S = sp{1 + 2x + 3x2 , x − x2 } ⊂ P2 . (a) There is a (unique) solution α1 = 2, α2 = 1 to α1 (1 + 2x + 3x2 ) + α2 (x − x2 ) = 2 + 5x + 5x2 . Therefore, 2 + 5x + 5x2 ∈ S.

(b) There is no solution α1 , α2 to α1 (1 + 2x + 3x2 ) + α2 (x − x2 ) = 1 − x + x2 . Therefore, 1 − x + x2 6∈ S. 7. Let u = (1, 1, −1), v = (1, 0, 2) be vectors in R3 . We wish to show that S = sp{u, v} is a plane in R3 . First note that if S = {x ∈ R3 : ax1 + bx2 + cx3 = 0}, then (taking x = u, x = v) we see that a, b, c must satisfy a + b − c = 0, a + 2c = 0. One solution is a = 2, b = −3, c = −1. We will now prove that S = {x ∈ R3 : 2x1 − 3x2 − x3 = 0}. First, suppose x ∈ S. Then there exist α, β ∈ R such that x = αu+βv = α(1, 1, −1)+β(1, 0, 2) = (α+β, α, −α+2β), and 2x1 −3x2 −x3 = 2(α+β)−3α−(−α+2β) = 2α + 2β − 3α + α − 2β = 0. Therefore, x ∈ {x ∈ R3 : 2x1 − 3x2 − x3 = 0}. Conversely, suppose x ∈ {x ∈ R3 : 2x1 − 3x2 − x3 = 0}. If we solve the equation αu + βv = x, we see that it has the solution α = x2 , β = x1 −x2 , and therefore x ∈ S. (Notice that x2 (1, 1, −1)+(x1 −x2 )(1, 0, 2) = (x1 , x2 , 2x1 −3x2 ), and the assumption 2x1 − 3x2 − x3 = 0 implies that 2x1 − 3x2 = x3 .) This completes the proof. 11. Let S = sp{(−1, −3, 3), (−1, −4, 3), (−1, −1, 4)} ⊂ R3 . We wish to determine if S = R3 or if S is a proper subspace of R3 . Given an arbitrary x ∈ R3 , we solve α1 (−1, −1, 3) + α2 (−1, −4, 3) + α3 (−1, −1, 4) = (x1 , x2 , x3 ) and find that there is a unique solution, namely, α1 = −13x1 + x2 − 3x3 , α2 = 9x1 − x2 + 2x3 , α3 = 3x1 + x3 . This shows that every x ∈ R3 lies in S, and therefore S = R3 . 15. Let V be a vector space over a field F , and let u ∈ V , u 6= 0, α ∈ F . We wish to prove that sp{u} = sp{u, αu}. First, if x ∈ sp{u}, then x = βu for some β ∈ F , in which case we can write x = βu + 0(αu), which shows that x also belongs to sp{u, αu}. Conversely, if x ∈ sp{u, αu}, then there exist scalars β, γ ∈ F such that x = βu + γ(αu). But then x = (β + γα)u, and therefore x ∈ sp{u}. Thus sp{u} = sp{u, αu}.

8

CHAPTER 2. FIELDS AND VECTOR SPACES

2.5

Linear independence

3. Let V be a vector space over a field F , and let u1 , . . . , un ∈ V . Suppose ui = 0 for some i, 1 ≤ i ≤ n, and define scalars α1 , . . . , αn ∈ F by αk = 0 if k 6= i, αi = 1. Then α1 u1 + · · · + αn un = 0 · u1 + · · · + 0 · ui−1 + 1 · 0 + 0 · ui+1 + · · · + 0 · un = 0, and hence there is a nontrivial solution to α1 u1 + · · · + αn un = 0. This shows that {u1 , . . . , un } is linearly dependent.

9. We wish to show that {1, x, x2 } is linearly dependent in P2 (Z2 ). The equation α1 · 1 + α2 x + α3 x2 = 0 has the nontrivial solution α1 = 0, α2 = 1, α3 = 1. To verify this, we must simply verify that x + x2 is the zero function in P2 (Z2 ). Substituting x = 0, we obtain 0 + 02 = 0 + 0 = 0, and with x = 1, we obtain 1 + 12 = 1 + 1 = 0.

13. We wish to show that {p1 , p2 , p3 }, where p1 (x) = 1 − x2 , p2 (x) = 1 + x − 6x2 , p3 (x) = 3 − 2x2 , is linearly independent and spans P2 . We first verify that the set is linearly independent by solving α1 (1−x2 )+α2 (1+x−6x2 )+α3 (3−2x2 ) = 0. This equation is equivalent to the system α1 +α2 +3α2 = 0, α2 = 0, −α1 − 6α2 − 2α3 = 0, and a direct calculation shows that the only solution is α1 = α2 = α3 = 0. To show that the set spans P2 , we take an arbitrary p ∈ P2 , say p(x) = c0 + c1 x + c2 x2 , and solve α1 (1 − x2 ) + α2 (1 + x − 6x2 ) + α3 (3 − 2x2 ) = c0 + c1 x + c2 x2 . This is equivalent to the system α1 + α2 + 3α2 = c0 , α2 = c1 , −α1 − 6α2 − 2α3 = c2 . There is a unique solution: α1 = −2c0 − 16c1 − 3c2 , α2 = c1 , α3 = c0 +5c1 +c2 . This shows that p ∈ sp{p1 , p2 , p3 }, and, since p was arbitrary, that {p1 , p2 , p3 } spans all of P2 . 17. (a) Let V be a vector space over R, and suppose {x, y, z} is a linearly independent subset of V . We wish to show that {x + y, y + z, x + z} is also linearly independent. Let α1 , α2 , α3 ∈ R satisfy α1 (x + y) + α2 (y + z) + α3 (x + z) = 0. This equation is equivalent to (α1 + α3 )x + (α1 + α2 )y + (α2 + α3 )z = 0. Since {x, y, z} is linearly independent, it follows that α1 + α3 = α1 + α2 = α2 + α3 = 0. This system can be solved directly to show that α1 = α2 = α3 = 0, which proves that {x + y, y + z, x + z} is linearly independent. (b) We now show, by example, that the previous result is not necessarily true if V is a vector space over some field F 6= R. Let V = Z32 , and define x = (1, 0, 0), y = (0, 1, 0), and z = (0, 0, 1). Obviously {x, y, z} is linearly independent. On the other hand, we have (x + y) + (y + z) + (x + z) = (1, 1, 0)+(0, 1, 1)+(1, 0, 1) = (1+0+1, 1+1+0, 0+1+1) = (0, 0, 0), which shows that {x+y, y+z, x+z} is linearly dependent. 21. Let V be a vector space over a field F , and suppose {u1 , u2 , . . . , un } is linearly dependent. We wish to prove that, given any i, 1 ≤ i ≤ n, either ui is a linear combination of u1 , . . . , ui−1 , ui+1 , . . . , un or these vectors form a linearly dependent set. By assumption, there exist scalars α1 , . . . , αn ∈ F , not all zero, such that α1 u1 + · · · + αi ui + · · · + αn un = 0. We now consider two cases. If αi 6= 0, the we can solve the latter −1 −1 −1 equation for ui to obtain ui = −α−1 i α1 u1 −· · ·−αi αi−1 ui−1 −αi αi+1 ui+1 −· · ·−αi αn un . In this case, ui is a linear combination of the remaining vectors. The second case is that αi = 0, in which case at least one of α1 , . . . , αi−1 , αi+1 , . . . , αn is nonzero, and we have α1 u1 +· · ·+αi−1 ui−1 +αi+1 ui+1 +· · ·+αn un = 0. This shows that {u1 , . . . , ui−1 , ui+1 , . . . , un } is linearly dependent.

2.6

Basis and dimension

3. We now repeat the previous exercise for the vectors v1 = (−1, 3, −1), v2 = (1, −2, −2), v3 = (−1, 7, −13). If we try to solve α1 v1 + α2 v2 + α3 v3 = x for an arbitrary x ∈ R3 , we find that this equation is equivalent to the following system: −α1 + α2 − α3 = x1

α2 + 4α3 = 3x1 + x2 0 = 8x1 + 3x2 + x3 .

Since this system is inconsistent for most x ∈ R3 (the system is consistent only if x happens to satisfy 8x1 + 3x2 + x3 = 0), {v1 , v2 , v3 } does not span R3 and therefore is not a basis.

9

2.6. BASIS AND DIMENSION 7. Consider the subspace S = sp{p1 , p2 , p3 , p4 , p5 } of P3 , where p1 (x) = −1 + 4x − x2 + 3x3 , p2 (x) = 2 − 8x + 2x2 − 5x3 ,

p3 (x) = 3 − 11x + 3x2 − 8x3 , p4 (x) = −2 + 8x − 2x2 − 3x3 , p5 (x) = 2 − 8x + 2x2 + 3x3 . (a) The set {p1 , p2 , p3 , p4 , p5 } is linearly dependent (by Theorem 34) because it contains five elements and the dimension of P3 is only four. (b) As illustrated in Example 39, we begin by solving α1 p1 (x) + α2 p2 (x) + α3 p3 (x) + α4 p4 (x) + α5 p5 (x) = 0; this is equivalent to the system −α1 + 2α2 + 3α3 − 2α4 + 2α5 = 0, 4α1 − 8α2 − 11α3 + 8α4 − 8α5 = 0,

−α1 + 2α2 + 3α3 − 2α4 + 2α5 = 0, 3α1 − 5α2 − 8α3 − 3α4 + 3α5 = 0,

which reduces to α1 = 16α4 − 16α5 , α2 = 9α4 − 9α5 , α3 = 0.

Since there are nontrivial solutions, {p1 , p2 , p3 , p4 , p5 } is linearly dependent (which we already knew), but we can deduce more than that. By taking α4 = 1, α5 = 0, we see that α1 = 16, α2 = 9, α3 = 0, α4 = 1, α5 = 0 is one solution, which means that 16p1 (x) + 9p2 (x) + p4 (x) = 0 ⇒ p4 (x) = −16p1 (x) − 9p2 (x). This shows that p4 ∈ sp{p1 , p2 } ⊂ sp{p1 , p2 , p3 }. Similarly, taking α4 = 0, α5 = 0, we find that −16p1 (x) − 9p2 (x) + p5 (x) = 0 ⇒ p5 (x) = 16p1 (x) + 9p2 (x), and hence p5 ∈ sp{p1 , p2 } ⊂ sp{p1 , p2 , p3 }. It follows from Lemma 19 that sp{p1 , p2 , p3 , p4 , p5 } = sp{p1 , p2 , p3 }. Our calculations above show that {p1 , p2 , p3 } is linearly independent (if α4 = α5 = 0, then also α1 = α2 = α3 = 0). Therefore, {p1 , p2 , p3 } is a linearly independent spanning set of S and hence a basis for S. 13. Suppose V is a vector space over a field F , and S, T are two n-dimensional subspaces of V . We wish to prove that if S ⊂ T , then in fact S = T . Let {s1 , s2 , . . . , sn } be a basis for S. Since S ⊂ T , this implies that {s1 , s2 , . . . , sn } is a linearly independent subset of T . We will now show that {s1 , s2 , . . . , sn } also spans T . Let t ∈ T be arbitrary. Since T has dimension n, the set {s1 , s2 , . . . , sn , t} is linearly dependent by Theorem 34. But then, by Lemma 33, t must be a linear combination of s1 , s2 , . . . , sn (since no sk is a linear combination of s1 , s2 , . . . , sk−1 ). This shows that t ∈ sp{s1 , s2 , . . . , sn }, and hence we have shown that {s1 , s2 , . . . , sn } is a basis for T . But then T = sp{s1 , s2 , . . . , sn } = S, as desired.

10

CHAPTER 2. FIELDS AND VECTOR SPACES

2.7

Properties of bases

1. Consider the following vectors in R3 : v1 = (1, 5, 4), v2 = (1, 5, 3), v3 = (17, 85, 56), v4 = (1, 5, 2), v5 = (3, 16, 13). (a) We wish to show that {v1 , v2 , v3 , v4 , v5 } spans R3 . Given an arbitrary x ∈ R3 , the equation α1 v1 + α2 v2 + α3 v3 + α4 v4 + α5 v5 = x is equivalent to the system α1 + α2 + 17α3 + α4 + 3α5 = x1 , 5α1 + 5α2 + 85α3 + 5α4 + 16α5 = x2 , 4α1 + 3α2 + 56α3 + 2α4 + 13α5 = x3 . Applying Gaussian elimination, this system reduces to α1 = 17x1 − 4x2 + x3 − 5α3 + α4 ,

α2 = x2 − x1 − x3 − 12α3 − 2α5 , α5 = x2 − 5x1 .

This shows that there are solutions regardless of the value of x; that is, each x ∈ R3 can be written as a linear combination of v1 , v2 , v3 , v4 , v5 . Therefore, {v1 , v2 , v3 , v4 , v5 } spans R3 .

(b) Now we wish to find a subset of {v1 , v2 , v3 , v4 , v5 } that is a basis for R3 . According to the calculations given above, each x ∈ R3 can be written as a linear combination of {v1 , v2 , v5 } (just take α3 = α4 = 0 in the system solved above). Since dim(R3 ) = 3, any three vectors spanning R3 form a basis for R3 (by Theorem 45). Hence {v1 , v2 , v5 } is a basis for R3 . 5. Let u1 = (1, 4, 0, −5, 1), u2 = (1, 3, 0, −4, 0), u3 = (0, 4, 1, 1, 4) be vectors in R5 . (a) To show that {u1 , u2 , u3 } is linearly independent, we solve the equation α1 u1 + α2 u2 + α3 u3 = 0, which is equivalent to the system α1 + α2 = 0, 4α1 + 3α2 + 4α3 = 0, α3 = 0, −5α1 − 4α2 + α3 = 0,

α1 + 4α3 = 0.

A direct calculation shows that this system has only the trivial solution. (b) To extend {u1 , u2 , u3 } to a basis for R5 , we need two more vectors. We will try u4 = (0, 0, 0, 1, 0) and u5 = (0, 0, 0, 0, 1). We solve α1 u1 + α2 u2 + α3 u3 + α4 u4 + α5 u5 = 0 and find that the only solution is the trivial one. This implies that {u1 , u2 , u3 , u4 , u5 } is linearly independent and hence, by Theorem 45, a basis for R5 . 9. Consider the vectors u1 = (3, 1, 0, 4) and u2 = (1, 1, 1, 4) in Z45 . (a) It is obvious that {u1 , u2 } is linearly independent, since neither vector is a multiple of the other.

(b) To extend {u1 , u2 } to a basis for Z45 , we must find vectors u3 , u4 such that {u1 , u2 , u3 , u4 } is linearly independent. We try u3 = (0, 0, 1, 0) and u4 = (0, 0, 0, 1). A direct calculation then shows that α1 u1 + α2 u2 + α3 u3 + α4 u4 = 0 has only the trivial solution. Therefore {u1 , u2 , u3 , u4 } is linearly independent and hence, since dim(Z45 ) = 4, it is a basis for Z45 .

11

2.7. PROPERTIES OF BASES

15. Let V be a vector space over a field F , and let {u1 , . . . , un } be a basis for V . Let v1 , . . . , vk be vectors in V , and suppose vj = α1,j u1 + . . . + αn,j un , j = 1, 2, . . . , k. Define the vectors x1 , . . . , xk in F n by xj = (α1,j , . . . , αn,j ), j = 1, 2, . . . , k. (a) We first prove that {v1 , . . . , vk } is linearly independent if and only if {x1 , . . . , xk } is linearly independent. We will do this by showing that c1 v1 +· · ·+ck vk = 0 in V is equivalent to c1 x1 +· · ·+ck xk = 0 in F n . Then the first equation has only the trivial solution if and only if the second equation does, and the result follows. The proof is a direct manipulation, for which summation notation is convenient: k X j=1

cj vj = 0 ⇔ ⇔

k X

cj

n X

αij ui

i=1

j=1

k X n X

!

cj αij ui = 0

j=1 i=1



n X k X



n X

cj αij ui = 0

i=1 j=1

i=1

 

k X j=1



cj αij  ui = 0.

Since {u1 , . . . , un } is linearly independent, the last equation is equivalent to k X

cj αij = 0, i = 1, 2, . . . , n,

j=1

which, by definition of xj and of addition in F n , is equivalent to k X

cj xj = 0.

j=1

This completes the proof. (b) Now we show that {v1 , . . . , vk } spans V if and only if {x1 , . . . , xk } spans F n . Since each vector in V can be represented uniquely as a linear combination of u1 , . . . , un , there is a one-to-one correspondence between V and F n : w = c1 u1 + · · · + cn un ∈ V ←→ x = (c1 , . . . , cn ) ∈ F n . Mimicking the manipulations in the first part of the exercise, we see that k X j=1

cj vj = w ⇔

k X

cj xj = x.

j=1

Thus the first equation has a solution for every v ∈ V if and only if the second equation has a solution for every x ∈ F n . The result follows.

12

CHAPTER 2. FIELDS AND VECTOR SPACES

2.8

Polynomial interpolation and the Lagrange basis

1. (a) The Lagrange polynomials for the interpolation nodes x0 = 1, x1 = 2, x3 = 3 are 1 (x − 2)(x − 3) = (x − 2)(x − 3), (1 − 2)(1 − 3) 2 (x − 1)(x − 3) = −(x − 1)(x − 3), L1 (x) = (2 − 1)(2 − 3) (x − 1)(x − 2) 1 L2 (x) = = (x − 1)(x − 2). (3 − 1)(3 − 2) 2 L0 (x) =

(b) The quadratic polynomial interpolating (1, 0), (2, 2), (3, 1) is p(x) = 0L0 (x) + 2L1 (x) + L2 (x) 1 = −2(x − 1)(x − 3) + (x − 1)(x − 2) 2 13 3 = − x2 + x − 5. 2 2 5. We wish to write p2 (x) = 2 + x − x2 as a linear combination of the Lagrange polynomials constructed on the nodes x0 = −1, x1 = 1, x2 = 3. The graph of p passes through the points (−1, p(−1)), (1, p(1)), (3, p(3)), that is, (−1, 0), (1, 2), (3, −4). The Lagrange polynomials are (x − 1)(x − 3) 1 = (x − 1)(x − 3), (−1 − 1)(−1 − 3) 8 1 (x + 1)(x − 3) = − (x + 1)(x − 3), L1 (x) = (1 + 1)(1 − 3) 4 1 (x + 1)(x − 1) = (x + 1)(x − 1), L2 (x) = (3 + 1)(3 − 1) 8 L0 (x) =

and therefore, p(x) = 0L0 (x) + 2L1 (x) − 4L2 (x) 1 1 = − (x + 1)(x − 3) − (x + 1)(x − 1). 2 2 11. Consider a secret sharing scheme in which five individuals will receive information about the secret, and any two of them, working together, will have access to the secret. Assume that the secret is a twodigit integer, and that p is chosen to be 101. The degree of the polynomial will be one, since then the polynomial will be uniquely determined by two data points. Let us suppose that the secret is N = 42 and we choose the polynomial to be p(x) = N + c1 x, where c1 = 71 (recall that c1 is chosen at random). We also choose the five interpolation nodes at random to obtain x1 = 9, x2 = 14, x3 = 39, x4 = 66, and x5 = 81. We then compute y1 = p(x1 ) = 42 + 71 · 9 = 75,

y2 = p(x2 ) = 42 + 71 · 14 = 26, y3 = p(x3 ) = 42 + 71 · 39 = 84,

y4 = p(x4 ) = 42 + 71 · 66 = 82, y5 = p(x5 ) = 42 + 71 · 81 = 36

(notice that all arithmetic is done modulo 101). The data points, to be distributed to the five individuals, are (9, 75), (14, 26), (39, 84), (66, 82), (81, 36).

2.9. CONTINUOUS PIECEWISE POLYNOMIAL FUNCTIONS

2.9

13

Continuous piecewise polynomial functions

1. The following table shows the maximum errors obtained in approximating f (x) = ex on the interval [0, 1] by polynomial interpolation and by piecewise linear interpolation, each on a uniform grid with n nodes. n 1 2 3 4 5 6 7 8 9 10

Poly. interp. err. 2.1187 · 10−1 1.4420 · 10−2 9.2390 · 10−4 5.2657 · 10−5 2.6548 · 10−6 1.1921 · 10−7 4.8075 · 10−9 1.7565 · 10−10 5.8575 · 10−12 1.8119 · 10−13

PW linear interp. err. 2.1187 · 10−1 6.6617 · 10−2 3.2055 · 10−2 1.8774 · 10−2 1.2312 · 10−2 8.6902 · 10−3 6.4596 · 10−3 4.9892 · 10−3 3.9692 · 10−3 3.2328 · 10−3

For this example, polynomial interpolation is very effective.

Chapter 3

Linear operators 3.1

Linear operators

1. Let m, b be real numbers, and define f : R → R be defined by f (x) = mx + b. If b is nonzero, then this function is not linear. For example, f (2 · 1) = f (2) = 2m + b,

2f (1) = 2(m + b) = 2m + 2b.

Since b 6= 0, 2m + b 6= 2m + 2b, which shows that f is not linear. On the other hand, if b = 0, then f is linear: f (x + y) = m(x + y) = mx + my = f (x) + f (y) for all x, y ∈ R, f (ax) = m(ax) = a(mx) = af (x) for all a, x ∈ R.

Thus we see that f (x) = mx + b is linear if and only if b = 0. 5. We wish to determine which of the following real-valued functions defined on Rn is linear. Pn (a) f : Rn → R, f (x) = i=1 xi . Pn (b) g : Rn → R, g(x) = i=1 |xi |. Qn (c) h : Rn → R, h(x) = i=1 xi . The function f is linear:

f (x + y) = f (αx) =

n X

i=1 n X

(x + y)i =

n X

(xi + yi ) =

i=1

i=1

(αx)i =

n X

αxi = α

n X

xi +

n X

yi = f (x) + f (y),

i=1

xi = αf (x).

i=1

i=1

i=1

n X

However, g and h are both nonlinear. For instance, if x 6= 0, then g(−x) 6= −g(x) (in fact, g(−x) = g(x) for all x ∈ Rn ). Also, if no component of x is zero, then h(2x) = 2n h(x) 6= 2h(x) (of course, we are assuming n > 1). 9. (a) If A ∈ C2×3 , x ∈ C3 are defined by A=



1+i 2−i

1 − i 2i 1 + 2i 3 15





 3 , x =  2+i , 1 − 3i

16

CHAPTER 3. LINEAR OPERATORS then Ax =



12 + 4i 9 − 7i



.

(b) If A ∈ Z3×3 , x ∈ Z32 are defined by 2 

1 A= 1 0

   0 1 1 , x =  1 , 1 1

1 0 1

then 

 0 Ax =  0  . 0 13. We now wish to give a formula for the ith row of AB, assuming A ∈ F m×n , B ∈ F n×p . As pointed out in the text, we have a standard notation for the columns of a matrix, but no standard notation for the rows. Let us suppose that the rows of B are r1 , r2 , . . . , rn . Then (AB)ij =

n X

n X

Aik (rk )j =

k=1

k=1

Aik rk

!

j

(once again using the componentwise definition of the operations). This shows that the ith row of AB is n X

Aik rk ,

k=1

that is, the linear combination of the rows of B, with the weights taken from the ith row of A.

3.2

More properties of linear operators

3. Consider the linear operator mapping R2 into itself that sends each vector (x, y) to its projection onto the x-axis, namely, (x, 0). We see that e1 is mapped to itself and e2 = (0, 1) is mapped to (0, 0). Therefore, the matrix representing the linear operator is A=



1 0

0 0



.

9. Let x ∈ RN be denoted as x = (x0 , x1 , . . . , xN −1 ). Given x, y ∈ RN , the convolution of x and y is the vector x ∗ y ∈ RN defined by (x ∗ y)n =

N −1 X

m=0

xm yn−m , n = 0, 1, . . . , N − 1.

In this formula, y is regarded as defining a periodic vector of period N ; therefore, if n − m < 0, we take

17

3.3. ISOMORPHIC VECTOR SPACES yn−m = yN +n−m . The linearity of the convolution operator is obvious: ((x + z) ∗ y)n =

N −1 X

(x + z)m yn−m =

m=0

N −1 X

(xm + zm )yn−m

m=0

=

N −1 X

(xm yn−m + zm yn−m )

m=0

=

N −1 X

xm yn−m +

N −1 X

α(xm yn−m )

m=0

((αx) ∗ y)n =

N −1 X

(αx)m yn−m =

m=0

N −1 X

zm yn−m

m=0

= (x ∗ y)m + (z ∗ y)m ,

N −1 X

(αxm )yn−m =

m=0

m=0



N −1 X

xm yn−m

m=0

= α(x ∗ y)n . Therefore, if y is fixed and F : RN → RN , then F (x + z) = F (x) + F (z) for all x, z ∈ RN and F (αx) = αF (x) for all x ∈ RN , α ∈ R. This proves that F is linear. Next, notice that, if ek is the kth standard basis vector, then

F (ek )n = yn−k , n = 0, 1, . . . , N − 1. It follows that F (e0 ) = (y0 , y1 , . . . , YN −1 ), F (e1 ) = (yN −1 , y0 , . . . , YN −2 ),

F (e2 ) = (yN −2 , yN −1 , . . . , YN −3 ), .. .. . . F (eN −1 ) = (y1 , y2 , . . . , Y0 ). Therefore, F (x) = Ax for all x ∈ RN , where  y0 yN −1  y1 y0   y2 y1 A=  .. ..  . . yN −1 yN −2

3.3

yN −2 yN −1 y0 .. .

··· ··· ··· .. .

y1 y2 y3 .. .

yN −3

···

y0



   .  

Isomorphic vector spaces

1. (a) The function f : R → R, f (x) = 2x + 1 is invertible since f (x) = y has a unique solution for each y ∈ R: y−1 f (x) = y ⇔ 2x + 1 = y ⇔ x = . 2 We see that f −1 (y) = (y − 1)/2. (b) The functionf : R → (0, ∞), f (x) = ex is also invertible, and the inverse is f −1 (y) = ln (y): eln (y) = y for all y ∈ R, ln (ex ) = x for all x ∈ R.

18

CHAPTER 3. LINEAR OPERATORS (c) The function f : R2 → R2 , f (x) = (x1 + x2 , x1 − x2 ) is invertible since f (x) = y has a unique solution for each y ∈ R2 : f (x) = y ⇔

x1 + x2 = y1 , x1 − x2 = y2

We see that f −1 (y) =





x1 = x2 =

y1 + y2 y1 − y2 , 2 2



y1 +y2 2 , y1 −y2 2 .

.

(d) The function f : R2 → R2 , f (x) = (x1 − 2x2 , −2x1 + 4x2 ) fails to be invertible, and in fact is neither injective nor surjective. For example, f (0) = 0 and also f ((2, 1)) = (0, 0) = 0, which shows that f is not injective. The equation f (x) = (1, 1) has no solution: f (x) = (1, 1) ⇔

x1 − 2x2 = 1, −2x1 + 4x2 = 1



x1 − 2x2 = 1, 0 = 3.

This shows that f is not surjective. 5. Let X, Y , and Z be sets, and suppose f : X → Y , g : Y → Z are given functions. (a) If f and g ◦ f are invertible, then g is invertible. We can prove this using the previous exercise and the fact that f −1 is invertible. We have g = (g ◦ f ) ◦ f −1 : ((g ◦ f ) ◦ f −1 )(y) = g(f (f −1 (y))) = g(y) for all y ∈ Y. Therefore, since g ◦ f and f −1 are invertible, the previous exercise implies that g is invertible.

(b) Similarly, if g and g ◦ f are invertible, then f is invertible since f = g −1 ◦ (g ◦ f ).

(c) If we merely know that g ◦ f is invertible, we cannot conclude that either f or g is invertible. For example, let f : R → R2 , g : R2 → R be defined by f (x) = (x, 0) for all x ∈ R and g(y) = y1 + y2 for all y ∈ R2 , respectively. Then (g ◦ f )(x) = g(f (x)) = g((x, 0)) = x + 0 = x, and it is obvious that g ◦ f is invertible (in fact, (g ◦ f )−1 = g ◦ f )). However, f is not surjective and g is not injective, so neither is invertible.

13. Let X be the basis for Z32 from the previous exercise, let A ∈ Z3×3 be defined by 2   1 1 0 A =  1 0 1 , 0 1 1

and define L : Z32 → Z32 by L(x) = Ax. We wish to find [L]X ,X . The columns of this matrix are [L(x1 )]X , [L(x2 )]X , [L(x3 )]X . We have [L(x1 )]X = [(1, 1, 0)]X = (0, 1, 0),

[L(x2 )]X = [(0, 1, 1)]X = (1, 0, 1), [L(x3 )]X = [(0, 0, 0)]X = (0, 0, 0), and hence [L]X ,X



0 = 1 0

 1 0 0 0 . 1 0

19. We wish to determine if the operator D defined in Example 79 is an isomorphism. In fact, D is not an isomorphism since it is not injective. For example, if p(x) = x and q(x) = x + 1, then p 6= q but D(p) = D(q).

19

3.4. LINEAR OPERATOR EQUATIONS

3.4

Linear operator equations

1. Suppose L : R3 → R3 is linear, b ∈ R3 is given, and y = (1, 0, 1), z = (1, 1, −1) are two solutions to L(x) = b. We are asked to find two more solutions to L(x) = b. We know that y − z = (0, −1, 2) satisfies L(y − z) = 0. Therefore, z + α(y − z) satisfies L(z + α(y − z)) = L(z) + αL(y − z) = b + α · 0 = b for all α ∈ R. Thus two more solutions of L(x) = b are z + 2(y − z) = (1, 1, −1) + (0, −2, 4) = (1, −1, 3), z + 3(y − z) = (1, 1, −1) + (0, −3, 6) = (1, −2, 5). 5. Let L : R3 → R3 satisfy ker(L) = sp{(1, 1, 1)} and L(u) = v, where u = (1, 1, 0) and v = (2, −1, 2). A vector x satisfies L(x) = v if and only if there exists α such that x = u + αz (where z = (1, 1, 1)), that is, if and only if x − u = αz for some α ∈ R. (a) For x = (1, 2, 1), x − u = (0, 1, 1), which is not a multiple of z. Thus L(x) 6= v.

(b) For x = (3, 3, 2), x − u = (2, 2, 2) = 2z. Thus L(x) = v.

(c) For x = (−3, −3, −2), x − u = (−4, −4, −2), which is not a multiple of z. Thus L(x) 6= v.

7. Let A ∈ Z3×3 , b ∈ Z32 be defined by 2 

1 1 A= 1 0 0 1

   0 1 1 , b =  0 . 1 1

If we solve Ax = 0 directly, we obtain two solutions: x = (0, 0, 0) or x = (1, 1, 1). Thus the solution space of Ax = 0 is {(0, 0, 0), (1, 1, 1)}. If we solve Ax = b, we also obtain two solutions: x = (0, 1, 0) or x = (1, 0, 1). If L : Z32 → Z32 is the linear operator defined by L(x) = Ax for all x ∈ Z32 , then ker(L) = {(0, 0, 0), (1, 1, 1)} and the solution set of L(x) = b is {(0, 1, 0), (1, 0, 1)} = (0, 1, 0) + ker(L), as predicted by Theorem 90.

3.5

Existence and uniqueness of solutions

3. Each part of this exercise describes an operator with certain properties. We wish to determine if such an operator can exist. (a) We wish to find a linear operator T : R3 → R2 such that T (x) = b has a solution for all b ∈ R2 . Any surjective operator will do, such as T : R3 → R2 defined by T (x) = (x1 , x2 ).

(b) No linear operator T : R2 → R3 has the property that T (x) = b has a solution for all b ∈ R3 . This would imply that T is surjective, but then Theorem 99 would imply that dim(R3 ) ≤ dim(R2 ), which is obviously not true.

(c) Every linear operator T : R3 → R2 has the property that, for some b ∈ R2 , the equation T (x) = b has infinitely many solutions. In fact, since dim(R2 ) < dim(R3 ), any such T : R3 → R2 fails to be injective and hence has a nontrivial kernel. Therefore, T (x) = 0 necessarily has infinitely many solutions. (d) We wish to construct a linear operator T : R2 → R3 such that, for some b ∈ R3 , the equation T (x) = b has infinitely many solutions. We will take T defined by T (x) = (x1 − x2 , x1 − x2 , 0). Then, for every α ∈ R, x = (α, α) satisfies T (x) = 0. (e) We wish to find a linear operator T : R2 → R3 with the property that T (x) = b does not have a solution for all b ∈ R3 , but when there is a solution, it is unique. Any nonsingular T will do, since R(T ) is necessarily a proper subspace of R3 (that is, T : R2 → R3 cannot be surjective), and nonsingularity implies that T (x) = b has a unique solution for each b ∈ R(T ) (by Definition 96 and Theorem 92). For example, T : R2 → R3 defined by T (x) = (x1 , x2 , 0) has the desired properties.

20

CHAPTER 3. LINEAR OPERATORS 7. Define S : Pn → Pn by S(p)(x) = p(2x + 1). We wish to find the rank and nullity of S. We first find the kernel of S. We have that S(p) = 0 if and only if p(2x + 1) = 0 for all x ∈ R. Consider an arbitrary y ∈ R and define x = (y − 1)/2. Then   y−1 + 1 = 0 ⇒ p(y) = 0. p(2x + 1) = 0 ⇒ p 2 · 2 Since p(y) = 0 for all y ∈ R, it follows that p must be the zero polynomial, and hence the kernel of S is trivial. Thus nullity(S) = 0. Now consider any q ∈ Pn , and define p ∈ Pn by p(x) = q((x − 1)/2). Then (S(p))(x) = p(2x + 1) = q(

2x + 1 − 1 ) = q(x). 2

This shows that S is surjective, and hence rank(S) = dim(Pn ) = n + 1. 13. (a) Suppose X and U are finite-dimensional vector spaces over a field F and T : X → U is an injective linear operator. Then R(T ) is a subspace of U , and we can define T1 : X → R(T ) by T1 (x) = T (x) for all x ∈ X. Obviously T1 is surjective, and it is injective since T is injective. Thus T1 is an isomorphism between X and R(T ). (b) Consider the operator S : Pn−1 → Pn of Example 100 (and the previous exercise), which we have seen to be injective. By the previous part of this exercise, S defines an isomorphism between Pn−1 and R(S) = sp{x, x2 , . . . , xn } ⊂ Pn .

3.6

The fundamental theorem; inverse operators

3. Let F be a field, let A ∈ F n×n , and let T : F n → F n be defined by T (x) = Ax. We first wish to show that A is invertible if and only if T is invertible. We begin by noting that if M ∈ F n×n and M x = x for all x ∈ F n , then M = I. This follows because each linear operator mapping F n into F n is represented by a unique matrix (with respect to the standard basis on F n ). The condition M x = x for all x ∈ F n shows that M represents the identity operator, as does the identity matrix I; hence M = I. Now suppose that T is invertible, and let B be the matrix of T −1 under the standard basis. We then have (AB)x = A(Bx) = T (T −1 (x)) = x for all x ∈ F n ⇒ AB = I and

(BA)x = B(Ax) = T −1 (T (x)) = x for all x ∈ F n ⇒ BA = I.

This shows that A is invertible and that B = A−1 .

Conversely, suppose that A is invertible, and define S : F n → F n by S(x) = A−1 x. Then S(T (x)) = A−1 (Ax) = (A−1 A)x = Ix = x for all x ∈ F n and

T (S(x)) = A(A−1 x) = (AA−1 )x = Ix = x for all x ∈ F n .

This shows that T is invertible and that S = T −1 .

Notice that the above also shows that if A is invertible, then A−1 is the matrix defining T −1 . 7. We repeat the previous exercise for the operators defined below. (a) M : R2 → R3 defined by M (x) = Ax, where 

 1 1 A =  1 0 . 0 1

21

3.7. GAUSSIAN ELIMINATION

Since the dimension of the domain is less than the dimension of the co-domain, Theorem 99 implies that M is not surjective. The range of M is spanned by the columns of A (which are linearly independent), so R(M ) = sp{(1, 1, 0), (1, 0, 1)}

and rank(M ) = 2. By the fundamental theorem, we see that nullity(M ) = 0; thus ker(M ) is trivial and M is injective. (b) M : R3 → R2 define by M (x) = Ax, where   1 2 1 A= . 1 0 −1 Since the dimension of the domain is greater than the dimension of the co-domain, Theorem 93 implies that M cannot be injective. A direct calculation shows that ker(M ) = sp{(1, −1, 1)}, and hence nullity(M ) = 1. By the fundamental theorem, we see that rank(M ) = 2 = dim(R2 ). Hence M must be surjective. 13. Let F be a field and suppose A ∈ F m×n , B ∈ F n×p . We wish to show that that rank(AB) ≤ rank(A). For all x ∈ F p , we have (AB)x = A(Bx), where Bx ∈ F n . It follows that (AB)x ∈ col(A) for all x ∈ F p , which shows that col(AB) ⊂ col(A). Hence, by Exercise 2.6.13, dim(col(AB)) ≤ dim(col(A)), that is, rank(AB) ≤ rank(A). 15. Suppose A ∈ Cn×n is called strictly diagonally dominant: |Aii | >

n X

j = 1 j 6= i

|Aij |, i = 1, 2, . . . , n.

We wish to prove that a A is nonsingular. Suppose x ∈ Rn satisfies x 6= 0, Ax = 0. Let j be the index with the property that |xj | ≥ |xk | for all k = 1, 2, . . . , n, k 6= j. Then (Ax)i =

n X

Aij xj = Aii xi +

j=1

Since (Ax)i = 0, we obtain Aii xi = −

n X

Aij xj .

j = 1 j 6= i

n X

Aij xj .

j =1 j 6= i

But n n n X X X Aij xj ≤ |Aij ||xj | ≤ |Aij ||xi | − j=1 j =1 j = 1 j 6= i j 6= i j 6= i = |xi |

n X

j = 1 j 6= i

|Aij | < |xi ||Aii |.

This is a contradiction, which shows that x cannot be nonzero. Thus the only solution of Ax = 0 is x = 0, which shows that A must be nonzero.

3.7

Gaussian elimination

3. The matrix A has no inverse; its rank is only 2. 7. The matrix A is not invertible; its rank is only 2.

22

CHAPTER 3. LINEAR OPERATORS

3.8

Newton’s method

1. The two solutions of (3.13) are   s  s √ √ √ √ −1 + −1 + −1 + 5 5 −1 + 5 5  , − .  , , 2 2 2 2 3. We apply Newton’s method to solve F (x) = 0, where F : R3 → R3 is defined by   2 x1 + x22 + x23 − 1 F (x) =  x21 + x22 − x3  . 3x1 + x2 + 3x3 There are two solutions:

(−0.721840, 0.311418, 0.6180340), (−0.390621, −0.682238, 0.6180340) (to six digits).

3.9

Linear ordinary differential equations

1. If x1 (t) = ert , x2 (t) = tert , where r ∈ R, then the Wronskian matrix is   rt t0 ert0 e 0 . W = rert0 ert0 + rt0 ert0 Choosing t0 = 0, we obtain W =



1 r

0 1



,

which is obvious nonsingular. Thus {ert , tert } is a linearly independent subset of C(R). 7. Consider the set {x1 , x2 , x3 } ⊂ C(R), where x1 (t) = t, x2 (t) = t2 , x3 (t) = t3 . (a) The Wronskian matrix of x1 , x2 , x3 at t0 = 1 is 

1 W = 1 0

 1 1 2 3 . 2 6

A direct calculation shows that ker(W ) = {0}, and hence W is nonsingular. By Theorem 129, this implies that {x1 , x2 , x3 } is linearly independent.

(b) The Wronskian matrix of x1 , x2 , x3 at t0 = 0 is  which is obviously singular.

0 W = 1 0

 0 0 0 0 , 2 0

Theorem 130 states that, if x1 , x2 , x3 are all solutions of a third-order linear ODE, then W is nonsingular for all t0 if and only if {x1 , x2 , x3 } is linearly independent. In this example, {x1 , x2 , x3 } is linearly independent, but W is singular for t0 = 0. This does not violate Theorem 130, because x1 , x2 , x3 are not solutions of a common third-order ODE.

23

3.10. GRAPH THEORY

3.10

Graph theory

1. Let G be a graph, and let vi , vj be two nodes in VG . Since (AℓG )ij is the number of walks of length ℓ joining vi and vj , the distance between vi and vj is the smallest ℓ (ℓ = 1, 2, 3, . . .) such that (AℓG )ij 6= 0.  3. Let G be a graph, and let AG be the adjacency matrix of G. We wish to prove that A2G ii is the degree of vi for each i = 1, 2, . . . , n. Since AG is symmetric ((AG )ij = (AG )ji for all i, j = 1, 2, . . . , n), we have A2G



ii

=

n X

(AG )2ij .

j=1

Now, (AG )2ij is 1 if an edge joins vi and vj , and it equals 0 otherwise. Thus A2G edges having vi as an endpoint.

3.11



ii

counts the number of

Coding theory

3. The following message is received. 010110 000101 010110 010110 010001 100100 010110 010001 It is known that the code of Example 141 is used. Let B be the 8 × 6 binary matrix whose rows are the above codewords. If we try to solve M G = B, we obtain   0 1 0 1  x x x x     0 1 0 1     0 1 0 1   M =  0 1 0 0 ,    1 0 0 1     0 1 0 1  0 1 0 0

where the second “codeword” cannot be decoded because it is not a true codeword (that is mG = b2 is inconsistent, where b2 = 000101). In fact, both 000111 (= mG for m = 0001) and 001101 (= mG for m = 0011) are distance 1 from b2 . This means that the first ASCII character, with code 0101xxxx could be 01010001 (Q) or 01010011 (S). The remaining characters are 01010101 (U), 01001001 (I), 01010100 (T), so the message is either “QUIT” or “SUIT”.

3.12

Linear programming

5. The LP is unbounded; every point of the form (x1 , x2 ) = (3 + t, t/3), t ≥ 0, is feasible, and for such points z increases without bound as t → ∞. 11. If we apply the simplex method to the LP of Example 158, using the smallest subscript rule to choose both the entering and leaving variables, the method terminates after 7 iterations with an optimal solution of x = (1, 0, 1, 0) and z = 1. The basic variables change as follows: {x5 , x6 , x7 } → {x1 , x6 , x7 } → {x1 , x2 , x7 } → {x3 , x2 , x7 } → {x3 , x4 , x7 } → {x5 , x4 , x7 } → {x5 , x1 , x7 } → {x5 , x1 , x3 }.

Chapter 4

Determinants and eigenvalues 4.1

The determinant function

3. Define 

Then

a  0 A=  0 0

0 0 b 0 0 c 0 0

  a 0  0 0  , B =   0 0  0 d

b c e f 0 h 0 0

 d g  . i  j

det(A) = det(ae1 , be2 , ce3 , de4 ) = adet(e1 , be2 , ce3 , de4 ) = abdet(e1 , e2 , ce3 , de4 ) = abcdet(e1 , e2 , e3 , de4 ) = abcddet(e1 , e2 , e3 , e4 ) = abcd. Here we have repeatedly used the second part of the definition of the determinant. To compute det(B), we note that we can add any multiple of one column to another without changing the determinant (the third part of the definition of the determinant). We have det(B) = det(ae1 , be1 + ee2 , ce1 + f e2 + he3 , de1 + ge2 + ie3 + je4 ) = adet(e1 , be1 + ee2 , ce1 + f e2 + he3 , de1 + ge2 + ie3 + je4 ) = adet(e1 , ee2 , f e2 + he3 , ge2 + ie3 + je4 ) = aedet(e1 , e2 , f e2 + he3 , ge2 + ie3 + je4 ) = aedet(e1 , e2 , he3 , ie3 + je4 ) = aehdet(e1 , e2 , e3 , ie3 + je4 ) = aehdet(e1 , e2 , e3 , je4 ) = aehjdet(e1 , e2 , e3 , e4 ) = aehj. 5. Consider the permutation τ = (4, 3, 2, 1) ∈ S4 . We can write τ = [2, 3][1, 4], τ = [3, 4][2, 4][2, 3][1, 4][1, 3][1, 2]. Since τ is the product of an even number of permutations, we see that σ(τ ) = 1. 25

26

CHAPTER 4. DETERMINANTS AND EIGENVALUES

11. Let n be a positive integer, and let i and j be integers satisfying 1 ≤ i, j ≤ n, i 6= j. For any τ ∈ Sn , define τ ′ by τ ′ = τ [i, j] (that is, τ ′ is the composition of τ and the transposition [i, j]. Finally, define f : Sn → Sn by f (τ ) = τ ′ . We wish to prove that f is a bijection. First, let γ ∈ Sn , and define θ ∈ Sn by θ = γ[i, j]. Then f (θ) = (γ[i, j])[i, j] = γ([i, j][i, j]). It is obvious that [i, j][i, j] is the identity permutation, and hence f (θ) = γ. This shows that f is surjective. Similarly, if θ1 , θ2 ∈ Sn and f (θ1 ) = f (θ2 ), then θ1 [i, j] = θ2 [i, j]. But then θ1 [i, j] = θ2 [i, j] ⇒ (θ1 [i, j])[i, j] = (θ2 [i, j])[i, j] ⇒ θ1 ([i, j][i, j]) = θ2 ([i, j][i, j]) ⇒ θ1 = θ2 . This shows that f is injective, and hence bijective.

4.2

Further properties of the determinant function

3. Let F be a field and let A ∈ F n×n . We wish to show that AT A is singular if and only if A is singular. We have det(AT A) = det(AT )det(A) = det(A)2 , since det(AT ) = det(A). It follows that det(AT A) = 0 if and only if det(A) = 0, that is, AT A is singular if and only if A is singular. 9. Let A ∈ F m×n , B ∈ F n×m , where m < n. We will show by example that both det(AB) = 0 and det(AB) 6= 0 are possible. First, note that A=



1 2 1 0

1 1





  1 1 4 , B =  1 1  ⇒ AB = 2 1 1

4 2



⇒ det(AB) = 4 · 2 − 2 · 4 = 0.

On the other hand, A=



1 2 1 0

1 1





  1 1 6   , B = 2 0 ⇒ AB = 2 1 1

2 2



⇒ det(AB) = 6 · 2 − 2 · 2 = 8.

4.3

Practical computation of det(A)

7. Suppose A ∈ Rn×n is invertible and has integer entries, and assume det(A) = ±1. Notice that the determinant of a matrix with integer entries is obviously an integer. We can compute the jth column of A−1 by solving Ax = ej . By Cramer’s rule, (A−1 )ij =

det(Ai (ej )) = ±det(Ai (ej )), i = 1, . . . , n. det(A)

Since det(Ai (ej )) is an integer, so is (A−1 )ij . This holds for all i, j, and hence A−1 has integer entries. 8. The solution is x = (2, 1, 1).

27

4.5. EIGENVALUES AND THE CHARACTERISTIC POLYNOMIAL

4.5

Eigenvalues and the characteristic polynomial

1. (a) The eigenvalues are λ1 = 1 (algebraic multiplicity 2) and λ2 = −1 (algebraic multiplicity 1). Bases for the eigenspaces are {(0, 1, 0), (1, 0, 2)} and {(4, 0, 7)}, respectively. (b) The eigenvalues are λ1 = 2 (algebraic multiplicity 2) and λ2 = 1 (algebraic multiplicity 1). Bases for the eigenspaces are {(1, 1, −2)} and {(5, 5, −9)}, respectively.

5. Suppose A ∈ Rn×n has a real eigenvalue λ and a corresponding eigenvector z ∈ Cn . We wish to show that either the real or imaginary part of z is an eigenvector of A. We are given that Az = λz, z 6= 0. Write z = x + iy, x, y ∈ Rn . Then Az = λz ⇒ A(x + iy) = λ(x + iy) ⇒ Ax + iAy = λx + iλy ⇒ Ax = λx and Ay = λy. Since z 6= 0, it follows that x = 6 0 or y 6= 0. If x 6= 0, then the real part of z is an eigenvector for A corresponding to λ; otherwise, y = 6 0 and the imaginary part of z is an eigenvector.

9. Let q(r) = rn + cn−1 rn−1 + · · · + c0 be an arbitrary polynomial with coefficients in a field F , and let   0 0 0 ··· −c0  1 0 0 ··· −c1     0 1 0 ··· −c2  A= .   .. .. .. ..   . . . . 0 0 ··· 1 −cn−1

We wish to prove that pA (r) = q(r). We argue by induction on n. The case n = 1 is trivial, since A is 1 × 1 in that case: A = [−c0 ], |rI − A| = r + c0 = q(r). Suppose the result holds for polynomials of degree n − 1, let q(r) = rn + cn−1 rn−1 + . . . + c0 , and let A be defined as above. Then r 0 0 ··· c0 −1 r 0 · · · c 1 0 −1 r · · · c 2 |rI − A| = , .. . . . .. .. .. . 0 0 · · · −1 r + cn−1 and cofactor expansion along the first row yields r 0 ··· c1 −1 r · · · c2 |rI − A| = r . + . . .. .. .. 0 · · · −1 r + cn−1 −1 r ··· −1 r ··· n+1 . .. ... (−1) c0 −1 r −1 By the induction hypothesis, the first determinant is

c1 + c2 r + · · · + cn−1 rn−2 + rn−1 ,

.

and the second is simply (−1)n−1 . Thus pA (r) = |rI − A|

 = r c1 + c2 r + · · · + cn−1 rn−2 + rn−1 + (−1)n+1 c0 (−1)n−1

= c1 r + c2 r2 + · · · + cn−1 rn−1 + rn + c0 = q(r).

28

CHAPTER 4. DETERMINANTS AND EIGENVALUES This completes the proof by induction.

4.6

Diagonalization

3. A is diagonalizable: A = XDX −1 , where √   √  √  0 1−i 2 i 2 −i 2 √ , X= . D= 1 1 0 1+i 2 7. A is diagonalizable: A = XDX −1 , where D=



0 0

0 1



, X=



0 1

1 1



.

11. Let F be a finite field. We will show that F is not algebraically closed by constructing a polynomial p(x) with coefficients in F such that p(x) 6= 0 for all x ∈ F . Let the elements of F be α1 , α2 , . . . , αq . Define p(x) = (x − α1 )(x − α2 ) · · · (x − αq ) + 1. Then p is a polynomial of degree q, and p(x) = 1 for all x ∈ F . Thus p(x) has no roots, which shows that F is not algebraically closed.

4.7

Eigenvalues of linear operators

1. Let T : R3 → R3 be defined by T (x) = (ax1 + bx2 , bx1 + ax2 + bx3 , bx2 + ax3 ), where a, b ∈ R are constants. Notice that T (x) = Ax for all x ∈ R3 , where   a b 0 A =  b a b . 0 b a

√ √ The eigenvalues of A are a, a+ 2b, a− 2b. If b = 0, then A is already diagonal, which means that [T ]X ,X is diagonal if X is the standard basis. If b 6= 0, then A (and hence T ) has three distinct eigenvalues, and hence three linearly independent eigenvectors. It follows that there exists a basis X such that [T ]X ,X is diagonal.

9. Let L : C3 → C3 be defined by

L(z) = (z1 , 2z2 , z1 + z3 ).

Then L(z) = Az for all z ∈ C3 , where 

1 A= 0 1

 0 0 2 0 . 0 1

The eigenvalues of A are the diagonal entries, λ1 = 1 (with algebraic multiplicity 2) and λ2 = 2. A straightforward calculation shows that EA (1) = sp{(0, 0, 1)}, and hence A is defective and not diagonalizable. Notice that A = [L]S,S , where S is the standard basis for C3 . Since [L]X ,X is either diagonalizable for every basis X or diagonalizable for no basis X , we see that [L]X ,X is not diagonalizable for every basis X of C3 .

29

4.8. SYSTEMS OF LINEAR ODES 11. Let T : Z22 → Z22 be defined by T (x) = (0, x1 + x2 ). Then T (x) = Ax for all x ∈ Z2 , where   0 0 A= . 1 1

The eigenvalues of A are the diagonal entries, λ1 = 0 and λ2 = 1. Corresponding eigenvectors are x1 = (1, 1) and x2 = (0, 1), respectively. It follows that if X = {x1 , x2 }, then [T ]X ,X is diagonal:   0 0 [T ]X ,X = . 0 1

4.8

Systems of linear ODEs

5. Let A ∈ R2×2 be defined by A=



1 9

4 1



.

We wish to solve the IVP u′ = Au, u(0) = v, where v = (1, 2). The eigenvalues of A are λ1 = 7 and λ2 = −5. Corresponding eigenvectors are (2, 3) and (2, −3), respectively. Therefore, the general solution of the ODE u′ = Au is     2 2 −5t 7t . + c2 e x(t) = c1 e −3 3 We solve x(0) = (1, 2) to obtain c1 = 7/12, c2 = −1/12, and thus the solution of the IVP is     1 −5t 7 7t 2 2 − e . e x(t) = 3 −3 12 12 7. etA =

4.9



et cos (2t) et sin (2t) −et sin (2t) et cos (2t)



.

Integer programming

1. Each of the matrices is totally unimodular by Theorem 219. The sets S1 and S2 are given below. (a) S1 = {1, 2}, S2 = {3, 4}.

(b) S1 = {1, 2, 3, 4}, S2 = ∅.

(c) S1 = {2, 3}, S2 = {1, 4}.

Chapter 5

The Jordan canonical form 5.1

Invariant subspaces

1. (a) S is not invariant under A (in fact, A does not map either basis vector into S). (b) T is invariant under A. 5. Let A ∈ R3×3 be defined by



3 0 A =  −6 1 2 0

 −1 3 , 0

and let S = sp{s1 , s2 }, where s1 = (0, 1, 0), s2 = (1, 0, 1)}. A direct calculation shows that As1 = s1 , As2 = −3s1 + 2s2 . It follows that S is invariant under A. We extend {s1 , s2 } to a basis {s1 , s2 , s3 } of R3 by defining s3 = (0, 0, 1), and define X = [s1 |s2 |s3 ]. Then   1 −3 3 X −1 AX =  0 2 −1  . 0 0 1

is block upper triangular (in fact, simply upper triangular).

9. Let U be a finite-dimensional vector space over a field F , and let T : U → U be a linear operator. Let U = {u1 , u2 , . . . , un } be a basis for U and define A = [T ]U ,U . Suppose X ∈ F n×n is an invertible matrix, and define J = X −1 AX. Finally, define vj =

n X

Xij ui , j = 1, 2, . . . , n.

i=1

(a) We wish to show that V = {v1 , v2 , . . . , vn } is a basis for U . Since n = dim(V ), it suffices to prove that V is linearly independent. First notice that ! n X n n X n n n n X X X X X Xij cj ui Xij cj ui = Xij ui = cj cj vj = j=1

j=1

i=1 j=1

j=1 i=1

i=1

= =

n n X X

j=1 n X j=1

31

Xij cj

i=1

(Xc)i ui .

!

ui

32

CHAPTER 5. THE JORDAN CANONICAL FORM Also,

Pn

j=1 (Xc)i ui

= 0 implies that Xc = 0 since U is linearly independent. Therefore, n X j=1

cj vj = 0 ⇒

n X j=1

(Xc)i ui = 0 ⇒ Xc = 0 ⇒ c = 0,

where the last step follows from the fact that X is invertible. Thus V is linearly independent.

(b) Now we wish to prove that [T ]V,V = J. The calculation above shows that if [v]V = c, then [v]U = Xc, that is, [v]U = X[v]V . Therefore, for all v ∈ V , [T ]V,V [v]V = [T (v)]V ⇔ [T ]V,V X −1 [v]U = X −1 [T (v)]U  ⇔ X[T ]V,V X −1 [v]U = [T (v)]U ,

which shows that [T ]U ,U = X[T ]V,V X −1 , or [T ]V,V = X −1 [T ]U ,U X = X −1 AX = J, as desired.

5.2

Generalized eigenspaces

5. For the given A ∈ R5×5 , we have pA (r) = (r − 1)3 (r + 1)2 , so the eigenvalues are λ1 = 1 and λ2 = −1. Direct calculation shows that dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I)3 )) = 3, dim(N ((A − I)4 )) = 3 and dim(N (A + I)) = 1, dim(N ((A + I)2 )) = 2, dim(N ((A + I)3 )) = 2. These results show that the generalized eigenspaces are N ((A − I)3 ) and N ((A + I)2 ). Bases for these subspaces are N ((A − I)3 ) = sp{(0, 1, 0, 0, 0), (0, 0, 1, 0, 0), (0, 0, 0, 1, 0)},

N ((A + I)2 ) = sp{(−1, 0, 12, 4, 0), (1, 4, 0, 0, 2)}.

A direct calculation shows that the union of the two bases is linearly independent and hence a basis for R5 , and it then follows from Theorem 226 that R5 = N ((A − I)3 ) + N ((A + I)2 ). 9. Let F be a field, let λ ∈ F be an eigenvalue of A ∈ F n×n , and suppose that the algebraic and geometric multiplicities of λ are equal, say to m. We wish to show that N ((A − λI)2 ) = N (A − λI). By Theorem 235, there exists a positive integer k such that dim(N ((A − λI)k+1 )) = dim(N ((A − λI)k )) and dim(N ((A − λI)k )) = m. We know that N (A − λI) ⊂ N ((A − λI)2 ) ⊂ · · · ⊂ N ((A − λI)k ), and, by hypothesis, dim(N (A − λI)) = m = dim(N ((A − λI)k )). This implies that N (A − λI) = N ((A − λI)k ), and hence that N ((A − λI)2 ) = N (A − λI) (since N (A − λI) ⊂ N ((A − λI)2 ) ⊂ N ((A − λI)k )). 13. Let F be a field and suppose A ∈ F n×n has distinct eigenvalues λ1 , . . . , λt . We wish to show that A is diagonalizable if and only if mA (r) = (r − λ1 ) · · · (r − λt ).

First, suppose A is diagonalizable and write p(r) = (r − λ1 ) · · · (r − λt ). Every vector x ∈ Rn can be written as a linear combination of eigenvectors of A, from which it is easy to prove that p(A)x = 0 for all x ∈ F n (recall that the factors (A − λi I) commute with one another). Hence p(A) is the zero matrix.

33

5.3. NILPOTENT OPERATORS

It follows from Theorem 239 that mA (r) divides p(r). But every eigenvalue of A is a root of mA (r), and hence p(r) divides mA (r). Since both p(r) and mA (r) are monic, it follows that mA (r) = p(r), as desired. Conversely, suppose the minimal polynomial of A is mA (r) = (r − λ1 ) · · · (r − λt ). But then Corollary 244 implies that F n is the direct sum of the subspaces N (A − λ1 I), . . . , N (A − λt I), which are the eigenspaces EA (λ1 ), . . . , EA (λt ). Since F n is the direct sum of the eigenspaces, it follows that there is a basis of F n consisting of eigenvectors of A (see the proof of Theorem 226 given in Exercise 5.1.10), and hence A is diagonalizable.

5.3

Nilpotent operators

5. Let A ∈ Cn×n be nilpotent. We wish to prove that the index of nilpotency of A is at most n. By Exercise 2, the only eigenvalue of A is λ = 0, which means that the characteristic polynomial of A must be pA (r) = rn . By the Cayley-Hamilton theorem, pA (A) = 0, and hence An = 0. It follows that the index of nilpotency of A is at most n. 9. Suppose A ∈ Rn×n has 0 as its only eigenvalue, so that it must be nilpotent of index k for some k satisfying 1 ≤ k ≤ 5. The possibilities are • If k = 1, then A = 0 and dim(N (A)) = 5. • If k = 2, there is at least one chain x1 , Ax1 of nonzero vectors. There could be a second such chain, x2 , Ax2 , in which case the fifth basis vector must come from N (A), we have dim(N (A)) = 3, dim(N (A2 )) = 5, and A is similar to      

1 0 0 0 0

1 1 0 0 0

0 0 1 0 0

It is also possible that there is only one such chain, and A is similar to  1 1 0  0 1 0   0 0 1   0 0 0 0 0 0

0 0 1 1 0

0 0 0 0 1



  .  

in which case dim(N (A)) = 4, dim(N (A2 )) = 5, 0 0 0 1 0

0 0 0 0 1



  .  

• If k = 3, there is a chain x1 , Ax1 , A2 x1 of nonzero vectors. There are two possibilities for the other vectors needed for the basis. There could be a chain x2 , Ax2 (with A2 x2 0), in which case dim(N (A)) = 2, dim(N (A2 )) = 4, and dim(N (A3 )) = 5, and A is similar to      

1 0 0 0 0

1 1 0 0 0

0 1 1 0 0

0 0 0 1 0

0 0 0 1 1



  .  

34

CHAPTER 5. THE JORDAN CANONICAL FORM Altenatively, there could be two more independent vectors in N (A), in which case dim(N (A)) = 3, dim(N (A2 )) = 4, dim(N (A3 )) = 5, and A is similar to   1 1 0 0 0  0 1 1 0 0     0 0 1 0 0 .    0 0 0 1 0  0 0 0 0 1

• If k = 4, then there is a chain x1 , Ax1 , A2 x1 , A3 x1 of nonzero vectors, and there must be a second independent vector in N (A). Then dim(N (A)) = 2, dim(N (A2 )) = 3, dim(N (A3 )) = 4, dim(N (A4 )) = 5, and A is similar to   1 1 0 0 0  0 1 1 0 0     0 0 1 1 0 .    0 0 0 1 0  0 0 0 0 1

• If k = 5, then there is a chain x1 , Ax1 , A2 x1 , A3 x1 , A4 x1 of nonzero vectors, dim(N (A)) = 1, dim(N (A2 )) = 2, dim(N (A3 )) = 3, dim(N (A4 )) = 4, dim(N (A5 )) = 5, and A is similar to   1 1 0 0 0  0 1 1 0 0     0 0 1 1 0 .    0 0 0 1 1  0 0 0 0 1

15. Let F be a field and suppose A ∈ F n×n is nilpotent. We wish to prove that det(I + A) = 1. By Theorem 251 and the following discussion, we know there exists an invertible matrix X ∈ F n such that X −1 AX is upper triangular with zeros on the diagonal. But then X −1 (I + A)X = I + X −1 AX is upper triangular with ones on the diagonal. From this, we conclude that det(X −1 (I + A)X) = 1. Since I + A is similar to X −1 (I + A)X, it has the same determinant, and hence det(I + A) = 1.

5.4

The Jordan canonical form of a matrix

1. Let



1 A= 0 0

 1 1 1 1  ∈ R3×3 . 0 1

The only eigenvalue of A is λ = 1, and dim(N (A − I)) = 1. Therefore, there must be a single chain of the form (A − I)2 x1 , (A − I)x1 , x1 , where x1 ∈ N ((A − I)3 ) \ N ((A − I)2 ). We have   0 0 1 2 (A − I) =  0 0 0  , 0 0 0 and thus it is easy to see that x1 = (0, 0, 1) 6∈ N ((A − I)2 ). We define   1 1 0 X = [(A − I)2 x1 |(A − I)x1 |x1 ] =  0 1 0  , 0 0 1

35

5.5. THE MATRIX EXPONENTIAL and then

5. Let A ∈ R4×4 be defined by



1 1 X −1 AX =  0 1 0 0

 0 1 . 1



 −3 1 −4 −4  −17 1 −17 −38   A=  −4 −1 −3 −14  . 4 0 4 10

Then pA (r) = (r − 1)3 (r − 2). A direct calculation shows that dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I)3 )) = 3 (and, of course, dim(N (A − 2I)) = 1. Therefore, the Jordan canonical form of A is   1 1 0 0  0 1 1 0     0 0 1 0 . 0 0 0 2

7. Let A ∈ R5×5 be defined by



−7 1 24 4 7 −9 4 21 3 6 −2 −1 11 2 3 −7 13 −18 −6 −8 3 −5 6 3 5

  A=  



  .  

Then pA (r) = (r −1)3 (r −2)2 . A direct calculation shows that dim(N (A−I)) = 2, dim(N ((A−I)2 )) = 3, dim(N (A − 2I)) = 1, and dim(N ((A − 2I)2 )) = 2. Therefore, the Jordan canonical form of A is   1 1 0 0 0  0 1 0 0 0     0 0 1 0 0     0 0 0 2 1  0 0 0 0 2

5.5

The matrix exponential

3. We wish to find the matrix exponential etA for the matrix given formation defined by  1 20 −9 0  0 1 0 −4 X=  −1 −20 0 −2 0 0 4 1 puts A in Jordan canonical form:

We have



1 1 0 0

tet et 0 0

t2 t 2e t

1  0 −1 X AX = J =   0 0 etJ



et  0 =  0 0

te et 0

0 1 1 0

in Exercise 5.4.5. The similarity trans   

 0 0  . 0  2

 0 0  , 0  e2t

36

CHAPTER 5. THE JORDAN CANONICAL FORM and etA = XetJ X −1 is 2 6 6 6 6 6 6 4

« 2 1 − 4t − t 2 et (16 − t) − « 16e2t „ 2 − 8e2t et 8 − 4t + t 2 et



« „ 2 −4t − t 2 et (16 − t) − « 16e2t „ 2 et 9 + 4t + t − 8e2t 2

tet

et

et −tet

−4et + 4e2t

−4et + 4e2t

0

et



−4t − t2



et (36 − 2t) − 36e2t “ ” et 18 + 4t + t2 − 18e2t −8et + 4e2t

3

7 7 7 7. 7 7 5

5. Let A, B ∈ Cn×n . (a) We first show that if A and B commute, then so do etA and B. We define U (t) = etA B − BetA . Then U ′ (t) = AetA B − BAetA = AetA B − ABetA  = A etA B − BetA = AU (t).

Also, U (0) = IB − BI = B − B = 0. Thus U satisfies U ′ = AU , U (0) = 0. But the unique solution of this IVP is U (t) = 0; hence etA B = BetA . (b) Use the preceding result to show that if A and B commute, the et(A+B) = etA etB holds. We define U (t) = etA etB . Then U ′ (t) = AetA etB + etA BetB = AetA etB + BetA etB = (A + B)etA etB (notice how we used the first part of the exercise), and U (0) = I. But et(A+B) also satisfies the IVP U ′ = (A + B)U , U (0) = I. Hence et(A+B) and etA etB must be equal. (c) Let A= Then e and eA+B

tA

=

"



0 1

1 0

1+e2 2 −1+e2 2e



−1+e2 2e 1+e2 2e

#

1 2

It is easy to verify that eA eB 6= eA+B .

5.6

0 0

, e

tB

, B=

 √ √  e 2 + e− 2  √ √  = 1 2 − 2 √ e − e 2 2 



1 0

=

 

.

1 0

1 1



 √ √  2 − 2 √1 − e e 2 √ √  1 2 − 2 e + e 2



.

Graphs and eigenvalues

3. The adjacency matrix is 

    AG =     

0 1 1 0 0 0 0

1 0 1 0 0 0 0

1 1 0 0 0 0 0

0 0 0 0 1 0 1

0 0 0 1 0 1 0

0 0 0 0 1 0 1

0 0 0 1 0 1 0



    ,    

and its eigenvalues are 0, ±1, ±2. Notice that the adjacency matrix shows that every vertex has degree 2, and hence G is 2-regular. Theorem 263 states that the largest eigenvalue of G should be 2, as indeed it is. Also, since the multiplicity of λ = 2 is 2, G must have two connected components. It does; the vertices v1 , v2 , v3 form one connected component, and the vertices v4 , v5 , v6 , v7 form the other.

Chapter 6

Orthogonality and best approximation 6.1

Norms and inner products

1. Let V be a normed vector space over R, and suppose u, v ∈ V with v = αu, α ≥ 0. Then ku + vk = ku + auk = k(1 + a)uk = (1 + a)kuk = kuk + akuk

= kuk + kauk = kuk + kvk

(note the repeated use of the second property of a norm from Definition 265). Thus, if v = au, a ≥ 0, then equality holds in the triangle inequality (ku + vk = kuk + kvk). 5. We wish to derive relationships among the L1 (a, b), L2 (a, b), and L∞ (a, b) norms. Three such relationships exist: kf k1 ≤ (b − a)kf k∞ for all f ∈ L∞ (a, b), √ kf k2 ≤ b − akf k∞ for all f ∈ L∞ (a, b), √ kf k1 ≤ b − akf k2 for all f ∈ L2 (a, b). The first two are simple; we have |f (x)| ≤ kf k∞ for (almost) all x ∈ [a, b], and hence kf k1 = kf k22 =

Z

Z

a

a b

b

|f (x)| dx ≤ |f (x)|2 dx ≤

Z

Z

b

kf k∞ dx = (b − a)kf k∞ ,

a b a

kf k2∞ dx = (b − a)kf k2∞ .

To prove the third relationship, define v(x) = 1 for all x ∈ [a, b]. Then Z

a

b

|f (x)| dx =

Z

a

b

|v(x)||f (x)| dx = h|v|, |f |i2 ≤ kvk2 kf k2 ,

where the last step follows from the Cauchy-Schwarz inequality. Since kvk2 = follows.



b − 1, the desired result

Note that it is not possible to bound kf k∞ by a multiple of either kf k1 or kf k2 , nor to bound kf k2 by a multiple of kf k1 . 9. The following graph shows the (boundaries of the) unit balls in the ℓ2 norm (solid curve), the ℓ1 norm (dotted curve), and the ℓ∞ norm (dashed curve). 37

38

CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION

Notice that ℓ1 unit ball is contained in the other two, which is consistent with kxk1 ≥ kxk2 , kxk∞ , while the ℓ2 unit ball is contained in the ℓ∞ unit ball, consistent with kxk∞ ≤ kxk2 . 11. Suppose V is an inner product space and k · k is the norm defined by the inner product h·, ·i on V . Then, for all u, v ∈ V , ku + vk2 + ku − vk2 = hu + v, u + vi + hu − v, u − vi

= hu, ui + 2 hu, vi + hv, vi + hu, ui − 2 hu, vi + hv, vi = 2 hu, ui + 2 hv, vi = 2kuk2 + 2kvk2 .

Thus the parallelogram law holds in V . If u = (1, 1) and v = (1, −1) in R2 , then a direct calculation shows that ku + vk21 + ku − vk21 = 8, 2kuk21 + 2kvk21 = 16, and hence the parallelogram law does not hold for k · k1 . Therefore, k · k1 cannot be defined by an inner product. For the same u and v, we have ku + vk2∞ + ku − vk2∞ = 8, 2kuk2∞ + 2kvk2∞ = 4, and hence the parallelogram law does not hold for k · k∞. Therefore, k · k∞ cannot be defined by an inner product.

6.2

The adjoint of a linear operator

9. Let M : P2 → P3 be defined by M (p) = q, where q(x) = xp(x). We wish to find M ∗ , assuming that the L2 (0, 1) inner product is imposed on both P2 and P3 . Following the technique of Example 277 (and the previous exercise), we compute the matrix N , defined by Nij = hM (pi ), qj iP3 , where S1 = {p1 , p2 , p3 } = {1, x, x2 } and S2 = {q1 , q2 , q3 , q4 } = {1, x, x2 , x3 }:    N =

1 2 1 3 1 4

1 3 1 4 1 5

1 4 1 5 1 6

1 5 1 6 1 7

 .

The Gram matrix is the same as in the previous exercise, and we obtain   3 1 0 0 20 35   [M ]S3 ,S2 = G−1 N =  1 0 − 53 − 32 35  . 3 12 0 1 2 7

This shows that M ∗ maps a0 + a1 x + a2 x2 + a3 x3 to       3 3 3 32 12 1 a2 + a3 + a0 − a2 − a3 x + a1 + a2 + a3 x2 . 20 35 5 35 2 7

39

6.3. ORTHOGONAL VECTORS AND BASES

11. Let X and U be finite-dimensional inner product spaces over R, and suppose T : X → U is linear. Define S : R(T ∗ ) → R(T ) by S(x) = T (x) for all x ∈ R(T ∗ ). We will prove that S is an isomorphism between R(T ∗ ) and R(T ). (a) First, suppose x1 , x2 ∈ R(T ∗ ) satisfy S(x1 ) = S(x2 ). Since x1 , x2 ∈ R(T ∗ ), there exist u1 , u2 ∈ U such that x1 = T ∗ (u1 ), x2 = T ∗ (u2 ). Then we have S(x1 ) = S(x2 ) ⇒ T (x1 ) = T (x2 )

⇒ T (T ∗ (u1 )) = T (T ∗(u2 )) ⇒ T (T ∗ (u1 − u2 )) = 0

⇒ hu1 − u2 , T (T ∗ (u1 − u2 ))iU = 0

⇒ hT ∗ (u1 − u2 ), T ∗ (u1 − u2 )iX = 0 ⇒ T ∗ (u1 − u2 ) = 0 ⇒ T ∗ (u1 ) = T ∗ (u2 ) ⇒ x1 = x2 .

Therefore, S is injective. (b) Since S injective, Theorem 93 implies that dim(R(T )) ≥ dim(R(T ∗ )). This results holds for any linear operator mapping one finite-dimensional inner product space to another, and hence it applies to the operator T ∗ . Hence dim(R(T ∗ )) ≥ dim(R((T ∗ )∗ )). Since (T ∗ )∗ = T , it follows that dim(R(T ∗ )) ≥ dim(R(T )), and therefore that dim(R(T ∗ )) ≥ dim(R(T )) (that is, rank(T ∗ ) = rank(T )). (c) Now we see that S is an injective linear operator mapping one finite-dimensional vector space to another of the same dimension. It follows from Corollary 105 that S is also surjective and hence is an isomorphism.

6.3

Orthogonal vectors and bases

3. The equation α1 p1 + α2 p2 + α3 p3 = q is equivalent to   1 1 α1 − α2 + α3 + (α2 − α3 )x + α3 x2 = 3 + 2x + x2 , 2 6 which in turn is equivalent to the system 1 1 α1 − α2 + α3 = 3, 2 6 α2 − α3 = 2, α3 = 1.

The solution is α = (13/3, 3/1), and thus q(x) =

13 p1 (x) + 3p2 (x) + p3 (x). 3

7. Consider the functions ex and e−x to be elements of C[0, 1], and regard C[0, 1] as an inner product space under the L2 (0, 1) inner product. Define S = sp{ex , e−x }. We wish to find an orthogonal basis {f1 , f2 } for S. We will take f1 (x) = ex . The function f2 must be of the form f (x) = c1 ex + c2 e−x and satisfy Z

0

1

f (x)ex dx = 0.

40

CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION This last condition leads to the equation c1

Z

1

e2x dx + c2

0

Z

0

1

1 dx = 0 ⇔

1 2 (e − 1)c1 + c2 = 0. 2

One solution is c1 = 2, c2 = 1 − e2 . Thus if f2 (x) = 2ex + (1 − e2 )e−x , then {f1 , f2 } is an orthonal basis for S. 15. Let V be an inner product space over R, and let u, v be vectors in V . (a) Assume u and v are nonzero. We wish to prove that v ∈ sp{u} if and only if | hu, vi | = kukkvk. First suppose v ∈ sp{u}. Then, by Exercise 13, v=



hv, ui | hv, ui | | hv, ui | hv, ui

= kuk = u ⇒ kvk = u .

2 hu, ui hu, ui kuk kuk

This yields kukkvk = | hv, ui |, as desired. On the other hand, if v 6∈ sp{u}, then, by the previous exercise,

hv, ui | hv, ui |

u = , kvk > hu, ui kuk

and hence kukkvk > | hv, ui |. Thus v ∈ sp{u} if and only if kukkvk = | hv, ui |. (b) If u and v are nonzero, then the first part of the exercise shows that equality holds in the CauchySchwarz inequality if and only if v ∈ sp{u}, that is, if and only if v is a multiple of u. If v = 0 or u = 0, then equality trivially holds in the Cauchy-Schwarz inequality (both sides are zero). Thus we can say that equality holds in the Cauchy-Schwarz inequality if and only if u = 0 or v is a multiple of u.

6.4

The projection theorem

1. Let A ∈ Rm×n . (a) We wish to prove that N (AT A) = N (A). First, if x ∈ N (A), then Ax = 0, which implies that AT Ax = 0, and hence that x ∈ N (AT A). Thus N (A) ⊂ N (AT A). Conversely, suppose x ∈ N (AT A). Then AT Ax = 0 ⇒ x · AT Ax = 0 ⇒ (Ax) · (Ax) = 0 ⇒ Ax = 0. Therefore, x ∈ N (A), and we have shown that N (AT A) ⊂ N (A). This completes the proof. (b) If A has full rank, then the null space of A is trivial (by the fundamental theorem of linear algebra) and hence so is the null space of N (AT A). Since AT A is square, this shows that AT A is invertible. (c) Thus, if A has full rank, then AT A is invertible and, for any y ∈ Rm there is a unique solution x = (AT A)−1 AT y of the normal equations AT Ax = AT y. Thus, by Theorem 291, there is a unique least-squares solution to Ax = y.

9. Consider the following data points: (0, 3.1), (1, 1.4), (2, 1.0), (3, 2.2), (4, 5.2), (5, 15.0). We wish to find the function of the form f (x) = a1 ex + a2 e−x that fits the data as nearly as possible in the least-squares sense. We wish to solve the equations a1 exi + a2 e−xi = yi , i = 1, 2, 3, 4, 5, 6

41

6.5. THE GRAM-SCHMIDT PROCESS in the least-squares sense. These equations are equivalent to the system M a = y, where     x 3.1 e 1 e−x1  1.4   ex2 e−x2      x  1.0   e 3 e−x3     , y = M =  2.2  .  ex4 e−x4      x  5.2   e 5 e−x5  ex6 e−x6 15.0

. The solution is a = (0.10013, 2.9878), and the approximating function is 0.10013ex + 2.9878e−x.

15. Let A ∈ Rm×n , where m < n and rank(A) = m. Let y ∈ Rm . (a) Since rank(A) = dim(col(A)), the fact that rank(A) = m proves that col(A) = Rm . Thus Ax = y has a solution for all y ∈ Rm . Moreover, by Theorem 93, N (A) is nontrivial (a linear operator cannot be injective unless the dimension of the co-domain is at least as large as the dimension of the domain), and hence Ax = y has infinitely many solutions. (b) Consider the matrix AAT ∈ Rm×m . We know from Exercise 1(a) (applied to AT ) that N (AAT ) = N (AT ). By Exercise 6.2.11, rank(AT ) = rank(A) = m, and hence nullity(AT ) = 0 by the fundamental theorem of linear algebra. Since N (AAT ) = N (AT ) is trivial, this proves that the square matrix AAT is invertible. −1 y, then (c) If x = AT AAT  −1 −1  Ax = A AT AAT y = y, y = (AAT ) AAT and hence x is a solution of Ax = y.

6.5

The Gram-Schmidt process

5. (a) The best cubic approximation, in the L2 (−1, 1) norm, to the function f (x) = ex , is the polynomial q(x) = α1 +α2 x+α3 x2 +α4 x3 , where Gα = b. G is the Gram matrix, and b is defined by bi = hf, p1 i2 , where pi (x) = xi−1 , i = 1, 2, 3, 4:     2 0 32 0 e − e−1   0 2 0 2   2e−1 3 5    G=  2 0 2 0  , b =  e − 5e−1  . 3 5 16e−1 − 2e 0 25 0 27 We obtain



  α= 

3e 33 4e − 4 105e 765 4 − 4e 15e 105 4 − 4e 175e 1295 4e − 4



  . 

(b) Applying the Gram-Schmidt process to the standard basis {1, x, x2 , x3 }, we obtain the orthogonal basis   1 3 3 2 1, x, x − , x − x . 3 5 (c) Using the orthogonal basis for P3 , we compute the best approximation to f (x) = ex from P3 (relative to the L2 (−1, 1) norm) to be       e − e−1 1 3 1295 175e 3 15e 105 x2 − + x3 − x . + x+ − − 2 e 4 4e 3 4e 4 5

42

CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION 9. Let P be the plane in R3 defined by the equation 3x − y − z = 0. (a) A basis for P is {(1, 3, 0), (1, 0, 3)}; applying the Gram-Schmidt process to this basis yields the orthogonal basis {(1, 3, 0), (9/10, −3/10, 3)}. (b) The projection of u = (1, 1, 1) onto P is (8/11, 12/11, 12/11).

13. Define an inner product on C[0, 1] by hf, gi =

Z

1

(1 + x)f (x)g(x) dx.

0

(a) We will first verify that h·, ·i really does define an inner product on C[0, 1]. For all f, g ∈ C[0, 1], we have Z 1 Z 1 (1 + x)g(x)f (x) dx = hg, f i , (1 + x)f (x)g(x) dx = hf, gi = 0

0

and thus the first property of an inner product is satisfied. If f, g, h ∈ C[0, 1] and α, β ∈ R, then Z 1 hαf + βg, hi = (1 + x)(αf (x) + βg(x))h(x) dx 0

=

Z

1

0



Z

{α(1 + x)f (x)h(x) + β(1 + x)g(x)h(x)} dx 1

(1 + x)f (x)h(x) dx + β 0

Z

1

(1 + x)g(x)h(x) dx

0

= α hf, hi + β hg, hi .

This verifies the second property of an inner product. Finally, for any f ∈ C[0, 1], Z 1 hf, f i = (1 + x)f (x)2 dx ≥ 0 0

2

(since (1 + x)f (x) ≥ 0 for all x ∈ [0, 1]). Also, if hf, f i = 0, then (1 + x)f (x)2 = 0 for all x ∈ [0, 1]. Since 1 + x > 0 for all x ∈ [0, 1], this implies that f (x)2 ≡ 0, or f (x) ≡ 0. Therefore, hf, f i = 0 if and only if f = 0, and we have verified that h·, ·i defines an inner product on C[0, 1]. (b) Applying the Gram-Schmidt process to the standard basis yields the orthogonal basis   5 5 2 68 . 1, x − , x − x + 9 65 26

6.6

Orthogonal complements

1. Let S = sp{(1, 2, 1, −1), (1, 1, 2, 0)}. We wish to find a basis for S ⊥ . Let A ∈ R2×4 be the matrix whose rows are the given vectors (the basis vectors for S). Then x ∈ S ⊥ if and only if Ax = 0; that is, S ⊥ = N (A). A direct calculation shows that S ⊥ = N (A) = {(−3, 1, 1, 0), (−1, 1, 0, 1)}. 5. (a) Since N (A) is orthogonal to col(AT ), it suffices to orthogonalize the basis of col(AT ) by (a single step of) the Gram-Schmidt process, yielding {(1, 4, −4), (−16/33, −97/33, −101/33)}. Then {(24, −5, 1), (1, 4, −4), (−16/33, −97/33, −101/33)} is an orthogonal basis for R3 . (b) A basis for N (AT ) is {(1, −1, 1, 0), (−2, 1, 0, 1)} and a basis for col(A) is {(1, 1, 0, 1), (4, 3, −1, 5)}. Applying the Gram-Schmidt process to each of the bases individually yields {(1, −1, 1, 0), (−1, 0, 1, 1)} and {(1, 1, 0, 1), (0, −1, −1, 1)}, respectively. The union of these two bases, {(1, 1, 0, 1), (−1, 0, 1, 1), (1, 1, 0, 1), (0, −1, −1, 1)}, is an orthogonal basis for R4 .

43

6.7. COMPLEX INNER PRODUCT SPACES 9. (a) Let A ∈ Rn×n be symmetric. Then N (A)⊥ = col(A) and col(A)⊥ = N (A).

(b) It follows that y ∈ col(A) if and only if y ∈ N (A)⊥ ; in other words, Ax = y has a solution if and only if Az = 0 ⇒ y · z = 0.

6.7

Complex inner product spaces

1. The projection of v onto S is w=



 2 4 4 2 1 4 − i, + i, + i . 3 9 9 3 3 9

5. The best approximation to f from P2 is     2i 24 60i(π 2 − 12) 1 1 2 + . x −x+ − 2 x− π π 2 π3 6 9. (a) Let u = (1, 1), v = (1, −1 + i) ∈ C2 . Then a direct calculation shows that ku + vk22 = kuk22 + kvk22 and hu, vi2 = −i 6= 0. (b) Suppose V is a complex inner product space. If u, v ∈ V and ku + vk2 = kuk2 + kvk2 , then hu, ui + hv, vi + hu, vi + hv, ui = hu, ui + hv, vi

⇔ hu, vi + hv, ui = 0

⇔ hu, vi + hu, vi = 0.

For any z = x + iy ∈ C, z + z = 0 is equivalent to 2x = 0 or, equivalently, x = 0. Thus ku + vk2 = kuk2 + kvk2 holds if and only if the real part of hu, vi is zero.

6.8

More on polynomial approximation

1. (a) The best quadratic approximation to f in the (unweighted) L2 (−1, 1) norm is   1 3 15e − 105e−1 e − e−1 x2 − . + x+ 2 e 4 3 (b) The best quadratic approximation to f in the weighted L2 (−1, 1) norm is (approximately) 1.2660659 + 1.1303182x + 0.27149534(2x2 − 1) (note that the integrals were computed numerically). The following graph shows the error in the L2 approximation (solid curve) and the error in the weighted L2 approximation (dashed) curve. 0.1 0.08 0.06 0.04 0.02 0 −1

−0.5

0

0.5

1

We see that the weighted L2 approximation has the smaller maximum error.

44

CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION 3. (a) The orthogonal basis for P3 on [−1, 1] is   1 3 3 2 1, x, x − , x − x . 3 5 Transforming this to the interval [0, π], we obtain the following orthogonal basis for P3 as a subspace of L2 (0, π):   4 2 4 2 8 3 12 2 24 2 2 . t− 1, t − 1, 2 t − t + , 3 t − 2 t + π π π 3 π π 5π 5 The best cubic approximation to f (t) = sin (t) on [0, π] is 2 15(π 2 − 12) + π π3

p(t) =



4 2 4 2 t − t+ 2 π π 3



.

(b) The orthogonal basis for P3 on [−1, 1], in the weighted L2 inner product, is  1, x, 2x2 − 1, 4x3 − 3x .

Transforming this to the interval [0, π], we obtain the following orthogonal basis for P3 as a subspace of L2 (0, π):   8 48 18 8 32 2 1, t − 1, 2 t2 − t + 1, 3 t3 − 2 t2 + t − 1 . π π π π π π The best cubic approximation to f (t) = sin (t) on [0, π], in the weighted L2 norm, is (approximately)   8 2 8 t − t+1 q(t) = 0.47200122 − 0.49940326 π2 π

(the integrals were computed numerically). (c) The following graph shows the error in the ordinary L2 approximation (solid curve) and the error in the weighted L2 approximation (dashed curve): 0.06 0.05 0.04 0.03 0.02 0.01 0 0

1

2

3

4

The second approximation has a smaller maximum error.

6.9

The energy inner product and Galerkin’s method

5. The variational form of the BVP is to find u ∈ V such that  Z ℓ Z ℓ dv du f (x)v(x) dx for all v ∈ V. k(x) (x) (x) + p(x)u(x)v(x) dx = dx dx 0 0 If the basis for Vn is {φ1 , . . . , φn }, then Galerkin’s method results in the system (K + M )U = F , where K and F are the same stiffness matrix and load vector as before, and M is the mass matrix: Mij =

Z

0



p(x)φj (x)φi (x) dx, i, j = 1, . . . , n.

45

6.10. GAUSSIAN QUADRATURE

7. The projection of f onto V can be computed just as described by the projection theorem. The projection Pn−1 is i=1 Vi φi , where V ∈ Rn−1 satisfies the system M V = B. The Gram matrix is called the mass matrix in this context (see the solution of Exercise 5): Z ℓ Mij = φj (x)φi (x) dx, i, j = 1, . . . , n − 1. 0

The vector B is defined by Bi =

6.10

Rℓ 0

f (x)φi (x) dx, i = 1, . . . , n − 1.

Gaussian quadrature

1. We wish to find the Gaussian quadrature rule with n = 3 quadrature nodes (on the reference interval [−1, 1]). We know p that the nodes are theproots of the third orthogonal polynomial, p3 (x). Therefore, the nodes are x1 = − 3/5, x2 = 0, x3 = 3/5. To find the weights, we need the Lagrange polynomials defined by these nodes, which are p p 5 5 5 L1 (x) = x(x − 3/5), L2 (x) = − x2 + 1, L3 (x) = x(x + 3/5). 6 3 6 Then Z 1 5 w1 = L1 (x) dx = , 9 −1 Z 1 5 w2 = L2 (x) dx = , 9 −1 Z 1 8 L3 (x) dx = . w3 = 9 −1 √ 3. Let w(x) = 1/ 1 − x2 . We wish to find the weighted Gaussian quadrature rule Z 1 n . X wi f (xi ), w(x)f (x) dx = −1

i=1

where n = 3. The nodes are the roots of the third orthogonal polynomial under p p this weight function, which is T3 (x) = 4x3 − 3x. The nodes are thus x1 = − 3/4, x2 = 0, x3 = 3/4. The corresponding Lagrange polynomials are p p 4 2 2 L1 (x) = x(x − 3/4), L2 (x) = − x2 + 1, L3 (x) = x(x + 3/4). 3 3 3 The weights are then Then Z 1 π w1 = w(x)L1 (x) dx = , 3 −1 Z 1 π w2 = w(x)L2 (x) dx = , 3 −1 Z 1 π w3 = w(x)L3 (x) dx = . 3 −1

6.11

The Helmholtz decomposition

1. Let Ω be a domain in R3 , and let φ, u be a scalar field and a vector field, respectively, defined on Ω. We have  X 3 3  3 3 X X X ∂φ ∂ui ∂ ∂φ ∂ui = (φui ) = ui + φ ui + φ = ∇φ · u + φ∇ · u. ∇ · (φu) = ∂x ∂x ∂x ∂x ∂x i i i i i i=1 i=1 i=1 i=1

46

CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION 3. Let φ : Ω → R be a smooth scalar field. Then ∇ · ∇φ =

  X 3 3 X ∂2φ ∂2φ ∂2φ ∂2φ ∂φ ∂ = = + + . 2 ∂xi ∂xi ∂xi ∂x21 ∂x22 ∂x23 i=1 i=1

Chapter 7

The spectral theory of symmetric matrices 7.1

The spectral theorem for symmetric matrices

1. Let A ∈ Rm×n . Then (AT A)T = AT (AT )T = AT A and, for every x ∈ Rn , x · AT Ax = (Ax) · (Ax) = kAxk22 ≥ 0. Therefore, AT A is symmetric and positive semidefinite. 5. Suppose A ∈ Rn×n satisfies

(Ax) · (Ay) = x · y for all x, y ∈ Rn .

Then x · (AT Ay) = x · y for all x, y ∈ Rn

⇒ AT Ay = y for all y ∈ Rn (by Corollary 275)

⇒ AT A = I.

The last step follows from the uniqueness of the matrix representing a linear operator on Rn . Since AT A = I, we see that A is orthogonal. 9. Let X be a finite-dimensional inner product space over R with basis X = {x1 , . . . , xn }, and assume that T : X → X is a self-adjoint linear operator (T ∗ = T ). Define A = [T ]X ,X . Let G be the Gram matrix for the basis X , and define B ∈ Rn×n by B = G1/2 AG−1/2 , where G1/2 is the square root of G (see Exercise 7) and G−1/2 is the inverse of G1/2 . (a) We wish to prove that B is symmetric. Let x, yP ∈ X be given, and define α = [x]X , β = [y]X . Then, n since [T (x)]X = A[x]X = Aα, we have T (x) = i=1 (Aα)i xi . Therefore, + * n n X n n X X X (Aα)i βj hxi , xj i βj xj = (Aα)i xi , hT (x), yi = i=1

i=1 j=1

j=1

=

n X i=1



(Aα)i 

n X j=1



Gij βj 

= (Aα) · Gβ = α · (AT G)β. 47

48

CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES Similarly, [T (y)]X = Aβ, and an analogous calculation shows that hx, T (y)i = α · GAβ. Since hT (x), yi = hx, T (y)i, it follows that α · (AT G)β = α · GAβ for all α, β ∈ Rn , and hence GA = AT G. This implies that (GA)T = AT GT = AT G = GA (using the fact that G is symmetric), and hence GA is symmetric. (Thus the fact that T is self-adjoint does not imply that A is symmetric, but rather than GA is symmetric.) Now, it is easy to see that G−1/2 is symmetric, and if C is symmetric, then so is XCX T for any square matrix X. Therefore, G−1/2 GA(G−1/2 )T = G−1/2 GAG−1/2 = G1/2 AG−1/2 is symmetric, which is what we wanted to prove. (b) If λ, u is an eigenvalue/eigenvector pair of B, then Bu = λu ⇒ G1/2 AG−1/2 u = λu ⇒ AG−1/2 u = λG−1/2 u. Since u 6= 0 and G−1/2 is nonsingular, G−1/2 u is also nonzero, and hence λ, G−1/2 u is an eigenpair of A. (c) Since B is symmetric, there exists an orthonormal basis {u1 , . . . , un } of Rn consisting of eigenvectors of B. Let λ1 , . . . , λn be the corresponding eigenvalues. From above, we know that {G−1/2 u1 , . . . , G−1/2 un } is a basis of Rn consisting of eigenvectors of A. Define {y1 , . . . , yn } ⊂ X by [yi ]X = G−1/2 ui , that is, n   X yi = G−1/2 ui xk , i = 1, . . . , n. k

k=1

Notice that

hyi , yj i = =

*

n  X

−1/2

G

k=1 n X n  X k=1 ℓ=1

ui



k

G−1/2 ui

n   X xk , G−1/2 uj xℓ ℓ

ℓ=1

+

   G−1/2 uj hxk , xℓ i k



    = G−1/2 ui · G G−1/2 uj   = ui · G−1/2 GG−1/2 uj  1, i = j, = ui · uj = 0, i 6= j.

This shows that {y1 , . . . , yn } is an orthonormal basis for X. Also, [T (yi )]X = A[yi ]X = AG−1/2 ui = λi G−1/2 ui = λi [yi ]X = [λi yi ]X , which proves that T (yi ) = λi yi . Thus each yi is an eigenvector of T , and we have proved that there exists an orthonormal basis of X consisting of eigenvectors of T .

7.2

The spectral theorem for normal matrices

1. The matrix A is normal since AT A = AAT = 2I. We have A = U DU ∗ , where # "   √i √i − 1−i 0 2 , D = . U = √12 √1 0 1+i 2 2

49

7.3. OPTIMIZATION AND THE HESSIAN MATRIX 7. Let A ∈ Rn×n be skew-symmetric.

(a) Since AT A = AAT = −A2 , it follows that A is normal (b) Hence there exist a unitary matrix X ∈ Cn×n and a diagonal matrix D ∈ Cn×n such that A = XDX ∗ , or, equivalently, D = X ∗ AX. We then have D∗ = X ∗ A∗ X = X ∗ (−A)X = −X ∗ AX = −D. Therefore, each diagonal entry λ of D, that is, each eigenvalue of A, satisfies λ = −λ. This implies that λ is of the form iθ, θ ∈ R. Thus a skew-symmetric matrix has only purely imaginary eigenvalues. 11. Suppose A, B ∈ Cn×n are normal and commute (AB = BA). Then, by Exercise 5.4.18, A and B are simultaneously diagonalizable; that is, there exists a unitary matrix X ∈ Cn×n such that X ∗ AX = D and X ∗ BX = C are both diagonal matrices in Cn×n . It follows that A + B = XDX ∗ + XCX ∗ = X(D + C)X ∗ , (A + B)∗ = X(D + C)∗ X ∗ . Since A + B and (A + B)∗ are simultaneously diagonalizable, it follows that they commute and hence A + B is normal. 15. Let A ∈ F m×n and B ∈ F n×p , where F represents R or C. We wish to find a formula for the product AB in terms of outer products of the columns of A and the rows of B. Let A = [c1 | · · · |cn ],   r1   B =  ...  . rn Then

AB =

n X

ck rkT .

k=1

To verify this, notice that (ck rkT )ij = Aik Bkj , and hence ! n n n X X X  T ck rk Aik Bkj = (AB)ij . = ck rkT ij = k=1

ij

k=1

k=1

This holds for all i, j, and thus verifies the formula above. It follows that the linear operator defined by AB can be written as n X ck ⊗ rk . k=1

7.3

Optimization and the Hessian matrix

 1. Suppose A ∈ Rn×n and define Asym = (1/2) A + AT . We have

T  1  1 1 T A + AT = A +A = A + AT = Asym , 2 2 2 is symmetric. Also, for any x ∈ Rn ,     1 1 x · Asym x = x · x = x · Ax + AT x A + AT 2 2 1 1 = x · Ax + x · AT x 2 2 1 1 = x · Ax + Ax · x 2 2 1 1 = x · Ax + x · Ax = x · Ax. 2 2 (Asym )T =

and hence Asym

50

CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES 5. The eigenvalues of A are λ = −1, λ = 3. Since A is indefinite, q has no global minimizer (or global maximizer). 7. The eigenvalues of A are λ = 0, λ = 5, and therefore A is positive semidefinite and singular. The vector b belongs to col(A), and therefore, by Exercise 2, every vector in x∗ + N (A), where x∗ is any solution of Ax = −b, is a global minimizer of q. (In other words, every solution of Ax = −b is a global minimizer.) We can take x∗ = (−1, 0), and N (A) = sp{(2, −1)}.

7.4

Lagrange multipliers

. . 3. The maximizer is x = (−0.058183, 0.73440, −0.67622), with Lagrange multiplier λ = (3.1883, −5.7454) . . and f (x) = 17.923, while the minimizer is x = (0.67622, −0.73440, 0.058183), with Lagrange multiplier . . λ = (0.81171, −5.7454) and f (x) = 14.077. 7. The minimizer, and associated Lagrange multiplier, of f (x) subject to g(x) = u is x(u) =

√ √ √   √ 1+u 1+u 1+u 3 , λ(u) = − √ ,− √ ,− √ . − √ 2 1+u 3 3 3

√ √ We have p(u) = f (x(u)) = − 3 1 + u. Therefore, by direct calculation, √ √ 3 3 . ∇p(u) = − √ ⇒ ∇p(0) = − 2 2 1+u

√ On the other hand, the Lagrange multiplier associated with x∗ = x(0) is λ∗ = λ(0) = − 3/2. Thus ∇p(0) = λ∗ , as implied by the previous exercise.

7.5

Spectral methods for differential equations

1. The exact solution is u(x) = (x − x2 )/2, and the solution obtained by the method of Fourier series is ∞ X 2(1 − (−1)n ) u(x) = sin (nπx). n3 π 3 n=1

The following graph shows the exact solution u (the solid curve), together with the partial Fourier series with 1 (the dashed curve) and 4 terms (the dotted curve). 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0

0.2

0.4

0.6

0.8

1

Notice that the partial Fourier series with 4 terms is already indistinguishable from the exact solution on this scale. 2 3. Consider the operator M : CD [0, ℓ] → C[0, ℓ] defined by M (u) = −u′′ + u.

51

7.5. SPECTRAL METHODS FOR DIFFERENTIAL EQUATIONS 2 (a) For any u, v ∈ CD [0, ℓ], we have

hM (u), vi2 =

Z

1

(−u′′ (x) + u(x))v(x) dx

0

=−

1

Z

u′′ (x)v(x) dx +

0

1

1

Z

1

Z

0 1

= u(x)v ′ (x)|0 − =− =

Z

1

u′ (x)v ′ (x) dx +

0

u′ (x)v ′ (x) dx +

Z

1

Z

Z

1

u(x)v(x) dx

0

Z

1

u(x)v(x) dx (since v(0) = v(1) = 0) 0

u(x)v ′′ (x) dx +

0

Z

1

u(x)v(x) dx

0

′′

u(x)v (x) dx +

0

1

u(x)v(x) dx

0

= − u′ (x)v(x)| 0 + =

1

Z

Z

1

u(x)v(x) dx (since u(0) = u(1) = 0)

0

u(x)(−v ′′ (x) + v(x)) dx

0

= hu, M (v)i2 . This shows that M is a symmetric operator. Also, notice from the above calculation that hM (u), ui2 =

Z

1



2

(u (x)) dx +

0

Z

1

(u(x))2 dx,

0

2 which shows that hM (u), ui2 > 0 for all u ∈ CD [0, ℓ], u 6= 0. Then, if λ is an eigenvalue of M and u is a corresponding eigenfunction with hu, ui2 = 1, then

λ = λ hu, ui2 = hλu, ui2 = hM (u), ui2 > 0, which shows that all the eigenvalues of M are positive. (b) It is easy to show that the eigenvalues of M are n2 π 2 + 1, n = 1, 2, . . ., with corresponding eigenfunctions sin (nπx) (the calculation is essentially the same as in Section 7.5.1).

Chapter 8

The singular value decomposition 8.1

Introduction to the SVD

3. The SVD of A is U ΣV T , where      √ √1 √1 0 4 3 √ 0 0 2 2    U =  √12 0 − √12  , Σ =  0 11 0  , V =  0 0 0 0 −1 0 The outer product form simplifies to   1  A = 2 1  1 0

2 1



√1 6 √2 6 √1 6



 0  +  0  −1 −1 −1

− √111 − √111 √3 11

3



√7 66 − √466 √1 66



 .

.

  7. If the columns of A are A1 , . . . , An , U = kA1 k−1 A1 | · · · |kAn k−1 An , V = I, and Σ is the diagonal matrix with diagonal entries kA1 k, . . . , kAn k, then A = U ΣV T is the SVD of A. 11. Suppose A ∈ Cn×n is invertible and A = U ΣV ∗ is the SVD of A. (a) We have (U ΣV ∗ )∗ = V Σ∗ U ∗ = V ΣU ∗ (since Σ is a real, diagonal matrix), and hence A∗ = V ΣU ∗ is the SVD of A. (b) Since A is invertible, all the diagonal entries of Σ are positive, and hence Σ−1 exists. We have (U ΣV ∗ )−1 = V Σ−1 U ∗ ; however, the diagonal entries of Σ−1 , σ1−1 , . . . , σn−1 , are ordered from smallest to largest. We obtain the SVD of A−1 by re-ordering. Define W = [vn | · · · |v1 ], Z = [un | · · · |u1 ], and let T be the diagonal matrix with diagonal entries σn−1 , . . . , σ1−1 . Then A−1 = W T Z ∗ is the SVD of A−1 . (c) The SVD of A−∗ is ZT W ∗ , where W , T , and Z are defined above. In outer product form, A∗ =

n X i=1

8.2

σi (vi ⊗ ui ), A−1 =

n X i=1

σi−1 (vi ⊗ ui ), A−∗ =

n X i=1

σi−1 (ui ⊗ vi ).

The SVD for general matrices

3. We have projcol(A) b = (1, 5/2, 5/2, 4) , projN (AT ) b = (0, −1/2, 1/2, 0) . 53

54

CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION 5. Referring to the solution of Exercise 8.1.4, if U = [u1 |u2 |u3 |u4 ], V = [v1 |v2 |v3 |v4 ], then {u1 , u2 , u3 } is a basis for col(A), {u4 } is a basis for N (AT ), {v1 , v2 , v3 } is a basis for col(AT ), and {v4 } is a basis for N (A). 9. Let A ∈ √ Rm×n be nonsingular. We wish to compute min{kAxk2 : x ∈ Rn , kxk2 = 1}. Note that kAxk2 = x · AT Ax. By Exercise 7.4.5, the minimum value of x · AT Ax, where kxk2 = 1, is the smallest eigenvalue of AT A, which is σn2 . Therefore, min{kAxk2 : x ∈ Rn , kxk2 = 1} = σn , and the value of x yielding the minimum is vn , the right singular vector corresponding to σn , the smallest singular value.

15. (a) Let U ∈ Cm×m , V ∈ Cn×n be unitary. We wish to prove that kU AkF = kAkF and kAV kF = kAkF for all A ∈ Cm×n . We begin with two preliminary observations. By definition of the Frobenius norm, for any A ∈ Cm×n , ! n m n X X X 2 2 kAj k22 , = |Aij | kAkF = j=1

i=1

j=1

where Aj is the jth column of A. Also, it is obvious that kAT kF = kAkF for all A ∈ Cm×n . We thus have n n n X X X 2 2 2 kAj k2F = kAk2F kU Aj kF = k(U A)j kF = kU AkF = j=1

j=1

j=1

and kAV kF = kV T AT kF = kAT kF = kAkF , as desired. (b) Let A ∈ Cm×n be given, and let r be a positive integer with r < rank(A). We wish to find the matrix B ∈ Cm×n of rank r such that kA − BkF is as small as possible. If we define B = U Σr V T , where Σr ∈ Rm×n is the diagonal matrix with diagonal entries σ1 , . . . , σr , 0, . . . , 0, then kA − BkF = kU ΣV ∗ − U Σr V ∗ kF = kU (Σ − Σr )V ∗ kF = k(Σ − Σr )V ∗ kF = kΣ − Σr kF q 2 = σr+1 + · · · + σtw ,

where t ≤ min{m, n} is the rank of A. Notice that the rank of a matrix is the number q of positive singular values, so rank(B) = r, as desired. Thus we can make kA−BkF as small as

Moreover, for any B ∈ C

m×n

, we have

kA − Bk2F = kΣ − U ∗ BV k2F n  m X 2 X Σij − (U ∗ BV )ij =

2 σr+1 + · · · + σtw .

i=1 j=1

=

t X i=1

2

(σi − (U ∗ BV )ii ) +

m X i=1

t X

j=1 j 6= i

(U ∗ BV )2ij +

m X n X

(U ∗ BV )2ij .

i=1 j=t+1

Now, we are free to choose all the entries of U ∗ BV (since U ∗ and V are invertible, given any C ∈ Cm×n , there exists a unique B ∈ Cm×n with U ∗ BV = C) to make the above sum as small as possible. Since all three summations are nonnegative, we should choose (U ∗ BV )ij = 0 for i = 1, . . . , m, j = 1, . . . , n, j 6= i for i = 1, . . . , t. This causes the second two summations to vanish, and yields t X 2 2 (σi − (U ∗ BV )ii ) . kA − BkF = i=1

55

8.3. SOLVING LEAST-SQUARES PROBLEMS USING THE SVD The rank of U ∗ BV (and hence the rank of B) is the number of nonzero diagonal entries (U ∗ BV )11 , . . . , (U ∗ BV )tt .

Since the rank of B must be r, it is clear that U ∗ BV = Σr , where Σr is defined above, will make kA − BkF as small as possible. This shows that B = U Σr V ∗ is the desired matrix.

8.3

Solving least-squares problems using the SVD

3. The matrix A has two positive singular values, σ1 = 6 and σ2 vectors are    2 − √12 3    v1 =  23  , v2 =  √12 1 0 3 and the left singular vectors are



  u1 =  

1 2 1 2 1 2 1 2

= 2. The corresponding right singular   

 − √12   0      , u2 =  1  .   √2  0 



The minimum-norm least-squares solution to Ax = b is then given by x=

u2 · b u1 · b v1 + v2 . σ1 σ2

(a) x = (2/3, −1/3, 1/12);

(b) x = (13/18, −5/18, 1/9);

(c) x = (0, 0, 0) (b is orthogonal to col(A)).

7. Let A ∈ Rm×n have rank r, and let σ1 , . . . , σr be the positive singular values of A, with corresponding right singular vectors v1 , . . . , vr ∈ Rn and left singular vectors u1 , . . . , ur ∈ Rm . Then, for all x ∈ Rn , Ax =

r X i=1

and, for all b ∈ Rm , A† b =

σi (vi · x)ui

r X ui · b i=1

8.4

σi

vi .

The SVD and linear inverse problems

1. (a) Let A ∈ Rm×n be given, let I ∈ Rn×n be the identity matrix, and let ǫ be a positive number. For any ˆb ∈ Rm , we can solve the equation     ˆb A x= (8.1) ǫI 0 in the least-square sense, which is equivalent to minimizing



   

A ˆb 2 Ax − ˆb



ǫI x − 0 = ǫx 2

 2

.

2

56

CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION Now, for any Euclidean vector w ∈ Rk , partitioned as w = (u, v), u ∈ Rp , v ∈ Rq , p + q = k, we have kwk22 = kuk22 + kvk22 , and hence



Ax − ˆb

ǫx

 2

= kAx − ˆbk22 + kǫxk22 = kAx − ˆbk22 + ǫ2 kxk22 .

2

Thus solving (8.1) in the least-squares sense is equivalent to choosing x ∈ Rn to minimize kAx − ˆbk2 + ǫ2 kxk2 . 2 2

(b) We have



A ǫI

T

=

which implies that 

A ǫI

T 

A ǫI





AT

= AT A + ǫ2 I,



ǫI



A ǫI

T 

,

ˆb 0



= AT ˆb.

Hence the normal equations for (8.1) take the form (AT A + ǫ2 I)x = AT ˆb. (c) We have 

A ǫI



x=0 ⇒



Ax ǫx



=



0 0



,

which yields ǫx = 0, that is, x = 0. Thus the matrix is nonsingular and hence, by the fundamental theorem, it has full rank. It follows from Exercise 6.4.1 that AT A + ǫ2 I is invertible, and hence there is a unique solution xǫ to (AT A + ǫ2 I)x = AT ˆb. 3. With the errors drawn from a normal distribution with mean zero and standard deviation 10−4 : • truncated SVD works best with k = 3 singular values/vectors; • Tikhonov regularization works best with ǫ around 10−3 .

8.5

The Smith normal form of a matrix

1. We have A = U SV , where 

4 2 U = 6 1 3 2

3. We have A = U SV , where 

  −1 1 0 , S =  0 −1 0

  4 0 1 1 0 1 , S =  0 U = 5 7 −1 0 0

  0 0 1 3 0 , V =  0 0 15 0 0 3 0

 0 −1 1 4 . 0 1

  0 2 1 0 , V =  1 0 0 0 0

 4 7 . 1

Chapter 9

Matrix factorizations and numerical linear algebra 9.1

The LU factorization

1. 

1 0 0  3 1 0 L=  −1 −4 1 −2 0 5

  0 1  0 0  , U =   0 0  1 0

3 2 1 −1 0 1 0 0

 4 3  . −1  1

7. Let L=



1 ℓ

0 1



LU =



u v ℓu ℓv + w

, U=



u v 0 w



,

and notice that 

.

(a) We wish to show that there do not exist matrices L, U of the above forms such that LU = A, where A=



0 1

1 1



.

This is straightforward, since LU = A implies u = 0 (comparing the 1, 1 entries) and also ℓu = 1 (comparing the 2, 1 entries). No choice of ℓ, u, v, w can make both of these true, and hence there do not exist such L and U . (b) If A=



0 0

1 1



,

then LU = A is equivalent to u = 0, v = 1, and ℓ + w = 1. There are infinitely many choices of ℓ and w that will work, and hence there exist infinitely many L, U satisfying LU = A. 11. Computing A−1 is equivalent to solving Ax = ej for j = 1, 2, . . . , n. If we compute the LU factorization and then solve the n systems LU = ej , j = 1, . . . , n, the operation count is  8 1 7 2 3 1 2 1 n − n − n + n 2n2 − n = n3 + n2 − n. 3 2 6 3 2 6 57

58

CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA We can reduce the above operation count by taking advantage of the zeros in the vectors ej . Solving Lc = ej takes the following form: ci = 0, i = 1, . . . , j − 1, cj = 1, ci = −

i−1 X

Lik ck , i = j + 1, . . . , n.

k=j

Pn Thus solving Lc − ej requires i=j+1 2(i − j) = (n − j)2 + (n − j) operations. The total for solving all n of the lower triangular systems Lc = ej , j = 1, . . . , n, is n X  1 1 (n − j)2 + (n − j) = n3 − n 3 3 j=1

(instead of n(n2 − n) = n3 − n if we ignore the structure of the right-hand side). We still need n3 operations to solve the n upper triangular systems U Aj = L−1 ej (since we perform back substition, there is no simplification from the fact that the first j − 1 entries in L−1 ej are zero). Hence the total is 1 1 3 3 2 3 1 2 1 n − n − n + n3 − n + n3 = 2n3 + n2 − n. 3 2 6 3 3 2 2 Notice the reduction in the leading term from (8/3)n3 to 2n3 .

9.2

Partial pivoting

1. The solution is x = (2, 1, −1); partial pivoting requires interchanging rows 1 and 2 on the first step, and interchanging rows 2 and 3 on the second step. 5. Suppose A ∈ Rn×n has an LU decomposition, A = LU . We know that the determinant of a square matrix is the product of the eigenvalues, and also that the determinant of a triangular matrix is the product of the diagonal entries. We therefore have det(A) = λ1 λ2 · · · λn , where λ1 , λ2 , . . . , λn are the eigenvalues of A (listed according to multiplicity), and also det(A) = det(LU ) = det(L)det(U ) = (1 1 · · · 1)(U11 U22 · · · Unn ) = U11 U22 · · · Unn . This shows that λ1 λ2 · · · λn = U11 U22 · · · Unn . 9. On the first step, partial pivoting requires that rows 1 and 3 be interchanged. No interchange is necessary on step 2, and rows 3 and 4 must be interchanged on step 3. Thus   0 0 1 0  0 1 0 0   P =  0 0 0 1 . 1 0 0 0 The LU factorization of P A is P A = LU , where  1 0 0  0.5 1 0 L=  0.25 −0.5 1 −0.25 −0.1 0.25

  0 6  0 0  , U =   0 0  1 0

 2 −1 4 3 1 −2  . 0 2 1  0 0 1

59

9.3. THE CHOLESKY FACTORIZATION

9.3

The Cholesky factorization

1. (a) The Cholesky factorization is A = RT R, where   1 −3 2 2 −2  . R= 0 0 0 2 (b) We also have A = LDLT , where L entries of R:  1 D= 0 0

= (D−1/2 R)T and the diagonal entries of D are the diagonal    1 0 0 0 √0  2 0  , L =  −3 √2 √0 . 0 2 2 − 2 2

Alternatively, we can write A = LU , where U  1 0 1 L =  −3 2 −1

= DR and L = (D−1 R)T :    0 1 −3 2 0 , U =  0 4 −4  . 1 0 0 4

5. Let A ∈ Rn×n be SPD. The algorithm described on pages 527–528 of the text (that is, the equations derived on those pages) shows that there is a unique upper triangular matrix R with positive diagonal entries such that RT R = A. (The only freedom in solving those equations for the entries of R lies in choosing the positive or negative square root when computing Rii . If Rii is constrained to be positive, then the entries of R are uniquely determined.)

9.4

Matrix norms

1. Let k · k be any induced matrix norm on Rn×n . If λ ∈ Cn×n , x ∈ Cn , x 6= 0, is an eigenvalue/eigenvector pair of A, then kAxk ≤ kAkkxk ⇒ kλxk ≤ kAkkxk ⇒ |λ|kxk ≤ kAkkxk ⇒ |λ| ≤ kAk (the last step follows from the fact that x 6= 0). Since this holds for every eigenvalue of A, it follows that ρ(A) = max{|λ| : λ is an eigenvalue of A} ≤ kAk. 7. Let A ∈ Rm×n . We wish to prove that kAT k2 = kAk p from Theorem 403 and p2 . This follows immediately Exercise 4.5.14. Theorem 403 implies that kAk2 = λmax (AT A), kAT k2 = λmax (AAT ). By Exercise 4.5.14, AT A and AAT have the same nonzero eigenvalues, and hence λmax (AT A) = λmax (AAT ), from which it follows that kAk2 = kAT k2 .

9.5

The sensitivity of linear systems to errors

1. Let A ∈ Rn×n be nonsingular. (a) Suppose b ∈ Rn is given. Choose c ∈ Rn such that  −1  kA xk kA−1 ck = sup : x ∈ Rn , x 6= 0 = kA−1 k, kck kxk and notice that kA−1 ck = kA−1 kkck. (Such a c exists. For the common norms—the ℓ1 , ℓ∞ , and Euclidean norms—we have seen in the text how to compute such a c; for an arbitrary norm, it

60

CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA can be shown that such a c exists, although some results from analysis concerning continuity and compactness are needed.) Define ˆb = b + c and x = A−1 b, x ˆ = A−1 (b + c). Then kˆ x − xk = kA−1 (b + c) − A−1 bk = kA−1 ck = kA−1 kkck = kA−1 kkˆb − bk. Notice that, for the Euclidean norm k · k2 and induced matrix norm, c should be chosen to be a right singular vector corresponding to the smallest singular value of A. (b) Let A ∈ Rn×n be given, and let x, c ∈ Rn be chosen so that   kAyk kAxk = sup : y ∈ Rn , y 6= 0 = kAk, kxk kyk  −1  kA yk kA−1 ck = sup : y ∈ Rn , y 6= 0 = kA−1 k. kck kyk Define b = Ax, ˆb = b + c, and x ˆ = A−1 (b + c) = x + A−1 c. We then have kbk = kAxk = kAkkxk ⇒ kxk = and It follows that

5. Let A ∈ R norm.

n×n

kbk kAk

kˆ x − xk = kA−1 ck = kA−1 kkck = kA−1 kkˆb − bk. kˆ x − xk kA−1 kkˆb − bk kˆb − bk = . = kAkkA−1 k kbk kxk kbk kAk

be invertible, and let k · k denote any norm on Rn and the corresponding induced matrix

(a) Let B ∈ Rn×n be any singular matrix. We have seen (Exercise 9.4.9) that kAxk ≥ kxk/kA−1 k for all x ∈ Rn . Let x ∈ N (B) with kxk = 1. Then kA − Bk ≥ k(A − B)xk = kAxk ≥

kxk 1 = . kA−1 k kA−1 k

(b) It follows that  kA − Bk : B ∈ Rn×n , det(B) = 0 kAk   1/kA−1 k ≥ inf : B ∈ Rn×n , det(B) = 0 kAk 1 = . cond(A) inf



(c) Consider the special case of the Euclidean norm on Rn and induced norm k · k2 on Rn×n . Let A = U ΣV T be the SVD of A, and define A′ = U Σ′ V T , where Σ′ is the diagonal matrix with diagonal entries σ1 , . . . , σn−1 , 0 (σ1 ≥ · · · σn−1 ≥ σn > 0 are the singular values of A). Then kA − A′ k2 = kU ΣV T − U Σ′ V T k2 = kU (Σ − Σ′ )V T k2 = kΣ − Σ′ k2 1 = σn = kA−1 k

(kΣ − Σ′ k2 is the largest singular value of Σ − Σ′ , which is a diagonal matrix with a single nonzero entry, σn ). It follows that kA − A′ k2 1 1 = = . −1 kAk2 kAk2 kA k2 cond2 (A)

Hence the inequality derived in part (b) is an equality in this case.

61

9.6. NUMERICAL STABILITY

9.6

Numerical stability

1. (a) Suppose x and y are two real numbers and xˆ and yˆ are perturbations of x and y, respectively. We have xˆyˆ − xy = x ˆyˆ − xˆy + x ˆy − xy = xˆ(ˆ y − y) + (ˆ x − x)y, which implies that

|ˆ xyˆ − xy| ≤ |ˆ x||ˆ y − y| + |ˆ x − x||y|.

This gives a bound on the absolute error in approximating xy by xˆyˆ. Dividing by xy yields y − y| |ˆ |ˆ x| |ˆ x − x| |ˆ xyˆ − xy| ≤ + . |xy| |x| |y| |x| This yields a bound on the relative error in xˆyˆ in terms of the relative errors in x ˆ and yˆ, although it would be preferable if the bound did not contain x ˆ (except in the expression for the relative error in x ˆ). We can manipulate the bound as follows:   y − y| |ˆ |ˆ x| |ˆ x − x| |ˆ y − y| |ˆ x − x| |ˆ xyˆ − xy| |ˆ x| |ˆ y − y| ≤ + = + + −1 |xy| |x| |y| |x| |y| |x| |x| |y| |ˆ |ˆ y − y| |ˆ x − x| |ˆ x| y − y| ≤ + + − 1 |y| |x| |x| |y| |ˆ y − y| |ˆ y − y| x − x| ||ˆ x| − |x|| |ˆ = + + |y| |x| |x| |y| y − y| x − x| |ˆ x − x| |ˆ |ˆ y − y| |ˆ + + . ≤ |y| |x| |x| |y| When the relative errors in xˆ and yˆ are small (that is, much less than 1), then their product is much smaller, and we see that the relative error in x ˆyˆ as an approximation to xy is approximately bounded by the sum of the errors in x ˆ and yˆ. (b) If x and y are floating point numbers, then fl(xy) = xy(1 + ǫ), where |ǫ| ≤ u. Therefore, fl(xy) is the exact product of x ˜ and y˜, where x ˜ = x and y˜ = y(1 + ǫ). This shows that the computed product is the exact product of nearby numbers, and therefore that floating point multiplication is backward stable.

9.7

The sensitivity of the least-squares problem

3. Let A ∈ Rm×n , b ∈ Rm be given, and let x be a least-squares solution of Ax = b. We have b = Ax + (b − Ax) ⇒ b · Ax = (Ax) · (Ax) + (b − Ax) · Ax = kAxk22 , since (b − Ax) · Ax = 0 (b − Ax is orthogonal to col(A)). It follows that kbk2 kAxk2 cos (θ) = kAxk22 ⇒ kAxk2 = kbk2 cos (θ), where θ is the angle between Ax and b. Also, since Ax, b − Ax are orthogonal, the Pythagorean theorem implies kbk22 = kAxk22 + kb − Axk22 .

Dividing both sides by kbk22 yields 1= Therefore,

kb − Axk22 kb − Axk22 kAxk22 + ⇒ 1 = cos2 (θ) + . 2 2 kbk2 kbk2 kbk22

kb − Axk22 = sin2 (θ) ⇒ kb − Axk2 = kbk2 sin (θ). kbk22

62

CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA

9.8

The QR factorization

1. Let x, y ∈ R3 be defined by x = (1, 2, 1) and y = (2, 1, 1). We define x−y = u= kx − yk2 Then Qx = y.



  0 1 1 − √ , √ , 0 , Q = I − 2uuT =  1 2 2 0

1 0 0

 0 0 . 1

3. We have A = QR, where 

 −0.42640 0.65970 0.61885 Q =  −0.63960 −0.70368 0.30943  , 0.63960 −0.26388 0.72199   −4.6904 −3.8376 2.7716 0 2.0671 −2.1110  R= 0 0 9.2828

(correct to the digits shown). The Householder vectors are

u1 = (0.84451, 0.37868, −0.37868), u2 = (−0.99987, 0.015968).

9.9

Eigenvalues and simultaneous iteration

1. We apply n iterations of the power method, normalizing the approximate eigenvectors in the Euclidean norm at each iteration, and estimating the eigenvalue by λ = (x · Ax)/(x · x) = x · Ax. We use a random starting vector. With n = 10, we have x0 = (0.37138, −0.22558, 1.1174),

x10 = (0.40779, −0.8165, 0.40871), . λ = 3.9999991538580488. With n = 20, we obtain x0 = (−1.0891, 0.032557, 0.55253), x20 = (−0.40825, 0.8165, −0.40825), . λ = 3.9999999999593752. It seems clear that the dominant eigenvalue is λ = 4. 5. Let A ∈ Rn×n . We wish to prove that there exists an orthogonal matrix Q such that QT AQ is block upper triangular, with each diagonal block of size 1 × 1 or 2 × 2. If all the eigenvalues of A are real, then the proof of Theorem 413 can be given using only real numbers and vectors, and the result is immediate. Therefore, let λ, λ be a complex conjugate pair of eigenvalues of A. By an induction argument similar to ˆ ∈ Rn×n that in the proof of Theorem 413, it suffices to prove that there exists an orthogonal matrix Q such that   T B T ˆ ˆ Q AQ = , 0 C where T ∈ R2×2 has eigenvalues λ, λ. Suppose z = x + iy, z 6= 0, x, y ∈ Rn , satisfies Az = λz. In the solution of the previous exercise, we saw that {x, y} is linearly independent, so define S = sp{x, y} ⊂ Rn and Sˆ = sp{x, y} ⊂ Cn (Sˆ = sp{z, z}). Let {q1 , q2 } be an orthonormal basis for S, and extend it to an

63

9.10. THE QR ALGORITHM

ˆ 1 = [q1 |q2 ], Q ˆ 2 = [q3 | · · · |qn ], and Q ˆ = [Q ˆ 1 |Q ˆ 2 ]. We orthonormal basis {q1 , q2 , . . . , qn } for Rn . Define Q then have # " # " ˆ T AQ ˆ1 Q ˆ T AQ ˆ2 ˆT Q Q 1 1 1 T ˆ ˆ ˆ ˆ [AQ1 |AQ2 ] = . Q AQ = ˆT ˆ T AQ ˆ1 Q ˆ T AQ ˆ2 Q Q 2 2 2 ˆ 1 belong to S and each column of Q ˆ 2 belongs to S ⊥ , it follows that Q ˆ T AQ ˆ 1 = 0. Since both columns of AQ 2 T ˆ 1 AQ ˆ 1 are λ, λ. We see that η ∈ C , u ∈ C2 Thus it remains only to prove that the eigenvalues of T = Q form an eigenpair of T if and only if ˆ T1 AQ ˆ 1 u = ηu ⇔ A(Q ˆ 1 u) = η(Q ˆ 1 u). T u = ηu ⇔ Q ˆ 1 u for u ∈ C2 , it follows that both λ and λ are eigenvalues of T ; Since both z and z can be written as Q moreover, since T is 2 × 2, these are the only eigenvalues of T . This completes the proof.

9.10

The QR algorithm

1. Two steps are required to reduce A to upper Hessenberg form. The result is  −2.0000 4.8507 · 10−1 −1.1021 · 10−1 8.6750 · 10−1  −4.1231 3.1765 1.3851 −7.6145 · 10−1 H=  0 −1.4240 1.9481 1.6597 · 10−1 −1 0 0 8.9357 · 10 −3.1246 and the vectors defining the two Householder transformations are

   

u1 = (8.6171 · 10−1 , −2.8146 · 10−1 , 4.2219 · 10−1 ), u2 = (9.3689 · 10−1 , 3.4964 · 10−1 ).

5. The inequality

is equivalent to

|λk+1 − µ| |λk+1 | < |λk − µ| |λk | |λk − µ| |λk+1 − µ| < , |λk+1 | |λk |

(9.1)

and hence (9.1) holds if and only if the relative error in µ as an estimate of λk+1 is less than the relative error in µ as an estimate of λk .

Chapter 10

Analysis in vector spaces 10.1

Analysis in Rn

3. Let k · k and k · k∗ be two norms on Rn . Since k · k and k · k∗ are equivalent, there exist positive constants c1 , c2 such that c1 kxk ≤ kxk∗ ≤ c2 kxk for all x ∈ Rn . Suppose that xk → x under k · k, and let ǫ > 0. Then there exists a positive integer N such that kxk − xk < ǫ/c2 for all k ≥ N . It follows that ǫ kxk − xk∗ ≤ c2 kxk − xk < c2 = ǫ c2 for all k ≥ N . Therefore, xk → x under k · k∗ .

Conversely, if xk → x under k · k∗ and ǫ > 0 is given, there exists a positive integer N such that −1 kxk − xk∗ < c1 ǫ for all k ≥ N . It follows that kxk − xk ≤ c−1 1 kxk − xk∗ < c1 c1 ǫ = ǫ for all k ≥ N . Therefore, xk → x under k · k. 7. Let k · k and k · k∗ be two norms on Rn , let S be a nonempty subset of Rn , let f : S → Rn be a function, and let y be an accumulation point of S. Since k · k and k · k∗ are equivalent, there exist positive constants c1 , c2 such that c1 kxk ≤ kxk∗ ≤ c2 kxk for all x ∈ Rn . Suppose first that limx→y f (x) = L under k · k, and let ǫ > 0 be given. Then there exists δ > 0 such that if x ∈ S and kx − yk < δ, then |f (x) − L| < ǫ. −1 But then kx − yk∗ < c1 δ ⇒ ky − xk ≤ c−1 c1 δ = δ. Therefore, if x ∈ S and kx − yk∗ < c1 δ, 1 kx − yk < c it follows that kx − yk < δ, and hence that |f (x) − L| < ǫ. This shows that limx→y f (x) = L under k · k∗ . Conversely, suppose that limx→y f (x) = L under k · k∗ , and let ǫ > 0 be given. Then there exists δ > 0 such that if x ∈ S and kx − yk∗ < δ, then |f (x) − L| < ǫ. But then −1 kx − yk < c−1 2 δ ⇒ ky − xk∗ ≤ c2 kx − yk < c2 c2 δ = δ.

Therefore, if x ∈ S and kx − yk < c−1 2 δ, it follows that kx − yk∗ < δ, and hence that |f (x) − L| < ǫ. This shows that limx→y f (x) = L under k · k. 11. Let k · k and k · k∗ be two norms on Rn , and let {xk } be a sequence in Rn . Since k · k and k · k∗ are equivalent, there exist positive constants c1 , c2 such that c1 kxk ≤ kxk∗ ≤ c2 kxk for all x ∈ Rn . Suppose first that {xk } is Cauchy under k · k, and let ǫ > 0 be given. Then there exists a positive integer N such that m, n ≥ N implies that kxm − xn k < c−1 2 ǫ. But then m, n ≥ N implies that kxm − xn k∗ ≤ c2 kxm − xn k < c2 c−1 2 ǫ = ǫ,

and hence {xk } is Cauchy under k · k∗ .

Conversely, suppose {xk } is Cauchy under k · k∗ , and let ǫ > 0 be given. Then there exists a positive integer N such that m, n ≥ N implies that kxm − xn k∗ < c1 ǫ. But then m, n ≥ N implies that −1 kxm − xn k ≤ c−1 1 kxm − xn k∗ < c1 c1 ǫ = ǫ,

and hence {xk } is Cauchy under k · k. 65

66

CHAPTER 10. ANALYSIS IN VECTOR SPACES

10.2

Infinite-dimensional vector spaces

3. Suppose {fk } is a Cauchy sequence in C[a, b] (under the L∞ norm) that converges pointwise to f : [a, b] → R. We wish to prove that fk → f in the L∞ norm. By Theorem 442, C[a, b] is complete under k · k∞ , and hence there exists a function g ∈ C[a, b] such that kfk − gk∞ → 0 as k → ∞. By the previous exercise, {fk } converges uniformly to g and hence, in particular, g(x) = limk→∞ fk (x) for all x ∈ [a, b] (cf. the discussion on page 593 in the text). However, by assumption, f (x) = limk→∞ fk (x) for all x ∈ [a, b]. This proves that g(x) = f (x) for all x ∈ [a, b], that is, that g = f . Thus kfk − f k∞ → 0 as k → ∞.

10.3

Functional analysis

3. Let V be a normed vector space. We wish to prove that V ∗ is complete. Let {fk } be a Cauchy sequence in V ∗ , let v ∈ V , v 6= 0, be fixed, and let ǫ > 0 be given. Then, since {fk } is Cauchy, there exists a positive integer N such that kfn − fm kV ∗ < ǫ/kvk for all m, n ≥ N . By definition of k · kV ∗ , it follows that |fn (v) − fm (v)| < ǫ for all m, n ≥ N . This proves that {fk (v)} is a Cauchy sequence of real numbers and hence converges. We define f (v) = limk→∞ fk (v). Since v was an arbitrary element of V , this defines f : V → R. Moreover, it is easy to show that f is linear: f (αv) = lim fk (αv) = lim αfk (v) = α lim fk (v) = αf (v), k→∞

k→∞

k→∞

f (u + v) = lim fk (u + v) = lim (fk (u) + fk (v)) = lim fk (u) + lim fk (v) = f (u) + f (v). k→∞

k→∞

k→∞

k→∞

We can also show that f is bounded. Since {fk } is Cauchy under k · kV ∗ , it is easy to show that {kfk kV ∗ } is a bounded sequence of real numbers, that is, that there exists M > 0 such that kfk kV ∗ ≤ M for all k. Therefore, if v ∈ V , kvk ≤ 1, then |f (v)| = |limk→∞ fk (v)| = limk→∞ |fk (v)| ≤ limk→∞ kfk kV ∗ kvk ≤ M kvk. Thus f is bounded, and hence f ∈ V ∗ . Finally, we must show that fk → f under k · kV ∗ , that is, that kfk − f kV ∗ → 0 as k → ∞. Let ǫ > 0 be given. Since {fk } is Cauchy under k · kV ∗ , there exists a positive integer N such that kfm − fn kV ∗ < ǫ/2 for all m, n ≥ N . We will show that |fn (v) − f (v)| < ǫ for all v ∈ V , kvk ≤ 1, and all n ≥ N , which then implies that kfn − f kV ∗ < ǫ for all n ≥ N and completes the proof. For any v ∈ V , kvk ≤ 1, and all n, m ≥ N , we have |fn (v) − f (v)| ≤ |fn (v) − fm (v)| + |fm (v) − f (v)|
0. Then, for any y, z ∈ S, α, β ∈ [0, 1], α + β = 1, we have kαy + βz − xk = kαy + βz − αx − βxk = kα(y − x) + β(z − x)k ≤ αky − xk + βkz − xk < αr + βr = r.

This shows that αy + βz ∈ S, and hence that S is convex. 7. Let f : Rn → R be convex and continuously differentiable, and let x, y ∈ Rn be given. By the previous exercise, f (x) ≥ f (y) + ∇f (y) · (x − y),

f (y) ≥ f (x) + ∇f (x) · (y − x).

67

10.4. WEAK CONVERGENCE Adding these two equations yields f (x) + f (y) ≥ f (y) + f (x) + ∇f (y) · (x − y) + ∇f (x) · (y − x). Canceling the common terms and rearranging yields (∇f (x) − ∇f (y)) · (x − y) ≥ 0.