Elementary linear algebra 10th edition

28 downloads 14490 Views 33MB Size Report
This edition of Elementary Linear Algebra gives an introductory treatment of ... New Chapter on Numerical Methods In the previous edition an assortment of ...
About The Author Howard Anton obtained his B.A. from Lehigh University, his M.A. from the University of Illinois, and his Ph.D. from the Polytechnic University of Brooklyn, all in mathematics. In the early 1960s he worked for Burroughs Corporation and Avco Corporation at Cape Canaveral, Florida, where he was involved with the manned space program. In 1968 he joined the Mathematics Department at Drexel University, where he taught full time until 1983. Since then he has devoted the majority of his time to textbook writing and activities for mathematical associations. Dr. Anton was president of the EPADEL Section of the Mathematical Association of America (MAA), served on the Board of Governors of that organization, and guided the creation of the Student Chapters of the MAA. In addition to various pedagogical articles, he has published numerous research papers in functional analysis, approximation theory, and topology. He is best known for his textbooks in mathematics, which are among the most widely used in the world. There are currently more than 150 versions of his books, including translations into Spanish, Arabic, Portuguese, Italian, Indonesian, French, Japanese, Chinese, Hebrew, and German. For relaxation, Dr. Anton enjoys travel and photography.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

Preface This edition of Elementary Linear Algebra gives an introductory treatment of linear algebra that is suitable for a first undergraduate course. Its aim is to present the fundamentals of linear algebra in the clearest possible way—sound pedagogy is the main consideration. Although calculus is not a prerequisite, there is some optional material that is clearly marked for students with a calculus background. If desired, that material can be omitted without loss of continuity. Technology is not required to use this text, but for instructors who would like to use MATLAB, Mathematica, Maple, or calculators with linear algebra capabilities, we have posted some supporting material that can be accessed at either of the following Web sites: www.howardanton.com www.wiley.com/college/anton

Summary of Changes in this Edition This edition is a major revision of its predecessor. In addition to including some new material, some of the old material has been streamlined to ensure that the major topics can all be covered in a standard course. These are the most significant changes: • Vectors in 2-space, 3-space, and n-space Chapters 3 and 4 of the previous edition have been combined into a single chapter. This has enabled us to eliminate some duplicate exposition and to juxtapose concepts in n-space with those in 2-space and 3-space, thereby conveying more clearly how n-space ideas generalize those already familiar to the student. • New Pedagogical Elements Each section now ends with a Concept Review and a Skills mastery that provide the student a convenient reference to the main ideas in that section. • New Exercises Many new exercises have been added, including a set of True/False exercises at the end of most sections. • Earlier Coverage of Eigenvalues and Eigenvectors The chapter on eigenvalues and eigenvectors, which was Chapter 7 in the previous edition, is Chapter 5 in this edition. • Complex Vector Spaces The chapter entitled Complex Vector Spaces in the previous edition has been completely revised. The most important ideas are now covered in Section 5.3 and Section 7.5 in the context of matrix diagonalization. A brief review of complex numbers is included in the Appendix. • Quadratic Forms This material has been extensively rewritten to focus more precisely on the most important ideas. • New Chapter on Numerical Methods In the previous edition an assortment of topics appeared in the last chapter. That chapter has been replaced by a new chapter that focuses exclusively on numerical methods of linear algebra. We achieved this by moving those topics not concerned with numerical methods elsewhere in the text. • Singular-Value Decomposition In recognition of its growing importance, a new section on Singular-Value Decomposition has been added to the chapter on numerical methods.

• Internet Search and the Power Method A new section on the Power Method and its application to Internet search engines has been added to the chapter on numerical methods. • Applications There is an expanded version of this text by Howard Anton and Chris Rorres entitled Elementary Linear Algebra: Applications Version, 10th (ISBN 9780470432051), whose purpose is to supplement this version with an extensive body of applications. However, to accommodate instructors who asked us to include some applications in this version of the text, we have done so. These are generally less detailed than those appearing in the Anton/Rorres text and can be omitted without loss of continuity.

Hallmark Features • Relationships Among Concepts One of our main pedagogical goals is to convey to the student that linear algebra is a cohesive subject and not simply a collection of isolated definitions and techniques. One way in which we do this is by using a crescendo of Equivalent Statements theorems that continually revisit relationships among systems of equations, matrices, determinants, vectors, linear transformations, and eigenvalues. To get a general sense of how we use this technique see Theorems 1.5.3, 1.6.4, 2.3.8, 4.8.10, 4.10.4 and then Theorem 5.1.6, for example. • Smooth Transition to Abstraction Because the transition from Rn to general vector spaces is difficult for many students, considerable effort is devoted to explaining the purpose of abstraction and helping the student to “visualize” abstract ideas by drawing analogies to familiar geometric ideas. • Mathematical Precision When reasonable, we try to be mathematically precise. In keeping with the level of student audience, proofs are presented in a patient style that is tailored for beginners. There is a brief section in the Appendix on how to read proof statements, and there are various exercises in which students are guided through the steps of a proof and asked for justification. • Suitability for a Diverse Audience This text is designed to serve the needs of students in engineering, computer science, biology, physics, business, and economics as well as those majoring in mathematics. • Historical Notes To give the students a sense of mathematical history and to convey that real people created the mathematical theorems and equations they are studying, we have included numerous Historical Notes that put the topic being studied in historical perspective.

About the Exercises • Graded Exercise Sets Each exercise set begins with routine drill problems and progresses to problems with more substance. • True/False Exercises Most exercise sets end with a set of True/False exercises that are designed to check conceptual understanding and logical reasoning. To avoid pure guessing, the students are required to justify their responses in some way. • Supplementary Exercise Sets Most chapters end with a set of supplementary exercises that tend to be more challenging and force the student to draw on ideas from the entire chapter rather than a specific section.

Supplementary Materials for Students • Student Solutions Manual This supplement provides detailed solutions to most theoretical exercises and to at least one nonroutine exercise of every type (ISBN 9780470458228). • Technology Exercises and Data Files The technology exercises that appeared in the previous edition have been moved to the Web site that accompanies this text. Those exercises are designed to be solved using MATLAB, Mathematica, or Maple and are accompanied by data files in all three formats. The exercises and data can be downloaded from either of the following Web sites. www.howardanton.com www.wiley.com/college/anton

Supplementary Materials for Instructors • Instructor's Solutions Manual This supplement provides worked-out solutions to most exercises in the text (ISBN 9780470458235). • WileyPLUS™ This is Wiley's proprietary online teaching and learning environment that integrates a digital version of this textbook with instructor and student resources to fit a variety of teaching and learning styles. WileyPLUS will help your students master concepts in a rich and structured environment that is available to them 24/7. It will also help you to personalize and manage your course more effectively with student assessments, assignments, grade tracking, and other useful tools. • Your students will receive timely access to resources that address their individual needs and will receive immediate feedback and remediation resources when needed. • There are also self-assessment tools that are linked to the relevant portions of the text that will enable your students to take control of their own learning and practice. • WileyPLUS will help you to identify those students who are falling behind and to intervene in a timely manner without waiting for scheduled office hours. More information about WileyPLUS can be obtained from your Wiley representative.

A Guide for the Instructor Although linear algebra courses vary widely in content and philosophy, most courses fall into two categories —those with about 35–40 lectures and those with about 25–30 lectures. Accordingly, we have created long and short templates as possible starting points for constructing a course outline. Of course, these are just guides, and you will certainly want to customize them to fit your local interests and requirements. Neither of these sample templates includes applications. Those can be added, if desired, as time permits.

Long Template Short Template Chapter 1: Systems of Linear Equations and Matrices

7 lectures

6 lectures

Chapter 2: Determinants

3 lectures

2 lectures

Long Template Short Template Chapter 3: Euclidean Vector Spaces

4 lectures

3 lectures

Chapter 4: General Vector Spaces

10 lectures

10 lectures

Chapter 5: Eigenvalues and Eigenvectors

3 lectures

3 lectures

Chapter 6: Inner Product Spaces

3 lectures

1 lecture

Chapter 7: Diagonalization and Quadratic Forms

4 lectures

3 lectures

Chapter 8: Linear Transformations

3 lectures

2 lectures

37 lectures

30 lectures

Total:

Acknowledgements I would like to express my appreciation to the following people whose helpful guidance has greatly improved the text.

Reviewers and Contributors Don Allen, Texas A&M University John Alongi, Northwestern University John Beachy, Northern Illinois University Przemslaw Bogacki, Old Dominion University Robert Buchanan, Millersville University of Pennsylvania Ralph Byers, University of Kansas Evangelos A. Coutsias, University of New Mexico Joshua Du, Kennesaw State University Fatemeh Emdad, Michigan Technological University Vincent Ervin, Clemson University Anda Gadidov, Kennesaw State University Guillermo Goldsztein, Georgia Institute of Technology Tracy Hamilton, California State University, Sacramento Amanda Hattway, Wentworth Institute of Technology Heather Hulett, University of Wisconsin—La Crosse David Hyeon, Northern Illinois University Matt Insall, Missouri University of Science and Technology Mic Jackson, Earlham College Anton Kaul, California Polytechnic Institute, San Luis Obispo

Harihar Khanal, Embry-Riddle University Hendrik Kuiper, Arizona State University Kouok Law, Georgia Perimeter College James McKinney, California State University, Pomona Eric Schmutz, Drexel University Qin Sheng, Baylor University Adam Sikora, State University of New York at Buffalo Allan Silberger, Cleveland State University Dana Williams, Dartmouth College

Mathematical Advisors Special thanks are due to a number of talented teachers and mathematicians who provided pedagogical guidance, provided help with answers and exercises, or provided detailed checking or proofreading: John Alongi, Northwestern University Scott Annin, California State University, Fullerton Anton Kaul, California Polytechnic State University Sarah Streett Cindy Trimble, C Trimble and Associates Brad Davis, C Trimble and Associates

The Wiley Support Team David Dietz, Senior Acquisitions Editor Jeff Benson, Assistant Editor Pamela Lashbrook, Senior Editorial Assistant Janet Foxman, Production Editor Maddy Lesure, Senior Designer Laurie Rosatone, Vice President and Publisher Sarah Davis, Senior Marketing Manager Diana Smith, Marketing Assistant Melissa Edwards, Media Editor Lisa Sabatini, Media Project Manager Sheena Goldstein, Photo Editor Carol Sawyer, Production Manager Lilian Brady, Copyeditor

Special Contributions The talents and dedication of many individuals are required to produce a book such as this, and I am fortunate to have benefited from the expertise of the following people: David Dietz — my editor, for his attention to detail, his sound judgment, and his unwavering faith in me. Jeff Benson — my assistant editor, who did an unbelievable job in organizing and coordinating the many threads required to make this edition a reality. Carol Sawyer — of The Perfect Proof, who coordinated the myriad of details in the production process. It will be a pleasure to finally delete from my computer the hundreds of emails we exchanged in the course of working together on this book. Scott Annin — California State University at Fullerton, who critiqued the previous edition and provided valuable ideas on how to improve the text. I feel fortunate to have had the benefit of Prof. Annin's teaching expertise and insights. Dan Kirschenbaum — of The Art of Arlene and Dan Kirschenbaum, whose artistic and technical expertise resolved some difficult and critical illustration issues. Bill Tuohy — who read parts of the manuscript and whose critical eye for detail had an important influence on the evolution of the text. Pat Anton — who proofread manuscript, when needed, and shouldered the burden of household chores to free up time for me to work on this edition. Maddy Lesure — our text and cover designer whose unerring sense of elegant design is apparent in the pages of this book. Rena Lam — of Techsetters, Inc., who did an absolutely amazing job of wading through a nightmare of author edits, scribbles, and last-minute changes to produce a beautiful book. John Rogosich — of Techsetters, Inc., who skillfully programmed the design elements of the book and resolved numerous thorny typesetting issues. Lilian Brady — my copyeditor of many years, whose eye for typography and whose knowledge of language never ceases to amaze me. The Wiley Team — There are many other people at Wiley who worked behind the scenes and to whom I owe a debt of gratitude: Laurie Rosatone, Ann Berlin, Dorothy Sinclair, Janet Foxman, Sarah Davis, Harry Nolan, Sheena Goldstein, Melissa Edwards, and Norm Christiansen. Thanks to you all.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

CHAPTER

1

Systems of Linear Equations and Matrices

CHAPTER CONTENTS 1.1. Introduction to Systems of Linear Equations 1.2. Gaussian Elimination 1.3. Matrices and Matrix Operations 1.4. Inverses; Algebraic Properties of Matrices 1.5. Elementary Matrices and a Method for Finding 1.6. More on Linear Systems and Invertible Matrices 1.7. Diagonal, Triangular, and Symmetric Matrices 1.8. Applications of Linear Systems • Network Analysis (Traffic Flow) • Electrical Circuits • Balancing Chemical Equations • Polynomial Interpolation 1.9. Leontief Input-Output Models

INTRODUCTION Information in science, business, and mathematics is often organized into rows and columns to form rectangular arrays called “matrices” (plural of “matrix”). Matrices often appear as tables of numerical data that arise from physical observations, but they occur in various mathematical contexts as well. For example, we will see in this chapter that all of the information required to solve a system of equations such as

is embodied in the matrix

and that the solution of the system can be obtained by performing appropriate operations on this matrix. This is particularly important in developing computer programs for solving systems of equations because computers are well suited for manipulating arrays of numerical information. However, matrices are not simply a notational tool for solving systems of equations; they can be viewed as mathematical objects in their own right, and there is a rich and important theory associated with them that has a multitude of practical applications. It is the study of matrices and related topics that forms the mathematical field that we call “linear algebra.” In this chapter we will begin our study of matrices.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

1.1 Introduction to Systems of Linear Equations Systems of linear equations and their solutions constitute one of the major topics that we will study in this course. In this first section we will introduce some basic terminology and discuss a method for solving such systems.

Linear Equations Recall that in two dimensions a line in a rectangular xy-coordinate system can be represented by an equation of the form and in three dimensions a plane in a rectangular xyz-coordinate system can be represented by an equation of the form These are examples of “linear equations,” the first being a linear equation in the variables x and y and the second a linear equation in the variables x, y, and z. More generally, we define a linear equation in the n variables to be one that can be expressed in the form (1) where and b are constants, and the a's are not all zero. In the special cases where we will often use variables without subscripts and write linear equations as

or

,

(2)

(3) In the special case where

, Equation 1 has the form (4)

which is called a homogeneous linear equation in the variables

.

E X A M P L E 1 Linear Equations Observe that a linear equation does not involve any products or roots of variables. All variables occur only to the first power and do not appear, for example, as arguments of trigonometric, logarithmic, or exponential functions. The following are linear equations:

The following are not linear equations:

A finite set of linear equations is called a system of linear equations or, more briefly, a linear system. The variables are called unknowns. For example, system 5 that follows has unknowns x and y, and system 6 has unknowns , , and . (5)

(6)

The double subscripting on the coefficients of the unknowns gives their location in the system—the first subscript indicates the equation in which the coefficient occurs, and the second indicates which unknown it multplies. Thus, is in the first equation and multiplies .

A general linear system of m equations in the n unknowns

can be written as

(7)

A solution of a linear system in n unknowns the substitution

is a sequence of n numbers

for which

makes each equation a true statement. For example, the system in 5 has the solution and the system in 6 has the solution These solutions can be written more succinctly as in which the names of the variables are omitted. This notation allows us to interpret these solutions geometrically as points in two-dimensional and three-dimensional space. More generally, a solution of a linear system in n unknowns can be written as which is called an ordered n-tuple. With this notation it is understood that all variables appear in the same order

in each equation. If triple.

, then the n-tuple is called an ordered pair, and if

, then it is called an ordered

Linear Systems with Two and Three Unknowns Linear systems in two unknowns arise in connection with intersections of lines. For example, consider the linear system

in which the graphs of the equations are lines in the xy-plane. Each solution (x, y) of this system corresponds to a point of intersection of the lines, so there are three possibilities (Figure 1.1.1): 1. The lines may be parallel and distinct, in which case there is no intersection and consequently no solution. 2. The lines may intersect at only one point, in which case the system has exactly one solution. 3. The lines may coincide, in which case there are infinitely many points of intersection (the points on the common line) and consequently infinitely many solutions.

Figure 1.1.1 In general, we say that a linear system is consistent if it has at least one solution and inconsistent if it has no solutions. Thus, a consistent linear system of two equations in two unknowns has either one solution or infinitely many solutions—there are no other possibilities. The same is true for a linear system of three equations in three unknowns

in which the graphs of the equations are planes. The solutions of the system, if any, correspond to points where all three planes intersect, so again we see that there are only three possibilities—no solutions, one solution, or infinitely many solutions (Figure 1.1.2).

Figure 1.1.2 We will prove later that our observations about the number of solutions of linear systems of two equations in two unknowns and linear systems of three equations in three unknowns actually hold for all linear systems. That is: Every system of linear equations has zero, one, or infinitely many solutions. There are no other possibilities.

E X A M P L E 2 A Linear System with One Solution Solve the linear system

Solution We can eliminate x from the second equation by adding −2 times the first equation to the second. This yields the simplified system

From the second equation we obtain obtain

, and on substituting this value in the first equation we

. Thus, the system has the unique solution

Geometrically, this means that the lines represented by the equations in the system intersect at the single point

. We leave it for you to check this by graphing the lines.

E X A M P L E 3 A Linear System with No Solutions Solve the linear system

Solution We can eliminate x from the second equation by adding −3 times the first equation to the second equation. This yields the simplified system

The second equation is contradictory, so the given system has no solution. Geometrically, this means that the lines corresponding to the equations in the original system are parallel and distinct. We leave it for you to check this by graphing the lines or by showing that they have the same slope but different y-intercepts.

E X A M P L E 4 A Linear System with Infinitely Many Solutions Solve the linear system

In Example 4 we could have also obtained parametric equations for the solutions by solving 8 for y in terms of x, and letting be the parameter. The resulting parametric equations would look different but would define the same solution set.

Solution We can eliminate x from the second equation by adding −4 times the first equation to the second. This yields the simplified system

The second equation does not impose any restrictions on x and y and hence can be omitted. Thus, the solutions of the system are those values of x and y that satisfy the single equation

(8) Geometrically, this means the lines corresponding to the two equations in the original system coincide. One way to describe the solution set is to solve this equation for x in terms of y to obtain and then assign an arbitrary value t (called a parameter) to y. This allows us to express the solution by the pair of equations (called parametric equations)

We can obtain specific numerical solutions from these equations by substituting numerical values for the parameter. For example, and

yields the solution

yields the solution

yields the solution

,

. You can confirm that these are solutions by

substituting the coordinates into the given equations.

E X A M P L E 5 A Linear System with Infinitely Many Solutions Solve the linear system

Solution This system can be solved by inspection, since the second and third equations are multiples of the first. Geometrically, this means that the three planes coincide and that those values of x, y, and z that satisfy the equation (9) automatically satisfy all three equations. Thus, it suffices to find the solutions of 9. We can do this by first solving 9 for x in terms of y and z, then assigning arbitrary values r and s (parameters) to these two variables, and then expressing the solution by the three parametric equations Specific solutions can be obtained by choosing numerical values for the parameters r and s. For example, taking and yields the solution (6, 1,0).

Augmented Matrices and Elementary Row Operations As the number of equations and unknowns in a linear system increases, so does the complexity of the algebra involved in finding solutions. The required computations can be made more manageable by simplifying notation and standardizing procedures. For example, by mentally keeping track of the location of the +'s, the x's, and the ='s in the linear system

we can abbreviate the system by writing only the rectangular array of numbers

As noted in the introduction to this chapter, the term “matrix” is used in mathematics to denote a rectangular array of numbers. In a later section we will study matrices in detail, but for now we will only be concerned with augmented matrices for linear systems. This is called the augmented matrix for the system. For example, the augmented matrix for the system of equations

The basic method for solving a linear system is to perform appropriate algebraic operations on the system that do not alter the solution set and that produce a succession of increasingly simpler systems, until a point is reached where it can be ascertained whether the system is consistent, and if so, what its solutions are. Typically, the algebraic operations are as follows: 1. Multiply an equation through by a nonzero constant. 2. Interchange two equations. 3. Add a constant times one equation to another. Since the rows (horizontal lines) of an augmented matrix correspond to the equations in the associated system, these three operations correspond to the following operations on the rows of the augmented matrix: 1. Multiply a row through by a nonzero constant. 2. Interchange two rows. 3. Add a constant times one row to another. These are called elementary row operations on a matrix. In the following example we will illustrate how to use elementary row operations and an augmented matrix to solve a linear system in three unknowns. Since a systematic procedure for solving linear systems will be developed in the next section, do not worry about how the steps in the example were chosen. Your objective here should be simply to understand the computations.

E X A M P L E 6 Using Elementary Row Operations In the left column we solve a system of linear equations by operating on the equations in the system, and in the right column we solve the same system by operating on the rows of the augmented matrix.

Add −2 times the first equation to the second to obtain

Add −2 times the first row to the second to obtain

Add −3 times the first equation to the third to obtain

Add −3 times the first row to the third to obtain

Multiply the second equation by

Multiply the second row by

to obtain

to obtain

Add −3 times the second equation to the third to obtain

Add −3 times the second row to the third to obtain

Multiply the third equation by −2 to obtain

Multiply the third row by −2 to obtain

Add −1 times the second equation to the first to obtain

Add −1 times the second row to the first to obtain

Add and

times the third equation to the first

Add

times the third equation to the second to and

obtain

The solution

times the third row to the first times the third row to the second

to obtain

is now evident.

Maxime Bôcher (1867–1918) Historical Note The first known use of augmented matrices appeared between 200 B.C. and 100 B.C. in a Chinese manuscript entitled Nine Chapters of Mathematical Art. The coefficients were arranged in columns rather than in rows, as today, but remarkably the system was solved by performing a succession of operations on the columns. The actual use of the term augmented matrix appears to have been introduced by the American mathematician Maxime Bôcher in his book Introduction to Higher Algebra, published in 1907. In addition to being an outstanding research mathematician and an expert in Latin, chemistry, philosophy, zoology, geography, meteorology, art, and music, Bôcher was an outstanding expositor of mathematics whose elementary textbooks were greatly appreciated by students and are still in demand today. [Image: Courtesy of the American Mathematical Society]

Concept Review • Linear equation • Homogeneous linear equation • System of linear equations • Solution of a linear system • Ordered n-tuple • Consistent linear system • Inconsistent linear system • Parameter • Parametric equations • Augmented matrix • Elemenetary row operations

Skills • Determine whether a given equation is linear. • Determine whether a given n-tuple is a solution of a linear system. • Find the augmented matrix of a linear system. • Find the linear system corresponding to a given augmented matrix. • Perform elementary row operations on a linear system and on its corresponding augmented matrix. • Determine whether a linear system is consistent or inconsistent. • Find the set of solutions to a consistent linear system.

Exercise Set 1.1 1. In each part, determine whether the equation is linear in

,

, and

.

(a) (b) (c) (d) (e) (f) Answer: (a), (c), and (f) are linear equations; (b), (d) and (e) are not linear equations 2. In each part, determine whether the equations form a linear system.

(a)

(b) (c) (d)

3. In each part, determine whether the equations form a linear system. (a) (b)

(c)

(d) Answer: (a) and (d) are linear systems; (b) and (c) are not linear systems 4. For each system in Exercise 2 that is linear, determine whether it is consistent. 5. For each system in Exercise 3 that is linear, determine whether it is consistent. Answer: (a) and (d) are both consistent 6. Write a system of linear equations consisting of three equations in three unknowns with (a) no solutions. (b) exactly one solution. (c) infinitely many solutions. 7. In each part, determine whether the given vector is a solution of the linear system

(a) (3, 1, 1) (b) (3, −1, 1)

(c) (13, 5, 2) (d) (e) (17, 7, 5) Answer: (a), (d), and (e) are solutions; (b) and (c) are not solutions 8. In each part, determine whether the given vector is a solution of the linear system

(a) (b) (c) (5, 8, 1) (d) (e) 9. In each part, find the solution set of the linear equation by using parameters as necessary. (a) (b) Answer: (a)

(b)

10. In each part, find the solution set of the linear equation by using parameters as necessary. (a) (b) 11. In each part, find a system of linear equations corresponding to the given augmented matrix (a)

(b)

(c) (d)

Answer: (a)

(b)

(c) (d)

12. In each part, find a system of linear equations corresponding to the given augmented matrix. (a)

(b) (c)

(d)

13. In each part, find the augmented matrix for the given system of linear equations.

(a)

(b) (c)

(d) Answer: (a)

(b) (c)

(d) 14. In each part, find the augmented matrix for the given system of linear equations. (a)

(b)

(c)

(d)

15. The curve

shown in the accompanying figure passes through the points

. Show that the coefficients a, b, and c are a solution of the system of linear equations whose augmented matrix is

Figure Ex-15 16. Explain why each of the three elementary row operations does not affect the solution set of a linear system. 17. Show that if the linear equations have the same solution set, then the two equations are identical (i.e.,

and

).

True-False Exercises In parts (a)–(h) determine whether the statement is true or false, and justify your answer. (a) A linear system whose equations are all homogeneous must be consistent. Answer: True (b) Multiplying a linear equation through by zero is an acceptable elementary row operation. Answer: False (c) The linear system

cannot have a unique solution, regardless of the value of k. Answer: True (d) A single linear equation with two or more unknowns must always have infinitely many solutions. Answer: True (e) If the number of equations in a linear system exceeds the number of unknowns, then the system must be inconsistent. Answer: False

(f) If each equation in a consistent linear system is multiplied through by a constant c, then all solutions to the new system can be obtained by multiplying solutions from the original system by c. Answer: False (g) Elementary row operations permit one equation in a linear system to be subtracted from another. Answer: True (h) The linear system with corresponding augmented matrix

is consistent. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

1.2 Gaussian Elimination In this section we will develop a systematic procedure for solving systems of linear equations. The procedure is based on the idea of performing certain operations on the rows of the augmented matrix for the system that simplifies it to a form from which the solution of the system can be ascertained by inspection.

Considerations in Solving Linear Systems When considering methods for solving systems of linear equations, it is important to distinguish between large systems that must be solved by computer and small systems that can be solved by hand. For example, there are many applications that lead to linear systems in thousands or even millions of unknowns. Large systems require special techniques to deal with issues of memory size, roundoff errors, solution time, and so forth. Such techniques are studied in the field of numerical analysis and will only be touched on in this text. However, almost all of the methods that are used for large systems are based on the ideas that we will develop in this section.

Echelon Forms In Example 6 of the last section, we solved a linear system in the unknowns x, y, and z by reducing the augmented matrix to the form

from which the solution , , became evident. This is an example of a matrix that is in reduced row echelon form. To be of this form, a matrix must have the following properties: 1. If a row does not consist entirely of zeros, then the first nonzero number in the row is a 1. We call this a leading 1. 2. If there are any rows that consist entirely of zeros, then they are grouped together at the bottom of the matrix. 3. In any two successive rows that do not consist entirely of zeros, the leading 1 in the lower row occurs farther to the right than the leading 1 in the higher row. 4. Each column that contains a leading 1 has zeros everywhere else in that column. A matrix that has the first three properties is said to be in row echelon form. (Thus, a matrix in reduced row echelon form is of necessity in row echelon form, but not conversely.)

E X A M P L E 1 Row Echelon and Reduced Row Echelon Form The following matrices are in reduced row echelon form.

The following matrices are in row echelon form but not reduced row echelon form.

E X A M P L E 2 More on Row Echelon and Reduced Row Echelon Form As Example 1 illustrates, a matrix in row echelon form has zeros below each leading 1, whereas a matrix in reduced row echelon form has zeros below and above each leading 1. Thus, with any real numbers substituted for the *'s, all matrices of the following types are in row echelon form:

All matrices of the following types are in reduced row echelon form:

If, by a sequence of elementary row operations, the augmented matrix for a system of linear equations is put in reduced row echelon form, then the solution set can be obtained either by inspection or by converting certain linear equations to parametric form. Here are some examples. In Example 3 we could, if desired, express the solution more succinctly as the 4-tuple (3, −1, 0, 5).

E X A M P L E 3 Unique Solution Suppose that the augmented matrix for a linear system in the unknowns x1, x2, x3, and x4 has been reduced by elementary row operations to

This matrix is in reduced row echelon form and corresponds to the equations

Thus, the system has a unique solution, namely,

,

,

,

.

E X A M P L E 4 Linear Systems in Three Unknowns In each part, suppose that the augmented matrix for a linear system in the unknowns x, y, and z has been reduced by elementary row operations to the given reduced row echelon form. Solve the system.

Solution (a) The equation that corresponds to the last row of the augmented matrix is Since this equation is not satisfied by any values of x, y, and z, the system is inconsistent. (b) The equation that corresponds to the last row of the augmented matrix is This equation can be omitted since it imposes no restrictions on x, y, and z; hence, the linear system corresponding to the augmented matrix is

Since x and y correspond to the leading 1's in the augmented matrix, we call these the leading variables. The remaining variables (in this case z) are called free variables. Solving for the leading variables in terms of the free variables gives

From these equations we see that the free variable z can be treated as a parameter and assigned an arbitrary value, t, which then determines values for x and y. Thus, the solution set can be represented by the parametric equations By substituting various values for t in these equations we can obtain various solutions of the system. For example, setting yields the solution and setting

yields the solution

(c) As explained in part (b), we can omit the equations corresponding to the zero rows, in which case the linear system associated with the augmented matrix consists of the single equation (1) from which we see that the solution set is a plane in three-dimensional space. Although 1 is a valid form of the solution set, there are many applications in which it is preferable to express the solution set in parametric form. We can convert 1 to parametric form by solving for the leading variable x in terms of the free variables y and z to obtain From this equation we see that the free variables can be assigned arbitrary values, say and which then determine the value of x. Thus, the solution set can be expressed parametrically as

,

(2)

We will usually denote parameters in a general solution by the letters r, s, t,…, but any letters that do not conflict with the names of the unknowns can be used. For systems with more than three unknowns, subscripted letters such as t1, t2, t3,… are convenient.

Formulas, such as 2, that express the solution set of a linear system parametrically have some associated terminology.

DEFINITION 1 If a linear system has infinitely many solutions, then a set of parametric equations from which all solutions can be obtained by assigning numerial values to the parameters is called a general solution of the system.

Elimination Methods We have just seen how easy it is to solve a system of linear equations once its augmented matrix is in reduced row echelon form. Now we will give a step-by-step elimination procedure that can be used to reduce any matrix to reduced row echelon form. As we state each step in the procedure, we illustrate the idea by reducing the following matrix to reduced row echelon form.

Step 1. Locate the leftmost column that does not consist entirely of zeros.

Step 2. Interchange the top row with another row, if necessary, to bring a nonzero entry to the top of the column found in Step 1.

Step 3. If the entry that is now at the top of the column found in Step 1 is a, multiply the first row by 1/a in order to introduce a leading 1.

Step 4. Add suitable multiples of the top row to the rows below so that all entries below the leading 1 become zeros.

Step 5. Now cover the top row in the matrix and begin again with Step 1 applied to the submatrix that remains. Continue in this way until the entire matrix is in row echelon form.

The entire matrix is now in row echelon form. To find the reduced row echelon form we need the following additional step. Step 6. Beginning with the last nonzero row and working upward, add suitable multiples of each row to the rows above to introduce zeros above the leading 1's.

The last matrix is in reduced row echelon form. The procedure (or algorithm) we have just described for reducing a matrix to reduced row echelon form is called GaussJordan elimination. This algorithm consists of two parts, a forward phase in which zeros are introduced below the leading 1's and then a backward phase in which zeros are introduced above the leading 1's. If only the forward phase is used, then the procedure produces a row echelon form only and is called Gaussian elimination. For example, in the preceding computations a row echelon form was obtained at the end of Step 5.

Carl Friedrich Gauss (1777–1855)

Wilhelm Jordan (1842–1899) Historical Note Although versions of Gaussian elimination were known much earlier, the power of the method was not recognized until the great German mathematician Carl Friedrich Gauss used it to compute the orbit of the asteroid Ceres from limited data. What happened was this: On January 1, 1801 the Sicilian astronomer Giuseppe Piazzi (1746–1826) noticed a dim celestial object that he believed might be a “missing planet.” He named the object Ceres and made a limited number of positional observations but then lost the object as it neared the Sun. Gauss undertook the problem of computing the orbit from the limited data using least squares and the procedure that we now call Gaussian elimination. The work of Gauss caused a sensation when Ceres reappeared

a year later in the constellation Virgo at almost the precise position that Gauss predicted! The method was further popularized by the German engineer Wilhelm Jordan in his handbook on geodesy (the science of measuring Earth shapes) entitled Handbuch der Vermessungskunde and published in 1888. [Images: Granger Collection (Gauss); wikipedia (Jordan)]

E X A M P L E 5 Gauss-Jordan Elimination Solve by Gauss-Jordan elimination.

Solution The augmented matrix for the system is

Adding —2 times the first row to the second and fourth rows gives

Multiplying the second row by −1 and then adding −5 times the new second row to the third row and −4 times the new second row to the fourth row gives

Interchanging the third and fourth rows and then multiplying the third row of the resulting matrix by gives the row echelon form

Adding −3 times the third row to the second row and then adding 2 times the second row of the resulting matrix to the first row yields the reduced row echelon form

The corresponding system of equations is

(3)

Note that in constructing the linear system in 3 we ignored the row of zeros in the corresponding augmented matrix. Why is this justified? Solving for the leading variables we obtain

Finally, we express the general solution of the system parametrically by assigning the free variables x2, x4, and x5 arbitrary values r, s, and t, respectively. This yields

Homogeneous Linear Systems A system of linear equations is said to be homogeneous if the constant terms are all zero; that is, the system has the form

Every homogeneous system of linear equations is consistent because all such systems have , ,…, as a solution. This solution is called the trivial solution; if there are other solutions, they are called nontrivial solutions. Because a homogeneous linear system always has the trivial solution, there are only two possibilities for its solutions: • The system has only the trivial solution. • The system has infinitely many solutions in addition to the trivial solution. In the special case of a homogeneous linear system of two equations in two unknowns, say

the graphs of the equations are lines through the origin, and the trivial solution corresponds to the point of intersection at the origin (Figure 1.2.1).

Figure 1.2.1 There is one case in which a homogeneous system is assured of having nontrivial solutions—namely, whenever the system involves more unknowns than equations. To see why, consider the following example of four equations in six unknowns.

E X A M P L E 6 A Homogeneous System Use Gauss-Jordan elimination to solve the homogeneous linear system

(4)

Solution Observe first that the coefficients of the unknowns in this system are the same as those in Example 5; that is, the two systems differ only in the constants on the right side. The augmented matrix for the given homogeneous system is

(5)

which is the same as the augmented matrix for the system in Example 5, except for zeros in the last column. Thus, the reduced row echelon form of this matrix will be the same as that of the augmented matrix in Example 5, except for the last column. However, a moment's reflection will make it evident that a column of zeros is not changed by an elementary row operation, so the reduced row echelon form of 5 is

(6)

The corresponding system of equations is

Solving for the leading variables we obtain (7) If we now assign the free variables x2, x4, and x5 arbitrary values r, s, and t, respectively, then we can

express the solution set parametrically as Note that the trivial solution results when

.

Free Variable in Homogeneous Linear Systems Example 6 illustrates two important points about solving homogeneous linear systems: 1. Elementary row operations do not alter columns of zeros in a matrix, so the reduced row echelon form of the augmented matrix for a homogeneous linear system has a final column of zeros. This implies that the linear system corresponding to the reduced row echelon form is homogeneous, just like the original system. 2. When we constructed the homogeneous linear system corresponding to augmented matrix 6, we ignored the row of zeros because the corresponding equation does not impose any conditions on the unknowns. Thus, depending on whether or not the reduced row echelon form of the augmented matrix for a homogeneous linear system has any rows of zero, the linear system corresponding to that reduced row echelon form will either have the same number of equations as the original system or it will have fewer. Now consider a general homogeneous linear system with n unknowns, and suppose that the reduced row echelon form of the augmented matrix has r nonzero rows. Since each nonzero row has a leading 1, and since each leading 1 corresponds to a leading variable, the homogeneous system corresponding to the reduced row echelon form of the augmented matrix must have r leading variables and free variables. Thus, this system is of the form

(8)

where in each equation the expression summary, we have the following result.

denotes a sum that involves the free variables, if any [see 7, for example]. In

THEOREM 1.2.1 Free Variable Theorem for Homogeneous Systems If a homogeneous linear system has n unknowns, and if the reduced row echelon form of its augmented matrix has r nonzero rows, then the system has n - r free variables.

Note that Theorem 1.2.2 applies only to homogeneous systems—a nonhomogeneous system with more unknowns than equations need not be consistent. However, we will prove later that if a nonhomogeneous system with more unknowns then equations is consistent, then it has in infinitely many solutions.

Theorem 1.2.1 has an important implication for homogeneous linear systems with more unknowns than equations. Specifically, if a homogeneous linear system has m equations in n unknowns, and if , then it must also be true that (why?). This being the case, the theorem implies that there is at least one free variable, and this implies in turn that the system has infinitely many solutions. Thus, we have the following result.

THEOREM 1.2.2 A homogeneous linear system with more unknowns than equations has infinitely many solutions.

In retrospect, we could have anticipated that the homogeneous system in Example 6 would have infinitely many solutions since it has four equations in six unknowns.

Gaussian Elimination and Back-Substitution For small linear systems that are solved by hand (such as most of those in this text), Gauss-Jordan elimination (reduction to reduced row echelon form) is a good procedure to use. However, for large linear systems that require a computer solution, it is generally more efficient to use Gaussian elimination (reduction to row echelon form) followed by a technique known as back-substitution to complete the process of solving the system. The next example illustrates this technique.

E X A M P L E 7 Example 5 Solved by Back-Substitution From the computations in Example 5, a row echelon form of the augmented matrix is

To solve the corresponding system of equations

we proceed as follows: Step 1. Solve the equations for the leading variables.

Step 2. Beginning with the bottom equation and working upward, successively substitute each equation into all the equations above it. Substituting

into the second equation yields

Substituting

into the first equation yields

Step 3. Assign arbitrary values to the free variables, if any. If we now assign x2, x4, and x5 the arbitrary values r, s, and t, respectively, the general solution is given by the formulas

This agrees with the solution obtained in Example 5.

EXAMPLE 8 Suppose that the matrices below are augmented matrices for linear systems in the unknowns x1, x2, x3, and x4. These matrices are all in row echelon form but not reduced row echelon form. Discuss the existence and uniqueness of solutions to the corresponding linear systems

Solution (a) The last row corresponds to the equation from which it is evident that the system is inconsistent. (b) The last row corresponds to the equation which has no effect on the solution set. In the remaining three equations the variables x1, x2, and x3 correspond to leading 1's and hence are leading variables. The variable x4 is a free variable. With a little algebra, the leading variables can be expressed in terms of the free variable, and the free variable can be assigned an arbitrary value. Thus, the system must have infinitely many solutions. (c) The last row corresponds to the equation which gives us a numerical value for x4. If we substitute this value into the third equation, namely, we obtain . You should now be able to see that if we continue this process and substitute the known values of x3 and x4 into the equation corresponding to the second row, we will obtain a unique numerical value for x2; and if, finally, we substitute the known values of x4, x3, and x2 into the

equation corresponding to the first row, we will produce a unique numerical value for x1. Thus, the system has a unique solution.

Some Facts About Echelon Forms There are three facts about row echelon forms and reduced row echelon forms that are important to know but we will not prove: 1. Every matrix has a unique reduced row echelon form; that is, regardless of whether you use Gauss-Jordan elimination or some other sequence of elementary row operations, the same reduced row echelon form will result in the end.* 2. Row echelon forms are not unique; that is, different sequences of elementary row operations can result in different row echelon forms. 3. Although row echelon forms are not unique, all row echelon forms of a matrix A have the same number of zero rows, and the leading 1's always occur in the same positions in the row echelon forms of A. Those are callled the pivot positions of A. A column that contains a pivot position is called a pivot column of A.

E X A M P L E 9 Pivot Positions and Columns Earlier in this section (immediately after Definition 1) we found a row echelon form of

to be

The leading 1's occur in positions (row 1, column 1), (row 2, column 3), and (row 3, column 5). These are the pivot positions. The pivot columns are columns 1,3, and 5.

Roundoff Error and Instability There is often a gap between mathematical theory and its practical implementation—Gauss-Jordan elimination and Gaussian elimination being good examples. The problem is that computers generally approximate numbers, thereby introducing roundoff errors, so unless precautions are taken, successive calculations may degrade an answer to a degree that makes it useless. Algorithms (procedures) in which this happens are called unstable. There are various techniques for minimizing roundoff error and instability. For example, it can be shown that for large linear systems Gauss-Jordan elimination involves roughly 50% more operations than Gaussian elimination, so most computer algorithms are based on the latter method. Some of these matters will be considered in Chapter 9.

Concept Review • Reduced row echelon form • Row echelon form • Leading 1 • Leading variables • Free variables • General solution to a linear system • Gaussian elimination • Gauss-Jordan elimination • Forward phase • Backward phase • Homogeneous linear system • Trivial solution • Nontrivial solution • Dimension Theorem for Homogeneous Systems • Back-substitution

Skills • Recognize whether a given matrix is in row echelon form, reduced row echelon form, or neither. • Construct solutions to linear systems whose corresponding augmented matrices that are in row echelon form or reduced row echelon form. • Use Gaussian elimination to find the general solution of a linear system. • Use Gauss-Jordan elimination in order to find the general solution of a linear system. • Analyze homogeneous linear systems using the Free Variable Theorem for Homogeneous Systems.

Exercise Set 1.2 1. In each part, determine whether the matrix is in row echelon form, reduced row echelon form, both, or neither. (a)

(b)

(c)

(d)

(e)

(f)

(g)

Answer: (a) Both (b) Both (c) Both (d) Both (e) Both (f) Both (g) Row echelon 2. In each part, determine whether the matrix is in row echelon form, reduced row echelon form, both, or neither. (a)

(b)

(c)

(d)

(e)

(f)

(g) 3. In each part, suppose that the augmented matrix for a system of linear equations has been reduced by row operations to the given reduced row echelon form. Solve the system.

(a)

(b)

(c)

(d)

Answer: (a) (b) (c) (d) Inconsistent 4. In each part, suppose that the augmented matrix for a system of linear equations has been reduced by row operations to the given reduced row echelon form. Solve the system. (a)

(b)

(c)

(d)

In Exercises 5–8, solve the linear system by Gauss-Jordan elimination. 5.

Answer:

6.

7.

Answer:

8.

In Exercises 9–12, solve the linear system by Gaussian elimination. 9. Exercise 5 Answer:

10. Exercise 6 11. Exercise 7 Answer:

12. Exercise 8 In Exercises 13–16, determine whether the homogeneous system has nontrivial solutions by inspection (without pencil and paper). 13.

Answer: Has nontrivial solutions 14.

15.

Answer: Has nontrivial solutions 16.

In Exercises 17–24, solve the given homogeneous linear system by any method. 17.

Answer:

18.

19.

Answer:

20.

21.

Answer:

22.

23.

Answer:

24.

In Exercises 25–28, determine the values of a for which the system has no solutions, exactly one solution, or infinitely

many solutions. 25.

Answer: If , there are infinitely many solutions; if solution.

, there are no solutions; if

, there is exactly one

, there are no solutions; if

, there is exactly one

26.

27.

Answer: If , there are infinitely many solutions; if solution. 28.

In Exercises 29–30, solve the following systems, where a, b, and c are constants. 29.

Answer:

30.

31. Find two different row echelon forms of

This exercise shows that a matrix can have multiple row echelon forms. Answer: and 32. Reduce

are possible answers.

to reduced row echelon form without introducing fractions at any intermediate stage. 33. Show that the following nonlinear system has 18 solutions if

[Hint: Begin by making the substitutions

,

,

, and

, and

.

.]

34. Solve the following system of nonlinear equations for the unknown angles α, β, and γ, where , and .

,

35. Solve the following system of nonlinear equations for x, y, and z.

[Hint: Begin by making the substitutions

,

,

.]

Answer:

36. Solve the following system for x, y, and z.

37. Find the coefficients a, b, c, and d so that the curve shown in the accompanying figure is the graph of the equation .

Figure Ex-37 Answer:

38. Find the coefficients a, b, c, and d so that the curve shown in the accompanying figure is given by the equation .

Figure Ex-38 39. If the linear system

has only the trivial solution, what can be said about the solutions of the following system?

Answer: The nonhomogeneous system will have exactly one solution. 40. (a) If A is a

matrix, then what is the maximum possible number of leading 1's in its reduced row echelon form?

(b) If B is a matrix whose last column has all zeros, then what is the maximum possible number of parameters in the general solution of the linear system with augmented matrix B? (c) If C is a C?

matrix, then what is the minimum possible number of rows of zeros in any row echelon form of

41. (a) Prove that if

, then the reduced row echelon form of

(b) Use the result in part (a) to prove that if

, then the linear system

has exactly one solution. 42. Consider the system of equations

Discuss the relative positions of the lines , , and only the trivial solution, and (b) the system has nontrivial solutions.

when (a) the system has

43. Describe all possible reduced row echelon forms of (a)

(b)

True-False Exercises In parts (a)–(i) determine whether the statement is true or false, and justify your answer. (a) If a matrix is in reduced row echelon form, then it is also in row echelon form. Answer: True (b) If an elementary row operation is applied to a matrix that is in row echelon form, the resulting matrix will still be in row echelon form. Answer: False (c) Every matrix has a unique row echelon form. Answer: False (d) A homogeneous linear system in n unknowns whose corresponding augmented matrix has a reduced row echelon form with r leading 1's has n − r free variables. Answer: True (e) All leading 1's in a matrix in row echelon form must occur in different columns. Answer: True (f) If every column of a matrix in row echelon form has a leading 1 then all entries that are not leading 1's are zero. Answer: False (g) If a homogeneous linear system of n equations in n unknowns has a corresponding augmented matrix with a reduced row echelon form containing n leading 1's, then the linear system has only the trivial solution. Answer: True

(h) If the reduced row echelon form of the augmented matrix for a linear system has a row of zeros, then the system must have infinitely many solutions. Answer: False (i) If a linear system has more unknowns than equations, then it must have infinitely many solutions. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

1.3 Matrices and Matrix Operations Rectangular arrays of real numbers arise in contexts other than as augmented matrices for linear systems. In this section we will begin to study matrices as objects in their own right by defining operations of addition, subtraction, and multiplication on them.

Matrix Notation and Terminology In Section 1.2 we used rectangular arrays of numbers, called augmented matrices, to abbreviate systems of linear equations. However, rectangular arrays of numbers occur in other contexts as well. For example, the following rectangular array with three rows and seven columns might describe the number of hours that a student spent studying three subjects during a certain week: Mon. Tues. Wed. Thurs. Fri. Sat. Sun. Math

2

3

2

4

1

4

2

History

0

3

1

4

3

2

2

Language

4

1

3

1

0

0

2

If we suppress the headings, then we are left with the following rectangular array of numbers with three rows and seven columns, called a “matrix”:

More generally, we make the following definition.

DEFINITION 1 A matrix is a rectangular array of numbers. The numbers in the array are called the entries in the matrix.

A matrix with only one column is called a column vector or a column matrix, and a matrix with only one row is called a row vector or a row matrix. In Example 1, the matrix is a column vector, the matrix is a row vector, and the matrix is both a row vector and a column vector.

E X A M P L E 1 Examples of Matrices Some examples of matrices are

The size of a matrix is described in terms of the number of rows (horizontal lines) and columns (vertical lines) it contains. For example, the first matrix in Example 1 has three rows and two columns, so its size is 3 by 2 (written ). In a size description, the first number always denotes the number of rows, and the second denotes the number of columns. The remaining matrices in Example 1 have sizes , , , and , respectively. We will use capital letters to denote matrices and lowercase letters to denote numerical quantities; thus we might write

When discussing matrices, it is common to refer to numerical quantities as scalars. Unless stated otherwise, scalars will be real numbers; complex scalars will be considered later in the text. Matrix brackets are often omitted from matrices, making it impossible to tell, for example, whether the symbol 4 denotes the number “four” or the matrix [4]. This rarely causes problems because it is usually possible to tell which is meant from the context.

The entry that occurs in row i and column j of a matrix A will be denoted by aij. Thus a general written as

and a general

matrix might be

matrix as (1)

When a compact notation is desired, the preceding matrix can be written as the first notation being used when it is important in the discussion to know the size, and the second being used when the size need not be emphasized. Usually, we will match the letter denoting a matrix with the letter denoting its entries; thus, for a matrix B we would generally use bij for the entry in row i and column j, and for a matrix C we would use the notation cij. The entry in row i and column j of a matrix A is also commonly denoted by the symbol (A)ij. Thus, for matrix 1 above, we have

and for the matrix

we have

, and

.

Row and column vectors are of special importance, and it is common practice to denote them by boldface lowercase letters rather than capital letters. For such matrices, double subscripting of the entries is unnecessary. Thus a general row vector a and a general column vector b would be written as

A matrix A with n rows and n columns is called a square matrix of order n, and the shaded entries in 2 are said to be on the main diagonal of A.

(2)

Operations on Matrices So far, we have used matrices to abbreviate the work in solving systems of linear equations. For other applications, however, it is desirable to develop an “arithmetic of matrices” in which matrices can be added, subtracted, and multiplied in a useful way. The remainder of this section will be devoted to developing this arithmetic.

DEFINITION 2 Two matrices are defined to be equal if they have the same size and their corresponding entries are equal.

The equality of two matrices of the same size can be expressed either by writing or by writing where it is understood that the equalities hold for all values of i and j.

E X A M P L E 2 Equality of Matrices Consider the matrices

If , then , but for all other values of x the matrices A and B are not equal, since not all of their corresponding entries are equal. There is no value of x for which since A and C have different sizes.

DEFINITION 3 If A and B are matrices of the same size, then the sum is the matrix obtained by adding the entries of B to the corresponding entries of A, and the difference is the matrix obtained by subtracting the entries of B from the corresponding entries of A. Matrices of different sizes cannot be added or subtracted. In matrix notation, if

and

have the same size, then

E X A M P L E 3 Addition and Subtraction Consider the matrices

Then

The expressions

, and

are undefined.

DEFINITION 4 If A is any matrix and c is any scalar, then the product cA is the matrix obtained by multiplying each entry of the matrix A by c. The matrix cA is said to be a scalar multiple of A.

In matrix notation, if

, then

E X A M P L E 4 Scalar Multiples For the matrices

we have

It is common practice to denote (− 1)B by −B.

Thus far we have defined multiplication of a matrix by a scalar but not the multiplication of two matrices. Since matrices are added by adding corresponding entries and subtracted by subtracting corresponding entries, it would seem natural to define multiplication of matrices by multiplying corresponding entries. However, it turns out that such a definition would not be very useful for most problems. Experience has led mathematicians to the following more useful definition of matrix multiplication.

DEFINITION 5 If A is an matrix and B is an matrix, then the product AB is the matrix whose entries are determined as follows: To find the entry in row i and column j of AB, single out row i from the matrix A and column j from the matrix B. Multiply the corresponding entries from the row and column together, and then add up the resulting products.

E X A M P L E 5 Multiplying Matrices Consider the matrices

Since A is a matrix and B is a matrix, the product AB is a matrix. To determine, for example, the entry in row 2 and column 3 of AB, we single out row 2 from A and column 3 from B. Then, as illustrated below, we multiply corresponding entries together and add up these products.

The entry in row 1 and column 4 of AB is computed as follows:

The computations for the remaining entries are

The definition of matrix multiplication requires that the number of columns of the first factor A be the same as the number of rows of the second factor B in order to form the product AB. If this condition is not satisfied, the product is undefined. A convenient way to determine whether a product of two matrices is defined is to write down the size of the first factor and, to the right of it, write down the size of the second factor. If, as in 3, the inside numbers are the same, then the product is defined. The outside numbers then give the size of the product.

(3)

Gotthold Eisenstein (1823–1852) Historical Note The concept of matrix multiplication is due to the German mathematician Gotthold Eisenstein, who introduced the idea around 1844 to simplify the process of making substitutions in linear systems. The idea was then expanded on and formalized by Cayley in his Memoir on the Theory of Matrices that was published in 1858. Eisenstein was a pupil of Gauss, who ranked him as the equal of Isaac Newton and Archimedes. However, Eisenstein, suffering from bad health his entire life, died at age 30, so his potential was never realized. [Image: wikipedia]

E X A M P L E 6 Determining Whether a Product Is Defined Suppose that A, B, and C are matrices with the following sizes:

Then by 3, AB is defined and is a matrix; BC is defined and is a and is a matrix. The products AC, CB, and BA are all undefined.

In general, if

is an

matrix and

is an

matrix; and CA is defined

matrix, then, as illustrated by the shading in 4,

(4)

the entry

in row i and column j of AB is given by (5)

Partitioned Matrices A matrix can be subdivided or partitioned into smaller matrices by inserting horizontal and vertical rules between selected rows and columns. For example, the following are three possible partitions of a general matrix A—the first is a partition of A into four submatrices A11, A12, A21, and A22; the second is a partition of A into its row vectors r1, r2, and r3; and the third is a partition of A into its column vectors c1, c2, c3, and c4:

Matrix Multiplication by Columns and by Rows Partitioning has many uses, one of which is for finding particular rows or columns of a matrix product AB without computing the entire product. Specifically, the following formulas, whose proofs are left as exercises, show how individual column vectors of AB can be obtained by partitioning B into column vectors and how individual row vectors of AB can be obtained by partitioning A into row vectors.

(6)

(7)

In words, these formulas state that (8)

(9)

E X A M P L E 7 Example 5 Revisited If A and B are the matrices in Example 5, then from 8 the second column vector of AB can be obtained by the computation

and from 9 the first row vector of AB can be obtained by the computation

Matrix Products as Linear Combinations We have discussed three methods for computing a matrix product AB—entry by entry, column by column, and row by row. The following definition provides yet another way of thinking about matrix multiplication.

DEFINITION 6 If

are matrices of the same size, and if

are scalars, then an expression of the

form is called a linear combination of

with coefficients

To see how matrix products can be viewed as linear combinations, let A be an vector, say

.

matrix and x an

column

Then

(10)

This proves the following theorem.

THEOREM 1.3.1 If A is an matrix, and if x is an column vector, then the product Ax can be expressed as a linear combination of the column vectors of A in which the coefficients are the entries of x.

E X A M P L E 8 Matrix Products as Linear Combinations The matrix product

can be written as the following linear combination of column vectors

E X A M P L E 9 Columns of a Product AB as Linear Combinations We showed in Example 5 that

It follows from Formula 6 and Theorem 1.3.1 that the j th column vector of AB can be expressed as a linear combination of the column vectors of A in which the coefficients in the linear combination are the entries from the j th column of B. The computations are as follows:

Matrix Form of a Linear System Matrix multiplication has an important application to systems of linear equations. Consider a system of m linear equations in n unknowns:

Since two matrices are equal if and only if their corresponding entries are equal, we can replace the m equations in this system by the single matrix equation

The

matrix on the left side of this equation can be written as a product to give

If we designate these matrices by A, x, and b, respectively, then we can replace the original system of m equations in n unknowns has been replaced by the single matrix equation The matrix A in this equation is called the coefficient matrix of the system. The augmented matrix for the system is obtained by adjoining b to A as the last column; thus the augmented matrix is

The vertical bar in [A|b] is a convenient way to separate A from b visually; it has no mathematical significance.

Transpose of a Matrix We conclude this section by defining two matrix operations that have no analogs in the arithmetic of real numbers.

DEFINITION 7 If A is any

matrix, then the transpose of A, denoted by AT, is defined to be the

matrix that results

T

by interchanging the rows and columns of A; that is, the first column of A is the first row of A, the second column of AT is the second row of A, and so forth.

E X A M P L E 1 0 Some Transposes The following are some examples of matrices and their transposes.

Observe that not only are the columns of AT the rows of A, but the rows of AT are the columns of A. Thus the entry in row i and column j of AT is the entry in row j and column i of A; that is, (11) Note the reversal of the subscripts. In the special case where A is a square matrix, the transpose of A can be obtained by interchanging entries that are symmetrically positioned about the main diagonal. In 12 we see that AT can also be obtained by “reflecting” A about its main diagonal.

(12)

DEFINITION 8 If A is a square matrix, then the trace of A, denoted by tr(A), is defined to be the sum of the entries on the main diagonal of A. The trace of A is undefined if A is not a square matrix.

James Sylvester (1814–1897)

Arthur Cayley (1821–1895) Historical Note The term matrix was first used by the English mathematician (and lawyer) James Sylvester, who defined the term in 1850 to be an “oblong arrangement of terms.” Sylvester communicated his work on matrices to a fellow English mathematician and lawyer named Arthur Cayley, who then introduced some of the basic operations on matrices in a book entitled Memoir on the Theory of Matrices that was published in 1858. As a matter of interest, Sylvester, who was Jewish, did not get his college degree because he refused to sign a required oath to the Church of England. He was appointed to a chair at the University of Virginia in the United States but resigned after swatting a student with a stick because he was reading a newspaper in class.

Sylvester, thinking he had killed the student, fled back to England on the first available ship. Fortunately, the student was not dead, just in shock! [Images: The Granger Collection, New York]

E X A M P L E 11 Trace of a Matrix The following are examples of matrices and their traces.

In the exercises you will have some practice working with the transpose and trace operations.

Concept Review • Matrix • Entries • Column vector (or column matrix) • Row vector (or row matrix) • Square matrix • Main diagonal • Equal matrices • Matrix operations: sum, difference, scalar multiplication • Linear combination of matrices • Product of matrices (matrix multiplication) • Partitioned matrices • Submatrices • Row-column method • Column method • Row method • Coefficient matrix of a linear system • Transpose • Trace

Skills

• Determine the size of a given matrix. • Identify the row vectors and column vectors of a given matrix. • Perform the arithmetic operations of matrix addition, subtraction, scalar multiplication, and multiplication. • Determine whether the product of two given matrices is defined. • Compute matrix products using the row-column method, the column method, and the row method. • Express the product of a matrix and a column vector as a linear combination of the columns of the matrix. • Express a linear system as a matrix equation, and identify the coefficient matrix. • Compute the transpose of a matrix. • Compute the trace of a square matrix.

Exercise Set 1.3 1. Suppose that A, B, C, D, and E are matrices with the following sizes: A

B

C

D

E

In each part, determine whether the given matrix expression is defined. For those that are defined, give the size of the resulting matrix. (a) BA (b) (c) (d) (e) (f) E(AC) (g) ETA (h) Answer: (a) Undefined (b) (c) Undefined (d) Undefined (e) (f) (g) Undefined (h) 2. Suppose that A, B, C, D, and E are matrices with the following sizes:

A

B

C

D

E

In each part, determine whether the given matrix expression is defined. For those that are defined, give the size of the resulting matrix. (a) EA (b) ABT (c) (d) (e) (f) (g) (h) 3. Consider the matrices

In each part, compute the given expression (where possible). (a) (b) (c) 5A (d) (e) (f) (g) (h) (i) tr(D) (j) (k) 4 tr(7B) (l) tr(A) Answer: (a)

(b)

(c)

(d) (e) Undefined (f)

(g)

(h)

(i) 5 (j) (k) 168 (l) Undefined 4. Using the matrices in Exercise 3, in each part compute the given expression (where possible). (a) (b) (c) (d) (e) (f) (g) (h) (i) (CD)E (j) C(BA) (k) tr(DET) (l) tr(BC) 5. Using the matrices in Exercise 3, in each part compute the given expression (where possible). (a) AB (b) BA (c) (3E)D (d) (AB)C (e) A(BC) (f) CCT

(g) (DA)T (h) (i) tr(DDT) (j) (k) (l)

Answer: (a)

(b) Undefined (c)

(d)

(e)

(f) (g) (h)

(i) 61 (j) 35 (k) 28 (l) 99 6. Using the matrices in Exercise 3, in each part compute the given expression (where possible). (a) (b) (c) (d)

(e) (f) 7. Let

Use the row method or column method (as appropriate) to find (a) the first row of AB. (b) the third row of AB. (c) the second column of AB. (d) the first column of BA. (e) the third row of AA. (f) the third column of AA. Answer: (a) (b) (c)

(d)

(e) (f)

8. Referring to the matrices in Exercise 7, use the row method or column method (as appropriate) to find (a) the first column of AB. (b) the third column of BB. (c) the second row of BB. (d) the first column of AA. (e) the third column of AB. (f) the first row of BA. 9. Referring to the matrices A and B in Exercise 7, and Example 9, (a) express each column vectorof AA as a linear combination of the column vectors of A. (b) express each column vector of BB as a linear combination of the column vectors of B. Answer:

(a)

(b)

10. Referring to the matrices A and B in Exercise 7, and Example 9, (a) express each column vector of AB as a linear combination of the column vectors of A. (b) express each column vector of BA as a linear combination of the column vectors of B. 11. In each part, find matrices A, x, and b that express the given system of linear equations as a single matrix equation , and write out this matrix equation. (a)

(b)

Answer: (a)

(b)

12. In each part, find matrices A, x, and b that express the given system of linear equations as a single matrix equation , and write out this matrix equation. (a)

(b)

13. In each part, express the matrix equation as a system of linear equations. (a)

(b)

Answer: (a)

(b)

14. In each part, express the matrix equation as a system of linear equations. (a)

(b)

In Exercises 15–16, find all values of k, if any, that satisfy the equation. 15.

Answer: 16.

In Exercises 17–18, solve the matrix equation for a, b, c, and d. 17. Answer:

18. 19. Let A be any or

matrix and let 0 be the

matrix each of whose entries is zero. Show that if

.

20. (a) Show that if AB and BA are both defined, then AB and BA are square matrices. (b) Show that if A is an

matrix and A(BA) is defined, then B is an

matrix.

, then

21. Prove: If A and B are

matrices, then

.

22. (a) Show that if A has a row of zeros and B is any matrix for which AB is defined, then AB also has a row of zeros. (b) Find a similar result involving a column of zeros. 23. In each part, find a matrix [aij] that satisfies the stated condition. Make your answers as general as possible by using letters rather than specific numbers for the nonzero entries. (a) (b) (c) (d) Answer: (a)

(b)

(c)

(d)

24. Find the (a) (b)

matrix

whose entries satisfy the stated condition.

(c)

25. Consider the function

defined for

matrices x by

, where

Plot f(x) together with x in each case below. How would you describe the action of f? (a) (b) (c) (d)

Answer:

(a)

(b)

(c)

(d)

26. Let I be the

Show that 27. How many

matrix whose entry in row i and column j is

for every

matrix A.

matrices A can you find such that

for all choices of x, y, and z? Answer: One; namely, 28. How many

matrices A can you find such that

for all choices of x, y, and z? 29. A matrix B is said to be a square root of a matrix A if (a) (b)

Find two square roots of

.

.

How many different square roots can you find of

(c) Do you think that every

?

matrix has at least one square root? Explain your reasoning.

Answer: (a) (b)

Four;

30. Let 0 denote a

matrix, each of whose entries is zero.

(a) Is there a

matrix A such that

and

? Justify your answer.

(b) Is there a

matrix A such that

and

? Justify your answer.

True-False Exercises In parts (a)–(o) determine whether the statement is true or false, and justify your answer. (a)

The matrix

has no main diagonal.

Answer: True (b) An

matrix has m column vectors and n row vectors.

Answer: False (c) If A and B are

matrices, then

.

Answer: False (d) The i th row vector of a matrix product AB can be computed by multiplying A by the ith row vector of B. Answer: False (e) For every matrix A, it is true that

.

Answer: True (f) If A and B are square matrices of the same order, then

.

Answer: False (g) If A and B are square matrices of the same order, then

.

Answer: False (h) For every square matrix A, it is true that

.

Answer: True (i) If A is a Answer:

matrix and B is an

matrix such that BTAT is a

matrix, then

and

.

True (j) If A is an

matrix and c is a scalar, then

.

Answer: True (k) If A, B, and C are matrices of the same size such that

, then

.

Answer: True (l) If A, B, and C are square matrices of the same order such that

, then

Answer: False (m) If

is defined, then A and B are square matrices of the same size.

Answer: True (n) If B has a column of zeros, then so does AB if this product is defined. Answer: True (o) If B has a column of zeros, then so does BA if this product is defined. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

.

1.4 Inverses; Algebraic Properties of Matrices In this section we will discuss some of the algebraic properties of matrix operations. We will see that many of the basic rules of arithmetic for real numbers hold for matrices, but we will also see that some do not.

Properties of Matrix Addition and Scalar Multiplication The following theorem lists the basic algebraic properties of the matrix operations.

THEOREM 1.4.1 Properties of Matrix Arithmetic Assuming that the sizes of the matrices are such that the indicated operations can be performed, the following rules of matrix arithmetic are valid. (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m)

To prove any of the equalities in this theorem we must show that the matrix on the left side has the same size as that on the right and that the corresponding entries on the two sides are the same. Most of the proofs follow the same pattern, so we will prove part (d) as a sample. The proof of the associative law for multiplication is more complicated than the rest and is outlined in the exercises. There are three basic ways to prove that two matrices of the same size are equal—prove that corresponding entries are the same, prove that corresponding row vectors are the same, or prove that corresponding column vectors are the same.

Proof (d) We must show that and have the same size and that corresponding entries are equal. To form , the matrices B and C must have the same size, say , and the matrix A must then have m columns, so its size must be of the form . This makes an matrix. It follows that is also an matrix and, consequently, and have the same size. Suppose that and

,and are equal; that is,

. We want to show that corresponding entries of

for all values of i and j. But from the definitions of matrix addition and matrix multiplication, we have

Remark Although the operations of matrix addition and matrix multiplication were defined for pairs of matrices, associative laws (b) and (c) enable us to denote sums and products of three matrices as and ABC without inserting any parentheses. This is justified by the fact that no matter how parentheses are inserted, the associative laws guarantee that the same end result will be obtained. In general, given any sum or any product of matrices, pairs of parentheses can be inserted or deleted anywhere within the expression without affecting the end result.

E X A M P L E 1 Associativity of Matrix Multiplication As an illustration of the associative law for matrix multiplication, consider

Then

Thus

and

so

, as guaranteed by Theorem 1.4.1(c).

Properties of Matrix Multiplication Do not let Theorem 1.4.1 lull you into believing that all laws of real arithmetic carry over to matrix arithmetic. For example, you know that in real arithmetic it is always true that , which is called the commutative law for multiplication. In matrix arithmetic, however, the equality of AB and BA can fail for three possible reasons: 1. AB may be defined and BA may not (for example, if A is

and B is

).

2. AB and BA may both be defined, but they may have different sizes (for example, if A is ).

and B is

3. AB and BA may both be defined and have the same size, but the two matrices may be different (as illustrated in the next example). Do not read too much into Example 2—it does not rule out the possibility that AB and BA may be equal in certain cases, just that they are not equal in all cases. If it so happens that , then we say that AB and BA commute.

E X A M P L E 2 Order Matters in Matrix Multiplication Consider the matrices

Multiplying gives

Thus,

.

Zero Matrices A matrix whose entries are all zero is called a zero matrix. Some examples are

We will denote a zero matrix by 0 unless it is important to specify its size, in which case we will denote the zero matrix by .

It should be evident that if A and 0 are matrices with the same size, then Thus, 0 play s the same role in this matrix equation that the number 0 plays in the numerical equation . The following theorem lists the basic properties of zero matrices. Since the results should be self-evident, we will omit the formal proofs.

THEOREM 1.4.2 Properties of Zero Matrices If c is a scalar, and if the sizes of the matrices are such that the operations can be perfomed, then: (a) (b) (c) (d) (e) If

, then

or

.

Since we know that the commutative law of real arithmetic is not valid in matrix arithmetic, it should not be surprising that there are other rules that fail as well. For example, consider the following two laws of real arithmetic: • If • If

and

, then

. [The cancellation law]

, then at least one of the factors on the left is 0.

The next two examples show that these laws are not universally true in matrix arithmetic.

E X A M P L E 3 Failure of the Cancellation Law Consider the matrices

We leave it for you to confirm that

Although , canceling A from both sides of the equation would lead to the incorrect conclusion that . Thus, the cancellation law does not hold, in general, for matrix multiplication.

E X A M P L E 4 A Zero Product with Nonzero Factors

Here are two matrices for which

, but

and

:

Identity Matrices A square matrix with 1's on the main diagonal and zeros elsewhere is called an identity matrix. Some examples are

An identity matrix is denoted by the letter I. If it is important to emphasize the size, we will write In for the identity matrix. To explain the role of identity matrices in matrix arithmetic, let us consider the effect of multiplying a general matrix A on each side by an identity matrix. Multiplying on the right by the identity matrix yields

and multiplying on the left by the

identity matrix yields

The same result holds in general; that is, if A is any

matrix, then

Thus, the identity matrices play the same role in these matrix equations that the number 1 plays in the numerical equation . As the next theorem shows, identity matrices arise naturally in studying reduced row echelon forms of square matrices.

THEOREM 1.4.3 If R is the reduced row echelon form of an identity matrix In.

matrix A, then either R has a row of zeros or R is the

Proof Suppose that the reduced row echelon form of A is

Either the last row in this matrix consists entirely of zeros or it does not. If not, the matrix contains no zero rows, and consequently each of the n rows has a leading entry of 1. Since these leading 1's occur progressively farther to the right as we move down the matrix, each of these 1's must occur on the main diagonal. Since the other entries in the same column as one of these 1's are zero, R must be In. Thus, either R has a row of zeros or .

Inverse of a Matrix In real arithmetic every nonzero number a has a reciprocal

with the property

The number is sometimes called the multiplicative inverse of a. Our next objective is to develop an analog of this result for matrix arithmetic. For this purpose we make the following definition.

DEFINITION 1 If A is a square matrix, and if a matrix B of the same size can be found such that , then A is said to be invertible (or nonsingular) and B is called an inverse of A. If no such matrix B can be found, then A is said to be singular.

Remark The relationship is not changed by interchanging A and B, so if A is invertible and B is an inverse of A, then it is also true that B is invertible, and A is an inverse of B. Thus, when

we say that A and B are inverses of one another.

E X A M P L E 5 An Invertible Matrix Let

Then

Thus, A and B are invertible and each is an inverse of the other.

E X A M P L E 6 Class of Singular Matrices In general, a square matrix with a row or column of zeros is singular. To help understand why this is so, consider the matrix

To prove that A is singular we must show that there is no matrix B such that . For this purpose let be the column vectors of A. Thus, for any matrix B we can express the product BA as The column of zeros shows that

and hence that A is singular.

Properties of Inverses It is reasonable to ask whether an invertible matrix can have more than one inverse. The next theorem shows that the answer is no—an invertible matrix has exactly one inverse.

THEOREM 1.4.4 If B and C are both inverses of the matrix A, then

Proof Since B is an inverse of A, we have . But it is also true that

.

. Multiplying both sides on the right by C gives , so .

As a consequence of this important result, we can now speak of “the” inverse of an invertible matrix. If A is invertible, then its inverse will be denoted by the symbol . Thus, (1)

The inverse of A plays much the same role in matrix arithmetic that the reciprocal relationships and .

plays in the numerical

In the next section we will develop a method for computing the inverse of an invertible matrix of any size. For now we give the following theorem that specifies conditions under which a matrix is invertible and provides a simple formula for its inverse.

THEOREM 1.4.5 The matrix

is invertible if and only if

, in which case the inverse is given by the formula (2)

We will omit the proof, because we will study a more general version of this theorem later. For now, you should at least confirm the validity of Formula 2 by showing that .

Historical Note The formula for given in Theorem 1.4.5 first appeared (in a more general form) in Arthur Cayley's 1858 Memoir on the Theory of Matrices. The more general result that Cayley discovered will be studied later.

The quantity in Theorem 1.4.5 is called the determinant of the matrix A and is denoted by or alternatively by

Remark Figure 1.4.1 illustrates that the determinant of a matrix A is the product of the entries on its main diagonal minus the product of the entries off its main diagonal. In words, Theorem 1.4.5 states that a matrix A is invertible if and only if its determinant is nonzero, and if invertible, then its inverse can be obtained by interchanging its diagonal entries, reversing the signs of its off-diagonal entries, and multiplying the entries by the reciprocal of the determinant of A.

Figure 1.4.1

E X A M P L E 7 Calculating the Inverse of a 2 × 2 Matrix In each part, determine whether the matrix is invertible. If so, find its inverse. (a) (b)

Solution (a) The determinant of A is invertible, and its inverse is

We leave it for you to confirm that (b) The matrix is not invertible since

, which is nonzero. Thus, A is

. .

E X A M P L E 8 Solution of a Linear System by Matrix Inversion A problem that arises in many applications is to solve a pair of equations of the form

for x and y in terms of u and v. One approach is to treat this as a linear system of two equations in the unknowns x and y and use Gauss–Jordan elimination to solve for x and y. However, because the coefficients of the unknowns are literal rather than numerical, this procedure is a little clumsy. As an alternative approach, let us replace the two equations by the single matrix equation

which we can rewrite as

If we assume that the matrix is invertible (i.e., the left by the inverse and rewrite the equation as

), then we can multiply through on

which simplifies to

Using Theorem 1.4.5, we can rewrite this equation as

from which we obtain

The next theorem is concerned with inverses of matrix products.

THEOREM 1.4.6 If A and B are invertible matrices with the same size, then AB is invertible and

Proof We can establish the invertibility and obtain the stated formula at the same time by showing that

But

and similarly,

.

Although we will not prove it, this result can be extended to three or more factors: A product of any number ofinvertible matrices is invertible, and the inverse of the product is the product of the inverses in the reverse order.

E X A M P L E 9 The Inverse of a Product

Consider the matrices

We leave it for you to show that

and also that

Thus,

as guaranteed by Theorem 1.4.6.

Powers of a Matrix If A is a square matrix, then we define the nonnegative integer powers of A to be

and if A is invertible, then we define the negative integer powers of A to be

Because these definitions parallel those for real numbers, the usual laws of nonnegative exponents hold; for example,

If a product of matrices is singular, then at least one of the factors must be singular. Why?

In addition, we have the following properties of negative exponents.

THEOREM 1.4.7 If A is invertible and n is a nonnegative integer, then: (a)

is invertible and

.

(b) An is invertible and

.

(c) kA is invertible for any nonzero scalar k, and

.

We will prove part (c) and leave the proofs of parts (a) and (b) as exercises. Proof (c) Properties (c) and (m) in Theorem 1.4.1 imply that

and similarly,

. Thus, kA is invertible and

.

E X A M P L E 1 0 Properties of Exponents Let A and

be the matrices in Example 9; that is,

Then

Also,

so, as expected from Theorem 1.4.7(b),

E X A M P L E 11 The Square of a Matrix Sum In real arithmetic, where we have a commutative law for multiplication, we can write However, in matrix arithmetic, where we have no commutative law for multiplication, the best we can do is to write It is only in the special case where A and B commute (i.e., further and write

) that we can go a step

Matrix Polynomials If A is a square matrix, say

, and if

is any polynomial, then we define the

matrix p(A) to be (3)

where I is the identity matrix; that is, p(A) is obtained by substituting A for x and replacing the constant term by the matrix . An expression of form 3 is called a matrix polynomial in A.

E X A M P L E 1 2 A Matrix Polynomial Find

for

Solution

or more briefly,

.

Remark It follows from the fact that that powers of a square matrix commute, and since a matrix polynomial in A is built up from powers of A, any two matrix polynomials in A also commute; that is, for any polynomials p1 and p2 we have

(4)

Properties of the Transpose

The following theorem lists the main properties of the transpose.

THEOREM 1.4.8 If the sizes of the matrices are such that the stated operations can be performed, then: (a) (b) (c) (d) (e)

If you keep in mind that transposing a matrix interchanges its rows and columns, then you should have little trouble visualizing the results in parts (a)–(d). For example, part (a) states the obvious fact that interchanging rows and columns twice leaves a matrix unchanged; and part (b) states that adding two matrices and then interchanging the rows and columns produces the same result as interchanging the rows and columns before adding. We will omit the formal proofs. Part (e) is a less obvious, but for brevity we will omit its proof as well. The result in that part can be extended to three or more factors and restated as: The transpose of a product of any number of matrices is the product of the transposes in the reverse order.

The following theorem establishes a relationship between the inverse of a matrix and the inverse of its transpose.

THEOREM 1.4.9 If A is an invertible matrix, then AT is also invertible and

Proof We can establish the invertibility and obtain the formula at the same time by showing that

But from part (e) of Theorem 1.4.8 and the fact that

, we have

which completes the proof.

E X A M P L E 1 3 Inverse of a Transpose Consider a general

invertible matrix and its transpose:

Since A is invertible, its determinant

is nonzero. But the determinant of AT is also

(verify), so AT is also invertible. It follows from Theorem 1.4.5 that

which is the same matrix that results if

as guaranteed by Theorem 1.4.9.

Concept Review • Commutative law for matrix addition • Associative law for matrix addition • Associative law for matrix multiplication • Left and right distributive laws • Zero matrix • Identity matrix • Inverse of a matrix • Invertible matrix • Nonsingular matrix • Singular matrix • Determinant • Power of a matrix

is transposed (verify). Thus,

• Matrix polynomial

Skills • Know the arithmetic properties of matrix operations. • Be able to prove arithmetic properties of matrices. • Know the properties of zero matrices. • Know the properties of identity matrices. • Be able to recognize when two square matrices are inverses of each other. • Be able to determine whether a

matrix is invertible.

• Be able to solve a linear system of two equations in two unknowns whose coefficient matrix is invertible. • Be able to prove basic properties involving invertible matrices. • Know the properties of the matrix transpose and its relationship with invertible matrices.

Exercise Set 1.4 1. Let

Show that (a) (b) (c) (d) 2. Using the matrices and scalars in Exercise 1, verify that (a) (b) (c) (d) 3. Using the matrices and scalars in Exercise 1, verify that (a) (b) (c) (d)

In Exercises 4–7 use Theorem 1.4.5 to compute the inverses of the following matrices. 4. 5. Answer:

6. 7. Answer:

8. Find the inverse of

9. Find the inverse of

Answer:

10. Use the matrix A in Exercise 4 to verify that

.

11. Use the matrix B in Exercise 5 to verify that

.

12. Use the matrices A and B in 4 and 5 to verify that

.

13. Use the matrices A, B, and C in Exercises 4–6 to verify that In Exercises 14–17, use the given information to find A. 14. 15.

Answer:

16. 17.

Answer:

18. Let A be the matrix

In each part, compute the given quantity. (a) (b) (c) (d)

, where

(e)

, where

(f)

, where

19. Repeat Exercise 18 for the matrix

Answer:

.

(a) (b) (c) (d) (e) (f) 20. Repeat Exercise 18 for the matrix

21. Repeat Exercise 18 for the matrix

Answer: (a)

(b)

(c)

(d)

(e)

(f)

In Exercises 22–24, let

, and

. Show that

for the given matrix. 22. The matrix A in Exercise 18. 23. The matrix A in Exercise 21. 24. An arbitrary square matrix A. 25. Show that if

then

and

.

26. Show that if

then

and

.

27. Consider the matrix

where

. Show that A is invertible and find its inverse.

Answer:

28. Show that if a square matrix A satisfies

, then

29. (a) Show that a matrix with a row of zeros cannot have an inverse. (b) Show that a matrix with a column of zeros cannot have an inverse. 30. Assuming that all matrices are

and invertible, solve for D.

.

31. Assuming that all matrices are

and invertible, solve for D.

Answer:

32. If A is a square matrix and n is a positive integer, is it true that

? Justify your answer.

33. Simplify:

Answer:

34. Simplify:

In Exercises 35–37, determine whether A is invertible, and if so, find the inverse. [Hint: Solve by equating corresponding entries on the two sides.] 35.

Answer:

36.

37.

Answer:

for X

38. Prove Theorem 1.4.2. In Exercises 39–42, use the method of Example 8 to find the unique solution of the given linear system. 39.

Answer:

40. 41.

Answer:

42. 43. Prove part (a) of Theorem 1.4.1. 44. Prove part (c) of Theorem 1.4.1. 45. Prove part (f) of Theorem 1.4.1. 46. Prove part (b) of Theorem 1.4.2. 47. Prove part (c) of Theorem 1.4.2. 48. Verify Formula 4 in the text by a direct calculation. 49. Prove part (d) of Theorem 1.4.8. 50. Prove part (e) of Theorem 1.4.8. 51. (a) Show that if A is invertible and

, then

.

(b) Explain why part (a) and Example 3 do not contradict one another. 52. Show that if A is invertible and k is any nonzero scalar, then 53. (a) Show that if A, B, and

for all integer values of n.

are invertible matrices with the same size, then

(b) What does the result in part (a) tell you about the matrix

?

54. A square matrix A is said to be idempotent if (a) Show that if A is idempotent, then so is (b) Show that if A is idempotent, then 55. Show that if A is a square matrix such that invertible and

. . is invertible and is its own inverse. for some positive integer k, then the matrix A is

True-False Exercises In parts (a)–(k) determine whether the statement is true or false, and justify your answer. (a) Two

matrices, A and B, are inverses of one another if and only if

.

Answer: False (b) For all square matrices A and B of the same size, it is true that

.

Answer: False (c) For all square matrices A and B of the same size, it is true that

.

Answer: False (d) If A and B are invertible matrices of the same size, then AB is invertible and

.

Answer: False (e) If A and B are matrices such that AB is defined, then it is true that Answer: False (f) The matrix

is invertible if and only if Answer: True

.

.

(g) If A and B are matrices of the same size and k is a constant, then Answer: True (h) If A is an invertible matrix, then so is

.

Answer: True (i) If

and I is an identity matrix, then .

Answer: False (j) A square matrix containing a row or column of zeros cannot be invertible. Answer: True (k) The sum of two invertible matrices of the same size must be invertible. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

.

1.5 Elementary Matrices and a Method for Finding A −1 In this section we will develop an algorithm for finding the inverse of a matrix, and we will discuss some of the basic properties of invertible matrices. In Section 1.1 we defined three elementary row operations on a matrix A: 1. Multiply a row by a nonzero constant c. 2. Interchange two rows. 3. Add a constant c times one row to another. It should be evident that if we let B be the matrix that results from A by performing one of the operations in this list, then the matrix A can be recovered from B by performing the corresponding operation in the following list: 1. Multiply the same row by 1/c. 2. Interchange the same two rows. 3. If B resulted by adding c times row r1 of A to row r2, then add −c times r1 to r2. It follows that if B is obtained from A by performing a sequence of elementary row operations, then there is a second sequence of elementary row operations, which when applied to B recovers A (Exercise 43). Accordingly, we make the following definition.

DEFINITION 1 Matrices A and B are said to be row equivalent if either (hence each) can be obtained from the other by a sequence of elementary row operations.

Our next goal is to show how matrix multiplication can be used to carry out an elementary row operation.

DEFINITION 2 An matrix is called an elementary matrix if it can be obtained from the by performing a single elementary row operation.

identity matrix

E X A M P L E 1 Elementary Matrices and Row Operations Listed below are four elementary matrices and the operations that produce them.

The following theorem, whose proof is left as an exercises, shows that when a matrix A is multiplied on the left by an elementary matrix E, the effect is to perform an elementary row operation on A.

THEOREM 1.5.1 Row Operations by Matrix Multiplication If the elementary matrix E results from performing a certain row operation on Im and if A is an matrix, then the product EA is the matrix that results when this same row operation is performed on A.

E X A M P L E 2 Using Elementary Matrices Consider the matrix

and consider the elementary matrix

which results from adding 3 times the first row of

to the third row. The product EA is

which is precisely the same matrix that results when we add 3 times the first row of A to the third row. Theorem 1.5.1 will be a useful tool for developing new results about matrices, but as a practical matter it is usually preferable to perform row operations directly.

We know from the discussion at the beginning of this section that if E is an elementary matrix that results from performing an elementary row operation on an identity matrix I, then there is a second elementary row operation, which when applied to E, produces I back again. Table 1 lists these operations. The operations on the right side of the table are called the inverse operations of the corresponding operations on the left. Table 1 Row Operation on I That Produces E Row Operation on E That Reproduces I Multiply row i by

Multiply row i by 1/c

Interchange rows i and j

Interchange rows i and j

Add c times row i to row j

Add −c times row i to row j

E X A M P L E 3 Row Operations and Inverse Row Operations In each of the following, an elementary row operation is applied to the identity matrix to obtain an elementary matrix E, then E is restored to the identity matrix by applying the inverse row operation.

The next theorem is a key result about invertibility of elementary matrices. It will be a building block for many results that follow.

THEOREM 1.5.2 Every elementary matrix is invertible, and the inverse is also an elementary matrix.

Proof If E is an elementary matrix, then E results by performing some row operation on I. Let be the matrix that results when the inverse of this operation is performed on I. Applying Theorem 1.5.1 and using the fact that inverse row operations cancel the effect of each other, it follows that

Thus, the elementary matrix

is the inverse of E.

Equivalence Theorem One of our objectives as we progress through this text is to show how seemingly diverse ideas in linear algebra are related. The following theorem, which relates results we have obtained about invertibility of matrices, homogeneous linear systems, reduced row echelon forms, and elementary matrices, is our first step in that direction. As we study new topics, more statements will be added to this theorem.

THEOREM 1.5.3 Equivalent Statements If A is an

matrix, then the following statements are equivalent, that is, all true or all false.

(a) A is invertible. (b)

has only the trivial solution.

(c) The reduced row echelon form of A is

.

(d) A is expressible as a product of elementary matrices.

It may make the logic of our proof of Theorem 1.5.3 more apparent by writing the implications

This makes it evident visually that the validity

of any one statement implies the validity of all the others, and hence that the falsity of any one implies the falsity of the others. Proof We will prove the equivalence by establishing the chain of implications:

matrix

Assume A is invertible and let gives , or

be any solution of. Multiplying both sides of this equation by the , or , or . Thus, has only the

trivial solution. Let

be the matrix form of the system

(1)

and assume that the system has only the trivial solution. If we solve by Gauss-Jordan elimination, then the system of equations corresponding to the reduced row echelon form of the augmented matrix will be

(2)

Thus the augmented matrix

for 1 can be reduced to the augmented matrix

for 2 by a sequence of elementary row operations. If we disregard the last column (all zeros) in each of these matrices, we can conclude that the reduced row echelon form of A is . Assume that the reduced row echelon form of A is , so that A can be reduced to by a finite sequence of elementary row operations. By Theorem 1.5.1, each of these operations can be accomplished by multiplying on the left by an appropriate elementary matrix. Thus we can find elementary matrices such that (3)

By Theorem 1.5.2, by

are invertible. Multiplying both sides of Equation 3 on the left successively we obtain (4)

By Theorem 1.5.2, this equation expresses A as a product of elementary matrices. If A is a product of elementary matrices, then from Theorem 1.4.7 and Theorem 1.5.2, the matrix A is a product of invertible matrices and hence is invertible.

A Method for Inverting Matrices As a first application of Theorem 1.5.3, we will develop a procedure (or algorithm) that can be used to tell whether a given matrix is invertible, and if so, produce its inverse. To derive this algorithm, assume for the moment, that A is an invertible matrix. In Equation 3, the elementary matrices execute a sequence of row operations that reduce A to . If we multiply both sides of this equation on the right by and simplify, we obtain But this equation tells us that the same sequence of row operations that reduces A to . Thus, we have established the following result.

will transform

to

Inversion Algorithm To find the inverse of an invertible matrix A, find a sequence of elementary row operations that reduces A to the identity and then perform that same sequence of operations on to obtain .

A simple method for carrying out this procedure is given in the following example.

E X A M P L E 4 Using Row Operations to Find A−1 Find the inverse of

Solution We want to reduce A to the identity matrix by row operations and simultaneously apply these operations to I to produce . To accomplish this we will adjoin the identity matrix to the right side of A, thereby producing a partitioned matrix of the form Then we will apply row operations to this matrix until the left side is reduced to I; these operations will convert the right side to , so the final matrix will have the form

The computations are as follows:

Thus,

Often it will not be known in advance if a given matrix A is invertible. However, if it is not, then by parts (a) and (c) of Theorem 1.5.3 it will be impossible to reduce A to by elementary row operations. This will be signaled by a row of zeros appearing on the left side of the partition at some stage of the inversion algorithm. If this occurs, then you can stop the computations and conclude that A is not invertible.

E X A M P L E 5 Showing That a Matrix Is Not Invertible Consider the matrix

Applying the procedure of Example 4 yields

Since we have obtained a row of zeros on the left side, A is not invertible.

E X A M P L E 6 Analyzing Homogeneous Systems Use Theorem 1.5.3 to determine whether the given homogeneous system has nontrivial solutions. (a)

(b)

Solution From parts (a) and (b) of Theorem 1.5.3 a homogeneous linear system has only the trivial solution if and only if its coefficient matrix is invertible. From Example 4 and Example 5 the coefficient matrix of system (a) is invertible and that of system (b) is not. Thus, system (a) has only the trivial solution whereas system (b) has nontrivial solutions.

Concept Review • Row equivalent matrices • Elementary matrix • Inverse operations • Inversion algorithm

Skills • Determine whether a given square matrix is an elementary. • Determine whether two square matrices are row equivalent. • Apply the inverse of a given elementary rwo operation to a matrix. • Apply elementary row operations to reduce a given square matrix to the identity matrix.

• Understand the relationships between statements that are equivalent to the invertibility of a square matrix (Theorem 1.5.3). • Use the inversion algorithm to find the inverse of an invertible matrix. • Express an invertible matrix as a product of elementary matrices.

Exercise Set 1.5 1. Decide whether each matrix below is an elementary matrix. (a) (b) (c)

(d)

Answer: (a) Elementary (b) Not elementary (c) Not elementary (d) Not elementary 2. Decide whether each matrix below is an elementary matrix. (a) (b)

(c)

(d)

3. Find a row operation and the corresponding elementry matrix that will restore the given elementary matrix to

the identity matrix. (a) (b)

(c)

(d)

Answer: (a)

Add 3 times row 2 to row 1:

(b) Multiply row 1 by

:

(c) Add 5 times row 1 to row 3: (d) Swap rows 1 and 3:

4. Find a row operation and the corresponding elementry matrix that will restore the given elementary matrix to the identity matrix. (a) (b)

(c)

(d)

5. In each part, an elementary matrix E and a matrix A are given. Write down the row operation corresponding to E and show that the product EA results from applying the row operation to A. (a) (b)

(c)

Answer: (a)

Swap rows 1 and 2:

(b) Add

times row 2 to row 3:

(c) Add 4 times row 3 to row 1: 6. In each part, an elementary matrix E and a matrix A are given. Write down the row operation corresponding to E and show that the product EA results from applying the row operation to A. (a) (b)

(c)

In Exercises 7–8, use the following matrices.

7. Find an elementary matrix E that satisfies the equation. (a) (b) (c) (d) Answer: (a)

(b)

(c)

(d)

8. Find an elementary matrix E that satisfies the equation. (a) (b) (c) (d) In Exercises 9–24, use the inversion algorithm to find the inverse of the given matrix, if the inverse exists. 9. Answer:

10. 11. Answer:

12. 13.

Answer:

14.

15.

Answer: No inverse 16.

17.

Answer:

18.

19.

Answer:

20.

21.

Answer:

22.

23.

Answer:

24.

In Exercises 25–26, find the inverse of each of the following all nonzero. 25. (a)

(b)

Answer: (a)

matrices, where

, and k are

(b)

26. (a)

(b)

In Exercise 27–Exercise 28, find all values of c, if any, for which the given matrix is invertible. 27.

Answer:

28.

In Exercises 29–32, write the given matrix as a product of elementary matrices. 29. Answer:

30. 31.

Answer:

32.

In Exercises 33–36, write the inverse of the given matrix as a product of elementary matrices. 33. The matrix in Exercise 29. Answer:

34. The matrix in Exercise 30. 35. The matrix in Exercise 31. Answer:

36. The matrix in Exercise 32. In Exercises 37–38, show that the given matrices A and B are row equivalent, and find a sequence of elementary row operations that produces B from A. 37.

Answer: Add times the first row to the second row. Add times the first row to the third row. Add the second row to the first row. Add the second row to the third row. 38.

39. Show that if

is an elementary matrix, then at least one entry in the third row must be a zero.

times

40. Show that

is not invertible for any values of the entries. 41. Prove that if A and B are matrices, then A and B are row equivalent if and only if A and B have the same reduced row echelon form. 42. Prove that if A is an invertible matrix and B is row equivalent to A, then B is also invertible. 43. Show that if B is obtained from A by performing a sequence of elementary row operations, then there is a second sequence of elementary row operations, which when applied to B recovers A.

True-False Exercises In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) The product of two elementary matrices of the same size must be an elementary matrix. Answer: False (b) Every elementary matrix is invertible. Answer: True (c) If A and B are row equivalent, and if B and C are row equivalent, then A and C are row equivalent. Answer: True (d) If A is an

matrix that is not invertible, then the linear system

has infinitely many solutions.

Answer: True (e) If A is an matrix that is not invertible, then the matrix obtained by interchanging two rows of A cannot be invertible. Answer: True

(f) If A is invertible and a multiple of the first row of A is added to the second row, then the resulting matrix is invertible. Answer: True (g) An expression of the invertible matrix A as a product of elementary matrices is unique. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

1.6 More on Linear Systems and Invertible Matrics In this section we will show how the inverse of a matrix can be used to solve a linear system and we will develop some more results about invertible matrices.

Number of Solutions of a Linear System In Section 1.1 we made the statement (based on Figures 1.1.1 and 1.1.2) that every linear system has either no solutions, has exactly one solution, or has infinitely many solutions. We are now in a position to prove this fundamental result.

THEOREM 1.6.1 A system of linear equations has zero, one, or infinitely many solutions. There are no other possibilities.

Proof If is a system of linear equations, exactly one of the following is true: (a) the system has no solutions, (b) the system has exactly one solution, or (c) the system has more than one solution. The proof will be complete if we can show that the system has infinitely many solutions in case (c). Assume that has more than one solution, and let distinct, the matrix x0 is nonzero; moreover,

, where x1 and x2 are any two distinct solutions. Because x1 and x2 are

If we now let k be any scalar, then

But this says that is a solution of infinitely many solutions.

. Since x0 is nonzero and there are infinitely many choices for k, the system

has

Solving Linear Systems by Matrix Inversion Thus far we have studied two procedures for solving linear systems–Gauss–Jordan elimination and Gaussian elimination. The following theorem provides an actual formula for the solution of a linear system of n equations in n unknowns in the case where the coefficient matrix is invertible.

THEOREM 1.6.2 If A is an invertible .

Proof Since

matrix, then for each

, it follows that

matrix b, the system of equations

is a solution of

arbitrary solution and then show that x0 must be the solution If x0 is any solution of

, then

. To show that this is the only solution, we will assume that x0 is an

.

. Multiplying both sides of this equation by

E X A M P L E 1 Solution of a Linear System Using A−1 Consider the system of linear equations

In matrix form this system can be written as

has exactly one solution, namely,

, where

, we obtain

.

In Example 4 of the preceding section, we showed that A is invertible and

By Theorem 1.6.2, the solution of the system is

or

.

Keep in mind that the method of Example 1 only applies when the system has as many equations as unknowns and the coefficient matrix is invertible.

Linear Systems with a Common Coefficient Matrix Frequently, one is concerned with solving a sequence of systems each of which has the same square coefficient matrix A. If A is invertible, then the solutions

can be obtained with one matrix inversion and k matrix multiplications. An efficient way to do this is to form the partitioned matrix (1) in which the coefficient matrix A is “augmented” by all k of the matrices b1, b2,…,bk, and then reduce 1 to reduced row echelon form by GaussJordan elimination. In this way we can solve all k systems at once. This method has the added advantage that it applies even when A is not invertible.

E X A M P L E 2 Solving Two Linear Systems at Once Solve the systems (a)

(b)

Solution The two systems have the same coefficient matrix. If we augment this coefficient matrix with the columns of constants on the right sides of these systems, we obtain

Reducing this matrix to reduced row echelon form yields (verify)

It follows from the last two columns that the solution of system (a) is , , .

,

,

and the solution of system (b) is

Properties of Invertible Matrices Up to now, to show that an

matrix A is invertible, it has been necessary to find an

The next theorem shows that if we produce an

matrix B such that

matrix B satisfying either condition, then the other condition holds automatically.

THEOREM 1.6.3 Let A be a square matrix. (a) If B is a square matrix satisfying

, then

.

(b) If B is a square matrix satisfying

, then

.

We will prove part (a) and leave part (b) as an exercise. Proof (a) Assume that obtain

. If we can show that A is invertible, the proof can be completed by multiplying

on both sides by

to

To show that A is invertible, it suffices to show that the system has only the trivial solution (see Theorem 1.5.3). Let x0 be any solution of this system. If we multiply both sides of on the left by B, we obtain or or . Thus, the system of equations has only the trivial solution.

Equivalence Theorem We are now in a position to add two more statements to the four given in Theorem 1.5.3.

THEOREM 1.6.4 Equivalent Statements If A is an

matrix, then the following are equivalent.

(a) A is invertible. (b)

has only the trivial solution.

(c) The reduced row echelon form of A is In. (d) A is expressible as a product of elementary matrices. (e)

is consistent for every

(f)

has exactly one solution for every

matrix b. matrix b.

It follows from the equivalency of parts (e) and (f) that if you can show that has at least one solution for every matrix b, then you can conclude that it has exactly one solution for every matrix b. Proof Since we proved in Theorem 1.5.3 that (a), (b), (c), and (d) are equivalent, it will be sufficient to prove that

.

This was already proved in Theorem 1.6.2. This is self-evident, for if

has exactly one solution for every

matrix b, then

is consistent for every

matrix b.

If the system

is consistent for every

matrix b, then, in particular, this is so for the systems

Let x1, x2,…,xn be solutions of the respective systems, and let us form an

matrix C having these solutions as columns. Thus C has the form

As discussed in Section 1.3, the successive columns of the product AC will be [see Formula 8 of Section 1.3]. Thus,

By part (b) of Theorem 1.6.3, it follows that

. Thus, A is invertible.

We know from earlier work that invertible matrix factors produce an invertible product. Conversely, the following theorem It shows that if the product of square matrices is invertible, then the factors themselves must be invertible.

THEOREM 1.6.5 Let A and B be square matrices of the same size. If AB is invertible, then A and B must also be invertible.

In our later work the following fundamental problem will occur frequently in various contexts.

A Fundamental Problem Let A be a fixed

matrix. Find all

matrices b such that the system of equations

is consistent.

If A is an invertible matrix, Theorem 1.6.2 completely solves this problem by asserting that for every matrix b, the linear system has the unique solution . If A is not square, or if A is square but not invertible, then Theorem 1.6.2 does not apply. In these cases the matrix b must usually satisfy certain conditions in order for to be consistent. The following example illustrates how the methods of Section 1.2 can be used to determine such conditions.

E X A M P L E 3 Determining Consistency by Elimination What conditions must b1, b2, and b3 satisfy in order for the system of equations

to be consistent? Solution The augmented matrix is

which can be reduced to row echelon form as follows:

It is now evident from the third row in the matrix that the system has a solution if and only if b1, b2, and b3 satisfy the condition To express this condition another way,

is consistent if and only if b is a matrix of the form

where b1 and b2 are arbitrary.

E X A M P L E 4 Determining Consistency by Elimination What conditions must b1, b2, and b3 satisfy in order for the system of equations

to be consistent? Solution The augmented matrix is

Reducing this to reduced row echelon form yields (verify) (2) In this case there are no restrictions on b1, b2, and b3, so the system has the unique solution (3) for all values of b1, b2, and b3.

What does the result in Example 4 tell you about the coefficient matrix of the system?

Skills • Determine whether a linear system of equations has no solutions, exactly one solution, or infinitely many solutions. • Solve linear systems by inverting its coefficient matrix. • Solve multiple linear systems with the same coefficient matrix simultaneously.

• Be familiar with the additional conditions of invertibility stated in the Equivalence Theorem.

Exercise Set 1.6 In Exercises 1–8, solve the system by inverting the coefficient matrix and using Theorem 1.6.2. 1.

Answer:

2. 3.

Answer:

4.

5.

Answer:

6.

7.

Answer:

8.

In Exercises 9–12, solve the linear systems together by reducing the appropriate augmented matrix. 9. (i) (ii) Answer: (i) (ii)

10.

(i) (ii) 11. (i) (ii) (iii) (iv) Answer: (i) (ii) (iii) (iv) 12.

(i) (ii) (iii) In Exercises 13–17, determine conditions on the bi's, if any, in order to guarantee that the linear system is consistent. 13.

Answer: No conditions on

and

14. 15.

Answer:

16.

17.

Answer:

18. Consider the matrices

(a) Show that the equation (b) Solve

can be rewritten as

and use this result to solve

for x.

.

In Exercises 19–20, solve the given matrix equation for X. 19.

Answer:

20.

21. Let be a homogeneous system of n linear equations in n unknowns that has only the trivial solution. Show that if k is any positive integer, then the system also has only the trivial solution. 22. Let be a homogeneous system of n linear equations in n unknowns, and let Q be an invertible the trivial solution if and only if has just the trivial solution. 23. Let the form

matrix. Show that

has just

be any consistent system of linear equations, and let x1 be a fixed solution. Show that every solution to the system can be written in , where x0 is a solution to . Show also that every matrix of this form is a solution.

24. Use part (a) of Theorem 1.6.3 to prove part (b).

True-False Exercises In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) It is impossible for a linear system of linear equations to have exactly two solutions. Answer: True (b) If the linear system

has a unique solution, then the linear system

also must have a unique solution.

Answer: True (c) If A and B are

matrices such that

, then

.

Answer: True (d) If A and B are row equivalent matrices, then the linear systems

and

have the same solution set.

Answer: True (e) If A is an linear system Answer: True

matrix and S is an .

invertible matrix, then if x is a solution to the linear system

, then Sx is a solution to the

(f) Let A be an

matrix. The linear system

has a unique solution if and only if

Answer: True (g) Let A and B be

matrices. If A or B (or both) are not invertible, then neither is AB.

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

is an invertible matrix.

1.7 Diagonal, Triangular, and Symmetric Matrices In this section we will discuss matrices that have various special forms. These matrices arise in a wide variety of applications and will also play an important role in our subsequent work.

Diagonal Matrices A square matrix in which all the entries off the main diagonal are zero is called a diagonal matrix. Here are some examples:

A general

diagonal matrix D can be written as

(1)

A diagonal matrix is invertible if and only if all of its diagonal entries are nonzero; in this case the inverse of 1 is

(2)

Confirm Formula 2 by showing that

Powers of diagonal matrices are easy to compute; we leave it for you to verify that if D is the diagonal matrix 1 and k is a positive integer, then

(3)

E X A M P L E 1 Inverses and Powers of Diagonal Matrices If

then

Matrix products that involve diagonal factors are especially easy to compute. For example,

In words, to multiply a matrix A on the left by a diagonal matrix D, one can multiply successive rows of A by the successive diagonal entries of D, and to multiply A on the right by D, one can multiply successive columns of A by the successive diagonal entries of D.

Triangular Matrices A square matrix in which all the entries above the main diagonal are zero is called lower triangular, and a square matrix in which all the entries below the main diagonal are zero is called upper triangular. A matrix that is either upper triangular or lower triangular is called triangular.

E X A M P L E 2 Upper and Lower Triangular Matrices

Remark Observe that diagonal matrices are both upper triangular and lower triangular since they have zeros below and above the main diagonal. Observe also that a square matrix in row echelon form is upper triangular since it has zeros below the main diagonal.

Properties of Triangular Matrices

Example 2 illustrates the following four facts about triangular matrices that we will state without formal proof. • A square matrix is upper triangular if and only if all entries to the left of the main diagonal are zero; that is, if (Figure 1.7.1). • A square matrix is lower triangular if and only if all entries to the right of the main diagonal are zero; that is, if (Figure 1.7.1). • A square matrix

is upper triangular if and only if the ith row starts with at least

• A square matrix

is lower triangular if and only if the jth column starts with at least

zeros for every i. zeros for every j.

Figure 1.7.1 The following theorem lists some of the basic properties of triangular matrices.

THEOREM 1.7.1 (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular. (b) The product of lower triangular matrices is lower triangular, and the product of upper triangular matrices is upper triangular. (c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero. (d) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an invertible upper triangular matrix is upper triangular.

Part (a) is evident from the fact that transposing a square matrix can be accomplished by reflecting the entries about the main diagonal; we omit the formal proof. We will prove (b), but we will defer the proofs of (c) and (d) to the next chapter, where we will have the tools to prove those results more efficiently. Proof (b) We will prove the result for lower triangular matrices; the proof for upper triangular matrices is similar. Let and be lower triangular matrices, and let be the product . We can prove that C is lower triangular by showing that for . But from the definition of matrix multiplication,

If we assume that

, then the terms in this expression can be grouped as follows:

In the first grouping all of the b factors are zero since B is lower triangular, and in the second grouping all of the a factors are zero since A is lower triangular. Thus, , which is what we wanted to prove.

E X A M P L E 3 Computations with Triangular Matrices

Consider the upper triangular matrices

It follows from part (c) of Theorem 1.7.1 that the matrix A is invertible but the matrix B is not. Moreover, the theorem also tells us that , AB, and BA must be upper triangular. We leave it for you to confirm these three statements by showing that

Symmetric Matrices

DEFINITION 1 A square matrix A is said to be symmetric if

.

It is easy to recognize a symmetric matrix by inspection: The entries on the main diagonal have no restrictions, but mirror images of entries across the main diagonal must be equal. Here is a picture using the second matrix in Example 4:

All diagonal matrices, such as the third matrix in Example 4, obviously have this property.

E X A M P L E 4 Symmetric Matrices The following matrices are symmetric, since each is equal to its own transpose (verify).

Remark It follows from Formula 11 of Section 1.3 that a square matrix

is symmetric if and only if

(4) for all values of i and j. The following theorem lists the main algebraic properties of symmetric matrices. The proofs are direct consequences of Theorem 1.4.8 and are omitted.

THEOREM 1.7.2 If A and B are symmetric matrices with the same size, and if k is any scalar, then: (a) AT is symmetric. (b)

and

are symmetric.

(c) kA is symmetric.

It is not true, in general, that the product of symmetric matrices is symmetric. To see why this is so, let A and B be symmetric matrices with the same size. Then it follows from part (e) of Theorem 1.4.8 and the symmetry of A and B that Thus,

if and only if

, that is, if and only if A and B commute. In summary, we have the following

result.

THEOREM 1.7.3 The product of two symmetric matrices is symmetric if and only if the matrices commute.

E X A M P L E 5 Products of Symmetric Matrices The first of the following equations shows a product of symmetric matrices that is not symmetric, and the second shows a product of symmetric matrices that is symmetric. We conclude that the factors in the first equation do not commute, but those in the second equation do. We leave it for you to verify that this is so.

Invertibility of Symmetric Matrices In general, a symmetric matrix need not be invertible. For example, a diagonal matrix with a zero on the main diagonal is

symmetric but not invertible. However, the following theorem shows that if a symmetric matrix happens to be invertible, then its inverse must also be symmetric.

THEOREM 1.7.4 If A is an invertible symmetric matrix, then

is symmetric.

Proof Assume that A is symmetric and invertible. From Theorem 1.4.9 and the fact that

which proves that

, we have

is symmetric.

Products AAT and ATA Matrix products of the form AAT and ATA arise in a variety of applications. If A is an matrix, so the products AAT and ATA are both square matrices—the matrix AAT has size . Such products are always symmetric since

matrix, then AT is an , and the matrix ATA has size

E X A M P L E 6 The Product of a Matrix and Its Transpose Is Symmetric Let A be the

matrix

Then

Observe that ATA and AAT are symmetric as expected.

Later in this text, we will obtain general conditions on A under which AAT and ATA are invertible. However, in the special case where A is square, we have the following result.

THEOREM 1.7.5 If A is an invertible matrix, then AAT and ATA are also invertible.

Proof Since A is invertible, so is AT by Theorem 1.4.9. Thus AAT and ATA are invertible, since they are the products of invertible matrices.

Concept Review • Diagonal matrix • Lower triangular matrix • Upper triangular matrix • Triangular matrix • Symmetric matrix

Skills • Determine whether a diagonal matrix is invertible with no computations. • Compute matrix products involving diagonal matrices by inspection. • Determine whether a matrix is triangular. • Understand how the transpose operation affects diagonal and triangular matrices. • Understand how inversion affects diagonal and triangular matrices. • Determine whether a matrix is a symmetric matrix.

Exercise Set 1.7 In Exercises 1–4, determine whether the given matrix is invertible. 1. Answer:

2.

3.

Answer:

4.

In Exercises 5–8, determine the product by inspection. 5.

Answer:

6.

7.

Answer:

8.

In Exercises 9–12, find 9. Answer:

10.

,

, and

(where k is any integer) by inspection.

11.

Answer:

12.

In Exercises 13–19, decide whether the given matrix is symmetric. 13. Answer: Not symmetric 14. 15. Answer: Symmetric 16. 17.

Answer: Not symmetric 18.

19.

Answer:

Not symmetric In Exercises 20–22, decide by inspection whether the given matrix is invertible. 20.

21.

Answer: Not invertible 22.

In Exercises 23–24, find all values of the unknown constant(s) in order for A to be symmetric. 23. Answer: 24.

In Exercises 25–26, find all values of x in order for A to be invertible. 25.

Answer:

26.

In Exercises 27–28, find a diagonal matrix A that satisfies the given condition. 27.

Answer:

28.

29. Verify Theorem 1.7.1(b) for the product AB, where

30. Verify Theorem 1.7.1(d) for the matrices A and B in Exercise 29. 31. Verify Theorem 1.7.4 for the given matrix A. (a) (b)

32. Let A be an

symmetric matrix. 2

(a) Show that A is symmetric. (b) Show that

is symmetric.

33. Prove: If 34. Find all

, then A is symmetric and

.

diagonal matrices A that satisfy

35. Let

be an

.

matrix. Determine whether A is symmetric.

(a) (b) (c) (d) Answer: (a) Yes (b) No (unless

)

(c) Yes (d) No (unless

)

36. On the basis of your experience with Exercise 35, devise a general test that can be applied to a formula for aij to determine whether is symmetric. 37. A square matrix A is called skew-symmetric if

.

Prove: (a) If A is an invertible skew-symmetric matrix, then (b) If A and B are skew-symmetric matrices, then so are

is skew-symmetric. for any scalar k.

(c) Every square matrix A can be expressed as the sum of a symmetric matrix and a skew-symmetric matrix. [Hint: Note the identity

.]

In Exercises 38–39, fill in the missing entries (marked with ×) to produce a skew-symmetric matrix. 38.

39.

Answer:

40. Find all values of a, b, c, and d for which A is skew-symmetric.

41. We showed in the text that the product of symmetric matrices is symmetric if and only if the matrices commute. Is the product of commuting skew-symmetric matrices skew- symmetric? Explain. [Note: See Exercise 37 for the deffinition of skew-symmetric.] 42. If the matrix A can be expressed as , where L is a lower triangular matrix and U is an upper triangular matrix, then the linear system can be expressed as and can be solved in two steps: Step 1. Let

, so that

Step 2. Solve the system

can be expressed as

. Solve this system.

for x.

In each part, use this two-step method to solve the given system. (a)

(b)

43. Find an upper triangular matrix that satisfies

Answer:

True-False Exercises In parts (a)–(m) determine whether the statement is true or false, and justify your answer. (a) The transpose of a diagonal matrix is a diagonal matrix.

Answer: True (b) The transpose of an upper triangular matrix is an upper triangular matrix. Answer: False (c) The sum of an upper triangular matrix and a lower triangular matrix is a diagonal matrix. Answer: False (d) All entries of a symmetric matrix are determined by the entries occurring on and above the main diagonal. Answer: True (e) All entries of an upper triangular matrix are determined by the entries occurring on and above the main diagonal. Answer: True (f) The inverse of an invertible lower triangular matrix is an upper triangular matrix. Answer: False (g) A diagonal matrix is invertible if and only if all of its diagonal entries are positive. Answer: False (h) The sum of a diagonal matrix and a lower triangular matrix is a lower triangular matrix. Answer: True (i) A matrix that is both symmetric and upper triangular must be a diagonal matrix. Answer: True (j) If A and B are

matrices such that

is symmetric, then A and B are symmetric.

matrices such that

is upper triangular, then A and B are upper triangular.

Answer: False (k) If A and B are Answer: False (l) If A2 is a symmetric matrix, then A is a symmetric matrix.

Answer: False (m) If kA is a symmetric matrix for some Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

, then A is a symmetric matrix.

1.8 Applications of Linear Systems In this section we will discuss some relatively brief applications of linear systems. These are but a small sample of the wide variety of real-world problems to which our study of linear systems is applicable.

Network Analysis The concept of a network appears in a variety of applications. Loosely stated, a network is a set of branches through which something “flows.” For example, the branches might be electrical wires through which electricity flows, pipes through which water or oil flows, traffic lanes through which vehicular traffic flows, or economic linkages through which money flows, to name a few possibilities. In most networks, the branches meet at points, called nodes or junctions, where the flow divides. For example, in an electrical network, nodes occur where three or more wires join, in a traffic network they occur at street intersections, and in a financial network they occur at banking centers where incoming money is distributed to individuals or other institutions. In the study of networks, there is generally some numerical measure of the rate at which the medium flows through a branch. For example, the flow rate of electricity is often measured in amperes, the flow rate of water or oil in gallons per minute, the flow rate of traffic in vehicles per hour, and the flow rate of European currency in millions of Euros per day. We will restrict our attention to networks in which there is flow conservation at each node, by which we mean that the rate of flow into any node is equal to the rate of flow out of that node. This ensures that the flow medium does not build up at the nodes and block the free movement of the medium through the network. A common problem in network analysis is to use known flow rates in certain branches to find the flow rates in all of the branches. Here is an example.

E X A M P L E 1 Network Analysis Using Linear Systems Figure 1.8.1 shows a network with four nodes in which the flow rate and direction of flow in certain branches are known. Find the flow rates and directions of flow in the remaining branches.

Figure 1.8.1 Solution As illustrated in Figure 1.8.2, we have assigned arbitrary directions to the unknown flow rates , and . We need not be concerned if some of the directions are incorrect, since an incorrect direction will be signaled by a negative value for the flow rate when we solve for the unknowns.

Figure 1.8.2 It follows from the conservation of flow at node A that Similarly, at the other nodes we have

These four conditions produce the linear system

which we can now try to solve for the unknown flow rates. In this particular case the system is sufficiently simple that it can be solved by inspection (work from the bottom up). We leave it for you to confirm that the solution is The fact that is negative tells us that the direction assigned to that flow in Figure 1.8.2 is incorrect; that is, the flow in that branch is into node A.

E X A M P L E 2 Design of Traffic Patterns The network in Figure 1.8.3 shows a proposed plan for the traffic flow around a new park that will house the Liberty Bell in Philadelphia, Pennsylvania. The plan calls for a computerized traffic light at the north exit on Fifth Street, and the diagram indicates the average number of vehicles per hour that are expected to flow in and out of the streets that border the complex. All streets are one-way. (a) How many vehicles per hour should the traffic light let through to ensure that the average number of vehicles per hour flowing into the complex is the same as the average number of vehicles flowing out? (b) Assuming that the traffic light has been set to balance the total flow in and out of the complex, what can you say about the average number of vehicles per hour that will flow along the streets that border the complex?

Figure 1.8.3 Solution (a) If, as indicated in Figure 1.8.3b we let x denote the number of vehicles per hour that the traffic light must let through, then the total number of vehicles per hour that flow in and out of the complex will be

Equating the flows in and out shows that the traffic light should let x = 600 vehicles per hour pass through. (b) To avoid traffic congestion, the flow in must equal the flow out at each intersection. For this to happen, the following conditions must be satisfied: Intersection

Thus, with

Flow In

Flow Out

A

=

B

=

C

=

D

=

700

, as computed in part (a), we obtain the following linear system:

We leave it for you to show that the system has infinitely many solutions and that these are given by the parametric equations (1) However, the parameter t is not completely arbitrary here, since there are physical constraints to be considered. For example, the average flow rates must be nonnegative since we have assumed the streets to be one-way, and a negative flow rate would indicate a flow in the wrong direction. This being the case, we see from 1 that t can be any real number that satisfies , which implies that the average flow rates along the streets will fall in the ranges

Electrical Circuits Next, we will show how network analysis can be used to analyze electrical circuits consisting of batteries and resistors. A battery is a source of electric energy, and a resistor, such as a lightbulb, is an element that dissipates electric energy. Figure 1.8.4 shows a schematic diagram of a circuit with one battery (represented by the symbol ), one resistor (represented by the symbol ), and a switch. The battery has a positive pole (+) and a negative pole (−). When the switch is closed, electrical current is considered to flow from the positive pole of the battery, through the resistor, and back to the negative pole (indicated by the arrowhead in the figure).

Figure 1.8.4 Electrical current, which is a flow of electrons through wires, behaves much like the flow of water through pipes. A battery acts like a pump that creates “electrical pressure” to increase the flow rate of electrons, and a resistor acts like a restriction in a pipe that reduces the flow rate of electrons. The technical term for electrical pressure is electrical potential; it is commonly measured in volts (V). The degree to which a resistor reduces the electrical potential is called its resistance and is commonly measured in ohms (Ω). The rate of flow of electrons in a wire is called current and is commonly measured in amperes (also called amps) (A). The precise effect of a resistor is given by the following law:

Ohm's Law If a current of I amperes passes through a resistor with a resistance of R ohms, then there is a resulting drop of E volts in electrical potential that is the product of the current and resistance; that is,

A typical electrical network will have multiple batteries and resistors joined by some configuration of wires. A point at which three or more wires in a network are joined is called a node (or junction point). A branch is a wire connecting two nodes, and a closed loop is a succession of connected branches that begin and end at the same node. For example, the electrical network in Figure 1.8.5 has two nodes and three closed loops— two inner loops and one outer loop. As current flows through an electrical network, it undergoes increases and decreases in electrical potential, called voltage rises and voltage drops, respectively. The behavior of the current at the nodes and around closed loops is governed by two fundamental laws:

Figure 1.8.5

Kirchhoff's Current Law The sum of the currents flowing into any node is equal to the sum of the currents flowing out.

Kirchhoff's Voltage Law In one traversal of any closed loop, the sum of the voltage rises equals the sum of the voltage drops.

Kirchhoff's current law is a restatement of the principle of flow conservation at a node that was stated for general networks. Thus, for example, the currents at the top node in Figure 1.8.6 satisfy the equation .

Figure 1.8.6 In circuits with multiple loops and batteries there is usually no way to tell in advance which way the currents are flowing, so the usual procedure in circuit analysis is to assign arbitrary directions to the current flows in the branches and let the mathematical computations determine whether the assignments are correct. In addition to assigning directions to the current flows, Kirchhoff's voltage law requires a direction of travel for each closed loop. The choice is arbitrary, but for consistency we will always take this direction to be clockwise (Figure 1.8.7). We also make the following conventions: • A voltage drop occurs at a resistor if the direction assigned to the current through the resistor is the same as the direction assigned to the loop, and a voltage rise occurs at a resistor if the direction assigned to the current through the resistor is the opposite to that assigned to the loop. • A voltage rise occurs at a battery if the direction assigned to the loop is from − to + through the battery, and a voltage drop occurs at a battery if the direction assigned to the loop is from + to − through the battery. If you follow these conventions when calculating currents, then those currents whose directions were assigned correctly will have positive values and those whose directions were assigned incorrectly will have negative values.

Figure 1.8.7

E X A M P L E 3 A Circuit with One Closed Loop

Determine the current I in the circuit shown in Figure 1.8.8.

Figure 1.8.8 Solution Since the direction assigned to the current through the resistor is the same as the direction of the loop, there is a voltage drop at the resistor. By Ohm's law this voltage drop is . Also, since the direction assigned to the loop is from − to + through the battery, there is a voltage rise of 6 volts at the battery. Thus, it follows from Kirchhoff's voltage law that from which we conclude that the current is flow is correct.

. Since I is positive, the direction assigned to the current

E X A M P L E 4 A Circuit with Three Closed Loops Determine the currents

, and

in the circuit shown in Figure 1.8.9.

Figure 1.8.9 Solution Using the assigned directions for the currents, Kirchhoff s current law provides one equation for each node: Node Current In

Current Out

A

=

B

=

However, these equations are really the same, since both can be expressed as (2)

Gustav Kirchhoff (1824-1887) Historical Note The German physicist Gustav Kirchhoff was a student of Gauss. His work on Kirchhoff's laws, announced in 1854, was a major advance in the calculation of currents, voltages, and resistances of electrical circuits. Kirchhoff was severely disabled and spent most of his life on crutches or in a wheelchair. Image: © SSPL/The Image Works]

To find unique values for the currents we will need two more equations, which we will obtain from Kirchhoff's voltage law. We can see from the network diagram that there are three closed loops, a left inner loop containing the 50 V battery, a right inner loop containing the 30 V battery, and an outer loop that contains both batteries. Thus, Kirchhoff's voltage law will actually produce three equations. With a clockwise traversal of the loops, the voltage rises and drops in these loops are as follows: Voltage Rises Left Inside Loop Right Inside Loop

Voltage Drops

50 0

Outside Loop These conditions can be rewritten as (3) However, the last equation is superfluous, since it is the difference of the first two. Thus, if we combine 2 and the first two equations in 3, we obtain the following linear system of three equations in the three unknown currents:

We leave it for you to solve this system and show that , and . The fact that is negative tells us that the direction of this current is opposite to that indicated in Figure 1.8.9.

Balancing Chemical Equations

Chemical compounds are represented by chemical formulas that describe the atomic makeup of their molecules. For example, water is composed of two hydrogen atoms and one oxygen atom, so its chemical formula is H2O; and stable oxygen is composed of two oxygen atoms, so its chemical formula is O2. When chemical compounds are combined under the right conditions, the atoms in their molecules rearrange to form new compounds. For example, when methane burns, the methane (CH4) and stable oxygen (O2) react to form carbon dioxide (CO2) and water (H2O). This is indicated by the chemical equation (4) The molecules to the left of the arrow are called the reactants and those to the right the products. In this equation the plus signs serve to separate the molecules and are not intended as algebraic operations. However, this equation does not tell the whole story, since it fails to account for the proportions of molecules required for a complete reaction (no reactants left over). For example, we can see from the right side of 4 that to produce one molecule of carbon dioxide and one molecule of water, one needs three oxygen atoms for each carbon atom. However, from the left side of 4 we see that one molecule of methane and one molecule of stable oxygen have only two oxygen atoms for each carbon atom. Thus, on the reactant side the ratio of methane to stable oxygen cannot be one-to-one in a complete reaction. A chemical equation is said to be balanced if for each type of atom in the reaction, the same number of atoms appears on each side of the arrow. For example, the balanced version of Equation 4 is (5) by which we mean that one methane molecule combines with two stable oxygen molecules to produce one carbon dioxide molecule and two water molecules. In theory, one could multiply this equation through by any positive integer. For example, multiplying through by 2 yields the balanced chemical equation However, the standard convention is to use the smallest positive integers that will balance the equation. Equation 4 is sufficiently simple that it could have been balanced by trial and error, but for more complicated chemical equations we will need a systematic method. There are various methods that can be used, but we will give one that uses systems of linear equations. To illustrate the method let us reexamine Equation 4. To balance this equation we must find positive integers, , and such that (6) For each of the atoms in the equation, the number of atoms on the left must be equal to the number of atoms on the right. Expressing this in tabular form we have Left Side Carbon

=

Hydrogen

=

Oxygen

=

from which we obtain the homogeneous linear system

The augmented matrix for this system is

Right Side

We leave it for you to show that the reduced row echelon form of this matrix is

from which we conclude that the general solution of the system is where t is arbitrary. The smallest positive integer values for the unknowns occur when we let , so the equation can be balanced by letting . This agrees with our earlier conclusions, since substituting these values into Equation 6 yields Equation 5.

E X A M P L E 5 Balancing Chemical Equations Using Linear Systems Balance the chemical equation

Solution Let

, and

be positive integers that balance the equation (7)

Equating the number of atoms of each type on the two sides yields

from which we obtain the homogeneous linear system

We leave it for you to show that the reduced row echelon form of the augmented matrix for this system is

from which we conclude that the general solution of the system is

where t is arbitrary. To obtain the smallest positive integers that balance the equation, we let , in which case we obtain , and . Substituting these values in 7 produces the balanced equation

Polynomial Interpolation An important problem in various applications is to find a polynomial whose graph passes through a specified set of points in the plane; this is called an interpolating polynomial for the points. The simplest example of such a problem is to find a linear polynomial (8) whose graph passes through two known distinct points, and , in the xy-plane (Figure 1.8.10). You have probably encountered various methods in analytic geometry for finding the equation of a line through two points, but here we will give a method based on linear systems that can be adapted to general polynomial interpolation.

Figure 1.8.10 The graph of 8 is the line

, and for this line to pass through the points

and

, we must have

Therefore, the unknown coefficients a and b can be obtained by solving the linear system

We don't need any fancy methods to solve this system—the value of a can be obtained by subtracting the equations to eliminate b, and then the value of a can be substituted into either equation to find b. We leave it as an exercise for you to find a and b and then show that they can be expressed in the form (9) provided

. Thus, for example, the line

can be obtained by taking

Therefore, the equation of the line is

that passes through the points and

, in which case 9 yields

(Figure 1.8.11).

Figure 1.8.11 Now let us consider the more general problem of finding a polynomial whose graph passes through n points with distinct x-coordinates (10) Since there are n conditions to be satisfied, intuition suggests that we should begin by looking for a polynomial of the form (11) since a polynomial of this form has n coefficients that are at our disposal to satisfy the n conditions. However, we want to allow for cases where the points may lie on a line or have some other configuration that would make it possible to use a polynomial whose degree is less than ; thus, we allow for the possibility that and other coefficients in 11 may be zero. The following theorem, which we will prove later in the text, is the basic result on polynomial interpolation.

THEOREM 1.8.1 Polynomial Interpolation Given any n points in the xy-plane that have distinct x-coordinates, there is a unique polynomial of degree n — 1 or less whose graph passes through those points.

Let us now consider how we might go about finding the interpolating polynomial 11 whose graph passes through the points in 10. Since the graph of this polynomial is the graph of the equation (12) it follows that the coordinates of the points must satisfy

(13)

In these equations the values of x's and y's are assumed to be known, so we can view this as a linear system in the unknowns . From this point of view the augmented matrix for the system is

(14)

and hence the interpolating polynomial can be found by reducing this matrix to reduced row echelon form (Gauss-Jordan elimination).

E X A M P L E 6 Polynomial Interpolation by Gauss-Jordan Elimination Find a cubic polynomial whose graph passes through the points

Solution Since there are four points, we will use an interpolating polynomial of degree polynomial by

. Denote this

and denote the x- and y-coordinates of the given points by Thus, it follows from 14 that the augmented matrix for the linear system in the unknowns is

We leave it for you to confirm that the reduced row echelon form of this matrix is

from which it follows that

. Thus, the interpolating polynomial is

The graph of this polynomial and the given points are shown in Figure 1.8.12.

, and

Figure 1.8.12

Remark Later we will give a more efficient method for finding interpolating polynomials that is better suited for problems in which the number of data points is large. C A L C U L U S A N D C A L C U L AT I N G U T I L I T Y R E Q U I R E D

E X A M P L E 7 Approximate Integration There is no way to evaluate the integral

directly since there is no way to express an antiderivative of the integrand in terms of elementary functions. This integral could be approximated by Simpson's rule or some comparable method, but an alternative approach is to approximate the integrand by an interpolating polynomial and integrate the approximating polynomial. For example, let us consider the five points that divide the interval [0, 1] into four equally spaced subintervals. The values of

at these points are approximately The interpolating polynomial is (verify) (15) and (16) As shown in Figure 1.8.13, the graphs of f and p match very closely over the interval [0, 1], so the approximation is quite good.

Figure 1.8.13

Concept Review • Network • Branches • Nodes • Flow conservation • Electrical circuits: battery, resistor, poles (positive and negative), electrical potential, Ohm's law, Kirchhoff's current law, Kirchhoff's voltage law • Chemical equations: reactants, products, balanced equation • Interpolating polynomial

Skills • Find the flow rates and directions of flow in branches of a network. • Find the amount of current flowing through parts of an electrical circuit. • Write a balanced chemical equation for a given chemical reaction. • Find an interpolating polynomial for a graph passing through a given collection of points.

Exercise Set 1.8 1. The accompanying figure shows a network in which the flow rate and direction of flow in certain branches are known. Find the flow rates and directions of flow in the remaining branches.

Figure Ex-1 Answer:

2. The accompanying figure shows known flow rates of hydrocarbons into and out of a network of pipes at an oil refinery. (a) Set up a linear system whose solution provides the unknown flow rates.

(b) Solve the system for the unknown flow rates. (c) Find the flow rates and directions of flow if

and

.

Figure Ex-2 3. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow rates along the streets are measured as the average number of vehicles per hour. (a) Set up a linear system whose solution provides the unknown flow rates. (b) Solve the system for the unknown flow rates. (c) If the flow along the road from A to B must be reduced for construction, what is the minimum flow that is required to keep traffic flowing on all roads?

Figure Ex-3 Answer: (a) (b) (c) For all rates to be nonnegative, we need

cars per hour, so

4. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow rates along the streets are measured as the average number of vehicles per hour. (a) Set up a linear system whose solution provides the unknown flow rates. (b) Solve the system for the unknown flow rates. (c) Is it possible to close the road from A to B for construction and keep traffic flowing on the other streets? Explain.

Figure Ex-4 In Exercises 5–8, analyze the given electrical circuits by finding the unknown currents. 5.

Answer:

6.

7.

Answer:

8.

In Exercises 9–12, write a balanced equation for the given chemical reaction. 9. Answer: , and 10. 11.

; the balanced equation is

Answer: ; the balanced equation is 12. 13. Find the quadratic polynomial whose graph passes through the points (1, 1), (2, 2), and (3, 5). Answer:

14. Find the quadratic polynomial whose graph passes through the points (0, 0), (−1, 1), and (1, 1). 15. Find the cubic polynomial whose graph passes through the points (−1, −1), (0, 1), (1, 3), (4, −1). Answer:

16. The accompanying figure shows the graph of a cubic polynomial. Find the polynomial.

Figure Ex-16 17. (a) Find an equation that represents the family of all second-degree polynomials that pass through the points (0, 1) and (1,2). [Hint: The equation will involve one arbitrary parameter that produces the members of the family when varied.] (b) By hand, or with the help of a graphing utility, sketch four curves in the family. Answer: (a) Using

as a parameter,

(b) The graphs for

where

.

, and 3 are shown.

18. In this section we have selected only a few applications of linear systems. Using the Internet as a search tool, try to find some more real-world applications of such systems. Select one that is of interest to you, and write a paragraph about it.

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) In any network, the sum of the flows out of a node must equal the sum of the flows into a node. Answer: True (b) When a current passes through a resistor, there is an increase in the electrical potential in a circuit. Answer: False (c) Kirchhoff's current law states that the sum of the currents flowing into a node equals the sum of the currents flowing out of the node. Answer: True (d) A chemcial equation is called balanced if the total number of atoms on each side of the equation is the same. Answer: False (e) Given any n points in the xy-plane, there is a unique polynomial of degree those points. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

or less whose graph passes through

1.9 Leontief Input-Output Models In 1973 the economist Wassily Leontief was awarded the Nobel prize for his work on economic modeling in which he used matrix methods to study the relationships between different sectors in an economy. In this section we will discuss some of the ideas developed by Leontief.

Inputs and Outputs in an Economy One way to analyze an economy is to divide it into sectors and study how the sectors interact with one another. For example, a simple economy might be divided into three sectors—manufacturing, agriculture, and utilities. Typically, a sector will produce certain outputs but will require inputs from the other sectors and itself. For example, the agricultural sector may produce wheat as an output but will require inputs of farm machinery from the manufacturing sector, electrical power from the utilities sector, and food from its own sector to feed its workers. Thus, we can imagine an economy to be a network in which inputs and outputs flow in and out of the sectors; the study of such flows is called input-output analysis. Inputs and outputs are commonly measured in monetary units (dollars or millions of dollars, for example) but other units of measurement are also possible. The flows between sectors of a real economy are not always obvious. For example, in World War II the United States had a demand for 50,000 new airplanes that required the construction of many new aluminum manufacturing plants. This produced an unexpectedly large demand for certain copper electrical components, which in turn produced a copper shortage. The problem was eventually resolved by using silver borrowed from Fort Knox as a copper substitute. In all likelihood modern input-output analysis would have anticipated the copper shortage. Most sectors of an economy will produce outputs, but there may exist sectors that consume outputs without producing anything themselves (the consumer market, for example). Those sectors that do not produce outputs are called open sectors. Economies with no open sectors are called closed economies, and economies with one or more open sectors are called open economies (Figure 1.9.1). In this section we will be concerned with economies with one open sector, and our primary goal will be to determine the output levels that are required for the productive sectors to sustain themselves and satisfy the demand of the open sector.

Figure 1.9.1

Leontief Model of an Open Economy Let us consider a simple open economy with one open sector and three product-producing sectors: manufacturing, agriculture, and utilities. Assume that inputs and outputs are measured in dollars and that the inputs required by the

productive sectors to produce one dollar's worth of output are in accordance with Table 1. Table 1 Income Required per Dollar Output

Provider

Manufacturing

Agriculture

Utilities

Manufacturing

$ 0.50

$ 0.10

$ 0.10

Agriculture

$ 0.20

$ 0.50

$ 0.30

Utilities

$ 0.10

$ 0.30

$ 0.40

Wassily Leontief (1906–1999) Historical Note It is somewhat ironic that it was the Russian-born Wassily Leontief who won the Nobel prize in 1973 for pioneering the modern methods for analyzing free-market economies. Leontief was a precocious student who entered the University of Leningrad at age 15. Bothered by the intellectual restrictions of the Soviet system, he was put in jail for anti-Communist activities, after which he headed for the University of Berlin, receiving his Ph.D. there in 1928. He came to the United States in 1931, where he held professorships at Harvard and then New York University. [Image: © Bettmann/©Corbis]

Usually, one would suppress the labeling and express this matrix as (1) This is called the consumption matrix (or sometimes the technology matrix) for the economy. The column vectors

in C list the inputs required by the manufacturing, agricultural, and utilities sectors, respectively, to produce $1.00 worth of output. These are called the consumption vectors of the sectors. For example, c1 tells us that to produce $1.00 worth of output the manufacturing sector needs $0.50 worth of manufacturing output, $0.20 worth of agricultural output, and $0.10 worth of utilities output. What is the economic significance of the row sums of the consumption matrix?

Continuing with the above example, suppose that the open sector wants the economy to supply it manufactured goods, agricultural products, and utilities with dollar values: dollars of manufactured goods dollars of agricultural products dollars of utilities The column vector d that has these numbers as successive components is called the outside demand vector. Since the product-producing sectors consume some of their own output, the dollar value of their output must cover their own needs plus the outside demand. Suppose that the dollar values required to do this are dollars of manufactured goods dollars of agricultural products dollars of utilities The column vector x that has these numbers as successive components is called the production vector for the economy. For the economy with consumption matrix 1, that portion of the production vector x that will be consumed by the three productive sectors is

The vector is called the intermediate demand vector for the economy. Once the intermediate demand is met, the portion of the production that is left to satisfy the outside demand is . Thus, if the outside demand vector is d, then x must satisfy the equation

which we will find convenient to rewrite as (2) The matrix

is called the Leontief matrix and 2 is called the Leontief equation.

E X A M P L E 1 Satisfying Outside Demand Consider the economy described in Table 1. Suppose that the open sector has a demand for $7900 worth of manufacturing products, $3950 worth of agricultural products, and $1975 worth of utilities. (a) Can the economy meet this demand? (b) If so, find a production vector x that will meet it exactly. Solution The consumption matrix, production vector, and outside demand vector are (3) To meet the outside demand, the vector x must satisfy the Leontief equation 2, so the problem reduces to solving the linear system

(4)

(if consistent). We leave it for you to show that the reduced row echelon form of the augmented matrix for this system is

This tells us that 4 is consistent, and the economy can satisfy the demand of the open sector exactly by producing $27,500 worth of manufacturing output, $33,750 worth of agricultural output, and $24,750 worth of utilities output.

Productive Open Economies In the preceding discussion we considered an open economy with three product-producing sectors; the same ideas apply to an open economy with n product-producing sectors. In this case, the consumption matrix, production vector, and outside demand vector have the form

where all entries are nonnegative and = the monetary value of the output of the ith sector that is needed by the jth sector to produce one unit of output = the monetary value of the output of the ith sector = the monetary value of the output of the ith sector that is required to meet the demand of the open sector Remark Note that the jth column vector of C contains the monetary values that the jth sector requires of the other sectors to produce one monetary unit of output, and the ith row vector of C contains the monetary values required of the ith sector by the other sectors for each of them to produce one monetary unit of output. As discussed in our example above, a production vector x that meets the demand d of the outside sector must satisfy the Leontief equation If the matrix

is invertible, then this equation has the unique solution (5)

for every demand vector d. However, for x to be a valid production vector it must have nonnegative entries, so the problem of importance in economics is to determine conditions under which the Leontief equation has a solution with nonnegative entries. It is evident from the form of 5 that if

is invertible, and if

has non-negative entries, then for every

demand vector d the corresponding x will also have non-negative entries, and hence will be a valid production vector for the economy. Economies for which has nonnegative entries are said to be productive. Such economies are desirable because demand can always be met by some level of production. The following theorem, whose proof can be found in many books on economics, gives conditions under which open economies are productive.

THEOREM 1.9.1 If C is the consumption matrix for an open economy, and if all of the column sums are less than then the matrix is invertible, the entries of are nonnegative, and the economy is productive.

Remark The jth column sum of C represents the total dollar value of input that the jth sector requires to produce $1 of output, so if the jth column sum is less than 1, then the jth sector requires less than $1 of input to produce $1 of output; in this case we say that the jth sector is profitable. Thus, Theorem 1.9.1 states that if all product-producing sectors of an open economy are profitable, then the economy is productive. In the exercises we will ask you to show that an open economy is productive if all of the row sums of C are less than 1 (Exercise 11). Thus, an open economy is productive if either all of the column sums or all of the row sums of C are less than 1.

E X A M P L E 2 An Open Economy Whose Sectors Are All Profitable The column sums of the consumption matrix C in 1 are less than 1, so

exists and has nonnegative

entries. Use a calculating utility to confirm this, and use this inverse to solve Equation 4 in Example 1. Solution We leave it for you to show that

This matrix has nonnegative entries, and

which is consistent with the solution in Example 1.

Concept Review • Sectors • Inputs • Outputs • Input-output analysis • Open sector • Economies: open, closed

• Consumption (technology) matrix • Consumption vector • Outside demand vector • Production vector • Intermediate demand vector • Leontief matrix • Leontief equation

Skills • Construct a consumption matrix for an economy. • Understand the relationships among the vectors of a sector of an economy: consumption, outside demand, production, and intermediate demand.

Exercise Set 1.9 1. An automobile mechanic (M) and a body shop (B) use each other's services. For each $1.00 of business that M does, it uses $0.50 of its own services and $0.25 of B's services, and for each $1.00 of business that B does it uses $0.10 of its own services and $0.25 of M's services. (a) Construct a consumption matrix for this economy. (b) How much must M and B each produce to provide customers with $7000 worth of mechanical work and $14,000 worth of body work? Answer: (a) (b)

2. A simple economy produces food (F) and housing (H). The production of $1.00 worth of food requires $0.30 worth of food and $0. 10 worth of housing, and the production of $1.00 worth of housing requires $0.20 worth of food and $0.60 worth of housing. (a) Construct a consumption matrix for this economy. (b) What dollar value of food and housing must be produced for the economy to provide consumers $130,000 worth of food and $130,000 worth of housing? 3. Consider the open economy described by the accompanying table, where the input is in dollars needed for $1.00 of output. (a) Find the consumption matrix for the economy. (b) Suppose that the open sector has a demand for $1930 worth of housing, $3860 worth of food, and $5790 worth of utilities. Use row reduction to find a production vector that will meet this demand exactly. Table Ex-3 Income Required per Dollar Output

Provider

Housing

Food

Utilities

Housing

$ 0.10

$ 0.60

$ 0.40

Food

$ 0.30

$ 0.20

$ 0.30

Utilities

$ 0.40

$ 0.10

$ 0.20

Answer: (a)

(b)

4. A company produces Web design, software, and networking services. View the company as an open economy described by the accompanying table, where input is in dollars needed for $1.00 of output. (a) Find the consumption matrix for the company. (b) Suppose that the customers (the open sector) have a demand for $5400 worth of Web design, $2700 worth of software, and $900 worth of networking. Use row reduction to find a production vector that will meet this demand exactly. Table Ex-4 Income Required per Dollar Output

Provider

Web Design

Software

Networking

Web Design

$ 0.40

$ 0.20

$ 0.45

Software

$ 0.30

$ 0.35

$ 0.30

Networking

$0.15

$0.10

$ 0.20

In Exercises 5–6, use matrix inversion to find the production vector x that meets the demand d for the consumption matrix C. 5. Answer:

6. 7. Consider an open economy with consumption matrix

(a) Showthat the economy can meet a demand of units from the first sector and units from the second sector, but it cannot meet a demand of units from the first sector and unit from the second sector. (b) Give both a mathematical and an economic explanation of the result in part (a). 8. Consider an open economy with consumption matrix

If the open sector demands the same dollar value from each product-producing sector, which such sector must produce the greatest dollar value to meet the demand? 9. Consider an open economy with consumption matrix

Show that the Leontief equation

has a unique solution for every demand vector d if

.

10. (a) Consider an open economy with a consumption matrix C whose column sums are less than 1, and let x be the production vector that satisfies an outside demand d; that is, . Let be the demand vector that is obtained by increasing the jth entry of d by 1 and leaving the other entries fixed. Prove that the production vector that meets this demand is

(b) In words, what is the economic significance of the jth column vector of 11. Prove: If C is an

matrix whose entries are nonnegative and whose row sums are less than 1, then

invertible and has nonnegative entries. [Hint:

for any invertible matrix A.]

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) Sectors of an economy that produce outputs are called open sectors. Answer: False (b) A closed economy is an economy that has no open sectors. Answer: True (c) The rows of a consumption matrix represent the outputs in a sector of an economy. Answer: False

? [Hint: Look at

.] is

(d) If the column sums of the consumption matrix are all less than 1, then the Leontif matrix is invertible. Answer: True (e) The Leontif equation relates the production vector for an economy to the outside demand vector. Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

Chapter 1 Supplementary Exercises In Exercises 1–4 the given matrix represents an augmented matrix for a linear system. Write the corresponding set of linear equations for the system, and use Gaussian elimination to solve the linear system. Introduce free parameters as necessary. 1. Answer:

2.

3.

Answer:

4.

5. Use Gauss–Jordan elimination to solve for x′ and y′ in terms of x and y.

Answer:

6. Use Gauss–Jordan elimination to solve for x′ and y′ in terms of x and y.

7. Find positive integers that satisfy

Answer:

8. A box containing pennies, nickels, and dimes has 13 coins with a total value of 83 cents. How many coins of each type are in the box? 9. Let

be the augmented matrix for a linear system. Find for what values of a and b the system has (a) a unique solution. (b) a one-parameter solution. (c) a two-parameter solution. (d) no solution. Answer: (a) (b) (c) (d) 10. For which value(s) of a does the following system have zero solutions? One solution? Infinitely many solutions?

11. Find a matrix K such that

given that

Answer:

12. How should the coefficients a, b, and c be chosen so that the system

has the solution

, and

?

13. In each part, solve the matrix equation for X. (a)

(b) (c)

Answer: (a) (b) (c)

14. Let A be a square matrix. (a) Show that

if

.

(b) Show that if

.

15. Find values of a, b, and c such that the graph of the polynomial points (1, 2), (−1, 6), and (2, 3). Answer:

16. (Calculus required) Find values of a, b, and c such that the graph of the polynomial

passes through the

passes through the point (−1, 0) and has a horizontal tangent at (2, −9). 17. Let Jn be the

matrix each of whose entries is 1. Show that if

, then

18. Show that if a square matrix A satisfies then so does

.

19. Prove: If B is invertible, then 20. Prove: If A is invertible, then 21. Prove: If A is an

where

if and only if and

matrix and B is the

.

are both invertible or both not invertible. matrix each of whose entries is 1/n, then

is the average of the entries in the ith row of A.

22. (Calculus required) If the entries of the matrix

are differentiable functions of x, then we define

Show that if the entries in A and B are differentiable functions of x and the sizes of the matrices are such that the stated operations can be performed, then (a) (b) (c) 23. (Calculus required) Use part (c) of Exercise 22 to show that

State all the assumptions you make in obtaining this formula. 24. Assuming that the stated inverses exist, prove the following equalities.

(a) (b) (c)

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

CHAPTER

2

Determinants

CHAPTER CONTENTS 2.1. Determinants by Cofactor Expansion 2.2. Evaluating Determinants by Row Reduction 2.3. Properties of Determinants; Cramer's Rule

INTRODUCTION In this chapter we will study “determinants” or, more precisely, “determinant functions.” Unlike real-valued functions, such as , that assign a real number to a real variable x, determinant functions assign a real number to a matrix variable A. Although determinants first arose in the context of solving systems of linear equations, they are no longer used for that purpose in real-world applications. Although they can be useful for solving very small linear systems (say two or three unknowns), our main interest in them stems from the fact that they link together various concepts in linear algebra and provide a useful formula for the inverse of a matrix.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

2.1 Determinants by Cofactor Expansion In this section we will define the notion of a “determinant.” This will enable us to give a specific formula for the inverse of an invertible matrix, whereas up to now we have had only a computational procedure for finding it. This, in turn, will eventually provide us with a formula for solutions of certain kinds of linear systems. Recall from Theorem 1.4.5 that the

matrix

WARNING It is important to keep in mind that whereas A is a matrix.

is a number,

is invertible if and only if and that the expression that this determinant is denoted by writing

is called the determinant of the matrix A. Recall also

(1) and that the inverse of A can be expressed in terms of the determinant as (2)

Minors and Cofactors One of our main goals in this chapter is to obtain an analog of Formula 2 that is applicable to square matrices of all orders. For this purpose we will find it convenient to use subscripted entries when writing matrices or determinants. Thus, if we denote a matrix as

then the two equations in 1 take the form (3)

We define the determinant of a as

matrix

The following definition will be key to our goal of extending the definition of a determinant to higher order matrices.

DEFINITION 1 If A is a square matrix, then the minor of entry is denoted by and is defined to be the determinant of the submatrix that remains after the ith row and jth column are deleted from A. The number is denoted by and is called the cofactor of entry

.

E X A M P L E 1 Finding Minors and Cofactors Let

WARNING We have followed the standard convention of using capital letters to denote minors and cofactors even though they are numbers, not matrices. The minor of entry

The cofactor of

is

is

Similarly, the minor of entry

The cofactor of

is

is

Historical Note The term determinant was first introduced by the German mathematician Carl Friedrich Gauss in 1801 (see p. 15), who used them to “determine” properties of certain kinds of functions. Interestingly, the term matrix is derived from a Latin word for “womb” because it was viewed as a container of determinants.

Historical Note The term minor is apparently due to the English mathematician James Sylvester (see p. 34), who wrote the following in a paper published in 1850: “Now conceive any one line and any one column be struck out, we get… a square, one term less in breadth and depth than the original square; and by varying in every possible selection of the line and column excluded, we obtain, supposing the original square to consist of n lines and n columns, such minor squares, each of which will represent what I term a “First Minor Determinant” relative to the principal or complete determinant.”

Remark Note that a minor

and its corresponding cofactor

are either the same or negatives of each other and that the

relating sign

is either

or

in accordance with the pattern in the “checkerboard” array

For example, and so forth. Thus, it is never really necessary to calculate

to calculate

—you can simply compute the minor

and then adjust the sign in accordance with the checkerboard pattern. Try this in Example 1.

E X A M P L E 2 Cofactor Expansions of a The checkerboard pattern for a

matrix

Matrix

is

so that

We leave it for you to use Formula 3 to verify that four ways:

can be expressed in terms of cofactors in the following

(4)

Each of last four equations is called a cofactor expansion of . In each cofactor expansion the entries and cofactors all come from the same row or same column of A. For example, in the first equation the entries and cofactors all come from the first row of A, in the second they all come from the second row of A, in the third they all come from the first column of A, and in the fourth they all come from the second column of A.

Definition of a General Determinant Formula 4 is a special case of the following general result, which we will state without proof.

THEOREM 2.1.1 If A is an matrix, then regardless of which row or column of A is chosen, the number obtained by multiplying the entries in that row or column by the corresponding cofactors and adding the resulting products is always the same.

This result allows us to make the following definition.

DEFINITION 2 If A is an matrix, then the number obtained by multiplying the entries in any row or column of A by the corresponding cofactors and adding the resulting products is called the determinant of A, and the sums themselves are called cofactor expansions of A. That is, (5) and (6)

E X A M P L E 3 Cofactor Expansion Along the First Row Find the determinant of the matrix

by cofactor expansion along the first row. Solution

E X A M P L E 4 Cofactor Expansion Along the First Column Let A be the matrix in Example 3, and evaluate

by cofactor expansion along the first column of A.

Solution

Note that in Example 4 we had to compute three cofactors, whereas in Example 3 only two were needed because the third was multiplied by zero. As a rule, the best strategy for cofactor expansion is to expand along a row or column with the most zeros.

This agrees with the result obtained in Example 3.

Charles Lutwidge Dodgson (Lewis Carroll) (1832–1898) Historical Note Cofactor expansion is not the only method for expressing the determinant of a matrix in terms of determinants of lower order. For example, although it is not well known, the English mathematician Charles Dodgson, who was the author of Alice's Adventures in Wonderland and Through the Looking Glass under the pen name of Lewis Carroll, invented such a method, called “condensation.” That method has recently been resurrected from obscurity because of its suitability for parallel processing on computers. [Image: Time & Life Pictures/Getty Images, Inc.]

E X A M P L E 5 Smart Choice of Row or Column If A is the

then to find

For the zeros:

matrix

it will be easiest to use cofactor expansion along the second column, since it has the most zeros:

determinant, it will be easiest to use cofactor expansion along its second column, since it has the most

E X A M P L E 6 Determinant of an Upper Triangular Matrix

The following computation shows that the determinant of a upper triangular matrix is the product of its diagonal entries. Each part of the computation uses a cofactor expansion along the first row.

The method illustrated in Example 6 can be easily adapted to prove the following general result.

THEOREM 2.1.2 If A is an triangular matrix (upper triangular, lower triangular, or diagonal), then entries on the main diagonal of the matrix; that is, .

is the product of the

A Useful Technique for Evaluating 2 × 2 and 3 × 3 Determinants Determinants of

and

matrices can be evaluated very efficiently using the pattern suggested in Figure 2.1.1.

Figure 2.1.1 In the case, the determinant can be computed by forming the product of the entries on the rightward arrow and subtracting the product of the entries on the leftward arrow. In the case we first recopy the first and second columns as shown in the figure, after which we can compute the determinant by summing the products of the entries on the rightward arrows and subtracting the products on the leftward arrows. These procedures execute the computations

WARNING The arrow technique only works for determinants of and matrices.

which agrees with the cofactor expansions along the first row.

E X A M P L E 7 A Technique for Evaluating 2 × 2 and 3 × 3 Determinants

Concept Review • Determinant • Minor • Cofactor • Cofactor expansion

Skills • Find the minors and cofactors of a square matrix. • Use cofactor expansion to evaluate the determinant of a square matrix. • Use the arrow technique to evaluate the determinant of a • Use the determinant of a

or

matrix.

invertible matrix to find the inverse of that matrix.

• Find the determinant of an upper triangular, lower triangular, or diagonal matrix by inspection.

Exercise Set 2.1 In Exercises 1–2, find all the minors and cofactors of the matrix A. 1.

Answer:

2.

3. Let

Find (a) (b) (c) (d) Answer: (a) (b) (c) (d) 4. Let

Find (a) (b) (c) (d) In Exercises 5–8, evaluate the determinant of the given matrix. If the matrix is invertible, use Equation 2 to find its inverse. 5. Answer:

6. 7. Answer:

8.

In Exercises 9–14, use the arrow technique to evaluate the determinant of the given matrix. 9. Answer:

10.

11.

Answer: 12.

13.

Answer: 14.

In Exercises 15–18, find all values of λ for which

.

15. Answer: 16.

17. Answer: 18.

19. Evaluate the determinant of the matrix in Exercise 13 by a cofactor expansion along (a) the first row. (b) the first column. (c) the second row. (d) the second column. (e) the third row. (f) the third column. Answer:

20. Evaluate the determinant of the matrix in Exercise 12 by a cofactor expansion along (a) the first row. (b) the first column. (c) the second row. (d) the second column. (e) the third row. (f) the third column. In Exercises 21–26, evaluate 21.

Answer: 22.

by a cofactor expansion along a row or column of your choice.

23.

Answer: 0 24.

25.

Answer: 26.

In Exercises 27–32, evaluate the determinant of the given matrix by inspection. 27.

Answer: 28.

29.

Answer: 0 30.

31.

Answer: 6 32.

33. Show that the value of the following determinant is independent of θ.

Answer: The determinant is

.

34. Show that the matrices

commute if and only if

35. By inspection, what is the relationship between the following determinants?

Answer:

36. Show that

for every

matrix A.

37. What can you say about an nth-order determinant all of whose entries are 1? Explain your reasoning. 38. What is the maximum number of zeros that a reasoning.

matrix can have without having a zero determinant? Explain your

39. What is the maximum number of zeros that a reasoning.

matrix can have without having a zero determinant? Explain your

40. Prove that

,

, and

are collinear points if and only if

41. Prove that the equation of the line through the distinct points

and

can be written as

42. Prove that if A is upper triangular and is upper triangular if .

is the matrix that results when the ith row and jth column of A are deleted, then

True-False Exercises In parts (a)–(i) determine whether the statement is true or false, and justify your answer. (a)

The determinant of the

matrix

is

.

Answer: False (b) Two square matrices A and B can have the same determinant only if they are the same size. Answer: False (c) The minor

is the same as the cofactor

if and only if

is even.

Answer: True (d) If A is a

symmetric matrix, then

for all i and j.

Answer: True (e) The value of a cofactor expansion of a matrix A is independent of the row or column chosen for the expansion. Answer: True (f) The determinant of a lower triangular matrix is the sum of the entries along its main diagonal. Answer: False (g) For every square matrix A and every scalar c, we have

.

Answer: False (h) For all square matrices A and B, we have Answer: False

.

(i) For every

matrix A, we have

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

.

2.2 Evaluating Determinants by Row Reduction In this section we will show how to evaluate a determinant by reducing the associated matrix to row echelon form. In general, this method requires less computation than cofactor expansion and hence is the method of choice for large matrices.

A Basic Theorem We begin with a fundamental theorem that will lead us to an efficient procedure for evaluating the determinant of a square matrix of any size.

THEOREM 2.2.1 Let A be a square matrix. If A has a row of zeros or a column of zeros, then

.

Proof Since the determinant of A can be found by a cofactor expansion along any row or column, we can use the row or column of zeros. Thus, if we let denote the cofactors of A along that row or column, then it follows from Formula 5 or 6 in Section 2.1 that

The following useful theorem relates the determinant of a matrix and the determinant of its transpose.

THEOREM 2.2.2 Let A be a square matrix. Then

.

Because transposing a matrix changes its columns to rows and its rows to columns, almost every theorem about the rows of a determinant has a companion version about columns, and vice versa. Proof Since transposing a matrix changes its columns to rows and its rows to columns, the cofactor expansion of A along any row is the same as the cofactor expansion of AT along the corresponding column. Thus, both have the same determinant.

Elementary Row Operations The next theorem shows how an elementary row operation on a square matrix affects the value of its determinant. In

place of a formal proof we have provided a table to illustrate the ideas in the

case (see Table 1).

THEOREM 2.2.3 Let A be an

matrix.

(a) If B is the matrix that results when a single row or single column of A is multiplied by a scalar k, then . (b) If B is the matrix that results when two rows or two columns of A are interchanged, then

.

(c) If B is the matrix that results when a multiple of one row of A is added to another row or when a multiple of one column is added to another column, then .

The first panel of Table 1 shows that you can bring a common factor from any row (column) of a determinant through the determinant sign. This is a slightly different way of thinking about part (a) of Theorem 2.2.3. Table 1

We will verify the first equation in Table 1 and leave the other two for you. To start, note that the determinants on the two sides of the equation differ only in the first row, so these determinants have the same cofactors, , , , along that row (since those cofactors depend only on the entries in the second two rows). Thus, expanding the left side by cofactors along the first row yields

Elementary Matrices It will be useful to consider the special case of Theorem 2.2.3 in which is the identity matrix and E (rather than B) denotes the elementary matrix that results when the row operation is performed on . In this special case Theorem 2.2.3 implies the following result.

THEOREM 2.2.4 Let E be an

elementary matrix.

(a) If E results from multiplying a row of

by a nonzero number k, then

(b) If E results from interchanging two rows of

, then

(c) If E results from adding a multiple of one row of

.

. to another, then

.

E X A M P L E 1 Determinants of Elementary Matrices The following determinants of elementary matrices, which are evaluated by inspection, illustrate Theorem 2.2.4.

Observe that the determinant of an elementary matrix cannot be zero.

Matrices with Proportional Rows or Columns If a square matrix A has two proportional rows, then a row of zeros can be introduced by adding a suitable multiple of one

of the rows to the other. Similarly for columns. But adding a multiple of one row or column to another does not change the determinant, so from Theorem 2.2.1, we must have . This proves the following theorem.

THEOREM 2.2.5 If A is a square matrix with two proportional rows or two proportional columns, then

.

E X A M P L E 2 Introducing Zero Rows The following computation shows how to introduce a row of zeros when there are two proportional rows.

Each of the following matrices has two proportional rows or columns; thus, each has a determinant of zero.

Evaluating Determinants by Row Reduction We will now give a method for evaluating determinants that involves substantially less computation than cofactor expansion. The idea of the method is to reduce the given matrix to upper triangular form by elementary row operations, then compute the determinant of the upper triangular matrix (an easy computation), and then relate that determinant to that of the original matrix. Here is an example.

E X A M P L E 3 Using Row Reduction to Evaluate a Determinant Evaluate

where

Solution We will reduce A to row echelon form (which is upper triangular) and then apply Theorem 2.1.2.

Even with today's fastest computers it would take millions of years to calculate a determinant by cofactor expansion, so

methods based on row reduction are often used for large determinants. For determinants of small size (such as those in this text), cofactor expansion is often a reasonable choice.

E X A M P L E 4 Using Column Operations to Evaluate a Determinant Compute the determinant of

Solution This determinant could be computed as above by using elementary row operations to reduce A to row echelon form, but we can put A in lower triangular form in one step by adding −3 times the first column to the fourth to obtain

Example 4 points out that it is always wise to keep an eye open for column operations that can shorten

computations.

Cofactor expansion and row or column operations can sometimes be used in combination to provide an effective method for evaluating determinants. The following example illustrates this idea.

E X A M P L E 5 Row Operations and Cofactor Expansion Evaluate

where

Solution By adding suitable multiples of the second row to the remaining rows, we obtain

Skills • Know the effect of elementary row operations on the value of a determinant. • Know the determinants of the three types of elementary matrices. • Know how to introduce zeros into the rows or columns of a matrix to facilitate the evaluation of its determinant. • Use row reduction to evaluate the determinant of a matrix. • Use column operations to evaluate the determinant of a matrix. • Combine the use of row reduction and cofactor expansion to evaluate the determinant of a matrix.

Exercise Set 2.2 In Exercises 1–4, verify that 1.

.

2. 3.

4.

In Exercises 5–9, find the determinant of the given elementary matrix by inspection. 5.

Answer: 6.

7.

Answer: 8.

9.

Answer: 1 In Exercises 10–17, evaluate the determinant of the given matrix by reducing the matrix to row echelon form. 10.

11.

Answer: 5 12.

13.

Answer: 33 14.

15.

Answer: 6 16.

17.

Answer: 18. Repeat Exercises 10–13 by using a combination of row reduction and cofactor expansion. 19. Repeat Exercises 14–17 by using a combination of row operations and cofactor expansion. Answer: Exercise 14: 39; Exercise 15: 6; Exercise 16:

; Exercise 17:

In Exercises 20–27, evaluate the determinant, given that

20.

21.

Answer: 22.

23.

Answer: 72 24.

25.

Answer: 26.

27.

Answer: 18 28. Show that

(a)

(b)

29. Use row reduction to show that

In Exercises 30–33, confirm the identities without evaluating the determinants directly. 30.

31.

32.

33.

34. Find the determinant of the following matrix.

In Exercises 35–36, show that

without directly evaluating the determinant.

35.

36.

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer.

(a) If A is a rows, then

matrix and B is obtained from A by interchanging the first two rows and then interchanging the last two .

Answer: True (b) If A is a by , then

matrix and B is obtained from A by multiplying the first column by 4 and multiplying the third column .

Answer: True (c) If A is a then

matrix and B is obtained from A by adding 5 times the first row to each of the second and third rows, .

Answer: False (d) If A is an

matrix and B is obtained from A by multiplying each row of A by its row number, then

Answer: False (e) If A is a square matrix with two identical columns, then

.

Answer: True (f) If the sum of the second and fourth row vectors of a Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

matrix A is equal to the last row vector, then

.

2.3 Properties of Determinants; Cramer's Rule In this section we will develop some fundamental properties of matrices, and we will use these results to derive a formula for the inverse of an invertible matrix and formulas for the solutions of certain kinds of linear systems.

Basic Properties of Determinants Suppose that A and B are matrices and k is any scalar. We begin by considering possible relationships between , , and Since a common factor of any row of a matrix can be moved through the determinant sign, and since each of the n rows in has a common factor of k, it follows that (1) For example,

Unfortunately, no simple relationship exists among that will usually not be equal to

,

, and . In particular, we emphasize . The following example illustrates this fact.

E X A M P L E 1 det(A + B) ≠ det(A) + det(B) Consider

We have

,

, and

; thus

In spite of the previous example, there is a useful relationship concerning sums of determinants that is applicable when the matrices involved are the same except for one row (column). For example, consider the following two matrices that differ only in the second row:

Calculating the determinants of A and B we obtain

Thus

This is a special case of the following general result.

THEOREM 2.3.1 Let A, B, and C be matrices that differ only in a single row, say the rth, and assume that the rth row of C can be obtained by adding corresponding entries in the rth rows of A and B. Then The same result holds for columns.

E X A M P L E 2 Sums of Determinants We leave it to you to confirm the following equality by evaluating the determinants.

Determinant of a Matrix Product Considering the complexity of the formulas for determinants and matrix multiplication, it would seem unlikely that a simple relationship should exist between them. This is what makes the simplicity of our next result so surprising. We will show that if A and B are square matrices of the same size, then (2) The proof of this theorem is fairly intricate, so we will have to develop some preliminary results first. We begin with the special case of 2 in which A is an elementary matrix. Because this special case is only a prelude to 2, we call it a lemma.

LEMMA 2.3.2 If B is an

matrix and E is an

elementary matrix, then

Proof We will consider three cases, each in accordance with the row operation that produces the matrix E. Case 1 If E results from multiplying a row of by k, then by Theorem 1.5.1, the corresponding row by k; so from Theorem 2.2.3(a) we have But from Theorem 2.2.4(a) we have

results from B by multiplying

, so

Case 2 and 3 The proofs of the cases where E results from interchanging two rows of or from adding a multiple of one row to another follow the same pattern as Case 1 and are left as exercises. Remark It follows by repeated applications of Lemma 2.3.2 that if B is an elementary matrices, then

matrix and

are

(3)

Determinant Test for Invertibility Our next theorem provides an important criterion for determining whether a matrix is invertible. It also takes us a step closer to establishing Formula 2.

THEOREM 2.3.3 A square matrix A is invertible if and only if

.

Proof Let R be the reduced row echelon form of A. As a preliminary step, we will show that and are both zero or both nonzero: Let be the elementary matrices that correspond to the elementary row operations that produce R from A. Thus

and from 3,

(4) We pointed out in the margin note that accompanies Theorem 2.2.4 that the determinant of an elementary matrix is nonzero. Thus, it follows from Formula 4 that and are either both zero or both nonzero, which sets the stage for the main part of the proof. If we assume first that A is invertible, then it follows from Theorem 1.6.4 that and hence that . This, in turn, implies that , which is what we wanted to show.

It follows from Theorems 2.3.3 and Theorem 2.2.5 that a square matrix with two proportional rows or two proportional columns is not invertible.

Conversely, assume that . It follows from this that , which tells us that R cannot have a row of zeros. Thus, it follows from Theorem 1.4.3 that and hence that A is invertible by Theorem 1.6.4.

E X A M P L E 3 Determinant Test for Invertibility Since the first and third rows of

are proportional,

. Thus A is not invertible.

We are now ready for the main result concerning products of matrices.

THEOREM 2.3.4 If A and B are square matrices of the same size, then

Proof We divide the proof into two cases that depend on whether or not A is invertible. If the matrix A is not invertible, then by Theorem 1.6.5 neither is the product AB. Thus, from Theorem Theorem 2.3.3, we have and , so it follows that .

Augustin Louis Cauchy (1789–1857) Historical Note In 1815 the great French mathematician Augustin Cauchy published a landmark paper in which he gave the first systematic and modern treatment of determinants. It was in that paper that Theorem 2.3.4 was stated and proved in full generality for the first time. Special cases of the theorem had been stated and proved earlier, but it was Cauchy who made the final jump. [Image: The Granger Collection, New York]

Now assume that A is invertible. By Theorem 1.6.4, the matrix A is expressible as a product of elementary matrices, say (5) so Applying 3 to this equation yields and applying 3 again yields which, from 5, can be written as

.

E X A M P L E 4 Verifying That det(AB) = det(A), det(B) Consider the matrices

We leave it for you to verify that Thus

, as guaranteed by Theorem 2.3.4.

The following theorem gives a useful relationship between the determinant of an invertible matrix and the determinant of its inverse.

THEOREM 2.3.5 If A is invertible, then

Proof Since Since

, it follows that

. Therefore, we must have

, the proof can be completed by dividing through by

.

.

Adjoint of a Matrix In a cofactor expansion we compute by multiplying the entries in a row or column by their cofactors and adding the resulting products. It turns out that if one multiplies the entries in any row by the corresponding cofactors from a different row, the sum of these products is always zero. (This result also holds for columns.) Although we omit the general proof, the next example illustrates the idea of the proof in a special case.

It follows from Theorems 2.3.5 and 2.1.2 that

Moreover, by using the adjoint formula it is possible to show that

are actually the successive diagonal entries of (compare A and in Example 3 of Section 1.7 ).

E X A M P L E 5 Entries and Cofactors from Different Rows Let

Consider the quantity that is formed by multiplying the entries in the first row by the cofactors of the corresponding entries in the third row and adding the resulting products. We can show that this quantity is equal to zero by the following trick: Construct a new matrix by replacing the third row of A with another copy of the first row. That is,

Let , , be the cofactors of the entries in the third row of and are the same, and since the computations of , , , entries from the first two rows of A and , it follows that Since

. Since the first two rows of A , , and involve only

has two identical rows, it follows from 3 that (6)

On the other hand, evaluating

by cofactor expansion along the third row gives (7)

From 6 and 7 we obtain

DEFINITION 1 If A is any

matrix and

is the cofactor of

, then the matrix

is called the matrix of cofactors from A. The transpose of this matrix is called the adjoint of A and is denoted by adj(A).

E X A M P L E 6 Adjoint of a 3 × 3 Matrix Let

The cofactors of A are

so the matrix of cofactors is

and the adjoint of A is

Leonard Eugene Dickson (1874–1954) Historical Note The use of the term adjoint for the transpose of the matrix of cofactors appears to have been introduced by the American mathematician L. E. Dickson in a research paper that he published in 1902. [Image: Courtesy of the American Mathematical Society]

In Theorem 1.4.5 we gave a formula for the inverse of a result to invertible matrices.

invertible matrix. Our next theorem extends that

THEOREM 2.3.6 Inverse of a Matrix Using Its Adjoint If A is an invertible matrix, then (8)

Proof We show first that

Consider the product

The entry in the ith row and jth column of the product

is (9)

(see the shaded lines above). If , then 9 is the cofactor expansion of along the ith row of A (Theorem 2.1.1), and if a's and the cofactors come from different rows of A, so the value of 9 is zero. Therefore,

, then the

(10)

Since A is invertible,

. Therefore, Equation 10 can be rewritten as

Multiplying both sides on the left by

yields

E X A M P L E 7 Using the Adjoint to Find an Inverse Matrix Use 8 to find the inverse of the matrix A in Example 6. Solution We leave it for you to check that

. Thus

Cramer's Rule Our next theorem uses the formula for the inverse of an invertible matrix to produce a formula, called Cramer's

rule, for the solution of a linear system matrix A is invertible (or, equivalently, when

of n equations in n unknowns in the case where the coefficient ).

THEOREM 2.3.7 Cramer's Rule If is a system of n linear equations in n unknowns such that unique solution. This solution is

where

, then the system has a

is the matrix obtained by replacing the entries in the jth column of A by the entries in the matrix

Proof If , then A is invertible, and by Theorem 1.6.2, Therefore, by Theorem 2.3.6 we have

is the unique solution of

.

Multiplying the matrices out gives

The entry in the jth row of x is therefore (11) Now let

Since differs from A only in the jth column, it follows that the cofactors of entries in same as the cofactors of the corresponding entries in the jth column of A. The cofactor expansion of along the jth column is therefore

are the

Substituting this result in 11 gives

E X A M P L E 8 Using Cramer's Rule to Solve a Linear System Use Cramer's rule to solve

Gabriel Cramer (1704–1752) Historical Note Variations of Cramer's rule were fairly well known before the Swiss mathematician discussed it in work he published in 1750. It was Cramer's superior notation that popularized the method and led mathematicians to attach his name to it. [Image: Granger Collection]

Solution

For n > 3, it is usually more efficient to solve a linear system with n equations in n unknowns by Gauss–Jordan elimination than by Cramer's rule. Its main use is for obtaining properties of solutions of a linear system without actually solving the system.

Therefore,

Equivalence Theorem In Theorem 1.6.4 we listed five results that are equivalent to the invertibility of a matrix A. We conclude this section by merging Theorem 2.3.3 with that list to produce the following theorem that relates all of the major topics we have studied thus far.

THEOREM 2.3.8 Equivalent Statements If A is an

matrix, then the following statements are equivalent.

(a) A is invertible. (b)

has only the trivial solution.

(c) The reduced row echelon form of A is

.

(d) A can be expressed as a product of elementary matrices. (e)

is consistent for every

(f)

has exactly one solution for every

(g)

matrix b. matrix b.

.

OPTIONAL

We now have all of the machinery necessary to prove the following two results, which we stated without proof in Theorem 1.7.1: • Theorem 1.7.1(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero.

• Theorem 1.7.1(d) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an invertible upper triangular matrix is upper triangular. Proof of Theorem 1.7.1(c) Let

be a triangular matrix, so that its diagonal entries are

From Theorem 2.1.2, the matrix A is invertible if and only if is nonzero, which is true if and only if the diagonal entries are all nonzero. Proof of Theorem 1.7.1(d) We will prove the result for upper triangular matrices and leave the lower triangular case for you. Assume that A is upper triangular and invertible. Since

we can prove that is upper triangular by showing that is upper triangular or, equivalently, that the matrix of cofactors is lower triangular. We can do this by showing that every cofactor with (i.e., above the main diagonal) is zero. Since

it suffices to show that each minor with the ith row and jth column of A are deleted, so

is zero. For this purpose, let

be the matrix that results when

(12) From the assumption that , it follows that is upper triangular (see Figure Figure 1.7.1). Since A is upper triangular, its -st row begins with at least i zeros. But the ith row of is the -st row of A with the entry in the jth column removed. Since , none of the first i zeros is removed by deleting the jth column; thus the ith row of starts with at least i zeros, which implies that this row has a zero on the main diagonal. It now follows from Theorem 2.1.2 that and from 12 that .

Concept Review • Determinant test for invertibility • Matrix of cofactors • Adjoint of a matrix • Cramer's rule • Equivalent statements about an invertible matrix

Skills • Know how determinants behave with respect to basic arithmetic operations, as given in Equation 1, Theorem 2.3.1, Lemma 2.3.2, and Theorem 2.3.4. • Use the determinant to test a matrix for invertibility.

• Know how

and

are related.

• Compute the matrix of cofactors for a square matrix A. • Compute

for a square matrix A.

• Use the adjoint of an invertible matrix to find its inverse. • Use Cramer's rule to solve linear systems of equations. • Know the equivalent characterizations of an invertible matrix given in Theorem 2.3.8.

Exercise Set 2.3 In Exercises 1–4, verify that

.

1. 2. 3.

4.

In Exercises 5–6, verify that

and determine whether the equality holds.

5.

6.

In Exercises 7–14, use determinants to decide whether the given matrix is invertible. 7.

Answer: Invertible

8.

9.

Answer: Invertible 10.

11.

Answer: Not invertible 12.

13.

Answer: Invertible 14.

In Exercises 15–18, find the values of k for which A is invertible. 15. Answer:

16.

17.

Answer: 18.

In Exercises 19–23, decide whether the given matrix is invertible, and if so, use the adjoint method to find its inverse. 19.

Answer:

20.

21.

Answer:

22.

23.

Answer:

In Exercises 24–29, solve by Cramer's rule, where it applies. 24. 25.

Answer:

26.

27.

Answer:

28.

29.

Answer: Cramer's rule does not apply. 30. Show that the matrix

is invertible for all values of θ; then find

using Theorem 2.3.6.

31. Use Cramer's rule to solve for y without solving for the unknowns x, z, and w.

Answer:

32. Let

be the system in Exercise 31.

(a) Solve by Cramer's rule. (b) Solve by Gauss–Jordan elimination. (c) Which method involves fewer computations? 33. Prove that if

and all the entries in A are integers, then all the entries in

are integers.

34. Let be a system of n linear equations in n unknowns with integer coefficients and integer constants. Prove that if , the solution x has integer entries. 35. Let

Assuming that

, find

(a) (b) (c) (d) (e)

Answer: (a) (b) (c) (d) (e) 7 36. In each part, find the determinant given that A is a (a)

matrix for which

(b) (c) (d) 37. In each part, find the determinant given that A is a

matrix for which

(a) (b) (c) (d) Answer: (a) 189 (b) (c) (d) 38. Prove that a square matrix A is invertible if and only if 39. Show that if A is a square matrix, then

is invertible. .

True-False Exercises In parts (a)–(l) determine whether the statement is true or false, and justify your answer. (a) If A is a

matrix, then

.

Answer: False (b) If A and B are square matrices of the same size such that Answer: False (c) If A and B are square matrices of the same size and A is invertible, then

Answer: True

, then

.

(d) A square matrix A is invertible if and only if

.

Answer: False (e) The matrix of cofactors of A is precisely

.

Answer: True (f) For every

matrix A, we have

Answer: True (g) If A is a square matrix and the linear system

has multiple solutions for x, then

.

Answer: True (h) If A is an matrix and there exists an matrix b such that the linear system then the reduced row echelon form of A cannot be In.

has no solutions,

Answer: True (i) If E is an elementary matrix, then

has only the trivial solution.

Answer: True (j) If A is an invertible matrix, then the linear system system has only the trivial solution. Answer: True (k) If A is invertible, then

must also be invertible.

Answer: True (l) If A has a row of zeros, then so does Answer: False

.

has only the trivial solution if and only if the linear

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

Chapter 2 Supplementary Exercises In Exercises 1–8, evaluate the determinant of the given matrix by (a) cofactor expansion and (b) using elementary row operations to introduce zeros into the matrix. 1. Answer: 2. 3.

Answer: 24 4.

5.

Answer: 6.

7.

Answer: 329 8.

9. Evaluate the determinants in Exercises 3–6 by using the arrow technique (see Example 7 in Section 2.1). Answer: Exercise 3: 24; Exercise 4: 0; Exercise 5:

; Exercise 6:

10. (a) Construct a matrix whose determinant is easy to compute using cofactor expansion but hard to evaluate using elementary row operations. (b) Construct a matrix whose determinant is easy to compute using elementary row operations but hard to evaluate using cofactor expansion. 11. Use the determinant to decide whether the matrices in Exercises 1–4 are invertible. Answer: The matrices in Exercises 1–3 are invertible, the matrix in Exercise 4 is not. 12. Use the determinant to decide whether the matrices in Exercises 5–8 are invertible. In Exercises 13–15, find the determinant of the given matrix by any method. 13. Answer:

14.

15.

Answer: 16. Solve for x.

In Exercises 17–24, use the adjoint method (Theorem 2.3.6) to find the inverse of the given matrix, if it exists. 17. The matrix in Exercise 1.

Answer:

18. The matrix in Exercise 2. 19. The matrix in Exercise 3. Answer:

20. The matrix in Exercise 4. 21. The matrix in Exercise 5. Answer:

22. The matrix in Exercise 6. 23. The matrix in Exercise 7. Answer:

24. The matrix in Exercise 8. 25. Use Cramer's rule to solve for

and

in terms of x and y.

Answer:

26. Use Cramer's rule to solve for

and

in terms of x and y.

27. By examining the determinant of the coefficient matrix, show that the following system has a nontrivial solution if and only if .

28. Let A be a

matrix, each of whose entries is 1 or 0. What is the largest possible value for

29. (a) For the triangle in the accompanying figure, use trigonometry to show that

and then apply Cramer's rule to show that

(b) Use Cramer's rule to obtain similar formulas for

and

.

Figure Ex-29 Answer: (b) 30. Use determinants to show that for all real values of λ, the only solution of

is

,

.

?

31. Prove: If A is invertible, then

32. Prove: If A is an

is invertible and

matrix, then

33. Prove: If the entries in each row of an matrix A add up to zero, then the determinant of A is zero. [Hint: Consider the product , where X is the matrix, each of whose entries is one. 34. (a) In the accompanying figure, the area of the triangle Use this and the fact that the area of a trapezoid equals

can be expressed as the altitude times the sum of the parallel

sides to show that

[Note: In the derivation of this formula, the vertices are labeled such that the triangle is traced counterclockwise proceeding from to to . For a clockwise orientation, the determinant above yields the negative of the area.] (b) Use the result in (a) to find the area of the triangle with vertices (3, 3), (4, 0), (−2, −1).

Figure Ex-34 35. Use the fact that 21,375, 38,798, 34,162, 40,223, and 79,154 are all divisible by 19 to show that

is divisible by 19 without directly evaluating the determinant. 36. Without directly evaluating the determinant, show that

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

CHAPTER

3

Euclidean Vector Spaces

CHAPTER CONTENTS 3.1. Vectors in 2-Space, 3-Space, and n-Space 3.2. Norm, Dot Product, and Distance in Rn 3.3. Orthogonality 3.4. The Geometry of Linear Systems 3.5. Cross Product

INTRODUCTION Engineers and physicists distinguish between two types of physical quantities—scalars, which are quantities that can be described by a numerical value alone, and vectors, which are quantities that require both a number and a direction for their complete physical description. For example, temperature, length, and speed are scalars because they can be fully described by a number that tells “how much”—a temperature of 20°C, a length of 5 cm, or a speed of 75 km/h. In contrast, velocity and force are vectors because they require a number that tells “how much” and a direction that tells “which way”—say, a boat moving at 10 knots in a direction 45° northeast, or a force of 100 lb acting vertically. Although the notions of vectors and scalars that we will study in this text have their origins in physics and engineering, we will be more concerned with using them to build mathematical structures and then applying those structures to such diverse fields as genetics, computer science, economics, telecommunications, and environmental science.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

3.1 Vectors in 2-Space, 3-Space, and n-Space Linear algebra is concerned with two kinds of mathematical objects, “matrices” and “vectors.” We are already familiar with the basic ideas about matrices, so in this section we will introduce some of the basic ideas about vectors. As we progress through this text we will see that vectors and matrices are closely related and that much of linear algebra is concerned with that relationship.

Geometric Vectors Engineers and physicists represent vectors in two dimensions (also called 2-space) or in three dimensions (also called 3-space) by arrows. The direction of the arrowhead specifies the direction of the vector and the length of the arrow specifies the magnitude. Mathematicians call these geometric vectors. The tail of the arrow is called the initial point of the vector and the tip the terminal point (Figure 3.1.1).

Figure 3.1.1 In this text we will denote vectors in boldface type such as a, b, v, w, and x, and we will denote scalars in lowercase italic type such as a, k, v, w, and x. When we want to indicate that a vector v has initial point A and terminal point B, then, as shown in Figure 3.1.2, we will write

Figure 3.1.2 Vectors with the same length and direction, such as those in Figure 3.1.3, are said to be equivalent. Since we want a vector to be determined solely by its length and direction, equivalent vectors are regarded to be the same vector even though they may be in different positions. Equivalent vectors are also said to be equal, which we indicate by writing

Figure 3.1.3 The vector whose initial and terminal points coincide has length zero, so we call this the zero vector and denote it by 0. The zero vector has no natural direction, so we will agree that it can be assigned any direction that is convenient for the problem at hand.

Vector Addition There are a number of important algebraic operations on vectors, all of which have their origin in laws of physics.

Parallelogram Rule for Vector Addition If v and w are vectors in 2-space or 3-space that are positioned so their initial points coincide, then the two vectors form adjacent sides of a parallelogram, and the sum is the vector represented by the arrow from the common initial point of and to the opposite vertex of the parallelogram (Figure 3.1.4a).

Figure 3.1.4

Here is another way to form the sum of two vectors.

Triangle Rule for Vector Addition If and are vectors in 2-space or 3-space that are positioned so the initial point of is at the terminal point of , then the sum is represented by the arrow from the initial point of to the terminal point of (Figure 3.1.4b).

In Figure 3.1.4c we have constructed the sums it evident that

and

by the triangle rule. This construction makes

(1) and that the sum obtained by the triangle rule is the same as the sum obtained by the parallelogram rule. Vector addition can also be viewed as a process of translating points.

Vector Addition Viewed as Translation If , and are positioned so their initial points coincide, then the terminal point of be viewed in two ways:

can

1. The terminal point of is the point that results when the terminal point of the direction of by a distance equal to the length of (Figure 3.1.5a).

is translated in

2. The terminal point of is the point that results when the terminal point of the direction of by a distance equal to the length of (Figure 3.1.5b).

is translated in

Accordingly, we say that

is the translation of

by

or, alternatively, the translation of

by .

Figure 3.1.5

Vector Subtraction In ordinary arithmetic we can write There is an analogous idea in vector arithmetic.

, which expresses subtraction in terms of addition.

Vector Subtraction The negative of a vector , denoted by , is the vector that has the same length as but is oppositely directed (Figure 3.1.6a), and the difference of from , denoted by , is taken to be

the sum (2)

Figure 3.1.6 The difference of from can be obtained geometrically by the parallelogram method shown in Figure 3.1.6b, or more directly by positioning and so their initial points coincide and drawing the vector from the terminal point of to the terminal point of (Figure 3.1.6c).

Scalar Multiplication Sometimes there is a need to change the length of a vector or change its length and reverse its direction. This is accomplished by a type of multiplication in which vectors are multiplied by scalars. As an example, the product denotes the vector that has the same direction as but twice the length, and the product denotes the vector that is oppositely directed to and has twice the length. Here is the general result.

Scalar Multiplication If is a nonzero vector in 2-space or 3-space, and if k is a nonzero scalar, then we define the scalar product of by to be the vector whose length is times the length of and whose direction is the same as that of if k is positive and opposite to that of if k is negative. If or , then we define to be .

Figure 3.1.7 shows the geometric relationship between a vector and some of its scalar multiples. In particular, observe that has the same length as but is oppositely directed; therefore, (3)

Figure 3.1.7

Parallel and Collinear Vectors Suppose that and are vectors in 2-space or 3-space with a common initial point. If one of the vectors is a scalar multiple of the other, then the vectors lie on a common line, so it is reasonable to say that they are collinear (Figure 3.1.8a). However, if we translate one of the vectors, as indicated in Figure 3.1.8b, then the vectors are parallel but no longer collinear. This creates a linguistic problem because translating a vector does not change it. The only way to resolve this problem is to agree that the terms parallel and collinear mean the same thing when applied to vectors. Although the vector has no clearly defined direction, we will regard it to be parallel to all vectors when convenient.

Figure 3.1.8

Sums of Three or More Vectors Vector addition satisfies the associative law for addition, meaning that when we add three vectors, say u, , and , it does not matter which two we add first; that is, It follows from this that there is no ambiguity in the expression no matter how the vectors are grouped.

because the same result is obtained

A simple way to construct is to place the vectors “tip to tail” in succession and then draw the vector from the initial point of u to the terminal point of (Figure 3.1.9a). The tip-to-tail method also works for four or more vectors (Figure 3.1.9b). The tip-to-tail method also makes it evident that if u, , and are vectors in 3-space with a common initial point, then is the diagonal of the parallelepiped that has the three vectors as adjacent sides (Figure 3.1.9c).

Figure 3.1.9

Vectors in Coordinate Systems Up until now we have discussed vectors without reference to a coordinate system. However, as we will soon see, computations with vectors are much simpler to perform if a coordinate system is present to work with. The component forms of the zero vector are in 2-space and in 3-space.

If a vector in 2-space or 3-space is positioned with its initial point at the origin of a rectangular coordinate system, then the vector is completely determined by the coordinates of its terminal point (Figure 3.1.10). We call these coordinates the components of relative to the coordinate system. We will write to denote a vector in 2-space with components , and to denote a vector in 3-space with components .

Figure 3.1.10 It should be evident geometrically that two vectors in 2-space or 3-space are equivalent if and only if they have the same terminal point when their initial points are at the origin. Algebraically, this means that two vectors are equivalent if and only if their corresponding components are equal. Thus, for example, the vectors in 3-space are equivalent if and only if

Remark It may have occurred to you that an ordered pair

can represent either a vector with

components and or a point with components and (and similarly for ordered triples). Both are valid geometric interpretations, so the appropriate choice will depend on the geometric viewpoint that we want to emphasize (Figure 3.1.11).

Figure 3.1.11 The ordered pair

can represent a point or a vector.

Vectors Whose Initial Point Is Not at the Origin It is sometimes necessary to consider vectors whose initial points are not at the origin. If vector with initial point given by the formula

and terminal point

denotes the

, then the components of this vector are

(4) That is, the components of

are obtained by subtracting the coordinates of the initial point from the

coordinates of the terminal point. For example, in Figure 3.1.12 the vector and

is the difference of vectors

, so

As you might expect, the components of a vector in 3-space that has initial point point are given by

and terminal

(5)

Figure 3.1.12

E X A M P L E 1 Finding the Components of a Vector The components of the vector

with initial point

and terminal point

are

n-Space The idea of using ordered pairs and triples of real numbers to represent points in two-dimensional space and three-dimensional space was well known in the eighteenth and nineteenth centuries. By the dawn of the twentieth century, mathematicians and physicists were exploring the use of “higher-dimensional” spaces in mathematics and physics. Today, even the layman is familiar with the notion of time as a fourth dimension, an idea used by Albert Einstein in developing the general theory of relativity. Today, physicists working in the field of “string theory” commonly use 11-dimensional space in their quest for a unified theory that will explain how the fundamental forces of nature work. Much of the remaining work in this section is concerned with extending the notion of space to n-dimensions. To explore these ideas further, we start with some terminology and notation. The set of all real numbers can be viewed geometrically as a line. It is called the real line and is denoted by R or . The superscript reinforces the intuitive idea that a line is one-dimensional. The set of all ordered pairs of real numbers (called 2-tuples) and the set of all ordered triples of real numbers (called 3-tuples) are denoted by and , respectively. The superscript reinforces the idea that the ordered pairs correspond to points in the plane (two-dimensional) and ordered triples to points in space (three-dimensional). The following definition extends this idea.

DEFINITION 1 If n is a positive integer, then an ordered n-tuple is a sequence of n real numbers The set of all ordered n-tuples is called n-space and is denoted by .

.

Remark You can think of the numbers in an n-tuple as either the coordinates of a generalized point or the components of a generalized vector, depending on the geometric image you want to bring to mind—the choice makes no difference mathematically, since it is the algebraic properties of n-tuples that are of concern. Here are some typical applications that lead to n-tuples.

• Experimental Data A scientist performs an experiment and makes n numerical measurements each time the experiment is performed. The result of each experiment can be regarded as a vector in in which are the measured values. • Storage and Warehousing A national trucking company has 15 depots for storing and servicing its trucks. At each point in time the distribution of trucks in the service depots can be described by a 15-tuple in which is the number of trucks in the first depot, is the number in the second depot, and so forth. • Electrical Circuits A certain kind of processing chip is designed to receive four input voltages and produces three output voltages in response. The input voltages can be regarded as vectors in and the output voltages as vectors in . Thus, the chip can be viewed as a device that transforms an input vector in into an output vector in . • Graphical Images One way in which color images are created on computer screens is by assigning each pixel (an addressable point on the screen) three numbers that describe the hue, saturation, and brightness of the pixel. Thus, a complete color image can be viewed as a set of 5-tuples of the form in which x and y are the screen coordinates of a pixel and h, s, and b are its hue, saturation, and brightness. • Economics One approach to economic analysis is to divide an economy into sectors (manufacturing, services, utilities, and so forth) and measure the output of each sector by a dollar value. Thus, in an economy with 10 sectors the economic output of the entire economy can be represented by a 10-tuple in which the numbers are the outputs of the individual sectors. • Mechanical Systems Suppose that six particles move along the same coordinate line so that at time t their coordinates are and their velocities are , respectively. This information can be represented by the vector in

. This vector is called the state of the particle system at time t.

Albert Einstein (1879-1955) Historical Note The German-born physicist Albert Einstein immigrated to the United States in 1935, where he settled at Princeton University. Einstein spent the last three decades of his life working unsuccessfully at producing a unified field theory that would establish an underlying link between the forces of gravity and electromagnetism. Recently, physicists have made progress on the problem using a framework known as string theory. In this theory the smallest, indivisible components of the Universe are not particles but loops that behave like vibrating strings. Whereas

Einstein's space-time universe was four-dimensional, strings reside in an 11-dimensional world that is the focus of current research. [Image: © Bettmann/© Corbis]

Operations on Vectors in Rn Our next goal is to define useful operations on vectors in . These operations will all be natural extensions of the familiar operations on vectors in and . We will denote a vector in using the notation and we will call

the zero vector.

We noted earlier that in and two vectors are equivalent (equal) if and only if their corresponding components are the same. Thus, we make the following definition.

DEFINITION 2 Vectors equal) if We indicate this by writing

and

in

are said to be equivalent (also called

.

E X A M P L E 2 Equality of Vectors

if and only if

, and

.

Our next objective is to define the operations of addition, subtraction, and scalar multiplication for vectors in . To motivate these ideas, we will consider how these operations can be performed on vectors in using components. By studying Figure 3.1.13 you should be able to deduce that if and , then (6)

(7)

In particular, it follows from 7 that (8) and hence that (9)

Figure 3.1.13 Motivated by Formulas 6–9, we make the following definition.

DEFINITION 3 If define

and

are vectors in

, and if k is any scalar, then we

(10)

(11)

(12)

(13)

In words, vectors are added (or subtracted) by adding (or subtracting) their corresponding components, and a vector is multiplied by a scalar by multiplying each component by that scalar.

E X A M P L E 3 Algebraic Operations Using Components If

and

, then

The following theorem summarizes the most important properties of vector operations.

THEOREM 3.1.1 If

and w are vectors in

and if k and m are scalars, then:

(a) (b) (c) (d) (e) (f) (g) (h)

We will prove part (b) and leave some of the other proofs as exercises. Proof (b) Let

, and

. Then

The following additional properties of vectors in of components (verify).

can be deduced easily by expressing the vectors in terms

THEOREM 3.1.2 If v is a vector in

and k is a scalar, then:

(a) (b) (c)

Calculating Without Components One of the powerful consequences of Theorems 3.1.1 and 3.1.2 is that they allow calculations to be performed without expressing the vectors in terms of components. For example, suppose that x, a, and b are vectors in , and we want to solve the vector equation for the vector x without using components. We could proceed as follows:

While this method is obviously more cumbersome than computing with components in important later in the text where we will encounter more general kinds of vectors.

, it will become

Linear Combinations Addition, subtraction, and scalar multiplication are frequently used in combination to form new vectors. For example, if , , and are vectors in , then the vectors are formed in this way. In general, we make the following definition.

DEFINITION 4 If

is a vector in

, then

is said to be a linear combination of the vectors

in

if it

can be expressed in the form (14) where are scalars. These scalars are called the coefficients of the linear combination. In the case where , Formula 14 becomes , so that a linear combination of a single vector is just a scalar muliple of that vector.

Note that this definition of a linear combination is consistent with that given in the context of matrices (see Definition 6 in Section 1.3).

Application of Linear Combinations to Color Models Colors on computer monitors are commonly based on what is called the RGB color model. Colors in this system are created by adding together percentages of the primary colors red (R), green (G), and blue (B). One way to do this is to identify the primary colors with the vectors

in and to create all other colors by forming linear combinations of , and b using coefficients between 0 and 1, inclusive; these coefficients represent the percentage of each pure color in the mix. The set of all such color vectors is called RGB space or the RGB color cube (Figure 3.1.14). Thus, each color vector c in this cube is expressible as a linear combination of the form

where . As indicated in the figure, the corners of the cube represent the pure primary colors together with the colors black, white, magenta, cyan, and yellow. The vectors along the diagonal running from black to white correspond to shades of gray.

Figure 3.1.14

Alternative Notations for Vectors Up to now we have been writing vectors in

using the notation (15)

We call this the comma-delimited form. However, since a vector in is just a list of its n components in a specific order, any notation that displays those components in the correct order is a valid way of representing the vector. For example, the vector in 15 can be written as (16) which is called row-matrix form, or as (17) which is called column-matrix form. The choice of notation is often a matter of taste or convenience, but sometimes the nature of a problem will suggest a preferred notation. Notations 15, 16, and 17 will all be used at various places in this text.

Concept Review • Geometric vector • Direction • Length • Initial point • Terminal point • Equivalent vectors

• Zero vector • Vector addition: parallelogram rule and triangle rule • Vector subtraction • Negative of a vector • Scalar multiplication • Collinear (i.e., parallel) vectors • Components of a vector • Coordinates of a point • n-tuple • n-space • Vector operations in n-space: addition, subtraction, scalar multiplication • Linear combination of vectors

Skills • Perform geometric operations on vectors: addition, subtraction, and scalar multiplication. • Perform algebraic operations on vectors: addition, subtraction, and scalar multiplication. • Determine whether two vectors are equivalent. • Determine whether two vectors are collinear. • Sketch vectors whose initial and terminal points are given. • Find components of a vector whose initial and terminal points are given. • Prove basic algebraic properties of vectors (Theorems 3.1.1 and 3.1.2).

Exercise Set 3.1 In Exercises 1–2, draw a coordinate system (as in Figure 3.1.10) and locate the points whose coordinates are given. 1. (a) (3, 4, 5) (b) (−3, 4, 5) (c) (3, −4, 5) (d) (3, 4, −5) (e) (−3, −4, 5) (f) (−3, 4, −5) Answer:

(a)

(b)

(c)

(d)

(e)

(f)

2. (a) (0,3,−3) (b) (3,−3,0) (c) (−3, 0, 0) (d) (3, 0, 3) (e) (0, 0, −3) (f) (0, 3, 0) In Exercises 3–4, sketch the following vectors with the initial points located at the origin. 3. (a) (b) (c)

(d) (e) (f) Answer: (a)

(b)

(c)

(d)

(e)

(f)

4. (a) (b) (c) (d) (e)

(f) In Exercises 5–6, sketch the following vectors with the initial points located at the origin. 5. (a) (b) (c) Answer: (a)

(b)

(c)

6. (a) (b) (c) (d) In Exercises 7–8, find the components of the vector

.

7. (a) (b) Answer: (a) (b) 8. (a) (b) 9. (a) Find the terminal point of the vector that is equivalent to .

and whose initial point is

(b) Find the initial point of the vector that is equivalent to .

and whose terminal point is

Answer: (a) The terminal point is B(2, 3). (b) The initial point is

.

10. (a) Find the initial point of the vector that is equivalent to .

and whose terminal point is

(b) Find the terminal point of the vector that is equivalent to . 11. Find a nonzero vector u with terminal point

and whose initial point is

such that

(a) u has the same direction as

.

(b) u is oppositely directed to

.

Answer: (a)

is one possible answer.

(b)

is one possible answer.

12. Find a nonzero vector u with initial point

such that

(a) u has the same direction as

.

(b) u is oppositely directed to 13. Let

,

.

, and

. Find the components of

(a) (b) (c) (d) (e) (f) Answer: (a) (b) (c) (d) (e) (f) 14. Let

,

, and

. Find the components of

(a) (b) (c) (d) (e) (f) 15. Let

,

, and

. Find the components of

(a) (b) (c) (d) (e) (f) Answer: (a) (b) (c) (d) (e) (f) 16. Let u, v, and w be the vectors in Exercise 15. Find the vector x that satisfies 17. Let components of (a) (b) (c) (d) (e) (f) Answer: (a) (b) (c) (d)

,

, and

. . Find the

(e) (f) 18. Let

and

. Find the components of

and

. Find the components

(a) (b) (c) 19. Let of (a) (b) (c) Answer: (a) (b) (c) 20. Let u, v, and w be the vectors in Exercise 18. Find the components of the vector x that satisfies the equation . 21. Let u, v, and w be the vectors in Exercise 19. Find the components of the vector x that satisfies the equation . Answer:

22. For what value(s) of t, if any, is the given vector parallel to

?

(a) (b) (c) 23. Which of the following vectors in (a) (b) (c) Answer: (a) Not parallel (b) Parallel

are parallel to

?

(c) Parallel 24. Let

and

Find scalars a and b so that .

25. Let

and

. Find scalars a and b so that

.

Answer:

26. Find all scalars

,

, and

such that

27. Find all scalars

,

, and

such that

,

, and

such that

Answer:

28. Find all scalars 29. Let ,

, , and

,

, and

such that

. Find scalars .

Answer:

30. Show that there do not exist scalars

,

, and

such that

31. Show that there do not exist scalars

,

, and

such that

32. Consider Figure 3.1.12. Discuss a geometric interpretation of the vector

33. Let P be the point

and Q the point

.

(a) Find the midpoint of the line segment connecting P and Q. (b) Find the point on the line segment connecting P and Q that is Answer: (a) (b)

of the way from P to Q.

,

34. Let P be the point Q, what is Q?

. If the point

is the midpoint of the line segment connecting P and

35. Prove parts (a), (c), and (d) of Theorem 3.1.1. 36. Prove parts (e)–(h) of Theorem 3.1.1. 37. Prove parts (a)–(c) of Theorem 3.1.2.

True-False Exercises In parts (a)–(k) determine whether the statement is true or false, and justify your answer. (a) Two equivalent vectors must have the same initial point. Answer: False (b) The vectors

and

are equivalent.

Answer: False (c) If k is a scalar and v is a vector, then v and kv are parallel if and only if

.

Answer: False (d) The vectors

and

are the same.

Answer: True (e) If

, then

.

Answer: True (f) If a and b are scalars such that

, then u and v are parallel vectors.

Answer: False (g) Collinear vectors with the same length are equal. Answer: False (h) If

, then

must be the zero vector.

Answer: True (i) If k and m are scalars and u and v are vectors, then

Answer: False (j) If the vectors v and w are given, then the vector equation can be solved for x. Answer: True (k) The linear combinations

and

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

can only be equal if

and

.

3.2 Norm, Dot Product, and Distance in R n In this section we will be concerned with the notions of length and distance as they relate to vectors. We will first discuss these ideas in and and then extend them algebraically to .

Norm of a Vector In this text we will denote the length of a vector v by the symbol , which is read as the norm of v, the length of v, or the magnitude of v (the term “norm” being a common mathematical synonym for length). As suggested in Figure 3.2.1a, it follows from the Theorem of Pythagoras that the norm of a vector in is (1) Similarly, for a vector Pythagoras that

in

, it follows from Figure 3.2.1b and two applications of the Theorem of

and hence that (2) Motivated by the pattern of Formulas 1 and 2 we make the following definition.

DEFINITION 1 If is a vector in , then the norm of v (also called the length of v or the magnitude of v) is denoted by , and is defined by the formula (3)

E X A M P L E 1 Calculating Norms It follows from Formula 2 that the norm of the vector

and it follows from Formula 3 that the norm of the vector

in

is

in

is

Figure 3.2.1

Our first theorem in this section will generalize to :

the following three familiar facts about vectors in

and

• Distances are nonnegative. • The zero vector is the only vector of length zero. • Multiplying a vector by a scalar multiplies its length by the absolute value of that scalar. It is important to recognize that just because these results hold in and does not guarantee that they hold in —their validity in must be proved using algebraic properties of n-tuples.

THEOREM 3.2.1 If v is a vector in

and if k is any scalar, then:

(a) (b)

if and only if

(c)

We will prove part (c) and leave (a) and (b) as exercises. Proof (c) If

, then

, so

Unit Vectors A vector of norm 1 is called a unit vector. Such vectors are useful for specifying a direction when length is not relevant to the problem at hand. You can obtain a unit vector in a desired direction by choosing any nonzero vector v in that direction and multiplying v by the reciprocal of its length. For example, if v is a vector of length 2 in or , then is a unit vector in the same direction as v. More generally, if v is any nonzero vector in

, then (4)

defines a unit vector that is in the same direction as v. We can confirm that 4 is a unit vector by applying part (c) of Theorem 3.2.1 with to obtain

The process of multiplying a nonzero vector by the reciprocal of its length to obtain a unit vector is called normalizing v.

WARNING Sometimes you will see Formula 4 expressed as

This is just a more compact way of writing that formula and is not intended to convey that v is being divided by .

E X A M P L E 2 Normalizing a Vector Find the unit vector u that has the same direction as Solution The vector v has length

Thus, from 4

.

As a check, you may want to confirm that

.

The Standard Unit Vectors When a rectangular coordinate system is introduced in or , the unit vectors in the positive directions of the coordinate axes are called the standard unit vectors. In these vectors are denoted by and in

by

(Figure 3.2.2). Every vector in and every vector linear combination of standard unit vectors by writing

in

can be expressed as a

(5)

(6) Moreover, we can generalize these formulas to

by defining the standard unit vectors in

to be (7)

in which case every vector

in

can be expressed as (8)

E X A M P L E 3 Linear Combinations of Standard Unit Vectors

Figure 3.2.2

Distance in Rn If

and

are points in

or

, then the length of the vector

two points (Figure 3.2.3). Specifically, if Section 3.1 implies that

and

is equal to the distance d between the are points in

, then Formula 4 of

(9) This is the familiar distance formula from analytic geometry. Similarly, the distance between the points and in 3-space is (10) Motivated by Formulas 9 and 10, we make the following definition.

DEFINITION 2 If u and v by

and and define it to be

are points in

, then we denote the distance between

(11)

Figure 3.2.3

We noted in the previous section that n-tuples can be viewed either as vectors or points in . In Definition 2 we chose to describe them as points, as that seemed the more natural interpretation.

E X A M P L E 4 Calculating Distance in Rn If then the distance between u and v is

Dot Product Our next objective is to define a useful multiplication operation on vectors in and and then extend that operation to . To do this we will first need to define exactly what we mean by the “angle” between two vectors in or . For this purpose, let u and v be nonzero vectors in or that have been positioned so that their initial points coincide. We define the angle between u and v to be the angle θ determined by u and v that satisfies the inequalities (Figure 3.2.4).

DEFINITION 3 If u and v are nonzero vectors in or , and if θ is the angle between u and v, then the dot product (also called the Euclidean inner product) of u and v is denoted by and is defined as (12) If

or

, then we define

to be 0.

Figure 3.2.4 The sign of the dot product reveals information about the angle θ that we can obtain by rewriting Formula 12 as (13) Since

, it follows from Formula 13 and properties of the cosine function studied in trigonometry that



is acute if



is obtuse if



if

. . .

E X A M P L E 5 Dot Product Find the dot product of the vectors shown in Figure 3.2.5.

Figure 3.2.5 Solution The lengths of the vectors are and the cosine of the angle θ between them is

Thus, it follows from Formula 12 that

E X A M P L E 6 A Geometry Problem Solved Using Dot Product Find the angle between a diagonal of a cube and one of its edges. Solution Let k be the length of an edge and introduce a coordinate system as shown in Figure 3.2.6. If we let , , and , then the vector is a diagonal of the cube. It follows from Formula 13 that the angle θ between d and the edge satisfies

With the help of a calculator we obtain

Figure 3.2.6

Note that the angle θ obtained in Example 6 does not involve k. Why was this to be expected?

Component Form of the Dot Product For computational purposes it is desirable to have a formula that expresses the dot product of two vectors in terms of components. We will derive such a formula for vectors in 3-space; the derivation for vectors in 2-space is similar.

Let and be two nonzero vectors. If, as shown in Figure 3.2.7, θ is the angle between u and v, then the law of cosines yields (14)

Josiah Willard Gibbs (1839-1903) Historical Note The dot product notation was first introduced by the American physicist and mathematician J. Willard Gibbs in a pamphlet distributed to his students at Yale University in the 1880s. The product was originally written on the baseline, rather than centered as today, and was referred to as the direct product. Gibbs's pamphlet was eventually incorporated into a book entitled Vector Analysis that was published in 1901 and coauthored with one of his students. Gibbs made major contributions to the fields of thermodynamics and electromagnetic theory and is generally regarded as the greatest American physicist of the nineteenth century. [Image: The Granger Collection, New York] Since

, we can rewrite 14 as

or

Substituting

and we obtain, after simplifying, (15)

Although we derived Formula 15 and its 2-space companion under the assumption that u and v are nonzero, it turned out that these formulas are also applicable if or (verify). The companion formula for vectors in 2-space is (16) Motivated by the pattern in Formulas 15 and 16, we make the following definition.

DEFINITION 4 If and are vectors in , then the dot product (also called the Euclidean inner product) of u and v is denoted by and is defined by (17)

In words, to calculate the dot product (Euclidean inner product) multiply corresponding components and add the resulting products.

E X A M P L E 7 Calculating Dot Products Using Components (a) Use Formula 15 to compute the dot product of the vectors u and v in Example 5. (b) Calculate

for the following vectors in

:

Solution (a) The component forms of the vectors are

and

which agrees with the result obtained geometrically in Example 5. (b)

. Thus,

Figure 3.2.7

Algebraic Properties of the Dot Product In the special case where

in Definition 4, we obtain the relationship (18)

This yields the following formula for expressing the length of a vector in terms of a dot product: (19)

Dot products have many of the same algebraic properties as products of real numbers.

THEOREM 3.2.2 If u, v, and w are vectors in

and if k is a scalar, then:

(a) (b) (c) (d)

We will prove parts (c) and (d) and leave the other proofs as exercises. Proof (c) Let

and

. Then

Proof (d) The result follows from parts (a) and (b) of Theorem 3.2.1 and the fact that

The next theorem gives additional properties of dot products. The proofs can be obtained either by expressing the vectors in terms of components or by using the algebraic properties established in Theorem 3.2.2.

THEOREM 3.2.3 If

and w are vectors in

and if k is a scalar, then:

(a) (b) (c) (d) (e)

We will show how Theorem 3.2.2 can be used to prove part (b) without breaking the vectors into components. The other proofs are left as exercises. Proof (b)

Formulas 18 and 19 together with Theorems 3.2.2 and 3.2.3 make it possible to manipulate expressions involving dot products using familiar algebraic techniques.

E X A M P L E 8 Calculating with Dot Products

Cauchy—Schwarz Inequality and Angles in Rn Our next objective is to extend to by starting with the formula

the notion of “angle” between nonzero vectors u and v. We will do this

(20) which we previously derived for nonzero vectors in and . Since dot products and norms have been defined for vectors in , it would seem that this formula has all the ingredients to serve as a definition of the angle θ between two vectors, u and v, in . However, there is a fly in the ointment, the problem being that the inverse cosine in Formula 20 is not defined unless its argument satisfies the inequalities (21) Fortunately, these inequalities do hold for all nonzero vectors in result known as the Cauchy—Schwarz inequality.

as a result of the following fundamental

THEOREM 3.2.4 Cauchy—Schwarz Inequality If

and

are vectors in

then (22)

or in terms of components (23)

We will omit the proof of this theorem because later in the text we will prove a more general version of which this will be a special case. Our goal for now will be to use this theorem to prove that the inequalities in 21 hold for all nonzero vectors in . Once that is done we will have established all the results required to use Formula 20 as our definition of the angle between nonzero vectors u and v in . To prove that the inequalities in 21 hold for all nonzero vectors in product to obtain

from which 21 follows.

, divide both sides of Formula 22 by the

Hermann Amandus Schwarz (1843-1921)

Viktor Yakovlevich Bunyakovsky (1804-1889) Historical Note The Cauchy—Schwarz inequality is named in honor of the French mathematician Augustin Cauchy (see p. 109) and the German mathematician Hermann Schwarz. Variations of this inequality occur in many different settings and under various names. Depending on the context in which the inequality occurs, you may find it called Cauchy's inequality, the Schwarz inequality, or sometimes even the Bunyakovsky inequality, in recognition of the Russian mathematician who published his version of the inequality in 1859, about 25 years before Schwarz. [Images: wikipedia (Schwarz); wikipedia (Bunyakovsky)]

Geometry in Rn Earlier in this section we extended various concepts to with the idea that familiar results that we can visualize in and might be valid in as well. Here are two fundamental theorems from plane geometry whose validity extends to : • The sum of the lengths of two side of a triangle is at least as large as the third (Figure 3.2.8). • The shortest distance between two points is a straight line (Figure 3.2.9). The following theorem generalizes these theorems to

.

THEOREM 3.2.5 If u, v, and w are vectors in

and if k is any scalar, then:

(a) (b)

Proof (a)

Proof (b) It follows from part (a) and Formula 11 that

Figure 3.2.8

Figure 3.2.9 It is proved in plane geometry that for any parallelogram the sum of the squares of the diagonals is equal to the sum of the squares of the four sides (Figure 3.2.10). The following theorem generalizes that result to .

THEOREM 3.2.6 Parallelogram Equation for Vectors If u and v are vectors in

then (24)

Proof

Figure 3.2.10 We could state and prove many more theorems from plane geometry that generalize to , but the ones already given should suffice to convince you that is not so different from and even though we cannot visualize it directly. The next theorem establishes a fundamental relationship between the dot product and norm in .

THEOREM 3.2.7 If u and v are vectors in

with the Euclidean inner product, then (25)

Proof

from which 25 follows by simple algebra.

Note that Formula 25 expresses the dot product in terms of norms.

Dot Products as Matrix Multiplication There are various ways to express the dot product of vectors using matrix notation. The formulas depend on whether the vectors are expressed as row matrices or column matrices. Here are the possibilities. If A is an matrix and u and v are properties of the transpose that

matrices, then it follows from the first row in Table 1 and

The resulting formulas (26)

(27) provide an important link between multiplication by an

matrix A and multiplication by

E X A M P L E 9 Verifying That Au · v = u · ATv

.

Suppose that

Then

from which we obtain

Thus, 27 also holds.

as guaranteed by Formula 26. We leave it for you to verify that Formula

Table 1 Form u a column matrix and v a column matrix

u a row matrix and v a column matrix

u a column matrix and v a row matrix

Dot Product

Example

Form

Dot Product

Example

u a row matrix and v a row matrix

A Dot Product View of Matrix Multiplication Dot products provide another way of thinking about matrix multiplication. Recall that if matrix and is an matrix, then the th entry of AB is

is an

which is the dot product of the ith row vector of A and the jth column vector of B

Thus, if the row vectors of A are product AB can be expressed as

and the column vectors of B are

, then the matrix

(28)

Application of Dot Products to ISBN Numbers Although the system has recently changed, most books published in the last 25 years have been assigned a unique 10-digit number called an International Standard Book Number or ISBN. The first nine digits of this number are split into three groups—the first group representing the country or group of countries in which the book originates, the second identifying the publisher, and the third assigned to the book title itself. The tenth and final digit, called a check digit, is computed from the first nine digits and is used to ensure that an electronic transmission of the ISBN, say over the Internet, occurs without error. To explain how this is done, regard the first nine digits of the ISBN as a vector b in

, and let a be the

vector Then the check digit c is computed using the following procedure: 1. Form the dot product

.

2. Divide by 11, thereby producing a remainder c that is an integer between 0 and 10, inclusive. The check digit is taken to be c, with the proviso that is written as X to avoid double digits. For example, the ISBN of the brief edition of Calculus, sixth edition, by Howard Anton is which has a check digit of 9. This is consistent with the first nine digits of the ISBN, since Dividing 152 by 11 produces a quotient of 13 and a remainder of 9, so the check digit is . If an electronic order is placed for a book with a certain ISBN, then the warehouse can use the above procedure to verify that the check digit is consistent with the first nine digits, thereby reducing the possibility of a costly shipping error.

Concept Review • Norm (or length or magnitude) of a vector • Unit vector • Normalized vector • Standard unit vectors • Distance between points in • Angle between two vectors in • Dot product (or Euclidean inner product) of two vectors in • Cauchy-Schwarz inequality • Triangle inequality • Parallelogram equation for vectors

Skills • Compute the norm of a vector in

.

• Determine whether a given vector in • Normalize a nonzero vector in

is a unit vector.

.

• Determine the distance between two vectors in • Compute the dot product of two vectors in

.

.

• Compute the angle between two nonzero vectors in

.

• Prove basic properties pertaining to norms and dot products (Theorems 3.2.1–3.2.3 and 3.2.5–3.2.7).

Exercise Set 3.2 In Exercises 1–2, find the norm of v, a unit vector that has the same direction as v, and a unit vector that is oppositely directed to v. 1. (a) (b) (c) Answer: (a) (b) (c) 2. (a) (b) (c) In Exercises 3–4, evaluate the given expression with . 3. (a) (b) (c) (d) Answer: (a) (b) (c) (d) 4. (a) (b) (c)

and

(d) In Exercises 5–6, evaluate the given expression with

and

5. (a) (b) (c) Answer: (a) (b) (c) 6. (a) (b) (c) 7. Let

. Find all scalars k such that

.

Answer:

8. Let In Exercises 9–10, find

. Find all scalars k such that and

.

9. (a) (b) Answer: (a) (b) 10. (a) (b) In Exercises 11–12, find the Euclidean distance between u and v. 11. (a) (b) (c)

.

Answer: (a) (b) (c) 12. (a) (b) (c) 13. Find the cosine of the angle between the vectors in each part of Exercise 11, and then state whether the angle is acute, obtuse, or 90°. Answer: (a) (b) (c)

; θ is acute ; θ is obtuse ; θ is obtuse

14. Find the cosine of the angle between the vectors in each part of Exercise 12, and then state whether the angle is acute, obtuse, or 90°. 15. Suppose that a vector a in the xy-plane has a length of 9 units and points in a direction that is 120° counterclockwise from the positive x-axis, and a vector b in that plane has a length of 5 units and points in the positive y-direction. Find . Answer:

16. Suppose that a vector a in the xy-plane points in a direction that is 47° counterclockwise from the positive x-axis, and a vector b in that plane points in a direction that is 43° clockwise from the positive x-axis. What can you say about the value of ? In Exercises 17–18, determine whether the expression makes sense mathematically. If not, explain why. 17. (a) (b) (c) (d) Answer:

(a) (b) (c) (d)

does not make sense because

is a scalar.

makes sense. does not make sense because the quantity inside the norm is a scalar. makes sense since the terms are both scalars.

18. (a) (b) (c) (d) 19. Find a unit vector that has the same direction as the given vector. (a) (b) (c) (d) Answer: (a) (b) (c)

(d)

20. Find a unit vector that is oppositely directed to the given vector. (a) (b) (c) (d) 21. State a procedure for finding a vector of a specified length m that points in the same direction as a given vector v. 22. If and , what are the largest and smallest values possible for explanation of your results. 23. Find the cosine of the angle θ between u and v. (a) (b) (c)

? Give a geometric

(d) Answer: (a) (b) (c) (d) 24. Find the radian measure of the angle θ (with (a) (b)

) between u and v.

and and

(c)

and

(d)

and

In Exercises 25–26, verify that the Cauchy-Schwarz inequality holds. 25. (a) (b) (c) Answer: (a) (b) (c) 26. (a) (b) (c) 27. Let

and

. Describe the set of all points

for which

.

Answer: A sphere of radius 1 centered at 28. (a) Show that the components of the vector .

. in Figure Ex-28a are

and

(b) Let u and v be the vectors in Figure Ex-28b. Use the result in part (a) to find the components of .

Figure Ex-28 29. Prove parts (a) and (b) of Theorem 3.2.1. 30. Prove parts (a) and (c) of Theorem 3.2.3. 31. Prove parts (d) and (e) of Theorem 3.2.3. 32. Under what conditions will the triangle inequality (Theorem 3.2.5a) be an equality? Explain your answer geometrically. 33. What can you say about two nonzero vectors, u and v, that satisfy the equation

?

34. (a) What relationship must hold for the point to be equidistant from the origin and the xz-plane? Make sure that the relationship you state is valid for positive and negative values of a, b, and c. (b) What relationship must hold for the point to be farther from the origin than from the xz-plane? Make sure that the relationship you state is valid for positive and negative values of a, b, and c

True-False Exercises In parts (a)–(j) determine whether the statement is true or false, and justify your answer. (a) If each component of a vector in

is doubled, the norm of that vector is doubled.

Answer: True (b) In , the vectors of norm 5 whose initial points are at the origin have terminal points lying on a circle of radius 5 centered at the origin. Answer: True (c) Every vector in Answer: False

has a positive norm.

(d) If v is a nonzero vector in

, there are exactly two unit vectors that are parallel to v.

Answer: True (e) If

,

, and

, then the angle between u and v is

radians.

Answer: True (f) The expressions

and

are both meaningful and equal to each other.

Answer: False (g) If

, then

.

Answer: False (h) If

, then either

or

.

Answer: False (i) In

, if u lies in the first quadrant and v lies in the third quadrant, then

Answer: True (j) For all vectors u, v, and w in

, we have

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

cannot be positive.

3.3 Orthogonality In the last section we defined the notion of “angle” between vectors in . In this section we will focus on the notion of “perpendicularity.” Perpendicular vectors in play an important role in a wide variety of applications.

Orthogonal Vectors Recall from Formula 20 in the previous section that the angle θ between two nonzero vectors u and v in formula

It follows from this that

if and only if

is defned by the

. Thus, we make the following definition.

DEFINITION 1 Two nonzero vectors u and v in are said to be orthogonal (or perpendicular) if . We will also agree that the zero vector in is orthogonal to every vector in . A nonempty set of vectors in is called an orthogonal set if all pairs of distinct vectors in the set are orthogonal. An orthogonal set of unit vectors is called an orthonormal set.

E X A M P L E 1 Orthogonal Vectors (a) Show that (b) Show that the set

and

are orthogonal vectors in

of standard unit vectors is an orthogonal set in

. .

Solution (a) The vectors are orthogonal since (b) We must show that all pairs of distinct vectors are orthogonal, that is, This is evident geometrically (Figure 3.2.2), but it can be seen as well from the computations

In Example 1 there is no need to check that since this follows from computations in the example and the symmetry property of the dot product.

Lines and Planes Determined by Points and Normals

One learns in analytic geometry that a line in is determined uniquely by its slope and one of its points, and that a plane in is determined uniquely by its “inclination” and one of its points. One way of specifying slope and inclination is to use a nonzero vector n, called a normal, that is orthogonal to the line or plane in question. For example, Figure 3.3.1 shows the line through the point that has normal and the plane through the point that has normal . Both the line and the plane are represented by the vector equation (1) where P is either an arbitrary point

on the line or an arbitrary point

in the plane. The vector

can be expressed

in terms of components as

(2)

(3) These are called the point-normal equations of the line and plane.

E X A M P L E 2 Point-Normal Equations It follows from 2 that in

the equation

represents the line through the point represents the plane through the point

with normal

; and it follows from 3 that in

with normal

the equation

.

Figure 3.3.1 When convenient, the terms in Equations 2 and 3 can be multiplied out and the constants combined. This leads to the following theorem.

THEOREM 3.3.1 (a) If a and b are constants that are not both zero, then an equation of the form

(4) represents a line in

with normal

.

(b) If a, b, and c are constants that are not all zero, then an equation of the form (5) represents a plane in

with normal

.

E X A M P L E 3 Vectors Orthogonal to Lines and Planes Through the Origin (a) The equation represents a line through the origin in . Show that the vector formed from the coefficients of the equation is orthogonal to the line, that is, orthogonal to every vector along the line. (b) The equation represents a plane through the origin in . Show that the vector formed from the coefficients of the equation is orthogonal to the plane, that is, orthogonal to every vector that lies in the plane. Solution We will solve both problems together. The two equations can be written as or, alternatively, as These equations show that is orthogonal to every vector vector in the plane (Figure 3.3.1).

on the line and that

is orthogonal to every

Recall that are called homogeneous equations. Example 3 illustrates that homogeneous equations in two or three unknowns can be written in the vector form (6) where n is the vector of coefficients and x is the vector of unknowns. In this is called the vector form of a line through the origin, and in it is called the vector form of a plane through the origin. Referring to Table 1 of Section 3.2, in what other ways can you write 6 if n and x are expressed in matrix form?

Orthogonal Projections In many applications it is necessary to “decompose” a vector u into a sum of two terms, one term being a scalar multiple of a specified nonzero vector a and the other term being orthogonal to a. For example, if u and a are vectors in that are positioned so their initial points coincide at a point Q, then we can create such a decomposition as follows (Figure 3.3.2): • Drop a perpendicular from the tip of u to the line through a. • Construct the vector

from Q to the foot of the perpendicular.

• Construct the vector

.

Figure 3.3.2 In parts (b) through (d),

, where

is parallel to a and

is orthogonal to a.

Since we have decomposed u into a sum of two orthogonal vectors, the first term being a scalar multiple of a and the second being orthogonal to a. The following theorem shows that the foregoing results, which we illustrated using vectors in

, apply as well in

.

THEOREM 3.3.2 Projection Theorem If u and a are vectors in

and if

is a scalar multiple of a and

Proof Since the vector

then u can be expressed in exactly one way in the form

where

is orthogonal to a.

is to be a scalar multiple of a, it must have the form

(7) Our goal is to find a value of the scalar k and a vector

that is orthogonal to a such that (8)

We can determine k by using 7 to rewrite 8 as and then applying Theorems 3.2.2 and 3.2.3 to obtain (9) Since

is to be orthogonal to a, the last term in 9 must be 0, and hence k must satisfy the equation

from which we obtain

as the only possible value for k. The proof can be completed by rewriting 8 as

and then confirming that

is orthogonal to a by showing that

(we leave the details for you).

The vectors and in the Projection Theorem have associated names—the vector is called the orthogonal projection of u on a or sometimes the vector component of u along a, and the vector is called the vector component of u orthogonal to a. The vector is commonly denoted by the symbol , in which case it follows from 8 that . In summary,

(10)

(11)

E X A M P L E 4 Orthogonal Projection on a Line Find the orthogonal projections of the vectors the positive x-axis in .

and

Solution As illustrated in Figure 3.3.3, to find the orthogonal projection of along a. Since

on the line L that makes an angle θ with

is a unit vector along the line L, so our first problem is

it follows from Formula 10 that this projection is

Similarly, since

, it follows from Formula 10 that

E X A M P L E 5 Vector Component of u Along a Let orthogonal to a.

and

. Find the vector component of u along a and the vector component of u

Solution

Thus the vector component of u along a is

and the vector component of u orthogonal to a is

As a check, you may wish to verify that the vectors product is zero.

Figure 3.3.3

and a are perpendicular by showing that their dot

Sometimes we will be more interested in the norm of the vector component of u along a than in the vector component itself. A formula for this norm can be derived as follows:

where the second equality follows from part (c) of Theorem 3.2.1 and the third from the fact that

. Thus, (12)

If denotes the angle between u and a, then

, so 12 can also be written as (13)

(Verify.) A geometric interpretation of this result is given in Figure 3.3.4.

Figure 3.3.4

The Theorem of Pythagoras In Section 3.2 we found that many theorems about vectors in generalization of the Theorem of Pythagoras (Figure 3.3.5).

and

also hold in

. Another example of this is the following

THEOREM 3.3.3 Theorem of Pythagoras in Rn If u and

are orthogonal vectors in

with the Euclidean inner product, then (14)

Proof Since u and v are orthogonal, we have

, from which it follows that

E X A M P L E 6 Theorem of Pythagoras in R4 We showed in Example 1 that the vectors are orthogonal. Verify the Theorem of Pythagoras for these vectors. Solution We leave it for you to confirm that

Thus,

Figure 3.3.5 OPTIONAL

Distance Problems We will now show how orthogonal projections can be used to solve the following three distance problems: Problem 1. Find the distance between a point and a line in

.

Problem 2. Find the distance between a point and a plane in

.

Problem 3. Find the distance between two parallel planes in

.

A method for solving the first two problems is provided by the next theorem. Since the proofs of the two parts are similar, we will prove part (b) and leave part (a) as an exercise.

THEOREM 3.3.4 (a) In

the distance D between the point

and the line

is (15)

(b) In

the distance D between the point

and the plane

is (16)

Proof (b) Let be any point in the plane. Position the normal so that its initial point is at Q. As illustrated in Figure 3.3.6, the distance D is equal to the length of the orthogonal projection of on n. Thus, it follows from Formula 12 that

But

Thus (17) Since the point

lies in the given plane, its coordinates satisfy the equation of that plane; thus

or Substituting this expression in 17 yields 16.

E X A M P L E 7 Distance Between a Point and a Plane Find the distance D between the point

and the plane

.

Solution Since the distance formulas in Theorem 3.3.4 require that the equations of the line and plane be written with zero on the right side, we first need to rewrite the equation of the plane as from which we obtain

Figure 3.3.6 The third distance problem posed above is to find the distance between two parallel planes in

. As suggested in Figure 3.3.7, the

distance between a plane V and a plane W can be obtained by finding any point distance between that point and the other plane. Here is an example.

in one of the planes, and computing the

Figure 3.3.7 The distance between the parallel planes V and W is equal to the distance between

and W.

E X A M P L E 8 Distance Between Parallel Planes The planes are parallel since their normals, planes.

and

, are parallel vectors. Find the distance between these

Solution To find the distance D between the planes, we can select an arbitrary point in one of the planes and compute its distance to the other plane. By setting in the equation , we obtain the point in this plane. From 16, the distance between and the plane is

Concept Review • Orthogonal (perpendicular) vectors • Orthogonal set of vectors • Normal to a line • Normal to a plane • Point-normal equations • Vector form of a line • Vector form of a plane • Orthogonal projection of u on a • Vector component of u along a • Vector component of u orthogonal to a • Theorem of Pythagoras

Skills • Determine whether two vectors are orthogonal. • Determine whether a given set of vectors forms an orthogonal set. • Find equations for lines (or planes) by using a normal vector and a point on the line (or plane). • Find the vector form of a line or plane through the origin. • Compute the vector component of u along a and orthogonal to a.

• Find the distance between a point and a line in • Find the distance between two parallel planes in

or

.

.

• Find the distance between a point and a plane.

Exercise Set 3.3 In Exercises 1–2, determine whether u and v are orthogonal vectors. 1. (a) (b) (c) (d) Answer: (a) Orthogonal (b) Not orthogonal (c) Not orthogonal (d) Not orthogonal 2. (a) (b) (c) (d) In Exercises 3–4, determine whether the vectors form an orthogonal set. 3. (a)

,

(b)

,

(c)

,

(d)

, ,

,

Answer: (a) Not an orthogonal set (b) Orthogonal set (c) Orthogonal set (d) Not an orthogonal set 4. (a)

,

(b)

,

(c)

,

(d)

, ,

,

5. Find a unit vector that is orthogonal to both Answer:

and

6. (a) Show that

and

are orthogonal vectors.

(b) Use the result in part (a) to find two vectors that are orthogonal to (c) Find two unit vectors that are orthogonal to 7. Do the points

,

.

.

, and

form the vertices of a right triangle? Explain your answer.

Answer: Yes 8. Repeat Exercise 7 for the points

,

, and

.

In Exercises 9–12, find a point-normal form of the equation of the plane passing through P and having n as a normal. 9. Answer:

10. 11. Answer: 12. In Exercises 13–16, determine whether the given planes are parallel. 13.

and Answer: Not parallel

14.

and

15.

and Answer: Parallel

16.

and

In Exercises 17–18, determine whether the given planes are perpendicular. 17. Answer: Not perpendicular 18. In Exercises 19–20, find 19. (a) (b)

.

Answer: (a) (b) 20. (a) (b) In Exercises 21–28, find the vector component of u along a and the vector component of u orthogonal to a. 21. Answer:

22. 23. Answer:

24. 25. Answer:

26. 27. Answer:

28. In Exercises 29–32, find the distance between the point and the line. 29. Answer: 1 30. 31. Answer:

32. In Exercises 33–36, find the distance between the point and the plane.

33. Answer:

34. 35. Answer:

36. In Exercises 37–40, find the distance between the given parallel planes. 37.

and Answer:

38.

and

39.

and Answer: 0 (The planes coincide.)

40.

and

41. Let i, j, and k be unit vectors along the positive x, y, and z axes of a rectangular coordinate system in 3-space. If is a nonzero vector, then the angles α, β, and γ between v and the vectors i, j, and k, respectively, are called the direction angles of v (Figure Ex-41), and the numbers , and are called the direction cosines of v. (a) Show that (b) Find

. and

.

(c) Show that (d) Show that

. .

Figure Ex-41 Answer: (b) 42. Use the result in Exercise 41 to estimate, to the nearest degree, the angles that a diagonal of a box with dimensions

makes with the edges of the box. 43. Show that if v is orthogonal to both

and

, then v is orthogonal to

44. Let u and v be nonzero vectors in 2- or 3-space, and let angle between u and v.

for all scalars

and

and

. Show that the vector

45. Prove part (a) of Theorem 3.3.4. 46. Is it possible to have Explain your reasoning.

True-False Exercises In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) The vectors

and

are orthogonal.

Answer: True (b) If u and v are orthogonal vectors, then for all nonzero scalars k and m,

and

are orthogonal vectors.

Answer: True (c) The orthogonal projection of u along a is perpendicular to the vector component of u orthogonal to a. Answer: True (d) If a and b are orthogonal vectors, then for every nonzero vector u, we have

Answer: True (e) If a and u are nonzero vectors, then

Answer: True (f) If the relationship holds for some nonzero vector a, then Answer: False (g) For all vectors u and v, it is true that

Answer: False

.

. bisects the

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

3.4 The Geometry of Linear Systems In this section we will use parametric and vector methods to study general systems of linear equations. This work will enable us to interpret solution sets of linear systems with n unknowns as geometric objects in just as we interpreted solution sets of linear systems with two and three unknowns as points, lines, and planes in and .

Vector and Parametric Equations of Lines in R2 and R3 In the last section we derived equations of lines and planes that are determined by a point and a normal vector. However, there are other useful ways of specifying lines and planes. For example, a unique line in or is determined by a point on the line and a nonzero vector v parallel to the line, and a unique plane in is determined by a point in the plane and two noncollinear vectors and parallel to the plane. The best way to visualize this is to translate the vectors so their initial points are at (Figure 3.4.1).

Figure 3.4.1 Let us begin by deriving an equation for the line L that contains the point and is parallel to v. If x is a general point on such a line, then, as illustrated in Figure 3.4.2, the vector will be some scalar multiple of v, say As the variable t (called a parameter) varies from

to

, the point x traces out the line L. Accordingly, we have the following result.

THEOREM 3.4.1 Let L be the line in that is parallel to

or is

that contains the point

and is parallel to the nonzero vector . Then the equation of the line through

(1) If

, then the line passes through the origin and the equation has the form (2)

Although it is not stated explicitly, it is understood in Formulas 1 and 2 that the parameter t varies from to . This applies to all vector and parametric equations in this text except where stated otherwise.

Figure 3.4.2

Vector and Parametric Equations of Planes in R3 Next we will derive an equation for the plane W that contains the point and is parallel to the noncollinear vectors and . As shown in Figure 3.4.3, if x is any point in the plane, then by forming suitable scalar multiples of and , say and , we can create a parallelogram with diagonal and adjacent sides and . Thus, we have

Figure 3.4.3 As the variables and (called parameters) vary independently from we make the following definition.

to

, the point x varies over the entire plane W. Accordingly,

THEOREM 3.4.2 Let W be the plane in that contains the point and is parallel to the noncollinear vectors plane through that is parallel to and is given by

and

. Then an equation of the

(3)

If

, then the plane passes through the origin and the equation has the form (4)

Remark Observe that the line through represented by Equation 1 is the translation by of the line through the origin represented by Equation 2 and that the plane through represented by Equation 3 is the translation by of the plane through the origin represented by Equation 4 (Figure 3.4.4).

Figure 3.4.4

Motivated by the forms of Formulas 1 to 4, we can extend the notions of line and plane to

by making the following definitions.

DEFINITION 1 If

and v are vectors in

, and if v is nonzero, then the equation (5)

defines the line through

that is parallel to . In the special case where

, the line is said to pass through the origin.

DEFINITION 2 If

and

are vectors in

, and if

and

are not collinear, then the equation (6)

defines the plane through origin.

that is parallel to

and

. In the special case where

, the plane is said to pass through the

Equations 5 and 6 are called vector forms of a line and plane in . If the vectors in these equations are expressed in terms of their components and the corresponding components on each side are equated, then the resulting equations are called parametric equations of the line and plane. Here are some examples.

E X A M P L E 1 Vector and Parametric Equations of Lines in R2 and R3 (a) Find a vector equation and parametric equations of the line in vector .

that passes through the origin and is parallel to the

(b) Find a vector equation and parametric equations of the line in parallel to the vector .

that passes through the point

(c) Use the vector equation obtained in part (b) to find two points on the line that are different from

and is .

Solution (a) It follows from 5 with expressed in vector form as

that a vector equation of the line is

. If we let

, then this equation can be

Equating corresponding components on the two sides of this equation yields the parametric equations

(b) It follows from 5 that a vector equation of the line is . If we let , then this equation can be expressed in vector form as

, and if we take

(7) Equating corresponding components on the two sides of this equation yields the parametric equations (c) A point on the line represented by Equation 7 can be obtained by substituting a specific numerical value for the parameter t . However, since produces , which is the point , this value of t does not serve our purpose. Taking produces the point and taking produces the point . Any other distinct values for t (except ) would work just as well.

E X A M P L E 2 Vector and Parametric Equations of a Plane in R3 Find vector and parametric equations of the plane

.

Solution We will find the parametric equations first. We can do this by solving the equation for any one of the variables in terms of the other two and then using those two variables as parameters. For example, solving for x in terms of y and z yields (8) and then using y and z as parameters

and

, respectively, yields the parametric equations

We would have obtained different parametric and vector equations in Example 2 had we solved 8 for y or z rather than x. However, one can show the same plane results in all three cases as the parameters vary from to .

To obtain a vector equation of the plane we rewrite these parametric equations as or, equivalently, as

E X A M P L E 3 Vector and Parametric Equations of Lines and Planes in R4 (a) Find vector and parametric equations of the line through the origin of

that is parallel to the vector

(b) Find vector and parametric equations of the plane in that passes through the point to both and . Solution (a) If we let

, then the vector equation

can be expressed as

Equating corresponding components yields the parametric equations

. and is parallel

(b) The vector equation

can be expressed as

which yields the parametric equations

Lines Through Two Points in Rn If and are distinct points in , then the line determined by these points is parallel to the vector follows from 5 that the line can be expressed in vector form as

(Figure 3.4.5), so it

(9) or, equivalently, as (10) These are called the two-point vector equations of a line in

.

E X A M P L E 4 A Line Through Two Points in R2 Find vector and parametric equations for the line in

that passes through the points

and

.

Solution We will see below that it does not matter which point we take to be and which we take to be choose and . It follows that and hence that

, so let us

(11) which we can rewrite in parametric form as Had we reversed our choices and taken

and

, then the resulting vector equation would have been (12)

and the parametric equations would have been (verify). Although 11 and 12 look different, they both represent the line whose equation in rectangular coordinates is (Figure 3.4.6). This can be seen by eliminating the parameter t from the parametric equations (verify).

Figure 3.4.5

Figure 3.4.6 The point in Equations 9 and 10 traces an entire line in as the parameter t varies over the interval . If, however, we restrict the parameter to vary from to , then x will not trace the entire line but rather just the line segment joining the points and . The point x will start at when and end at when . Accordingly, we make the following definition.

DEFINITION 3 If

and

are vectors in

, then the equation (13)

defines the line segment from

to

. When convenient, Equation 13 can be written as (14)

E X A M P L E 5 A Line Segment from One Point to Another in R2 It follows from 13 and 14 that the line segment in equation

from

to

can be represented either by the

or by

Dot Product Form of a Linear System Our next objective is to show how to express linear equations and linear systems in dot product notation. This will lead us to some important results about orthogonality and linear systems. Recall that a linear equation in the variables

has the form (15)

and that the corresponding homogeneous equation is (16) These equations can be rewritten in vector form by letting

in which case Formula 15 can be written as (17) and Formula 16 as (18)

Except for a notational change from n to a, Formula 18 is the extension to of Formula 6 in Section 3.3. This equation reveals that each solution vector x of a homogeneous equation is orthogonal to the coefficient vector a. To take this geometric observation a step further, consider the homogeneous system

If we denote the successive row vectors of the coefficient matrix by

, then we can rewrite this system in dot product form as

(19)

from which we see that every solution vector x is orthogonal to every row vector of the coefficient matrix. In summary, we have the following result.

THEOREM 3.4.3 If A is an matrix, then the solution set of the homogeneous linear system orthogonal to every row vector of A.

consists of all vectors in

E X A M P L E 6 Orthogonality of Row Vectors and Solution Vectors We showed in Example 6 of Section 1.2 that the general solution of the homogeneous linear system

is which we can rewrite in vector form as

According to Theorem 3.4.3, the vector x must be orthogonal to each of the row vectors

that are

We will confirm that x is orthogonal to well. The dot product of and x is

, and leave it for you to verify that x is orthogonal to the other three row vectors as

which establishes the orthogonality.

The Relationship Between Ax = 0 and Ax = b We will conclude this section by exploring the relationship between the solutions of a homogeneous linear system and the solutions (if any) of a nonhomogeneous linear system that has the same coefficient matrix. These are called corresponding linear systems. To motivate the result we are seeking, let us compare the solutions of the corresponding linear systems

We showed in Example 5 and Example 6 of Section 1.2 that the general solutions of these linear systems can be written in parametric form as

which we can then rewrite in vector form as

By splitting the vectors on the right apart and collecting terms with like parameters, we can rewrite these equations as (20)

(21) Formulas 20 and 21 reveal that each solution of the nonhomogeneous system can be obtained by adding the fixed vector to the corresponding solution of the homogeneous system. This is a special case of the following general result.

THEOREM 3.4.4 The general solution of a consistent linear system solution of .

can be obtained by adding any specific solution of

Proof Let be any specific solution of , let W denote the solution set of result by adding to each vector in W. We must show that if x is a vector in every solution of is in the set . Assume first that x is a vector in which shows that x is a solution of

to the general

, and let denote the set of all vectors that , then x is a solution of , and conversely, that

This implies that x is expressible in the form

where

and

Thus,

Conversely, let x be any solution of

To show that x is in the set

we must show that x is expressible in the form (22)

where w is in W (i.e.,

). We can do this by taking

Figure 3.4.7 The solution set of

This vector obviously satisfies 22, and it is in W since

is a translation of the solution space of

.

Remark Theorem 3.4.4 has a useful geometric interpretation that is illustrated in Figure 3.4.7. If, as discussed in Section 3.1, we interpret vector addition as translation, then the theorem states that if is any specific solution of , then the entire solution set of can be obtained by translating the solution set of by the vector

Concept Review • Parameters • Parametric equations of lines • Parametric equations of planes • Two-point vector equations of a line • Vector equation of a line • Vector equation of a plane

Skills • Express the equations of lines in • Express the equations of planes in

and

using either vector or parametric equations.

using either vector or parametric equations.

• Express the equation of a line containing two given points in

or

using either vector or parametric equations.

• Find equations of a line and a line segment. • Verify the orthogonality of the row vectors of a linear system of equations and a solution vector. • Use a specific solution to the nonhomogeneous linear system to obtain the general solution to .

and the general solution of the corresponding linear system

Exercise Set 3.4 In Exercises 1–4, find vector and parametric equations of the line containing the point and parallel to the vector. 1. Point:

; vector:

Answer: Vector equation: parametric equations:

;

2. Point:

; vector:

3. Point:

; vector:

Answer: Vector equation:

;

parametric equations: 4. Point:

; vector:

In Exercises 5–8, use the given equation of a line to find a point on the line and a vector parallel to the line. 5. Answer: Point:

; parallel vector:

6. 7. Answer: Point: (4, 6); parallel vector: 8. In Exercises 9–12, find vector and parametric equations of the plane containing the given point and parallel vectors. 9. Point:

; vectors:

and

Answer: Vector equation: parametric equations: 10. Point:

; vectors:

and

11. Point:

; vectors:

and

Answer: Vector equation: parametric equations: 12. Point:

; vectors:

and

In Exercises 13–14, find vector and parametric equations of the line in

that passes through the origin and is orthogonal to v.

13. Answer: A possible answer is vector equation:

;

parametric equations: 14. In Exercises 15–16, find vector and parametric equations of the plane in 15.

that passes through the origin and is orthogonal to v.

[Hint: Construct two nonparallel vectors orthogonal to v in Answer:

].

A possible answer is vector equation:

;

parametric equations: 16. In Exercises 17–20, find the general solution to the linear system and confirm that the row vectors of the coefficient matrix are orthogonal to the solution vectors. 17.

Answer:

18. 19.

Answer:

20. 21. (a) The equation can be viewed as a linear system of one equation in three unknowns. Express a general solution of this equation as a particular solution plus a general solution of the associated homogeneous system. (b) Give a geometric interpretation of the result in part (a). Answer: (a) (b) a plane in

passing through P(1, 0, 0) and parallel to

and

22. (a) The equation can be viewed as a linear system of one equation in two unknowns. Express a general solution of this equation as a particular solution plus a general solution of the associated homogeneous system. (b) Give a geometric interpretation of the result in part (a). 23. (a) Find a homogeneous linear system of two equations in three unknowns whose solution space consists of those vectors in orthogonal to and .

that are

(b) What kind of geometric object is the solution space? (c) Find a general solution of the system obtained in part (a), and confirm that Theorem 3.4.3 holds. Answer: (a) (b) a line through the origin in (c) 24. (a) Find a homogeneous linear system of two equations in three unknowns whose solution space consists of those vectors in orthogonal to and . (b) What kind of geometric object is the solution space?

that are

(c) Find a general solution of the system obtained in part (a), and confirm that Theorem 3.4.3 holds. 25. Consider the linear systems

and

(a) Find a general solution of the homogeneous system. (b) Confirm that

is a solution of the nonhomogeneous system.

(c) Use the results in parts (a) and (b) to find a general solution of the nonhomogeneous system. (d) Check your result in part (c) by solving the nonhomogeneous system directly. Answer: a. c. 26. Consider the linear systems

and

(a) Find a general solution of the homogeneous system. (b) Confirm that

is a solution of the nonhomogeneous system.

(c) Use the results in parts (a) and (b) to find a general solution of the nonhomogeneous system. (d) Check your result in part (c) by solving the nonhomogeneous system directly. In Exercises 27–28, find a general solution of the system, and use that solution to find a general solution of the associated homogeneous system and a particular solution of the given system. 27.

Answer: ; The general solution of the associated homogeneous system is . A particular solution of the given system is 28.

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer.

.

(a) The vector equation of a line can be determined from any point lying on the line and a nonzero vector parallel to the line. Answer: True (b) The vector equation of a plane can be determined from any point lying in the plane and a nonzero vector parallel to the plane. Answer: False (c) The points lying on a line through the origin in

or

are all scalar multiples of any nonzero vector on the line.

Answer: True (d) All solution vectors of the linear system

are orthogonal to the row vectors of the matrix A if and only if

.

Answer: True (e) The general solution of the nonhomogeneous linear system homogeneous linear system .

can be obtained by adding b to the general solution of the

Answer: False (f) If and are two solutions of the nonhomogeneous linear system homogeneous linear system. Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

, then

is a solution of the corresponding

3.5 Cross Product This optional section is concerned with properties of vectors in 3-space that are important to physicists and engineers. It can be omitted, if desired, since subsequent sections do not depend on its content. Among other things, we define an operation that provides a way of constructing a vector in 3-space that is perpendicular to two given vectors, and we give a geometric interpretation of determinants.

Cross Product of Vectors In Section 3.2 we defined the dot product of two vectors u and v in n-space. That operation produced a scalar as its result. We will now define a type of vector multiplication that produces a vector as the result but which is applicable only to vectors in 3-space.

DEFINITION 1 If defined by

and

are vectors in 3-space, then the cross product

is the vector

or, in determinant notation, (1)

Remark Instead of memorizing 1, you can obtain the components of • Form the

matrix

as follows:

whose first row contains the components of u and whose second row

contains the components of v. • To find the first component of , delete the first column and take the determinant; to find the second component, delete the second column and take the negative of the determinant; and to find the third component, delete the third column and take the determinant.

E X A M P L E 1 Calculating a Cross Product Find

, where

and

.

Solution From either 1 or the mnemonic in the preceding remark, we have

The following theorem gives some important relationships between the dot product and cross product and also shows that is orthogonal to both u and v. Historical Note The cross product notation was introduced by the American physicist and mathematician J. Willard Gibbs, (see p. 134) in a series of unpublished lecture notes for his students at Yale University. It appeared in a published work for the first time in the second edition of the book Vector Analysis, (Edwin Wilson) by Edwin Wilson (1879--1964), a student of Gibbs. Gibbs originally referred to as the “skew product.”

THEOREM 3.5.1 Relationships Involving Cross Product and Dot Product If u, v, and w are vectors in 3-space, then

Proof (a) Let

and

. Then

Proof (b) Similar to (a). Proof (c) Since

(2) and (3) the proof can be completed by “multiplying out” the right sides of 2 and 3 and verifying their equality. Proof (d) and (e) See Exercises 38 and 39.

E X A M P L E 2 u × v Is Perpendicular to u and to v Consider the vectors In Example 1 we showed that Since and is orthogonal to both u and v, as guaranteed by Theorem 3.5.1.

Joseph Louis Lagrange (1736-1813) Historical Note Joseph Louis Lagrange was a French-Italian mathematician and astronomer. Although his father wanted him to become a lawyer, Lagrange was attracted to mathematics and astronomy after reading a memoir by the astronomer Halley. At age 16 he began to study mathematics on his own and by age 19 was appointed to a professorship at the Royal Artillery School in Turin. The following year he solved some famous problems using new methods that eventually blossomed into a branch of mathematics called the calculus of variations. These methods and Lagrange's applications of them to problems in celestial mechanics were so monumental that by age 25 he was regarded by many of his contemporaries as the greatest living mathematician. One of Lagrange's most famous works is a memoir, Mécanique Analytique, in which he reduced the theory of mechanics to a few general formulas from which all other necessary equations could be derived. Napoleon was a great admirer of Lagrange and showered him with many honors. In spite of his fame, Lagrange was a shy and modest man. On his death, he was buried with honor in the Pantheon. [Image: ©SSPL/The Image Works]

The main arithmetic properties of the cross product are listed in the next theorem.

THEOREM 3.5.2 Properties of Cross Product If u, v, and w are any vectors in 3-space and k is any scalar, then: (a) (b) (c) (d) (e) (f)

The proofs follow immediately from Formula 1 and properties of determinants; for example, part (a) can be proved as follows. Proof (a) Interchanging u and v in 1 interchanges the rows of the three determinants on the right side of 1 and hence changes the sign of each component in the cross product. Thus . The proofs of the remaining parts are left as exercises.

E X A M P L E 3 Standard Unit Vectors Consider the vectors These vectors each have length 1 and lie along the coordinate axes (Figure 3.5.1). They are called the standard unit vectors in 3-space. Every vector in 3-space is expressible in terms of i, j, and k since we can write For example, From 1 we obtain

Figure 3.5.1 The standard unit vectors

You should have no trouble obtaining the following results:

Figure 3.5.2 is helpful for remembering these results. Referring to this diagram, the cross product of two consecutive vectors going clockwise is the next vector around, and the cross product of two consecutive vectors going counterclockwise is the negative of the next vector around.

Figure 3.5.2

Determinant Form of Cross Product It is also worth noting that a cross product can be represented symbolically in the form (4) For example, if

and

, then

which agrees with the result obtained in Example 1.

WARNING It is not true in general that

. For example,

and so

We know from Theorem 3.5.1 that is orthogonal to both u and v. If u and v are nonzero vectors, it can be shown that the direction of can be determined using the following “right-hand rule” (Figure 3.5.3): Let θ be

the angle between u and v, and suppose u is rotated through the angle θ until it coincides with v. If the fingers of the right hand are cupped so that they point in the direction of rotation, then the thumb indicates (roughly) the direction of .

Figure 3.5.3 You may find it instructive to practice this rule with the products

Geometric Interpretation of Cross Product If u and v are vectors in 3-space, then the norm of given in Theorem 3.5.1, states that

has a useful geometric interpretation. Lagrange's identity,

(5) If θ denotes the angle between u and v, then

Since

, it follows that

, so 5 can be rewritten as

, so this can be rewritten as (6)

But is the altitude of the parallelogram determined by u and v (Figure 3.5.4). Thus, from 6, the area A of this parallelogram is given by This result is even correct if u and v are collinear, since the parallelogram determined by u and v has zero area and from 6 we have because in this case. Thus we have the following theorem.

THEOREM 3.5.3 Area of a Parallelogram If, u and v are vectors in 3-space, then and v.

is equal to the area of the parallelogram determined by u

E X A M P L E 4 Area of a Triangle Find the area of the triangle determined by the points Solution The area A of the triangle is and

,

, and

the area of the parallelogram determined by the vectors

(Figure 3.5.5). Using the method discussed in Example 1 of Section 3.1, and

. It follows that

(verify) and consequently that

DEFINITION 2 If u, v, and w are vectors in 3-space, then is called the scalar triple product of u, v, and w.

Figure 3.5.4

Figure 3.5.5

.

The scalar triple product of formula

,

, and

can be calculated from the

(7) This follows from Formula 4 since

E X A M P L E 5 Calculating a Scalar Triple Product Calculate the scalar triple product

of the vectors

Solution From 7,

Remark The symbol makes no sense because we cannot form the cross product of a scalar and a vector. Thus, no ambiguity arises if we write rather than . However, for clarity we will usually keep the parentheses. It follows from 7 that since the determinants that represent these products can be obtained from one another by two row interchanges. (Verify.) These relationships can be remembered by moving the vectors u, v, and w clockwise around the vertices of the triangle in Figure 3.5.6.

Figure 3.5.6

Geometric Interpretation of Determinants The next theorem provides a useful geometric interpretation of

and

determinants.

THEOREM 3.5.4 (a) The absolute value of the determinant

is equal to the area of the parallelogram in 2-space determined by the vectors . (See Figure 3.5.7a.)

and

(b) The absolute value of the determinant

is equal to the volume of the parallelepiped in 3-space determined by the vectors and . (See Figure 3.5.7b.)

Figure 3.5.7

Proof (a) The key to the proof is to use Theorem 3.5.3. However, that theorem applies to vectors in 3-space, whereas and are vectors in 2-space. To circumvent this “dimension problem,” we will view u and v as vectors in the xy-plane of an xyz-coordinate system (Figure 3.5.7c), in which case these vectors are expressed as and . Thus

It now follows from Theorem 3.5.3 and the fact that and v is

that the area A of the parallelogram determined by u

which completes the proof. Proof (b) As shown in Figure 3.5.8, take the base of the parallelepiped determined by u, v, and w to be the parallelogram determined by v and w. It follows from Theorem 3.5.3 that the area of the base is and, as illustrated in Figure 3.5.8, the height h of the parallelepiped is the length of the orthogonal projection of u on . Therefore, by Formula 12 of Section 3.3,

It follows that the volume V of the parallelepiped is

so from 7, (8) which completes the proof.

Figure 3.5.8 Remark If V denotes the volume of the parallelepiped determined by vectors u, v, and w, then it follows from Formulas 7 and 8 that

(9) From this result and the discussion immediately following Definition 3 of Section 3.2, we can conclude that where the + or − results depending on whether u makes an acute or an obtuse angle with

.

Formula 9 leads to a useful test for ascertaining whether three given vectors lie in the same plane. Since three vectors not in the same plane determine a parallelepiped of positive volume, it follows from 9 that if and only if the vectors u, v, and w lie in the same plane. Thus we have the following result.

THEOREM 3.5.5 If the vectors they lie in the same plane if and only if

and

have the same initial point, then

Concept Review • Cross product of two vectors • Determinant form of cross product • Scalar triple product

Skills • Compute the cross product of two vectors u and v in • Know the geometric relationship between

.

to u and v.

• Know the properties of the cross product (listed in Theorem 3.5.2). • Compute the scalar triple product of three vectors in 3-space. • Know the geometric interpretation of the scalar triple product. • Compute the areas of triangles and parallelograms determined by two vectors or three points in 2-space or 3-space. • Use the scalar triple product to determine whether three given vectors in 3-space are collinear.

Exercise Set 3.5 In Exercises 1–2, let 1. (a) (b) (c) Answer:

and

. Compute the indicated vectors.

(a) (b) (c) 2. (a) (b) (c) In Exercises 3–6, use the cross product to find a vector that is orthogonal to both u and v. 3. Answer:

4. 5. Answer:

6. In Exercises 7–10, find the area of the parallelogram determined by the given vectors u and v. 7. Answer:

8. 9. Answer:

10. In Exercises 11–12, find the area of the parallelogram with the given vertices. 11. Answer: 3 12. In Exercises 13–14, find the area of the triangle with the given vertices. 13. Answer:

7 14. In Exercises 15–16, find the area of the triangle in 3-space that has the given vertices. 15. Answer:

16. In Exercises 17–18, find the volume of the parallelepiped with sides u, v, and w. 17. Answer: 16 18. In Exercises 19–20, determine whether u, v, and w lie in the same plane when positioned so that their initial points coincide. 19. Answer: The vectors do not lie in the same plane. 20. In Exercises 21–24, compute the scalar triple product 21. Answer: 22. 23. Answer: abc 24. In Exercises 25–26, suppose that 25. (a) (b) (c)

. Find

.

Answer: (a) (b) 3 (c) 3 26. (a) (b) (c) 27. (a) Find the area of the triangle having vertices

,

, and

.

(b) Use the result of part (a) to find the length of the altitude from vertex C to side AB. Answer: (a) (b) 28. Use the cross product to find the sine of the angle between the vectors 29. Simplify

and

.

.

Answer:

30. Let

,

,

, and

. Show that

31. Let u, v, and w be nonzero vectors in 3-space with the same initial point, but such that no two of them are collinear. Show that (a)

lies in the plane determined by v and w.

(b)

lies in the plane determined by u and v.

32. Prove the following identities. (a) (b) 33. Prove: If a, b, c, and d lie in the same plane, then 34. Prove: If θ is the angle between u and v and 35. Show that if u, v, and w are vectors in

. , then

no two of which are collinear, then

. lies in the plane

determined by v and w. 36. It is a theorem of solid geometry that the volume of a tetrahedron is to prove that the volume of a tetrahedron whose sides are the vectors a, b, and c is accompanying figure).

. Use this result (see the

Figure Ex-36 37. Use the result of Exercise 26 to find the volume of the tetrahedron with vertices P, Q, R, S. (a) (b) Answer: (a) (b) 38. Prove part (d) of Theorem 3.5.1. [Hint: First prove the result in the case where , and then when . Finally, prove it for an arbitrary vector by writing .]

, then when

39. Prove part (e) of Theorem 3.5.1. [Hint: Apply part (a) of Theorem 3.5.2 to the result in part (d) of Theorem 3.5.1.] 40. Prove: (a) Prove (b) of Theorem 3.5.2. (b) Prove (c) of Theorem 3.5.2. (c) Prove (d) of Theorem 3.5.2. (d) Prove (e) of Theorem 3.5.2. (e) Prove (f) of Theorem 3.5.2.

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer. (a) The cross product of two nonzero vectors u and v is a nonzero vector if and only if u and v are not parallel. Answer: True (b) A normal vector to a plane can be obtained by taking the cross product of two nonzero and noncollinear vectors lying in the plane. Answer: True (c) The scalar triple product of u, v, and w determines a vector whose length is equal to the volume of the parallelepiped determined by u, v, and w.

Answer: False (d) If u and v are vectors in 3-space, then

is equal to the area of the parallelogram determined by u and v.

Answer: True (e) For all vectors u, v, and w in 3-space, the vectors

and

are the same.

Answer: False (f) If u, v, and w are vectors in

, where u is nonzero and

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

, then

.

Chapter 3 Supplementary Exercises 1. Let

,

, and

. Compute

(a) (b) (c) the distance between

and

(d) (e)

}

(f) Answer: (a) (b) (c) (d) (e) (f) 2. Repeat Exercise 1 for the vectors 3. Repeat parts (a)–(d) of Exercise 1 for the vectors .

,

, and

.

,

, and

Answer: (a) (b) (c) (d) 4. Repeat parts (a)–(d) of Exercise 1 for the vectors .

,

, and

In Exercises 5–6, determine whether the given set of vectors forms an orthogonal set. If so, normalize each vector to form an orthonormal set. 5.

, Answer: Not an orthogonal set

,

6.

,

,

7. (a) The set of all vectors in

that are orthogonal to a nonzero vector is what kind of geometric object?

(b) The set of all vectors in

that are orthogonal to a nonzero vector is what kind of geometric object?

(c) The set of all vectors in object?

that are orthogonal to two noncollinear vectors is what kind of geometric

(d) The set of all vectors in object?

that are orthogonal to two noncollinear vectors is what kind of geometric

Answer: (a) A line through the origin, perpendicular to the given vector. (b) A plane through the origin, perpendicular to the given vector. (c) {0} (the origin) (d) A line through the origin, perpendicular to the plane containing the two noncollinear vectors. 8. Show that which

and

are orthonormal vectors, and find a third vector

for

is an orthonormal set.

9. True or False: If u and v are nonzero vectors such that

, then u and v are

orthogonal. Answer: True 10. True or False: If u is orthogonal to

, then u is orthogonal to v and w.

11. Consider the points , , and component is and such that is parallel to .

. Find the point S in

whose first

Answer:

12. Consider the points , , and third component is 6 and such that is parallel to .

. Find the point S in

13. Using the points in Exercise 11, find the cosine of the angle between the vectors

and

.

and

.

Answer:

14. Using the points in Exercise 12, find the cosine of the angle between the vectors 15. Find the distance between the point Answer:

and the plane

.

whose

16. Show that the planes between the planes.

and

are parallel, and find the distance

In Exercises 17–22, find vector and parametric equations for the line or plane in question. 17. The plane in

that contains the points

,

, and

.

Answer: Vector equation:

;

parametric equations: 18. The line in

that contains the point

and is orthogonal to the plane

19. The line in

that is parallel to the vector

.

and contains the point

.

Answer: Vector equation:

;

parametric equations: 20. The plane in 21. The line in

that contains the point with equation

and parallel to the plane

.

.

Answer: A possible answer is vector equation: 22. The plane in

with equation

; parametric equations: .

In Exercises 23–25, find a point-normal equation for the given plane. 23. The plane that is represented by the vector equation . Answer:

24. The plane that contains the point , , and . 25. The plane that passes through the points Answer:

and is orthogonal to the line with parametric equations ,

, and

.

26. Suppose that and all i and j. Prove that if are orthogonal.

are two sets of vectors such that are any scalars, then the vectors

27. Prove that if two vectors u and v in multiples of each other.

are orthogonal to a nonzero vector w in

28. Prove that

and

are orthogonal for and , then u and v are scalar

if and only if u and v are parallel vectors.

29. The equation represents a line through the origin in does this equation represent in if you think of it as Answer: A plane

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

if A and B are not both zero. What ? Explain.

CHAPTER

4

General Vector Spaces

CHAPTER CONTENTS 4.1.

Real Vector Spaces

4.2.

Subspaces

4.3.

Linear Independence

4.4.

Coordinates and Basis

4.5.

Dimension

4.6.

Change of Basis

4.7.

Row Space, Column Space, and Null Space

4.8.

Rank, Nullity, and the Fundamental Matrix Spaces

4.9.

Matrix Transformations from

to

4.10. Properties of Matrix Transformations 4.11. Geometry of Matrix Operators on 4.12. Dynamical Systems and Markov Chains

INTRODUCTION Recall that we began our study of vectors by viewing them as directed line segments (arrows). We then extended this idea by introducing rectangular coordinate systems, which enabled us to view vectors as ordered pairs and ordered triples of real numbers. As we developed properties of these vectors we noticed patterns in various formulas that enabled us to extend the notion of a vector to an n-tuple of real numbers. Although w-tuples took us outside the realm of our “visual experience,” it gave us a valuable tool for understanding and studying systems of linear equations. In this chapter we will extend the concept of a vector yet again by using the most important algebraic properties of vectors in as axioms. These axioms, if satisfied by a set of objects, will enable us to think of those objects as vectors.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

4.1 Real Vector Spaces In this section we will extend the concept of a vector by using the basic properties of vectors in by a set of objects, guarantee that those objects behave like familiar vectors.

as axioms, which if satisfied

Vector Space Axioms The following definition consists often axioms, eight of which are properties of vectors in that were stated in Theorem 3.1.1. It is important to keep in mind that one does not prove axioms; rather, they are assumptions that serve as the starting point for proving theorems. Vector space scalars can be real numbers or complex numbers. Vector spaces with real scalars are called real vector spaces and those with complex scalars are called complex vector spaces. For now we will be concerned exclusively with real vector spaces. We will consider complex vector spaces later.

DEFINITION 1 Let V be an arbitrary nonempty set of obj ects on which two operations are defined: addition, and multiplication by scalars. By addition we mean a rule for associating with each pair of objects u and v in V an object , called the sum of u and v; by scalar multiplication we mean a rule for associating with each scalar k and each object u in V an object ku, called the scalar multiple of u by k. If the following axioms are satisfied by all objects u, v, w in V and all scalars k and m, then we call V a vector space and we call the objects in V vectors. 1.

If u and v are objects in V, then

is in V.

2. 3. 4.

There is an object 0 in V, called a zero vector for V, such that

5.

For each u in V, there is an object

6.

If k is any scalar and u is any object in V, then ku is in V.

in V, called a negative of u, such that

for all u in V. .

7. 8. 9. 10.

Observe that the definition of a vector space does not specify the nature of the vectors or the operations. Any kind of object can be a vector, and the operations of addition and scalar multiplication need not have any relationship to those on . The only requirement is that the ten vector space axioms be satisfied. In the examples that follow we will use four basic steps to show that a set with two operations is a vector space.

To Show that a Set with Two Operations is a Vector Space Step 1 Identify the set V of objects that will become vectors.

Step 2 Identify the addition and scalar multiplication operations on V. Step 3 Verify Axioms 1 and 6; that is, adding two vectors in V produces a vector in V, and multiplying a vector in V by a scalar also produces a vector in V. Axiom 1 is called closure under addition, and Axiom 6 is called closure under scalar multiplication. Step 4 Confirm that Axioms 2, 3, 4, 5, 7, 8, 9, and 10 hold.

Hermann Günther Grassmann (1809-1877) Historical Note The notion of an “abstract vector space” evolved over many years and had many contributors. The idea crystallized with the work of the German mathematician H. G. Grassmann, who published a paper in 1862 in which he considered abstract systems of unspecified elements on which he defined formal operations of addition and scalar multiplication. Grassmann's work was controversial, and others, including Augustin Cauchy (p. 137), laid reasonable claim to the idea. [Image: (c)Sueddeutsche Zeitung Photo/The Image Works]

Our first example is the simplest of all vector spaces in that it contains only one object. Since Axiom 4 requires that every vector space contain a zero vector, the object will have to be that vector.

E X A M P L E 1 The Zero Vector Space Let V consist of a single object, which we denote by 0, and define for all scalars k. It is easy to check that all the vector space axioms are satisfied. We call this the zero vector space.

Our second example is one of the most important of all vector spaces—the familiar space . It should not be surprising that the operations on satisfy the vector space axioms because those axioms were based on known properties of operations on .

E X A M P L E 2 Rn Is a Vector Space Let , and define the vector space operations on V to be the usual operations of addition and scalar multiplication of n-tuples; that is,

The set

is closed under addition and scalar multiplication because the foregoing operations produce

n-tuples as their end result, and these operations satisfy Axioms 2, 3, 4, 5, 7, 8, 9, and 10 by virtue of Theorem 3.1.1.

Our next example is a generalization of

in which we allow vectors to have infinitely many components.

E X A M P L E 3 The Vector Space of Infinite Sequences of Real Numbers Let V consist of objects of the form in which . is an infinite sequence of real numbers. We define two infinite sequences to be equal if their corresponding components are equal, and we define addition and scalar multiplication componentwise by

We leave it as an exercise to confirm that V with these operations is a vector space. We will denote this vector space by the symbol .

In the next example our vectors will be matrices. This may be a little confusing at first because matrices are composed of rows and columns, which are themselves vectors (row vectors and column vectors). However, here we will not be concerned with the individual rows and columns but rather with the properties of the matrix operations as they relate to the matrix as a whole. Note that Equation 1 involves three different addition operations: the addition operation on vectors, the addition operation on matrices, and the addition operation on real numbers.

E X A M P L E 4 A Vector Space of 2 × 2 Matrices Let V be the set of matrices with real entries, and take the vector space operations on V to be the usual operations of matrix addition and scalar multiplication; that is,

(1)

The set V is closed under addition and scalar multiplication because the foregoing operations produce matrices as the end result. Thus, it remains to confirm that Axioms 2, 3, 4, 5, 7, 8, 9, and 10 hold. Some of these are standard properties of matrix operations. For example, Axiom 2 follows from Theorem 1.4.1a since

Similarly, Axioms 3, 7, 8, and 9 follow from parts (b), (h), (j), and (e), respectively, of that theorem (verify). This leaves Axioms 4, 5, and 10 that remain to be verified. To confirm that Axiom 4 is satisfied, we must find a matrices in V. We can do this by taking

matrix 0 in V for which

for all

With this definition,

and similarly V such that

. To verify that Axiom 5 holds we must show that each object u in V has a negative and . This can be done by defining the negative of u to be

in

With this definition,

and similarly

. Finally, Axiom 10 holds because

E X A M P L E 5 The Vector Space of m × n Matrices Example 4 is a special case of a more general class of vector spaces. You should have no trouble adapting the argument used in that example to show that the set V of all matrices with the usual matrix operations of addition and scalar multiplication is a vector space. We will denote this vector space by the symbol . Thus, for example, the vector space in Example 4 is denoted as .

In Example 6 the functions were defined on the entire interval . However, the arguments used in that example apply as well on all subin-tervals of , such as a closed interval [a, b] or an open interval (a, b). We will denote the vector spaces of functions on these intervals by F[a, b] and F(a, b), respectively.

E X A M P L E 6 The Vector Space of Real-Valued Functions Let V be the set of real-valued functions that are defined at each x in the interval . If are two functions in V and if k is any scalar, then define the operations of addition and scalar multiplication by

and

(2)

(3) One way to think about these operations is to view the numbers f(x) and g(x) as “components” of f and g at the point x, in which case Equations 2 and 3 state that two functions are added by adding corresponding components, and a function is multiplied by a scalar by multiplying each component by that scalar—exactly as in and . This idea is illustrated in parts (a) and (b) of Figure 4.1.1. The set V with these operations is denoted by the symbol . We can prove that this is a vector space as follows:

Axioms 1 and 6 These closure axioms require that if we add two functions that are defined at each x in the interval , then sums and scalar multiples of those functions are also defined at each x in the interval . This follows from Formulas 2 and 3. Axiom 4 This axiom requires that there exists a function 0 in , which when added to any other function f in produces f back again as the result. The function, whose value at every point x in the interval is zero, has this property. Geometrically, the graph of the function 0 is the line that coincides with the x-axis. Axiom 5 This axiom requires that for each function fin there exists a function —f in , which when added to f produces the function 0. The function defined by has this property. The graph of can be obtained by reflecting the graph of f about the x-axis (Figure 4.1.1c). Axioms 2,3,7,8,9,10 The validity of each of these axioms follows from properties of real numbers. For example, if f and g are functions in , then Axiom 2 requires that . This follows from the computation in which the first and last equalities follow from 2, and the middle equality is a property of real numbers. We will leave the proofs of the remaining parts as exercises.

Figure 4.1.1 It is important to recognize that you cannot impose any two operations on any set V and expect the vector space axioms to hold. For example, if V is the set of n-tuples withpositive components, and if the standard operations from are used, then V is not closed under scalar multiplication, because if u is a nonzero n-tuple in V, then has at least one negative component and hence is not in V. The following is a less obvious example in which only one of the ten vector space axioms fails to hold.

E X A M P L E 7 A Set That Is Not a Vector Space Let and define addition and scalar multiplication operations as follows: If , then define

and

and if k is any real number, then define For example, if

, and

, then

The addition operation is the standard one from , but the scalar multiplication is not. In the exercises we will ask you to show that the first nine vector space axioms are satisfied. However, Axiom 10 fails to hold for certain vectors. For example, if is such that , then Thus, V is not a vector space with the stated operations.

Our final example will be an unusual vector space that we have included to illustrate how varied vector spaces can be. Since the objects in this space will be real numbers, it will be important for you to keep track of which operations are intended as vector operations and which ones as ordinary operations on real numbers.

E X A M P L E 8 An Unusual Vector Space Let V be the set of positive real numbers, and define the operations on V to be

Thus, for example,

and

—strange indeed, but nevertheless the set V with these

operations satisfies the 10 vector space axioms and hence is a vector space. We will confirm Axioms 4, 5, and 7, and leave the others as exercises. • Axiom 4—The zero vector in this space is the number 1 (i.e., • Axiom 5—The negative of a vector u is its reciprocal (i.e.,

) since ) since

• Axiom 7—

Some Properties of Vectors The following is our first theorem about general vector spaces. As you will see, its proof is very formal with each step being justified by a vector space axiom or a known property of real numbers. There will not be many rigidly formal proofs of this type in the text, but we have included these to reinforce the idea that the familiar properties of vectors can all be derived from the vector space axioms.

THEOREM 4.1.1 Let V be a vector space, u a vector in V, and k a scalar; then: (a) (b) (c) (d) If

, then

or

.

We will prove parts (a) and (c) and leave proofs of the remaining parts as exercises. Proof (a) We can write

By Axiom 5 the vector 0u has a negative,

. Adding this negative to both sides above yields

or

Proof (c) To prove that

, we must show that

. The proof is as follows:

A Closing Observation This section of the text is very important to the overall plan of linear algebra in that it establishes a common thread between such diverse mathematical objects as geometric vectors, vectors in , infinite sequences, matrices, and real-valued functions, to name a few. As a result, whenever we discover a new theorem about general vector spaces, we will at the same time be discovering a theorem about geometric vectors, vectors in , sequences, matrices, real-valued functions, and about any new kinds of vectors that we might discover. To illustrate this idea, consider what the rather innocent-looking result in part (a) of Theorem 4.1.1 says about the vector space in Example 8. Keeping in mind that the vectors in that space are positive real numbers, that scalar multiplication means numerical exponentiation, and that the zero vector is the number 1, the equation is a statement of the fact that if u is a positive real number, then

Concept Review • Vector space • Closure under addition • Closure under scalar multiplication • Examples of vector spaces

Skills • Determine whether a given set with two operations is a vector space. • Show that a set with two operations is not a vector space by demonstrating that at least one of the vector space axioms fails.

Exercise Set 4.1

1. Let V be the set of all ordered pairs of real numbers, and consider the following addition and scalar multiplication operations on and : (a) Compute

and ku for

,

and

.

(b) In words, explain why V is closed under addition and scalar multiplication. (c) Since addition on V is the standard addition operation on known to hold for . Which axioms are they?

, certain vector space axioms hold for V because they are

(d) Show that Axioms 7, 8, and 9 hold. (e) Show that Axiom 10 fails and hence that V is not a vector space under the given operations. Answer: (a) (c) Axioms 1–5 2. Let V be the set of all ordered pairs of real numbers, and consider the following addition and scalar multiplication operations on and : (a) Compute

and ku for

(b) Show that

.

(c) Show that

,

, and

.

.

(d) Show that Axiom 5 holds by producing an ordered pair

such that

for

.

(e) Find two vector space axioms that fail to hold. In Exercises 3–12, determine whether each set equipped with the given operations is a vector space. For those that are not vector spaces identify the vector space axioms that fail. 3. The set of all real numbers with the standard operations of addition and multiplication. Answer: The set is a vector space with the given operations. 4. The set of all pairs of real numbers of the form (x, 0) with the standard operations on 5. The set of all pairs of real numbers of the form (x, y), where

.

, with the standard operations on

.

Answer: Not a vector space, Axioms 5 and 6 fail. 6. The set of all n-tuples of real numbers that have the form

with the standard operations on

.

7. The set of all triples of real numbers with the standard vector addition but with scalar multiplication defined by

Answer: Not a vector space. Axiom 8 fails. 8. The set of all

invertible matrices with the standard matrix addition and scalar multiplication.

9. The set of all

matrices of the form

with the standard matrix addition and scalar multiplication. Answer: The set is a vector space with the given operations. 10. The set of all real-valued functions f defined everywhere on the real line and such that Example 6.

with the operations used in

11. The set of all pairs of real numbers of the form (1, x) with the operations

Answer: The set is a vector space with the given operations. 12. The set of polynomials of the form

with the operations

and 13. Verify Axioms 3, 7, 8, and 9 for the vector space given in Example 4. 14. Verify Axioms 1, 2, 3, 7, 8, 9, and 10 for the vector space given in Example 6. 15. With the addition and scalar multiplication operations defined in Example 7, show that

satisfies Axioms 1-9.

16. Verify Axioms 1, 2, 3, 6, 8, 9, and 10 for the vector space given in Example 8. 17. Show that the set of all points in lying on a line is a vector space with respect to the standard operations of vector addition and scalar multiplication if and only if the line passes through the origin. 18. Show that the set of all points in lying in a plane is a vector space with respect to the standard operations of vector addition and scalar multiplication if and only if the plane passes through the origin. In Exercises 19–21, prove that the given set with the stated operations is a vector space. 19. The set

with the operations of addition and scalar multiplication given in Example 1.

20. The set of all infinite sequences of real numbers with the operations of addition and scalar multiplication given in Example 3. 21. The set

of all

matrices with the usual operations of addition and scalar multiplication.

22. Prove part (d) of Theorem 4.1.1. 23. The argument that follows proves that if u, v, and w are vectors in a vector space V such that (the cancellation law for vector addition). As illustrated, justify the steps by filling in the blanks.

24. Let v be any vector in a vector space V. Prove that

, then

.

25. Below is a seven-step proof of part (b) of Theorem 4.1.1. Justify each step either by stating that it is true by hypothesis or by specifying which of the ten vector space axioms applies. Hypothesis: Let u be any vector in a vector space V, let 0 be the zero vector in V, and let k be a scalar. Conclusion: Then

.

Proof: (1) k0 + ku = k(0 + u (2)

= ku

(3) Since ku is in V, −ku is in V. (4) Therefore, (k0 + ku + (−ku = ku + (−ku). (5)

k0 + (ku + (−ku)) = ku + (−ku)

(6)

k0 + 0 = 0

(7)

k0 = 0

26. Let v be any vector in a vector space V. Prove that

.

27. Prove: If u is a vector in a vector space V and k a scalar such that , then either or that if and , then . The result then follows as a logical consequence of this.]

. [Suggestion: Show

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) A vector is a directed line segment (an arrow). Answer: False (b) A vector is an n-tuple of real numbers. Answer: False (c) A vector is any element of a vector space. Answer: True (d) There is a vector space consisting of exactly two distinct vectors. Answer: False (e) The set of polynomials with degree exactly 1 is a vector space under the operations defined in Exercise 12. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

4.2 Subspaces It is possible for one vector space to be contained within another. We will explore this idea in this section, we will discuss how to recognize such vector spaces, and we will give a variety of examples that will be used in our later work. We will begin with some terminology.

DEFINITION 1 A subset W of a vector space V is called a subspace of V if W is itself a vector space under the addition and scalar multiplication defined on V.

In general, to show that a nonempty set W with two operations is a vector space one must verify the ten vector space axioms. However, if W is a subspace of a known vector space V, then certain axioms need not be verified because they are “inherited” from V. For example, it is not necessary to verify that holds in W because it holds for all vectors in V including those in W. On the other hand, it is necessary to verify that W is closed under addition and scalar multiplication since it is possible that adding two vectors in W or multiplying a vector in W by a scalar produces a vector in V that is outside of W (Figure 4.2.1).

Figure 4.2.1 The vectors u and v are in W, but the vectors

and ku are not

Those axioms that are not inherited by W are Axiom 1—Closure of W under addition Axiom 4—Existence of a zero vector in W Axiom 5—Existence of a negative in W for every vector in W Axiom 6—Closure of W under scalar multiplication so these must be verified to prove that it is a subspace of V. However, the following theorem shows that if Axiom 1 and Axiom 6 hold in W, then Axioms 4 and 5 hold in W as a consequence and hence need not be verified.

THEOREM 4.2.1

If W is a set of one or more vectors in a vector space V, then W is a subspace of V if and only if the following conditions hold. (a) If u and v are vectors in W, then

is in W.

(b) If k is any scalar and u is any vector in W, then ku is in W.

In words, Theorem 4.2.1 states that W is a subspace of V if and only if it is closed under addition and scalar multiplication. Proof If W is a subspace of V, then all the vector space axioms hold in W, including Axioms 1 and 6, which are precisely conditions (a) and (b). Conversely, assume that conditions (a) and (b) hold. Since these are Axioms 1 and 6, and since Axioms 2, 3, 7, 8, 9, and 10 are inherited from V, we only need to show that Axioms 4 and 5 hold in W. For this purpose, let u be any vector in W. It follows from condition (b) that ku is a vector in W for every scalar k. In particular, and are in W, which shows that Axioms 4 and 5 hold in W. Note that every vector space has at least two subspaces, itself and its zero subspace.

E X A M P L E 1 The Zero Subspace If V is any vector space, and if is the subset of V that consists of the zero vector only, then W is closed under addition and scalar multiplication since for any scalar k. We call W the zero subspace of V.

E X A M P L E 2 Lines Through the Origin Are Subspaces of R2 and of R3 If W is a line through the origin of either or , then adding two vectors on the line W or multiplying a vector on the line W by a scalar produces another vector on the line W, so W is closed under addition and scalar multiplication (see Figure 4.2.2 for an illustration in ).

Figure 4.2.2

E X A M P L E 3 Planes Through the Origin AreSubspaces of R3 If u and v are vectors in a plane W through the origin of , then it is evident geometrically that and ku lie in the same plane W for any scalar k (Figure 4.2.3). Thus W is closed under addition and scalar multiplication.

Figure 4.2.3 The vectors

and ku both lie in the same plane as u and v

Table 1 that follows gives a list of subspaces of and of later that these are the only subspaces of and of .

that we have encountered thus far. We will see

Table 1 Subspaces of

Subspaces of

• {0}

• {0}

• Lines through the origin • Lines through the origin •

• Planes through the origin •

E X A M P L E 4 A Subset of R2 That Is Not a Subspace

Let W be the set of all points (x, y) in for which and (the shaded region in Figure 4.2.4). This set is not a subspace of because it is not closed under scalar multiplication. For example, is a vector in W, but is not.

Figure 4.2.4 W is not closed under scalar multiplication

E X A M P L E 5 Subspaces of Mnn We know from Theorem 1.7.2 that the sum of two symmetric matrices is symmetric and that a scalar multiple of a symmetric matrix is symmetric. Thus, the set of symmetric matrices is closed under addition and scalar multiplication and hence is a subspace of . Similarly, the sets of upper triangular matrices, lower triangular matrices, and diagonal matrices are subspaces of .

E X A M P L E 6 A Subset of Mnn That Is Not a Subspace The set W of invertible matrices is not a subspace of , failing on two counts—it is not closed under addition and not closed under scalar multiplication. We will illustrate this with an example in that you can readily adapt to . Consider the matrices

The matrix 0U is the zero matrix and hence is not invertible, and the matrix column of zeros, so it also is not invertible.

has a

CALCULUS REQUIRED

E X A M P L E 7 The Subspace C(−∞, ∞) There is a theorem in calculus which states that a sum of continuous functions is continuous and that a constant times a continuous function is continuous. Rephrased in vector language, the set of continuous functions on is a subspace of . We will denote this

subspace by

.

CALCULUS REQUIRED

E X A M P L E 8 Functions with Continuous Derivatives A function with a continuous derivative is said to be continuously differentiable. There is a theorem in calculus which states that the sum of two continuously differentiable functions is continuously differentiable and that a constant times a continuously differentiable function is continuously differentiable. Thus, the functions that are continuously differentiable on form a subspace of . We will denote this subspace by , where the superscript emphasizes that the first derivative is continuous. To take this a step further, the set of functions with m continuous derivatives on is a subspace of as is the set of functions with derivatives of all orders on . We will denote these subspaces by and , respectively.

E X A M P L E 9 The Subspace of All Polynomials Recall that a polynomial is a function that can be expressed in the form (1) where are constants. It is evident that the sum of two polynomials is a polynomial and that a constant times a polynomial is a polynomial. Thus, the set W of all polynomials is closed under addition and scalar multiplication and hence is a subspace of . We will denote this space by .

E X A M P L E 1 0 The Subspace of Polynomials of Degree ≤ n Recall that the degree of a polynomial is the highest power of the variable that occurs with a nonzero coefficient. Thus, for example, if in Formula 1, then that polynomial has degree n. It is not true that the set W of polynomials with positive degree n is a subspace of because that set is not closed under addition. For example, the polynomials both have degree 2, but their sum has degree 1. What is true, however, is that for each nonnegative integer n the polynomials of degree n or less form a subspace of . We will denote this space by .

In this text we regard all constants to be polynomials of degree zero. Be aware, however, that some authors do not assign a degree to the constant 0.

The Hierarchy of Function Spaces It is proved in calculus that polynomials are continuous functions and have continuous derivatives of all orders on . Thus, it follows that is not only a subspace of , as previously observed, but is also a subspace of . We leave it for you to convince yourself that the vector spaces discussed in Example 7 to Example 10 are “nested” one inside the other as illustrated in Figure 4.2.5.

Figure 4.2.5

Remark In our previous examples, and as illustrated in Figure 4.2.5, we have only considered functions that are defined at all points of the interval . Sometimes we will want to consider functions that are only defined on some subinterval of , say the closed interval [a, b] or the open interval (a, b). In such cases we will make an appropriate notation change. For example, C[a, b] is the space of continuous functions on [a, b] and C(a, b) is the space of continuous functions on (a, b).

Building Subspaces The following theorem provides a useful way of creating a new subspace from known subspaces.

THEOREM 4.2.2 If subspace of V.

are subspaces of a vector space V, then the intersection of these subspaces is also a

Note that the first step in proving Theorem 4.2.2 was to establish that W contained at least one vector. This is important, for otherwise the subsequent argument might be logically correct but meaningless. Proof Let W be the intersection of the subspaces . This set is not empty because each of these subspaces contains the zero vector of V, and hence so does their intersection. Thus, it remains to show that W is closed under addition and scalar multiplication. To prove closure under addition, let u and v be vectors in W. Since W is the intersection of , it follows that u and v also lie in each of these subspaces. Since these subspaces are all closed under addition, they all contain the vector and hence so does their intersection W. This proves that W is closed under addition. We leave the proof that W is closed under scalar multiplication to you. Sometimes we will want to find the “smallest” subspace of a vector space V that contains all of the vectors in some set of interest. The following definition, which generalizes Definition 4 of Section 3.1, will help us to do that. If

, then Equation 2 has the form , in which case the linear combination is just a scalar multiple of .

DEFINITION 2 If w is a vector in a vector space V, then w is said to be a linear combination of the vectors in V if w can be expressed in the form (2) where

are scalars. These scalars are called the coefficients of the linear combination.

THEOREM 4.2.3 If

is a nonempty set of vectors in a vector space V, then:

(a) The set W of all possible linear combinations of the vectors in S is a subspace of V. (b) The set W in part (a) is the “smallest” subspace of V that contains all of the vectors in S in the sense that any other subspace that contains those vectors contains W.

Proof (a) Let W be the set of all possible linear combinations of the vectors in S. We must show that S is closed under addition and scalar multiplication. To prove closure under addition, let

be two vectors in S. It follows that their sum can be written as which is a linear combination of the vectors in S. Thus, W is closed under addition. We leave it for you to prove that W is also closed under scalar multiplication and hence is a subspace of V. Proof (b) Let W′ be any subspace of V that contains all of the vectors in S. Since W′ is closed under addition and scalar multiplication, it contains all linear combinations of the vectors in S and hence contains W. The following definition gives some important notation and terminology related to Theorem 4.2.3.

DEFINITION 3 The subspace of a vector space V that is formed from all possible linear combinations of the vectors in a nonempty set S is called the span of S, and we say that the vectors in S span that subspace. If , then we denote the span of S by

E X A M P L E 11 The Standard Unit Vectors Span Rn Recall that the standard unit vectors in These vectors span

since every vector

which is a linear combination of span

are

since every vector

in

can be expressed as

. Thus, for example, the vectors in this space can be expressed as

E X A M P L E 1 2 A Geometric View of Spanning in R2 and R3 (a) If v is a nonzero vector in or that has its initial point at the origin, then span{v}, which is the set of all scalar multiples of v, is the line through the origin determined by v. You should be able to visualize this from Figure 4.2.6a by observing that the tip of the vector kv can be made to fall at any point on the line by choosing the value of k appropriately.

George William Hill (1838-1914) Historical Note The terms linearly independent and linearly dependent were introduced by Maxime Bôcher (see p. 7) in his book Introduction to Higher Algebra, published in 1907. The term linear combination is due to the American mathematician G. W. Hill, who introduced it in a research paper on planetary motion published in 1900. Hill was a “loner” who preferred to work out of his home in West Nyack, New York, rather than in academia, though he did try lecturing at Columbia University for a few years. Interestingly, he apparently returned the teaching salary, indicating that he did not need the money and did not want to be bothered looking after it. Although technically a mathematician, Hill had little interest in modern developments of mathematics and worked almost entirely on the theory of planetary orbits. [Image: Courtesy of the American Mathematical Society]

(b) If

and

are nonzero vectors in that have their initial points at the origin, then , which consists of all linear combinations of and , is the plane through the origin determined by these two vectors. You should be able to visualize this from Figure 4.2.6b by observing that the tip of the vector can be made to fall at any point in the plane by adjusting the scalars and to lengthen, shorten, or reverse the directions of the vectors and appropriately.

Figure 4.2.6

E X A M P L E 1 3 A Spanning Set for Pn The polynomials polynomial p in

span the vector space

defined in Example 10 since each

can be written as

which is a linear combination of

. We can denote this by writing

The next two examples are concerned with two important types of problems: • Given a set S of vectors in vectors in S.

and a vector v in

, determine whether v is a linear combination of the

• Given a set S of vectors in

, determine whether the vectors span

.

E X A M P L E 1 4 Linear Combinations Consider the vectors and linear combination of u and v and that

in . Show that is a is not a linear combination of u and v.

Solution In order for w to be a linear combination of u and v, there must be scalars such that ; that is,

and

or Equating corresponding components gives

Solving this system using Gaussian elimination yields

,

, so

Similarly, for w′ to be a linear combination of u and v, there must be scalars ; that is, or

and

such that

Equating corresponding components gives

This system of equations is inconsistent (verify), so no such scalars Consequently, w′ is not a linear combination of u and v.

and

exist.

E X A M P L E 1 5 Testing for Spanning Determine whether

, and

Solution We must determine whether an arbitrary vector expressed as a linear combination of the vectors

,

, and

span the vector space in

.

can be

. Expressing this equation in terms of components gives

or or

Thus, our problem reduces to ascertaining whether this system is consistent for all values of , , and . One way of doing this is to use parts (e) and (g) of Theorem 2.3.8, which state that the system is consistent if and only if its coefficient matrix

has a nonzero determinant. But this is not the case here; we leave it for you to confirm that , so , , and do not span .

Solution Spaces of Homogeneous Systems The solutions of a homogeneous linear system of m equations in n unknowns can be viewed as vectors in . The following theorem provides a useful insight into the geometric structure of the solution set.

THEOREM 4.2.4 The solution set of a homogeneous linear system

in n unknowns is a sub space of

.

Proof Let W be the solution set for the system. The set W is not empty because it contains at least the trivial solution . To show that W is a subspace of , we must show that it is closed under addition and scalar multiplication. To do this, let and be vectors in W. Since these vectors are solutions of , we have It follows from these equations and the distributive property of matrix multiplication that so W is closed under addition. Similarly, if k is any scalar then so W is also closed under scalar multiplication. Because the solution set of a homogeneous system in n unknowns is actually a subspace of , we will generally refer to it as the solution space of the system.

E X A M P L E 1 6 Solution Spaces of Homogeneous Systems Consider the linear systems (a)

(b)

(c)

(d)

Solution (a) We leave it for you to verify that the solutions are from which it follows that

This is the equation of a plane through the origin that has

as a normal.

(b) We leave it for you to verify that the solutions are which are parametric equations for the line through the origin that is parallel to the vector . (c) We leave it for you to verify that the only solution is space is {0}.

, so the solution

(d) This linear system is satisfied by all real values of x, y, and z, so the solution space is all of .

Remark Whereas the solution set of every homogeneous system of m equations in n unknowns is a subspace of , it is never true that the solution set of a nonhomogeneous system of m equations in n unknowns is a subspace of . There are two possible scenarios: first, the system may not have any solutions at all, and second, if there are solutions, then the solution set will not be closed under either addition or under scalar multiplication (Exercise 18).

A Concluding Observation It is important to recognize that spanning sets are not unique. For example, any nonzero vector on the line in Figure 4.2.6a will span that line, and any two noncollinear vectors in the plane in Figure 4.2.6b will span that plane. The following theorem, whose proof we leave as an exercise, states conditions under which two sets of vectors will span the same space.

THEOREM 4.2.5 If then

and

are nonempty sets of vectors in a vector space V,

if and only if each vector in S is a linear combination of those in S′, and each vector in S′ is a linear combination of those in S.

Concept Review • Subspace

• Zero subspace • Examples of subspaces • Linear combination • Span • Solution space

Skills • Determine whether a subset of a vector space is a subspace. • Show that a subset of a vector space is a subspace. • Show that a nonempty subset of a vector space is not a subspace by demonstrating that the set is either not closed under addition or not closed under scalar multiplication. • Given a set S of vectors in the vectors in S.

and a vector v in

, determine whether v is a linear combination of

• Given a set S of vectors in

, determine whether the vectors in S span

.

• Determine whether two nonempty sets of vectors in a vector space V span the same subspace of V.

Exercise Set 4.2 1. Use Theorem 4.2.1 to determine which of the following are subspaces of

.

(a) All vectors of the form (a, 0, 0). (b) All vectors of the form (a, 1, 1). (c) All vectors of the form (a, b, c), where

.

(d) All vectors of the form (a, b, c), where

.

(e) All vectors of the form (a, b, 0). Answer: (a), (c), (e) 2. Use Theorem 4.2.1 to determine which of the following are subspaces of (a) The set of all diagonal

matrices.

(b) The set of all

matrices A such that

(c) The set of all

matrices A such that

(d) The set of all symmetric

. .

matrices.

(e) The set of all

matrices A such that

(f) The set of all

matrices A for which

(g) The set of all

matrices A such that

. has only the trivial solution. for some fixed

3. Use Theorem 4.2.1 to determine which of the following are subspaces of (a) All polynomials

.

for which

.

matrix B. .

(b) All polynomials

for which

.

(c) All polynomials of the form

in which

(d) All polynomials of the form

, where

and

,

,

, and

are integers.

are real numbers.

Answer: (a), (b), (d) 4. Which of the following are subspaces of

?

(a) All functions f in

for which

.

(b) All functions f in

for which

.

(c) All functions f in

for which

.

(d) All polynomials of degree 2. 5. Which of the following are subspaces of

?

(a) All sequences v in

of the form

.

(b) All sequences v in

of the form

.

(c) All sequences v in

of the form

(d) All sequences in

.

whose components are 0 from some point on.

Answer: (a), (c), (d) 6. A line L through the origin in can be represented by parametric equations of the form , , and . Use these equations to show that L is a subspace of by showing that if and are points on L and k is any real number, then k and are also points on L. 7. Which of the following are linear combinations of

and

?

(a) (2,2,2) (b) (3,1,5) (c) (0, 4, 5) (d) (0, 0, 0) Answer: (a), (b), (d) 8. Express the following as linear combinations of (a) (b) (6,11,6) (c) (0,0,0) (d) (7,8,9) 9. Which of the following are linear combinations of

,

, and

.

(a) (b) (c) (d)

Answer: (a), (b), (c) 10. In each part express the vector as a linear combination of .

,

, and

(a) (b) (c) 0 (d) 11. In each part, determine whether the given vectors span (a)

,

(b)

, ,

(c)

,

(d)

,

.

, , ,

, ,

Answer: (a) The vectors span (b) The vectors do not span (c) The vectors do not span (d) The vectors span 12. Suppose that vectors are in

,

, and ?

(a) (b) (0, 0, 0, 0) (c) (1,1, 1, 1) (d) 13. Determine whether the following polynomials span

.

. Which of the following

Answer: The polynomials do not span 14. Let

and

. Which of the following lie in the space spanned by f and g?

(a) (b) (c) 1 (d) (e) 0 15. Determine whether the solution space of the system is a line through the origin, a plane through the origin, or the origin only. If it is a plane, find an equation for it. If it is a line, find parametric equations for it. (a)

(b)

(c)

(d)

(e)

(f)

Answer: (a) Line; (b) Line; (c) Origin (d) Origin

(e) Line; (f) Plane; 16. (Calculus required) Show that the following sets of functions are subspaces of (a) All continuous functions on

.

.

(b) All differentiable functions on

.

(c) All differentiable functions on

that satisfy

.

17. (Calculus required) Show that the set of continuous functions

on [a, b] such that

is a subspace of C[a, b]. 18. Show that the solution vectors of a consistent nonhomoge- neous system of m linear equations in n unknowns do not form a subspace of . 19. Prove Theorem 4.2.5. 20. Use Theorem 4.2.5 to show that the vectors , vectors , span the same subspace of

, and the .

True-False Exercises In parts (a)–(k) determine whether the statement is true or false, and justify your answer. (a) Every subspace of a vector space is itself a vector space. Answer: True (b) Every vector space is a subspace of itself. Answer: True (c) Every subset of a vector space V that contains the zero vector in V is a subspace of V. Answer: False (d) The set

is a subspace of

.

Answer: False (e) The solution set of a consistent linear system Answer:

of m equations in n unknowns is a subspace of

.

False (f) The span of any finite set of vectors in a vector space is closed under addition and scalar multiplication. Answer: True (g) The intersection of any two subspaces of a vector space V is a subspace of V. Answer: True (h) The union of any two subspaces of a vector space V is a subspace of V. Answer: False (i) Two subsets of a vector space V that span the same subspace of V must be equal. Answer: False (j) The set of upper triangular

matrices is a subspace of the vector space of all

Answer: True (k) The polynomials

,

, and

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

span

.

matrices.

4.3 Linear Independence In this section we will consider the question of whether the vectors in a given set are interrelated in the sense that one or more of them can be expressed as a linear combination of the others. This is important to know in applications because the existence of such relationships often signals that some kind of complication is likely to occur.

Extraneous Vectors In a rectangular xy-coordinate system every vector in the plane can be expressed in exactly one way as a linear combination of the standard unit vectors. For example, the only way to express the vector (3, 2) as a linear combination of and is (1) (Figure 4.3.1). Suppose, however, that we were to introduce a third coordinate axis that makes an angle of 45° with the x-axis. Call it the w-axis. As illustrated in Figure 4.3.2, the unit vector along the w-axis is

Whereas Formula 1 shows the only way to express the vector (3, 2) as a linear combination of i and j, there are infinitely many ways to express this vector as a linear combination of i, j, and w. Three possibilities are

In short, by introducing a superfluous axis we created the complication of having multiple ways of assigning coordinates to points in the plane. What makes the vector w superfluous is the fact that it can be expressed as a linear combination of the vectors i and j, namely,

Thus, one of our main tasks in this section will be to develop ways of ascertaining whether one vector in a set S is a linear combination of other vectors in S.

Figure 4.3.1

Figure 4.3.2

Linear Independence and Dependence We will often apply the terms linearly independent and linearly dependent to the vectors themselves rather than to the set.

DEFINITION 1 If

is a nonempty set of vectors in a vector space V, then the vector equation

has at least one solution, namely, We call this the trivial solution. If this is the only solution, then S is said to be a linearly independent set. If there are solutions in addition to the trivial solution, then S is said to be a linearly dependent set.

E X A M P L E 1 Linear Independence of the Standard Unit Vectors in Rn The most basic linearly independent set in

is the set of standard unit vectors

For notational simplicity, we will prove the linear independence in

of

The linear independence or linear dependence of these vectors is determined by whether there exist nontrivial solutions of the vector equation

Since the component form of this equation is it follows that linearly independent.

. This implies that 2 has only the trivial solution and hence that the vectors are

E X A M P L E 2 Linear Independence in R3 Determine whether the vectors are linearly independent or linearly dependent in

.

Solution The linear independence or linear dependence of these vectors is determined by whether there exist nontrivial solutions of the vector equation (3) or, equivalently, of Equating corresponding components on the two sides yields the homogeneous linear system (4) Thus, our problem reduces to determining whether this system has nontrivial solutions. There are various ways to do this; one possibility is to simply solve the system, which yields

(we omit the details). This shows that the system has nontrivial solutions and hence that the vectors are linearly dependent. A second method for obtaining the same result is to compute the determinant of the coefficient matrix

and use parts (b) and (g) of Theorem 2.3.8. We leave it for you to verify that which it follows 3 has nontrivial solutions and the vectors are linearly dependent.

, from

In Example 2, what relationship do you see between the components of , , and and the columns of the coefficient matrix A?

E X A M P L E 3 Linear Independence in R4 Determine whether the vectors in

are linearly dependent or linearly independent.

Solution The linear independence or linear dependence of these vectors is determined by whether there exist nontrivial solutions of the vector equation or, equivalently, of Equating corresponding components on the two sides yields the homogeneous linear system

We leave it for you to show that this system has only the trivial solution from which you can conclude that

,

, and

are linearly independent.

E X A M P L E 4 An Important Linearly Independent Set in Pn Show that the polynomials form a linearly independent set in

.

Solution For convenience, let us denote the polynomials as We must show that the vector equation (5) has only the trivial solution

But 5 is equivalent to the statement that (6) for all x in , so we must show that this holds if and only if each coefficient in 6 is zero. To see that this is so, recall from algebra that a nonzero polynomial of degree n has at most n distinct roots. That being the case, each coefficient in 6 must be zero, for otherwise the left side of the equation would be a nonzero polynomial with infinitely many roots. Thus, 5 has only the trivial solution.

The following example shows that the problem of determining whether a given set of vectors in is linearly independent or linearly dependent can be reduced to determining whether a certain set of vectors in is linearly dependent or independent.

E X A M P L E 5 Linear Independence of Polynomials Determine whether the polynomials are linearly dependent or linearly independent in

.

Solution The linear independence or linear dependence of these vectors is determined by whether there exist nontrivial solutions of the vector equation (7) This equation can be written as (8) or, equivalently, as

Since this equation must be satisfied by all x in , each coefficient must be zero (as explained in the previous example). Thus, the linear dependence or independence of the given polynomials hinges on whether the following linear system has a nontrivial solution: (9) We leave it for you to show that this linear system has a nontrivial solutions either by solving it directly or by showing that the coefficient matrix has determinant zero. Thus, the set is linearly dependent.

In Example 5, what relationship do you see between the coefficients of the given polynomials and the column vectors of the coefficient matrix of system 9?

An Alternative Interpretation of Linear Independence The terms linearly dependent and linearly independent are intended to indicate whether the vectors in a given set are interrelated in some way. The following theorem, whose proof is deferred to the end of this section, makes this idea more precise.

THEOREM 4.3.1 A set S with two or more vectors is (a) Linearly dependent if and only if at least one of the vectors in S is expressible as a linear combination of the other vectors in S. (b) Linearly independent if and only if no vector in S is expressible as a linear combination of the other vectors in S.

E X A M P L E 6 Example 1 Revisited In Example 1 we showed that the standard unit vectors in are linearly independent. Thus, it follows from Theorem 4.3.1 that none of these vectors is expressible as a linear combination of the other two. To illustrate this in , suppose, for example, that or in terms of components that Since this equation cannot be satisfied by any values of and , there is no way to express k as a linear combination of i and j. Similarly, i is not expressible as a linear combination of j and k, and j is not expressible as a linear combination of i and k.

E X A M P L E 7 Example 2 Revisited In Example 2 we saw that the vectors are linearly dependent. Thus, it follows from Theorem 4.3.1 that at least one of these vectors is

expressible as a linear combination of the other two. We leave it for you to confirm that these vectors satisfy the equation

from which it follows, for example, that

Sets with One or Two Vectors The following basic theorem is concerned with the linear independence and linear dependence of sets with one or two vectors and sets that contain the zero vector.

THEOREM 4.3.2 (a) A finite set that contains 0 is linearly dependent. (b) A set with exactly one vector is linearly independent if and only if that vector is not 0. (c) A set with exactly two vectors is linearly independent if and only if neither vector is a scalar multiple of the other.

Józef Hoëné de Wroński (1778–1853) Historical Note The Polish-French mathematician Józef Hoëné de Wroński was born Józef Hoëné and adopted the name Wroński after he married. Wroński's life was fraught with controversy and conflict, which some say was due to his psychopathic tendencies and his exaggeration of the importance of his own work. Although Wroński's work was dismissed as rubbish for many years, and much of it was indeed erroneous, some of his ideas contained hidden brilliance and have survived. Among other things, Wroński designed a caterpillar vehicle to compete with trains (though it was

never manufactured) and did research on the famous problem of determining the longitude of a ship at sea. His final years were spent in poverty. [Image: wikipedia]

We will prove part (a) and leave the rest as exercises. Proof (a) For any vectors equation

, the set

is linearly dependent since the

expresses 0 as a linear combination of the vectors in S with coefficients that are not all zero.

E X A M P L E 8 Linear Independence of Two Functions The functions and are linearly independent vectors in since neither function is a scalar multiple of the other. On the other hand, the two functions and are linearly dependent because the trigonometric identity reveals that and are scalar multiples of each other.

A Geometric Interpretation of Linear Independence Linear independence has the following useful geometric interpretations in

and

:

• Two vectors in or are linearly independent if and only if they do not lie on the same line when they have their initial points at the origin. Otherwise one would be a scalar multiple of the other (Figure 4.3.3).

Figure 4.3.3 • Three vectors in are linearly independent if and only if they do not lie in the same plane when they have their initial points at the origin. Otherwise at least one would be a linear combination of the other two (Figure 4.3.4).

Figure 4.3.4 At the beginning of this section we observed that a third coordinate axis in is superfluous by showing that a unit vector along such an axis would have to be expressible as a linear combination of unit vectors along the positive x- and y-axis. That result is a consequence of the next theorem, which shows that there can be at most n vectors in any linearly independent set . It follows from Theorem 4.3.3, for example, that a set in with more than two vectors is linearly dependent and a set in with more than three vectors is linearly dependent.

THEOREM 4.3.3 Let

be a set of vectors in

. If

, then S is linearly dependent.

Proof Suppose that

and consider the equation If we express both sides of this equation in terms of components and then equate the corresponding components, we obtain the system

This is a homogeneous system of n equations in the r unknowns Theorem 1.2.2 that the system has nontrivial solutions. Therefore, dependent set.

. Since

, it follows from is a linearly

CALCULUS REQUIRED

Linear Independence of Functions Sometimes linear dependence of functions can be deduced from known identities. For example, the functions form a linearly dependent set in

expresses 0 as a linear combination of

, since the equation

,

, and

with coefficients that are not all zero.

Unfortunately, there is no general method that can be used to determine whether a set of functions is linearly independent or linearly dependent. However, there does exist a theorem that is useful for establishing linear independence in certain circumstances. The following definition will be useful for discussing that theorem.

DEFINITION 2 If interval

are functions that are

times differentiable on the

, then the determinant

is called the Wronskian of

Suppose for the moment that

.

,

are linearly dependent vectors in

. This implies that for certain values of the coefficients the vector equation

has a nontrivial solution, or equivalently that the equation

is satisfied for all x in . Using this equation together with those that result by differentiating it times yields the linear system

Thus, the linear dependence of

implies that the linear system

(10)

has a nontrivial solution. But this implies that the determinant of the coefficient matrix of 10 is zero for every such x. Since this determinant is the Wronskian of , we have established the following result.

THEOREM 4.3.4 If the functions have continuous derivatives on the interval , and if the Wronskian of these functions is not identically zero on , then these functions form a linearly independent set of vectors in

.

In Example 8 we showed that x and are linearly independent functions by observing that neither is a scalar multiple of the other. The following example shows how to obtain the same result using the Wronskian (though it is a more complicated procedure in this particular case).

E X A M P L E 9 Linear Independence Using the Wronskian Use the Wronskian to show that

and

are linearly independent.

Solution The Wronskian is

This function is not identically zero on the interval

Thus, the functions are linearly independent.

since, for example,

WARNING The converse of Theorem 4.3.4 is false. If the Wronskian of is identically zero on , then no conclusion can be reached about the linear independence of — this set of vectors may be linearly independent or linearly dependent.

E X A M P L E 1 0 Linear Independence Using the Wronskian Use the Wronskian to show that

,

, and

are linearly independent.

Solution The Wronskian is

This function is obviously not identically zero on independent set.

, so

,

, and

form a linearly

OPTIONAL

We will close this section by proving part (a) of Theorem 4.3.1. We will leave the proof of part (b) as an exercise. Proof of Theorem 4.3.1 (a) Let that S is linearly dependent, then there are scalars

be a set with two or more vectors. If we assume , not all zero, such that

(11) To be specific, suppose that

. Then 11 can be rewritten as

which expresses as a linear combination of the other vectors in S. Similarly, if , then is expressible as a linear combination of the other vectors in S.

in 11 for some

Conversely, let us assume that at least one of the vectors in S is expressible as a linear combination of the other vectors. To be specific, suppose that so

It follows that S is linearly dependent since the equation is satisfied by which are not all zero. The proof in the case where some vector other than combination of the other vectors in S is similar.

is expressible as a linear

Concept Review • Trivial solution • Linearly independent set • Linearly dependent set • Wronskian

Skills • Determine whether a set of vectors is linearly independent or linearly dependent. • Express one vector in a linearly dependent set as a linear combination of the other vectors in the set. • Use the Wronskian to show that a set of functions is linearly independent.

Exercise Set 4.3 1. Explain why the following are linearly dependent sets of vectors. (Solve this problem by inspection.) (a)

and

(b)

,

(c)

in ,

in

and

(d)

in

and

in

Answer: (a)

is a scalar multiple of

.

(b) The vectors are linearly dependent by Theorem 4.3.3. (c)

is a scalar multiple of

.

(d) B is a scalar multiple of A. 2. Which of the following sets of vectors in (a)

are linearly dependent?

(b) (c) (d) 3. Which of the following sets of vectors in (a) (b)

,

are linearly dependent?

,

,

,

,

(c)

,

(d)

,

,

,

,

,

Answer: None 4. Which of the following sets of vectors in (a) (b)

,

are linearly dependent?

,

,

,

(c) (d)

,

,

,

5. Assume that , , and are vectors in that have their initial points at the origin. In each part, determine whether the three vectors lie in a plane. (a)

,

,

(b)

,

,

Answer: (a) They do not lie in a plane. (b) They do lie in a plane. 6. Assume that , , and are vectors in that have their initial points at the origin. In each part, determine whether the three vectors lie on the same line. (a)

,

(b)

,

(c)

,

7. (a) Show that the three vectors linearly dependent set in .

, , , ,

, and

(b) Express each vector in part (a) as a linear combination of the other two. Answer: (b)

form a

8. (a) Show that the three vectors linearly dependent set in .

,

, and

form a

(b) Express each vector in part (a) as a linear combination of the other two. 9. For which real values of

do the following vectors form a linearly dependent set in

?

Answer:

10. Show that if

is a linearly independent set of vectors, then so are , and .

11. Show that if subset of S.

is a linearly independent set of vectors, then so is every nonempty

12. Show that if is a linearly dependent set of vectors in a vector space V, and vector in V that is not in S, then is also linearly dependent. 13. Show that if

is any

is a linearly dependent set of vectors in a vector space V, and if are any vectors in V that are not in S, then is also linearly

dependent. 14. Show that in

every set with more than three vectors is linearly dependent.

15. Show that if is linearly independent and linearly independent.

does not lie in

16. Prove: For any vectors u, v, and w in a vector space V, the vectors linearly dependent set. 17. Prove: The space spanned by two vectors in the origin itself.

, then ,

, and

is form a

is a line through the origin, a plane through the origin, or

18. Under what conditions is a set with one vector linearly independent? 19. Are the vectors , , and those in part (b)? Explain.

in part (a) of the accompanying figure linearly independent? What about

Figure Ex-19

Answer: (a) They are linearly independent since with their initial points at the origin. (b) They are not linearly independent since with their initial points at the origin.

, and

do not lie in the same plane when they are placed

, and

line in the same plane when they are placed

20. By using appropriate identities, where required, determine which of the following sets of vectors in are linearly dependent. (a) (b) (c) (d) (e) (f) 21. The functions and are linearly independent in because neither function is a scalar multiple of the other. Confirm the linear independence using Wroński's test. Answer: for some x. 22. The functions and are linearly independent in because neither function is a scalar multiple of the other. Confirm the linear independence using Wroński's test. 23. (Calculus required) Use the Wronskian to show that the following sets of vectors are linearly independent. (a) (b) Answer: (a) (b) 24. Show that the functions 25. Show that the functions

,

, and ,

Answer: for some x. 26. Use part (a) of Theorem 4.3.1 to prove part (b).

, and

are linearly independent. are linearly independent.

27. Prove part (b) of Theorem 4.3.2. 28. (a) In Example 1 we showed that the mutually orthogonal vectors i, j, and k form a linearly independent set of vectors in . Do you think that every set of three nonzero mutually orthogonal vectors in is linearly independent? Justify your conclusion with a geometric argument. (b) Justify your conclusion with an algebraic argument. [Hint: Use dot products.]

True-False Exercises In parts (a)–(h) determine whether the statement is true or false, and justify your answer. (a) A set containing a single vector is linearly independent. Answer: False (b) The set of vectors

is linearly dependent for every scalar k.

Answer: True (c) Every linearly dependent set contains the zero vector. Answer: False (d) If the set of vectors is linearly independent, then independent for every nonzero scalar k.

is also linearly

Answer: True (e) If are linearly dependent nonzero vectors, then at least one vector combination of

is a unique linear

Answer: True (f) The set of

matrices that contain exactly two 1's and two 0's is a linearly independent set in

Answer: False (g) The three polynomials Answer: True

, and

are linearly independent.

.

(h) The functions

and

are linearly dependent if there is a real number x so that for some scalars and .

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

4.4 Coordinates and Basis We usually think of a line as being one-dimensional, a plane as two-dimensional, and the space around us as threedimensional. It is the primary goal of this section and the next to make this intuitive notion of dimension precise. In this section we will discuss coordinate systems in general vector spaces and lay the groundwork for a precise definition of dimension in the next section.

Coordinate Systems in Linear Algebra In analytic geometry we learned to use rectangular coordinate systems to create a one-to-one correspondence between points in 2-space and ordered pairs of real numbers and between points in 3-space and ordered triples of real numbers (Figure 4.4.1). Although rectangular coordinate systems are common, they are not essential. For example, Figure 4.4.2 shows coordinate systems in 2-space and 3-space in which the coordinate axes are not mutually perpendicular.

Figure 4.4.1

Figure 4.4.2 In linear algebra coordinate systems are commonly specified using vectors rather than coordinate axes. For example, in Figure 4.4.3 we have recreated the coordinate systems in Figure 4.4.2 by using unit vectors to identify the positive directions and then attaching coordinates to a point P using the scalar coefficients in the equations

Figure 4.4.3 Units of measurement are essential ingredients of any coordinate system. In geometry problems one tries to use the same unit of measurement on all axes to avoid distorting the shapes of figures. This is less important in applications where coordinates represent physical quantities with diverse units (for example, time in seconds on one axis and temperature in degrees Celsius on another axis). To allow for this level of generality, we will relax the requirement that unit vectors be used to identify the positive directions and require only that those vectors be linearly independent. We will refer to these as the “basis vectors” for the coordinate system. hi summary, it is the directions of the basis vectors that establish the positive directions, and it is the lengths of the basis vectors that establish the spacing between the integer points on the axes (Figure 4.4.4).

Figure 4.4.4

Basis for a Vector Space The following definition will make the preceding ideas more precise and will enable us to extend the concept of a coordinate system to general vector spaces. Note that in Definition 1 we have required a basis to have finitely many vectors. Some authors call this a finite basis, but we will not use this terminology.

DEFINITION 1 If V is any vector space and V if the following two conditions hold:

is a finite set of vectors in V, then S is called a basis for

(a) S is linearly independent. (b) S spans V.

If you think of a basis as describing a coordinate system for a vector space in V, then part (a) of this definition guarantees that there is no interrelationship between the basis vectors, and part (b) guarantees that there are enough basis vectors to provide coordinates for all vectors in V. Here are some examples.

E X A M P L E 1 The Standard Basis for Rn Recall from Example 11 of Section 4.2 that the standard unit vectors span and from Example 1 of Section 4.3 that they are linearly independent. Thus, they form a basis for that we call the standard basis for . In particular, is the standard basis for

.

E X A M P L E 2 The Standard Basis for Pn Show that

is a basis for the vector space

of polynomials of degree n or

less. Solution We must show that the polynomials in S are linearly independent and span denote these polynomials by We showed in Example 13 of Section 4.2 that these vectors span 4.3 that they are linearly independent. Thus, they form a basis for for .

. Let us

and in Example 4 of Section that we call the standard basis

E X A M P L E 3 Another Basis for R3 Show that the vectors

, and

form a basis for

Solution We must show that these vectors are linearly independent and span independence we must show that the vector equation

.

. To prove linear

(1) has only the trivial solution; and to prove that the vectors span in can be expressed as

we must show that every vector

(2) By equating corresponding components on the two sides, these two equations can be expressed as the linear systems (3) (verify). Thus, we have reduced the problem to showing that in 3 the homogeneous system has only the trivial solution and that the nonhomogeneous system is consistent for all values of , , and . But the two systems have the same coefficient matrix

so it follows from parts (b), (e), and (g) of Theorem 2.3.8 that we can prove both results at the same time by showing that . We leave it for you to confirm that , which proves that the vectors , , and form a basis for .

E X A M P L E 4 The Standard Basis for Mmn Show that the matrices

form a basis for the vector space

of

matrices.

Solution We must show that the matrices are linearly independent and span independence we must show that the equation

. To prove linear

(4) has only the trivial solution, where 0 is the we must show that every matrix

zero matrix; and to prove that the matrices span

can be expressed as (5)

The matrix forms of Equations 4 and 5 are

and

which can be rewritten as

Since the first equation has only the trivial solution the matrices are linearly independent, and since the second equation has the solution the matrices span . This proves that the matrices , , , form a basis for . More generally, the mn different matrices whose entries are zero except for a single entry of 1 form a basis for called the standard basis for .

Some writers define the empty set to be a basis for the zero vector space, but we will not do so.

It is not true that every vector space has a basis in the sense of Definition 1. The simplest example is the zero vector space, which contains no linearly independent sets and hence no basis. The following is an example of a nonzero vector space that has no basis in the sense of Definition 1 because it cannot be spanned by finitely many vectors.

E X A M P L E 5 A Vector Space That Has No Finite Spanning Set Show that the vector space of

of all polynomials with real coefficients has no finite spanning set.

Solution If there were a finite spanning set, say , then the degrees of the polynomials in S would have a maximum value, say n; and this in turn would imply that any linear combination of the polynomials in S would have degree at most n. Thus, there would be no way to express the polynomial as a linear combination of the polynomials in S, contradicting the fact that the vectors in S span .

For reasons that will become clear shortly, a vector space that cannot be spanned by finitely many vectors is said to be infinite-dimensional, whereas those that can are said to be finite-dimensional.

E X A M P L E 6 Some Finite-and Infinite-Dimensional Spaces

In Example 1, Example 2, and Example 4 we found bases for , , and , so these vector spaces are finite-dimensional. We showed in Example 5 that the vector space is not spanned by finitely many vectors and hence is infinite-dimensional. In the exercises of this section and the next we will ask you to show that the vector spaces , , , , and are infinite-dimensional.

Coordinates Relative to a Basis Earlier in this section we drew an informal analogy between basis vectors and coordinate systems. Our next goal is to make this informal idea precise by defining the notion of a coordinate system in a general vector space. The following theorem will be our first step in that direction.

THEOREM 4.4.1 Uniqueness of Basis Representation If form

is a basis for a vector space V, then every vector v in V can be expressed in the in exactly one way.

Proof Since S spans V, it follows from the definition of a spanning set that every vector in V is expressible as a linear combination of the vectors in S. To see that there is only one way to express a vector as a linear combination of the vectors in S, suppose that some vector v can be written as

and also as Subtracting the second equation from the first gives Since the right side of this equation is a linear combination of vectors in S, the linear independence of S implies that that is, Thus, the two expressions for v are the same.

Figure 4.4.5

Sometimes it will be desirable to write a coordinate vector as a column matrix, in which case we will denote it using square brackets as

We will refer to as a coordinate matrix and reserve the terminology coordinate vector for the comma delimited form .

We now have all of the ingredients required to define the notion of “coordinates” in a general vector space V. For motivation, observe that in , for example, the coordinates (a, b, c) of a vector v are precisely the coefficients in the formula that expresses v as a linear combination of the standard basis vectors for definition generalizes this idea.

(see Figure 4.4.5). The following

DEFINITION 2 If

is a basis for a vector space V, and

is the expression for a vector v in terms of the basis S, then the scalars are called the coordinates of v relative to the basis S. The vector in constructed from these coordinates is called the coordinate vector of v relative to S; it is denoted by (6)

Remark Recall that two sets are considered to be the same if they have the same members, even if those

members are written in a different order. However, if is a set of basis vectors, then changing the order in which the vectors are written would change the order of the entries in , possibly producing a different coordinate vector. To avoid this complication, we will make the convention that in any discussion involving a basis S the order of the vectors in S remains fixed. Some authors call a set of basis vectors with this restriction an ordered basis. However, we will use this terminology only when emphasis on the order is required for clarity. Observe that is a vector in , so that once basis S is given for a vector space V, Theorem 4.4.1 establishes a one-to-one correspondence between vectors in V and vectors in (Figure 4.4.6).

Figure 4.4.6

E X A M P L E 7 Coordinates Relative to the Standard Basis for Rn In the special case where v are the same; that is,

and S is the standard basis, the coordinate vector

For example, in the representation of a vector the standard basis is

and the vector

as a linear combination of the vectors in

so the coordinate vector relative to this basis is

, which is the same as the vector v.

E X A M P L E 8 Coordinate Vectors Relative to Standard Bases (a) Find the coordinate vector for the polynomial

relative to the standard basis for the vector space

.

(b) Find the coordinate vector of

relative to the standard basis for

.

Solution (a) The given formula for basis vectors

expresses this polynomial as a linear combination of the standard . Thus, the coordinate vector for p relative to S is

(b) We showed in Example 4 that the representation of a vector

as a linear combination of the standard basis vectors is

so the coordinate vector of B relative to S is

E X A M P L E 9 Coordinates in R3 (a) We showed in Example 3 that the vectors form a basis for

. Find the coordinate vector of

relative to the basis

. (b) Find the vector v in

whose coordinate vector relative to S is

.

Solution (a) To find we must first express v as a linear combination of the vectors in S; that is, we must find values of , , and such that or, in terms of components, Equating corresponding components gives

Solving this system we obtain (b) Using the definition of

,

, we obtain

,

(verify). Therefore,

Concept Review • Basis • Standard bases for

,

,

• Finite-dimensional • Infinite-dimensional • Coordinates • Coordinate vector

Skills • Show that a set of vectors is a basis for a vector space. • Find the coordinates of a vector relative to a basis. • Find the coordinate vector of a vector relative to a basis.

Exercise Set 4.4 1. In words, explain why the following sets of vectors are not bases for the indicated vector spaces. (a)

,

,

(b)

,

(c)

,

(d)

,

for for for ,

,

,

Answer: (a) A basis for

has two linearly independent vectors.

(b) A basis for

has three linearly independent vectors.

(c) A basis for

has three linearly independent vectors.

(d) A basis for

has four linearly independent vectors.

2. Which of the following sets of vectors are bases for

?

(a) (b) (c) (d) 3. Which of the following sets of vectors are bases for (a) (b) (c)

?

, for

(d) Answer: (a), (b) 4. Which of the following form bases for

?

(a) (b) (c) (d) 5. Show that the following matrices form a basis for

.

6. Let V be the space spanned by

,

,

(a) Show that

.

is not a basis for V.

(b) Find a basis for V. 7. Find the coordinate vector of w relative to the basis (a)

,

(b) (c)

for

.

; ,

;

,

;

Answer: (a) (b) (c) 8. Find the coordinate vector of w relative to the basis (a)

,

;

(b)

,

;

(c)

,

;

of

9. Find the coordinate vector of v relative to the basis (a) (b) Answer: (a) (b)

;

, ;

. ,

,

.

,

10. Find the coordinate vector of p relative to the basis (a)

;

(b)

,

.

,

;

,

,

11. Find the coordinate vector of A relative to the basis

.

Answer:

In Exercises 12–13, show that basis vectors.

is a basis for

, and express A as a linear combination of the

12. 13. Answer:

In Exercises 14–15, show that vectors. 14.

is a basis for

,

15.

,

, ,

, and express p as a linear combination of the basis

; ;

Answer:

16. The accompanying figure shows a rectangular xy-coordinate system and an skewed axes. Assuming that 1-unit scales are used on all the axes, find the whose xy-coordinates are given. (a) (1, 1) (b) (1, 0) (c) (0, 1) (d) (a b)

-coordinate system with -coordinates of the points

Figure Ex-16 17. The accompanying figure shows a rectangular xy-coordinate system determined by the unit basis vectors i and j and an -coordinate system determined by unit basis vectors and . Find the -coordinates of the points whose xy-coordinates are given. (a) (b) (1, 0) (c) (0, 1) (d) (a, b)

Figure Ex-17 Answer: (a) (2, 0) (b) (c) (0, 1) (d)

18. The basis that we gave for in Example 4 consisted of noninvertible matrices. Do you think that there is a basis for consisting of invertible matrices? Justify your answer. 19. Prove that

is infinite-dimensional.

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) If

, then

is a basis for V.

Answer: False (b) Every linearly independent subset of a vector space V is a basis for V. Answer: False (c) If combination of

is a basis for a vector space V, then every vector in V can be expressed as a linear

Answer: True (d) The coordinate vector of a vector x in

relative to the standard basis for

Answer: True (e) Every basis of

contains at least one polynomial of degree 3 or less.

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

is x.

4.5 Dimension We showed in the previous section that the standard basis for

has three vectors, the standard basis for

has n vectors and hence that the standard basis

has two vectors, and the standard basis for

has one

vector. Since we think of space as three dimensional, a plane as two dimensional, and a line as one dimensional, there seems to be a link between the number of vectors in a basis and the dimension of a vector space. We will develop this idea in this section.

Number of Vectors in a Basis Our first goal in this section is to establish the following fundamental theorem.

THEOREM 4.5.1 All bases for a finite-dimensional vector space have the same number of vectors.

To prove this theorem we will need the following preliminary result, whose proof is deferred to the end of the section.

THEOREM 4.5.2 Let V be a finite-dimensional vector space, and let

be any basis.

(a) If a set has more than n vectors, then it is linearly dependent. (b) If a set has fewer than n vectors, then it does not span V.

Some writers regard the empty set to be a basis for the zero vector space. This is consistent with our definition of dimension, since the empty set has no vectors and the zero vector space has dimension zero.

We can now see rather easily why Theorem 4.5.1 is true; for if is an arbitrary basis for V, then the linear independence of S implies that any set in V with more than n vectors is linearly dependent and any set in V with fewer than n vectors does not span V. Thus, unless a set in V has exactly n vectors it cannot be a basis.

We noted in the introduction to this section that for certain familiar vector spaces the intuitive notion of dimension coincides with the number of vectors in a basis. The following definition makes this idea precise. Engineers often use the term degrees of freedom as a synonym for dimension.

DEFINITION 1 The dimension of a finite-dimensional vector space V is denoted by and is defined to be the number of vectors in a basis for V. In addition, the zero vector space is defined to have dimension zero.

E X A M P L E 1 Dimensions of Some Familiar Vector Spaces

E X A M P L E 2 Dimension of Span(S) If is a linearly independent set in a vector space V, then S is automatically a basis for span(S) (why?), and this implies that In words, the dimension of the space spanned by a linearly independent set of vectors is equal to the number of vectors in that set.

E X A M P L E 3 Dimension of a Solution Space Find a basis for and the dimension of the solution space of the homogeneous system

Solution We leave it for you to solve this system by Gauss-Jordan elimination and show that its general solution is

which can be written in vector form as or, alternatively, as This shows that the vectors and span the solution space. Since neither vector is a scalar multiple of the other, they are linearly independent and hence form a basis for the solution space. Thus, the solution space has dimension 2.

E X A M P L E 4 Dimension of a Solution Space Find a basis for and the dimension of the solution space of the homogeneous system

Solution In Example 6 of Section 1.2 we found the solution of this system to be which can be written in vector form as or, alternatively, as This shows that the vectors span the solution space. We leave it for you to check that these vectors are linearly independent by showing that none of them is a linear combination of the other two (but see the remark that follows). Thus, the solution space has dimension 3.

Remark It can be shown that for a homogeneous linear system, the method of the last example always produces a basis for the solution space of the system. We omit the formal proof.

Some Fundamental Theorems We will devote the remainder of this section to a series of theorems that reveal the subtle interrelationships among the concepts of linear independence, basis, and dimension. These theorems are not simply exercises in mathematical theory—they are essential to the understanding of vector spaces and the applications that build on them.

We will start with a theorem (proved at the end of this section) that is concerned with the effect on linear independence and spanning if a vector is added to or removed from a given nonempty set of vectors. Informally stated, if you start with a linearly independent set S and adjoin to it a vector that is not a linear combination of those in S, then the enlarged set will still be linearly independent. Also, if you start with a set S of two or more vectors in which one of the vectors is a linear combination of the others, then that vector can be removed from S without affecting span(S) (Figure 4.5.1).

Figure 4.5.1

THEOREM 4.5.3 Plus/Minus Theorem Let S be a nonempty set of vectors in a vector space V. (a) If S is a linearly independent set, and if v is a vector in V that is outside of that results by inserting v into S is still linearly independent.

then the set

(b) If v is a vector in S that is expressible as a linear combination of other vectors in S, and if denotes the set obtained by removing v from S, then span the same space; that is,

E X A M P L E 5 Applying the Plus/Minus Theorem Show that

,

, and

are linearly independent vectors.

Solution The set is linearly independent, since neither vector in S is a scalar multiple of the other. Since the vector cannot be expressed as a linear combination of the vectors in S (why?), it can be adjoined to S to produce a linearly independent set .

In general, to show that a set of vectors is a basis for a vector space V, we must show that the vectors are linearly independent and span V. However, if we happen to know that V has dimension n (so that contains the right number of vectors for a basis), then it suffices to check either linear

independence or spanning— the remaining condition will hold automatically. This is the content of the following theorem.

THEOREM 4.5.4 Let V be an n-dimensional vector space, and let S be a set in V with exactly n vectors. Then S is a basis for V if and only if S spans V or S is linearly independent.

Proof Assume that S has exactly n vectors and spans V. To prove that S is a basis, we must show that S is a linearly independent set. But if this is not so, then some vector v in S is a linear combination of the remaining vectors. If we remove this vector from S, then it follows from Theorem 4.5.3b that the remaining set of vectors still spans V. But this is impossible, since it follows from Theorem 4.5.2b that no set with fewer than n vectors can span an n-dimensional vector space. Thus S is linearly independent. Assume that S has exactly n vectors and is a linearly independent set. To prove that S is a basis, we must show that S spans V. But if this is not so, then there is some vector v in V that is not in . If we insert this vector into S, then it follows from Theorem 4.5.3a that this set of vectors is still linearly independent. But this is impossible, since Theorem 4.5.2a states that no set with more than n vectors in an n-dimensional vector space can be linearly independent. Thus S spans V.

E X A M P L E 6 Bases by Inspection (a) By inspection, explain why

and

(b) By inspection, explain why basis for .

,

form a basis for

.

, and

form a

Solution (a) Since neither vector is a scalar multiple of the other, the two vectors form a linearly independent set in the two-dimensional space , and hence they form a basis by Theorem 4.5.4. (b) The vectors and form a linearly independent set in the xz-plane (why?). The vector is outside of the xz-plane, so the set is also linearly independent. Since is three-dimensional, Theorem 4.5.4 implies that is a basis for .

The next theorem (whose proof is deferred to the end of this section) reveals two important facts about the vectors in a finite-dimensional vector space V: 1. Every spanning set for a subspace is either a basis for that subspace or has a basis as a subset. 2. Every linearly independent set in a subspace is either a basis for that subspace or can be extended to a basis for it.

THEOREM 4.5.5 Let S be a finite set of vectors in a finite-dimensional vector space V. (a) If S spans V but is not a basis for V, then S can be reduced to a basis for V by removing appropriate vectors from S. (b) If S is a linearly independent set that is not already a basis for V, then S can be enlarged to a basis for V by inserting appropriate vectors into S.

We conclude this section with a theorem that relates the dimension of a vector space to the dimensions of its subspaces.

THEOREM 4.5.6 If W is a subspace of a finite-dimensional vector space V, then: (a) W is finite-dimensional. (b) (c)

. if and only if

.

Proof (a) We will leave the proof of this part for the exercises. Proof (b) Part (a) shows that W is finite-dimensional, so it has a basis

Either S is also a basis for V or it is not. If so, then , which means that . Ifnot, then because S is a linearly independent set it can be enlarged to a basis for V by part (b) of Theorem 4.5.5. But this implies that , so we have shown that in all cases. Proof (c) Assume that

and that

is a basis for W. If S is not also a basis for V, then being linearly independent S can be extended to a basis for V by part (b) of Theorem 4.5.5. But this would mean that , which contradicts our hypothesis. Thus S must also be a basis for V, which means that . Figure 4.5.2 illustrates the geometric relationship between the subspaces of dimension.

in order of increasing

Figure 4.5.2 OPTIONAL

We conclude this section with optional proofs of Theorem 4.5.2, Theorem 4.5.3, and Theorem 4.5.5. Proof of Theorem 4.5.2(a) Let want to show that S′ is linearly dependent. Since linear combination of the vectors in S, say

be any set of m vectors in V, where is a basis, each

. We

can be expressed as a

(1) To show that S′ is linearly dependent, we must find scalars

, not all zero, such that (2)

Using the equations in 1, we can rewrite 2 as

Thus, from the linear independence of S, the problem of proving that S′ is a linearly dependent set reduces to showing there are scalars , not all zero, that satisfy

(3)

But 3 has more unknowns than equations, so the proof is complete since Theorem 1.2.2 guarantees the existence of nontrivial solutions. Proof of Theorem 4.5.2(b) Let

be any set of m vectors in V, where

. We

want to show that S′ does not span V. We will do this by showing that the assumption that S′ spans V leads to a contradiction of the linear independence of . If S′ spans V, then every vector in V is a linear combination of the vectors in S′. In particular, each basis vector is a linear combination of the vectors in S′,

say

(4) To obtain our contradiction, we will show that there are scalars

, not all zero, such that (5)

But 4 and 5 have the same form as 1 and 2 except that m and n are interchanged and the w′s and v′s are interchanged. Thus, the computations that led to 3 now yield

This linear system has more unknowns than equations and hence has nontrivial solutions by Theorem 1.2.2. Proof of Theorem 4.5.3(a) Assume that and v is a vector in V outside of . To show that

is a linearly independent set of vectors in V, is a linearly independent set,

we must show that the only scalars that satisfy

(6) are as a linear combination of simplifies to

. But it must be true that for otherwise we could solve 6 for v , contradicting the assumption that v is outside of . Thus, 6

(7) which, by the linear independence of

Proof Theorem 4.5.3(b) Assume that suppose that is a linear combination of

, implies that

is a set of vectors in V, and (to be specific) , say

(8) We want to show that if is removed from S, then the remaining set of vectors S; that is, we must show that every vector w in is expressible as a linear combination of . But if w is in , then w is expressible in the form or, on substituting 8,

still spans

which expresses w as a linear combination of

.

Proof of Theorem 4.5.5(a) If S is a set of vectors that spans V but is not a basis for V, then S is a linearly dependent set. Thus some vector v in S is expressible as a linear combination of the other vectors in S. By the Plus/Minus Theorem (4.5.3b), we can remove v from S, and the resulting set S′ will still span V. If S′ is linearly independent, then S′ is a basis for V, and we are done. If S′ is linearly dependent, then we can remove some appropriate vector from S′ to produce a set S″ that still spans V. We can continue removing vectors in this way until we finally arrive at a set of vectors in S that is linearly independent and spans V. This subset of S is a basis for V. Proof of Theorem 4.5.5(b) Suppose that . If S is a linearly independent set that is not already a basis for V, then S fails to span V, so there is some vector v in V that is not in . By the Plus/Minus Theorem (4.5.3a), we can insert v into S, and the resulting set S′ will still be linearly independent. If S′ spans V, then S′ is a basis for V, and we are finished. If S′ does not span V, then we can insert an appropriate vector into S′ to produce a set S″ that is still linearly independent. We can continue inserting vectors in this way until we reach a set with n linearly independent vectors in V. This set will be a basis for V by Theorem 4.5.4.

Concept Review • Dimension • Relationships among the concepts of linear independence, basis, and dimension

Skills • Find a basis for and the dimension of the solution space of a homogeneous linear system. • Use dimension to determine whether a set of vectors is a basis for a finite-dimensional vector space. • Extend a linearly independent set to a basis.

Exercise Set 4.5 In Exercises 1–6, find a basis for the solution space of the homogeneous linear system, and find the dimension of that space. 1.

Answer: Basis: (1, 0, 1); dimension = 1

2. 3.

Answer: Basis:

;

4.

5.

Answer: No basis; 6.

7. Find bases for the following subspaces of (a) The plane (b) The plane

.

. .

(c) The line

.

(d) All vectors of the form

, where

.

Answer: (a) (b) (1, 1, 0), (0, 0, 1) (c) (d) (1, 1, 0), (0, 1, 1) 8. Find the dimensions of the following subspaces of

.

(a) All vectors of the form

.

(b) All vectors of the form

, where

and

(c) All vectors ofthe form

, where

.

9. Find the dimension of each ofthe following vector spaces. (a) The vector space of all diagonal

matrices.

.

(b) The vector space of all symmetric

matrices.

(c) The vector space of all upper triangular

matrices.

Answer: (a) n (b) (c) 10. Find the dimension of the subspace of .

consisting of all polynomials

11. (a) Show that the set W of all polynomials in

such that

for which

is a subspace of

.

(b) Make a conjecture about the dimension of W. (c) Confirm your conjecture by finding a basis for W. 12. Find a standard basis vector for

that can be added to the set

to produce a basis for

.

(a) (b) 13. Find standard basis vectors for

that can be added to the set

to produce a basis for

.

Answer: Any two of (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1) can be used. 14. Let

be a basis for a vector space V. Show that , and .

15. The vectors for .

and

is also a basis, where

are linearly independent. Enlarge

, to a basis

Answer: with 16. The vectors to a basis for

and

are linearly independent. Enlarge

.

17. (a) Show that for every positive integer n, one can find . [Hint: Look for polynomials.] (b) Use the result inpart (a) to prove that (c) Prove that

linearly independent vectors in

is infinite- dimensional. , and

18. Let S be a basis for an n-dimensional vector space V. Show that if independent set of vectors in V, then the coordinate vectors independent set in , and conversely.

are infinite-dimensional vector spaces. form a linearly form a linearly

19. Using the notation from Exercise 18, show that if the vectors vectors span , and conversely. 20. Find a basis for the subspace of (a) (b) (c)

spanned by the given vectors.

, ,

,

span V, then the coordinate

,9 ,

,

,

[Hint: Let S be the standard basis for 18 and 19.]

, and work with the coordinate vectors relative to S as in Exercises

21. Prove: A subspace of a finite-dimensional vector space is finite-dimensional. 22. State the two parts of Theorem 4.5.2 in contrapositive form.

True-False Exercises In parts (a)–(j) determine whether the statement is true or false, and justify your answer. (a) The zero vector space has dimension zero. Answer: True (b) There is a set of 17 linearly independent vectors in

.

Answer: True (c) There is a set of 11 vectors that span

.

Answer: False (d) Every linearly independent set of five vectors in

is a basis for

Answer: True (e) Every set of five vectors that spans

is a basis for

.

Answer: True (f) Every set of vectors that spans Answer: True

contains a basis for

.

.

(g) Every linearly independent set of vectors in

is contained in some basis for

.

Answer: True (h) There is a basis for

consisting of invertible matrices.

Answer: True (i)

If A has size

and

are distinct matrices, then

dependent. Answer: True (j) There are at least two distinct three-dimensional subspaces of Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

.

is linearly

4.6 Change of Basis A basis that is suitable for one problem may not be suitable for another, so it is a common process in the study of vector spaces to change from one basis to another. Because a basis is the vector space generalization of a coordinate system, changing bases is akin to changing coordinate axes in and . In this section we will study problems related to change of basis.

Coordinate Maps If

is a basis for a finite-dimensional vector space V, and if

is the coordinate vector of v relative to S, then, as observed in Section 4.4 , the mapping (1) creates a connection (a one-to-one correspondence) between vectors in the general vector space V and vectors in the familiar vector space . We call 1 the coordinate map from V to . In this section we will find it convenient to express coordinate vectors in the matrix form (2) where the square brackets emphasize the matrix notation (Figure 4.6.1).

Figure 4.6.1

Change of Basis There are many applications in which it is necessary to work with more than one coordinate system. In such cases it becomes important to know how the coordinates of a fixed vector relative to each coordinate system are related. This leads to the following problem.

The Change-of-Basis Problem If v is a vector in a finite-dimensional vector space V, and if we change the basis for V from a basis B to a basis B′, how are the coordinate vectors and ?

Remark To solve this problem, it will be convenient to refer to B as the “old basis” and B′ as the “new basis.” Thus, our objective is to find a relationship between the old and new coordinates of a fixed vector v in V. For simplicity, we will solve this problem for two-dimensional spaces. The solution for n-dimensional spaces is similar. Let be the old and new bases, respectively. We will need the coordinate vectors for the new basis vectors relative to the old basis. Suppose they are (3) That is, (4) Now let v be any vector in V, and let (5) be the new coordinate vector, so that (6) In order to find the old coordinates of v, we must express v in terms of the old basis B. To do this, we substitute 4 into 6. This yields or Thus, the old coordinate vector for v is

which, by using 5, can be written as

This equation states that the old coordinate vector on the left by the matrix

results when we multiply the new coordinate vector

Since the columns of this matrix are the coordinates of the new basis vectors relative to the old basis [see 3] we have the following solution of the change-of-basis problem.

Solution of the Change-of-Basis Problem If we change the basis for a vector space V from an old basis , then for each vector v in V, the old coordinate vector new coordinate vector

to a new basis is related to the

by the equation (7)

where the columns of P are the coordinate vectors of the new basis vectors relative to the old basis; that is, the column vectors of P are (8)

Transition Matrices The matrix P in Equation 7 is called the transition matrix from to B. For emphasis, we will often denote it by It follows from 8 that this matrix can be expressed in terms of its column vectors as (9) Similarly, the transition matrix from B to

can be expressed in terms of its column vectors as (10)

Remark There is a simple way to remember both of these formulas using the terms “old basis” and “new basis” defined earlier in this section: In Formula 9 the old basis is and the new basis is B, whereas in Formula 10 the old basis is B and the new basis is . Thus, both formulas can be restated as follows:

The columns of the transition matrix from an old basis to a new basis are the coordinate vectors of the old basis relative to the new basis.

E X A M P L E 1 Finding Transition Matrices Consider the bases

and

for

(a) Find the transition matrix

from

(b) Find the transition matrix

from B to

, where

to B. .

Solution (a) Here the old basis vectors are and and the new basis vectors are and . We want to find the coordinate matrices of the old basis vectors and relative to the new basis vectors and . To do this, first we observe that

from which it follows that

and hence that

(b) Here the old basis vectors are and and the new basis vectors are and . As in part (a), we want to find the coordinate matrices of the old basis vectors and relative to the new basis vectors and . To do this, observe that

from which it follows that

and hence that

Suppose now that B and are bases for a finite-dimensional vector space V. Since multiplication by maps coordinate vectors relative to the basis into coordinate vectors relative to a basis B, and maps coordinate vectors relative to B into coordinate vectors relative to , it follows that for every vector v in V we have (11)

(12)

E X A M P L E 2 Computing Coordinate Vectors Let B and

be the bases in Example 1. Use an appropriate formula to find

Solution To find we need to make the transition from 11 and part (a) of Example 1 that

given that

to B. It follows from Formula

Invertibility of Transition Matrices If B and

are bases for a finite-dimensional vector space V, then

because multiplication by first maps B-coordinates of a vector into -coordinates, and then maps those -coordinates back into the original B-coordinates. Since the net effect of the two operations is to leave each coordinate vector unchanged, we are led to conclude that must be the identity matrix, that is, (13) (we omit the formal proof). For example, for the transition matrices obtained in Example 1 we have

It follows from 13 that theorem.

is invertible and that its inverse is

Thus, we have the following

THEOREM 4.6.1 If P is the transition matrix from a basis to a basis B for a finite-dimensional vector space V, then P is invertible and is the transition matrix from B to .

An Efficient Method for Computing Transition Matrices for Rn Our next objective is to develop an efficient procedure for computing transition matrices between bases for . As illustrated in Example 1, the first step in computing a transition matrix is to express each new basis vector as a linear combination of the old basis vectors. For this involves solving n linear systems of n equations in n unknowns, each of which has the same coefficient matrix (why?). An efficient way to do this is by the method illustrated in Example 2 of Section 1.6, which is as follows:

A Procedure for Computing PB → B′ Step 1 Form the matrix

.

Step 2 Use elementary row operations to reduce the matrix in Step 1 to reduced row echelon form. Step 3 The resulting matrix will be Step 4 Extract the matrix

from the right side of the matrix in Step 3.

This procedure is captured in the following diagram. (14)

E X A M P L E 3 Example 1 Revisited In Example 1 we considered the bases

and

(a) Use Formula 14 to find the transition matrix from

for

to B.

(b) Use Formula 14 to find the transition matrix from B to Solution

.

, where

(a) Here

is the old basis and B is the new basis, so

Since the left side is already the identity matrix, no reduction is needed. We see by inspection that the transition matrix is

which agrees with the result in Example 1. (b) Here B is the old basis and

is the new basis, so

By reducing this matrix, so the left side becomes the identity we obtain (verify)

so the transition matrix is

which also agrees with the result in Example 1.

Transition to the Standard Basis for Rn Note that in part (a) of the last example the column vectors of the matrix that made the transition from the basis to the standard basis turned out to be the vectors in written in column form. This illustrates the following general result.

THEOREM 4.6.2 Let standard basis for

beany basis for the vector space

and let

be the

. If the vectors in these bases are written in column form, then (15)

It follows from this theorem that if

is any invertible matrix, then A can be viewed as the transition matrix from the basis for to the standard basis for . Thus, for example, the matrix

which was shown to be invertible in Example 4 of Section 1.5, is the transition matrix from the basis to the basis

Concept Review • Coordinate map • Change-of-basis problem • Transition matrix

Skills • Find coordinate vectors relative to a given basis directly. • Find the transition matrix from one basis to another. • Use the transition matrix to compute coordinate vectors.

Exercise Set 4.6 1. Find the coordinate vector for w relative to the basis (a) (b) (c) Answer: (a) (b)

for

.

(c)

2. Find the coordinate vector for v relative to the basis (a)

.

for

.

;

(b)

;

3. Find the coordinate vector for p relative to the basis (a) (b)

for

; ;

Answer: (a)

(b)

4. Find the coordinate vector for A relative to the basis

5. Consider the coordinate vectors

(a) Find w if S is the basis in Exercise 2(a). (b) Find q if S is the basis in Exercise 3(a). (c) Find B if S is the basis in Exercise 4. Answer: (a) (b) (c)

for

.

6. Consider the bases

and

(a) Find the transition matrix from

for

to B.

(b) Find the transition matrix from B to (c) Compute the coordinate vector

and use 10 to compute

, where

. , where

.

(d) Check your work by computing

directly.

7. Repeat the directions of Exercise 6 with the same vector w but with

Answer: (a)

(b)

(c)

8. Consider the bases

and

(a) Find the transition matrix from B to (b) Compute the coordinate vector

for

. , where

, where

and use 12 to compute

.

(c) Check your work by computing

directly.

9. Repeat the directions of Exercise 8 with the same vector w, but with

Answer: (a)

(b)

10. Consider the bases

and

(a) Find the transition matrix from

for

to B.

(b) Find the transition matrix from B to (c) Compute the coordinate vector

. , where

(d) Check your work by computing 11. Let V be the space spanned by (a) Show that

and and

(c) Find the transition matrix from B to (d) Compute the coordinate vector (e) Check your work by computing

(b)

, and use 12 to compute

.

directly. . form a basis for V.

(b) Find the transition matrix from

Answer:

where

to

.

. , where directly.

, and use 12 to obtain

.

(c)

(d) 12. Let S be the standard basis for (a) Find the transition matrix

, and let

be the basis in which

and

by inspection.

(b) Use Formula 14 to find the transition matrix (c) Confirm that

and

are inverses of one another.

(d) Let

Find

and then use Formula 11 to compute

(e) Let

Find

and then use Formula 12 to compute

13. Let S be the standard basis for , and (a) Find the transition matrix

, and let .

be the basis in which

by inspection.

(b) Use Formula 14 to find the transition matrix

.

(c) Confirm that

and

are inverses of one another.

(d) Let

. Find

and then use Formula 11 to compute

.

(e) Let

. Find

and then use Formula 12 to compute

.

Answer: (a)

(b)

(d)

(e)

14. Let

and

be the bases for and

in which .

(a) Use Formula 14 to find the transition matrix

.

(b) Use Formula 14 to find the transition matrix

.

(c) Confirm that

and

are inverses of one another.

,

(d) Let

. Find

and then use the matrix

to compute

from

.

(e) Let

. Find

and then use the matrix

to compute

from

.

15. Let

and

be the bases for

, and

in which

,

,

.

(a) Use Formula 14 to find the transition matrix

.

(b) Use Formula 14 to find the transition matrix

.

(c) Confirm that

and

are inverses of one another.

(d) Let

. Find

and then use the matrix

to compute

from

.

(e) Let

. Find

and then use the matrix

to compute

from

.

Answer: (a) (b) (d) (e) 16. Let

and , .

(a) Find the transition matrix (b) Let compute

,

be the bases for ,

in which

. Find and then use the transition matrix obtained in part (a) to by matrix multiplication. directly.

17. Follow the directions of Exercise 16 with the same vector w but with , and

(a)

, and

.

(c) Check the result in part (b) by computing

Answer:

,

, .

(b)

18. Let be the standard basis for , and let vectors in S are reflected about the line . (a) Find the transition matrix (b) Let

be the basis that results when the

.

and show that

.

19. Let be the standard basis for , and let be the basis that results when the vectors in S are reflected about the line that makes an angle with the positive x-axis. (a) Find the transition matrix (b) Let

.

and show that

.

Answer: (a) 20. If

,

, and

are bases for

then

, and if

.

21. If P is the transition matrix from a basis to a basis B, and Q is the transition matrix from B to a basis C, what is the transition matrix from to C? What is the transition matrix from C to ? 22. To write the coordinate vector for a vector, it is necessary to specify an order for the vectors in the basis. If P is the transition matrix from a basis to a basis B, what is the effect on P if we reverse the order of vectors in B from to ? What is the effect on P if we reverse the order of vectors in both and B? 23. Consider the matrix

(a) P is the transition matrix from what basis B to the standard basis (b) P is the transition matrix from the standard basis Answer: (a) (b)

for

?

to what basis B for

?

24. The matrix

is the transition matrix from what basis B to the basis

for

?

25. Let B be a basis for only if the vectors

. Prove that the vectors form a linearly independent set in form a linearly independent set in .

26. Let B be a basis for

. Prove that the vectors span .

27. If

holds for all vectors

in

span

if and

if and only if the vectors

, what can you say about the basis B?

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer. (a) If

and

are bases for a vector space V, then there exists a transition matrix from

to

.

Answer: True (b) Transition matrices are invertible. Answer: True (c) If B is a basis for a vector space

, then

is the identity matrix.

Answer: True (d) If

is a diagonal matrix, then each vector in

is a scalar multiple of some vector in

.

Answer: True (e) If each vector in

is a scalar multiple of some vector in

, then

is a diagonal matrix.

Answer: False (f) If A is a square matrix, then Answer: False

for some bases

and

for

.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

4.7 Row Space, Column Space, and Null Space In this section we will study some important vector spaces that are associated with matrices. Our work here will provide us with a deeper understanding of the relationships between the solutions of a linear system and properties of its coefficient matrix.

Row Space, Column Space, and Null Space Recall that vectors can be written in comma-delimited form or in matrix form as either row vectors or column vectors. In this section we will use the latter two.

DEFINITION 1 For an

matrix

the vectors

in

that are formed from the rows of A are called the row vectors of A, and the vectors

in

formed from the columns of A are called the column vectors of A.

E X A M P L E 1 Row and Column Vectors of a 2 × 3 Matrix Let

The row vectors of A are and the column vectors of A are

The following definition defines three important vector spaces associated with a matrix.

DEFINITION 2 If A is an matrix, then the subspace of spanned by the row vectors of A is called the row space of A, and the subspace of spanned by the column vectors of A is called the column space of A. The solution space of the homogeneous system of equations , which is a subspace of , is called the null space of A.

In this section and the next we will be concerned with two general questions: Question 1. What relationships exist among the solutions of a linear system and null space of the coefficient matrix A?

and the row space, column space,

Question 2. What relationships exist among the row space, column space, and null space of a matrix? Starting with the first question, suppose that

It follows from Formula 10 of Section 1.3 that if denote the column vectors of A, then the product be expressed as a linear combination of these vectors with coefficients from ; that is,

can

(1) Thus, a linear system,

, of m equations in n unknowns can be written as (2)

from which we conclude that is consistent if and only if vectors of A. This yields the following theorem.

is expressible as a linear combination of the column

THEOREM 4.7.1 A system of linear equations

E X A M P L E 2 A Vector Let

be the linear system

is consistent if and only if b is in the column space of A.

in the Column Space of A

Show that A.

is in the column space of A by expressing it as a linear combination of the column vectors of

Solution Solving the system by Gaussian elimination yields (verify) It follows from this and Formula 2 that

Recall from Theorem 3.4.4 that the general solution of a consistent linear system can be obtained by adding any specific solution of this system to the general solution of the corresponding homogeneous system . Keeping in mind that the null space of A is the same as the solution space of , we can rephrase that theorem in the following vector form.

THEOREM 4.7.2 If is any solution of a consistent linear system , and if space of A, then every solution of can be expressed in the form

is a basis for the null

(3) Conversely, for all choices of scalars

, the vector

in this formula is a solution of

.

Equation 3 gives a formula for the general solution of . The vector in that formula is called a particular solution of , and the remaining part of the formula is called the general solution of . In words, this formula tells us that. The general solution of a consistent linear system can be expressed as the sum of a particular solution of that system and the general solution of the corresponding homogeneous system. Geometrically, the solution set of 4.7.1).

can be viewed as the translation by

of the solution space of

(Figure

Figure 4.7.1

E X A M P L E 3 General Solution of a Linear System Ax = b In the concluding subsection of Section 3.4 we compared solutions of the linear systems

and deduced that the general solution of the nonhomogeneous system and the general solution corresponding homogeneous system (when written in column-vector form) are related by

Recall from the Remark following Example 4 of Section 4.5 that the vectors in .

of the

form a basis for the solution space of

Bases for Row Spaces, Column Spaces, and Null Spaces We first developed elementary row operations for the purpose of solving linear systems, and we know from that work that performing an elementary row operation on an augmented matrix does not change the solution set of the corresponding linear system. It follows that applying an elementary row operation to a matrix A does not change the solution set of the corresponding linear system , or, stated another way, it does not change the null space of A. Thus we have the following theorem.

THEOREM 4.7.3 Elementary row operations do not change the null space of a matrix.

The following theorem, whose proof is left as an exercise, is a companion to Theorem 4.7.3.

THEOREM 4.7.4 Elementary row operations do not change the row space of a matrix.

Theorems 4.7.3 and 4.7.4 might tempt you into incorrectly believing that elementary row operations do not change the column space of a matrix. To see why this is not true, compare the matrices

The matrix B can be obtained from A by adding −2 times the first row to the second. However, this operation has changed the column space of A, since that column space consists of all scalar multiples of

whereas the column space of B consists of all scalar multiples of

and the two are different spaces.

E X A M P L E 4 Finding a Basis for the Null Space of a Matrix Find a basis for the null space of the matrix

Solution The null space of A is the solution space of the homogeneous linear system shown in Example 3, has the basis

, which, as

Remark Observe that the basis vectors , , and in the last example are the vectors that result by successively setting one of the parameters in the general solution equal to 1 and the others equal to 0. The following theorem makes it possible to find bases for the row and column spaces of a matrix in row echelon form by inspection.

THEOREM 4.7.5 If a matrix R is in row echelon form, then the row vectors with the leading 1′s (the nonzero row vectors) form a basis for the row space of R, and the column vectors with the leading 1′s of the row vectors form a basis for the column space of R.

The proof involves little more than an analysis of the positions of the 0′s and 1′s of R. We omit the details.

E X A M P L E 5 Bases for Row and Column Spaces The matrix

is in row echelon form. From Theorem 4.7.5, the vectors

form a basis for the row space of R, and the vectors

form a basis for the column space of R.

E X A M P L E 6 Basis for a Row Space by Row Reduction Find a basis for the row space of the matrix

Solution Since elementary row operations do not change the row space of a matrix, we can find a basis for the row space of A by finding a basis for the row space of any row echelon form of A. Reducing A to row echelon form, we obtain (verify)

By Theorem 4.7.5, the nonzero row vectors of R form a basis for the row space of R and hence form a basis for the row space of A. These basis vectors are

The problem of finding a basis for the column space of a matrix A in Example 6 is complicated by the fact that an elementary row operation can alter its column space. However, the good news is that elementary row operations do not alter dependence relationships among the column vectors. To make this more precise, suppose that are linearly dependent column vectors of A, so there are scalars that are not all zero and such that (4) If we perform an elementary row operation on A, then these vectors will be changed into new column vectors . At first glance it would seem possible that the transformed vectors might be linearly independent. However, this is not so, since it can be proved that these new column vectors will be linear dependent and, in fact, related by an equation that has exactly the same coefficients as 4. It follows from the fact that elementary row operations are reversible that they also preserve linear independence among column vectors (why?). The following theorem summarizes all of these results.

THEOREM 4.7.6 If A and B are row equivalent matrices, then: (a) A given set of column vectors of A is linearly independent if and only if the corresponding column vectors of B are linearly independent.

(b) A given set of column vectors of A forms a basis for the column space of A if and only if the corresponding column vectors of B form a basis for the column space of B.

E X A M P L E 7 Basis for a Column Space by Row Reduction Find a basis for the column space of the matrix

Solution We observed in Example 6 that the matrix

is a row echelon form of A. Keeping in mind that A and R can have different column spaces, we cannot find a basis for the column space of A directly from the column vectors of R. However, it follows from Theorem 4.7.6b that if we can find a set of column vectors of R that forms a basis for the column space of R, then the corresponding column vectors of A will form a basis for the column space of A. Since the first, third, and fifth columns of R contain the leading 1′s of the row vectors, the vectors

form a basis for the column space of R. Thus, the corresponding column vectors of A, which are

form a basis for the column space of A.

Up to now we have focused on methods for finding bases associated with matrices. Those methods can readily be adapted to the more general problem of finding a basis for the space spanned by a set of vectors in .

E X A M P L E 8 Basis for a Vector Space Using Row Operations Find a basis for the subspace of

spanned by the vectors

Solution The space spanned by these vectors is the row space of the matrix

Reducing this matrix to row echelon form, we obtain

The nonzero row vectors in this matrix are These vectors form a basis for the row space and consequently form a basis for the subspace of spanned by , , , and .

Bases Formed from Row and Column Vectors of a Matrix In all of the examples we have considered thus far we have looked for bases in which no restrictions were imposed on the individual vectors in the basis. We now want to focus on the problem of finding a basis for the row space of a matrix A consisting entirely of row vectors from A and a basis for the column space of A consisting entirely of column vectors of A. Looking back on our earlier work, we see that the procedure followed in Example 7 did, in fact, produce a basis for the column space of A consisting of column vectors of A, whereas the procedure used in Example 6 produced a basis for the row space of A, but that basis did not consist of row vectors of A. The following example shows how to adapt the procedure from Example 7 to find a basis for the row space of a matrix that is formed from its row vectors.

E X A M P L E 9 Basis for the Row Space of a Matrix Find a basis for the row space of

consisting entirely of row vectors from A. Solution We will transpose A, thereby converting the row space of A into the column space of ; then we will use the method of Example 7 to find a basis for the column space of ; and then we will transpose again to convert column vectors back to row vectors. Transposing A yields

Reducing this matrix to row echelon form yields

The first, second, and fourth columns contain the leading 1′s, so the corresponding column vectors in form a basis for the column space of ; these are

Transposing again and adjusting the notation appropriately yields the basis vectors and for the row space of A.

Next, we will give an example that adapts the methods we have developed above to solve the following general problem in :

PROBLEM Given a set of vectors in , find a subset of these vectors that forms a basis for span (S), and express those vectors that are not in that basis as a linear combination of the basis vectors.

E X A M P L E 1 0 Basis and Linear Combinations (a) Find a subset of the vectors

that forms a basis for the space spanned by these vectors. (b) Express each vector not in the basis as a linear combination of the basis vectors. Solution

(a) We begin by constructing a matrix that has

as its column vectors:

(5)

The first part of our problem can be solved by finding a basis for the column space of this matrix. Reducing the matrix to reduced row echelon form and denoting the column vectors of the resulting matrix by , , , , and yields

(6)

The leading 1′s occur in columns 1, 2, and 4, so by Theorem 4.7.5, is a basis for the column space of 6, and consequently, is a basis for the column space of 5. (b) We will start by expressing and as linear combinations of the basis vectors , , . The simplest way of doing this is to express and in terms of basis vectors with smaller subscripts. Accordingly, we will express as a linear combination of and , and we will express as a linear combination of , , and . By inspection of 6, these linear combinations are

We call these the dependency equations. The corresponding relationships in 5 are

The following is a summary of the steps that we followed in our last example to solve the problem posed above.

Basis for Span(S) Step 1. Form the matrix A having vectors in

as column vectors.

Step 2. Reduce the matrix A to reduced row echelon form R. Step 3. Denote the column vectors of R by

.

Step 4. Identify the columns of R that contain the leading 1′s. The corresponding column vectors of A form a basis for span(S). This completes the first part of the problem. Step 5. Obtain a set of dependency equations by expressing each column vector of R that does not contain a leading 1 as a linear combination of preceding column vectors that do contain leading 1′s.

Step 6. Replace the column vectors of R that appear in the dependency equations by the corresponding column vectors of A. This completes the second part of the problem.

Concept Review • Row vectors • Column vectors • Row space • Column space • Null space • General solution • Particular solution • Relationships among linear systems and row spaces, column spaces, and null spaces • Relationships among the row space, column space, and null space of a matrix • Dependency equations

Skills • Determine whether a given vector is in the column space of a matrix; if it is, express it as a linear combination of the column vectors of the matrix. • Find a basis for the null space of a matrix. • Find a basis for the row space of a matrix. • Find a basis for the column space of a matrix. • Find a basis for the span of a set of vectors in

.

Exercise Set 4.7 1. List the row vectors and column vectors of the matrix

Answer: ;

2. Express the product

as a linear combination of the column vectors of A.

(a) (b)

(c)

(d)

3. Determine whether of A.

is in the column space of A, and if so, express

(a) (b)

(c)

(d)

(e)

Answer: (a) (b) b is not in the column space of A. (c)

(d)

(e)

as a linear combination of the column vectors

4. Suppose that , , , the solution set of the homogeneous system

is a solution of a nonhomogeneous linear system is given by the formulas

(a) Find a vector form of the general solution of

.

(b) Find a vector form of the general solution of

.

5. In parts (a)–(d), find the vector form of the general solution of the given linear system find the vector form of the general solution of . (a) (b)

(c)

(d)

Answer: (a) (b)

(c)

(d)

6. Find a basis for the null space of A. (a)

(b)

and that

; then use that result to

(c)

(d)

(e)

7. In each part, a matrix in row echelon form is given. By inspection, find bases for the row and column spaces of A. (a)

(b)

(c)

(d)

Answer: (a)

(b)

(c)

,

(d)

8. For the matrices in Exercise 6, find a basis for the row space of A by reducing the matrix to row echelon form. 9. By inspection, find a basis for the row space and a basis for the column space of each matrix. (a)

(b)

(c)

(d)

Answer: (a)

(b)

(c)

(d)

10. For the matrices in Exercise 6, find a basis for the row space of A consisting entirely of row vectors of A. 11. Find a basis for the subspace of (a)

,

(b)

,

spanned by the given vectors. , ,

(c) Answer: (a) (b) (c) (1, 1, 0, 0), (0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1) 12. Find a subset of the vectors that forms a basis for the space spanned by the vectors; then express each vector that is not in the basis as a linear combination of the basis vectors. (a) (b) (c) 13. Prove that the row vectors of an

invertible matrix A form a basis for

.

14. Construct a matrix whose null space consists of all linear combinations of the vectors

15. (a) Let

Show that relative to an -coordinate system in 3-space the null space of A consists of all points on the z-axis and that the column space consists of all points in the xy-plane (see the accompanying figure). (b) Find a

matrix whose null space is the x-axis and whose column space is the yz-plane.

Figure Ex-15 Answer: (b)

16. Find a

matrix whose null space is

(a) a point. (b) a line. (c) a plane. 17. (a) Find all

matrices whose null space is the line

(b) Sketch the null spaces of the following matrices:

Answer: (a)

for all real numbers a, b not both 0.

(b) Since A and B are invertible, their null spaces are the origin. The null space of C is the line space of D is the entire xy-plane.

. The null

18. The equation can be viewed as a linear system of one equation in three unknowns. Express its general solution as a particular solution plus the general solution of the corresponding homogeneous system. [Suggestion: Write the vectors in column form.] 19. Suppose that A and B are matrices and A is invertible.Invent and prove a theorem that describes how the row spaces of and B are related.

True-False Exercises In parts (a)–(j) determine whether the statement is true or false, and justify your answer. (a) The span of

is the column space of the matrix whose column vectors are

.

Answer: True (b) The column space of a matrix A is the set of solutions of

.

Answer: False (c) If R is the reduced row echelon form of A, then those column vectors of R that contain the leading 1′s form a basis for the column space of A.

Answer: False (d) The set of nonzero row vectors of a matrix A is a basis for the row space of A. Answer: False (e) If A and B are

matrices that have the same row space, then A and B have the same column space.

Answer: False (f) If E is an of A.

elementary matrix and A is an

matrix, then the null space of E A is the same as the null space

elementary matrix and A is an

matrix, then the row space of E A is the same as the row space

elementary matrix and A is an

matrix, then the column space of E A is the same as the column

Answer: True (g) If E is an of A. Answer: True (h) If E is an space of A. Answer: False (i) The system

is inconsistent if and only if

is not in the column space of A.

Answer: True (j) There is an invertible matrix A and a singular matrix B such that the row spaces of A and B are the same. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

4.8 Rank, Nullity, and the Fundamental Matrix Spaces In the last section we investigated relationships between a system of linear equations and the row space, column space, and null space of its coefficient matrix. In this section we will be concerned with the dimensions of those spaces. The results weobtain will provide a deeper insight into the relationship between a linear system and its coefficient matrix.

Row and Column Spaces Have Equal Dimensions In Examples 6 and 7 of Section 4.7 we found that the row and column spaces of the matrix

both have three basis vectors and hence are both three-dimensional. The fact that these spaces have the same dimension is not accidental, but rather a consequence of the following theorem.

THEOREM 4.8.1 The row space and column space of a matrix A have the same dimension.

Proof Let R be any row echelon form of A. It follows from Theorem 4.7.4 and Theorem 4.7.6 b that

so it suffices to show that the row and column spaces of R have the same dimension. But the dimension of the row space of R is the number of nonzero rows, and by Theorem 4.7.5 the dimension of the column space of R is the number of leading 1′s. Since these two numbers are the same, the row and column space have the same dimension.

Rank and Nullity The dimensions of the row space, column space, and null space of a matrix are such important numbers that there is some notation and terminology associated with them.

DEFINITION 1 The common dimension of the row space and column space of a matrix A is called the rank of A and is denoted by rank(A); the dimension of the null space of A is called the nullity of A and is denoted by nullity(A).

The proof of Theorem 4.8.1 shows that the rank of A can be interpreted as the number of leading 1′s in any row echelon form of A.

E X A M P L E 1 Rank and Nullity of a 4 × 6 Matrix Find the rank and nullity of the matrix

Solution The reduced row echelon form of A is

(1)

(verify). Since this matrix has two leading 1′s, its row and column spaces are two-dimensional and rank . To find the nullity of A, we must find the dimension of the solution space of the linear system . This system can be solved by reducing its augmented matrix to reduced row echelon form. The resulting matrix will be identical to 1, except that it will have an additional last column of zeros, and hence the corresponding system of equations will be

Solving these equations for the leading variables yields (2) from which we obtain the general solution

or in column vector form

(3)

Because the four vectors on the right side of 3 form a basis for the solution space, nullity(A) = 4.

E X A M P L E 2 Maximum Value for Rank What is the maximum possible rank of an

matrix A that is not square?

Solution Since the row vectors of A lie in and the column vectors in , the row space of A is at most n-dimensional and the column space is at most m-dimensional. Since the rank of A is the common dimension of its row and column space, it follows that the rank is at most the smaller of m and n. We denote this by writing in which min

is the minimum of m and n.

The following theorem establishes an important relationship between the rank and nullity of a matrix.

THEOREM 4.8.2 Dimension Theorem for Matrices If A is a matrix with n columns, then (4)

Proof Since A has n columns, the homogeneous linear system has n unknowns (variables). These fall into two distinct categories: the leading variables and the free variables. Thus,

But the number of leading variables is the same as the number of leading 1′s in the reduced row echelon form of A, which is the rank of A; and the number of free variables is the same as the number of parameters in the general solution of , which is the nullity of A. This yields Formula 4.

E X A M P L E 3 The Sum of Rank and Nullity The matrix

has 6 columns, so This is consistent with Example 1, where we showed that

The following theorem, which summarizes results already obtained, interprets rank and nullity in the context of a homogeneous linear system.

THEOREM 4.8.3 If A is an

matrix, then

(a)

.

(b)

E X A M P L E 4 Number of Parameters in a General Solution Find the number of parameters in the general solution of

if A is a

matrix of rank 3.

Solution From 4, Thus there are four parameters.

Equivalence Theorem In Theorem 2.3.8 we listed seven results that are equivalent to the invertibility of a square matrix A. We are now in a position to add eight more results to that list to produce a single theorem that summarizes most of the topics we have covered thus far.

THEOREM 4.8.4 Equivalent Statements If A is an

matrix, then the following statements are equivalent.

(a) A is invertible. (b)

has only the trivial solution.

(c) The reduced row echelon form of A is

.

(d) A is expressible as a product of elementary matrices. (e)

is consistent for every

(f)

has exactly one solution for every

(g)

matrix . matrix .

.

(h) The column vectors of A are linearly independent. (i) The row vectors of A are linearly independent. (j) The column vectors of A span (k) The row vectors of A span

. .

(l) The column vectors of A form a basis for (m) The row vectors of A form a basis for

. .

(n) A has rank n. (o) A has nullity 0.

Proof The equivalence of proof we will show that , .

through follows from Theorem 4.5.4 (we omit the details). To complete the , and are equivalent by proving the chain of implications

If has only the trivial solution, then there are no parameters in that solution, so nullity by Theorem 4.8.3 b. Theorem 4.8.2. If A has rank n, then Theorem 4.8.3a implies that there are n leading variables (hence no free variables) in the general solution of . This leaves the trivial solution as the only possibility.

Overdetermined and Underdetermined Systems In many applications the equations in a linear system correspond to physical constraints or conditions that must be satisfied. In general, the most desirable systems are those that have the same number of constraints as unknowns, since such systems often have a unique solution. Unfortunately, it is not always possible to match the number of constraints and unknowns, so researchers are often faced with linear systems that have more constraints than unknowns, called overdetermined systems, or with fewer constraints than unknowns, called underdetermined systems. The following two theorems will help us to analyze both overdetermined and underdetermined systems.

In engineering and other applications, the occurrence of an overdetermined or underdetermined linear system often signals that one or more variables were omitted in formulating the problem or that extraneous variables were included. This often leads to some kind of undesirable physical result.

THEOREM 4.8.5 If is a consistent linear system of m equations in n unknowns, and if A has rank r, then the general solution of the system contains parameters.

Proof It follows from Theorem 4.7.2 that the number of parameters is equal to the nullity of A, which, by Theorem 4.8.2, is .

THEOREM 4.8.6 Let A be an

matrix.

(a) (Overdetermined Case) If in .

, then the linear system

(b) (Underdetermined Case) If , then for each vector inconsistent or has infinitely many solutions.

is inconsistent for at least one vector in

the linear system

is either

Proof (a) Assume that , in which case the column vectors of A cannot span (fewer vectors than the dimension of ). Thus, there is at least one vector in that is not in the column space of A, and for that the system is inconsistent by Theorem 4.7.1. Proof (b) Assume that . For each vector in there are two possibilities: either the system is consistent or it is inconsistent. If it is inconsistent, then the proof is complete. If it is consistent, then Theorem 4.8.5 implies that the general solution has parameters, where . But rank (A) is the smaller of m and n, so

This means that the general solution has at least one parameter and hence there are infinitely many solutions.

E X A M P L E 5 Overdetermined and Underdetermined Systems

(a) What can you say about the solutions of an overdetermined system unknowns in which A has rank ? (b) What can you say about the solutions of an underdetermined system unknowns in which A has rank ?

of 7 equations in 5 of 5 equations in 7

Solution (a) The system is consistent for some vector in the general solution is .

, and for any such

the number of parameters in

(b) The system may be consistent or inconsistent, but if it is consistent for the vector general solution has parameters.

in

, then the

E X A M P L E 6 An Overdetermined System The linear system

is overdetermined, so it cannot be consistent for all possible values of , , , , and . Exact conditions under which the system is consistent can be obtained by solving the linear system by Gauss– Jordan elimination. We leave it for you to show that the augmented matrix is row equivalent to

(5)

Thus, the system is consistent if and only if

,

,

,

, and

satisfy the conditions

Solving this homogeneous linear system yields where r and s are arbitrary.

Remark The coefficient matrix for the linear system in the last example has columns, and it has rank because there are two nonzero rows in its reduced row echelon form. This implies that when the system is consistent its general solution will contain parameters; that is, the solution will be unique. With a moment's thought, you should be able to see that this is so from 5.

The Fundamental Spaces of a Matrix There are six important vector spaces associated with a matrix A and its transpose

:

However, transposing a matrix converts row vectors into column vectors and conversely, so except for a difference in notation, the row space of is the same as the column space of A, and the column space of is the same as the row space of A. Thus, of the six spaces listed above, only the following four are distinct:

If A is an matrix, then the row space and null space of A are subspaces of , and the column space of A and the null space of are subspaces of . These are called the fundamental spaces of a matrix A. We will conclude this section by discussing how these four subspaces are related. Let us focus for a moment on the matrix . Since the row space and column space of a matrix have the same dimension, and since transposing a matrix converts its columns to rows and its rows to columns, the following result should not be surprising.

THEOREM 4.8.7 If A is any matrix, then

.

Proof

This result has some important implications. For example, if A is an matrix and using the fact that this matrix has m columns yields

matrix, then applying Formula 4 to the

which, by virtue of Theorem 4.8.7, can be rewritten as (6) This alternative form of Formula 4 in Theorem 4.8.2 makes it possible to express the dimensions of all four fundamental spaces in terms of the size and rank of A. Specifically, if rank , then (7) The four formulas in 7 provide an algebraic relationship between the size of a matrix and the dimensions of its fundamental spaces. Our next objective is to find a geometric relationship between the fundamental spaces themselves. For this purpose recall from Theorem 3.4.3 that if A is an matrix, then the null space of A consists of those vectors that are orthogonal to each of the row vectors of A. To develop that idea in more detail, we make the following definition.

DEFINITION 2 If W is a subspace of , then the set of all vectors in that are orthogonal to every vector in W is called the orthogonal complement of W and is denoted by the symbol .

The following theorem lists three basic properties of orthogonal complements. We will omit the formal proof because a more general version of this theorem will be given later in the text.

THEOREM 4.8.8 If W is a subspace of (a)

, then:

is a subspace of

.

(b) The only vector common to W and (c) The orthogonal complement of

is 0. is W.

E X A M P L E 7 Orthogonal Complements In the orthogonal complement of a line W through the origin is the line through the origin that is perpendicular to W (Figure 4.8.1a); and in the orthogonal complement of a plane W through the origin is the line through the origin that is perpendicular to that plane (Figure 4.8.1b).

Figure 4.8.1

Explain why complements.

and

are orthogonal

A Geometric Link Between the Fundamental Spaces The following theorem provides a geometric link between the fundamental spaces of a matrix. Part (a) is essentially a restatement of Theorem 3.4.3 in the language of orthogonal complements, and part (b), whose proof is left as an exercise, follows from part (a). The essential idea of the theorem is illustrated in Figure 4.8.2.

THEOREM 4.8.9 If A is an

matrix, then:

(a) The null space of A and the row space of A are orthogonal complements in (b) The null space of

.

and the column space of A are orthogonal complements in

Figure 4.8.2

More on the Equivalence Theorem

.

As our final result in this section, we will add two more statements to Theorem 4.8.4. We leave the proof that those statements are equivalent to the rest as an exercise.

THEOREM 4.8.10 Equivalent Statements If A is an

matrix, then the following statements are equivalent.

(a) A is invertible. (b)

has only the trivial solution.

(c) The reduced row echelon form of A is

.

(d) A is expressible as a product of elementary matrices. (e)

is consistent for every

(f)

has exactly one solution for every

(g)

matrix . matrix .

.

(h) The column vectors of A are linearly independent. (i) The row vectors of A are linearly independent. (j) The column vectors of A span (k) The row vectors of A span

. .

(l) The column vectors of A form a basis for (m) The row vectors of A form a basis for (n) A has (o) A has

. .

. .

(p) The orthogonal complement of the null space of A is (q) The orthogonal complement of the row space of A is

. .

Applications of Rank The advent of the Internet has stimulated research on finding efficient methods for transmitting large amounts of digital data over communications lines with limited bandwidths. Digital data are commonly stored in matrix form, and many techniques for improving transmission speed use the rank of a matrix in some way. Rank plays a role because it measures the “redundancy” in a matrix in the sense that if A is an matrix of rank k, then of the column vectors and of the row vectors can be expressed in terms of k linearly independent column or row vectors. The essential idea in many data compression schemes is to approximate the original data set by a data set with smaller rank that conveys nearly the same information, then eliminate redundant vectors in the approximating set to speed up the transmission time.

Concept Review • Rank • Nullity • Dimension Theorem • Overdetermined system • Underdetermined system • Fundamental spaces of a matrix • Relationships among the fundamental spaces • Orthogonal complement • Equivalent characterizations of invertible matrices

Skills • Find the rank and nullity of a matrix. • Find the dimension of the row space of a matrix.

Exercise Set 4.8 1. Verify that

.

Answer:

2. Find the rank and nullity of the matrix; then verify that the values obtained satisfy Formula 4 in the Dimension Theorem. (a)

(b)

(c)

(d)

(e)

3. In each part of Exercise 2, use the results obtained to find the number of leading variables and the number of parameters in the solution of without solving the system. Answer: (a) 2; 1 (b) 1; 2 (c) 2; 2 (d) 2; 3 (e) 3; 2 4. In each part, use the information in the table to find the dimension of the row space of A, column space of A, null space of A, and null space of . (a)

(b)

(c)

(d)

(e)

(f)

(g)

3

2

1

2

2

0

2

Size of A Rank(A)

5. In each part, find the largest possible value for the rank of A and the smallest possible value for the nullity of A. (a) A is (b) A is (c) A is Answer: (a) (b) (c) 6. If A is an nullity?

matrix, what is the largest possible value for its rank and the smallest possible value for its

7. In each part, use the information in the table to determine whether the linear system state the number of parameters in its general solution.

is consistent. If so,

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Rank (A)

3

2

1

2

2

0

2

Rank [A |b]

3

3

1

2

3

0

2

Size of A

Answer: (a) Yes, 0 (b) No (c) Yes, 2 (d) Yes, 7 (e) No (f) Yes, 4 (g) Yes, 0 8. For each of the matrices in Exercise 7, find the nullity of A, and determine the number of parameters in the general solution of the homogeneous linear system . 9. What conditions must be satisfied by

,

,

,

, and

for the overdetermined linear system

to be consistent? Answer:

10. Let

Show that A has

if and only if one or more of the determinants

is nonzero. 11. Suppose that A is a matrix whose null space is a line through the origin in 3-space. Can the row or column space of A also be a line through the origin? Explain. Answer: No 12. Discuss how the rank of A varies with t. (a)

(b)

13. Are there values of r and s for which

has rank 1? Has rank 2? If so, find those values. Answer: Rank is 2 if

and

; the rank is never 1.

14. Use the result in Exercise 10 to show that the set of points

has

is the curve with parametric equations

15. Prove: If

,

in

,

for which the matrix

.

, then A and kA have the same rank.

16. (a) Give an example of a

matrix whose column space is a plane through the origin in 3-space.

(b) What kind of geometric object is the null space of your matrix? (c) What kind of geometric object is the row space of your matrix? 17. (a) If A is a matrix, then the number of leading 1′s in the reduced row echelon form of A is at most _________ . Why? (b) If A is a matrix, then the number of parameters in the general solution of _________ . Why?

is at most

(c) If A is a matrix, then the number of leading 1′s in the reduced row echelon form of A is at most _________ . Why? (d) If A is a matrix, then the number of parameters in the general solution of _________ . Why? Answer: (a) 3 (b) 5 (c) 3 (d) 3 18. (a) If A is a

matrix, then the rank of A is at most _________ . Why?

(b) If A is a

matrix, then the nullity of A is at most _________ . Why?

(c) If A is a

matrix, then the rank of

(d) If A is a

matrix, then the nullity of

19. Find matrices A and B for which rank Answer:

is at most _________ . Why? is at most _________ . Why? , but rank

.

is at most

20. Prove: If a matrix A is not square, then either the row vectors or the column vectors of A are linearly dependent.

True-False Exercises In parts (a)–(j) determine whether the statement is true or false, and justify your answer. (a) Either the row vectors or the column vectors of a square matrix are linearly independent. Answer: False (b) A matrix with linearly independent row vectors and linearly independent column vectors is square. Answer: True (c) The nullity of a nonzero

matrix is at most m.

Answer: False (d) Adding one additional column to a matrix increases its rank by one. Answer: False (e) The nullity of a square matrix with linearly dependent rows is at least one. Answer: True (f) If A is square and

is inconsistent for some vector , then the nullity of A is zero.

Answer: False (g) If a matrix A has more rows than columns, then the dimension of the row space is greater than the dimension of the column space. Answer: False (h) If rank

, then A is square.

Answer: False (i) There is no

matrix whose row space and null space are both lines in 3-space.

Answer: True (j) If V is a subspace of

and W is a subspace of V, then

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

is a subspace of

.

4.9 Matrix Transformations from R n to R m In this section we will study functions of the form , where the independent variable is a vector in and the dependent variable is a vector in . We will concentrate on a special class of such functions called “matrix transformations.” Such transformations are fundamental in the study of linear algebra and have important applications in physics, engineering, social sciences, and various branches of mathematics.

Functions and Transformations Recall that a function is a rule that associates with each element of a set A one and only one element in a set B. If f associates the element b with the element a, then we write and we say that b is the image of a under f or that is the value of f at a. The set A is called the domain of f and the set B the codomain of f (Figure 4.9.1). The subset of the codomain that consists of all images of points in the domain is called the range of f.

Figure 4.9.1 For many common functions the domain and codomain are sets of real numbers, but in this text we will be concerned with functions for which the domain and codomain are vector spaces.

DEFINITION 1 If V and W are vector spaces, and if f is a function with domain V and codomain W, then we say that f is a transformation from V to W or that f maps V to W, which we denote by writing In the special case where

, the transformation is also called an operator on V.

In this section we will be concerned exclusively with transformations from to ; transformations of general vector spaces will be considered in a later section. To illustrate one way in which such transformations can arise, suppose that are real-valued functions of n variables, say

(1)

These m equations assign a unique point in to each point transformation from to . If we denote this transformation by T, then

in

and thus define a

and

Matrix Transformations In the special case where the equations in 1 are linear, they can be expressed in the form (2) which we can write in matrix notation as (3) or more briefly as (4) Although we could view this as a linear system, we will view it instead as a transformation that maps the column vector in into the column vector in by multiplying on the left by A. We call this a matrix transformation (or matrix operator if ), and we denote it by . With this notation, Equation 4 can be expressed as (5) The matrix transformation transformation.

is called multiplication by A, and the matrix A is called the standard matrix for the

We will also find it convenient, on occasion, to express 5 in the schematic form (6) which is read “

maps

into .”

E X A M P L E 1 A Matrix Transformation from R4 to R3 The matrix transformation

defined by the equations (7)

can be expressed in matrix form as

(8)

so the standard matrix for T is

The image of a point can be computed directly from the defining equations 7 or from 8 by matrix multiplication. For example, if then substituting in 7 yields

,

,

(verify), or alternatively from 8,

Some Notational Matters Sometimes we will want to denote a matrix transformation without giving a name to the matrix itself. In such cases we will denote the standard matrix for by the symbol . Thus, the equation (9) is simply the statement that T is a matrix transformation with standard matrix transformation is the product of the matrix and the column vector .

, and the image of

under this

Properties of Matrix Transformations The following theorem lists four basic properties of matrix transformations that follow from properties of matrix multiplication.

THEOREM 4.9.1 For every matrix A the matrix transformation in and for every scalar k: (a) (b) (c)

has the following properties for all vectors

and

(d)

Proof All four parts are restatements of familiar properties of matrix multiplication:

It follows from Theorem 4.9.1 that a matrix transformation maps linear combinations of vectors in corresponding linear combinations in in the sense that

into the

(10) Depending on whether n-tuples and m-tuples are regarded as vectors or points, the geometric effect of a matrix transformation is to map each vector (point) in into a vector (point) in (Figure 4.9.2).

Figure 4.9.2 The following theorem states that if two matrix transformations from , then the matrices themselves must be the same.

to

have the same image at each point of

THEOREM 4.9.2 If , then

and

for every vector

in

.

Proof To say that

for every vector

are matrix transformations, and if

in

for every vector in

. This is true, in particular, if

is the same as saying that

is any of the standard basis vectors

for

; that is, (11)

Since every entry of is 0 except for the jth, which is 1, it follows from Theorem 1.3.1 that is the jth column of A and is the jth column of B. Thus, it follows from 11 that corresponding columns of A and B are the same, and hence that .

E X A M P L E 2 Zero Transformations If 0 is the

zero matrix, then

so multiplication by zero maps every vector in transformation from to .

into the zero vector in

. We call

the zero

E X A M P L E 3 Identity Operators If I is the

identity matrix, then

so multiplication by I maps every vector in

into itself. We call

the identity operator on

.

A Procedure for Finding Standard Matrices There is a way of finding the standard matrix for a matrix transformation from to by considering the effect of that transformation on the standard basis vectors for . To explain the idea, suppose that A is unknown and that are the standard basis vectors for

. Suppose also that the images of these vectors under the transformation

are

It follows from Theorem 1.3.1 that is a linear combination of the columns of A in which the successive coefficients are the entries of . But all entries of are zero except the jth, so the product is just the jth column of the matrix A. Thus, (12)

In summary, we have the following procedure for finding the standard matrix for a matrix transformation:

Finding the Standard Matrix for a Matrix Transformation Step 1. Find the images of the standard basis vectors

for

in column form.

Step 2. Construct the matrix that has the images obtained in Step 1 as its successive columns. This matrix is the standard matrix for the transformation.

Reflection Operators

Some of the most basic matrix operators on and are those that map each point into its symmetric image about a fixed line or a fixed plane; these are called reflection operators. Table 1 shows the standard matrices for the reflections about the coordinate axes in , and Table 2 shows the standard matrices for the reflections about the coordinate planes in . In each case the standard matrix was obtained by finding the images of the standard basis vectors, converting those images to column vectors, and then using those column vectors as successive columns of the standard matrix. Table 1 Operator

Illustration

Images of e1 and e2

Standard Matrix

Reflection about the y-acis

Reflection about the x-axis

Reflection about the line

Table 2 Operator Reflection about the xy-plane

Reflection about the xz-plane

Reflection about the yz-plane

Illustration

e1, e2, e3

Standard Matrix

Projection Operators Matrix operators on and that map each point into its orthogonal projection on a fixed line or plane are called projection operators (or more precisely, orthogonal projection operators). Table 3 shows the standard matrices for the orthogonal projections on the coordinate axes in , and Table 4 shows the standard matrices for the orthogonal projections on the coordinate planes in . Table 3 Operator

Illustration

Images of e1 and e2

Standard Matrix

Orthogonal projection on the x-axis

Orthogonal projection on the y-axis

Table 4 Operator Orthogonal projection on the xy-plane

Orthogonal projection on the xz-plane

Orthogonal projection on the yz-plane

Rotation Operators

Illustration

Images of e1, e2, e3

Standard Matrix

Matrix operators on and that move points along circular arcs are called rotation operators. Let us consider how to find the standard matrix for the rotation operator that moves points counterclockwise about the origin through an angle θ (Figure 4.9.3). As illustrated in Figure 4.9.3, the images of the standard basis vectors are so the standard matrix for T is

Figure 4.9.3 In keeping with common usage we will denote this operator by

and call (13)

the rotation matrix for . If is a vector in , and if relationship can be written in component form as

is its image under the rotation, then the

(14) These are called the rotation equations for

. These ideas are summarized in Table 5. Table 5

Operator

Illustration

Rotation through an angle

In the plane, counterclockwise angles are positive and clockwise angles are negative. The rotation matrix for a clockwise rotation of radians can be obtained by replacing by in 12. After simplification this yields

Rotation Equations

Standard Matrix

E X A M P L E 4 A Rotation Operator Find the image of

under a rotation of

Solution It follows from 13 with

radians

about the origin.

that

or in comma-delimited notation,

.

Rotations in R3 A rotation of vectors in is usually described in relation to a ray emanating from the origin, called the axis of rotation. As a vector revolves around the axis of rotation, it sweeps out some portion of a cone (Figure 4.9.4a). The angle of rotation, which is measured in the base of the cone, is described as “clockwise” or “counterclockwise” in relation to a viewpoint that is along the axis of rotation looking toward the origin. For example, in Figure 4.9.4a the vector results from rotating the vector counterclockwise around the axis l through an angle . As in , angles are positive if they are generated by counterclockwise rotations and negative if they are generated by clockwise rotations.

Figure 4.9.4 The most common way of describing a general axis of rotation is to specify a nonzero vector that runs along the axis of rotation and has its initial point at the origin. The counterclockwise direction for a rotation about the axis can then be determined by a “right-hand rule” (Figure 4.9.4b): If the thumb of the right hand points in the direction of , then the cupped fingers point in a counterclockwise direction. A rotation operator on is a matrix operator that rotates each vector in about some rotation axis through a fixed angle . In Table 6 we have described the rotation operators on whose axes of rotation are the positive coordinate axes. For each of these rotations one of the components is unchanged, and the relationships between the other components can be derived by the same procedure used to derive 14. For example, in the rotation about the z-axis, the z-components of and are the same, and the x- and y-components are related as in 14. This yields the rotation equation shown in the last row of Table 6.

Table 6

For completeness, we note that the standard matrix for a counterclockwise rotation through an angle about an axis in , which is determined by an arbitrary unit vector that has its initial point at the origin, is

(15)

The derivation can be found in the book Principles of Interactive Computer Graphics, by W. M. Newman and R. F. Sproull (New York: McGraw-Hill, 1979). You may find it instructive to derive the results in Table 6 as special cases of this more general result.

Dilations and Contractions If k is a nonnegative scalar, then the operator length of each vector by a factor of k. If

on or has the effect of increasing or decreasing the the operator is called a contraction with factor k, and if it is

called a dilation with factor k (Figure 4.9.5). If , then T is the identity operator and can be regarded either as a contraction or a dilation. Tables 7 and 8 illustrate these operators.

Figure 4.9.5 Table 7 Operator

Illustration

Effect on the Standard Basis

Standard Matrix

Contraction with factor k on

Dilation with factor k on

Table 8

Yaw, Pitch, and Roll In aeronautics and astronautics, the orientation of an aircraft or space shuttle relative to an -coordinate system is often described in terms of angles called yaw, pitch, and roll. If, for example, an aircraft is flying

along the y-axis and the -plane defines the horizontal, then the aircraft's angle of rotation about the z-axis is called the yaw, its angle of rotation about the x-axis is called the pitch, and its angle of rotation about the y-axis is called the roll. A combination of yaw, pitch, and roll can be achieved by a single rotation about some axis through the origin. This is, in fact, how a space shuttle makes attitude adjustments—it doesn't perform each rotation separately; it calculates one axis, and rotates about that axis to get the correct orientation. Such rotation maneuvers are used to align an antenna, point the nose toward a celestial object, or position a payload bay for docking.

Expansion and Compressions In a dilation or contraction of or , all coordinates are multiplied by a factor k. If only one of the coordinates is multiplied by k, then the resulting operator is called an expansion or compression with factor k. This is illustrated in Table 9 for . You should have no trouble extending these results to . Table 9 Operator

Illustration

Effect on the Standard Basis

Standard Matrix

Illustration

Effect on the Standard Basis

Standard Matrix

Compression of in the x-direction with factor k

Expansion of in the x-direction with factor k

Operator Compression of in the y-direction with factor k

Operator

Illustration

Effect on the Standard Basis

Standard Matrix

Expansion of in the y-direction with factor k

Shears A matrix operator of the form translates a point in the -plane parallel to the x-axis by an amount that is proportional to the y-coordinate of the point. This operator leaves the points on the x-axis fixed (since ), but as we progress away from the x-axis, the translation distance increases. We call this operator the shear in the x-direction with factor k. Similarly, a matrix operator of the form is called the shear in the y-direction with factor k. Table 10 illustrates the basic information about shears in . Table 10 Operator

Effect on the Standard Basis

Shear of factor k

in the x-direction with

Shear of factor k

in the y-direction with

Standard Matrix

E X A M P L E 5 Some Basic Matrix Operators on R2 In each part describe the matrix operator corresponding to

and show its effect on the unit square.

Solution By comparing the forms of these matrices to those in Tables 7, 9, and 10, we see that the matrix corresponds to a shear in the x-direction with factor 2, the matrix corresponds to a dilation with factor 2, and corresponds to an expansion in the x-direction with factor 2. The effects of these operators on the unit square are shown in Figure 4.9.6.

Figure 4.9.6

OPTIONAL

Orthogonal Projections on Lines Through the Origin In Table 3 we listed the standard matrices for the orthogonal projections on the coordinate axes in . These are special cases of the more general operator that maps each point into its orthogonal projection on a line L through the origin that makes an angle with the positive x-axis (Figure 4.9.7). In Example 4 of Section 3.3 we used Formula 10 of that section to find the orthogonal projections of the standard basis vectors for on that line. Expressed in matrix form, we found those projections to be

Figure 4.9.7 Thus, the standard matrix for T is

In keeping with common usage, we will denote this operator by (16)

We have included two versions of Formula 16 because both are commonly used. Whereas the first version involves only the angle θ, the second involves both θ and 2θ.

E X A M P L E 6 Orthogonal Projection on a Line Through the Origin Use Formula 16 to find the orthogonal projection of the vector that makes an angle of Solution Since

on the line through the origin

with the x-axis. and

, it follows from 16 that the standard matrix

for this projection is

Thus,

or in comma-delimited notation,

Reflections About Lines Through the Origin In Table 1 we listed the reflections about the coordinate axes in . These are special cases of the more general operator that maps each point into its reflection about a line L through the origin that makes an angle θ with the positive x-axis (Figure 4.9.8). We could find the standard matrix for by finding the images of the standard basis vectors, but instead we will take advantage of our work on orthogonal projections by using the Formula 16 for to find a formula for .

Figure 4.9.8 You should be able to see from Figure 4.9.9 that for every vector x in

Figure 4.9.9 Thus, it follows from Theorem 4.9.2 that (17) and hence from 16 that (18)

E X A M P L E 7 Reflection About a Line Through the Origin Find the reflection of the vector x = (1, 5) on the line through the origin that makes an angle of π/6(= 30°) with the x-axis. Solution Since

and

for this projection is

Thus,

or in comma-delimited notation,

Show that the standard matrices in Tables 1 and 3 are special cases of 18 and 16.

Concept Review • Function • Image

, it follows from 18 that the standard matrix

• Value • Domain • Codomain • Transformation • Relationships among the fundamental spaces • Operator • Matrix transformation • Matrix operator • Standard matrix • Properties of matrix transformations • Zero transformation • Identity operator • Reflection operator • Projection operator • Rotation operator • Rotation matrix • Rotation equations • Axis of rotation in 3-space • Angle of rotation in 3-space • Expansion operator • Compression operator • Shear • Dilation • Contraction

Skills • Find the domain and codomain of a transformation, and determine whether the transformation is linear. • Find the standard matrix for a matrix transformation. • Describe the effect of a matrix operator on the standard basis in Rn.

Exercise Set 4.9 In Exercises 1–2, find the domain and codomain of the transformation 1. (a) A has size

.

(b) A has size

.

(c) A has size

.

(d) A has size

.

Answer:

(a) Domain:

; codomain:

(b) Domain:

; codomain:

(c) Domain:

; codomain:

(d) Domain:

; codomain:

2. (a) A has size

.

(b) A has size

.

(c) A has size

.

(d) A has size

.

3. If the image of

, then the domain of T is _________ , the codomain of T is _________ , and under T is _________ .

Answer:

4. If and the image of

, then the domain of T is _________ , the codomain of T is _________ , under T is _________ .

5. In each part, find the domain and codomain of the transformation defined by the equations, and determine whether the transformation is linear. (a) (b)

(c)

(d)

Answer: (a) Linear; (b) Nonlinear; (c) Linear; (d) Nonlinear; 6. In each part, determine whether T is a matrix transformation. (a) (b) (c) (d)

(e) 7. In each part, determine whether T is a matrix transformation. (a) (b) (c) (d) (e) Answer: (a) and (c) are matrix transformations; (b), (d), and (e) are not matrix transformations. 8. Find the standard matrix for the transformation defined by the equations. (a) (b)

(c)

(d)

9. Find the standard matrix for the operator

and then calculate

defined by

by directly substituting in the equations and also by matrix multiplication.

Answer: ; 10. Find the standard matrix for the operator T defined by the formula. (a) (b) (c) (d) 11. Find the standard matrix for the transformation T defined by the formula. (a) (b)

(c) (d) Answer: (a)

(b)

(c)

(d)

12. In each part, find

, and express the answer in matrix form.

(a) (b)

(c)

(d)

13. In each part, use the standard matrix for T to find (a)

; then check the result by calculating

;

(b)

;

Answer: (a) (b) 14. Use matrix multiplication to find the reflection of (a) the x-axis.

about

directly.

(b) the y-axis. (c) the line

.

15. Use matrix multiplication to find the reflection of

about

(a) the xy-plane. (b) the xz-plane. (c) the yz-plane. Answer: (a) (b) (2, 5, 3) (c) 16. Use matrix multiplication to find the orthogonal projection of

on

(a) the x-axis. (b) the y-axis. 17. Use matrix multiplication to find the orthogonal projection of

on

(a) the xy-plane. (b) the xz-plane. (c) the yz-plane. Answer: (a) (b) (c) (0, 1, 3) 18. Use matrix multiplication to find the image of the vector (a)

.

(b)

.

(c)

.

(d)

.

19. Use matrix multiplication to find the image of the vector (a) 30° about the x-axis. (b) 45° about the y-axis. (c) 90° about the z-axis. Answer: (a)

(b) (c)

when it is rotated through an angle of

if it is rotated

20. Find the standard matrix for the operator that rotates a vector in

through an angle of

about

(a) the x-axis. (b) the y-axis. (c) the z-axis. 21. Use matrix multiplication to find the image of the vector (a)

about the x-axis.

(b)

about the y-axis.

(c)

about the z-axis.

if it is rotated

Answer: (a)

(b) (c) (1, 2, 2) 22. In

the orthogonal projections on the x-axis, y-axis, and z-axis are defined by

respectively. (a) Show that the orthogonal projections on the coordinate axes are matrix operators, and find their standard matrices. (b) Show that if , the vectors

and

(c) Make a sketch showing

is an orthogonal projection on one of the coordinate axes, then for every vector are orthogonal. and

in

in the case where T is the orthogonal projection on the x-axis.

23. Use Formula 15 to derive the standard matrices for the rotations about the x-axis, y-axis, and z-axis in

.

24. Use Formula 15 to find the standard matrix for a rotation of radians about the axis determined by the vector . [Note: Formula 15 requires that the vector defining the axis of rotation have length 1.] 25. Use Formula 15 to find the standard matrix for a rotation of 180° about the axis determined by the vector . [Note: Formula 15 requires that the vector defining the axis of rotation have length 1.] Answer:

26. It can be proved that if A is a matrix with orthonormal column vectors and for which multiplication by A is a rotation through some angle . Verify that

, then

satisfies the stated conditions and find the angle of rotation. 27. The result stated in Exercise 26 can be extended to ; that is, it can be proved that if A is a matrix with orthonormal column vectors and for which , then multiplication by A is a rotation about some axis through some angle . Use Formula 15 to show that the angle of rotation satisfies the equation

28. Let A be a shown that if

matrix (other than the identity matrix) satisfying the conditions stated in Exercise 27. It can be is any nonzero vector in

, then the vector

determines an axis of

rotation when is positioned with its initial point at the origin. [See “The Axis of Rotation: Analysis, Algebra, Geometry,” by Dan Kalman, Mathematics Magazine, Vol. 62, No. 4, October 1989.] (a) Show that multiplication by

is a rotation. (b) Find a vector of length 1 that defines an axis for the rotation. (c) Use the result in Exercise 27 to find the angle of rotation about the axis obtained in part (b). 29. In words, describe the geometric effect of multiplying a vector

by the matrix A.

(a) (b)

Answer: (a) Twice the orthogonal projection on the x-axis. (b) Twice the reflection about the x-axis. 30. In words, describe the geometric effect of multiplying a vector

by the matrix A.

(a) (b)

31. In words, describe the geometric effect of multiplying a vector

by the matrix

Answer: Rotation through the angle

.

32. If multiplication by A rotates a vector ? Explain your reasoning.

in the xy-plane through an angle θ, what is the effect of multiplying

by

33. Let

be a nonzero column vector in , and suppose that is the transformation defined by the formula , where is the standard matrix of the rotation of about the origin through the angle θ. Give a geometric description of this transformation. Is it a matrix transformation? Explain. Answer: Rotation through the angle θ and translation by

; not a matrix transformation since

is nonzero.

34. A function of the form is commonly called a “linear function” because the graph of a line. Is f a matrix transformation on R? 35. Let be a line in , and let be a matrix operator on the image of this line under the operator T? Explain your reasoning.

is

. What kind of geometric object is

Answer: A line in

.

True-False Exercises In parts (a)–(i) determine whether the statement is true or false, and justify your answer. (a) If A is a

matrix, then the domain of the transformation

is

.

Answer: False (b) If A is an

matrix, then the codomain of the transformation

is

.

Answer: False (c) If

and

, then T is a matrix transformation.

Answer: False (d) If and T is a matrix transformation.

for all scalars

and

and all vectors

and

in

, then

Answer: True (e) There is only one matrix transformation

such that

for every vector

in

.

Answer: False (f) There is only one matrix transformation

such that

for all vectors

and

Answer: True (g) If

is a nonzero vector in

, then

is a matrix operator on

.

Answer: False (h) The matrix

is the standard matrix for a rotation.

Answer: False (i)

The standard matrices of the reflections about the coordinate axes in 2-space have the form . Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

, where

in

.

4.10 Properties of Matrix Transformations In this section we will discuss properties of matrix transformations. We will show, for example, that if several matrix transformations are performed in succession, then the same result can be obtained by a single matrix transformation that is chosen appropriately. We will also explore the relationship between the invertibility of a matrix and properties of the corresponding transformation.

Compositions of Matrix Transformations Suppose that is a matrix transformation from to and is a matrix transformation from to . If x is a vector in , then maps this vector into a vector in , and , in turn, maps that vector into the vector in . This process creates a transformation from to that we call the composition of with and denote by the symbol which is read “ first; that is,

circle

”. As illustrated in Figure 4.10.1, the transformation

in the formula is performed

(1) This composition is itself a matrix transformation since which shows that it is multiplication by

. This is expressed by the formula (2)

WARNING Just as it is not true, in general, that so it is not true, in general, that That is, order matters when matrix transformations are composed.

Figure 4.10.1 Compositions can be defined for any finite succession of matrix transformations whose domains and ranges have

the appropriate dimensions. For example, to extend Formula 2 to three factors, consider the matrix transformations We define the composition

by

As above, it can be shown that this is a matrix transformation whose standard matrix is

and that (3)

As in Formula 9 of Section 4.9 , we can use square brackets to denote a matrix transformation without referencing a specific matrix. Thus, for example, the formula (4) is a restatement of Formula 2 which states that the standard matrix for a composition is the product of the standard matrices in the appropriate order. Similarly, (5) is a restatement of Formula 3.

E X A M P L E 1 Composition of Two Rotations Let and

and be the matrix operators that rotate vectors through the angles , respectively. Thus the operation

first rotates through the angle , then rotates through the angle . It follows that the net effect of is to rotate each vector in through the angle (Figure 4.10.2). Thus, the standard matrices for these matrix operators are

These matrices should satisfy 4. With the help of some basic trigonometric identities, we can confirm that this is so as follows:

Figure 4.10.2

E X A M P L E 2 Composition Is Not Commutative Let be the reflection about the line , and let be the orthogonal projection on the y-axis. Figure 4.10.3 illustrates graphically that and have different effects on a vector . This same conclusion can be reached by showing that the standard matrices for and do not commute:

so

.

Figure 4.10.3

E X A M P L E 3 Composition of Two Reflections Let be the reflection about the y-axis, and let be the reflection about the x-axis. In this case and are the same; both map every vector into its negative (Figure 4.10.4):

The equality of and commute:

and

can also be deduced by showing that the standard matrices for

The operator on or is called the reflection about the origin. As the foregoing computations show, the standard matrix for this operator on is

Figure 4.10.4

E X A M P L E 4 Composition of Three Transformations Find the standard matrix for the operator that first rotates a vector counterclockwise about the z-axis through an angle , then reflects the resulting vector about the -plane, and then projects that vector orthogonally onto the -plane. Solution The operator T can be expressed as the composition

where is the rotation about the z-axis, is the reflection about the yz-plane, and is the orthogonal projection on the xy-plane. From Tables 6, 2, and 4 of Section 4.9 , the standard matrices for these operators are

Thus, it follows from 5 that the standard matrix for T is

One-to-One Matrix Transformations Our next objective is to establish a link between the invertibility of a matrix A and properties of the corresponding matrix transformation .

DEFINITION 1 A matrix transformation into distinct vectors (points) in

is said to be one-to-one if

maps distinct vectors (points) in

.

(See Figure 4.10.5). This idea can be expressed in various ways. For example, you should be able to see that the following are just restatements of Definition 1: 1.

is one-to-one if for each vector b in the range of A there is exactly one vector

2.

is one-to-one if the equality

implies that

Figure 4.10.5

.

in

such that

.

Rotation operators on are one-to-one since distinct vectors that are rotated through the same angle have distinct images (Figure 4.10.6). In contrast, the orthogonal projection of on the xy-plane is not one-to-one because it maps distinct points on the same vertical line into the same point (Figure 4.10.7).

Figure 4.10.6 Distinct vectors

and

are rotated into distinct vectors

and

Figure 4.10.7 The distinct points P and Q are mapped into the same point M The following theorem establishes a fundamental relationship between the invertibility of a matrix and properties of the corresponding matrix transformation.

THEOREM 4.10.1 If A is an matrix and statements are equivalent.

is the corresponding matrix operator, then the following

(a) A is invertible. (b) The range of (c)

is

.

is one-to-one.

Proof We will establish the chain of implications

.

Assume that A is invertible. By parts (a) and (e) of Theorem 4.8.10, the system is consistent for every matrix in . This implies that maps into the arbitrary vector in , which in turn implies that the range of is all of . Assume that the range of is . This implies that for every vector in there is some vector in for which and hence that the linear system is consistent for every vector in . But the equivalence of parts (e) and (f) of Theorem 4.8.10 implies that has a unique solution for every vector

in

and hence that for every vector .

in the range of

there is exactly one vector

in

such that

Assume that is one-to-one. Thus, if is a vector in the range of , there is a unique vector for which . We leave it for you to complete the proof using Exercise 30.

in

E X A M P L E 5 Properties of a Rotation Operator As indicated in Figure 4.10.6, the operator that rotates vectors in through an angle is one-to-one. Confirm that is invertible in accordance with Theorem 4.10.1. Solution From Table 5 of Section 4.9 the standard matrix for T is

This matrix is invertible because

E X A M P L E 6 Properties of a Projection Operator As indicated in Figure 4.10.7, the operator that projects each vector in orthogonally on the xy-plane is not one-to-one. Confirm that is not invertible in accordance with Theorem 4.10.1. Solution From Table 4 of Section 4.9 the standard matrix for T is

This matrix is not invertible since

.

Inverse of a One-to-One Matrix Operator If is a one-to-one matrix operator, then it follows from Theorem 4.10.1 that A is invertible. The matrix operator that corresponds to appropriate because

is called the inverse operator or (more simply) the inverse of . This terminology is and cancel the effect of each other in the sense that if is any vector in , then

or, equivalently,

From a more geometric viewpoint, if

is the image of

under

, then

maps

back into , since

(Figure 4.10.8).

Figure 4.10.8 Before considering examples, it will be helpful to touch on some notational matters. If is a one-to-one matrix operator, and if is its inverse, then the standard matrices for these operators are related by the equation (6) In cases where it is preferable not to assign a name to the matrix, we will write this equation as (7)

E X A M P L E 7 Standard Matrix for T−1 Let be the operator that rotates each vector in of Section 4.9 ,

through the angle , so from Table 5

(8) It is evident geometrically that to undo the effect of T, one must rotate each vector in through the angle . But this is exactly what the operator does, since the standard matrix for is

(verify), which is the standard matrix for a rotation through the angle

.

E X A M P L E 8 Finding T−1 Show that the operator

is one-to-one, and find

defined by the equations

.

Solution The matrix form of these equations is

so the standard matrix for T is

This matrix is invertible (so T is one-to-one) and the standard matrix for

is

Thus

from which we conclude that

Linearity Properties Up to now we have focused exclusively on matrix transformations from only kinds of transformations from to . For example, if variables , then the equations

to

. However, these are not the are any functions of the n

define a transformation that maps the vector into the vector . But it is only in the case where these equations are linear that T is a matrix transformation. The question that we

will now consider is this:

Question Are there algebraic properties of a transformation a matrix transformation?

that can be used to determine whether T is

The answer is provided by the following theorem.

THEOREM 4.10.2 and

in

is a matrix transformation if and only if the following relationships hold for all vectors and for every scalar k:

(i) (ii)

Proof If T is a matrix transformation, then properties (i) and (ii) follow respectively from parts (c) and (b) of Theorem 4.9.1. Conversely, assume that properties (i) and (ii) hold. We must show that there exists an

matrix A such that

for every vector in . As a first step, recall from Formula (10) of Section 4.9 that the additivity and homogeneity properties imply that (9) for all scalars in which

and all vectors are the standard basis vectors for

It follows from Theorem 1.3.1 that coefficients are the entries

in

. Let A be the matrix .

is a linear combination of the columns of A in which the successive of . That is,

Using 9 we can rewrite this as which completes the proof. The additivity and homogeneity properties in Theorem 4.10.2 are called linearity conditions, and a transformation that satisfies these conditions is called a linear transformation. Using this terminology Theorem

4.10.2 can be restated as follows.

THEOREM 4.10.3 Every linear transformation from to is a matrix transformation, and conversely, every matrix transformation from to is a linear transformation.

More on the Equivalence Theorem As our final result in this section, we will add parts (b) and (c) of Theorem 4.10.1 to Theorem 4.8.10.

THEOREM 4.10.4 Equivalent Statements If A is an

matrix, then the following statements are equivalent.

(a) A is invertible. (b)

has only the trivial solution.

(c) The reduced row echelon form of A is

.

(d) A is expressible as a product of elementary matrices. (e)

is consistent for every

(f)

has exactly one solution for every

(g)

matrix . matrix .

.

(h) The column vectors of A are linearly independent. (i) The row vectors of A are linearly independent. (j) The column vectors of A span (k) The row vectors of A span

. .

(l) The column vectors of A form a basis for (m) The row vectors of A form a basis for

. .

(n) A has rank n. (o) A has nullity . (p) The orthogonal complement of the null space of A is (q) The orthogonal complement of the row space of A is (r) The range of (s)

is

is one-to-one.

.

. .

Concept Review • Composition of matrix transformations • Reflection about the origin • One-to-one transformation • Inverse of a matrix operator • Linearity conditions • Linear transformation • Equivalent characterizations of invertible matrices

Skills • Find the standard matrix for a composition of matrix transformations. • Determine whether a matrix operator is one-to-one; if it is, then find the inverse operator. • Determine whether a transformation is a linear transformation.

Exercise Set 4.10 In Exercises 1–2, let for and

and

be the operators whose standard matrices are given. Find the standard matrices

1.

Answer:

2.

3. Let (a) Find the standard matrices for (b) Find the standard matrices for

and and

. and

(c) Use the matrices obtained in part (b) to find formulas for Answer:

and

(a) (b) (c)

,

4. Let

and

(a) Find the standard matrices for (b) Find the standard matrices for

and

.

. and

.

(c) Use the matrices obtained in part (b) to find formulas for 5. Find the standard matrix for the stated composition in

and

.

.

(a) A rotation of 90°, followed by a reflection about the line

.

(b) An orthogonal projection on the y-axis, followed by a contraction with factor (c) A reflection about the x-axis, followed by a dilation with factor

.

.

Answer: (a) (b)

(c) 6. Find the standard matrix for the stated composition in

.

(a) A rotation of 60°, followed by an orthogonal projection on the x-axis, followed by a reflection about the line . (b) A dilation with factor

, followed by a rotation of 45°, followed by a reflection about the y-axis.

(c) A rotation of 15°, followed by a rotation of 105°, followed by a rotation of 60°. 7. Find the standard matrix for the stated composition in

.

(a) A reflection about the yz-plane, followed by an orthogonal projection on the xz-plane. (b) A rotation of 45° about the y-axis, followed by a dilation with factor

.

(c) An orthogonal projection on the xy-plane, followed by a reflection about the yz-plane. Answer: (a)

(b)

(c)

8. Find the standard matrix for the stated composition in

.

(a) A rotation of 30° about the x-axis, followed by a rotation of 30° about the z-axis, followed by a contraction with factor . (b) A reflection about the xy-plane, followed by a reflection about the xz-plane, followed by an orthogonal projection on the yz-plane. (c) A rotation of 270° about the x-axis, followed by a rotation of 90° about the y-axis, followed by a rotation of 180° about the z-axis. 9. Determine whether (a)

.

is the orthogonal projection on the x-axis, and

is the orthogonal projection on

the y-axis. (b)

is the rotation through an angle

, and

is the rotation through an angle

(c)

is the orthogonal projection on the x-axis, and

.

is the rotation through an angle

. Answer: (a) (b) (c) 10. Determine whether (a)

.

is a dilation by a factor k, and

is the rotation about the z-axis through an angle

. (b)

is the rotation about the x-axis through an angle the z-axis through an angle .

11. By inspection, determine whether the matrix operator is one-to-one. (a) the orthogonal projection on the x-axis in (b) the reflection about the y-axis in (c) the reflection about the line

in

(d) a contraction with factor

in

(e) a rotation about the z-axis in (f) a reflection about the xy-plane in (g) a dilation with factor

in

, and

is the rotation about

Answer: (a) Not one-to-one (b) One-to-one (c) One-to-one (d) One-to-one (e) One-to-one (f) One-to-one (g) One-to-one 12. Find the standard matrix for the matrix operator defined by the equations, and use Theorem 4.10.4 to determine whether the operator is one-to-one. (a) (b) (c)

(d)

13. Determine whether the matrix operator standard matrix for the inverse operator, and find (a) (b) (c) (d)

Answer: (a) One-to-one;

(b) Not one-to-one (c)

One-to-one;

defined by the equations is one-to-one; if so, find the .

(d) Not one-to-one 14. Determine whether the matrix operator

defined by the equations is one-to-one; if so, find the

standard matrix for the inverse operator, and find

.

(a)

(b)

(c)

(d)

15. By inspection, find the inverse of the given one-to-one matrix operator. (a) The reflection about the x-axis in

.

(b) The rotation through an angle of (c) The dilation by a factor of 3 in

in

.

.

(d) The reflection about the yz-plane in

.

(e) The contraction by a factor of

.

in

Answer: (a) Reflection about the x-axis (b) Rotation through the angle (c) Contraction by a factor of (d) Reflection about the yz-plane (e) Dilation by a factor of 5 In Exercises 16—17, use Theorem 4.10.2 to determine whether 16. (a) (b) (c) (d) 17. (a) (b)

is a matrix operator.

(c) (d) Answer: (a) Matrix operator (b) Not a matrix operator (c) Matrix operator (d) Not a matrix operator In Exercises 18–19, use Theorem 4.10.2 to determine whether

is a matrix transformation.

18. (a) (b) 19. (a) (b) Answer: (a) Matrix transformation (b) Matrix transformation 20. In each part, use Theorem 4.10.3 to find the standard matrix for the matrix operator from the images of the standard basis vectors. (a) The reflection operators on

in Table 1 of Section 4.9 .

(b) The reflection operators on

in Table 2 of Section 4.9 .

(c) The projection operators on

in Table 3 of Section 4.9 .

(d) The projection operators on

in Table 4 of Section 4.9 .

(e) The rotation operators on

in Table 5 of Section 4.9 .

(f) The dilation and contraction operators on

in Table 8 of Section 4.9 .

21. Find the standard matrix for the given matrix operator. (a)

projects a vector orthogonally onto the x-axis and then reflects that vector about the y-axis.

(b)

reflects a vector about the line

(c)

dilates a vector by a factor of 3, then reflects that vector about the line projects that vector orthogonally onto the y-axis.

Answer: (a)

and then reflects that vector about the x-axis. , and then

(b) (c) 22. Find the standard matrix for the given matrix operator. (a)

reflects a vector about the xz-plane and then contracts that vector by a factor of

.

(b)

projects a vector orthogonally onto the xz-plane and then projects that vector orthogonally onto the xy-plane.

(c)

reflects a vector about the xy-plane, then reflects that vector about the xz-plane, and then reflects that vector about the yz-plane.

23. Let

and let (a)

be multiplication by

,

, and ,

be the standard basis vectors for

. Find the following vectors by inspection.

, and

(b) (c) Answer: (a) (b) (c) 24. Determine whether multiplication by A is a one-to-one matrix transformation. (a)

(b) (c)

25. (a) Is a composition of one-to-one matrix transformations one-to-one? Justify your conclusion. (b) Can the composition of a one-to-one matrix transformation and a matrix transformation that is not one-to-one be one-to-one? Account for both possible orders of composition and justify your conclusion. Answer:

(a) Yes (b) Yes 26. Show that

defines a matrix operator on

but

27. (a) Prove: If is a matrix transformation, then into the zero vector in .

does not. ; that is, T maps the zero vector in

(b) The converse of this is not true. Find an example of a function that satisfies transformation.

but is not a matrix

Answer: (b) 28. Prove: An every vector

matrix A is invertible if and only if the linear system in for which the system is consistent.

29. Let A be an

matrix such that

, and let

has exactly one solution for

be multiplication by A.

(a) What can you say about the range of the matrix T? Give an example that illustrates your conclusion. (b) What can you say about the number of vectors that T maps into ? Answer: (a) The range of T is a proper subset of

.

(b) T must map infinitely many vectors to 0. 30. Prove: If the matrix transformation

is one-to-one, then A is invertible.

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer. (a) If

and

, then T is a matrix transformation.

Answer: False (b) If and , then T is a matrix transformation.

for all scalars

and

and all vectors

and

in

Answer: True (c) If

is a one-to-one matrix transformation, then there are no distinct vectors .

Answer: True

and

for which

(d) If

is a matrix transformation and

, then T is one-to-one.

is a matrix transformation and

, then T is one-to-one.

is a matrix transformation and

, then T is one-to-one.

Answer: False (e) If Answer: False (f) If Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

4.11 Geometry of Matrix Operators on In this optional section we will discuss matrix operators on have important applications to computer graphics.

in a little more depth. The ideas that we will develop here

Transformations of Regions In Section 4.9 we focused on the effect that a matrix operator has on individual vectors in and . However, it is also important to understand how such operators affect the shapes of regions. For example, Figure 4.11.1 shows a famous picture of Albert Einstein and three computer-generated modifications of that image that result from matrix operators on . The original picture was scanned and then digitized to decompose it into a rectangular array of pixels. The pixels were then transformed as follows: • The program MATLAB was used to assign coordinates and a gray level to each pixel. • The coordinates of the pixels were transformed by matrix multiplication. • The pixels were then assigned their original gray levels to produce the transformed picture.

Figure 4.11.1 The overall effect of a matrix operator on can often be ascertained by graphing the images of the vertices , and of the unit square (Figure 4.11.2). Table 1 shows the effect that some of the matrix operators studied in Section 4.9 have on the unit square. For clarity, we have shaded a portion of the original square and its corresponding image.

Figure 4.11.2

Table 1

E X A M P L E 1 Transforming with Diagonal Matrices Suppose that the xy-plane first is compressed or expanded by a factor of in the x-direction and then is compressed or expanded by a factor of in the y-direction. Find a single matrix operator that performs both operations. Solution The standard matrices for the two operations are

Thus, the standard matrix for the composition of the x-operation followed by the y-operation is (1) This shows that multiplication by a diagonal matrix compresses or expands the plane in the x-direction and also in the y-direction. In the special case where and are the same, say Formula 1 simplifies to

,

which is a contraction or a dilation (Table 7 of Section 4.9 ).

E X A M P L E 2 Finding Matrix Operators (a) Find the standard matrix for the operator on that first shears by a factor of 2 in the x-direction and then reflects the result about the line . Sketch the image of the unit square under this operator. (b) Find the standard matrix for the operator on that first reflects about and then shears by a factor of 2 in the x-direction. Sketch the image of the unit square under this operator. (c) Confirm that the shear and the reflection in parts (a) and (b) do not commute. Solution (a) The standard matrix for the shear is

and for the reflection is

Thus, the standard matrix for the shear followed by the reflection is

(b) The standard matrix for the reflection followed by the shear is

(c) The computations in Solutions (a) and (b) show that , so the standard matrices, and hence the operators, do not commute. The same conclusion follows from Figures 4.11.3 and 4.11.4, since the two operators produce different images of the unit square.

Figure 4.11.3

Figure 4.11.4

Geometry of One-to-One Matrix Operators We will now turn our attention to one-to-one matrix operators on , which are important because they map distinct points into distinct points. Recall from Theorem 4.10.4 (the Equivalence Theorem) that a matrix transformation is one-to-one if and only if A can be expressed as a product of elementary matrices. Thus, we can analyze the effect of any one-to-one transformation by first factoring the matrix A into a product of elementary matrices, say and then expressing

as the composition (2)

The following theorem explains the geometric effect of matrix operators corresponding to elementary matrices.

THEOREM 4.11.1 If E is an elementary matrtix, then

is one of the following:

(a) A shear along a coordinate axis. (b) A reflection about

.

(c) A compression along a coordinate axis. (d) An expansion along a coordinate axis. (e) A reflection about a coordinate axis. (f) A compression or expansion along a coordinate axis followed by a reflection about a coordinate axis.

Proof Because a elementary matrix results from performing a single elementary row operation on the identity matrix, such a matrix must have one of the following forms (verify):

The first two matrices represent shears along coordinate axes, and the third represents a reflection about . If , the last two matrices represent compressions or expansions along coordinate axes, depending on whether or . If , and if we express k in the form , where , then the last two matrices can be written as (3)

(4) Since , the product in 3 represents a compression or expansion along the x-axis followed by a reflection about the y-axis, and 4 represents a compression or expansion along the y-axis followed by a reflection about the x-axis. In the case where , transformations 3 and 4 are simply reflections about the y-axis and x-axis, respectively. Since every invertible matrix is a product of elementary matrices, the following result follows from Theorem 4.11.1 and Formula 2.

THEOREM 4.11.2 If is multiplication by an invertible matrix A, then the geometric effect of appropriate succession of shears, compressions, expansions, and reflections.

is the same as an

E X A M P L E 3 Analyzing the Geometric Effect of a Matrix Operator Assuming that

and

are positive, express the diagonal matrix

as a product of elementary matrices, and describe the geometric effect of multiplication by A in terms of compressions and expansions. Solution From Example 1 we have

which shows that multiplication by A has the geometric effect of compressing or expanding by a factor of in the x-direction and then compressing or expanding by a factor of in the y-direction.

E X A M P L E 4 Analyzing the Geometric Effect of a Matrix Operator Express

as a product of elementary matrices, and then describe the geometric effect of multiplication by A in terms of shears, compressions, expansions, and reflections. Solution A can be reduced to I as follows:

The three successive row operations can be performed by multiplying A on the left successively by

Inverting these matrices and using Formula 4 of Section 1.5 yields

Reading from right to left and noting that

it follows that the effect of multiplying by A is equivalent to 1. shearing by a factor of 2 in the x-direction, 2. then expanding by a factor of 2 in the y-direction, 3. then reflecting about the x-axis, 4. then shearing by a factor of 3 in the y-direction.

Images of Lines Under Matrix Operators Many images in computer graphics are constructed by connecting points with line segments. The following theorem, some of whose parts are proved in the exercises, is helpful for understanding how matrix operators transform such figures.

THEOREM 4.11.3 If

is multiplication by an invertible matrix, then:

(a) The image of a straight line is a straight line. (b) The image of a straight line through the origin is a straight line through the origin. (c) The images of parallel straight lines are parallel straight lines. (d) The image of the line segment joining points P and Q is the line segment joining the images of P and Q. (e) The images of three points lie on a line if and only if the points themselves lie on a line.

Note that it follows from Theorem 4.11.3 that if A is an invertible matrix, then multiplication by A maps triangles into triangles and parallelograms into parallelograms.

E X A M P L E 5 Image of a Square Sketch the image of the square with vertices

,

, and

under multiplication by

Solution Since

the image of the square is a parallelogram with vertices 4.11.5).

, and

(Figure

Figure 4.11.5

E X A M P L E 6 Image of a Line According to Theorem 4.11.3, the invertible matrix

maps the line

into another line. Find its equation.

Solution Let be a point on the line multiplication by A. Then

so

Substituting in

Thus

yields

satisfies

which is the equation we want.

, and let

be its image under

Concept Review • Effect of a matrix operator on the unit square • Geometry of one-to-one matrix operators • Images of lines under matrix operators

Skills • Find standard matrices for geometric transformations of

.

• Describe the geometric effect of an invertible matrix operator. • Find the image of the unit square under a matrix operator. • Find the image of a line under a matrix operator.

Exercise Set 4.11 1. Find the standard matrix for the operator (a) its reflection about the line

that maps a point

into

.

(b) its reflection through the origin. (c) its orthogonal projection on the x-axis. (d) its orthogonal projection on the y-axis. Answer: (a) (b) (c) (d) 2. For each part of Exercise 1, use the matrix you have obtained to compute by plotting the points and .

. Check your answers geometrically

3. Find the standard matrix for the operator

into

(a) its reflection through the xy-plane. (b) its reflection through the xz-plane. (c) its reflection through the yz-plane. Answer: (a)

that maps a point

(b)

(c)

4. For each part of Exercise 3, use the matrix you have obtained to compute geometrically by plotting the points and . 5. Find the standard matrix for the operator

. Check your answers

that

(a) rotates each vector 90° counterclockwise about the z-axis (looking along the positive z-axis toward the origin). (b) rotates each vector 90° counterclockwise about the x-axis (looking along the positive x-axis toward the origin). (c) rotates each vector 90° counterclockwise about the y-axis (looking along the positive y-axis toward the origin). Answer: (a)

(b)

(c)

6. Sketch the image of the rectangle with vertices

,

,

, and

under

(a) a reflection about the x-axis. (b) a reflection about the y-axis. (c) a compression of factor (d) an expansion of factor

in the y-direction. in the x-direction.

(e) a shear of factor

in the x-direction.

(f) a shear of factor

in the y-direction.

7. Sketch the image of the square with vertices

Answer: Rectangle with vertices at (0, 0), 8. Find the matrix that rotates a point (a) 45° (b) 90° (c) 180° (d) 270°

about the origin

, and

under multiplication by

(e) −30° 9. Find the matrix that shears by (a) a factor of

in the y-direction.

(b) a factor of

in the x-direction.

Answer: (a) (b) 10. Find the matrix that compresses or expands by (a) a factor of

in the y-direction.

(b) a factor of 6 in the x-direction. 11. In each part, describe the geometric effect of multiplication by A. (a) (b) (c)

Answer: (a) Expansion by a factor of 3 in the x-direction (b) Expansion by a factor of 5 in the y-direction and reflection about the x-axis (c) Shearing by a factor of 4 in the x-direction 12. In each part, express the matrix as a product of elementary matrices, and then describe the effect of multiplication by A in terms of compressions, expansions, reflections, and shears. (a) (b) (c) (d) 13. In each part, find a single matrix that performs the indicated succession of operations. (a) Compresses by a factor of

in the x-direction, then expands by a factor of 5 in the y-direction.

(b) Expands by a factor of 5 in the y-direction, then shears by a factor of 2 in the y-direction. (c) Reflects about Answer:

, then rotates through an angle of 180° about the origin.

(a)

(b) (c) 14. In each part, find a single matrix that performs the indicated succession of operations. (a) Reflects about the y-axis, then expands by a factor of 5 in the x-direction, and then reflects about (b) Rotates through 30° about the origin, then shears by a factor of factor of 3 in the y-direction.

.

in the y-direction, and then expands by a

15. Use matrix inversion to show the following. (a) The inverse transformation for a reflection about

is a reflection about

.

(b) The inverse transformation for a compression along an axis is an expansion along that axis. (c) The inverse transformation for a reflection about a coordinate axis is a reflection about that axis. (d) The inverse transformation for a shear along a coordinate axis is a shear along that axis. 16. Find an equation of the image of the line

under multiplication by

17. In parts (a) through (e), find an equation of the image of the line

under

(a) a shear of factor 3 in the x-direction. (b) a compression of factor (c) a reflection about

in the y-direction. .

(d) a reflection about the y-axis. (e) a rotation of 60° about the origin. Answer: (a) (b) (c) (d) (e)

18. Find the matrix for a shear in the x-direction that transforms the triangle with vertices a right triangle with the right angle at the origin. 19. (a) Show that multiplication by

maps each point in the plane onto the line

.

, and

into

(b) It follows from part (a) that the noncollinear points violate part (e) of Theorem 4.11.3?

are mapped onto a line. Does this

Answer: (b) No 20. Prove part (a) of Theorem 4.11.3. [Hint: A line in the plane has an equation of the form , where A and B are not both zero. Use the method of Example 6 to show that the image of this line under multiplication by the invertible matrix

has the equation

, where

and Then show that

and

are not both zero to conclude that the image is a line.]

21. Use the hint in Exercise 20 to prove parts (b) and (c) of Theorem 4.11.3. 22. In each part of the accompanying figure, find the standard matrix for the operator described.

Figure Ex-22 23. In the shear in the xy-direction with factor k is the matrix transformation that moves each point to the xy-plane to the new position . (See the accompanying figure.)

parallel

(a) Find the standard matrix for the shear in the xy-direction with factor k. (b) How would you define the shear in the xz-direction with factor k and the shear in the yz-direction with factor k? Find the standard matrices for these matrix transformations.

Figure Ex-23

Answer: (a)

(b) Shear in the xz-direction with

factor k maps (x, y, z) to

:

.

Shear in the yz-direction with factor k maps (x, y, z) to

:

.

True-False Exercises In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) The image of the unit square under a one-to-one matrix operator is a square. Answer: False (b) A invertible matrix operator has the geometric effect of a succession of shears, compressions, expansions, and reflections. Answer: True (c) The image of a line under a one-to-one matrix operator is a line. Answer: True (d) Every reflection operator on

is its own inverse.

Answer: True (e)

The matrix

represents reflection about a line.

Answer: False (f)

The matrix Answer:

represents a shear.

False (g)

The matrix

represents an expansion.

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

4.12 Dynamical Systems and Markov Chains In this optional section we will show how matrix methods can be used to analyze the behavior of physical systems that evolve over time. The methods that we will study here have been applied to problems in business, ecology, demographics, sociology, and most of the physical sciences.

Dynamical Systems A dynamical system is a finite set of variables whose values change with time. The value of a variable at a point in time is called the state of the variable at that time, and the vector formed from these states is called the state of the dynamical system at that time. Our primary objective in this section is to analyze how the state of a dynamical system changes with time. Let us begin with an example.

E X A M P L E 1 Market Share as a Dynamical System Suppose that two competing television channels, channel 1 and channel 2, each have 50% of the viewer market at some initial point in time. Assume that over each one-year period channel 1 captures 10% of channel 2's share, and channel 2 captures 20% of channel 1's share (see Figure 4.12.1). What is each channel's market share after one year?

Figure 4.12.1 Solution Let us begin by introducing the time-dependent variables

and the column vector

The variables and form a dynamical system whose state at time t is the vector . If we take to be the starting point at which the two channels had 50% of the market, then the state of the system at that time is (1) Now let us try to find the state of the system at time (one year later). Over the one-year period, channel 1 retains 80% of its initial 50%, and it gains 10% of channel 2's initial 50%. Thus,

(2) Similarly, channel 2 gains 20% of channel 1's initial 50%, and retains 90% of its initial 50%. Thus, (3) Therefore, the state of the system at time

is (4)

E X A M P L E 2 Evolution of Market Share over Five Years Track the market shares of channels 1 and 2 in Example 1 over a five-year period. Solution To solve this problem suppose that we have already computed the market share of each channel at time and we are interested in using the known values of and to compute the market shares and one year later. The analysis is exactly the same as that used to obtain Equations 2 and 3. Over the one-year period, channel 1 retains 80% of its starting fraction and gains 10% of channel 2's starting fraction . Thus, (5) Similarly, channel 2 gains 20% of channel 1's starting fraction fraction . Thus,

and retains 90% of its own starting

(6) Equations 5 and 6 can be expressed in matrix form as (7) which provides a way of using matrix multiplication to compute the state of the system at time from the state at time . For example, using 1 and 7 we obtain

which agrees with 4. Similarly,

We can now continue this process, using Formula 7 to compute and so on. This yields (verify)

from

, then

from

,

(8) Thus, after five years, channel 1 will hold about 36% of the market and channel 2 will hold about 64%of the market.

If desired, we can continue the market analysis in the last example beyond the five-year period and explore what happens to the market share over the long term. We did so, using a computer, and obtained the following state vectors (rounded to six decimal places): (9) All subsequent state vectors, when rounded to six decimal places, are the same as , so we see that the market shares eventually stabilize with channel 1 holding about one-third of the market and channel 2 holding about two-thirds. Later in this section, we will explain why this stabilization occurs.

Markov Chains In many dynamical systems the states of the variables are not known with certainty but can be expressed as probabilities; such dynamical systems are called stochastic processes (from the Greek word stokastikos, meaning “proceeding by guesswork”). A detailed study of stochastic processes requires a precise definition of the term probability, which is outside the scope of this course. However, the following interpretation will suffice for our present purposes: Stated informally, the probability that an experiment or observation will have a certain outcome is approximately the fraction of the time that the outcome would occur if the experiment were to be repeated many times under constant conditions—the greater the number of repetitions, the more accurately the probability describes the fraction of occurrences.

For example, when we say that the probability of tossing heads with a fair coin is

, we mean that if the coin were

tossed many times under constant conditions, then we would expect about half of the outcomes to be heads. Probabilities are often expressed as decimals or percentages. Thus, the probability of tossing heads with a fair coin can also be expressed as 0.5 or 50%. If an experiment or observation has n possible outcomes, then the probabilities of those outcomes must be nonnegative fractions whose sum is 1. The probabilities are nonnegative because each describes the fraction of occurrences of an outcome over the long term, and the sum is 1 because they account for all possible outcomes. For example, if a box containing 10 balls has one red ball, three green balls, and six yellow balls, and if a ball is drawn at random from the box, then the probabilities of the various outcomes are

Each probability is a nonnegative fraction and In a stochastic process with n possible states, the state vector at each time t has the form

The entries in this vector must add up to 1 since they account for all n possibilities. In general, a vector with nonnegative entries that add up to 1 is called a probability vector.

E X A M P L E 3 Example 1 Revisited from the Probability Viewpoint Observe that the state vectors in Example 1 and Example 2 are all probability vectors. This is to be expected since the entries in each state vector are the fractional market shares of the channels, and together they account for the entire market. In practice, it is preferable to interpret the entries in the state vectors as probabilities rather than exact market fractions, since market information is usually obtained by statistical sampling procedures with intrinsic uncertainties. Thus, for example, the state vector

which we interpreted in Example 1 to mean that channel 1 has 45% of the market and channel 2 has 55%, can also be interpreted to mean that an individual picked at random from the market will be a channel 1 viewer with probability 0.45 and a channel 2 viewer with probability 0.55.

A square matrix, each of whose columns is a probability vector, is called a stochastic matrix. Such matrices commonly occur in formulas that relate successive states of a stochastic process. For example, the state vectors and in 7 are related by an equation of the form in which (10) is a stochastic matrix. It should not be surprising that the column vectors of P are probability vectors, since the entries in each column provide a breakdown of what happens to each channel's market share over the year—the entries in column 1 convey that each year channel 1 retains 80% of its market share and loses 20%; and the entries in column 2 convey that each year channel 2 retains 90% of its market share and loses 10%. The entries in 10 can also be viewed as probabilities:

Example 1 is a special case of a large class of stochastic processes, called Markov chains.

Andrei Andreyevich Markov (1856–1922) Historical Note Markov chains are named in honor of the Russian mathematician A. A. Markov, a lover of poetry, who used them to analyze the alternation of vowels and consonants in the poem Eugene Onegin by Pushkin. Markov believed that the only applications of his chains were to the analysis of literary works, so he would be astonished to learn that his discovery is used today in the social sciences, quantum theory, and genetics! [Image: wikipedia]

DEFINITION 1 A Markov chain is a dynamical system whose state vectors at a succession of time intervals are probability vectors and for which the state vectors at successive time intervals are related by an equation of the form in which

is a stochastic matrix and is the probability that the system will be in state i at time if it is in state j at time . The matrix P is called the transition matrix for the system.

Remark Note that in this definition the row index i corresponds to the later state and the column index j to the earlier state (Figure 4.12.2).

Figure 4.12.2

E X A M P L E 4 Wildlife Migration as a Markov Chain Suppose that a tagged lion can migrate over three adjacent game reserves in search of food, reserve 1, reserve 2, and reserve 3. Based on data about the food resources, researchers conclude that the monthly migration pattern of the lion can be modeled by a Markov chain with transition matrix

(see Figure 4.12.3). That is,

Assuming that t is in months and the lion is released in reserve 2 at time locations over a six-month period.

, track its probable

Figure 4.12.3 Solution Let respectively, at time

, and , and let

be the probabilities that the lion is in reserve 1, 2, or 3,

be the state vector at that time. Since we know with certainty that the lion is in reserve 2 at time initial state vector is

, the

We leave it for you to show that the state vectors over a six-month period are

As in Example 2, the state vectors here seem to stabilize over time with a probability of approximately 0.504 that the lion is in reserve 1, a probability of approximately 0.227 that it is in reserve 2, and a probability of approximately 0.269 that it is in reserve 3.

Markov Chains in Terms of Powers of the Transition Matrix In a Markov chain with an initial state of

For brevity, it is common to denote

, the successive state vectors are

by

, which allows us to write the successive state vectors more briefly as (11)

Note that Formula 12 makes it possible to compute the state vector without first computing the earlier state vectors as required in Formula 11. Alternatively, these state vectors can be expressed in terms of the initial state vector

as

from which it follows that (12)

E X A M P L E 5 Finding a State Vector Directly from x0 Use Formula 12 to find the state vector

in Example 2.

Solution From 1 and 7, the initial state vector and transition matrix are

We leave it for you to calculate

which agrees with the result in 8.

and show that

Long-Term Behavior of a Markov Chain We have seen two examples of Markov chains in which the state vectors seem to stabilize after a period of time. Thus, it is reasonable to ask whether all Markov chains have this property. The following example shows that this is not the case.

E X A M P L E 6 A Markov Chain That Does Not Stabilize The matrix

is stochastic and hence can be regarded as the transition matrix for a Markov chain. A simple calculation shows that , from which it follows that Thus, the successive states in the Markov chain with initial vector which oscillate between of are (verify).

and

are

. Thus, the Markov chain does not stabilize unless both components

A precise definition of what it means for a sequence of numbers or vectors to stabilize is given in calculus; however, that level of precision will not be needed here. Stated informally, we will say that a sequence of vectors approaches a limit or that it converges to if all entries in can be made as close as we like to the corresponding entries in the vector by taking k sufficiently large. We denote this by writing as . We saw in Example 6 that the state vectors of a Markov chain need not approach a limit in all cases. However, by imposing a mild condition on the transition matrix of a Markov chain, we can guarantee that the state vectors will approach a limit.

DEFINITION 2 A stochastic matrix P is said to be regular if P or some positive power of P has all positive entries, and a Markov chain whose transition matrix is regular is said to be a regular Markov chain.

E X A M P L E 7 Regular Stochastic Matrices

The transition matrices in Example 2 and Example 4 are regular because their entries are positive. The matrix

is regular because

has positive entries. The matrix P in Example 6 is not regular because P and every positive power of P have some zero entries (verify).

The following theorem, which we state without proof, is the fundamental result about the long-term behavior of Markov chains.

THEOREM 4.12.1 If P is the transition matrix for a regular Markov chain, then: (a) There is a unique probability vector (b) For any initial probability vector

such that

.

, the sequence of state vectors

converges to .

The vector in this theorem is called the steady-state vector of the Markov chain. It can be found by rewriting the equation in part (a) as and then solving this equation for

subject to the requirement that

be a probability vector. Here are some examples.

E X A M P L E 7 Example 1 and Example 2 Revisited The transition matrix for the Markov chain in Example 2 is

Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector To find we will solve the system , which we can write as

The general solution of this system is (verify), which we can write in vector form as

(13) For

to be a probability vector, we must have

which implies that

. Substituting this value in 13 yields the steady-state vector

which is consistent with the numerical results obtained in 9.

E X A M P L E 9 Example 4 Revisited The transition matrix for the Markov chain in Example 4 is

Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector To find we will solve the system , which we can write (using fractions) as

(14)

(We have converted to fractions to avoid roundoff error in this illustrative example.) We leave it for you to confirm that the reduced row echelon form of the coefficient matrix is

and that the general solution of 14 is (15) For

to be a probability vector we must have

, from which it follows that

(verify). Substituting this value in 15 yields the steady-state vector

(verify), which is consistent with the results obtained in Example 4.

Concept Review • Dynamical system • State of a variable • State of a dynamical system • Stochastic process • Probability • Probability vector • Stochastic matrix • Markov chain • Transition matrix • Regular stochastic matrix • Regular Markov chain • Steady-state vector

Skills • Determine whether a matrix is stochastic. • Compute the state vectors from a transition matrix and an initial state. • Determine whether a stochastic matrix is regular. • Determine whether a Markov chain is regular. • Find the steady-state vector for a regular transition matrix.

Exercise Set 4.12 In Exercises 1–2, determine whether A is a stochastic matrix. If A is not stochastic, then explain why not. 1. (a) (b)

(c)

(d)

Answer: (a) Stochastic (b) Not stochastic (c) Stochastic (d) Not stochastic 2. (a) (b) (c)

(d)

In Exercises 3–4, use Formulas 11 and 12 to compute the state vector 3.

; Answer:

4. In Exercises 5–6, determine whether P is a regular stochastic matrix.

in two different ways.

5. (a)

(b)

(c)

Answer: (a) Regular (b) Not regular (c) Regular 6. (a)

(b)

(c)

In Exercises 7–10, verify that P is a regular stochastic matrix, and find the steady-state vector for the associated Markov chain. 7.

Answer:

8.

9.

Answer:

10.

11. Consider a Markov process with transition matrix

(a) What does the entry 0.2 represent? (b) What does the entry 0.1 represent? (c) If the system is in state 1 initially, what is the probability that it will be in state 2 at the next observation? (d) If the system has a 50% chance of being in state 1 initially, what is the probability that it will be in state 2 at the next observation? Answer: (a) Probability that something in state 1 stays in state 1 (b) Probability that something in state 2 moves to state 1 (c) 0.8 (d) 0.85 12. Consider a Markov process with transition matrix

(a) What does the entry

represent?

(b) What does the entry 0 represent?

(c) If the system is in state 1 initially, what is the probability that it will be in state 1 at the next observation? (d) If the system has a 50% chance of being in state 1 initially, what is the probability that it will be in state 2 at the next observation? 13. On a given day the air quality in a certain city is either good or bad. Records show that when the air quality is good on one day, then there is a 95% chance that it will be good the next day, and when the air quality is bad on one day, then there is a 45% chance that it will be bad the next day. (a) Find a transition matrix for this phenomenon. (b) If the air quality is good today, what is the probability that it will be good two days from now? (c) If the air quality is bad today, what is the probability that it will be bad three days from now? (d) If there is a 20% chance that the air quality will be good today, what is the probability that it will be good tomorrow? Answer: (a) (b) 0.93 (c) 0.142 (d) 0.63 14. In a laboratory experiment, a mouse can choose one of two food types each day, type I or type II. Records show that if the mouse chooses type I on a given day, then there is a 75% chance that it will choose type I the next day, and if it chooses type II on one day, then there is a 50% chance that it will choose type II the next day. (a) Find a transition matrix for this phenomenon. (b) If the mouse chooses type I today, what is the probability that it will choose type I two days from now? (c) If the mouse chooses type II today, what is the probability that it will choose type II three days from now? (d) If there is a 10% chance that the mouse will choose type I today, what is the probability that it will choose type I tomorrow? 15. Suppose that at some initial point in time 100,000 people live in a certain city and 25,000 people live in its suburbs. The Regional Planning Commission determines that each year 5% of the city population moves to the suburbs and 3% of the suburban population moves to the city. (a) Assuming that the total population remains constant, make a table that shows the populations of the city and its suburbs over a five-year period (round to the nearest integer). (b) Over the long term, how will the population be distributed between the city and its suburbs? Answer: (a) Year

1

2

3

4

5

City

95,750

91,840

88,243

84,933

81,889

Suburbs

29,250

33,160

36,757

40,067

43,111

(b) City

46,875

Suburbs

78,125

16. Suppose that two competing television stations, station 1 and station 2, each have 50% of the viewer market at some initial point in time. Assume that over each one-year period station 1 captures 5% of station 2's market share and station 2 captures 10% of station 1's market share. (a) Make a table that shows the market share of each station over a five-year period. (b) Over the long term, how will the market share be distributed between the two stations? 17. Suppose that a car rental agency has three locations, numbered 1, 2, and 3. A customer may rent a car from any of the three locations and return it to any of the three locations. Records show that cars are rented and returned in accordance with the following probabilities: Rented from Location 1

2

3

1 Returned to Location 2 3 (a) Assuming that a car is rented from location 1, what is the probability that it will be at location 1 after two rentals? (b) Assuming that this dynamical system can be modeled as a Markov chain, find the steady-state vector. (c) If the rental agency owns 120 cars, how many parking spaces should it allocate at each location to be reasonably certain that it will have enough spaces for the cars over the long term? Explain your reasoning. Answer: (a) (b)

(c) 35, 50, 35 18. Physical traits are determined by the genes that an offspring receives from its parents. In the simplest case a trait in the offspring is determined by one pair of genes, one member of the pair inherited from the male parent and the other from the female parent. Typically, each gene in a pair can assume one of two forms, called alleles, denoted by A and a. This leads to three possible pairings: called genotypes (the pairs Aa and aA determine the same trait and hence are not distinguished from one another). It is shown in the study of heredity that if a parent of known genotype is crossed with a random parent of unknown genotype, then the offspring will have the genotype probabilities given in the following table, which can be viewed as a transition matrix for a Markov process:

Genotype of Parent AA

Aa

aa

AA Genotype of Offspring

0

Aa aa

0

Thus, for example, the offspring of a parent of genotype AA that is crossed at random with a parent of unknown genotype will have a 50% chance of being AA, a 50% chance of being Aa, and no chance of being aa. (a) Show that the transition matrix is regular. (b) Find the steady-state vector, and discuss its physical interpretation. 19. Fill in the missing entries of the stochastic matrix

and find its steady-state vector. Answer:

20. If P is an

stochastic matrix, and if M is a

matrix whose entries are all 1's, then

21. If P is a regular stochastic matrix with steady-state vector , what can you say about the sequence of products as

?

Answer: for every positive integer k 22. (a) If P is a regular stochastic matrix with steady-state vector , and if vectors in column form, what can you say about the behavior of the sequence as

for each

are the standard unit

?

(b) What does this tell you about the behavior of the column vectors of

as

?

23. Prove that the product of two stochastic matrices is a stochastic matrix. [Hint: Write each column of the product as a linear combination of the columns of the first factor.

24. Prove that if P is a stochastic matrix whose entries are all greater than or equal to , then the entries of greater than or equal to .

are

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) The vector

is a probability vector.

Answer: True (b)

The matrix

is a regular stochastic matrix.

Answer: True (c) The column vectors of a transition matrix are probability vectors. Answer: True (d) A steady-state vector for a Markov chain with transition matrix P is any solution of the linear system Answer: False (e) The square of every regular stochastic matrix is stochastic. Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

.

Chapter 4 Supplementary Exercises 1. Let V be the set of all ordered pairs of real numbers, and consider the following addition and scalar multiplication operations on and : (a) Compute

and

for

,

, and

.

(b) In words, explain why V is closed under addition and scalar multiplication. (c) Since the addition operation on V is the standard addition operation on , certain vector space axioms hold for V because they are known to hold for . Which axioms in Definition 1 of Section 4.1 are they? (d) Show that Axioms 7, 8, and 9 hold. (e) Show that Axiom 10 fails for the given operations. Answer: (a) (c) Axioms 1–5 2. In each part, the solution space of the system is a subspace of and so must be a line through the origin, a plane through the origin, all of , or the origin only. For each system, determine which is the case. If the subspace is a plane, find an equation for it, and if it is a line, find parametric equations. (a) (b)

(c)

(d)

3. For what values of s is the solution space of

the origin only, a line through the origin, a plane through the origin, or all of Answer:

?

If

the solution space is the origin. If , the solution space is a plane through the origin. If , the solution space is a line through the origin.

4. (a) Express

as a linear combination of

(b) Express

and

.

as a linear combination of

and

. (c) Express

as a linear combination of three nonzero vectors.

5. Let W be the space spanned by

and

.

(a) Show that for any value of , (b) Show that

and

6. (a) Express different ways.

and

are vectors in W.

form a basis for W. as a linear combination of

,

, and

in two

(b) Explain why this does not violate Theorem 4.4.1. 7. Let A be an matrix, and let matrices. What must be true about A for

be linearly independent vectors in expressed as to be linearly independent?

Answer: A must be invertible

8. Must a basis for

contain a polynomial of degree k for each

? Justify your answer.

9. For the purpose of this exercise, let us define a “checkerboard matrix” to be a square matrix such that

Find the rank and nullity of the following checkerboard matrices. (a) The

checkerboard matrix.

(b) The

checkerboard matrix.

(c) The

checkerboard matrix.

Answer: (a) (b) (c) 10. For the purpose of this exercise, let us define an “X-matrix” to be a square matrix with an odd number of rows and columns that has 0's everywhere except on the two diagonals where it has 1's. Find the rank and nullity of the following X-matrices. (a)

(b)

(c) the X-matrix of size 11. In each part, show that the stated set of polynomials is a subspace of (a) All polynomials in

such that

(b) All polynomials in

such that

and find a basis for it.

. .

Answer: (a)

where

if n is even and

if n is odd.

(b) 12. (Calculus required) Show that the set of all polynomials in subspace of . Find a basis for this subspace. 13. (a) Find a basis for the vector space of all (b) Find a basis for the vector space of all

that have a horizontal tangent at

is a

symmetric matrices. skew-symmetric matrices.

Answer: (a)

(b)

14. Various advanced texts in linear algebra prove the following determinant criterion for rank: The rank of a matrix A is r if and only if A has some submatrix with a nonzero determinant, and all square submatrices of larger size have determinant zero. [Note: A submatrix of A is any matrix obtained by deleting rows or columns of A. The matrix A itself is also considered to be a submatrix of A.] In each part, use this criterion to find the rank of the matrix. (a) (b) (c)

(d)

15. Use the result in Exercise 14 above to find the possible ranks for matrices of the form

Answer: Possible ranks are 2, 1, and 0. 16. Prove: If S is a basis for a vector space , then for any vectors relationships hold. (a) (b)

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

and

in V and any scalar k, the following

CHAPTER

5

Eigenvalues and Eigenvectors

CHAPTER CONTENTS 5.1. Eigenvalues and Eigenvectors 5.2. Diagonalization 5.3. Complex Vector Spaces 5.4. Differential Equations

INTRODUCTION In this chapter we will focus on classes of scalars and vectors known as “eigenvalues” and “eigenvectors,” terms derived from the German word eigen, meaning “own,” “peculiar to,” “characteristic,” or “individual.” The underlying idea first appeared in the study of rotational motion but was later used to classify various kinds of surfaces and to describe solutions of certain differential equations. In the early 1900s it was applied to matrices and matrix transformations, and today it has applications in such diverse fields as computer graphics, mechanical vibrations, heat flow, population dynamics, quantum mechanics, and economics to name just a few.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

5.1 Eigenvalues and Eigenvectors In this section we will define the notions of “eigenvalue” and “eigenvector” and discuss some of their basic properties.

Definition of Eigenvalue and Eigenvector We begin with the main definition in this section.

DEFINITION 1 If A is an operator

) if

matrix, then a nonzero vector x in is a scalar multiple of x; that is,

is called an eigenvector of A (or of the matrix

for some scalar . The scalar is called an eigenvalue of A (or of eigenvector corresponding to .

), and x is said to be an

The requirement that an eigenvector be nonzero is imposed to avoid the unimportant case , which holds for every A and

In general, the image of a vector x under multiplication by a square matrix A differs from x in both magnitude and direction. However, in the special case where x is an eigenvector of A, multiplication by A leaves the direction unchanged. For example, in or multiplication by A maps each eigenvector x of A (if any) along the same line through the origin as x. Depending on the sign and magnitude of the eigenvalue corresponding to x, the operation compresses or stretches x by a factor of , with a reversal of direction in the case where is negative (Figure 5.1.1).

Figure 5.1.1

E X A M P L E 1 Eigenvector of a 2 × 2 Matrix

The vector

is an eigenvector of

corresponding to the eigenvalue

, since

Geometrically, multiplication by A has stretched the vector x by a factor of 3 (Figure 5.1.2).

Figure 5.1.2

Computing Eigenvalues and Eigenvectors Our next objective is to obtain a general procedure for finding eigenvalues and eigenvectors of an matrix A. We will begin with the problem of finding the eigenvalues of A. Note first that the equation can be rewritten as , or equivalently, as For to be an eigenvalue of A this equation must have a nonzero solution for x. But it follows from parts (b) and (g) of Theorem 4.10.4 that this is so if and only if the coefficient matrix has a zero determinant. Thus, we have the following result.

THEOREM 5.1.1 If A is an

matrix, then

is an eigenvalue of A if and only if it satisfies the equation (1)

This is called the characteristic equation of A.

E X A M P L E 2 Finding Eigenvalues In Example 1 we observed that

is an eigenvalue of the matrix

but we did not explain how we found it. Use the characteristic equation to find all eigenvalues of this matrix. Solution It follows from Formula 1 that the eigenvalues of A are the solutions of the equation , which we can write as

from which we obtain (2) This shows that the eigenvalues of A are and . Thus, in addition to the eigenvalue noted in Example 1, we have discovered a second eigenvalue

.

When the determinant that appears on the left side of 1 is expanded, the result is a polynomial of degree n that is called the characteristic polynomial of A. For example, it follows from 2 that the characteristic polynomial of the matrix A in Example 2 is which is a polynomial of degree 2. In general, the characteristic polynomial of an in which the coefficient of follows that the equation

matrix has the form

is 1 (Exercise 17). Since a polynomial of degree n has at most n distinct roots, it

(3) has at most n distinct solutions and consequently that an matrix has at most n distinct eigenvalues. Since some of these solutions may be complex numbers, it is possible for a matrix to have complex eigenvalues, even if that matrix itself has real entries. We will discuss this issue in more detail later, but for now we will focus on examples in which the eigenvalues are real numbers.

E X A M P L E 3 Eigenvalues of a 3 × 3 Matrix Find the eigenvalues of

Solution The characteristic polynomial of A is

The eigenvalues of A must therefore satisfy the cubic equation (4)

To solve this equation, we will begin by searching for integer solutions. This task can be simplified by exploiting the fact that all integer solutions (if there are any) of a polynomial equation with integer coefficients

In applications involving large matrices it is often not feasible to compute the characteristic equation directly so other methods must be used to find eigenvalues. We will consider such methods in Chapter 9. must be divisors of the constant term, . Thus, the only possible integer solutions of 4 are the divisors of , that is, , , . Successively substituting these values in 4 shows that is an integer solution. As a consequence, must be a factor of the left side of 4. Dividing into shows that 4 can be rewritten as

Thus, the remaining solutions of 4 satisfy the quadratic equation which can be solved by the quadratic formula. Thus the eigenvalues of A are

E X A M P L E 4 Eigenvalues of an Upper Triangular Matrix Find the eigenvalues of the upper triangular matrix

Solution Recalling that the determinant of a triangular matrix is the product of the entries on the main diagonal (Theorem 2.1.2), we obtain

Thus, the characteristic equation is and the eigenvalues are which are precisely the diagonal entries of A.

The following general theorem should be evident from the computations in the preceding example.

THEOREM 5.1.2 If A is an triangular matrix (upper triangular, lower triangular, or diagonal), then the eigenvalues of A are the entries on the main diagonal of A.

E X A M P L E 5 Eigenvalues of a Lower Triangular Matrix By inspection, the eigenvalues of the lower triangular matrix

are

,

, and

.

Had Theorem 5.1.2 been available earlier, we could have anticipated the result obtained in Example 2.

THEOREM 5.1.3 If A is an (a)

matrix, the following statements are equivalent.

is an eigenvalue of A.

(b) The system of equations

has nontrivial solutions.

(c) There is a nonzero vector x such that (d)

is a solution of the characteristic equation

Finding Eigenvectors and Bases for Eigenspaces Now that we know how to find the eigenvalues of a matrix, we will consider the problem of finding the corresponding eigenvectors. Since the eigenvectors corresponding to an eigenvalue of a matrix A are the nonzero vectors that satisfy the equation these eigenvectors are the nonzero vectors in the null space of the matrix . We call this null space the eigenspace of A corresponding to . Stated another way, the eigenspace of A corresponding to the eigenvalue is the solution space of the homogeneous system . Notice that is in every eigenspace even though it is not an eigenvector. Thus, it is the nonzero vectors in an eigenspace that are the eigenvectors.

E X A M P L E 6 Bases for Eigenspaces Find bases for the eigenspaces of the matrix

Solution In Example 1 we found the characteristic equation of A to be from which we obtained the eigenvalues and of A, one corresponding to each of these eigenvalues. By definition,

. Thus, there are two eigenspaces

is an eigenvector of A corresponding to an eigenvalue of , that is, of

If

if and only if x is a nontrivial solution

, then this equation becomes

whose general solution is

(verify) or in matrix form,

Thus,

is a basis for the eigenspace corresponding to . We leave it as an exercise for you to follow the pattern of these computations and show that

is a basis for the eigenspace corresponding to

.

Historical Note Methods of linear algebra are used in the emerging field of computerized face recognition. Researchers are working with the idea that every human face in a racial group is a combination of a few dozen primary shapes. For example, by analyzing three-dimensional scans of many faces, researchers at Rockefeller University have produced both an average head shape in the

Caucasian group—dubbed the meanhead (top row left in the figure to the left)—and a set of standardized variations from that shape, called eigenheads (15 of which are shown in the picture). These are so named because they are eigenvectors of a certain matrix that stores digitized facial information. Face shapes are represented mathematically as linear combinations of the eigenheads. [Image: Courtesy Dr. Joseph Atick, Dr. Norman Redlich, and Dr. Paul Griffith]

E X A M P L E 7 Eigenvectors and Bases for Eigenspaces Find bases for the eigenspaces of

Solution The characteristic equation of A is

, or in factored form,

(verify). Thus, the distinct eigenvalues of A are

and

, so there

are two eigenspaces of A. By definition,

is an eigenvector of A corresponding to , or in matrix form,

if and only if x is a nontrivial solution of

(5)

In the case where

, Formula 5 becomes

Solving this system using Gaussian elimination yields (verify) Thus, the eigenvectors of A corresponding to

Since

are the nonzero vectors of the form

are linearly independent (why?), these vectors form a basis for the eigenspace corresponding to . If

, then 5 becomes

Solving this system yields (verify) Thus, the eigenvectors corresponding to

is a basis for the eigenspace corresponding to

are the nonzero vectors of the form

.

Powers of a Matrix Once the eigenvalues and eigenvectors of a matrix A are found, it is a simple matter to find the eigenvalues and eigenvectors of any positive integer power of A; for example, if is an eigenvalue of A and x is a corresponding eigenvector, then which shows that following result.

is an eigenvalue of

and that x is a corresponding eigenvector. In general, we have the

THEOREM 5.1.4 If k is a positive integer, is an eigenvalue of

is an eigenvalue of a matrix A, and x is a corresponding eigenvector, then and x is a corresponding eigenvector.

E X A M P L E 8 Powers of a Matrix

In Example 7 we showed that the eigenvalues of

are and , so from Theorem 5.1.4 both . We also showed that

and

are eigenvalues of

are eigenvectors of A corresponding to the eigenvalue , so from Theorem 5.1.4 they are also eigenvectors of corresponding to . Similarly, the eigenvector

of A corresponding to the eigenvalue .

is also an eigenvector of

corresponding to

Eigenvalues and Invertibility The next theorem establishes a relationship between eigenvalues and the invertibility of a matrix.

THEOREM 5.1.5 A square matrix A is invertible if and only if

is not an eigenvalue of A.

Proof Assume that A is an equation

matrix and observe first that

if and only if the constant term But

is zero. Thus, it suffices to prove that A is invertible if and only if

or, on setting

,

is a solution of the characteristic

.

It follows from the last equation that invertible if and only if .

if and only if

, and this in turn implies that A is

E X A M P L E 9 Eigenvalues and Invertibility The matrix A in Example 7 is invertible since it has eigenvalues is zero. We leave it for you to check this conclusion by showing that

and

, neither of which .

More on the Equivalence Theorem As our final result in this section, we will use Theorem 5.1.5 to add one additional part to Theorem 4.10.4.

THEOREM 5.1.6 Equivalent Statements If A is an

matrix, then the following statements are equivalent.

(a) A is invertible. (b)

has only the trivial solution.

(c) The reduced row echelon form of A is

.

(d) A is expressible as a product of elementary matrices. (e)

is consistent for every

(f)

has exactly one solution for every

(g)

matrix b. matrix b.

.

(h) The column vectors of A are linearly independent. (i) The row vectors of A are linearly independent. (j) The column vectors of A span (k) The row vectors of A span

. .

(l) The column vectors of A form a basis for (m) The row vectors of A form a basis for (n) A has

. .

.

(o) A has nullity . (p) The orthogonal complement of the null space of A is (q) The orthogonal complement of the row space of A is (r) The range of (s)

is

is one-to-one.

.

. .

(t)

is not an eigenvalue of A.

This theorem relates all of the major topics we have studied thus far.

Concept Review • Eigenvector • Eigenvalue • Characteristic equation • Characteristic polynomial • Eigenspace • Equivalence Theorem

Skills • Find the eigenvalues of a matrix. • Find bases for the eigenspaces of a matrix.

Exercise Set 5.1 In Exercises 1–2, confirm by multiplication that x is an eigenvector of A, and find the corresponding eigenvalue. 1.

Answer: 5 2.

3. Find the characteristic equations of the following matrices: (a) (b)

(c) (d) (e) (f)

Answer: (a) (b) (c) (d) (e) (f) 4. Find the eigenvalues of the matrices in Exercise 3 5. Find bases for the eigenspaces of the matrices in Exercise 3 Answer: (a)

(b)

Basis for eigenspace corresponding to

; basis for eigenspace corresponding to

Basis for eigenspace corresponding to

(c) Basis for eigenspace corresponding to

(d) There are no eigenspaces. (e) (f)

Basis for eigenspace corresponding to Basis for eigenspace corresponding to

; basis for eigenspace corresponding to

6. Find the characteristic equations of the following matrices: (a)

(b)

(c)

(d)

(e)

(f)

7. Find the eigenvalues of the matrices in Exercise 6. Answer: (a) 1, 2, 3 (b) (c) (d) 2 (e) 2 (f) 8. Find bases for the eigenspaces of the matrices in Exercise 6. 9. Find the characteristic equations of the following matrices: (a)

(b)

Answer: (a) (b) 10. Find the eigenvalues of the matrices in Exercise 9. 11. Find bases for the eigenspaces of the matrices in Exercise 9. Answer: (a)

(b)

12. By inspection, find the eigenvalues of the following matrices: (a) (b)

(c)

13. Find the eigenvalues of

Answer:

for

14. Find the eigenvalues and bases for the eigenspaces of

for

15. Let A be a matrix, and call a line through the origin of invariant under A if Ax lies on the line when x does. Find equations for all lines in , if any, that are invariant under the given matrix. (a) (b) (c)

Answer: (a)

and

(b) No lines (c) 16. Find

given that A has

as its characteristic polynomial.

(a) (b) [Hint: See the proof of Theorem 5.1.5.] 17. Let A be an

matrix.

(a) Prove that the characteristic polynomial of A has degree n. (b) Prove that the coefficient of

in the characteristic polynomial is 1.

18. Show that the characteristic equation of a where

matrix A can be expressed as

is the trace of A.

19. Use the result in Exercise 18 to show that if

then the solutions of the characteristic equation of A are

Use this result to show that A has (a) two distinct real eigenvalues if (b) two repeated real eigenvalues if (c) complex conjugate eigenvalues if

. . .

,

20. Let A be the matrix in Exercise 19. Show that if

, then

are eigenvectors of A that correspond, respectively, to the eigenvalues

and

21. Use the result of Exercise 18 to prove that if then .

is the characteristic polynomial of a

22. Prove: If a, b, c, and d are integers such that

has integer eigenvalues—namely,

matrix A,

, then

and

.

23. Prove: If is an eigenvalue of an invertible matrix A, and x is a corresponding eigenvector, then an eigenvalue of , and x is a corresponding eigenvector. 24. Prove: If is an eigenvalue of A, x is a corresponding eigenvector, and s is a scalar, then eigenvalue of , and x is a corresponding eigenvector. 25. Prove: If is an eigenvalue of A and x is a corresponding eigenvector, then every scalar s, and x is a corresponding eigenvector.

is

is an

is an eigenvalue of

for

26. Find the eigenvalues and bases for the eigenspaces of

and then use Exercises 23 and 24 to find the eigenvalues and bases for the eigenspaces of (a) (b) (c) 27. (a) Prove that if A is a square matrix, then A and characteristic equation .]

have the same eigenvalues. [Hint: Look at the

(b) Show that A and need not have the same eigenspaces. [Hint: Use the result in Exercise 20 to find a matrix for which A and have different eigenspaces.] 28. Suppose that the characteristic polynomial of some matrix A is found to be . In each part, answer the question and explain your reasoning. (a) What is the size of A? (b) Is A invertible? (c) How many eigenspaces does A have?

29. The eigenvectors that we have been studying are sometimes called right eigenvectors to distinguish them from left eigenvectors, which are column matrices x that satisfy the equation for some scalar . What is the relationship, if any, between the right eigenvectors and corresponding eigenvalues of A and the left eigenvectors and corresponding eigenvalues of A?

True-False Exercises In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) If A is a square matrix and

for some nonzero scalar , then x is an eigenvector of A.

Answer: False (b) If

is an eigenvalue of a matrix A, then the linear system

has only the trivial solution.

Answer: False (c) If the characteristic polynomial of a matrix A is

, then A is invertible.

Answer: True (d) If is an eigenvalue of a matrix A, then the eigenspace of A corresponding to of A corresponding to .

is the set of eigenvectors

Answer: False (e) If 0 is an eigenvalue of a matrix A, then

is singular.

Answer: True (f) The eigenvalues of a matrix A are the same as the eigenvalues of the reduced row echelon form of A. Answer: False (g) If 0 is an eigenvalue of a matrix A, then the set of columns of A is linearly independent. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

5.2 Diagonalization In this section we will be concerned with the problem of finding a basis for that consists of eigenvectors of an matrix A. Such bases can be used to study geometric properties of A and to simplify various numerical computations. These bases are also of physical significance in a wide variety of applications, some of which will be considered later in this text.

The Matrix Diagonalization Problem Our first objective in this section is to show that the following two seemingly different problems are equivalent.

Problem 1 Given an

matrix A, does there exist an invertible matrix P such that

Problem 2 Given an

matrix A, does A have n linearly independent eigenvectors?

is diagonal?

Similarity The matrix product that appears in Problem 1 is called a similarity transformation of the matrix A. Such products are important in the study of eigenvectors and eigenvalues, so we will begin with some terminology about them.

DEFINITION 1 If A and B are square matrices, then we say that B is similar to A if there is an invertible matrix P such that .

Note that if B is similar to A, then it is also true that A is similar to B, since we can express B as taking

by

. This being the case, we will usually say that A and B are similar matrices if either is similar to

the other.

Similarity Invariants Similar matrices have many properties in common. For example, if the same determinant, since

, then it follows that A and B have

In general, any property that is shared by all similar matrices is called a similarity invariant or is said to be invariant under similarity. Table 1 lists the most important similarity invariants. The proofs of some of these results are given as exercises. Table 1 Similarity Invariants Property

Description

Determinant

A and

Invertibility

A is invertible if and only if

Rank

A and

have the same rank.

Nullity

A and

have the same nullity.

Trace

A and

have the same trace.

Characteristic polynomial

A and

have the same characteristic polynomial.

Eigenvalues

A and

have the same eigenvalues.

Eigenspace dimension

If is an eigenvalue of A and hence of corresponding to and the eigenspace of dimension.

have the same determinant. is invertible.

, then the eigenspace of A corresponding to have the same

Expressed in the language of similarity, Problem 1 posed above is equivalent to asking whether the matrix A is similar to a diagonal matrix. If so, the diagonal matrix will have all of the similarity-invariant properties of A, but will have a simpler form, making it easier to analyze and work with. This important idea has some associated terminology.

DEFINITION 2 A square matrix A is said to be diagonalizable if it is similar to some diagonal matrix; that is, if there exists an invertible matrix P such that is diagonal. In this case the matrix P is said to diagonalize A.

The following theorem shows that Problems 1 and 2 posed above are actually two different forms of the same mathematical problem.

THEOREM 5.2.1 If A is an

matrix, the following statements are equivalent.

(a) A is diagonalizable. (b) A has n linearly independent eigenvectors.

Part (b) of Theorem 5.2.1 is equivalent to saying that there is a basis for consisting of eigenvectors of A. Why?

Proof (a)⇒ (b) Since A is assumed to be diagonalizable, it follows that there exists an invertible matrix P and a diagonal matrix D such that or, equivalently,

(1) If we denote the column vectors of P by , and if we assume that the diagonal entries of D are , then by Formula 6 of Section 1.3 the left side of 1 can be expressed as and, as noted in the comment following Example 1 of Section 1.7, the right side of 1 can be expressed as Thus, it follows from 1 that (2) Since P is invertible, we know from Theorem 5.1.6 that its column vectors are linearly independent (and hence nonzero). Thus, it follows from 2 that these n column vectors are eigenvectors of A. Proof (b)⇒ (a) Assume that A has n linearly independent eigenvectors, the corresponding eigenvalues. If we let

and if we let D be the diagonal matrix that has

, and that

are

as its successive diagonal entries, then

Since the column vectors of P are linearly independent, it follows from Theorem 5.1.6 that P is invertible, so that this last equation can be rewritten as , which shows that A is diagonalizable.

Procedure for Diagonalizing a Matrix The preceding theorem guarantees that an matrix A with n linearly independent eigenvectors is diagonalizable, and the proof suggests the following method for diagonalizing A.

Procedure for Diagonalizing a Matrix Step 1. Confirm that the matrix is actually diagonalizable by finding n linearly independent eigenvectors. One way to do this is by finding a basis for each eigenspace and merging these basis vectors into a single set S. If this set has fewer than n vectors, then the matrix is not diagonalizable. Step 2. Form the matrix Step 3. The matrix eigenvectors

that has the vectors in S as its column vectors. will be diagonal and have the eigenvalues as its successive diagonal entries.

corresponding to the

E X A M P L E 1 Finding a Matrix P That Diagonalizes a Matrix A Find a matrix P that diagonalizes

Solution In Example 7 of the preceding section we found the characteristic equation of A to be and we found the following bases for the eigenspaces:

There are three basis vectors in total, so the matrix

diagonalizes A. As a check, you should verify that

In general, there is no preferred order for the columns of P. Since the ith diagonal entry of is an eigenvalue for the ith column vector of P, changing the order of the columns of P just changes the order of the eigenvalues on the diagonal of . Thus, had we written

in the preceding example, we would have obtained

E X A M P L E 2 A Matrix That Is Not Diagonalizable Find a matrix P that diagonalizes

Solution The characteristic polynomial of A is

so the characteristic equation is Thus, the distinct eigenvalues of A are the eigenspaces are

Since A is a

and

. We leave it for you to show that bases for

matrix and there are only two basis vectors in total, A is not diagonalizable.

Alternative Solution If you are concerned only in determining whether a matrix is diagonalizable and not with actually finding a diagonalizing matrix P, then it is not necessary to compute bases for the eigenspaces—it suffices to find the dimensions of the eigenspaces. For this example, the eigenspace corresponding to is the solution space of the system

Since the coefficient matrix has rank 2 (verify), the nullity of this matrix is 1 by Theorem 4.8.2, and hence the eigenspace corresponding to is one-dimensional. The eigenspace corresponding to

is the solution space of the system

This coefficient matrix also has rank 2 and nullity 1 (verify), so the eigenspace corresponding to is also one-dimensional. Since the eigenspaces produce a total of two basis vectors, and since three are needed, the matrix A is not diagonalizable.

There is an assumption in Example 1 that the column vectors of P, which are made up of basis vectors from the various eigenspaces of A, are linearly independent. The following theorem, proved at the end of this section, shows that this is so.

THEOREM 5.2.2 If

are eigenvectors of a matrix A corresponding to distinct eigenvalues, then is a linearly independent set.

Remark Theorem 5.2.2 is a special case of a more general result: Suppose that are distinct eigenvalues and that we choose a linearly independent set in each of the corresponding eigenspaces. If we then merge all these vectors into a single set, the result will still be a linearly independent set. For example, if we choose three linearly independent vectors from one eigenspace and two linearly independent vectors from another eigenspace, then the five vectors together form a linearly independent set. We omit the proof. As a consequence of Theorem 5.2.2, we obtain the following important result.

THEOREM 5.2.3 If an

matrix A has n distinct eigenvalues, then A is diagonalizable.

Proof If 5.2.2,

are eigenvectors corresponding to the distinct eigenvalues are linearly independent. Thus, A is diagonalizable by Theorem 5.2.1.

, then by Theorem

E X A M P L E 3 Using Theorem 5.2.3 We saw in Example 3 of the preceding section that

has three distinct eigenvalues: and

,

, and

. Therefore, A is diagonalizable

for some invertible matrix P. If needed, the matrix P can be found using the method shown in Example 1 of this section.

E X A M P L E 4 Diagonalizability of Triangular Matrices From Theorem 5.1.2, the eigenvalues of a triangular matrix are the entries on its main diagonal. Thus, a triangular matrix with distinct entries on the main diagonal is diagonalizable. For example,

is a diagonalizable matrix with eigenvalues

,

,

,

.

Computing Powers of a Matrix There are many applications in which it is necessary to compute high powers of a square matrix A. We will show next that if A happens to be diagonalizable, then the computations can be simplified by diagonalizing A. To start, suppose that A is a diagonalizable

matrix, that P diagonalizes A, and that

Squaring both sides of this equation yields

We can rewrite the left side of this equation as

from which we obtain the relationship computation will show that

which we can rewrite as

. More generally, if k is a positive integer, then a similar

(3)

Formula 3 reveals that raising a diagonalizable matrix A to a positive integer power has the effect of raising its eigenvalues to that power.

Note that computing the right side of this formula involves only three matrix multiplications and the powers of the diagonal entries of D. For matrices of large size and high powers of , this involves substantially fewer operations than computing directly.

E X A M P L E 5 Power of a Matrix Use 3 to find

, where

Solution We showed in Example 1 that the matrix A is diagonalized by

and that

Thus, it follows from 3 that

(4)

Remark With the method in the preceding example, most of the work is in diagonalizing A. Once that work is done, it can be used to compute any power of A. Thus, to compute we need only change the exponents from 13 to 1000 in 4.

Eigenvalues of Powers of a Matrix Once the eigenvalues and eigenvectors of any square matrix A are found, it is a simple matter to find the eigenvalues and eigenvectors of any positive integer power of A. For example, if is an eigenvalue of A and x is a corresponding eigenvector, then which shows not only that the following result.

is an eigenvalue of

but that x is a corresponding eigenvector. In general, we have

Note that diagonalizability is not a requirement in Theorem 5.2.4.

THEOREM 5.2.4 If is an eigenvalue of a square matrix A and x is a corresponding eigenvector, and if k is any positive integer, then is an eigenvalue of and x is a corresponding eigenvector.

Some problems that use this theorem are given in the exercises.

Geometric and Algebraic Multiplicity Theorem 5.2.3 does not completely settle the diagonalizability question since it only guarantees that a square matrix with n distinct eigenvalues is diagonalizable, but does not preclude the possibility that there may exist diagonalizable matrices with fewer than n distinct eigenvalues. The following example shows that this is indeed the case.

E X A M P L E 6 The Converse of Theorem 5.2.3 Is False Consider the matrices

It follows from Theorem 5.1.2 that both of these matrices have only one distinct eigenvalue, namely , and hence only one eigenspace. We leave it as an exercise for you to solve the characteristic equations with and show that for I the eigenspace is three-dimensional (all of one-dimensional, consisting of all scalar multiples of

) and for J it is

This shows that the converse of Theorem 5.2.3 is false, since we have produced two matrices with fewer than three distinct eigenvalues, one of which is diagonalizable and the other of which is not.

A full excursion into the study of diagonalizability is left for more advanced courses, but we will touch on one theorem that is important to a fuller understanding of diagonalizability. It can be proved that if is an eigenvalue of A, then the dimension of the eigenspace corresponding to cannot exceed the number of times that appears as a factor of the characteristic polynomial of A. For example, in Example 1 and Example 2 the characteristic polynomial is Thus, the eigenspace corresponding to is at most (hence exactly) one-dimensional, and the eigenspace corresponding to is at most two-dimensional. In Example 1 the eigenspace corresponding to actually had dimension 2, resulting in diagonalizability, but in Example 2 the eigenspace corresponding to had only dimension 1, resulting in nondiagonalizability. There is some terminology that is related to these ideas. If is an eigenvalue of an matrix A, then the dimension of the eigenspace corresponding to is called the geometric multiplicity of , and the number of times that appears as a factor in the characteristic polynomial of A is called the algebraic multiplicity of The following theorem, which we state without proof, summarizes the preceding discussion.

THEOREM 5.2.5 Geometric and Algebraic Multiplicity If A is a square matrix, then: (a) For every eigenvalue of A, the geometric multiplicity is less than or equal to the algebraic multiplicity. (b) A is diagonalizable if and only if the geometric multiplicity of every eigenvalue is equal to the algebraic multiplicity.

OPTIONAL

We will complete this section with an optional proof of Theorem 5.2.2. Proof of Theorem 5.2.2 Let be eigenvectors of A corresponding to distinct eigenvalues . We will assume that are linearly dependent and obtain a contradiction. We can then conclude that are linearly independent. Since an eigenvector is nonzero by definition, is linearly independent. Let r be the largest integer such that is linearly independent. Since we are assuming that is linearly dependent, r satisfies . Moreover, by the definition of r, is linearly dependent. Thus, there are scalars , not all zero, such that

.

(5) Multiplying both sides of 5 by A and using the fact that we obtain (6) If we now multiply both sides of 5 by Since

and subtract the resulting equation from 6 we obtain

is a linearly independent set, this equation implies that

and since

are assumed to be distinct, it follows that (7)

Substituting these values in 5 yields Since the eigenvector

is nonzero, it follows that (8)

But equations 7 and 8 contradict the fact that

are not all zero so the proof is complete.

Concept Review • Similarity transformation • Similarity invariant • Similar matrices • Diagonalizable matrix • Geometric multiplicity • Algebraic multiplicity

Skills • Determine whether a square matrix A is diagonalizable. • Diagonalize a square matrix A. • Find powers of a matrix using similarity. • Find the geometric multiplicity and the algebraic multiplicity of an eigenvalue.

Exercise Set 5.2 In Exercises 1–4, show that A and B are not similar matrices. 1. Answer: Possible reason: Determinants are different. 2.

,

3. ,

Answer: Possible reason: Ranks are different. 4. , 5. Let A be a

matrix with characteristic equation

. What are the possible dimensions

for eigenspaces of A? Answer:

6. Let

(a) Find the eigenvalues of A. (b) For each eigenvalue , find the rank of the matrix

.

(c) Is A diagonalizable? Justify your conclusion. In Exercises 7–11, use the method of Exercise 6 to determine whether the matrix is diagonalizable. 7. Answer: Not diagonalizable

8. 9.

Answer: Not diagonalizable 10.

11.

Answer: Not diagonalizable In Exercises 12–15, find a matrix P that diagonalizes A, and compute

.

12. 13. Answer:

14.

15.

Answer:

In Exercises 16–21, find the geometric and algebraic multiplicity of each eigenvalue of the matrix A, and determine whether A is diagonalizable. If A is diagonalizable, then find a matrix P that diagonalizes A, and find .

16.

17.

Answer:

18.

19.

Answer:

20.

21.

Answer:

22. Use the method of Example 5 to compute

, where

23. Use the method of Example 5 to compute

, where

Answer:

24. In each part, compute the stated power of

25. Find

if n is a positive integer and

Answer:

26. Let

Show that (a) A is diagonalizable if

.

(b) A is not diagonalizable if

.

[Hint: See Exercise 19 of Section 5.1.] 27. In the case where the matrix A in Exercise 26 is diagonalizable, find a matrix P that diagonalizes A. [Hint: See Exercise 20 of Section 5.1.] Answer: On possibility is

where

28. Prove that similar matrices have the same rank. 29. Prove that similar matrices have the same nullity.

and

are as in Exercise 20 of Section 5.1.

30. Prove that similar matrices have the same trace. 31. Prove that if A is diagonalizable, then so is

for every positive integer k.

32. Prove that if A is a diagonalizable matrix, then the rank of A is the number of nonzero eigenvalues of A. 33. Suppose that the characteristic polynomial of some matrix A is found to be

.

In each part, answer the question and explain your reasoning. (a) What can you say about the dimensions of the eigenspaces of A? (b) What can you say about the dimensions of the eigenspaces if you know that A is diagonalizable? (c) If is a linearly independent set of eigenvectors of A all of which correspond to the same eigenvalue of A, what can you say about the eigenvalue? Answer: (a) (b) Dimensions will be exactly 1, 2, and 3. (c) 34. This problem will lead you through a proof of the fact that the algebraic multiplicity of an eigenvalue of an matrix A is greater than or equal to the geometric multiplicity. For this purpose, assume that is an eigenvalue with geometric multiplicity k. (a) Prove that there is a basis eigenspace corresponding to

for

in which the first k vectors of B form a basis for the

.

(b) Let P be the matrix having the vectors in B as columns. Prove that the product

can be expressed as

[Hint: Compare the first k column vectors on both sides.] (c) Use the result in part (b) to prove that A is similar to

and hence that A and C have the same characteristic polynomial. (d) By considering prove that the characteristic polynomial of C (and hence A) contains the factor at least k times, thereby proving that the algebraic multiplicity of is greater than or equal to the geometric multiplicity k.

True-False Exercises In parts (a)–(h) determine whether the statement is true or false, and justify your answer. (a) Every square matrix is similar to itself. Answer: True (b) If A, B, and C are matrices for which A is similar to B and B is similar to C, then A is similar to C.

Answer: True (c) If A and B are similar invertible matrices, then

and

are similar.

Answer: True (d) If A is diagonalizable, then there is a unique matrix P such that

is diagonal.

Answer: False (e) If A is diagonalizable and invertible, then

is diagonalizable.

Answer: True (f) If A is diagonalizable, then

is diagonalizable.

Answer: True (g) If there is a basis for

consisting of eigenvectors of an

matrix A, then A is diagonalizable.

Answer: True (h) If every eigenvalue of a matrix A has algebraic multiplicity 1, then A is diagonalizable. Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

5.3 Complex Vector Spaces Because the characteristic equation of any square matrix can have complex solutions, the notions of complex eigenvalues and eigenvectors arise naturally, even within the context of matrices with real entries. In this section we will discuss this idea and apply our results to study symmetric matrices in more detail. A review of the essentials of complex numbers appears in the back of this text.

Review of Complex Numbers Recall that if • • • • • the angle

is a complex number, then: and

are called the real part of z and the imaginary part of z, respectively, is called the modulus (or absolute value) of z,

is called the complex conjugate of z, , in Figure 5.3.1 is called an argument of z,

• • •

is called the polar form of z.

Figure 5.3.1

Complex Eigenvalues In Formula 3 of Section 5.1 we observed that the characteristic equation of a general

matrix A has the form (1)

in which the highest power of has a coefficient of 1. Up to now we have limited our discussion to matrices in which the solutions of 1 are real numbers. However, it is possible for the characteristic equation of a matrix A with real entries to have imaginary solutions; for example, the characteristic equation of the matrix

is

which has the imaginary solutions vector space and some related ideas.

and

. To deal with this case we will need to explore the notion of a complex

Vectors in Cn A vector space in which scalars are allowed to be complex numbers is called a complex vector space. In this section we will be concerned only with the following complex generalization of the real vector space .

DEFINITION 1 If n is a positive integer, then a complex n-tuple is a sequence of n complex numbers . The set of all complex n-tuples is called complex n-space and is denoted by . Scalars are complex numbers, and the operations of addition, subtraction, and scalar multiplication are performed componentwise.

The terminology used for n-tuples of real numbers applies to complex n-tuples without change. Thus, if are complex numbers, then we call a vector in and its components. Some examples of vectors in are

Every vector in

can be split into real and imaginary parts as

which we also denote as where The vector is called the complex conjugate of v and can be expressed in terms of

and

as (2)

It follows that the vectors in can be viewed as those vectors in vector v in is in if and only if .

whose imaginary part is zero; or stated another way, a

In this section we will also need to consider matrices with complex entries, so henceforth we will call a matrix A a real matrix if its entries are required to be real numbers and a complex matrix if its entries are allowed to be complex numbers. The standard operations on real matrices carry over to complex matrices without change, and all of the familiar properties of matrices continue to hold. If A is a complex matrix, then Re(A) and Im(A) are the matrices formed from the real and imaginary parts of the entries of A, and is the matrix formed by taking the complex conjugate of each entry in A.

E X A M P L E 1 Real and Imaginary Parts of Vectors and Matrices Let

Then

Algebraic Properties of the Complex Conjugate The next two theorems list some properties of complex vectors and matrices that we will need in this section. Some of the proofs are given as exercises.

THEOREM 5.3.1 If u and v are vectors in

, and if k is a scalar, then:

(a) (b) (c) (d)

THEOREM 5.3.2 If A is an

complex matrix and B is a

complex matrix, then:

(a) (b) (c)

The Complex Euclidean Inner Product The following definition extends the notions of dot product and norm to

.

DEFINITION 2 If and are vectors in , then the complex Euclidean inner product of of u and v (also called the complex dot product) is denoted by and is defined as

(3) We also define the Euclidean norm on

to be (4)

As in the real case, we call v a unit vector in

if

The complex conjugates in 3 ensure that number, for without them the quantity be imaginary.

, and we say two vectors u and v are orthogonal if

.

is a real in 4 might

E X A M P L E 2 Complex Euclidean Inner Product and Norm Find

,

,

, and

for the vectors

Solution

Recall from Table 1 of Section 3.2 that if u and v are column vectors in The analogous formulas in

, then their dot product can be expressed as

are (verify) (5)

Example 2 reveals a major difference between the dot product on and the complex dot product on . For the dot product on we always have (the symmetry property), but for the complex dot product the corresponding relationship is given by , which is called its antisymmetry property. The following theorem is an analog of Theorem 3.2.2.

THEOREM 5.3.3 If u, v, and w are vectors in properties: (a) (b) (c)

, and if k is a scalar, then the complex Euclidean inner product has the following

(d) (e)

Parts (c) and (d) of this theorem state that a scalar multiplying a complex Euclidean inner product can be regrouped with the first vector, but to regroup it with the second vector you must first take its complex conjugate. We will prove part (d), and leave the others as exercises. Proof (d)

To complete the proof. substitute

for k and use the fact that

.

Vector Concepts in Cn Except for the use of complex scalars, the notions of linear combination, linear independence, subspace, spanning, basis, and dimension carry over without change to . Is

a subspace of

? Explain.

Eigenvalues and eigenvectors are defined for complex matrices exactly as for real matrices. If A is an matrix with complex entries, then the complex roots of the characteristic equation are called complex eigenvalues of A. As in the real case, is a complex eigenvalue of A if and only if there exists a nonzero vector x in such that . Each such x is called a complex eigenvector of A corresponding to λ. The complex eigenvectors of A corresponding to λ are the nonzero solutions of the linear system , and the set of all such solutions is a subspace of , called the eigenspace of A corresponding to λ. The following theorem states that if a real matrix has complex eigenvalues, then those eigenvalues and their corresponding eigenvectors occur in conjugate pairs.

THEOREM 5.3.4 If λ is an eigenvalue of a real matrix A, and if x is a corresponding eigenvector, then and is a corresponding eigenvector.

is also an eigenvalue of A,

Proof Since λ is an eigenvalue of A and x is a corresponding eigenvector, we have

(6) However,

, since A has real entries, so it follows from part (c) of Theorem 5.3.2 that (7)

Equations 6 and 7 together imply that

in which

(why?); this tells us that

is an eigenvalue of A and

is a corresponding eigenvector.

E X A M P L E 3 Complex Eigenvalues and Eigenvectors Find the eigenvalues and bases for the eigenspaces of

Solution The characteristic polynomial of A is

so the eigenvalues of A are guaranteed by Theorem 5.3.4.

and

. Note that these eigenvalues are complex conjugates, as

To find the eigenvectors we must solve the system

with

and then with

. With

, this system becomes (8)

We could solve this system by reducing the augmented matrix (9) to reduced row echelon form by Gauss-Jordan elimination, though the complex arithmetic is somewhat tedious. A simpler procedure here is first to observe that the reduced row echelon form of 9 must have a row of zeros because 8 has nontrivial solutions. This being the case, each row of 9 must be a scalar multiple of the other, and hence the first row can be made into a row of zeros by adding a suitable multiple of the second row to it. Accordingly, we can simply set the entries in the first row to zero, then interchange the rows, and then multiply the new first row by to obtain the reduced row echelon form

Thus, a general solution of the system is

This tells us that the eigenspace corresponding to multiples of the basis vector

is one-dimensional and consists of all complex scalar

(10) As a check, let us confirm that

. We obtain

We could find a basis for the eigenspace corresponding to

in a similar way, but the work is unnecessary,

since Theorem 5.3.4 implies that (11) must be a basis for this eigenspace. The following computations confirm that corresponding to :

is an eigenvector of A

Since a number of our subsequent examples will involve matrices with real entries, it will be useful to discuss some general results about the eigenvalues of such matrices. Observe first that the characteristic polynomial of the matrix

is

We can express this in terms of the trace and determinant of A as (12) from which it follows that the characteristic equation of A is (13) Now recall from algebra that if

is a quadratic equation with real coefficients, then the discriminant

determines the nature of the roots:

Applying this to 13 with

,

, and

yields the following theorem.

Olga Taussky-Todd (1906–1995) Historical Note Olga Taussky-Todd was one of the pioneering women in matrix analysis and the first woman appointed to the faculty at the California Institute of Technology. She worked at the National Physical Laboratory in London during World War II, where she was assigned to study flutter in supersonic aircraft. While there, she realized that some results about the eigenvalues of a certain complex matrix could be used to answer key questions about the flutter problem that would otherwise have required laborious calculation. After World War II Olga Taussky-Todd continued her work on matrix-related subjects and helped to draw many known but disparate results about matrices into the coherent subject that we now call matrix theory. [Image: Courtesy of the Archives, California Institute of Technology]

THEOREM 5.3.5 If A is a

matrix with real entries, then the characteristic equation of A is

(a) A has two distinct real eigenvalues if

;

(b) A has one repeated real eigenvalue if

;

(c) A has two complex conjugate eigenvalues if

and

.

E X A M P L E 4 Eigenvalues of a 2 × 2 Matrix In each part, use Formula 13 for the characteristic equation to find the eigenvalues of (a) (b) (c)

Solution (a) We have

and

Factoring yields (b) We have

, so the characteristic equation of A is , so the eigenvalues of A are

and

Factoring this equation yields

and

, so the characteristic equation of A is

, so

is the only eigenvalue of A; it has algebraic

multiplicity 2. (c) We have

and

.

, so the characteristic equation of A is

Solving this equation by the quadratic formula yields

Thus, the eigenvalues of A are

and

.

Symmetric Matrices Have Real Eigenvalues Our next result, which is concerned with the eigenvalues of real symmetric matrices, is important in a wide variety of applications. The key to its proof is to think of a real symmetric matrix as a complex matrix whose entries have an imaginary part of zero.

THEOREM 5.3.6 If A is a real symmetric matrix, then A has real eigenvalues.

Proof Suppose that is an eigenvalue of A and x is a corresponding eigenvector, where we allow for the possibility that λ is complex and x is in . Thus,

where

. If we multiply both sides of this equation by

and use the fact that

then we obtain

Since the denominator in this expression is real, we can prove that λ is real by showing that (14) But, A is symmetric and has real entries, so it follows from the second equality in 14 and properties of the conjugate that

A Geometric Interpretation of Complex Eigenvalues The following theorem is the key to understanding the geometric significance of complex eigenvalues of real

matrices.

THEOREM 5.3.7 The eigenvalues of the real matrix (15)

are

. If a and b are not both zero, then this matrix can be factored as (16)

where φ is the angle from the positive x-axis to the ray that joins the origin to the point

(Figure 5.3.2).

Figure 5.3.2

Geometrically, this theorem states that multiplication by a matrix of form 15 can be viewed as a rotation through the angle φ followed by a scaling with factor (Figure 5.3.3).

Figure 5.3.3

Proof The characteristic equation of C is

(verify), from which it follows that the eigenvalues of C are

. Assuming that a and b are not both zero, let φ be the angle from the positive x-axis to the ray that joins the origin to the point . The angle φ is an argument of the eigenvalue , so we see from Figure 5.3.2 that

It follows from this that the matrix in 15 can be written as

The following theorem, whose proof is considered in the exercises, shows that every real eigenvalues is similar to a matrix of form 15.

matrix with complex

THEOREM 5.3.8 Let A be a real corresponding to

matrix with complex eigenvalues , then the matrix

(where ). If x is an eigenvector of A is invertible and

(17)

E X A M P L E 5 A Matrix Factorization Using Complex Eigenvalues Factor the matrix in Example 3 into form 17 using the eigenvalue that was given in 11.

and the corresponding eigenvector

Solution For consistency with the notation in Theorem 5.3.8, let us denote the eigenvector in 11 that corresponds to by x (rather than as before). For this λ and x we have

Thus,

so A can be factored in form 17 as

You may want to confirm this by multiplying out the right side.

A Geometric Interpretation of Theorem 5.3.8 To clarify what Theorem 5.3.8 says geometrically, let us denote the matrices on the right side of 16 by S and and then use 16 to rewrite 17 as

, respectively,

(18) If we now view P as the transition matrix from the basis computing a product can be broken down into a three-step process: Step 1 Map

to the standard basis, then 18 tells us that

from standard coordinates into B-coordinates by forming the product

Step 2 Rotate and scale the vector

by forming the product

.

.

Step 3 Map the rotated and scaled vector back to standard coordinates to obtain

.

Power Sequences There are many problems in which one is interested in how successive applications of a matrix transformation affect a specific vector. For example, if A is the standard matrix for an operator on and is some fixed vector in , then one might be interested in the behavior of the power sequence

For example, if

then with the help of a computer or calculator one can show that the first four terms in the power sequence are

With the help of MATLAB or a computer algebra system one can show that if the first 100 terms are plotted as ordered pairs , then the points move along the elliptical path shown in Figure 5.3.4a.

Figure 5.3.4 To understand why the points move along an elliptical path, we will need to examine the eigenvalues and eigenvectors of A. We leave it for you to show that the eigenvalues of A are and that the corresponding eigenvectors are

If we take

and

in 17 and use the fact that

, then we obtain the factorization

(19)

where

is a rotation about the origin through the angle φ whose tangent is

The matrix P in 19 is the transition matrix from the basis

to the standard basis, and is the transition matrix from the standard basis to the basis B (Figure 5.3.5). Next, observe that if n is a positive integer, then 19 implies that

so the product can be computed by first mapping into the point rotate this point about the origin through the angle , and then multiplying

in B-coordinates, then multiplying by to by P to map the resulting point back to

standard coordinates. We can now see what is happening geometrically: In B-coordinates each successive multiplication by A causes the point to advance through an angle φ, thereby tracing a circular orbit about the origin. However, the basis B is skewed (not orthogonal), so when the points on the circular orbit are transformed back to standard coordinates, the effect is to distort the circular orbit into the elliptical orbit traced by (Figure 5.3.4b). Here are the computations for the first step (successive steps are illustrated in Figure 5.3.4c):

Figure 5.3.5

Concept Review • Real part of z • Imaginary part of z • Modulus of z • Complex conjugate of z • Argument of z • Polar form of z • Complex vector space • Complex n-tuple • Complex n-space • Real matrix • Complex matrix • Complex Euclidean inner product • Euclidean norm on

• Antisymmetry property • Complex eigenvalue • Complex eigenvector • Eigenspace in • Discriminant

Skills • Find the real part, imaginary part, and complex conjugate of a complex matrix or vector. • Find the determinant of a complex matrix. • Find complex inner products and norms of complex vectors. • Find the eigenvalues and bases for the eigenspaces of complex matrices. • Factor a

real matrix with complex eigenvalues into a product of a scaling matrix and a rotation matrix.

Exercise Set 5.3 In Exercises 1–2, find ,

,

, and

.

1. Answer:

2. In Exercises 3–4, show that u, v, and k satisfy Theorem 5.3.1. 3. 4. 5. Solve the equation

for x, where u and v are the vectors in Exercise 3.

Answer:

6. Solve the equation In Exercises 7–8, find ,

for x, where u and v are the vectors in Exercise 4. ,

,

, and

.

7. Answer:

8. 9. Let A be the matrix given in Exercise 7, and let B be the matrix

Confirm that these matrices have the properties stated in Theorem 5.3.2. 10. Let A be the matrix given in Exercise 8, and let B be the matrix

Confirm that these matrices have the properties stated in Theorem 5.3.2. In Exercises 11–12, compute of Theorem 5.3.3.

,

, and

, and show that the vectors satisfy Formula 5 and parts ( a), ( b), and ( c)

11. Answer:

12. 13. Compute

for the vectors u, v, and w in Exercise 11.

Answer: 14. Compute

for the vectors u, v, and w in Exercise 12.

In Exercises 15–18, find the eigenvalues and bases for the eigenspaces of A. 15. Answer:

16. 17. Answer:

18. In Exercises 19–22, each matrix C has form 15. Theorem 5.3.7 implies that C is the product of a scaling matrix with factor and a rotation matrix with angle φ. Find and φ for which . 19. Answer:

20.

21.

Answer:

22.

In Exercises 23–26, find an invertible matrix P and a matrix C of form 15 such that

.

23. Answer:

24. 25. Answer:

26. 27. Find all complex scalars k, if any, for which u and v are orthogonal in

.

(a) (b) Answer: (a) (b) None 28. Show that if A is a real

matrix and x is a column vector in

, then

and

.

29. The matrices

called Pauli spin matrices, are used in quantum mechanics to study particle spin. The Dirac matrices, which are also used in quantum mechanics, are expressed in terms of the Pauli spin matrices and the identity matrix as

(a) Show that

.

(b) Matrices A and B for which anticommutative.

are said to be anticommutative. Show that the Dirac matrices are

30. If k is a real scalar and v is a vector in is a complex scalar and v is a vector in

, then Theorem 3.2.1 states that ? Justify your answer.

. Is this relationship also true if k

31. Prove part ( c) of Theorem 5.3.1. 32. Prove Theorem 5.3.2. 33. Prove that if u and v are vectors in

, then

34. It follows from Theorem 5.3.7 that the eigenvalues of the rotation matrix

are . Prove that if x is an eigenvector corresponding to either eigenvalue, then and are orthogonal and have the same length. [Note: This implies that is a real scalar multiple of an orthogonal matrix.] 35. The two parts of this exercise lead you through a proof of Theorem 5.3.8. (a) For notational simplicity, let

and let

and

, so

. Show that the relationship

implies that

and then equate real and imaginary parts in this equation to show that

(b) Show that P is invertible, thereby completing the proof, since the result in part (a) implies that P is not invertible, then one of its column vectors is a real scalar multiple of the other, say the equations and obtained in part (a), and show that that this leads to a contradiction, thereby proving that P is invertible.] 36. In this problem you will prove the complex analog of the Cauchy-Schwarz inequality. (a) Prove: If k is a complex number, and u and v are vectors in

, then

(b) Use the result in part (a) to prove that

(c) Take

in part (b) to prove that

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer. (a) There is a real

matrix with no real eigenvalues.

. [Hint: If . Substitute this into . Finally, show

Answer: False (b) The eigenvalues of a

complex matrix are the solutions of the equation

.

Answer: True (c) Matrices that have the same complex eigenvalues with the same algebraic multiplicities have the same trace. Answer: False (d) If λ is a complex eigenvalue of a real matrix A with a corresponding complex eigenvector v, then eigenvalue of A and is a complex eigenvector of A corresponding to .

is a complex

Answer: True (e) Every eigenvalue of a complex symmetric matrix is real. Answer: False (f) If a real matrix A has complex eigenvalues and on an ellipse. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

is a vector in

, then the vectors

,

,

lie

5.4 Differential Equations Many laws of physics, chemistry, biology, engineering, and economics are described in terms of “differential equations”—that is, equations involving functions and their derivatives. In this section we will illustrate one way in which linear algebra, eigenvalues and eigenvectors can be applied to solving systems of differential equations. Calculus is a prerequisite for this section.

Terminology Recall from calculus that a differential equation is an equation involving unknown functions and their derivatives. The order of a differential equation is the order of the highest derivative it contains. The simplest differential equations are the first-order equations of the form (1) where

is an unknown differentiable function to be determined,

is its derivative, and a is a

constant. As with most differential equations, this equation has infinitely many solutions; they are the functions of the form (2) where c is an arbitrary constant. That every function of this form is a solution of 1 follows from the computation and that these are the only solution is shown in the exercises. Accordingly, we call 2 the general solution of 1. As an example, the general solution of the differential equation is (3) Often, a physical problem that leads to a differential equation imposes some conditions that enable us to isolate one particular solution from the general solution. For example, if we require that solution 3 of the equation satisfy the added condition (4) (that is,

when

), then on substituting these values in 3, we obtain

from which we conclude

that is the only solution

that satisfies 4.

A condition such as 4, which specifies the value of the general solution at a point is called an initial condition, and the problem of solving a differential equation subject to an initial condition is called an initial-value problem.

First-Order Linear Systems

In this section we will be concerned with solving systems of differential equations of the form

(5)

where , matrix notation, 5 can be written as

are functions to be determined, and the

's are constants. In

or, more briefly as A system of differential equations of form 5 is called a first-order linear system.

(6) where the notation

denotes the vector obtained by differentiating each component of y.

E X A M P L E 1 Solution of a Linear System with Initial Conditions (a) Write the following system in matrix form: (7)

(b) Solve the system. (c) Find a solution of the system that satisfies the initial conditions .

,

, and

Solution (a) (8)

or (9)

(b) Because each equation in 7 involves only one unknown function, we can solve the equations individually. It follows from 2 that these solutions are

or, in matrix notation,

(10)

(c) From the given initial conditions, we obtain

so the solution satisfying these conditions is or, in matrix notation,

Solution by Diagonalization What made the system in Example 1 easy to solve was the fact that each equation involved only one of the unknown functions, so its matrix formulation, , had a diagonal coefficient matrix A [Formula 9]. A more complicated situation occurs when some or all of the equations in the system involve more than one of the unknown functions, for in this case the coefficient matrix is not diagonal. Let us now consider how we might solve such a system. The basic idea for solving a system whose coefficient matrix A is not diagonal is to introduce a new unknown vector u that is related to the unknown vector y by an equation of the form in which P is an invertible matrix that diagonalizes A. Of course, such a matrix may or may not exist, but if it does then we can rewrite the equation as or alternatively as

Since P is assumed to diagonalize A, this equation has the form

where D is diagonal. We can now solve this equation for u using the method of Example 1, and then obtain y by matrix multiplication using the relationship . In summary, we have the following procedure for solving a system

in the case were A is diagonalizable.

A Procedure for Solving y′ = Ay if A is Diagonalizable Step 1. Find a matrix P that diagonalizes A. Step 2. Make the substitutions

and

to obtain a new “diagonal system”

, where

. Step 3. Solve

.

Step 4. Determine y from the equation

.

E X A M P L E 2 Solution Using Diagonalization (a) Solve the system

(b) Find the solution that satisfies the initial conditions

,

.

Solution (a) The coefficient matrix for the system is

As discussed in Section 5.2, A will be diagonalized by any matrix P whose columns are linearly independent eigenvectors of A. Since

the eigenvalues of A are

and

is an eigenvector of A corresponding to

If

. By definition,

if and only if x is a nontrivial solution of

, this system becomes

Solving this system yields

so

Thus,

is a basis for the eigenspace corresponding to

. Similarly, you can show that

is a basis for the eigenspace corresponding to

. Thus,

diagonalizes A, and

Thus, as noted in Step 2 of the procedure stated above, the substitution yields the “diagonal system”

From 2 the solution of this system is

so the equation

yields, as the solution for y,

or (11) (b) If we substitute the given initial conditions in 11, we obtain

Solving this system, we obtain the initial conditions is

so it follows from 11 that the solution satisfying

Remark Keep in mind that the method of Example 2 works because the coefficient matrix of the system can be diagonalized. In cases where this is not so, other methods are required. These are typically discussed in books devoted to differential equations.

Concept Review • Differential equation • Order of a differential equation • General solution • Particular solution • Initial condition • Initial-value problem • First-order linear system

Skills • Find the matrix form of a system of linear differential equations. • Find the general solution of a system of linear differential equations by diagonalization. • Find the particular solution of a system of linear differential equations satisfying an initial condition.

Exercise Set 5.4 1. (a) Solve the system

(b) Find the solution that satisfies the initial conditions Answer: (a)

(b) 2. (a) Solve the system

,

.

(b) Find the solution that satisfies the conditions

,

.

3. (a) Solve the system

(b) Find the solution that satisfies the initial conditions

,

,

.

Answer: (a)

(b)

4. Solve the system

5. Show that every solution of [Hint: Let

has the form

.

be a solution of the equation, and show that

is constant.]

6. Show that if A is diagonalizable and

is a solution of the system

, then each

is a linear combination of

where

are the eigenvalues of A. 7. Sometimes it is possible to solve a single higher-order linear differential equation with constant coefficients by expressing it as a system and applying the methods of this section. For the differential equation , show that the substitutions

and

lead to the system

Solve this system, and use the result to solve the original differential equation.

Answer:

8. Use the procedure in Exercise 7 to solve

.

9. Explain how you might use the procedure in Exercise 7 to solve

. Use your

procedure to solve the equation. Answer:

10. (a) By rewriting 11 in matrix form, show that the solution of the system in Example 2 can be expressed as

This is called the general solution of the system. (b) Note that in part (a), the vector in the first term is an eigenvector corresponding to the eigenvalue , and the vector in the second term is an eigenvector corresponding to the eigenvalue This is a special case of the following general result:

Theorem. If the coefficient matrix A of the system

is diagonalizable, then the general

solution of the system can be expressed as where

are the eigenvalues of A, and

is an eigenvector of A corresponding to

Prove this result by tracing through the four-step procedure preceding Example 2 with

11. Consider the system of differential equations

, where A is a

matrix. For what values of

do the component solutions tend to zero as true about the determinant and the trace of A for this to happen?

? In particular, what must be

12. Solve the nondiagonalizable system

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) Every system of differential equations

has a solution.

Answer: False (b) If

and

, then

and

, then

.

Answer: False (c) If

for all scalars c and d.

Answer: True (d) If A is a square matrix with distinct real eigenvalues, then it is possible to solve Answer: True (e) If A and P are similar matrices, then Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

and

have the same solutions.

by diagonalization.

Chapter 5 Supplementary Exercises 1. (a) Show that if

, then

has no eigenvalues and consequently no eigenvectors. (b) Give a geometric explanation of the result in part (a). Answer: (b) The transformation rotates vectors through the angle ; therefore, if is transformed into a vector in the same or opposite direction.

, then no nonzero vector

2. Find the eigenvalues of

3. (a) Show that if D is a diagonal matrix with nonnegative entries on the main diagonal, then there is a matrix S such that . (b) Show that if A is a diagonalizable matrix with nonnegative eigenvalues, then there is a matrix S such that . (c) Find a matrix S such that

, given that

Answer: (c)

4. Prove: If A is a square matrix, then A and

have the same characteristic polynomial.

5. Prove: If A is a square matrix and is the characteristic polynomial of A, then the coefficient of in is the negative of the trace of A. 6. Prove: If

, then

is not diagonalizable. 7. In advanced linear algebra, one proves the Cayley—Hamilton Theorem, which states that a square matrix

A satisfies its characteristic equation; that is, if is the characteristic equation of A, then Verify this result for

In Exercises 8–10, use the Cayley—Hamilton Theorem, stated in Exercise 7. 8. (a) Use Exercise 18 of Section 5.1 to prove the Cayley—Hamilton Theorem for (b) Prove the Cayley—Hamilton Theorem for

matrices.

diagonalizable matrices.

9. The Cayley—Hamilton Theorem provides a method for calculating powers of a matrix. For example, if A is a matrix with characteristic equation then

, so

Multiplying through by A yields , which expresses in terms of and A, and multiplying through by yields , which expresses in terms of and . Continuing in this way, we can calculate successive powers of A by expressing them in terms of lower powers. Use this procedure to calculate and for

Answer:

10. Use the method of the preceding exercise to calculate

11. Find the eigenvalues of the matrix

Answer:

and

for

12. (a) It was shown in Exercise 17 of Section 5.1 that if A is an matrix, then the coefficient of in the characteristic polynomial of A is 1. (A polynomial with this property is called monic.) Show that the matrix

has characteristic polynomial This shows that every monic polynomial is the characteristic polynomial of some matrix. The matrix in this example is called the companion matrix of . [Hint: Evaluate all determinants in the problem by adding a multiple of the second row to the first to introduce a zero at the top of the first column, and then expanding by cofactors along the first column.] (b) Find a matrix with characteristic polynomial

13. A square matrix A is called nilpotent if eigenvalues of a nilpotent matrix?

for some positive integer n. What can you say about the

Answer: They are all 0. 14. Prove: If A is an 15. Find a

matrix and n is odd, then A has at least one real eigenvalue.

matrix A that has eigenvalues

, and

with corresponding eigenvectors

respectively. Answer:

16. Suppose that a

matrix A has eigenvalues

,

(a) Use the method of Exercise 16 of Section 5.1 to find (b) Use Exercise 5 above to find 17. Let A be a square matrix such that

,

, and

.

.

. . What can you say about the eigenvalues of A?

Answer: They are all 0, 1, or

.

18. (a) Solve the system

(b) Find the solution satisfying the initial conditions

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

and

.

CHAPTER

6

Inner Product Spaces

CHAPTER CONTENTS 6.1. Inner Products 6.2. Angle and Orthogonality in Inner Product Spaces 6.3. Gram–Schmidt Process; QR-Decomposition 6.4. Best Approximation; Least Squares 6.5. Least Squares Fitting to Data 6.6. Function Approximation; Fourier Series

INTRODUCTION In Chapter 3 we defined the dot product of vectors in , and we used that concept to define notions of length, angle, distance, and orthogonality. In this chapter we will generalize those ideas so they are applicable in any vector space, not just . We will also discuss various applications of these ideas.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

6.1 Inner Products In this section we will use the most important properties of the dot product on as axioms, which, if satisfied by the vectors in a vector space V, will enable us to extend the notions of length, distance, angle, and perpendicularity to general vector spaces.

General Inner Products In Definition 4 of Section 3.2 we defined the dot product of two vectors in , and in Theorem 3.2.2 we listed four fundamental properties of such products. Our first goal in this section is to extend the notion of a dot product to general real vector spaces by using those four properties as axioms. We make the following definition. Note that Definition 1 applies only to real vector spaces. A definition of inner products on complex vector spaces is given in the exercises. Since we will have little need for complex vector spaces from this point on, you can assume that all vector spaces under discussion are real, even though some of the theorems are also valid in complex vector spaces.

DEFINITION 1 An inner product on a real vector space V is a function that associates a real number with each pair of vectors in V in such a way that the following axioms are satisfied for all vectors u, v, and w in V and all scalars k. 1.

[Symmetry axiom]

2.

[Additivity axiom]

3. 4.

[Homogeneity axiom] and

if and only if

[Positivity axiom]

A real vector space with an inner product is called a real product space.

Because the axioms for a real inner product space are based on properties of the dot product, these inner product space axioms will be satisfied automatically if we define the inner product of two vectors u and v in to be This inner product is commonly called the Euclidean inner product (or the standard inner product) on to distinguish it from other possible inner products that might be defined on . We call with the Euclidean inner product Euclidean n-space. Inner products can be used to define notions of norm and distance in a general inner product space just as we did with dot products in . Recall from Formulas 11 and 19 of Section 3.2 that if u and v are vectors in Euclidean n-space, then norm and distance can be expressed in terms of the dot product as Motivated by these formulas we make the following definition.

DEFINITION 2 If V is a real inner product space, then the norm (or length) of a vector v in V is denoted by

and the distance between two vectors is denoted by

and is defined by

and is defined by

A vector of norm 1 is called a unit vector.

The following theorem, which we state without proof, shows that norms and distances in real inner product spaces have many of the properties that you might expect.

THEOREM 6.1.1 If u and v are vectors in a real inner product space V, and if k is a scalar, then: (a)

with equality if and only if

(b)

.

.

(c)

.

(d)

with equality if and only if

.

Although the Euclidean inner product is the most important inner product on desirable to modify it by weighting each term differently. More precisely, if are positive real numbers, which we will call weights, and if then it can be shown that the formula

, there are various applications in which it is

and

are vectors in

,

(1) defines an inner product on

that we call the weighted Euclidean inner product with weights

.

Note that the standard Euclidean inner product is the special case of the weighted Euclidean inner product in which all the weights are 1.

E X A M P L E 1 Weighted Euclidean Inner Product Let

and

be vectors in

. Verify that the weighted Euclidean inner product (2)

satisfies the four inner product axioms. Solution Axiom 1: Interchanging u and v in Formula 2 does not change the sum on the right side, so

. Axiom 2: If

, then

Axiom 3:

Axiom 4:

with equality if and only if

and only if

; that is, if

.

In Example 1, we are using subscripted w's to denote the components of thevector w, not the weights. The weights are the numbers 3 and 2 in Formula 2.

An Application of Weighted Euclidean Inner Products To illustrate one way in which a weighted Euclidean inner product can arise, suppose that some physical experiment has n possible numerical outcomes and that a series of m repetitions of the experiment yields these values with various frequencies. Specifically, suppose that occurs times, occurs times, and so forth. Since there are a total of m repetitions of the experiment, it follows that Thus, the arithmetic average of the observed numerical values (denoted by ) is (3) If we let

then 3 can be expressed as the weighted Euclidean inner product

E X A M P L E 2 Using a Weighted Euclidean Inner Product

It is important to keep in mind that norm and distance depend on the inner product being used. If the inner product is changed, then the norms and distances between vectors also change. For example, for the vectors and in with the Euclidean inner product we have

and

but if we change to the weighted Euclidean inner product we have

and

Unit Circles and Spheres in Inner Product Spaces If V is an inner product space, then the set of points in V that satisfy is called the unit sphere or sometimes the unit circle in V.

E X A M P L E 3 Unusual Unit Circles in R2 (a) Sketch the unit circle in an xy-coordinate system in .

using the Euclidean inner product

(b) Sketch the unit circle in an xy-coordinate system in

using the weighted Euclidean inner product

. Solution (a) If

, then

, so the equation of the unit circle is

squaring both sides, As expected, the graph of this equation is a circle of radius 1 centered at the origin (Figure 6.1.1 a). (b) If

, then

, so the equation of the unit circle is , or, on squaring both sides,

The graph of this equation is the ellipse shown in Figure 6.1.1b.

, or, on

Figure 6.1.1

Remark It may seem odd that the “unit circle” in the second part of the last example turned out to have an elliptical shape. This will make more sense if you think of circles and spheres in general vector spaces algebraically rather than geometrically. The change in geometry occurs because the norm, not being Euclidean, has the effect of distorting the space that we are used to seeing through “Euclidean eyes.”

Inner Products Generated by Matrices The Euclidean inner product and the weighted Euclidean inner products are special cases of a general class of inner products on called matrix inner products. To define this class of inner products, let u and v be vectors in that are expressed in column form, and let A be an nvertible matrix. It can be shown (Exercise 31) that if is the Euclidean inner product on , then the formula (4) also defines an inner product; it is called the inner product on Rn generated by A. Recall from Table 1 of Section 3.2 that if u and v are in column form, then that 4 can be expressed as

or, equivalently as

can be written as

from which it follows

(5)

E X A M P L E 4 Matrices Generating Weighted Euclidean Inner Products The standard Euclidean and weighted Euclidean inner products are examples of matrix inner products. The standard Euclidean inner product on is generated by the identity matrix, since setting in Formula 4 yields and the weighted Euclidean inner product (6) is generated by the matrix

(7)

This can be seen by first observing that is the diagonal matrix whose diagonal entries are the weights and then observing that 5 simplifies to 6 when A is the matrix in Formula 7.

E X A M P L E 5 Example 1 Revisited Every diagonal matrix with positive diagonal entries generates a weighted inner product. Why? The weighted Euclidean inner product generated by

discussed in Example 1 is the inner product on

Other Examples of Inner Products So far, we have only considered examples of inner products on of the other kinds of vector spaces that we discussed earlier.

E X A M P L E 6 An Inner Product on Mnn If U and V are

matrices, then the formula

. We will now consider examples of inner products on some

(8) defines an inner product on the vector space (see Definition 8 of Section 1.3 for a definition of trace). This can be proved by confirming that the four inner product space axioms are satisfied, but you can visualize why this is so by computing 8 for the matrices

This yields

which is just the dot product of the corresponding entries in the two matrices. For example, if

then The norm of a matrix U relative to this inner product is

E X A M P L E 7 The Standard Inner Product on Pn If are polynomials in , then the following formula defines an inner product on standard inner product on this space:

(verify) that we will call the

(9) The norm of a polynomial p relative to this inner product is

E X A M P L E 8 The Evaluation Inner Product on Pn If are polynomials in

, and if

are distinct real numbers (called sample points), then the formula (10)

defines an inner product on viewed as the dot product in

called the evaluation inner product at of the n-tuples

. Algebraically, this can be

and hence the first three inner product axioms follow from properties of the dot product. The fourth inner product axiom follows from the fact that

with equality holding if and only if But a nonzero polynomial of degree n or less can have at most n distinct roots, so it must be that proves that the fourth inner product axiom holds.

, which

The norm of a polynomial p relative to the evaluation inner product is (11)

E X A M P L E 9 Working with the Evaluation Inner Product Let

have the evaluation inner product at the points

Compute

and

for the polynomials

and

.

Solution It follows from 10 and 11 that

CALCULUS REQUIRED

E X A M P L E 1 0 An Inner Product on C[a, b] Let

and

be two functions in

and define (12)

We will show that this formula defines an inner product on for functions , , and in : 1.

which proves that Axiom 1 holds. 2.

which proves that Axiom 2 holds.

by verifying the four inner product axioms

3.

which proves that Axiom 3 holds. 4. If

is any function in

, then (13)

since

for all x in the interval

. Moreover because f is continuous on

holds in Formula 13 if and only if the function f is identically zero on this proves that Axiom 4 holds.

, the equality

, that is, if and only if

; and

CALCULUS REQUIRED

E X A M P L E 11 Norm of a Vector in C[a, b] If has the inner product that was defined in Example 10, then the norm of a function to this inner product is

relative

(14) and the unit sphere in this space consists of all functions f in

Remark Note that the vector space is a subspace of Formula 12 defines an inner product on . Remark Recall from calculus that the arc length of a curve

that satisfy the equation

because polynomials are continuous functions. Thus, over an interval

is given by the formula

(15) Do not confuse this concept of arc length with Formulas 14 and 15 are quite different.

, which is the length (norm) of f when f is viewed as a vector in

.

Algebraic Properties of Inner Products The following theorem lists some of the algebraic properties of inner products that follow from the inner product axioms. This result is a generalization of Theorem 3.2.3, which applied only to the dot product on .

THEOREM 6.1.2 If u, v, and w are vectors in a real inner product space V, and if k is a scalar, then

(a) (b) (c) (d) (e)

Proof We will prove part (b) and leave the proofs ofthe remaining parts as exercises.

The following example illustrates how Theorem 6.1.2 and the defining properties of inner products can be used to perform algebraic computations with inner products. As you read through the example, you will find it instructive to justify the steps.

E X A M P L E 1 2 Calculating with Inner Products

Concept Review • Inner product axioms • Euclidean inner product • Euclidean n-space • Weighted Euclidean inner product • Unit circle (sphere) • Matrix inner product • Norm in an inner product space • Distance between two vectors in an inner product space • Examples of inner products • Properties of inner products

Skills • Compute the inner product of two vectors. • Find the norm of a vector. • Find the distance between two vectors.

• Show that a given formula defines an inner product. • Show that a given formula does not define an inner product by demonstrating that at least one of the inner product space axioms fails.

Exercise Set 6.1 1. Let be the Euclidean inner product on following.

, and let

,

,

, and

. Compute the

(a) (b) (c) (d) (e) (f) Answer: (a) 5 (b) (c) (d) (e) (f) 2. Repeat Exercise 1 for the weighted Euclidean inner product 3. Let be the Euclidean inner product on following.

, and let

. ,

,

, and

. Verify the

(a) (b) (c) (d) (e) Answer: (a) 2 (b) 11 (c) (d) (e) 0 4. Repeat Exercise 3 for the weighted Euclidean inner product 5.

Let following.

be the inner product on

generated by

, and let

. ,

,

. Compute the

(a) (b) (c) (d) (e) (f) Answer: (a) (b) 1 (c) (d) 1 (e) 1 (f) 1 6.

Repeat Exercise 5 for the inner product on

7. Compute

generated by

.

using the inner product in Example 6.

(a) (b)

Answer: (a) 3 (b) 56 8. Compute

using the inner product in Example 7.

(a)

,

(b)

,

9. (a) Use Formula 4 to show that

(b) Use the inner product in part (a) to compute Answer: (b) 29 10. (a) Use Formula 4 to show that is the inner product on

generated by

is the inner product on

if

and

generated by

.

(b) Use the inner product in part (a) to compute 11. Let generates it.

and

if

and

.

. In each part, the given expression is an inner product on

. Find a matrix that

(a) (b) Answer: (a)

(b)

12. Let

have the inner product in Example 7. In each part, find

.

(a) (b) 13. Let

have the inner product in Example 6. In each part, find

.

(a) (b)

Answer: (a) (b) 0 14. Let

have the inner product in Example 7. Find

15. Let

have the inner product in Example 6. Find

.

.

(a) (b)

Answer: (a) (b) 16. Let

have the inner product of Example 9, and let

(a) (b) (c) 17. Let

have the evaluation inner product at the sample points

and

. Compute the following.

Find

and

for

and

.

Answer:

18. In each part, use the given inner product on

to find

, where

.

(a) the Euclidean inner product (b) the weighted Euclidean inner product

, where

and

(c) the inner product generated by the matrix

19. Use the inner products in Exercise 18 to find

for

Answer: (a) (b) (c) 20. Suppose that u, v, and w are vectors such that

Evaluate the given expression. (a) (b) (c) (d) (e) (f) 21. Sketch the unit circle in (a) (b) Answer: (a)

(b)

using the given inner product.

and

.

22. Find a weighted Euclidean inner product on

for which the unit circle is the ellipse shown in the accompanying figure.

Figure Ex-22 23. Let axioms hold.

and

. Show that the following are inner products on

by verifying that the inner product

(a) (b) Answer: For

, then

, so Axiom 4 fails.

24. Let and not, list the axioms that do not hold.

. Determine which of the following are inner products on

. For those that are

(a) (b) (c) (d) 25. Show that the following identity holds for vectors in any inner product space.

Answer: (a) (b) 0 26. Show that the following identity holds for vectors in any inner product space.

27. Let

and

. Show that

28. Calculus required Let the vector space

have the inner product

is not an inner product on

.

(a) Find

for

(b) Find

,

, and

if

and

. .

29. Calculus required Use the inner product

on

, to compute

.

(a) (b)

, ,

30. Calculus required In each part, use the inner product

on

to compute

.

(a) (b) (c) 31. Prove that Formula 4 defines an inner product on

.

32. The definition of a complex vector space was given in the first margin note in Section 4.1. The definition of a complex inner product on a complex vector space V is identical to Definition 1 except that scalars are allowed to be complex numbers, and Axiom 1 is replaced by . The remaining axioms are unchanged. A complex vector space with a complex inner product is called a complex inner product space. Prove that if V is a complex inner product space then .

True-False Exercises In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) The dot product on

is an example of a weighted inner product.

Answer: True (b) The inner product of two vectors cannot be a negative real number. Answer: False (c)

. Answer: True

(d)

. Answer: True

(e) If

, then

or

.

Answer: False (f) If

, then

.

Answer: True (g) If A is an

matrix, then

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

defines an inner product on

.

6.2 Angle and Orthogonality in Inner Product Spaces In Section 3.2 we defined the notion of “angle” between vector in Rn. In this section we will extend this idea to general vector spaces. This will enable us to extend the notion of orthogonality as well, thereby setting the groundwork for a variety of new applications.

Cauchy–Schwarz Inequality Recall from Formula 20 of Section 3.2 that the angle between two vectors u and v in

is (1)

We were assured that this formula was valid because it followed from the Cauchy–Schwarz inequality (Theorem 3.2.4) that (2) as required for the inverse cosine to be defined. The following generalization of Theorem 3.2.4 will enable us to define the angle between two vectors in any real inner product space.

THEOREM 6.2.1 Cauchy–Schwarz Inequality If u and v are vectors in a real inner product space V, then (3)

Proof We warn you in advance that the proof presented here depends on a clever trick that is not easy to motivate. In the case where the two sides of 3 are equal since consider the case where . Making this assumption, let

and

are both zero. Thus, we need only

and let t be any real number. Since the positivity axiom states that the inner product of any vector with itself is nonnegative, it follows that

This inequality implies that the quadratic polynomial

has either no real roots or a repeated real

root. Therefore, its discriminant must satisfy the inequality

. Expressing the coefficients

and c in terms of the vectors u and v gives

or, equivalently,

Taking square roots of both sides and using the fact that

and

,

are nonnegative yields

which completes the proof. The following two alternative forms of the Cauchy–Schwarz inequality are useful to know: (4)

(5) The first of these formulas was obtained in the proof of Theorem 6.2.1, and the second is a variation of the first.

Angle Between Vectors Our next goal is to define what is meant by the “angle” between vectors in a real inner product space. As the first step, we leave it for you to use the Cauchy–Schwarz inequality to show that (6) This being the case, there is a unique angle in radian measure for which (7) (Figure 6.2.1). This enables us to define the angle θ between u and v to be (8)

Figure 6.2.1

E X A M P L E 1 Cosine of an Angle Between Two Vectors in R4 Let

have the Euclidean inner product. Find the cosine of the angle between the vectors and .

Solution We leave it for you to verify that

from which it follows that

Properties of Length and Distance in General Inner Product Spaces In Section 3.2 we used the dot product to extend the notions of length and distance to , and we showed that various familiar theorems remained valid (see Theorem 3.2.5, Theorem 3.2.6, and Theorem 3.2.7). By making only minor adjustments to the proofs of those theorems, we can show that they remain valid in any real inner product space. For example, here is the generalization of Theorem 3.2.5 (the triangle inequalities).

THEOREM 6.2.2 If u, v, and w are vectors in a real inner product space V, and if k is any scalar, then: (a) (b)

[Triangle inequality for vectors] [Triangle inequality for distances]

Proof (a)

Taking square roots gives

.

Proof (b) Identical to the proof of part (b) of Theorem 3.2.5.

Orthogonality Although Example 1 is a useful mathematical exercise, there is only an occasional need to compute angles in vector spaces other than and . A problem of more interest in general vector spaces is ascertaining whether the angle between vectors is . You should be able to see from Formula 8 that if u and v are nonzero vectors, then the angle between them is if and only if . Accordingly, we make the following definition (which is applicable even if one or both of the vectors is zero).

DEFINITION 1 Two vectors u and v in an inner product space are called orthogonal if

.

As the following example shows, orthogonality depends on the inner product in the sense that for different inner products two vectors can be orthogonal with respect to one but not the other.

E X A M P L E 2 Orthogonality Depends on the Inner Product The vectors product on , since

and

are orthogonal with respect to the Euclidean inner

However, they are not orthogonal with respect to the weighted Euclidean inner product , since

E X A M P L E 3 Orthogonal Vectors in M22 If

has the inner product of Example 6 in the preceding section, then the matrices

are orthogonal, since

CALCULUS REQUIRED

E X A M P L E 4 Orthogonal Vectors in P2 Let

have the inner product

and let

Because

and

. Then

, the vectors

and

are orthogonal relative to the given inner

product.

In Section 3.3 we proved the Theorem of Pythagoras for vectors in Euclidean n-space. The following theorem extends this result to vectors in any real inner product space.

THEOREM 6.2.3 Generalized Theorem of Pythagoras If u and v are orthogonal vectors in an inner product space, then

Proof The orthogonality of u and v implies that

, so

CALCULUS REQUIRED

E X A M P L E 5 Theorem of Pythagoras in P2

In Example 4 we showed that

on

and

are orthogonal with respect to the inner product

. It follows from Theorem 6.2.3 that

Thus, from the computations in Example 4, we have

We can check this result by direct integration:

Orthogonal Complements In Section 4.8 we defined the notion of an orthogonal complement for subspaces of , and we used that definition to establish a geometric link between the fundamental spaces of a matrix. The following definition extends that idea to general inner product spaces.

DEFINITION 2 If W is a subspace of an inner product space V, then the set of all vectors in V that are orthogonal to every vector in W is called the orthogonal complement of W and is denoted by the symbol .

In Theorem 4.8.8 we stated three properties of orthogonal complements in . The following theorem generalizes parts (a) and (b) of that theorem to general inner product spaces.

THEOREM 6.2.4 If W is a subspace of an inner product space V, then: (a) (b)

is a subspace of V. .

Proof (a) The set contains at least the zero vector, since for every vector w in W. Thus, it remains to show that is closed under addition and scalar multiplication. To do this, suppose that u and v are vectors in , so that for every vector w in W we have and . It follows from the additivity and homogeneity axioms of inner products that

which proves that

and

are in

.

Proof (b) If v is any vector in both W and , then v is orthogonal to itself; that is, from the positivity axiom for inner products that .

. It follows

The next theorem, which we state without proof, generalizes part (c) of Theorem 4.8.8. Note, however, that this theorem applies only to finite-dimensional inner product spaces, whereas Theorem 6.2.5 does not have this restriction.

THEOREM 6.2.5 Theorem 6.2.5 implies that in a finitedimensional inner product space orthogonal complements occur in pairs, each being orthogonal to the other (Figure 6.2.2). Theorem 6.2.5 If W is a subspace of a finite-dimensional inner product space V, then the orthogonal complement of is W; that is,

Figure 6.2.2 Each vector in W is orthogonal to each vector in W⊥ and conversely

In our study of the fundamental spaces of a matrix in Section 4.8 we showed that the row space and null space of a matrix are orthogonal complements with respect to the Euclidean inner product on (Theorem 4.8.9). The following example takes advantage of that fact.

E X A M P L E 6 Basis for an Orthogonal Complement Let W be the subspace of

spanned by the vectors

Find a basis for the orthogonal complement of W. Solution The space W is the same as the row space of the matrix

Since the row space and null space of A are orthogonal complements, our problem reduces to finding a basis for the null space of this matrix. In Example 4 of Section 4.7 we showed that

form a basis for this null space. Expressing these vectors in comma-delimited form (to match that of , and ), we obtain the basis vectors You may want to check that these vectors are orthogonal to the necessary dot products.

Concept Review • Cauchy–Schwarz inequality • Angle between vectors • Orthogonal vectors • Orthogonal complement

Skills

,

,

, and

by computing

• Find the angle between two vectors in an inner product space. • Determine whether two vectors in an inner product space are orthogonal. • Find a basis for the orthogonal complement of a subspace of an inner product space.

Exercise Set 6.2 1. Let , and v.

, and

have the Euclidean inner product. In each part, find the cosine of the angle between u

(a) (b) (c) (d) (e) (f) Answer: (a) (b) (c) 0 (d) (e) (f) 2. Let

have the inner product in Example 7 of Section 6.1 . Find the cosine of the angle between pand q.

(a) (b) 3. Let B. (a) (b)

have the inner product in Example 6 of Section 6.1 . Find the cosine of the angle between A and

Answer: (a) (b) 0 4. In each part, determine whether the given vectors are orthogonal withrespect to the Euclidean inner product. (a) (b) (c) (d) (e) (f) 5. Show that

and

are orthogonal with respect to the inner product in Exercise

2. 6. Let

Which of the following matrices are orthogonal to A with respect to the inner product in Exercise 3? (a) (b) (c) (d) 7. Do there exist scalars k and l such that the vectors , mutually orthogonal with respect to the Euclidean inner product?

, and

are

Answer: No 8. Let have the Euclidean inner product, and suppose that value of k for which . 9. Let (a) (b)

and

have the Euclidean inner product. For which values of k are u and v orthogonal?

. Find a

Answer: (a) (b) 10. Let have the Euclidean inner product. Find two unit vectors that are orthogonal to all three of the vectors , , and . 11. In each part, verify that the Cauchy–Schwarz inequality holds for the given vectors using the Euclidean inner product. (a) (b) (c) (d) 12. In each part, verify that the Cauchy–Schwarz inequality holds for the given vectors. (a)

and

using the inner product of Example 1 of Section 6.1 .

(b)

using the inner product in Example 6 of Section 6.1 .

(c)

and

using the inner product given in Example 7 of Section 6.1 .

13. Let have the Euclidean inner product, and let orthogonal to the subspace spanned by the vectors .

. Determine whether the vector u is , , and

Answer: No In Exercises 14–15, assume that 14. Let W be the line in

has the Euclidean inner product.

with equation

15. (a) Let W be the plane in (b) Let W be the line in Find an equation for

. Find an equation for

with equation with parametric equations .

(c) Let W be the intersection of the two planes in Answer: (a)

. Find an equation for

.

.

. Find parametric equations for

.

(b) (c) 16. Find a basis for the orthogonal complement of the subspace of (a)

,

(b)

,

(c)

,

(d)

spanned by the vectors.

, , ,

,

,

17. Let V be an inner product space. Show that if u and v are orthogonal unit vectors in V, then . 18. Let V be an inner product space. Show that if w is orthogonal to both and , then it is orthogonal to for all scalars and . Interpret this result geometrically in the case where V is with the Euclidean inner product. 19. Let V be an inner product space. Show that if w is orthogonal to each of the vectors is orthogonal to every vector in span .

, then it

20. Let be a basis for an inner product space V. Show that the zero vector is the only vector in V that is orthogonal to all of the basis vectors. 21. Let be a basis for a subspace W of V. Show that orthogonal to every basis vector. 22. Prove the following generalization of Theorem 6.2.3: If an inner product space V, then

23. Prove: If u and v are

matrices and A is an

consists of all vectors in V that are are pairwise orthogonal vectors in

matrix, then

24. Use the Cauchy–Schwarz inequality to prove that for all real values of a, b, and ,

25. Prove: If are any two vectors in

are positive real numbers, and if , then

and

26. Show that equality holds in the Cauchy–Schwarz inequality if and only if u and v are linearly dependent. 27. Use vector methods to prove that a triangle that is inscribed in a circle so that it has a diameter for a side must be a right triangle. [Hint: Express the vectors and in the accompanying figure in terms of u andv.]

Figure Ex-27 28. As illustrated in the accompanying figure, the vectors

and

have norm 2 and

an angle of 60° between them relative to the Euclidean inner product. Find a weighted Euclidean inner product with respect to which u and v are orthogonal unit vectors.

Figure Ex-28 29. Calculus required Let

and

be continuous functions on

. Prove:

(a)

(b)

[Hint: Use the Cauchy–Schwarz inequality.] 30. Calculus required Let

and let 31. (a) Let W be the line

have the inner product

. Show that if

, then

in an xy-coordinate system in

(b) Let W be the y-axis in an xyz-coordinate system in (c) Let W be the yz-plane of an xyz-coordinate system in

and

are orthogonal vectors.

. Describe the subspace . Describe the subspace

. .

. Describe the subspace

Answer: (a) The line (b) The xz-plane (c) The x-axis 32. Prove that Formula 4 holds for all nonzero vectors u and v in an inner product space V.

.

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer. (a) If u is orthogonal to every vector of a subspace W, then

.

Answer: False (b) If u is a vector in both W and

, then

.

Answer: True (c) If u and v are vectors in

, then

is in

.

Answer: True (d) If u is a vector in

and k is a real number, then

is in

Answer: True (e) If u and v are orthogonal, then

.

Answer: False (f) If u and v are orthogonal, then Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

.

.

6.3 Gram–Schmidt Process; QR-Decomposition In many problems involving vector spaces, the problem solver is free to choose any basis for the vector space that seems appropriate. In inner product spaces, the solution of a problem is often greatly simplified by choosing a basis in which the vectors are orthogonal to one another. In this section we will show how such bases can be obtained.

Orthogonal and Orthonormal Sets Recall from Section 6.2 that two vectors in an inner product space are said to be orthogonal if their inner product is zero. The following definition extends the notion of orthogonality to sets of vectors in an inner product space.

DEFINITION 1 A set of two or more vectors in a real inner product space is said to be orthogonal if all pairs of distinct vectors in the set are orthogonal. An orthogonal set in which each vector has norm 1 is said to be orthogonal.

E X A M P L E 1 An Orthogonal Set in R3 Let and assume that

has the Euclidean inner product. It follows that the set of vectors is orthogonal since .

If v is a nonzero vector in an inner product space, then it follows from Theorem 6.1.1b with

that

from which we see that multiplying a nonzero vector by the reciprocal of its norm produces a vector of norm 1. This process is called normalizing v. It follows that any orthogonal set of nonzero vectors can be converted to an orthonormal set by normalizing each of its vectors.

E X A M P L E 2 Constructing an Orthonormal Set The Euclidean norms of the vectors in Example 1 are Consequently, normalizing

,

, and

yields

We leave it for you to verify that the set

is orthonormal by showing that

In any two nonzero perpendicular vectors are linearly independent because neither is a scalar multiple of the other; and in any three nonzero mutually perpendicular vectors are linearly independent because no one lies in the plane of the other two (and hence is not expressible as a linear combination of the other two). The following theorem generalizes these observations.

THEOREM 6.3.1 If independent.

is an orthogonal set of nonzero vectors in an inner product space, then S is linearly

Proof Assume that

(1) To demonstrate that For each

is linearly independent, we must prove that

.

in S, it follows from 1 that

or, equivalently, From the orthogonality of S it follows that

when

, so this equation reduces to

Since the vectors in S are assumed to be nonzero, it follows from the positivity axiom for inner products that . Thus, the preceding equation implies that each in Equation 1 is zero, which is what we wanted to prove. Since an orthonormal set is orthogonal, and since its vectors are nonzero (norm 1), it follows from Theorem 6.3.1 that every orthonormal set is linearly independent.

In an inner product space, a basis consisting of orthonormal vectors is called an orthonormal basis, and a basis

consisting of orthogonal vectors is called an orthogonal basis. A familiar example of an orthonormal basis is the standard basis for with the Euclidean inner product:

E X A M P L E 3 An Orthonormal Basis In Example 2 we showed that the vectors

form an orthonormal set with respect to the Euclidean inner product on . By Theorem 6.3.1, these vectors form a linearlyindependent set, and since is three-dimensional, it follows from Theorem 4.5.4 that is an orthonormal basis for .

Coordinates Relative to Orthonormal Bases One way to express a vector u as a linear combination of basis vectors is to convert the vector equation to a linear system and solve for the coefficients . However, if the basis happens to be orthogonal or orthonormal, then the following theorem shows that the coefficients can be obtained more simply by computing appropriate inner products.

THEOREM 6.3.2 (a) If then

is an orthogonal basis for an inner product space V, and if u is any vector in V,

(2) (b) If then

is an orthonormal basis for an inner product space V, and if u is any vector in V,

(3)

Proof (a) Since

is a basis for V, every vector u in V can be expressed in the form

We will complete the proof by showing that (4) for

. To do this, observe first that

Since S is an orthogonal set, all of the inner products in the last equality are zero except the ith, so we have

Solving this equation for

yields 4, which completes the proof.

Proof (b) In this case,

, so Formula 2 simplifies to Formula 3.

Using the terminology and notation from Definition 2 of Section 4.4, it follows from Theorem 6.3.2 that the coordinate vector of a vector u in V relative to an orthogonal basis is (5) and relative to an orthonormal basis

is (6)

E X A M P L E 4 A Coordinate Vector Relative to an Orthonormal Basis Let

It is easy to check that Express the vector .

is an orthonormal basis for with the Euclidean inner product. as a linear combination of the vectors in S, and find the coordinate vector

Solution We leave it for you to verify that

Therefore, by Theorem 6.3.2 we have

that is,

Thus, the coordinate vector of u relative to S is

E X A M P L E 5 An Orthonormal Basis from an Orthogonal Basis (a) Show that the vectors form an orthogonal basis for with the Euclidean inner product, and use that basis to find an orthonormal basis by normalizing each vector. (b) Express the vector in part (a).

as a linear combination of the orthonormal basis vectors obtained

Solution (a) The given vectors form an orthogonal set since It follows from Theorem 6.3.1 that these vectors are linearly independent and hence form a basis for by Theorem 4.5.4. We leave it for you to calculate the norms of , and and then obtain the orthonormal basis

(b) It follows from Formula 3 that We leave it for you to confirm that

and hence that

Orthogonal Projections Many applied problems are best solved by working with orthogonal or orthonormal basis vectors. Such bases are typically found by starting with some simple basis (say a standard basis) and then converting that basis into an

orthogonal or orthonormal basis. To explain exactly how that is done will require some preliminary ideas about orthogonal projections. In Section 3.3 we proved a result called the Prohection Theorem (see Theorem 3.3.2) which dealt with the problem of decomposing a vector u in into a sum of two terms, and , in which is the orthogonal projection of u on some nonzero vector a and is orthogonal to (Figure 3.3.2). That result is a special case of the following more general theorem.

THEOREM 6.3.3 Projection Theorem If W is a finite-dimensional subspace of an inner product space V,then every vector u in V can be expressed in exactly oneway as (7) where

The vectors

is in W and

and

is in

.

in Formula 7 are commonly denoted by (8)

They are called the orthogonal projection of u on W and the orthogonal projection of u on , respectively. The vector is also called the component of u orthogonal to W. Using the notation in 8, Formula 7 can be expressed as (9) (Figure 6.3.1). Moreover, since

, we can also express Formula 9 as (10)

Figure 6.3.1 The following theorem provides formulas for calculating orthogonal projections.

THEOREM 6.3.4 Let W be a finite-dimensional subspace of an inner product space V. (a) If

is an orthogonal basis for W, and u is any vector in V, then (11)

(b) If

is an orthonormal basis for W, and u is any vector in V, then (12)

Proof (a) It follows from Theorem 6.3.3 that the vector u can be expressed in the form is in W and is in ; and it follows from Theorem 6.3.2 that the component expressed in terms of the basis vectors for W as

, where can be

(13) Since

is orthogonal to W, it follows that

so we can rewrite 13 as

or, equivalently, as

Proof (a) In this case,

, so Formula 13 simplifies to Formula 12.

E X A M P L E 6 Calculating Projections Let

have the Euclidean inner product, and let W be the subspace spanned by the orthonormal

vectors

and on W is

The component of u orthogonal to W is

. From Formula 12 the orthogonal projection of

Observe that is orthogonal to both and the space W spanned by and , as it should be.

, so this vector is orthogonal to each vector in

A Geometric Interpretation of Orthogonal Projections If W is a one-dimensional subspace of an inner product space V, say span term

, then Formula 11 has only the one

In the special case where V is with the Euclidean inner product, this is exactly Formula 10 of Section 3.3 for the orthogonal projection of u along a. This suggests that we can think of 11 as the sum of orthogonal projections on “axes” determined by the basis vectors for the subspace W (Figure 6.3.2).

Figure 6.3.2

The Gram–Schmidt Process We have seen that orthonormal bases exhibit a variety of useful properties. Our next theorem, which is the main result in this section, shows that every nonzero finite-dimensional vector space has an orthonormal basis. The proof of this result is extremely important, since it provides an algorithm, or method, for converting an arbitrary basis into an orthonormal basis.

THEOREM 6.3.5 Every nonzero finite-dimensional inner product space has an orthonormal basis.

Proof Let W be any nonzero finite-dimensional subspace of an inner product space, and suppose that is any basis for W. It suffices to show that W has an orthogonal basis, since the vectors in that basis can be normalized to obtain an orthonormal basis. The following sequence of steps will produce an orthogonal basis for W:

Step 1. Let

.

Step 2. As illustrated in Figure 6.3.3, we can obtain a vector that is orthogonal to by computing the component of that is orthogonal to the space spanned by . Using Formula 11 to perform this computation we obtain

Of course, if , then is not a basis vector. But this cannot happen, since it would then follow from the above formula for that

which implies that

is a multiple of .

, contradicting the linear independence of the basis

Figure 6.3.3 Step 3. To construct a vector that is orthogonal to both and , we compute the component of orthogonal to the space spanned by and (Figure 6.3.4). Using Formula 11 to perform this computation we obtain

As in Step 2, the linear independence of

ensures that

. We leave the details for you.

Figure 6.3.4 Step 4. To determine a vector that is orthogonal to , , and to the space spanned by , , and . From 11,

, we compute the component of

orthogonal

Continuing in this way we will produce an orthogonal set of vectors after r steps. Since orthogonal sets are linearly independent, this set will be an orthogonal basis for the r-dimensional space W. By normalizing these basis vectors we can obtain an orthonormal basis.

The step-by-step construction of an orthogonal (or orthonormal) basis given in the foregoing proof is called the Gram–Schmidt process. For reference, we provide the following summary of the steps.

The Gram–Schmidt Process To convert a basis computations:

into an orthogonal basis

, perform the following

Step 1. Step 2. Step 3. Step 4.

(continue for r steps) Optional Step. To convert the orthogonal basis into an orthonormal basis orthogonal basis vectors.

, normalize the

E X A M P L E 7 Using the Gram–Schmidt Process Assume that the vector space to transform the basis vectors into an orthogonal basis orthonormal basis Solution Step 1. Step 2.

has the Euclidean inner product. Apply the Gram–Schmidt process

, and then normalize the orthogonal basis vectors to obtain an .

Step 3.

Thus,

form an orthogonal basis for

. The norms of these vectors are

so an orthonormal basis for

is

Remark In the last example we normalized at the end to convert the orthogonal basis into an orthonormal basis. Alternatively, we could have normalized each orthogonal basis vector as soon as it was obtained, thereby producing an orthonormal basis step by step. However, that procedure generally has the disadvantage in hand calculation of producing more square roots to manipulate. A more useful variation is to “scale” the orthogonal basis vectors at each step to eliminate some of the fractions. For example, after Step 2 above, we could have multiplied by 3 to produce as the second orthogonal basis vector, thereby simplifying the calculations in Step 3.

Erhardt Schmidt (1875–1959) Historical Note Schmidt wasa German mathematician who studied for his doctoral degree at Göttingen University under David Hilbert, one of the giants of modern mathematics. For most of his life he taught at Berlin University where, in addition to making important contributions to many branches of mathematics, he fashioned some of Hilbert's ideas into a general concept, called a Hilbert space—a fundamental idea in

the study of infinite-dimensional vector spaces.He first described the process that bears his name in a paper on integral equations that he published in 1907. [Image: Archives of the Mathematisches Forschungsinst]

Jorgen Pederson Germ (1850–1916) Historical Note Gram was a Danish actuary whose early education was at village schools supplementedby private tutoring. He obtained a doctorate degree in mathematics while working for the Hafnia Life Insurance Company, where he specialized in the mathematics of accident insurance.It was in his dissertation that his contributions to the Gram–Schmidt process were formulated. He eventually became interested in abstract mathematics and received a gold medal from the Royal Danish Society of Sciences and Letters in recognition of his work. His lifelong interest in applied mathematics never wavered, however, and he produced a variety of treatises on Danish forest management. [Image: wikipedia]

CALCULUS REQUIRED

E X A M P L E 8 Legendre Polynomials Let the vector space

have the inner product

Apply the Gram–Schmidt process to transform the standard basis orthogonal basis Solution Take Step 1. Step 2. We have

. ,

, and

.

for

into an

so

Step 3. We have

so

Thus, we have obtained the orthogonal basis

,

,

in which

Remark The orthogonal basis vectors in the foregoing example are often scaled so all three functions have a value of 1 at . The resulting polynomials

which are known as the first three Legendre polynomials, play an important role in a variety of applications. The scaling does not affect the orthogonality.

Extending Orthonormal Sets to Orthonormal Bases Recall from part (b) of Theorem 4.5.5 that a linearly independent set in a finite-dimensional vector space can be enlarged to a basis by adding appropriate vectors. The following theorem is an analog of that result for orthogonal and orthonormal sets in finite-dimensional inner product spaces.

THEOREM 6.3.6 If W is a finite-dimensional inner product space, then: (a) Every orthogonal set of nonzero vectors in W can be enlarged to an orthogonal basis for W. (b) Every orthonormal set in W can be enlarged to an orthonormal basis for W.

We will prove part (b) and leave part (a) as an exercise. Proof (b) Suppose that us that we can enlarge S to some basis

is an orthonormal set of vectors in W. Part (b) of Theorem 4.5.5 tells

for W. If we now apply the Gram–Schmidt process to the set since they are already orthonormal, and the resulting set

, then the vectors

, will not be affected

will be an orthonormal basis for W.

OPTIONAL

QR-Decomposition In recent years a numerical algorithm based on the Gram–Schmidt process, and known as QR-decomposition, has assumed growing importance as the mathematical foundation for a wide variety of numerical algorithms, including those for computing eigenvalues of large matrices. The technical aspects of such algorithms are discussed in textbooks that specialize in the numerical aspects of linear algebra. However, we will discuss some of the underlying ideas here. We begin by posing the following problem.

Problem If A is an matrix with linearly independent column vectors, and if Q is the matrix that results by applying the Gram–Schmidt process to the column vectors of A, what relationship, if any, exists between A and Q?

To solve this problem, suppose that the column vectors of A are of Q are . Thus, A and Q can be written in partitioned form as It follows from Theorem 6.3.2b that

and the orthonormal column vectors

are expressible in terms of the vectors

as

Recalling from Section 1.3 (Example 9) that the jth column vector of amatrix product is a linear combination of the column vectors of the first factor with coefficients coming from the jth column of the second factor, it follows that these relationships can be expressed in matrix form as

or more briefly as (14) where R is the second factor in the product. However, it is a property of the Gram–Schmidt process that for , the vector is orthogonal to . Thus, all entries below the main diagonal of R are zero, and R has the form

(15)

We leave it for you to show that R is invertible by showing that its diagonal entries are nonzero. Thus, Equation 14 is a factorization of A into the product of a matrix Q with orthonormal column vectors and an invertible upper triangular matrix R. We call Equation 14 the QR-decomposition of A. In summary, we have the following theorem.

THEOREM 6.3.7 QR-Decomposition If A is an where Q is an matrix.

matrix with linearly independent column vectors, then A can be factored as matrix with orthonormal column vectors, and R is an

invertible upper triangular

It is common in numerical linear algebra to say that a matrix with linearly independent columns has full column rank.

Recall from Theorem 5.1.6 (the Equivalence Theorem) that a square matrix has linearly independent column vectors if and only if it is invertible. Thus, it follows from the foregoing theorem that every invertible matrix has a QR-decomposition.

E X A M P L E 9 QR-Decomposition of a 3 × 3 Matrix Find the QR-decomposition of

Solution The column vectors of A are

Applying the Gram–Schmidt process with normalization to these column vectors yields the

orthonormal vectors (see Example 7)

Thus, it follows from Formula 15 that R is

Show that the matrix Q in Example 9 has the property , and show that every matrix with orthonormal column vectors has this property. from which it follows that the

Concept Review • Orthogonal and orthonormal sets • Normalizing a vector • Orthogonal projections • Gram–Schmidt process • QR-decomposition

Skills

-decomposition of A is

• Determine whether a set of vectors is orthogonal (or orthonormal). • Compute the coordinates of a vector with respect to an orthogonal (or orthonormal) basis. • Find the orthogonal projection of a vector onto a subspace. • Use the Gram–Schmidt process to construct an orthogonal (or orthonormal) basis for an inner product space. • Find the QR-decomposition of an invertible matrix.

Exercise Set 6.3 1. Which of the following sets of vectors are orthogonal with respect to the Euclidean inner product on

?

(a) (b)

,

(c)

,

(d) Answer: (a), (b), (d) 2. Which of the sets in Exercise 1 are orthonormal with respect to the Euclidean inner product on

?

3. Which of the following sets of vectors are orthogonal with respect to the Euclidean inner product on (a)

,

(b) (c) (d)

?

,

,

,

, ,

Answer: (b), (d) 4. Which of the sets in Exercise 3 are orthonormal with respect to the Euclidean inner product on 5. Which of the following sets of polynomials are orthonormal with respect to the inner product on Example 7 of Section 6.1 ? (a)

,

? discussed in

(b)

Answer: (a) 6. Which of the following sets of matrices are orthonormal with respect to the inner product on Example 6 of Section 6.1 ?

discussed in

(a)

(b) 7. Verify that the given vectors form an orthogonal set with respect to the Euclidean inner product; then convert it to an orthonormal set by normalizing the vectors. (a)

,

(b) (c)

, ,

, ,

Answer: (a) (b) (c)

8. Verify that the set of vectors is orthogonal with respect to the inner product on ; then convert it to an orthonormal set by normalizing the vectors. 9. Verify that the vectors

form an orthonormal basis for with the Euclidean inner product; then use Theorem 6.3.2b to express each of the following as linear combinations of , , and . (a) (b) (c) Answer:

(a) (b) (c) 10. Verify that the vectors

form an orthogonal basis for with the Euclidean inner product; then use Theorem 6.3.2a to express each of the following as linear combinations of , and . (a) (b) (c) 11. (a) Show that the vectors

form an orthogonal basis for

with the Euclidean inner product.

(b) Use Theorem 6.3.2a to express

as a linear combination of the vectors in part (a).

Answer: (b) In Exercises 12–13, an orthonormal basis with respect to the Euclidean inner product is given. Use Theorem 6.3.2b to find the coordinate vector of w with respect to that basis. 12. (a) (b)

;

,

,

13. (a) (b)

Answer: (a)

, ,

,

(b)

In Exercises 14–15, the given vectors are orthogonal with respect to the Euclidean inner product. Find where and W is the subspace of spanned by the vectors. 14. (a)

,

(b) 15. (a)

,

, ,

,

(b)

,

,

Answer: (a) (b) In Exercises 16–17, the given vectors are orthonormal with respect to the Euclidean inner product. Use Theorem 6.3.4b to find , where and W is the subspace of spanned by the vectors. 16. (a)

,

(b) 17. (a) (b)

, ,

, ,

Answer: (a) (b) 18. In Example 6 of Section 4.9 we found the orthogonal projection of the vector onto the line through the origin making an angle of radians with the x-axis. Solve that same problem using Theorem 6.3.4. 19. Find the vectors (a) Exercise 14(a). (b) Exercise 15(a). Answer:

in W and

in

such that

, where x and W are as given in

(a) (b) 20. Find the vectors

in W and

in

such that

, where x and W are as given in

(a) Exercise 16(a). (b) Exercise 17(a). 21. Let have the Euclidean inner product. Use the Gram–Schmidt process to transform the basis an orthonormal basis. Draw both sets of basis vectors in the xy-plane. (a) (b) Answer: (a)

(b)

22. Let have the Euclidean inner product. Use theGram–Schmidt process to transform the basis into an orthonormal basis. (a)

,

,

(b)

,

,

23. Let

have the Euclidean inner product. Use the Gram–Schmidt process to transform the basis into an orthonormal basis.

into

Answer:

24. Let

have the Euclidean inner product. Find an orthonormal basis for the subspace spanned by , .

25. Let

have the inner product

Use the Gram–Schmidt process to transform basis.

,

,

,

into an orthonormal

Answer:

26. Let R3 have the Euclidean inner product. The subspace of

spanned by the vectors

is aplane passing through the origin. Express lies in the plane and is perpendicular to the plane. 27. Repeat Exercise 26 with

and

in the form

and , where

.

Answer:

28. Let have the Euclidean inner product. Express the vector where is in the space W spanned by and 29. Find the (a) (b)

(c)

(d)

-decomposition of the matrix, where possible.

in the form , , and is orthogonal to W.

(e)

(f)

Answer: (a)

(b)

(c)

(d)

(e)

(f) Columns not linearly independent

30. In Step 3 of the proof of Theorem 6.3.5, it was stated that “the linear independence of that .” Prove this statement.

ensures

31. Prove that the diagonal entries of R in Formula 15 are nonzero. 32. Calculus required Use Theorem 6.3.2a to express the following polynomials as linear combinations of the first three Legendre polynomials (see the Remark following Example 8). (a) (b) (c) 33. Calculus required Let

have the inner product

Apply the Gram–Schmidt process to transform the standard basis

into an orthonormal basis.

Answer:

34. Find vectors x and y in that are orthonormal with respect to the inner product are not orthonormal with respect to the Euclidean inner product.

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer. (a) Every linearly independent set of vectors in an inner product space is orthogonal. Answer: False (b) Every orthogonal set of vectors in an inner product space is linearly independent. Answer: False (c) Every nontrivial subspace of

has an orthonormal basis with respect to the Euclidean inner product.

Answer: True (d) Every nonzero finite-dimensional inner product space has an orthonormal basis. Answer: True (e)

is orthogonal to every vector of W. Answer:

but

False (f) If A is an

matrix with a nonzero determinant, then A has a QR-decomposition.

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

6.4 Best Approximation; Least Squares In this section we will be concerned with linear systems that cannot be solved exactly and for which an approximate solution is needed. Such systems commonly occur in applications where measurement errors “perturb” the coefficients of a consistent system sufficiently to produce inconsistency.

Least Squares Solutions of Linear Systems Suppose that is an inconsistent linear system of m equations in n unknowns in which we suspect the inconsistency to be caused by measurement errors in the coefficients of A. Since no exact solution is possible, we will look for a vector x that comes as “close as possible” to being a solution in the sense that it minimizes with respect to the Euclidean inner product on . You can think of as an approximation to b and as the error in that approximation—the smaller the error, the better the approximation. This leads to the following problem.

Least Squares Problem Given a linear system Euclidean inner product on error vector, and we call

of m equations in n unknowns, find a vector x that minimizes . We call such an x a least squares solution of the system, we call the least squares error.

To clarify the above terminology, suppose that the matrix form of

The term “least squares solution” results from the fact that minimizing

with respect to the the least squares

is

also minimizes

.

Best Approximation Suppose that b is a fixed vector in that we would like to approximate by a vector w that is required to lie in some subspace W of . Unless b happens to be in W, then any such approximation will result in an “error vector” that cannot be made equal to 0 no matter how w is chosen (Figure 6.4.1a). However, by choosing we can make the length of the error vector as small as possible (Figure 6.4.1b).

Figure 6.4.1

These geometric ideas suggest the following general theorem.

THEOREM 6.4.1 Best Approximation Theorem If W is a finite-dimensional subspace of an inner product space V, and if b is a vector in V, then approximation to b from W in the sense that for every vector w in W that is different from

is the best

.

Proof For every vector w in W, we can write

(1) But being a difference of vectors in W is itself in W; and since is orthogonal to W, the two terms on the right side of 1 are orthogonal. Thus, it follows from the Theorem of Pythagoras (Theorem 6.2.3) that Since

, it follows that the second term in this sum is positive, and hence that

Since norms are nonnegative, it follows (from a property of inequalities) that

Least Squares Solutions of Linear Systems One way to find a least squares solution of matrix A and then solve the equation

is to calculate the orthogonal projection

on the column space W of the

(2) However, we can avoid the need to calculate the projection by rewriting 2 as and then multiplying both sides of this equation by

to obtain (3)

Since is the component of b that is orthogonal to the column space of A, it follows from Theorem 4.8.9b that this vector lies in the null space of , and hence that

Thus, 3 simplifies to which we can rewrite as (4)

This is called the normal equation or the normal system associated with equations are called the normal equations associated with .

. When viewed as a linear system, the individual

In summary, we have established the following result.

THEOREM 6.4.2 For every linear system

, the associated normal system (5)

is consistent, and all solutions of 5 are least squaressolutions of . Moreover, if W is the column space of A, and x is any least squares solution of , then the orthogonal projection of b on W is (6)

If a linear system is consistent, then its exact solutions are the same as its least squares solutions, in which case the error is zero.

E X A M P L E 1 Least Squares Solution (a) Find all least squares solutions of the linear system

(b) Find the error vector and the error. Solution (a) It will be convenient to express the system in the matrix form

It follows that

so the normal system

is

Solving this system yields a unique least squares solution, namely,

, where

(b) The error vector is

and the error is

E X A M P L E 2 Orthogonal Projection on a Subspace Find the orthogonal projection of the vector

on the subspace of

spanned by the vectors

Solution We could solve this problem by first using the Gram–Schmidt process to convert into an orthonormal basis and then applying the method used in Example 6 of Section 6.3 . However, the following method is more efficient. The subspace W of

spanned by

,

, and

is the column space of the matrix

Thus, if u is expressed as a column vector, we can find the orthogonal projection of u on W by finding a least squares solution of the system and then calculating from the least squares solution. The computations are as follows: The system is

so

The normal system

Solving this system yields

in this case is

as the least squares solution of

(verify), so

or, in comma-delimited notation,

.

Uniqueness of Least Squares Solutions In general, least squares solutions of linear systems are not unique. Although the linear system in Example 1 turned out to have a unique least squares solution, that occurred only because the coefficient matrix of the system happened to satisfy certain conditions that guarantee uniqueness. Our next theorem will show what those conditions are.

THEOREM 6.4.3 If A is an

matrix, then the following are equivalent.

(a) A has linearly independent column vectors. (b)

is invertible.

Proof We will prove that

and leave the proof that

as an exercise.

Assume that A has linearly independent column vectors. The matrix has size , so we can prove that this matrix is invertible by showing that the linear system has only the trivial solution. But if x is any solution of this system, then is in the null space of and also in the column space of A. By Theorem 4.8.9b these spaces are orthogonal complements, so part (b) of Theorem 6.2.4 implies that . But A is assumed to have linearly independent column vectors, so by Theorem 1.3.1. As an exercise, try using Formula 7 to solve the problem in part (a) of Example 1.

The next theorem, which follows directly from Theorem 6.4.2 and Theorem 6.4.3, gives an explicit formula for the least squares solution of a linear system in which the coefficient matrix has linearly independent column vectors.

THEOREM 6.4.4 If A is an matrix with linearly independent column vectors, then for every has a unique least squares solution. This solution is given by

matrix b, the linearsystem

(7)

Moreover, if W is the column space of A, then the orthogonalprojection of b on W is (8)

OPTIONAL

The Role of QR-Decomposition in Least Squares Problems Formulas 7 and 8 have theoretical use but are not well suited for numerical computation. In practice, least squares solutions of are typically found by using some variation of Gaussian elimination to solve the normal equations or by using QR-decomposition and the following theorem.

THEOREM 6.4.5 If A is an matrix with linearly independent column vectors, and if A = QR is a QR-decomposition of A (see Theorem 6.3.7), then for each b in the system has a unique least squares solution given by (9)

A proof of this theorem and a discussion of its use can be found in many books on numerical methods of linear algebra. However, you can obtain Formula 9 by making the substitution in 7 and using the fact that to obtain

Orthogonal Projections on Subspaces of Rm In Section 4.8 we showed how to compute orthogonal projections on the coordinate axes of a rectangular coordinate system in and more generally on lines through the origin of . We will now consider the problem of finding orthogonal projections on subspaces of . We begin with the following definition.

DEFINITION 1 If W is a subspace of projection

, then the linear transformation

that maps each vector x in m

in W is called the orthogonal projection of R on W

It follows from Formula 7 that the standard matrix for the transformation P is

into its orthogonal

(10) where A is constructed using any basis for W as its column vectors.

E X A M P L E 3 The Standard Matrix for an Orthogonal Projection on a Line We showed in Formula 16 of Section 4.9 that

is the standard matrix for the orthogonal projection on the line W through the origin of the positive x-axis. Derive this result using Formula 10.

that makes an angle θ with

Solution The column vectors of A can be formed from any basis for W. Since W is one-dimensional, we can take as the basis vector (Figure 6.4.2), so

We leave it for you to show that

is the

identity matrix. Thus, Formula 10 simplifies to

Figure 6.4.2

Another View of Least Squares Recall from Theorem 4.8.9 that the null space and row space of an matrix A are orthogonal complements, as are the null space of and the column space of A. Thus, given a linear system in which A is an matrix, the Projection Theorem (6.3.3) tells us that the vectors x and b can each be decomposed into sums of orthogonal terms as

where

and and

are the orthogonal projections of x on the row space of A and the null space of A, and the vectors are the orthogonal projections of b on the null space of and the column space of A.

In Figure 6.4.3 we have represented the fundamental spaces of A by perpendicular lines in and on which we indicated the orthogonal projections of x and b. (This, of course, is only pictorial since the fundamental spaces need not be one-dimensional.) The figure shows as a point in the column space of A and conveys that is the point in col(A) that is closest to b. This

illustrates that the least squares solutions of

are the exact solutions of the equation

.

Figure 6.4.3

More on the Equivalence Theorem As our final result in the main part of this section we will add one additional part to Theorem 5.1.6.

THEOREM 6.4.6 Equivalent Statements If A is an

matrix, then the following statements are equivalent.

(a) A is invertible. (b)

has only the trivial solution.

(c) The reduced row echelon form of A is

.

(d) A is expressible as a product of elementary matrices. (e)

is consistent for every

(f)

has exactly one solution for every

(g)

matrix b. matrix b.

.

(h) The column vectors of A are linearly independent. (i) The row vectors of A are linearly independent. (j) The column vectors of A span (k) The row vectors of A span

. .

(l) The column vectors of A form a basis for (m) The row vectors of A form a basis for (n) A has

. .

.

(o) A has nullity 0. (p) The orthogonal complement of the null space of A is (q) The orthogonal complement of the row space of A is (r) The range of (s) (t) (u)

is

. .

.

is one-to-one. is not an eigenvalue of A. is invertible.

The proof of part (u) follows from part (h) of this theorem and Theorem 6.4.3 applied to square matrices.

OPTIONAL

We now have all the ingredients needed to prove Theorem 6.3.3 in the special case where V is the vector space

.

Proof of Theorem 6.3.3 We will leave the case where as an exercise, so assume that . Let be any basis for W, and form the matrix M that has these basis vectors as successive columns. This makes W the column space of M and hence the null space of . We will complete the proof by showing that every vector u in can be written in exactly one way as

where

is in the column space of M and for some vector x in , and to say that show that the equation

. However, to say that is in the column space of M is equivalent to saying is equivalent to saying that . Thus, if we can

(11) has a unique solution for x, then do this, let us rewrite 11 as

and

will be uniquely determined vectors with the required properties. To

Since the matrix M has linearly independent column vectors, the matrix equation has a unique solution as required to complete the proof.

is invertible by Theorem 6.4.6 and hence the

Concept Review • Least squares problem • Least squares solution • Least squares error vector • Least squares error • Best approximation • Normal equation • Orthogonal projection

Skills • Find the least squares solution of a linear system. • Find the error and error vector associated with a least squares solution to a linear system. • Use the techniques developed in this section to compute orthogonal projections. • Find the standard matrix of an orthogonal projection.

Exercise Set 6.4 1. Find the normal system associated with the given linear system. (a)

(b)

Answer: (a) (b)

In Exercises 2–4, find the least squares solution of the linear equation

.

2. (a) ; (b) ; 3. (a)

(b)

Answer: (a) (b) 4. (a)

(b)

In Exercises 5–6, find the least squares error vector orthogonal to the column space of A. 5. (a) A and b are as in Exercise 3(a). (b) A and b are as in Exercise 3(b). Answer:

resulting from the least squares solution x and verify that it is

(a)

(b)

6. (a) A and b are as in Exercise 4(a). (b) A and b are as in Exercise 4(b). 7. Find all least squares solutions of squares error.

andconfirm that all of the solutions have the same error vector. Compute the least

(a) ; (b) ; (c) ;

Answer: (a) Solution:

; least squares error:

(b) Solution:

(t a real number); least squares error:

(c) Solution:

(t a real number); least squares error:

8. Find the orthogonal projection of u on the subspace of

spanned by the vectors

and

spanned by the vectors

,

.

(a) (b) 9. Find the orthogonal projection of u on the subspace of (a) (b)

;

, ;

, and

.

, ,

,

Answer: (a) (7, 2, 9, 5) (b) 10. Find the orthogonal projection of

11. In each part, find

on the solution space of the homogeneous linear system

, and apply Theorem 6.4.3 to determine whether A has linearly independent column vectors.

(a)

(b)

Answer: (a)

A does not have linearly independent column vectors.

(b)

A does not have linearly independent column vectors.

12. Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection

onto

(a) the x-axis. (b) the y-axis. [Note: Compare your results to Table 3 of Section 4.9.] 13. Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection

onto

(a) the xz-plane. (b) the yz-plane. [Note: Compare your results to Table 4 of Section 4.9.] Answer: (a)

(b)

14. Show that if is

is a nonzero vector, then the standard matrix for the orthogonal projection of

15. Let W be the plane with equation

on the line

.

(a) Find a basis for W. (b) Use Formula 10 to find the standard matrix for the orthogonal projection on W. (c) Use the matrix obtained in part (b) to find the orthogonal projection of a point (d) Find the distance between the point Answer: (a) (b)

on W.

and the plane W, and check your result using Theorem 3.3.4.

(c) (d) 16. Let W be the line with parametric equations (a) Find a basis for W. (b) Use Formula 10 to find the standard matrix for the orthogonal projection on W. (c) Use the matrix obtained in part (b) to find the orthogonalprojection of a point (d) Find the distance between the point 17. In

on W.

and the line W.

, consider the line l given by the equations

and the line m given by the equations Let P be a point on l, and let Q be a point on m. Find the values of t and s that minimize the distance between the lines by minimizing the squared distance . Answer: 18. Prove: If A has linearly independent column vectors, and if the exact solution of are the same.

is consistent, then the least squares solution of

19. Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of A, then the least squares solution of is . 20. Let

be the orthogonal projection of

(a) Prove that

onto a subspace W.

.

(b) What does the result in part (a) imply about the composition

?

(c) Show that [P] is symmetric. 21. Let A be an matrix with linearly independent row vectors. Find a standard matrix for the orthogonal projection of onto the row space of A. [Hint: Start with Formula 10.] Answer:

22. Prove the implication

of Theorem 6.4.3.

True-False Exercises In parts (a)–(h) determine whether the statement is true or false, and justify your answer. (a) If A is an

matrix, then

is a square matrix.

Answer: True (b) If

is invertible, then A is invertible.

Answer: False

and

(c) If A is invertible, then

is invertible.

Answer: True (d) If

is a consistent linear system, then

is also consistent.

Answer: True (e) If

is an inconsistent linear system, then

is also inconsistent.

Answer: False (f) Every linear system has a least squares solution. Answer: True (g) Every linear system has a unique least squares solution. Answer: False (h) If A is an

matrix with linearly independent columns and b is in

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

, then

has a unique least squares solution.

6.5 Least Squares Fitting to Data In this section we will use results about orthogonal projections in inner product spaces to obtain a technique for fitting a line or other polynomial curve to a set of experimentally determined points in the plane.

Fitting a Curve to Data A common problem in experimental work is to obtain a mathematical relationship between two variables x and y by “fitting” a curve to points in the plane corresponding to various experimentally determined values of x and y, say

On the basis of theoretical considerations or simply by observing the pattern of the points, the experimenter decides on the general form of the curve to be fitted. Some possibilities are (Figure 6.5.1) (a) A straight line: (b) A quadratic polynomial: (c) A cubic polynomial: Because the points are obtained experimentally, there is often some measurement “error” in the data, making it impossible to find a curve of the desired form that passes through all the points. Thus, the idea is to choose the curve (by determining its coefficients) that “best” fits the data. We begin with the simplest and most common case: fitting a straight line to data points.

Figure 6.5.1

Least Squares Fit of a Straight Line Suppose we want to fit a straight line

to the experimentally determined points

If the data points were collinear, the line would pass through all n points, and the unknown coefficients a and b would satisfy the equations

We can write this system in matrix form as

or more compactly as (1) where

(2)

If the data points are not collinear, then it is impossible to find coefficients a and b that satisfy system 1 exactly; that is, the system is inconsistent. In this case we will look for a least squares solution

We call a line

whose coefficients come from a least squares solution a regression line or a

least squares straight line fit to the data. To explain this terminology, recall that a least squares solution of 1 minimizes (3) If we express the square of 3 in terms of components, we obtain (4) If we now let then 4 can be written as (5) As illustrated in Figure 6.5.2, the number can be interpreted as the vertical distance between the line and the data point . This distance is a measure of the “error” at the point

resulting from the inexact fit of to the data points, the assumption being that the are known exactly and that all the error is in the measurement of the . Since 3 and 5 are minimized by the same vector , the least squares straight line fit minimizes the sum of the squares of the estimated errors , hence the name least squares straight line fit.

Figure 6.5.2

measures the vertical error in the least squares straight line.

Normal Equations Recall from Theorem 6.4.2 that the least squares solutions of 1 can be obtained by solving the associated normal system the equations of which are called the normal equations. In the exercises it will be shown that the column vectors of M are linearly independent if and only if the n data points do not lie on a vertical line in the xy-plane. In this case it follows from Theorem 6.4.4 that the least squares solution is unique and is given by

In summary, we have the following theorem.

THEOREM 6.5.1 Uniqueness of the Least Squares Solution Let line, and let

be a set of two or more data points, not all lying on a vertical

Then there is a unique least squares straight line fit to the data points. Moreover,

is given by the formula (6) which expresses the fact that

is the unique solution of the normal equations (7)

E X A M P L E 1 Least Squares Straight Line Fit Find the least squares straight line fit to the four points Figure 6.5.3.)

Figure 6.5.3 Solution We have

so the desired line is

.

,

,

, and

. (See

E X A M P L E 2 Spring Constant Hooke's law in physics states that the length x of a uniform spring is a linear function of the force y applied to it. If we express this relationship as , then the coefficient b is called the spring constant. Suppose a particular unstretched spring has a measured length of 6.1 inches (i.e., when ). Forces of 2 pounds, 4 pounds, and 6 pounds are then applied to the spring, and the corresponding lengths are found to be 7.6 inches, 8.7 inches, and 10.4 inches (see Figure 6.5.4). Find the spring constant.

Figure 6.5.4 Solution We have

and

where the numerical values have been rounded to one decimal place. Thus, the estimated value of the spring constant is pounds/inch.

Historical Note On October 5, 1991 the Magellan spacecraft entered the atmosphere of Venus and transmitted thetemperature T in kelvins (K) versus the altitude h in kilometers (km) until its signal was lost at an altitude of about 34 km. Discounting theinitial erratic signal, the data strongly suggested a linear relationship, so a least squares straight line fit was used on the linear part of the data to obtain the equation By setting

in this equation, the surface temperature of Venus was estimated at

K.

Least Squares Fit of a Polynomial The technique described for fitting a straight line to data points can be generalized to fitting a polynomial of specified degree to data points. Let us attempt to fit a polynomial of fixed degree m (8) to n points Substituting these n values of x and y into 8 yields the n equations

or, in matrix form, (9) where

(10)

As before, the solutions of the normal equations determine the coefficients of the polynomial, and the vector v minimizes Conditions that guarantee the invertibility of are discussed in the exercises (Exercise 7). If invertible, then the normal equations have a unique solution , which is given by

is

(11)

E X A M P L E 3 Fitting a Quadratic Curve to Data According to Newton's second law of motion, a body near the Earth's surface falls vertically downward according to the equation (12) where s = vertical displacement downward relative to some fixed point = initial displacement at time = initial velocity at time g = acceleration of gravity at the Earth's surface from Equation 12 by releasing a weight with unknown initial displacement and velocity and measuring the distance it has fallen at certain times relative to a fixed reference point. Suppose that a laboratory experiment is performed to evaluate g. Suppose it is found that at times , and .5 seconds the weight has fallen , and 3.73 feet, respectively, from the reference point. Find an approximate value of g using these data. Solution The mathematical problem is to fit a quadratic curve (13) to the five data points: With the appropriate adjustments in notation, the matrices M and y in 10 are

Thus, from 11,

From 12 and 13, we have

, so the estimated value of g is

If desired, we can also estimate the initial displacement and initial velocity of the weight:

In Figure 6.5.5 we have plotted the five data points and the approximating polynomial.

Figure 6.5.5

Concept Review • Least squares straight line fit • Regression line • Least squares polynomial fit

Skills

• Find the least squares straight line fit to a set of data points. • Find the least squares polynomial fit to a set of data points. • Use the techniques of this section to solve applied problems.

Exercise Set 6.5 1. Find the least squares straight line fit to the three points

,

, and

.

Answer:

2. Find the least squares straight line fit to the four points 3. Find the quadratic polynomial that best fits the four points

,

, ,

, and

.

,

, and

.

Answer:

4. Find the cubic polynomial that best fits the five points .

,

,

,

, and

5. Show that the matrix M in Equation 2 has linearly independent columns if and only if at least two of the numbers are distinct. 6. Show that the columns of the at least of the numbers most m distinct roots.]

matrix M in Equation 10 are linearly independent if and are distinct. [Hint: A nonzero polynomial of degreem has at

7. Let M be the matrix in Equation 10. Using Exercise 6, show that a sufficient condition for the matrix to be invertible is that and that at least of the numbers are distinct. 8. The owner of a rapidly expanding business finds that for the first five months of the year the sales (in thousands) are , and $8.0. The owner plots these figures on a graph and conjectures that for the rest of the year, the sales curve can be approximated by a quadratic polynomial. Find the least squares quadratic polynomial fit to the sales curve, and use it to project the sales for the twelfth month of the year. 9. A corporation obtains the following data relating the number of sales representatives on its staff to annual sales:

Explain how you might use least squares methods to estimate the annual sales with 45 representatives, and discuss the assumptions that you are making. (You need not perform the actual computations.)

10. Pathfinder is an experimental, lightweight,remotely piloted,solar-powered aircraft that was used in aseries of experiments by NASA to determine the feasibilityof applyingsolar power for long-duration,highaltitude flight. In August 1997 Pathfinder recordedthe data in the accompanying table relating altitude H and temperature T. Show that a linear model is reasonable by plotting the data, and then find theleast squares line of best fit. Table Ex-10

11. Find a curve of the form that best fits the data points , , by making the substitution . Draw the curve and plot the data points in the same coordinate system. Answer:

True-False Exercises In parts (a)–(d) determine whether the statement is true or false, and justify your answer. (a) Every set of data points has a unique least squares straight line fit. Answer: False (b) If the data points

are not collinear, then 1 is an inconsistent system.

Answer: True (c) If

Answer:

is the least squares line fit to the data points is minimal for every .

, then

False (d) If

is the least squares line fit to the data points is minimal.

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

, then

6.6 Function Approximation; Fourier Series In this section we will show orthogonal projections can be used to approximate certain types of functions by simpler functions that are easier to work with. The ideas explained here have important applications in engineering and science. Calculus is required.

Best Approximations All of the problems that we will study in this section will be special cases of the following general problem.

APPROXIMATION PROBLEM Given a function f that is continuous on an interval using only functions from a specified subspace W of

, find the “best possible approximation” to f .

Here are some examples of such problems: (a) Find the best possible approximation to

over

(b) Find the best possible approximation to .

by a polynomial of the form

over

.

by a function of the form

(c) Find the best possible approximation to x over

by a function of the form .

In the first example W is the subspace of subspace of spanned by , spanned by

,

,

spanned by , and ; in the second example W is the , and ; and in the third example W is the subspace of , and

.

Measurements of Error To solve approximation problems of the preceding types, we first need to make the phrase “best approximation over ” mathematically precise. To do this we will need some way of quantifying the error that results when one continuous function is approximated by another over an interval . If we were to approximate by , and if we were concerned only with the error in that approximation at a single point , then it would be natural to define the error to be sometimes called the deviation between f and g at (Figure 6.6.1). However, we are not concerned simply with measuring the error at a single point but rather with measuring it over the entire interval . The problem is that an approximation may have small deviations in one part of the interval and large deviations in another. One possible way of accounting for this is to integrate the deviation over the interval and define the error over the interval to be

(1) Geometrically, 1 is the area between the graphs of greater the area, the greater the overall error.

and

over the interval

(Figure 6.6.2); the

Figure 6.6.1 The deviation between f and g x0

Figure 6.6.2 The area between the graphs of f and g over [a, b] measures the error in approximating f by g over [a, b] Although 1 is natural and appealing geometrically, most mathematicians and scientists generally favor the following alternative measure of error, called the mean square error:

Mean square error emphasizes the effect of larger errors because of the squaring and has the added advantage that it allows us to bring to bear the theory of inner product spaces. To see how, suppose that f is a continuous function on that we want to approximate by a function g from a subspace W of , and suppose that is given the inner product

It follows that

so minimizing the mean square error is the same as minimizing

. Thus the approximation problem

posed informally at the beginning of this section can be restated more precisely as follows.

Least Squares Approximation

LEAST SQUARES APPROXIMATION PROBLEM Let f be a function that is continuous on an interval

and let W be a finite-dimensional subspace of

Since

and

, let

have the inner product

. Find a function g in W that minimizes

are minimized by the same function g, this problem is equivalent to looking for a

function g in W that is closest to f. But we know from Theorem 6.4.1 that (Figure 6.6.3).

is such a function

Figure 6.6.3 Thus, we have the following result.

THEOREM 6.6.1 If f is a continuous function on , and W is a finite-dimensional subspace of function g in W that minimizes the mean square error

is

The function

, where the orthogonal projection is relative to the inner product

is called the last squares approximation to f from W.

, then the

Fourier Series A function of the form (2) is called a trigonometric polynomial; if example,

and

are not both zero, then

is said to have order n. For

is a trigonometric polynomial of order 4 with

It is evident from 2 that the trigonometric polynomials of order n or less are the various possible linear combinations of (3) It can be shown that these -dimensional subspace of

functions are linearly independent and thus form a basis for a .

Let us now consider the problem of finding the least squares approximation of a continuous function over the interval by a trigonometric polynomial of order n or less. As noted above, the least squares approximation to f from W is the orthogonal projection of f on W. To find this orthogonal projection, we must find an orthonormal basis for W, after which we can compute the orthogonal projection on W from the formula (4) (see Theorem 6.3.4b). An orthonormal basis for W can be obtained by applying the Gram–Schmidt process to the basis vectors in 3 using the inner product

This yields the orthonormal basis

(5)

(see Exercise 6). If we introduce the notation

(6)

then on substituting 5 in 4, we obtain (7) where

In short, (8) The numbers

are called the Fourier coefficients of f.

E X A M P L E 1 Least Squares Approximations Find the least squares approximation of

on

by

(a) a trigonometric polynomial of order 2 or less; (b) a trigonometric polynomial of order n or less. Solution (a) (9a) For

, integration by parts yields (verify)

(9b)

(9c) Thus, the least squares approximation to x on order 2 or less is

by a trigonometric polynomial of

or, from (9a), (9b), and (9c), (b) The least squares approximation to x on or less is

by a trigonometric polynomial of order n

or, from (9a), (9b), and (9c),

The graphs of

and some of these approximations are shown in Figure 6.6.4.

Figure 6.6.4 It is natural to expect that the mean square error will diminish as the number of terms in the least squares approximation

increases. It can be proved that for functions f in , the mean square error approaches zero as ; this is denoted by writing

The right side of this equation is called the Fourier series for f over the interval Such series are of major importance in engineering, science, and mathematics.

Jean Baptiste Fourier (1768–1830) Historical Note Fourier was a French mathematician and physicist who discovered the Fourier series and related ideas while working on problems of heat diffusion. This discovery was one of the most influential in the history of mathematics; it is the cornerstone of many fields of mathematical research and a basic tool in many branches of engineering. Fourier, a political activist during the French revolution, spent time in jail for his defense of many victims during the Terror. He later became a favorite of Napoleon and was named a baron. [Image: The Granger Collection, New York]

Concept Review • Approximation of functions • Mean square error • Least squares approximation • Trigonometric polynomial • Fourier coefficients • Fourier series

Skills • Find the least squares approximation of a function. • Find the mean square error of the least squares approximation of a function. • Compute the Fourier series of a function.

.

Exercise Set 6.6 1. Find the least squares approximation of

over the interval

by

(a) a trigonometric polynomial of order 2 or less. (b) a trigonometric polynomial of order n or less. Answer: (a) (b) 2. Find the least squares approximation of

over the interval

by

(a) a trigonometric polynomial of order 3 or less. (b) a trigonometric polynomial of order n or less. 3. (a) Find the least squares approximation of x over the interval

by a function of the form

(b) Find the mean square error of the approximation. Answer: (a) (b) 4. (a) Find the least squares approximation of .

over the interval

by a polynomial of the form

(b) Find the mean square error of the approximation. 5. (a) Find the least squares approximation of .

over the interval [−1, 1] by a polynomial of the form

(b) Find the mean square error of the approximation. Answer: (a) (b) 6. Use the Gram–Schmidt process to obtain the orthonormal basis 5 from the basis 3. 7. Carry out the integrations indicated in Formulas 9a, 9b, and 9c. 8. Find the Fourier series of

over the interval

.

.

9. Find the Fourier series of

and

,

over the interval

.

Answer:

10. What is the Fourier series of

?

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) If a function f in is approximated by the function g, then the mean square error is the same as the area between the graphs of and over the interval . Answer: False (b) Given a finite-dimensional subspace W of error.

, the function g = projW f minimizes the mean square

Answer: True (c)

is an orthogonal subset of the vector space inner product

with respect to the

.

Answer: True (d)

is an orthonormal subset of the vector space inner product

with respect to the

.

Answer: False (e)

is a linearly independent subset of Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

.

Chapter 6 Supplementary Exercises 1. Let

have the Euclidean inner product.

(a) Find a vector in angles with

that is orthogonal to and

and

and makes equal

.

(b) Find a vector of length 1 that is orthogonal to and above and such that the cosine of the angle between x and is twice the cosine of the angle between x and . Answer: (a)

with

(b)

2. Prove: If

is the Euclidean inner product on

[Hint: Use the fact that 3. Let

, and if A is an

matrix, then

.]

have the inner product

that was defined in Example 6 of

Section 6.1 . Describe the orthogonal complement of (a) the subspace of all diagonal matrices. (b) the subspace of symmetric matrices. Answer: (a) The subspace of all matrices in

with only zeros on the diagonal.

(b) The subspace of all skew-symmetric matrices in 4. Let

.

be a system of m equations in n unknowns. Show that

is a solution of this system if and only if the vector of A with respect to the Euclidean inner product on 5. Use the Cauchy–Schwarz inequality to show that if

is orthogonal to every row vector . are positive real numbers, then

6. Show that if x and y are vectors in an inner product space and c is any scalar, then

7. Let have the Euclidean inner product. Find two vectors of length 1 that are orthogonal to all three of the vectors , , and . Answer:

8. Find a weighted Euclidean inner product on

such that the vectors

form an orthonormal set. 9. Is there a weighted Euclidean inner product on orthonormal set? Justify your answer.

for which the vectors

and

form an

Answer: No 10. If u and v are vectors in an inner product space , then u, v, and can be regarded as sides of a “triangle” in V (see the accompanying figure). Prove that the law of cosines holds for any such triangle; that is, where is the angle between u and v.

Figure Ex-10 11. (a) As shown in Figure 3.2.6, the vectors (k, 0, 0), (0, k, 0), and (0, 0, k) form the edges of a cube in with diagonal . Similarly, the vectors can be regarded as edges of a “cube” in with diagonal edges makes an angle of θ with the diagonal, where

. Show that each of the above .

(b) Calculus required What happens to the angle θ inpart (a) as the dimension of

approaches

?

Answer: (b)

approaches

12. Let u and v be vectors in an inner product space. (a) Prove that

if and only if

and

are orthogonal.

(b) Give a geometric interpretation of this result in

with the Euclidean inner product.

13. Let u be a vector in an inner product space V, and let Show that if is the angle between u and , then

14. Prove: If

and

be an orthonormal basis for V.

are two inner products on a vector space V, then the quantity is also an inner product.

15. Prove Theorem 6.2.5. 16. Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of A,then the least squares solution of is . 17. Is there any value of s for which system?

and

is the leastsquares solution of the following linear

Explain your reasoning. Answer: No 18. Show that if p and q are distinct positive integers, then the functions orthogonal with respect to the inner product

19. Show that if p and q are positive integers, then the functions orthogonal with respect to the inner product

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

and

and

are

are

CHAPTER

7

Diagonalization and Quadratic Forms

CHAPTER CONTENTS 7.1. Orthogonal Matrices 7.2. Orthogonal Diagonalization 7.3. Quadratic Forms 7.4. Optimization Using Quadratic Forms 7.5. Hermitian, Unitary, and Normal Matrices

INTRODUCTION In Section 5.2 we found conditions that guaranteed the diagonalizability of an matrix, but we did not consider what class or classes of matrices might actually satisfy those conditions. In this chapter we will show that every symmetric matrix is diagonalizable. This is an extremely important result because many applications utilize it in some essential way.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

7.1 Orthogonal Matrices In this section we will discuss the class of matrices whose inverses can be obtained by transposition. Such matrices occur in a variety of applications and arise as well as transition matrices when one orthonormal basis is changed to another.

Orthogonal Matrices We begin with the following definition.

DEFINITION 1 A square matrix A is said to be orthogonal if its transpose is the same as its inverse, that is, if or, equivalently, if (1)

Recall from Theorem 1.6.3 that if either product in 1 holds, then so does the other. Thus, A is orthogonal if either or .

E X A M P L E 1 A 3 × 3 Orthogonal Matrix The matrix

is orthogonal since

E X A M P L E 2 Rotation and Reflection Matrices are Orthogonal Recall from Table Table 5 of Section 4.9 that the standard matrix for the counterclockwise rotation of

through an angle θ is

This matrix is orthogonal for all choices of θ since

We leave it for you to verify that the reflection matrices in Tables Table 1 and Table 2 and the rotation matrices in Table Table 6 of Section 4.9 are all orthogonal.

Observe that for the orthogonal matrices in Example 1 and Example 2, both the row vectors and the column vectors form orthonormal sets with respect to the Euclidean inner product. This is a consequence of the following theorem.

THEOREM 7.1.1 The following are equivalent for an

matrix A.

(a) A is orthogonal (b) The row vectors of A form an orthonormal set in

with the Euclidean inner product.

(c) The column vectors of A form an orthonormal set in

with the Euclidean inner product.

Proof We will prove the equivalence of (a) and (b) and leave the equivalence of (a) and (c) as an exercise. (a) ⇔ (b) The entry in the ith row and jth column of the matrix product is the dot product of the ith row vector of A and the jth column vector of (see Formula 5 of Section 1.3). But except for a difference in form, the jth column vector of is the jth row vector of A. Thus, if the row vectors of A are , then the matrix product can be expressed as

[see Formula 28 of Section 3.2]. Thus, it follows that

if and only if

and which are true if and only if

is an orthonormal set in

.

WARNING Note that an orthogonal matrix is one with orthonormal rows and columns—not simply orthogonal rows and columns.

The following theorem lists three more fundamental properties of orthogonal matrices. The proofs are all straightforward and are left as exercises.

THEOREM 7.1.2 (a) The inverse of an orthogonal matrix is orthogonal. (b) A product of orthogonal matrices is orthogonal. (c) If A is orthogonal, then

or

.

E X A M P L E 3 det(A) = ±1 for an Orthogonal Matrix A The matrix

is orthogonal since its row (and column) vectors form orthonormal sets in with the Euclidean inner product. We leave it for you to verify that and that interchanging the rows produces an orthogonal matrix whose determinant is .

Orthogonal Matrices as Linear Operators We observed in Example 2 that the standard matrices for the basic reflection and rotation operators on will explain why this is so.

and

are orthogonal. The next theorem

THEOREM 7.1.3 If A is an

matrix, then the following are equivalent.

(a) A is orthogonal. (b)

for all x in

(c)

.

for all x and y in

.

Proof We will prove the sequence of implications (a) ⇒ (b) ⇒ (c) ⇒ (a). (a) ⇒ (b) Assume that A is orthogonal, so that

(b) ⇒ (c) Assume that

(c) ⇒ (a) Assume that

which can be rewritten as

Since this equation holds for all x in

for all x in

. It follows from Formula 26 of Section 3.2 that

. From Theorem 3.2.7 we have

for all x and y in

. It follows from Formula 26 of Section 3.2 that

or as

, it holds in particular if

, so

Thus, it follows from the positivity axiom for inner products that

Since this equation is satisfied by every vector y in orthogonal.

, it must be that

is the zero matrix (why?) and hence that

. Thus, A is

Theorem 7.1.3 has a useful geometric interpretation when considered from the viewpoint of matrix transformations: If A is an orthogonal matrix and is multiplication by A, then we will call an orthogonal operator on . It follows from parts (a) and (b) of Theorem 7.1.3 that the orthogonal operators on are precisely those operators that leave the lengths of all vectors unchanged. This explains why, in Example 2, we found the standard matrices for the basic reflections and rotations of and to be orthogonal. Parts (a) and (c) of Theorem 7.1.3 imply that orthogonal operators leave the angle between two vectors unchanged. Why?

Change of Orthonormal Basis Orthonormal bases for inner product spaces are convenient because, as the following theorem shows, many familiar formulas hold for such bases. We leave the proof as an exercise.

THEOREM 7.1.4 If S is an orthonormal basis for an n-dimensional inner product space

and if

then: (a) (b) (c)

Remark Note that the three parts of Theorem 7.1.4 can be expressed as

where the norm, distance, and inner product on the left sides are relative to the inner product on V and on the right sides are relative to the Euclidean inner product on .

Transitions between orthonormal bases for an inner product space are of special importance in geometry and various applications. The following theorem, whose proof is deferred to the end of this section, is concerned with transitions of this type.

THEOREM 7.1.5 Let V be a finite-dimensional inner product space. If P is the transition matrix from one orthonormal basis for V to another orthonormal basis for V, then P is an orthogonal matrix.

E X A M P L E 4 Rotation of Axes in 2-Space In many problems a rectangular xy-coordinate system is given, and a new -coordinate system is obtained by rotating the xy-system counterclockwise about the origin through an angle θ. When this is done, each point Q in the plane has two sets of coordinates—coordinates relative to the xy-system and coordinates relative to the -system (Figure 7.1.1a).

Figure 7.1.1 By introducing unit vectors

and

along the positive x- and y-axes and unit vectors

we can regard this rotation as a change from an old basis coordinates

and the old coordinates

to a new basis

and

along the positive

- and

-axes,

(Figure 7.1.1b). Thus, the new

of a point Q will be related by (2)

where P is the transition from B′ to B. To find P we must determine the coordinate matrices of the new basis vectors relative to the old basis. As indicated in Figure 7.1.1c, the components of in the old basis are cos θ and sin θ, so

Similarly, from Figure 7.1.1d, we see that the components of , so

in the old basis are

and

and

Thus the transition matrix from B′ to B is (3) Observe that P is an orthogonal matrix, as expected, since B and B′ are orthonormal bases. Thus

so 2 yields (4) or, equivalently, (5) These are sometimes called the rotation equations for

.

E X A M P L E 5 Rotation of Axes in 2-Space Use form 4 of the rotation equations for to find the new coordinates of the point coordinate system are rotated through an angle of .

if the coordinate axes of a rectangular

Solution Since

the equation in 4 becomes

Thus, if the old coordinates of a point Q are

so the new coordinates of Q are

, then

.

Remark Observe that the coefficient matrix in 4 is the same as the standard matrix for the linear operator that rotates the vectors of through the angle (see margin note for Table 5 of Section 4.9). This is to be expected since rotating the coordinate axes through the angle θ with the vectors of kept fixed has the same effect as rotating the vectors in through the angle with the axes kept fixed.

E X A M P L E 6 Application to Rotation of Axes in 3-Space Suppose that a rectangular xyz-coordinate system is rotated around its z-axis counterclockwise (looking down the positive z-axis) through an angle θ (Figure 7.1.2). If we introduce unit vectors , , and along the positive x-, y-, and z-axes and unit vectors , , and along the positive -, -, and -axes, we can regard the rotation as a change from the old basis to the new basis

Moreover, since

. In light of Example 4, it should be evident that

extends 1 unit up the positive -axis,

Figure 7.1.2 It follows that the transition matrix from B′ to B is

and the transition matrix from B to B′ is

(verify). Thus, the new coordinates

of a point Q can be computed from its old coordinates

by

OPTIONAL

We conclude this section with an optional proof of Theorem 7.1.5. Proof of Theorem 7.1.5 Assume that V is an n-dimensional inner product space and that P is the transition matrix from an orthonormal basis B′ to an orthonormal basis B. We will denote the norm relative to the inner product on V by the symbol to distinguish it from the norm relative to the Euclidean inner product on , which we will denote by .

Recall that denotes a coordinate vector expressed in comma-delimited form whereas denotes a coordinate vector expressed in column form. To prove that P is orthogonal, we will use Theorem 7.1.3 and show that for every vector x in . As a first step in this direction, recall from Theorem 7.1.4a that for any orthonormal basis for V the norm of any vector u in V is the same as the norm of its coordinate vector with respect to the Euclidean inner product, that is

or (6) Now let x be any vector in 6,

, and let u be the vector in V whose coordinate vector with respect to the basis B′ is x; that is,

which proves that P is orthogonal.

Concept Review • Orthogonal matrix • Orthogonal operator • Properties of orthogonal matrices. • Geometric properties of an orthogonal operator • Properties of transition matrices from one orthonormal basis to another.

Skills • Be able to identify an orthogonal matrix. • Know the possible values for the determinant of an orthogonal matrix. • Find the new coordinates of a point resulting from a rotation of axes.

Exercise Set 7.1 1. (a) Show that the matrix

is orthogonal in three ways: by calculating (b) Find the inverse of the matrix A in part (a). Answer: (b)

2. (a) Show that the matrix

is orthogonal.

, by using part (b) of Theorem 7.1.1, and by using part (c) of Theorem 7.1.1.

. Thus, from

(b) Let on

be multiplication by the matrix A in part (a). Find , verify that .

for the vector

3. Determine which of the following matrices are orthogonal. For those that are orthogonal, find the inverse. (a) (b)

(c)

(d)

(e)

(f)

Answer: (a) (b)

(d)

. Using the Euclidean inner product

(e)

4. Prove that if A is orthogonal, then

is orthogonal.

5. Verify that the reflection matrices in Tables Table 1 and Table 2 of Section 4.9 are orthogonal. 6. Let a rectangular

-coordinate system be obtained by rotating a rectangular xy-coordinate system counterclockwise through the angle

. (a) Find the

-coordinates of the point whose xy-coordinates are

(b) Find the xy-coordinates of the point whose 7. Repeat Exercise 6 with

-coordinates are

. .

.

Answer: (a) (b) 8. Let a rectangular

-coordinate system be obtained by rotating a rectangular xyz-coordinate system counterclockwise about the z-axis

(looking down the z-axis) through the angle (a) Find the

.

-coordinates of the point whose xyz-coordinates are

(b) Find the xyz-coordinates of the point whose 9. Repeat Exercise 8 for a rotation of

-coordinates are

. .

counterclockwise about the y-axis (looking along the positive y-axis toward the origin).

Answer: (a) (b) 10. Repeat Exercise 8 for a rotation of 11. (a) A rectangular

counterclockwise about the x-axis (looking along the positive x-axis toward the origin).

-coordinate system is obtained by rotating an xyz-coordinate system counterclockwise about the y-axis through an

angle θ (looking along the positive y-axis toward the origin). Find a matrix A such that

where

and

are the coordinates of the same point in the

- and

-systems, respectively.

(b) Repeat part (a) for a rotation about the x-axis. Answer: (a)

(b)

12. A rectangular

-coordinate system is obtained by first rotating a rectangular xyz-coordinate system 60° counterclockwise about the

z-axis (looking down the positive z-axis) to obtain an

-coordinate system, and then rotating the

-coordinate system 45°

counterclockwise about the

where

and

-axis (looking along the positive

are the xyz- and

-axis toward the origin). Find a matrix A such that

-coordinates of the same point.

13. What conditions must a and b satisfy for the matrix

to be orthogonal? Answer:

14. Prove that a

where

orthogonal matrix A has only one of two possible forms:

. [Hint: Start with a general

matrix

, and use the fact that the column vectors form an orthonormal set in

15. (a) Use the result in Exercise 14 to prove that multiplication by a rotation about the x-axis. (b) Prove that multiplication by Ais a rotation if

.]

orthogonal matrix is either a reflection or a reflection followed by a

and that a reflection followed by a rotation if

.

16. Use the result in Exercise 15 to determine whether multiplication by A is a reflection or a reflection followed by a rotation about the x-axis. Find the angle of rotation in either case. (a)

(b)

17. Find a, b, and c for which the matrix

is orthogonal. Are the values of a, b, and c unique? Explain. Answer: The only possibilities are

or

.

18. The result in Exercise 15 has an analog for orthogonal matrices: It can be proved that multiplication by a orthogonal matrix A is a rotation about some axis if and is a rotation about some axis followed by a reflection about some coordinate plane if . Determine whether multiplication by A is a rotation or a rotation followed by a reflection. (a)

(b)

19. Use the fact stated in Exercise 18 and part (b) of Theorem 7.1.2 to show that a composition of rotations can always be accomplished by a single rotation about some appropriate axis. 20. Prove the equivalence of statements (a) and (c) in Theorem 7.1.1. 21. A linear operator on is called rigid if it does not change the lengths of vectors, and it is called angle preserving if it does not change the angle between nonzero vectors. (a) Name two different types of linear operators that are rigid. (b) Name two different types of linear operators that are angle preserving. (c) Are there any linear operators on

that are rigid and not angle preserving? Angle preserving and not rigid? Justify your answer.

Answer: (a) Rotations about the origin, reflections about any line through the origin, and any combination of these (b) Rotation about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these (c) No; dilations and contractions

True-False Exercises In parts (a)–(h) determine whether the statement is true or false, and justify your answer. (a) The matrix

is orthogonal.

Answer: False (b)

The matrix

is orthogonal.

Answer: False (c) An

matrix A is orthogonal if

.

Answer: False (d) A square matrix whose columns form an orthogonal set is orthogonal. Answer: False (e) Every orthogonal matrix is invertible. Answer: True (f) If A is an orthogonal matrix, then

is orthogonal and

Answer: True (g) Every eigenvalue of an orthogonal matrix has absolute value 1.

.

Answer: True (h) If A is a square matrix and

for all unit vectors u, then A is orthogonal.

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

7.2 Orthogonal Diagonalization In this section we will be concerned with the problem of diagonalizing a symmetric matrix A. As we will see, this problem is closely related to that of finding an orthonormal basis for that consists of eigenvectors of A. Problems of this type are important because many of the matrices that arise in applications are symmetric.

The Orthogonal Diagonalization Problem In Definition 1 of Section 5.2 we defined two square matrices, A and B, to be similar if there is an invertible matrix P such that . In this section we will be concerned with the special case in which it is possible to find an orthogonal matrix P for which this relationship holds. We begin with the following definition.

DEFINITION 1 If A and B are square matrices, then we say that A and B are orthogonally similar if there is an orthogonal matrix P such that .

If A is orthogonally similar to some diagonal matrix, say then we say that A is orthogonally diagonalizable and that P orthogonally diagonalizes A. Our first goal in this section is to determine what conditions a matrix must satisfy to be orthogonally diagonalizable. As a first step, observe that there is no hope of orthogonally diagonalizing a matrix that is not symmetric. To see why this is so, suppose that (1) where P is an orthogonal matrix and D is a diagonal matrix. Multiplying the left side of 1 by P, the right side by using the fact that , we can rewrite this equation as

, and then

(2) Now transposing both sides of this equation and using the fact that a diagonal matrix is the same as its transpose we obtain

so A must be symmetric.

Conditions for Orthogonal Diagonalizability The following theorem shows that every symmetric matrix is, in fact, orthogonally diagonalizable. In this theorem, and for the remainder of this section, orthogonal will mean orthogonal with respect to the Euclidean inner product on .

THEOREM 7.2.1 If A is an

matrix, then the following are equivalent.

(a) A is orthogonally diagonalizable. (b) A has an orthonormal set of n eigenvectors. (c) A is symmetric.

Proof (a) ⇒ (b) Since A is orthogonally diagonalizable, there is an orthogonal matrix P such that is diagonal. As shown in the proof of Theorem 5.2.1, the n column vectors of P are eigenvectors of A. Since P is orthogonal, these column vectors are orthonormal, so A has n orthonormal eigenvectors. (b) ⇒ (a) Assume that A has an orthonormal set of n eigenvectors . As shown in the proof of Theorem 5.2.1, the matrix P with these eigenvectors as columns diagonalizes A. Since these eigenvectors are orthonormal, P is orthogonal and thus orthogonally diagonalizes A. (a) ⇒ (c) In the proof that (a) ⇒ (b) we showed that an orthogonally diagonalizable matrix A is orthogonally diagonalized by an matrix P whose columns form an orthonormal set of eigenvectors of A. Let D be the diagonal matrix from which it follows that Thus,

which shows that A is symmetric. (c) ⇒ (a) The proof of this part is beyond the scope of this text and will be omitted.

Properties of Symmetric Matrices Our next goal is to devise a procedure for orthogonally diagonalizing a symmetric matrix, but before we can do so, we need the following critical theorem about eigenvalues and eigenvectors of symmetric matrices.

THEOREM 7.2.2 If A is a symmetric matrix, then: (a) The eigenvalues of A are all real numbers. (b) Eigenvectors from different eigenspaces are orthogonal.

Part (a), which requires results about complex vector spaces, will be discussed in Section 7.5.

Proof (b) Let and be eigenvectors corresponding to distinct eigenvalues and that . Our proof of this involves the trick of starting with the expression Section 3.2 and the symmetry of A that

of the matrix A. We want to show . It follows from Formula 26 of

(3) But is an eigenvector of A corresponding to relationship

, and

is an eigenvector of A corresponding to

, so 3 yields the

which can be rewritten as (4) But

, since

and

were assumed distinct. Thus, it follows from 4 that

.

Theorem 7.2.2 yields the following procedure for orthogonally diagonalizing a symmetric matrix.

Orthogonally Diagonalizing an n × n Symmetric Matrix Step 1 Find a basis for each eigenspace of A. Step 2 Apply the Gram-Schmidt process to each of these bases to obtain an orthonormal basis for each eigenspace. Step 3 Form the matrix P whose columns are the vectors constructed in Step 2. This matrix will orthogonally diagonalize A, and the eigenvalues on the diagonal of will be in the same order as their corresponding eigenvectors in P.

Remark The justification of this procedure should be clear: Theorem 7.2.2 ensures that eigenvectors from different eigenspaces are orthogonal, and applying the Gram-Schmidt process ensures that the eigenvectors within the same eigenspace are orthonormal. It follows that the entire set of eigenvectors obtained by this procedure will be orthonormal.

E X A M P L E 1 Orthogonally Diagonalizing a Symmetric Matrix Find an orthogonal matrix P that diagonalizes

Solution We leave it for you to verify that the characteristic equation of A is

Thus, the distinct eigenvalues of A are can be shown that

and

. By the method used in Example 7 of Section 5.1, it

(5) form a basis for the eigenspace corresponding to . Applying the Gram-Schmidt process to yields the following orthonormal eigenvectors (verify):

(6)

The eigenspace corresponding to

has

as a basis. Applying the Gram-Schmidt process to

Finally, using

,

, and

(i.e., normalizing

) yields

as column vectors, we obtain

which orthogonally diagonalizes A. As a check, we leave it for you to confirm that

Spectral Decomposition If A is a symmetric matrix that is orthogonally diagonalized by

and if

are the eigenvalues of A corresponding to the unit eigenvectors then we know that , where D is a diagonal matrix with the eigenvalues in the diagonal positions. It follows from this that the matrix A can be expressed as

Multiplying out, we obtain the formula (7) which is called a spectral decomposition of A.* Note that in each term of the spectral decomposition of A has the form , where u is a unit eigenvector of A in column form, and λ is an eigenvalue of A corresponding to u. Since u has size , it follows that the product has size . It can be proved (though we will not do it) that is the standard matrix for the orthogonal projection of on the subspace spanned by the vector u. Accepting this to be so, the spectral decomposition of A tells that the image of a vector x under multiplication by a symmetric matrix A can be obtained by projecting x orthogonally on the lines (one-dimensional subspaces) determined by the eigenvectors of A, then scaling those projections by the eigenvalues, and then adding the scaled projections. Here is an example.

E X A M P L E 2 A Geometric Interpretation of a Spectral Decomposition The matrix

has eigenvalues

and

with corresponding eigenvectors

(verify). Normalizing these basis vectors yields

so a spectral decomposition of A is

(8)

where, as noted above, the matrices on the right side of 8 are the standard matrices for the orthogonal projections onto the eigenspaces corresponding to and , respectively. Now let us see what this spectral decomposition tells us about the image of the vector multiplication by A. Writing x in column form, it follows that

under

(9) and from 8 that

(10)

Formulas 9 and 10 provide two different ways of viewing the image of the vector under multiplication by A: Formula 9 tells us directly that the image of this vector is , whereas Formula 10 tells us that this image can also be obtained by projecting (1, 1) onto the eigenspaces corresponding to and to obtain the vectors

and

, then scaling by the eigenvalues to obtain

adding these vectors (see Figure 7.2.1).

Figure 7.2.1

and

, and then

The Nondiagonalizable Case If A is an matrix that is not orthogonally diagonalizable, it may still be possible to achieve considerable simplification in the form of by choosing the orthogonal matrix P appropriately. We will consider two theorems (without proof) that illustrate this. The first, due to the German mathematician Isaai Schur, states that every square matrix A is orthogonally similar to an upper triangular matrix that has the eigenvalues of A on the main diagonal.

THEOREM 7.2.3 Schur's Theorem If A is an matrix with real entries and real eigenvalues, then there is an orthogonal matrix P such that an upper triangular matrix of the form

is

(11)

in which

are the eigenvalues of the matrix A repeated according to multiplicity.

Issai Schur (1875–1941) Historical Note The life of the German mathematician Issai Schur is a sad reminder of the effect that Nazi policies had on Jewish intellectuals during the 1930s. Schur was a brilliant mathematician and a popular lecturer who attracted many students and researchers to the University of Berlin, where he worked and taught. His lectures sometimes attracted so many students that opera glasses were needed to see him from the back row. Schur's life became increasingly difficult under Nazi rule, and in April of 1933 he was forced to “retire” from the university under a law that prohibited non-Aryans from holding “civil service” positions. There was an outcry from many of his students and colleagues who respected and liked him, but it did not stave off his complete dismissal in 1935. Schur, who thought of himself as a loyal German never understood the persecution and humiliation he received at Nazi hands. He left Germany for Palestine in 1939, a broken man. Lacking in financial resources, he had to sell his beloved mathematics books and lived in poverty until his death in 1941. [Image: Courtesy Electronic Publishing Services, Inc., New York City]

It is common to denote the upper triangular matrix in 11 by S (for Schur), in which case that equation can be rewritten as

(12) which is called a Schur decomposition of A. The next theorem, due to the German mathematician and engineer Karl Hessenberg (1904–1959), states that every square matrix with real entries is orthogonally similar to a matrix in which each entry below the first subdiagonal is zero (Figure 7.2.2). Such a matrix is said to be in upper Hessenberg form.

Figure 7.2.2

THEOREM 7.2.4 Hessenberg's Theorem If A is an

matrix, then there is an orthogonal matrix P such that

is a matrix of the form

(13)

Note that unlike those in 11, the diagonal entries in 13 are usually not the eigenvalues of A.

It is common to denote the upper Hessenberg matrix in 13 by H (for Hessenberg), in which case that equation can be rewritten as (14) which is called an upper Hessenberg decomposition of A. Remark In many numerical algorithms the initial matrix is first converted to upper Hessenberg form to reduce the amount of computation in subsequent parts of the algorithm. Many computer packages have built-in commands for finding Schur and Hessenberg decompositions.

Concept Review • Orthogonally similar matrices

• Orthogonally diagonalizable matrix • Spectral decomposition (or eigenvalue decomposition) • Schur decomposition • Subdiagonal • Upper Hessenburg form • Upper Hessenburg decomposition

Skills • Be able to recognize an orthogonally diagonalizable matrix. • Know that eigenvalues of symmetric matrices are real numbers. • Know that for a symmetric matrix eigenvectors from different eigenspaces are orthogonal. • Be able to orthogonally diagonalize a symmetric matrix. • Be able to find the spectral decomposition of a symmetric matrix. • Know the statement of Schur's Theorem. • Know the statement of Hessenburg's Theorem.

Exercise Set 7.2 1. Find the characteristic equation of the given symmetric matrix, and then by inspection determine the dimensions of the eigenspaces. (a) (b)

(c)

(d)

(e)

(f)

Answer: (a) (b)

one-dimensional; one-dimensional;

one-dimensional two-dimensional

(c) (d) (e) (f)

one-dimensional;

two-dimensional

two-dimensional; three-dimensional;

one-dimensional

one-dimensional two-dimensional;

two-dimensional

In Exercises 2–9, find a matrix P that orthogonally diagonalizes A, and determine 2. 3.

Answer:

4. 5.

Answer:

6.

7.

Answer:

.

8.

9.

Answer:

10. Assuming that

, find a matrix that orthogonally diagonalizes

11. Prove that if A is any

matrix, then

12. (a) Show that if v is any

has an orthonormal set of n eigenvectors.

matrix and I is the

identity matrix, then

(b) Find a matrix P that orthogonally diagonalizes

is orthogonally diagonalizable.

if

13. Use the result in Exercise 19 of Section 5.1 to prove Theorem 7.2.2a for

symmetric matrices.

14. Does there exist a

,

symmetric matrix with eigenvalues

If so, find such a matrix; if not, explain why not. 15. Is the converse of Theorem 7.2.2b true? Explain. Answer: No 16. Find the spectral decomposition of each matrix. (a) (b) (c)

and corresponding eigenvectors

(d)

17. Show that if A is a symmetric orthogonal matrix, then 1 and 18. (a) Find a symmetric matrix whose eigenvalues are eigenvectors are , , (b) Is there a

are the only possible eigenvalues. ,

and for which the corresponding

.

symmetric matrix with eigenvalues , , , , ? Explain your reasoning.

and corresponding eigenvectors

19. Let A be a diagonalizable matrix with the property that eigenvectors from distinct eigenvalues are orthogonal. Must A be symmetric? Explain you reasoning. Answer: Yes 20. Prove: If

is an orthonormal basis for

then A is symmetric and has eigenvalues

, and if A can be expressed as

.

21. In this exercise we will establish that a matrix A is orthogonally diagonalizable if and only if it is symmetric. We have shown that an orthogonally diagonalizable matrix is symmetric. The harder part is to prove that a symmetric matrix A is orthogonally diagonalizable. We will proceed in two steps: first we will show that A is diagonalizable, and then we will build on that result to show that A is orthogonally diagonalizable. (a) Assume that A is a symmetric matrix. One way to prove that A is diagonalizable is to show that for each eigenvalue the geometric multiplicity is equal to the algebraic multiplicity. For this purpose, assume that the geometric multiplicity of is k, let be an orthonormal basis for the eigenspace corresponding to , extend this to an orthonormal basis for , and let P be the matrix having the vectors of B as columns. As shown in Exercise 34(b) of Section 5.2, the product can be written as

Use the fact that B is an orthonormal basis to prove that

[a zero matrix of size

.

(b) It follows from part (a) and Exercise 34(c) of Section 5.2 that A has the same characteristic polynomial as

Use this fact and Exercise 34(d) of Section 5.2 to prove that the algebraic multiplicity of geometric multiplicity of . This establishes that A is diagonalizable.

is the same as the

(c) Use Theorem 7.2.2(b) and the fact that A is diagonalizable to prove that A is orthogonally diagonalizable.

True-False Exercises In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) If A is a square matrix, then Answer: True

and

are orthogonally diagonalizable.

(b) If

and

are eigenvectors from distinct eigenspaces of a symmetric matrix, then

.

Answer: True (c) Every orthogonal matrix is orthogonally diagonalizable. Answer: False (d) If A is both invertible and orthogonally diagonalizable, then

is orthogonally diagonalizable.

Answer: True (e) Every eigenvalue of an orthogonal matrix has absolute value 1. Answer: True (f) If A is an orthogonally diagonalizable matrix, then there exists an orthonormal basis for eigenvectors of A. Answer: False (g) If A is orthogonally diagonalizable, then A has real eigenvalues. Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

consisting of

7.3 Quadratic Forms In this section we will use matrix methods to study real-valued functions of several variables in which each term is either the square of a variable or the product of two variables. Such functions arise in a variety of applications, including geometry, vibrations of mechanical systems, statistics, and electrical engineering.

Definition of a Quadratic Form Expressions of the form occurred in our study of linear equations and linear systems. If are treated as fixed constants, then this expression is a real-valued function of the n variables and is called a linear form on . All variables in a linear form occur to the first power and there are no products of variables. Here we will be concerned with quadratic forms on , which are functions of the form

The terms of the form with those involving

are called cross product terms. It is common to combine the cross product terms involving to avoid duplication. Thus, a general quadratic form on would typically be expressed as (1)

and a general quadratic form on

as (2)

If, as usual, we do not distinguish between the number a and the variables, then 1 and 2 can be expressed in matrix form as

matrix [a], and if we let x be the column vector of

(verify). Note that the matrix A in these formulas is symmetric, that its diagonal entries are the coefficients of the squared terms, and its off-diagonal entries are half the coefficients of the cross product terms. In general, if A is a symmetric matrix and x is an column vector of variables, then we call the function (3) the quadratic form associated with A. When convenient, 3 can be expressed in dot product notation as (4)

In the case where A is a diagonal matrix, the quadratic form entries , then

has no cross product terms; for example, if A has diagonal

E X A M P L E 1 Expressing Quadratic Forms in Matrix Notation In each part, express the quadratic form in the matrix notation

, where A is symmetric.

(a) (b) Solution The diagonal entries of A are the coefficients of the squared terms, and the off-diagonal entries are half the coefficients of the cross product terms, so

Change of Variable in a Quadratic Form There are three important kinds of problems that occur in applications of quadratic forms:

Problem 1 If ?

is a quadratic form on

or

, what kind of curve or surface is represented by the equation

Problem 2 If ?

is a quadratic form on

, what conditions must A satisfy for

Problem 3 If ?

is a quadratic form on

, what are its maximum and minimum values if x is constrained to satisfy

to have positive values for

We will consider the first two problems in this section and the third problem in the next section. Many of the techniques for solving these problems are based on simplifying the quadratic form

by making a substitution (5)

that expresses the variables in terms of new variables . If P is invertible, then we call 5 a change of variable, and if P is orthogonal, then we call 5 an orthogonal change of variable. If we make the change of variable

in the quadratic form

, then we obtain (6)

Since the matrix in the variables

is symmetric (verify), the effect of the change of variable is to produce a new quadratic form . In particular, if we choose P to orthogonally diagonalize A, then the new quadratic form will be

, where D is a diagonal matrix with the eigenvalues of A on the main diagonal; that is,

Thus, we have the following result, called the principal axes theorem.

THEOREM 7.3.1 The Principal Axes Theorem If A is a symmetric

matrix, then there is an orthogonal change of variable that transforms the quadratic form

into a quadratic form

with no cross product terms. Specifically, if P orthogonally diagonalizes A, then making the

change of variable

in which P.

in the quadratic form

yields the quadratic form

are the eigenvalues of A corresponding to the eigenvectors that form the successive columns of

E X A M P L E 2 An Illustration of the Principal Axes Theorem Find an orthogonal change of variable that eliminates the cross product terms in the quadratic form , and express Q in terms of the new variables. Solution The quadratic form can be expressed in matrix notation as

The characteristic equation of the matrix A is

so the eigenvalues are are

,

. We leave it for you to show that orthonormal bases for the three eigenspaces

Thus, a substitution

that eliminates the cross product terms is

This produces the new quadratic form

in which there are no cross product terms.

Remark If A is a symmetric matrix, then the quadratic form is a real-valued function whose range is the set of all possible values for as x varies over . It can be shown that an orthogonal change of variable does not alter the range of a quadratic form; that is, the set of all values for as x varies over is the same as the set of all values for as y varies over

.

Quadratic Forms in Geometry Recall that a conic section or conic is a curve that results by cutting a double-napped cone with a plane (Figure 7.3.1). The most important conic sections are ellipses, hyperbolas, and parabolas, which result when the cutting plane does not pass through the vertex. Circles are special cases of ellipses that result when the cutting plane is perpendicular to the axis of symmetry of the cone. If the cutting plane passes through the vertex, then the resulting intersection is called a degenerate conic. The possibilities are a point, a pair of intersecting lines, or a single line.

Figure 7.3.1

Figure 7.3.2 Quadratic forms in

arise naturally in the study of conic sections. For example, it is shown in analytic geometry that an

equation of the form (7) in which a, b, and c are not all zero, represents a conic section.* If becomes

in 7, then there are no linear terms, so the equation

(8) and is said to represent a central conic. These include circles, ellipses, and hyperbolas, but not parabolas. Furthermore, if in 8, then there is no cross product term (i.e., term involving xy), and the equation (9) is said to represent a central conic in standard position. The most important conics of this type are shown in Table 1. Table 1

If we take the constant f in Equations 8 and 9 to the right side and let form as

, then we can rewrite these equations in matrix

(10) The first of these corresponds to Equation 8 in which there is a cross product term 2bxy, and the second corresponds to Equation 9 in which there is no cross product term. Geometrically, the existence of a cross product term signals that the graph of the quadratic form is rotated about the origin, as in Figure 7.3.2. The three-dimensional analogs of the equations in 10 are (11) If a, b, and c are not all zero, then the graphs of these equations in

are called central quadrics in standard position.

Identifying Conic Sections We are now ready to consider the first of the three problems posed earlier, identifying the curve or surface represented by an equation in two or three variables. We will focus on the two-variable case. We noted above that an equation of the form

(12) represents a central conic. If , then the conic is in standard position, and if , it is rotated. It is an easy matter to identify central conics in standard position by matching the equation with one of the standard forms. For example, the equation can be rewritten as

which, by comparison with Table 1, is the ellipse shown in Figure 7.3.3.

Figure 7.3.3 If a central conic is rotated out of standard position, then it can be identified by first rotating the coordinate axes to put it in standard position and then matching the resulting equation with one of the standard forms in Table 1. To find a rotation that eliminates the cross product term in the equation (13) it will be convenient to express the equation in the matrix form (14) and look for a change of variable that diagonalizes A and for which

. Since we saw in Example 4 of Section 7.1 that the transition matrix (15)

has the effect of rotating the xy-axes of a rectangular coordinate system through an angle θ, our problem reduces to finding θ that diagonalizes A, thereby eliminating the cross product term in 13. If we make this change of variable, then in the -coordinate system, Equation 14 will become (16) where

and

are the eigenvalues of A.The conic can now be identified by writing 16 in the form (17)

and performing the necessary algebra to match it with one of the standard forms in Table 1. For example, if , , and k are positive, then 17 represents an ellipse with an axis of length in the -direction and in the -direction. The

first column vector of P, which is a unit eigenvector corresponding to , is along the positive -axis; and the second column vector of P, which is a unit eigenvector corresponding to , is a unit vector along the -axis. These are called the principal axes of the ellipse, which explains why Theorem 7.3.1 is called “the principal axes theorem.” (See Figure 7.3.4.)

Figure 7.3.4

E X A M P L E 3 Identifying a Conic by Eliminating the Cross Product Term (a) Identify the conic whose equation is

by rotating the xy-axes to put the conic in

standard position. (b) Find the angle θ through which you rotated the xy-axes in part (a). Solution (a) The given equation can be written in the matrix form where

The characteristic polynomial of A is

so the eigenvalues are are

and

. We leave it for you to show that orthonormal bases for the eigenspaces

Thus, A is orthogonally diagonalized by

(18)

Had it turned out that , then we would have interchanged the columns to reverse the sign.

Moreover, it happens by chance that , so we are assured that the substitution performs a rotation of axes. It follows from 16 that the equation of the conic in the -coordinate system is

which we can write as

We can now see from Table 1 that the conic is an ellipse whose axis has length length in the -direction.

in the

-direction and

(b) It follows from 15 that

which implies that

Thus,

(Figure 7.3.5).

Figure 7.3.5

Remark In the exercises we will ask you to show that if

, then the cross product term in the equation

can be eliminated by a rotation through an angle θ that satisfies (19) We leave it for you to confirm that this is consistent with part (b) of the last example.

Positive Definite Quadratic Forms We will now consider the second of the two problems posed earlier, determining conditions under which nonzero values of x. We will explain why this is important shortly, but first we introduce some terminology.

for all

The terminology in Definition 1 also applies to the matrix A; that is, A is positive definite, negative definite, or indefinite in accordance with whether the associated quadratic form has that property.

DEFINITION 1 A quadratic form

is said to be

positive definite if

for

negative definite if indefinite if

for

has both positive and negative values

The following theorem, whose proof is deferred to the end of the section, provides a way of using eigenvalues to determine whether a matrix A and its associated quadratic form are positive definite, negative definite, or indefinite.

THEOREM 7.3.2 If A is a symmetric matrix, then: (a)

is positive definite if and only if all eigenvalues of A are positive.

(b)

is negative definite if and only if all eigenvalues of A are negative.

(c)

is indefinite if and only if A has at least one positive eigenvalue and at least one negative eigenvalue.

Remark The three classifications in Definition 1 do not exhaust all of the possibilities. For example, a quadratic form for which if is called positive semidefinite, and one for which if is called negative semidefinite. Every positive definite form is positive semidefinite, but not conversely, and every negative definite form is negative semidefinite, but not conversely (why?). By adjusting the proof of Theorem 7.3.2 appropriately, one can prove that positive semidefinite if and only if all eigenvalues of A are nonnegative and is negative semidefinite if and only if all eigenvalues of A are nonpositive.

E X A M P L E 4 Positive Definite Quadratic Forms It is not usually possible to tell from the signs of the entries in a symmetric matrix A whether that matrix is positive definite, negative definite, or indefinite. For example, the entries of the matrix

are nonnegative, but the matrix is indefinite since its eigenvalues are way, let us write out the quadratic form as

, 4,

(verify). To see this another

is

Positive definite and negative definite matrices are invertible. Why? We can now see, for example, that and

Classifying Conic Sections Using Eigenvalues If

is the equation of a conic, and if

, then we can divide through by k and rewrite the equation in the form (20)

where . If we now rotate the coordinate axes to eliminate the cross product term (if any) in this equation, then the equation of the conic in the new coordinate system will be of the form (21) in which and are the eigenvalues of A. The particular type of conic represented by this equation will depend on the signs of the eigenvalues and . For example, you should be able to see from 21 that: •

represents an ellipse if

and



has no graph if



represents a hyperbola if

and

.

. and

have opposite signs.

In the case of the ellipse, Equation 21 can be rewritten as (22) so the axes of the ellipse have lengths

and

(Figure 7.3.6).

Figure 7.3.6 The following theorem is an immediate consequence of this discussion and Theorem 7.3.2.

THEOREM 7.3.3 If A is a symmetric

matrix, then:

(a)

represents an ellipse if A is positive definite.

(b)

has no graph if A is negative definite.

(c)

represents a hyperbola if A is indefinite.

In Example we performed a rotation to show that the equation represents an ellipse with a major axis of length 6 and a minor axis of length 4. This conclusion can also be obtained by rewriting the equation in the form

and showing that the associated matrix

has eigenvalues

and

. These eigenvalues are positive, so the matrix A is positive definite and the equation

represents an ellipse. Moreover, it follows from 21 that the axes of the ellipse have lengths is consistent with Example 3.

and

, which

Identifying Positive Definite Matrices Positive definite matrices are the most important symmetric matrices in applications, so it will be useful to learn a little more about them. We already know that a symmetric matrix is positive definite if and only if its eigenvalues are all positive; now we will give a criterion that can be used to determine whether a symmetric matrix is positive definite without finding the eigenvalues. For this purpose we define the kth principal submatrix of an matrix A to be the submatrix consisting of the first k rows and columns of A. For example, here are the principal submatrices of a general matrix:

The following theorem, which we state without proof, provides a determinant test for ascertaining whether a symmetric matrix is positive definite.

THEOREM 7.3.4 A symmetric matrix A is positive definite if and only if the determinant of every principal submatrix is positive.

E X A M P L E 5 Working with Principal Submatrices

The matrix

is positive definite since the determinants

are all positive. Thus, we are guaranteed that all eigenvalues of A are positive and

for

.

OPTIONAL

We conclude this section with an optional proof of Theorem 7.3.2. Proofs of Theorem 7.3.2(a) and (b) It follows from the principal axes theorem (Theorem 7.3.1) that there is an orthogonal change of variable for which

(23) where the λ's are the eigenvalues of A. Moreover, it follows from the invertibility of P that if and only if values of for are the same as the values of for Thus, it follows from 23 that only if all of the λ's in that equation are positive, and that proves parts (a) and (b).

for

, so the for if and

if and only if all of the λ's are negative. This

Proof (c) Assume that A has at least one positive eigenvalue and at least one negative eigenvalue, and to be specific, suppose that and in 23. Then

and which proves that

is indefinite. Conversely, if

in 23 must be positive. Similarly, if negative, which completes the proof.

Concept Review • Linear form • Quadratic form • Cross product term • Quadratic form associated with a matrix • Change of variable • Orthogonal change of variable • Principal Axes Theorem • Conic section

for some x, then

for some x, then

for some y, so at least one of the λ's

for some y, so at least one of the λ's in 23 must be

• Degenerate conic • Central conic • Standard position of a central conic • Standard form of a central conic • Central quadric • Principal axes of an ellipse • Positive definite quadratic form • Negative definite quadratic form • Indefinite quadratic form • Positive semidefinite quadratic form • Negative semidefinite quadratic form • Principal submatrix

Skills • Express a quadratic form in the matrix notation

, where A is a symmetric matrix.

• Find an orthogonal change of variable that eliminates the cross product terms in a quadratic form, and express the quadratic form in terms of the new variable. • Identify a conic section from an equation by rotating axes to place the conic in standard position, and find the angle of rotation. • Identify a conic section using eigenvalues. • Classify matrices and quadratic forms as positive definite, negative definite, indefinite, positive semidefinite or negative semidefinite.

Exercise Set 7.3 In Exercises 1–2, express the quadratic form in the matrix notation 1. (a) (b) (c) Answer: (a) (b) (c)

2. (a) (b)

, where A is a symmetric matrix.

(c) In Exercises 3–4, find a formula for the quadratic form that does not use matrices. 3.

Answer:

4.

In Exercises 5–8, find an orthogonal change of variables that eliminates the cross product terms in the quadratic form Q, and express Q in terms of the new variables. 5. Answer:

6. 7. Answer:

8. In Exercises 9–10, express the quadratic equation in the matrix form quadratic form and K is an appropriate matrix. 9. (a) (b) Answer: (a)

, where

is the associated

(b) 10. (a) (b) In Exercises 11–12, identify the conic section represented by the equation. 11. (a) (b) (c) (d) Answer: (a) ellipse (b) hyperbola (c) parabola (d) circle 12. (a) (b) (c) (d) In Exercises 13–16, identify the conic section represented by the equation by rotating axes to place the conic in standard position. Find an equation of the conic in the rotated coordinates, and find the angle of rotation. 13. Answer: Hyperbola: 14. 15. Answer: Hyperbola: 16. In Exercises 17–18, determine by inspection whether the matrix is positive definite, negative definite, indefinite, positive semidefinite, or negative semidefinite. 17. (a) (b)

(c) (d) (e)

Answer: (a) Positive definite (b) Negative definite (c) Indefinite (d) Positive semidefinite (e) Negative semidefinite 18. (a) (b) (c) (d) (e) In Exercise 19–24, classify the quadratic form as positive definite, negative definite, indefinite, positive semidefinite, or negative semidefinite. 19. Answer: Positive definite 20. 21. Answer: Positive semidefinite 22. 23. Answer: Indefinite 24. In Exercises 25–26, show that the matrix A is positive definite first by using Theorem 7.3.2 and second by using Theorem 7.3.4. 25.

(a) (b)

26. (a) (b)

In Exercises 27–28, find all values of k for which the quadratic form is positive definite. 27. Answer: 28. 29. Let

be a quadratic form in the variables

(a) Show that

, and define

by

.

.

(b) Show that 30. Express the quadratic form

in the matrix notation

, where A is symmetric.

31. In statistics, the quantities

and

are called, respectively, the sample mean and sample variance of (a) Express the quadratic form (b) Is

in the matrix notation

.

, where A is symmetric.

a positive definite quadratic form? Explain.

Answer: (a)

(b) Yes 32. The graph in an xyz-coordinate system of an equation of form

in which a, b, and c are positive is a

surface called a central ellipsoid in standard position (see the accompanying figure). This is the three-dimensional generalization of the ellipse in the xy-plane. The intersections of the ellipsoid with the

coordinate axes determine three line segments called the axes of the ellipsoid. If a central ellipsoid is rotated about the origin so two or more of its axes do not coincide with any of the coordinate axes, then the resulting equation will have one or more cross product terms. (a) Show that the equation

represents an ellipsoid, and find the lengths of its axes. [Suggestion: Write the equation in the form an orthogonal change of variable to eliminate the cross product terms. (b) What property must a symmetric

matrix have in order for the equation

and make

to represent an ellipsoid?

Figure Ex-32 33. What property must a symmetric

matrix A have for

to represent a circle?

Answer: A must have a positive eigenvalue of multiplicity 2.

34. Prove: If

, then the cross product term can be eliminated from the quadratic form

by rotating the

coordinate axes through an angle θ that satisfies the equation

35. Prove that if A is an

symmetric matrix all of whose eigenvalues are nonnegative, then

.

True-False Exercises In parts (a)–(l) determine whether the statement is true or false, and justify your answer. (a) A symmetric matrix with positive definite eigenvalues is positive definite. Answer: True (b)

is a quadratic form. Answer: False

(c)

is a quadratic form. Answer: True

(d) A positive definite matrix is invertible. Answer:

for all nonzero x in

True (e) A symmetric matrix is either positive definite, negative definite, or indefinite. Answer: False (f) If A is positive definite, then

is negative definite.

Answer: True (g)

is a quadratic form for all x in

.

Answer: True (h) If

is a positive definite quadratic form, then so is

.

Answer: True (i) If A is a matrix with only positive eigenvalues, then

is a positive definite quadratic form.

Answer: False (j) If A is a

symmetric matrix with positive entries and

, then A is positive definite.

Answer: True (k) If

is a quadratic form with no cross product terms, then A is a diagonal matrix.

Answer: False (l) If

is a positive definitequadratic form in two variables and

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

, then the graph of the equation

is an ellipse.

7.4 Optimization Using Quadratic Forms Quadratic forms arise in various problems in which the maximum or minimum value of some quantity is required. In this section we will discuss some problems of this type.

Constrained Extremum Problems Our first goal in this section is to consider the problem of finding the maximum and minimum values of a quadratic form subject of the constraint . Problems of this type arise in a wide variety of applications. To visualize this problem geometrically in the case where is a quadratic form on , view as the equation of some surface in a rectangular xyz-coordinate system and view as the unit circle centered at the origin of the xy-plane. Geometrically, the problem of finding the maximum and minimum values of subject to the requirement amounts to finding the highest and lowest points on the intersection of the surface with the right circular cylinder determined by the circle (Figure 7.4.1).

Figure 7.4.1 The following theorem, whose proof is deferred to the end of the section, is the key result for solving problems of this type.

THEOREM 7.4.1 Constrained Extremum Theorem Let A be a symmetric (a) the quadratic form which ;

matrix whose eigenvalues in order of decreasing size are . Then: attains a maximum value and a minimum value on the set of vectors for

(b) the maximum value attained in part (a) occurs at a unit vector corresponding to the eigenvalue

;

(c) the minimum value attained in part (a) occurs at a unit vector corresponding to the eigenvalue

.

Remark The condition in this theorem is called a constraint, and the maximum or minimum value of subject to the constraint is called a constrained extremum. This constraint can also be expressed as or as , when convenient.

E X A M P L E 1 Finding Constrained Extrema Find the maximum and minimum values of the quadratic form subject to the constraint

.

Solution The quadratic form can be expressed in matrix notation as

We leave it for you to show that the eigenvalues of A are eigenvectors are

and

and that corresponding

Normalizing these eigenvectors yields

(1)

Thus, the constrained extrema are

Remark Since the negatives of the eigenvectors in 1 are also unit eigenvectors, they too produce the maximum and minimum values of z; that is, the constrained maximum also occurs at the point and the constrained minimum

at

.

E X A M P L E 2 A Constrained Extremum Problem A rectangle is to be inscribed in the ellipse

, as shown in Figure 7.4.2.Use

eigenvalue methods to find nonnegative values of x and y that produce the inscribed rectangle with maximum area.

Figure 7.4.2 A rectangle inscribed in the ellipse

.

Solution The area z of the inscribed rectangle is given by

, so the problem is to maximize

the quadratic form

. In this problem, the graph of

subject to the constraint

the constraint equation is an ellipse rather than the unit circle as required in Theorem 7.4.1, but we can remedy this problem by rewriting the constraint as

and defining new variables,

and

, by the equations

This enables us to reformulate the problem as follows: subject to the constraint To solve this problem, we will write the quadratic form

We now leave it for you to show that the largest eigenvalue of A is corresponding unit eigenvector with nonnegative entries is

Thus, the maximum area is

as

and that the only

, and this occurs when

Constrained Extrema and Level Curves A useful way of visualizing the behavior of a function of two variables is to consider the curves in the xy-plane along which is constant. These curves have equations of the form

and are called the level curves of f (Figure 7.4.3).In particular, the level curves of a quadratic form have equations of the form

on

(2) so the maximum and minimum values of subject to the constraint are the largest and smallest values of k for which the graph of 2 intersects the unit circle. Typically, such values of k produce level curves that just touch the unit circle (Figure 7.4.4), and the coordinates of the points where the level curves just touch produce the vectors that maximize or minimize subject to the constraint .

Figure 7.4.3

Figure 7.4.4

E X A M P L E 3 Example 1 Revisited Using Level Curves In Example 1 (and its following remark) we found the maximum and minimum values of the quadratic form subject to the constraint

. We showed that the constrained maximum is

, and this is

attained at the points (3) and that the constrained minimum

, and this is attained at the points (4)

Geometrically, this means that the level curve

should just touch the unit

circle at the points in 3, and the level curve

should just touch it at the points

in 4. All of this is consistent with Figure 7.4.5.

Figure 7.4.5

CALCULUS REQUIRED

Relative Extrema of Functions of Two Variables We will conclude this section by showing how quadratic forms can be used to study characteristics of real-valued functions of two variables. Recall that if a function occur at points where

has first-order partial derivatives, then its relative maxima and minima, if any,

These are called critical points of f. The specific behavior of f at a critical point of

is determined by the sign

(5) at points

that are close to, but different from,

:

• If

at points that are sufficiently close to, but different from, at such points and f is said to have a relative minimum at

• If

at points that are sufficiently close to, but different from, at such points and f is said to have a relative maximum at

• If

, then (Figure 7.4.6a). , then (Figure 7.4.6b).

has both positive and negative values inside every circle centered at , then there are points that are arbitrarily close to at which and points that are arbitrarily close to at which . In this case we say that f has a saddle point at (Figure 7.4.6c).

Figure 7.4.6 In general, it can be difficult to determine the sign of 5 directly. However, the following theorem, which is proved in calculus, makes it possible to analyze critical points using derivatives.

THEOREM 7.4.2 Second Derivative Test Suppose that is a critical point of derivatives in some circular region centered at

and that f has continuous second-order partial . Then:

(a) f has a relative minimum at

if

(b) f has a relative maximum at

if

(c) f has a saddle point at

if

(d) The test is inconclusive if

Our interest here is in showing how to reformulate this theorem using properties of symmetric matrices. For this purpose we consider the symmetric matrix

which is called the Hessian or Hessian matrix of f in honor of the German mathematician and scientist Ludwig Otto Hesse (1811–1874). The notation emphasizes that the entries in the matrix depend on x and y. The Hessian is of interest because

is the expression that appears in Theorem 7.4.2. We can now reformulate the second derivative test as follows.

THEOREM 7.4.3 Hessian Form of the Second Derivative Test Suppose that is a critical point of derivatives in some circular region centered at

and that f has continuous second-order partial . If is the Hessian of f at

(a) f has a relative minimum at

if

is positive definite.

(b) f has a relative maximum at

if

is negative definite.

(c) f has a saddle point at

if

, then:

is indefinite.

(d) The test is inconclusive otherwise.

We will prove part (a). The proofs of the remaining parts will be left as exercises. Proof (a) If is positive definite, then Theorem 7.3.4 implies that the principal submatrices of have positive determinants. Thus,

and so f has a relative minimum at

by part (a) of Theorem 7.4.2.

E X A M P L E 4 Using the Hessian to Classify Relative Extrema Find the critical points of the function

and use the eigenvalues of the Hessian matrix at those points to determine which of them, if any, are relative maxima, relative minima, or saddle points. Solution To find both the critical points and the Hessian matrix we will need to calculate the first and second partial derivatives of f. These derivatives are

Thus, the Hessian matrix is

To find the critical points we set

and

equal to zero. This yields the equations

Solving the second equation yields or . Substituting in the first equation and solving for y yields or ; and substituting into the first equation and solving for x yields or . Thus, we have four critical points: Evaluating the Hessian matrix at these points yields

We leave it for you to find the eigenvalues of these matrices and deduce the following classifications of the stationary points: Critical Point (x0, y0)

λ1

λ2

Classification

(0, 0)

8

−8

Saddle point

(0, 8)

8

−8

Saddle point

Critical Point (x0, y0)

λ1

λ2

Classification

(4, 4)

8

8

Relative minimum

(−4, 4)

−8

−8

Relative maximum

OPTIONAL

We conclude this section with an optional proof of Theorem 7.4.1. Proof of Theorem 7.4.1 The first step in the proof is to show that has constrained maximum and minimum values for . Since A is symmetric, the principal axes theorem (Theorem 7.3.1) implies that there is an orthogonal change of variable such that

(6) in which are the eigenvalues of A. Let us assume that (which are unit eigenvectors of A) have been ordered so that

and that the column vectors of P

(7) Since the matrix P is orthogonal, multiplication by P is length preserving, so that

; that is,

It follows from this equation and 7 that

and hence from 6 that This shows that all values of for which lie between the largest and smallest eigenvalues of A. Now let x be a unit eigenvector corresponding to . Then

which shows that of A corresponding to

so to

has as a constrained maximum and that this maximum occurs if x is a unit eigenvector . Similarly, if x is a unit eigenvector corresponding to , then

has as a constrained minimum and this minimum occurs if x is a unit eigenvector of A corresponding . This completes the proof.

Concept Review • Constraint • Constrained extremum • Level curve • Critical point • Relative minimum • Relative maximum • Saddle point • Second derivative test • Hessian matrix

Skills • Find the maximum and minimum values of a quadratic form subject to a constraint. • Find the critical points of a real-valued function of two variables, and use the eigenvalues of the Hessian matrix at the critical points to classify them as relative maxima, relative minima, or saddle points.

Exercise Set 7.4 In Exercises 1–4, find the maximum and minimum values of the given quadratic form subject to the constraint , and determine the values of x and y at which the maximum and minimum occur. 1. Answer: Maximum: 5 at

and

; minimum:

at

and

2. xy 3. Answer: Maximum: 7 at (0, 1) and (0, −1); minimum: 3 at (1, 0) and (−1, 0) 4. In Exercises 5–6, find the maximum and minimum values of the given quadratic form subject to the constraint and determine the values of x, y, and z at which the maximum and minimum occur. 5.

Answer: Maximum: 9 at (1, 0, 0) and (−1, 0, 0); minimum: 3 at (0, 0, 1) and (0, 0, −1) 6. 7. Use the method of Example 2 to find the maximum and minimum values of xy subject to the constraint . Answer: Maximum:

at

and

minimum:

at

and 8. Use the method of Example 2 to find the maximum and minimum values of constraint

subject to the

.

In Exercises 9–10, draw the unit circle and the level curves corresponding to the given quadratic form. Show that the unit circle intersects each of these curves in exactly two places, label the intersection points, and verify that the constrained extrema occur at those points. 9. Answer:

10. xy 11. (a)

Show that the function

has critical points at (0, 0), (1, 1), and

.

(b) Use the Hessian form of the second derivative test to show f has relative maxima at (1, 1) and and a saddle point at (0, 0). 12. (a)

Show that the function

has critical points at (0, 0) and

(b) Use the Hessian form of the second derivative test to show f has a relative maximum at saddle point at (0, 0).

. and a

In Exercises 10–13, find the critical points of f, if any, and classify them as relative maxima, relative minima, or saddle points.

13. Answer: Critical points: (−1, 1), relative maximum; (0, 0), saddle point 14. 15. Answer: Critical points: (0, 0), relative minimum; (2, 1) and (−2, 1), saddle points 16. 17. A rectangle whose center is at the origin and whose sides are parallel to the coordinate axes is to be inscribed in the ellipse . Use the method of Example 2 to find nonnegative values of x and y that produce the inscribed rectangle with maximum area. Answer: Corner points: 18. Suppose that the temperature at a point

on a metal plate is

. An ant, walking

on the plate, traverses a circle of radius 5 centered at the origin. What are the highest and lowest temperatures encountered by the ant? 19. (a) Show that the functions

have a critical point at (0, 0) but the second derivative test is inconclusive at that point. (b) Give a reasonable argument to show that f has a relative minimum at (0, 0) and g has a saddle point at (0, 0). 20. Suppose that the Hessian matrix of a certain quadratic form

is

What can you say about the location and classification of the critical points of f? 21. Suppose that A is an

symmetric matrix and

where x is a vector in that is expressed in column form. What can you say about the value of q if x is a unit eigenvector corresponding to an eigenvalue λ of A? Answer:

22. Prove: If is a quadratic form whose minimum and maximum values subject to the constraint are m and M, respectively, then for each number c in the interval , there is a unit vector . [Hint: In the case where , let and be unit eigenvectors of A such that and , and let

Show that

such that

.]

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) A quadratic form must have either a maximum or minimum value. Answer: False (b) The maximum value of a quadratic form corresponding to the largest eigenvalue of A.

subject to the constraint

occurs at a unit eigenvector

Answer: True (c) The Hessian matrix of a function f with continuous second-order partial derivatives is a symmetric matrix. Answer: True (d) If is a critical point of a function f and the Hessian of f at maximum nor a relative minimum at .

is 0, then f has neither a relative

Answer: False (e) If A is a symmetric matrix and negative.

, then the minimum of

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

subject to the constraint

is

7.5 Hermitian, Unitary, and Normal Matrices We know that every real symmetric matrix is orthogonally diagonalizable and that the real symmetric matrices are the only orthogonally diagonalizable matrices. In this section we will consider the diagonalization problem for complex matrices.

Hermitian and Unitary Matrices The transpose operation is less important for complex matrices than for real matrices. A more useful operation for complex matrices is given in the following definition.

DEFINITION 1 If A is a complex matrix, then the conjugate transpose of A, denoted by

, is defined by (1)

Remark Since part (b) of Theorem 5.3.2 states that conjugation operations are performed in computing has real entries we have

, so

, the order in which the transpose and does not matter. Moreover, in the case where A

is the same as

for real matrices.

E X A M P L E 1 Conjugate Transpose Find the conjugate transpose

of the matrix

Solution We have

The following theorem, parts of which are given as exercises, shows that the basic algebraic properties of the conjugate transpose operation are similar to those of the transpose (compare to Theorem 1.4.8).

THEOREM 7.5.1 If k is a complex scalar, and if A, B, and C are complex matrices whose sizes are such that the stated operations can be performed, then: (a) (b) (c) (d) (e)

Remark Note that the relationship

in Formula 5 of Section 5.3 can be expressed in terms of the

conjugate transpose as

(2)

We are now ready to define two new classes of matrices that will be important in our study of diagonalization in .

DEFINITION 2 A square complex matrix A is said to be unitary if (3) and is said to be Hermitian * if (4)

Note that a unitary matrix can also be defined

as a square complex matrix A for which

If A is a real matrix, then , in which case 3 becomes and 4 becomes . Thus, the unitary matrices are complex generalizations of the real orthogonal matrices and Hermitian matrices are complex generalizations of the real symmetric matrices.

E X A M P L E 2 Recognizing Hermitian Matrices Hermitian matrices are easy to recognize because their diagonal entries are real (why?), and the entries that are symmetrically positioned across the main diagonal are complex conjugates. Thus, for example, we can tell by inspection that

is Hermitian.

The fact that real symmetric matrices have real eigenvalues is a special case of the following more general result about Hermitian matrices, the proof of which is left for the exercises.

THEOREM 7.5.2 The eigenvalues of a Hermitian matrix are real numbers.

The fact that eigenvectors from different eigenspaces of a real symmetric matrix are orthogonal is a special case of the following more general result about Hermitian matrices.

THEOREM 7.5.3 If A is a Hermitian matrix, then eigenvectors from different eigenspaces are orthogonal.

Proof Let and and the facts that

be eigenvectors of A corresponding to distinct eigenvalues , and we can write

and

. Using Formula 2

This implies that

and hence that

.

E X A M P L E 3 Eigenvalues and Eigenvectors of a Hermitian Matrix Confirm that the Hermitian matrix

has real eigenvalues and that eigenvectors from different eigenspaces are orthogonal. Solution The characteristic polynomial of A is

so the eigenvalues of A are by solving the linear system

with and with systems are

and

, which are real. Bases for the eigenspaces of A can be obtained

. We leave it for you to do this and to show that the general solutions of these

Thus, bases for these eigenspaces are

The vectors

and

are orthogonal since

and hence all scalar multiples of them are also orthogonal.

Unitary matrices are not usually easy to recognize by inspection. However, the following analog of Theorems 7.1.1 and 7.1.3, part of which is proved in the exercises, provides a way of ascertaining whether a matrix is

unitary without computing its inverse.

THEOREM 7.5.4 If A is an

matrix with complex entries, then the following are equivalent.

(a) A is unitary. (b) (c)

for all x in

.

for all x and y in

.

(d) The column vectors of A form an orthonormal set in inner product. (e) The row vectors of A form an orthonormal set in product.

E X A M P L E 4 A Unitary Matrix Use Theorem 7.5.4 to show that

is unitary, and then find

.

Solution We will show that the row vectors

are orthonormal. The relevant computations are

Since we now know that A is unitary, it follows that

with respect to the complex Euclidean with respect to the complex Euclidean inner

You can confirm the validity of this result by showing that

.

Unitary Diagonalizability Since unitary matrices are the complex analogs of the real orthogonal matrices, the following definition is a natural generalization of orthogonal diagonalizability for real matrices.

DEFINITION 3 A square complex matrix is said to be unitarily diagonalizable if there is a unitary matrix P such that is a complex diagonal matrix. Any such matrix P is said to unitarily diagonalize A.

Recall that a real symmetric matrix A has an orthonormal set of n eigenvectors and is orthogonally diagonalized by any matrix whose column vectors are an orthonormal set of eigenvectors of A. Here is the complex analog of that result.

THEOREM 7.5.5 Every by any

Hermitian matrix A has an orthonormal set of n eigenvectors and is unitarily diagonalized matrix P whose column vectors form an orthonormal set of eigenvectors of A.

The procedure for unitarily diagonalizing a Hermitian matrix A is exactly the same as that for orthogonally diagonalizing a symmetric matrix:

Unitarily Diagonalizing a Hermitian Matrix Step 1. Find a basis for each eigenspace of A. Step 2. Apply the Gram-Schmidt process to each of these bases to obtain orthonormal bases for the eigenspaces.

Step 3. Form the matrix P whose column vectors are the basis vectors obtained in Step 2. This will be a unitary matrix (Theorem 7.5.4) and will unitarily diagonalize A.

E X A M P L E 5 Unitary Diagonalization of a Hermitian Matrix Find a matrix P that unitarily diagonalizes the Hermitian matrix

Solution We showed in Example 3 that the eigenvalues of A are for the corresponding eigenspaces are

and

and that bases

Since each eigenspace has only one basis vector, the Gram-Schmidt process is simply a matter of normalizing these basis vectors. We leave it for you to show that

Thus, A is unitarily diagonalized by the matrix

Although it is a little tedious, you may want to check this result by showing that

Skew-Symmetric and Skew-Hermitian Matrices In Exercise 37 of Section 1.7 we defined a square matrix with real entries to be skew-symmetric if . A skew-symmetric matrix must have zeros on the main diagonal (why?), and each entry off the main diagonal

must be the negative of its mirror image about the main diagonal. Here is an example.

We leave it for you to confirm that

.

The complex analogs of the skew-symmetric matrices are the matrices for which said to be skew-Hermitian.

. Such matrices are

Since a skew-Hermitian matrix A has the property it must be that A has zeros or pure imaginary numbers on the main diagonal (why?), and that the complex conjugate of each entry off the main diagonal is the negative of its mirror image about the main diagonal. Here is an example.

Normal Matrices Hermitian matrices enjoy many, but not all, of the properties of real symmetric matrices. For example, we know that real symmetric matrices are orthogonally diagonalizable and Hermitian matrices are unitarily diagonalizable. However, whereas the real symmetric matrices are the only orthogonally diagonalizable matrices, the Hermitian matrices do not constitute the entire class of unitarily diagonalizable complex matrices; that is, there exist unitarily diagonalizable matrices that are not Hermitian. Specifically, it can be proved that a square complex matrix A is unitarily diagonalizable if and only if Matrices with this property are said to be normal. Normal matrices include the Hermitian, skew-Hermitian, and unitary matrices in the complex case and the symmetric, skew-symmetric, and orthogonal matrices in the real case. The nonzero skew-symmetric matrices are particularly interesting because they are examples of real matrices that are not orthogonally diagonalizable but are unitarily diagonalizable.

A Comparison of Eigenvalues We have seen that Hermitian matrices have real eigenvalues. In the exercises we will ask you to show that the eigenvalues of a skew-Hermitian matrix are either zero or purely imaginary (have real part of zero) and that the eigenvalues of unitary matrices have modulus 1. These ideas are illustrated schematically in Figure 7.5.1.

Figure 7.5.1

Concept Review • Conjugate transpose • Unitary matrix • Hermitian matrix • Unitarily diagonalizable matrix • Skew-symmetric matrix • Skew-Hermitian matrix • Normal matrix

Skills • Find the conjugate transpose of a matrix. • Be able to identify Hermitian matrices. • Find the inverse of a unitary matrix. • Find a unitary matrix that diagonalizes a Hermitian matrix.

Exercise Set 7.5 In Exercises 1–2, find 1.

Answer:

.

2. In Exercises 3–4, substitute numbers for the ×'s so that A is Hermitian. 3.

Answer:

4.

In Exercises 5–6, show that A is not Hermitian for any choice of the ×'s. 5. (a)

(b)

Answer: (a) (b) 6. (a)

(b)

In Exercises 7–8, verify that the eigenvalues of the Hermitian matrix A are real and that eigenvectors from different eigenspaces are orthogonal (see Theorem 7.5.3). 7. 8.

In Exercises 9–12, show that A is unitary, and find

.

9.

Answer:

10.

11.

Answer:

12.

In Exercises 13–18, find a unitary matrix P that diagonalizes the Hermitian matrix A, and determine 13. Answer:

.

14. 15. Answer:

16. 17.

Answer:

18.

In Exercises 19–20, substitute numbers for the ×'s so that A is skew-Hermitian. 19.

Answer:

20.

In Exercises 21–22, show that A is not skew-Hermitian for any choice of the ×'s. 21. (a)

(b)

Answer: (a) (b) 22. (a)

(b)

In Exercises 23–24, verify that the eigenvalues of the skew-Hermitian matrix A are pure imaginary numbers. 23. 24. In Exercises 25–26, show that A is normal. 25.

26.

27. Show that the matrix

is unitary for all real values of θ. [Note: See Formula 17 in Appendix B for the definition of

.]

28. Prove that each entry on the main diagonal of a skew-Hermitian matrix is either zero or a pure imaginary number. 29. Let A be any

matrix with complex entries, and define the matrices B and C to be

(a) Show that B and C are Hermitian. (b) Show that

and

.

(c) What condition must B and C satisfy for A to be normal? Answer: (c) B and C must commute. 30. Show that if A is an in column form, then

matrix with complex entries, and if u and v are vectors in

31. Show that if A is a unitary matrix, then so is

that are expressed

.

32. Show that the eigenvalues of a skew-Hermitian matrix are either zero or purely imaginary. 33. Show that the eigenvalues of a unitary matrix have modulus 1. 34. Show that if u is a nonzero vector in 35. Show that if u is a unit vector in unitary.

that is expressed in column form, then that is expressed in column form, then

is Hermitian. is Hermitian and

36. What can you say about the inverse of a matrix A that is both Hermitian and unitary? 37. Find a

matrix that is both Hermitian and unitary and whose entries are not all real numbers.

Answer:

38. Under what conditions is the following matrix normal?

39. What geometric interpretations might you reasonably give to multiplication by the matrices

and

in Exercises 34 and 35? Answer: Multiplication of x by P corresponds to

times the orthogonal projection of x onto

, then multiplications of x by

. If

corresponds to reflection of x about the hyperplane

. 40. Prove that if A is an invertible matrix, then 41. (a) Prove that

is invertible, and

.

.

(b) Use the result in part (a) and the fact that a square matrix and its transpose have the same determinant to prove that

.

42. Use part (b) of Exercise 41 to prove: (a) If A is Hermitian, then det(A) is real. (b) If A is unitary, then

.

43. Use properties of the transpose and complex conjugate to prove parts (a) and (e) of Theorem 7.5.1. 44. Use properties of the transpose and complex conjugate to prove parts (b) and (d) of Theorem 7.5.1. 45. Prove that an matrix with complex entries is unitary if and only if the columns of A form an orthonormal set in . 46. Prove that the eigenvalues of a Hermitian matrix are real.

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a)

The matrix

is Hermitian.

Answer: False (b)

The matrix

is unitary.

Answer: False (c) The conjugate transpose of a unitary matrix is unitary.

Answer: True (d) Every unitarily diagonalizable matrix is Hermitian. Answer: False (e) A positive integer power of a skew-Hermitian matrix is skew-Hermitian. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

Chapter 7 Supplementary Exercises 1. Verify that each matrix is orthogonal, and find its inverse. (a)

(b)

Answer: (a)

(b)

2. Prove: If Q is an orthogonal matrix, then each entry of Q is the same as its cofactor if the negative of its cofactor if . 3. Prove that if A is a positive definite symmetric matrix, and if u and v vectors in

is an inner product on

in column form, then

.

4. Find the characteristic polynomial and the dimensions of the eigenspaces of the symmetric matrix

5. Find a matrix P that orthogonally diagonalizes

and determine the diagonal matrix

.

and is

Answer:

6. Express each quadratic form in the matrix notation

.

(a) (b) 7. Classify the quadradic form as positive definite, negative definite, indefinite, positive semidefinite, or negative semidefinite. Answer: positive definite 8. Find an orthogonal change of variable that eliminates the cross product terms in each quadratic form, and express the quadratic form in terms of the new variables. (a) (b) 9. Identify the type of conic section represented by each equation. (a) (b) Answer: (a) parabola (b) parabola 10. Find a unitary matrix U that diagonalizes

and determine the diagonal matrix 11. Show that if U is an then the product

unitary matrix and

.

is also unitary. 12. Suppose that

.

(a) Show that iA is Hermitian. (b) Show that A is unitarily diagonalizable and has pure imaginary eigenvalues.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

CHAPTER

8

Linear Transformations

CHAPTER CONTENTS 8.1. General Linear Transformations 8.2. Isomorphism 8.3. Compositions and Inverse Transformations 8.4. Matrices for General Linear Transformations 8.5. Similarity

INTRODUCTION In Section 4.9 and Section 4.10 we studied linear transformations from to . In this chapter we will define and study linear transformations from a general vector space V to a general vector space W. The results we obtain here have important applications in physics, engineering, and various branches of mathematics.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

8.1 General Linear Transformations Up to now our study of linear transformations has focused on transformations from to . In this section we will turn our attention to linear transformations involving general vector spaces. We will illustrate ways in which such transformations arise, and we will establish a fundamental relationship between general n-dimensional vector spaces and .

Definitions and Terminology In Section 4.9 we defined a matrix transformation

to be a mapping of the form

in which A is an matrix. We subsequently established in Theorem 4.10.2 and Theorem 4.10.3 that the matrix transformations are precisely the linear transformations from to , that is, the transformations with the linearity properties We will use these two properties as the starting point for defining more general linear transformations.

DEFINITION 1 If is a function from a vector space V to a vector space W, then T is called a linear transformation from V to W if the following two properties hold for all vectors u and v in V and for all scalars k: (i)

[Homogeneity property]

(ii)

[Additivity property]

In the special case where V.

, the linear transformation T is called a linear operator on the vector space

The homogeneity and additivity properties of a linear transformation show that if and are vectors in V and and are any scalars, then More generally, if

are vectors in V and

can be used in combination to

are any scalars, then (1)

The following theorem is an analog of parts (a) and (d) of Theorem 4.9.1.

THEOREM 8.1.1 If

is a linear transformation, then:

(a)

.

(b)

for all u and v in V.

Proof Let u be any vector in V. Since

, it follows from the homogeneity property in Definition 1 that

which proves (a). We can prove part (b) by rewriting

as

We leave it for you to justify each step. Use the two parts of Theorem 8.1.1 to prove that for all v in V.

E X A M P L E 1 Matrix Transformations Because we have based the definition of a general linear transformation on the homogeneity and additivity properties of matrix transformations, it follows that a matrix transformation also a linear transformation in this more general sense with and .

is

E X A M P L E 2 The Zero Transformation Let V and W be any two vector spaces. The mapping such that for every v in V is a linear transformation called the zero transformation. To see that T is linear, observe that Therefore,

E X A M P L E 3 The Identity Operator Let V be any vector space. The mapping defined by V. We will leave it for you to verify that I is linear.

E X A M P L E 4 Dilation and Contraction Operators

is called the identity operator on

If V is a vector space and k is any scalar, then the mapping given by operator on V, for if c is any scalar and if u and v are any vectors in V, then

If , then T is called the contraction of V with factor k, and if with factor k (Figure 8.1.1).

is a linear

, it is called the dilation of V

Figure 8.1.1

E X A M P L E 5 A Linear Transformation from Pn to Pn + 1 Let

be a polynomial in

, and define the transformation

by

This transformation is linear because for any scalar k and any polynomials

and

in

we have

and

E X A M P L E 6 A Linear Transformation Using an Inner Product Let V be an inner product space, let

be any fixed vector in V, and let

be the transformation

that maps a vector x into its inner product with . This transformation is linear, for if k is any scalar, and if u and v are any vectors in V, then it follows from properties of inner products that

E X A M P L E 7 Transformations on Matrix Spaces

Let linear.

be the vector space of

matrices. In each part determine whether the transformation is

(a) (b) Solution (a) It follows from parts (b) and (d) of Theorem 1.4.8 that

so

is linear.

(b) It follows from Formula 1 of Section 2.3 that Thus, is not homogeneous and hence not linear if because we showed in Example 1 of Section 2.3 that generally equal.

. Note that additivity also fails and are not

E X A M P L E 8 Translation Is Not Linear Part (a) of Theorem 8.1.1 states that a linear transformation maps 0 to 0. This property is useful for identifying transformations that are not linear. For example, if is a fixed nonzero vector in , then the transformation has the geometric effect of translating each point x in a direction parallel to through a distance of (Figure 8.1.2). This cannot be a linear transformation since , so T does not map 0 to 0.

Figure 8.1.2

translates each point x along a line parallel to .

E X A M P L E 9 The Evaluation Transformation

through a distance

Let V be a subspace of

, let

be distinct real numbers, and let

be the transformation (2)

that associates with f the n-tuple of function values at transformation on V at . Thus, for example, if and if

. We call this the evaluation

, then

The evaluation transformation in 2 is linear, for if k is any scalar, and if f and g are any functions in V, then

and

Finding Linear Transformations from Images of Basis Vectors We saw in Formula (12) of Section 4.9 that if if are the standard basis vectors for It follows from this that the image of any vector expressed as

is a matrix transformation, say multiplication by A, and , then A can be expressed as in

under multiplication by A can be

This formula tells us that for a matrix transformation the image of any vector is expressible as a linear combination of the images of the standard basis vectors. This is a special case of the following more general result.

THEOREM 8.1.2 Let

be a linear transformation, where V is finite dimensional. If

is a basis

for V, then the image of any vector v in V can be expressed as (3) where

are the coefficients required to express v as a linear combination of the vectors in S.

Proof Express v as

and use the linearity of T.

E X A M P L E 1 0 Computing with Images of Basis Vectors Consider the basis Let

for

, where

be the linear transformation for which

Find a formula for

, and then use that formula to compute

Solution We first need to express write

as a linear combination of

. ,

, and

. If we

then on equating corresponding components, we obtain

which yields

,

,

, so

Thus

From this formula, we obtain

CALCULUS REQUIRED

E X A M P L E 11 A Linear Transformation from C1(−∞, ∞) to F(−∞, ∞) Let

be the vector space of functions with continuous first derivatives on

be the vector space of all real-valued functions defined on transformation that maps a function into its derivative—that is,

. Let

, and let be the

From the properties of differentiation, we have Thus, D is a linear transformation. CALCULUS REQUIRED

E X A M P L E 1 2 An Integral Transformation Let

be the vector space of continuous functions on the interval be the vector space of functions with continuous first derivatives on

let

, let , and

be the transformation that maps a function f in V into

For example, if

, then

The transformation is linear, for if k is any constant, and if f and g are any functions in V, then properties of the integral imply that

Kernel and Range Recall that if A is an matrix, then the null space of A consists of all vectors x in such that , and by Theorem 4.7.1 the column space of A consists of all vectors b in for which there is at least one vector x in such that . From the viewpoint of matrix transformations, the null space of A consists of all vectors in that multiplication by A maps into 0, and the column space of A consists of all vectors in that are images of at least one vector in under multiplication by A. The following definition extends these ideas to general linear transformations.

DEFINITION 2 If is a linear transformation, then the set of vectors in V that T maps into 0 is called the kernel of T and is denoted by . The set of all vectors in W that are images under T of at least one vector in V is called the range of T and is denoted by .

E X A M P L E 1 3 Kernel and Range of a Matrix Transformation If is multiplication by the matrix A, then, as discussed above, the kernel of the null space of A, and the range of is the column space of A.

is

E X A M P L E 1 4 Kernel and Range of the Zero Transformation Let

be the zero transformation. Since T maps every vector in V into 0, it follows that . Moreover, since 0 is the only image under T of vectors in V, it follows that

.

E X A M P L E 1 5 Kernel and Range of the Identity Operator Let be the identity operator. Since of some vector (namely, itself); thus that .

for all vectors in V, every vector in V is the image . Since the only vector that I maps into 0 is 0, it follows

E X A M P L E 1 6 Kernel and Range of an Orthogonal Projection As illustrated in Figure 8.1.3a, the points that T maps into are precisely those on the z-axis, so is the set of points of the form . As illustrated in Figure 8.1.3b, T maps the points in to the xy-plane, where each point in that plane is the image of each point on the vertical line above it. Thus, is the set of points of the form .

Figure 8.1.3

E X A M P L E 1 7 Kernel and Range of a Rotation Let be the linear operator that rotates each vector in the xy-plane through the angle (Figure 8.1.4). Since every vector in the xy-plane can be obtained by rotating some vector through the angle , it follows that . Moreover, the only vector that rotates into 0 is 0, so .

Figure 8.1.4 CALCULUS REQUIRED

E X A M P L E 1 8 Kernel of a Differentiation Transformation Let

be the vector space of functions with continuous first derivatives on

let

be the vector space of all real-valued functions defined on , and let be the differentiation transformation . The kernel of D is the set of functions in

V with derivative zero. From calculus, this is the set of constant functions on

,

.

Properties of Kernel and Range In all of the preceding examples, and turned out to be subspaces. In Example 14, Example 15, and Example 17 they were either the zero subspace or the entire vector space. In Example 16 the kernel was a line through the origin, and the range was a plane through the origin, both of which are subspaces of . All of this is a consequence of the following general theorem.

THEOREM 8.1.3 If

is a linear transformation, then:

(a) The kernel of T is a subspace of V. (b) The range of T is a subspace of W.

Proof (a) To show that is a subspace, we must show that it contains at least one vector and is closed under addition and scalar multiplication. By part (a) of Theorem 8.1.1, the vector 0 is in , so the kernel contains at least one vector. Let and be vectors in , and let k be any scalar. Then

so

is in

. Also,

so

is in

.

Proof (b) To show that is a subspace of W, we must show that it contains at least one vector and is closed under addition and scalar multiplication. However, it contains at least the zero vector of W since by part (a) of Theorem 8.1.1. To prove that it is closed under addition and scalar multiplication, we must show that if and are vectors in , and if k is any scalar, then there exist vectors a and b in V for which

(4) But the fact

and

are in

tells us that there exist vectors

and

in V such that

The following computations complete the proof by showing that the vectors equations in 4:

and

satisfy the

CALCULUS REQUIRED

E X A M P L E 1 9 Application to Differential Equations Differential equations of the form (5) arise in the study of vibrations. The set of all solutions of this equation on the interval kernel of the linear transformation

is the

, given by

It is proved in standard textbooks on differential equations that the kernel is a two-dimensional subspace of

, so that if we can find two linearly independent solutions of 5, then all other solutions

can be expressed as linear combinations of those two. We leave it for you to confirm by differentiating that are solutions of 5. These functions are linearly independent since neither is a scalar multiple of the other, and thus (6) is a “general solution” of 5 in the sense that every choice of solution is of this form.

and

produces a solution, and every

Rank and Nullity of Linear Transformations In Definition 1 of Section 4.8 we defined the notions of rank and nullity for an matrix, and in Theorem 4.8.2, which we called the Dimension Theorem, we proved that the sum of the rank and nullity is n. We will show next that this result is a special case of a more general result about linear transformations. We start with the following definition.

DEFINITION 3 Let be a linear transformation. If the range of T is finite-dimensional, then its dimension is called the rank of T; and if the kernel of T is finite-dimensional, then its dimension is called the nullity of T. The rank of T is denoted by and the nullity of T by .

The following theorem, whose proof is optional, generalizes Theorem 4.8.2.

THEOREM 8.1.4 Dimension Theorem for Linear Transformations If

is a linear transformation from an n-dimensional vector space V to a vector space W, then (7)

In the special case where A is an matrix and is multiplication by A, the kernel of space of A, and the range of is the column space of A. Thus, it follows from Theorem 8.1.4 that

is the null

OPTIONAL

Proof of Theorem 8.1.4 We must show that

We will give the proof for the case where . The cases where and are left as exercises. Assume , and let be a basis for the kernel. Since is linearly independent, Theorem 4.5.5b states that there are vectors, , such that the extended set is a basis for V. To complete the proof, we will show that the vectors in the set form a basis for the range of T. It will then follow that

First we show that S spans the range of T. If b is any vector in the range of T, then for some vector v in V. Since is a basis for V, the vector v can be written in the form Since

lie in the kernel of T, we have

, so

Thus S spans the range of T. Finally, we show that S is a linearly independent set and consequently forms a basis for the range of T. Suppose that some linear combination of the vectors in S is zero; that is, (8) We must show that

. Since T is linear, 8 can be rewritten as

which says that combination of the basis vectors

is in the kernel of T. This vector can therefore be written as a linear , say

Thus, Since is linearly independent, all of the k's are zero; in particular, completes the proof.

Concept Review • Linear transformation • Linear operator • Zero transformation • Identity operator • Contraction • Dilation • Evaluation transformation • Kernel • Range • Rank • Nullity

Skills • Determine whether a function is a linear transformation. • Find a formula for a linear transformation • Find a basis for the kernel of a linear transformation. • Find a basis for the range of a linear transformation. • Find the rank of a linear transformation. • Find the nullity of a linear transformation.

given the values of T on a basis for V.

, which

Exercise Set 8.1 In Exercises 1–8, determine whether the function is a linear transformation. Justify your answer. 1.

, where V is an inner product space, and

.

Answer: Nonlinear 2.

, where

3.

is a fixed vector in

, where B is a fixed

and

.

matrix and

.

Answer: Linear 4.

, where

5.

.

, where

.

Answer: Linear 6.

, where (a) (b)

7.

, where (a) (b) Answer: (a) Linear (b) Nonlinear

8.

, where (a) (b)

9. Consider the basis operator for which Find a formula for

for

, where

, and use that formula to find

and

, and let

.

be the linear

Answer:

10. Consider the basis linear transformation such that Find a formula for

for

, where

and

, and use that formula to find

11. Consider the basis for , where be the linear operator for which

Find a formula for

, and let

be the

. ,

, and use that formula to find

, and

, and let

, and

, and let

.

Answer:

12. Consider the basis for , where be the linear transformation for which Find a formula for 13. Let

,

Find

, and

, and use that formula to find be vectors in a vector space V, and let

,

. be a linear transformation for which

.

Answer:

14. Let

be the linear operator given by the formula

Which of the following vectors are in

?

(a) (b) (c) 15. Let (a) (b) (c) Answer: (a)

be the linear operator in Exercise 14. Which of the following vectors are in

?

16. Let

be the linear transformation given by the formula

Which of the following are in

?

(a) (b) (c) 17. Let

be the linear transformation in Exercise 16. Which of the following are in

?

(a) (b) (c) Answer: (a) 18. Let

be the linear transformation defined by

. Which of the following are in

? (a) (b) (c) 19. Let

be the linear transformation in Exercise 18. Which of the following are in

(a) (b) (c) Answer: (a) 20. Find a basis for the kernel of (a) the linear operator in Exercise 14. (b) the linear transformation in Exercise 16. (c) the linear transformation in Exercise 18. 21. Find a basis for the range of (a) the linear operator in Exercise 14. (b) the linear transformation in Exercise 16. (c) the linear transformation in Exercise 18. Answer: (a) (b)

?

(c) 22. Verify Formula 7 of the dimension theorem for (a) the linear operator in Exercise 14. (b) the linear transformation in Exercise 16. (c) the linear transformation in Exercise 18. In Exercises 23–26, let T be multiplication by the matrix A. Find (a) a basis for the range of T. (b) a basis for the kernel of T. (c) the rank and nullity of T. (d) the rank and nullity of A. 23.

Answer: (a)

(b)

(c) Rank

nullity

(d) Rank

nullity

24.

25. Answer: (a) (b)

(c) Rank (d) Rank

26.

27. Describe the kernel and range of (a) the orthogonal projection on the

-plane.

(b) the orthogonal projection on the

-plane.

(c) the orthogonal projection on the plane defined by the equation

.

Answer: (a) Kernel: y-axis; range: xz-plane (b) Kernel: x-axis; range: yz-plane (c) Kernel: the line through the origin perpendicular to the plane 28. Let V be any vector space, and let

be defined by

; range: plane .

(a) What is the kernel of T? (b) What is the range of T? 29. In each part, use the given information to find the nullity of the linear transformation T. (a)

has rank 3.

(b)

has rank 1.

(c) The range of (d)

is

.

has rank 3.

Answer: (a) Nullity (b) Nullity (c) Nullity (d) Nullity 30. Let A be a matrix such that A. Find the rank and nullity of T. 31. Let A be a

has only the trivial solution, and let

matrix with rank 4.

(a) What is the dimension of the solution space of (b) Is

be multiplication by

consistent for all vectors b in

?

? Explain.

Answer: (a) 3 (b) No 32. Let

be a linear transformation from

to any vector space. Give a geometric description of

.

33. Let

be a linear transformation from any vector space to

. Give a geometric description of

.

Answer: A line through the origin, a plane through the origin, the origin only, or all of 34. Let

be multiplication by

(a) Show that the kernel of T is a line through the origin, and find parametric equations for it. (b) Show that the range of T is a plane through the origin, and find an equation for it. 35. (a) Show that if

,

,

, and

defines a linear operator on

are any scalars, then the formula .

(b) Does the formula

define a linear operator on

? Explain.

Answer: (b) No 36. Let

be a basis for a vector space V, and let

be a linear transformation. Show that if

then T is the zero transformation. 37. Let

be a basis for a vector space V, and let

be a linear operator. Show that if

then T is the identity transformation on V. 38. For a positive integer , let be the linear transformation defined by an matrix with real entries. Determine the dimension of . 39. Prove: If is a basis for V and there exists a linear transformation such that 40. (Calculus required) Let be the transformation defined by

, where A is

are vectors in W, not necessarily distinct, then

be the vector space of functions continuous on

, and let

Is T a linear operator? 41. (Calculus required) Let

be the differentiation transformation

D? Answer: ker(D) consists of all constant polynomials.

. What is the kernel of

42.

(Calculus required) Let

be the integration transformation

. What is the kernel

of J? 43. (Calculus required) Let V be the vector space of real-valued functions with continuous derivatives of all orders on the interval , and let be the vector space of real-valued functions defined on . (a) Find a linear transformation

whose kernel is

.

(b) Find a linear transformation

whose kernel is

.

Answer: (a) (b) 44. If A is an matrix, and if the linear system say about the range of ?

is consistent for every vector b in

, what can you

True-False Exercises In parts (a)–(i) determine whether the statement is true or false, and justify your answer. (a) If linear transformation.

for all vectors

and

in V and all scalars

and

, then T is a

Answer: True (b) If v is a nonzero vector in V, then there is exactly one linear transformation .

such that

Answer: False (c) There is exactly one linear transformation V.

for which

for all vectors u and v in

Answer: True (d) If

is a nonzero vector in V, then the formula

Answer: False (e) The kernel of a linear transformation is a vector space. Answer: True

defines a linear operator on V.

(f) The range of a linear transformation is a vector space. Answer: True (g) If

is a linear transformation, then the nullity of T is 3.

Answer: False (h) The function

defined by

is a linear transformation.

Answer: False (i) The linear transformation

has rank 1. Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

defined by

8.2 Isomorphism In this section we will establish a fundamental connection between real finite-dimensional vector spaces and the Euclidean space . This connection is not only important theoretically, but it has practical applications in that it allows us to perform vector computations in general vector spaces by working with the vectors in .

One-to-One and Onto Although many of the theorems in this text have been concerned exclusively with the vector space , this is not as limiting as it might seem. As we will show, the vector space is the “mother” of all real n-dimensional vector spaces in the sense that any such space might differ from in the notation used to represent vectors, but not in its algebraic structure. To explain what we mean by this, we will need two definitions, the first of which is a generalization of Definition 1 in Section 4.10. (See Figure 8.2.1).

DEFINITION 1 If is a linear transformation from a vector space V to a vector space W, then T is said to be one-to-one if T maps distinct vectors in V into distinct vectors in W.

DEFINITION 2 If is a linear transformation from a vector space V to a vector space W, then T is said to be onto (or onto W) if every vector in W is the image of at least one vector in V.

Figure 8.2.1 The following theorem provides a useful way of telling whether a linear transformation is one-to-one by examining its kernel.

THEOREM 8.2.1 If

is a linear transformation, then the following statements are equivalent.

(a) T is one-to-one. (b)

Proof (a) ⇒ (b) Since T is linear, we know that other vectors in V that map into 0, so .

by Theorem 8.1.1a. Since T is one-to-one, there can be no

(b) ⇒ (a) Assume that . If u and v are distinct vectors in V, then for otherwise would contain a nonzero vector. Since T is linear, it follows that

. This implies that

so T maps distinct vectors in V into distinct vectors in W and hence is one-to-one. In the special case where V is finite-dimensional and T is a linear operator on V, then we can add a third statement to those in Theorem 8.2.1.

THEOREM 8.2.2 If V is a finite-dimensional vector space, and if equivalent.

is a linear operator, then the following statements are

(a) T is one-to-one. (b)

.

(c) T is onto [i.e.,

]

Proof We already know that (a) and (b) are equivalent by Theorem 8.2.1, so it suffices to show that (b) and (c) are equivalent. We leave it for you to do this by assuming that and applying Theorem 8.1.4.

E X A M P L E 1 Dilations and Contractions Are One-to-One and Onto Show that if V is a finite-dimensional vector space and c is any nonzero scalar, then the linear operator defined by is one-to-one and onto. Solution The operator T is onto (and hence one-to-one) for if v is any vector in V then that vector is the image of the vector .

E X A M P L E 2 Matrix Operators If

is the matrix operator , then it follows from parts (r) and (s) of Theorem 5.1.6 that is one-to-one and onto if and only if A is invertible.

E X A M P L E 3 Shifting Operators

,

Let be the sequence space discussed in Example 3 of Section 4.1, and consider the linear “shifting operators” on V defined by

(a) Show that

is one-to-one but not onto.

(b) Show that

is onto but not one to one.

Solution (a) The operator is one-to-one because distinct sequences in obviously have distinct images. This operator is not onto because no vector in maps into the sequence , for example. (b) The operator

is not one-to-one because, for example, the vectors and both map into . This operator is onto because every possible sequence of real numbers can be obtained with an appropriate choice of the numbers

Why does Example 3 not violate Theorem 8.2.2?

E X A M P L E 4 Basic Transformations That Are One-to-One and Onto The linear transformations

and

defined by

are both one-to-one and onto (verify by showing that their kernels contain only the zero vector).

E X A M P L E 5 A One-to-One Linear Transformation Let

be the linear transformation

discussed in Example 5 of Section 8.1. If are distinct polynomials, then they differ in at least one coefficient. Thus,

also differ in at least one coefficient. It follows that T is one-to-one since it maps distinct polynomials p and q into distinct polynomials and . CALCULUS REQUIRED

E X A M P L E 6 A Transformation That Is Not One-to-One Let

.

be the differentiation transformation discussed in Example 11 of Section 8.1. This linear transformation is not one-to-one because it maps functions that differ by a constant into the same function. For example,

Dimension and Linear Transformations In the exercises we will ask you to prove the following two important facts about a linear transformation case where V and W are finite-dimensional: 1. If

, then T cannot be one-to-one.

2. If

, then T cannot be onto.

in the

Stated informally, if a linear transformation maps a “bigger” space to a “smaller” space, then some points in the “bigger” space must have the same image; and if a linear transformation maps a “smaller” space to a “bigger” space, then there must be points in the “bigger” space that are not images of any points in the “smaller” space. Remark These observations tell us, for example, that any linear transformation from to must map some distinct points of into the same point in , and it also tells us that there is no linear transformation that maps onto all of

.

Isomorphism Our next definition paves the way for the main result in this section.

DEFINITION 3 If a linear transformation is both one-to-one and onto, then T is said to be an isomorphism, and the vector spaces V and W are said to be isomorphic.

The word isomorphic is derived from the Greek words iso, meaning “identical,” and morphe, meaning “form.” This terminology is appropriate because, as we will now explain, isomorphic vector spaces have the same “algebraic form,” even though they may consist of different kinds of objects. To illustrate this idea, examine Table 1 in which we have shown how the isomorphism

matches up vector operations in

and

. Table 1

Operation in P2

Operation in R3

Operation in P2

Operation in R3

The following theorem, which is one of the most important results in linear algebra, reveals the fundamental importance of the vector space .

THEOREM 8.2.3 Every real n-dimensional vector space is isomorphic to

.

Theorem 8.2.3 tells us that a real n-dimensional vector space may differ from in notation, but its algebraic structure will be the same. Proof Let V be a real n-dimensional vector space. To prove that V is isomorphic to transformation that is one-to-one and onto. For this purpose, let

we must find a linear

be any basis for V, let (1) be the representation of a vector u in V as a linear combination of the basis vectors, and define the transformation by (2) We will show that T is an isomorphism (linear, one-to-one, and onto). To prove the linearity, let u and v be vectors in V, let c be a scalar, and let (3) be the representations of u and v as linear combinations of the basis vectors. Then it follows from 1 that

and it follows from 2 that

which shows that T is linear. To show that T is one-to-one, we must show that if u and v are distinct vectors in V, then so are their images in . But if , and if the representations of these vectors in terms of the basis vectors are as in 3, then we

must have

for at least one i. Thus,

which shows that u and v have distinct images under T. Finally, the transformation T is onto, for if is any vector in

, then it follows from 2 that w is the image under T of the vector

Remark Note that the isomorphism T in Formula 2 of the foregoing proof is the coordinate map

that maps u into its coordinate vector with respect to the basis . Since there are generally many possible bases for a given vector space V, there are generally many possible isomorphisms between V and , one for each different basis.

E X A M P L E 7 The Natural Isomorphism from Pn − 1 to Rn We leave it for you to verify that the mapping

from

to

is one-to-one, onto, and linear. This is called the natural isomorphism from

because, as the following computations show, it maps the natural basis standard basis for

for

to into the

:

E X A M P L E 8 The Natural Isomorphism from M22 to R4 The matrices

form a basis for the vector space of matrices. An isomorphism first writing a matrix A in in terms of the basis vectors as

can be constructed by

and then defining T as Thus, for example,

More generally, this idea can be used to show that the vector space isomorphic to .

of

matrices with real entries is

E X A M P L E 9 Differentiation by Matrix Multiplication Consider the differentiation transformation on the vector space of polynomials of degree three or less. If we map and into and , respectively, by the natural isomorphisms, then the transformation D produces a corresponding matrix transformation from to . Specifically, the derivative transformation

produces the matrix transformation

Thus, for example, the derivative

can be calculated as the matrix product

This idea is useful for constructing numerical algorithms to perform derivative calculations.

Inner Product Space Isomorphisms In the case where V is a real n-dimensional inner product space, both V and have, in addition to their algebraic structure, a geometric structure arising from their respective inner products. Thus, it is reasonable to inquire if there exists an isomorphism from V to that preserves the geometric structure as well as the algebraic structure. For example, we would want orthogonal vectors in V to have orthogonal counterparts in , and we would want orthonormal sets in V to correspond to orthonormal sets in . In order for an isomorphism to preserve geometric structure, it obviously has to preserve inner products, since notions of length, angle, and orthogonality are all based on the inner product. Thus, if V and W are inner product spaces, then we call an isomorphism an inner product space isomorphism if

It can be proved that if V is any real n-dimensional inner product space and has the Euclidean inner product (the dot product), then there exists an inner product space isomorphism from V to . Under such an isomorphism, the inner product space V has the same algebraic and geometric structure as . In this sense, every n-dimensional inner product space is a “carbon copy” of with the Euclidean inner product that differs only in the notation used to represent vectors.

E X A M P L E 1 0 An Inner Product Space Isomorphism

Let be the vector space of real n-tuples in comma-delimited form, let matrices, let have the Euclidean inner product , and let

be the vector space of real have the inner product

in which u and v are expressed in column form. The mapping

defined by

is an inner product space isomorphism, so the distinction between the inner product space and the inner product space is essentially notational, a fact that we have used many times in this text.

Concept Review • One-to-one • Onto • Isomorphism • Isomorphic vector spaces • Natural isomorphism • Inner product space isomorphism

Skills • Determine whether a linear transformation is one-to-one. • Determine whether a linear transformation is onto. • Determine whether a linear transformation is an isomorphism.

Exercise Set 8.2 1. In each part, find

, and determine whether the linear transformation T is one-to-one.

(a)

, where

(b)

, where

(c)

, where

(d)

, where

(e)

, where

(f)

, where

Answer: (a) (b) (c)

T is one-to-one T is not one-to-one T is one-to-one

(d)

T is one-to-one

(e)

; T is not one-to-one

(f)

; T is not one-to-one

2. Which of the transformations in Exercise 1 are onto? 3. In each part, determine whether multiplication by A is a one-to-one linear transformation. (a)

(b)

(c)

Answer: (a) Not one-to-one (b) Not one-to-one (c) One-to-one 4. Which of the transformations in Exercise 3 are onto? 5. As indicated in the accompanying figure, let

be the orthogonal projection on the line

.

(a) Find the kernel of T. (b) Is T one-to-one? Justify your conclusion.

Figure Ex-5 Answer: (a) ker (b) T is not one-to-one since

.

6. As indicated in the accompanying figure, let (a) Find the kernel of T. (b) Is T one-to-one? Justify your conclusion.

be the linear operator that reflects each point about the y-axis.

Figure Ex-6 7. In each part, use the given information to determine whether the linear transformation T is one-to-one. (a) (b) (c) (d) Answer: (a) T is one-to-one (b) T is not one-to-one (c) T is not one-to-one (d) T is one-to-one 8. In each part, determine whether the linear transformation T is one-to-one. (a)

, where

(b)

, where

9. Prove: If V and W are finite-dimensional vector spaces such that transformation .

, then there is no one-to-one linear

10. Prove: There can be an onto linear transformation from V to W only if 11. (a) Find an isomorphism between the vector space of all

.

symmetric matrices and

(b) Find two different isomorphisms between the vector space of all

matrices and

. .

(c) Find an isomorphism between the vector space of all polynomials of degree at most 3 such that (d) Find an isomorphism between the vector spaces Answer: (a)

(b)

(c)

and

.

and

.

(d)

12.

(Calculus required) Let

be the integration transformation

. Determine whether J is

one-to-one. Justify your conclusion. 13. (Calculus required) Let V be the vector space

and let

be defined by

Verify that T is a linear transformation. Determine whether T is one-to-one, and justify your conclusion. Answer: T is not one-to-one since, for example,

is in its kernel.

14. (Calculus required) Devise a method for using matrix multiplication to differentiate functions in the vector space . Use your method to find the derivative of . 15. Does the formula

define a one-to-one linear transformation from

to

? Explain your

reasoning. Answer: Yes; it is one-to-one 16. Let E be a fixed elementary matrix. Does the formula Explain your reasoning. 17. Let a be a fixed vector in reasoning.

. Does the formula

define a one-to-one linear operator on define a one-to-one linear operator on

?

? Explain your

Answer: T is not one-to-one since, for example a is in its kernel.

18. Prove that an inner product space isomorphism preserves angles and distances—that is, the angle between u and v in V is equal to the angle between and in W, and . 19. Does an inner product space isomorphism map orthonormal sets to orthonormal sets? Justify your answer. Answer: Yes 20. Find an inner product space isomorphism between

and

.

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer. (a) The vector spaces

and

are isomorphic.

Answer: False (b) If the kernel of a linear transformation

is

, then T is an isomorphism.

Answer: True (c) Every linear transformation from

to

is an isomorphism.

Answer: False (d) There is a subspace of

that is isomorphic to

.

Answer: True (e) There is a

matrix P such that

defined by

is an isomorphism.

Answer: False (f) There is a linear transformation Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

such that the kernel of T is isomorphic to the range of T.

8.3 Compositions and Inverse Transformations In Section 4.10 we discussed compositions and inverses of matrix transformations. In this section we will extend some of those ideas to general linear transformations.

Composition of Linear Transformations The following definition extends Formula 1 of Section 4.10 to general linear transformations. Note that the word “with” establishes the order of the operations in a composition. The composition of with is whereas the composition of

with

is

DEFINITION 1 If denoted by

and

are linear transformations, then the composition of with (which is read “ circle ”), is the function defined by the formula

,

(1) where u is a vector in U.

Remark Observe that this definition requires that the domain of (which is V) contain the range of This is essential for the formula to make sense (Figure 8.3.1).

Figure 8.3.1 The composition of

with

.

.

Our first theorem shows that the composition of two linear transformations is itself a linear transformation.

THEOREM 8.3.1 If and transformation.

are linear transformations, then

is also a linear

Proof If u and v are vectors in U and c is a scalar, then it follows from 1 and the linearity of

and

that

and

Thus,

satisfies the two requirements of a linear transformation.

E X A M P L E 1 Composition of Linear Transformations Let

and

be the linear transformations given by the formulas

Then the composition In particular, if

is given by the formula , then

E X A M P L E 2 Composition with the Identity Operator If is any linear operator, and if 8.1), then for all vectors v in V, we have

It follows that

and

is the identity operator (Example 3 of Section

are the same as T; that is, (2)

As illustrated in Figure 8.3.2, compositions can be defined for more than two linear transformations. For example, if are linear transformations, then the composition

is defined by (3)

Figure 8.3.2 The composition of three linear transformations.

Inverse Linear Transformations In Theorem 4.10.1 we showed that a matrix operator is one-to-one if and only if the matrix A is invertible, in which case the inverse operator is . We then showed that if w is the image of a vector x under the operator , then x is the image under of the vector w (see Figure 4.10.8). Our next objective is to extend the notion of invertibility to general linear transformations. Recall that if is a linear transformation, then the range of T, denoted by , is the subspace of W consisting of all images under T of vectors in V. If T is one-to-one, then each vector w in is the image of a unique vector v in V. This uniqueness allows us to define a new function, called the inverse of T and denoted by , that maps w back into v (Figure 8.3.3).

Figure 8.3.3 The inverse of T maps It can be proved (Exercise 19) that definition of

back into v.

is a linear transformation. Moreover, it follows from the

that (4)

(5)

so that T and

, when applied in succession in either order, cancel the effect of each other.

Remark It is important to note that if is a one-to-one linear transformation, then the domain of is the range of T, where the range may or may not be all of W. However, in the special case where is a one-to-one linear operator and V is n-dimensional, then it follows from Theorem 8.2.2 that T must also be onto, so the domain of is all of V.

E X A M P L E 3 An Inverse Transformation In Example 5 of Section 8.2 we showed that the linear transformation

given by

is one-to-one; thus, T has an inverse. In this case the range of T is not all of but rather the subspace of consisting of polynomials with a zero constant term. This is evident from the formula for T:

It follows that

is given by the formula

For example, in the case where

,

E X A M P L E 4 An Inverse Transformation Let

be the linear operator defined by the formula

Determine whether T is one-to-one; if so, find

.

Solution It follows from Formula 12 of Section 4.9 that the standard matrix for T is

(verify). This matrix is invertible, and from Formula 7 of Section 4.10 the standard matrix for is

It follows that

Expressing this result in horizontal notation yields

Composition of One-To-One Linear Transformations The following theorem shows that a composition of one-to-one linear transformations is one-to-one, and it relates the inverse of a composition to the inverses of its individual linear transformations.

THEOREM 8.3.2 If (a)

and

are one-to-one linear transformations, then

is one-to-one.

(b)

.

Proof (a) We want to show that maps distinct vectors in U into distinct vectors in W. But if u and v are distinct vectors in U, then and are distinct vectors in V since is one-to-one. This and the fact that is one-to-one imply that

are also distinct vectors. But these expressions can also be written as so

maps u and v into distinct vectors in W.

Proof (b) We want to show that

for every vector w in the range of

. For this purpose, let (6)

so our goal is to show that

But it follows from 6 that or, equivalently, Now, taking

of each side of this equation, then taking

of each side of the result, and then using 4

yields (verify)

or, equivalently,

In words, part (b) of Theorem 8.3.2 states that the inverse of a composition is the composition of the inverses in the reverse order. This result can be extended to compositions of three or more linear transformations; for example, (7) In the case where

,

, and

are matrix operators on

, Formula 7 can be written as

or alternatively as (8)

Note the order of the subscripts on the two sides of Formula 8.

Concept Review • Composition of linear transformations • Inverse of a linear transformation

Skills • Find the domain and range of the composition of two linear transformations. • Find the composition of two linear transformations. • Determine whether a linear transformation has an inverse. • Find the inverse of a linear transformation.

Exercise Set 8.3 1. Find

.

(a)

,

(b)

,

(c)

,

(d)

,

Answer: (a) (b) (c) (d) 2. Find

.

(a)

,

(b)

,

,

3. Let

,

and

be the linear transformations given by

and

. (a)

Find

, where

(b) Can you find

? Explain.

Answer: (a) (b)

does not exist since

4. Let

and . Find

5. Let

is not a

matrix.

be the linear operators given by and

be the dilation

. Find a linear operator

and . such that

. Answer:

6. Suppose that the linear transformations

and

are given by the formulas

and

and 7. Let

. Find

be a fixed polynomial of degree m, and define a function T with domain . Show that T is a linear transformation.

8. Use the definition of (a)

by the formula

given by Formula 3 to prove that

is a linear transformation.

(b)

.

(c)

.

9. Let

.

be the orthogonal projection of

10. In each part, let

onto the xy-plane. Show that

.

be multiplication by A. Determine whether T has an inverse; if so, find

(a) (b) (c) 11. In each part, let

(a)

(b)

(c)

(d)

Answer: (a) T has no inverse.

be multiplication by A. Determine whether T has an inverse; if so, find

(b)

(c)

(d)

12. In each part, determine whether the linear operator

is one-to-one; if so, find

. (a) (b) (c) 13. Let

be the linear operator defined by the formula

where

are constants.

(a) Under what conditions will T have an inverse? (b) Assuming that the conditions determined in part (a) are satisfied, find a formula for . Answer: (a)

for

(b) 14. Let

(a) Show that

and

and

be the linear operators given by the formulas

are one-to-one.

(b) Find formulas for

(c) Verify that

.

15. Let

and

(a) Find formulas for

be the linear transformations given by the formulas ,

(b) Verify that

, and

.

.

Answer: (a) 16. Let the

, , and be the reflections about the xy-plane, the -plane, respectively. Verify Formula 8 for these linear operators.

17. Let

-plane, and

be the function defined by the formula

(a) Find

.

(b) Show that T is a linear transformation. (c) Show that T is one-to-one. (d) Find

, and sketch its graph.

Answer: (a) (d) 18. Let be the linear operator given by the formula one-to-one and that for every real value of k. 19. Prove: If

. Show that T is

is a one-to-one linear transformation, then

is a one-to-one linear

transformation. In Exercises 20–21, determine whether 20. (a)

.

is the orthogonal projection on the x-axis, and

is the orthogonal projection

on the y-axis. (b)

is the rotation about the origin through an angle about the origin through an angle .

, and

is the rotation

(c)

is the rotation about the x-axis through an angle about the z-axis through an angle .

, and

is the rotation

21. (a) (b)

is the reflection about the x-axis, and is the orthogonal projection on the x-axis, and rotation through an angle .

is the reflection about the y-axis. is the counterclockwise

(c)

is a dilation by a factor k, and z-axis through an angle .

is the counterclockwise rotation about the

Answer: (a) (b) (c) 22. (Calculus required) Let

be the linear transformations in Examples 11 and 12 of Section 8.1. Find

for

(a) (b) (c) 23. (Calculus required) The Fundamental Theorem of Calculus implies that integration and differentiation reverse the actions of each other. Define a transformation by , and define

by

(a) Show that D and J are linear transformations. (b) Explain why J is not the inverse transformation of D. (c) Can the domains and/or codomains of D and J be restricted so they are inverse linear transformations?

True-False Exercises In parts (a)–(f) determine whether the statement is true or false, and justify your answer. (a) The composition of two linear transformations is also a linear transformation. Answer: True (b) If

and

are any two linear operators, then

Answer: False (c) The inverse of a linear transformation is a linear transformation. Answer:

.

False (d) If a linear transformation T has an inverse, then the kernel of T is the zero subspace. Answer: True (e) If is the orthogonal projection onto the x-axis, then x-axis onto a line that is perpendicular to the x-axis.

maps each point on the

Answer: False (f) If

and

are linear transformations, and if

. Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

is not one-to-one, then neither is

8.4 Matrices for General Linear Transformations In this section we will show that a general linear transformation from any n-dimensional vector space V to any m-dimensional vector space W can be performed using an appropriate matrix transformation from to . This idea is used in computer computations since computers are well suited for performing matrix computations.

Matrices of Linear Transformations Suppose that V is an n-dimensional vector space, W is an m-dimensional vector space, and that is a linear transformation. Suppose further that B is a basis for V, that is a basis for W, and that for each vector x in V, the coordinate matrices for x and are and , respectively (Figure 8.4.1).

Figure 8.4.1 It will be our goal to find an matrix A such that multiplication by A maps the vector into the vector for each x in V (Figure 8.4.2a). If we can do so, then, as illustrated in Figure 8.4.2 b, we will be able to execute the linear transformation T by using matrix multiplication and the following indirect procedure:

Finding T (x) Indirectly Step 1. Compute the coordinate vector Step 2. Multiply Step 3. Reconstruct

.

on the left by A to produce

.

from its coordinate vector

Figure 8.4.2

.

The key to executing this plan is to find an

matrix A with the property that (1)

For this purpose, let

be a basis for the n-dimensional space V and

a basis for

the m-dimensional space W. Since Equation 1 must hold for all vectors in V, it must hold, in particular, for the basis vectors in B; that is, (2) But

so

Substituting these results into 2 yields

which shows that the successive columns of A are the coordinate vectors of with respect to the basis

. Thus, the matrix A that completes the link in Figure 8.4.2a is (3)

We will call this the matrix for T relative to the bases B and B′ and will denote it by the symbol notation, Formula 3 can be written as

. Using this

(4) and from 1, this matrix has the property (5) We leave it as an exercise to show that in the special case where are the standard bases for and , respectively, then

is multiplication by A, and where B and

(6)

Remark Observe that in the notation the right subscript is a basis for the domain of T, and the left subscript is a basis for the image space of T (Figure 8.4.3). Moreover, observe how the subscript B seems to “cancel out” in Formula 5 (Figure 8.4.4).

Figure 8.4.3

Figure 8.4.4

E X A M P L E 1 Matrix for a Linear Transformation Let

be the linear transformation defined by

Find the matrix for T with respect to the standard bases where

Solution From the given formula for T we obtain

By inspection, the coordinate vectors for

and

relative to

are

Thus, the matrix for T with respect to B and

is

E X A M P L E 2 The Three-Step Procedure Let be the linear transformation in Example 1, and use the three-step procedure described in the following figure to perform the computation

Solution Step 1. The coordinate matrix for

Step 2. Multiplying

Step 3. Reconstructing

relative to the basis

by the matrix

found in Example 1 we obtain

from

we obtain

Although Example 2 is simple, the procedure that it illustrates is applicable to problems of great complexity.

E X A M P L E 3 Matrix for a Linear Transformation Let

is

be the linear transformation defined by

Find the matrix for the transformation T with respect to the bases for , where

for

and

Solution From the formula for T,

Expressing these vectors as linear combinations of

,

, and

, we obtain (verify)

Thus,

so

Remark Example 3 illustrates that a fixed linear transformation generally has multiple representations, each depending on the bases chosen. In this case the matrices

both represent the transformation T, the first relative to the standard bases for B and stated in the example.

and

, the second relative to the bases

Matrices of Linear Operators In the special case where (so that is a linear operator), it is usual to take when constructing a matrix for T. In this case the resulting matrix is called the matrix for T relative to the basis B and is usually denoted by rather than . If , then Formulas 4 and 5 become Phrased informally, Formulas 7 and 8 state that the matrix for T, when multiplied by the coordinate vector for x, produces the coordinate vector for .

(7)

(8) In the special case where then Formula 7 simplifies to

is a matrix operator, say multiplication by A, and B is the standard basis for

,

(9)

Matrices of Identity Operators Recall that the identity operator maps every vector in V into itself, that is, for every vector x in . The following example shows that if V is n-dimensional, then the matrix for I relative to any basis B for V is the identity matrix.

E X A M P L E 4 Matrices of Identity Operators If operator on , then

is a basis for a finite-dimensional vector space , and if

Therefore,

E X A M P L E 5 Linear Operator on P2 Let

be the linear operator defined by

that is, (a) Find

. relative to the basis

(b) Use the indirect procedure to compute (c) Check the result in (b) by computing Solution

. . directly.

is the identity

(a) From the formula for T,

so

Thus,

(b)

Step 1. The coordinate matrix for

Step 2. Multiplying

by the matrix

Step 3. Reconstructing

relative to the basis

is

found in part (a) we obtain

from

we obtain

(c) By direct computation,

which agrees with the result in (b).

Matrices of Compositions and Inverse Transformations We will conclude this section by mentioning two theorems without proof that are generalizations of Formulas 4 and 7 of Section 4.10.

THEOREM 8.4.1 If and respectively, then

are linear transformations, and if B,

, and

are bases for U,

and W,

(10)

THEOREM 8.4.2 If

is a linear operator, and if B is a basis for V, then the following are equivalent.

(a) T is one-to-one. (b)

is invertible.

Moreover, when these equivalent conditions hold, (11)

Remark In 10, observe how the interior subscript (the basis for the intermediate space V) seems to “cancel out,” leaving only the bases for the domain and image space of the composition as subscripts (Figure 8.4.5). This cancellation of interior subscripts suggests the following extension of Formula 10 to compositions of three linear transformations (Figure 8.4.6):

(12)

Figure 8.4.5

Figure 8.4.6 The following example illustrates Theorem 8.4.1.

E X A M P L E 6 Composition Let and let

be the linear transformation defined by be the linear operator defined by

Then the composition

is given by

Thus, if

, then (13)

In this example, plays the role of U in Theorem 8.4.1, and take in 10 so that the formula simplifies to

plays the roles of both V and W; thus we can

(14)

Let us choose

to be the basis for

and choose

to be the basis for

. We

showed in Examples 1 and 5 that

Thus, it follows from 14 that (15)

As a check, we will calculate Formula 4 with and

directly from Formula 4. Since

, it follows from

that (16)

Using 13 yields

From this and the fact that

Substituting in 16 yields

which agrees with 15.

, it follows that

Concept Review • Matrix for a linear transformation relative to bases • Matrix for a linear operator relative to a basis • The three-step procedure for finding

Skills • Find the matrix for a linear transformation • For a linear transformation

find

relative to bases of V and W. using the matrix for T relative to bases of V and W.

Exercise Set 8.4 1. Let

be the linear transformation defined by

.

(a) Find the matrix for T relative to the standard bases where

(b) Verify that the matrix .

obtained in part (a) satisfies Formula 5 for every vector

in

Answer: (a)

2. Let

be the linear transformation defined by

(a) Find the matrix for T relative to the standard bases (b) Verify that the matrix . 3. Let

for

and

.

obtained in part (a) satisfies Formula 5 for every vector

in

be the linear operator defined by

(a) Find the matrix for T relative to the standard basis (b) Verify that the matrix Answer:

and

for

.

obtained in part (a) satisfies Formula 8 for every vector

in

.

(a)

4. Let

be the linear operator defined by

and let

(a) Find

be the basis for which

.

(b) Verify that Formula 8 holds for every vector x in 5. Let

.

be defined by

(a) Find the matrix

relative to the bases

and

(b) Verify that Formula 5 holds for every vector in

, where

.

Answer: (a)

6. Let

be the linear operator defined by

(a) Find the matrix for T with respect to the basis

, where

(b) Verify that Formula 8 holds for every vector (c) Is T one-to-one? If so, find the matrix of 7. Let

(a) Find

in

with respect to the basis B.

be the linear operator defined by

with respect to the basis

.

, that is,

.

(b) Use the three-step procedure illustrated in Example 2 to compute (c) Check the result obtained in part (b) by computing

. directly.

Answer: (a)

(b) 8. Let

be the linear transformation defined by

(a) Find

relative to the bases

, that is,

and

.

(b) Use the three-step procedure illustrated in Example 2 to compute (c) Check the result obtained in part (b) by computing 9.

Let

and

(a) Find (b) Find (c) (d)

(a) (b) (c)

(d)

relative to the basis and

and

. .

Find a formula for Use the formula obtained in (c) to compute

Answer:

directly.

and let

be the matrix for

.

.

10. Let

be the matrix for

relative to the bases

and

, where

(a) Find

,

(b) Find

,

, ,

, and

, and

.

.

(c) Find a formula for

(d) Use the formula obtained in (c) to compute

11. Let

be the matrix for ,

Find

,

(a) Find

,

with respect to the basis

, , and , and

(b) Find a formula for

. .

. .

(c) Use the formula obtained in (c) to compute Answer: (a)

(b) (c) (d) 12. Let and let

be the linear transformation defined by be the linear operator defined by

.

, where

Let

and

be the standard bases for

(a) Find

,

, and

and

.

.

(b) State a formula relating the matrices in part (a). (c) Verify that the matrices in part (a) satisfy the formula you stated in part(b). 13. Let

be the linear transformation defined by

and let

be the linear transformation defined by

Let

,

(a) Find

, and ,

.

, and

.

(b) State a formula relating the matrices in part (a). (c) Verify that the matrices in part (a) satisfy the formula you stated in part(b). Answer: (a)

(b) 14. Show that if matrix.

is the zero transformation, then the matrix for T with respect to any bases for V and W is a zero

15. Show that if is a contraction or a dilation of V (Example 4) of Section 8.1), then the matrix for T relative to any basis for V is a positive scalar multiple of the identity matrix. 16. Let defined by

be a basis for a vector space V. Find the matrix with respect to B of the linear operator , , , .

17. Prove that if B and are the standard bases for and , respectively, then the matrix for a linear transformation relative to the bases B and is the standard matrix for T. 18. (Calculus required) Let

be the differentiation operator

matrix of D relative to the basis

. In parts (a) and (b), find the

.

(a) (b) (c) Use the matrix in part (a) to compute

.

(d) Repeat the directions for part (c) for the matrix in part (b). 19. (Calculus required) In each part, suppose that is a basis for a subspace V of the vector space of real-valued functions defined on the real line. Find the matrix with respect to B for differentiation operator (a) (b) (c)

.

(d) Use the matrix in part (c) to compute

.

Answer: (a)

(b)

(c)

(d) since 20. Let V be a four-dimensional vector space with basis B, let W be a seven-dimensional vector space with basis , and let be a linear transformation. Identify the four vector spaces that contain the vectors at the corners of the accompanying diagram.

Figure Ex-20 21. In each part, fill in the missing part of the equation. (a) (b) Answer: (a) (b)

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a)

If the matrix of a linear transformation vector x in V such that Answer: False

.

relative to some bases of V and W is

, then there is a nonzero

(b)

If the matrix of a linear transformation vector x in V such that

relative to bases for V and W is

, then there is a nonzero

.

Answer: False (c)

If the matrix of a linear transformation

relative to certain bases for V and W is

, then T is one-to-one.

Answer: True (d) If

and

are linear operators and B is a basis for V, then the matrix of

relative to B is

. Answer: False (e) If

is an invertible linear operator and B is a basis for V, then the matrix for

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

relative to B is

.

8.5 Similarity The matrix for a linear operator T: V→V depends on the basis selected for V. One of the fundamental problems of linear algebra is to choose a basis for V that makes the matrix for T as simple as possible—a diagonal or a triangular matrix, for example. In this section we will study this problem.

Simple Matrices for Linear Operators Standard bases do not necessarily produce the simplest matrices for linear operators. For example, consider the matrix operator whose standard matrix is (1) and view

as the matrix for T relative to the standard basis

T relative to the basis

for

for

. Let us compare this to the matrix for

in which (2)

Since

it follows that

so the matrix for T relative to the basis

is

This matrix, being diagonal, has a simpler form than and conveys clearly that the operator T scales and by a factor of 3, information that is not immediately evident from .

by a factor of 2

One of the major themes in more advanced linear algebra courses is to determine the “simplest possible form” that can be obtained for the matrix of a linear operator by choosing the basis appropriately. Sometimes it is possible to obtain a diagonal matrix (as above, for example), whereas other times one must settle for a triangular matrix or some other form. We will only be able to touch on this important topic in this text. The problem of finding a basis that produces the simplest possible matrix for a linear operator can be attacked by first finding a matrix for T relative to any basis, typically a standard basis, where applicable, and then changing the basis in a way that simplifies the matrix. Before pursuing this idea, it will be helpful to revisit some concepts about changing bases.

A New View of Transition Matrices Recall from Formulas 7 and 8 of Section 4.6 that if space V, then the transition matrices from B to

and from

and to B are

are bases for a vector

(3)

(4) where the matrices and that if is any vector in V, then

are inverses of each other. We also showed in Formulas 9 and 10 of that section

(5)

(6) The following theorem shows that transition matrices in Formulas 3 and 4 can be viewed as matrices for identity operators.

THEOREM 8.5.1 If B and

are bases for a finite-dimensional vector space V, and if

Proof Suppose that

and

is the identity operator on V, then

are bases for V. Using the fact that

for all

in V, it follows from Formula 4 of Section 8.4 that

The proof that

is similar.

Effect of Changing Bases on Matrices of Linear Operators We are now ready to consider the main problem in this section.

PROBLEM If B and are two bases for a finite-dimensional vector space V, and if relationship, if any, exists between the matrices and ?

is a linear operator, what

The answer to this question can be obtained by considering the composition of the three linear operators on V pictured in Figure 8.5.1.

Figure 8.5.1 In this figure, is first mapped into itself by the identity operator, then is mapped into by T, and then is mapped into itself by the identity operator. All four vector spaces involved in the composition are the same (namely, V), but the bases for the spaces vary. Since the starting vector is and the final vector is , the composition produces the same result as applying T directly; that is, (7) If, as illustrated in Figure 8.5.1, if the first and last vector spaces are assigned the basis and the middle two spaces are assigned the basis B, then it follows from 7 and Formula 12 of Section 8.4 (with an appropriate adjustment to the names of the bases) that (8) or, in simpler notation, (9) We can simplify this formula even further by using Theorem 8.5.1 to rewrite it as (10) In summary, we have the following theorem.

THEOREM 8.5.2 Let

be a linear operator on a finite-dimensional vector space V, and let B and

be bases for V. Then (11)

where

and

.

Warning When applying Theorem 8.5.2, it is easy to forget whether (correct) or (incorrect). It may help to use the diagram in Figure 8.5.2 and observe that the exterior subscripts of the transition matrices match the subscript of the matrix they enclose.

Figure 8.5.2

In the terminology of Definition 1 of Section 5.2, Theorem 8.5.2 tells us that matrices representing the same linear operator relative to different bases must be similar. The following theorem is a rephrasing of Theorem 8.5.2 in the language of similarity.

THEOREM 8.5.3 Two matrices, A and B, are similar if and only if they represent the same linear operator. Moreover, if then P is the transition matrix from the basis relative to matrix B to the basis relative to matrix A.

E X A M P L E 1 Similar Matrices Represent the Same Linear Operator We showed at the beginning of this section that the matrices

represent the same linear operator which .

. Verify that these matrices are similar by finding a matrix P for

Solution We need to find the transition matrix

where

is the basis for

inspection that

from which it follows that

Thus,

We leave it for you to verify that

and hence that

given by 2 and

is the standard basis for

. We see by

Similarity Invariants Recall from Section 5.2 that a property of a square matrix is called a similarity invariant if that property is shared by all similar matrices. In Table 1 of that section (table reproduced below), we listed the most important similarity invariants. Since we know from Theorem 8.5.3 that two matrices are similar if and only if they represent the same linear operator , it follows that if B and are bases for V, then every similarity invariant property of is also a similarity invariant property of for any other basis for V. For example, for any two bases B and we must have It follows from this equation that the value of the determinant depends on T, but not on the particular basis that is used to obtain the matrix for T. Thus, the determinant can be regarded as a property of the linear operator T; indeed, if V is a finitedimensional vector space, then we can define the determinant of the linear operator T to be (12) where B is any basis for V. Table 1 Similarity Invariants Property

Description

Determinant

A and

Invertibility

A is invertible if and only if

Rank

A and

have the same rank.

Nullity

A and

have the same nullity.

Trace

A and

have the same trace.

Characteristic polynomial

A and

have the same characteristic polynomial.

Eigenvalues

A and

have the same eigenvalues.

Eigenspace dimension

If is an eigenvalue of A and , then the eigenspace of A corresponding to eigenspace of corresponding to have the same dimension.

have the same determinant. is invertible.

and the

E X A M P L E 2 Determinant of a Linear Operator At the beginning of this section we showed that the matrices

represent the same linear operator relative to different bases, the first relative to the standard basis for and the second relative to the basis for which

This means that and must be similar matrices and hence must have the same similarity invariant properties. In particular, they must have the same determinant. We leave it for you to verify that

E X A M P L E 3 Eigenvalues and Bases for Eigenspaces Find the eigenvalues and bases for the eigenspaces of the linear operator

defined by

Solution We leave it for you to show that the matrix for T with respect to the standard basis is

The eigenvalues of T are and eigenspace of corresponding to

(Example 7 of Section 5.1). Also from that example, the has the basis , where

and the eigenspace of

corresponding to

The matrices

are the coordinate matrices relative to B of

,

, and

Thus, the eigenspace of T corresponding to

and that corresponding to

has the basis

, where

has the basis

has the basis

As a check, you can use the given formula for T to verify that

Concept Review • Similarity of matrices representing a linear operator • Similarity invariant • Determinant of a linear operator

Skills • Show that two matrices A and B represent the same linear operator, and find a transition matrix P so that .

• Find the eigenvalues and bases for the eigenspaces of a linear operator on a finite-dimensional vector space.

Exercise Set 8.5 In Exercises 1–7, find the matrix for T relative to the basis B, and use Theorem 8.5.2 to compute the matrix for T relative to the basis . 1.

is defined by

and

and

, where

Answer:

2.

is defined by

and

3.

and

, where

is the rotation about the origin through an angle of 45°; B and Answer:

4.

is defined by

and B is the standard basis for

and

, where

are the bases in Exercise 1.

5.

is the orthogonal projection on the

-plane, and B and

are as in Exercise 4.

Answer:

6.

is defined by

7.

is defined by ,

,

,

, and B and

are the bases in Exercise 2. , and

and

, where

.

Answer:

8. Find

.

(a)

, where

(b)

, where

(c)

, where

9. Prove that the following are similarity invariants: (a) rank (b) nullity (c) invertibility 10. Let

be the linear operator given by the formula

.

(a) Find a matrix for T relative to some convenient basis, and then use it to find the rank and nullity of T. (b) Use the result in part (a) to determine whether T is one-to-one. 11. In each part, find a basis for

relative to which the matrix for T is diagonal.

(a) (b)

Answer: (a) (b)

12. In each part, find a basis for

relative to which the matrix for T is diagonal.

(a)

(b)

(c)

13. Let

be defined by

(a) Find the eigenvalues of T. (b) Find bases for the eigenspaces of T. Answer: (a) (b) Basis for eigenspace corresponding to

14. Let

; basis for eigenspace corresponding to

be defined by

(a) Find the eigenvalues of T. (b) Find bases for the eigenspaces of T. 15. Let be an eigenvalue of a linear operator nonzero vectors in the kernel of .

. Prove that the eigenvectors of T corresponding to

16. (a) Prove that if A and B are similar matrices, then are similar if k is any positive integer. (b) If

and

17. Let C and D be 18. Find two nonzero

and

are also similar. More generally, prove that

are similar, must A and B be similar? Explain. matrices, and let for all x in V, then

be a basis for a vector space V. Show that if .

matrices that are not similar, and explain why they are not.

19. Complete the proof below by justifying each step. Hypothesis: A and B are similar matrices. Conclusion: A and B have the same characteristic polynomial. Proof: 1.

are the and

2. 3. 4. 5. 6. 20. If A and B are similar matrices, say , then it follows from Exercise 19 that A and B have the same eigenvalues. Suppose that is one of the common eigenvalues and x is a corresponding eigenvector of A. See if you can find an eigenvector of B corresponding to (expressed in terms of , and P). 21. Since the standard basis for

is so simple, why would one want to represent a linear operator on

in another basis?

Answer: The choice of an appropriate basis can yield a better understanding of the linear operator. 22. Prove that trace is a similarity invariant.

True-False Exercises In parts (a)—(h) determine whether the statement is true or false, and justify your answer. (a) A matrix cannot be similar to itself. Answer: False (b) If A is similar to B, and B is similar to C, then A is similar to C. Answer: True (c) If A and B are similar and B is singular, then A is singular. Answer: True (d) If A and B are invertible and similar, then

and

are similar.

Answer: True (e) If for

and then

are linear operators, and if for every vector x in

with respect to two bases B and

Answer: True (f) If Answer:

is a linear operator, and if

with respect to two bases B and

for

, then

.

False (g) If on

is a linear operator, and if

with respect to some basis B for

, then T is the identity operator

.

Answer: True (h) If operator on

is a linear operator, and if .

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

with respect to two bases B and

for

, then T is the identity

Chapter 8 Supplementary Exercises 1. Let A be an

matrix, B a nonzero matrix, and x a vector in a linear operator on ? Justify your answer.

expressed in matrix notation. Is

Answer: No.

, and if

, then

2. Let

(a) Show that

(b) Based on your answer to part (a), make a guess at the form of the matrix

for any positive integer n.

(c) By considering the geometric effect of multiplication by A, obtain the result in part (b) geometrically. 3. Let

be defined by

4. Let

. Show that T is not a linear operator on V.

be fixed vectors in , and let be the function defined by , where is the Euclidean inner product on .

(a) Show that T is a linear transformation. (b) Show that the matrix with row vectors 5. Let which

be the standard basis for

is the standard matrix for T. , and let

be the linear transformation for

(a) Find bases for the range and kernel of T. (b) Find the rank and nullity of T. Answer: (a)

and any two of for the kernel.

, and

form bases for the range;

is a basis

(b) 6. Suppose that vectors in

are denoted by

matrices, and define

by

(a) Find a basis for the kernel of T. (b) Find a basis for the range of T. 7. Let which

be a basis for a vector space V, and let

be the linear operator for

(a) Find the rank and nullity of T. (b) Determine whether T is one-to-one. Answer: (a) (b) T is not one-to-one. 8. Let V and W be vector spaces, let T, Define new transformations,

(a) Show that

, and be linear transformations from V to W, and let k be a scalar. and , by the formulas

and

are both linear transformations.

(b) Show that the set of all linear transformations from V to W with the operations in part (a) is a vector space. 9. Let A and B be similar matrices. Prove: (a)

and

are similar.

(b) If A and B are invertible, then

and

are similar.

10. Fredholm Alternative Theorem Let be a linear operator on an n-dimensional vector space. Prove that exactly one of the following statements holds: (i) The equation (ii) Nullity of 11. Let

has a solution for all vectors b in V. . be the linear operator defined by

Find the rank and nullity of T. Answer:

12. Prove: If A and B are similar matrices, and if B and C are also similar matrices, then A and C are similar matrices.

13. Let

be the linear operator that is defined by

respect to the standard basis for

. Find the matrix for L with

.

Answer:

14. Let

and

be the transition matrix from

be bases for a vector space V, and let

to B.

(a) Express

,

,

as linear combinations of

,

,

.

(b) Express

,

,

as linear combinations of

,

,

.

15. Let

be a basis for a vector space V, and let

Find

, where

be a linear operator for which

is the basis for V defined by

Answer:

16. Show that the matrices

are similar but that

are not. 17. Suppose that

Find

.

is a linear operator, and B is a basis for V for which

Answer:

18. Let

be a linear operator. Prove that T is one-to-one if and only if

.

19. (Calculus required) (a) Show that if

is twice differentiable, then the function

defined by

is a linear transformation.

(b) Find a basis for the kernel of D. (c) Show that the set of functions satisfying the equation , and find a basis for this subspace. Answer: (b) (c) 20. Let

be the function defined by the formula

(a) Find

.

(b) Show that T is a linear transformation. (c) Show that T is one-to-one. (d) Find

.

(e) Sketch the graph of the polynomial in part (d). 21. Let

,

and let

, and

be distinct real numbers such that be the function defined by the formula

(a) Show that T is a linear transformation. (b) Show that T is one-to-one.

is a two-dimensional subspace of

(c) Verify that if

,

, and

are any real numbers, then

where

(d) What relationship exists between the graph of the function and the points

,

, and

?

Answer: (b) The points are on the graph. 22. (Calculus required) Let and be continuous functions, and let V be the subspace of consisting of all twice differentiable functions. Define by

(a) Show that L is a linear transformation. (b) Consider the special case where

and

is in the kernel of L for all real values of 23. Calculus required Let relative to the basis

and

. Show that the function .

be the differentiation operator

. Show that the matrix for D

is

24. Calculus required It can be shown that for any real number c, the vectors

form a basis for basis. 25. Calculus required

. Find the matrix for the differentiation operator of Exercise 23 with respect to this be the integration transformation defined by

where .

. Find the matrix for J with respect to the standard bases for

Answer:

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

and

CHAPTER

9

Numerical Methods

CHAPTER CONTENTS 9.1. LU-Decompositions 9.2. The Power Method 9.3. Internet Search Engines 9.4. Comparison of Procedures for Solving Linear Systems 9.5. Singular Value Decomposition 9.6. Data Compression Using Singular Value Decomposition

INTRODUCTION This chapter is concerned with “numerical methods” of linear algebra, an area of study that encompasses techniques for solving large-scale linear systems and for finding numerical approximations of various kinds. It is not our objective to discuss algorithms and technical issues in fine detail, since there are many excellent books on the subject. Rather, we will be concerned with introducing some of the basic ideas and exploring important contemporary applications that rely heavily on numerical ideas—singular value decomposition and data compression. A computing utility such as MATLAB, Mathematica, or Maple is recommended for Section 9.2 to Section 9.6 .

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

9.1 LU-Decompositions Up to now, we have focused on two methods for solving linear systems, Gaussian elimination (reduction to row echelon form) and Gauss–Jordan elimination (reduction to reduced row echelon form). While these methods are fine for the small-scale problems in this text, they are not suitable for large-scale problems in which computer roundoff error, memory usage, and speed are concerns. In this section we will discuss a method for solving a linear system of n equations in n unknowns that is based on factoring its coefficient matrix into a product of lower and upper triangular matrices. This method, called “LU-decomposition,” is the basis for many computer algorithms in common use.

Solving Linear Systems by Factoring Our first goal in this section is to show how to solve a linear system factoring the coefficient matrix A into a product

of n equations in n unknowns by

(1) where L is lower triangular and U is upper triangular. Once we understand how to do this, we will discuss how to obtain the factorization itself. Assuming that we have somehow obtained the factorization in 1, the linear system following procedure, called LU-decomposition.

can be solved by the

The Method of LU-Decomposition Step 1. Rewrite the system

as (2)

Step 2. Define a new

matrix y by (3)

Step 3. Use 3 to rewrite 2 as

and solve this system for y.

Step 4. Substitute y in 3 and solve for .

This procedure, which is illustrated in Figure 9.1.1, replaces the single linear system systems

by a pair of linear

that must be solved in succession. However, since each of these systems has a triangular coefficient matrix, it generally turns out to involve no more computation to solve the two systems than to solve the original system

directly.

Figure 9.1.1

E X A M P L E 1 Solving Ax = b by LU-Decomposition Later in this section we will derive the factorization

(4)

Use this result to solve the linear system

From 4 we can rewrite this system as

(5)

Historical Note In 1979 an important library of machine-independent linear algebra programs called LINPACK was developed at Argonne National Laboratories. Many of the programs in that library use the decomposition methods that we will study in this section. Variations of the LINPACK routines are used in many computer programs, including MATLAB, Mathematica, and Maple.

As specified in Step 2 above, let us define

,

, and

by the equation

(6)

which allows us to rewrite 5 as

(7)

or equivalently as

This system can be solved by a procedure that is similar to back substitution, except that we solve the equations from the top down instead of from the bottom up. This procedure, called forward substitution, yields (verify). As indicated in Step 4 above, we substitute these values into 6, which yields the linear system

or, equivalently,

Solving this system by back substitution yields (verify).

Alan Mathison Turing (1912–1954) Historical Note Although the ideas were known earlier, credit for popularizing the matrix formulation of the LU-decomposition is often given to the British mathematician Alan Turing for his work on the subject in 1948. Turing, one of the great geniuses of the twentieth century, is the founder of the field of artificial intelligence. Among his many accomplishments in that field, he developed the concept of an internally programmed computer before the practical technology had reached the point where the construction of

such a machine was possible. During World War II Turing was secretly recruited by the British government's Code and Cypher School at Bletchley Park to help break the Nazi Enigma codes; it was Turing's statistical approach that provided the breakthrough. In addition to being a brilliant mathematician, Turing was a world-class runner who competed successfully with Olympic-level competition. Sadly, Turing, a homosexual, was tried and convicted of “gross indecency” in 1952, in violation of the then-existing British statutes. Depressed, he committed suicide at age 41 by eating an apple laced with cyanide. [Image: Time & Life Pictures/Getty Images, Inc.]

Finding LU-Decompositions Example 1 makes it clear that after A is factored into lower and upper triangular matrices, the system be solved by one forward substitution and one back substitution. We will now show how to obtain such factorizations. We begin with some terminology.

can

DEFINITION 1 A factorization of a square matrix A as , where L is lower triangular and U is upper triangular is called an LU-decomposition (or LU-factorization) of A.

Not every square matrix has an LU-decomposition. However, we will see that if it is possible to reduce a square matrix A to row echelon form by Gaussian elimination without performing any row interchanges, then A will have an LU-decomposition, though it may not be unique. To see why this is so, assume that A has been reduced to a row echelon form U using a sequence of row operations that does not include row interchanges. We know from Theorem 1.5.1 that these operations can be accomplished by multiplying A on the left by an appropriate sequence of elementary matrices; that is, there exist elementary matrices such that (8) Since elementary matrices are invertible, we can solve 8 for A as or more briefly as (9) where (10)

We now have all of the ingredients to prove the following result.

THEOREM 9.1.1 If A is a square matrix that can be reduced to a row echelon form U by Gaussian elimination without row interchanges, then A can be factored as , where L is a lower triangular matrix.

Proof Let L and U be the matrices in Formulas 10 and 8, respectively. The matrix U is upper triangular because it is a row echelon form of a square matrix (so all entries below its main diagonal are zero). To prove that L is lower triangular, it suffices to prove that each factor on the right side of 10 is lower triangular, since Theorem 1.7.1b will then imply that L itself is lower triangular. Since row interchanges are excluded, each results either by adding a scalar multiple of one row of an identity matrix to a row below or by multiplying one row of an identity matrix by a nonzero scalar. In either case, the resulting matrix is lower triangular and hence so is by Theorem 1.7.1d. This completes the proof.

E X A M P L E 2 An LU-Decomposition Find an LU-decomposition of

Solution To obtain an LU-decomposition, , we will reduce A to a row echelon form U using Gaussian elimination and then calculate L from 10. The steps are as follows:

and, from 10,

so

is an LU-decomposition of A.

Bookkeeping As Example 2 shows, most of the work in constructing an LU-decomposition is expended in calculating L. However, all this work can be eliminated by some careful bookkeeping of the operations used to reduce A to U. Because we are assuming that no row interchanges are required to reduce A to U, there are only two types of operations involved—multiplying a row by a nonzero constant, and adding a scalar multiple of one row to another. The first operation is used to introduce the leading 1's and the second to introduce zeros below the leading 1's. In Example 2, a multiplier of

was needed in Step 1 to introduce a leading 1 in the first row, and a multiplier of

was needed in Step 5 to introduce a leading 1 in the third row. No actual multiplier was required to introduce a leading 1 in the second row because it was already a 1 at the end of Step 2, but for convenience let us say that the multiplier was 1. Comparing these multipliers with the successive diagonal entries of L, we see that these diagonal entries are precisely the reciprocals of the multipliers used to construct U: (11) Also observe in Example 2 that to introduce zeros below the leading 1 in the first row, we used the operations

and to introduce the zero below the leading 1 in the second row, we used the operation Now note in 12 that in each position below the main diagonal of L, the entry is the negative of the multiplier in the operation that introduced the zero in that position in U: (12) This suggests the following procedure for constructing an LU-decomposition of a square matrix A, assuming that this matrix can be reduced to row echelon form without row interchanges.

Procedure for Constructing an LU-Decomposition Step 1. Reduce A to a row echelon form U by Gaussian elimination without row interchanges, keeping track of the multipliers used to introduce the leading 1's and the multipliers used to introduce the zeros below the leading 1's. Step 2. In each position along the main diagonal of L, place the reciprocal of the multiplier that introduced the leading 1 in that position in U. Step 3. In each position below the main diagonal of L, place the negative of the multiplier used to introduce the zero in that position in U. Step 4. Form the decomposition

.

E X A M P L E 3 Constructing an LU-Decomposition Find an LU-decomposition of

Solution We will reduce A to a row echelon form U and at each step we will fill in an entry of L in accordance with the four-step procedure above.

Thus, we have constructed the LU-decomposition

We leave it for you to confirm this end result by multiplying the factors.

LU-Decompositions Are Not Unique In the absence of restrictions, LU-decompositions are not unique. For example, if

and L has nonzero diagonal entries, then we can shift the diagonal entries from the left factor to the right factor by writing

which is another LU-decomposition of A.

LDU-Decompositions The method we have described for computing LU-decompositions may result in an “asymmetry” in that the matrix U has 1's on the main diagonal but L need not. However, if it is preferred to have 1's on the main diagonal of the lower triangular factor, then we can “shift” the diagonal entries of L to a diagonal matrix D and write L as where is a lower triangular matrix with 1's on the main diagonal. For example, a general matrix with nonzero entries on the main diagonal can be factored as

lower triangular

Note that the columns of are obtained by dividing each entry in the corresponding column of L by the diagonal entry in the column. Thus, for example, we can rewrite 4 as

One can prove that if A is a square matrix that can be reduced to row echelon form without row interchanges, then A can be factored uniquely as where L is a lower triangular matrix with 1's on the main diagonal, D is a diagonal matrix, and U is an upper triangular matrix with 1's on the main diagonal. This is called the LDU-decomposition (or LDU-factorization) of A.

PLU-Decompositions Many computer algorithms for solving linear systems perform row interchanges to reduce roundoff error, in which

case the existence of an LU-decomposition is not guaranteed. However, it is possible to work around this problem by “preprocessing” the coefficient matrix A so that the row interchanges are performed prior to computing the LU-decomposition itself. More specifically, the idea is to create a matrix Q (called a permutation matrix) by multiplying, in sequence, those elementary matrices that produce the row interchanges and then execute them by computing the product QA. This product can then be reduced to row echelon form without row interchanges, so it is assured to have an LU-decomposition (13) Because the matrix Q is invertible (being a product of elementary matrices), the systems will have the same solutions. But it follows from 13 that the latter system can be rewritten as can be solved using LU-decomposition.

and and hence

It is common to see Equation 13 expressed as (14) in which

. This is called a PLU-decomposition or (PLU-factorization) of A.

Concept Review • LU-decomposition • LDU-decomposition • PLU-decomposition

Skills • Determine whether a square matrix has an LU-decomposition. • Find an LU-decomposition of a square matrix. • Use the method of LU-decomposition to solve linear systems. • Find the LDU-decomposition of a square matrix. • Find a PLU-decomposition of a square matrix.

Exercise Set 9.1 1. Use the method of Example 1 and the LU-decomposition

to solve the system

Answer:

2. Use the method of Example 1 and the LU-decomposition

to solve the system

In Exercises 3–10, find an LU-decomposition of the coefficient matrix, and then use the method of Example 1 to solve the system. 3. Answer:

4. 5.

Answer:

6.

7.

Answer:

8.

9.

Answer:

10.

11. Let

(a) Find an LU-decomposition of A. (b) Express A in the form , where is lower triangular with 1's along the main diagonal, upper triangular, and D is a diagonal matrix. (c) Express A in the form upper triangular.

, where

is lower triangular with 1's along the main diagonal and

Answer: (a)

(b)

(c)

In Exercises 12–13, find an LDU-decomposition of A 12. 13.

Answer:

14.

is is

(a) Show that the matrix

has no LU-decomposition. (b) Find a PLU-decomposition of this matrix. In Exercises 15–16, use the given PLU-decomposition of A to solve the linear system and solving this system by LU-decomposition.

by rewriting it as

15.

Answer:

16.

In Exercises 17–18, find a PLU-decomposition of A, and use it to solve the linear system of Exercises 15 and 16. 17.

Answer:

18.

19. Let

by the method

(a) Prove: If

, then the matrix A has a unique LU-decomposition with 1's along the main diagonal of L.

(b) Find the LU-decomposition described in part (a). Answer: (b)

20. Let be a linear system of n equations in n unknowns, and assume that A is an invertible matrix that can be reduced to row-echelon form without row interchanges. How many additions and multiplications are required to solve the system by the method of Example 1? 21. Prove: If A is any matrix, then A can be factored as , where L is lower triangular, U is upper triangular, and P can be obtained by interchanging the rows of appropriately. [Hint: Let U be a row echelon form of A, and let all row interchanges required in the reduction of A to U be performed first.]

True-False Exercises In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) Every square matrix has an LU-decomposition. Answer: False (b) If a square matrix A is row equivalent to an upper triangular matrix U, then A has an LU-decomposition. Answer: False (c) If

are

lower triangular matrices, then the product

is lower triangular.

Answer: True (d) If a square matrix A has an LU-decomposition, then A has a unique LDU-decomposition. Answer: True (e) Every square matrix has a PLU-decomposition. Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

9.2 The Power Method The eigenvalues of a square matrix can, in theory, be found by solving the characteristic equation. However, this procedure has so many computational difficulties that it is almost never used in applications. In this section we will discuss an algorithm that can be used to approximate the eigenvalue with greatest absolute value and a corresponding eigenvector. This particular eigenvalue and its corresponding eigenvectors are important because they arise naturally in many iterative processes. The methods we will study in this section have recently been used to create Internet search engines such as Google. We will discuss this application in the next section.

The Power Method There are many applications in which some vector sequence

in

is multiplied repeatedly by an

matrix A to produce a

We call a sequence of this form a power sequence generated by A. In this section we will be concerned with the convergence of power sequences and how such sequences can be used to approximate eigenvalues and eigenvectors. For this purpose, we make the following definition.

DEFINITION 1 If the distinct eigenvalues of a matrix A are and if is larger than then called a dominant eigenvalue of A. Any eigenvector corresponding to a dominant eigenvalue is called a dominant eigenvector of A.

is

E X A M P L E 1 Dominant Eigenvalues Some matrices have dominant eigenvalues and some do not. For example, if the distinct eigenvalues of a matrix are then is dominant since is greater than the absolute values of all the other eigenvalues; but if the distinct eigenvalues of a matrix are then so there is no eigenvalue whose absolute value is greater than the absolute value of all the other eigenvalues.

The most important theorems about convergence of power sequences apply to matrices with n linearly independent eigenvectors (symmetric matrices, for example), so we will limit our discussion to this case in this section.

THEOREM 9.2.1

Let A be a symmetric matrix with a positive* dominant eigenvalue If is a unit vector in not orthogonal to the eigenspace corresponding to then the normalized power sequence

that is

(1) converges to a unit dominant eigenvector, and the sequence (2) converges to the dominant eigenvalue λ.

Remark In the exercises we will ask you to show that 1 can also be expressed as

(3) This form of the power sequence expresses each iterate in terms of the starting vector predecessor.

rather than in terms of its

We will not prove Theorem 9.2.1, but we can make it plausible geometrically in the case where A is a symmetric matrix with distinct positive eigenvalues, and one of which is dominant. To be specific, assume that is dominant and Since we are assuming that A is symmetric and has distinct eigenvalues, it follows from Theorem 7.2.2 that the eigenspaces corresponding to and are perpendicular lines through the origin. Thus, the assumption that is a unit vector that is not orthogonal to the eigenspace corresponding to implies that does not lie in the eigenspace corresponding to To see the geometric effect of multiplying by A, it will be useful to split into the sum (4) where

and

are the orthogonal projections of

on the eigenspaces of

and

respectively (Figure 9.2.1a).

Figure 9.2.1 This enables us to express

as (5)

which tells us that multiplying by A “scales” the terms and in 4 by and respectively. However, is larger than , so the scaling is greater in the direction of than in the direction of Thus, multiplying by A “pulls” toward the eigenspace of and normalizing produces a vector , which is on the unit circle and is closer to the eigenspace of than (Figure 9.2.1b). Similarly, multiplying by A and normalizing produces a unit vector that is closer to the eigenspace of than . Thus, it seems reasonable that by repeatedly multiplying by A and normalizing we will produce a sequence of vectors that lie on the unit circle and converge to a unit vector in the eigenspace of (Figure 9.2.1c). Moreover, if converges to then it also seems reasonable that will converge to which is the dominant eigenvalue of A.

The Power Method with Euclidean Scaling Theorem 9.2.1 provides us with an algorithm for approximating the dominant eigenvalue and a corresponding unit eigenvector of a symmetric matrix provided the dominant eigenvalue is positive. This algorithm, called the power method with Euclidean scaling, is as follows:

The Power Method with Euclidean Scaling Step 1. Choose an arbitrary nonzero vector and normalize it, if need be, to obtain a unit vector Step 2. Compute Compute

and normalize it to obtain the first approximation to a dominant unit eigenvector. to obtain the first approximation to the dominant eigenvalue.

Step 3. Compute and normalize it to obtain the second approximation to a dominant unit eigenvector. Compute to obtain the second approximation to the dominant eigenvalue. Step 4. Compute and normalize it to obtain the third approximation to a dominant unit eigenvector. Compute to obtain the third approximation to the dominant eigenvalue. Continuing in this way will usually generate a sequence of better and better approximations to the dominant eigenvalue and a corresponding unit eigenvector.*

E X A M P L E 2 The Power Method with Euclidean Scaling Apply the power method with Euclidean scaling to

Stop at and compare the resulting approximations to the exact values of the dominant eigenvalue and eigenvector. Solution We will leave it for you to show that the eigenvalues of A are and and that the eigenspace corresponding to the dominant eigenvalue is the line represented by the parametric equations , , which we can write in vector form as (6)

Setting

yields the normalized dominant eigenvector

(7)

Now let us see what happens when we use the power method, starting with the unit vector

Thus, approximates the dominant eigenvalue to five decimal place accuracy and dominant eigenvector in 7 correctly to three decimal place accuracy.

.

approximates the

It is accidental that (the fifth approximation) produced five decimal place accuracy. In general, n iterations need not produce n decimal place accuracy.

The Power Method with Maximum Entry Scaling There is a variation of the power method in which the iterates, rather than being normalized at each stage, are scaled to make the maximum entry 1. To describe this method, it will be convenient to denote the maximum absolute value of the entries in a vector by . Thus, for example, if

then

. We will need the following variation of Theorem 9.2.1.

THEOREM 9.2.2 Let A be a symmetric matrix with a positive dominant* eigenvalue is not orthogonal to the eigenspace corresponding to then the sequence

If

is a nonzero vector in

that

(8) converges to an eigenvector corresponding to λ, and the sequence (9) converges to λ.

Remark In the exercises we will ask you to show that 8 can be written in the alternative form

(10) which expresses the iterates in terms of the initial vector We will omit the proof of this theorem, but if we accept that 8 converges to an eigenvector of A, then it is not hard to see why 9 converges to the dominant eigenvalue. For this purpose we note that each term in 9 is of the form (11) which is called a Rayleigh quotient of A. In the case where λ is an eigenvalue of A and the Rayleigh quotient is

Thus, if

converges to a dominant eigenvector

is a corresponding eigenvector,

then it seems reasonable that

which is the dominant eigenvalue. Theorem 9.2.2 produces the following algorithm, called the power method with maximum entry scaling.

The Power Method with Maximum Entry Scaling

Step 1. Choose an arbitrary nonzero vector Step 2. Compute and multiply it by the factor dominant eigenvector. Compute the Rayleigh quotient of dominant eigenvalue.

to obtain the first approximation to obtain the first approximation to the

to a

Step 3. Compute and scale it by the factor dominant eigenvector. Compute the Rayleigh quotient of dominant eigenvalue.

to obtain the second approximation to a to obtain the second approximation to the

Step 4. Compute and scale it by the factor dominant eigenvector. Compute the Rayleigh quotient of dominant eigenvalue.

to obtain the third approximation to a to obtain the third approximation to the

Continuing in this way will generate a sequence of better and better approximations to the dominant eigenvalue and a corresponding eigenvector.

John William Strutt Rayleigh (1842–1919) Historical Note The British mathematical physicist John Rayleigh won the Nobel prize in physics in 1904 for his discovery of the inert gas argon. Rayleigh also made fundamental discoveries in acoustics and optics, and his work in wave phenomena enabled him to give the first accurate explanation of why the sky is blue. [Image: The Granger Collection, New York]

E X A M P L E 3 Example 2 Revisited Using Maximum Entry Scaling Apply the power method with maximum entry scaling to

Stop at and compare the resulting approximations to the exact values and to the approximations obtained in Example 2. Solution We leave it for you to confirm that

Thus, approximates the dominant eigenvalue correctly to five decimal places and approximates the dominant eigenvector

that results by taking

closely

in 6.

Whereas the power method with Euclidean scaling produces a sequence that approaches a unit dominant eigenvector, maximum entry scaling produces a sequence that approaches an eigenvector whose largest component is 1.

Rate of Convergence If A is a symmetric matrix whose distinct eigenvalues can be arranged so that then the “rate” at which the Rayleigh quotients converge to the dominant eigenvalue depends on the ratio ; that is, the convergence is slow when this ratio is near 1 and rapid when it is large—the greater the ratio, the more rapid the convergence. For example, if A is a symmetric matrix, then the greater the ratio the greater the

disparity between the scaling effects of and in Figure 9.2.1, and hence the greater the effect that multiplication by A has on pulling the iterates toward the eigenspace of . Indeed, the rapid convergence in Example 3 is due to the fact that which is considered to be a large ratio. In cases where the ratio is close to 1, the convergence of the power method may be so slow that other methods must be used.

Stopping Procedures If λ is the exact value of the dominant eigenvalue, and if a power method produces the approximation iteration, then we call

at the kth

(12) the relative error in example, if

. If this is expressed as a percentage, then it is called the percentage error in

and the approximation after three iterations is

For

then

In applications one usually knows the relative error E that can be tolerated in the dominant eigenvalue, so the goal is to stop computing iterates once the relative error in the approximation to that eigenvalue is less than E. However, there is a problem in computing the relative error from 12 in that the eigenvalue λ is unknown. To circumvent this problem, it is usual to estimate λ by and stop the computations when (13) The quantity on the left side of 13 is called the estimated relative error in estimated percentage error in .

and its percentage form is called the

E X A M P L E 4 Estimated Relative Error For the computations in Example 3, find the smallest value of k for which the estimated percentage error in is less than 0.1%. Solution The estimated percentage errors in the approximations in Example 3 are as follows:

Thus,

is the first approximation whose estimated percentage error is less than 0.1%.

Remark A rule for deciding when to stop an iterative process is called a stopping procedure. In the exercises, we will discuss stopping procedures for the power method that are based on the dominant eigenvector rather than the dominant eigenvalue.

Concept Review • Power sequence • Dominant eigenvalue • Dominant eigenvector • Power method with Euclidean scaling • Rayleigh quotient • Power method with maximum entry scaling • Relative error • Percentage error • Estimated relative error • Estimated percentage error • Stopping procedure

Skills • Identify the dominant eigenvalue of a matrix. • Use the power methods described in this section to approximate a dominant eigenvector. • Find the estimated relative and percentage errors associated with the power methods.

Exercise Set 9.2 In Exercises 1–2, the distinct eigenvalues of a matrix are given. Determine whether A has a dominant eigenvalue, and if so, find it. 1. (a) (b) Answer: (a)

dominant

(b) No dominant eigenvalue 2. (a) (b)

In Exercises 3–4, apply the power method with Euclidean scaling to the matrix A, starting with and stopping at Compare the resulting approximations to the exact values of the dominant eigenvalue and the corresponding unit eigenvector.

.

3. Answer: ; dominant eigenvalue:

;

dominant eigenvector: 4.

In Exercises 5–6, apply the power method with maximum entry scaling to the matrix A, starting with and stopping at . Compare the resulting approximations to the exact values of the dominant eigenvalue and the corresponding scaled eigenvector. 5. Answer:

dominant eigenvalue:

dominant eigenvector:

6.

7. Let

;

(a) Use the power method with maximum entry scaling to approximate a dominant eigenvector of A. Start with round off all computations to three decimal places, and stop after three iterations.

,

(b) Use the result in part (a) and the Rayleigh quotient to approximate the dominant eigenvalue of A. (c) Find the exact values of the eigenvector and eigenvalue approximated in parts (a) and (b). (d) Find the percentage error in the approximation of the dominant eigenvalue. Answer: (a) (b) (c)

Dominant eigenvalue:

; dominant eigenvector:

(d) 0.1% 8. Repeat the directions of Exercise 7 with

In Exercises 9–10, a matrix A with a dominant eigenvalue and a sequence , 9 and 10 to approximate the dominant eigenvalue and a corresponding eigenvector.

are given. Use Formulas

9.

Answer:

10.

11. Consider matrices

where is a unit vector and . Show that even though the matrix A is symmetric and has a dominant eigenvalue, the power sequence 1 in Theorem 9.2.1 does not converge. This shows that the requirement in that theorem that the dominant eigenvalue be positive is essential. 12. Use the power method with Euclidean scaling to approximate the dominant eigenvalue and a corresponding eigenvector of A. Choose your own starting vector, and stop when the estimated percentage error in the eigenvalue approximation is less than 0.1%.

(a)

(b)

13. Repeat Exercise 12, but this time stop when all corresponding entries in two successive eigenvector approximations differ by less than 0.01 in absolute value. Answer: (a) Starting with

, it takes 8 iterations.

Starting with

, it takes 8 iterations.

(b)

14. Repeat Exercise 12 using maximum entry scaling. 15. Prove: If A is a nonzero

matrix, then

and

have positive dominant eigenvalues.

16. (For readers familiar with proof by induction) Let A be an the sequence by

Prove by induction that

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

be a unit vector in

, and define

matrix, let

be a nonzero vector in

.

17. (For readers familiar with proof by induction) Let A be an define the sequence by

Prove by induction that

matrix, let

, and

9.3 Internet Search Engines Early search engines on the Internet worked by examining key words and phrases in pages and titles of posted documents. Today's most popular search engines use algorithms based on the power method to analyze hyperlinks (references) between documents. In this section we will discuss one of the ways in which this is done. Google, the most widely used engine for searching the Internet, was developed in 1996 by Larry Page and Sergey Brin while both were graduate students at Stanford University. Google uses a procedure known as the PageRank algorithm to analyze how documents at relevant sites reference one another. It then assigns to each site a PageRank score, stores those scores as a matrix, and uses the components of the dominant eigenvector of that matrix to establish the relative importance of the sites to the search. Google starts by using a standard text-based search engine to find an initial set of sites containing relevant pages. Since words can have multiple meanings, the set will typically contain irrelevant sites and miss others of relevance. To compensate for this, the set is expanded to a larger set S by adjoining all sites referenced by the pages in the sites of . The underlying assumption is that S will contain the most important sites relevant to the search. This process is then repeated a number of times to refine the search information still further. To be more specific, suppose that the search set S contains n sites, and define the adjacency matrix for S to be the

matrix

in which

We will assume that no site references itself, so the diagonal entries of A will all be zero.

E X A M P L E 1 Adjacency Matrices Here is a typical adjacency matrix for a search set with four sites:

(1)

Thus, Site 1 references Sites 3 and 4, Site 2 references Site 1, and so forth.

There are two basic roles that a site can play in the search process—the site may be a hub, meaning that it references many other sites, or it may be an authority, meaning that it is referenced by many other sites. A given site will typically have both hub and authority properties in that it will both reference and be referenced. Historical Note The term google is a variation of the word googol, which stands for the number (1 followed by 100 zeros). This term was invented by the American mathematician Edward Kasner (1878–1955) in 1938, and the story goes that it came about when Kasner asked his eight-year-old nephew to give a name to a really big number—he responded with “googol.” Kasner then went on to define a googolplex to be (1 followed by googol zeros).

In general, if A is an adjacency matrix for n sites, then the column sums of A measure the authority aspect of the sites and the row sums of A measure their hub aspect. For example, the column sums of the matrix in 1 are 3, 1, 2, and 2, which means that Site 1 is referenced by three other sites, Site 2 is referenced by one other site, and so forth. Similarly, the row sums of the matrix in 1 are 2, 1, 2, and 3, so Site 1 references two other sites, Site 2 references one other site, and so forth. Accordingly, if A is an adjacency matrix, then we call the vector of row sums of A the initial hub vector of A, and we call the vector of column sums of A the initial authority vector of A. Alternatively, we can think of as the vector of row sums of , which turns out to be more convenient for computations. The entries in the hub vector are called hub weights and those in the authority vector authority weights.

E X A M P L E 2 Initial Hub and Authority Vectors of an Adjacency Matrix Find the initial hub and authority vectors for the adjacency matrix A in Example 1. Solution The row sums of A yield the initial hub vector

(2)

and the row sums of

(the column sums of A) yield the initial authority vector

(3)

The link counting in Example 2 suggests that Site 4 is the major hub and Site 1 is the greatest authority. However, counting links does not tell the whole story; for example, it seems reasonable that if Site 1 is to be considered the greatest authority, then more weight should be given to hubs that link to that site, and if Site 4 is to be considered a major

hub, then more weight should be given to sites to which it links. Thus, there is an interaction between hubs and authorities that needs to be accounted for in the search process. Accordingly, once the search engine has calculated the initial authority vector , it then uses the information in that vector to create new hub and authority vectors and using the formulas (4) The numerators in these formulas do the weighting, and the normalization serves to control the size of the entries. To understand how the numerators accomplish the weighting, view the product as a linear combination of the column vectors of A with coefficients from . For example, with the adjacency matrix in Example 1 and the authority vector calculated in Example 2 we have

Thus, we see that the links to each referenced site are weighted by the authority values in updated hub vector

The new hub vector

To control the size of the entries, the search engine normalizes

can now be used to update the authority vector using Formula 4. The product

Once the updated hub and authority vectors, generating the interrelated sequences

and

to produce the

performs the weighting, and the normalization controls the size:

, are obtained, the search engine repeats the process and computes a succession of hub and authority vectors, thereby

(5)

(6) However, each of these is a power sequence in disguise. For example, if we substitute the expression for

into the expression for

, then we obtain

which means that we can rewrite 6 as (7) Similarly, we can rewrite 5 as (8)

Remark In Exercise 15 of Section 9.2 you were asked to show that and both have positive dominant eigenvalues. That being the case, Theorem 9.2.1 ensures that 7 and 8 converge to the dominant eigenvectors of and , respectively. The entries in those eigenvectors are the authority and hub weights that Google uses to rank the search sites in order of importance as hubs and authorities.

E X A M P L E 3 A Ranking Procedure Suppose that a search engine produces 10 Internet sites in its search set and that the adjacency matrix for those sites is

Use Formula 7 to rank the sites in decreasing order of authority. Solution We will take to be the normalized vector of column sums of A, and then we will compute the iterates in 7 until the authority vectors seem to stabilize. We leave it for you to show that

and that

Thus,

Continuing in this way yields the following authority iterates:

The small changes between and suggest that the iterates have stabilized near a dominant eigenvector of . From the entries in 1, 6, 7, and 9 are probably irrelevant to the search and that the remaining sites should be searched in order of decreasing importance as

we conclude that Sites

Concept Review • Adjacency matrix • Hub vector • Authority vector • Hub weights • Authority weights

Skills • Find the initial hub and authority vectors of an adjacency matrix. • Use the method of Example 3 to rank sites.

Exercise Set 9.3 In Exercises 1–2, find the initial hub and authority vectors for the given adjacency matrix A. 1.

Answer:

2.

In Exercises 3–4, find the updated hub and authority vectors

and

for the adjacency matrix A.

3. The matrix in Exercise 1. Answer:

4. The matrix in Exercise 2. In Exercises 5–8, the adjacency matrix A of an Internet search engine is given. Use the method of Example 3 to rank the sites in decreasing order of authority. 5.

Answer: Sites 1 and 2 (tie); sites 3 and 4 are irrelevant 6.

7.

Answer: Site 2, site 3, site 4; sites 1 and 5 are irrelevant 8.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

9.4 Comparison of Procedures for Solving Linear Systems There is an old saying that “time is money.” This is especially true in industry where the cost of solving a linear system is generally determined by the time it takes for a computer to perform the required computations. This typically depends both on the speed of the computer processor and on the number of operations required by the algorithm. Thus, choosing the right algorithm has important financial implication in an industrial or research setting. In this section we will discuss some of the factors that affect the choice of algorithms for solving large-scale linear systems.

Flops and the Cost of Solving a Linear System In computer jargon, an arithmetic operation (+, −, *,

) on two real numbers is called a flop, which is an acronym

*

for “floating-point operation.” The total number of flops required to solve a problem, which is called the cost of the solution, provides a convenient way of choosing between various algorithms for solving the problem. When needed, the cost in flops can be converted to units of time or money if the speed of the computer processor and the financial aspects of its operation are known. For example, many of today's personal computers are capable of performing in excess of 10 gigaflops per second (1 gigaflop flops). Thus, an algorithm that costs 1,000,000 flops would be executed in 0.0001 seconds. To illustrate how costs (in flops) can be computed, let us count the number of flops required to solve a linear system of n equations in n unknowns by Gauss–Jordan elimination. For this purpose we will need the following formulas for the sum of the first n positive integers and the sum of the squares of the first n positive integers: (1)

(2)

Let be a linear system of n equations in n unknowns to be solved by Gauss–Jordan elimination (or, equivalently, by Gaussian elimination with back substitution). For simplicity, let us assume that A is invertible and that no row interchanges are required to reduce the augmented matrix to row echelon form. The diagrams that accompany the following analysis provide a convenient way of counting the operations required to introduce a leading 1 in the first row and then zeros below it. In our operation counts, we will lump divisions and multiplications together as “multiplications,” and we will lump additions and subtractions together as “additions.” Step 1. It requires n flops (multiplications) to introduce the leading 1 in the first row.

Step 2. It requires n multiplications and n additions to introduce a zero below the leading 1, and there are below the leading 1, so the number of flops required to introduce zeros below the leading 1 is

rows .

Column 1. Combining Steps 1 and 2, the number of flops required for column 1 is

Column 2. The procedure for column 2 is the same as for column 1, except that now we are dealing with one less row and one less column. Thus, the number of flops required to introduce the leading 1 in row 2 and the zeros below it can be obtained by replacing n by in the flop count for the first column. Thus, the number of flops required for column 2 is

Column 3. By the argument for column 2, the number of flops required for column 3 is

Total for all columns. The pattern should now be clear. The total number of flops required to create the n leading 1's and the associated zeros is

which we can rewrite as

or on applying Formulas 1 and 2 as

Next, let us count the number of operations required to complete the backward phase (the back substitution). Column n. It requires multiplications and additions to introduce zeros above the leading 1 in the nth column, so the total number of flops required for the column is .

Column (n − 1). The procedure is the same as for Step 1, except that now we are dealing with one less row. Thus, the number of flops required for the st column is .

Column (n − 2). By the argument for column is .

, the number of flops required for column

Total. The pattern should now be clear. The total number of flops to complete the backward phase is

which we can rewrite using Formula 1 as

In summary, we have shown that for Gauss–Jordan elimination the number of flops required for the forward and backward phases is (3)

(4) Thus, the total cost of solving a linear system by Gauss–Jordan elimination is (5)

Cost Estimates for Solving Large Linear Systems It is a property of polynomials that for large values of the independent variable the term of highest power makes the major contribution to the value of the polynomial. Thus, for large linear systems we can use 3 and 4 to approximate the number of flops in the forward and backward phases as (6)

(7) This shows that it is more costly to execute the forward phase than the backward phase for large linear systems.

Indeed, the cost difference between the forward and backward phases can be enormous, as the next example shows.

E X A M P L E 1 Cost of Solving a Large Linear System Approximate the time required to execute the forward and backward phases of Gauss–Jordan elimination for a system of 10,000 ( ) equations in 10,000 unknowns using a computer that can execute 10 gigaflops per second. Solution We have for the given system, so from 6 and 7 the number of gigaflops required for the forward and backward phases is

Thus, at 10 gigaflops/s the execution times for the forward and backward phases are

We leave it as an exercise for you to confirm the results in Table 1. Table 1 Approximate Cost for an

Matrix A with Large n

Algorithm

Cost in Flops

Gauss-Jordan elimination (forward phase) Gauss-Jordan elimination (backward phase) LU-decomposition of A Forward substitution to solve Backward substitution to solve by reducing

to

Compute

Considerations in Choosing an Algorithm for Solving a Linear System For a single linear system of n equations in n unknowns, the methods of LU-decomposition and Gauss– Jordan elimination differ in bookkeeping but otherwise involve the same number of flops. Thus, neither method has a cost advantage over the other. However, LU-decomposition has other advantages that make it the method of choice:

• Gauss–Jordan elimination and Gaussian elimination both use the augmented matrix so b must be known. In contrast, LU-decomposition uses only the matrix A, so once that decomposition is known it can be used with as many right-hand sides as are required, one at a time. • The LU-decomposition that is computed to solve

can be used to compute

if needed, with little

additional work. • For large linear systems in which computer memory is at a premium, one can dispense with the storage of the 1's and zeros that appear on or below the main diagonal of U, since those entries are known from the form of U. The space that this opens up can then be used to store the entries of L, thereby reducing the amount of memory required to solve the system. • If A is a large matrix consisting mostly of zeros, and if the nonzero entries are concentrated in a “band” around the main diagonal, then there are techniques that can be used to reduce the cost of LU-decomposition, giving it an advantage over Gauss–Jordan elimination. The cost in flops for Gaussian elimination is the same as that for the forward phase of Gauss– Jordan elimination.

Concept Review • Flop • Formula for the sum of the first n positive integers • Formula for the sum of the squares of the first n positive integers • Cost in flops for solving large linear systems by various methods • Cost in flops for inverting a matrix by row reduction • Issues to consider when choosing an algorithm to solve a large linear system

Skills • Compute the cost of solving a linear system by Gauss–Jordan elimination. • Approximate the time required to execute the forward and backward phases of Gauss–Jordan elimination. • Approximate the time required to find an LU-decomposition of a matrix. • Approximate the time required to find the inverse of an invertible matrix.

Exercise Set 9.4 1. A certain computer can execute 10 gigaflops per second. Use Formula 5 to find the time required to solve the system using Gauss–Jordan elimination. (a) A system of 1000 equations in 1000 unknowns. (b) A system of 10,000 equations in 10,000 unknowns. (c) A system of 100,000 equations in 100,000 unknowns.

Answer: (a) (b) (c)

, or about 18.5 hours

2. A certain computer can execute 100 gigaflops per second. Use Formula 5 to find the time required to solve the system using Gauss–Jordan elimination. (a) A system of 10,000 equations in 10,000 unknowns. (b) A system of 100,000 equations in 100,000 unknowns. (c) A system of 1,000,000 equations in 1,000,000 unknowns. 3. Today's personal computers can execute 70 gigaflops per second. Use Table 1 to estimate the time required to perform the following operations on the invertible 10,000 × 10,000 matrix A. (a) Execute the forward phase of Gauss–Jordan elimination. (b) Execute the backward phase of Gauss–Jordan elimination. (c)

-decomposition of A.

(d) Find

by reducing

to

.

Answer: (a) (b) (c) (d) 4. The IBM Roadrunner computer can operate at speeds in excess of 1 petaflop per second (1 flops). Use Table 1 to estimate the time required to perform the following operations of the invertible matrix A. (a) Execute the forward phase of Gauss–Jordan elimination. (b) Execute the backward phase of Gauss–Jordan elimination. (c)

-decomposition of A.

(d) Find

by reducing

to

.

5. (a) Approximate the time required to execute the forward phase of Gauss–Jordan elimination for a system of 100,000 equations in 100,000 unknowns using a computer that can execute 1 gigaflop per second. Do the same for the backward phase. (See Table 1.) (b) How many gigaflops per second must a computer be able to execute to find the matrix of size 10,000 10,000 in less than 0.5 s? (See Table 1.) Answer: (a) (b) 1334

s for forward phase, 10 s for backward phase

-decomposition of a

6. About how many teraflops per second must a computer be able to execute to find the inverse of a matrix of size in less than 0.5 s? (1 flops.) In Exercises 7–10, A and B are

matrices and c is a real number.

7. How many flops are required to compute

?

Answer:

8. How many flops are required to compute 9. How many flops are required to compute

? ?

Answer:

10. If A is a diagonal matrix and k is a positive integer, how many flops are required to compute

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

?

9.5 Singular Value Decomposition In this section we will discuss an extension of the diagonalization theory for symmetric matrices to general matrices. The results that we will develop in this section have applications to compression, storage, and transmission of digitized information and form the basis for many of the best computational algorithms that are currently available for solving linear systems.

Decompositions of Square Matrices We saw in Formula 2 of Section 7.2 that every symmetric matrix A can be expressed as (1) where P is an orthogonal matrix of eigenvectors of A, and D is the diagonal matrix whose diagonal entries are the eigenvalues corresponding to the column vectors of P. In this section we will call 1 an eigenvalue decomposition of A (abbreviated EVD of A). If an matrix A is not symmetric, then it does not have an eigenvalue decomposition, but it does have a Hessenberg decomposition in which P is an orthogonal matrix and H is in upper Hessenberg form (Theorem 7.2.4). Moreover, if A has real eigenvalues, then it has a Schur decomposition in which P is an orthogonal matrix and S is upper triangular (Theorem 7.2.3). The eigenvalue, Hessenberg, and Schur decompositions are important in numerical algorithms not only because the matrices D, H, and S have simpler forms than but also because the orthogonal matrices that appear in these factorizations do not magnify roundoff error. To see why this is so, suppose that is a column vector whose entries are known exactly and that is the vector that results when roundoff error is present in the entries of If P is an orthogonal matrix, then the length-preserving property of orthogonal transformations implies that which tells us that the error in approximating .

by

has the same magnitude as the error in approximating

by

There are two main paths that one might follow in looking for other kinds of decompositions of a general square matrix A: One might look for decompositions of the form in which P is invertible but not necessarily orthogonal, or one might look for decompositions of the form

in which U and V are orthogonal but not necessarily the same. The first path leads to decompositions in which J is either diagonal or a certain kind of block diagonal matrix, called a Jordan canonical form in honor of the French mathematician Camille Jordan (see p. 510). Jordan canonical forms, which we will not consider in this text, are important theoretically and in certain applications, but they are of lesser importance numerically because of the roundoff difficulties that result from the lack of orthogonality in P. In this section we will focus on the second path.

Singular Values Since matrix products of the form theorems about them.

will play an important role in our work, we will begin with two basic

THEOREM 9.5.1 If A is an

matrix, then:

(a) A and

have the same null space.

(b) A and

have the same row space.

(c)

and

have the same column space.

(d) A and

have the same rank.

We will prove part (a) and leave the remaining proofs for the exercises. Proof (a) We must show that every solution of solution of , then is also a solution of

is a solution of since

, and conversely. If

Conversely, if is any solution of , then is in the null space of vectors in the row space of by part (q) of Theorem 4.8.10. However,

is symmetric, so

must be orthogonal to the vector

is any

and hence is orthogonal to all

is also orthogonal to every vector in the column space of

. In particular,

; that is,

Using the first formula in Table 1 of Section 3.2 and properties of the transpose operation we can rewrite this as

which implies that

THEOREM 9.5.2

, thereby proving that

is a solution of

.

If A is an (a)

matrix, then: is orthogonally diagonalizable.

(b) The eigenvalues of

Proof (a) The matrix Proof (b) Since eigenvectors of we have

are nonnegative.

, being symmetric, is orthogonally diagonalizable by Theorem 7.2.1.

is orthogonally diagonalizable, there is an orthonormal basis for consisting of , say . If we let be the corresponding eigenvalues, then for

It follows from this relationship that

.

DEFINITION 1 If A is an

matrix, and if

are the eigenvalues of

are called the singular values of A.

We will assume throughout this section that the eigenvalues of are named so that and hence that

E X A M P L E 1 Singular Values Find the singular values of the matrix

Solution The first step is to find the eigenvalues of the matrix

, then the numbers

The characteristic polynomial of

is

so the eigenvalues of size are

and

are

and the singular values of A in order of decreasing

Singular Value Decomposition Before turning to the main result in this section, we will find it useful to extend the notion of a “main diagonal” to matrices that are not square. We define the main diagonal of an matrix to be the line of entries shown in Figure 9.5.1—it starts at the upper left corner and extends diagonally as far as it can go. We will refer to the entries on the main diagonal as the diagonal entries.

Figure 9.5.1 We are now ready to consider the main result in this section, which is concerned with a specific way of factoring a general matrix A. This factorization, called singular value decomposition (abbreviated SVD) will be given in two forms, a brief form that captures the main idea, and an expanded form that spells out the details. The proof is given at the end of this section.

THEOREM 9.5.3 Singular Value Decomposition If A is an

matrix, then A can be expressed in the form

where U and V are orthogonal matrices and is an values of A and whose other entries are zero.

matrix whose diagonal entries are the singular

Harry Bateman (1882–1946) Historical Note The term singular value is apparently due to the British-born mathematician Harry Bateman, who used it in a research paper published in 1908. Bateman emigrated to the United States in 1910, teaching at Bryn Mawr College, Johns Hopkins University, and finally at the California Institute of Technology. Interestingly, he was awarded his Ph.D. in 1913 by Johns Hopkins at which point in time he was already an eminent mathematician with 60 publications to his name. [Image: Courtesy of the Archives, California Institute of Technology]

THEOREM 9.5.4 Singular Value Decomposition (Expanded Form) If A is an

matrix of rank k, then A can be factored as

in which U,

and V have sizes

(a)

and

orthogonally diagonalizes

respectively, and in which .

(b) The nonzero diagonal entries of are nonzero eigenvalues of corresponding to the column vectors of V.

where

are the

(c) The column vectors of V are ordered so that (d) (e) (f)

is an orthonormal basis for col(A)}. is an extension of

to an ortho-normal basis for

.

The vectors are called the left singular vectors of A, and the vectors are called the right singular vectors of A.

E X A M P L E 2 Singular Value Decomposition if A Is Not Square Find a singular value decomposition of the matrix

Solution We showed in Example 1 that the eigenvalues of are and and that the corresponding singular values of A are and . We leave it for you to verify that

are eigenvectors corresponding to and , respectively, and that diagonalizes . From part (d) of Theorem 9.5.4, the vectors

orthogonally

are two of the three column vectors of U. Note that and are orthonormal, as expected. We could extend the set to an orthonormal basis for . However, the computations will be easier if we first remove the messy radicals by multiplying and by appropriate scalars. Thus, we will look for a unit vector that is orthogonal to

To satisfy these two orthogonality conditions, the vector

must be a solution of the homogeneous

linear system

We leave it for you to show that a general solution of this system is

Normalizing the vector on the right yields

Thus, the singular value decomposition of A is

You may want to confirm the validity of this equation by multiplying out the matrices on the right side.

Eugenio Beltrami (1835–1900)

Camille Jordan (1838–1922)

Herman Klaus Weyl (1885–1955)

Gene H. Golub (1932–) Historical Note The theory of singular value decompositions can be traced back to the work of five people: the Italian mathematician Eugenio Beltrami, the French mathematician Camille Jordan, the English mathematician James Sylvester (see p. 34), and the German mathematicians Erhard Schmidt (see p. 360) and the mathematician Herman Weyl. More recently, the pioneering efforts of the American mathematician Gene Golub produced a stable and efficient algorithm for computing it. Beltrami and Jordan were the progenitors of the decomposition—Beltrami gave a proof of the result for real, invertible matrices with distinct singular values in 1873. Subsequently, Jordan refined the theory and eliminated the unnecessary restrictions imposed by Beltrami. Sylvester, apparently unfamiliar with the work of Beltrami and Jordan, rediscovered the result in 1889 and suggested its importance. Schmidt was the first person to show that the singular value decomposition could be used to approximate a matrix by another matrix with lower rank, and, in so doing, he transformed it from a mathematical curiosity to an important practical tool. Weyl showed how to find the lower rank approximations in the presence of error. [Images: wikipedia (Beltrami); The Granger Collection, New York (Jordan); Courtesy Electronic Publishing Services, Inc., New York City (Weyl; wikipedia (Golub)]

OPTIONAL

We conclude this section with an optional proof of Theorem 9.5.4. Proof of Theorem 9.5.4 For notational simplicity we will prove this theorem in the case where A is an matrix. To modify the argument for an matrix you need only make the notational adjustments required to account for the possibility that or . The matrix

is symmetric, so it has an eigenvalue decomposition

in which the column vectors of are unit eigenvectors of , and D is a diagonal matrix whose successive diagonal entries are the eigenvalues of corresponding in succession to the column vectors of Since A is assumed to have rank k, it follows from Theorem 9.5.1 that also has rank k. It follows as well that D has rank k, since it is similar to and rank is a similarity invariant. Thus, D can be expressed in the form

(2)

where

. Now let us consider the set of image vectors (3)

This is an orthogonal set, for if

then the orthogonality of

and

implies that

Moreover, the first k vectors in 3 are nonzero since we showed in the proof of Theorem 9.5.2b that , and we have assumed that the first k diagonal entries in 2 are positive. Thus,

for

is an orthogonal set of nonzero vectors in the column space of A. But the column space of A has dimension k since

and hence S, being a linearly independent set of k vectors, must be an orthogonal basis for col(A). If we now normalize the vectors in S, we will obtain an orthonormal basis for col(A) in which

or, equivalently, in which

(4) It follows from Theorem 6.3.6 that we can extend this to an orthonormal basis for

. Now let U be the orthogonal matrix

and let

be the diagonal matrix

It follows from 4, and the fact that

for

, that

which we can rewrite using the orthogonality of V as

.

Concept Review • Eigenvalue decomposition • Hessenberg decomposition • Schur decomposition • Magnification of roundoff error • Properties that A and •

have in common

is orthogonally diagonalizable

• Eigenvalues of

are nonnegative

• Singular values • Diagonal entries of a matrix that is not square • Singular value decomposition

Skills • Find the singular values of an

matrix.

• Find a singular value decomposition of an

Exercise Set 9.5

matrix.

In Exercises 1–4, find the distinct singular values of 1. Answer:

2. 3. Answer:

4.

In Exercises 5–12, find a singular value decomposition of A. 5. Answer:

6. 7. Answer:

8. 9.

Answer:

10. 11.

Answer:

12.

13. Prove: If A is an

matrix, then

and

have the same rank.

14. Prove part (d) of Theorem 9.5.1 by using part (a) of the theorem and the fact that A and 15. (a)

Prove part (b) of Theorem 9.5.1 by first showing that row

have n columns.

is a subspace of row(A).

(b) Prove part (c) of Theorem 9.5.1 by using part (b). 16. Let

be a linear transformation whose standard matrix A has the singular value decomposition and let and be the column vectors of V and

respectively. Show that 17. Show that the singular values of 18. Show that if

are the squares of the singular values of

is a singular value decomposition of

then U orthogonally diagonalizes

True-False Exercises In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) If A is an Answer:

matrix, then

is an

matrix

.

False (b) If A is an

matrix, then

is a symmetric matrix.

Answer: True (c) If A is an

matrix, then the eigenvalues of

are positive real numbers.

Answer: False (d) If A is an

matrix, then A is orthogonally diagonalizable.

Answer: False (e) If A is an

matrix, then

is orthogonally diagonalizable.

Answer: True (f) The eigenvalues of

are the singular values of A.

Answer: False (g) Every

matrix has a singular value decomposition.

Answer: True

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

9.6 Data Compression Using Singular Value Decomposition Efficient transmission and storage of large quantities of digital data has become a major problem in our technological world. In this section we will discuss the role that singular value decomposition plays in compressing digital data so that it can be transmitted more rapidly and stored in less space. We assume here that you have read Section 9.5 .

Reduced Singular Value Decomposition Algebraically, the zero rows and columns of the matrix in Theorem 9.5.4 are superfluous and can be eliminated by multiplying out the expression using block multiplication and the partitioning shown in that formula. The products that involve zero blocks as factors drop out, leaving

(1)

which is called a reduced singular value decomposition of A. In this text we will denote the matrices on the right side of 1 by respectively, and we will write this equation as

, and

(2) Note that the sizes of

and

are

,

, and

, respectively, and that the matrix

is invertible, since its diagonal

entries are positive. If we multiply out on the right side of 1 using the column-row rule, then we obtain (3) which is called a reduced singular value expansion of A. This result applies to all matrices, whereas the spectral decomposition [Formula 7 of Section 7.2] applies only to symmetric matrices. Remark It can be proved that an matrix M has rank 1 if and only if it can be factored as , where is a column vector in and V is a column vector in . Thus, a reduced singular value decomposition expresses a matrix A of rank k as a linear combination of k rank 1 matrices.

E X A M P L E 1 Reduced Singular Value Decomposition Find a reduced singular value decomposition and a reduced singular value expansion of the matrix

Solution In Example 2 of Section 9.5 we found the singular value decomposition

(4)

Since A has rank 2 (verify), it follows from 1 with to 4 is

that the reduced singular value decomposition of A corresponding

This yields the reduced singular value expansion

Note that the matrices in the expansion have rank 1, as expected.

Data Compression and Image Processing Singular value decompositions can be used to “compress” visual information for the purpose of reducing its required storage space and speeding up its electronic transmission. The first step in compressing a visual image is to represent it as a numerical matrix from which the visual image can be recovered when needed. For example, a black and white photograph might be scanned as a rectangular array of pixels (points) and then stored as a matrix A by assigning each pixel a numerical value in accordance with its gray level. If 256 different gray levels are used (0 = white to 255 = black), then the entries in the matrix would be integers between 0 and 255. The image can be recovered from the matrix A by printing or displaying the pixels with their assigned gray levels.

Historical Note In 1924 the U.S. Federal Bureau of Investigation (FBI) began collecting fingerprints and handprints and now has more than 30 million such prints in its files. To reduce the storage cost, the FBI began working with the Los Alamos National Laboratory, the National Bureau of Standards, and other groups in 1993 to devise rank based compression methods for storing prints in digital form. The following figure shows an original fingerprint and a reconstruction from digital data that was compressed at a ratio of 26:1.

If the matrix A has size , then one might store each of its singular value decomposition

entries individually. An alternative procedure is to compute the reduced

(5) in which

, and store the

, the 's, and the 's.

When needed, the matrix A (and hence the image it represents) can be reconstructed from 5. Since each entries, this method requires storage space for numbers. Suppose, however, that the singular values produces an acceptable approximation

has m entries and each

has n

are sufficiently small that dropping the corresponding terms in 5

(6) to A and the image that it represents. We call 6 the rank r approximation of A. This matrix requires storage space for only numbers, compared to numbers required for entry-by-entry storage of A. For example, the rank 100 approximation of a matrix A requires storage for only numbers, compared to the 1,000,000 numbers required for entry-by-entry storage of A—a compression of almost 80%. Figure 9.6.1 shows some approximations of a digitized mandrill image obtained using 6.

Figure 9.6.1

Concept Review • Reduced singular value decomposition • Reduced singular value expansion • Rank of an approximation

Skills • Find the reduced singular value decomposition of an • Find the reduced singular value expansion of an

matrix. .

Exercise Set 9.6 In Exercises 1–4, find a reduced singular value decomposition of A. [Note: Each matrix appears in Exercise Set 9.5, where you were asked to find its (unreduced) singular value decomposition.] 1.

Answer:

2. 3.

Answer:

4.

In Exercises 5–8, find a reduced singular value expansion of A. 5. The matrix A in Exercise 1. Answer:

6. The matrix A in Exercise 2. 7. The matrix A in Exercise 3. Answer:

8. The matrix A in Exercise 4. 9. Suppose A is a number of entries of A.

matrix. How many numbers must be stored in the rank 100 approximation of A? Compare this with the

Answer: 70,100 numbers must be stored; A has 100,000 entries

True-False Exercises In parts (a)—(c) determine whether the statement is true or false, and justify your answer. Assume that value decomposition of an (a)

has size

matrix of rank k.

.

Answer: True (b)

has size

.

Answer: True (c)

has size

.

Answer: False

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

is a reduced singular

Chapter 9 Supplementary Exercises 1.

Find an LU-decomposition of

.

Answer:

2. Find the LDU-decomposition of the matrix A in Exercise 1. 3. Find an LU-decomposition of

.

Answer:

4. Find the LDU-decomposition of the matrix A in Exercise 3. 5.

Let

and

.

(a) Identify the dominant eigenvalue of A and then find the corresponding dominant unit eigenvector with positive entries. (b) Apply the power method with Euclidean scaling to A and to the eigenvector found in part (a).

, stopping at

(c) Apply the power method with maximum entry scaling to A and result with the eigenvector Answer: (a)

(b) (c) 6. Consider the symmetric matrix

.

. Compare your value of

, stopping at

. Compare your

Discuss the behavior of the power sequence with Euclidean scaling for a general nonzero vector observed behavior?

What is it about the matrix that causes the

7. Suppose that a symmetric matrix A has distinct eigenvalues , What can you say about the convergence of the Rayleigh quotients? 8.

Find a singular value decomposition of

.

Find a singular value decomposition of

.

,

, and

9.

Answer:

10. Find a reduced singular value decomposition and a reduced singular value expansion of the matrix A in Exercise 9. 11. Find the reduced singular value decomposition of the matrix whose singular value decomposition is

Answer:

.

12. Do orthogonally similar matrices have the same singular values? Justify your answer. 13. If P is the standard matrix for the orthogonal projection of the singular values of P?

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

onto a subspace W, what can you say about

CHAPTER

10

Applications of Linear Algebra

CHAPTER CONTENTS 10.1.

Constructing Curves and Surfaces Through Specified Points

10.2.

Geometric Linear Programming

10.3.

The Earliest Applications of Linear Algebra

10.4.

Cubic Spline Interpolation

10.5.

Markov Chains

10.6.

Graph Theory

10.7.

Games of Strategy

10.8.

Leontief Economic Models

10.9.

Forest Management

10.10. Computer Graphics 10.11. Equilibrium Temperature Distributions 10.12. Computed Tomography 10.13. Fractals 10.14. Chaos 10.15. Cryptography 10.16. Genetics 10.17. Age-Specific Population Growth 10.18. Harvesting of Animal Populations 10.19. A Least Squares Model for Human Hearing 10.20. Warps and Morphs

INTRODUCTION This chapter consists of 20 applications of linear algebra. With one clearly marked

exception, each application is in its own independent section, so sections can be deleted or permuted as desired. Each topic begins with a list of linear algebra prerequisites. Because our primary objective in this chapter is to present applications of linear algebra, proofs are often omitted. Whenever results from other fields are needed, they are stated precisely, with motivation where possible, but usually without proof.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.1 Constructing Curves and Surfaces Through Specified Points In this section we describe a technique that uses determinants to construct lines, circles, and general conic sections through specified points in the plane. The procedure is also used to pass planes and spheres in 3-space through fixed points.

Prerequisites Linear Systems Determinants Analytic Geometry

The following theorem follows from Theorem 2.3.8.

THEOREM 10.1.1 A homogeneous linear system with as many equations as unknowns has a nontrivial solution if and only if the determinant of the coefficient matrix is zero.

We will now show how this result can be used to determine equations of various curves and surfaces through specified points.

A Line Through Two Points Suppose that

and

are two distinct points in the plane. There exists a unique line (1)

that passes through these two points (Figure 10.1.1). Note that , , and coefficients are unique only up to a multiplicative constant. Because substituting them in 1 gives the two equations

are not all zero and that these and lie on the line,

(2)

(3)

Figure 10.1.1 The three equations, 1, 2, and 3, can be grouped together and rewritten as

which is a homogeneous linear system of three equations for , , and . Because , , and are not all zero, this system has a nontrivial solution, so the determinant of the coefficient matrix of the system must be zero. That is, (4) Consequently, every point satisfies 4 lies on the line.

on the line satisfies 4; conversely, it can be shown that every point

that

E X A M P L E 1 Equation of a Line Find the equation of the line that passes through the two points

and

.

Solution Substituting the coordinates of the two points into Equation 4 gives

The cofactor expansion of this determinant along the first row then gives

A Circle Through Three Points Suppose that there are three distinct points in the plane, , , and straight line. From analytic geometry we know that there is a unique circle, say,

, not all lying on a

(5)

that passes through them (Figure 10.1.2). Substituting the coordinates of the three points into this equation gives (6)

(7)

(8) As before, Equations 5 through 8 form a homogeneous linear system with a nontrivial solution for and . Thus the determinant of the coefficient matrix is zero:

,

,

,

(9)

This is a determinant form for the equation of the circle.

Figure 10.1.2

E X A M P L E 2 Equation of a Circle Find the equation of the circle that passes through the three points

,

, and

Solution Substituting the coordinates of the three points into Equation 9 gives

which reduces to In standard form this is

.

Thus the circle has center

and radius 5.

A General Conic Section Through Five Points In his momumental work Principia Mathematica, Issac Newton posed and solved the following problem (Book I, Proposition 22, Problem 14): “To describe a conic that shall pass through five given points.” Newton solved this problem geometrically, as shown in Figure 10.1.3, in which he passed an ellipse through the points A, B, D, P, C; however, the methods of this section can also be applied.

Figure 10.1.3 The general equation of a conic section in the plane (a parabola, hyperbola, or ellipse, or degenerate forms of these curves) is given by This equation contains six coefficients, but we can reduce the number to five if we divide through by any one of them that is not zero. Thus only five coefficients must be determined, so five distinct points in the plane are sufficient to determine the equation of the conic section (Figure 10.1.4). As before, the equation can be put in determinant form (see Exercise 7):

(10)

Figure 10.1.4

E X A M P L E 3 Equation of an Orbit An astronomer who wants to determine the orbit of an asteroid about the Sun sets up a Cartesian coordinate system in the plane of the orbit with the Sun at the origin. Astronomical units of measurement are used along the axes (1 astronomical distance of Earth to million miles). By Kepler's first law, the orbit must be an ellipse, so the astronomer makes five observations of the asteroid at five different times and finds five points along the orbit to be Find the equation of the orbit. Solution Substituting the coordinates of the five given points into 10 and rounding to three decimal places give

The cofactor expansion of this determinant along the first row yields Figure 10.1.5 is an accurate diagram of the orbit, together with the five given points.

Figure 10.1.5

A Plane Through Three Points In Exercise 8 we ask you to show the following: The plane in 3-space with equation that passes through three noncollinear points determinant equation

,

, and

is given by the

(11)

E X A M P L E 4 Equation of a Plane The equation of the plane that passes through the three noncollinear points and is

,

,

which reduces to

A Sphere Through Four Points In Exercise 9 we ask you to show the following: The sphere in 3-space with equation that passes through four noncoplanar points by the following determinant equation:

,

,

, and

is given

(12)

E X A M P L E 5 Equation of a Sphere

The equation of the sphere that passes through the four points and is

,

,

,

This reduces to which in standard form is

Exercise Set 10.1 1. Find the equations of the lines that pass through the following points: (a) (b) Answer: (a) (b) 2. Find the equations of the circles that pass through the following points: (a) (b) Answer: (a) (b)

or or

3. Find the equation of the conic section that passes through the points .

,

Answer: (a parabola) 4. Find the equations of the planes in 3-space that pass through the following points: (a)

,

,

, and

(b) Answer: (a) (b) 5. (a) Alter Equation 11 so that it determines the plane that passes through the origin and is parallel to the plane that passes through three specified noncollinear points. (b) Find the two planes described in part (a) corresponding to the triplets of points in Exercises 4(a) and 4(b). Answer: (a)

(b)

;

6. Find the equations of the spheres in 3-space that pass through the following points: (a) (b) Answer: (a) (b)

or or

7. Show that Equation 10 is the equation of the conic section that passes through five given distinct points in the plane. 8. Show that Equation 11 is the equation of the plane in 3-space that passes through three given noncollinear points. 9. Show that Equation 12 is the equation of the sphere in 3-space that passes through four given noncoplanar points. 10. Find a determinant equation for the parabola of the form that passes through three given noncollinear points in the plane. Answer:

11. What does Equation 9 become if the three distinct points are collinear? Answer: The equation of the line through the three collinear points 12. What does Equation 11 become if the three distinct points are collinear? Answer: 13. What does Equation 12 become if the four points are coplanar? Answer: The equation of the plane through the four coplanar points

Section 10.1 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. The general equation of a quadric surface is given by Given nine points on this surface, it may be possible to determine its equation. (a) Show that if the nine points for lie on this surface, and if they determine uniquely the equation of this surface, then its equation can be written in determinant form as

(b) Use the result in part (a) to determine the equation of the quadric surface that passes through the points , , , , , , , , and . T2. (a) A hyperplane in the n-dimensional Euclidean space where which

,

has an equation of the form

, are constants, not all zero, and

,

, are variables for

A point lies on this hyperplane if Given that the n points , , lie on this hyperplane and that they uniquely determine the equation of the hyperplane, show that the equation of the hyperplane can be written in determinant form as

(b) Determine the equation of the hyperplane in

that goes through the following nine points:

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.2 Geometric Linear Programming In this section we describe a geometric technique for maximizing or minimizing a linear expression in two variables subject to a set of linear constraints.

Prerequisites Linear Systems Linear Inequalities

Linear Programming The study of linear programming theory has expanded greatly since the pioneering work of George Dantzig in the late 1940s. Today, linear programming is applied to a wide variety of problems in industry and science. In this section we present a geometric approach to the solution of simple linear programming problems. Let us begin with some examples.

E X A M P L E 1 Maximizing Sales Revenue A candy manufacturer has 130 pounds of chocolate-covered cherries and 170 pounds of chocolate-covered mints in stock. He decides to sell them in the form of two different mixtures. One mixture will contain half cherries and half mints by weight and will sell for $2.00 per pound. The other mixture will contain one-third cherries and two-thirds mints by weight and will sell for $1.25 per pound. How many pounds of each mixture should the candy manufacturer prepare in order to maximize his sales revenue? Mathematical Formulation Let the mixture of half cherries and half mints be called mix A, and let be the number of pounds of this mixture to be prepared. Let the mixture of one-third cherries and two-thirds mints be called mix B, and let be the number of pounds of this mixture to be prepared. Since mix A sells for $2.00 per pound and mix B sells for $1.25 per pound, the total sales z (in dollars) will be Since each pound of mix A contains

pound of cherries and each pound of mix B contains

pound of cherries, the total number of pounds of cherries used in both mixtures is

Similarly, since each pound of mix A contains contains

pound of mints and each pound of mix B

pound of mints, the total number of pounds of mints used in both mixtures is

Because the manufacturer can use at most 130 pounds of cherries and 170 pounds of mints, we must have

Furthermore, since

and

cannot be negative numbers, we must have

The problem can therefore be formulated mathematically as follows: Find values of that maximize

and

subject to

Later in this section we will show how to solve this type of mathematical problem geometrically.

E X A M P L E 2 Maximizing Annual Yield A woman has up to $10,000 to invest. Her broker suggests investing in two bonds, A and B. Bond A is a rather risky bond with an annual yield of 10%, and bond B is a rather safe bond with an annual yield of 7%. After some consideration, she decides to invest at most $6000 in bond A, to invest at least $2000 in bond B, and to invest at least as much in bond A as in bond B. How should she invest her money in order to maximize her annual yield? Mathematical Formulation Let be the number of dollars to be invested in bond A, and let be the number of dollars to be invested in bond B. Since each dollar invested in bond A earns $.10 per year and each dollar invested in bond B earns $.07 per year, the total dollar amount z earned each year by both bonds is The constraints imposed can be formulated mathematically as follows:

We also have the implicit assumption that

and

are nonnegative:

Thus the complete mathematical formulation of the problem is as follows: Find values of and that maximize subject to

E X A M P L E 3 Minimizing Cost A student desires to design a breakfast of cornflakes and milk that is as economical as possible. On the basis of what he eats during his other meals, he decides that his breakfast should supply him with at least 9 grams of protein, at least the recommended daily allowance (RDA) of vitamin D, and at least

the RDA of calcium. He finds the following nutrition and cost

information on the milk and cornflakes containers:

In order not to have his mixture too soggy or too dry, the student decides to limit himself to mixtures that contain 1 to 3 ounces of cornflakes per cup of milk, inclusive. What quantities of milk and cornflakes should he use to minimize the cost of his breakfast? Mathematical Formulation Let

be the quantity of milk used (measured in

-cup units),

and let be the quantity of cornflakes used (measured in 1-ounce units). Then if z is the cost of the breakfast in cents, we may write the following.

As before, we also have the implicit assumption that and . Thus the complete mathematical formulation of the problem is as follows: Find values of and that minimize subject to

Geometric Solution of Linear Programming Problems Each of the preceding three examples is a special case of the following problem.

Problem Find values of

and

that either maximize or minimize (1)

subject to

(2)

and (3)

In each of the m conditions of 2, any one of the symbols

,

, and

may be used.

The problem above is called the general linear programming problem in two variables. The linear function z in 1 is called the objective function. Equations 2 and 3 are called the constraints; in particular, the equations in 3 are called the nonnegativity constraints on the variables and . We will now show how to solve a linear programming problem in two variables graphically. A pair of values that satisfy all of the constraints is called a feasible solution. The set of all feasible solutions determines a subset of the -plane called the feasible region. Our desire is to find a feasible solution that maximizes the objective function. Such a solution is called an optimal solution. To examine the feasible region of a linear programming problem, let us note that each constraint of the form defines a line in the

-plane, whereas each constraint of the form

defines a half-plane that includes its boundary line Thus the feasible region is always an intersection of finitely many lines and half-planes. For example, the four constraints

of Example 1 define the half-planes illustrated in parts (a), (b), (c), and (d) of Figure 10.2.1. The feasible region of this problem is thus the intersection of these four half-planes, which is illustrated in Figure 10.2.1e.

Figure 10.2.1 It can be shown that the feasible region of a linear programming problem has a boundary consisting of a finite number of straight line segments. If the feasible region can be enclosed in a sufficiently large circle, it is called bounded (Figure 10.2.1e); otherwise, it is called unbounded (see Figure 10.2.5). If the feasible region is empty (contains no points), then the constraints are inconsistent and the linear programming problem has no solution (see Figure 10.2.6). Those boundary points of a feasible region that are intersections of two of the straight line boundary segments are called extreme points. (They are also called corner points and vertex points.) For example, in Figure 10.2.1e, we see that the feasible region of Example 1 has four extreme points: (4)

The importance of the extreme points of a feasible region is shown by the following theorem.

THEOREM 10.2.1 Maximum and Minimum Values If the feasible region of a linear programming problem is nonempty and bounded, then the objective function attains both a maximum and a minimum value, and these occur at extreme points of the feasible region. If the feasible region is unbounded, then the objective function may or may not attain a maximum or minimum value; however, if it attains a maximum or minimum value, it does so at an extreme point.

Figure 10.2.2 suggests the idea behind the proof of this theorem. Since the objective function of a linear programming problem is a linear function of and , its level curves (the curves along which z has constant values) are straight lines. As we move in a direction perpendicular to these level curves, the objective function either increases or decreases monotonically. Within a bounded feasible region, the maximum and minimum values of z must therefore occur at extreme points, as Figure 10.2.2 indicates.

Figure 10.2.2 In the next few examples we use Theorem 10.2.1 to solve several linear programming problems and illustrate the variations in the nature of the solutions that may occur.

E X A M P L E 4 Example 1 Revisited Figure 10.2.1e shows that the feasible region of Example 1 is bounded. Consequently, from Theorem 10.2.1 the objective function attains both its minimum and maximum values at extreme points. The four extreme points and the corresponding values of z are given in the following table.

We see that the largest value of z is 520.00 and the corresponding optimal solution is . Thus the candy manufacturer attains maximum sales of $520 when he produces 260 pounds of mixture A and none of mixture B.

E X A M P L E 5 Using Theorem 10.2.1 Find values of

and

that maximize

subject to

Solution In Figure 10.2.3 we have drawn the feasible region of this problem. Since it is bounded, the maximum value of z is attained at one of the five extreme points. The values of the objective function at the five extreme points are given in the following table.

Figure 10.2.3

From this table, the maximum value of z is 21, which is attained at

E X A M P L E 6 Using Theorem 10.2.1

and

.

Find values of

and

that maximize

subject to

Solution The constraints in this problem are identical to the constraints in Example 5, so the feasible region of this problem is also given by Figure 10.2.3. The values of the objective function at the extreme points are given in the following table.

We see that the objective function attains a maximum value of 48 at two adjacent extreme points, and . This shows that an optimal solution to a linear programming problem need not be unique. As we ask you to show in Exercise 10, if the objective function has the same value at two adjacent extreme points, it has the same value at all points on the straight line boundary segment connecting the two extreme points. Thus, in this example the maximum value of z is attained at all points on the straight line segment connecting the extreme points and .

E X A M P L E 7 The Feasible Region Is a Line Segment Find values of

and

that minimize

subject to

Solution In Figure 10.2.4 we have drawn the feasible region of this problem. Because one of the constraints is an equality constraint, the feasible region is a straight line segment with two extreme points. The values of z at the two extreme points are given in the following table.

Figure 10.2.4

The minimum value of z is thus 4 and is attained at

and

.

E X A M P L E 8 Using Theorem 10.2.1 Find values of

and

that maximize

subject to

Solution The feasible region of this linear programming problem is illustrated in Figure 10.2.5. Since it is unbounded, we are not assured by Theorem 10.2.1 that the objective function attains a maximum value. In fact, it is easily seen that since the feasible region contains points for which both and are arbitrarily large and positive, the objective function can be made arbitrarily large and positive. This problem has no optimal solution. Instead, we say the problem has an unbounded solution.

Figure 10.2.5

E X A M P L E 9 Using Theorem 10.2.1 Find values of

and

that maximize

subject to

Solution The above constraints are the same as those in Example 8, so the feasible region of this problem is also given by Figure 10.2.5. In Exercise 11 we ask you to show that the objective function of this problem attains a maximum within the feasible region. By Theorem 10.2.1, this maximum must be attained at an extreme point. The values of z at the two extreme points of the feasible region are given in the following table.

The maximum value of z is thus 1 and is attained at the extreme point

E X A M P L E 1 0 Inconsistent Constraints

,

.

Find values of

and

that minimize

subject to

Solution As can be seen from Figure 10.2.6, the intersection of the five half-planes defined by the five constraints is empty. This linear programming problem has no feasible solutions since the constraints are inconsistent.

Figure 10.2.6 There are no points common to all five shaded half-planes.

Exercise Set 10.2 1. Find values of

and

that maximize

subject to

Answer: ,

; maximum value of

2. Find values of

and

that minimize

subject to

Answer: No feasible solutions 3. Find values of

and

that minimize

subject to

Answer: Unbounded solution 4. Solve the linear programming problem posed in Example 2. Answer: Invest $6000 in bond A and $4000 in bond B; the annual yield is $880. 5. Solve the linear programming problem posed in Example 3. Answer: cup of milk,

ounces of corn flakes; minimum

6. In Example 5 the constraint is said to be nonbinding because it can be removed from the problem without affecting the solution. Likewise, the constraint is said to be binding because removing it will change the solution. (a) Which of the remaining constraints are nonbinding and which are binding? (b) For what values of the right-hand side of the nonbinding constraint become binding? For what values will the resulting feasible set be empty? (c) For what values of the right-hand side of the binding constraints nonbinding? For what values will the resulting feasible set be empty?

will this constraint

will this constraint become

Answer: (a)

and

(b) (c)

are nonbinding; for

for

is binding

is binding and for is nonbinding and for

yields the empty set. yields the empty set.

7. A trucking firm ships the containers of two companies, A and B. Each container from company A weighs 40 pounds and is 2 cubic feet in volume. Each container from company B weighs 50 pounds and is 3 cubic feet in volume. The trucking firm charges company A $2.20 for each container shipped and charges company B $3.00 for each container shipped. If one of the firm's trucks cannot carry more than 37,000 pounds and cannot hold more than 2000 cubic feet, how many containers from companies A and B should a truck carry to maximize the shipping charges? Answer: 550 containers from company A and 300 containers from company B; maximum shipping 8. Repeat Exercise 7 if the trucking firm raises its price for shipping a container from company A to $2.50. Answer: 925 containers from company A and no containers from company B; maximum shipping 9. A manufacturer produces sacks of chicken feed from two ingredients, A and B. Each sack is to contain at least 10 ounces of nutrient , at least 8 ounces of nutrient , and at least 12 ounces of nutrient . Each pound of ingredient A contains 2 ounces of nutrient , 2 ounces of nutrient , and 6 ounces of nutrient . Each pound of ingredient B contains 5 ounces of nutrient , 3 ounces of nutrient , and 4 ounces of nutrient . If ingredient A costs 8 cents per pound and ingredient B costs 9 cents per pound, how much of each ingredient should the manufacturer use in each sack of feed to minimize his costs? Answer: 0.4 pound of ingredient A and 2.4 pounds of ingredient B; minimum 10. If the objective function of a linear programming problem has the same value at two adjacent extreme points, show that it has the same value at all points on the straight line segment connecting the two extreme points. [Hint: If and are any two points in the plane, a point lies on the straight line segment connecting them if and where t is a number in the interval

.]

11. Show that the objective function in Example 9 attains a maximum value in the feasible set. [Hint: Examine the level curves of the objective function.]

Section 10.2 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Consider the feasible region consisting of

for assuming that (a) (j) , and (k) , , and

,

along with the set of inequalities

. Maximize the objective function , (b) , (c) , (d) , (e) , (f) , (g) , (h) , (i) , . (l) Next, maximize this objective function using the nonlinear feasible region,

(m) Let the results of parts (a) through (k) begin a sequence of values for value determined in part (l)? Explain. T2. Repeat Exercise T1 using the objective function

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

.

. Do these values approach the

10.3 The Earliest Applications of Linear Algebra Linear systems can be found in the earliest writings of many ancient civilizations. In this section we give some examples of the types of problems that they used to solve.

Prerequisites Linear Systems The practical problems of early civilizations included the measurement of land, the distribution of goods, the tracking of resources such as wheat and cattle, and taxation and inheritance calculations. In many cases, these problems led to linear systems of equations since linearity is one of the simplest relationships that can exist among variables. In this section we present examples from five diverse ancient cultures illustrating how they used and solved systems of linear equations. We restrict ourselves to examples before A.D. 500. These examples consequently predate the development of the field of algebra by Islamic/Arab mathematicians, a field that ultimately led in the nineteenth century to the branch of mathematics now called linear algebra.

E X A M P L E 1 Egypt (about 1650 B.C.)

Problem 40 of the Ahmes Papyrus The Ahmes (or Rhind) Papyrus is the source of most of our information about ancient Egyptian mathematics. This 5-meter-long papyrus contains 84 short mathematical problems, together with their solutions, and dates from about 1650 B.C. Problem 40 in this papyrus is the following: Divide 100 hekats of barley among five men in arithmetic progression so that the sum of the two smallest is one-seventh the sum of the three largest. Let a be the least amount that any man obtains, and let d be the common difference of the terms in the arithmetic progression. Then the other four men receive , , , and hekats. The two conditions of the problem require that

These equations reduce to the following system of two equations in two unknowns:

(1) The solution technique described in the papyrus is known as the method of false position or false assumption. It begins by assuming some convenient value of a (in our case ), substituting that value into the second equation, and obtaining . Substituting and into the left-hand side of the first equation gives 60, whereas the right-hand side is 100. Adjusting the initial guess for a by multiplying it by leads to the correct value . Substituting into the second equation then gives , so the quantities of barley received by the five men are , , , , and hekats. This technique of guessing a value of an unknown and later adjusting it has been used by many cultures throughout the ages.

E X A M P L E 2 Babylonia (1900–1600 B.C.)

Babylonian clay tablet Ca MLA 1950 The Old Babylonian Empire flourished in Mesopotamia between 1900 and 1600 B.C. Many clay tablets containing mathematical tables and problems survive from that period, one of which (designated Ca MLA 1950) contains the next problem. The statement of the problem is a bit muddled because of the condition of the tablet, but the diagram and the solution on the tablet indicate that the problem is as follows:

A trapezoid with an area of 320 square units is cut off from a right triangle by a line parallel to one of its sides. The other side has length 50 units, and the height of the trapezoid is 20 units. What are the upper and the lower widths of the trapezoid? Let x be the lower width of the trapezoid and y its upper width. The area of the trapezoid is its height times its average width, so

. Using similar triangles, we also have

. The solution on the tablet uses these relations to generate the linear system

(2) Adding and subtracting these two equations then gives the solution

and

.

E X A M P L E 3 China (A.D. 263)

Chiu Chang Suan Shu in Chinese characters The most important treatise in the history of Chinese mathematics is the Chiu Chang Suan Shu, or “The Nine Chapters of the Mathematical Art.” This treatise, which is a collection of 246 problems and their solutions, was assembled in its final form by Liu Hui in A.D. 263. Its contents, however, go back to at least the beginning of the Han dynasty in the second century B.C. The eighth of its nine chapters, entitled “The Way of Calculating by Arrays,” contains 18 word problems that lead to linear systems in three to six unknowns. The general solution procedure described is almost identical to the Gaussian elimination technique developed in

Europe in the nineteenth century by Carl Friedrich Gauss. The first problem in the eighth chapter is the following: There are three classes of corn, of which three bundles of the first class, two of the second, and one of the third make 39 measures. Two of the first, three of the second, and one of the third make 34 measures. And one of the first, two of the second, and three of the third make 26 measures. How many measures of grain are contained in one bundle of each class?

Let x, y, and z be the measures of the first, second, and third classes of corn. Then the conditions of the problem lead to the following linear system of three equations in three unknowns: (3) The solution described in the treatise represented the coefficients of each equation by an appropriate number of rods placed within squares on a counting table. Positive coefficients were represented by black rods, negative coefficients were represented by red rods, and the squares corresponding to zero coefficients were left empty. The counting table was laid out as follows so that the coefficients of each equation appear in columns with the first equation in the rightmost column:

Next, the numbers of rods within the squares were adjusted to accomplish the following two steps: (1) two times the numbers of the third column were subtracted from three times the numbers in the second column and (2) the numbers in the third column were subtracted from three times the numbers in the first column. The result was the following array:

In this array, four times the numbers in the second column were subtracted from five times the numbers in the first column, yielding

This last array is equivalent to the linear system

This triangular system was solved by a method equivalent to back substitution to obtain , , and .

E X A M P L E 4 Greece (third century B.C.)

Archimedes c. 287–212 B.C. Perhaps the most famous system of linear equations from antiquity is the one associated with the first part of Archimedes' celebrated Cattle Problem. This problem supposedly was posed by Archimedes as a challenge to his colleague Eratosthenes. No solution has come down to us from ancient times, so that it is not known how, or even whether, either of these two geometers solved it. If thou art diligent and wise, O stranger, compute the number of cattle of the Sun, who once upon a time grazed on the fields of the Thrinacian isle of Sicily, divided into four herds of different colors, one milk white, another glossy black, a third yellow, and the last dappled. In each herd were bulls, mighty in number according to these proportions: Understand, stranger, that the white bulls were equal to a half and a third of the black together with the whole of the yellow, while the black were equal to the fourth part of the dappled and a fifth, together with, once more, the whole of the yellow. Observe further that the remaining bulls, the dappled, were equal to a sixth part of the white and a seventh, together with all of the yellow. These were the proportions of the cows: The white were precisely equal to the third part and a fourth of the whole herd of the black; while the black were equal to the fourth part once more of the dappled and with it a

fifth part, when all, including the bulls, went to pasture together. Now the dappled in four parts were equal in number to a fifth part and a sixth of the yellow herd. Finally the yellow were in number equal to a sixth part and a seventh of the white herd. If thou canst accurately tell, O stranger, the number of cattle of the Sun, giving separately the number of well-fed bulls and again the number of females according to each color, thou wouldst not be called unskilled or ignorant of numbers, but not yet shalt thou be numbered among the wise.

The conventional designation of the eight variables in this problem is

The problem can now be stated as the following seven homogeneous equations in eight unknowns: 1.

(The white bulls were equal to a half and a third of the black [bulls] together with the whole of the yellow [bulls].)

2.

(The black [bulls] were equal to the fourth part of the dappled [bulls] and a fifth, together with, once more, the whole of the yellow [bulls].)

3.

(The remaining bulls, the dappled, were equal to a sixth part of the white [bulls] and a seventh, together with all of the yellow [bulls].)

4.

(The white [cows] were precisely equal to the third part and a fourth of the whole herd of the black.)

5.

(The black [cows] were equal to the fourth part once more of the dappled and with it a fifth part, when all, including the bulls, went to pasture together.)

6.

(The dappled [cows] in four parts [that is, in totality] were equal in number to a fifth part and a sixth of the yellow herd.)

7.

(The yellow [cows] were in number equal to a sixth part and a seventh of the white herd.)

As we ask you to show in the exercises, this system has infinitely many solutions of the form

(4)

where k is any real number. The values give infinitely many positive integer solutions to the problem, with giving the smallest solution.

E X A M P L E 5 India (fourth century A.D.)

Fragment III-5-3v of the Bakhshali Manuscript The Bakhshali Manuscript is an ancient work of Indian/Hindu mathematics dating from around the fourth century A.D., although some of its materials undoubtedly come from many centuries before. It consists of about 70 leaves or sheets of birch bark containing mathematical problems and their solutions. Many of its problems are so-called equalization problems that lead to systems of linear equations. One such problem on the fragment shown is the following: One merchant has seven asava horses, a second has nine haya horses, and a third has ten camels. They are equally well off in the value of their animals if each gives two animals, one to each of the others. Find the price of each animal and the total value of the animals possessed by each merchant.

Let x be the price of an asava horse, let y be the price of a haya horse, let z be the price of a camel, and the let K be the total value of the animals possessed by each merchant. Then the conditions of the problem lead to the following system of equations: (5) The method of solution described in the manuscript begins by subtracting the quantity

from both sides of the three equations to obtain . This shows that if the prices x, y, and z are to be integers, then the quantity must be an integer that is divisible by 4, 6, and 7. The manuscript takes the product of these three numbers, or 168, for the value of , which yields , , and for the prices and for the total value. (See Exercise 6 for more solutions to this problem.)

Exercise Set 10.3 1. The following lines from Book 12 of Homer's Odyssey relate a precursor of Archimedes' Cattle Problem: Thou shalt ascend the isle triangular, Where many oxen of the Sun are fed, And fatted flocks. Of oxen fifty head In every herd feed, and their herds are seven; And of his fat flocks is their number even. The last line means that there are as many sheep in all the flocks as there are oxen in all the herds. What is the total number of oxen and sheep that belong to the god of the Sun? (This was a difficult problem in Homer's day.) Answer: 700 2. Solve the following problems from the Bakhshali Manuscript. (a) B possesses two times as much as A; C has three times as much as A and B together; D has four times as much as A, B, and C together. Their total possessions are 300. What is the possession of A? (b) B gives 2 times as much as A; C gives 3 times as much as B; D gives 4 times as much as C. Their total gift is 132. What is the gift of A? Answer: (a) 5 (b) 4 3. A problem on a Babylonian tablet requires finding the length and width of a rectangle given that the length and the width add up to 10, while the length and one-fourth of the width add up to 7. The solution provided on the tablet consists of the following four statements:

Multiply 7 by 4 to obtain 28. Take away 10 from 28 to obtain 18. Take one-third of 18 to obtain 6, the length. Take away 6 from 10 to obtain 4, the width. Explain how these steps lead to the answer. 4. The following two problems are from “The Nine Chapters of the Mathematical Art.” Solve them using the array technique described in Example 3. (a) Five oxen and two sheep are worth 10 units and two oxen and five sheep are worth 8 units. What is the value of each ox and sheep? (b) There are three kinds of corn. The grains contained in two, three, and four bundles, respectively, of these three classes of corn, are not sufficient to make a whole measure. However, if we added to them one bundle of the second, third, and first classes, respectively, then the grains would become on full measure in each case. How many measures of grain does each bundle of the different classes contain? Answer: (a) Ox,

units; sheep,

(b) First kind,

unit

measure; second kind,

measure; third kind,

measure

5. This problem in part (a) is known as the “Flower of Thymaridas,” named after a Pythagorean of the fourth century B.C. (a) Given the n numbers

, solve for

in the following linear system:

(b) Identify a problem in this exercise set that fits the pattern in part (a), and solve it using your general solution. Answer: (a) (b) Exercise 7(b); gold,

, minae; brass,

, minae; tin,

minae; iron,

minae

6. For Example 5 from the Bakhshali Manuscript: (a) Express Equations 5 as a homogeneous linear system of three equations in four unknowns (x, y, z, and K) and show that the solution set has one arbitrary parameter. (b) Find the smallest solution for which all four variables are positive integers.

(c) Show that the solution given in Example 5 is included among your solutions. Answer: (a)

where t is an arbitrary number (b) Take

, so that

,

,

,

.

(c) Take

, so that

,

,

,

.

7. Solve the problems posed in the following three epigrams, which appear in a collection entitled “The Greek Anthology,” compiled in part by a scholar named Metrodorus around A.D. 500. Some of its 46 mathematical problems are believed to date as far back as 600 B.C. [Note: Before solving parts (a) and (c), you will have to formulate the question.] (a) I desire my two sons to receive the thousand staters of which I am possessed, but let the fifth part of the legitimate one's share exceed by ten the fourth part of what falls to the illegitimate one. (b) Make me a crown weighing sixty minae, mixing gold and brass, and with them tin and much-wrought iron. Let the gold and brass together form two-thirds, the gold and tin together three-fourths, and the gold and iron three-fifths. Tell me how much gold you must put in, how much brass, how much tin, and how much iron, so as to make the whole crown weigh sixty minae. (c) First person: I have what the second has and the third of what the third has. Second person: I have what the third has and the third of what the first has. Third person: And I have ten minae and the third of what the second has. Answer: (a) Legitimate son, (b) Gold,

staters; illegitimate son,

minae; brass,

minae; tin,

(c) First person, 45; second person,

staters

minae; iron,

minae

; third person,

Section 10.3 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets.

T1. (a) Solve Archimedes' Cattle Problem using a symbolic algebra program. (b) The Cattle Problem has a second part in which two additional conditions are imposed. The first of these states that “When the white bulls mingled their number with the black, they stood firm, equal in depth and breadth.” This requires that be a square number, that is, 1, 4, 9, 16, 25, and so on. Show that this requires that the values of k in Eq. 4 be restricted as follows: and find the smallest total number of cattle that satisfies this second condition. Remark The second condition imposed in the second part of the Cattle Problem states that “When the yellow and the dappled bulls were gathered into one herd, they stood in such a manner that their number, beginning from one, grew slowly greater ’til it completed a triangular figure.” This requires that the quantity be a triangular number—that is, a number of the form , , , . This final part of the problem was not completely solved until 1965 when all 206,545 digits of the smallest number of cattle that satisfies this condition were found using a computer. T2. The following problem is from “The Nine Chapters of the Mathematical Art” and determines a homogeneous linear system of five equations in six unknowns. Show that the system has infinitely many solutions, and find the one for which the depth of the well and the lengths of the five ropes are the smallest possible positive integers. Suppose that five families share a well. Suppose further that 2 of A's ropes are short of the well's depth by one of B's ropes. 3 of B's ropes are short of the well's depth by one of C's ropes. 4 of C's ropes are short of the well's depth by one of D's ropes. 5 of D's ropes are short of the well's depth by one of E's ropes. 6 of E's ropes are short of the well's depth by one of A's ropes.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.4 Cubic Spline Interpolation In this section an artist's drafting aid is used as a physical model for the mathematical problem of finding a curve that passes through specified points in the plane. The parameters of the curve are determined by solving a linear system of equations.

Prerequisites Linear Systems Matrix Algebra Differential Calculus

Curve Fitting Fitting a curve through specified points in the plane is a common problem encountered in analyzing experimental data, in ascertaining the relations among variables, and in design work. A ubiquitous application is in the design and description of computer and printer fonts, such as PostScript™ and TrueType™ fonts (Figure 10.4.1). In Figure 10.4.2 seven points in the xy-plane are displayed, and in Figure 10.4.4 a smooth curve has been drawn that passes through them. A curve that passes through a set of points in the plane is said to interpolate those points, and the curve is called an interpolating curve for those points. The interpolating curve in Figure 10.4.4 was drawn with the aid of a drafting spline (Figure 10.4.3). This drafting aid consists of a thin, flexible strip of wood or other material that is bent to pass through the points to be interpolated. Attached sliding weights hold the spline in position while the artist draws the interpolating curve. The drafting spline will serve as the physical model for a mathematical theory of interpolation that we will discuss in this section.

Figure 10.4.1

Figure 10.4.2

Figure 10.4.3

Figure 10.4.4

Statement of the Problem Suppose that we are given n points in the xy-plane, which we wish to interpolate with a “well-behaved” curve (Figure 10.4.5). For convenience, we take the points to be equally spaced in the x-direction, although our results can easily be extended to the case of unequally spaced points. If we let the common distance between the x-coordinates of the points be h, then we have Let , denote the interpolating curve that we seek. We assume that this curve describes the displacement of a drafting spline that interpolates the n points when the weights holding down the spline are situated precisely at the n points. It is known from linear beam theory that for small displacements, the fourth derivative of the displacement of a beam is zero along any interval of the x-axis that contains no external forces acting on the beam. If we treat our drafting spline as a thin beam and realize that the only external forces acting on it arise from the weights at the n specified points, then it follows that (1) for values of x lying in the

open intervals

between the n points.

Figure 10.4.5 We also need the result from linear beam theory that states that for a beam acted upon only by external forces, the displacement must have two continuous derivatives. In the case of the interpolating curve constructed by the drafting spline, this means that , , and must be continuous for . The condition that

be continuous is what causes a drafting spline to produce a pleasing curve, as it results in continuous

curvature. The eye can perceive sudden changes in curvature—that is, discontinuities in derivatives are not discernible. Thus, the condition that

—but sudden changes in higher

be continuous is the minimal prerequisite for the interpolating

curve to be perceptible as a single smooth curve, rather than as a series of separate curves pieced together. To determine the mathematical form of the function , we observe that because in the intervals between the n specified points, it follows by integrating this equation four times that must be a cubic polynomial in x in each such interval. In general, however, will be a different cubic polynomial in each interval, so must have the form

(2)

where

are cubic polynomials. For convenience, we will write these in the form

(3)

The 's, 's, 's, and 's constitute a total of coefficients that we must determine to specify completely. If we choose these coefficients so that interpolates the n specified points in the plane and , , and are continuous, then the resulting interpolating curve is called a cubic spline.

Derivation of the Formula of a Cubic Spline From Equations 2 and 3, we have

(4)

so

(5)

and

(6)

We will now use these equations and the four properties of cubic splines stated below to express the unknown coefficients , , , , in terms of the known coordinates . 1.

interpolates the points Because

interpolates the points

,

. ,

, we have

,

(7) From the first

of these equations and 4, we obtain

(8)

From the last equation in 7, the last equation in 4, and the fact that

, we obtain (9)

2.

is continuous on Because

.

is continuous for

, it follows that at each point

in the set

we must have (10)

Otherwise, the graphs of interpolating property

and would not join together to form a continuous curve at . When we apply the , it follows from 10 that , , or from 4 that

(11)

3.

is continuous on Because

.

is continuous for

, it follows that

or, from 5,

(12)

4.

is continuous on Because

is continuous for

. , it follows that

or, from 6,

(13)

Equations 8, 9, 11, 12, and 13 constitute a system of linear equations in the unknown coefficients , , , , . Consequently, we need two more equations to determine these coefficients uniquely. Before obtaining these additional equations, however, we can simplify our existing system by expressing the unknowns , , , and in terms of

new unknown quantities and the known quantities For example, from 6 it follows that

so

Moreover, we already know from 8 that We leave it as an exercise for you to derive the expressions for the as follows:

's and

's in terms of the

's and

's. The final result is

THEOREM 10.4.1 Cubic Spline Interpolation Given n points

with

,

, the cubic spline

that interpolates these points has coefficients given by

(14)

for

, where

,

.

From this result, we see that the quantities uniquely determine the cubic spline. To find these quantities, we substitute the expressions for , , and given in 14 into 12. After some algebraic simplification, we obtain

(15)

or, in matrix form,

This is a linear system of equations for the n unknowns . Thus, we still need two additional equations to determine uniquely. The reason for this is that there are infinitely many cubic splines that interpolate the given points, so we simply do not have enough conditions to determine a unique cubic spline passing through the points. We discuss below three possible ways of specifying the two additional conditions required to obtain a unique cubic spline through the points. (The exercises present two more.) They are summarized in Table 1. Table 1

The Natural Spline The two simplest mathematical conditions we can impose are These conditions together with 15 result in an

linear system for

For numerical calculations it is more convenient to eliminate

and

, which can be written in matrix form as

from this system and write

(16)

together with (17)

(18) Thus, the determined by 17 and 18.

linear system can be solved for the

coefficients

, and

and

are

Physically, the natural spline results when the ends of a drafting spline extend freely beyond the interpolating points without constraint. The end portions of the spline outside the interpolating points will fall on straight line paths, causing to vanish at the endpoints

and

and resulting in the mathematical conditions

.

The natural spline tends to flatten the interpolating curve at the endpoints, which may be undesirable. Of course, if it is required that vanish at the endpoints, then the natural spline must be used.

The Parabolic Runout Spline The two additional constraints imposed for this type of spline are (19)

(20) If we use the preceding two equations to eliminate

and

from 15, we obtain the

linear system

(21)

for

. Once these

values have been determined,

and

are determined from 19 and 20.

From 14 we see that implies that , and implies that . Thus, from 3 there are no cubic terms in the formula for the spline over the end intervals and . Hence, as the name suggests, the parabolic runout spline reduces to a parabolic curve over these end intervals.

The Cubic Runout Spline

For this type of spline, we impose the two additional conditions (22)

(23) Using these two equations to eliminate :

and

from 15 results in the following

linear system for

(24)

After we solve this linear system for

, we can use 22 and 23 to determine

and

.

If we rewrite 22 as it follows from 14 that

. Because

on

and

on

, we see that

is

constant over the entire interval . Consequently, consists of a single cubic curve over the interval rather than two different cubic curves pieced together at . [To see this, integrate three times.] A similar analysis shows that consists of a single cubic curve over the last two intervals. Whereas the natural spline tends to produce an interpolating curve that is flat at the endpoints, the cubic runout spline has the opposite tendency: it produces a curve with pronounced curvature at the endpoints. If neither behavior is desired, the parabolic runout spline is a reasonable compromise.

E X A M P L E 1 Using a Parabolic Runout Spline The density of water is well known to reach a maximum at a temperature slightly above freezing. Table 2, from the Handbook of Chemistry and Physics (CRC Press, 2009), gives the density of water in grams per cubic centimeter for five equally spaced temperatures from to . We will interpolate these five temperature–density measurements with a parabolic runout spline and attempt to find the maximum density of water in this range by finding the maximum value on this cubic spline. In the exercises we ask you to perform similar calculations using a natural spline and a cubic runout spline to interpolate the data points. Table 2

Set

Then

and the linear system 21 for the parabolic runout spline becomes

Solving this system yields

From 19 and 20, we have

Solving for the runout spline:

's,

's,

's, and

's in 14, we obtain the following expression for the interpolating parabolic

This spline is plotted in Figure 10.4.6. From that figure we see that the maximum is attained in the interval . To find this maximum, we set equal to zero in the interval :

To three significant digits the root of this quadratic in the interval is , and for this value of x, . Thus, according to our interpolated estimate, the maximum density of water is attained at . This agrees well with the experimental maximum density of attained at

. (In the original metric system, the gram was defined as the mass of one cubic

centimeter of water at its maximum density.)

Figure 10.4.6

Closing Remarks In addition to producing excellent interpolating curves, cubic splines and their generalizations are useful for numerical integration and differentiation, for the numerical solution of differential and integral equations, and in optimization theory.

Exercise Set 10.4 1. Derive the expressions for

and

in Equations 14 of Theorem 10.4.1.

2. The six points

lie on the graph of

, where x is in radians.

(a) Find the portion of the parabolic runout spline that interpolates these six points for five decimal places in your calculations. (b) Calculate value of

for the spline you found in part (a). What is the percentage error of ?

. Maintain an accuracy of with respect to the “exact”

Answer: (a) (b) 3. The following five points lie on a single cubic curve. (a) Which of the three types of cubic splines (natural, parabolic runout, or cubic runout) would agree exactly with the single cubic curve on which the five points lie? (b) Determine the cubic spline you chose in part (a), and verify that it is a single cubic curve that interpolates the five points. Answer: (a) The cubic runout spline

(b) 4. Repeat the calculations in Example 1 using a natural spline to interpolate the five data points. Answer:

Maximum at 5. Repeat the calculations in Example 1 using a cubic runout spline to interpolate the five data points. Answer:

Maximum at 6. Consider the five points

,

,

,

(a) Use a natural spline to interpolate the data points

, and ,

(b) Use a natural spline to interpolate the data points

on the graph of , and

,

, and

.

. .

(c) Explain the unusual nature of your result in part (b). Answer: (a)

(b) (c) The three data points are collinear. 7. (The Periodic Spline) If it is known or if it is desired that the n points on a single cycle of a periodic curve with period , then an interpolating cubic spline

(a) Show that these three periodicity conditions require that

to be interpolated lie must satisfy

(b) Using the three equations in part (a) and Equations 15, construct an in matrix form.

linear system for

Answer: (b)

8. (The Clamped Spline) Suppose that, in addition to the n points to be interpolated, we are given specific values the slopes and of the interpolating cubic spline at the endpoints and .

and

for

(a) Show that

(b) Using the equations in part (a) and Equations 15, construct an

linear system for

in matrix form.

Remark The clamped spline described in this exercise is the most accurate type of spline for interpolation work if the slopes at the endpoints are known or can be estimated. Answer: (b)

Section 10.4 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. In the solution of the natural cubic spline problem, it is necessary to solve a system of equations having coefficient matrix

If we can present a formula for the inverse of this matrix, then the solution for the natural cubic spline problem can be easily obtained. In this exercise and the next, we use a computer to discover this formula. Toward this end, we first determine an

expression for the determinant of

, denoted by the symbol

. Given that

we see that and

(a) Use the cofactor expansion of determinants to show that for

. This says, for example, that

and so on. Using a computer, check this result for

.

(b) By writing and the identity,

, in matrix form,

show that

(c) Use the methods in Section 5.2 and a computer to show that

and hence

for

.

(d) Using a computer, check this result for

.

T2. In this exercise, we determine a formula for calculating to be 1. (a) Use a computer to compute

for

from

for

, assuming that

, 2, 3, 4, and 5.

(b) From your results in part (a), discover the conjecture that

where

for

and

.

(c) Use the result in part (b) to compute

and compare it to the result obtained using the computer.

is defined

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.5 Markov Chains In this section we describe a general model of a system that changes from state to state. We then apply the model to several concrete problems.

Prerequisites Linear Systems Matrices Intuitive Understanding of Limits

A Markov Process Suppose a physical or mathematical system undergoes a process of change such that at any moment it can occupy one of a finite number of states. For example, the weather in a certain city could be in one of three possible states: sunny, cloudy, or rainy. Or an individual could be in one of four possible emotional states: happy, sad, angry, or apprehensive. Suppose that such a system changes with time from one state to another and at scheduled times the state of the system is observed. If the state of the system at any observation cannot be predicted with certainty, but the probability that a given state occurs can be predicted by just knowing the state of the system at the preceding observation, then the process of change is called a Markov chain or Markov process.

DEFINITION 1 If a Markov chain has k possible states, which we label as , then the probability that the system is in state i at any observation after it was in state j at the preceding observation is denoted by and is called the transition probability from state j to state i. The matrix is called the transition matrix of the Markov chain. For example, in a three-state Markov chain, the transition matrix has the form

In this matrix, is the probability that the system will change from state 2 to state 3, the system will still be in state 1 if it was previously in state 1, and so forth.

is the probability that

E X A M P L E 1 Transition Matrix of the Markov Chain A car rental agency has three rental locations, denoted by 1, 2, and 3. A customer may rent a car from any of the three locations and return the car to any of the three locations. The manager finds that customers return the cars to the various locations according to the following probabilities:

This matrix is the transition matrix of the system considered as a Markov chain. From this matrix, the probability is that a car rented from location 3 will be returned to location 2, the probability is that a car rented from location 1 will be returned to location 1, and so forth.

E X A M P L E 2 Transition Matrix of the Markov Chain By reviewing its donation records, the alumni office of a college finds that 80% of its alumni who contribute to the annual fund one year will also contribute the next year, and 30% of those who do not contribute one year will contribute the next. This can be viewed as a Markov chain with two states: state 1 corresponds to an alumnus giving a donation in any one year, and state 2 corresponds to the alumnus not giving a donation in that year. The transition matrix is

In the examples above, the transition matrices of the Markov chains have the property that the entries in any column sum to 1. This is not accidental. If is the transition matrix of any Markov chain with k states, then for each j we must have (1) because if the system is in state j at one observation, it is certain to be in one of the k possible states at the next observation. A matrix with property 1 is called a stochastic matrix, a probability matrix, or a Markov matrix. From the preceding discussion, it follows that the transition matrix for a Markov chain must be a stochastic matrix. In a Markov chain, the state of the system at any observation time cannot generally be determined with certainty. The best one can usually do is specify probabilities for each of the possible states. For example, in a Markov chain with three states, we might describe the possible state of the system at some observation time by a column vector

in which is the probability that the system is in state 1, the probability that it is in state 2, and probability that it is in state 3. In general we make the following definition.

the

DEFINITION 2 The state vector for an observation of a Markov chain with k states is a column vector x whose ith component is the probability that the system is in the ith state at that time. Observe that the entries in any state vector for a Markov chain are nonnegative and have a sum of 1. (Why?) A column vector that has this property is called a probability vector. Let us suppose now that we know the state vector for a Markov chain at some initial observation. The following theorem will enable us to determine the state vectors at the subsequent observation times.

THEOREM 10.5.1 If P is the transition matrix of a Markov chain and .

is the state vector at the nth observation, then

The proof of this theorem involves ideas from probability theory and will not be given here. From this theorem, it follows that

In this way, the initial state vector

and the transition matrix P determine

for

.

E X A M P L E 3 Example 2 Revisited The transition matrix in Example 2 was

We now construct the probable future donation record of a new graduate who did not give a donation in the

initial year after graduation. For such a graduate the system is initially in state 2 with certainty, so the initial state vector is

From Theorem 10.5.1 we then have

Thus, after three years the alumnus can be expected to make a donation with probability .525. Beyond three years, we find the following state vectors (to three decimal places):

For all n beyond 11, we have

to three decimal places. In other words, the state vectors converge to a fixed vector as the number of observations increases. (We will discuss this further below.)

E X A M P L E 4 Example 1 Revisited The transition matrix in Example 1 was

If a car is rented initially from location 2, then the initial state vector is

Using this vector and Theorem 10.5.1, one obtains the later state vectors listed in Table 1. Table 1

For all values of n greater than 11, all state vectors are equal to

to three decimal places.

Two things should be observed in this example. First, it was not necessary to know how long a customer kept the car. That is, in a Markov process the time period between observations need not be regular. Second, the state vectors approach a fixed vector as n increases, just as in the first example.

E X A M P L E 5 Using Theorem 10.5.1 A traffic officer is assigned to control the traffic at the eight intersections indicated in Figure 10.5.1. She is instructed to remain at each intersection for an hour and then to either remain at the same intersection or move to a neighboring intersection. To avoid establishing a pattern, she is told to choose her new intersection on a random basis, with each possible choice equally likely. For example, if she is at intersection 5, her next intersection can be 2, 4, 5, or 8, each with probability . Every day she starts at the location where she stopped the day before. The transition matrix for this Markov chain is

Figure 10.5.1 If the traffic officer begins at intersection 5, her probable locations, hour by hour, are given by the state vectors given in Table 2. For all values of n greater than 22, all state vectors are equal to to three decimal places. Thus, as with the first two examples, the state vectors approach a fixed vector as n increases. Table 2

Limiting Behavior of the State Vectors In our examples we saw that the state vectors approached some fixed vector as the number of observations increased. We now ask whether the state vectors always approach a fixed vector in a Markov chain. A simple example shows that this is not the case.

E X A M P L E 6 System Oscillates Between Two State Vectors Let

Then, because

and

, we have that

and

This system oscillates indefinitely between the two state vectors

and

, so it does not

approach any fixed vector.

However, if we impose a mild condition on the transition matrix, we can show that a fixed limiting state vector is approached. This condition is described by the following definition.

DEFINITION 3 A transition matrix is regular if some integer power of it has all positive entries. Thus, for a regular transition matrix P, there is some positive integer m such that all entries of are positive. This is the case with the transition matrices of Examples 1 and 2 for . In Example 5 it turns out that has all positive entries. Consequently, in all three examples the transition matrices are regular. A Markov chain that is governed by a regular transition matrix is called a regular Markov chain. We will see that every regular Markov chain has a fixed state vector q such that approaches q as n increases for any choice of . This result is of major importance in the theory of Markov chains. It is based on the following theorem.

THEOREM 10.5.2 Behavior of Pn as If P is a regular transition matrix, then as

where the

are positive numbers such that

,

.

We will not prove this theorem here. We refer you to a more specialized text, such as J. Kemeny and J. Snell, Finite Markov Chains (New York: Springer-Verlag, 1976). Let us set

Thus, Q is a transition matrix, all of whose columns are equal to the probability vector q. Q has the property that if x is any probability vector, then

That is, Q transforms any probability vector x into the fixed probability vector q. This result leads to the following theorem.

THEOREM 10.5.3 Behavior of Pnx as If P is a regular transition matrix and x is any probability vector, then as

,

where q is a fixed probability vector, independent of n, all of whose entries are positive.

This result holds since Theorem 10.5.2 implies that

as

. This in turn implies that

as . Thus, for a regular Markov chain, the system eventually approaches a fixed state vector q. The vector q is called the steady-state vector of the regular Markov chain. For systems with many states, usually the most efficient technique of computing the steady-state vector q is simply to calculate for some large n. Our examples illustrate this procedure. Each is a regular Markov process, so that convergence to a steady-state vector is ensured. Another way of computing the steady-state vector is to make use of the following theorem.

THEOREM 10.5.4 Steady-State Vector The steady-state vector q of a regular transition matrix P is the unique probability vector that satisfies the equation .

To see this, consider the matrix identity . By Theorem 10.5.2, both and approach Q as . Thus, we have . Any one column of this matrix equation gives . To show that q is the only probability vector that satisfies this equation, suppose r is another probability vector such that . Then also for . When we let , Theorem 10.5.3 leads to . Theorem 10.5.4 can also be expressed by the statement that the homogeneous linear system has a unique solution vector q with nonnegative entries that satisfy the condition apply this technique to the computation of the steady-state vectors for our examples.

. We can

E X A M P L E 7 Example 2 Revisited In Example 2 the transition matrix was

so the linear system

is (2)

This leads to the single independent equation or Thus, when we set

, any solution of 2 is of the form

where s is an arbitrary constant. To make the vector q a probability vector, we set . Consequently,

is the steady-state vector of this regular Markov chain. This means that over the long run, 60% of the alumni will give a donation in any one year, and 40% will not. Observe that this agrees with the result obtained numerically in Example 3.

E X A M P L E 8 Example 1 Revisited In Example 1 the transition matrix was

so the linear system

is

The reduced row echelon form of the coefficient matrix is (verify)

so the original linear system is equivalent to the system

When we set

, any solution of the linear system is of the form

To make this a probability vector, we set

Thus, the steady-state vector of the system is

This agrees with the result obtained numerically in Table 1. The entries of q give the long-run probabilities that any one car will be returned to location 1, 2, or 3, respectively. If the car rental agency has a fleet of 1000 cars, it should design its facilities so that there are at least 558 spaces at location 1, at least 230 spaces at location 2, and at least 214 spaces at location 3.

E X A M P L E 9 Example 5 Revisited We will not give the details of the calculations but simply state that the unique probability vector solution of the linear system is

The entries in this vector indicate the proportion of time the traffic officer spends at each intersection over the long term. Thus, if the objective is for her to spend the same proportion of time at each intersection, then the strategy of random movement with equal probabilities from one intersection to another is not a good one. (See Exercise 5.)

Exercise Set 10.5 1. Consider the transition matrix

(a)

Calculate

for

if

.

(b) State why P is regular and find its steady-state vector. Answer: (a) (b) P is regular since all entries of P are positive;

2. Consider the transition matrix

(a) Calculate

,

, and

to three decimal places if

(b) State why P is regular and find its steady-state vector. Answer: (a)

(b) P is regular, since all entries of P are positive:

3. Find the steady-state vectors of the following regular transition matrices: (a)

(b) (c)

Answer: (a)

(b)

(c)

4. Let P be the transition matrix

(a) Show that P is not regular. (b)

Show that as n increases,

approaches

for any initial state vector

.

(c) What conclusion of Theorem 10.5.3 is not valid for the steady state of this transition matrix? Answer: (a) Thus, no integer power of P has all positive entries.

(b) (c)

as n increases, so The entries of the limiting vector

for any

as n increases.

are not all positive.

5. Verify that if P is a regular transition matrix all of whose row sums are equal to 1, then the entries of its steady-state vector are all equal to . 6. Show that the transition matrix

is regular, and use Exercise 5 to find its steady-state vector. Answer:

has all positive entries;

7. John is either happy or sad. If he is happy one day, then he is happy the next day four times out of five. If he is sad one day, then he is sad the next day one time out of three. Over the long term, what are the chances that John is happy on any given day? Answer:

8. A country is divided into three demographic regions. It is found that each year 5% of the residents of region 1 move to region 2, and 5% move to region 3. Of the residents of region 2, 15% move to region 1 and 10% move to region 3. And of the residents of region 3, 10% move to region 1 and 5% move to region 2. What percentage of the population resides in each of the three regions after a long period of time? Answer: in region 1,

in region 2, and

in region 3

Section 10.5 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Consider the sequence of transition matrices with

and so on. (a) Use a computer to show that each of these four matrices is regular by computing their squares. (b) Verify Theorem 10.5.2 by computing the 100th power of for the limiting value of as for all .

. Then make a conjecture as to

(c) Verify that the common column of the limiting matrix you found in part (b) satisfies the equation , as required by Theorem 10.5.4. T2. A mouse is placed in a box with nine rooms as shown in the accompanying figure. Assume that it is equally likely that the mouse goes through any door in the room or stays in the room. (a) Construct the

transition matrix for this problem and show that it is regular.

(b) Determine the steady-state vector for the matrix. (c) Use a symmetry argument to show that this problem may be solved using only a

Figure Ex-T2

matrix.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.6 Graph Theory In this section we introduce matrix representations of relations among members of a set. We use matrix arithmetic to analyze these relationships.

Prerequisites Matrix Addition and Multiplication

Relations Among Members of a Set There are countless examples of sets with finitely many members in which some relation exists among members of the set. For example, the set could consist of a collection of people, animals, countries, companies, sports teams, or cities; and the relation between two members, A and B, of such a set could be that person A dominates person B, animal A feeds on animal B, country A militarily supports country B, company A sells its product to company B, sports team A consistently beats sports team B, or city A has a direct airline flight to city B. We will now show how the theory of directed graphs can be used to mathematically model relations such as those in the preceding examples.

Directed Graphs A directed graph is a finite set of elements, , together with a finite collection of ordered pairs of distinct elements of this set, with no ordered pair being repeated. The elements of the set are called vertices, and the ordered pairs are called directed edges, of the directed graph. We use the notation (which is read “ is connected to ”) to indicate that the directed edge belongs to the directed graph. Geometrically, we can visualize a directed graph (Figure 10.6.1) by representing the vertices as points in the plane and representing the directed edge by drawing a line or arc from vertex to vertex , with an arrow pointing from to . If both and hold (denoted , we draw a single line between and with two oppositely pointing arrows (as with and in the figure).

Figure 10.6.1

As in Figure 10.6.1, for example, a directed graph may have separate “components” of vertices that are connected only among themselves; and some vertices, such as , may not be connected with any other vertex. Also, because is not permitted in a directed graph, a vertex cannot be connected with itself by a single arc that does not pass through any other vertex. Figure 10.6.2 shows diagrams representing three more examples of directed graphs. With a directed graph having n vertices, we may associate an matrix , called the vertex matrix of the directed graph. Its elements are defined by

for i,

. For the three directed graphs in Figure 10.6.2, the corresponding vertex matrices are

Figure 10.6.2 By their definition, vertex matrices have the following two properties: (i) All entries are either 0 or 1. (ii) All diagonal entries are 0. Conversely, any matrix with these two properties determines a unique directed graph having the given matrix as its vertex matrix. For example, the matrix

determines the directed graph in Figure 10.6.3.

Figure 10.6.3

E X A M P L E 1 Influences Within a Family A certain family consists of a mother, father, daughter, and two sons. The family members have influence, or power, over each other in the following ways: the mother can influence the daughter and the oldest son; the father can influence the two sons; the daughter can influence the father; the oldest son can influence the youngest son; and the youngest son can influence the mother. We may model this family influence pattern with a directed graph whose vertices are the five family members. If family member A influences family member B, we write . Figure 10.6.4 is the resulting directed graph, where we have used obvious letter designations for the five family members. The vertex matrix of this directed graph is

Figure 10.6.4

E X A M P L E 2 Vertex Matrix: Moves on a Chessboard In chess the knight moves in an “L”-shaped pattern about the chessboard. For the board in Figure 10.6.5 it may move horizontally two squares and then vertically one square, or it may move vertically two squares and then horizontally one square. Thus, from the center square in the figure, the knight may move to any of the eight marked shaded squares. Suppose that the knight is restricted to the nine numbered squares in Figure 10.6.6. If by we mean that the knight may move from square i to square j, the directed graph in Figure 10.6.7 illustrates all

possible moves that the knight may make among these nine squares. In Figure 10.6.8 we have “unraveled” Figure 10.6.7 to make the pattern of possible moves clearer. The vertex matrix of this directed graph is given by

Figure 10.6.5

Figure 10.6.6

Figure 10.6.7

Figure 10.6.8 In Example 1 the father cannot directly influence the mother; that is, is not true. But he can influence the youngest son, who can then influence the mother. We write this as and call it a 2-step connection from F to M. Analogously, we call a 1-step connection, a 3-step connection, and so forth. Let us now consider a technique for finding the number of all possible r-step connections from one vertex to another vertex of an arbitrary directed graph. (This will include the case when and are the same vertex.) The number of 1-step connections from to is simply . That is, there is either zero or one 1-step connection from to , depending on whether is zero or one. For the number of 2-step connections, we consider the square of the vertex matrix. If we let be the

-th element of

, we have (1)

Now, if , there is a 2-step connection from to . But if either or is zero, such a 2-step connection is not possible. Thus is a 2-step connection if and only if . Similarly, for any , is a 2-step connection from to if and only if the term on the right side of 1 is one; otherwise, the term is zero. Thus, the right side of1 is the total number of two 2-step connections from to . A similar argument will work for finding the number of general, we have the following result.

-step connections from

to

THEOREM 10.6.1 Let M be the vertex matrix of a directed graph and let is equal to the number of r-step connections from

to

E X A M P L E 3 Using Theorem 10.6.1

be the .

-th element of

. Then

. In

Figure 10.6.9 is the route map of a small airline that services the four cities a directed graph, its vertex matrix is

,

,

,

. As

We have that

If we are interested in connections from city to city , we may use Theorem 10.6.1 to find their number. Because , there is one 1-step connection; because , there is one 2-step connection; and because

, there are three 3-step connections. To verify this,

from Figure 10.6.9 we find

Figure 10.6.9

Cliques In everyday language a “clique” is a closely knit group of people (usually three or more) that tends to communicate within itself and has no place for outsiders. In graph theory this concept is given a more precise meaning.

DEFINITION 1 A subset of a directed graph is called a clique if it satisfies the following three conditions: (i)

The subset contains at least three vertices.

(ii) For each pair of vertices

and

in the subset, both

and

are true.

(iii) The subset is as large as possible; that is, it is not possible to add another vertex to the subset and still satisfy condition (ii).

This definition suggests that cliques are maximal subsets that are in perfect “communication” with each other. For example, if the vertices represent cities, and means that there is a direct airline flight from city to city , then there is a direct flight between any two cities within a clique in either direction.

E X A M P L E 4 A Directed Graph with Two Cliques The directed graph illustrated in Figure 10.6.10 (which might represent the route map of an airline) has two cliques: This example shows that a directed graph may contain several cliques and that a vertex may simultaneously belong to more than one clique.

Figure 10.6.10

For simple directed graphs, cliques can be found by inspection. But for large directed graphs, it would be desirable to have a systematic procedure for detecting cliques. For this purpose, it will be helpful to define a matrix related to a given directed graph as follows:

The matrix S determines a directed graph that is the same as the given directed graph, with the exception that the directed edges with only one arrow are deleted. For example, if the original directed graph is given by Figure 10.6.11a, the directed graph that has S as its vertex matrix is given in Figure 10.6.11b. The matrix S may be obtained from the vertex matrix M of the original directed graph by setting if and setting otherwise.

Figure 10.6.11 The following theorem, which uses the matrix S, is helpful for identifying cliques.

THEOREM 10.6.2 Identifying Cliques Let

Proof If

be the

-th element of

. Then a vertex

belongs to some clique if and only if

, then there is at least one 3-step connection from

.

to itself in the modified directed graph

determined by S. Suppose it is . In the modified directed graph, all directed relations are two-way, so we also have the connections . But this means that is either a clique or a subset of a clique. In either case, must belong to some clique. The converse statement, “if belongs to a clique, then ,” follows in a similar manner.

E X A M P L E 5 Using Theorem 10.6.2 Suppose that a directed graph has as its vertex matrix

Then

Because all diagonal entries of graph has no cliques.

are zero, it follows from Theorem 10.6.2 that the directed

E X A M P L E 6 Using Theorem 10.6.2 Suppose that a directed graph has as its vertex matrix

Then

The nonzero diagonal entries of

are

,

, and

. Consequently, in the given directed

graph, , , and belong to cliques. Because a clique must contain at least three vertices, the directed graph has only one clique, .

Dominance-Directed Graphs In many groups of individuals or animals, there is a definite “pecking order” or dominance relation between any two members of the group. That is, given any two individuals A and B, either A dominates B or B

dominates A, but not both. In terms of a directed graph in which means dominates , this means that for all distinct pairs, either or , but not both. In general, we have the following definition.

DEFINITION 2 A dominance-directed graph is a directed graph such that for any distinct pair of vertices either or , but not both.

and

,

An example of a directed graph satisfying this definition is a league of n sports teams that play each other exactly one time, as in one round of a round-robin tournament in which no ties are allowed. If means that team beat team in their single match, it is easy to see that the definition of a dominance-directed group is satisfied. For this reason, dominance-directed graphs are sometimes called tournaments. Figure 10.6.12 illustrates some dominance-directed graphs with three, four, and five vertices, respectively. In these three graphs, the circled vertices have the following interesting property: from each one there is either a 1-step or a 2-step connection to any other vertex in its graph. In a sports tournament, these vertices would correspond to the most “powerful” teams in the sense that these teams either beat any given team or beat some other team that beat the given team. We can now state and prove a theorem that guarantees that any dominance-directed graph has at least one vertex with this property.

THEOREM 10.6.3 Connections in Dominance-Directed Graphs In any dominance-directed graph, there is at least one vertex from which there is a 1-step or 2-step connection to any other vertex.

Proof Consider a vertex (there may be several) with the largest total number of 1-step and 2-step connections to other vertices in the graph. By renumbering the vertices, we may assume that is such a vertex. Suppose there is some vertex such that there is no 1-step or 2-step connection from to . Then, in particular, is not true, so that by definition of a dominance-directed graph, it must be that . Next, let be any vertex such that is true. Then we cannot have , as then would be a 2-step connection from to . Thus, it must be that . That is, has 1-step connections to all the vertices to which has 1-step connections. The vertex must then also have 2-step connections to all the vertices to which has 2-step connections. But because, in addition, we have that , this means that has more 1-step and 2-step connections to other vertices than does . However, this contradicts the way in which was chosen. Hence, there can be no vertex to which has no 1-step or 2-step connection.

Figure 10.6.12 This proof shows that a vertex with the largest total number of 1-step and 2-step connections to other vertices has the property stated in the theorem. There is a simple way of finding such vertices using the vertex matrix M and its square . The sum of the entries in the ith row of M is the total number of 1-step connections from to other vertices, and the sum of the entries of the ith row of is the total number of 2-step connections from to other vertices. Consequently, the sum of the entries of the ith row of the matrix is the total number of 1-step and 2-step connections from to other vertices. In other words, a row of

with the largest row sum identifies a vertex having the property stated in Theorem

10.6.3.

E X A M P L E 7 Using Theorem 10.6.3 Suppose that five baseball teams play each other exactly once, and the results are as indicated in the dominance-directed graph of Figure 10.6.13. The vertex matrix of the graph is

so

The row sums of A are

Because the second row has the largest row sum, the vertex must have a 1-step or 2-step connection to any other vertex. This is easily verified from Figure 10.6.13.

Figure 10.6.13

We have informally suggested that a vertex with the largest number of 1-step and 2-step connections to other vertices is a “powerful” vertex. We can formalize this concept with the following definition.

DEFINITION 3 The power of a vertex of a dominance-directed graph is the total number of 1-step and 2-step connections from it to other vertices. Alternatively, the power of a vertex is the sum of the entries of the ith row of the matrix , where M is the vertex matrix of the directed graph.

E X A M P L E 8 Example 7 Revisited Let us rank the five baseball teams in Example 7 according to their powers. From the calculations for the row sums in that example, we have

Hence, the ranking of the teams according to their powers would be

Exercise Set 10.6 1. Construct the vertex matrix for each of the directed graphs illustrated in Figure Ex-1.

Figure Ex-1 Answer:

(a)

(b)

(c)

2. Draw a diagram of the directed graph corresponding to each of the following vertex matrices. (a)

(b)

(c)

Answer: (a)

(b)

(c)

3. Let M be the following vertex matrix of a directed graph:

(a) Draw a diagram of the directed graph. (b) Use Theorem 10.6.1 to find the number of 1-, 2-,and 3-step connections from the vertex vertex . Verify your answer by listing the various connections as in Example 3. (c) Repeat part (b) for the 1-, 2-, and 3-step connections from

to

.

Answer: (a)

(b)

(c)

4. (a) Compute the matrix product

for the vertex matrix M in Example 1.

to the

(b) Verify that the kth diagonal entry of family member. Why is this true?

is the number of family members who influence the kth

(c) Find a similar interpretation for the values of the nondiagonal entries of

.

Answer: (a)

(c) The

th entry is the number of family members who influence both the ith and jth family members.

5. By inspection, locate all cliques in each of the directed graphs illustrated in Figure Ex-5.

Figure Ex-5 Answer: (a) (b) (c)

and

6. For each of the following vertex matrices, use Theorem 10.6.2 to find all cliques in the corresponding directed graphs.

(a)

(b)

Answer: (a) None (b) 7. For the dominance-directed graph illustrated in Figure Ex-7 construct the vertex matrix and find the power of each vertex.

Figure Ex-7 Answer:

8. Five baseball teams play each other one time with the following results:

Rank the five baseball teams in accordance with the powers of the vertices they correspond to in the dominance-directed graph representing the outcomes of the games.

Answer: First, A; second, B and E (tie); fourth, C; fifth, D

Section 10.6 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. A graph having n vertices such that every vertex is connected to every other vertex has a vertex matrix given by

In this problem we develop a formula for from to .

whose

(a) Use a computer to compute the eight matrices

-th entry equals the number of k-step connections for

and for

(b) Use the results in part (a) and symmetry arguments to show that

.

can be written as

(c) Using the fact that

, show that

with

(d) Using part (c), show that

(e) Use the methods of Section 5.2 to compute

and thereby obtain expressions for

where

is the

(f) Show that for

and

, and eventually show that

matrix all of whose entries are ones and

is the

identity matrix.

, all vertices for these directed graphs belong to cliques.

T2. Consider a round-robin tournament among n players (labeled , , ) where beats , beats , beats beats and beats . Compute the “power” of each player, showing that they all have the same power; then determine that common power. [Hint: Use a computer to study the cases ; then make a conjecture and prove your conjecture to be true.]

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.7 Games of Strategy In this section we discuss a general game in which two competing players choose separate strategies to reach opposing objectives. The optimal strategy of each player is found in certain cases with the use of matrix techniques.

Prerequisites Matrix Multiplication Basic Probability Concepts

Game Theory To introduce the basic concepts in the theory of games, we will consider the following carnival-type game that two people agree to play. We will call the participants in the game player R and player C. Each player has a stationary wheel with a movable pointer on it as in Figure 10.7.1. For reasons that will become clear, we will call player R's wheel the row-wheel and player C's wheel the column-wheel. The row-wheel is divided into three sectors numbered 1, 2, and 3, and the column-wheel is divided into four sectors numbered 1, 2, 3, and 4. The fractions of the area occupied by the various sectors are indicated in the figure. To play the game, each player spins the pointer of his or her wheel and lets it come to rest at random. The number of the sector in which each pointer comes to rest is called the move of that player. Thus, player R has three possible moves and player C has four possible moves. Depending on the move each player makes, player C then makes a payment of money to player R according to Table 1.

Figure 10.7.1 Table 1

For example, if the row-wheel pointer comes to rest in sector 1 (player R makes move 1), and the column-wheel pointer comes to rest in sector 2 (player C makes move 2), then player C must pay player R the sum of $5. Some of the entries in this table are negative, indicating that player C makes a negative payment to player R. By this we mean that player R makes a positive payment to player C. For example, if the row-wheel shows 2 and the column-wheel shows 4, then player R pays player C the sum of $4, because the corresponding entry in the table is −$4. In this way the positive entries of the table are the gains of player R and the losses of player C, and the negative entries are the gains of player C and the losses of player R. In this game the players have no control over their moves; each move is determined by chance. However, if each player can decide whether he or she wants to play, then each would want to know how much he or she can expect to win or lose over the long term if he or she chooses to play. (Later in the section we will discuss this question and also consider a more complicated situation in which the players can exercise some control over their moves by varying the sectors of their wheels.)

Two-Person Zero-Sum Matrix Games The game described above is an example of a two-person zero-sum matrix game. The term zero-sum means that in each play of the game, the positive gain of one player is equal to the negative gain (loss) of the other player. That is, the sum of the two gains is zero. The term matrix game is used to describe a two-person game in which each player has only a finite number of moves, so that all possible outcomes of each play, and the corresponding gains of the players, can be displayed in tabular or matrix form, as in Table 1. In a general game of this type, let player R have m possible moves and let player C have n possible moves. In a play of the game, each player makes one of his or her possible moves, and then a payoff is made from player C to player R, depending on the moves. For , and , let us set

This payoff need not be money; it can be any type of commodity to which we can attach a numerical value. As before, if an entry is negative, we mean that player C receives a payoff of from player R. We arrange these mn possible payoffs in the form of an matrix

which we will call the payoff matrix of the game. Each player is to make his or her moves on a probabilistic basis. For example, for the game discussed in the

introduction, the ratio of the area of a sector to the area of the wheel would be the probability that the player makes the move corresponding to that sector. Thus, from Figure 10.7.1, we see that player R would make move 2 with probability , and player C would make move 2 with probability . In the general case we make the following definitions:

It follows from these definitions that and With the probabilities

and

we form two vectors:

We call the row vector p the strategy of player R and the column vector q the strategy of player C. For example, from Figure 10.7.1 we have

for the carnival game described earlier. From the theory of probability, if the probability that player R makes move i is , and independently the probability that player C makes move j is , then is the probability that for any one play of the game, player R makes move i and player C makes move j. The payoff to player R for such a pair of moves is . If we multiply each possible payoff by its corresponding probability and sum over all possible payoffs, we obtain the expression (1) Equation 1 is a weighted average of the payoffs to player R; each payoff is weighted according to the probability of its occurrence. In the theory of probability, this weighted average is called the expected payoff to player R. It can be shown that if the game is played many times, the long-term average payoff per play to player R is given by this expression. We denote this expected payoff by to emphasize the fact that it depends on the strategies of the two players. From the definition of the payoff matrix A and the strategies p and q, it can be verified that we may express the expected payoff in matrix notation as

(2)

Because

is the expected payoff to player R, it follows that

is the expected payoff to player C.

E X A M P L E 1 Expected Payoff to Player R For the carnival game described earlier, we have

Thus, in the long run, player R can expect to receive an average of about 18 cents from player C in each play of the game.

So far we have been discussing the situation in which each player has a predetermined strategy. We will now consider the more difficult situation in which both players can change their strategies independently. For example, in the game described in the introduction, we would allow both players to alter the areas of the sectors of their wheels and thereby control the probabilities of their respective moves. This qualitatively changes the nature of the problem and puts us firmly in the field of true game theory. It is understood that neither player knows what strategy the other will choose. It is also assumed that each player will make the best possible choice of strategy and that the other player knows this. Thus, player R attempts to choose a strategy p such that is as large as possible for the best strategy q that player C can choose; and similarly, player C attempts to choose a strategy q such that is as small as possible for the best strategy p that player R can choose. To see that such choices are actually possible, we will need the following theorem, called the Fundamental Theorem of Two-Person Zero-Sum Games. (The general proof, which involves ideas from the theory of linear programming, will be omitted. However, below we will prove this theorem for what are called strictly determined games and matrix games.)

THEOREM 10.7.1 Fundamental Theorem of Zero-Sum Games There exist strategies

and

such that (3)

for all strategies p and q.

The strategies this is so, let

and

in this theorem are the best possible strategies for players R and C, respectively. To see why . The left-hand inequality of Equation 3 then reads

This means that if player R chooses the strategy

, then no matter what strategy q player C chooses, the expected

payoff to player R will never be below v. Moreover, it is not possible for player R to achieve an expected payoff greater than v. To see why, suppose there is some strategy that player R can choose such that

Then, in particular, But this contradicts the right-hand inequality of Equation 3, which requires that

. Consequently, the best

player R can do is prevent his or her expected payoff from falling below the value v. Similarly, the best player C can do is ensure that player R's expected payoff does not exceed v, and this can be achieved by using strategy . On the basis of this discussion, we arrive at the following definitions.

DEFINITION 1 If

and

are strategies such that (4)

for all strategies p and q, then (i)

is called an optimal strategy for player R.

(ii)

is called an optimal strategy for player C.

(iii)

is called the value of the game.

The wording in this definition suggests that optimal strategies are not necessarily unique. This is indeed the case, and in Exercise 2 we ask you to show this. However, it can be proved that any two sets of optimal strategies always result in the same value v of the game. That is, if , and , are optimal strategies, then (5) The value of a game is thus the expected payoff to player R when both players choose any possible optimal strategies. To find optimal strategies, we must find vectors

and

that satisfy Equation 4. This is generally done by using linear

programming techniques. Next, we discuss special cases for which optimal strategies may be found by more elementary techniques. We now introduce the following definition.

DEFINITION 2 An entry

in a payoff matrix A is called a saddle point if

(i)

is the smallest entry in its row, and

(ii)

is the largest entry in its column.

A game whose payoff matrix has a saddle point is called strictly determined. For example, the shaded element in each of the following payoff matrices is a saddle point:

If a matrix has a saddle point

, it turns out that the following strategies are optimal strategies for the two players:

That is, an optimal strategy for player R is to always make the rth move, and an optimal strategy for player C is to always make the sth move. Such strategies for which only one move is possible are called pure strategies. Strategies for which more than one move is possible are called mixed strategies. To show that the above pure strategies are optimal, you can verify the following three equations (see Exercise 6): (6)

(7)

(8) Together, these three equations imply that for all strategies p and q. Because this is exactly Equation 4, it follows that

and

are optimal strategies.

From Equation 6 the value of a strictly determined game is simply the numerical value of a saddle point . It is possible for a payoff matrix to have several saddle points, but then the uniqueness of the value of a game guarantees that the numerical values of all saddle points are the same.

E X A M P L E 2 Optimal Strategies to Maximize a Viewing Audience Two competing television networks, R and C, are scheduling one-hour programs in the same time period. Network R can schedule one of three possible programs, and network C can schedule one of four possible programs. Neither network knows which program the other will schedule. Both networks ask the same outside polling agency to give them an estimate of how all possible pairings of the programs will divide the viewing audience. The agency gives them each Table 2, whose -th entry is the percentage of the viewing audience that will watch network R if network R's program i is paired against network C's program j. What program should each network schedule in order to maximize its viewing audience? Table 2

Solution Subtract 50 from each entry in Table 2 to construct the following matrix:

This is the payoff matrix of the two-person zero-sum game in which each network is considered to start with 50% of the audience, and the -th entry of the matrix is the percentage of the viewing audience that network C loses to network R if programs i and j are paired against each other. It is easy to see that the entry is a saddle point of the payoff matrix. Hence, the optimal strategy of network R is to schedule program 2, and the optimal strategy of network C is to schedule program 3. This will result in network R's receiving 45% of the audience and network C's receiving 55% of the audience.

2 × 2 Matrix Games Another case in which the optimal strategies can be found by elementary means occurs when each player has only two possible moves. In this case, the payoff matrix is a matrix

If the game is strictly determined, at least one of the four entries of A is a saddle point, and the techniques discussed above can then be applied to determine optimal strategies for the two players. If the game is not strictly determined, we first compute the expected payoff for arbitrary strategies p and q: (9) Because (10) we may substitute

and

into 9 to obtain (11)

If we rearrange the terms in Equation 11, we can write (12) By examining the coefficient of the

term in 12, we see that if we set (13)

then that coefficient is zero, and 12 reduces to (14) Equation 14 is independent of q; that is, if player R chooses the strategy determined by 13, player C cannot change the expected payoff by varying his or her strategy. In a similar manner, it can be verified that if player C chooses the strategy determined by (15) then substituting in 12 gives (16) Equations 14 and 16 show that (17) for all strategies p and q. Thus, the strategies determined by 13, 15, and 10 are optimal strategies for players R and C, respectively, and so we have the following result.

THEOREM 10.7.2 Optimal Strategies for a 2 × 2 Matrix Game For a

game that is not strictly determined, optimal strategies for players R and C are

and

The value of the game is

In order to be complete, we must show that the entries in the vectors

and

are numbers strictly between 0 and 1. In

Exercise 8 we ask you to show that this is the case as long as the game is not strictly determined.

Equation 17 is interesting in that it implies that either player can force the expected payoff to be the value of the game by choosing his or her optimal strategy, regardless of which strategy the other player chooses. This is not true, in general, for games in which either player has more than two moves.

E X A M P L E 3 Using Theorem 10.7.2 The federal government desires to inoculate its citizens against a certain flu virus. The virus has two strains, and the proportions in which the two strains occur in the virus population is not known. Two vaccines have been developed and each citizen is given only one of them. Vaccine 1 is 85% effective against strain 1 and 70% effective against strain 2. Vaccine 2 is 60% effective against strain 1 and 90% effective against strain 2. What inoculation policy should the government adopt? Solution We can consider this a two-person game in which player R (the government) desires to make the payoff (the fraction of citizens resistant to the virus) as large as possible, and player C (the virus) desires to make the payoff as small as possible. The payoff matrix is

This matrix has no saddle points, so Theorem 10.7.2 is applicable. Consequently,

Thus, the optimal strategy for the government is to inoculate

of the citizens with vaccine 1 and

of the

citizens with vaccine 2. This will guarantee that about 76.7% of the citizens will be resistant to a virus attack regardless of the distribution of the two strains. In contrast, a virus distribution of

of strain 1 and

of strain 2 will result in the same 76.7% of resistant

citizens, regardless of the inoculation strategy adopted by the government (see Exercise 7).

Exercise Set 10.7 1. Suppose that a game has a payoff matrix

(a) If players R and C use strategies

respectively, what is the expected payoff of the game? (b) If player C keeps his strategy fixed as in part (a), what strategy should player R choose to maximize his expected payoff? (c) If player R keeps her strategy fixed as in part (a), what strategy should player C choose to minimize the expected payoff to player R? Answer: (a) (b) (c) 2. Construct a simple example to show that optimal strategies are not necessarily unique. For example, find a payoff matrix with several equal saddle points. Answer: Let

, for example.

3. For the strictly determined games with the following payoff matrices, find optimal strategies for the two players, and find the values of the games. (a) (b)

(c)

(d)

Answer: (a)

(b) (c)

(d)

4. For the games with the following payoff matrices, find optimal strategies for the two players, and find the values of the games. (a) (b) (c) (d) (e)

Answer: (a)

(b)

(c) (d)

(e)

5. Player R has two playing cards: a black ace and a red four. Player C also has two cards: a black two and a red three. Each player secretly selects one of his or her cards. If both selected cards are the same color, player C pays player R the sum of the face values in dollars. If the cards are different colors, player R pays player C the sum of the face values. What are optimal strategies for both players, and what is the value of the game? Answer:

6. Verify Equations 6, 7, and 8. 7. Verify the statement in the last paragraph of Example 3. 8. Show that the entries of the optimal strategies

and

given in Theorem 10.7.2 are numbers strictly between zero

and one.

Section 10.7 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Consider a game between two players where each player can make up to n different moves . If the ith move of player R and the jth move of player C are such that is even, then C pays R $1. If is odd, then R pays C $1. Assume that both players have the same strategy—that is, and , where . Use a computer to show that

Using these results as a guide, prove in general that the expected payoff to player R is

which shows that in the long run, player R will not lose in this game. T2. Consider a game between two players where each player can make up to n different moves . If both players make the same move, then player C pays player R . However, if both players make different moves, then player R pays player C $1. Assume that both players have the same strategy—that is, and , where . Use a computer to show that

Using these results as a guide, prove in general that the expected payoff to player R is

which shows that in the long run, player R will not lose in this game.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.8 Leontief Economic Models In this section we discuss two linear models for economic systems. Some results about nonnegative matrices are applied to determine equilibrium price structures and outputs necessary to satisfy demand.

Prerequisites Linear Systems Matrices

Economic Systems Matrix theory has been very successful in describing the interrelations among prices, outputs, and demands in economic systems. In this section we discuss some simple models based on the ideas of Nobel laureate Wassily Leontief. We examine two different but related models: the closed or input-output model, and the open or production model. In each, we are given certain economic parameters that describe the interrelations between the “industries” in the economy under consideration. Using matrix theory, we then evaluate certain other parameters, such as prices or output levels, in order to satisfy a desired economic objective. We begin with the closed model.

Leontief Closed (Input-Output) Model First we present a simple example; then we proceed to the general theory of the model.

E X A M P L E 1 An Input-Output Model Three homeowners—a carpenter, an electrician, and a plumber—agree to make repairs in their three homes. They agree to work a total of 10 days each according to the following schedule:

For tax purposes, they must report and pay each other a reasonable daily wage, even for the work each does on his or her own home. Their normal daily wages are about $100, but they agree to adjust their respective daily wages so that each homeowner will come out even—that is, so that the total amount paid out by each is the same as the total amount each receives. We can set

To satisfy the “equilibrium” condition that each homeowner comes out even, we require that for each of the homeowners for the 10-day period. For example, the carpenter pays a total of

for the

repairs in his own home and receives a total income of for the repairs that he performs on all three homes. Equating these two expressions then gives the first of the following three equations:

The remaining two equations are the equilibrium equations for the electrician and the plumber. Dividing these equations by 10 and rewriting them in matrix form yields (1) Equation 1 can be rewritten as a homogeneous system by subtracting the left side from the right side to obtain

The solution of this homogeneous system is found to be (verify)

where s is an arbitrary constant. This constant is a scale factor, which the homeowners may choose for their convenience. For example, they can set so that the corresponding daily wages—$93, $96, and $108—are about $100.

This example illustrates the salient features of the Leontief input-output model of a closed economy. In the basic Equation 1, each column sum of the coefficient matrix is 1, corresponding to the fact that each of the homeowners' “output” of labor is completely distributed among these same homeowners in the proportions given by the entries in the column. Our problem is to determine suitable “prices” for these outputs so as to put the system in equilibrium—that is, so that each homeowner's total expenditures equal his or her total income. In the general model we have an economic system consisting of a finite number of “industries,” which we number as industries . Over some fixed period of time, each industry produces an “output” of some good or service that is completely utilized in a predetermined manner by the k industries. An important problem is to find suitable “prices” to be charged for these k outputs so that for each industry, total expenditures equal total income. Such a price structure represents an equilibrium position for the economy. For the fixed time period in question, let us set

for

. By definition, we have

With these quantities, we form the price vector

and the exchange matrix or input-output matrix

Condition (iii) expresses the fact that all the column sums of the exchange matrix are 1. As in the example, in order that the expenditures of each industry be equal to its income, the following matrix equation must be satisfied [see 1]: (2) or (3) Equation 3 is a homogeneous linear system for the price vector p. It will have a nontrivial solution if and only if the determinant of its coefficient matrix is zero. In Exercise 7 we ask you to show that this is the case for any exchange matrix E. Thus, 3 always has nontrivial solutions for the price vector p. Actually, for our economic model to make sense, we need more than just the fact that 3 has nontrivial solutions for p. We also need the prices of the k outputs to be nonnegative numbers. We express this condition as . (In general, if A is any vector or matrix, the notation means that every entry of A is nonnegative, and the notation means that every entry of A is positive. Similarly, means , and means .) To show that 3 has a nontrivial solution for which is a bit more difficult than showing merely that some nontrivial solution exists. But it is true, and we state this fact without proof in the following theorem.

THEOREM 10.8.1 If E is an exchange matrix, then

always has a nontrivial solution p whose entries are nonnegative.

Let us consider a few simple examples of this theorem.

E X A M P L E 2 Using Theorem 10.8.1 Let

Then

is

which has the general solution

where s is an arbitrary constant. We then have nontrivial solutions

E X A M P L E 3 Using Theorem 10.8.1

for any

.

Let

Then

has the general solution

where s and t are independent arbitrary constants. Nontrivial solutions both zero.

then result from any

and

, not

Example 2 indicates that in some situations one of the prices must be zero in order to satisfy the equilibrium condition. Example 3 indicates that there may be several linearly independent price structures available. Neither of these situations describes a truly interdependent economic structure. The following theorem gives sufficient conditions for both cases to be excluded.

THEOREM 10.8.2 Let E be an exchange matrix such that for some positive integer m all the entries of are positive. Then there is exactly one linearly independent solution of and it may be chosen so that all its entries are positive.

We will not give a proof of this theorem. If you have read Section 10.5 on Markov chains, observe that this theorem is essentially the same as Theorem 10.5.4. What we are calling exchange matrices in this section were called stochastic or Markov matrices in Section 10.5.

E X A M P L E 4 Using Theorem 10.8.2 The exchange matrix in Example 1 was

Because , the condition in Theorem 10.8.2 is satisfied for . Consequently, we are guaranteed that there is exactly one linearly independent solution of , and it can be chosen so that . In that example, we found that

is such a solution.

Leontief Open (Production) Model In contrast with the closed model, in which the outputs of k industries are distributed only among themselves, the open model attempts to satisfy an outside demand for the outputs. Portions of these outputs can still be distributed among the industries themselves, to keep them operating, but there is to be some excess, some net production, with which to satisfy the outside demand. In the closed model the outputs of the industries are fixed, and our objective is to determine prices for these outputs so that the equilibrium condition, that expenditures equal incomes, is satisfied. In the open model it is the prices that are fixed, and our objective is to determine levels of the outputs of the industries needed to satisfy the outside demand. We will measure the levels of the outputs in terms of their economic values using the fixed prices. To be precise, over some fixed period of time, let

With these quantities, we define the production vector

the demand vector

and the consumption matrix

By their nature, we have that From the definition of

and

, it can be seen that the quantity

is the value of the output of the ith industry needed by all k industries to produce a total output specified by the production vector x. Because this quantity is simply the ith entry of the column vector , we can say further that the ith entry of the column vector is the value of the excess output of the ith industry available to satisfy the outside demand. The value of the outside demand for the output of the ith industry is the ith entry of the demand vector d. Consequently, we are led to the following equation or (4) for the demand to be exactly met, without any surpluses or shortages. Thus, given C and d, our objective is to find a production vector that satisfies Equation 4.

E X A M P L E 5 Production Vector for a Town A town has three main industries: a coal-mining operation, an electric power-generating plant, and a local railroad. To mine $1 of coal, the mining operation must purchase $.25 of electricity to run its equipment and $.25 of transportation for its shipping needs. To produce $1 of electricity, the generating plant requires $.65 of coal for fuel, $.05 of its own electricity to run auxiliary equipment, and $.05 of transportation. To provide $1 of transportation, the railroad requires $.55 of coal for fuel and $.10 of electricity for its auxiliary equipment. In a certain week the coal-mining operation receives orders for $50,000 of coal from outside the town, and the generating plant receives orders for $25,000 of electricity from outside. There is no outside demand for the local railroad. How much must each of the three industries produce in that week to exactly satisfy their own demand and the outside demand? Solution For the one-week period let

From the information supplied, the consumption matrix of the system is

The linear system

is then

The coefficient matrix on the left is invertible, and the solution is given by

Thus, the total output of the coal-mining operation should be $102,087, the total output of the power-generating plant should be $56,163, and the total output of the railroad should be $28,330.

Let us reconsider Equation 4: If the square matrix

is invertible, we can write (5)

In addition, if the matrix

has only nonnegative entries, then we are guaranteed that for any

, Equation 5 has a unique

nonnegative solution for x. This is a particularly desirable situation, as it means that any outside demand can be met. The terminology used to describe this case is given in the following definition.

DEFINITION 1 A consumption matrix C is said to be productive if

exists and

We will now consider some simple criteria that guarantee that a consumption matrix is productive. The first is given in the following theorem.

THEOREM 10.8.3 Productive Consumption Matrix A consumption matrix C is productive if and only if there is some production vector

(The proof is outlined in Exercise 9.) The condition industry produces more than it consumes.

such that

.

means that there is some production schedule possible such that each

Theorem 10.8.3 has two interesting corollaries. Suppose that all the row sums of C are less than 1. If

then is a column vector whose entries are these row sums. Therefore, Thus, we arrive at the following corollary:

, and the condition of Theorem 10.8.3 is satisfied.

COROLLARY 10.8.4 A consumption matrix is productive if each of its row sums is less than 1.

As we ask you to show in Exercise 8, this corollary leads to the following:

COROLLARY 10.8.5 A consumption matrix is productive if each of its column sums is less than 1.

Recalling the definition of the entries of the consumption matrix C, we see that the jth column sum of C is the total value of the outputs of all k industries needed to produce one unit of value of output of the jth industry. The jth industry is thus said to be profitable if that jth column sum is less than 1. In other words, Corollary 10.8.5 says that a consumption matrix is productive if all k industries in the economic system are profitable.

E X A M P L E 6 Using Corollary 10.8.5 The consumption matrix in Example 5 was

All three column sums in this matrix are less than 1, so all three industries are profitable. Consequently, by Corollary 10.8.5, the consumption matrix C is productive. This can also be seen in the calculations in Example 5, as is nonnegative.

Exercise Set 10.8 1. For the following exchange matrices, find nonnegative price vectors that satisfy the equilibrium condition 3. (a)

(b)

(c)

Answer:

(a) (b)

(c)

2. Using Theorem 10.8.3 and its corollaries, show that each of the following consumption matrices is productive. (a) (b)

(c)

Answer: (a) Use Corollary 10.8.4; all row sums are less than one. (b) Use Corollary 10.8.5; all column sums are less than one. (c) Use Theorem 10.8.3, with

.

3. Using Theorem 10.8.2, show that there is only one linearly independent price vector for the closed economic system with exchange matrix

Answer: has all positive entries. 4. Three neighbors have backyard vegetable gardens. Neighbor A grows tomatoes, neighbor B grows corn, and neighbor C grows lettuce. They agree to divide their crops among themselves as follows: A gets of the tomatoes, of the corn, and of the lettuce. B gets

of the tomatoes,

of the corn, and

of the lettuce. C gets

of the tomatoes,

of the corn,

of the lettuce.

What prices should the neighbors assign to their respective crops if the equilibrium condition of a closed economy is to be satisfied, and if the lowest-priced crop is to have a price of $100? Answer: Price of tomatoes, $120.00; price of corn, $100.00; price of lettuce, $106.67 5. Three engineers—a civil engineer (CE), an electrical engineer (EE), and a mechanical engineer (ME)—each have a consulting firm. The consulting they do is of a multidisciplinary nature, so they buy a portion of each others' services. For each $1 of consulting the CE does, she buys $.10 of the EE's services and $.30 of the ME's services. For each $1 of consulting the EE does, she buys $.20 of the CE's services and $.40 of the ME's services. And for each $1 of consulting the ME does, she buys $.30 of the CE's services and $.40 of the EE's services. In a certain week the CE receives outside consulting orders of $500, the EE receives outside consulting orders of $700, and the ME receives outside consulting orders of $600. What dollar amount of consulting does each engineer perform in that week? Answer:

$1256 for the CE, $1448 for the EE, $1556 for the ME 6. (a) Suppose that the demand for the output of the ith industry increases by one unit. Explain why the ith column of the matrix is the increase that must be made to the production vector x to satisfy this additional demand. (b) Referring to Example 5, use the result in part (a) to determine the increase in the value of the output of the coal-mining operation needed to satisfy a demand of one additional unit in the value of the output of the power-generating plant. Answer: (b) 7. Using the fact that the column sums of an exchange matrix E are all 1, show that the column sums of show that has zero determinant, and so has nontrivial solutions for p.

are zero. From this,

8. Show that Corollary 10.8.5 follows from Corollary 10.8.4. [Hint: Use the fact that

for any invertible matrix A.]

9. (Calculus required) Prove Theorem 10.8.3 as follows: (a) Prove the “only if” part of the theorem; that is, show that if C is a productive consumption matrix, then there is a vector such that . (b) Prove the “if” part of the theorem as follows: Step 1 Show that if there is a vector

such that

Step 2 Show that there is a number λ such that Step 3 Show that Step 4 Show that

for as

, then and

. .

. .

Step 5 By multiplying out, show that for

.

Step 6 By letting exists and that Step 7 Show that

in Step 5, show that the matrix infinite sum . and that

.

Step 8 Show that C is a productive consumption matrix.

Section 10.8 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Consider a sequence of exchange matrices

where

and so on. Use a computer to show that

,

,

,

, and make the conjecture that although

is true,

is not true for . Next, use a computer to determine the vectors such that (for , 3, 4, 5, 6), and then see if you can discover a pattern that would allow you to compute easily from . Test your discovery by first constructing from

and then checking to see whether

.

T2. Consider an open production model having n industries with . In order to produce $1 of its own output, the jth industry must spend for the output of the ith industry (for all ), but the jth industry (for all ) spends nothing for its own output. Construct the consumption matrix , show that it is productive, and determine an expression for . In determining an expression for , use a computer to study the cases when , 3, 4, and 5; then make a conjecture and prove your conjecture to be true. [Hint: If (i.e., the matrix with every entry equal to 1), first show that and then express your value of

in terms of n,

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

, and

.]

10.9 Forest Management In this section we discuss a matrix model for the management of a forest where trees are grouped into classes according to height. The optimal sustainable yield of a periodic harvest is calculated when the trees of different height classes can have different economic values.

Prerequisites Matrix Operations

Optimal Sustainable Yield Our objective is to introduce a simplified model for the sustainable harvesting of a forest whose trees are classified by height. The height of a tree is assumed to determine its economic value when it is cut down and sold. Initially, there is a distribution of trees of various heights. The forest is then allowed to grow for a certain period of time, after which some of the trees of various heights are harvested. The trees left unharvested are to be of the same height configuration as the original forest, so that the harvest is sustainable. As we will see, there are many such sustainable harvesting procedures. We want to find one for which the total economic value of all the trees removed is as large as possible. This determines the optimal sustainable yield of the forest and is the largest yield that can be attained continually without depleting the forest.

The Model Suppose that a harvester has a forest of Douglas fir trees that are to be sold as Christmas trees year after year. Every December the harvester cuts down some of the trees to be sold. For each tree cut down, a seedling is planted in its place. In this way the total number of trees in the forest is always the same. (In this simplified model, we will not take into account trees that die between harvests. We assume that every seedling planted survives and grows until it is harvested.) In the marketplace, trees of different heights have different economic values. Suppose that there are n different price classes corresponding to certain height intervals, as shown in Table 1 and Figure 10.9.1.The first class consists of seedlings with heights in the interval , and these seedlings are of no economic value. The nth class consists of trees with heights greater than or equal to .

Figure 10.9.1 Table 1

Let be the number of trees within the ith class that remain after each harvest. We form a column vector with the numbers and call it the nonharvest vector:

For a sustainable harvesting policy, the forest is to be returned after each harvest to the fixed configuration given by the nonharvest vector x. Part of our problem is to find those nonharvest vectors x for which sustainable harvesting is possible. Because the total number of trees in the forest is fixed, we can set (1) where s is predetermined by the amount of land available and the amount of space each tree requires. Referring to Figure 10.9.2, we have the following situation. The forest configuration is given by the vector x after each harvest. Between harvests the trees grow and produce a new forest configuration before each harvest. A certain number of trees are removed from each class at the harvest. Finally, a seedling is planted in place of each tree removed, to return the forest again to the configuration x.

Figure 10.9.2 Consider first the growth of the forest between harvests. During this period a tree in the ith class may grow and move up to a

higher height class. Or its growth may be retarded for some reason, and it will remain in the same class. We consequently define the following growth parameters for : For simplicity we assume that a tree can move at most one height class upward in one growth period. With this assumption, we have With these

growth parameters, we form the following

growth matrix:

(2)

Because the entries of the vector x are the numbers of trees in the n classes before the growth period, you can verify that the entries of the vector

(3)

are the numbers of trees in the n classes after the growth period. Suppose that during the harvest we remove

trees from the ith class. We will call the column vector

the harvest vector. Thus, a total of trees are removed at each harvest. This is also the total number of trees added to the first class (the new seedlings) after each harvest. If we define the following replacement matrix

(4)

then the column vector

(5)

specifies the configuration of trees planted after each harvest. At this point we are ready to write the following equation, which characterizes a sustainable harvesting policy:

or mathematically, This equation can be rewritten as (6) or more comprehensively as

We will refer to Equation 6 as the sustainable harvesting condition. Any vectors x and y with nonnegative entries, and such that , which satisfy this matrix equation, determine a sustainable harvesting policy for the forest. Note that if , then the harvester is removing seedlings of no economic value and replacing them with new seedlings. Because there is no point in doing this, we assume that (7) With this assumption, it can be verified that 6 is the matrix form of the following set of equations:

(8)

Note that the first equation in 8 is the sum of the remaining Because we must have

for

equations.

, Equations 8 require that (9)

Conversely, if x is a column vector with nonnegative entries that satisfy Equation 9, then 7 and 8 define a column vector y with nonnegative entries. Furthermore, x and y then satisfy the sustainable harvesting condition 6. In other words, a necessary and sufficient condition for a nonnegative column vector x to determine a forest configuration that is capable of sustainable harvesting is that its entries satisfy 9.

Optimal Sustainable Yield Because we remove trees from the ith class total yield of the harvest, Yld, is given by

and each tree in the ith class has an economic value of

, the

(10)

Using 8, we may substitute for the

's in 10 to obtain (11)

Combining 11, 1, and 9, we can now state the problem of maximizing the yield of the forest over all possible sustainable harvesting policies as follows:

Problem Find nonnegative numbers

that maximize

subject to and

As formulated above, this problem belongs to the field of linear programming. However, we will illustrate the following result, without linear programming theory, by actually exhibiting a sustainable harvesting policy.

THEOREM 10.9.1 Optimal Sustainable Yield The optimal sustainable yield is achieved by harvesting all the trees from one particular height class and none of the trees from any other height class.

Let us first set The largest value of for will then be the optimal sustainable yield, and the corresponding value of k will be the class that should be completely harvested to attain the optimal sustainable yield. Because no class but the kth is harvested, we have (12) In addition, because all of the kth class is harvested, no trees are ever present in the height classes above the kth class. Thus, (13) Substituting 12 and 13 into the sustainable harvesting condition 8 gives

(14)

Equations 14 can also be written as

(15) from which it follows that

(16)

If we substitute Equations 13 and 16 into [which is Equation 1], we can solve for

and obtain (17)

For the yield

, we combine 10, 12, 15, and 17 to obtain

(18)

Equation 18 determines in terms of the known growth and economic parameters for any sustainable yield is found as follows.

. Thus, the optimal

THEOREM 10.9.2 Finding the Optimal Sustainable Yield The optimal sustainable yield is the largest value of

for

. The corresponding value of k is the number of the class that is completely harvested.

In Exercise 4 we ask you to show that the nonharvest vector x for the optimal sustainable yield is

(19)

Theorem 10.9.2 implies that it is not necessarily the highest-priced class of trees that should be totally cropped. The growth parameters must also be taken into account to determine the optimal sustainable yield.

E X A M P L E 1 Using Theorem 10.9.2

For a Scots pine forest in Scotland with a growth period of six years, the following growth matrix was found (see M. B. Usher, “A Matrix Approach to the Management of Renewable Resources, with Special Reference to Selection Forests,” Journal of Applied Ecology, vol. 3, 1966, pp. 355–367):

Suppose that the prices of trees in the five tallest height classes are Which class should be completely harvested to obtain the optimal sustainable yield, and what is that yield? Solution From matrix G we have that Equation 18 then gives

We see that is the largest of these five quantities, so from Theorem 10.9.2 the third class should be completely harvested every six years to maximize the sustainable yield. The corresponding optimal sustainable yield is $14.7s, where s is the total number of trees in the forest.

Exercise Set 10.9 1. A certain forest is divided into three height classes and has a growth matrix between harvests given by

If the price of trees in the second class is $30 and the price of trees in the third class is $50, which class should be completely harvested to attain the optimal sustainable yield? What is the optimal yield if there are 1000 trees in the forest? Answer: The second class; $15,000 2. In Example 1, to what level must the price of trees in the fifth class rise so that the fifth class is the one to harvest completely in order to attain the optimal sustainable yield? Answer: $223 3. In Example 1, what must the ratio of the prices

be in order that the yields

,

, all be the

same? (In this case, any sustainable harvesting policy will produce the same optimal sustainable yield. Answer: 4. Derive Equation 19 for the nonharvest vector x corresponding to the optimal sustainable harvesting policy described in Theorem 10.9.2. 5. For the optimal sustainable harvesting policy described in Theorem 10.9.2, how many trees are removed from the forest during each harvest? Answer:

6. If all the growth parameters in the growth matrix G are equal, what should the ratio of the prices be in order that any sustainable harvesting policy be an optimal sustainable harvesting policy? (See Exercise 3.) Answer:

Section 10.9 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. A particular forest has growth parameters given by

for , where n (the total number of height classes) can be chosen as large as needed. Suppose that the value of a tree in the kth height interval is given by where a is a constant (in dollars) and ρ is a parameter satisfying (a) Show that the yield

.

is given by

(b) For use a computer to determine the class number that should be completely harvested, and determine the optimal sustainable yield in each case. Make sure that you allow k to take on only integer values in your calculations. (c) Repeat the calculations in part (b) using

(d) Show that if

, then the optimal sustainable yield can never be larger than 2as.

(e) Compare the values of k determined in parts (b) and (c) to

T2. A particular forest has growth parameters given by

, and use some calculus to explain why

for , where n (the total number of height classes) can be chosen as large as needed. Suppose that the value of a tree in the kth height interval is given by where a is a constant (in dollars) and ρ is a parameter satisfying (a) Show that the yield

.

is given by

(b) For use a computer to determine the class number that should be completely harvested in order to obtain an optimal yield, and determine the optimal sustainable yield in each case. Make sure that you allow k to take on only integer values in your calculations. (c) Compare the values of k determined in part (b) to

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

and use some calculus to explain why

10.10 Computer Graphics In this section we assume that a view of a three-dimensional object is displayed on a video screen and show how matrix algebra can be used to obtain new views of the object by rotation, translation, and scaling.

Prerequisites Matrix Algebra Analytic Geometry

Visualization of a Three-Dimensional Object Suppose that we want to visualize a three-dimensional object by displaying various views of it on a video screen. The object we have in mind to display is to be determined by a finite number of straight line segments. As an example, consider the truncated right pyramid with hexagonal base illustrated in Figure 10.10.1. We first introduce an xyz-coordinate system in which to embed the object. As in Figure 10.10.1, we orient the coordinate system so that its origin is at the center of the video screen and the xy-plane coincides with the plane of the screen. Consequently, an observer will see only the projection of the view of the three-dimensional object onto the two-dimensional xy-plane.

Figure 10.10.1 In the xyz-coordinate system, the endpoints of the object will have certain coordinates—say,

of the straight line segments that determine the view

These coordinates, together with a specification of which pairs are to be connected by straight line segments,

are to be stored in the memory of the video display system. For example, assume that the 12 vertices of the truncated pyramid in Figure 10.10.1 have the following coordinates (the screen is 4 units wide by 3 units high):

These 12 vertices are connected pairwise by 18 straight line segments as follows, where point is connected to point :

denotes that

In View 1 these 18 straight line segments are shown as they would appear on the video screen. It should be noticed that only the x- and y-coordinates of the vertices are needed by the video display system to draw the view, because only the projection of the object onto the xy-plane is displayed. However, we must keep track of the z-coordinates to carry out certain transformations discussed later.

View 1 We now show how to form new views of the object by scaling, translating, or rotating the initial view. We first construct a matrix P, referred to as the coordinate matrix of the view, whose columns are the coordinates of the n points of a view:

For example, the coordinate matrix P corresponding to View 1 is the

matrix

We will show below how to transform the coordinate matrix P of a view to a new coordinate matrix P' corresponding to a new view of the object. The straight line segments connecting the various points move with the points as they are transformed. In this way, each view is uniquely determined by its coordinate matrix once we have specified which pairs of points in the original view are to be connected by straight lines.

Scaling The first type of transformation we consider consists of scaling a view along the x, y, and z directions by factors of α, β, and γ, respectively. By this we mean that if a point has coordinates in the original view, it is to move to a new point with coordinates in the new view. This has the effect of transforming a unit cube in the original view to a rectangular parallelepiped of dimensions (Figure 10.10.2). Mathematically, this may be accomplished with matrix multiplication as follows. Define a diagonal matrix

Then, if a point

in the original view is represented by the column vector

then the transformed point

is represented by the column vector

Using the coordinate matrix P, which contains the coordinates of all n points of the original view as its columns, we can transform these n points simultaneously to produce the coordinate matrix of the scaled view, as follows:

The new coordinate matrix can then be entered into the video display system to produce the new view of the object. As an example, View 2 is View 1 scaled by setting , , and . Note that the scaling along the z-axis is not visible in View 2, since we see only the projection of the object onto the xy-plane.

Figure 10.10.2

View 2 View 1 scaled by

,

,

Translation We next consider the transformation of translating or displacing an object to a new position on the screen. Referring to Figure 10.10.3, suppose we desire to change an existing view so that each point with coordinates moves to a new point with coordinates . The vector

is called the translation vector of the transformation. By defining a

matrix T as

we can translate all n points of the view determined by the coordinate matrix P by matrix addition via the equation The coordinate matrix then specifies the new coordinates of the n points. For example, if we wish to translate View 1 according to the translation vector

the result is View 3. Note, again, that the translation View 3.

along the z-axis does not show up explicitly in

View 3 View 1 translated by

Figure 10.10.3 In Exercise 7, a technique of performing translations by matrix multiplication rather than by matrix addition is explained.

Rotation A more complicated type of transformation is a rotation of a view about one of the three coordinate axes. We begin with a rotation about the z-axis (the axis perpendicular to the screen) through an angle θ. Given a point in the original view with coordinates , we wish to compute the new coordinates of the

rotated point following:

. Referring to Figure 10.10.4 and using a little trigonometry, you should be able to derive the

These equations can be written in matrix form as

If we let R denote the matrix in this equation, all n points can be rotated by the matrix product yield the coordinate matrix of the rotated view.

to

Figure 10.10.4 Rotations about the x- and y-axes can be accomplished analogously, and the resulting rotation matrices are given with Views 4, 5, and 6. These three new views of the truncated pyramid correspond to rotations of View 1 about the x-, y-, and z-axes, respectively, each through an angle of .

View 4 View 1 rotated

about the x-axis

View 5 View 1 rotated

about the y-axis.

View 6 View 1 rotated

about the z-axis.

Rotations about three coordinate axes may be combined to give oblique views of an object. For example, View 7 is View 1 rotated first about the x-axis through , then about the y-axis through , and finally about the z-axis through . Mathematically, these three successive rotations can be embodied in the single transformation equation , where R is the product of three individual rotation matrices:

in the order

View 7 Oblique view of truncated pyramid. As a final illustration, in View 8 we have two separate views of the truncated pyramid, which constitute a stereoscopic pair. They were produced by first rotating View 7 about the y-axis through an angle of and translating it to the right, then rotating the same View 7 about the y-axis through an angle of and translating it to the left. The translation distances were chosen so that the stereoscopic views are about inches apart—the approximate distance between a pair of eyes.

View 8 Stereoscopic figure of truncated pyramid. The three-dimensionality of the diagram can be seen by holding the book about one foot away and focusing on a distant object. Then by shifting your gaze to View 8 without refocusing, you can make the two views of the stereoscopic pair merge together and produce the desired effect.

Exercise Set 10.10 1. View 9 is a view of a square with vertices

,

,

, and

.

(a) What is the coordinate matrix of View 9? (b) What is the coordinate matrix of View 9 after it is scaled by a factor

in the x-direction and

y-direction? Draw a sketch of the scaled view. (c) What is the coordinate matrix of View 9 after it is translated by the following vector?

Draw a sketch of the translated view.

in the

(d) What is the coordinate matrix of View 9 after it is rotated through an angle of Draw a sketch of the rotated view.

Ex-View 9 Square with vertices

,

,

, and

about the z-axis?

(Exercises 1 and 2)

Answer: (a)

(b)

(c)

(d)

2. (a) If the coordinate matrix of View 9 is multiplied by the matrix

the result is the coordinate matrix of View 10. Such a transformation is called a shear in the x-direction with factor with respect to the y-coordinate. Show that under such a transformation, a point with coordinates

has new coordinates

.

(b) What are the coordinates of the four vertices of the shear square in View 10?

(c) The matrix

determines a shear in the y-direction with factor .6 with respect to the x-coordinate (an example appears in View 11). Sketch a view of the square in View 9 after such a shearing transformation, and find the new coordinates of its four vertices.

Ex-View 10 View 9 sheared along the x-axis by

with respect to the y-coordinate (Exercise 2)

Ex-View 11 View 1 sheared along the y-axis by .6 with respect to the x-coordinate (Exercise 2). Answer: (b)

(c) 3. (a) The reflection about the xz-plane is defined as the transformation that takes a point to the point (e.g., View 12). If P and are the coordinate matrices of a view and its reflection about the xz-plane, respectively, find a matrix M such that . (b) Analogous to part (a), define the reflection about the yz-plane and construct the corresponding transformation matrix. Draw a sketch of View 1 reflected about the yz-plane. (c) Analogous to part (a), define the reflection about the xy-plane and construct the corresponding transformation matrix. Draw a sketch of View 1 reflected about the xy-plane.

Ex-View 12 View 1 reflected about the xz-plane (Exercise 3). Answer: (a)

(b)

(c)

4. (a) View 13 is View 1 subject to the following five transformations: 1. Scale by a factor of 2. Translate 3. Rotate 4. Rotate 5. Rotate

in the x-direction, 2 in the y-direction, and

in the z-direction.

unit in the x-direction. about the x-axis. about the y-axis. about the z-axis.

Construct the five matrices

,

,

,

(b) If P is the coordinate matrix of View 1 and of , , , , , and P.

, and

associated with these five transformations.

is the coordinate matrix of View 13, express

in terms

Ex-View 13 View 1 scaled, translated, and rotated (Exercise 4) Answer: (a)

(b) 5. (a) View 14 is View 1 subject to the following seven transformations: 1. Scale by a factor of .3 in the x-direction and by a factor of .5 in the y-direction. 2. Rotate 45° about the x-axis. 3. Translate 1 unit in the x-direction. 4. Rotate 35° about the y-axis. 5. Rotate −45° about the z-axis. 6. Translate 1 unit in the z-direction. 7. Scale by a factor of 2 in the x-direction. Construct the matrices

associated with these seven transformations.

(b) If P is the coordinate matrix of View 1 and of , and P.

is the coordinate matrix of View 14, express

in terms

Ex-View 14 View 1 scaled, translated, and rotated (Exercise 5). Answer: (a)

(b) 6. Suppose that a view with coordinate matrix P is to be rotated through an angle θ about an axis through the origin and specified by two angles α and β (see Figure Ex-6). If is the coordinate matrix of the rotated view, find rotation matrices , , , , and such that [Hint: The desired rotation can be accomplished in the following five steps: 1. Rotate through an angle of β about the y-axis. 2. Rotate through an angle of α about the z-axis. 3. Rotate through an angle of θ about the y-axis. 4. Rotate through an angle of −α about the z-axis. 5. Rotate through an angle of −β about the y-axis.]

Figure Ex-6 Answer:

7. This exercise illustrates a technique for translating a point with coordinates to a point with coordinates by matrix multiplication rather than matrix addition. (a) Let the point

be associated with the column vector

and let the point

Find a

be associated with the column vector

matrix M such that

(b) Find the specific to the point

.

matrix of the above form that will effect the translation of the point .

Answer: (a)

(b)

8. For the three rotation matrices given with Views 4, 5, and 6, show that (A matrix with this property is called an orthogonal matrix. See Section 7.1.)

Section 10.10 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Let be a unit vector normal to the plane , and let can be shown that the mirror image of the vector r through the above plane has coordinates where

be a vector. It

with

(a) Show that and give a physical reason why this must be so. [Hint: Use the fact that unit vector to show that .] (b) Use a computer to show that

is a

.

(c) The eigenvectors of M satisfy the equation

and therefore correspond to those vectors whose direction is not affected by a reflection through the plane. Use a computer to determine the eigenvectors and eigenvalues of M, and then give a physical argument to support your answer. T2. A vector the rotated vector

with

is rotated by an angle θ about an axis having unit vector . It can be shown that

, thereby forming

(a) Use a computer to show that and then give a physical reason why this must be so. Depending on the sophistication of the computer you are using, you may have to experiment using different values of a, b, and

(b) Show also that

and give a physical reason why this must be so.

(c) Use a computer to show that

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

.

10.11 Equilibrium Temperature Distributions In this section we will see that the equilibrium temperature distribution within a trapezoidal plate can be found when the temperatures around the edges of the plate are specified. The problem is reduced to solving a system of linear equations. Also, an iterative technique for solving the problem and a “random walk” approach to the problem are described.

Prerequisites Linear Systems Matrices Intuitive Understanding of Limits

Boundary Data Suppose that the two faces of the thin trapezoidal plate shown in Figure 10.11.1a are insulated from heat. Suppose that we are also given the temperature along the four edges of the plate. For example, let the temperature be constant on each edge with values of , , , and , as in the figure. After a period of time, the temperature inside the plate will stabilize. Our objective in this section is to determine this equilibrium temperature distribution at the points inside the plate. As we will see, the interior equilibrium temperature is completely determined by the boundary data—that is, the temperature along the edges of the plate.

Figure 10.11.1 The equilibrium temperature distribution can be visualized by the use of curves that connect points of equal temperature. Such curves are called isotherms of the temperature distribution. In Figure 10.11.1b we have sketched a few isotherms, using information we derive later in the chapter.

Although all our calculations will be for the trapezoidal plate illustrated, our techniques generalize easily to a plate of any practical shape. They also generalize to the problem of finding the temperature within a three-dimensional body. In fact, our “plate” could be the cross section of some solid object if the flow of heat perpendicular to the cross section is negligible. For example, Figure 10.11.1 could represent the cross section of a long dam. The dam is exposed to three different temperatures: the temperature of the ground at its base, the temperature of the water on one side, and the temperature of the air on the other side. A knowledge of the temperature distribution inside the dam is necessary to determine the thermal stresses to which it is subjected. Next we will consider a certain thermodynamic principle that characterizes the temperature distribution we are seeking.

The Mean-Value Property There are many different ways to obtain a mathematical model for our problem. The approach we use is based on the following property of equilibrium temperature distributions.

THEOREM 10.11.1 The Mean-Value Property Let a plate be in thermal equilibrium and let P be a point inside the plate. Then if C is any circle with center at P that is completely contained in the plate, the temperature at P is the average value of the temperature on the circle (Figure 10.11.2).

Figure 10.11.2

This property is a consequence of certain basic laws of molecular motion, and we will not attempt to derive it. Basically, this property states that in equilibrium, thermal energy tends to distribute itself as evenly as possible consistent with the boundary conditions. It can be shown that the mean-value property uniquely determines the equilibrium temperature distribution of a plate. Unfortunately, determining the equilibrium temperature distribution from the mean-value property is not an easy matter. However, if we restrict ourselves to finding the temperature only at a finite set of points within the plate, the problem can be reduced to solving a linear system. We pursue this idea next.

Discrete Formulation of the Problem We can overlay our trapezoidal plate with a succession of finer and finer square nets or meshes (Figure 10.11.3). In (a) we have a rather coarse net; in (b) we have a net with half the spacing as in (a); and in (c) we have a net with the spacing again reduced by half. The points of intersection of the net lines are called mesh points. We classify them as boundary mesh points if they fall on the boundary of the plate or as interior mesh points if they lie in the interior of the plate. For the three net spacings we have chosen, there are 1, 9, and 49 interior mesh points, respectively.

Figure 10.11.3 In the discrete formulation of our problem, we try to find the temperature only at the interior mesh points of some particular net. For a rather fine net, as in (c), this will provide an excellent picture of the temperature distribution throughout the entire plate. At the boundary mesh points, the temperature is given by the boundary data. (In Figure 10.11.3 we have labeled all the boundary mesh points with their corresponding temperatures.) At the interior mesh points, we will apply the following discrete version of the mean-value property.

THEOREM 10.11.2 Discrete Mean-Value Property At each interior mesh point, the temperature is approximately the average of the temperatures at the four neighboring mesh points.

This discrete version is a reasonable approximation to the true mean-value property. But because it is only an approximation, it will provide only an approximation to the true temperatures at the interior mesh points. However, the approximations will get better as the mesh spacing decreases. In fact, as the mesh spacing approaches zero, the approximations approach the exact temperature distribution, a fact proved in advanced courses in numerical analysis. We will illustrate this convergence by computing the approximate temperatures at the mesh points for the three mesh spacings given in Figure 10.11.3. Case (a) of Figure 10.11.3 is simple, for there is only one interior mesh point. If we let

be the temperature at this

mesh point, the discrete mean-value property immediately gives

In case (b) we can label the temperatures at the nine interior mesh points , as in Figure 10.11.3b. (The particular ordering is not important.) By applying the discrete mean-value property successively to each of these nine mesh points, we obtain the following nine equations:

(1)

This is a system of nine linear equations in nine unknowns. We can rewrite it in matrix form as (2) where

To solve Equation 2, we write it as

The solution for t is thus (3) as long as the matrix

is invertible. This is indeed the case, and the solution for t as calculated by 3 is

(4)

Figure 10.11.4 is a diagram of the plate with the nine interior mesh points labeled with their temperatures as given by this solution.

Figure 10.11.4 For case (c) of Figure 10.11.3, we repeat this same procedure. We label the temperatures at the 49 interior mesh points as in some manner. For example, we may begin at the top of the plate and proceed from left to right along each row of mesh points. Applying the discrete mean-value property to each mesh point gives a system of 49 linear equations in 49 unknowns:

(5)

In matrix form, Equations 5 are where t and b are column vectors with 49 entries, and M is a

matrix. As in 3, the solution for t is (6)

In Figure 10.11.5 we display the temperatures at the 49 mesh points found by Equation 6. The nine unshaded temperatures in this figure fall on the mesh points of Figure 10.11.4.

Figure 10.11.5 In Table 1 we compare the temperatures at these nine common mesh points for the three different mesh spacings used. Table 1

Knowing that the temperatures of the discrete problem approach the exact temperatures as the mesh spacing decreases, we may surmise that the nine temperatures obtained in case (c) are closer to the exact values than those in case (b).

A Numerical Technique To obtain the 49 temperatures in case (c) of Figure 10.11.3, it was necessary to solve a linear system with 49 unknowns. A finer net might involve a linear system with hundreds or even thousands of unknowns. Exact algorithms for the solutions of such large systems are impractical, and for this reason we now discuss a numerical technique for the practical solution of these systems. To describe this technique, we look again at Equation 2: (7) The vector t we are seeking appears on both sides of this equation. We consider a way of generating better and better approximations to the vector solution t. For the initial approximation we can take if no better choice is available. If we substitute into the right side of 7 and label the resulting left side as , we have (8) If we substitute

into the right side of 7, we generate another approximation, which we label

: (9)

Continuing in this way, we generate a sequence of approximations as follows:

(10)

One would hope that this sequence of approximations

converges to the exact solution of 7. We do

not have the space here to go into the theoretical considerations necessary to show this. Suffice it to say that for the particular problem we are considering, the sequence converges to the exact solution for any mesh size and for any initial approximation . This technique of generating successive approximations to the solution of 7 is a variation of a technique called Jacobi iteration; the approximations themselves are called iterates. As a numerical example, let us apply Jacobi iteration to the calculation of the nine mesh point temperatures of case (b). Setting , we have, from Equation 2,

Some additional iterates are

All iterates beginning with the thirtieth are equal to to four decimal places. Consequently, solution to four decimal places. This agrees with our previous result given in Equation 4.

is the exact

The Jacobi iteration scheme applied to the linear system 5 with 49 unknowns produces iterates that begin repeating to four decimal places after 119 iterations. Thus, would provide the 49 temperatures of case (c) correct to four decimal places.

A Monte Carlo Technique In this section we describe a so-called Monte Carlo technique for computing the temperature at a single interior mesh point of the discrete problem without having to compute the temperatures at the remaining interior mesh points. First we define a discrete random walk along the net. By this we mean a directed path along the net lines (Figure 10.11.6) that joins a succession of mesh points such that the direction of departure from each mesh point is chosen at random. Each of the four possible directions of departure from each mesh point along the path is to be equally probable.

Figure 10.11.6 By the use of random walks, we can compute the temperature at a specified interior mesh point on the basis of the following property.

THEOREM 10.11.3 Random Walk Property

Let Let

be a succession of random walks, all of which begin at a specified interior mesh point. be the temperatures at the boundary mesh points first encountered along each of these

random walks. Then the average value

of these boundary temperatures approaches

the temperature at the specified interior mesh point as the number of random walks n increases without bound.

This property is a consequence of the discrete mean-value property that the mesh point temperatures satisfy. The proof of the random walk property involves elementary concepts from probability theory, and we will not give it here. In Table 2 we display the results of a large number of computer-generated random walks for the evaluation of the temperature of the nine-point mesh of case (b) in Figure 10.11.6. The first column lists the number n of the random walk. The second column lists the temperature of the boundary point first encountered along the corresponding random walk. The last column contains the cumulative average of the boundary temperatures encountered along the n random walks. Thus, after 1000 random walks we have the approximation . This compares with the exact value that we had previously evaluated. As can be seen, the convergence to the exact value is not too rapid. Table 2

Exercise Set 10.11 1. A plate in the form of a circular disk has boundary temperatures of on the left of its circumference and the right half of its circumference. A net with four interior mesh points is overlaid on the disk (see Figure Ex-1). (a) Using the discrete mean-value property, write the linear system approximate temperatures at the four interior mesh points.

that determines the

(b) Solve the linear system in part (a). (c) Use the Jacobi iteration scheme with

to generate the iterates

linear system in part (a). What is the “error vector”

, and

for the

, where t is the solution found in part (b)?

on

(d) By certain advanced methods, it can be determined that the exact temperatures to four decimal places at the four mesh points are and . What are the percentage errors in the values found in part (b)?

Figure Ex-1 Answer: (a)

(b)

(c)

(d) for

and

,

%; for

and

,

%

2. Use Theorem 10.11.1 to find the exact equilibrium temperature at the center of the disk in Exercise 1. Answer:

3. Calculate the first two iterates and for case (b) of Figure 10.11.3 with nine interior mesh points [Equation 2] when the initial iterate is chosen as

Answer:

4. The random walk illustrated in Figure Ex-4a can be described by six arrows that specify the directions of departure from the successive mesh points along the path. Figure Ex-4b is an array of 100 computer-generated, randomly oriented arrows arranged in a array. Use these arrows to determine random walks to approximate the temperature , as in Table 2. Proceed as follows: 1. Take the last two digits of your telephone number. Use the last digit to specify a row and the other to specify a column. 2. Go to the arrow in the array with that row and column number. 3. Using this arrow as a starting point, move through the array of arrows as you would read a book (left to right and top to bottom). Beginning at the point labeled in Figure Ex-4a and using this sequence of arrows to specify a sequence of directions, move from mesh point to mesh point until you reach a boundary mesh point. This completes your first random walk. Record the temperature at the boundary mesh point. (If you reach the end of the arrow array, continue with the arrow in the upper left corner.) 4. Return to the interior mesh point labeled and begin where you left off in the arrow array; generate your next random walk. Repeat this process until you have completed 10 random walks and have recorded 10 boundary temperatures. 5. Calculate the average of the 10 boundary temperatures recorded. (The exact value is

Figure Ex-4

.)

Section 10.11 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Suppose that we have the square region described by and suppose that the equilibrium temperature distribution along the boundary is given by , and . Suppose next that this region is partitioned into an mesh using

for

and

,

. If the temperatures of the interior mesh points are labeled by

then show that

for

and

. To handle the boundary points, define

for

and

. Next let

be the matrix with the identity matrix in the upper right-hand corner, a one in the lower left-hand corner, and zeros everywhere else. For example,

and so on. By defining the

show that if

is the

matrix

matrix with entries

, then the set of equations

for

and

can be written as the matrix equation

where we consider only those elements of

with

and

.

T2. The results of the preceding exercise and the discussion in the text suggest the following algorithm for solving for the equilibrium temperature in the square region given the boundary conditions

1. Choose a value for n, and then choose an initial guess, say

2. For each value of

where entries in

compute

using

is as defined in Exercise T1 . Then adjust

by replacing all edge entries by the initial edge

. [Note: The edge entries of a matrix are the entries in the first and last columns and first and

last rows.] 3. Continue this process until

is approximately the zero matrix. This suggests that

Use a computer and this algorithm to solve for Choose

and compute up to

Use a computer to compute of in . T3. Using the exact solution program to do the following: (a) Plot the surface the square region.

given that

. The exact solution can be expressed as

for i,

, 1, 2, 3, 4, 5, 6, and then compare your results to the values

for the temperature distribution described in Exercise T2 , use a graphing in three-dimensional xyz-space in which z is the temperature at the point

(b) Plot several isotherms of the temperature distribution (curves in the xy-plane over which the temperature is a constant). (c) Plot several curves of the temperature as a function of x with y held constant.

in

(d) Plot several curves of the temperature as a function of y with x held constant.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.12 Computed Tomography In this section we will see how constructing a cross-sectional view of a human body by analyzing X-ray scans leads to an inconsistent linear system. We present an iteration technique that provides an “approximate solution” of the linear system.

Prerequisites Linear Systems Natural Logarithms Euclidean Space

The basic problem of computed tomography is to construct an image of a cross section of the human body using data collected from many individual beams of X rays that are passed through the cross section. These data are processed by a computer, and the computed cross section is displayed on a video monitor. Figure 10.12.1 is a diagram of General Electric's CT system showing a patient prepared to have a cross section of his head scanned by X-ray beams.

Figure 10.12.1 Such a system is also known as a CAT scanner, for Computer-Aided Tomography scanner. Figure 10.12.2 shows a typical cross section of a human head produced by the system.

Figure 10.12.2 The first commercial system of computed tomography for medical use was developed in 1971 by G. N. Hounsfield of EMI, Ltd., in England. In 1979, Houndsfield and A. M. Cormack were awarded the Nobel Prize for their pioneering work in the field. As we will see in this section, the construction of a cross section, or tomograph, requires the solution of a large linear system of equations. Certain algorithms, called algebraic reconstruction techniques (ARTs), can be used to solve these linear systems, whose solutions yield the cross sections in digital form.

Scanning Modes Unlike conventional X-ray pictures that are formed by X rays that are projected perpendicular to the plane of the picture, tomographs are constructed from thousands of individual, hairline-thin X-ray beams that lie in the plane of the cross section. After they pass through the cross section, the intensities of the X-ray beams are measured by an X-ray detector, and these measurements are relayed to a computer where they are

processed. Figures 10.12.3 and 10.12.4 illustrate two possible modes of scanning the cross section: the parallel mode and the fan-beam mode. In the parallel mode a single X-ray source and X-ray detector pair are translated across the field of view containing the cross section, and many measurements of the parallel beams are recorded. Then the source and detector pair are rotated through a small angle, and another set of measurements is taken. This is repeated until the desired number of beam measurements is completed. For example, in the original 1971 machine, 160 parallel measurements were taken through 180 angles spaced apart: a total of beam measurements. Each such scan took approximately

minutes.

Figure 10.12.3

Figure 10.12.4 In the fan-beam mode of scanning, a single X-ray tube generates a fan of collimated beams whose intensities are measured simultaneously by an array of detectors on the other side of the field of view. The X-ray tube and detector array are rotated through many angles, and a set of measurements is taken at each angle until the scan is completed. In the General Electric CT system, which uses the fan-beam mode, each scan takes 1 second.

Derivation of Equations To see how the cross section is reconstructed from the many individual beam measurements, refer to Figure 10.12.5. Here the field of view in which the cross section is situated has been divided into many square pixels (picture elements) numbered 1 through N as indicated. It is our desire to determine the X-ray density of each pixel. In the EMI system, 6400 pixels were used, arranged in a square array. The G.E. CT system uses 262,144 pixels in a array, each pixel being about 1 mm on a side. After the densities of the pixels are determined by the method we will describe, they are reproduced on a video monitor, with each pixel shaded a level of gray proportional to its X-ray density. Because different tissues within the human body have different X-ray densities, the video display clearly distinguishes the various tissues and organs within the cross section.

Figure 10.12.5 Figure 10.12.6 shows a single pixel with an X-ray beam of roughly the same width as the pixel passing squarely through it. The photons constituting the X-ray beam are absorbed by the tissue within the pixel at a rate proportional to the X-ray density of the tissue. Quantitatively, the X-ray density of the jth pixel is denoted by and is defined by

where “ln” denotes the natural logarithmic function. Using the logarithm property

, we also have

Figure 10.12.6 If the X-ray beam passes through an entire row of pixels (Figure 10.12.7), then the number of photons leaving one pixel is equal to the number of photons entering the next pixel in the row. If the pixels are numbered , then the additive property of the logarithmic function gives

(1)

Thus, to determine the total X-ray density of a row of pixels, we simply sum the individual pixel densities.

Figure 10.12.7 Next, consider the X-ray beam in Figure 10.12.5. By the beam density of the ith beam of a scan, denoted by

, we mean

(2)

The numerator in the first expression for is obtained by performing a calibration scan without the cross section in the field of view. The resulting detector measurements are stored within the computer's memory. Then a clinical scan is performed with the cross section in the field of view, the 's of all the beams constituting the scan are computed, and the values are stored for further processing. For each beam that passes squarely through a row of pixels, we must have

Thus, if the ith beam passes squarely through a row of n pixels, then it follows from Equations 1 and 2 that In this equation, determined.

is known from the clinical and calibration measurements, and

are unknown pixel densities that must be

More generally, if the ith beam passes squarely through a row (or column) of pixels with numbers

, then we have

If we set

then we can write this equation as (3) We will refer to Equation 3 as the ith beam equation. Referring to Figure 10.12.5, however, we see that the beams of a scan do not necessarily pass through a row or column of pixels squarely. Instead, a typical beam passes diagonally through each pixel in its path. There are many ways to take this into account. In Figure 10.12.8 we outline three methods of defining the quantities that appear in Equation 3, each of which reduces to our previous definition when the beam passes squarely through a row or column of pixels. Reading down the figure, each method is more exact than its predecessor, but with successively more computational difficulty.

Figure 10.12.8 Using any one of the three methods to define the

's in the ith beam equation, we can write the set of M beam equations in a complete scan as

(4)

In this way we have a linear system of M equations (the M beam equations) in N unknowns (the N pixel densities). Depending on the number of beams and pixels used, we can have , , or . We will consider only the case , the so-called overdetermined case, in which there are more beams in the scan than pixels in the field of view. Because of inherent modeling and experimental errors in the problem, we should not expect our linear system to have an exact mathematical solution for the pixel densities. In the next section we attempt to find an “approximate” solution to this linear system.

Algebraic Reconstruction Techniques There have been many mathematical algorithms devised to treat the overdetermined linear system 4. The one we will describe belongs to the class of so-called Algebraic Reconstruction Techniques (ARTs). This method, which can be traced to an iterative technique originally introduced by S. Kaczmarz in 1937, was the one used in the first commercial machine. To introduce this technique, consider the following system of three equations in two unknowns: (5) The lines , , determined by these three equations are plotted in the -plane. As shown in Figure 10.12.9a, the three lines do not have a common intersection, and so the three equations do not have an exact solution. However, the points on the shaded triangle formed by the three lines are all situated “near” these three lines and can be thought of as constituting “approximate” solutions to our system. The following iterative procedure describes a geometric construction for generating points on the boundary of that triangular region (Figure 10.12.9b):

Algorithm 1 Step 0 Choose an arbitrary starting point Step 1 Project

in the

orthogonally onto the first line

-plane. and call the projection

. The superscript 1 indicates that this is the first of several

cycles through the steps. Step 2 Project

orthogonally onto the second line

and call the projection

.

Step 3 Project Step 4 Take

orthogonally onto the third line as the new value of

and call the projection

.

and cycle through Steps 1 through 3 again. In the second cycle, label the projected points

; in the third cycle, label the projected points

,

,

,

,

; and so forth.

This algorithm generates three sequences of points

that lie on the three lines , , and , respectively. It can be shown that as long as the three lines are not all parallel, then the first sequence converges to a point on , the second sequence converges to a point on , and the third sequence converges to a point on (Figure 10.12.9c). These three limit points form what is called the limit cycle of the iterative process. It can be shown that the limit cycle is independent of the starting point .

Figure 10.12.9 Next we discuss the specific formulas needed to effect the orthogonal projections in Algorithm 1. First, because the equation of a line in

-space is we can express it in vector form as where

The following theorem gives the necessary projection formula (Exercise 5).

THEOREM 10.12.1 Orthogonal Projection Formula Let L be a line in

with equation

and let

be any point in

(Figure 10.12.10). Then the orthogonal projection,

onto L is given by

Figure 10.12.10

E X A M P L E 1 Using Algorithm 1 We can use Algorithm 1 to find an approximate solution of the linear system given in 5 and illustrated in Figure 10.12.9. If we write the equations of the three lines as

where

then, using Theorem 10.12.1, we can express the iteration scheme in Algorithm 1 as

where after

for the first cycle of iterates, for the second cycle of iterates, and so forth. After each cycle of iterates (i.e., is computed), the next cycle of iterates is begun with set equal to .

Table 1 gives the numerical results of six cycles of iterations starting with the initial point Table 1

.

of

Using certain techniques that are impractical for large linear systems, we can show the exact values of the points of the limit cycle in this example to be

It can be seen that the sixth cycle of iterates provides an excellent approximation to the limit cycle. Any one of the three iterates , , can be used as an approximate solution of the linear system. (The large discrepancies in the values of , , and are due to the artificial nature of this illustrative example. In practical problems, these discrepancies would be much smaller.

To generalize Algorithm 1 so that it applies to an overdetermined system of M equations in N unknowns,

(6)

we introduce column vectors x and

as follows:

With these vectors, the M equations constituting our linear system 6 can be written in vector form as Each of these M equations defines what is called a hyperplane in the N-dimensional Euclidean space . In general these M hyperplanes have no common intersection, and so we seek instead some point in that is reasonably “close” to all of them. Such a point will constitute an approximate solution of the linear system, and its N entries will determine approximate pixel densities with which to form the desired cross section.

As in the two-dimensional case, we will introduce an iterative process that generates cycles of successive orthogonal projections onto the M hyperplanes beginning with some arbitrary initial point in . Our notation for these successive iterates is

The algorithm is as follows:

Algorithm 2 Step 0 Choose any point in

and label it

Step 1 For the first cycle of iterates, set Step 2 For

Step 3 Set

. .

, compute

.

Step 4 Increase the cycle number p by 1 and return to Step 2. In Step 2 the iterate

is called the orthogonal projection of

onto the hyperplane

. Consequently, as in the two-dimensional

case, this algorithm determines a sequence of orthogonal projections from one hyperplane onto the next in which we cycle back to the first hyperplane after each projection onto the last hyperplane. It can be shown that if the vectors point

,

span

, then the iterates

,

on that hyperplane which does not depend on the choice of the initial point

,

lying on the Mth hyperplane will converge to a . In computed tomography, one of the iterates

for p

sufficiently large is taken as an approximate solution of the linear system for the pixel densities. Note that for the center-of-pixel method, the scalar quantity

appearing in the equation in Step 2 of the algorithm is simply the number of

pixels in which the kth beam passes through the center. Similarly, note that the scalar quantity

in that same equation can be interpreted as the excess kth beam density that results if the pixel densities are set equal to the entries of

. This

provides the following interpretation of our ART iteration scheme for the center-of-pixel method: Generate the pixel densities of each iterate by distributing the excess beam density of successive beams in the scan evenly among those pixels in which the beam passes through the center. When the last beam in the scan has been reached, return to the first beam and continue.

E X A M P L E 2 Using Algorithm 2 We can use Algorithm 2 to find the unknown pixel densities of the 9 pixels arranged in the array illustrated in Figure 10.12.11. These 9 pixels are scanned using the parallel mode with 12 beams whose measured beam densities are indicated in the figure. We choose the center-of-pixel method to set up the 12 beam equations. (In Exercises 7 and 8, you are asked to set up the beam equations using the center line and area methods.) As you can verify, the beam equations are

Table 2 illustrates the results of the iteration scheme starting with an initial . The table gives the values of each of the first cycle of iterates, through , but thereafter gives the iterates only for various values of p. The iterates start repeating to two decimal places for

, and so we take the entries of

as approximate values of the 9 pixel densities.

Figure 10.12.11 Table 2

We close this section by noting that the field of computed tomography is presently a very active research area. In fact, the ART scheme discussed here has been replaced in commercial systems by more sophisticated techniques that are faster and provide a more accurate view of the cross section. However, all the new techniques address the same basic mathematical problem: finding a good approximate solution of a large overdetermined inconsistent linear system of equations.

Exercise Set 10.12 1. (a) Setting

, show that the three projection equations

for the three lines in Equation 5 can be written as

where

for

.

(b) Show that the three pairs of equations in part (a) can be combined to produce

where

. [Note: Using this pair of equations, we can perform one complete cycle of three orthogonal

projections in a single step.] (c) Because

as

tends to the limit point

. Solve this linear system for

as

, the equations in part (b) become

. [Note: The simplifications of the ART formulas described in this exercise are

impractical for the large linear systems that arise in realistic computed tomography problems.] Answer: (c) 2. Use the result of Exercise 1(b) to find (a) (b) (c) Answer: (a)

(b) Same as part (a)

to five decimal places in Example 1 using the following initial points:

(c)

3. (a) Show directly that the points of the limit cycle in Example 1,

form a triangle whose vertices lie on the lines

,

, and

and whose sides are perpendicular to these lines (Figure 10.12.9c).

(b) Using the equations derived in Exercise 1(a), show that if

, then

[Note: Either part of this exercise shows that successive orthogonal projections of any point on the limit cycle will move around the limit cycle indefinitely.] 4. The following three lines in the

-plane,

do not have a common intersection. Draw an accurate sketch of the three lines and graphically perform several cycles of the orthogonal projections described in Algorithm 1, beginning with the initial point . On the basis of your sketch, determine the three points of the limit cycle. Answer: ,

,

5. Prove Theorem 10.12.1 by verifying that (a) the point (b) the vector

as defined in the theorem lies on the line is orthogonal to the line

6. As stated in the text, the iterates

(i.e.,

(i.e.,

).

is parallel to a).

defined in Algorithm 2 will converge to a unique limit point

if the vectors

span . Show that if this is the case and if the center-of-pixel method is used, then the center of each of the N pixels in the field of view is crossed by at least one of the M beams in the scan. 7. Construct the 12 beam equations in Example 2 using the center line method. Assume that the distance between the center lines of adjacent beams is equal to the width of a single pixel. Answer:

8. Construct the 12 beam equations in Example 2 using the area method. Assume that the width of each beam is equal to the width of a single pixel and that the distance between the center lines of adjacent beams is also equal to the width of a single pixel. Answer:

Section 10.12 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Given the set of equations for

(with

), let us consider the following algorithm for obtaining an approximate solution to the system.

1. Solve all possible pairs of equations for i,

and

for their unique solutions. This leads to

solutions, which we label as for i,

and

.

2. Construct the geometric center of these points defined by

and use this as the approximate solution to the original system.

Use this algorithm to approximate the solution to the system

and compare your results to those in this section. T2. (Calculus required) Given the set of equations for

(with

system. Given a point

If we define a function

), let us consider the following least squares algorithm for obtaining an approximate solution and the line

, the distance from this point to the line is given by

by

and then determine the point

that minimizes this function, we will determine the point that is closest to each of these lines in a

summed least squares sense. Show that

and

and

Apply this algorithm to the system

and compare your results to those in this section.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

are solutions to the system

to the

10.13 Fractals In this section we will use certain classes of linear transformations to describe and generate intricate sets in the Euclidean plane. These sets, called fractals, are currently the focus of much mathematical and scientific research.

Prerequisites Geometry of Linear Operators on

(Section 4.11)

Euclidean Space Natural Logarithms Intuitive Understanding of Limits

Fractals in the Euclidean Plane At the end of the nineteenth century and the beginning of the twentieth century, various bizarre and wild sets of points in the Euclidean plane began appearing in mathematics. Although they were initially mathematical curiosities, these sets, called fractals, are rapidly growing in importance. It is now recognized that they reveal a regularity in physical and biological phenomena previously dismissed as “random,” “noisy,” or “chaotic.” For example, fractals are all around us in the shapes of clouds, mountains, coastlines, trees, and ferns. In this section we give a brief description of certain types of fractals in the Euclidean plane . Much of this description is an outgrowth of the work of two mathematicians, Benoit B. Mandelbrot and Michael Barnsley, who are both active researchers in the field.

Self-Similar Sets To begin our study of fractals, we need to introduce some terminology about sets in . We will call a set in bounded if it can be enclosed by a suitably large circle (Figure 10.13.1) and closed if it contains all of its boundary points (Figure 10.13.2). Two sets in will be called congruent if they can be made to coincide exactly by translating and rotating them appropriately within (Figure 10.13.3). We will also rely on your intuitive concept of overlapping and nonoverlapping sets, as illustrated in Figure 10.13.4.

Figure 10.13.1

Figure 10.13.2 The boundary points (solid color) lie in the set.

Figure 10.13.3

Figure 10.13.4 If is the linear operator that scales by a factor of s (see Table 7 of Section 4.9), and if Q is a set in , then the set under T) is called a dilation of the set Q if and a contraction of Q if (Figure 10.13.5). In either case we say that s.

(the set of images of points in Q is the set Q scaled by the factor

Figure 10.13.5 A contraction of Q. The types of fractals we will consider first are called self-similar. In general, we define a self-similar set in

as follows:

DEFINITION 1 A closed and bounded subset of the Euclidean plane

is said to be self-similar if it can be expressed in the form (1)

where

are nonoverlapping sets, each of which is congruent to S scaled by the same factor s

.

If S is a self-similar set, then 1 is sometimes called a decomposition of S into nonoverlapping congruent sets.

E X A M P L E 1 Line Segment A line segment in (Figure 10.13.6a) can be expressed as the union of two nonoverlapping congruent line segments (Figure 10.13.6b). In Figure 10.13.6b we have separated the two line segments slightly so that they can be seen more easily. Each of these two smaller line segments is congruent to the original line segment scaled by a factor of . Hence, a line segment is a self-similar set with and .

Figure 10.13.6

E X A M P L E 2 Square A square (Figure 10.13.7a) can be expressed as the union of four nonoverlapping congruent squares (Figure 10.13.7b), where we have again separated the smaller squares slightly. Each of the four smaller squares is congruent to the original square scaled by a factor of . Hence, a square is a self-similar set with

and

.

Figure 10.13.7

E X A M P L E 3 Sierpinski Carpet The set suggested by Figure 10.13.8a, the Sierpinski “carpet,” was first described by the Polish mathematician Waclaw Sierpinski (1882–1969). It can be expressed as the union of eight nonoverlapping congruent subsets (Figure 10.13.8b), each of which is congruent to the original set scaled by a factor of . Hence, it is a self-similar set with and . Note that the intricate square-within-a-square pattern continues forever on a smaller and smaller scale (although this can only be suggested in a figure such as the one shown).

Figure 10.13.8

E X A M P L E 4 Sierpinski Triangle Figure 10.13.9a illustrates another set described by Sierpinski. It is a self-similar set with

and

carpet, the intricate triangle-within-a-triangle pattern continues forever on a smaller and smaller scale.

Figure 10.13.9

(Figure 10.13.9b). As with the Sierpinski

The Sierpinski carpet and triangle have a more intricate structure than the line segment and the square in that they exhibit a pattern that is repeated indefinitely. This difference will be explored later in this section.

Topological Dimension of a Set In Section 4.5 we defined the dimension of a subspace of a vector space to be the number of vectors in a basis, and we found that definition to coincide with our intuitive sense of dimension. For example, the origin of is zero-dimensional, lines through the origin are one-dimensional, and itself is two-dimensional. This definition of dimension is a special case of a more general concept called topological dimension, which is applicable to sets in that are not necessarily subspaces. A precise definition of this concept is studied in a branch of mathematics called topology. Although that definition is beyond the scope of this text, we can state informally that • a point in

has topological dimension zero;

• a curve in

has topological dimension one;

• a region in

has topological dimension two.

It can be proved that the topological dimension of a set in set S by .

must be an integer between 0 and n, inclusive. In this text we will denote the topological dimension of a

E X A M P L E 5 Topological Dimensions of Sets Table 1 gives the topological dimensions of the sets studied in our earlier examples. The first two results in this table are intuitively obvious; however, the last two are not. Informally stated, the Sierpinski carpet and triangle both contain so many “holes” that those sets resemble web-like networks of lines rather than regions. Hence they have topological dimension one. The proofs are quite difficult. Table 1

Hausdorff Dimension of a Self-Similar Set In 1919 the German mathematician Felix Hausdorff (1868–1942) gave an alternative definition for the dimension of an arbitrary set in complicated, but for a self-similar set, it reduces to something rather simple:

. His definition is quite

DEFINITION 1 The Hausdorff dimension of a self-similar set S of form 1 is denoted by

and is defined by (2)

In this definition, “ln” denotes the natural logarithm function. Equation 2 can also be expressed as (3) in which the Hausdorff dimension

appears as an exponent. Formula 3 is more helpful for interpreting the concept of Hausdorff dimension; it states, for

example, that if you scale a self-similar set by a factor of segment by a factor of .

, then its area (or more properly its measure) decreases by a factor of

reduces its measure (length) by a factor of

, and scaling a square region by a factor of

. Thus, scaling a line

reduces its measure (area) by a factor of

Before proceeding to some examples, we should note a few facts about the Hausdorff dimension of a set: • The topological dimension and Hausdorff dimension of a set need not be the same. • The Hausdorff dimension of a set need not be an integer. • The topological dimension of a set is less than or equal to its Hausdorff dimension; that is,

.

E X A M P L E 6 Hausdorff Dimensions of Sets Table 2 lists the Hausdorff dimensions of the sets studied in our earlier examples. Table 2

Fractals Comparing Tables 1 and 2, we see that the Hausdorff and topological dimensions are equal for both the line segment and square but are unequal for the Sierpinski carpet and triangle. In 1977 Benoit B. Mandelbrot suggested that sets for which the topological and Hausdorff dimensions differ must be quite complicated (as Hausdorff had earlier suggested in 1919). Mandelbrot proposed calling such sets fractals, and he offered the following definition.

DEFINITION 3 A fractal is a subset of a Euclidean space whose Hausdorff dimension and topological dimension are not equal.

According to thisdefinition, the Sierpinski carpet and Sierpinski triangle are fractals, whereas the line segment and square are not. It follows from the preceding definition that a set whose Hausdorff dimension is not an integer must be a fractal (why?). However, we will see later that the converse is not true; that is, it is possible for a fractal to have an integer Hausdorff dimension.

Similitudes We will now show how some techniques from linear algebra can be used to generate fractals. This linear algebra approach also leads to algorithms that can be exploited to draw fractals on a computer. We begin with a definition.

DEFINITION 4 A similitude with scale factor s is a mapping of

into

of the form

where s, θ, e, and f are scalars.

Geometrically, a similitude is a composition of three simpler mappings: a scaling by a factor of s, a rotation about the origin through an angle θ, and a translation (e units in the x-direction and f units in the y-direction). Figure 10.13.10 illustrates the effect of a similitude on the unit square U.

Figure 10.13.10 For our application to fractals, we will need only similitudes that are contractions, by which we mean that the scale factor s is restricted to the range Consequently, when we refer to similitudes we will always mean similitudes subject to this restriction.

.

Similitudes are important in the study of fractals because of the following fact: If by s.

is a similitude with scale factor s and if S is a closed and bounded set in

Recall from the definition of a self-similar set in

that a closed and bounded set S in

, then the image

of the set S under T is congruent to S scaled

is self-similar if it can be expressed in the form

where are nonoverlapping sets each of which is congruent to S scaled by the same factor s [see 1]. In the following examples, we will find similitudes that produce the sets from S for the line segment, square, Sierpinski carpet, and Sierpinski triangle.

E X A M P L E 7 Line Segment We will take as our line segment the line segment S connecting the points similitudes

and

in the xy-plane (Figure 10.13.11a). Consider the two

(4)

both of which have

and

. In Figure 10.13.11b we show how these two similitudes map the unit square U. The similitude

maps U onto

the smaller square , and the similitude maps U onto the smaller square . At the same time, maps the line segment S onto the smaller line segment , and maps S onto the smaller nonoverlapping line segment . The union of these two smaller nonoverlapping line segments is precisely the original line segment S; that is, (5)

Figure 10.13.11

E X A M P L E 8 Square Let us consider the unit square U in the xy-plane (Figure 10.13.12a) and the following four similitudes, all having

and

:

(6)

The images of the unit square U under these four similitudes are the four squares shown in Figure 10.13.12b. Thus, (7) is a decomposition of U into four nonoverlapping squares that are congruent to U scaled by the same scale factor

.

Figure 10.13.12

E X A M P L E 9 Sierpinski Carpet Let us consider a Sierpinski carpet S over the unit square U of the xy-plane (Figure 10.13.13a) and the following eight similitudes, all having

and

: (8) where the eight values of

are

The images of S under these eight similitudes are the eight sets shown in Figure 10.13.13b. Thus, (9) is a decomposition of S into eight nonoverlapping sets that are congruent to S scaled by the same scale factor

.

Figure 10.13.13

E X A M P L E 1 0 Sierpinski Triangle Let us consider a Sierpinski triangle S fitted inside the unit square U of the xy-plane, as shown in Figure 10.13.14a, and the following three similitudes, all having and :

(10)

The images of S under these three similitudes are the three sets in Figure 10.13.14b. Thus, (11) is a decomposition of S into three nonoverlapping sets that are congruent to S scaled by the same scale factor

.

Figure 10.13.14

In the preceding examples we started with a specific set S and showed that it was self-similar by finding similitudes such that were nonoverlapping sets and such that

with the same scale factor

(12) The following theorem addresses the converse problem of determining a self-similar set from a collection of similitudes.

THEOREM 10.13.1 If such that

are contracting similitudes with the same scale factor, then there is a unique nonempty closed and bounded set S in the Euclidean plane

Furthermore, if the sets

are nonoverlapping, then S is self-similar.

Algorithms for Generating Fractals In general, there is no simple way to obtain the set S in the preceding theorem directly. We now describe an iterative procedure that will determine S from the similitudes that define it. We first give an example of the procedure and then give an algorithm for the general case.

E X A M P L E 11 Sierpinski Carpet Figure 10.13.15 shows the unit square region in the xy-plane, which will serve as an “initial” set for an iterative procedure for the construction of the Sierpinski carpet. The set in the figure is the result of mapping with each of the eight similitudes in 8 that determine the Sierpinski carpet. It consists of eight square regions, each of side length , surrounding an empty middle square. Next we apply the eight similitudes to and arrive at the set . Similarly, applying the eight similitudes to results in the set sets will “converge” to a set S, which is the Sierpinski carpet.

Figure 10.13.15

. It we continue this process indefinitely, the sequence of

Remark Although we should properly give a definition of what it means for a sequence of sets to “converge” to a given set, an intuitive interpretation will suffice in this introductory treatment. Although we started in Figure 10.13.15 with the unit square region to arrive at the Sierpinski carpet, we could have started with any nonempty set . The only restriction is that the set be closed and bounded. For example, if we start with the particular set shown in Figure 10.13.16, then is the set obtained by applying each of the eight similitudes in 8. Applying the eight similitudes to results in the set . As before, applying the eight similitudes indefinitely yields the Sierpinski carpet S as the limiting set.

Figure 10.13.16 The general algorithm illustrated in the preceding example is as follows: Let arbitrary set Q in , define the set by The following algorithm generates a sequence of sets

be contracting similitudes with the same scale factor, and for an

that converges to the set S in Theorem 10.13.1.

Algorithm 1 Step 0 Choose an arbitrary nonempty closed and bounded set Step 1 Compute

.

Step 2 Compute

.

Step 3 Compute

.

Step n Compute

in

.

.

E X A M P L E 1 2 Sierpinski Triangle Let us construct the Sierpinski triangle determined by the three similitudes given in 10. The corresponding set mapping is . Figure 10.13.17 shows an arbitrary closed and bounded set ; the first four iterates , set S (the Sierpinski triangle).

,

,

; and the limiting

Figure 10.13.17

E X A M P L E 1 3 Using Algorithm 1 Consider the following two similitudes:

The actions of these two similitudes on the unit square U are illustrated in Figure 10.13.18. Here, the rotation angle θ is a parameter that we will vary to generate different self-similar sets. The self-similar sets determined by these two similitudes are shown in Figure 10.13.19 for various values of θ. For simplicity, we have not drawn the xy-axes, but in each case the origin is the lower left point of the set. These sets were generated on a computer using Algorithm 1 for the various values of θ. Because and , it follows from 2 that the Hausdorff dimension of these sets for any value of θ is 1. It can be shown that the topological dimension of these sets is 1 for and 0 for all other values of θ. It follows that the self-similar set for is not a fractal [it is the straight line segment from to ], while the self-similar sets for all other values of θ are fractals. In particular, they are examples of fractals with integer Hausdorff dimension.

Figure 10.13.18

Figure 10.13.19

A Monte Carlo Approach

The set-mapping approach of constructing self-similar sets described in Algorithm 1 is rather time-consuming on a computer because the similitudes involved must be applied to each of the many computer screen pixels in the successive iterated sets. In 1985 Michael Barnsley described an alternative, more practical method of generating a self-similar set defined through its similitudes. It is a so-called Monte Carlo method that takes advantage of probability theory. Barnsley refers to it as the Random Iteration Algorithm. Let

be contracting similitudes with the same scale factor. The following algorithm generates a sequence of points

that collectively converge to the set S in Theorem 10.13.1.

Algorithm 2 Step 0 Choose an arbitrary point

in S.

Step 1 Choose one of the k similitudes at random, say

, and compute

Step 2 Choose one of the k similitudes at random, say

, and compute

Step n Choose one of the k similitudes at random, say

, and compute

On a computer screen the pixels corresponding to the points generated by this algorithm will fill out the pixel representation of the limiting set S. Figure 10.13.20 shows four stages of the Random Iteration Algorithm that generate the Sierpinski carpet, starting with the initial point

.

Remark Although Step 0 in the preceding algorithm requires the selection of an initial point in the set S, which may not be known in advance, this is not a serious problem. In practice, one can usually start with any point in and after a few iterations (say ten or so), the point generated will be sufficiently close to S that the algorithm will work correctly from that point on.

Figure 10.13.20

More General Fractals So far, we have discussed fractals that are self-similar sets according to the definition of a self-similar set in . However, Theorem 10.13.1 remains true if the similitudes are replaced by more general transformations, called contracting affine transformations. An affine transformation is defined as follows:

DEFINITION 5 An affine transformation is a mapping of

into

of the form

where a, b, c, d, e, and f are scalars.

Figure 10.13.21 shows how an affine transformation maps the unit square U onto a parallelogram . An affine transformation is said to be contracting if the Euclidean distance between any two points in the plane is strictly decreased after the two points are mapped by the transformation. It can be shown that any k contracting affine transformations determine a unique closed and bounded set S satisfying the equation

(13) Equation 13 has the same form as Equation 12, which we used to find self-similar sets. Although Equation 13, which uses contracting affine transformations, does not determine a self-similar set S, the set it does determine has many of the features of self-similar sets. For example, Figure 10.13.22 shows how a set in the plane resembling a fern (an example made famous by Barnsley) can be generated through four contracting affine transformations. Note that the middle fern is the slightly overlapping union of the four smaller affine-image ferns surrounding it. Note also how , because the determinant of its matrix part is zero, maps the entire fern onto the small straight line segment between the points and . Figure 10.13.22 contains a wealth of information and should be studied carefully.

Figure 10.13.21

Figure 10.13.22 Michael Barnsley has applied the above theory to the field of data compression and transmission. The fern, for example, is completely determined by the four affine transformations , , , . These four transformations, in turn, are determined by the 24 numbers given in Figure 10.13.22 defining their corresponding values

of a, b, c, d, e, and f. In other words, these 24 numbers completely encode the picture of the fern. Storing these 24 numbers in a computer requires considerably less memory space than storing a pixel-by-pixel description of the fern. In principle, any picture represented by a pixel map on a computer screen can be described through a finite number of affine transformations, although it is not easy to determine which transformations to use. Nevertheless, once encoded, the affine transformations generally require several orders of magnitude less computer memory than a pixel-by-pixel description of the pixel map.

Further Readings Readers interested in learning more about fractals are referred to the following books, the first of which elaborates on the linear transformation approach of this section. 1. Michael Barnsley, Fractals Everywhere (New York: Academic Press, 1993). 2. Benoit B. Mandelbrot, The Fractal Geometry of Nature (New York: W. H. Freeman, 1982). 3. Heinz-Otto Peitgen and P. H. Richter, The Beauty of Fractals (New York: Springer-Verlag, 1986). 4. Heinz-Otto Peitgen and Dietmar Saupe, The Science of Fractal Images (New York: Springer-Verlag, 1988).

Exercise Set 10.13 1. The self-similar set in Figure Ex-1 has the sizes indicated. Given that its lower left corner is situated at the origin of the xy-plane, find the similitudes that determine the set. What is its Hausdorff dimension? Is it a fractal?

Figure Ex-1 Answer:

2. Find the Hausdorff dimension of the self-similar set shown in Figure Ex-2. Use a ruler to measure the figure and determine an approximate value of the scale factor s. What are the rotation angles of the similitudes determining this set?

Figure Ex-2 Answer: ;

Rotation angles:

(upper left);

(upper right);

3. Each of the 12 self-similar sets in Figure Ex-3 results from three similitudes with scale factor of

(lower left);

(lower right)

, and so all have Hausdorff dimension

. The

rotation angles of the three similitudes are all multiples of . Find these rotation angles for each set and express them as a triplet of integers , where is the corresponding integer multiple of in the order upper right, lower left, lower right. For example, the first set (the Sierpinski triangle) generates the triplet .

Figure Ex-3 Answer:

4. For each of the self-similar sets in Figure Ex-4, find: (i) the scale factor s of the similitudes describing the set; (ii) the rotation angles of all similitudes describing the set (all rotation angles are multiples of

); and

(iii) the Hausdorff dimension of the set. Which of the sets are fractals and why?

Figure Ex-4 Answer: (a) (i)

; (ii) all rotation angles are

(b) (i)

; (ii) all rotation angles are

(c) (i)

; (ii) rotation angles:

(top); 1

(d) (i)

; (ii) rotation angles:

(upper left);

; (iii)

This set is a fractal.

; (iii)

This set is a fractal. (lower left); (upper right);

(lower right); (iii) (lower right); (iii)

5. Show that of the four affine transformations shown in Figure 10.13.22, only the transformation Answer:

This set is a fractal. This set is a fractal. is a similitude. Determine its scale factor s and rotation angle .

6. Find the coordinates of the tip of the fern in Figure 10.13.22. [Hint: The transformation

maps the tip of the fern to itself.]

Answer: (0.766, 0.996) rounded to three decimal places 7. The square in Figure 10.13.7a was expressed as the union of 4 nonoverlapping squares as in Figure 10.13.7b. Suppose that it is expressed instead as the union of 16 nonoverlapping squares. Verify that its Hausdorff dimension is still 2, as determined by Equation 2. Answer:

8. Show that the four similitudes

express the unit square as the union of four overlapping squares. Evaluate the right-hand side of Equation 2 for the values of k and s determined by these similitudes, and show that the result is not the correct value of the Hausdorff dimension of the unit square. [Note: This exercise shows the necessity of the nonoverlapping condition in the definition of a self-similar set and its Hausdorff dimension.] Answer:

9. All of the results in this section can be extended to . Compute the Hausdorff dimension of the unit cube in (see Figure Ex-9). Given that the topological dimension of the unit cube is 3, determine whether it is a fractal. [Hint: Express the unit cube as the union of eight smaller congruent nonoverlapping cubes.]

Figure Ex-9 Answer: ; the cube is not a fractal. 10. The set in in Figure Ex-10 is called the Menger sponge. It is a self-similar set obtained by drilling out certain square holes from the unit cube. Note that each face of the Menger sponge is a Sierpinski carpet and that the holes in the Sierpinski carpet now run all the way through the Menger sponge. Determine the values of k and s for the Menger sponge and find its Hausdorff dimension. Is the Menger sponge a fractal?

Figure Ex-10 Answer: ;

;

; the set is a fractal.

11. The two similitudes

and

determine a fractal known as the Cantor set. Starting with the unit square region U as an initial set, sketch the first four sets that Algorithm 1 determines. Also, find the Hausdorff dimension of the Cantor set. (This famous set was the first example that Hausdorff gave in his 1919 paper of a set whose Hausdorff dimension is not equal to its topological dimension.) Answer:

12. Compute the areas of the sets

,

,

,

, and

in Figure 11.13.15.

Answer: Area of

; area of

; area of

; area of

; area of

Section 10.13 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets.

T1. Use similitudes of the form

to show that the Menger sponge (see Exercise 10) is the set S satisfying

for appropriately chosen similitudes

(for

). Determine these similitudes by determining the collection of

T2. Generalize the ideas involved in the Cantor set (in

), the Sierpinski carpet (in

), and the Menger sponge (in

) to

with

where each

equals 0,

, or

thereby determining the value of

, and no two of them ever equal

for

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

at the same time. Use a computer to construct the set

, 3, 4. Then develop an expression for

.

matrices

by considering the set S satisfying

10.14 Chaos In this section we use a map of the unit square in the xy-plane onto itself to describe the concept of a chaotic mapping.

Prerequisites Geometry of Linear Operators on

(Section 4.11)

Eigenvalues and Eigenvectors Intuitive Understanding of Limits and Continuity

Chaos The word chaos was first used in a mathematical sense in 1975 by Tien-Yien Li and James Yorke in a paper entitled “Period Three Implies Chaos.” The term is now used to describe the behavior of certain mathematical mappings and physical phenomena that at first glance seem to behave in a random or disorderly fashion but actually have an underlying element of order (examples include random-number generation, shuffling cards, cardiac arrhythmia, fluttering airplane wings, changes in the red spot of Jupiter, and deviations in the orbit of Pluto). In this section we discuss a particular chaotic mapping called Arnold's cat map, after the Russian mathematician Vladimir I. Arnold who first described it using a diagram of a cat.

Arnold's Cat Map To describe Arnold's cat map, we need a few ideas about modular arithmetic. If x is a real number, then the notation denotes the unique number in the interval that differs from x by an integer. For example, Note that if x is a nonnegative number, then x mod 1 is simply the fractional part of x. If then the notation mod 1 denotes (x mod 1, y mod 1). For example, Observe that for every real number x, the point x mod 1 lies in the unit interval point mod 1 lies in the unit square

is an ordered pair of real numbers,

and that for every ordered pair

, the

Also observe that the upper boundary and the right-hand boundary of the square are not included in S. Arnold's cat map is the transformation

defined by the formula

or, in matrix notation, (1) To understand the geometry of Arnold's cat map, it is helpful to write 1 in the factored form

which expresses Arnold's cat map as the composition of a shear in the x-direction with factor 1, followed by a shear in the y-direction with factor 1. Because the computations are performed mod 1, maps all points of into the unit square S.

We will illustrate the effect of Arnold's cat map on the unit square S, which is shaded in Figure 10.14.1a and contains a picture of a cat. It can be shown that it does not matter whether the mod 1 computations are carried out after each shear or at the very end. We will discuss both methods, first performing them at the end. The steps are as follows: Step 1 Shear in the x-direction with factor 1 (Figure 10.14.1b): or in matrix notation

Step 2 Shear in the y-direction with factor 1 (Figure 10.14.1c): or, in matrix notation,

Step 3 Reassembly into S (Figure 10.14.1d): The geometric effect of the mod 1 arithmetic is to break up the parallelogram in Figure 10.14.1c and reassemble the pieces of S as shown in Figure 10.14.1d.

Figure 10.14.1 For computer implementation, it is more convenient to perform the mod 1 arithmetic at each step, rather than at the end. With this approach there is a reassembly at each step, but the net effect is the same. The steps are as follows: Step 1 Shear in the x-direction with factor 1, followed by a reassembly into S (Figure 10.14.2b): Step 2 Shear in the y-direction with factor 1, followed by a reassembly into S (Figure 10.14.2c):

Figure 10.14.2

Repeated Mappings Chaotic mappings such as Arnold's cat map usually arise in physical models in which an operation is performed repeatedly. For example, cards are mixed by repeated shuffles, paint is mixed by repeated stirs, water in a tidal basin is mixed by repeated tidal changes, and so forth. Thus, we are interested in examining the effect on S of repeated applications (or iterations) of Arnold's cat map. Figure 10.14.3, which was generated on a computer, shows the effect of 25 iterations of Arnold's cat map on the cat in the unit square S. Two interesting phenomena occur: • The cat returns to its original form at the 25th iteration. • At some of the intermediate iterations, the cat is decomposed into streaks that seem to have a specific direction. Much of the remainder of this section is devoted to explaining these phenomena.

Figure 10.14.3

Periodic Points Our first goal is to explain why the cat in Figure 10.14.3 returns to its original configuration at the 25th iteration. For this purpose it will be helpful to think of a picture in the xy-plane as an assignment of colors to the points in the plane. For pictures generated on a computer screen or other digital device, hardware limitations require that a picture be broken up into discrete squares, called pixels. For example, in the computer-generated pictures in Figure 10.14.3 the unit square S is divided into a grid with 101 pixels on a side for a total of 10,201 pixels, each of which is black or white (Figure 10.14.4). An assignment of colors to pixels to create

a picture is called a pixel map.

Figure 10.14.4 As shown in Figure 10.14.5, each pixel in S can be assigned a unique pair of coordinates of the form ( ) that identifies its lower left-hand corner, where m and n are integers in the range . We call these points pixel points because each such point identifies a unique pixel. Instead of restricting the discussion to the case where S is subdivided into an array with 101 pixels on a side, let us consider the more general case where there are p pixels per side. Thus, each pixel map in S consists of pixels uniformly spaced units apart in both the x- and the y-directions. The pixel points in S have coordinates of the form

, where m and n are integers ranging from 0 to

.

Figure 10.14.5 Under Arnold's cat map each pixel point of S is transformed into another pixel point of S. To see why this is so, observe that the image of the pixel point under is given in matrix form by (2) The ordered pair

mod 1 is of the form

, where

and

lie in the range

. Specifically, and are the remainders when and are divided by p, respectively. Consequently, each point in S of the form is mapped onto another point of the same form. Because Arnold's cat map transforms every pixel point of S into another pixel point of S, and because there are only pixel points in S, it follows that any given pixel point must return to its original position after at most map.

E X A M P L E 1 Using Formula 2 If

, then 2 becomes

different

iterations of Arnold's cat

In this case the successive iterates of the point

are

(verify). Because the point returns to its initial position on the ninth application of Arnold's cat map (but no sooner), the point is said to have period 9, and the set of nine distinct iterates of the point is called a 9-cycle. Figure 10.14.6 shows this 9-cycle with the initial point labeled 0 and its successive iterates labeled accordingly.

Figure 10.14.6

In general, a point that returns to its initial position after n applications of Arnold's cat map, but does not return with fewer than n applications, is said to have period n, and its set of n distinct iterates is called an n-cycle. Arnold's cat map maps into , so this point has period 1. Points with period 1 are also called fixed points. We leave it as an exercise (Exercise 11) to show that is the only fixed point of Arnold's cat map.

Period Versus Pixel Width If and are points with periods and , respectively, then returns to its initial position in iterations (but no sooner), and returns to its initial position in iterations (but no sooner); thus, both points return to their initial positions in any number of iterations that is a multiple of both and . In general, for a pixel map with pixel points of the form , we let denote the least common multiple of the periods of all the pixel points in the map [i.e., is the smallest integer that is divisible by all of the periods]. It follows that the pixel map will return to its initial configuration in iterations of Arnold's cat map (but no sooner). For this reason, we call the period of the pixel map. In Exercise 4 we ask you to show that if , then all pixel points have period , or 25, so . This explains why the cat in Figure 10.14.3 returned to its initial configuration in 25 iterations. Figure 10.14.7 shows how the period of a pixel map varies with p. Although the general tendency is for the period to increase as p increases, there is a surprising amount of irregularity in the graph. Indeed, there is no simple function that specifies this relationship (see Exercise 1).

Figure 10.14.7 Although a pixel map with p pixels on a side does not return to its initial configuration until iterations have occurred, various unexpected things can occur at intermediate iterations. For example, Figure 10.14.8 shows a pixel map with of the famous Hungarian-American mathematician John von Neumann. It can be shown that ; hence, the pixel map will return to its initial configuration after 750 iterations of Arnold's cat map (but no sooner). However, after 375 iterations the pixel map is turned upside down, and after another 375 iterations (for a total of 750) the pixel map is returned to its initial configuration. Moreover, there are so many pixel points with periods that divide 750 that multiple ghostlike images of the original likeness occur at intermediate iterations; at 195 iterations numerous miniatures of the original likeness occur in diagonal rows.

Figure 10.14.8

The Tiled Plane

Our next objective is to explain the cause of the linear streaks that occur in Figure 10.14.3. For this purpose it will be helpful to view Arnold's cat map another way. As defined, Arnold's cat map is not a linear transformation because of the mod 1 arithmetic. However, there is an alternative way of defining Arnold's cat map that avoids the mod 1 arithmetic and results in a linear transformation. For this purpose, imagine that the unit square S with its picture of the cat is a “tile,” and suppose that the entire plane is covered with such tiles, as in Figure 10.14.9. We say that the xy-plane has been tiled with the unit square. If we apply the matrix transformation in 1 to the entire tiled plane without performing the mod 1 arithmetic, then it can be shown that the portion of the image within S will be identical to the image that we obtained using the mod 1 arithmetic (Figure 10.14.9). In short, the tiling results in the same pixel map in S as the mod 1 arithmetic, but in the tiled case Arnold's cat map is a linear transformation.

Figure 10.14.9 It is important to understand, however, that tiling and mod 1 arithmetic produce periodicity in different ways. If a pixel map in S has period n, then in the case of mod 1 arithmetic, each point returns to its original position at the end of n iterations. In the case of tiling, points need not return to their original positions; rather, each point is replaced by a point of the same color at the end of n iterations.

Properties of Arnold's Cat Map To understand the cause of the streaks in Figure 10.14.3, think of Arnold's cat map as a linear transformation on the tiled plane. Observe that the matrix

that defines Arnold's cat map is symmetric and has a determinant of 1. The fact that the determinant is 1 means that multiplication by this matrix preserves areas; that is, the area of any figure in the plane and the area of its image are the same. This is also true for figures in S in the case of mod 1 arithmetic, since the effect of the mod 1 arithmetic is to cut up the figure and reassemble the pieces without any overlap, as shown in Figure 10.14.1d. Thus, in Figure 10.14.3 the area of the cat (whatever it is) is the same as the total area of the blotches in each iteration. The fact that the matrix is symmetric means that its eigenvalues are real and the corresponding eigenvectors are perpendicular. We leave it for you to show that the eigenvalues and corresponding eigenvectors of C are

For each application of Arnold's cat map, the eigenvalue causes a stretching in the direction of the eigenvector by a factor of and the eigenvalue causes a compression in the direction of the eigenvector by a factor of Figure 10.14.10 shows a square centered at the origin whose sides are parallel to the two eigenvector directions. Under the above mapping, this square is deformed into the rectangle whose sides are also parallel to the two eigenvector directions. The area of the

square and rectangle are the same.

Figure 10.14.10 To explain the cause of the streaks in Figure 10.14.3, consider S to be part of the tiled plane, and let p be a point of S with period n. Because we are considering tiling, there is a point q in the plane with the same color as p that on successive iterations moves toward the position initially occupied by p, reaching that position on the nth iteration. This point is , since

Thus, with successive iterations, points of S flow away from their initial positions, while at the same time other points in the plane (with corresponding colors) flow toward those initial positions, completing their trip on the final iteration of the cycle. Figure 10.14.11 illustrates this in the case where

,

, and

. Note that

, so both points occupy the same positions on their respective tiles. The outgoing point moves in the general direction of the eigenvector , as indicated by the arrows in Figure 10.14.11, and the incoming point moves in the general direction of eigenvector . It is the “flow lines” in the general directions of the eigenvectors that form the streaks in Figure 10.14.3.

Figure 10.14.11

Nonperiodic Points Thus far we have considered the effect of Arnold's cat map on pixel points of the form for an arbitrary positive integer p. We know that all such points are periodic. We now consider the effect of Arnold's cat map on an arbitrary point in S. We classify such points as rational if the coordinates a and b are both rational numbers, and irrational if at least one of the coordinates is irrational. Every rational point is periodic, since it is a pixel point for a suitable choice of p. For example, the rational point can be written as , so it is a pixel point with . It can be shown (Exercise 13) that the converse is also true: Every periodic point must be a rational point.

It follows from the preceding discussion that the irrational points in S are nonperiodic, so that successive iterates of an irrational point in S must all be distinct points in S. Figure 10.14.12, which was computer generated, shows an irrational point and selected iterates up to 100,000. For the particular irrational point that we selected, the iterates do not seem to cluster in any particular region of S; rather, they appear to be spread throughout S, becoming denser with successive iterations.

Figure 10.14.12 The behavior of the iterates in Figure 10.14.12 is sufficiently important that there is some terminology associated with it. We say that a set D of points in S is dense in S if every circle centered at any point of S encloses points of D, no matter how small the radius of the circle is taken (Figure 10.14.13). It can be shown that the rational points are dense in S and the iterates of most (but not all) of the irrational points are dense in S.

Figure 10.14.13

Definition of Chaos We know that under Arnold's cat map, the rational points of S are periodic and dense in S and that some but not all of the irrational points have iterates that are dense in S. These are the basic ingredients of chaos. There are several definitions of chaos in

current use, but the following one, which is an outgrowth of a definition introduced by Robert L. Devaney in 1986 in his book An Introduction to Chaotic Dynamical Systems (Benjamin/Cummings Publishing Company), is most closely related to our work.

DEFINITION 1 A mapping T of S onto itself is said to be chaotic if: (i) S contains a dense set of periodic points of the mapping T. (ii) There is a point in S whose iterates under T are dense in S.

Thus Arnold's cat map satisfies the definition of a chaotic mapping. What is noteworthy about this definition is that a chaotic mapping exhibits an element of order and an element of disorder—the periodic points move regularly in cycles, but the points with dense iterates move irregularly, often obscuring the regularity of the periodic points. This fusion of order and disorder characterizes chaotic mappings.

Dynamical Systems Chaotic mappings arise in the study of dynamical systems. Informally stated, a dynamical system can be viewed as a system that has a specific state or configuration at each point of time but that changes its state with time. Chemical systems, ecological systems, electrical systems, biological systems, economic systems, and so forth can be looked at in this way. In a discrete-time dynamical system, the state changes at discrete points of time rather than at each instant. In a discrete-time chaotic dynamical system, each state results from a chaotic mapping of the preceding state. For example, if one imagines that Arnold's cat map is applied at discrete points of time, then the pixel maps in Figure 10.14.3 can be viewed as the evolution of a discrete-time chaotic dynamical system from some initial set of states (each point of the cat is a single initial state) to successive sets of states. One of the fundamental problems in the study of dynamical systems is to predict future states of the system from a known initial state. In practice, however, the exact initial state is rarely known because of errors in the devices used to measure the initial state. It was believed at one time that if the measuring devices were sufficiently accurate and the computers used to perform the iteration were sufficiently powerful, then one could predict the future states of the system to any degree of accuracy. But the discovery of chaotic systems shattered this belief because it was found that for such systems the slightest error in measuring the initial state or in the computation of the iterates becomes magnified exponentially, thereby preventing an accurate prediction of future states. Let us demonstrate this sensitivity to initial conditions with Arnold's cat map. Suppose that is a point in the xy-plane whose exact coordinates are . A measurement error of 0.00001 is made in the y-coordinate, such that the point is thought to be located at , which we denote by . Both and are pixel points with (why?), and thus, since , both return to their initial positions after 75,000 iterations. In Figure 10.14.14 we show the first 50 iterates of under Arnold's cat map as crosses and the first 50 iterates of as circles. Although and are close enough that their symbols overlap initially, only their first eight iterates have overlapping symbols; from the ninth iteration on their iterates follow divergent paths.

Figure 10.14.14 It is possible to quantify the growth of the error from the eigenvalues and eigenvectors of Arnold's cat map. For this purpose we will think of Arnold's cat map as a linear transformation on the tiled plane. Recall from Figure 10.14.10 and the related discussion that the projected distance between two points in S in the direction of the eigenvector increases by a factor of with each iteration (Figure 10.14.15). After nine iterations this projected distance increases by a factor of and with an initial error of roughly in the direction of , this distance is , or about

the width of the unit square S. After 12 iterations this small initial error grows to

which is greater than the width of S. Thus, we completely lose track of the true iterates within S after 12 iterations because of the exponential growth of the initial error.

Figure 10.14.15 Although sensitivity to initial conditions limits the ability to predict the future evolution of dynamical systems, new techniques are presently being investigated to describe this future evolution in alternative ways.

Exercise Set 10.14 1. In a journal article [F. J. Dyson and H. Falk, “Period of a Discrete Cat Mapping,” The American Mathematical Monthly, 99 (August–September 1992), pp. 603–614] the following results concerning the nature of the function were established: (i)

if and only if

(ii)

if and only if

(iii) Find

for

.

for

or

for

.

for all other choices of p. ,

,

,

,

,

,

,

, and

.

Answer: ,

,

,

,

,

2. Find all the n-cycles that are subsets of the 36 points in S of the form Then find .

,

,

,

with m and n in the range

.

Answer: One 1-cycle:

; one 3-cycle:

; two 4-cycles:

and

; two 12-cycles: and

,

.

3. (Fibonacci Shift-Register Random-Number Generator) A well-known method of generating a sequence of “pseudorandom” integers in the interval from 0 to is based on the following algorithm: (i) Pick any two integers

and

(ii) Set

from the range

mod p for

. .

Here x mod p denotes the number in the interval from 0 to (because ); (because

that differs from x by a multiple of p. For example, 35 mod ); and (because ).

(a) Generate the sequence of pseudorandom numbers that results from the choices sequence starts repeating.

,

, and

until the

(b) Show that the following formula is equivalent to step (ii) of the algorithm:

(c) Use the formula in part (b) to generate the sequence of vectors for the choices sequence starts repeating.

,

, and

until the

Answer: (a) 3, 7, 10, 2, 12, 14, 11, 10, 6, 1, 7, 8, 0, 8, 8, 1, 9, 10, 4, 14, 3, 2, 5, 7, 12, 4, 1, 5, 6, 11, 2, 13, 0, 13, 13, 11, 9, 5, 14, 4, 3, 7, (c) (5, 5), (10, 15), (4, 19), (2, 0), (2, 2), (4, 6), (10, 16), (5, 0), (5, 5), Remark If we take and pick and from the interval , then the above random-number generator produces pseudorandom numbers in the interval . The resulting scheme is precisely Arnold's ct map. Furthermore, if we eliminate the modular arithmetic in the algorithm and take , then the resulting sequence of integers is the famous Fibonacci sequence, in which each number after the first two is the sum of the preceding two numbers. 4.

For

, it can be verified that

It can also be verified that 12,586,269,025 is divisible by 101 and that when 7,778,742,049 and 20,365,011,074 are divided by 101, the remainder is 1. (a) Show that every point in S of the form cat map.

returns to its starting position after 25 iterations under Arnold's

(b) Show that every point in S of the form

has period

(c) Show that the point (d) Show that

, or 25.

has period greater than 5 by iterating it five times. .

Answer: (c) The first five iterates of 5. Show that for the mapping

are defined by

,

,

,

.

mod 1, every point in S is a periodic point. Why does

this show that the mapping is not chaotic? 6. An Anosov automorphism on

, and

is a mapping from the unit square S onto S of the form

in which (i) a, b, c, and d are integers, (ii) the determinant of the matrix is , and (iii) the eigenvalues of the matrix do not have magnitude 1. It can be shown that all Anosov automorphisms are chaotic mappings. (a) Show that Arnold's cat map is an Anosov automorphism. (b) Which of the following are the matrices of an Anosov automorphism?

(c) Show that the following mapping of S onto S is not an Anosov automorphism.

What is the geometric effect of this transformation on S? Use your observation to show that the mapping is not a chaotic mapping by showing that all points in S are periodic points. Answer: (b)

The matrices of Anosov automorphisms are

and

(c) The transformation affects a rotation of S through

.

in the clockwise direction.

7. Show that Arnold's cat map is one-to-one over the unit square S and that its range is S. 8. Show that the inverse of Arnold's cat map is given by

9. Show that the unit square S can be partitioned into four triangular regions on each of which Arnold's cat map is a transformation of the form

where a and b need not be the same for each region. [Hint: Find the regions in S that map onto the four shaded regions of the parallelogram in Figure 10.14.1d.] Answer:

In region I: 10. If

; in region II: is a point in S and

; in region III:

; in region IV:

is its nth iterate under Arnold's cat map, show that

This result implies that the modular arithmetic need only be performed once rather than after each iteration. 11. Show that

is the only fixed point of Arnold's cat map by showing that the only solution of the equation

with

and

is

. [Hint: For appropriate nonnegative integers, r and s, we can write

for the preceding equation.] 12. Find all 2-cycles of Arnold's cat map by finding all solutions of the equation

with

and

. [Hint: For appropriate nonnegative integers, r and s, we can write

for the preceding equation.] Answer: and

form one 2-cycle, and

and

form another 2-cycle.

13. Show that every periodic point of Arnold's cat map must be a rational point by showing that for all solutions of the equation

the numbers

and

are quotients of integers.

14. Let T be the Arnold's cat map applied five times in a row; that is, . Figure Ex-14 represents four successive mappings of T on the first image, each image having a resolution of pixels. The fifth mapping returns to the first image because this cat map has a period of 25. Explain how you might generate this particular sequence of images.

Figure Ex-14 Answer: Begin with a array of white pixels and add the letter ‘A’ in black pixels to it. Apply the mapping to this image, which will scatter the black pixels throughout the image. Then superimpose the letter ‘B’ in black pixels onto this image. Apply the mapping again and then superimpose the letter ‘C’ in black pixels onto the resulting image. Repeat this procedure with the letters ‘D’ and ‘E’. The next application of the mapping will return you to the letter ‘A’ with the pixels for the letters ‘B’ through ‘E’ scattered in the background.

Section 10.14 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. The methods of Exercise 4 show that for the cat map,

is the smallest integer satisfying the equation

This suggests that one way to determine

is to compute

starting with and stopping when this produces the identity matrix. Use this idea to compute for Compare your results to the formulas given in Exercise 1, if they apply. What can you conjecture about

when

.

is even?

T2. The eigenvalues and eigenvectors for the cat map matrix

are

Using these eigenvalues and eigenvectors, we can define

and write

; hence,

. Use a computer to show that

where

and

How can you use these results and your conclusions in Exercise T1 to simplify the method for computing

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

?

10.15 Cryptography In this section we present a method of encoding and decoding messages. We also examine modular arithmetic and show how Gaussian elimination can sometimes be used to break an opponent's code.

Prerequisites Matrices Gaussian Elimination Matrix Operations Linear Independence Linear Transformations (Section 4.9)

Ciphers The study of encoding and decoding secret messages is called cryptography. Although secret codes date to the earliest days of written communication, there has been a recent surge of interest in the subject because of the need to maintain the privacy of information transmitted over public lines of communication. In the language of cryptography, codes are called ciphers, uncoded messages are called plaintext, and coded messages are called ciphertext. The process of converting from plaintext to ciphertext is called enciphering, and the reverse process of converting from ciphertext to plaintext is called deciphering. The simplest ciphers, called substitution ciphers, are those that replace each letter of the alphabet by a different letter. For example, in the substitution cipher Plain

A B C D E F G H

Cipher D E

F G H

I

J

I

J

K L M N O P Q R

K L M N O

P

S

T

U V W X Y Z

Q R S T U V W X Y

Z

A B C

the plaintext letter A is replaced by D, the plaintext letter B by E, and so forth. With this cipher the plaintext message becomes

Hill Ciphers A disadvantage of substitution ciphers is that they preserve the frequencies of individual letters, making it relatively easy to break the code by statistical methods. One way to overcome this problem is to divide the plaintext into groups of letters and encipher the plaintext group by group, rather than one letter at a time. A system of cryptography in which the plaintext is divided into sets of n letters, each of which is replaced by a set of n cipher letters, is called a polygraphic system. In this section we will study a class of polygraphic systems based on matrix transformations. [The ciphers that we will discuss are called Hill ciphers after Lester S. Hill, who introduced them in two papers: “Cryptography in an Algebraic Alphabet,” American Mathematical Monthly, 36 (June–July 1929), pp. 306–312; and “Concerning Certain Linear Transformation Apparatus of Cryptography,” American Mathematical Monthly, 38 (March 1931), pp. 135–154.]

In the discussion to follow, we assume that each plaintext and ciphertext letter except Z is assigned the numerical value that specifies its position in the standard alphabet (Table 1). For reasons that will become clear later, Z is assigned a value of zero. Table 1

In the simplest Hill ciphers, successive pairs of plaintext are transformed into ciphertext by the following procedure: Step 1 Choose a

matrix with integer entries

to perform the encoding. Certain additional conditions on A will be imposed later. Step 2 Group successive plaintext letters into pairs, adding an arbitrary “dummy” letter to fill out the last pair if the plaintext has an odd number of letters, and replace each plaintext letter by its numerical value. Step 3 Successively convert each plaintext pair

and form the product

into a column vector

. We will call p a plaintext vector and

the corresponding ciphertext vector.

Step 4 Convert each ciphertext vector into its alphabetic equivalent.

E X A M P L E 1 Hill Cipher of a Message Use the matrix

to obtain the Hill cipher for the plaintext message

Solution If we group the plaintext into pairs and add the dummy letter G to fill out the last pair, we obtain or, equivalently, from Table 1, To encipher the pair IA, we form the matrix product

which, from Table 1, yields the ciphertext KC. To encipher the pair MH, we form the product

However, there is a problem here, because the number 29 has no alphabet equivalent (Table 1). To resolve this problem, we make the following agreement:

Because the remainder after division by 26 is one of the integers yield an integer with an alphabet equivalent.

, this procedure will always

Thus, in 1 we replace 29 by 3, which is the remainder after dividing 29 by 26. It now follows from Table 1 that the ciphertext for the pair MH is CX. The computations for the remaining ciphertext vectors are

These correspond to the ciphertext pairs QL, KP, and UU, respectively. In summary, the entire ciphertext message is which would usually be transmitted as a single string without spaces:

Because the plaintext was grouped in pairs and enciphered by a matrix, the Hill cipher in Example 1 is referred to as a Hill 2-cipher. It is obviously also possible to group the plaintext in triples and encipher by a matrix with integer entries; this is called a Hill 3-cipher. In general, for a Hill n-cipher, plaintext is grouped into sets of n letters and enciphered by an matrix with integer entries.

Modular Arithmetic In Example 1, integers greater than 25 were replaced by their remainders after division by 26. This technique of working with remainders is at the core of a body of mathematics called modular arithmetic. Because of its importance in cryptography, we will digress for a moment to touch on some of the main ideas in this area. In modular arithmetic we are given a positive integer m, called the modulus, and any two integers whose difference is an integer multiple of the modulus are regarded as “equal” or “equivalent” with respect to the modulus. More precisely, we make the following definition.

DEFINITION 1 If m is a positive integer and a and b are any integers, then we say that a is equivalent to b modulo m, written if

is an integer multiple of m.

E X A M P L E 2 Various Equivalences

For any modulus m it can be proved that every integer a is equivalent, modulo m, to exactly one of the integers We call this integer the residue of a modulo m, and we write to denote the set of residues modulo m. If a is a nonnegative integer, then its residue modulo m is simply the remainder that results when a is divided by m. For an arbitrary integer a, the residue can be found using the following theorem.

THEOREM 10.15.1 For any integer a and modulus m, let

Then the residue r of a modulo m is given by

E X A M P L E 3 Residues mod 26 Find the residue modulo 26 of (a) 87, (b)

, and (c)

.

Solution (a) Dividing

by 26 yields a remainder of

, so

. Thus,

(b) Dividing

by 26 yields a remainder of

, so

(c) Dividing

by 26 yields a remainder of

. Thus,

. Thus,

In ordinary arithmetic every nonzero number a has a reciprocal or multiplicative inverse, denoted by

, such that

In modular arithmetic we have the following corresponding concept:

DEFINITION 2 If a is a number in

, then a number

in

is called a reciprocal or multiplicative inverse of a modulo m if

.

It can be proved that if a and m have no common prime factors, then a has a unique reciprocal modulo m; conversely, if a and m have a common prime factor, then a has no reciprocal modulo m.

E X A M P L E 4 Reciprocal of 3 mod 26 The number 3 has a reciprocal modulo 26 because 3 and 26 have no common prime factors. This reciprocal can be obtained by finding the number x in that satisfies the modular equation Although there are general methods for solving such modular equations, it would take us too far afield to study them. However, because 26 is relatively small, this equation can be solved by trying the possible solutions, 0 to 25, one at a time. With this approach we find that is the solution, because Thus,

E X A M P L E 5 A Number with No Reciprocal mod 26 The number 4 has no reciprocal modulo 26, because 4 and 26 have 2 as a common prime factor (see Exercise 8).

For future reference, in Table 2 we provide the following reciprocals modulo 26: Table 2 Reciprocals Modulo 26

Deciphering Every useful cipher must have a procedure for decipherment. In the case of a Hill cipher, decipherment uses the inverse (mod 26) of the enciphering matrix. To be precise, if m is a positive integer, then a square matrix A with entries in is said to be invertible modulo m if there is a matrix B with entries in such that

Suppose now that

is invertible modulo 26 and this matrix is used in a Hill 2-cipher. If (1) is a plaintext vector, then is the corresponding ciphertext vector and Thus, each plaintext vector can be recovered from the corresponding ciphertext vector by multiplying it on the left by . In cryptography it is important to know which matrices are invertible modulo 26 and how to obtain their inverses. We now investigate these questions. In ordinary arithmetic, a square matrix A is invertible if and only if , or, equivalently, if and only if reciprocal. The following theorem is the analog of this result in modular arithmetic.

has a

THEOREM 10.15.2 A square matrix A with entries in reciprocal modulo m.

is invertible modulo m if and only if the residue of

modulo m has a

Because the residue of modulo m will have a reciprocal modulo m if and only if this residue and m have no common prime factors, we have the following corollary.

COROLLARY 10.15.3 A square matrix A with entries in have no common prime factors.

Because the only prime factors of

is invertible modulo m if and only if m and the residue of

modulo m

are 2 and 13, we have the following corollary, which is useful in cryptography.

COROLLARY 10.15.4 A square matrix A with entries in divisible by 2 or 13.

is invertible modulo 26 if and only if the residue of

modulo 26 is not

We leave it for you to verify that if

has entries in 26) is given by

and the residue of

modulo 26 is not divisible by 2 or 13, then the inverse of A (mod

(2) where

is the reciprocal of the residue of

(mod 26).

E X A M P L E 6 Inverse of a Matrix mod 26 Find the inverse of

modulo 26. Solution so from Table 2, Thus, from 2,

As a check,

Similarly,

.

E X A M P L E 7 Decoding a Hill 2-Cipher Decode the following Hill 2-cipher, which was enciphered by the matrix in Example 6:

Solution From Table 1 the numerical equivalent of this ciphertext is To obtain the plaintext pairs, we multiply each ciphertext vector by the inverse of A (obtained in Example 6):

From Table 1, the alphabet equivalents of these vectors are which yields the message

Breaking a Hill Cipher Because the purpose of enciphering messages and information is to prevent “opponents” from learning their contents, cryptographers are concerned with the security of their ciphers—that is, how readily they can be broken (deciphered by their opponents). We will conclude this section by discussing one technique for breaking Hill ciphers. Suppose that you are able to obtain some corresponding plaintext and ciphertext from an opponent's message. For example, on examining some intercepted ciphertext, you may be able to deduce that the message is a letter that begins DEAR SIR. We will show that with a small amount of such data, it may be possible to determine the deciphering matrix of a Hill code and consequently obtain access to the rest of the message. It is a basic result in linear algebra that a linear transformation is completely determined by its values at a basis. This principle suggests that if we have a Hill n-cipher, and if are linearly independent plaintext vectors whose corresponding ciphertext vectors are known, then there is enough information available to determine the matrix A and hence

.

The following theorem, whose proof is discussed in the exercises, provides a way to do this.

THEOREM 10.15.5 Determining the Deciphering Matrix Let be linearly independent plaintext vectors, and let vectors in a Hill n-cipher. If

be the corresponding ciphertext

is the

matrix with row vectors

is the

matrix with row vectors

to I transforms P to

and if

then the sequence of elementary row operations that reduces C

.

This theorem tells us that to find the transpose of the deciphering matrix , we must find a sequence of row operations that reduces C to I and then perform this same sequence of operations on P. The following example illustrates a simple algorithm for doing this.

E X A M P L E 8 Using Theorem 10.15.5 The following Hill 2-cipher is intercepted: Decipher the message, given that it starts with the word DEAR. Solution From Table 1, the numerical equivalent of the known plaintext is

and the numerical equivalent of the corresponding ciphertext is

so the corresponding plaintext and ciphertext vectors are

We want to reduce

to I by elementary row operations and simultaneously apply these operations to

to obtain

(the transpose of the deciphering matrix). This can be accomplished by adjoining P to the

right of C and applying row operations to the resulting matrix final matrix will then have the form

until the left side is reduced to I. The

. The computations can be carried out as follows:

Thus,

so the deciphering matrix is

To decipher the message, we first group the ciphertext into pairs and find the numerical equivalent of each letter:

Next, we multiply successive ciphertext vectors on the left by resulting plaintext pairs:

and find the alphabet equivalents of the

Finally, we construct the message from the plaintext pairs:

Further Readings Readers interested in learning more about mathematical cryptography are referred to the following books, the first of which is elementary and the second more advanced. 1. Abraham Sinkov, Elementary Cryptanalysis, a Mathematical Approach (Mathematical Association of America, 2009). 2. Alan G. Konheim, Cryptography, a Primer (New York: Wiley-Interscience, 1981).

Exercise Set 10.15 1. Obtain the Hill cipher of the message for each of the following enciphering matrices: (a) (b)

Answer:

(a) GIYUOKEVBH (b) SFANEFZWJH 2. In each part determine whether the matrix is invertible modulo 26. If so, find its inverse modulo 26 and check your work by verifying that (mod 26). (a) (b) (c) (d) (e) (f)

Answer: (a) (b) Not invertible (c) (d) Not invertible (e) Not invertible (f) 3. Decode the message given that it is a Hill cipher with enciphering matrix

Answer: WE LOVE MATH 4. A Hill 2-cipher is intercepted that starts with the pairs Find the deciphering and enciphering matrices, given that the plaintext is known to start with the word ARMY. Answer:

5. Decode the following Hill 2-cipher if the last four plaintext letters are known to be ATOM.

Answer: THEY SPLIT THE ATOM 6. Decode the following Hill 3-cipher if the first nine plaintext letters are IHAVECOME:

Answer: I HAVE COME TO BURY CAESAR 7. All of the results of this section can be generalized to the case where the plaintext is a binary message; that is, it is a sequence of 0's and 1's. In this case we do all of our modular arithmetic using modulus 2 rather than modulus 26. Thus, for example, (mod 2). Suppose we want to encrypt the message 110101111. Let us first break it into triplets to form the three vectors

,

,

, and let us take

as our enciphering matrix.

(a) Find the encoded message. (b) Find the inverse modulo 2 of the enciphering matrix, and verify that it decodes your encoded message. Answer: (a) 010110001 (b)

8. If, in addition to the standard alphabet, a period, comma, and question mark were allowed, then 29 plaintext and ciphertext symbols would be available and all matrix arithmetic would be done modulo 29. Under what conditions would a matrix with entries in be invertible modulo 29? Answer: A is invertible modulo 29 if and only if

9. Show that the modular equation . 10. (a)

(mod 29). has no solution in

Let P and C be the matrices in Theorem 10.15.5. Show that

(b) To prove Theorem 10.15.5, let reduce C to I, so

by successively substituting the values .

be the elementary matrices that correspond to the row operations that

Show that

from which it follows that the same sequence of row operations that reduces C to I converts P to 11. (a) If A is the enciphering matrix of a Hill n-cipher, show that

where C and P are the matrices defined in Theorem 10.15.5.

.

(b) Instead of using Theorem 10.15.5 as in the text, find the deciphering matrix of Example 8 by using the result in part (a) and Equation 2 to compute . [Note: Although this method is practical for Hill 2-ciphers, Theorem 10.15.5 is more efficient for Hill n-ciphers with .]

Section 10.15 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Two integers that have no common factors (except 1) are said to be relatively prime. Given a positive integer n, let , where , be the set of all positive integers less than n and relatively prime to n. For example, if , then (a) Construct a table consisting of n and

for

, and then compute

in each case. Draw a conjecture for and prove your conjecture to be true. [Hint: Use the fact that if a is relatively prime to n, then is also relatively prime to n.] (b) Given a positive integer n and the set

, let

be the

matrix

so that, for example,

Use a computer to compute conjecture.

and

for

, and then use these results to construct a

(c) Use the results of part (a) to prove your conjecture to be true. [Hint: Add the first then use Theorem 2.2.3.] What do these results imply about the inverse of

rows of

to its last row and

?

T2. Given a positive integer n greater than 1, the number of positive integers less than n and relatively prime to n is called the Euler phi function of n and is denoted by . For example, since only two positive integers (1 and 5) are less than 6 and have no common factor with 6. (a) Using a computer, for each value of compute and print out all positive integers that are less than n and relatively prime to n. Then use these integers to determine the values of for . Can you discover a pattern in the results?

(b) It can be shown that if

For example, since

are all the distinct prime factors of n, then

are the distinct prime factors of 12, we have

which agrees with the fact that are the only positive integers less than 12 and relatively prime to 12. Using a computer, print out all the prime factors of n for . Then compute using the formula above and compare it to your results in part (a).

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.16 Genetics In this section we investigate the propagation of an inherited trait in successive generations by computing powers of a matrix.

Prerequisites Eigenvalues and Eigenvectors Diagonalization of a Matrix Intuitive Understanding of Limits

Inheritance Traits In this section we examine the inheritance of traits in animals or plants. The inherited trait under consideration is assumed to be governed by a set of two genes, which we designate by A and a. Under autosomal inheritance each individual in the population of either gender possesses two of these genes, the possible pairings being designated AA, Aa, and aa. This pair of genes is called the individual's genotype, and it determines how the trait controlled by the genes is manifested in the individual. For example, in snapdragons a set of two genes determines the color of the flower. Genotype AA produces red flowers, genotype Aa produces pink flowers, and genotype aa produces white flowers. In humans, eye coloration is controlled through autosomal inheritance. Genotypes AA and aa have brown eyes, and genotype Aa has blue eyes. In this case we say that gene A dominates gene a, or that gene a is recessive to gene A, because genotype Aa has the same outward trait as genotype AA. In addition to autosomal inheritance we will also discuss X-linked inheritance. In this type of inheritance, the male of the species possesses only one of the two possible genes (A or a), and the female possesses a pair of the two genes (AA, aa, or Aa). In humans, color blindness, hereditary baldness, hemophilia, and muscular dystrophy, to name a few, are traits controlled by X-linked inheritance. Below we explain the manner in which the genes of the parents are passed on to their offspring for the two types of inheritance. We construct matrix models that give the probable genotypes of the offspring in terms of the genotypes of the parents, and we use these matrix models to follow the genotype distribution of a population through successive generations.

Autosomal Inheritance In autosomal inheritance an individual inherits one gene from each of its parents' pairs of genes to form its own particular pair. As far as we know, it is a matter of chance which of the two genes a parent passes on to the offspring. Thus, if one parent is of genotype Aa, it is equally likely that the offspring will inherit the A

gene or the a gene from that parent. If one parent is of genotype aa and the other parent is of genotype Aa, the offspring will always receive an a gene from the aa parent and will receive either an A gene or an a gene, with equal probability, from the Aa parent. Consequently, each of the offspring has equal probability of being genotype aa or Aa. In Table 1 we list the probabilities of the possible genotypes of the offspring for all possible combinations of the genotypes of the parents. Table 1

E X A M P L E 1 Distribution of Genotypes in a Population Suppose that a farmer has a large population of plants consisting of some distribution of all three possible genotypes AA, Aa, and aa. The farmer desires to undertake a breeding program in which each plant in the population is always fertilized with a plant of genotype AA and is then replaced by one of its offspring. We want to derive an expression for the distribution of the three possible genotypes in the population after any number of generations. For

Thus

let us set

,

, and

specify the initial distribution of the genotypes. We also have that

From Table 1 we can determine the genotype distribution of each generation from the genotype distribution of the preceding generation by the following equations:

(1)

For example, the first of these three equations states that all the offspring of a plant of genotype AA will be of genotype AA under this breeding program and that half of the offspring of a plant of genotype Aa will be of genotype AA.

Equations 1 can be written in matrix notation as (2) where

Note that the three columns of the matrix M are the same as the first three columns of Table 1. From Equation 2 it follows that (3) Consequently, if we can find an explicit expression for , we can use 3 to obtain an explicit expression for . To find an explicit expression for , we first diagonalize M. That is, we find an invertible matrix P and a diagonal matrix D such that (4) With such a diagonalization, we then have (see Exercise 1) where

The diagonalization of M is accomplished by finding its eigenvalues and corresponding eigenvectors. These are as follows (verify):

Thus, in Equation 4 we have

and

Therefore,

or

Using the fact that

, we thus have

(5)

These are explicit formulas for the fractions of the three genotypes in the nth generation of plants in terms of the initial genotype fractions. Because

tends to zero as n approaches infinity, it follows from these equations that

as n approaches infinity. That is, in the limit all plants in the population will be genotype AA.

E X A M P L E 2 Modifying Example 1

We can modify Example 1 so that instead of each plant being fertilized with one of genotype AA, each plant is fertilized with a plant of its own genotype. Using the same notation as in Example 1, we then find where

The columns of this new matrix M are the same as the columns of Table 1 corresponding to parents with genotypes AA–AA, Aa–Aa, and aa–aa. The eigenvalues of M are (verify)

The eigenvalue has multiplicity two and its corresponding eigenspace is two-dimensional. Picking two linearly independent eigenvectors and in that eigenspace, and a single eigenvector for the simple eigenvalue , we have (verify)

The calculations for

Thus,

are then

(6)

In the limit, as n tends to infinity,

and

, so

Thus, fertilization of each plant with one of its own genotype produces a population that in the limit contains only genotypes AA and aa.

Autosomal Recessive Diseases There are many genetic diseases governed by autosomal inheritance in which a normal gene A dominates an abnormal gene a. Genotype AA is a normal individual; genotype Aa is a carrier of the disease but is not afflicted with the disease; and genotype aa is afflicted with the disease. In humans such genetic diseases are often associated with a particular racial group—for instance, cystic fibrosis (predominant among Caucasians), sickle-cell anemia (predominant among people of African origin), Cooley's anemia (predominant among people of Mediterranean origin), and Tay-Sachs disease (predominant among Eastern European Jews). Suppose that an animal breeder has a population of animals that carries an autosomal recessive disease. Suppose further that those animals afflicted with the disease do not survive to maturity. One possible way to control such a disease is for the breeder to always mate a female, regardless of her genotype, with a normal male. In this way, all future offspring will either have a normal father and a normal mother (AA–AA matings) or a normal father and a carrier mother (AA–Aa matings). There can be no AA–aa matings since animals of genotype aa do not survive to maturity. Under this type of mating program no future offspring will be afflicted with the disease, although there will still be carriers in future generations. Let us now determine the fraction of carriers in future generations. We set

where

Because each offspring has at least one normal parent, we may consider the controlled mating program as one

of continual mating with genotype Aa, as in Example 1. Thus, the transition of genotype distributions from one generation to the next is governed by the equation where

Because we know the initial distribution by

, the distribution of genotypes in the nth generation is thus given

The diagonalization of M is easily carried out (see Exercise 4) and leads to

Because

, we have

(7)

Thus, as n tends to infinity, we have

so in the limit there will be no carriers in the population. From 7 we see that (8) That is, the fraction of carriers in each generation is one-half the fraction of carriers in the preceding generation. It would be of interest also to investigate the propagation of carriers under random mating, when two animals mate without regard to their genotypes. Unfortunately, such random mating leads to nonlinear equations, and the techniques of this section are not applicable. However, by other techniques it can be shown that under random mating, Equation 8 is replaced by (9)

As a numerical example, suppose that the breeder starts with a population in which 10% of the animals are carriers. Under the controlled-mating program governed by Equation 8, the percentage of carriers can be reduced to 5% in one generation. But under random mating, Equation 9 predicts that 9.5% of the population will be carriers after one generation ( if ). In addition, under controlled mating no offspring will ever be afflicted with the disease, but with random mating it can be shown that about 1 in 400 offspring will be born with the disease when 10% of the population are carriers.

X-Linked Inheritance As mentioned in the introduction, in X-linked inheritance the male possesses one gene (A or a) and the female possesses two genes (AA, Aa, or aa). The term X-linked is used because such genes are found on the X-chromosome, of which the male has one and the female has two. The inheritance of such genes is as follows: A male offspring receives one of his mother's two genes with equal probability, and a female offspring receives the one gene of her father and one of her mother's two genes with equal probability. Readers familiar with basic probability can verify that this type of inheritance leads to the genotype probabilities in Table 2. Table 2

We will discuss a program of inbreeding in connection with X-linked inheritance. We begin with a male and female; select two of their offspring at random, one of each gender, and mate them; select two of the resulting offspring and mate them; and so forth. Such inbreeding is commonly performed with animals. (Among humans, such brother-sister marriages were used by the rulers of ancient Egypt to keep the royal line pure.) The original male-female pair can be one of the six types, corresponding to the six columns of Table 2: The sibling pairs mated in each successive generation have certain probabilities of being one of these six types. To compute these probabilities, for let us set

With these probabilities we form a column vector

From Table 2 it follows that (10) where

For example, suppose that in the -st generation, the sibling pair mated is type . Then their male offspring will be genotype A or a with equal probability, and their female offspring will be genotype AA or Aa with equal probability. Because one of the male offspring and one of the female offspring are chosen at random for mating, the next sibling pair will be one of type , , , or with equal probability. Thus, the second column of M contains “ ” in each of the four rows corresponding to these four sibling pairs. (See Exercise 9 for the remaining columns.) As in our previous examples, it follows from 10 that (11)

After lengthy calculations, the eigenvalues and eigenvectors of M turn out to be

The diagonalization of M then leads to (12) where

We will not write out the matrix product in 12, as it is rather unwieldy. However, if a specific vector given, the calculation for is not too cumbersome (see Exercise 6).

is

Because the absolute values of the last four diagonal entries of D are less than 1, we see that as n tends to infinity,

And so, from Equation 12,

Performing the matrix multiplication on the right, we obtain (verify)

(13)

That is, in the limit all sibling pairs will be either type (A, AA) or type parents are type (that is, and

Thus, in the limit there is probability be

that the sibling pairs will be

. For example, if the initial ), then as n tends to infinity,

, and probability

that they will

.

Exercise Set 10.16 1. Show that if

, then

for

2. In Example 1 suppose that the plants are always fertilized with a plant of genotype Aa rather than one of genotype AA. Derive formulas for the fractions of the plants of genotypes AA, Aa, and aa in the nth generation. Also, find the limiting genotype distribution as n tends to infinity. Answer:

3. In Example 1 suppose that the initial plants are fertilized with genotype AA, the first generation is fertilized with genotype Aa, the second generation is fertilized with genotype AA, and this alternating pattern of fertilization is kept up. Find formulas for the fractions of the plants of genotypes AA, Aa, and aa in the nth generation. Answer:

4. In the section on autosomal recessive diseases, find the eigenvalues and eigenvectors of the matrix M and verify Equation 7. Answer: Eigenvalues:

,

; eigenvectors:

5. Suppose that a breeder has an animal population in which 25% of the population are carriers of an autosomal recessive disease. If the breeder allows the animals to mate irrespective of their genotype, use Equation 9 to calculate the number of generations required for the percentage of carriers to fall from 25% to 10%. If the breeder instead implements the controlled-mating program determined by Equation 8, what will the percentage of carriers be after the same number of generations? Answer: 12 generations; .006% 6. In the section on X-linked inheritance, suppose that the initial parents are equally likely to be of any of the six possible genotype parents; that is,

Using Equation 12, calculate

and also calculate the limit of

as n tends to infinity.

Answer:

;

7. From 13 show that under X-linked inheritance with inbreeding, the probability that the limiting sibling pairs will be of type is the same as the proportion of A genes in the initial population. 8. In X-linked inheritance suppose that none of the females of genotype Aa survive to maturity. Under inbreeding the possible sibling pairs are then Find the transition matrix that describes how the genotype distribution changes in one generation. Answer:

9. Derive the matrix M in Equation 10 from Table 2.

Section 10.16 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. (a) Use a computer to verify that the eigenvalues and eigenvectors of

as given in the text are correct. (b) Starting with

and the assumption that

exists, we must have

This suggests that x can be solved directly using the equation equation , where

. Use a computer to solve the

and along with

; compare your results to Equation 13. Explain why the solution to is not specific enough to determine .

T2. (a) Given

from Equation 12 and

use a computer to show that

(b) Use a computer to calculate limit in part (a).

for

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

, 20, 30, 40, 50, 60, 70, and then compare your results to the

10.17 Age-Specific Population Growth In this section we investigate, using the Leslie matrix model, the growth over time of a female population that is divided into age classes. We then determine the limiting age distribution and growth rate of the population.

Prerequisites Eigenvalues and Eigenvectors Diagonalization of a Matrix Intuitive Understanding of Limits

One of the most common models of population growth used by demographers is the so-called Leslie model developed in the 1940s. This model describes the growth of the female portion of a human or animal population. In this model the females are divided into age classes of equal duration. To be specific, suppose that the maximum age attained by any female in the population is L years (or some other time unit) and we divide the population into n age classes. Then each class is years in duration. We label the age classes according to Table 1. Table 1

Suppose that we know the number of females in each of the n classes at time . In particular, let there be females in the first class, females in the second class, and so forth. With these n numbers we form a column vector:

We call this vector the initial age distribution vector.

As time progresses, the number of females within each of the n classes changes because of three biological processes: birth, death, and aging. By describing these three processes quantitatively, we will see how to project the initial age distribution vector into the future. The easiest way to study the aging process is to observe the population at discrete times—say, The Leslie model requires that the duration between any two successive observation times be the same as the duration of the age intervals. Therefore, we set

With this assumption, all females in the

-st class at time

were in the ith class at time

.

The birth and death processes between two successive observation times can be described by means of the following demographic parameters:

By their definitions, we have that

Note that we do not allow any to equal zero, because then no females would survive beyond the ith age class. We also assume that at least one is positive so that some births occur. Any age class for which the corresponding value of is positive is called a fertile age class. We next define the age distribution vector

where

at time

by

is the number of females in the ith age class at time

class are just those daughters born between times

and

. Now, at time

. Thus, we can write

, the females in the first age

or, mathematically, (1) The females in the -st age class time who are still alive at time . Thus,

at time

are those females in the ith class at

or, mathematically, (2) Using matrix notation, we can write Equations 1 and 2 as

or more compactly as (3) where L is the Leslie matrix

(4)

From Equation 3 it follows that

(5)

Thus, if we know the initial age distribution distribution at any later time.

and the Leslie matrix L, we can determine the female age

E X A M P L E 1 Female Age Distribution for Animals Suppose that the oldest age attained by the females in a certain animal population is 15 years and we divide the population into three age classes with equal durations of five years. Let the Leslie matrix for this population be

If there are initially 1000 females in each of the three age classes, then from Equation 3 we have

Thus, after 15 years there are 14,375 females between 0 and 5 years of age, 1375 females between 5 and 10 years of age, and 875 females between 10 and 15 years of age.

Limiting Behavior Although Equation 5 gives the age distribution of the population at any time, it does not immediately give a general picture of the dynamics of the growth process. For this we need to investigate the eigenvalues and eigenvectors of the Leslie matrix. The eigenvalues of L are the roots of its characteristic polynomial. As we ask you to verify in Exercise 2, this characteristic polynomial is

To analyze the roots of this polynomial, it will be convenient to introduce the function (6) Using this function, the characteristic equation

can be written (verify) (7)

Because all the and are nonnegative, we see that is monotonically decreasing for greater than zero. Furthermore, has a vertical asymptote at and approaches zero as . Consequently, as Figure 10.17.1 indicates, there is a unique , say , such that . That is, the matrix L has a unique positive eigenvalue. It can also be shown (see Exercise 3) that has multiplicity 1; that is, is not a repeated root of the characteristic equation. Although we omit the computational details, you can verify that an eigenvector corresponding to is

(8)

Because has multiplicity 1, its corresponding eigenspace has dimension 1 (Exercise 3), and so any eigenvector corresponding to it is some multiple of . We can summarize these results in the following theorem.

Figure 10.17.1

THEOREM 10.17.1 Existence of a Positive Eigenvalue A Leslie matrix L has a unique positive eigenvalue eigenvector all of whose entries are positive.

. This eigenvalue has multiplicity 1 and an

We will now show that the long-term behavior of the age distribution of the population is determined by the positive eigenvalue and its eigenvector . In Exercise 9 we ask you to prove the following result.

THEOREM 10.17.2 Eigenvalues of a Leslie Matrix If is the unique positive eigenvalue of a Leslie matrix eigenvalue of then .

and

is any other real or complex

For our purposes the conclusion in Theorem 10.17.2 is not strong enough; we need to satisfy . In this case would be called the dominant eigenvalue of L. However, as the following example shows, not all Leslie matrices satisfy this condition.

E X A M P L E 2 Leslie Matrix with No Dominant Eigenvalue Let

Then the characteristic polynomial of L is

The eigenvalues of L are thus the solutions of

—namely,

All three eigenvalues have absolute value 1, so the unique positive eigenvalue is not dominant. Note that this matrix has the property that . This means that for any choice of the initial age distribution , we have The age distribution vector thus oscillates with a period of three time units. Such oscillations (or

population waves, as they are called) could not occur if

were dominant, as we will see below.

It is beyond the scope of this book to discuss necessary and sufficient conditions for eigenvalue. However, we will state the following sufficient condition without proof.

to be a dominant

THEOREM 10.17.3 Dominant Eigenvalue If two successive entries and in the first row of a Leslie matrix L are nonzero, then the positive eigenvalue of L is dominant.

Thus, if the female population has two successive fertile age classes, then its Leslie matrix has a dominant eigenvalue. This is always the case for realistic populations if the duration of the age classes is sufficiently small. Note that in Example 2 there is only one fertile age class (the third), so the condition of Theorem 10.17.3 is not satisfied. In what follows, we always assume that the condition of Theorem 10.17.3 is satisfied. Let us assume that L is diagonalizable. This is not really necessary for the conclusions we will draw, but it does simplify the arguments. In this case, L has n eigenvalues, , not necessarily distinct, and n linearly independent eigenvectors, , corresponding to them. In this listing we place the dominant eigenvalue first. We construct a matrix P whose columns are the eigenvectors of L: The diagonalization of L is then given by the equation

From this it follows that

for

For any initial age distribution vector

for

. Dividing both sides of this equation by

, we then have

and using the fact that

, we have

(9)

Because

is the dominant eigenvalue, we have

for

. It follows that

Using this fact, we can take the limit of both sides of 9 to obtain

(10)

Let us denote the first entry of the column vector by the constant c. As we ask you to show in Exercise 4, the right side of 10 can be written as , where c is a positive constant that depends only on the initial age distribution vector . Thus, 10 becomes (11) Equation 11 gives us the approximation (12) for large values of k. From 12 we also have (13) Comparing Equations 12 and 13, we see that (14) for large values of k. This means that for large values of time, each age distribution vector is a scalar multiple of the preceding age distribution vector, the scalar being the positive eigenvalue of the Leslie matrix. Consequently, the proportion of females in each of the age classes becomes constant. As we will see in the following example, these limiting proportions can be determined from the eigenvector .

E X A M P L E 3 Example 1 Revisited The Leslie matrix in Example 1 was

Its characteristic polynomial is eigenvalue is

, and you can verify that the positive

. From 8 the corresponding eigenvector

is

From 14 we have

for large values of k. Hence, every five years the number of females in each of the three classes will increase by about 50%, as will the total number of females in the population. From 12 we have

Consequently, eventually the females will be distributed among the three age classes in the ratios . This corresponds to a distribution of 72% of the females in the first age class, 24% of the females in the second age class, and 4% of the females in the third age class.

E X A M P L E 4 Female Age Distribution for Humans In this example we use birth and death parameters from the year 1965 for Canadian females. Because few women over 50 years of age bear children, we restrict ourselves to the portion of the female population between 0 and 50 years of age. The data are for 5-year age classes, so there are a total of 10 age classes. Rather than writing out the Leslie matrix in full, we list the birth and death parameters as follows:

Using numerical techniques, we can approximate the positive eigenvalue and corresponding eigenvector by

Thus, if Canadian women continued to reproduce and die as they did in 1965, eventually every 5 years their numbers would increase by 7.622%. From the eigenvector , we see that, in the limit, for every 100,000 females between 0 and 5 years of age, there will be 92,594 females between 5 and 10 years of age, 85,881 females between 10 and 15 years of age, and so forth.

Let us look again at Equation 12, which gives the age distribution vector of the population for large times: (15) Three cases arise according to the value of the positive eigenvalue

:

The case is particularly interesting because it determines a population that has zero population growth. For any initial age distribution, the population approaches a limiting age distribution that is some multiple of the eigenvector . From Equations 6 and 7, we see that is an eigenvalue if and only if

(16) The expression (17) is called the net reproduction rate of the population. (See Exercise 5 for a demographic interpretation of R.) Thus, we can say that a population has zero population growth if and only if its net reproduction rate is 1.

Exercise Set 10.17 1. Suppose that a certain animal population is divided into two age classes and has a Leslie matrix

(a) Calculate the positive eigenvalue

of L and the corresponding eigenvector

.

(b) Beginning with the initial age distribution vector

calculate (c) Calculate .

,

,

,

, and

, rounding off to the nearest integer when necessary.

using the exact formula

and using the approximation formula

Answer: (a)

(b) (c) 2. Find the characteristic polynomial of a general Leslie matrix given by Equation 4. 3. (a) Show that the positive eigenvalue of a Leslie matrix is always simple. Recall that a root polynomial is simple if and only if . (b) Show that the eigenspace corresponding to 4. Show that the right side of Equation 10 is

of a

has dimension 1.

, where c is the first entry of the column vector

5. Show that the net reproduction rate R, defined by 17, can be interpreted as the average number of daughters born to a single female during her expected lifetime.

.

6. Show that a population is eventually decreasing if and only if its net reproduction rate is less than 1. Similarly, show that a population is eventually increasing if and only if its net reproduction rate is greater than 1. 7. Calculate the net reproduction rate of the animal population in Example 1. Answer: 2.375 8. (For readers with a hand calculator) Calculate the net reproduction rate of the Canadian female population in Example 4. Answer: 1.49611 9. (For readers who have read Section 10.1–Section 10.3) Prove Theorem 10.17.2. [Hint: Write substitute into 7, take the real parts of both sides, and show that .

,

Section 10.17 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Consider the sequence of Leslie matrices

(a) Use a computer to show that for a suitable choice of a in terms of

,

.

(b) From your results in part (a), conjecture a relationship between a and , where

,

that will make

(c) Determine an expression for when a and ,

and use it to show that all eigenvalues of are related by the equation determined in part (b).

satisfy

T2. Consider the sequence of Leslie matrices

where

,

and

.

(a) Choose a value for n (say, ). For various values of a, b, and p, use a computer to determine the dominant eigenvalue of , and then compare your results to the value of . (b) Show that

which means that the eigenvalues of

must satisfy

(c) Can you now provide a rough proof to explain the fact that

?

T3. Suppose that a population of mice has a Leslie matrix L over a 1-month period and an initial age

distribution vector

given by

(a) Compute the net reproduction rate of the population. (b) Compute the age distribution vector after 100 months and 101 months, and show that the vector after 101 weeks is approximately a scalar multiple of the vector after 100 months. (c) Compute the dominant eigenvalue of L and its corresponding eigenvector. How are they related to your results in part (b)? (d) Suppose you wish to control the mouse population by feeding it a substance that decreases its age-specific birthrates (the entries in the first row of L) by a constant fraction. What range of fractions would cause the population eventually to decrease?

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.18 Harvesting of Animal Populations In this section we employ the Leslie matrix model of population growth to model the sustainable harvesting of an animal population. We also examine the effect of harvesting different fractions of different age groups.

Prerequisites Age-Specific Population Growth (Section 10.17)

Harvesting In Section 10.17 we used the Leslie matrix model to examine the growth of a female population that was divided into discrete age classes. In this section, we investigate the effects of harvesting an animal population growing according to such a model. By harvesting we mean the removal of animals from the population. (The word harvesting is not necessarily a euphemism for “slaughtering”; the animals may be removed from the population for other purposes.) In this section we restrict ourselves to sustainable harvesting policies. By this we mean the following:

DEFINITION 1 A harvesting policy in which an animal population is periodically harvested is said to be sustainable if the yield of each harvest is the same and the age distribution of the population remaining after each harvest is the same.

Thus, the animal population is not depleted by a sustainable harvesting policy; only the excess growth is removed. As in Section 10.17, we will discuss only the females of the population. If the number of males in each age class is equal to the number of females—a reasonable assumption for many populations—then our harvesting policies will also apply to the male portion of the population.

The Harvesting Model Figure 10.18.1 illustrates the basic idea of the model. We begin with a population having a particular age distribution. It undergoes a growth period that will be described by the Leslie matrix. At the end of the growth period, a certain fraction of each age class is harvested in such a way that the unharvested population has the

same age distribution as the original population. This cycle repeats after each harvest so that the yield is sustainable. The duration of the harvest is assumed to be short in comparison with the growth period so that any growth or change in the population during the harvest period can be neglected.

Figure 10.18.1 To describe this harvesting model mathematically, let

be the age distribution vector of the population at the beginning of the growth period. Thus is the number of females in the ith class left unharvested. As in Section 10.17, we require that the duration of each age class be identical with the duration of the growth period. For example, if the population is harvested once a year, then the population is divided into 1-year age classes. If L is the Leslie matrix describing the growth of the population, then the vector is the age distribution vector of the population at the end of the growth period, immediately before the periodic harvest. Let , for , be the fraction of females from the ith class that is harvested. We use these n numbers to form an diagonal matrix

which we will call the harvesting matrix. By definition, we have That is, we can harvest none , all , or some fraction of each of the n classes. Because the number of females in the ith class immediately before each harvest is the ith entry of the vector , the ith entry of the column vector

is the number of females harvested from the ith class. From the definition of a sustainable harvesting policy, we have

or, mathematically, (1) If we write Equation 1 in the form (2) we see that x must be an eigenvector of the matrix now show, this places certain restrictions on the values of

corresponding to the eigen- value 1. As we will and x.

Suppose that the Leslie matrix of the population is

(3)

Then the matrix

is (verify)

Thus, we see that is a matrix with the same mathematical form as a Leslie matrix. In Section 10.17 we showed that a necessary and sufficient condition for a Leslie matrix to have 1 as an eigenvalue is that its net reproduction rate also be 1 [see Eq. 16 of Section 10.17]. Calculating the net reproduction rate of and setting it equal to 1, we obtain (verify) (4) This equation places a restriction on the allowable harvesting fractions. Only those values of

that satisfy 4 and that lie in the interval

can produce a sustainable yield.

If do satisfy 4, then the matrix has the desired eigenvalue . Furthermore, this eigenvalue has multiplicity 1, because the positive eigenvalue of a Leslie matrix always has multiplicity 1 (Theorem 10.17.1). This means that there is only one linearly independent eigenvector x satisfying Equation 2. [See Exercise 3(b) of Section 10.17.] One possible choice for x is the following normalized eigenvector:

(5)

Any other solution x of 2 is a multiple of . Thus, the vector determines the proportion of females within each of the n classes after a harvest under a sustainable harvesting policy. But there is an ambiguity in the total number of females in the population after each harvest. This can be determined by some auxiliary condition, such as an ecological or economic constraint. For example, for a population economically supported by the harvester, the largest population the harvester can afford to raise between harvests would determine the particular constant that is multiplied by to produce the appropriate vector x in Equation 2. For a wild population, the natural habitat of the population would determine how large the total population could be between harvests. Summarizing our results so far, we see that there is a wide choice in the values of that will produce a sustainable yield. But once these values are selected, the proportional age distribution of the population after each harvest is uniquely determined by the normalized eigenvector defined by Equation 5. We now consider a few particular harvesting strategies of this type.

Uniform Harvesting With many populations it is difficult to distinguish or catch animals of specific ages. If animals are caught at random, we can reasonably assume that the same fraction of each age class is harvested. We therefore set Equation 2 then reduces to (verify)

Hence,

must be the unique positive eigenvalue

of the Leslie growth matrix L. That is,

Solving for the harvesting fraction h, we obtain (6) The vector

, in this case, is the same as the eigenvector of L corresponding to the eigenvalue

. From

Equation 8 of Section 10.17, this is

(7)

From 6 we can see that the larger the population. Note that we need This is to be expected, because

is, the larger is the fraction of animals we can harvest without depleting in order for the harvesting fraction h to lie in the interval . is the condition that the population be increasing.

E X A M P L E 1 Harvesting Sheep For a certain species of domestic sheep in New Zealand with a growth period of 1 year, the following Leslie matrix was found (see G. Caughley, “Parameters for Seasonally Breeding Populations,” Ecology, 48, 1967, pp. 834–839).

The sheep have a lifespan of 12 years, so they are divided into 12 age classes of duration 1 year each. By the use of numerical techniques, the unique positive eigenvalue of L can be found to be From Equation 6, the harvesting fraction h is Thus, the uniform harvesting policy is one in which 15.0 % of the sheep from each of the 12 age classes is harvested every year. From 7 the age distribution vector of the sheep after each harvest is proportional to

(8)

From 8 we see that for every 1000 sheep between 0 and 1 year of age that are not harvested, there are 719 sheep between 1 and 2 years of age, 596 sheep between 2 and 3 years of age, and so forth.

Harvesting Only the Youngest Age Class In some populations only the youngest females are of any economic value, so the harvester seeks to harvest only the females from the youngest age class. Accordingly, let us set

Equation 4 then reduces to or where R is the net reproduction rate of the population. [See Equation 17 of Section 10.17.] Solving for h, we obtain (9) Note from this equation that a sustainable harvesting policy is possible only if . This is reasonable because only if is the population increasing. From Equation 5, the age distribution vector after each harvest is proportional to the vector

(10)

E X A M P L E 2 Sustainable Harvesting Policy Let us apply this type of sustainable harvesting policy to the sheep population in Example 1. For the net reproduction rate of the population we find

From Equation 9, the fraction of the first age class harvested is From Equation 10, the age distribution of the sheep population after the harvest is proportional to the vector

(11)

A direct calculation gives us the following (see also Exercise 3):

(12)

The vector is the age distribution vector immediately before the harvest. The total of all entries in is 8.520, so the first entry 2.514 is 29.5% of the total. This means that immediately before each harvest, 29.5% of the population is in the youngest age class. Since 60.2% of this class is harvested, it follows that 17.8% (= 60.2% of 29.5%) of the entire sheep population is harvested each year. This can be compared with the uniform harvesting policy of Example 1, in which 15.0% of the sheep population is harvested each year.

Optimal Sustainable Yield We saw in Example 1 that a sustainable harvesting policy in which the same fraction of each age class is harvested produces a yield of 15.0 % of the sheep population. In Example 2 we saw that if only the youngest age class is harvested, the resulting yield is 17.8 % of the population. There are many other possible sustainable harvesting policies, and each generally provides a different yield. It would be of interest to find a sustainable harvesting policy that produces the largest possible yield. Such a policy is called an optimal sustainable harvesting policy, and the resulting yield is called the optimal sustainable yield. However, determining the optimal sustainable yield requires linear programming theory, which we will not discuss here. We refer you to the following result, which appears in J. R. Beddington and D. B. Taylor, “Optimum Age Specific Harvesting of a Population,” Biometrics, 29, 1973, pp. 801–809.

THEOREM 10.18.1 Optimal Sustainable Yield An optimal sustainable harvesting policy is one in which either one or two age classes are harvested. If two age classes are harvested, then the older age class is completely harvested.

As an illustration, it can be shown that the optimal sustainable yield of the sheep population is attained when

(13) and all other values of are zero. Thus, 52.2 % of the sheep between 0 and 1 year of age and all the sheep between 8 and 9 years of age are harvested. As we ask you to show in Exercise 2, the resulting optimal sustainable yield is 19.9 % of the population.

Exercise Set 10.18 1. Let a certain animal population be divided into three 1-year age classes and have as its Leslie matrix

(a) Find the yield and the age distribution vector after each harvest if the same fraction of each of the three age classes is harvested every year. (b) Find the yield and the age distribution vector after each harvest if only the youngest age class is harvested every year. Also, find the fraction of the youngest age class that is harvested. Answer: (a) of population;

(b) of population;

; harvest 57.9% of youngest age class

2. For the optimal sustainable harvesting policy described by Equations 13, find the vector that specifies the age distribution of the population after each harvest. Also calculate the vector and verify that the optimal sustainable yield is 19.9 % of the population. Answer:

3. Use Equation 10 to show that if only the first age class of an animal population is harvested

where R is the net reproduction rate of the population. 4. If only the ith class of an animal population is to be periodically harvested corresponding harvesting fraction .

, find the

Answer:

5. Suppose that all of the Jth class and a certain fraction periodically harvested . Calculate .

of the Ith class of an animal population is to be

Answer:

Section 10.18 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. The results of Theorem 10.18.1 suggest the following algorithm for determining the optimal sustainable yield.

1. For each value of , set and for and calculate the respective yields. These n calculations give the one-age-class results. Of course, any calculation leading to a value of h not between 0 and 1 is rejected. 2. For each value of

and

j and calculate the respective yields. These

, set

,

and

for

calculations give the two-age-class results. Of

course, any calculation leading to a value of h not between 0 and 1 is again rejected. 3. Of the yields calculated in parts (i) and (ii), the largest is the optimal sustainable yield. Note that there will be at most

calculations in all. Once again, some of these may lead to a value of h not between 0 and 1 and must therefore be rejected. If we use this algorithm for the sheep example in the text, there will be at most calculations to consider. Use a computer to do the two-age-class calculations for , and for or j for Construct a summary table consisting of the values of and the percentage yields using which will show that the largest of these yields occurs when

.

T2. Using the algorithm in Exercise T1 , do the one-age-class calculations for and for for Construct a summary table consisting of the values of and the percentage yields using , which will show that the largest of these yields occurs when . T3. Referring to the mouse population in Exercise T3 of Section 10.17, suppose that reducing the birthrates is not practical, so you instead decide to control the population by uniformly harvesting all of the age classes monthly. (a) What fraction of the population must be harvested monthly to bring the mouse population to equilibrium eventually? (b) What is the equilibrium age distribution vector under this uniform harvesting policy? (c) The total number of mice in the original mouse population was 155. What would be the total number of mice after 5, 10, and 200 months under your uniform harvesting policy?

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

10.19 A Least Squares Model for Human Hearing In this section we apply the method of least squares approximation to a model for human hearing. The use of this method is motivated by energy considerations.

Prerequisites Inner Product Spaces Orthogonal Projection Fourier Series (Section 6.6)

Anatomy of the Ear We begin with a brief discussion of the nature of sound and human hearing. Figure 10.19.1 is a schematic diagram of the ear showing its three main components: the outer ear, middle ear, and inner ear. Sound waves enter the outer ear where they are channeled to the eardrum, causing it to vibrate. Three tiny bones in the middle ear mechanically link the eardrum with the snail-shaped cochlea within the inner ear. These bones pass on the vibrations of the eardrum to a fluid within the cochlea. The cochlea contains thousands of minute hairs that oscillate with the fluid. Those near the entrance of the cochlea are stimulated by high frequencies, and those near the tip are stimulated by low frequencies. The movements of these hairs activate nerve cells that send signals along various neural pathways to the brain, where the signals are interpreted as sound.

Figure 10.19.1 The sound waves themselves are variations in time of the air pressure. For the auditory system, the most elementary type of sound wave is a sinusoidal variation in the air pressure. This type of sound wave stimulates the hairs within the cochlea in such a way that a nerve impulse along a single neural pathway is produced (Figure 10.19.2). A sinusoidal sound wave can be described by a function of time

(1) where is the atmospheric pressure at the eardrum, is the normal atmospheric pres-sure, A is the maximum deviation of the pressure from the normal atmospheric pressure, is the frequency of the wave in cycles per second, and is the phase angle of the wave. To be perceived as sound, such sinusoidal waves must have frequencies within a certain range. For humans this range is roughly 20 cycles per second (cps) to 20,000 cps. Frequencies outside this range will not stimulate the hairs within the cochlea enough to produce nerve signals.

Figure 10.19.2 To a reasonable degree of accuracy, the ear is a linear system. This means that if a complex sound wave is a finite sum of sinusoidal components of different amplitudes, frequencies, and phase angles, say, (2) then the response of the ear consists of nerve impulses along the same neural pathways that would be stimulated by the individual components (Figure 10.19.3).

Figure 10.19.3 Let us now consider some periodic sound wave with period T [i.e., ] that is not a finite sum of sinusoidal waves. If we examine the response of the ear to such a periodic wave, we find that it is the same as the response to some wave that is the sum of sinusoidal waves. That is, there is some sound wave as given by Equation 2 that produces the same response as , even though and are different functions of time. We now want to determine the frequencies, amplitudes, and phase angles of the sinusoidal components of . Because produces the same response as the periodic wave , it is reasonable to expect that has the same period T as . This requires that each sinusoidal term in have period T. Consequently, the frequencies

of the sinusoidal components must be integer multiples of the basic frequency in Equation 2 must be of the form

of the function

. Thus, the

But because the ear cannot perceive sinusoidal waves with frequencies greater than 20,000 cps, we may omit those values of k for which is greater than 20,000. Thus, is of the form (3) where n is the largest integer such that

is not greater than 20,000.

We now turn our attention to the values of the amplitudes and the phase angles appear in Equation 3. There is some criterion by which the auditory system “picks” these values so that produces the same response as . To examine this criterion, let us set

that

If we consider as an approximation to , then is the error in this approximation, an error that the ear cannot perceive. In terms of , the criterion for the determination of the amplitudes and the phase angles is that the quantity (4) be as small as possible. We cannot go into the physiological reasons for this, but we note that this expression is proportional to the acoustic energy of the error wave over one period. In other words, it is the energy of the difference between the two sound waves and that determines whether the ear perceives any difference between them. If this energy is as small as possible, then the two waves produce the same sensation of sound. Mathematically, the function in 4 is the least squares approximation to from the vector space of continuous functions on the interval . (See Section 6.6.) Least squares approximations by continuous functions arise in a wide variety of engineering and scientific approximation problems. Apart from the acoustics problem just discussed, some other examples follow. 1. Let be the axial strain distribution in a uniform rod lying along the x-axis from 10.19.4). The strain energy in the rod is proportional to the integral

to

(Figure

The closeness of an approximation to can be judged according to the strain energy of the difference of the two strain distributions. That energy is proportional to

which is a least squares criterion. 2. Let be a periodic voltage across a resistor in an electrical circuit (Figure 10.19.5). The electrical energy transferred to the resistor during one period T is proportional to

If has the same period as and is to be an approximation to be taken as the energy of the difference voltage. This is proportional to

, then the criterion of closeness might

which is again a least squares criterion. 3. Let from

If

be the vertical displacement of a uniform flexible string whose equilibrium position is along the x-axis to (Figure 10.19.6). The elastic potential energy of the string is proportional to

is to be an approximation to the displacement, then as before, the energy integral

determines a least squares criterion for the closeness of the approximation.

Figure 10.19.4

Figure 10.19.5

Figure 10.19.6 Least squares approximation is also used in situations where there is no a priori justification for its use, such as for approximating business cycles, population growth curves, sales curves, and so forth. It is used in these cases because of its mathematical simplicity. In general, if no other error criterion is immediately apparent for an approximation problem, the least squares criterion is the one most often chosen. The following result was obtained in Section 6.6.

THEOREM 10.19.1 Minimizing Mean Square Error on [0, 2π] If

is continuous on

then the trigonometric function

of the form

that minimizes the mean square error

has coefficients

If the original function is defined over the interval following result (see Exercise 8):

instead of

, a change of scale will yield the

THEOREM 10.19.2 Minimizing Mean Square Error on [0, T] If

is continuous on

then the trigonometric function

of the form

that minimizes the mean square error

has coefficients

E X A M P L E 1 Least Squares Approximation to a Sound Wave Let a sound wave have a saw-tooth pattern with a basic frequency of 5000 cps (Figure 10.19.7). Assume units are chosen so that the normal atmospheric pressure is at the zero level and the maximum amplitude of the wave is A. The basic period of the wave is second. From to , the function has the equation

Theorem 10.19.2 then yields the following (verify):

We can now investigate how the sound wave , so we need only go up to approximation to is then

is perceived by the human ear. We note that in the formulas above. The least squares

The four sinusoidal terms have frequencies of 5000, 10,000, 15,000, and 20,000 cps, respectively. In Figure 10.19.8 we have plotted and over one period. Although is not a very good point-by-point approximation to , to the ear, both and produce the same sensation of sound.

Figure 10.19.7

Figure 10.19.8

As discussed in Section 6.6, the least squares approximation becomes better as the number of terms in the approximating trigonometric polynomial becomes larger. More precisely,

tends to zero as n approaches infinity. We denote this by writing

where the right side of this equation is the Fourier series of . Whether the Fourier series of converges to for each t is another question, and a more difficult one. For most continuous functions encountered in applications, the Fourier series does indeed converge to its corresponding function for each value of t.

Exercise Set 10.19 1. Find the trigonometric polynomial of order 3 that is the least squares approximation to the function over the interval . Answer:

2. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function over the interval

.

Answer:

3. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function the interval , where

over

Answer:

4. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function over the interval . Answer:

5. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function over the interval , where

Answer:

6. For the inner product

show that (a) (b) (c) 7. Show that the

functions

are orthogonal over the interval

relative to the inner product

defined in Exercise 6.

8. If is defined and continuous on the interval , show that is defined and continuous for in the interval . Use this fact to show how Theorem 10.19.2 follows from Theorem 10.19.1.

Section 10.19 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Let g be the function

for

. Use a computer to determine the Fourier coefficients

for k = 0, 1, 2, 3, 4, 5. From your results, make a conjecture about the general expressions for conjecture by calculating

on the computer and see whether it converges to T2. Let g be the function

.

and

. Test your

for

. Use a computer to determine the Fourier coefficients

for k = 0, 1, 2, 3, 4, 5. From your results, make a conjecture about the general expressions for conjecture by calculating

on the computer and see whether it converges to g(t).

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

and

. Test your

10.20 Warps and Morphs Among the more interesting image-manipulation techniques available for computer graphics are warps and morphs. In this section we show how linear transformations can be used to distort a single picture to produce a warp, or to distort and blend two pictures to produce a morph.

Prerequisites Geometry of Linear Operators on

(Section 4.11)

Linear Independence Bases in

Computer graphics software enables you to manipulate an image in various ways, such as by scaling, rotating, or slanting the image. Distorting an image by separately moving the corners of a rectangle containing the image is another basic image-manipulation technique. Distorting various pieces of an image in different ways is a more complicated procedure that results in a warp of the picture. In addition, warping two different images in complementary ways and blending the warps results in a morph of the two pictures (from the Greek root meaning “shape” or “form”). An example is Figure 10.20.1 in which four photographs of a woman taken over a 50-year period (the four diagonal pictures from top left to bottom right) have been pairwise morphed by different amounts to suggest the gradual aging of the woman.

Figure 10.20.1 The most visible application of warping and morphing images has been the production of special effects in motion pictures and television. However, many scientific and technological applications of such techniques have also arisen—for example, studying the evolution, growth, and development of living organisms, assisting in reconstructive and cosmetic surgery, exploring various designs of a product, and “aging” photographs of missing persons or police suspects.

Warps We begin by describing a simple warp of a triangular region in the plane. Let the three vertices of a triangle be given by the three noncollinear points , , and (Figure 10.20.2a). We will call this triangle the begintriangle. If v is any point in the begin-triangle, then there are unique constants and such that (1)

Equation 1 expresses the vector as a (unique) linear combination of the two linearly independent vectors and with respect to an origin at . If we set , then we can rewrite 1 as (2) where (3) from the definition of . We say that v is a convex combination of the vectors , , and if 2 and 3 are satisfied and, in addition, the coefficients , , and are nonnegative. It can be shown (Exercise 6) that v lies in the triangle determined by , , and if and only if it is a convex combination of those three vectors.

Figure 10.20.2 Next, given three noncollinear points , , and of an end-triangle (Figure 10.20.2b), there is a unique affine transformation that maps to , to , and to . That is, there is a unique invertible matrix M and a unique vector b such that (4) (See Exercise 5 for the evaluation of M and b.) Moreover, it can be shown (Exercise 3) that the image w of the vector v in 2 under this affine transformation is

(5) This is a basic property of affine transformations: They map a convex combination of vectors to the same convex combination of the images of the vectors. Now suppose that the begin-triangle contains a picture within it (Figure 10.20.3a). That is, to each point in the begin-triangle we assign a gray level, say 0 for white and 100 for black, with any other gray level lying between 0 and 100. In particular, let a scalar-valued function , called the picture-density of the begintriangle, be defined so that is the gray level at the point v in the begin-triangle. We can now define a picture in the end-triangle, called a warp of the original picture, with a picture-density by defining the gray level at the point w within the end-triangle to be the gray level of the point v in the begin-triangle that maps onto w. In equation form, the picture-density is determined by (6) In this way, as , , and vary over all nonnegative values that add to one, 5 generates all points w in the end-triangle, and 6 generates the gray levels of the warped picture at those points (Figure 10.20.3b).

Figure 10.20.3 Equation 6 determines a very simple warp of a picture within a single triangle. More generally, we can break up a picture into many triangular regions and warp each triangular region differently. This gives us much freedom in designing a warp through our choice of triangular regions and how we change them. To this end, suppose we are given a picture contained within some rectangular region of the plane. We choose n points ,

within the rectangle, which we call vertex points, so that they fall on key elements or features of the picture we wish to warp (Figure 10.20.4a). Once the vertex points are chosen, we complete a triangulation of the rectangular region; that is, we draw line segments between the vertex points in such a way that we have the following conditions (Figure 10.20.4b): 1. The line segments form the sides of a set of triangles. 2. The line segments do not intersect. 3. Each vertex point is the vertex of at least one triangle. 4. The union of the triangles is the rectangle. 5. The set of triangles is maximal (i.e., no more vertices can be connected). Note that condition 4 requires that each corner of the rectangle containing the picture be a vertex point.

Figure 10.20.4 One can always form a triangulation from any n vertex points, but the triangulation is not necessarily unique.

For example, Figures 10.20.4b and 10.20.4c are two different triangulations of the set of vertex points in Figure 10.20.4a. Since there are various computer algorithms that perform triangulations very quickly, it is not necessary to perform the tiresome triangulation task by hand; one need only specify the desired vertex points and let a computer generate a triangulation from them. If n is the number of vertex points chosen, it can be shown that the number of triangles m of any triangulation of those points is given by (7) where k is the number of vertex points lying on the boundary of the rectangle, including the four situated at the corner points. The warp is specified by moving the n vertex points , to new locations , according to the changes we desire in the picture (Figures 10.20.5a and 10.20.5b). However, we impose two restrictions on the movements of the vertex points: 1. The four vertex points at the corners of the rectangle are to remain fixed, and any vertex point on a side of the rectangle is to remain fixed or move to another point on the same side of the rectangle. All other vertex points are to remain in the interior of the rectangle. 2. The triangles determined by the triangulation are not to overlap after their vertices have been moved. The first restriction guarantees that the rectangular shape of the begin-picture is preserved. The second restriction guarantees that the displaced vertex points still form a triangulation of the rectangle and that the new triangulation is similar to the original one. For example, Figure 10.20.5c is not an allowable movement of the vertex points shown in Figure 10.20.5a. Although a violation of this condition can be handled mathematically without too much additional effort, the resulting warps usually produce unnatural results and we will not consider them here.

Figure 10.20.5 Figure 10.20.6 is a warp of a photograph of a woman using a triangulation with 94 vertex points and 179 triangles. Note that the vertex points in the begin-triangulation are chosen to lie along key features of the picture (hairline, eyes, lips, etc.). These vertex points were moved to final positions corresponding to those same features in a picture of the woman taken 20 years after the begin-picture. Thus, the warped picture represents the woman forced into her older shape but using her younger gray levels.

Figure 10.20.6

Time-Varying Warps A time-varying warp is the set of warps generated when the vertex points of the begin-picture are moved continually in time from their original positions to specified final positions. This gives us a motion picture in which the begin-picture is continually warped to a final warp. Let us choose time units so that corresponds to our begin-picture and corresponds to our final warp. The simplest way of moving the vertex points from time 0 to time 1 is with constant velocity along straight-line paths from their initial

positions to their final positions. To describe such a motion, let denote the position of the ith vertex point at any time t between 0 and 1. Thus (its given position in the begin-picture) and (its given position in the final warp). In between, we determine its position by (8) Note that 8 expresses as a convex combination of and for each t in [0, 1]. Figure 10.20.7 illustrates a time-varying triangulation of a plain rectangular region with six vertex points. The lines connecting the vertex points at the different times are the space-time paths of these vertex points in this space-time diagram.

Figure 10.20.7 Once the positions of the vertex points are computed at time t, a warp is performed between the begin-picture and the triangulation at time t determined by the displaced vertex points at that time. Figure 10.20.8 shows a time-varying warp at five values of t generated from the warp between and shown in Figure 10.20.6.

Figure 10.20.8

Morphs A time-varying morph can be described as a blending of two time-varying warps of two different pictures using two triangulations that match corresponding features in the two pictures. One of the two pictures is designated as the begin-picture and the other as the end-picture. First, a time-varying warp from to is generated in which the begin-picture is warped into the shape of the end-picture. Then a time-varying warp from to is generated in which the end-picture is warped into the shape of the begin-picture. Finally, a weighted average of the gray levels of the two warps at each time t is produced to generate the morph of the two images at time t. Figure 10.20.9 shows two photographs of a woman taken 20 years apart. Below the pictures are two corresponding triangulations in which corresponding features of the two photographs are matched. The time-varying morph between these two pictures for five values of t between 0 and 1 is shown in Figure 10.20.10.

Figure 10.20.9

Figure 10.20.10

The procedure for producing such a morph is outlined in the following nine steps (Figure 10.20.11): Step 1 Given a begin-picture with picture-density and an end-picture with picture-density vertex points , in the begin-picture at key features of that picture. Step 2 Position n corresponding vertex points features of that picture.

,

, position n

in the end-picture at the corresponding key

Step 3 Triangulate the begin- and end-pictures in similar ways by drawing lines between corresponding vertex points in both pictures. Step 4 For any time t between 0 and 1, find the vertex points that time, using the formula

,

in the morph picture at

(9) Step 5 Triangulate the morph picture at time t similar to the begin- and end-picture triangulations. Step 6 For any point u in the morph picture at time t, find the triangle in the triangulation of the morph picture in which it lies and the vertices , , and of that triangle. (See Exercise 1 to determine whether a given point lies in a given triangle.) Step 7 Express u as a convex combination of such that

,

, and

by finding the constants

,

, and

(10) and (11) Step 8 Determine the locations of the point u in the begin- and end-pictures using (12) and (13) Step 9 Finally, determine the picture-density

of the morph-picture at the point u using (14)

Step 9 is the key step in distinguishing a warp from a morph. Equation 14 takes weighted averages of the gray levels of the begin- and end-pictures to produce the gray levels of the morph-picture. The weights depend on the fraction of the distances that the vertex points have moved from their beginning positions to their ending positions. For example, if the vertex points have moved one-fourth of the way to their destinations (i.e., if ), then we use one-fourth of the gray levels of the end-picture and three-fourths of the gray levels of

the begin-picture. Thus, as time progresses, not only does the shape of the begin-picture gradually change into the shape of the end-picture (as in a warp) but the gray levels of the begin-picture also gradually change into the gray levels of the end-picture.

Figure 10.20.11 The procedure described above to generate a morph is cumbersome to perform by hand, but it is the kind of dull, repetitive procedure at which computers excel. A successful morph demands good preparation and requires more artistic ability than mathematical ability. (The software designer is required to have the mathematical ability.) The two photographs to be morphed should be carefully chosen so that they have matching features, and the vertex points in the two photographs also should be carefully chosen so that the triangles in the two resulting triangulations contain similar features of the two pictures. When the procedure is done correctly, each frame of the morph should look just as “real” as the begin- and end-pictures. The techniques we have discussed in this section can be generalized in numerous ways to produce much more elaborate warps and morphs. For example: 1. If the pictures are in color, the three components of the picture colors (red, green, and blue) can be morphed separately to produce a color morph. 2. Rather than following straight-line paths to their destinations, the vertices of a triangulation can be directed separately along more complicated paths to produce a variety of results. 3. Rather than travel with constant speeds along their paths, the vertices of a triangulation can be directed to have different speeds at different times. For example, in a morph between two faces, the hairline can be made to change first, then the nose, and so forth. 4. Similarly, the gray-level mixing of the begin-picture and end-picture at different times and different vertices can be varied in a more complicated way than that in Equation 14. 5. One can morph two surfaces in three-dimensional space (representing two complete heads, for example) by triangulating the surfaces and using the techniques in this section.

6. One can morph two solids in three-dimensional space (for example, two three-dimensional tomographs of a beating human heart at two different times) by dividing the two solids into corresponding tetrahedral regions. 7. Two film strips can be morphed frame by frame by different amounts between each pair of frames to produce a morphed film strip in which, say, an actor walking along a set is gradually morphed into an ape walking along the set. 8. Instead of using straight lines to triangulate two pictures to be morphed, more complicated curves, such as spline curves, can be matched between the two pictures. 9. Three or more pictures can be morphed together by generalizing the formulas given in this section. These and other generalizations have made warping and morphing two of the most active areas in computer graphics.

Exercise Set 10.20 1. Determine whether the vector v is a convex combination of the vectors , , and . Do this by solving Equations 1 and 3 for , , and and ascertaining whether these coefficients are nonnegative. (a) (b) (c) (d)

Answer: (a) Yes; (b) No; (c) Yes; (d) Yes; 2. Verify Equation 7 for the two triangulations given in Figure 10.20.4. Answer: number of triangles ; Equation 7) is

number of vertex points .

,

number of boundary vertex points

3. Let an affine transformation be given by a 2 × 2 matrix M and a two-dimensional vector b. Let , where ; let ; and let for i = 1, 2, 3. Show that . (This shows that an affine transformation maps a convex combination of vectors to the same convex combination of the images of the vectors.)

Answer:

4. (a) Exhibit a triangulation of the points in Figure 10.20.4 in which the points vertices of a single triangle.

,

, and

form the

(b) Exhibit a triangulation of the points in Figure 10.20.4 in which the points the vertices of a single triangle.

,

, and

do not form

Answer: (a)

(b)

5. Find the matrix M and two-dimensional vector b that define the affine transformation that maps the three vectors , , and to the three vectors , , and . Do this by setting up a system of six linear equations for the four entries of the matrix M and the two entries of the vector b. (a) (b) (c) (d)

Answer:

(a) (b) (c) (d)

6. (a) Let a and b be linearly independent vectors in the plane. Show that if and are nonnegative numbers such that , then the vector lies on the line segment connecting the tips of the vectors a and b. (b) Let a and b be linearly independent vectors in the plane. Show that if and are nonnegative numbers such that , then the vector lies in the triangle connecting the origin and the tips of the vectors a and b. [Hint: First examine the vector multiplied by the scale factor .] (c) Let , , and be noncollinear points in the plane. Show that if , numbers such that , then the vector connecting the tips of the three vectors. [Hint: Let and Equation 1 and part (b) of this exercise.]

, and are nonnegative lies in the triangle , and then use

7. (a) What can you say about the coefficients , , and that determine a convex combination if v lies on one of the three vertices of the triangle determined by the three vectors , , and ? (b) What can you say about the coefficients , , and that determine a convex combination if v lies on one of the three sides of the triangle determined by the three vectors , , and ? (c) What can you say about the coefficients , , and that determine a convex combination if v lies in the interior of the triangle determined by the three vectors and ?

,

,

Answer: (a) Two of the coefficients are zero. (b) At least one of the coefficients is zero. (c) None of the coefficients are zero. 8. (a) The centroid of a triangle lies on the line segment connecting any one of the three vertices of the triangle with the midpoint of the opposite side. Its location on this line segment is two-thirds of the distance from the vertex. If the three vertices are given by the vectors , , and , write the centroid as a convex combination of these three vectors. (b) Use your result in part (a) to find the vector defining the centroid of the triangle with the three vertices ,

, and

.

Answer: (a) (b)

Section 10.20 Technology Exercises The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. To warp or morph a surface in

, and

we must be able to triangulate the surface. Let

be three noncollinear vectors on the surface. Then a vector

,

lies in the

triangle formed by these three vectors if and only if v is a convex combination of the three vectors; that is, for some nonnegative coefficients , , and whose sum is 1. (a) Show that in this case,

,

, and

are solutions of the following linear system:

In parts (b)–(d) determine whether the vector v is a convex combination of the vectors

, and (b)

(c)

.

,

(d)

T2. To warp or morph a solid object in ,

,

we first partition the object into disjoint tetrahedrons. Let

, and

be four noncoplanar vectors. Then a vector

lies in the solid tetrahedron formed by these four vectors if and only if v is a convex combination of the three vectors; that is, whose sum is one. (a) Show that in this case,

for some nonnegative coefficients ,

,

, and

, and

,

, and

are solutions of the following linear system:

In parts (b)–(d) determine whether the vector v is a convex combination of the vectors

,

,

.

(b)

(c)

(d)

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

,

APPENDIX

A

How to Read Theorems

Since many of the most important concepts in linear algebra occur as theorem statements, it is important to be familiar with the various ways in which theorems can be structured. This appendix will help you to do that.

Contrapositive Form of a Theorem The simplest theorems are of the form (1) where H is a statement, called the hypothesis, and C is a statement, called the conclusion. The theorem is true if the conclusion is true whenever the hypothesis is true, and the theorem is false if there is some case where the hypothesis is true but the conclusion is false. It is common to denote a theorem of form 1 as (2) (read, “H implies C”). As an example, the theorem (3) is of form 2, where (4)

(5) Sometimes it is desirable to phrase theorems in a negative way. For example, the theorem in 3 can be rephrased equivalently as (6) If we write is

to mean that 4 is false and

to mean that 5 is false, then the structure of the theorem in 6

(7) In general, any theorem of form 2 can be rephrased in form 7, which is called the contrapositive of 2. If a theorem is true, then so is its contrapositive, and vice versa.

Converse of a Theorem The converse of a theorem is the statement that results when the hypothesis and conclusion are interchanged. Thus, the converse of the theorem is the statement . Whereas the contrapositive of a true theorem must itself be a true theorem, the converse of a true theorem may or may not be true. For example, the converse of 3 is the false statement but the converse of the true theorem (8) is the true theorem (9)

Equivalent Statements If a theorem and its converse statements, which we denote by writing

are both true, then we say that H and C are equivalent

(10) (read, “H and C are equivalent”). There are various ways of phrasing equivalent statements as a single theorem. Here are three ways in which 8 and 9 can be combined into a single theorem.

Form 1 If

, then

, and conversely, if

, then

.

Form 2 if and only if

.

Form 3 The following statements are equivalent. (i) (ii)

Theorems Involving Three or More Statements Sometimes two true theorems will give you a third true theorem for free. Specifically, if is a true theorem, and is a true theorem, then must also be a true theorem. For example, the theorems and imply the third theorem Sometimes three theorems yield equivalent statements for free. For example, if (11) then we have the implication loop in Figure A.1 from which we can conclude that (12) Combining this with 11 we obtain (13) In summary, if you want to prove the three equivalences in 13, you need only prove the three implications in 11.

Figure A.1

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

APPENDIX

B

Complex Numbers

Complex numbers arise naturally in the course of solving polynomial equations. For example, the solutions of the quadratic equation , which are given by the quadratic formula

are complex numbers if the expression inside the radical is negative. In this appendix we will review some of the basic ideas about complex numbers that are used in this text.

Complex Numbers To deal with the problem that the equation century invented the “imaginary” number

has no real solutions, mathematicians of the eighteenth

which is assumed to have the property

but which otherwise has the algebraic properties of a real number. An expression of the form in which a and b are real numbers is called a complex number. Sometimes it will be convenient to use a single letter, typically z, to denote a complex number, in which case we write The number a is called the real part of z and is denoted by Re part of z and is denoted by Im . Thus,

, and the number b is called the imaginary

Two complex numbers are considered equal if and only if their real parts are equal and their imaginary parts are equal; that is, A complex number whose real part is zero is said to be pure imaginary. A complex number whose imaginary part is zero is a real number, so the real numbers can be viewed as a subset of the complex

numbers. Complex numbers are added, subtracted, and multiplied in accordance with the standard rules of algebra but with : (1)

(2)

(3) The multiplication formula is obtained by expanding the left side and using the fact that that if , then the multiplication formula simplifies to

. Also note

(4) The set of complex numbers with these operations is commonly denoted by the symbol C and is called the complex number system.

E X A M P L E 1 Multiplying Complex Numbers As a practical matter, it is usually more convenient to compute products of complex numbers by expansion, rather than substituting in 3. For example,

The Complex Plane A complex number can be associated with the ordered pair of real numbers and represented geometrically by a point or a vector in the xy-plane (Figure B.1). We call this the complex plane. Points on the x-axis have an imaginary part of zero and hence correspond to real numbers, whereas points on the y-axis have a real part of zero and correspond to pure imaginary numbers. Accordingly, we call the x-axis the real axis and the y-axis the imaginary axis (Figure B.2).

Figure B.1

Figure B.2 Complex numbers can be added, subtracted, or multiplied by real numbers geometrically by performing these operations on their associated vectors (Figure B.3, for example). In this sense the complex number system C is closely related to , the main difference being that complex numbers can be multiplied to produce other complex numbers, whereas there is no multiplication operation on that produces other vectors in (the dot product produces a scalar, not a vector in ).

Figure B.3 If is a complex number, then the complex conjugate of z, or more simply, the conjugate of z, is denoted by (read, “z bar”) and is defined by (5) Numerically, is obtained from z by reversing the sign of the imaginary part, and geometrically it is obtained by reflecting the vector for z about the real axis (Figure B.4).

Figure B.4

E X A M P L E 2 Some Complex Conjugates

Remark The last computation in this example illustrates the fact that a real number is equal to its complex conjugate. More generally, if and only if z is a real number. The following computation shows that the product of a complex number is a nonnegative real number:

and its conjugate

(6) You will recognize that

is the length of the vector corresponding to z (Figure B.5); we call this length the modulus (or absolute value of z) and denote it by . Thus, (7) Note that if

, then

is a real number and

, which tells us that the modulus of a real

number is the same as its absolute value as defined in beginning algebra.

Figure B.5

E X A M P L E 3 Some Modulus Computations

Reciprocals and Division If , then the reciprocal (or multiplicative inverse) of z is denoted by property

This equation has a unique solution for the fact that [see 7]. This yields

(or

) and is defined by the

, which we can obtain by multiplying both sides by and using

(8) If

, then the quotient

is defined to be the product of

and

. This yields the formula (9)

Observe that the expression on the right side of 9 results if the numerator and denominator of are multiplied by . As a practical matter, this is often the best way to perform divisions of complex numbers.

E X A M P L E 4 Division of Complex Numbers Let

and

. Express

in the form

.

Solution We will multiply the numerator and denominator of

by

. This yields

The following theorems list some useful properties of the modulus and conjugate operations.

THEOREM B.1 The following results hold for any complex numbers

and

.

and

.

(a) (b) (c) (d) (e)

THEOREM B.2 The following results hold for any complex numbers (a) (b) (c) (d)

Polar Form of a Complex Number If is a nonzero complex number, and if is an angle from the real axis to the vector z, then, as suggested in Figure B.6, the real and imaginary parts of z can be expressed as (10) Thus, the complex number

can be expressed as (11)

which is called a polar form of z. The angle φ in this formula is called an argument of z. The argument of z is not unique because we can add or subtract any multiple of to it to obtain a different argument of z. However, there is only one argument whose radian measure satisfies (12) This is called the principal argument of z.

Figure B.6

E X A M P L E 5 Polar Form of a Complex Number Express

in polar form using the principal argument.

Solution The modulus of z is

Thus, it follows from 10 with

and

that

and this implies that

The unique angle that satisfies these equations and whose radian measure satisfies 12 is (Figure B.7). Thus, a polar form of z is

Figure B.7

Geometric Interpretation of Multiplication and Division of Complex Numbers We now show how polar forms of complex numbers provide geometric interpretations of multiplication and division. Let be polar forms of the nonzero complex numbers

and

. Multiplying, we obtain

Now applying the trigonometric identities

yields (13) which is a polar form of the complex number with modulus and argument . Thus, we have shown that multiplying two complex numbers has the geometric effect of multiplying their moduli and adding their arguments (Figure B.8).

Figure B.8

Similar kinds of computations show that (14) which tells us that dividing complex numbers has the geometric effect of dividing their moduli and subtracting their arguments (both in the appropriate order).

E X A M P L E 6 Multiplying and Dividing in Polar Form Use polar forms of the complex numbers .

and

to compute

and

Solution Polar forms of these complex numbers are

(verify). Thus, it follows from 13 that

and from 14 that

As a check, let us calculate

and

directly:

which agrees with the results obtained using polar forms.

Remark The complex number i has a modulus of 1 and a principal argument of . Thus, if z is a complex number, then has the same modulus as z but its argument is greater by ; that is, multiplication by i has the geometric effect of rotating the vector z counterclockwise by 90° (Figure B.9).

Figure B.9

DeMoivre's Formula If n is a positive integer, and if z is a nonzero complex number with polar form then raising z to the nth power yields

which we can write more succinctly as (15) In the special case where

this formula simplifies to

which, using the polar form for z, becomes (16) This result is called DeMoivre's formula.

Euler's Formula If θ is a real number, say the radian measure of some angle, then the complex exponential function defined to be

is

(17) which is sometimes called Euler's formula. One motivation for this formula comes from the Maclaurin series in calculus. Readers who have studied infinite series in calculus can deduce 17 by formally substituting for x in the Maclaurin series for and writing

where the last step follows from the Maclaurin series for If

and

is any complex number, then the complex exponential

. is defined to be (18)

It can be proved that complex exponentials satisfy the standard laws of exponents. Thus, for example,

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.

Answer to Exercises Exercise Set 1.1 1. (a), (c), and (f) are linear equations; (b), (d) and (e) are not linear equations 3. (a) and (d) are linear systems; (b) and (c) are not linear systems 5. (a) and (d) are both consistent 7. (a), (d), and (e) are solutions; (b) and (c) are not solutions 9.

a.

b.

11.

a.

b.

c. d.

13.

a.

b. c.

d.

True/False 1.1 (a) True (b) False (c) True (d) True (e) False (f) False (g) True (h) False

Exercise Set 1.2 1.

a. Both b. Both c. Both d. Both e. Both f. Both g. Row echelon

3.

a. b. c.

d. Inconsistent 5. 7. 9. 11. 13. Has nontrivial solutions 15. Has nontrivial solutions 17. 19. 21. 23. 25. If

, there are infinitely many solutions; if

, there are no solutions; if

, there is exactly one solution.

27. If

, there are infinitely many solutions; if

, there are no solutions; if

, there is exactly one solution.

29. 31.

and

are possible answers.

35. 37. 39. The nonhomogeneous system will have exactly one solution.

True/False 1.2 (a) True (b) False (c) False (d) True (e) True (f) False (g) True (h) False (i) False

Exercise Set 1.3 1.

a. Undefined b. c. Undefined d. Undefined e. f. g. Undefined h.

3.

a.

b.

c.

d. e. Undefined f.

g.

h.

i. 5 j. k. 168 l. Undefined 5.

a.

b. Undefined c.

d.

e.

f. g. h.

i. 61 j. 35 k. 28 l. 99 7.

a. b. c.

d.

e. f.

9.

a.

b.

11.

a.

b.

13.

a.

b.

15. 17. 23.

a.

b.

c.

d.

25. a.

b.

c.

d.

27. One; namely, 29.

a. b.

Four;

True/False 1.3 (a) True (b) False (c) False (d) False (e) True (f)

False

(g) False (h) True (i)

True

(j)

True

(k) True (l)

False

(m) True (n) True (o) False

Exercise Set 1.4 5.

7.

9.

15.

17.

19.

a. b. c. d. e.

f. 21.

a.

b.

c.

d.

e.

f.

27.

31. 33. 35.

37.

39. 41.

True/False 1.4 (a) False (b) False (c) False (d) False (e) False (f) True (g) True (h) True (i) False (j) True (k) False

Exercise Set 1.5 1.

a. Elementary b. Not elementary c. Not elementary

d. Not elementary 3.

a.

Add 3 times row 2 to row 1:

b. Multiply row 1 by

:

c. Add 5 times row 1 to row 3: d. Swap rows 1 and 3:

5.

a.

Swap rows 1 and 2:

b. Add

times row 2 to row 3:

c. Add 4 times row 3 to row 1: 7.

a.

b.

c.

d.

9. 11.

13.

15. No inverse 17.

19.

21.

23.

25.

a.

b.

27. 29. 31.

33.

35.

37. Add times the first row to the second row. Add the third row.

True/False 1.5 (a) False (b) True (c) True (d) True (e) True (f) True (g) False

Exercise Set 1.6 1. 3. 5. 7. 9.

i. ii.

times the first row to the third row. Add

times the second row to the first row. Add the second row to

11.

i. ii. iii. iv.

13. No conditions on

and

15. 17. 19.

True/False 1.6 (a) True (b) True (c) True (d) True (e) True (f) True (g) True

Exercise Set 1.7 1.

2.

5.

7.

9.

11.

13. Not symmetric 15. Symmetric 17. Not symmetric 19. Not symmetric 21. Not invertible 23. 25. 27.

35.

a. Yes b. No (unless

)

c. Yes d. No (unless

)

39.

43.

True/False 1.7 (a) True (b) False (c) False (d) True (e) True (f)

False

(g) False (h) True (i)

True

(j)

False

(k) False (l)

False

(m) True

Exercise Set 1.8 1.

3.

a. b. c. For all rates to be nonnegative, we need

cars per hour, so

5. 7. 9.

, and

11.

; the balanced equation is

; the balanced equation is

13. 15. 17.

a. Using

as a parameter,

b. The graphs for

True/False 1.8 (a) True (b) False (c) True (d) False (e) False

Exercise Set 1.9 1.

a.

, and 3 are shown.

where

.

b.

3.

a.

b.

5.

True/False 1.9 (a) False (b) True (c) False (d) True (e) True

Chapter 1 Supplementary Exercises 1.

3.

5. 7. 9.

a. b. c. d.

11. 13.

a. b. c.

15.

Exercise Set 2.1 1.

3.

a. b. c.

d. 5.

7.

9. 11. 13. 15. 17. 19. 21. 23. 0 25. 27. 29. 0 31. 6 33. The determinant is

.

35.

True/False 2.1 (a) False (b) False (c) True (d) True (e) True (f) False (g) False (h) False (i) True

Exercise Set 2.2 5. 7. 9. 1 11. 5 13. 33 15. 6 17. 19. Exercises 14: 39; Exercise 15: 6; Exercise 16: 21. 23. 72 25. 27. 18

True/False 2.2 (a) True (b) True (c) False (d) False (e) True (f) True

Exercise Set 2.3 7. Invertible 9. Invertible

; Exercise 17:

11. Not invertible 13. Invertible 15. 17. 19.

21.

23.

25. 27. 29. Cramer's rule does not apply. 31. 35.

a. b. c. d. e. 7

37.

a. 189 b. c. d.

True/False 2.3 (a) False (b) False (c) True (d) False (e) True (f) True (g) True (h) True (i) True (j) True (k) True (l) False

Chapter 2 Supplementary Exercises 1. 3. 24 5. 7. 329 9. Exercise 3: 24; Exercise 4: 0; Exercise 5:

; Exercise 6:

11. The matrices in Exercise 1–3 are invertible, the matrix in Exercise 4 is not. 13. 15.

17.

19.

21.

23.

25. 29.

(b)

Exercise Set 3.1 1.

a.

b.

c.

d.

e.

f.

3.

a.

b.

c.

d.

e.

f.

5.

a.

b.

c.

7.

a. b.

9.

a. The terminal point is B(2, 3). b. The initial point is

11.

a. b.

13.

a. b. c. d. e. f.

15.

a. b. c. d. e. f.

17.

a. b.

. is one possible answer. is one possible answer.

c. d. e. f. 19.

a. b. c.

21. 23.

a. Not parallel b. Parallel c. Parallel

25. 27. 29. 33.

a. b.

True/False 3.1 (a) False (b) False (c) False (d) True (e) True (f) False (g) False (h) True (i) False (j) True (k) False

Exercise Set 3.2 1.

a. b. c.

3.

a. b. c. d.

5.

a. b. c.

7. 9.

a. b.

11.

a. b. c.

13.

a.

; θ is acute

b.

; θ is obtuse

c.

; θ is obtuse

15. 17.

a.

does not make sense because

b. c.

does not make sense because the quantity inside the norm is a scalar.

d. 19.

makes sense since the terms are both scalars.

a. b. c.

d.

23.

a. b. c. d.

25.

a. b. c.

27. A sphere of radius 1 centered at

True/False 3.2 (a) True (b) True (c) False (d) True (e) True (f) False (g) False (h) False (i) True (j) True

Exercise Set 3.3 1.

a. Orthogonal b. Not orthogonal c. Not orthogonal d. Not orthogonal

3.

a. Not an orthogonal set b. Orthogonal set c. Orthogonal set d. Not an orthogonal set

5. 7. Yes

is a scalar.

makes sense.

.

9. 11. 13. Not parallel 15. Parallel 17. Not perpendicular 19.

a. b.

21. 23. 25. 27. 29. 1 31. 33. 35. 37. 39. 0 (The planes coincide.) 41.

(b)

True/False 3.3 (a) True (b) True (c) True (d) True (e) True (f) False (g) False

Exercise Set 3.4 1. Vector equation:

;

parametric equations: 3. Vector equation:

;

parametric equations: 5. Point:

; parallel vector:

7. Point: (4, 6); parallel vector: 9. Vector equation: parametric equations: 11. Vector equation: parametric equations: 13. A possible answer is vector equation:

;

parametric equations: 15. A possible answer is vector equation: parametric equations: 17. 19.

;

21.

a. b. a plane in

23.

passing through P(1, 0, 0) and parallel to

and

a. b. a line through the origin in c.

25.

a. c.

27.

; The general solution of the associated homogeneous system is particular solution of the given system is

True/False 3.4 (a) True (b) False (c) True (d) True (e) False (f) True

Exercise Set 3.5 1.

a. b. c.

3. 5. 7. 9. 11. 3 13. 7 15. 17. 16 19. The vectors do not lie in the same plane. 21. 23. abc 25.

a. b. 3 c. 3

27.

a. b.

29. 37.

a. b.

True/False 3.5 (a) True (b) True (c) False (d) True (e) False (f) False

.

.A

Chapter 3 Supplementary Exercises 1.

a. b. c. d. e. f.

3.

a. b. c. d.

5. Not an orthogonal set 7.

a. A line through the origin, perpendicular to the given vector. b. A plane through the origin, perpendicular to the given vector. c. {0} (the origin) d. A line through the origin, perpendicular to the plane containing the two noncollinear vectors.

9. True 11. 13. 15. 17. Vector equation:

;

parametric equations: 19. Vector equation:

;

parametric equations: 21. A possible answer is vector equation: 23. 25. 29. A plane

Exercise Set 4.1 1.

(a) (c) Axioms 1–5

3. The set is a vector space with the given operations. 5. Not a vector space, Axioms 5 and 6 fail. 7. Not a vector space. Axiom 8 fails. 9. The set is a vector space with the given operations. 11. The set is a vector space with the given operations.

True/False 4.1 (a) False (b) False (c) True (d) False (e) False

Exercise Set 4.2 1. (a), (c), (e) 3. (a), (b), (d) 5. (a), (c), (d) 7. (a), (b), (d) 9. (a), (b), (c)

; parametric equations:

11.

a. The vectors span b. The vectors do not span c. The vectors do not span d. The vectors span

13. The polynomials do not span 15.

a. Line; b. Line; c. Origin d. Origin e. Line; f. Plane;

True/False 4.2 (a) True (b) True (c) False (d) False (e) False (f) True (g) True (h) False (i) False (j) True (k) False

Exercise Set 4.3 1.

a.

is a scalar multiple of

.

b. The vectors are linearly dependent by Theorem 4.3.3. c.

is a scalar multiple of

.

d. B is a scalar multiple of A. 3. None 5.

a. They do not lie in a plane. b. They do lie in a plane.

7.

(b)

9. 19.

a. They are linearly independent since b. They are not linearly independent since

21. 23.

for some x. a. b.

25.

True/False 4.3 (a) False (b) True (c) False (d) True (e) True (f) False (g) True (h) False

Exercise Set 4.4

for some x.

, and

do not lie in the same plane when they are placed with their initial points at the origin.

, and

line in the same plane when they are placed with their initial points at the origin.

1.

a. A basis for

has two linearly independent vectors.

b. A basis for

has three linearly independent vectors.

c. A basis for

has three linearly independent vectors.

d. A basis for

has four linearly independent vectors.

3. (a), (b) 7.

a. b. c.

9.

a. b.

11. 13. 15. 17.

a. (2, 0) b. c. (0, 1) d.

True/False 4.4 (a) False (b) False (c) True (d) True (e) False

Exercise Set 4.5 1. Basis: (1, 0, 1); dimension = 1 3. Basis:

;

5. No basis; 7.

a. b. (1, 1, 0), (0, 0, 1) c. d. (1, 1, 0), (0, 1, 1)

9.

a. n b. c.

13. Any two of (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1) can be used. 15.

True/False 4.5 (a) True (b) True (c) False (d) True (e) True (f) True (g) True (h) True (i) True (j) False

Exercise Set 4.6

with

1.

a. b.

c.

3.

a.

b.

5.

a. b. c.

7.

a.

b.

c.

9.

a.

b.

11.

(b) (c)

(d) 13.

(a)

(b)

(d)

(e)

15.

(a) (b) (d) (e)

17.

a.

b.

19. 23.

a. a. b.

True/False 4.6 (a) True (b) True (c) True (d) True (e) False (f) False

Exercise Set 4.7 1.

3.

;

a. b. b is not in the column space of A. c.

d.

e.

5.

a. b.

c.

d.

7.

a.

b.

c.

,

d.

9.

a.

b.

c.

d.

11.

a. b. c. (1, 1, 0, 0), (0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1)

15.

(b)

17.

a.

for all real numbers a, b not both 0.

b. Since A and B are invertible, their null spaces are the origin. The null space of C is the line

True/False 4.7 (a) True (b) False (c) False

. The null space of D is the entire xy-plane.

(d) False (e) False (f) True (g) True (h) False (i) True (j) False

Exercise Set 4.8 1. 3.

a. 2; 1 b. 1; 2 c. 2; 2 d. 2; 3 e. 3; 2

5.

a. b. c.

7.

a. Yes, 0 b. No c. Yes, 2 d. Yes, 7 e. No f. Yes, 4 g. Yes, 0

9. 11. No 13. Rank is 2 if 17.

and

; the rank is never 1.

a. 3 b. 5 c. 3 d. 3

19.

True/False 4.8 (a) False (b) True (c) False (d) False (e) True (f) False (g) False (h) False (i) True (j) False

Exercise Set 4.9 1.

a. Domain:

; codomain:

b. Domain:

; codomain:

c. Domain:

; codomain:

d. Domain:

; codomain:

3. 5.

a. Linear; b. Nonlinear;

c. Linear; d. Nonlinear; 7. (a) and (c) are matrix transformations; (b), (d), and (e) are not matrix transformations. 9. ; 11.

a.

b.

c.

d.

13.

a. b.

15.

a. b. (2, 5, 3) c.

17.

a. b. c. (0, 1, 3)

19.

a.

b. c. 21.

a.

b. c. (1, 2, 2) 25.

29.

a. Twice the orthogonal projection on the x-axis. b. Twice the reflection about the x-axis.

31. Rotation through the angle

.

33. Rotation through the angle θ and translation by 35. A line in

.

True/False 4.9 (a) False (b) False (c) False (d) True

; not a matrix transformation since

is nonzero.

(e) False (f) True (g) False (h) False (i) True

Exercise Set 4.10 1.

3.

a. b. c.

5.

,

a. b.

c. 7.

a.

b.

c.

9.

a. b. c.

11.

a. Not one-to-one b. One-to-one c. One-to-one d. One-to-one e. One-to-one f. One-to-one g. One-to-one

13.

a. One-to-one;

b. Not one-to-one c.

One-to-one;

d. Not one-to-one 15.

a. Reflection about the x-axis b. Rotation through the angle c. Contraction by a factor of d. Reflection about the yz-plane e. Dilation by a factor of 5

17.

a. Matrix operator b. Not a matrix operator c. Matrix operator d. Not a matrix operator

19.

a. Matrix transformation b. Matrix transformation

21.

a. b. c.

23.

a. b. c.

25.

a. Yes b. Yes

27. 29.

(b) a. The range of T is a proper subset of

.

b. T must map infinitely many vectors to 0.

True/False 4.10 (a) False (b) True (c) True (d) False (e) False (f) False

Exercise Set 4.11 1.

a. b. c. d.

3.

a.

b.

c.

5.

a.

b.

c.

7. Rectangle with vertices at (0, 0),

9.

a. b.

11.

a. Expansion by a factor of 3 in the x-direction b. Expansion by a factor of 5 in the y-direction and reflection about the x-axis c. Shearing by a factor of 4 in the x-direction

13.

a.

b. c. 17.

a. b. c. d. e.

19.

(b) No

23.

a.

b. Shear in the xz-direction with

factor k maps (x, y, z) to

:

Shear in the yz-direction with factor k maps (x, y, z) to

True/False 4.11 (a) False (b) True (c) True (d) True (e) False (f) False (g) True

Exercise Set 4.12 1.

a. Stochastic b. Not stochastic c. Stochastic d. Not stochastic

3. 5.

a. Regular b. Not regular c. Regular

7.

.

:

.

9.

11.

a. Probability that something in state 1 stays in state 1 b. Probability that something in state 2 moves to state 1 c. 0.8 d. 0.85

13.

a. b. 0.93 c. 0.142 d. 0.63

15.

a. Year

1

2

3

4

5

City

95,750

91,840

88,243

84,933

81,889

Suburbs

29,250

33,160

36,757

40,067

43,111

b.

17.

City

46,875

Suburbs

78,125

a. b.

c. 35, 50, 35 19.

21.

for every positive integer k

True/False 4.12 (a) True (b) True (c) True (d) False (e) True

Chapter 4 Supplementary Exercises 1.

(a) (c) Axioms 1–5

3. If

the solution space is the origin. If

, the solution space is a plane through the origin. If

7. A must be invertible 9.

a. b. c.

11.

a. b.

where

if n is even and

if n is odd.

, the solution space is a line through the origin.

13.

a.

b.

15. Possible ranks are 2, 1, and 0.

Exercise Set 5.1 1. 5 3.

a. b. c. d. e. f.

5.

a.

b.

Basis for eigenspace corresponding to

; basis for eigenspace corresponding to

Basis for eigenspace corresponding to

c. Basis for eigenspace corresponding to d. There are no eigenspaces. e. f. 7.

Basis for eigenspace corresponding to Basis for eigenspace corresponding to

a. 1, 2, 3 b. c. d. 2 e. 2 f.

9.

a. b.

11.

a.

b.

13. 15.

a.

and

b. No lines c.

True/False 5.1 (a) False (b) False (c) True

; basis for eigenspace corresponding to

(d) False (e) True (f) False (g) False

Exercise Set 5.2 1. Possible reason: Determinants are different. 3. Possible reason: Ranks are different. 5. 7. Not diagonalizable 9. Not diagonalizable 11. Not diagonalizable 13.

15.

17.

19.

21.

23.

25.

27. 33.

On possibility is

where

a. b. Dimensions will be exactly 1, 2, and 3. c.

True/False 5.2 (a) True (b) True (c) True (d) False (e) True (f) True (g) True (h) True

Exercise Set 5.3 1. 5. 7. 11. 13.

and

are as in Exercise 20 of Section 5.1.

15. 17. 19. 21. 23. 25. 27.

a. b. None

True/False 5.3 (a) False (b) True (c) False (d) True (e) False (f) False

Exercise Set 5.4 1.

a.

b. 3.

a.

b.

7. 9.

True/False 5.4 (a) False (b) False (c) True (d) True (e) False

Chapter 5 Supplementary Exercises 1.

(b) The transformation rotates vectors through the angle ; therefore, if direction.

3.

(c)

9. 11. 13. They are all 0. 15.

, then no nonzero vector is transformed into a vector in the same or opposite

17. They are all 0, 1, or

.

Exercise Set 6.1 1.

a. 5 b. c. d. e. f.

3.

a. 2 b. 11 c. d. e. 0

5.

a. b. 1 c. d. 1 e. 1 f. 1

7.

a. 3 b. 56

9.

(b) 29

11.

a.

b.

13.

a. b. 0

15.

a. b.

17. 19.

a. b. c.

21.

a.

b.

27.

For

, then

, so Axiom 4 fails.

29.

a. b. 0

True/False 6.1 (a) True (b) False (c) True (d) True (e) False (f) True (g) False

Exercise Set 6.2 1.

a. b. c. 0 d. e. f.

3.

a. b. 0

7. No 9.

a. b.

13. No 15.

a. b. c.

31.

a. The line b. The xz-plane c. The x-axis

True/False 6.2 (a) False (b) True (c) True (d) True (e) False (f) False

Exercise Set 6.3 1. (a), (b), (d) 3. (b), (d) 5. (a) 7.

a. b. c.

9.

a. b. c.

11.

(b)

13.

a. b.

15.

a. b.

17.

a. b.

19.

a. b.

21.

a.

b.

23.

25. 27. 29.

a.

b.

c.

d.

e.

f. Columns not linearly independent 33.

True/False 6.3 (a) False (b) False (c) True (d) True (e) False (f) True

Exercise Set 6.4 1.

a. b.

3.

a. b.

5.

a.

b.

7.

a. Solution: b. Solution:

; least squares error: (t a real number); least squares error:

c. Solution: 9.

(t a real number); least squares error:

a. (7, 2, 9, 5) b.

11.

13.

a.

A does not have linearly independent column vectors.

b.

A does not have linearly independent column vectors.

a.

b.

15.

a. b.

c. d. 17. 21.

True/False 6.4 (a) True (b) False (c) True (d) True (e) False (f) True (g) False (h) True

Exercise Set 6.5 1. 3. 11.

True/False 6.5 (a) False (b) True (c) False (d) True

Exercise Set 6.6 1.

a. b.

3.

a. b.

5.

a. b.

9.

True/False 6.6 (a) False (b) True (c) True (d) False (e) True

Chapter 6 Supplementary Exercises 1.

a.

with

b.

3.

a. The subspace of all matrices in

with only zeros on the diagonal.

b. The subspace of all skew-symmetric matrices in 7. 9. No 11.

(b)

approaches

17. No

Exercise Set 7.1 1.

(b)

3.

(a) (b)

(d)

(e)

7.

a. b.

9.

a.

.

b. 11.

a.

b.

13. 17.

The only possibilities are

21.

or

.

a. Rotations about the origin, reflections about any line through the origin, and any combination of these b. Rotation about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these c. No; dilations and contractions

True/False 7.1 (a) False (b) False (c) False (d) False (e) True (f) True (g) True (h) True

Exercise Set 7.2 1.

a. b. c. d. e. f.

3.

5.

7.

9.

15. No 19. Yes

True/False 7.2

one-dimensional;

one-dimensional

one-dimensional; one-dimensional;

two-dimensional two-dimensional

two-dimensional; three-dimensional;

one-dimensional

one-dimensional two-dimensional;

two-dimensional

(a) True (b) True (c) False (d) True (e) True (f) False (g) True

Exercise Set 7.3 1.

a. b. c.

3. 5.

7.

9.

a.

b. 11.

a. ellipse b. hyperbola c. parabola d. circle

13. Hyperbola: 15. Hyperbola: 17.

a. Positive definite b. Negative definite c. Indefinite d. Positive semidefinite e. Negative semidefinite

19. Positive definite 21. Positive semidefinite 23. Indefinite 27. 31.

a.

b. Yes

33. A must have a positive eigenvalue of multiplicity 2.

True/False 7.3 (a) True (b) False (c) True (d) True (e) False (f) True (g) True (h) True (i) False (j) True (k) False (l) False

Exercise Set 7.4 1. Maximum: 5 at

and

; minimum:

at

and

3. Maximum: 7 at (0, 1) and (0, −1); minimum: 3 at (1, 0) and (−1, 0) 5. Maximum: 9 at (1, 0, 0) and (−1, 0, 0); minimum: 3 at (0, 0, 1) and (0, 0, −1) 7. Maximum:

at

and

minimum:

9.

13. Critical points: (−1, 1), relative maximum; (0, 0), saddle point 15. Critical points: (0, 0), relative minimum; (2, 1) and (−2, 1), saddle points 17.

Corner points:

21.

True/False 7.4 (a) False (b) True (c) True (d) False (e) True

Exercise Set 7.5 1. 3.

5.

a. b.

9.

11.

at

and

13.

15.

17.

19.

21.

a. b.

29.

(c) B and C must commute.

37.

39. Multiplication of x by P corresponds to

times the orthogonal projection of x onto

corresponds to reflection of x about the hyperplane

True/False 7.5 (a) False (b) False (c) True (d) False (e) False

Chapter 7 Supplementary Exercises 1.

a.

b.

5.

7. positive definite 9.

a. parabola b. parabola

Exercise Set 8.1 1. Nonlinear 3. Linear 5. Linear

.

. If

, then multiplications of x by

7.

a. Linear b. Nonlinear

9. 11. 13. 15. (a) 17. (a) 19. (a) 21.

a. b. c.

23.

a.

b.

25.

c. Rank

nullity

d. Rank

nullity

a. b.

c. Rank d. Rank 27.

a. Kernel: y-axis; range: xz-plane b. Kernel: x-axis; range: yz-plane c. Kernel: the line through the origin perpendicular to the plane

29.

; range: plane

a. Nullity b. Nullity c. Nullity d. Nullity

31.

a. 3 b. No

33. A line through the origin, a plane through the origin, the origin only, or all of 35.

(b) No

41. ker(D) consists of all constant polynomials. 43.

a. b.

True/False 8.1 (a) True (b) False (c) True (d) False (e) True (f) True (g) False (h) False (i) False

Exercise Set 8.2

1.

a.

T is one-to-one

b.

T is not one-to-one

c.

T is one-to-one

d.

T is one-to-one

e.

; T is not one-to-one

f. 3.

; T is not one-to-one

a. Not one-to-one b. Not one-to-one c. One-to-one

5.

a. ker b. T is not one-to-one since

7.

.

a. T is one-to-one b. T is not one-to-one c. T is not one-to-one d. T is one-to-one

11.

a.

b.

c.

d.

13. T is not one-to-one since, for example,

is in its kernel.

15. Yes; it is one-to-one 17. T is not one-to-one since, for example a is in its kernel. 19. Yes

True/False 8.2 (a) False (b) True (c) False (d) True (e) False (f) False

Exercise Set 8.3 1.

a. b. c. d.

3.

a. b.

does not exist since

5. 11.

a. T has no inverse.

is not a

matrix.

b.

c.

d.

13.

a.

for

b. 15. 17.

a. (a) (d)

21.

a. b. c.

True/False 8.3 (a) True (b) False (c) False (d) True (e) False (f) True

Exercise Set 8.4 1.

a.

3.

a.

5.

a.

7.

a.

b. 9.

a. b. c.

d.

11.

a.

b. c. d. 13.

a.

b. 19.

a.

b.

c.

d. since 21.

a. b.

True/False 8.4 (a) False (b) False (c) True (d) False (e) True

Exercise Set 8.5 1.

3.

5.

7.

11.

a. b.

13.

a. b. Basis for eigenspace corresponding to

; basis for eigenspace corresponding to

21. The choice of an appropriate basis can yield a better understanding of the linear operator.

True/False 8.5 (a) False (b) True (c) True (d) True (e) True (f) False (g) True (h) False

Chapter 8 Supplementary Exercises 1. No. 5.

a.

, and if and any two of

b. 7.

a. b. T is not one-to-one.

11. 13.

15.

17.

19.

(b) (c)

21.

(d) The points are on the graph.

25.

Exercise Set 9.1 1. 3. 5. 7. 9. 11.

a.

b.

, and

form bases for the range;

, then is a basis for the kernel.

c.

13.

15. 17.

19.

(b)

True/False 9.1 (a) False (b) False (c) True (d) True (e) True

Exercise Set 9.2 1.

a.

dominant

b. No dominant eigenvalue 3.

; dominant eigenvalue:

;

dominant eigenvector: 5.

dominant eigenvalue:

;

dominant eigenvector:

7.

a. b. c.

Dominant eigenvalue:

; dominant eigenvector:

d. 0.1% 9. 13.

a. Starting with

, it takes 8 iterations.

Starting with

, it takes 8 iterations.

b.

Exercise Set 9.3

1.

3.

5. Sites 1 and 2 (tie); sites 3 and 4 are irrelevant 7. Site 2, site 3, site 4; sites 1 and 5 are irrelevant

Exercise Set 9.4 1.

a. b. c.

3.

, or about 18.5 hours

a. b. c. d.

5.

a. b. 1334

7. 9.

Exercise Set 9.5 1. 3. 5.

7.

9.

11.

True/False 9.5 (a) False (b) True (c) False (d) False (e) True (f) False (g) True

Exercise Set 9.6

s for forward phase, 10 s for backward phase

1.

3.

5.

7.

9. 70,100 numbers must be stored; A has 100,000 entries

True/False 9.6 (a) True (b) True (c) False

Chapter 9 Supplementary Exercises 1. 3.

5.

a.

b. c. 9.

11.

Exercise Set 10.1 1.

a. b.

2.

a.

or

b.

or

3.

(a parabola)

4.

a. b.

5.

a.

b. 6.

;

a.

or

b.

or

10.

11. The equation of the line through the three collinear points 12. 13. The equation of the plane through the four coplanar points

Exercise Set 10.2 1.

,

; maximum value of

2. No feasible solutions 3. Unbounded solution 4. Invest $6000 in bond A and $4000 in bond B; the annual yield is $880. 5. 6.

cup of milk, a.

ounces of corn flakes; minimum

and

b. c.

are nonbinding; for

for

is binding

is binding and for is nonbinding and for

yields the empty set. yields the empty set.

7. 550 containers from company A and 300 containers from company B; maximum shipping 8. 925 containers from company A and no containers from company B; maximum shipping 9. 0.4 pound of ingredient A and 2.4 pounds of ingredient B; minimum

Exercise Set 10.3 1. 700 2.

a. 5 b. 4

4.

a. Ox,

units; sheep,

b. First kind, 5.

measure; second kind,

a.

measure; third kind,

,

b. Exercise 7(b); gold, 6.

unit measure

,

minae; brass,

minae; tin,

minae; iron,

a.

where t is an arbitrary number b. Take

, so that

,

,

,

.

c. Take

, so that

,

,

,

.

minae

7.

a. Legitimate son, b. Gold,

staters; illegitimate son,

minae; brass,

minae; tin,

c. First person, 45; second person,

2.

a. b.

3.

a. The cubic runout spline b.

4.

Maximum at 5.

Maximum at 6.

a.

b. c. The three data points are collinear. 7.

(b)

8.

(b)

Exercise Set 10.5 1.

a. b. P is regular since all entries of P are positive;

2.

a.

minae; iron,

; third person,

Exercise Set 10.4

staters minae

b. P is regular, since all entries of P are positive:

3.

a.

b.

c.

4.

a. Thus, no integer power of P has all positive entries.

b. c.

as n increases, so The entries of the limiting vector

for any are not all positive.

6. has all positive entries;

7. 8.

in region 1,

Exercise Set 10.6 1.

a.

b.

c.

2.

a.

in region 2, and

in region 3

as n increases.

b.

c.

3.

a.

b.

c.

4.

(a)

(c) The 5.

th entry is the number of family members who influence both the ith and jth family members.

a. b. c.

6.

and

a. None b.

7.

8. First, A; second, B and E (tie); fourth, C; fifth, D

Exercise Set 10.7 1.

a. b. c.

2. 3.

Let

, for example. a. b. c.

d.

4.

a.

b.

c. d.

e.

5.

Exercise Set 10.8 1.

a. b.

c.

2.

a. Use Corollary 10.8.4; all row sums are less than one. b. Use Corollary 10.8.5; all column sums are less than one. c. Use Theorem 10.8.3, with

3.

.

has all positive entries.

4. Price of tomatoes, $120.00; price of corn, $100.00; price of lettuce, $106.67 5. $1256 for the CE, $1448 for the EE, $1556 for the ME 6.

(b)

Exercise Set 10.9 1. The second class; $15,000 2. $223 3. 5. 6.

Exercise Set 10.10 1.

a.

b.

c.

d.

2.

(b)

(c) 3.

a.

b.

c.

4.

a.

b. 5.

a.

b. 6.

7.

a.

b.

Exercise Set 10.11 1.

a.

b.

c.

d. for

and

,

2. 3.

Exercise Set 10.12 1.

(c)

2.

a.

b. Same as part (a) c.

%; for

and

,

%

4.

,

,

7.

8.

Exercise Set 10.13 1.

2.

;

Rotation angles:

(upper left);

(upper right);

(lower left);

(lower right);

3. 4.

a. (i)

; (ii) all rotation angles are

b. (i)

; (ii) all rotation angles are

c. (i)

; (ii) rotation angles:

(top); 1

d. (i)

; (ii) rotation angles:

(upper left);

; (iii)

This set is a fractal.

; (iii)

This set is a fractal. (lower left); (upper right);

5. 6.

(0.766, 0.996) rounded to three decimal places

7. 8. 9. 10.

; the cube is not a fractal. ;

;

; the set is a fractal.

(lower right); (iii) (lower right) (iii)

This set is a fractal. This set is a fractal.

11.

12. Area of

; area of

; area of

; area of

; area of

Exercise Set 10.14 1.

,

,

2. One 1-cycle:

,

,

,

; one 3-cycle:

,

,

; two 4-cycles:

and

two 12-cycles:

; and

, 3.

,

.

(a) 3, 7, 10, 2, 12, 14, 11, 10, 6, 1, 7, 8, 0, 8, 8, 1, 9, 10, 4, 14, 3, 2, 5, 7, 12, 4, 1, 5, 6, 11, 2, 13, 0, 13, 13, 11, 9, 5, 14, 4, 3, 7, (c) (5, 5), (10, 15), (4, 19), (2, 0), (2, 2), (4, 6), (10, 16), (5, 0), (5, 5),

4.

(c) The first five iterates of

6.

(b)

are

,

The matrices of Anosov automorphisms are

,

and

(c) The transformation affects a rotation of S through

,

, and

.

.

in the clockwise direction.

9.

In region I: 12.

and

; in region II: form one 2-cycle, and

; in region III: and

; in region IV:

form another 2-cycle.

14. Begin with a array of white pixels and add the letter ‘A’ in black pixels to it. Apply the mapping to this image, which will scatter the black pixels throughout the image. Then superimpose the letter ‘B’ in black pixels onto this image. Apply the mapping again and then superimpose the letter ‘C’ in black pixels onto the resulting image. Repeat this procedure with the letters ‘D’ and ‘E’. The next application of the mapping will return you to the letter ‘A’ with the pixels for the letters ‘B’ through ‘E’ scattered in the background.

Exercise Set 10.15 1.

a. GIYUOKEVBH b. SFANEFZWJH

2.

a. b. Not invertible c. d. Not invertible e. Not invertible f.

3. WE LOVE MATH 4. 5. THEY SPLIT THE ATOM 6. I HAVE COME TO BURY CAESAR 7.

a. 010110001 b.

8. A is invertible modulo 29 if and only if

(mod 29).

Exercise Set 10.16 2.

3.

4.

Eigenvalues:

,

; eigenvectors:

5. 12 generations; .006% 6.

;

8.

Exercise Set 10.17 1.

a.

b. c. 7. 2.375 8. 1.49611

Exercise Set 10.18

1.

a. of population;

b. of population;

; harvest 57.9% of youngest age class

2.

4. 5.

Exercise Set 10.19 1. 2.

3. 4. 5.

Exercise Set 10.20 1.

a. Yes; b. No; c. Yes; d. Yes;

2.

number of triangles

3. 4.

a.

number of vertex points

,

number of boundary vertex points

; Equation (7) is

.

b.

5.

a. b. c. d.

7.

a. Two of the coefficients are zero. b. At least one of the coefficients is zero. c. None of the coefficients are zero.

8.

a. b.

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved.