Appendix A: Matrix Algebra

2 downloads 0 Views 398KB Size Report
order, matrix multiplication of type Cayley, Kronecker-Zehfuss, Khatri-Rao and. Hadamard. Second, we introduce special matrices of type symmetric, antisym- metric .... (of two rectangular matrices of the same order; elementwise product). [ ], ( ) ... The transported Khatri-Rao-product generates a row product which we do.
Appendix A: Matrix Algebra As a two-dimensional array we define a quadratic and rectangular matrix. First, we review matrix algebra with respect to two inner and one external relation, namely multiplication of a matrix by a scalar, addition of matrices of the same order, matrix multiplication of type Cayley, Kronecker-Zehfuss, Khatri-Rao and Hadamard. Second, we introduce special matrices of type symmetric, antisymmetric, diagonal, unity, null, idempotent, normal, orthogonal, orthonormal (special facts of representing a 2×2 orthonormal matrix, a general n×n orthonormal matrix, the Helmert representation of an orthonormal matrix with examples, special facts about the representation of a Hankel matrix with examples, the definition of a Vandermonde matrix), the permutation matrix, the commutation matrix. Third, scalar measures like rank, determinant, trace and norm. In detail, we review the Inverse Partitional Matrix /IPM/ and the Cayley inverse of the sum of two matrices. We summarize the notion of a division algebra. A special paragraph is devoted to vector-valued matrix forms like vec, vech and veck. Fifth, we introduce the notion of eigenvalue-eigenvector decomposition (analysis versus synthesis) and the singular value decomposition. Sixth, we give details of generalized inverse, namely g-inverse, reflexive g-inverse, reflexive symmetric ginverse, pseudo inverse, Zlobec formula, Bjerhammar formula, rank factorization, left and right inverse, projections, bordering, singular value representation and the theory solving linear equations.

A1 Matrix-Algebra A matrix is a rectangular or a quadratic array of numbers,  a11 a  21 A : [aij ]   ...   an 11  an1

a12

...

a1m 1

a22

...

a2 m 1

... an 12

... ... ... an 1 m 1

an 2

...

anm 1

a1m  a2 m  ...  , aij  ,[aij ]   n m .  an 1m  anm 

The format or “order” of A is given by the number n of rows and the number of the columns, O( A) : n  m.

Fact: Two matrices are identical if they have identical format and if at each place (i, j) are identical numbers, namely

 i  {1,..., n} A  B  aij  bij   j  {1,..., m}.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

486

A1 Matrix-Algebra

Beside the identity of two matrices the transpose of an m  n matrix A  [aij ] is the m  n matrix Α  [a ji ] whose ij element is a ji . Fact:

( A)  A.

A matrix algebra is defined by the following operations:  multiplication of a matrix by a scalar (external relation)  addition of two matrices of the same order (internal relation)  multiplication of two matrices (internal relation) Definition (matrix additions and multiplications):

(1) Multiplication by a scalar Α  [aij ],      A  A  [ aij ].

(2) Addition of two matrices of the same order A  [aij ], B  [bij ]  A  B : [aij  bij ] A  B  B  A (commutativity) (A  B)  C  A  (B  C) (associativity)

A  B  A  ( 1)B (inverse addition).

Compatibility (   ) A   A   A  distributivity  ( A  B)   A   B  ( A  B)  A  B.

(3) Multiplication of matrices 3(i) “Cayley-product” (“matrix-product”)  A  [aij ], O( A)  n  l     B  [bij ], O(B)  l  m  l

 C : A  B  [cij ] :  aik bkl , O (C)  n  m k 1

3(ii) “Kronecker-Zehfuss-product” A  [aij ], O ( A )  n  m   B  [bij ], O (B)  k  l 

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

487

A1 Matrix-Algebra

 C : B  A  [cij ], B  A : [bij A ], O (C)  O (B  A )  kn  l

3(iii) “Khatri-Rao-product” (of two rectangular matrices of identical column number) A  [a1 ,..., am ], O ( A )  n  m   B  [b1 ,..., bm ], O (B)  k  m   C : B  A : [b1  a1 , , bm  am ], O (C)  kn  m

3(iv) “Hadamard-product” (of two rectangular matrices of the same order; elementwise product) G  [ gij ], O (G )  n  m   H =[hij ], O (H )  n  m   K : G  H  [kij ], kij : gij hij , O (K )  n  m .

The existence of the product A  B does not imply the existence of the product B  A . If both products exist, they are in general not equal. Two quadratic matrices A and B, for which holds A  B = B  A , are called commutative. Laws

(i)

(A  B)  C  A  (B  C) A  ( B  C)  A  B  A  C ( A  B)  C  A  C  B  C ( A  B )  ( B  A ) .

(ii) ( A  B )  C  A  ( B  C)  A  B  C ( A  B )  C  ( A  B )  ( B  C) A  ( B  C)  ( A  B )  ( A  C ) ( A  B )  ( C  D )  ( A  C)  ( B  D ) ( A  B )  A   B .

(iii) ( A  B )  C  A  ( B  C)  A  B  C ( A  B )  C  ( A  C)  ( B  C) A  ( B  C)  ( A  B )  ( A  C ) ( A  C)  (B  D)  ( A  B)  (C  D) A  (B  D)  ( A  B)  D, if d ij  0 for i  j.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

488

A1 Matrix-Algebra

The transported Khatri-Rao-product generates a row product which we do not follow here. (iv)

A B  B A ( A  B )  C  A  ( B  C)  A  B  C ( A  B )  C  ( A  C)  ( B  C) ( A1  B1  C1 )  ( A 2  B 2  C2 )  ( A1  A 2 )  ( B1  B 2 )  (C1  C2 ) (D  A)  (B  D)  D  ( A  B)  D, if dij  0 for i  j ( A  B)  A  B.

A2 Special Matrices We will collect special matrices of symmetric, antisymmetric, diagonal, unity, zero, idempotent, normal, orthogonal, orthonormal, positive-definite and positive-semidefinite, special orthonormal matrices, for instance of type Helmert or of type Hankel. Definitions (special matrices):

A quadratic matrix A  [aij ] of the order O( A)  n  n is called symmetric

 aij  a ji i, j  {1,..., n} : A  A 

antisymmetric  aij   a ji i, j  {1,..., n} : A   A   aij  0  i  j ,

diagonal

A  Diag[a11 ,..., ann ]  aij  0  i  j  I n n    aij  1  i  j

unity zero matrix

0 n n : aij  0  i, j  {1,..., n}

upper   triangular: lower 

 aij  0 i  j  a  0 i  j  ij

idempotent if and only if A  A  A normal if and only if A  A  A   A . Definition (orthogonal matrix) :

The matrix A is called orthogonal if AA  and A A are diagonal matrices. (The rows and columns of A are orthogonal.)

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A2 Special Matrices

489

Definition (orthonormal matrix) :

The matrix A is called orthonormal if AA   A A  I . (The rows and columns of A are orthonormal.) Facts (representation of a 22 orthonormal matrix) X  SO ( 2) :

A 22 orthonormal matrix X  SO(2) is an element of the special orthogonal group SO(2) defined by SO(2) : {X   22 | X X  I 2 and det X  1}

x {X   1  x3

(i)

x2    22 x 4 

 cos  X  sin 

x12  x 22  1 x1 x3  x 2 x 4  0 , x1 x 4  x 2 x3  1} x32  x 42  1

sin     22 ,   [0, 2 ] cos  

is a trigonometric representation of X  SO (2) . (ii)

 x X 2  1  x

1  x 2    22 , x  [1, 1]  x 

is an algebraic representation of X  SO(2) 2 2 ( x112  x122  1, x11 x 21  x12 x 22   x 1  x 2  x 1  x 2  0, x 21  x 22  1) .

(iii)

 1  x2 2x     2 1  x 2    2 2 , x   X   1 x 2 1 x   2 x  1  x 2 1  x 2 

is called a stereographic projection of X (stereographic projection of SO(2) ~ 1 onto 1 ). (iv)

 0 x X  (I 2  S)(I 2  S) 1 , S   ,  x 0 

where S  S  is a skew matrix (antisymmetric matrix), is called a Cayley-Lipschitz representation of X  SO ( 2) . (v)

X  SO(2) is a commutative group (“Abel”)

(Example: X1  SO(2) , X 2  SO(2) , then X1 X 2  X 2 X1 ) ( SO(n) for n  2 is the only commutative group, SO(n | n  2) is not “Abel”).

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A2 Special Matrices

490

Facts (representation of an nn orthonormal matrix) X  SO(n) :

An nn orthonormal matrix X  SO(n) is an element of the special orthogonal group SO(n) defined by SO(n) : {X   nn | XX  I n and det X  1} .

As a differentiable manifold SO(n) inherits a Riemann structure from the ambi2 n 2 ent space  n with a Euclidean metric ( vec X   , dim vec X  n ). Any atlas of the special orthogonal group SO(n) has at least four distinct charts and there is one with exactly four charts. (“minimal atlas”: Lusternik – Schnirelmann category) 2

(i)

X  (I n  S)(I n  S) 1 ,

where S  S is a skew matrix (antisymmetric matrix), is called a Cayley-Lipschitz representation of X  SO(n) . ( n! / 2(n  2)! is the number of independent parameters/coordinates of X) (ii)

If each of the matrices R 1 ,  , R k is an nn orthonormal matrix, then their product R1R 2  R k 1R k  SO(n)

is an nn orthonormal matrix. Facts (orthonormal matrix: Helmert representation) :

Let a  [a1 ,  , a n ] represent any row vector such that a i  0 (i {1,  , n}) is any row vector whose elements are all nonzero. Suppose that we require an nn orthonormal matrix, one row which is proportional to a . In what follows one such matrix R is derived. Let [r1,  , rn ] represent the rows of R and take the first row r1 to be the row of R that is proportional to a . Take the second row r2 to be proportional to the ndimensional row vector [a1 ,  a12 / a 2 , 0, 0,  , 0],

(H2)

the third row r3 proportional to [a1 , a 2 ,  (a12  a 22 ) / a 3 , 0, 0, , 0]

(H3)

and more generally the first through nth rows r1, , rn proportional to k 1

[a1 , a 2 ,  , a k 1 ,   a i2 / a k , 0, 0,  , 0]

(Hn-1)

i 1

for k  {2, , n} ,

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A2 Special Matrices

491

respectively confirm to yourself that the n-1 vectors ( H n 1 ) are orthogonal to each other and to the vector a . In order to obtain explicit expressions for r1,  , rn it remains to normalize a and the vectors ( H n 1 ). The Euclidean norm of the kth of the vectors ( H n 1 ) is k 1

k 1

k 1

k

i 1

i 1

i 1

i 1

{ a i2  ( a i2 ) 2 / a k2 }1 / 2  {( a i2 ) ( a i2 ) / a k2 }1 / 2 .

Accordingly for the orthonormal vectors r1, , rn we finally find n

r1  [ a i2 ] 1 / 2 (a1 ,  , a n )

(1st row)

i 1

(kth row) rk  [

(nth row)

a k2 k 1

k

i 1

i 1

( a i2 ) ( a i2 ).

rn  [

a i2 , 0, 0, , 0) i 1 a k

k 1

] 1 / 2 (a1 , a 2 , , a k 1 ,  

a n2

a i2 ] . i 1 a n

n 1

n 1

n

i 1

i 1

( a i2 ) ( a i2 ).

] 1 / 2 [a1 , a 2 ,  , a n 1 ,  

The recipy is complicated: When a  [1, 1,  ,1, 1] , the Helmert factors in the 1st row, …, kth row,…, nth row simplify to r1  n 1 / 2 [1, 1,  ,1, 1]   n rk  [k (k  1)]1 / 2 [1, 1,  ,1, 1  k , 0, 0,  , 0, 0]   n

rn  [ n( n  1)]

1/ 2

[1, 1,  ,1, 1  n]   . n

The orthonormal matrix  r1   r   2     rk1   SO(n)  rk     r    n 1   rn 

is known as the Helmert matrix of order n. (Alternatively the transposes of such a matrix are called the Helmert matrix.)

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A2 Special Matrices

492

Example (Helmert matrix of order 3): 1/ 3  1/ 2  1/ 6

1/ 3   0   SO(3).  2 / 6 

1/ 3 1/ 2 1/ 6

Check that the rows are orthogonal and normalized. Example (Helmert matrix of order 4):  1/ 2   1/ 2   1/ 6 1/ 12 

1/ 2

1/ 2

1/ 2

0

1/ 6

2 / 6

1/ 12

1/ 12

1/ 2   0    SO(4). 0  3 / 12 

Check that the rows are orthogonal and normalized. Example (Helmert matrix of order n):  1/ n  1/ 2   1/ 6     1   (n 1)(n  2)   1  n(n 1) 

1/ n

1/ n 

1/ n

1/ n

1/ 2

0

0



0

1/ 6

2/ 6

0

 

0

1

1

(n 1)(n  2)

(n 1)(n  2)









1

1

n(n 1)

n(n 1)

1 (n 1) (n 1)(n  2) 1 n(n 1)

1/ n   0   0     SO(n).  0   1 n   n(n 1) 

Check that the rows are orthogonal and normalized. An example is the nth row 1 n ( n  1)

 

n n 2



n( n  1)



1 n( n  1)

n( n  1) n( n  1)



(1  n )

2

n ( n  1)



n 1 n ( n  1)



1  2n  n n ( n  1)

2



 1,

where (n-1) terms 1/[n(n-1)] have to be summed. Definition (orthogonal matrix) :

A rectangular matrix A  [aij ]   n m is called “a Hankel matrix” if the n+m-1 distinct elements of A , With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A2 Special Matrices

493  a11 a  21     an 11  an1

an 2

       anm 

only appear in the first column and last row. Example: Hankel matrix of power sums

Let A   n m be a nm rectangular matrix ( n  m ) whose entries are power sums.  n    i xi  i 1  n  x2 A :  i 1 i i    n   xn i i   i 1

n

 x

2 i i

i 1 n

 x

3 i i

i 1

 n

 x i 1

n 1 i i

   n m 1     i xi  i 1      n    i xin  m 1   i 1 

n

 x i 1

m i i

A is a Hankel matrix. Definition (Vandermonde matrix):

Vandermonde matrix: V   nn  1  x V :  1  x1n 1

1  1  x2  xn     , n 1 n 1  x2  xn 

n

det V   ( xi  x j ). i, j i j

Example: Vandermonde matrix V   33 1 V :  x1  x12

1 x2 x22

1 x3  , det V  ( x2  x1 )( x3  x2 )( x3  x1 ). x32 

Example: Submatrix of a Hankel matrix of power sums

Consider the submatrix P  [a1 , a2 , , an ] of the Hankel matrix A   n m (n  m) whose entries are power sums. The determinant of the power sums matrix P is With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A2 Special Matrices

494 n

n

i 1

i 1

det P  (  i )( xi )(det V ) 2 ,

where det V is the Vandermonde determinant. Example: Submatrix P   33 of a 34 Hankel matrix of power sums (n=3,m=4) A  1 x1   2 x2   3 x3  x   x   x  2 2 2 1 x1   2 x2   3 x3  x   x   x 3 3 3 1 x1   2 x2   3 x3  x   x   x  2 1 1 3 1 1 4 1 1

2 2 2 3 2 2 4 2 2

2 3 3 3 3 3 4 3 3

1 x13   2 x23   3 x33 1 x14   2 x24   3 x34   1 x14   2 x24   3 x34 1 x15   2 x25   3 x35  1 x15   2 x25   3 x35 1 x16   2 x26   3 x36 

P  [a1 , a2 , a3 ]  1 x1   2 x2   3 x3 1 x12   2 x22   3 x32 1 x13   2 x23   3 x33   2 2 2 3 3 3 4 4 4 1 x1   2 x2   3 x3 1 x1   2 x2   3 x3 1 x1   2 x2   3 x3  . 1 x13   2 x23   3 x33 1 x14   2 x24   3 x34 1 x15   2 x25   3 x35   

Definitions (positive definite and positive semidefinite matrices)

A matrix A is called positive definite, if and only if xAx  0 x   n , x  0 .

A matrix A is called positive semidefinite, if and only if xAx  0 x   n .

An example follows. Example (idempotence):

All idempotent matrices are positive semidefinite, at the time BB and BB for an arbitrary matrix B . What are “permutation matrices” or “commutation matrices”? After their definitions we will give some applications. Definitions (permutation matrix, commutation matrix)

A matrix is called a permutation matrix if and only if each column of the matrix A and each row of A has only one element 1 . All other elements are zero. There holds AA   I . A matrix is called a commutation matrix, if and only if for a matrix of the order n 2  n 2 there holds K  K  and K 2  I n2 .

The commutation matrix is symmetric and orthonormal. With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A3 Scalar Measures and Inverse Matrices

495

Example (commutation matrix)

n2

1 0 K4   0  0

0 0 0 0 1 0   K 4 . 1 0 0  0 0 1

A general definition of matrices K nm of the order nm  nm with n  m are to found in J.R. Magnus and H. Neudecker (1988 p.46-48). This definition does not lead to a symmetric matrix anymore. Nevertheless is the transpose commutation matrix again a commutation matrix since we have K nm  K nm and K nm K mn  I nm . Example (commutation matrix)

n  2  m  3

n  3  m  2 

K 23

1 0 0  0 0 0  0

0 0 0 1 0 0 0

0 1 0 0 0 0 0

0 0 0 0 1 0 0

0 0 1 0 0 0 0

0 0 0 0 0 0 1 

K 32

1 0   0 0 0  0

0 0 1 0 0 0

0 0 0 0 1 0

0 1 0 0 0 0

0 0 0 1 0 0

0 0 0 0 0 1 

K 32 K 23  I 6  K 23 K 32 .

A3 Scalar Measures and Inverse Matrices We will refer to some scalar measures, also called scalar functions, of matrices. Beforehand we will introduce some classical definitions of type

 linear independence  column and row rank  rank identities. Definitions (linear independence, column and row rank):

A set of vectors x1 , ..., x n is called linear independent if for an arbitrary n linear combination  i 1 i xi  0 only holds if all scalars 1 , ...,  n disappear, that is if 1   2  ...   n 1   n  0 holds.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A3 Scalar Measures and Inverse Matrices

496

For all vectors which are characterized by x1 ,..., x n unequal from zero are called linear dependent. Let A be a rectangular matrix of the order O( Α)  n  m . The column rank of the matrix A is the largest number of linear independent columns, while the row rank is the largest number of linear independent rows. Actually the column rank of the matrix A is identical to its row rank. The rank of a matrix thus is called rk A .

Obviously,

rk A  min{n, m}.

If rk A  n holds, we say that the matrix A has full row ranks. In contrast if the rank identity rk A  m holds, we say that the matrix A has full column rank. We list the following important rank identities. Facts (rank identities):

(i)

rk A  rk A   rk A A  rk AA 

(ii)

rk( A  B )  rk A  rk B

(iii)

rk( A  B)  min{rk A, rk B}

(iv)

rk( A  B)  rk A if B has full row rank,

(v)

rk( A  B )  rk B if A has full column rank.

(vi)

rk( A  B  C)  rk B  rk( A  B)  rk( B  C)

(vii)

rk( A  B)  (rk A)  (rk B).

If a rectangular matrix of the order O( A)  n  m is fulfilled and, in addition, Ax  0 holds for a certain vector x  0 , then

rk A  m  1 . Let us define what is a rank factorization, the column space, a singular matrix and, especially, what is division algebra. Facts (rank factorization)

We call a rank factorization A  GF , if rk A  rk G  rk F holds for certain matrices G and F of the order

O(G )  n  rk A and O(F)  rk A  m. With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

497

A3 Scalar Measures and Inverse Matrices

Facts

A matrix A has the column space  ( A)

formed by the column vectors. The dimension of such a vector space is dim  ( A)  rk A . In particular,  ( A)   ( AA)

holds. Definition (non-singular matrix versus singular matrix)

Let a quadratic matrix A of the order O( A) be given. A is called nonsingular or regular if rk A  n holds. In case rk A  n, the matrix A is called singular. Definition (division algebra):

Let the matrices A, B, C be quadratic and non-singular of the order O( A)  O(B)  O(C)  n  n . In terms of the Cayley-product an inner relation can be based on A  [aij ], B  [bij ], C  [cij ], O( A)  O(B)  O(C)  n  n

(i)

( A  B )  C  A  ( B  C)

(ii)

AI  A

(identity)

(iii)

A  A 1  I

(inverse).

(associativity)

The non-singular matrix A 1  B is called Cayley-inverse. The conditions A  B  In  B  A  In

are equivalent. The Cayley-inverse A 1 is left and right identical. The Cayleyinverse is unique. Fact: ( A 1 )   ( A ) 1 : A is symmetric  A 1 is symmetric. Facts: (Inverse Partitional Matrix /IPM/ of a symmetric matrix):

Let the symmetric matrix A be partitioned as A A :  11   A 12

A 12    A 11 , A 22  A 22 . , A 11 A 22 

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

498

A3 Scalar Measures and Inverse Matrices

Then its Cayley inverse A 1 is symmetric and can be partitioned as well as A A 1   11   A 12

A 12  A 22 

1 1 1 [I  A 11  A 11  ]A 11 A 12 ( A 22  A 12 A 12 ) 1 A 12  1 1  A 11  A 11 A 12 ) 1 A 12  ( A 22  A 12 

1



1 1  A 11 A 12 ( A 22  A 12 A 12 ) 1   A 11 , 1  A 11 ( A 22  A 12 A 12 ) 1 

1 if A 11 exists ,

A

A   11   A 12

1

A 12  A 22 

1



   A 221 A 12 ) 1  A 221 A 12 ) 1 A 12 A 221 ( A 11  A 12  ( A 11  A 12 ,  1 1 1 1 1 1 1      A A ( A A A A ) [ I A A ( A A A A ) A ] A     22 12 11 12 22 12 22 12 11 12 22 12 12 22   1 if A 22 exists . 1 1  A 11  A 22 S 11 : A 22  A 12 A 12 and S 22 : A 11  A 12 A 12

are the minors determined by properly chosen rows and columns of the matrix A called “Schur complements” such that A

A   11   A 12

1

1 1 1 (I  A 11  ) A 11 A 12 S 11 A 12  1 1  A 11  S 11 A 12 

A 12  A 22 

1



1 1   A 11 A 12 S 11  1 S 11 

1 if A 11 exists ,

A A 1   11   A 12  S 221  1 1  S 22  A 22 A 12

A 12  A 22 

1



1   S 221 A 12 A 22 1 1 1   S 22 A 12 ]A 22  [I  A 22 A 12

if A 221 exists ,

are representations of the Cayley inverse partitioned matrix A 1 in terms of “Schur complements”.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

499

A3 Scalar Measures and Inverse Matrices

The formulae S11 and S 22 were first used by J. Schur (1917). The term “Schur complements” was introduced by E. Haynsworth (1968). A. Albert (1969) replaced the Cayley inverse A 1 by the Moore-Penrose inverse A  . For a survey we recommend R. W. Cottle (1974), D.V. Oullette (1981) and D. Carlson (1986). :Proof:

For the proof of the “inverse partitioned matrix” A 1 (Cayley inverse) of the partitioned matrix A of full rank we apply Gauss elimination (without pivoting). AA 1  A 1 A  I A A   11   A 12

A 12    A 11 , A 22  A 22 , A 11 A 22 

 A   mm , A   ml 12  11 l m l l   , A 22    A 12 B A 1   11  B 12

B 12    B 11 , B 22  B 22 , B 11 B 22 

B   mm , B   ml 12  11 l m l l   , B 22    B12 AA 1  A 1 A  I



  B11A11  B12 A12   Im  A11B11  A12 B12 A B  A B  B A  B A  0 12 22 11 12 12 22  11 12  B11  A 22 B12   B12  A11  B 22 A12  0  A12   B12  A 22 B 22  B12  A12  B 22 A 22  I l  A12

(1) (2) (3) (4).

1 Case (i): A 11 exists

“forward step”   I m (first left equation: A11B11  A12 B12

  1  multiply by  A12 A11 )   B11  A 22 B12   0 (second right equation)  A12 



1 1   B 11  A 12  A 11   A 12  A 11 A 12 B12  A 12    B11  A 22 B 12  0 A 12 

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

500

A3 Scalar Measures and Inverse Matrices

  Im  A B  A 12 B 12   11 11  1 1  A 11 A 12 )B 12    A 12  A 11 ( A 22  A 12 1 1   ( A 22  A 12  A 11  A 11 B 12 A 12 ) 1 A 12 1 1   S 11 A 12  A 11 B 12

or

 Im   A A 1  12 11

0   A11

I l   A12

A12 

 A11   A 22   0

 . A 22  A12 A11 A12  A12

1

1  A 11 Note the “Schur complement” S 11 : A 22  A 12 A 12 .

“backward step”   Im A 11B 11  A 12 B12   1 1 1     ( A 22  A 12  A 11 A 12 ) A 12  A 11  B12 1 1  )  (I m  B 12 A 12  ) A 11  B11  A 11 (I m  A 12 B 12 1 1 1  A 11  ]A 11 B 11  [I m  A 11 A 12 ( A 22  A 12 A 12 ) 1 A 12 1 1 1 1  A 11 B 11  A 11  A 11 A 12 S 11 A 12

A11B12  A12 B 22  0 (second left equation)  1 1 1  A 11  B 12   A 11 A 12 B 22   A 11 A 12 ( A 22  A 12 A 12 ) 1

 1  A11 B 22  ( A 22  A12 A12 ) 1 1 B 22  S11 .

Case (ii): A 221 exists

“forward step” A11B12  A12 B 22  0 (third right equation)   B12  A 22 B 22  I l (fourth left equation:   A12  1 multiply by  A12 A 22 )  

A 11B 12  A 12 B 22  0   1  B 12  A 12 B 22   A 12 A 221   A 12 A 22 A 12

 A  B  A 22 B 22  I l   12 12  1  )B 12   A 12 A 221 ( A 11  A 12 A 22 A 12

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

501

A3 Scalar Measures and Inverse Matrices 1  ) 1 A 12  A 221 B 12  ( A 11  A 12 A 22 A 12 1 1 B 12  S 22 A 12 A 22

or I m  0

1   A 11  A 12 A 22   Il   A 12

1 A 12   A 11  A 12 A 22  A 12   A 22    A 12

0  . A 22 

1  . A 12 Note the “Schur complement” S 22 : A 11  A 12 A 22

“backward step”  B12  A 22 B 22  I l A 12  ) A 12 A B12   ( A 11  A 12 A A 12 1 22

1

1 22

   

1 1  B12  )  (I l  B12  A 12 ) A 22  B 22  A 22 (I l  A 12

 ( A 11  A 12 A 221 A 12  ) 1 A 12 ]A 221 B 22  [I l  A 221 A 12 1 1 1 1  S 22 A 12 A 22 B 22  A 22  A 22 A 12   0 ( third left equation )   B 11  A 22 B 12 A 12 1 1 1    A 22  B11   A 22  ( A 11  A 12 A 22  ) 1  B 12 A 12 A 12 A 12

 B 1 1  ( A 1 1  A 1 2 A 2 21 A 1 2 )  1 B 1 1  S 2 21 .

  , B 22 } in terms of { A11 , A12 , A 21 = A12  , The representations { B11 , B12 , B 21  B12 A 22 } have been derived by T. Banachiewicz (1937). Generalizations are referred to T. Ando (1979), R. A. Brunaldi and H. Schneider (1963), F. Burns, D. Carlson, E. Haynsworth and T. Markham (1974), D. Carlson (1980), C. D. Meyer (1973) and S. K. Mitra (1982), C. K. Li and R. Mathias (2000).

We leave the proof of the following fact as an exercise. Fact (Inverse Partitioned Matrix /IPM/ of a quadratic matrix):

Let the quadratic matrix A be partitioned as A A :  11  A 21

A 12  . A 22 

Then its Cayley inverse A 1 can be partitioned as well as With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

502

A3 Scalar Measures and Inverse Matrices

A A 1   11  A 21

A 12  A 22 

1 1 1 1  A 11 A 12 S 11 A 21 A 11  A 11  1 1 A 21 A 11  S 11 

1



1 1  A 12 S 11  A 11 , 1 S 11 

1 if A 11 exists

A

1

 S 221  1 1  A 22 A 21S 22

A   11  A 21

A 12  A 22 

1



 ,  A A 21S A 12 A   S 221 A 12 A 221

A

1 22

1 22

1 22

1 22

1 if A 22 exists

and the “Schur complements” are definded by 1 1 S 11 : A 22  A 21 A 11 A 12 and S 22 : A 11  A 12 A 22 A 21 .

Facts: ( Cayley inverse: sum of two matrices):

(s1)

( A + B) 1  A 1  A 1 ( A 1  B 1 ) 1 A 1

(s2)

( A  B) 1  A 1  A 1 ( A 1  B 1 ) 1 A 1

(s3)

( A  CBD) 1  A 1  A 1 (I  CBDA 1 ) 1 CBDA 1

(s4)

( A  CBD) 1  A 1  A 1 (I  BDA 1C) 1 BDA 1

(s5)

( A  CBD) 1  A 1  A 1CB(I  DA 1CB) 1 DA 1

(s6)

( A  CBD) 1  A 1  A 1CBD(I  A 1CBD) 1 A 1

(s7)

( A  CBD) 1  A 1  A 1CBDA 1 (I  CBDA 1 ) 1

(s8)

( A  CBD) 1  A 1  A 1C(B 1  DA 1C) 1 DA 1

( Sherman-Morrison-Woodbury matrix identity ) (s9)

B( AB  C) 1  (I  BC1 A) 1 BC1

(s10)

BD( A  CBD) 1  (B 1  DA 1C) 1 DA 1

(Duncan-Guttman matrix identity). W. J. Duncan (1944) calls (s8) the Sherman-Morrison-Woodbury matrix identity. If the matrix A is singular consult H. V. Henderson and G. S. Searle (1981), D. V. Ouellette (1981), W. M. Hager (1989), G. W. Stewart (1977) and K. S. Riedel

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A3 Scalar Measures and Inverse Matrices

503

(1992). (s10) has been noted by W. J. Duncan (1944) and L. Guttman (1946): The result is directly derived from the identity ( A  CBD)( A  CBD) 1  I   A( A  CBD) 1  CBD( A  CBD) 1  I ( A  CBD) 1  A 1  A 1CBD( A  CBD) 1 A 1  ( A  CBD) 1  A 1CBD( A  CBD) 1 DA 1  D( A  CBD) 1  DA 1CBD( A  CBD) 1 DA 1  (I  DA 1CB)D( A  CBD) 1 DA 1  (B 1  DA 1C)BD( A  CBD) 1 (B 1  DA 1C) 1 DA 1  BD( A  CBD) 1 .

 Certain results follow directly from their definitions. Facts (inverses):

(i)

( A ⋅ B)-1 = B-1 ⋅ A-1

(ii)

( A Ä B)-1 = B-1 Ä A-1

(iii)

A positive definite  A-1 positive definite

(iv)

( A Ä B)-1 , ( A * B)-1 and (A-1 * B-1 ) are positive definite, then (A-1 * B-1 ) - ( A * B)-1 is positive semidefinite as well as (A-1 * A ) - I and I - (A-1 * A)-1 .

Facts (rank factorization):

(i) If the n ´ n matrix is symmetric and positive semidefinite, then its rank factorization is G  A   1  G1 G 2  , G 2 

where G1 is a lower triangular matrix of the order O(G1 )  rk A  rk A with rk G 2  rk A ,

whereas G 2 has the format O(G 2 )  (n  rk A)  rk A. In this case we speak of a Choleski decomposition.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

504

A3 Scalar Measures and Inverse Matrices

(ii) In case that the matrix A is positive definite, the matrix block G 2 is not needed anymore: G1 is uniquely determined. There holds A 1  (G11 )G11 . Beside the rank of a quadratic matrix A of the order O( A)  n  n as the first scalar measure of a matrix, is its determinant A 



n

perm ( j1 ,..., jn )

(1) ( j1 ,..., jn )  aiji i 1

plays a similar role as a second scalar measure. Here the summation is extended as the summation perm ( j1 , , jn ) over all permutations ( j1 ,..., jn ) of the set of integer numbers (1, , n) .  ( j1 , , jn ) is the number of permutations which transform (1, , n) into ( j1 , , jn ) . Laws (determinant)

(i)

|   A |   n  | A | for an arbitrary scalar   

(ii)

| A  B || A |  | B |

(iii)

| A  B || A |m  | B |n for an arbitrary m  n matrix B

(iv)

| A  || A |

(vi)

1 | (A  A ) || A | if A  A is positive definite 2 | A 1 || A |1 if A 1 exists

(vii)

| A | 0  A is singular ( A 1 does not exist)

(viii)

| A | 0 if A is idempotent, A  I

(ix)

| A |  aii if A is diagonal and a triangular matrix

(v)

n

i 1

n

(x)

0 | A |  aii | A  I | if A is positive definite i 1

n

(xi)

| A |  | B |  | A |  bii | A  B | if A and B are posii 1

tive definite

(xii)

 A11 A  21

1 det A11 det( A 22  A 21 A11 A12 )  m m , rk A11  m1 A12   A11     1 A 22  det A 21 det( A11  A12 A 22 A 21 )   A   m  m , rkA  m . 22 22 2  1

2

1

2

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

505

A3 Scalar Measures and Inverse Matrices

A submatrix of a rectangular matrix A is the result of a canceling procedure of certain rows and columns of the matrix A. A minor is the determinant of a quadratic submatrix of the matrix A. If the matrix A is a quadratic matrix, to any element aij there exists a minor being the determinant of a submatrix of the matrix A which is the result of reducing the i-th row and the j-th column. By multiplying with ( 1)i  j we gain a new element cij of a matrix C  [cij ] . The transpose matrix C is called the adjoint matrix of the matrix A, written adjA . Its order is the same as of the matrix A. Laws (adjoint matrix) n

(i)

| A |  aij cij , i  1, , n j 1 n

(ii)

| A |  a jk c jk , k  1, , n j 1

(iii)

A  (adj A)  (adj A)  A  | A | I

(iv)

adj( A  B)  (adj B)  (adj A )

(v)

adj( A  B)  (adj A)  (adj B)

(vi)

adj A | A | A 1 if A is nonsingular

(vii)

adjA positive definitive  A positive definite.

As a third scalar measure of a quadratic matrix A of the order O( A)  n  n we introduce the trace tr A as the sum of diagonal elements, n

tr A   aii . i 1

Laws (trace of a matrix)

(i)

tr(  A)    tr A for an arbitrary scalar   

(ii)

tr( A  B)  tr A  tr B for an arbitrary n  n matrix B

(iii)

tr( A  B)  (tr A)  (tr B) for an arbitrary m  m matrix B

iv) (v)

tr A  tr(B  C) for any factorization A = B  C tr A (B  C)  tr( A   B)C for an arbitrary n  n matrix B and C tr A   tr A trA  rkA if A is idempotent 0  tr A  tr ( A  I ) if A is positive definite

(vi) (vii) (viii) (ix)

tr( A  B)  (trA)  (trB) if A und  are positive semidefinite.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A3 Scalar Measures and Inverse Matrices

506

In correspondence to the W – weighted vector (semi) – norm. || x ||W  (x W x)1/ 2

is the W – weighted matrix (semi) norm || A ||W  (trA WA)1/ 2

for a given positive – (semi) definite matrix W of proper order. Laws (trace of matrices): tr AWA  0 (i) (ii) tr A WA  0  WA  0  A  0 if W is positive definite

A4 Vector-valued Matrix Forms If A is a rectangular matrix of the order O( A)  n  m , a j its j – th column, then vec A is an nm  1 vector  a1  a   2  vec A     .    an 1   an  In consequence, the operator “vec” of a matrix transforms a vector in such a way that the columns are stapled one after the other.

Definitions ( vec, vech, veck ):

(i)

 a1  a   2  vec A     .    an 1   an 

(ii) Let A be a quadratic symmetric matrix, A  A  , of order O( A)  n  n . Then vechA (“vec - koef”) is the [n(n  1) / 2]  1 vector which is the result of row (column) stapels of those matrix elements which are upper and under of its diagonal.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A4 Vector-valued Matrix Forms

507

 a11       an1  a A  [aij ]  [a ji ]  A  vechA :  22  .  a   n2     ann  (iii) Let A be a quadratic, antisymmetric matrix, A  A  , of order O( A)  n  n . Then veckA (“vec - skew”) is the [n(n  1) / 2] 1 vector which is generated columnwise stapels of those matrix elements which are under its diagonal.  a11        an1   a  A  [aij ]  [a ji ]   A  veckA :  32  .   a   n2      an, n 1 

Examples

(i)

a b A d e

c  vecA  [a, d , b, e, c, f ] f 

(ii)

a b A   b d  c e

c e   A  vechA  [a, b, c, d , e, f ] f 

(iii)

 0  a b  a 0 d A b d 0  c e f 

c  e    A  veckA  [a, b, c, d , e, f ] . f  0 

Useful identities, relating to scalar- and vector - valued measures of matrices will be reported finally. Facts (vec and trace forms): vec(A  B  C)  (C  A) vec B (i)

(ii)

vec(A  B)  (B  I n ) vec A  (B  A) vec I m   (I1  A ) vec B, A   n m , B   m q

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

508 (iii)

A  B  c  (c  A)vecB  ( A  c)vecB, c   q

(iv)

tr( A  B)  (vecA )vecB  (vecA )vecB  tr( A  B)

(v)

tr(A  B  C  D)  (vec D)(C  A ) vec B   (vec D)( A  C) vec B

(vi)

K nm  vecA  vecA, A   n m

(vii)

K qn (A  B)  (B  A)K pm

(viii)

K qn (A  B)K mp  (B  A )

(ix)

K qn (A  c)  c  A

(x)

K nq (c  A)  A  c, A   nm , B   q p , c   q

(xi)

vec(A  B)  (I m  K pn  I q )(vecA  vecB)

(xii)

A  (a1 , , a m ), B : Diagb, O(B)  m  m, m

C  [c1 , , c m ]  vec(A  B  C)  vec[ (a j b j cj )]  j 1

m

  (c j  a j )b j  [c1  a1 , , c m  a m )]b  (C  A)b j 1

(xiii)

A  [aij ], C  [cij ], B : Diagb, b = [b1 , ,b m ]   m

 tr(A  B  C  B)  (vec B) vec(C  B  A )   b(I m  I m )  ( A  C)b  b( A  C)b

(xiv)

B := I m  tr( A  C)  rm ( A  C)rm ( rm is the m  1 summation vector: rm : [1, ,1]   m )

(xv)

vec DiagD : (I m  D)rm  [I m  ( A  B  C)]rm   (I m  I m )  [I m  ( A   B  C)]  vec DiagI m   (I m  I m )  vec( A   B  C)   (I m  I m )  (C  A )vecB  (C  A)vecB when D  A   B  C is factorized.

Facts (Löwner partial ordering):

For any quadratic matrix A   mm there holds the uncertainty I m  ( A   A)  I m  A  A  I m  [( A  I m )  (I m  A)] in the Löwner partial ordering that is the difference matrix I m  ( A  A)  I m  A  A is at least positive semidefinite.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A5 Eigenvalues and Eigenvectors

509

A5 Eigenvalues and Eigenvectors To any quadratic matrix A of the order O( A)  m  m there exists an eigenvalue  as a scalar which makes the matrix A   I m singular. As an equivalent statement, we say that the characteristic equation  I m  A  0 has a zero value which could be multiple of degrees, if s is the dimension of the related null space  ( A   I ) . The non-vanishing element x of this null space for which Ax   x, x  0 holds, is called right eigenvector of A. Related vectors y for which y A = λy , y  0 , holds, are called left eigenvectors of A and are representative of the right eigenvectors A’. Eigenvectors always belong to a certain eigenvalue and are usually normed in the sense of xx  1, y y  1 as long as they have real components. As the same time, the eigenvectors which belong to different eigenvalues are always linear independent: They obviously span a subspace of  ( A) . In general, the eigenvalues of a matrix A are complex! There is an important exception: the orthonormal matrices, also called rotation matrices whose eigenvalues are +1 or, –1 and idempotent matrices which can only be 0 or 1 as a multiple eigenvalue generally, we call a null eigenvalue a singular matrix. There is the special case of a symmetric matrix A = A  of order O( A)  m  m . It can be shown that all roots of the characteristic polynomial are real numbers and accordingly m - not necessary different - real eigenvalues exist. In addition, the different eigenvalues  and  and their corresponding eigenvectors x and y are orthogonal, that is (   )x  y  (x  A )  y  x( A  y )  0,     0.

In case that the eigenvalue  of degrees s appears s-times, the eigenspace  ( A    I m ) is s - dimensional: we can choose s orthonormal eigenvectors which are orthonormal to all other! In total, we can organize m orthonormal eigenvectors which span the entire  m . If we restrict ourselves to eigenvectors and to eigenvalues  ,   0 , we receive the column space  ( A) . The rank of A coincides with the number of non-vanishing eigenvalues {1 , , r }. U : [U1 , U 2 ], O(U)  m  m, U  U  U U  I m

U1 : [u1 , , u r ], O(U1 )  m  r , r  rkA U 2 : [u r 1 , , u m ], O(U 2 )  m  (m  r ), A  U 2  0.

With the definition of the r  r diagonal matrix  : Diag(1 , r ) of nonvanishing eigenvalues we gain  0 A  U  A  [U1 , U 2 ]  [U1, 0]  [U1 , U 2 ]  .  0 0

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

510

A5 Eigenvalues and Eigenvectors

Due to the orthonormality of the matrix U : [U1 , U 2 ] we achieve the results about eigenvalue – eigenvector analysis and eigenvalues – eigenvector synthesis. Lemma (eigenvalue – eigenvector analysis: decomposition):

Let A  A be a symmetric matrix of the order O( A)  m  m . Then there exists an orthonormal matrix U in such a way that UAU  Diag(1 , r , 0, , 0)

holds. (1 , r ) denotes the set of non – vanishing eigenvalues of A with r  rkA ordered decreasingly. Lemma (eigenvalue – eigenvectorsynthesis: decomposition):

Let A  A be a symmetric matrix of the order O ( A )  m  m . Then there exists a synthetic representation of eigenvalues and eigenvectors of type A  U  Diag(1 , r , 0, , 0)U   U1U1 .

In the class of symmetric matrices the positive (semi)definite matrices play a special role. Actually, they are just the positive (nonnegative) eigenvalues squarerooted. 1/ 2 : Diag( 1 , , r ) .

The matrix A is positive semidefinite if and only if there exists a quadratic m  m matrix G such that A  GG  holds, for instance, G : [u11/ 2 , 0] . The quadratic matrix is positive definite if and only if the m  m matrix G is not singular. Such a representation leads to the rank fatorization A  G1  G1 with G1 : U1  1/ 2 . In general, we have Lemma (representation of the matrix U1 ):

If A is a positive semidefinite matrix of the order O( A) with non – vanishing eigenvalues {1 , , r } , then there exists an m  r matrix U1 : G1   1  U1   1/ 2

with U1  U1  I r ,  (U1 )   (U1 )   ( A),

such that U1  A  U1  (

1/ 2

 U1 )  (U1    U1 )  (U1   1/ 2 )  I r .

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

A5 Eigenvalues and Eigenvectors

511

The synthetic relation of the matrix A is A  G1  G1  U1   1  U1 .

The pseudoinverse has a peculiar representation if we introduce the matrices U1 , U1 and  1 . Definition (pseudoinverse):

If we use the representation of the matrix A of type A  G1  G1  U1U1 then A  : U1  U1  U1   1  U1

is the representation of its pseudoinverse namely (i)

AA  A  (U1U1 )(U1 1U1 )(U1U1 )  U1U1

(ii) A  AA   (U1 1U1 )(U1U1 )(U1 1U1 )  U1 1U1  A  (iii) AA   (U1U1 )(U1 1U1 )  U1U1  ( AA  ) (iv) A  A  (U1 1U1 )(U1U1 )  U1U1  ( A  A ) . The pseudoinverse A  exists and is unique, even if A is singular. For a nonsingular matrix A, the matrix A  is identical with A 1 . Indeed, for the case of the pseudoinverse (or any other generalized inverse) the generalized inverse of a rectangular matrix exists. The singular value decomposition is an excellent tool which generalizes the classical eigenvalue – eigenvector decomposition of symmetric matrices. Lemma (Singular value decomposition):

(i) Let A be an n  m matrix of rank r : rkA  min(n, m) . Then the matrices AA and AA are symmetric positive (semi) definite matrices whose nonvanishing eigenvalues {1 , r } are positive. Especially r  rk( AA)  rk( AA)

holds. A A contains 0 as a multiple eigenvalue of degree m  r , and AA has the multiple eigenvalue of degree n  r . (ii) With the support of orthonormal eigenvalues of AA and AA  we are able to introduce an m  m matrix V and an n  n matrix U such that UU  U U  I n , VV   V V  I m holds and UAAU  Diag(12 , , r 2 , 0, , 0), V A AV  Diag(12 , , r 2 , 0, , 0).

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

512

A5 Eigenvalues and Eigenvectors

The diagonal matrices on the right side have different formats m  m and m  n . (iii)

The original n  m matrix A can be decomposed according to  0 U AV    , O(UAV )  n  m  0 0

with the r  r diagonal matrix  : Diag(1 , , r )

of singular values representing the positive roots of nonvanishing eigenvalues of AA and AA . (iv)

A synthetic form of the n  m matrix A is  0 A  U   V .  0 0

We note here that all transformed matrices of type T1 AT of a quadratic matrix have the same eigenvalues as A  ( AT)T1 being used as often as an invariance property. ?what is the relation between eigenvalues and the trace, the determinant, the rank? The answer will be given now. Lemma (relation between eigenvalues and other scalar measures):

Let A be a quadratic matrix of the order O( A)  m  m with eigenvalues in decreasing order. Then we have m

m

j 1

j 1

| A |   j , trA    j , rkA  trA ,

if A is idempotent. If A  A is a symmetric matrix with real eigenvalues, then we gain 1  max{a jj | j  1, , m},

m  min{a jj | j  1, , m}. At the end we compute the eigenvalues and eigenvectors which relate the variation problem xAx  extr subject to the condition xx  1 , namely xAx   (xx)  extr . x, 

The eigenvalue  is the Lagrange multiplicator of the optimization problem.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

513

A6 Generalized Inverses

A6 Generalized Inverses Because the inversion by Cayley inversion is only possible for quadratic nonsingular matrices, we introduce a slightly more general definition in order to invert arbitrary matrices A of the order O( A)  n  m by so – called generalized inverses or for short g – inverses. An m  n matrix G is called g – inverse of the matrix A if it fulfils the equation AGA  A in the sense of Cayley multiplication. Such g – inverses always exist and are unique if and only if A is a nonsingular quadratic matrix. In this case G  A 1 if A is invertible,

in other cases we use the notation G  A  if A 1 does not exist.

For the rank of all g – inverses the inequality

r : rk A  rk A   min{n, m} holds. In reverse, for any even number d in this interval there exists a g – inverse A  such that d  rkA   dim  ( A  ) holds. Especially even for a singular quadratic matrix A of the order O( A)  n  n there exist g-inverses A  of full rank rk A   n . In particular, such g-inverses A r are of interest which have the same rank compared to the matrix A, namely rkA r  r  rkA .

Those reflexive g-inverse A r are equivalent due to the additional condition A r AA r  A r

but are not necessary symmetric for symmetric matrices A. In general, A  A and A  g-inverse of A   ( A  ) g-inverse of A  A rs : A  A( A  ) is reflexive symmetric g  inverse of A.

For constructing of A rs we only need an arbitrary g-inverse of A. On the other side, A rs does not mean unique. There exist certain matrix functions which are independent of the choice of the g-inverse. For instance,

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

514

A6 Generalized Inverses

A ( A A )  A and A ( AA ) 1 A

can be used to generate special g-inverses of AA or AA  . For instance, A  : ( AA)  A and A m : A ( AA ) 

have the special reproducing properties A( A A)  A A  AA  A  A and AA ( AA )  A  AA m A  A ,

which can be generalized in case that W and S are positive semidefinite matrices to WA ( A WA )  A WA  WA ASA ( ASA )  AS  AS ,

where the matrices WA ( A WA )  A W and SA ( ASA )  AS

are independent of the choice of the g-inverse ( A WA )  and ( ASA )  . A beautiful interpretation of the various g-inverses is based on the fact that the matrices ( AA  )( AA  )  ( AA  A ) A   AA  and ( A  A )( A  A )  A  ( AA  A )  A  A

are idempotent and can therefore be geometrically interpreted as projections. The image of AA  , namely  ( AA  )   ( A)  {Ax | x   m }   n ,

can be completed by the projections A  A along the null space  ( A  A )   ( A )  {x | Ax  0}   m .

By the choice of the g – inverse we are able to choose the projected direction of AA  and the image of the projections A  A if we take advantage of the complementary spaces of the subspaces  ( A  A)   ( A  A)   m and  ( AA  )   ( AA  )   n

by using the symbol " " as the sign of “direct sum” of linear spaces which only have the zero element in common. Finally we have use the corresponding dimensions dim  ( A  A )  r  rkA  dim  ( AA  ) 

dim  ( A  A)  m  rkA  m  r    dim  ( AA )  n  rkA  n  r With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

515

A6 Generalized Inverses

independent of the special rank of the g-inverses A  which are determined by the subspaces  ( A  A) and  ( AA  ) , respectively.

 ( AA )

( A  A )

 (A  A)

 ( AA  ) in  n

in  m

Example (geodetic networks):

In a geodetic network, the projections A  A correspond to a S – transformations in the sense of W. Baarda (1973). Example ( A  and A m g-inverses):

The projections AA   A( A A)  A  guarantee that the subspaces  ( AA  ) and  ( AA  ) are orthogonal to each other. The same holds for the subspaces  ( A m A) and  ( A m A) of the projections A m A  A ( AA )  A. In general, there exist more than one g-inverses which lead to identical projections AA  and A  A . For instance, following A. Ben – Israel, T. N. E. Greville (1974, p.59) we learn that the reflexive g-inverse which follows from A r  ( A  A) A  ( AA  )  A  AA 

contains the class of all reflexive g-inverses. Therefore it is obvious that the reflexive g-inverses A r contain exact by one pair of projections AA  and A  A and conversely. In the special case of a symmetric matrix A , A  A  , and n  m we know due to  ( AA  )   ( A )   ( A )   ( A )   ( A  A )

that the column spaces  ( AA  ) are orthogonal to the null space  ( A  A) illustrated by the sign ”  ”. If these complementary subspaces  ( A  A) and  ( AA  ) are orthogonal to each other, the postulate of a symmetric reflexive ginverse agrees to A rs : ( A  A) A  ( A  A)  A  A( A  ) ,

if A  is a suited g-inverse.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

516

A6 Generalized Inverses

There is no insurance that the complementary subspaces  ( A  A ) and  ( A  A ) and  ( AA  ) and  ( AA  ) are orthogonal. If such a result should be reached, we should use

the uniquely defined pseudoinverse A  , also called Moore-Penrose inverse for which holds  ( A  A )   ( A  A ),  ( AA  )   ( AA  )

or equivalent AA  ( AA  ), A  A  ( A  A ). 

If we depart from an arbitrary g-inverse ( AA  A)  , the pseudoinverse A  can be build on A  : A ( AAA)  A (Zlobec formula)

or A : A( AA) A( A A) A  (Bjerhammar formula) , 





if both the g-inverses ( AA)  and ( A A)  exist. The Moore-Penrose inverse fulfils the Penrose equations: (i) AA  A  A (g-inverse) (ii) A  AA   A  (reflexivity) (iii) AA   ( AA  )  Symmetry due to orthogonal projection . (iv) A  A  ( A  A) 

Lemma (Penrose equations)

Let A be a rectangular matrix A of the order O( A) be given. A ggeneralized matrix inverse which is rank preserving rk( A)  rk( A  ) fulfils the axioms of the Penrose equations (i) - (iv). For the special case of a symmetric matrix A also the pseudoinverse A  is symmetric, fulfilling  ( A  A)   ( AA  )   ( AA  )   ( A  A) ,

in addition 

2 

A  A( A ) A  A( A 2 )  A( A 2 )  A.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

517

A6 Generalized Inverses

Various formulas of computing certain g-inverses, for instance by the method of rank factorization, exist. Let A be an n  m matrix A of rank r : rkA such that A  GF, O(G )  n  r , O(F)  r  m .

Due to the inequality r  rk G   min{r , n}  r only G posesses reflexive ginverses G r , because of I r  r  [(G G ) 1 G ]G  [(G G ) 1 G ](GG r G )  G r G

represented by left inverses in the sense of G L G  I. In a similar way, all ginverses of F are reflexive and right inverses subject to Fr : F (FF ) 1 . The whole class of reflexive g-inverses of A can be represented by A r : Fr G r  Fr G L .

In this case we also find the pseudoinverse, namely A  : F (FF ) 1 (G G ) 1 G 

because of  ( A  A)   (F )   (F)   ( A  A)   ( A)  ( AA  )   (G )   (G )   ( AA  )   ( A ).

If we want to give up the orthogonality conditions, in case of a quadratic matrix A  GF , we could take advantage of the projections A r A  AA r

we could postulate  ( A p A)   ( AA r )   (G ) ,  ( A A r )   ( A r A)   (F ) .

In consequence, if FG is a nonsingular matrix, we enjoy the representation A r : G (FG ) 1 F ,

which reduces in case that A is a symmetric matrix to the pseudoinverse A  .

Dual methods of computing g-inverses A  are based on the basis of the null space, both for F and G, or for A and A . On the first side we need the matrix EF by FEF  0, rkEF  m  r versus G EG   0, rkEG   n  r on the other side. The enlarged matrix of the order (n  r  r )  (n  m  r ) is automatically nonsingular and has the Cayley inverse

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

518

A6 Generalized Inverses

A E  F

1

EG    A    0   EG 

EF   0

with the pseudoinverse A  on the upper left side. Details can be derived from A. Ben – Israel and T. N. E. Greville (1974 p. 228).

If the null spaces are always normalized in the sense of  EF | EF  I m  r ,  EG  | EG   I n  r

because of E  EF  EF | EF  1  EF  F

and EG   EG  | EG   1 EG   EG 

A E  F

1

EG   A   0   EF

EG    . 0 

These formulas gain a special structure if the matrix A is symmetric to the order O( A) . In this case EG   EF : E , O(E)  (m  r )  m , rk E  m  r

and 1

 A E   E | E   1   A E     E 0  1 0     E | E  E 

on the basis of such a relation, namely EA   0 there follows I m  AA   E  E | E  1 E   ( A  EE)[ A   E(EEEE) 1 E]

and with the projection (S - transformation) A  A  I m  E  E | E  1 E  ( A  EE) 1 A

and A   ( A  EE) 1  E(EEEE) 1 E

pseudoinverse of A  ( A A)   ( AA  )   ( A)   ( A)   (E) . 

In case of a symmetric, reflexive g-inverse A rs there holds the orthogonality or complementary

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

519

A6 Generalized Inverses

 ( A rs A )   ( AA rs )

 ( AA rs ) complementary to  ( AA rs ) ,

which is guaranteed by a matrix K , rk K  m  r , O(K )  (m  r )  m such that KE is a non-singular matrix. At the same time, we take advantage of the bordering of the matrix A by K and K  , by a non-singular matrix of the order (2m  r )  (2m  r ) . 1

 A rs K R   A K  . K 0      0     (K R ) K R : E(KE) 1 is the right inverse of A . Obviously, we gain the symmetric reflexive g-inverse A rs whose columns are orthogonal to K  : R( A rs A )  R(K )   ( AA rs )

KA rs  0



 I m  AA  K (EK ) 1 E   rs

 ( A  K K )[ A rs  E(EK EK ) 1 E]

and projection (S - transformation) A A  I m  E(KE) 1 K  ( A  K K ) 1 A  ,  rs

A rs  ( A  K K ) 1  E(EK EK ) 1 E .

symmetric reflexive g-inverse For the special case of a symmetric and positive semidefinite m  m matrix A the matrix set U and V are reduced to one. Based on the various matrix decompositions L 0   U1  A   U1 , U 2       U1 AU1 ,  0 0   U 2  we find the different g - inverses listed as following. L1 A   U1 , U 2    L 21

  U1    . L 21LL12   U 2  L12

Lemma (g-inverses of symmetric and positive semidefinite matrices):

(i)

L1 A    U1 , U 2    L 21

L12   U1    , L 22   U2 

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

520

A6 Generalized Inverses

(ii) reflexive g-inverse   U1     L 21LL12   U 2 

L1 A r   U1 , U 2    L 21

L12

(iii) reflexive and symmetric g-inverse L1 L12   U1  A rs   U1 , U 2      L12 L12 LL12   U2  (iv) pseudoinverse L1 A    U1 , U 2    0

0   U1  1     U1L U1 . 0   U 2 

We look at a representation of the Moore-Penrose inverse in terms of U 2 , the basis of the null space  ( A  A ) . In these terms we find E : U1



A  U2

1

 U2    A  0   U2

U2  , 0 

by means of the fundamental relation of A  A A  A  lim( A   I m ) 1 A  AA   I m  U 2 U 2  U1U1 ,  0

we generate the fundamental relation of the pseudo inverse A   ( A  U 2 U 2 ) 1  U 2 U 2 .

The main target of our discussion of various g-inverses is the easy handling of representations of solutions of arbitrary linear equations and their characterizations. We depart from the solution of a consistent system of linear equations, Ax  c, O ( A )  n  m,

c   ( A)  x  A  c

for any g-inverse A  .

x  A  c is the general solution of such a linear system of equations. If we want to generate a special g - inverse, we can represent the general solution by x  A  c  (I m  A  A ) z

for all z   m ,

since the subspaces  ( A) and  (I m  A  A ) are identical. We test the consistency of our system by means of the identity AA  c  c .

c is mapped by the projection AA  to itself. Similary we solve the matrix equation AXB  C by the consistency test: the existence of the solution is granted by the identity With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

521

A6 Generalized Inverses

AA  CB  B  C for any g-inverse A  and B  .

If this condition is fulfilled, we are able to generate the general solution by X  A  CB  Z  A  AZBB  ,

where Z is an arbitrary matrix of suitable order. We can use an arbitrary ginverse A  and B  , for instance the pseudoinverse A  and B  which would be for Z  0 coincide with two-sided orthogonal projections. How can we reduce the matrix equation AXB  C to a vector equation? The vec-operator is the door opener. AXB  C

 (B  A) vec X  vec C .

The general solution of our matrix equation reads vec X  (B  A)  vec C  [I  (B   A)  (B  A)] vec Z .

Here we can use the identity ( A  B)   B   A  ,

generated by two g-inverses of the Kronecker-Zehfuss product. At this end we solve the more general equation Ax  By of consistent type  ( A)   (B) by Lemma (consistent system of homogenous equations Ax  By ):

Given the homogenous system of linear equations Ax  By for y    constraint by By   ( A) . Then the solution x  Ly can be given under the condition  ( A )   (B ) .

In this case the matrix L may be decomposed by L  A  B for a certain g-inverse A  .

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

Appendix B: Matrix Analysis A short version on matrix analysis is presented. Arbitrary derivations of scalarvalued, vector-valued and matrix-valued vector – and matrix functions for functionally independent variables are defined. Extensions for differenting symmetric and antisymmetric matrices are given. Special examples for functionally dependent matrix variables are reviewed.

B1 Derivatives of Scalar valued and Vector valued Vector Functions Here we present the analysis of differentiating scalar-valued and vector-valued vector functions enriched by examples. Definition: (derivative of scalar valued vector function):

Let a scalar valued function f (x) of a vector x of the order O(x)  1 m (row vector) be given, then we call Df (x)  [D1 f (x), , Dm f (x)] :

f x

first derivative of f (x) with respect to x . Vector differentiation is based on the following definition. Definition: (derivative of a matrix valued matrix function):

Let a n  q matrix-valued function F(X) of a m  p matrix of functional independent variables X be given. Then the nq  mp Jacobi matrix of first derivates of F is defined by J F = DF(X) :

vecF(X) .  (vecX)

The definition of first derivatives of matrix-functions can be motivated as following. The matrices F  [ f ij ]   n q and X  [ xk  ]   m p are based on twodimensional arrays. In contrast, the array of first derivatives  f ij  n q m p     J ijk      x  k 

is four-dimensional and automatic outside the usual frame of matrix algebra of two-dimensional arrays. By means of the operations vecF and vecX we will vectorize the matrices F and X. Accordingly we will take advantage of vecF(X) of the vector vecX derived with respect to the matrix J F , a two-dimensional array.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

B2 Derivatives of Trace Forms

523

Examples

(i) f (x)  xAx  a11 x12  (a12  a21 ) x1 x2  a22 x22 f  x  [2a11 x1  (a12  a21 ) x2 | (a12  a21 ) x1  2a22 x2 ]  x( A  A) Df (x)  [D1 f (x), D2 f (x)] 

 a11 x1  a12 x2  (ii) f ( x)  Ax     a21 x1  a22 x2 

J F  Df (x) 

f  a11  x  a21

 x2  x x (iii) F(X)  X 2   11 12 21  x21 x11  x22 x21

a12  A a22 

x11 x12  x12 x22   2 x21 x12  x22 

 x112  x12 x21    x21 x11  x22 x21   vecF(X)  x x  x x   11 12 12 2 22   x21 x12  x22  (vecX)  [ x11 , x21 , x12 , x22 ]

 2 x11  vecF(X)  x21  J F  DF(X)   (vecX)  x12   0

x12 x11  x22 0 x12

x21 0 x11  x22 x21

0  x21  x12   2 x22 

O(J F )  4  4 .

B2 Derivatives of Trace Forms Up to now we have assumed that the vector x or the matrix X are functionally idempotent. For instance, the matrix X cannot be a symmetric matrix X  [ xij ]  [ x ji ]  X or an antisymmetric matrix X  [ xij ]  [ x ji ]   X . In case of a functional dependent variables, for instance xij  x ji or xij   x ji we can take advantage of the chain rule in order to derive the differential procedure.  A, if X consists of functional independent elements;  tr( AX)   A  A - Diag[a11 , , ann ], if the n  n matrix X is symmetric; X  A  A, if the n  n matrix X is antisymmetric.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

524

B2 Derivations of Trace Forms

[vecA ], if X consists of functional independent elements; [vec(A   A - Diag[a , , a ])], if the n  n matrix X is  11 nn tr( AX)    symmetric;  (vecX)  [vec(A   A )], if the n  n matrix X is antisymmetric.

for instance a A   11  a21

a12  x , X   11 a22   x21

x12  . x22 

Case # 1: “the matrix X consists of functional independent elements”    x  11  X    x  21

  x12  ,   x22 

a  tr( AX)   11 X  a12

a21   A . a22 

Case # 2: “the n  n matrix X is symmetric : X  X “ x12  x21 

tr( AX )  a11 x11  ( a12  a21 ) x21  a22 x22    x  =  11 X    x  21  a11  tr( AX)   X  a12  a21

dx21     dx12 x21   x11      x22   x21

  x21     x22 

a12  a21   A   A  Diag(a11 , , ann ) . a22 

Case # 3: “the n  n matrix X is antisymmetric : X   X  ” x11  x22  0, x12   x21  tr( AX)  (a12  a21 ) x21

   x  =  11 X     x21

dx21     dx12 x21   x11        x22   x21

 0  tr( AX)   X  a12  a21

  x21     x22 



a12  a21    A  A . 0 

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

525

B2 Derivations of Trace Forms

Let us now assume that the matrix X of variables xij is always consisting of functionally independent elements. We note some useful identities of first derivatives. Scalar valued functions of vectors  (ax)  a  x

(B1)

 (xAx)  X( A  A ).  x

(B2)

Scalar-valued function of a matrix: trace  tr(AX)  A ; X

(B3)

especially:  aXb  tr(baX)   b  a ;  (vecX)  (vecX)  tr(XAX)  ( A  A) X ; X

(B4)

especially:  tr(XX)  2(vecX) .  (vecX)  tr(XAX)  XA  A X , X

(B5)

especially:  trX 2  2(vecX) .  (vecX)  tr(AX 1 )  ( X 1AX 1 ), if X is nonsingular, X

(B6)

especially: 1

 tr(X )  [vec(X 2 )] ;  (vecX)  aX 1b  tr(baX 1 )   b( X 1 )  aX 1 .    (vecX)  (vecX)

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

526

B2 Derivations of Trace Forms

 trX   ( X) 1 , if X is quadratic ; X

(B7)

especially:  trX  (vecI ) .  (vecX)

B3 Derivatives of Determinantal Forms The scalarvalued forms of matrix determinantal form will be listed now.  | AXB | A (adjAXB)B | AXB | A (BXA) 1 B, X if AXB is nonsingular ;

(B8)

especially: axb  b  a, where adj(aXb)=1 .  (vecX)  | AXBXC | C(adjAXBXC) AXB  A (adjAXBXC)CXB ; X

(B9)

especially:  | XBX | (adjXBX)XB  (adjXB X) XB ; X  | XSX |  2(vecX)(S  adjXSX), if S is symmetric;  (vecX)

 | XX |  2(vecX)(I  adjXX) .  (vecX)  | AXBXC | BXC(adjAXBXC) A  BXA (adjAXBXC)C ; X especially:  | XBX | BX(adjXBX)  BX(adjXBX) ; X

(B10)

 | XSX |  2(vecX)(adjXSX  S), if S is symmetric;  (vecX)  | XX |  2(vecX)(adjXX  I ) .  (vecX)

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

B4 Derivatives of a Vector/Matrix Function of a Vector/Matrix  | AXBXC | BXA (adjAXBXC)C  A (adjAXBXC)CXB ; X  | XBX | BX(adjXBX)  (adjXBX)XB ; X

527 (B11)

especially: |X |  (vec[Xadj(X 2 )  adj(X 2 )X])   (vecX) 2

| X |2 (vec[X (X ) 2  (X ) 2 X ])  2 | X |2 [vec(X 1 )], if X is non-singular .

 | X |  | X | ( X 1 ),    if X is non-singular , X

(B12)

|X| | X | (X 1 ) if X is non-singular; X especially: |X|  [vec(adjX)].  (vecX)

B4 Derivatives of a Vector/Matrix Function of a Vector/Matrix If we differentiate the vector or matrix valued function of a vector or matrix, we will find the results of type (B13) – (B20). vector-valued function of a vector or a matrix  AX  A x

(B13)

  (a  A )vecX  a  A AXa   (vecX)  (vecX)

(B14)

matrix valued function of a matrix  (vecX)  I mp for all X   m p  (vecX)

(B15)

 (vecX)  K m p for all X   m p   (vecX)

(B16)

where K m p is a suitable commutation matrix  (vecXX )  (I m2 +K mm )(X  I m ) for all X   m  p ,   (vecX ) where the matrix I m2 +K mm is symmetric and idempotent,

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions  (vecXX)  (I p2 +K p  p )(I p  X) for all X   m p  (vecX)

528

 (vecX 1 )  ( X 1 ) if X is non-singular  (vecX)

 (vecX )    (X) -j  X j 1 for all    , if X is a square matrix.  (vecX) j 1

B5 Derivatives of the Kronecker – Zehfuss product Act a matrix-valued function of two matrices X and Y as variables be given. In particular, we assume the function F(X, Y)  X  Y for all X   m p , Y   n q as the Kronecker – Zehfuss product of variables X and Y well defined. Then the identities of the first differential and the first derivative follow: dF(X, Y)  (dX)  Y  X  dY, dvecF(X, Y)  vec( dX  Y)  vec(X  dY), vec( dX  Y)  (I p  K qm  I n )  (vecdX  vecY)   (I p  K qm  I n )  (I mp  vecY )  d (vecX)   (I p  [K qm  I n )  (I m  vecY )])  d (vecX), vec(X  dY)  (I p  K qm  I n )  (vecX  vecdY)   (I p  K qm  I n )  (vecX  I nq )  d (vecY )   ([(I p  K qm )  (vecX  I q )]  I n )  d (vecY ),

 vec(X  Y)  I p  [(K qm  I  n)  (I m  vecY)] ,  (vecX)  vec(X  Y )  (I p  K qm )  (vecX  I q )]  I n .  (vecY )

B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions Many matrix functions f ( X) or F(X) force us to pay attention to dependencies within the variables. As examples we treat here first derivatives of symmetric or antisymmetric matrix functions of X.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions

529

Definition: (derivative of a matrix-valued symmetric matrix function):

Let F(X) be an n  q matrix-valued function of an m  m symmetric matrix X = X  . The nq  m( m  1) / 2 Jacobi matrix of first derivates of F is defined by vecF(X) .  (vechX )

J Fs = DF(X  X ) :

Definition: (derivative of matrix valued antisymmetric matrix function):

Let F(X) be an n  q matrix-valued function of an m  m antisymmetric matrix X =  X  . The nq  m( m  1) / 2 Jacobi matrix of first derivates of F is defined by J aF = DF(X   X ) :

vecF(X) .  (veckX )

Examples

(i) Given is a scalar-valued matrixfunction tr(AX ) of a symmetric variable matrix X = X  , for instance a A   11 a  21

a12  x , X   11 a22  x  21

 x11  x12  x   , vech X  22  x22   x33 

tr(AX )  a11 x11  (a12  a 21 )x 21  a22 x22     , , ] [  (vechX ) x11 x21 x22

 tr(AX)  [a11 , a12  a21 , a22 ]  (vechX)  tr(AX)  tr(AX)  [vech(A   A  Diag[a11 , , ann ])]=[vech ].  (vechX) X

(ii) Given is scalar-valued matrix function tr(AX) of an antisymmetric variable matrix X   X , for instance a A   11  a21

a12  0 , X   a22   x21

 x21  , veckX  x21 , 0 

tr(AX)  (a12  a 21 )x 21

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions

530

   tr(AX)   a12  a21 , ,  (veckX) x21  (veckX)  tr(AX)  tr(AX)  [veck(A   A )]=[veck ] .   (veckX) X

B7 Higher order derivatives Up to now we computed only first derivatives of scalar-valued, vector-valued and matrix-valued functions. Second derivatives is our target now which will be needed for the classification of optimization problems of type minimum or maximum. Definition: (second derivatives of a scalar valued vector function):

Let f (x) a scalar-valued function of the n  1 vector x . Then the m  m matrix 2 f DDf (x)  D(Df (x)) : xx denotes the second derivatives of f ( x ) to x and x . Correspondingly   D2 f (x) :  f (x)  (vecDD) f (x) x x denotes the 1  m 2 vector of second derivatives. and Definition: (second derivative of a vector valued vector function):

Let f (x) be an n  1 vector-valued function of the m  1 vector x . Then the n  m 2 matrix of second derivatives H f  D2 f (x)  D(Df (x)) :

   2 f ( x)  f ( x)  x x xx

is the Hesse matrix of the function f (x) . and Definition: (second derivatives of a matrix valued matrix function):

Let F(X) be an n  q matrix valued function of an m  p matrix of functional independent variables X . The nq  m 2 p 2 Hesse matrix of second derivatives of F is defined by H F  D2 F(X)  D(DF(X)):

   2 vecF(X)  vecF(X)  .  (vecX)  (vecX)  (vecX) (vecX)

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

531

B7 Higher order derivatives

The definition of second derivatives of matrix functions can be motivated as follows. The matrices F  [ fij ]   n q and X  [ xk  ]   m p are the elements of a two-dimensional array. In contrast, the array of second derivatives [

 2 fij xk  x pq

]  [kijk pq ]   n q  m p  m p

is six-dimensional and beyond the common matrix algebra of two-dimensional arrays. The following operations map a six-dimensional array of second derivatives to a two-dimensional array. (i) vecF(X) is the vectorized form of the matrix valued function (ii) vecX is the vectorized form of the variable matrix     (vecX )  (vecX ) vectorizes the matrix of second derivatives

(iii) the Kronecker – Zehfuss product

(iv) the formal product of the 1  m 2 p 2 row vector of second derivatives with the nq 1 column vector vecF(X) leads to an nq  m 2 p 2 Hesse matrix of second derivatives. Again we assume the vector of variables x and the matrix of variables X consists of functional independent elements. If this is not the case we according to the chain rule must apply an alternative differential calculus similary to the first deri-vative, case studies of symmetric and antisymmetric variable matrices. Examples:

(i) f (x)  xAx  a11 x12  (a12  a21 ) x1 x2  a22 x22 Df (x) 

f  [2a11 x1  (a12  a21 ) x2 | (a12  a21 ) x1  2a22 x2 ] x

D2 f (x)  D(Df (x)) 

(ii)

 2a11 2 f  xx  a12  a21

a12  a21   A  A 2a22 

a x  a x  f (x)  Ax   11 1 12 2   a21 x1  a22 x2  Df (x) 

DDf (x) 

f  a11  x  a21

a12  A a22 

0 0   2f , O(DDf (x))  2  2  xx  0 0 

D2 f (x)  [0 0 0 0], O(D2 f (x))  1 4

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

532

B7 Higher order derivatives

(iii)

 x2  x x F(X)  X 2   11 12 21  x21 x11  x22 x21

x11 x12  x12 x22   2 x21 x12  x22 

 x112  x12 x21    x21 x11  x22 x21  vecF(X)   , O (F )  O ( X )  2  2  x11 x12  x12 x22    2  x21 x12  x22  (vecX)  [ x11 , x21 , x12 , x22 ]

 2 x11   vecF(X)  x21 JF    (vecX)  x12   0

x12 x11  x22

x21 0

0

x11  x22

x12

x21

0  x21  x12   2 x22 

O(J F )  4  4 HF 

       vecF(X)  [ , , , ]  JF     (vecX)  (vecX) x11 x21 x12 x22

2  0  0   0

0 0 0 0 0 1 0 0 1 0 0 0 0 0 0  1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0  0 0 0 0 0 1 0 0 1 0 0 0 0 0 2  O(H F )  4  16 .

At the end, we want to define the derivative of order l of a matrix-valued matrix function whose structure is derived from the postulate of a suitable array. Definition ( l-th derivative of a matrix-valued matrix function):

Let F(X) be an n  q matrix valued function of an m  p matrix of functional independent variables X. The nq  ml p l matrix of l-th derivative is defined by Dl F(X) :

=

  vecF(X)     (vecX) l -times  (vecX)

l vecF(X) for all l   .  (vecX)  (vecX) l -times

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

Appendix C: Lagrange Multipliers ?How can we find extrema with side conditions? We generate solutions of such external problems first on the basis of algebraic manipulations, namely by the lemma of implicit functions, and secondly by a geometric tool box, by means of interpreting a risk function and side conditions as level surfaces (specific normal images, Lagrange multipliers).

C1 A first way to solve the problem A first way to find extreme with side conditions will be based on a risk function f ( x1 ,..., xm )  extr

(C1)

with unknowns ( x1 ,..., xm )   m , which are restricted by side conditions of type

 F1 ( x1 ,..., xm ), F2 ( x1 ,..., xm ),..., Fr ( x1 ,..., xm )  0 rk(

Fi )  r  m. xm

(C2) (C3)

The side conditions Fi ( x j ) (i  1,..., r , j  1,..., m) are reduced by the lemma of the implicit function: solve for xm  r 1  G1 ( x1 ,..., xm  r ) xm  r  2  G2 ( x1 ,..., xm  r ) ... xm 1  Gr 1 ( x1 ,..., xm  r )

(C4)

xm  Gr ( x1 ,..., xm  r )

and replace the result within the risk function f ( x1 , x2 ,..., xm  r , G1 ( x1 ,..., xm  r ),..., Gr ( x1 ,..., xm 1 ))  extr .

(C5)

The “free” unknowns ( x1 , x2 ,..., xm  r 1 , xm  r )   m  r can be found by taking the result of the implicit function theorem as follows. Lemma C1 (“implicit function theorem”): Let Ω be an open set of  m   m  r   r and F : Ω   r with vectors x1   m  r and x 2   m  r . The maps

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

534

C1 A first way to solve the problem

 F1 ( x1 ,..., xm  r ; xm  r 1 ,..., xm )   F2 ( x1 ,..., xm  r ; xm  r 1 ,..., xm )    (x1 , x 2 )  F (x1 , x 2 )   (C6) ...   Fr 1 ( x1 ,..., xm  r ; xm  r 1 ,..., xm )   Fr ( x1 ,..., xm  r ; xm  r 1 ,..., xm )  transform a continuously differential function with F(x1 , x 2 )  0 . In case of a Jacobi determinant j not zero or a Jacobi matrix J of rank r, or

 ( F1 ,..., Fr ) , (C7)  ( xm  r 1 ,..., xm ) there exists a surrounding U : U (x1 )   m  r and V : U (x 2 )   r such that the equation F(x1 , x 2 )  0 for any x1  U in V has only one solution j : det J  0 or rk J  r , J :

 xm  r 1   G1 ( x1 ,..., xm  r )   xm  r  2   G 2 ( x1 ,..., xm  r )      x 2  G (x1 ) or  ...    ... . G x ( x ,..., x )  m 1   r 1 1 mr   xm   G r ( x1 ,..., xm  r ) 

(C8)

The function G : U  V is continuously differentiable. A sample reference is any literature treating analysis, e.g. C. Blotter . Lemma C1 is based on the Implicit Function Theorem whose result we insert within the risk function (C1) in order to gain (C5) in the free variables ( x1 , ..., xm  r )   m  r . Our example C1 explains the solution technique for finding extreme with side conditions within our first approach. Lemma C1 illustrates that there exists a local inverse of the side conditions towards r unknowns ( xm  r 1 , xm  r  2 ,..., xm 1 , xm )   r which in the case of nonlinear side conditions towards r unknowns ( xm  r 1 , xm  r  2 ,..., xm 1 , xm )   r which in case of nonlinear side conditions is not necessary unique. :Example C1: Search for the global extremum of the function f ( x1 , x2 , x3 )  f ( x, y , z )  x  y  z

subject to the side conditions  F1 ( x1 , x2 , x3 )  Z ( x, y, z ) : x 2  2 y 2  1  0  F ( x , x , x )  E ( x, y, z ) : 3x  4 z  0  2 1 2 3

(elliptic cylinder) (plane)

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

C1 A first way to solve the problem

J(

535

Fi 2x 4 y 0  ) , rk J ( x  0 oder y  0)  r  2 x j  3 0 4 

1  2 1y   2 2 1  x F1 ( x1 , x2 , x3 )  Z ( x, y, z )  0   2 y   1 2 1  x2  2 3 F2 ( x1 , x2 , x3 )  E ( x, y, z )  0  z  x 4 1 3 2 1  x2 , )  1 f ( x1 , x2 , x3 )  1 f ( x, y , z )  f ( x,  2 4 x 1 2 2 1 x   4 2 1 3 2 1  x2 , ) 2 f ( x1 , x2 , x3 )  2 f ( x, y , z )  f ( x,  2 4 x 1   2 1  x2 4 2 x 1 1 1    0 1 x   2 1 f ( x)  0  2 4 2 3 1 x x 1 1 1   0 2 x   2 2 4 2 3 1 x 1 3 1 3 (minimum), 2 f ( )   (maximum). 1 f ( )   3 4 3 4 2

f ( x )  0 

At the position x  1/ 3, y  2 / 3, z  1/ 4 we find a global minimum, but at the position x  1/ 3, y  2 / 3, z  1/ 4 a global maximum. An alternative path to find extreme with side conditions is based on the geometric interpretation of risk function and side conditions. First, we form the conditions F1 ( x1 , , xm )  0  Fi F2 ( x1 , , xm )  0  )r  rk(   xj  Fr ( x1 , , xm )  0 

by continuously differentiable real functions on an open set Ω   m . Then we define r equations Fi ( x1 ,..., xm )  0 for all i  1,..., r with the rank conditions rk(Fi / x j )  r , geometrically an (m-1) dimensional surface  F  Ω which can be seen as a level surface. See as an example our Example C1 which describe as side conditions

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

536

C1 A first way to solve the problem

F1 ( x1 , x2 , x3 )  Z ( x, y, z )  x 2  2 y 2  1  0 F2 ( x1 , x2 , x3 )  E ( x, y , z )  3x  4 z  0

representing an elliptical cylinder and a plane. In this case is the (m-r) dimensional surface  F the intersection manifold of the elliptic cylinder and of the plane as the m-r =1 dimensional manifold in  3 , namely as “spatial curve”. Secondly, the risk function f ( x1 ,..., xm )  extr generates an (m-1) dimensional surface  f which is a special level surface. The level parameter of the (m-1) dimensional surface  f should be external. In our Example C1 one risk function can be interpreted as the plane f ( x1 , x2 , x3 )  f ( x, y, z )  x  y  z .

We summarize our result within Lemma C2. Lemma C2 (extrema with side conditions) The side conditions Fi ( x1 , , xm )  0 for all i  {1, , r} are built on continuously differentiable functions on an open set Ω   m which are subject to the side conditions rk(Fi / x j )  r generating an (m-r) dimensional level surface  f . The function f ( x1 , , xm ) produces certain constants, namely an (m-1) dimensional level surface  f . f ( x1 , , xm ) is geometrically as a point p   F conditionally extremal (stationary) if and only if the (m-1) dimensional level surface  f is in contact to the (m-r) dimensional level surface in p. That is there exist numbers 1 , , r , the Lagrange multipliers, by grad f ( p )   i 1 i grad Fi ( p ). r

The unnormalized surface normal vector grad f ( p ) of the (m-1) dimensional level surface  f in the normal space  F of the level surface  F is in the unnormalized surface normal vector grad Fi ( p ) in the point p . To this equation belongs the variational problem

 ( x1 , , xm ; 1 , , r )  f ( x1 , , xm )   i 1 i Fi ( x1 , , xm )  extr . r

:proof: First, the side conditions Fi ( x j )  0, rk(Fi / x j )  r for all i  1, , r ; j  1, , m generate an (m-r) dimensional level surface  F whose normal vectors ni ( p) : grad Fi ( p)   p  F

(i  1, , r )

span the r dimensional normal space  of the level surface  F  Ω . The r dimensional normal space  p  F of the (m-r) dimensional level surface  F With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

537

C1 A first way to solve the problem

is orthogonal complement p  p to the tangent space p  F   m 1 of  F in the point p spanned by the m-r dimensional tangent vectors t k ( p ) :

x x k

 p  F

(k  1,..., m  r ).

x p

:Example C2: Let the m  r  2 dimensional level surface  F of the sphere  r2   3 of radius r (“level parameter r 2 ”) be given by the side condition F ( x1 , x2 , x3 )  x12  x2 2  x32  r 2  0. :Normal space:  2 x1  F F F  e2  e3  [e1 , e 2 , e 3 ]  2 x2  . n( p )  grad F ( p )  e1 2 x  x1 x2 x3  3p 3 The orthogonal vectors [e1 , e 2 , e 3 ] span  . The normal space will be generated locally by a normal vector n( p ) = grad F ( p ).

:Tangent space: The implicit representation is the characteristic element of the level surface. In order to gain an explicit representation, we take advantage of the Implicit Function Theorem according to the following equations. F ( x1 , x2 , x3 )  0   F rk( )  r  1   x3  G ( x1 , x2 ) x j  x12  x2 2  x32  r  0 and (

F F )  [2 x1  2 x2  2 x3 ], rk( ) 1 x j x j

 x j  G ( x1 , x2 )   r 2  ( x12  x2 2 ) .

The negation root leads into another domain of the sphere: here holds the do2 2 main 0  x1  r , 0  x2  r , r 2  ( x1  x2 )  0. The spherical position vector x( p ) allows the representation x( p )  e1 x1  e 2 x2  e 3 r 2  ( x12  x2 2 ) ,

which is the basis to produce

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

538

C1 A first way to solve the problem

    x1 x  t1 ( p )  ( p )  e1  e3  [e1 , e 2 , e3 ]  2 2 2 x2   r  ( x1  x2 )       x2 x   [e1 , e 2 , e3 ]  t1 ( p )  x ( p)  e 2  e3 2  r  ( x12  x2 2 ) 2   

   x1   r 2  ( x12  x22 )   0  1 , x2  r 2  ( x12  x2 2 )   1 0

which span the tangent space p  F   2 at the point p. :The general case: In the general case of an ( m  r ) dimensional level surface  F , implicitly produced by r side conditions of type F1 ( x1 ,..., xm )  0  F2 ( x1 ,..., xm )  0  Fi  ...  rk ( x )  r , j Fr  j ( x1 ,..., xm )  0   Fr ( x1 ,..., xm )  0 

the explicit surface representation, produced by the Implicit Function Theorem, reads x( p )  e1 x1  e 2 x2  ...  e m  r xm  r  e m  r 1G1 ( x1 ,..., xm  r )  ...  e mGr ( x1 ,..., xm  r ).

The orthogonal vectors [e1 ,..., e m ] span  m . Secondly, the at least once conditional differentiable risk function f ( x1 ,..., xm ) for special constants describes an ( m  1) dimensional level surface  F whose normal vector n f : grad f ( p )   p  f

spans an one-dimensional normal space  p  f of the level surface  f  Ω in the point p . The level parameter of the level surface is chosen in the extremal case that it touches the level surface  f the other level surface  F in the point p . That means that the normal vector n f ( p ) in the point p is an element of the normal space  p  f . Or we may say the normal vector grad f ( p ) is a linear combination of the normal vectors grad Fi ( p ) in the point p, grad f ( p )   i 1 i grad Fi ( p ) for all i  1,..., r , r

where the Lagrange multipliers i are the coordinates of the vector grad f ( p ) in the basis grad Fi ( p ).

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

539

C1 A first way to solve the problem

:Example C3: Let us assume that there will be given the point X   3 . Unknown is the point in the m  r  2 dimensional level surface  F of type sphere  r 2   3 which is from the point X   3 at extremal distance, either minimal or maximal. The distance function || X  x ||2 for X   3 and X   r 2 describes the risk function f ( x1 , x2 , x3 )  ( X 1  x1 ) 2  ( X 2  x2 ) 2  ( X 3  x3 ) 2   2  extr , x1 , x2 , x3

which represents an m  1  2 dimensional level surface  f of type sphere  r 2   3 at the origin ( X 1 , X 2 , X 3 ) and level parameter R 2 . The conditional extremal problem is solved if the sphere  R 2 touches the other sphere  r 2 . This result is expressed in the language of the normal vector. n( p ) : grad f ( p )  e1

f f f  e2  e3  x1 x2 x3

 2( X 1  x1 )   [e1 , e 2 , e 3 ]  2( X 2  x2 )    p  f  2( X  x )  3 3 p   2 x1  n( p ) : grad F ( p )  [e1 , e 2 , e 3 ]  2 x2  2x   3

is an element of the normal space  p  f . The normal equation grad f ( p )   grad F ( p )

leads directly to three equations xi  X 0   xi  xi (1   )  X i

(i  1, 2,3) ,

which are completed by the fourth equation F ( x1 , x2 , x3 )  x12  x2 2  x32  r 2  0. Lateron we solve the 4 equations. Third, we interpret the differential equations grad f ( p )   i 1 i grad Fi ( p ) r

by the variational problem, by direct differentiation namely

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

540

C1 A first way to solve the problem

 ( x1 ,..., xm ; 1 ,..., r )   f ( x1 ,..., xm )   i 1 i Fi ( x1 ,..., xm )  r

extr

x1 ,..., xm ; 1 ,..., r

Fi f r    x  x   i 1 i x  0 ( j  1,..., m) j j  i   (i  1,..., r ).   x  Fi ( x j )  0 k 

:Example C4: We continue our third example by solving the alternative system of equations.

 ( x1 , x2 , x3 ;  )  ( X 1  x1 ) 2  ( X 2  x2 ) 2  ( X 3  x3 )   ( x12  x2 2  x32  r 2 )  extr

x1 , x2 , x3 ; 

   2( X j  x j )  2 x j  0  x j    2 2 2 2   x1  x2  x3  r  0    X1 X2  x1  ; x2  1  1     2 2 2 x1  x2  x3  r 2  0  X 12  X 2 2  X 32  r 2  0  (1   ) 2 r 2  X 12  X 2 2  X 32  0  (1   ) 2  (1   ) 2 

X 12  X 2 2  X 32 1  1  1, 2   X 12  X 2 2  X 32  r r2

1, 2  1 

r  X 12  X 2 2  X 32 1 X 12  X 2 2  X 32  r r rX 1 ( x1 )1, 2   , X 12  X 2 2  X 32 ( x2 )1, 2   ( x3 )1, 2  

rX 2 X  X 2 2  X 32 2 1

rX 3 X  X 2 2  X 32 2 1

, .

The matrix of second derivatives H decides upon whether at the point ( x1 , x2 , x3 ,  )1, 2 we enjoy a maximum or minimum.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

541

C1 A first way to solve the problem

H(

2 )  ( jk (1   ))  (1   )I 3 x j xk

H (1    0)  0 (minimum)  ( x1 , x2 , x3 ) is the point of minimum 

 H (1    0)  0 (maximum)  ( x , x , x ) is the point of maximum .  1 2 3

Our example illustrates how we can find the global optimum under side conditions by means of the technique of Lagrange multipliers. :Example C5: Search for the global extremum of the function f ( x1 , x2 , x3 ) subject to two side conditions F1 ( x1 , x2 , x3 ) and F2 ( x1 , x2 , x3 ) , namely f ( x1 , x2 , x3 )  f ( x, y, z )  x  y  z (plane)  F1 ( x1 , x2 , x3 )  Z ( x, y, z ) : x 2  2 y 2  1  0  F ( x , x , x )  E ( x, y, z ) : 3x  4 z  0  2 1 2 3 J(

(elliptic cylinder) (plane)

Fi 2x 4 y 0  ) , rk J ( x  0 oder y  0)  r  2 . x j  3 0 4 

:Variational Problem:

 ( x1 , x2 , x3 ; 1 , 2 )   ( x, y, z;  ,  )  x  y  z   ( x  2 y 2  1)   (3x  4 z )  2

extr

x1 , x2 , x3 ;  , 

   1  2 x  3  0  x  1   1  4 y  0      4y  y  1    1  4   0     4  z      x2  2 y 2  1  0       3x  4 z  0.   

We multiply the first equation  / x by 4y, the second equation  / y by (2 x) and the third equation  / z by 3 and add ! 4 y  8 xy  12  y  2 x  8 xy  3 y  12  y  y  2 x  0 .

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"

542

C1 A first way to solve the problem

Replace in the cylinder equation (first side condition) Z(x, y, z)= x 2  2 y 2  1  0 , that is x1,2  1/ 3. From the second condition of the plane (second side condition) E ( x, y, z )  3 x  4 z  0 we gain z1,2  1/ 4. As a result we find x1,2 , z1,2 and finally y1,2   2 / 3. The matrix of second derivatives H decides upon whether at the point  1,2   3 / 8 we find a maximum or minimum. H(

 2 2 ) 0 x j xk  0

0 0 4 0  0 0 

 - 34 0 0    H (2  3 )   0 - 3 0   0  8  0 02 0      (maximum)  1 2 1 3 1 ( x, y, z;  ,  )1 =( 13 ,- 32 , 14 ;- 83 , 14 )   ( x, y, z;  ,  ) 2 =(- 3 , 3 ,- 4 ; 8 , 4 ) is the restricted minmal solution point. is the restricted maximal solution point. 3 3  4 03 0   H (1   )  0 2 0   0 8 0 0 0   (minimum)

The geometric interpretation of the Hesse matrix follows from E. Grafarend and P. Lohle (1991). The matrix of second derivatives H decides upon whether at the point ( x1 , x2 , x3 ,  )1, 2 we enjoy a maximum or minimum.

With permission from the author: Extract from "Erik W. Grafarend: Linear and nonlinear models. Fixed effects, random effects, and mixed models"