Maximum Likelihood Estimation.pdf

1 downloads 0 Views 36KB Size Report
and are sought that maximize the joint density of the y's, called the likelihood function. For the multivariate normal, the maximum likelihood estimates of and are.
90

THE MULTIVARIATE NORMAL DISTRIBUTION

4.3 ESTIMATION IN THE MULTIVARIATE NORMAL 4.3.1 Maximum Likelihood Estimation When a distribution such as the multivariate normal is assumed to hold for a population, estimates of the parameters are often found by the method of maximum likelihood. This technique is conceptually simple: The observation vectors y1 , y2 , . . . , yn are considered to be known and values of ␮ and ⌺ are sought that maximize the joint density of the y’s, called the likelihood function. For the multivariate normal, the maximum likelihood estimates of ␮ and ⌺ are

␮ˆ = y,

(4.11)

n  ˆ = 1 ⌺ (yi − y)(yi − y)′ n i=1

1 W n n−1 S, = n

=

(4.12)

n where W = i=1 (yi − y)(yi − y)′ and S is the sample covariance matrix defined ˆ has divisor n instead of n − 1, it is biased [see (3.33)], in (3.22) and (3.27). Since ⌺ ˆ and we usually use S in place of ⌺. We now give a justification of y as the maximum likelihood estimator of ␮ . Because the yi ’s constitute a random sample, they are independent, and the joint density is the product of the densities of the y’s. The likelihood function is, therefore, L(y1 , y2 , . . . , yn , ␮ , ⌺) =

n

=

n

f (yi , ␮ , ⌺)

i=1

1 ′ −1 e−(yi −␮) ⌺ (yi −␮)/2 √ p 1/2 i=1 ( 2π) |⌺|

n 1 ′ −1 e− i=1 (yi −␮) ⌺ (yi −␮)/2 . = √ ( 2π)np |⌺|n/2

(4.13)

To see that ␮ ˆ = y maximizes the likelihood function, we begin by adding and subtracting y in the exponent in (4.13), −

n 1 (yi − y + y − ␮ )′ ⌺−1 (yi − y + y − ␮ ). 2 i=1

When this is expanded in terms of yi − y and y − ␮ , two of the four resulting terms  vanish because i (yi − y) = 0, and (4.13) becomes

ESTIMATION IN THE MULTIVARIATE NORMAL n 1 ′ −1 ′ −1 e− i=1 (yi −y) ⌺ (yi −y)/2−n(y−␮) ⌺ (y−␮)/2 . L= √ np n/2 ( 2π) |⌺|

91 (4.14)

Since ⌺−1 is positive definite, we have −n(y − ␮ )′ ⌺−1 (y − ␮ )/2 ≤ 0 and 0 < ′ −1 e−n(y−␮) ⌺ (y−␮)/2 ≤ 1, with the maximum occurring when the exponent is 0. Therefore, L is maximized when ␮ ˆ = y. The maximum likelihood estimator of the population correlation matrix Pρ [see (3.39)] is the sample correlation matrix, that is, Pˆ ρ = R. Relationships among multinormal variables are linear, as can be seen in (4.7). Thus the estimators S and R serve well for the multivariate normal because they measure only linear relationships (see Sections 3.2.1 and 4.2). These estimators are not as useful for some nonnormal distributions. 4.3.2 Distribution of y and S n yi /n, we can distinguish two cases: For the distribution of y = i=1 1. When y is based on a random sample y1 , y2 , . . . , yn from a multivariate normal distribution N p (␮ , ⌺), then y is N p (␮ , ⌺/n). 2. When y is based on a random sample y1 , y2 , . . . , yn from a nonnormal multivariate population with mean vector ␮ and covariance matrix ⌺, then for large n, y is approximately N p (␮ , ⌺/n). More formally, this result is known as the multivariate central limit theorem: If y is the mean vector of a random sample y1 , y2 , . . . , yn from a population with√mean vector ␮ and covariance matrix ⌺, then as n → ∞, the distribution of n(y − ␮ ) approaches N p (0, ⌺). There are p variances in S and p+



p 2



p 2

covariances, for a total of

= p+

p( p − 1) p( p + 1) = 2 2

distinct entries.  The joint distribution of these p( p + 1)/2 distinct variables in W = (n −1)S = i (yi −y)(yi −y)′ is the Wishart distribution, denoted by W p (n −1, ⌺), where n − 1 is the degrees of freedom. The Wishart distribution is the multivariate analogue of the χ 2 -distribution, and it has similar uses. As noted in property 3 of Section 4.2, a χ 2 random variable is defined formally as the sum of squares of independent standard normal (univariate) random variables: n  i=1

z i2 =

n  (yi − µ)2 is χ 2 (n). 2 σ i=1