MULTIVARIATE SKEWNORMAL/INDEPENDENT DISTRIBUTIONS: PROPERTIES AND INFERENCE V. H. LACHOS1 Departamento de Estat´ıstica, Universidade Estadual de Campinas, Brazil F. V. LABRA Departamento de Estat´ıstica, Universidade Estadual de Campinas, Brazil P. GHOSH Georgia State University, Atlanta, GA30303, USA Abstract Liu (1996) discussed a class of robust distributions as normal/independent distributions (Andrews and Mallows, 1974; Lange and Sinsheimer, 1993), which contains a group of thicktailed distributions. In this article, we have developed a skewed version of these distributions in the multivariate setting, and we call it multivariate skew normal/independent distributions. We derive several useful properties of these distributions. The main virtue of the members of this family of distributions is that they are easy to simulate from and they also lend themselves to a EMtype algorithm for maximum likelihood estimation. For two multivariate models of practical interest, the EMtype algorithm has been discussed with emphasis on the skewt, on the skewslash, and on the contaminated skewnormal distributions. Results obtained from simulated and two real data sets are reported illustrating the usefulness of the proposed methodology. Key Words: EM algorithm; normal/independent distributions; skewness; measurement errors models.
1
Introduction
A normal distribution is a routine assumption for analysing real data, but it may be unrealistic, specially for data with with strong skewness and heavy tails. In practice, we generate a great number of data that are skewed or heavytailed, for instance, the data on family income, CD4 count data from AIDS studies etc. Thus, one need to develop a flexible class of models that can readily adapt to the nonnormality behavior of the data. Flexible models that include several known distributions, including normal distribution are of particular importance, since, such models can adapt to distributions that are in the neighborhood of the normal model ( DiCiccio and Monti, 2004). Lange and Sinsheimer (1993) developed a normal/independent distributions, which contains a group of thicktailed distributions that are often used 1
Address for correspondence: V´ıctor Hugo Lachos D´avila, Departamento de Estat´ıstica, IMECC, Universidade Estadual de Campinas, Caixa Postal 6065, CEP 13083859, Campinas, S˜ao Paulo, Brazil. Email:
[email protected]
1
for robust inference of symmetrical data (Liu, 1996). In this article, we further generalize the normal/independent (NI) distributions and combine skewness with heavy tails. These new class of distributions are attractive as it simultaneously models the skewness with heavy tails, it has a stochastic representation for easy implementation of EMalgorithm, and also facilitates the study of many useful properties. Our proposal extends some of the recent results found in Azzalini and Capitanio (2003), Gupta (2003) and Wang and Genton (2006). Azzalini (1985), proposed the univariate skewnormal distribution and it was recently generalized to the multivariate case by Azzalini and Dalla–Valle (1996), and Arellano–Valle et al. (2005). The multivariate skewnormal densities extends the multivariate normal model by allowing a shape parameter to account for skewness. The probability density function (pdf) of the generic element of a multivariate skewnormal distribution is given by, f (y) = 2φp (yµ, Σ)Φ1 (λ⊤ Σ−1/2 (y − µ)) ,
y ∈ Rp .
(1)
where φp (.µ, Σ) stands for the pdf of the pvariate normal distribution with mean vector µ and covariate matrix Σ, Φ1 (.) represents the cumulative distribution function (cdf) of the standard normal distribution, and Σ−1/2 satisfies Σ−1/2 Σ−1/2 = Σ−1 . When λ = 0, the skew normal distribution reduces to the normal distribution (Y ∼ Np (µ, Σ)). A pdimensional random vector Y with pdf as in (1), will be denoted by SNp (µ, Σ, λ). Its marginal stochastic representation, which can be used to derive several of its properties, is given by d
Y = µ + Σ1/2 (δT0  + (Ip − δδ ⊤ )1/2 T1 ),
λ with δ = p , 1 + λ⊤ λ
(2)
where T0  denotes the absolute value of T0 , T0 ∼ N1 (0, 1) and T1 ∼ Np (0, Ip ) d are independent, and “ = ” means “distributed as”. From (2) it follows that the expectation and variance of Y are given, respectively, by, r 2 1/2 E[Y] = µ + Σ δ, (3) π 2 1/2 ⊤ 1/2 Σ δδ Σ . (4) π Several extensions of the above model has been proposed, viz., the skewt distributions (Sahu et al., 2003; Gupta, 2003), skewCauchy distributions (Arnold and Beaver, 2000), skewslash distributions (Wang and Genton, 2006), skewslasht distributions (Tan and Peng, 2006), and skewelliptical distributions (Azzalini and Capitanio, 1999; Branco and Dey, 2001; Sahu et a., 2003; Genton and Loperfido, 2005). In this article, we define a new unified family of asymmetric distributions that offers a much needed flexibility by combining both skewness with heavy tails. This family contains, as a special case, the multivariate skewnormal distribution defined by ArellanoValle et al. (2005), the multivariate skewslash distribution defined by Wang and Genton (2006), the multivariate skewt distribution defined by Azzalini and Capitanio (2003), and all the distributions studied by Lange and Sinsheimer V ar[Y] = Σ −
2
(1993) in a symmetric context. Thus, our proposal is a more flexible class than the existing skewed distributions, since it allows easy implementation of inferences in any type of models. We point out that the results and methods provided in this paper is not available elsewhere in the literature. The plan of the paper is as follows. In Section 2, the normal/independent distributions (NI) are reviewed for completeness. In Section 3, the skewnormal normal/independent distributions (SNI) are described and the main results are presented. In Section 4, we derive the maximum likelihood estimates (MLE) under two important applications of SNI distributions. Analytical expressions for the observed information matrix is derived in Section 5. Illustrative example is presented in Section 6, indicating the usefulness of the proposed methodology. Concluding remarks are presented in Section 6.
2
Normal/independent distributions
The symmetric family of NI distributions has attracted much attention in the last few years, mainly because they include distributions such as the Studentt, the slash, the power exponential, and the contaminated normal distributions. All these distributions have heavier tails than the normal. We say that a pdimensional vector Y has an NI distribution (see for instance, Lange and Sinsheimer, 1993) with location parameter µ ∈ Rp and a positive definite scale matrix Σ if its density function assumes the form Z ∞ f (y) = φp (yµ, u−1 Σ)dH(u), (5) 0
where H(u; ν) is a cdf of a unidimensional positive random variable U indexed by the parameter vector ν. For a random vector with a pdf as in (5), we shall use the notation Y ∼ NIp (µ, Σ; H). Now, when µ = 0 and Σ = Ip , use the notation Y ∼ NIp (H). Its stochastic representation is given by Y = µ + U −1/2 Z,
(6)
where Z ∼ Np (0, Σ) and U is a positive random variable with cdf H independent of Z. Examples of NI distributions are described subsequently (see Lange and Sinsheimer, 1993). For this family, the distributional properties of the Mahalanobis distance d = (y − µ)⊤ Σ−1 (y − µ), are described, because they are extremely useful in testing the goodness of fit and detecting outliers.
2.1
Examples of NI distributions
• The Studentt distribution with ν > 0 degrees of freedom, Y ∼ tp (µ, Σ, ν). The use of the tdistribution as an alternative to the normal distribution, has frequently been suggested in the literature, for example, Little (1988) and
3
Lange et al. (1989) use the Studentt distribution for robust modeling. Y has a density given by ) Γ( p+ν d p+ν f (y) = ν 2 p/2 ν −p/2 Σ−1/2 (1 + )−( 2 ) . Γ( 2 )π ν
(7)
In this case, we have that U ∼ Gamma(ν/2, ν/2), where H(u; ν) has density h(u; ν) =
(ν/2)ν/2 uν/2−1 1 exp (− νu), Γ(ν/2) 2
(8)
(ν/2)m Γ(ν/2 − m) , for m < ν/2. Γ(ν/2) From Lange and Sinsheimer (1993), it also follows that with finite reciprocal moments E[U −m ] =
d = (y − µ)⊤ Σ−1 (y − µ) ∼ pF (p, ν). • The slash distribution, Y ∼ SLp (µ, Σ, ν), with a shape parameter ν > 0. This distribution presents heavier tails than those of the normal distribution and it includes the normal case when ν ↑ ∞. Its pdf is given by Z 1 f (y) = ν uν−1 φp (yµ, u−1Σ)du. (9) 0
Here we have that H(u; ν) has density h(u; ν) = νuν−1 I(0,1) , (10) ν with reciprocal moments E[U −m ] = , for m < ν, and the Mahalanobis ν−m distance has cdf 2ν Γ(p/2 + ν) P r(d ≤ r) = P r(χ2p ≤ r) − P r(χ2p+2ν ≤ r). r ν Γ(p/2) • The contaminated normal distribution, Y ∼ CNp (µ, Σ, ν, γ), 0 ≤ ν ≤ 1, 0 < γ ≤ 1 (Little, 1988). This distribution may also be applied for modeling symmetric data with outlying observations. The parameter ν represents the percentage of outliers, while γ may be interpreted as a scale factor. Its pdf is given by Σ f (y) = νφp (yµ, ) + (1 − ν)φp (yµ, Σ). (11) γ In this case the probability function H(u; ν) is given by h(u; ν) = νI(u=γ) + (1 − ν)I(u=1) , ν = (ν, γ)⊤ ,
(12)
where the notation I(A) is the indicator function of the set A. Clearly, E[U −m ] = ν/γ m + 1 − ν, and P r(d ≤ r) = νP r(χ2p ≤ γr) + (1 − ν)P r(χ2p ≤ r).
The powerexponential distribution is of the type NI. However, the scale distribution H(u; ν) is not computationally attractive and it will not be dealt with in this work. 4
3
Multivariate skewnormal/independent distributions and main results
In this section, we define the multivariate SNI distributions and study some of its important properties, viz., moments, kurtosis, linear transformations, and marginal and conditional distributions. Definition 1. A pdimensional random vector Y follows an SNI distribution with location parameter µ ∈ Rp , scale matrix Σ (an p × p positive definite matrix) and skewness parameter λ ∈ Rp , if its pdf is given by Z ∞ f (y) = 2 φp (yµ, u−1 Σ)Φ1 (u1/2 λ⊤ Σ−1/2 (y − µ))dH(u) 0
= 2
Z
∞
0
u − dλ up/2 −1/2 Σ e 2 Φ1 (u1/2 λ⊤ Σ−1/2 (y − µ))dH(u), p/2 (2π)
(13)
where U is a positive random variable with cdf H(u; ν). For a random vector with pdf as in (13), we use the notion Y ∼ SNIp (µ, Σ, λ; H). when µ = 0 and Σ = Ip we get a standard SNI distribution and denote it by SNIp (λ; H). It is clear from (13) that, when λ = 0 we get back the NI distribution defined in (5). For a random vector with pdf as in (13), we write the Mahalanobis distance as dλ = (y − µ)⊤ Σ−1 (y − µ). In definition 1, note that the cdf H(u; ν) is indexed by the parameter vector ν. Thus, if we suppose that ν ∞ is such that ν ↑ ν ∞ , and H(u; ν) converges weakly to the distribution function H∞ (u) = H(u; ν ∞ ) of the unit point mass at 1, then the density function in (13) converges to the density function of a random vector having a skewnormal distribution. The proof of this result is similar to that of Lange and Sinsheimer (1993) for the NI case. For an SNI random vector, the stochastic representation given below can be used to quickly simulate pseudo realizations of Y and also to study many of its properties. Proposition 1. Let Y ∼ SNIp (µ, Σ, λ; H). Then d
Y = µ + U −1/2 Z,
(14)
where Z ∼ SNp (0, Σ, λ) and U is a positive random variable with cdf H independent of Z. Proof. The proof follows from the fact that YU = u ∼ SNp (µ, u−1 Σ, λ). Notice that the stochastic representation given in (6) for the NI case is a special case of (14) when λ = 0. Hence, we have extended the family of NI distributions for the skewed case. Besides, from (2) it follows that (14) can be written as d
Y =µ+
1 U 1/2
Σ1/2 {δX0  + (In − δδ T )1/2 X1 }, 5
(15)
p where δ = λ/ 1 + λ⊤ λ, and U, X0 ∼ N1 (0, 1) and X1 ∼ Np (0, Ip ) are all independent. The marginal stochastic representation given in (15) is very important since it allows to implement the EMalgorithm for a wide variety of linear models similar to those of Lachos et al. (2007). In the next proposition, we derive a general expression for the moment generating function (mgf) of a SNI random vector. Proposition 2. Let Y ∼ SNIp (µ, Σ, λ; H). Then Z ∞ 1 −1 ⊤ ⊤ s⊤ Y My (s) = E[e ]= 2es µ+ 2 u s Σs Φ1 (u−1/2 δ ⊤ Σ1/2 s)dH(u), s ∈ Rp .
(16)
0
Proof. From Proposition 1, we have that YU = u ∼ SNp (µ, u−1Σ, λ). Now, from the known properties of conditional expectation, it follows that My (s) = ⊤ EU [E[es Y U]] and the proof concludes of the fact that U is a positive random vari1 ⊤ ⊤ able with cdf H and since, if Z ∼ SNp (µ, Σ, λ) then Mz (s) = 2es µ+ 2 s Σs Φ1 (δ ⊤ Σ1/2 ). In the next proposition,summarized below, we show that an SNI random vector is invariant under linear transformations, which implies that the marginal distributions of Y ∼ SNIp (µ, Σ, λ; H) are still SNI.
Proposition 3. Let Y ∼ SNIp (µ, Σ, λ; H). Then for any fixed vector b ∈ Rm and matrix A ∈ Rm×p of full row rank matrix, V = b + AY ∼ SNIp (b + Aµ, AΣA⊤ , λ∗ ; H),
(17)
where λ∗ = δ ∗ /(1 − δ ∗⊤ δ ∗ )1/2 , with δ ∗ = (AΣA⊤ )−1/2 AΣ1/2 δ. Moreover, if m = p the matrix A is nonsingular, then λ∗ = λ. Also, for any a ∈ Rp , a⊤ Y ∼ SNIp (a⊤ µ, a⊤ Σa, λ∗ ; H),
where λ∗ = α/(1 − α2 )1/2 , with α = {a⊤ Σa(1 + λ⊤ λ)}−1/2 a⊤ Σ1/2 λ. Proof. The proof of this result was obtained directly from Proposition 2, since ⊤ Mb+AY (s) = es b MY (A⊤ s). When A is a nonsingular matrix, it is easy to see that δ ∗ = δ. By using Proposition 17, with A = [Ip1 , 0p2 ], p1 + p2 = p, we have the following additional properties of a SNI random vector, related with the marginal distribution. Corollary 1. Let Y ∼ SNIp (µ, Σ, λ; H) and Y is partitioned as Y ⊤ = (Y1⊤ , Y2⊤ )⊤ of dimensions p1 and p2 (p1 + p2 = p), respectively; let Σ11 Σ12 ⊤ ⊤ Σ= , µ = (µ⊤ 1 , µ2 ) Σ21 Σ22 be the corresponding partitions of Σ and µ. Then, the marginal density of Y1 is 1/2 e ; H), where SNIp1 (µ1 , Σ11 , Σ11 υ υ 1 + Σ−1 11 Σ12 υ 2 e=p . υ 1 + υ⊤ 2 Σ22.1 υ 2
−1/2 ⊤ ⊤ with Σ22.1 = Σ22 − Σ21 Σ−1 λ = (υ ⊤ 11 Σ12 , υ = Σ 1 , υ2 ) .
6
Proposition 4. Under the notation of Corollary 1, if Y ∼ SNIp (µ, Σ, λ; H) then the distribution of Y2 conditionally on Y1 = y1 and U = u, has density given by f (y2 y1 , u) = φp2 (y2 µ2.1 , u−1Σ22.1 )
Φ1 (u1/2 υ ⊤ (y − µ)) , e ⊤ (y1 − µ1 )) Φ1 (u1/2 υ
(18)
with µ2.1 = µ2 + Σ21 Σ−1 11 (y1 − µ1 ). Furthermore, E[Y2 y1 , u] = µ2.1 + u
−1/2
e ⊤ (y1 − µ1 )) φ1 (u1/2 υ Σ22.1 υ 2 p . ⊤ 1/2 e (y1 − µ1 )) 1 + υ ⊤ Φ1 (u υ 2 Σ22.1 υ 2
(19)
Proof. In fact, the density of f (y2 y1 , u) = f (yu)/f (y1u), and (18) follows by 1/2 e ). noting that YU = u ∼ SNp (µ, u−1 Σ, λ) and Y1 U = u ∼ SN(µ1 , u−1Σ11 , Σ11 υ ⊤ Result (19) follows from Lemma 2 given in Appendix B, with A = υ 1 (y1 − µ1 ) − υ⊤ 2 µ2 , B = υ 2 , µ = µ2.1 and Σ = Σ22.1 , which concludes the proof. Note that given u, when Σ21 = 0 and λ2 = 0, is possible to obtain independence for the components Y1 and Y2 of a SNI random vector Y. The following Corollary is a byproduct of Proposition 4, since E[Y2 y1 ] = EU [E[Y2 y1 , U]y1 ]. Proposition 5. Consider the notation of Corollary 1. If Y ∼ SNIp (µ, Σ, λ; H), then the first moment of Y2 conditionally on Y1 = y1 is given by E[Y2 y1 ] = µ2.1 + p
Σ22.1 υ 2 1 + υ⊤ 2 Σ22.1 υ 2
with µ2.1 = µ2 + Σ21 Σ−1 11 (y1 − µ1 ).
E[U −1/2
e ⊤ (y1 − µ1 )) φ1 (U 1/2 υ y1 ], e ⊤ (y1 − µ1 )) Φ1 (U 1/2 υ
In Section 3, we give additional results for some elements of this family of distributions based on Proposition 5. The next result can be useful in applications to linear models. For instance, when the linear model depends on a vector of unobservable random effects and a vector of random errors (linear mixed model), in which the random effects is assumed to have a SNI distribution and the errors is assumed to have a NI distribution. Proposition 6. Let X ∼ SNIm (µ1 , Σ1 , λ, H) and Y ∼ NIp (µ2 , Σ2 , H), where for d a positive random variable U with cdf H, we can write X = µ1 + U −1/2 Z and d Y = µ2 + U −1/2 W, with Z ∼ SNm (0, Σ1 , λ) independent of W ∼ Np (0, Σ2 ), then for any matrix A of dimension p × m, AX + Y ∼ SNIm (Aµ1 + µ2 , AΣ1 A⊤ + Σ2 , λ∗ ; H), q 1/2 ⊤ −1/2 where λ∗ = δ ∗ / 1 − δ ⊤ AΣ1 δ. ∗ δ ∗ , with δ ∗ = (AΣ1 A + Σ2 )
7
Proof. The proof is based on the result of Proposition 2. Note first that given U; X and Y are independent, so that letting V = AX + Y, we have that ⊤
⊤
MV (s) = EU (E[es AX U]E[es Y U]) 1 ⊤ 1 ⊤ Z ∞ ⊤ 1/2 s Aµ1 + s AΣ1 A⊤ s δ ⊤ Σ1 A⊤ s s⊤ µ2 + s Σ2 s 2u 2u √ = 2e Φ1 ( )e dH(u) u 0 1 ⊤ Z ∞ ⊤ 1/2 s (Aµ1 +µ2 )+ s (AΣ1 A⊤ + Σ2 )s δ ⊤ Σ1 A ⊤ s 2u √ = 2e Φ1 ( )dH(u) u 0 1 ⊤ Z ∞ t⊤ (Aµ1 +µ2 )+ s (AΣ1 A⊤ + Σ2 )s δ ⊤ Ψ1/2 s 2u = 2e )dH(u), Φ1 ( ∗ √ u 0 1/2
where Ψ = AΣ1 A⊤ + Σ2 and where δ ∗ = Ψ−1/2 AΣ1 δ, and the proof follows from Proposition 2. In the following proposition we derive the mean vector and the covariance matrix of a SNI random vector. Moreover, we present the the multidimensional kurtosis coefficient for a random vector SNI, which represent a extension of the kurtosis coefficient proposed by Azzalini and Capitanio (1999). Proposition 7. Suppose that Y ∼ SNI qp (µ, Σ, λ; H). Then, a) If E[U −1/2 ] < ∞, then E[Y] = µ +
2 E[U −1/2 ]Σ1/2 δ; π E[U −1 ]Σ − π2 E 2 [U −1/2 ]Σ1/2 δδ ⊤ Σ1/2 ;
b) If E[U −1 ] < ∞, then V ar[Y] = Σy = c) If E[U −2 ] < ∞, then the multidimensional kurtosis coefficient is
E[U −2 ] E[U −3/2 ] a − 4 a2y + a3y − p(p + 2), 1y E 2 [U −1 ] E 2 [U −1 ] −1 −1 2 ⊤ −1 2 where a1y = p(p+2)+2(p+2)µ⊤ Σ µ +3(µ Σ µ ) , a = p + µ⊤ 2y y y y y y y y Σy µy + E[U −1/2 ] −1 ] −1 2 ⊤ −1 ⊤ −1 2 1 + E[U 2−1/2] − π2 EE[U (µ⊤ 2 [U −1/2 ] y Σy µy ) and a3y = 2(p+2)µy Σy µy +3(µy Σy µy ) , q with µy = E[Y − µ] = π2 E[U −1/2 ]Σ1/2 δ. γ2 (Y) =
Proof. The proof of a) and b) follows from Proposition 1. For to obtain the expression in c), we use the definition of the multivariate kurtosis introduced by Mardia (1974). Without loss of generality, we consider that µ = 0, so that µy = q 2 E[Y] = E[U −1/2 ]Σ1/2 δ. Note first that the kurtosis is defined by γ2 (Y) = π 2 E[{(Y − µy )⊤ Σ−1 y Y − µy )} ]. Now, by using the stochastic representation of Y given in (6), we have that d
−1 ⊤ −1 ⊤ −1 (Y − µy )⊤ Σ−1 Z Σy Z − 2U −1/2 Z⊤ Σ−1 y Y − µy ) = U y µy + µy Σy µy ,
where Z ∼ SNp (0, Σ, λ). According to definition of γ2 (Y) and after some algebraic manipulations, the proof follow by using of the first two moments of a quadratic form ( see, Genton, He and Liu, 2001) and Lemma 1 given in Appendix B. 8
Notice that under the skewnormal distribution, i.e,, when U = 1, the multidi−1 2 mensional kurtosis coefficient reduce to γ2 (Y) = 2(π − 3)(µ⊤ y Σy µy ) , which is the kurtosis coefficient for a skewnormal random vector (see for instance, Azzalini and Capitanio, 1999). Proposition 8. If Y ∼ SNIp (µ, Σ, λ; H), then for any even function g, the distribution of g(Y −µ) does not depend on λ and has the same distribution that g(X−µ), where X ∼ NIp (µ, Σ; H). In a particular case, if A is a p × p symmetric matrix, then (Y − µ)⊤ A(Y − µ) and (X − µ)⊤ A(X − µ) are identically distributed. Proof. The proof follows by using Proposition 2 and a similar procedure to found in Wang et al. (2004). As a byproduct of Proposition 8, we have the following interesting result Corollary 2. Let Y ∼ SNIp (µ, Σ, λ; H). Then the quadratic form dλ = (Y − µ)⊤ Σ−1 (Y − µ) has the same distribution as d = (X − µ)⊤ Σ−1 (X − µ), where X ∼ NIp (µ, Σ; H). The result of Corollary 2 is interesting because it allows us to check models in practice (see Section 5). On the other hand, Corollary 2 together with the result found in Lange and Sinsheimer (1993, Section 2) allows us to obtain the mth moment of dλ . Corollary 3. Let Y ∼ SNIp (µ, Σ, λ; H). Then for any m > 0 E[dm λ] =
3.1
2m Γ(m + p/2) E[U −m ]. Γ(p/2)
Examples of SNI distributions
Some examples of SNI distributions, includes • The skewt distribution, with ν degree of freedom, STp (µ, Σ, λ, ν). Considering U ∼ Gamma(ν/2, ν/2), similar procedures to found in Gupta (2003, Section 2) lead to the following density function: √ v + pλ⊤ Σ−1/2 (y − µ) √ 0, 1, ν + p), y ∈ Rp , (20) f (y) = 2tp (yµ, Σ, ν)T1 ( d+ν where as usual, tp (·µ, Σ, ν) and Tp (·µ, Σ, ν) denote, respectively, the pdf and cdf of the Studentt distribution, tp (µ, Σ, ν), defined in (7). A particular case of the skewt distribution is the skewCauchy distribution, when ν = 1. Also, when ν ↑ ∞, we get the skewnormal distribution as the limiting case. See Gupta (2003) for further details. In this case, from Proposition 7, the mean and covariance matrix of Y ∼ STp (µ, Σ, λ, ν) are given by r ) 1/2 ν Γ( ν−1 2 E[Y] = µ + Σ δ, ν > 1 and ν π Γ( 2 ) 9
Figure 1: Densities curves of the univariate skewnormal, skewt, skewslash and conta
0.8
minated skewnormal distributions.
0.0
0.2
0.4
0.6
SN ST SNC SSL
−2
V ar[Y] =
0
2
4
6
8
) 2 1/2 ⊤ 1/2 ν ν Γ( ν−1 2 Σ− ( ) Σ δδ Σ , ν > 2. ν ν−2 π Γ( 2 )
In what follows, we give an important result which will be used in the implementation of the EM algorithm and to found closed form expressions of the first conditional moment given in Proposition 5. Proposition 9. If Y ∼ STp (µ, Σ, λ, ν). Then, r p+ν+2r 2r+1 ν ν/2 Γ( p+ν+2r )(d + ν)− 2 p + ν + 2r r 2 √ E[U y] = T ( A0, 1, p + ν + 2r), 1 d+ν f (y)Γ(ν/2) π p Σ1/2 and
r
E[U WΦ1 (U
1/2
2r+1/2 ν ν/2 Γ( p+ν+2r )(d + ν + A2 )− 2 A)] = √ p+1 f (y)Γ(ν/2) π Σ1/2
p+ν+2r 2
.
where A = λ⊤ Σ−1/2 (y − µ) and WΦ1 (x) = φ1 (x)/Φ1 (x), x ∈ R. Proof. The proof follows from Lemma 1 in Azzalini and Capitanio (2003)(see also Appendix B), since f (uy) = f (y, u)/f (y) and Z ∞ 2 r E[U y] = ur φp (yµ, u−1Σ)Φ1 (u1/2 A)Gu (ν/2, ν/2)du, f (y) 0 10
and r
E[U WΦ1 (U
1/2
2 A)] = f (y)
Z
∞
ur φp (yµ, u−1Σ)φ1 (u1/2 A)Gu (ν/2, ν/2)du,
0
where Gu (ν/2, ν/2) denoted the pdf of the Gamma( ν2 , ν2 ) distribution. For a skewt random vector Y, partitioned as Y ⊤ = (Y1⊤ , Y2⊤ )⊤ , we have from 1/2 e , ν). Thus, from Proposition 5 we Corollary 1 that Y1 ∼ STp1 (µ1 , Σ11 , Σ11 υ have the following result: Corollary 4. Under the notation of Proposition 5, if Y ∼ STp (µ, Σ, λ, ν). Then, E[Y2 y1 ] = µ2.1 + p
Σ22.1 υ 2
× 1 + υ⊤ Σ υ 22.1 2 2 ν+p1 −1 ν/2 ν+p1 −1 ν Γ( ) 1 ⊤ 2 − 2 2 (ν + d + (e υ (y − µ )) ) y 1 √ 1 1 f (y1 ) Γ(ν/2) π (p1 +1) Σ11 1/2
where dy1 = (y1 − µ1 )⊤ Σ−1 11 (y1 − µ1 ).
• The skewslash distribution, with the shape parameter ν > 0, SSLp (µ, Σ, λ, ν). With h(u; ν) as in (10), from the Proposition 1, can be easily seen that Z 1 Σ f (y) = 2ν uν−1 φp (yµ, )Φ1 (u1/2 λ⊤ Σ−1/2 (y − µ)), y ∈ Rp , (21) u 0 The skewslash distribution reduces to the skewnormal distribution when ν ↑ ∞. See Wang and Genton (2006) for further details. In this case from the Proposition 7 r 2 2ν E[Y] = µ + Σ1/2 δ, ν > 1/2, and π 2ν − 1 V ar[Y] =
2 2ν 2 1/2 ⊤ 1/2 ν Σ− ( ) Σ δδ Σ , ν > 1. ν−1 π 2ν − 1
As in the skewt case we have the following results: Proposition 10. If Y ∼ SSLp (µ, Σ, λ, ν). Then,
p + 2ν + 2r d − p+2ν+2r 2 2ν+r+1 νΓ( p+2ν+2r )P1 ( , )d 2 r 2 2 √ E[U y] = E[Φ(S 1/2 A)], f (y) π p Σ1/2 where Si ∼ Gamma( r
E[U WΦ1 (U
1/2
p + 2ν + 2r d , )I(0,1) , and 2 2
) 2ν+r+1/2 νΓ( 2ν+p+2r 2ν + p + 2r d + A2 2 − 2ν+p+2r 2 2 A)] = (d+A ) P1 ( , ) √ p+1 2 2 f (y) π Σ1/2
where Px (a, b) denotes the cdf of the Gamma(a, b) distribution evaluated at x. 11
Corollary 5. Under the notation of Proposition 5, if Y ∼ SSLp (µ, Σ, λ, ν). Then, Σ22.1 υ 2 E[Y2 y1 ] = µ2.1 + p × 1 + υ⊤ 2 Σ22.1 υ 2 2ν ν Γ f (y1 )
P1 (
(p1 +2ν−1) (dy1 2
√
+ (e υ ⊤ (y1 − µ1 ))2 )−
π
(p1 +1)
p1 +2ν−1 2
Σ11 1/2
×
υ ⊤ (y1 − µ1 ))2 p1 + 2ν − 1 dy1 + (e , ), 2 2
where dy1 = (y1 − µ1 )⊤ Σ−1 11 (y1 − µ1 ). • The contaminated skewnormal distribution, SCNp (µ, Σ, λ, ν, γ), 0 ≤ ν ≤ 1, 0 < γ < 1. Taking h(u; ν) as in (12), it follows straightforwardly that Σ )Φ1 (γ 1/2 λ⊤ Σ−1/2 (y − µ)) γ +(1 − ν)φp (yµ, Σ)Φ1 (λ⊤ Σ−1/2 (y − µ))},
f (y) = 2{νφp (yµ,
(22)
in this case, the contaminated skewnormal distribution reduces to the skewnormal distribution when γ = 1. Hence, the mean vector and the covariance matrix are given, respectively, by r 2 ν ( 1/2 + 1 − ν)Σ1/2 δ, and E[Y] = µ + π γ ν 2 ν V ar[Y] = ( + 1 − ν)Σ − ( 1/2 + 1 − ν)2 Σ1/2 δδ ⊤ Σ1/2 . γ π γ From (22) it follows the following results Proposition 11. If Y ∼ SCNp (µ, Σ, λ, ν, γ). Then, E[U r y] =
2 [νγ r φp (yµ, γ −1 Σ)Φ1 (γ 1/2 A) + (1 − ν)φp (yµ, Σ)Φ1 (A)] f (y)
and E[U r WΦ1 (U 1/2 A)] =
2 [νγ r φp (yµ, γ −1 Σ)φ1 (γ 1/2 A)+(1−ν)φp (yµ, Σ)φ1 (A)] f (y)
Corollary 6. Under the notation of Proposition 5, if Y ∼ SCNp (µ, Σ, λ, ν, γ). Then, E[Y2 y1 ] = µ2.1 +
2Σ υ p 22.1 2 × f (y1 ) 1 + υ ⊤ 2 Σ22.1 υ 2
h e ⊤ (y1 − µ1 )) νγ −1/2 φp1 (y1 µ1 , γ −1 Σ11 )φ1 (γ 1/2 υ i + (1 − ν)φp1 (y1 µ1 , Σ11 )φ1 (e υ ⊤ (y1 − µ1 ))
where dy1 = (y1 − µ1 )⊤ Σ−1 11 (y1 − µ1 ). 12
Remark 1. a) The stochastic representation given by equation (6) can be used to obtain the slash student distribution. Let U1 have pdf as in (10), U2 ∼ Gamma(ν/2, ν/2), ν > 0 and X ∼ Np (0, Σ), all independently distributed. Then d
−1/2
Y = µ + U1
−1/2
U2
X
(23)
has a slash student distribution that was defined in Tang and Peng (2006). The proof follows from the fact that −1/2
T = U2
X ∼ tp (µ, Σ, ν).
b) Now, if X ∼ SNp (0, Σ, λ), then Y in (23) has a skewslash student distribution as shown by Tang and Peng (2006). Obviously, many other distributions can be constructed by choosing appropriate pdfs (h(.; ν)) for U1 and U2 . In Figure 1, we drew the density of the standard distribution SN1 (3) together with the standard densities of the distributions ST1 (3, 2), SSL1 (3, 1) and SNC1 (3, 0.5, 0.5). They are rescaled so that they have the same value at the origin. Note that the four densities are positively skewed, and that the skewslash and the skewt distributions have much heavier tails than the skewnormal distribution. Actually, the skewslash and the skewt distributions do not have finite means and variances. Figure 2 depicts some contours of the densities associated with the standard bivariate skewnormal distribution SN2 (λ), the standard bivariate skewt distribution ST2 (λ, 2), the standard bivariate skewslash distribution SSL2 (λ, 1), and the standard bivariate contaminated skewnormal distribution SCN2 (λ, 0.5, 0.5), with λ = (2, 1)⊤ for all the distributions. Note that these contours are not elliptical and they can be strongly asymmetric depending on the choice of the parameters.
3.2
A Simulation study
To illustrate the usefulness of the SNI distribution, we perform a small simulation to study the behavior of two location estimators, the sample mean and the sample median, under four different standard univariate settings. In particular, we consider a standard skewnormal SN1 (3) , a skewt ST1 (3, 2), a skewslash SSL1 (3, 1) and a contaminated skewnormal SCN1 (3, 0.9, 0.1). The mean of all the asymmetric distributions is adjusted to zero, so that all four distributions are comparable. Thus, this setting represents four distributions with the same mean, but with different tail behaviors and skewness. Note that, the skewslash and skewt will have infinite variance when ν = 1, and 2 respectively. We simulate 500 samples of size n = 100 from each of these four distributions. For each sample, we compute the sample mean and the sample median and report the boxplot for each distribution in Figure 3. In the left panel, we observe that all boxplots of the estimated means are centered around zero but have larger variability for the heavy tailed distributions (skewt and skewslash). In the right panel, we see that the boxplots of the estimated medians has a slightly larger variability for the skewnormal and skewcontaminated normal and has a much smaller variability for skewt and skewslash distributions. This 13
Figure 2: Contour plot of some elements of the standard bivariate SNI family. (a) SN2 (λ) (b) ST2 (λ, 2) (c) SCN2 (λ, 0.5, 0.5) (d) SSL2 (λ, 1), where λ = (2, 1)⊤ .
2 0 −2 −4
−4
−2
0
2
4
(b)
4
(a)
−4
−2
0
2
4
−4
−2
2
4
2
4
2 0 −2 −4
−4
−2
0
2
4
(d)
4
(c)
0
−4
−2
0
2
4
−4
−2
0
indicates that the median is a robust estimator of location at asymmetric light tailed distributions. On the other hand, the median estimator becomes biased as soon as unexpected skewness and heavy tailed arise in the underlying distribution.
4
Applications and maximum likelihood estimation
This section presents EMalgorithms for performing maximum likelihood estimation for two multivariate SNI models of considerable practical interest.
4.1
Multivariate SNI responses
Suppose that we have observations on n independent individuals, Y1 , . . . , Yn , where Yi ∼ SNIp (µ, Σ, λ; H), i = 1, . . . , n. . The parameter vector is θ = (µ⊤ , γ ⊤ , λ⊤ )⊤ , 14
Figure 3: Boxplots of the sample mean (left panel) and sample median (right panel) on
2.0 1.5 1.0 0.5 0.0 −0.5 −1.0
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
500 samples of size n=100 from the four standardized distributions: SN1 (3); ST1 (3, 2); SSL1 (3, 1), and SN C1 (3, 0.9, 0.1). The respective means are adjusted to zero.
SN
ST
SNC
SSL
SN
ST
SNC
SSL
where γ denotes a minimal set of parameters such that Σ(γ) is well defined (e.g. the upper triangular elements of Σ in the unstructured case). In what follows, we illustrate the implementation of likelihood inference for the multivariate SNI via the EMalgorithm. The EMalgorithm is a popular iterative algorithm for ML estimation for models with incomplete data. More specifically, let y denote the observed data and s denote the missing data. The complete data yc = (y, s) is y augmented with s. We denote by ℓc (θyc ), θ ∈ Θ, the completedata b = E[ℓc (θyc )y, θ], b the expected completeloglikelihood function and by Q(θθ) data loglikelihood function. Each iteration of the EMalgorithm involves two steps; an Estep and an Mstep, defined as: • Estep: Compute Q(θθ (r) ) as a function of θ; • Mstep: Find θ (r+1) such that Q(θ (r+1) θ (r) ) = maxθ ∈Θ Q(θθ (r) ). Notice that, by using (15), the setup defined above can be written as ind
1/2 Yi Ti = ti , Ui = ui , ∼ Np (µ + ti Σ1/2 δ, u−1 (Ip − δδ ⊤ )Σ1/2 ), i Σ 1 iid Ti Ui = ui ∼ HN1 (0, ) i = 1, . . . , n, ui ind
Ui ∼ h(ui ; ν) 15
(24) (25) (26)
all independent, where HN1 (0, 1) denotes the univariate standard halfnormal distribution (see X0  in equation (2) or Johnson et al., 1994). We assume that the parameter vector ν is known. In applications the optimum value of ν can be determined using the profile likelihood and the Schwarz information criterion (see Lange et al., 1989). Let y = (y1⊤ , . . . , yn⊤ )⊤ u = (u1 , . . . , un )⊤ and t = (t1 , . . . , tn )⊤ . Then, under the hierarchical representation (24)(25), with ∆ = Σ1/2 δ and Γ = Σ − ∆∆⊤ , it follows that the complete loglikelihood function associated with yc = (y⊤ , u⊤ , t⊤ )⊤ is ℓc (θyc ) = n
c−
1X n log Γ − ui(yi − µ − ∆ti )⊤ Γ−1 (yi − µ − ∆ti ) 2 2 i=1 (27)
where c is a constant that is independent of the parameter vector θ. Letting u bi = 2 c 2 b b b b E[Ui θ = θ, yi ], uti = E[Ui Ti θ = θ, yi ] , uti = E[Ui Ti θ = θ, yi ] and using known properties of conditional expectation we obtain cT τbi ci = u ut bi µ bT i + M i 2 2 c 2 c cT µ ut i = u bi µ bTi + MTi + M b τb , i Ti i
(28) (29)
1/2
1/2
where τbi = E[Ui WΦ1 (
Ui µ bTi b c2 = 1/(1 + )θ, yi ], WΦ1 (x) = φ1 (x)/Φ1 (x), M T c MT i
b ⊤Γ b −1 ∆) b and µ b ⊤Γ b −1 (yi − µ), i = 1, . . . , n. c2 ∆ ∆ bTi = M Ti µ Since MTTi = λ⊤ Σ−1/2 (yi − µ), the conditional expectations given in (28)  (29), i specifically u bi and τbi , can be easily derived from the result given in Section 3.1. Thus, at least for the skewt and skewcontaminated normal distributions of the SNI class we have closed form expression for the quantities u bi and τbi . For the skewslash case, Monte Carlo integration may be employed, which yield the socalled MCEM algorithm. It follows, after some simple algebra and using (28)(29), that the conditional expectation of the complete loglikelihood function has the form n
X b = E[ℓc (θyc )y, θ] b = c − n log Γ − 1 Q(θθ) u bi (yi − µ)⊤ Γ−1 (yi − µ) 2 2 i=1 +
n X i=1
n
X c2 i ∆⊤ Γ−1 ∆. ci (yi − µ)⊤ Γ−1 ∆ − 1 ut ut 2 i=1
We then have the following EMtype algorithm: c2 i , ut b compute ut ci and ubi, for i = 1, . . . , n, using (28)(29). Estep: Given θ = θ, b by maximizing Q(θθ) b over θ, which leads to the following closed Mstep: Update θ
16
form expressions b = µ
n X i=1
ci ∆)/( (ubi yi − ut
n X i=1
u bi),
(30)
i Xh c2 i ∆∆⊤ , b = 1 b i ∆(yi − µ)⊤ + ut Γ u bi(yi − µ)(yi − µ)⊤ − 2ut n i=1 Pn b i=1 uti (yi − µ) b = ∆ . Pn c2 ut i n
i=1
The skewness parameter vector and the unstructured scale matrix can be estib =Σ b =Γ b+∆ b∆ b T and λ b −1/2 ∆/(1 b b ⊤Σ b −1 ∆) b 1/2 . It is mated by noting that Σ −∆ clear that when λ = 0 (or ∆ = 0) the Mstep equations reduce to the equations obtained assuming normal/independent distribution. Note also that, this algorithm clearly generelized results found in Lachos et al. (2007, Section 2) by taken Ui = 1, i = 1, . . . , n. Useful starting values required to implement this algorithm are those obtained under the normality assumption, with the starting values for the skewness parameter vector set equal to 0. However, in order to ensure that the true ML estimate is identified, we recommend running the EM algorithm using a range of different starting values. The loglikelihood function for θ = (µ⊤ , γ ⊤ , λ⊤ )⊤ , given the observed sample y = (y1⊤ , . . . , yn⊤ )⊤ is of the form ℓ(θ) =
n X
ℓi (θ),
(31)
i=1
p 1 where ℓi (θ) = log 2 − log2π − log Σ + log Ki , with 2 2 Z ∞ 1 p/2 1/2 Ki = Ki (θ) = ui exp{− ui di }Φ1 (ui Ai )dH(ui), 2 0 where di = (yi − µ)⊤ Σ−1 (yi − µ) and Ai = λ⊤ Σ−1 (yi − µ). Explicit expressions for the observed information matrix can be derived from the results given in Section 5 and Appendix A.
4.2
Multivariate measurement error model
In this section we further apply the SNI distribution to a multivariate measurement error model. Let n be the sample size; Xi , the observed value of the covariate in unit i; yij , the jth observed response in unit i and xi , the unobserved (true) covariate value for unit i, i = 1, . . . , n and j = 1, . . . , r. Relating these variables we postulate as working model the equations (see, Barnett, 1969 and Shyr and Gleser, 1986), Xi = xi + ui ,
(32)
Zi = α + βxi + ei ,
(33)
and
17
where Zi = (zi1 , . . . , zir )⊤ , is the vector of responses for the ith experimental unit, ei = (ei1 , . . . , eir )⊤ , is a random vector of measurement errors of dimension r, α = (α1 , ..., αr )⊤ and β = (β1 , ..., βr )⊤ are parameter vectors of dimension r. Let ǫi = ⊤ ⊤ ⊤ (ui, e⊤ i ) and Yi = (Xi , Zi ) , then the model defined by equations (32)(33) can be written as Yi = a + bxi + ǫi , (34) where a = (0, α⊤ )⊤ and b = (1, β⊤ )⊤ are p × 1 vectors, with p = r + 1 .We assume that xi iid µx λx ∼ SNIp+1 , D(φx , φ), ;H , (35) ǫi 0 0 i = 1, . . . , n, where D(φx , φ) = diag(φx , φ1 , . . . , φp )⊤ , with φ = (φ1 , . . . , φp ), which will be called structural SNIMMEM. From (6), this formulation implies that xi µx λ −1 Ui = ui ∼ SNp+1 , ui D(φx , φ), x , ǫi 0 0 Ui ∼ h(ui ; ν),
(36) (37)
i = 1, . . . , n. Note from Corollary 1 that marginally, ind
ind
ǫi ∼ NIm+1 (0, D(φ); H) and xi ∼ SNI1 (µx , φx , λx ; H).
(38)
The asymmetric parameter λx incorporates asymmetry in the latent variable xi and consequently in the observed quantities Yi , i = 1, . . . , n, which will be shown to have marginally multivariate SNI distributions. If λx = 0, then the asymmetric model reduces to the symmetric MMEM considering NI distributions. Under (35), it follows from (2) that the regression set up defined in (32)(33) can be written hierarchically as ind
(39)
ind
(40)
Yi  xi , Ui = ui ∼ Np (a + bxi , u−1 i D(φ)),
2 xi  Ti = ti , Ui = ui ∼ N1 (µx + φx1/2 δx ti , u−1 i φx (1 − δx )), 1 iid Ti ∼ HN1 (0, ), ui iid
Ui ∼ h(ui; ν),
(41) (42)
i = 1, . . . , n, all independent, with δx = λx /(1 + λ2x )1/2 . As in Lange et al. (1989), we assumed that ν is known. Classical inference on the parameter vector θ = (α⊤ , β ⊤ , φ⊤ , µx , φx , λx )⊤ in this type of model is based on the marginal distribution for Yi (see, Bolfarine and GaleaRojas, 1995), given in the following proposition. Proposition 12. Under the structural SNIMMEM defined in (32)(33), the marginal distribution of Yi is given by Z ∞ 1/2 ¯ ⊤ −1/2 fYi (yi θ) = 2 φp (yi µ, u−1 (yi − µ))dH(ui), (43) i Σ)Φ1 (ui λx Σ 0
18
iid ¯ x ; H), i = 1, . . . , n, where i.e., Yi ∼ SNIp (µ, Σ, λ
λx φx Σ−1/2 b ¯x = p µ = a + bµx , Σ = φx bb⊤ + D(φ) and λ , φx + λ2x Λx
with Λx = (φx −1 + b⊤ D −1 (φ)b)−1 = φx /c and c = 1 + φx b⊤ D −1 (φ)b.
Proof. The proof is direct from Proposition 6 and after some algebraic manipulations. It follows that the loglikelihood function for θ given the observed sample y = (y1⊤ , . . . , yn⊤ )⊤ is given by n X ℓ(θ) = ℓi (θ), (44) i=1
p 1 where ℓi (θ) = log 2 − log2π − log Σ + log Ki , with 2 2 Z ∞ 1 1/2 p/2 Ki = Ki (θ) = ui exp{− ui di }Φ1 (ui Ai )dH(ui), 2 0 ¯ x as in Proposition 12. In this case, di = (yi − µ)⊤ Σ−1 (yi − µ) and where µ, Σ, λ ⊤ ¯ Σ−1/2 (yi − µ) = Ax ai , with Ai = λ x Ax = p
λx Λ x and ai = (yi − µ)⊤ D −1 (φ)b. φx + λ2x Λx
The ML estimators of the parameter in the model (34)(35) can be found by direct maximization of the loglikelihood (44) which can be computed numerically by using the optim routine in platform R, fmincon in Matlab. An oftvoiced complaint of these methods is that it may not converge unless good starting values are used. The EM algorithm, which takes advantage of being insensitive to the stating values as a powerful computational tool that requires the construction of unobserved data, has been well developed and has become broadly applicable approach to the iterative computation of ML estimates. Thus, let y = (y1⊤ , . . . , yn⊤ )⊤ , x = (x1 , . . . , xn )⊤ , 1/2 u = (u1 , . . . , un )⊤ , t = (t1 , . . . , tn )⊤ , νx2 = φx (1 − δx2 ) and τx = φx δx , it follows that the complete loglikelihood function associated with (y, x, t, u) is n
n 1X ℓc (θy, x, t, u) ∝ − log(D(φ)) − ui (yi − a − bxi )⊤ D −1 (φ)(yi − a − bxi ) 2 2 i=1 n n 1 X 2 − log(νx ) − 2 ui (xi − µx − τx ti )2 . 2 2νx i=1
(45)
b yi ], ut b yi ], tb2 = E[t2 θ, b yi ], ucxi = E[Ui xi θ, b yi ], b i = E[Ui ti θ, Letting u bi = E[Ui θ, i i
19
d2 i = E[Ui x2 θ, b yi ] and u b yi ], we obtain d ux txi = E[Ui ti xi θ, i 1/2
bTi b cT E[U 1/2 WΦ1 ( Ui µ bi = u ut bi µ bT i + M )θ, yi ], i cT M 1/2 Ui µ bTi b 1/2 c2 + M cT µ tb2 i = u bi µ b2Ti + M b E[U W ( )θ, yi ], Ti Φ1 T i cT M
d2 i = T cx 2 + rb2 u b i , ux b i + sb2 tb2 i , ucxi = rbi u bi + b s ut risb ut i bi + 2b d b i + sb tb2 i , u txi = rbi ut
(46)
and
2 2 b⊤ 2 b b ⊤ −1 b −1 b b ⊤ (D(φ)+ b νbx 2 b bb b⊤ )−1 (yi − d c2 b where M bTi = τbx M T = [1+τbx b (D(φ)+νbx bb ) b] , µ T 2 2 b µx ), T b ⊤ D −1 (φ) b b] b −1 , rbi = µ b ⊤ D −1 (φ)(y b i−b b µx ) cx = νbx 2 [1 + νbx 2 b b a − bb bx + c Tx b a − bb b⊤ D −1 (φ) b b). b Once again closed for expression can be found for cx 2 b and sb = τbx (1 − T 1/2
1/2
E[Ui WΦ1 (
Ui
µ bTi
T M
b yi ] from the results given in Section 3.1. )θ,
Thus, we have the following EM type algorithm:
d2 i , and u b compute u b i , ucxi , ux d Estep: Given θ = θ, bi , tb2 i , ut txi for i = 1, . . . , n, using (46). b by maximizing E[ℓc (θy, x, t, u)y, b Mstep: Update θ θ] over θ, which leads to b b = zu − xu β, α n
Pn
n
cxi (zi − zu ) b 1X d2 i ), b = Pi=1 u , φ = (b uiXi2 − 2c uxi Xi + ux β 1 n d 2 2 − nb n ux u x i i=1 u i=1
1X d2 i − 2b φbj+1 = (b uizij2 + u bi αj2 + βj2 ux uiαj zij − 2yij βj u cxi + 2αj βj ucxi ), j = 1, . . . , r, n i=1 n
µ bx = xu − τbx tu , τbx =
Pn
n
1 X d2 1 Xd = (ux i − µ bx ucxi ) − τbx utxi and n i=1 n i=1
d − xu ut b i) , b ) (tb2 − t ut
i=1 (utxi
Pn
νbx2
i=1
i
u
i
Pn Pn Pn n bi u bizi u cxi ut 1X i=1 i=1 where, zu = Pn , xu = Pn , tu = Pi=1 and u b = u bi . n bi bi bi n i=1 i=1 u i=1 u i=1 u Note that when Ui = 1 the Mstep equations reduce to the equations obtained in Lachos, et al. (2005) under the skewnormal distribution and when λx = 0 (or τx = 0) the Mstep equations reduces to the equations obtained by Bolfarine and GaleaRojas (1995). Moreover, when U ∼ Gamma(ν/2, ν/2) and λx = 0 , the Mstep reduces to equations obtained by Bolfarine and GaleaRojas (1996). The shape and scale parameters of the latent variable x, can be estimated by noting that τx /νx = λx , and φx = τx2 + νx2 . We now consider an empirical Bayes inference for the latent variable that is useful for estimating the xi quantities. From model (34) and 20
(38), it implies that Yi xi ∼ NIp (a + bxi , D(φ); H) and xi ∼ SNI1 (µx , σx2 , λx ; H). The conditional density of xi given yi , ui is f (xi yi , ui) = φq (xi µx +
Λx ai , u−1 i Λx )
1/2 λx (xi −µx ) ) σx , 1/2 Φ1 (ui Ai )
Φ1 (ui
where Λx and ai , Ai as in Proposition 12 and (44), respectively. It follows, from Lemma 1 in Appendix A, that −1/2
E[xi yi , ui ] = µx + Λx ai + ui
p
Λ x λx 1/2 WΦ1 (ui Ai ), i = 1, . . . , n, 2 1 + λx Λ x
and since E[xy] = EU [E[xy, U]y], we have that the minimum meansquare error (MSE) estimator of xi obtained by the conditional mean of xi given yi is given by 1/2
Λ x λx −1/2 φ1 (Ui Ai ) E[Ui x bi = E[xi yi ] = µx + Λx ai + p yi ], i = 1, . . . , n, (47) 1/2 2 1 + λx Λ x Φ1 (Ui Ai )
¯ x , ν) or SCNp (µ, Σ, λ ¯ x , ν, γ), Note that, if Yi , i = 1, . . . , n, has distributions STp (µ, Σ, λ we can obtain closed form expression for the expected values given in (47) from the results given in Section 3.1. In practice, the Bayes estimator of xi , x bi , can be obtained by substituting the ML estimate θ into (47).
5
The observed information matrix
In this section we developed the observed information matrix in a general form. Suppose that we have observations on n independent individuals, Y1 , . . . , Yn , where Yi ∼ SNIni (µi (β), Σi (γ), λi (λ); H), i = 1, . . . , n. Then, the loglikelihood function for θ = (β ⊤ , γ ⊤ , λ⊤ )⊤ ∈ Rq , given the observed sample y = (y1⊤ , . . . , yn⊤ )⊤ , is of the form n X ℓ(θ) = ℓi (θ), (48) i=1
ni 1 log2π − log Σi  + log Ki , with 2 2 Z ∞ 1 n /2 1/2 Ki = Ki (θ) = ui i exp{− ui di}Φ1 (ui Ai )dH(ui), 2 0
where ℓi (θ) = log 2 −
⊤ −1 and di = (yi − µi )⊤ Σ−1 i (yi − µi ), Ai = λi Σi (yi − µi ). Using the notation Z ∞ 1 1/2 Φ Ii (w) = uw i exp{− ui di }Φ1 (ui Ai )dH(ui ), 2 0 Z ∞ 1 1/2 φ Ii (w) = uw i exp{− ui di }φ1 (ui Ai 0, 1)dH(ui), 2 0
21
so that, Ki (θ) can be expressed as Ki (θ) = IiΦ ( n2i ), i = 1, . . . , n, it follows that the matrix of second derivatives with respect to θ is given by L=
n X ∂ 2 ℓi (θ) i=1
∂θ∂θ ⊤
n
n
n
1 X ∂ 2 log Σi  X 1 ∂Ki ∂Ki X 1 ∂ 2 Ki =− − + , 2 ⊤ ⊤ 2 i=1 ∂θ∂θ ⊤ K ∂θ K ∂θ ∂θ∂θ i i i=1 i=1
where
(49)
∂Ki ni + 1 ∂Ai 1 Φ ni + 2 ∂di = Iiφ ( ) − Ii ( ) ∂θ 2 ∂θ 2 2 ∂θ
and ∂ 2 Ki 1 Φ ni + 4 ∂di ∂di 1 Φ ni + 2 ∂ 2 di = I ( ) − I ( ) (50) 4 i 2 ∂θ ∂θ ⊤ 2 i 2 ∂θ∂θ ⊤ ∂θ∂θ ⊤ ni + 3 ∂Ai ∂di ∂Ai ∂Ai 1 ∂di ∂Ai ni + 3 − Iiφ ( )( + ) − Iiφ ( )Ai ⊤ ⊤ 2 2 ∂θ ∂θ ∂θ ∂θ 2 ∂θ ∂θ ⊤ 2 ni + 1 ∂ Ai +IiΦ ( ) . 2 ∂θ∂θ ⊤ From propositions 8, 9 and 10 we have that, for each distribution considered in this work, the integrates above can be written as • Skewt.
√ Ai 2w ν ν/2 Γ(w + ν/2) T1 2w + ν0, 1, 2w + ν and = Γ(ν/2)(ν + di )ν/2+w (di + ν)1/2 ν+2w 1 ν + 2w 2w ν ν/2 ( Iiφ (w) = √ ) 2 Γ( ). 2 2 2πΓ(ν/2) di + Ai + ν
IiΦ (w)
• Skewslash. di 2w+ν Γ(w + ν) 1/2 P1 (w + ν, )E[Φ(Si Ai )] and w+ν 2 di ν2w+ν Γ(w + ν) di + A2i Iiφ (w) = √ P (w + ν, ), 1 2 2π(di + A2i )w+ν
IiΦ (w) =
where Si ∼ Gamma(w + ν, d2i )I(0,1) . • Contaminated skew–normal. √
1 2π{νγ w−1/2 φ1 (di 0, )Φ(γ 1/2 Ai ) + (1 − ν)φ1 (di 0, 1)Φ(Ai)} and γ 1 Iiφ (w) = νγ w−1/2 φ1 (di + A2i 0, ) + (1 − ν)φ1 (di + A2i ). γ
IiΦ (w) =
In many situation the derivatives of log Σi , di and Ai involves complicated algebraic manipulation. For the multivariate SNI responses these derivatives are given in the Appendix A. For SNIMEM, the derivatives of log Σ, di and Ai can be found in Lachos, et al. (2007). Asymptotic confidence intervals and test on the MLEs can 22
be obtained using this matrix. Thus, if J = −L denotes the observed information matrix for the marginal loglikelihood ℓ(θ) , then asymptotic confidence intervals and hypotheses tests for the parameter θ ∈ Rq are obtained assuming that the MLE b has approximately a Nq (θ, J−1 ) distribution. In practice, J is usually unknown θ b that is, the matrix J b evaluated at the MLE and has to be replaced by the MLE J, ˆ θ. More generally speaking, for models as those in Proposition 5, the observed information matrix can be derived from the results given here.
6
Illustrative examples
We illustrate the usefulness of the proposed class of distribution by applying them to two real data set. The first example is an application of the methodology for univariate SNI responses and the second one is an application of SNIMEM with p = 5.
6.1
Fiberglass data set
In this section we apply three specific distributions of the skew normal/independent class, viz., the univariate skewnormal, skewt, skewslash and skewcontaminated normal distributions, to fit the data on the breaking strength of 1.5 cm long glass fiber, consisting of 63 observations. Jones and Faddy (2003) and Wang and Genton (2006) had previously analysed this data with a skewt and a skewslash distribution respectively. They both found the strong presence of skewness on the left as well as heavy tail behavior of the data, as depicted in Figure 4. We compare in the sequel skewnormal (SN), skewt (ST), contaminated skewnormal (SCN) and skewslash (SSL) fitting for this data set. Resulting parameter estimates for the four models are given in Table 1. As suggested by Lange et al. (1989) for each model, the Schwarz information criterion was used for choosing the value of ν. This strategy is illustrated in Figure 5. Figure 4 shows the histogram of the fiber data superimposed with the fitted curves of the densities from the four models, we considered. We observed that the contaminated skewnormal fits the fiber data better than the other three distributions, especially at the peak part of the histogram. This conclusion is also supported by the values from the loglikelihoods given in Table 1. Replacing the ML estimates of θ in the Mahalanobis distance di = (yi − µ)2 /σ 2 , we present QQ plots and envelopes in Figure 6 (lines represent the 5th percentile, the mean, and the 95th percentile of 100 simulated points for each observation). Plots in Figure 6 again provide a strong evidence that the SNI distributions provides a better fit to the data set than the skewnormal distribution.
6.2
Chipkevitch et al. (1996) data set.
In this application, the multivariate skewnormal, skewt, skewslash and skewcontaminated normal distributions, are applied to fit the data studied in Chipkevitch et al. (1996) where measurement of the testicular volume of 42 adolescents were made in a certain sequence by using five different techniques: ultrasound 23
Figure 4: The histogram of the fiber grass strength superimposed with the fitted densities
1.0
SNC SSL ST SN
0.0
0.5
Density
1.5
2.0
curves of the four distributions.
0.5
1.0
1.5
2.0
Fiber grass strength
Table 1: MLE of the four models fitted on the fiber grass strength data set. SE are the estimated asymptotic standard errors based on the observed information matrix given in Section 5. distribution SN ST SCN SSL
µ b 1.8503 (0.0468) 1.7549 (0.045) 1.7241 (0.0393) 1.805591 (0.0461)
σ b 0.4705 (0.0464) 0.2725 ( 0.0353) 0.1615 (0.0187) 0.2989667 (0.0366)
b λ 2.6789 (0.7513) 1.6196 (0.6523) 1.2940 (0.5080) 1.870298 (0.6320)
ν 
γ 
Loglikelihood 13.9571
3
11.7546
0.5
0.10
1.7

9.1928 12.9367
(US), graphical method proposed by the authors (I), dimensional measurement (II), prader orchidometer (III) and ring orchidometer (IV), with the ultrasound approach assumed to be the reference measurement device. A histogram of the measurement 24
Figure 5: Plot of the profile loglikelihood for fitting (a) a skewt model for fiber grass strength (b) a contaminated skewnormal model for Chipkevitch data.
(a)
(b)
−11.5
−12
−414 −12.5
−416
log−likelihood
log−likelihood
415.97 −13
−13.5
−418 −420 −422 −424
−14
−426 −428 1
−14.5
0.8
1 0.6
−15
0.8 0.6
0.4 −15.5
0.4
0.2 1
2
3
4
5
ν
6
7
8
9
10
ν
0.2 0
0
γ
(see Figure 7c) shows certain asymmetry in the data set so that GaleaRojas et al. (2002) recommend using a cubic root transformation to better achieve normality. Resulting parameter estimates for the four models are given in Table 2. The AIC criterion was used for choosing among some values of ν, for the ST model we found ν = 6, for the SSL we found ν = 3 and for the SCN ν = 0.3, γ = 0.3. Therefore, for the three models heavytailed distributions will be assumed. We can note from Table 2 that the intercept and slope estimates are similar among the four fitted models, however the standard errors of the SNI distributions are smaller than ones of the skewnormal model, indicating that the three models with longer than skewnormal tails seem produce more accurate maximum likelihood estimates. The estimates for the variance components are not comparable since they are in different scale. Note also that using the loglikelihood values shown in the bottom of the Table 2 we have that they favor the SNI models. Particularly, we can see that the skewt distribution fits the data better than the other three distributions. The plots in Figure 7 provide even stronger evidence that the ST distribution provides a better fit to the data set than the SN distribution.
7
Conclusion
In this work we have defined a new family of asymmetric models by extending the symmetric normal/independet family. Our proposal generalized recent results found in Azzalini and Capitanio (2003), Gupta (2003) and Wang and Genton (2006). In addition, we have developed a very general method based on the EM algorithm for estimating the parameters of the skewnormal/independent distributions. A closed form expressions were derived for the iterative estimation processes. This was a consequence of the fact that the proposed distributions posses a stochastic representation that can be used to represent them hierarchically. This stochastic representation also allows us to study many of its properties easily. We believe that the approaches proposed here can also be used to study other asymmetric 25
Figure 6: Fiber grass strength data set. QQ plots and simulated envelopes: (a) skewnormal model (b) contaminated skewnormal model (c) skewt model and (d) skewslash model.
(b)
80 60 40 0
20
Sample values and simulated envelope
8 6 4 2 0
Sample values and simulated envelope
10
(a)
0
1
2
3
4
5
6
7
0
1
2
Theoretical χ2 quantiles
3
4
5
6
7
5
6
7
Theoretical χ2 quantiles
(d)
100 80 60 40 0
20
Sample values and simulated envelope
120 100 80 60 40 20 0
Sample values and simulated envelope
140
(c)
0
1
2
3
4
5
6
7
0
Theoretical χ2 quantiles
1
2
3
4
Theoretical χ2 quantiles
multivariate models like those proposed by Branco and Dey (2001, Section 3). These models proposed by Branco and Dey (2001) have a stochastic representation of the form Y = µ + η(U)Z, and they also have proper elements like the skewt, the skewslash, the skewcontaminated normal, the skewlogistic, the skewstable and the skewexponential power distributions. The assessment of influence of data and model assumption on the result of the statistical analysis is a key aspect of any new class of distribution. We are currently exploring the local influence and residual analysis to address this issue. Acknowledgments: The first author was partially supported by grant from FAPESP.
26
Figure 7: Chipkevitch data set. QQ plots and simulated envelopes for (a) skewnormal model and (b) skewt model. (c) histogram of the observed measurement d) histogram of the reference device measurement with superimposed fitted SNI densities.
15 10 5 0
5
10
15
Sample values and simulated envelope
(b)
0
Sample values and simulated envelope
(a)
2
4
6
8
10
12
14
0
2
Theoretical χ quantiles
4
6
8
Theoretical χ quantiles
2
2
(d)
0.08 0.04
0.06
Density
0.06 0.04
0.00
0.02
0.02 0.00
Density
SCN SSL ST SN
0.10
0.08
0.12
0.10
(c)
5
10
15
20
25
5
Chipkevitch
10
15
Measurement of ultrasound
27
20
Table 2: Results of fitting skewnormal and SNIMEM to the Chipkevitch data. SE are the estimated asymptotic standard errors based on the observed information matrix given in Section 5. SN Estimate 0.1022 0.0096 0.0483 1.5391 0.8838 0.9495 1.1419 1.0826 1.3384 1.3284 1.6736 1.1578 1.4105 3.9952 59.2857 4.7842 
Parameter α1 α2 α3 α4 β1 β2 β3 β4 φ1 φ2 φ3 φ4 φ5 µx σx2 λx ν γ loglikelihood
SE 0.5655 0.6216 0.6277 0.6337 0.0509 0.0559 0.0565 0.0570 0.3714 0.3480 0.4322 0.3710 0.3994 1.3958 21.5487 4.7925 
422.1628
ST Estimate 0.0426 0.2472 0.1110 1.5444 0.8990 0.9866 1.1537 1.0957 0.9291 0.9538 0.9028 0.9481 1.0196 4.1688 38.1512 3.4300 6 
SE 0.4940 0.5261 0.5674 0.5605 0.0513 0.0565 0.0586 0.0579 0.2893 0.2841 0.2960 0.3160 0.3385 1.0890 14.3057 2.4361 
416.7776
SCN Estimate SE 0.1559 0.5017 0.1527 0.5227 0.1464 0.5643 1.6404 0.5729 0.8911 0.0511 0.9782 0.0547 1.1540 0.0574 1.0885 0.0584 0.7467 0.2240 0.7972 0.2225 0.7294 0.2105 0.7845 0.2537 0.9119 0.2761 4.2784 1.3844 30.9914 13.6063 3.2404 2.8759 0.3 0.3 
SSL Estimate SE 0.1226 0.5317 0.1716 0.5777 0.1027 0.5999 1.5784 0.5978 0.8887 0.0517 0.9754 0.0577 1.1466 0.0583 1.0864 0.0578 0.8345 0.2508 0.8405 0.2369 0.8938 0.2849 0.7998 0.2581 0.9080 0.2783 4.1830 1.3096 35.0036 13.3506 3.7562 3.2021 3 
415.9791
419.3461
Appendix A: The observed information matrix in Multivariate SNI responses In this appendix we reparameterize Σ = D2 for ease computation of the observed information matrix and the first and second derivatives log D, Ai and di are obtained. The notation used is that of Section 5 and for a p dimensional vector ˙ r = ∂Υ(ρ)/∂ρr , with r = 1, 2 . . . , p. . Thus, ρ = (ρ1 . . . , ρp )⊤ , we use the notation Υ the loglikelihood function for θ = (µ⊤ , α⊤ , λ⊤ )⊤ , α = V ech(D), given the observed sample y = (y1⊤ , . . . , yn⊤ )⊤ have elements given by •D ∂ 2 log D ˙ s D−1 D ˙ k) = −tr(D−1 D ∂αk ∂αr • Ai ∂Ai ∂µ 2 ∂ Ai ∂µ∂µ⊤ ∂ 2 Ai ∂αk ∂αr ∂ 2 Ai ∂αk ∂λ⊤
∂Ai ˙ k D−1 (yi − µ), ∂Ai = D−1 (yi − µ), = −λ⊤ D−1 D ∂αk ∂λ 2 2 ∂ Ai ˙ k D−1 λ, ∂ Ai = −D−1 , = 0, = −D−1 D ∂µ∂αk ∂µ∂λ⊤ = −D−1 λ,
˙ s D−1 D ˙ k+D ˙ k D−1 D ˙ s )D−1 (yi − µ), = λ⊤ D−1 (D ˙ k D−1 , = −(yj − µ)⊤ D−1 D
∂ 2 Ai =0 ∂λ∂λ⊤ 28
• di
∂di ∂µ ∂di ∂λ 2 ∂ di ∂µ∂λ⊤ ∂ 2 di ∂αk ∂αs
∂di ˙ k D−1 + D−1 D ˙ k )D−1 (yi − µ), = −(yi − µ)⊤ D−1 (D ∂αk ∂ 2 di ∂ 2 di −2 ˙ k D−1 + D−1 D ˙ k )D−1 (yi − µ), = 0, = 2D , = 2D−1 (D ∂µ∂α ∂µ∂µ⊤ k k ∂ 2 di ∂ 2 di = 0, = 0, =0 ∂αk ∂λ⊤ ∂λ∂λ⊤
= −2D−2 (yi − µ),
˙ s D−1 D ˙ k D−1 + D ˙ k D−1 D ˙ s D−1 + D ˙ k D−2 D ˙ s+D ˙ s D−2 D ˙k = (yi − µ)⊤ D−1 (D ˙ s D−1 D ˙ k + D−1 D ˙ k D−1 D ˙ s )D−1 (yi − µ) D−1 D Appendix B: Lemmas using in Section 3
Lemma 1. Let Y ∼ SNp (λ). Then for any fixed pdimensional vector b and p × p matrix A, r 2 ⊤ E[Y ⊤ AYb⊤ Y] = − [(δ Aδ + tr(A))b⊤ δ + 2δ ⊤ Ab], π where δ is as in (15). Proof. The proof follows by using the stochastic representation of Y given in (2) and of the moments E[X0 ] and E[X0 3 ], where X0 ∼ N(0, 1). Lemma 2. Let Y ∈ Rp a random vector with the following pdf
f (yu) = k −1 (u)φp (yµ, u−1Σ)Φ1 (u1/2 A + u1/2 B⊤ y),
where u is a positive constant, A ∈ R, B any fixed pdimensional vector and k(u) = A + B⊤ µ Φ1 (u1/2 √ ) is a standardized constant. Then, 1 + B⊤ ΣB ΣB A + B⊤ µ WΦ1 (u1/2 √ ) 1 + B⊤ ΣB 1 + B⊤ ΣB Proof. First note, by using Lemma 2 from ArellanoValle et al. (2005), that Z Z ∞ −1 E[Yu] = k (u) yφ1 (tu1/2 A + u1/2 B⊤ y, 1)φ(yµ, u−1 Σ)dtdy ZR∞ 0 = k −1 (u) φ1 (tu1/2 A + u1/2 B⊤ µ, 1 + B⊤ ΣB)EYt [Y]dt, E[Yu] = µ + u−1/2 √
0
where Yt ∼ Np (µ − ΛB(A + B⊤ µ) + u−1/2ΛBt, u−1Λ), with Λ = (Σ−1 + BB⊤ )−1 , and the proof follows by using known properties of the truncated normal distribution (See Johnson et al., 1994, Section 10.1). Lemma 3. Let Y ∼ Gamma(α, β). Then for any a ∈ R r p α E[Φ1 (a (Y ))] = T1 (a 0, 1, 2α), β Proof. See Azallini and Capitanio (2003), Lemma 1. 29
References [1] Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society,Series B, 36, 99102. [2] ArellanoValle, R. B., Bolfarine, H. and Lachos, V. H. (2005). Skewnormal linear mixed models. Journal of Data Science, 3, 415438. [3] Arnold, B. C. and Beaver, R. J. (2002). Skewed multivariate models related to hidden truncation and/or selective reporting. Test, 11, 754. [4] Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12, 171178. [5] Azzalini, A., Capitanio, A. (1999). Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society ,Series B, 61, 579602. [6] Azzalini, A., Capitanio, A. (2003). Distributions generated y perturation of symmetry with emphasis on the multivariate skew tdistribution. Journal of the Royal Statistical Society, Series B, 61, 367389. [7] Azzalini, A. and DallaValle, A. (1996). The multivariate skewnormal distribution. Biometrika, 83, 715726. [8] Barnett, V.D. (1969). Simultaneous pairwise linear structural relationships. Biometrics, 25, 129142. [9] Branco, M. and Dey, D. (2001). A general class of multivariate skewelliptical distribution. Journal of Multivariate Analysis, 79, 93113. [10] Bolfarine, H. and GaleaRojas, M. (1995). Maximum likelihood estimation of simultaneous pairwise linear structural ralationships. Biometrical Journal, 37, 673689. [11] Bolfarine, H. and GaleaRojas, M. (1996). On structural comparative calibration under a tmodel. Computational Statistics, 11, 6385. [12] Chipkevitch, E., Nishimura, R., Tu, D. and GaleaRojas, M. (1996). Clinical measurement of testicular volume in adolescents: Comparison of the reliability of 5 methods. Journal of Urology, 156, 20502053. [13] Diciccio, T. and Monti, A. C. (2004). Inferential aspects of the skew exponential power distribution. Journal of the American Statistical Association, 99, 439450. [14] Genton, M. G., He, L. and Liu, X. (2001). Moments of skewnormM random vectors and their quadratic forms, Statistics and Probability Letters, 51,319325.
30
[15] Genton, M. G., Loperfido, N. (2005). Generalized skewelliptical distributions and their quadratic forms. Annals of the Institute of Statistical Mathematics, 57, 389401. [16] Gupta A. K. (2003). Multivariate skew tdistributions. Statistics, 37(4), 359363. [17] Johnson, N. L., Kotz, S., Balakrishnan, N. (1994). Continuous Univariate Distributions, vol. 1. New York: John Wiley. [18] Jones, M. C. and Faddy, M. J. (2003). A skew extension of the s distribution, with applications. Journal of the Royal Statistical Society, Series B, 65, 159174. [19] Lachos, V. H., Bolfarine, H., ArellanoValle, R. B. and Montenegro, L. C. (2007). Likelihood based inference for multivariate skewnormal regression models. Communications in Statistics: Theory and Methods, 36, 17691786. [20] Lachos, V. H., Bolfarine, H., Vilca, L. F., Galea–Rojas, M. (2005). Estimation and influence diagnostic for structural comparative calibration models under the skewnormal distribution. Technical report, RTMAE 200512, IMEUSP. [21] Lange, K. and Sinsheimer. (1993). Normal/independent distributions and their applications in robust regression . Journal of Computational and Graphical Statistics, 2, 175198. [22] Lange, K. L., Little, J. A. and Taylor, M. G. J. (1989). Robust statistical modeling using the t distribution . Journal of the American Statistical Association, 84, 881896. [23] Little, R. J. A. (1988). Robust Estimation of the mean and covariance matrix from data with missing values . Applied Statistics, 37, 2338. [24] Liu, C. (1996). Bayesian roboust linear regression with incomplete data. Journal of the American Statistical Association, 91, 12191227. [25] Sahu, S. K., Dey, D. K. and Branco, M. D. (2003). A new class of multivariate skew distributions with aplications to Bayesian regression models. The Canadian Journal of Statistics, 31, 129150. [26] Shyr, I. and Gleser, L. (1986). Inference about comparative precision in linear structural relationships. Journal of Statistical Planning and Inference, 14, 339358. [27] Tan, F. and Peng, H. (2006). The slash and the skewslash Student t distributions. Submitted [28] Wang, J., Boyer, J. and Genton, M. (2004). A skewsymmetric representation of multivariate distributions. Statistica Sinica , 14, 12591270 . 31
[29] Wang, J. and Genton, M. (2006). The multivariate skewslash distribution. Journal of Statistical Planning and Inference, 136, 209220. [30] Zhu, H. and Lee, S. (2001). Local influence for incompletedata models. Journal of the Royal Statistical Society, Series B, 63, 111126.
32