A Class of Multivariate Skew Distributions: Properties ... - OhioLINK ETD

0 downloads 0 Views 3MB Size Report
Key words and phrases: Multivariate Skew-Symmetric Distribution, Matrix Variate ..... generally elliptical distributions are closed under nonsingular linear ...
A CLASS OF MULTIVARIATE SKEW DISTRIBUTIONS: PROPERTIES AND INFERENTIAL ISSUES

Deniz Akdemir

A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 2009 Committee: Arjun K. Gupta, Advisor James M. McFillen, Graduate Faculty Representative John T. Chen Hanfeng Chen

ii ABSTRACT Arjun K. Gupta, Advisor Flexible parametric distribution models that can represent both skewed and symmetric distributions, namely skew symmetric distributions, can be constructed by skewing symmetric kernel densities by using weighting distributions. In this dissertation, we study a multivariate skew family that have either centrally symmetric or spherically symmetric kernel. Specifically, we define multivariate skew symmetric forms of uniform, normal, Laplace, and Logistic distributions by using the cdf’s of the same distributions as weighting distributions. Matrix variate extensions of these distributions are also introduced herein. To bypass the unbounded likelihood problem related to the inference about this model, we propose an estimation procedure based on the maximum product of spacings method. This idea also leads to bounded model selection criteria that can be considered as alternatives to Akaike’s and other likelihood based criteria when the unbounded likelihood may be a problem. Applications of skew symmetric distributions to data are also considered.

AMS 2000 subject classification: Primary 60E05, Secondary 62E10 Key words and phrases: Multivariate Skew-Symmetric Distribution, Matrix Variate Skew-Symmetric Distribution, Inference, Maximum Product of Spacings, Unbounded Likelihood, Model Selection Criterion

iii

To little Pelin

iv ACKNOWLEDGMENTS This is a great opportunity to express my respect and thanks to Prof. Arjun K. Gupta for his continuous support and invaluable input for this work. It was a pleasure to be his student. I am also grateful to my wife, my parents and my children.

v

Table of Contents CHAPTER 1: Notation, Introduction and Preliminaries

1

1.1

Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2.1

Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2.2

Random Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . .

5

1.2.3

Univariate, Multivariate, Matrix Variate Normal Distribution

. . . .

9

1.2.4

Univariate Normal Distribution . . . . . . . . . . . . . . . . . . . . .

9

1.2.5

Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . .

10

1.2.6

Matrix Variate Normal Distribution . . . . . . . . . . . . . . . . . . .

12

1.2.7

Symmetric Distributions . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.2.8

Centrally Symmetric Distributions . . . . . . . . . . . . . . . . . . .

13

1.2.9

Spherically Symmetric Distributions

. . . . . . . . . . . . . . . . . .

14

1.2.10 Location-Scale Families from Spherically Symmetric Distributions . .

15

CHAPTER 2: A Class of Multivariate Skew Symmetric Distributions

18

2.1

Background: Multivariate Skew Normal Distributions . . . . . . . . . . . . .

18

2.2

Skew-Centrally Symmetric Densities

. . . . . . . . . . . . . . . . . . . . . .

22

Independent Observations, Location-Scale Family . . . . . . . . . . .

31

Multivariate Skew-Normal Distribution . . . . . . . . . . . . . . . . . . . . .

36

2.3.1

36

2.2.1 2.3

Motivation: Skew-Spherically Symmetric Densities . . . . . . . . . . .

2.4

2.3.2

Multivariate Skew-Normal Distribution . . . . . . . . . . . . . . . . .

vi 37

2.3.3

Generalized Multivariate Skew-Normal Distribution . . . . . . . . . .

51

2.3.4

Stochastic Representation, Random Number Generation . . . . . . .

60

Extensions: Some Other Skew Symmetric Forms . . . . . . . . . . . . . . . .

61

CHAPTER 3: A Class of Matrix Variate Skew Symmetric Distributions

70

3.1

Distribution of Random Sample . . . . . . . . . . . . . . . . . . . . . . . . .

70

3.2

Matrix Variable Skew Symmetric Distribution . . . . . . . . . . . . . . . . .

74

3.3

Matrix Variate Skew-Normal Distribution

. . . . . . . . . . . . . . . . . . .

76

3.4

Generalized Matrix Variate Skew Normal Distribution . . . . . . . . . . . . .

79

3.5

Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

CHAPTER 4: Estimation and Some Applications

84

4.1

Maximum products of spacings (MPS) estimation . . . . . . . . . . . . . . .

85

4.2

A Bounded Information Criterion for Statistical Model Selection . . . . . . .

88

4.3

Estimation of parameters of snk (α, µ, Σ) . . . . . . . . . . . . . . . . . . . .

91

4.3.1

Case 1: MPS estimation of sn1 (α, 0, 1) . . . . . . . . . . . . . . . . .

92

4.3.2

Case 2: Estimation of µ, Σc1/2 given α . . . . . . . . . . . . . . . . .

94

4.3.3

Case 3: Simultaneous estimation of µ, Σc1/2 and α. . . . . . . . . . .

96

BIBLIOGRAPHY

103

vii

List of Figures 1.1

Centrally symmetric t distribution . . . . . . . . . . . . . . . . . . . . . . . .

16

2.1

Surface of skew uniform density in two dimensions . . . . . . . . . . . . . . .

26

2.2

Surface of skew uniform density in two dimensions . . . . . . . . . . . . . . .

27

2.3

Surface of skew uniform density in two dimensions . . . . . . . . . . . . . . .

28

2.4

The contours for 2−dimensional skew normal pdf . . . . . . . . . . . . . . .

29

2.5

The surface for 2 dimensional pdf for skew normal variable . . . . . . . . . .

30

2.6

The surface for 2 dimensional pdf for skew Laplace variable . . . . . . . . . .

32

2.7

The surface for 2 dimensional pdf for skew-normal variable . . . . . . . . . .

63

2.8

The surface for 2 dimensional pdf for skew-normal variable . . . . . . . . . .

64

2.9

The surface for 2 dimensional pdf for skew-normal variable . . . . . . . . . .

65

2.10 The contours for 2 dimensional pdf for skew-normal variable . . . . . . . . .

66

2.11 The surface for 2 dimensional pdf for skew-normal variable . . . . . . . . . .

67

2.12 The contours for 2 dimensional pdf for skew-normal variable . . . . . . . . .

68

4.1

Frontier data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

4.2

95

4.3

Histogram of 5000 simulated values for α b by MPS. . . . . . . . . . . . . . . . 1150 height measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

4.4

The scatter plot for 100 generated observations . . . . . . . . . . . . . . . .

99

4.5

The scatter plot for LBM and BMI in Australian Athletes Data . . . . . . . 100

4.6

The scatter plot for LBM and SSF in Australian Athletes Data

. . . . . . . 100

viii

List of Tables 4.1

M P SIC, M P SIC2 , and M P SIC3 are compared . . . . . . . . . . . . . . .

91

4.2

MPS estimation of α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

4.3

ML estimation of α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

4.4

Estimators of α, µ and σ . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

4.5

b and α b . . . . . . . . . . . . . . . . . . . . . 101 The simulation results about µ

4.6

d 1/2 The simulation results about Σ . . . . . . . . . . . . . . . . . . . . . . . . 102 c

1

CHAPTER 1 Notation, Introduction and Preliminaries 1.1

Notation

We denote matrices by capital letters, vectors by small bold letters, scalars by small letters. We usually use letters from the beginning of the alphabet to denote the constants, sometimes we use Greek letters. For random variables we choose letters from the end of the alphabet. We do not follow the convention of making a distinction between a random variable and its values. Also the following notation is used in this dissertation: Rk : k−dimensional real space. (A)i. : the ith row vector of the matrix A. (A).j : the jth row vector of the matrix A. (A)ij : the element located at the ith row and jth column of matrix A. A′ : transpose of matrix A. A+ : Moore-Penrose inverse of matrix A.

2 |A|: determinant of square matrix A. tr(A): trace of square matrix A. etr(A): etr(A) for square matrix A. A1/2 : symmetric square root of symmetric matrix A. 1/2

Ac : square root of positive definite matrix A by Cholesky decomposition. Ik : k−dimensional identity matrix. Sk : the surface of the unit sphere in Rk . 0k×n : k × n dimensional matrix of zeros. ej (or sometimes cj ): a vector that has zero elements except for one 1 at its jth row. 1k : k−dimensional vector of ones. 0k : k−dimensional vector of zeros. x ∼ F : x is distributed as F. d

x = y: x and y have the same distribution. Nk (µ, Σ1/2 ): k−dimensional normal distribution with mean µ and covariance Σ. φk (x, µ, Σ): density of Nk (µ, Σ1/2 ) distribution evaluated at x. Φ(x, µ, σ): cumulative distribution function of N1 (µ, σ) distribution evaluated at x. φk (x): density of Nk (0, Ik ) distribution evaluated at x. Φ(x): cumulative distribution function of N1 (0, 1) distribution evaluated at x. Φk (x, µ, Σ): cumulative distribution function of Nk (µ, Σ1/2 ) distribution evaluated at x. Φk (x): cumulative distribution function of Nk (0, Ik ) distribution evaluated at x.

3 χ2k :

2

central χ distribution with k degrees of freedom.

χ2k (δ): non central χ2 distribution of k degrees of freedom and non centrality parameter δ. Wk (n, Σ): Wishart distribution with parameters n, and Σ.

1.2

Preliminaries

In this section, we provide certain definitions and results involving vectors and matrices as well as random vectors and matrices. Most of these results are stated here without proof; proofs can be obtained from ([4],[25]), or in most other textbooks on matrix algebra and multivariate statistics.

1.2.1

Vectors and Matrices

A k−dimensional real vector a is an ordered array of real numbers ai , i = 1, 2, . . . , k, organized in a single column, written as a = (ai ). The set of all k−dimensional real vectors is denoted by Rk . A real matrix A of dimension k × n is an ordered rectangular array of real numbers aij arranged in rows i = 1, 2, . . . , k and columns j = 1, 2, . . . , n, written as A = (aij ). Two matrices, A and B of same dimension, are equal if aij = bij for i = 1, 2, . . . , k and j = 1, 2, . . . , n. If all aij = 0, we write A = 0. A vector of dimension k that has all zero components except for a single component equal to 1 at its ith row is called the ith (k)

elementary vector of Rk and is denoted by ei , we drop the superscript k when no confusion is possible. For two k × n matrices A and B, the sum is defined as A + B = (aij + bij ). For a real number c, scalar multiplication is defined as cA = (caij ). Product of two matrices A of P dimension k × n and B of dimension n × m is defined as AB = (cij ) where cij = kℓ=1 aiℓ bℓj .

A matrix is called square matrix of order k if the number of columns or rows it has are

4 equal to k. The transpose of a matrix A is obtained by interchanging the rows and columns of A and represented by A′ . When A = A′ , we say A is symmetric. A k × k symmetric matrix A for which aij = 0, i 6= j for i, j = 1, 2, . . . , k is called diagonal matrix of order k, and represented by A = diag(a11 , a22 , . . . , akk ); in this case if aii = 1 for all i, then we denote this by Ik and call it identity matrix. A square matrix of order k is orthogonal if A′ A = AA′ = Ik . A symmetric matrix A of order k is positive definite and is denoted by A > 0 if for all vectors a ∈ Rk , a′ Aa > 0. A symmetric matrix A of order k is positive semi definite and is denoted by A ≥ 0 if for all vectors a ∈ Rk , a′ Aa ≥ 0. Given a square matrix A of order k, the equation Ax = λx can be solved for k pairs (λ, x) (λ ∈ R and x ∈ Rk ), the first of a pair of solutions is called an eigenvalue, then second of the pair is called a corresponding eigenvector. Eigenvalues and eigenvectors of a matrix are usually complex, but a real symmetric matrix always have real eigenvalues and eigenvectors. Additional assumption of positive definiteness requires positive eigenvalues, assumption of nonnegative definiteness requires nonnegative eigenvalues. The number of nonzero eigenvalues of a square matrix is the rank of A and written as rk(A). We can define the rank of a k × n matrix as rk(A) = rk(AA′ ) or equivalently as rk(A) = rk(A′ A), if rk(A) = min(k, n) then A is called a full rank matrix. The determinant of a square matrix A of order k is defined as the product of its eigenvalues and written as |A|. If |A| = 6 0, A is called nonsingular, and we can calculate the unique matrix B such that AB = BA = Ik , B is called the inverse of A and written as A−1 . A positive definite matrix is nonsingular and has positive determinant. The trace of a square matrix A of order k is defined as the sum of its eigenvalues and written as tr(A). A symmetric matrix A of order k can be written as A = GDG′ where G is an orthogonal matrix of order k and D is a diagonal matrix of order k. The symmetric square root matrix A1/2 is defined as A1/2 = GD1/2 G′ where D1/2 is the diagonal matrix with diagonal elements equal to the square roots of the elements of D. If A is also positive definite, there exists a

5 unique lower triangular matrix L with positive diagonal elements such that A = LL . This ′

is called the Cholesky decomposition of A. In the following, we denote the square root of A 1/2

obtained through Cholesky decomposition as Ac . Suppose that A is a symmetric matrix of order k and rank r ≤ k. In this case, the unique pseudo inverse of A can be written as A− = GD− G′ , where D− is the diagonal matrix of order r with diagonal elements equal to the inverses of nonzero eigenvalues of A, G. A matrix of dimensions k × r is constructed such that jth column of it is the normalized eigenvector corresponding to the inverse of the jth diagonal element of D− ; A and A− satisfy AA− A = A, and A− AA− = A− . Let B be a k × n matrix of rank r, r ≤ k ≤ n. The unique Moore-Penrose inverse of B is defined as B + = B ′ (B ′ B)− . A symmetric matrix A of order k is idempotent of order k if it satisfies AA′ = A′ A = A. BB + and B + B are idempotent matrices.    Iq 0  ′ If A is idempotent of order k and rank q, then it can be written as A = G  G 0 0 where G is an orthogonal matrix of order k, 0’s denote zero matrices of appropriate dimensions. If A1 , A2 , . . . , Am are each idempotent of order k with ranks r1 , r2 , . . . , rm such that Pm we can find an orthogonal matrix G of order k j=1 rj = k and if Ai Aj = 0 for i 6= j, then        0 0 0    ′ 0 0  ′  Ir 1 0  ′  G , Am = G  such that A1 = G   G , A2 = G    G where 0 I 0 r2     0 0 0 Ir m 0 0 0 0’s denote zero matrices of appropriate dimensions.

1.2.2

Random Vectors and Matrices

k random variables x1 , x2 , . . . , xk observable on a sampling unit are represented by a k dimensional column vector x, and is called a multivariate random variable. The matrix variable random variable obtained by considering n sampling units selected to a random sample observed on k variables is represented by a k × n matrix, say X. The distribution of a random variable x can be characterized by its cumulative distribution function (cdf) Fx (.). It is defined as Fx (c) = P (x ≤ c). Given a cdf F (.), if there exists a

6 nonnegative function f (.) such that −∞ f (t)dt for every x ∈ R, we say that the random Rx

variable x with cdf Fx (.) has probability density function (pdf), dF (x) = f (x). We can also represent a distribution using the moment generating function (mgf) of the random variable defined as Mx (t) = E(etx ), if it exists for all t in some neighborhood of 0. The mgf of a random variable does not always exist, but the characteristic function (cf) always exists and it is defined as Ψx (t) = E(eitx ). More generally, the distribution of a random variable x in Rk can be characterized by its joint cumulative distribution function (jcdf) Fx(.). For c in Rk , it is defined as Fx(c) = P (x1 ≤ c1 , x2 ≤ c2 , . . . , xk ≤ ck ). Given a jcdf F (.), if there R x1 R x2 R xk exists a nonnegative function f (.) such that −∞ . . . f (t1 , t2 , . . . , tk )dt1 dt2 . . . dtk for −∞ −∞

every x ∈ Rk , we say that the random vector x with jcdf Fx (.) has joint probability density

function (jpdf), dF (x) = f (x). The joint moment generating function (jmgf) of the random ′

variable defined as Mx(t) = E(et x), if it exists for all t in some neighborhood of 0k in Rk . ′

The joint characteristic function (jcf) is defined as Ψx(t) = E(eit x) for t ∈ Rk . We write x ∼ F (x), x ∼ f (x), x ∼ Mx(t), or x ∼ Ψx(t) equivalently to say that a random vector has a certain distribution. When we say that two random vectors x and y are have the same distribution, we refer to the equivalence of either jcdf’s, jpdf’s, jmgf’s, or jcf’s for these d

variables, in this case we write x = y. Let x be a k dimensional random variable. Suppose we know either one of the following: x ∼ F (x), x ∼ f (x), x ∼ Mx(t), or x ∼ Ψx(t). Let k denote the set of indices for random variables in x. Let m be a subset of k. The marginal distribution of m−dimensional vector of random variables with indices in m, say x(m) , can be found in one of the following ways: R R c x(m) ∼ F (x)|x(mc ) =∞ , x(m) ∼ f (x(m) ) = . . . Rk−m f (x)dx(m ) , x(m) ∼ Mx(t)t(mc ) =o, or c)

x(m) ∼ Ψx(t)t(mc ) =0 where mc denotes the complement of m, x(m c)

sub vector of x, and similarly t(m

is the corresponding

denotes the sub vector of t corresponding to indices c)

in mc . If the random variable x has a jpdf, the conditional jpdf of x(m) given x(m c

f (x(m) |x(m ) ) =

f (x) c . f (x(m ) )

is

c

When x does not have a density, the distribution of x(m) |x(m ) is

still defined from definition of conditional probability. Two random vectors, x and y, are

7 said to be independent if F (x, y) = G(x)H(y) where F (x, y), G(x) and H(y) are the jcdf’s of (x, y), x, and y correspondingly. If they have densities f (x, y), g(x), and h(y), then x and y are independent if and only if f (x, y) = g(x)h(y). R The expectation of a random variable x with cdf F (x) is defined by E(x) = R xdF (x). R For r = 1, 2, . . ., µra = R (x − a)r dF (x) is called the moment of order r about a. In general,

the expected value of an arbitrary function of x, g(X) with respect to the probability density R function dF (x) is given by E(g(x)) = R g(x)dF (x). We can write E(x) = µ10 for the mean. The moment of order 2 about E(x) is called the variance of the random variable x, and

write it as var(x). Let x = (x1 , x2 , . . . , xk )′ be a k−dimensional random vector. If every element xj of x has finite expectation µj , we write E(x) = (E(x1 ), E(x2 ), . . . , E(xk ))′ = µ = (µ1 , µ2 , . . . , µk )′ for the mean vector. A k variable moment µra1 r2 ...rk about an origin a = (a1 , a2 , . . . , ak )is defined as µra1 r2 ...rk = E((x1 − a1 )r1 (x2 − a2 )r2 . . . (xk − ak )rk ) when it exists. R Expectation of an arbitrary function of x, g(x) is defined as E(g(x)) = Rk g(x)dF (x). Covariance matrix Σ for a k dimensional random vector x is the expectation of the k × k

matrix (x − µ)(x − µ)′ . Obviously, Σ is symmetric, that is, the class of covariance matrices coincides with the class of positive semi definite matrices. The following is very useful: If x is a k−dimensional random vector with mean vector µ and covariance matrix Σ, then the variable y = Ax + ξ for A a m × k matrix and ξ ∈ Rm has mean vector Aµ + ξ, and covariance matrix AΣA′ . Assume x ∈ χ be a k−dimensional random vector described by its jcdf, F (x). Let h(x) be a measurable function from χ to Υ ⊂ Rm . Then y = h(x) is a m−dimensional random R vector with with its distribution described by H(y) = Θ dF (x), where Θ = {x : h(x) ≤ y}. R ′ The jcf of y can be obtained as Ψy (t) = Rk eit h(x) dF (x) for t ∈ Rm . Similarly, the when it R ′ exists jmgf of y is given by My (t) = Rk et h(x) dF (x) for t ∈ Rm . Let x ∈ χ be a k−dimensional random vector having density function f (x). Let h(x) be

an invertible function from χ to Υ ⊂ Rm such that h−1 has continuous partial derivatives in Υ. Then y = h(x) is a m−dimensional random vector with density function f (h−1 (y))|J(y)|,

8 here |J(y)| is called the Jacobian of the transformation h , and it is the determinant of the −1

k × k matrix whose ijth component is the partial derivative of the ith component of h−1 (y) with respect to jth component of y. For example, the Jacobian of the linear transformation y = Ax + µ for A a nonsingular matrix of order k and constant vector µ ∈ Rk is given by |A|−1 . Also, for the k×n matrix variate random variable X the transformation Y = AXB+M for nonsingular matrices A and B of orders k and n and a k × n constant matrix M is given by |A|−n |B|−k . We also have the following useful results for transformation of variables: ′

If x ∼ Mx(t), then y = Ax + µ ∼ eµ tMx(A′ t) for any m × k matrix and µ ∈ Rm , if X ∼ MX (T ) then Y = AY B + M ∼ etr(M ′ T )MX (A′ T B ′ ) for any m × k matrix A, n × l matrix B and m × l matrix M. We define a model to be a k−dimensional random vector x taking values in a set χ ⊂ Rk and having jcdf F (x : θ) where θ is a p−dimensional vector of parameters taking values in Θ ⊂ Rp . χ is called the sample space, Θ is called the parameter space. In parametric statistical inference, we assume that the sample space, the parameter space, and the jcdf F (x; θ) are all known. Usually we refer to a model family in one of the following ways: F = [χ, Fx(x; θ), Θ] or F = [χ, fx(x; θ), Θ]. We try to make some inference about θ after we observe a sample of observations, say X. A group of transformations, G, is a collection of transformations from χ to χ that is closed under inversion, and closed under composition. The model F = [χ, fx(x; θ), Θ] is invariant with respect to G when x ∼ fx(x; θ), g(x) ∼ fg(x) (g(x); g(θ)) where g ∈ G and G is a group of transformations from Θ to Θ. The family of the normal distributions and more generally elliptical distributions are closed under nonsingular linear transformations. Invariance under a group of transformations, say G, can be useful for reducing the dimension of the data since inference about θ should not depend on whether the simple random sample X = (x1 , x2 , . . . , xn ) is observed or g(X) = (g(x1 ), g(x2 ), . . . , g(xn )) is observed for g ∈ G. Suppose that F is invariant with respect to G and that the statistic T = T (X) is an estimator of θ. That is, T is a function that maps χn to Θ. Then T is equivariant estimator

9 if T (g(X)) = g(T (X)) for all g ∈ G, g ∈ G, and all X ∈ χ . It can also be shown that if F is n

invariant with respect to G and the mle is unique then the mle of θ is equivariant, if the mle is not unique then the mle can be chosen to be equivariant. A statistic T (X) is invariant under a group of transformations G if T (g(X)) = T (X) for all g ∈ G and all X ∈ χn . A function M (X) is maximal invariant under a group of transformations G if M (g(X)) = M (X) for all g ∈ G and all X ∈ χn , and if M (X1 ) = M (X2 ) for some X1 ∈ χn and X2 ∈ χn then X1 = g(X2 ). Suppose that T is a statistic and that M is a maximal invariant. Then T is invariant if and only if T is a function of M. Invariance property is usually utilized for obtaining suitable test statistics for hypothesis testing problems that stay invariant under a group of transformations.

1.2.3

Univariate, Multivariate, Matrix Variate Normal Distribution

The normal distribution arises in many areas of theoretical and applied statistics. The use of the normal model is usually justified by assuming many small, independent effects additively contributing to each observation. Also, in probability theory, normal distributions arise as the limiting distributions of several continuous and discrete families of distributions. In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. In the following, we give the definition and review some important results about the normal distribution for univariate, multivariate and matrix variate cases.

1.2.4

Univariate Normal Distribution

Each member of the univariate normal distribution family is defined by two parameters location and scale: the mean (”average”, µ) and variance (standard deviation squared) σ 2 ,

10 respectively. To indicate that a real-valued random variable x is normally distributed with mean µ and variance σ 2 , we write x ∼ N (µ, σ). The pdf of the normal distribution is the Gaussian function φ(x; µ, σ) =

1 2 √ 1 e− 2σ2 (x−µ) 2πσ

where σ > 0 is the standard deviation, the

real parameter µ is the expected value. φ(z; 0, 1) = φ(z) is the density function of the ”standard” normal random variable. The cumulative distribution function of the normal Rx )du. The distribution is expressed in terms of the density function Φ(x; µ, σ) = −∞ φ( u−µ σ standard normal cdf, Φ(z), is just the general cdf evaluated with µ = 0 and σ = 1. The values

Φ(z) may be approximated very accurately by a variety of methods, such as numerical integration, Taylor series, asymptotic series and continued fractions. Mgf exists, for x ∼ 1 2 2 σ

N (µ, σ) and is given by Mx (t) = eµt+ 2 t

1.2.5

1 2 2 σ

. Characteristic function is hx (t) = eiµt− 2 t

.

Multivariate Normal Distribution

In classical multivariate analysis the central distribution is the multivariate normal distribution. There seems to be at least two reasons for this: First due to the effect of multivariate central limit theorem, in most cases multivariate observations are at least approximately normal. Second, multivariate normal distribution and its sampling distributions are usually tractable. A k−dimensional random vector x is said to have an k−dimensional normal distribution if the distribution of α′ x is univariate normal for every α ∈ Rk . A random vector x with k components has normal distribution with parameters µ ∈ Rk and k × k positive definite matrix A has jpdf φk (x; µ, A) =

1

1

(2π)p/2 |A|



′ −1 (x−µ)

e− 2 (x−µ) (AA )

.

(1.1)

We say that x is distributed according to Nk (µ, A), and write x ∼ Nk (µ, A). The mgf corresponding to the density in (1.1) is ′

1 ′



Mx(t) = et µ+ 2 t AA t.

(1.2)

11 If we define a m−dimensional random variable y as y = Bx+ξ where B is any m×k matrix and ξ ∈ Rm , then by the moment generating function, we can write y ∼ Nm (Bµ + ξ, BA); note that a pdf representation does not exist when BB ′ is singular, i.e. when rk(B) = r < m. If x ∼ Nk (µ, A) where A is nonsingular, then z = A−1 (x − µ) has the distribution Nk (0k , Ik ) with density φk (z) =

1 ′ 1 e− 2 z z 1/2 (2π)

(1.3)

and moment generating function 1 ′



E(et z ) = e− 2 t t.

(1.4)

We denote the jpdf of z with φk (z), and its jcdf with Φk (x). v = z ′ z has chi-square distribution with k degrees of freedom and denoted by χ2k , the density of v can be written as

dF (v) =

1 2k/2 Γ(k/2)

1

1

v 2 k−1 e− 2 v .

(1.5) d

d

Let u have uniform distribution on the k−dimensional sphere, and r = v 1/2 . Then, z = ru. d

and x = Az + µ satisfies x = µ + rAu; in general, if y has Nm (ξ, B) distribution for any d

m × k matrix B and any k−dimensional vector ξ, we have y = ξ + rBu. The parameters of the Nk (µ, A) distribution have direct interpretation as the expectation and the covariance of the variable x with this distribution since E(x) = µ, E((x − µ)(x − µ)′ ) = AA′ . Any odd moments of x − µ are zero. The fourth order moments are E[(xi − mui )(xj − muj )(xk − muk )(xl − mul )] = (σij σkl + σik σjl + σil σjk ) where AA′ = (σij ). The family of the normal distributions is closed under linear transformations, marginalization and conditioning. A characterization of multivariate normal distribution in the elliptical family of distributions is due to fact that it is the only distribution in this class for which zero covariance matrix between two partitions of a random vector implies independence. The following are important for x ∼ Nk (µ, A), where A is nonsingular: 1. (x − µ)′ (AA′ )−1 (x − µ) ∼ χ2k



′ −1

2. x (AA ) x ∼

χ2k (δ),

here

χ2k (δ)

12 denotes the non central χ distribution with non 2

centrality parameter δ = 12 µ′ (AA′ )−1 µ. When x ∼ Nk (µ, A) where A is singular of rank r we can prove similar results: 1. (x − µ)′ (AA′ )+ (x − µ) ∼ χ2r 2. x′ (AA′ )+ x ∼ χ2r (δ), here χ2r (δ) denotes the non central χ2 distribution with non centrality parameter δ = 21 µ′ (AA′ )+ µ. Assume that x ∼ Nk (µ, A). Let y = Bx + b, z = Cx + c where B is m × k, C is l × k, b is m × 1 and c is l × 1. Then y and z are independent if and only if BAA′ C = 0.

1.2.6

Matrix Variate Normal Distribution

If a random sample of n observations are independently selected from a Nk (µ, A) distribution, the joint density of X = (x1 , x2 , . . . , xn ) can be written as −n

dF (X) = |A|

n Y j=1

φk ((AA′ )−1/2 (xj − µ)) =

1

1

e− 2 (2π)nk/2 |A|n

Pn

j=1 (xj −µ)

′ (AA′ )−1 (x −µ) j

.

A more general matrix variate normal distribution can be defined through linear transformation to an iid random sample of a k dimensional standard normal vectors both from right and left. Let z ∼ φk (z), and let Z = (z 1 , z 2 , . . . , z n ) denote an iid random sample from this distribution. The joint density for Z is given by

dF (Z) =

n Y j=1

=

φk (z j ) =

P 1 ′ − 12 n j=1 zj zj e nk/2 (2π)

1 1 etr(− ZZ ′ ). np/2 (2π) 2

We denote this by Z ∼ φk×n (Z). Now let X = AZB + M for nonsingular matrices A and B of orders k and n and a k × n constant matrix M. The Jacobian of this transformation is

13 −n

|A|

−k

|B| . The density of X is

φk×n (X; M, A, B) =

1.2.7

1 (2π)nk/2 |A|n |B|k

1 etr(− (AA′ )−1 (X − M )(B ′ B)−1 (X − M )′ ). 2

Symmetric Distributions d

A random variable x is symmetric if x = −x. Multivariate symmetry is usually defined in terms of invariance of the distribution of a centered random vector x in Rk under a suitable family of transformations. A random vector x has centrally symmetric distribution about d

the origin if x = −x. An equivalent alternative description for central symmetry is through d

requiring v ′ x = −v ′ x for any vector v ∈ Rk , that is any projection of x to a line through the origin has a symmetric distribution. The joint distribution of k symmetrically distributed random variables is centrally symmetric. A random vector x has a spherically symmetric distribution about the origin 0 if an orthogonal rotation of x about 0 by any orthogonal d

matrix A does not alter the distribution, i.e., x = Ax for all orthogonal k × k matrices A. A random vector that has uniform distribution on a k−cube of form [−c, c]k is centrally but not spherically symmetric, central symmetry is more general than spherical symmetry. Both of these definitions of symmetry reduce to the usual notion of symmetry in the univariate case.

1.2.8

Centrally Symmetric Distributions

The most obvious way of constructing centrally symmetric distributions over Rk is through considering the joint distribution of k independent symmetric random variables. Because of independence, the joint density is the product of univariate marginals. Some examples of this kind of densities follows: 1. Multivariate uniform distribution on the k−cube over [−1, 1]k . Density:

g(x) =

1 , x ∈ [−1, 1]k . 2k

14 k

2. Multivariate symmetric beta distribution on the k−cube over [−1, 1] . Density:

g(x) = (

k e′j x − 1/2 θ−1 Γ(2θ) k Y e′j x − 1/2 ( ) (1 − )) , x ∈ [−1, 1]k . (Γ(θ))2 2 j=1 2 2

3. Multivariate centrally symmetric t-distribution with v degrees of freedom. Density: k k Y (e′j x)2 ( v+1 ) Γ( v+1 ) 2 g(x) = ( √ ) 2 , x ∈ Rk . (1 + ) v vπΓ( v2 ) j=1

4. Multivariate Laplace distribution. Density: k X 1 |e′j x|), x ∈ Rk . g(x; v) = k exp(− 2 j=1

1.2.9

Spherically Symmetric Distributions

A spherically symmetric random vector x has a jcf of the form h(t′ t) for some cf h(.); and a density, if it exists, of the form f (x′ x) for some pdf f (.). For a spherically symmetric random vector x, the random variable defined as u(k) = √ x ′

(x x)

is always

distributed uniformly over the surface of the unit sphere in Rk , denoted by Sk . A stochastic representation for the random vector x which has spherically symmetric d

distribution is given by x = ru where r is a nonnegative random variable with cdf K(r) that is independent of u which is uniformly distributed over Sk . In this case d p d r = (x′ x), u = √ x ′ and they are independent. If both K ′ (r) = k(r) ∈ K the pdf (x x)

of r, and f (x′ x) the jpdf of x exists then they are related as follows: k

2π 2 k(r) = k rk−1 f (r2 ) Γ( 2 )

(1.6)

15 or sometimes we like the expression for pdf of v = r

2

k

π2 1 k(v) = k v 2 (k−1) f (v). Γ( 2 )

(1.7)

Some examples follow: (a) The standard multivariate normal Nk (0k , Ik ) distribution with jpdf

φk (x) =

1 − 21 x′ x e ; (2π)k/2

We denote the jcdf of x by Φk (x). (b) The multivariate spherical t distribution with v degrees of freedom with density function Γ( 21 (v + k)) 1 . dF (x) = 1 1 ′ k/2 Γ( 2 v)(vπ) (1 + v x x)(v+k)/2 When n = 1 the multivariate t distribution is called spherical Cauchy distribution. Both centrally symmetric and spherically symmetric multivariate t distributions approach to the multivariate normal density. In Figure 1.2.9 we display centrally symmetric t distribution for various choices of degrees of freedom, v. As v gets larger the contours approach to the contours of the multivariate normal density.

1.2.10

Location-Scale Families from Spherically Symmetric

Distributions Elliptical distributions that are obtained from an affine transform of a spherically symmetric random vector are usually utilized in the study of robustness of procedures developed under the normal distribution assumption. The family of the elliptical distributions is closed under linear transformations, marginalization and conditioning.

16 degrees of freedom is 5. 5

0

0

y

y

degrees of freedom is 1. 5

−5 −5

0 x

−5 −5

5

0 x

degrees of freedom is 100.

5

5

0

0

y

y

degrees of freedom is 15.

−5 −5

0 x

5

−5 −5

5

0 x

5

Figure 1.1: Centrally symmetric t distribution for various choices of degrees of freedom, v. As v gets larger the contours approach to the contours of the multivariate normal density. Let z be a k−dimensional random vector with spherical distribution F. A random d

vector x = Az + µ for A a m × k constant matrix and µ a m−dimensional constant vector has elliptically contoured distribution with parameters µ ∈ Rk and A with kernel ′

F, we denote this distribution by Em (µ, A, F ). x has a jcf of the form eiµ thz (A′ t) where hz (t) = h(t′ t) is the jcf of the spherically distributed variable z. When it exists the ′

jmgf is given by eµ tMx(A′ t). When AA′ is nonsingular and jpdf F ′ = f exists the jpdf of x is written as follows: Ek (x; µ, A, f ) = |A|−1 f ((x − µ)′ (AA′ )−1 (x − µ)). d

(1.8)

d

A characterization of x is given by x = Az + µ = rAu(k) + µ where r is a nonnegative random variable with cdf K(r) that is independent of u(k) which is uniformly distributed over Sk . If x ∼ Ek (µ, A, f ) has finite first moment then it is given by E(x) = µ. Any odd moments of x − µ are zero. Covariance matrix if it exists is

17 ′

given by E[(x − µ)(x − µ) ] =

E(r 2 ) AA′ . k

If fourth order moments exists, they are

E[(xi − mui )(xj − muj )(xk − muk )(xl − mul )] =

E(r 4 ) (σij σkl k(k+2)

+ σik σjl + σil σjk ) where

AA′ = (σij ). If a random sample of n observations are independently selected from a Ek (µ, A, f ) distribution, the density of X = (x1 , x2 , . . . , xn ) can be written as −n

Ek×n (X; µ, A, f ) = |A|

n Y j=1

f ((xj − µ)′ (AA′ )−1 (xj − µ)).

Gupta and Varga give a generalization of this model ([37]): Ek×n (X, M, A, B, f ) = |A|−n |B|−k f (tr((X − M )′ (AA′ )−1 (X − M )(B ′ B)−1 )).

18

CHAPTER 2 A Class of Multivariate Skew Symmetric Distributions The interest in flexible distribution models that can adequately accommodate asymmetric populations has increased during the last few years. Many distributions are skewed and so do not satisfy some standard assumptions such as normality, or even elliptical symmetry. A reasonable approach is to develop models that can represent both skewed and symmetric distributions. The book edited by Genton supplies a variety of applications for such models in areas such as economics, engineering, and biological sciences ([27]).

2.1

Background: Multivariate Skew Normal Dis-

tributions The univariate skew normal density first appears in Roberts ([55]) as an example to weighted densities. In Aigner et al., skew normal model is obtained in the context of stochastic frontier production function models ([1]). Azzalini provides a formal study

19 of this distribution ([10]). The density of this model is given by SN1 (y, σ 2 , α) = f (y, σ 2 , α) = 2φ(y, σ 2 )Φ(αy),

(2.1)

where α is a real scalar, φ(., σ 2 ) is the univariate normal density function with variance σ 2 and Φ(.) denotes the cumulative distribution function of the univariate standard normal variable. Some basic properties of the univariate skew normal distribution are given in ([10]): (a) SN1 (y, σ 2 , α = 0) = φ(y, σ 2 ), (b) If y ∼ SN1 (y, σ 2 , α) then −y ∼ SN1 (y, σ 2 , −α), (c) As α → ±∞ the density SN1 (y, σ 2 , α) approaches to the half normal density, i.e. to the distribution of ±|z| when z ∼ φ(z, σ 2 ), (d) If y ∼ SN1 (y, σ 2 = 1, α) then y 2 ∼ χ21 . Properties 1, 2, and 3 follow directly from the definition while Property 4 follows immediately from the following lemma: Lemma 2.1.1. ([55]) w2 ∼ χ21 if and only if the p.d.f, of w has the form f (w) = p h(w)exp(−w2 /2) where h(w) + h(−w) = 2/π. The univariate skew normal distribution family extends the widely employed family of normal distributions by introducing a skewness factor α. The advantage of this distribution family is that it maintains many statistical properties of the normal distribution family. The study of skew normal distributions explores an approach for statistical analysis without the assumption of symmetry for the underlying population distribution. The skew normal distribution family emerges to take into account the skewness property. Skew symmetric distribution is useful in many practical situations.

20 A more general form of the density in (2.1) arises as

f (y, α) = 2g(y)H(αy)

(2.2)

where α is a real scalar, g(.) is a univariate density function symmetric around 0 and H(.)is a absolutely continuous cumulative distribution function with H ′ (.) symmetric around 0. This family of densities is called skew symmetric family. Taking α equal to zero in (2.2) gives a symmetric density, we obtain a skewed density for any other value of α. The parameter α controls both skewness and kurtosis. When applying the skew normal distribution family in statistical inference, frequently we need to discuss the joint distribution of a random sample from the population. This consequently necessitates the study of multivariate skew normal distribution. Several generalizations of densities (2.1), (2.2) to the multivariate case have been studied. For example, Azzalini introduced the k-dimensional skew-normal density in two alternative forms ([10], [12]):

f (y, α, Σ) = cφ(y, Σ)

k Y

Φ(αi yi )

(2.3)

i=1

f (y, α, Σ) = 2φ(y, Σ)Φ(α′ y)

(2.4)

where α = (α1 , α2 , . . . αk )′ is a k-dimensional real vector, φ(., Σ) is the k-dimensional normal density function with covariance matrix Σ, Φ(.) denotes the cumulative distribution function of the univariate standard normal variable, and c is the normalizing constant. Chang and Gupta generalize the idea in (2.4) for cases other than normal ([31]). A function f (y, α) = 2g(y)H(α′ y)

(2.5)

21 defines a probability density function for y ∈ R when α is k-dimensional real vector, k

g(.) is a k-dimensional density function symmetric around 0k and H(.) is a absolutely continuous cumulative distribution function with H ′ (.) is symmetric around 0. The skew symmetric family generated by Equation 2.5 has been studied by many authors. For instance, one can define skew-t distributions ([16], [41], [56]), skew-Cauchy distributions ([9]), skew-elliptical distributions ([13], [6]), or other skew symmetric distributions ([60]). The models in ([35]) are obtained by taking both g(.) and H(.) in (2.2) to belong to one of normal, Cauchy, Laplace, logistic or uniform family. In ([49]), g(.) is taken to be a normal pdf while the cumulative distributive function H(.) is taken as the cumulative distribution function of normal, Students t, Cauchy, Laplace, logistic or uniform distribution. Alternatively, Gomez, Venegas, and Bolfarine consider the situation where H(.) is fixed to be the cumulative distribution function of the normal distribution while g(.) is taken as the density function of the normal, Student’s t, logistic, Laplace, and uniform distributions ([30]). Univariate and multivariate skew symmetric family of distributions appear as a subset of the so called selection models. Suppose x is a k dimensional random vector with density f (x). The usual statistical analysis involves using a random sample x1 , x2 , . . . , xn to make inferences about f (x). Nevertheless, there are situations for which a weighted sample instead of a random sample is selected from f (x) because it is either difficult, costly, or even impossible to observe certain parts of the distribution. If we assume that a weight function used in selection, say g(x), then the sample of observation may be thought as coming from the following weighted density

h(x) = R

f (x)g(x) . g(x)f (x)dx Rk

(2.6)

When the sample is only a subset of the population then the associated model would be called a selection model. This kind of densities originate from Fisher ([26]). Patil, Rao

22 and Zelen provide a survey on this family of distributions ([51]). In their recent article article, Arellano-Valle, Branco, and Genton show that a very general class of skew symmetric densities emerge from selection models called fundamental skew distribution ([5]).

2.2

Skew-Centrally Symmetric Densities

In this section, we study a family of multivariate skew symmetric densities generated by multivariate centrally symmetric densities. Theorem 2.2.1. Let g(.) be the k-dimensional jpdf for k independent variables centrally symmetric around 0, H(.) be a absolutely continuous cumulative distribution function with H ′ (.) is symmetric around 0, α = (α1 , α2 , . . . , αk )′ be a k−dimensional real vector, and ej for j = 1, 2, . . . , k are the elementary vectors of Rk . Then

k

f (y, α) = 2 g(y)

k Y

H(αj e′j y)

j=1

defines a probability density function. Proof. First note that f (y) ≥ 0 for all y ∈ Rk . We need to show that k(α1 , α2 , ..., αk ) =

Z

Rk

k

2 g(y)

k Y j=1

H(αj e′j y)dy = 1.

(2.7)

23 Observe that, Z

k Y d k 2 g(y) H(αj e′j y)dy dα k ℓ R j=1 Z k Y k ′ ′ 2 yℓ H (αℓ eℓ y) = H(αj e′j y)g(y)dy

∂ k(α1 , α2 , ..., αk ) = ∂αℓ

Rk

j6=ℓ

= 0.

The first equality is true because of Lebesgue dominated convergence theorem, the last equality is due to independence through

Eg(y) (yℓ H ′ (αℓ e′ℓ y)

k Y j6=ℓ

k Y H(αj e′j y)) = Eg(y) (yℓ H ′ (αℓ e′ℓ y))Eg(y) ( H(αj e′j y)) = 0, j6=ℓ

and for g(.) is centrally symmetric around 0, yℓ H ′ (αℓ e′ℓ y) is an odd function of yℓ . Hence, k(α1 , α2 , ..., αk ) is constant as a function of αj for all j = 1, 2, . . . , k; and when all αi = 0, k(α1 , α2 , ..., αk ) = 1. This concludes the proof. In the next theorem, we relate the distribution of the even powers of a skew symmetric random variable to those of its kernel’s. Theorem 2.2.2. Let x be a random variable with probability density function g(x), let y be the random variable with probability density function

f (y, α) = 2k g(y)

k Y

H(αj e′j y)

j=1

where g(.), H(.) and α are defined as in Theorem 2.2.1. Then, (a) the even moments of y and x are the same, i.e E(yy ′ )p = E(xx′ )p for p even and E(y ′ y)m = E(x′ x)m for any natural number m, (b) y ′ y and x′ x have the same distribution.

24 Proof. It suffices to show E(xn1 1 xn2 2 . . . xnk k ) = E(y1n1 y2n2 . . . yknk ) for n1 , n2 , . . . , nk even. Let Ψy (t) be the characteristic function of y. Then,

Ψy (t) =

Z



eit y 2k g(y)

Rk

k Y

H(αj e′j y)dy.

(2.8)

j=1

Let n1 + n2 + . . . + nk = n. Taking the nth j partial derivatives of (2.8) with respect to tj for j = 1, 2, . . . , k and putting t = 0k , Z

k Y ∂n it′ y k 2 H(αj e′j y)g(y)dy |t=0k nk e n1 n2 Rk ∂t1 ∂t2 . . . ∂tk j=1 Z k k Y Y it′ y k n ′ [e 2 i H(αj ej y)][ yℓnℓ ]g(y)dy |t=0k =

∂ n Ψy (t) = n1 n2 ∂t1 ∂t2 . . . ∂tnk k |t=0k

Rk

=

Z

Rk

j=1

[2k in

k Y

ℓ=1

k Y

H(αj e′j y)][

j=1

yℓnℓ ]g(y)dy.

(2.9)

ℓ=1

Taking derivative of (2.9) with respect to αm ,

R Q Q ∂( Rk [2k in kj=1 H(αj e′j y)][ kℓ=1 yℓnℓ ]g(y)dy) ∂αm Y n (nm +1) ′ = 2k in Eg(y) [ym H (αm e′m y)[ yℓ ℓ H(αj e′j y)] j6=m

(nm +1) ′ = 2k in Eg(y) [ym H (αm e′m y)]Eg(y) [

Y

yℓnℓ H(αj e′j y)]

j6=m

=0

The first equality is true because of Lebesgue dominated convergence theorem, the

25 second equality due to the independence of components. The last equality is due to the fact that (nm +1) ′ ym H (αm ym )

is an odd function of ym and g(.) is centrally symmetric around 0. Therefore, E(y1n1 y2n2 . . . yknk ), for n1 , n2 , . . . , nk even, is constant as a function of αm . If all αm = 0 then f (x) = g(x) and therefore E(xn1 1 xn2 2 . . . xnk k ) = E(y1n1 y2n2 . . . yknk ) Finally, E(xn1 1 xn2 2 . . . xnk k ) = E(y1n1 y2n2 . . . yknk ) is true for all αm . The required results follow immediately. The first theorem aids in constructing families of densities. As for an example take the family of jpdf’s generated by the uniform kernel. Example 2.2.1. Multivariate skew uniform distribution. Let’s take the uniform density on the interval (−1, 1). The density is given by 1 u(z) = , z ∈ (−1, 1). 2

(2.10)

A k−dimensional extension of this density is obtained by considering k independent variables, x = (x1 , x2 , . . . , xk )′ , each with density u(.). The density of z is 1 z ∼ u∗ (z) = ( )k , z ∈ (−1, 1)k . 2

(2.11)

A skew symmetric density can be obtained by using the cdf of the standard normal distribution, Φ(.) In Theorem 2.2.1 above, use g(.) = u∗ (.), and H(.) = Φ(.), the skew

26 2 variate skew uniform density for α1=2, α2=−2 1 0.8 0.6 0.4 0.2 0 −1

0

1 x2

0

0.5

1

−0.5

−1

x1

Figure 2.1: Surface of skew uniform density in two dimensions; α1 = 2, α2 = −2, H(.) = Φ(.). symmetric density becomes

f (x, α) =

   Qk

j=1

 

Φ(αj e′j x), x ∈ (−1, 1)k 0, elsewhere.

Figures 2.1 to 2.3 show the flexibility of the skew uniform distribution. A skew normal density is obtained from normal kernel in the following example. Example 2.2.2. Multivariate skew normal densities In Theorem 2.2.1 above, let g(.) = φk (.) where , φk (.) is the k-dimensional standard normal density function. Also let H(.) and α be defined as in Theorem 2.2.1. We can construct a density for k-dimensional

27

2 variable skew uniform density for α1=2, α2=0

1 0.8 0.6 0.4 0.2 0 1 0.5

1 0.5

0

0

−0.5 x2

−0.5 −1

−1

x1

Figure 2.2: Surface of skew uniform density in two dimensions; α1 = 2, α2 = 0.

28

2 variable skew uniform density for α1=2, α2=10

1 0.8 0.6 0.4 0.2 0 1 0.5

1 0.5

0

0

−0.5 x2

−0.5 −1

−1

x1

Figure 2.3: Surface of skew uniform density in two dimensions; α1 = 2, α2 = 10.

29 2 variable skew normal density for various choices of the shape parameter α1=−5, α2=0

2

2

0

0

−2

−2

−4 −4

−4 −4

−2

0

2

4

x2

2 x2

4

0 −2

−2

0

2

−4 −4

4

x1

α1=0, α2=−5

α1=0, α2=0

α1=0, α2=5 4

2

2

0

x2

4

2

0 −2

−2

0

2

−4 −4

4

−2

0

2

−4 −4

4

α1=5, α2=−5

α1=5, α2=0

2

2 x2

2 x2

4

0 −2

0

2

−4 −4

4

0

2

4

2

4

α1=5, α2=5

4

−2

−2

x1

4

−2

4

0

x1

0

2

−2

x1

−4 −4

0

x1

4

−4 −4

−2

x1

−2

x2

α1=−5, α2=5

4

x2

x2

x2

α1=−5, α2=−5 4

0 −2

−2

0

x1

2

4

−4 −4

x1

−2

0 x1

Figure 2.4: The contours for 2 dimensional skew normal pdf for different values for α and for H(.) = Φ(.). joint p.d.f ’s of the form

f (y, α) = 2k φk (y)

k Y

H(αj e′j y)

(2.12)

j=1

Using Theorem 2.2.2 we can relate some properties of the density introduced by Example 2.2.2 with its kernel density, φk (.). Let x ∼ φk (x). Let y be the random variable with probability density function

k

f (y, α) = 2 φk (y)

k Y i=1

Then,

H(αi e′i y).

30

α1=3, α2=5

0.4 0.2 0 3 2 1 0 −1 x2

−2

−2

−1

0

1

2

3

x1

Figure 2.5: The surface for 2 dimensional pdf for skew normal variable for α1 = 3, α2 = 5, H(.) = Φ(.).

31 (a) the even moments of y and x are the same, i.e E(yy ) = E(xx ) for p even ′ p

′ p

and E(y ′ y)m = E(x′ x)m for any natural number m, (b) y ′ y and x′ x both have χ2k distribution. Example 2.2.3. Multivariate skew-Laplace densities. Take g(.) as the joint density of k iid Laplace variables: k X 1 |xj |). g(x) = k exp(− 2 j=1

Let H(.) and α be defined as in Theorem 2.2.1. We get the following density: k k X Y exp(− |xj |) H(αj e′j x). j=1

2.2.1

j=1

Independent Observations, Location-Scale Family

In this section, we consider a location scale family of skew symmetric random variables. But first observe the following: Remark 2.2.1. The family of densities introduced by Equation 2.5 are different models from the family of densities introduced in Theorem 2.2.1 except for the case k = 1. Remark 2.2.2. (Joint Density for Random Sample). Let g(.) be a density function symmetric about 0, H(.) be an absolutely continuous cumulative distribution function with H ′ (.) symmetric around 0k . Let y1 , y2 , . . . , yn constitute a random sample from a distribution with density f (y, α) = 2g(y)H(αy). Then the joint p.d.f.

of

y = (y1 , y2 , . . . , yn )′ is n ∗

f (y, α) = 2 g (y)

n Y i=1

where g ∗ (y) =

Qn

i=1

n ∗

H(αyi ) = 2 g (y)

n Y

H(αe′i yi )

i=1

g(yi ), belongs to the family of densities introduced in Theorem

2.2.1. Observe that density of the random sample can not be represented by the family

32

0.6 0.4

−2

0.2

−1

0

0

−2 1

−1 2

0 1

3 2 4 3

x

1

4

5

x2

Figure 2.6: The surface for 2−dimensional pdf for skew Laplace variable for α1 = 3, α2 = 5, H(.) = Φ(.).

33 of multivariate skew symmetric densities introduced by Equation 2.5, i.e., in the form f (y, α) = 2g(y)H(α′ y). This puts a doubt on the usefulness of the latter. Remark 2.2.3. (Joint Density of Independent Skew-Symmetric Variables). Let g(.) be a density function symmetric about 0, H(.) be an absolutely continuous cumulative distribution function with H ′ (.) symmetric around 0. Let

zj ∼ f (z, αj ) = 2g(z)H(αj z) with mean µ(αj ) and variance σ 2 (αj ) for j = 1, 2, . . . , k be independent variables. Then the joint p.d.f. of z = (z1 , z2 , . . . , zk )′ is

k ∗

f (z, α) = 2 g (z)

k Y

H(αj e′j z)

j=1

where α = (α1 , α2 . . . , αk )′ , g ∗ (z) =

Qk

j=1

g(zj ) and e′j are the elementary vectors of

the coordinate system Rk . This density belongs to the family of densities introduced in Theorem 2.2.1. Note that the density of z can not be represented by the family of multivariate skew symmetric densities introduced by Equation 2.5. Let z ∼ f (z, α) = 2k g ∗ (z)

Qk

j=1

H(αj e′j z). By construction, the mean of z is

E(z) = µ0 = (µ(α1 ), µ(α2 ), . . . , µ(αk ))′ ,

and covariance is C(z) = Σ0 = diag(σ 2 (α1 ), σ 2 (α2 ), . . . , σ 2 (αk )). Now let Σ be a positive definite covariance matrix of dimension k, and let y = Σ1/2 z.

34 Then

k 2k ∗ −1/2 Y g (Σ y) H(αj e′j Σ−1/2 y). y ∼ f (y, α, Σ) = |Σ|1/2 j=1

By letting αj Σ−1/2 ej = λj for j = 1, 2, . . . , k, we get

f (y, λ1 , λ2 . . . , λk , Σ) =

k 2k ∗ −1/2 Y g (Σ y) H(λj ′ y). 1/2 |Σ| j=1

This time, the mean of y is E(y) = Σ1/2 µ′0 , and covariance matrix is C(y) = Σ1/2 Σ0 Σ1/2 . Next, let x = y + µ where µ ∈ Rk . The probability density function of x is k Y 2k ∗ −1/2 g (Σ (x − µ)) H(αj e′j Σ−1/2 (x − µ)). f (x, α, Σ, µ) = 1/2 |Σ| j=1

The mean of x is E(x) = Σ1/2 µ0 + µ, and covariance matrix of x is C(x) = Σ1/2 Σ0 Σ1/2 .

The results in the previous section lead to the following definition: Definition 2.2.1. (Multivariate Skew-Symmetric Density). Let g(.) be a density function symmetric about 0, H(.) be an absolutely continuous cumulative distribution function with H ′ (.) symmetric around 0. A variable x has skew-symmetric distribution if it

35 has probability density function k Y 2k ∗ −1/2 g (Σ (x − µ)) H(αj e′j Σ−1/2 (x − µ)) f (x, α, Σ, µ) = |Σ|1/2 j=1

(2.13)

where αj are scalars, α = (α1 , α2 . . . , αk )′ , µ ∈ Rk , Σ is a positive definite covariance Q matrix, g ∗ (y) = kj=1 g(yj ) is a k-dimensional density function symmetric around 0.

We call this density skew-symmetric density with location parameter µ, scale parameter 1/2 Σ1/2 , and shape parameter α and denote it as ssg,H , α). k (µ, Σ

Note that we could have taken the the matrix Σ1/2 in the linear form x = Σ1/2 z + µ to be any (m × k) matrix(for µ ∈ Rm ). Then we do not always have a density for the random variable x, nevertheless the distribution of x is defined by the moment generating function if it exists. Let z ∼ ssg,H k (0k , Ik , α). Then the moment generating function of z evaluated at tk ∈ Rk ,Mz (tk ), can be obtained as follows.

t′k z

Mz (tk ) = E(e

)=

Z

...

Z

t′k z k ∗

e

2 g (z)

Rk

k Y

H(αj e′j z)dz

t′k z

= Eg∗ (z) (e

j=1

k Y

H(αj e′j z))

j=1

Let x = Az + µ for constant (m × k) matrix A and m dimensional constant vector µ. The moment generating function of x evaluated at tm ∈ Rm is Mx(tm ) can be obtained as follows.

t′m µ

Mx(tm ) = e



t′m µ

Mz (A tm ) = e

A′ t′m z

Eg∗ (z) (e

k Y

H(αj e′j z))

j=1

Definition 2.2.2. (Multivariate Skew Symmetric Random Variable) Let g(.) be a density function symmetric about 0, H(.) be an absolutely continuous cumulative distribution function with H ′ (.) symmetric around 0. Let zj ∼ f (z, αj ) = 2g(z)H(αj z) for

36 j = 1, 2, . . . , k be independent variables. Then



k ∗

z = (z1 , z2 , . . . , zk ) ∼ f (z, α, Σ) = 2 g (z) where g ∗ (z) =

Qk

j=1

k Y

H(αj e′j z)

j=1

g(zj ) and ej for j = 1, 2, . . . , k are the elementary vectors of the

coordinate system Rk . Let A be a m × k constant matrix, and µ be a k-dimensional constant real vector. A random variable x = Az + µ is distributed with respect to multivariate skew symmetric distribution with location parameter µ, scale parameter g,H A, and shape parameter α. We denote this by x ∼ SSm,k (µ, A, α).

By this definition we can write the following properties: Property 2.2.1. Let zj ∼ f (z, αj ) = 2g(z)H(αj z) with mean µ(αj ) and variance σ 2 (αj ) for j = 1, 2, . . . , k be independent variables. Let z = (z1 , z2 , . . . , zk )′ and x = g,H Az + µ and α = (α1 , α2 . . . , αk )′ . Then, x ∼ SSm,k (µ, A, α). By construction, the

mean of x is E(x) = Aµ0 + µ, and covariance matrix of x is C(x) = AΣ0 A′ where µ0 = (µ(α1 ), µ(α2 ), . . . , µ(αk ))′ and Σ0 = diag(σ 2 (α1 ), σ 2 (α2 ), . . . , σ 2 (αk )).

2.3 2.3.1

Multivariate Skew-Normal Distribution Motivation: Skew-Spherically Symmetric Densities

Theorem 2.3.1. Let g(.) be a k−dimensional spherical jpdf, i.e. satisfies g(x) = k(x′ x) for all x ∈ Rk for some density k(r2 ) defined for r ≥ 0, H(.) be a absolutely continuous cumulative distribution function with H ′ (.) is symmetric around 0, α = (α1 , α2 , . . . , αk ) be a k−dimensional real vector, and ej for j = 1, 2, . . . , k are the elementary vectors of Rk . Then

f (y, α) = c−1 g(y)

k Y j=1

H(αj e′j y)

(2.14)

37 Qk defines a probability density function where c = Ex( j=1 P (zj ≤ αj xj )|x) for (z1 , z2 , . . . , zk ) are identically distributed random variables with cdf H(.) independent of x ∼ g(x). Proof. f (y) is nonnegative, we have to prove Z

Rk

cf (y)dy =

Z

Rk

R

g(y)

Rk

f (y)dy = 1. Write

k Y

H(αj e′j y)dy

j=1

k Y = Ex( P (zj ≤ αj xj )|x). j=1

The last equality holds because z1 , z2 , . . . , zk are identically distributed random variables with cdf H(.) independent of x ∼ g(x). This concludes the proof. When we choose g(.) = φk (.) and (z1 , z2 , . . . , zk )′ independent, this reduces to the denQ sity introduced in Example 2.2.1. To see this observe that Ex( kj=1 P (zj ≤ αj xj )|x) = Qk Qk 1 ′ j=1 P (zj ≤ αj xj ) = j=1 P (zj − αj xj ≤ 0) = 2k for both H (.) and φ1 (.) are sym-

metric zj − αj xj j = 1, 2, . . . , k have iid symmetric distribution.

2.3.2

Multivariate Skew-Normal Distribution

Definition In a recent article Chen and Gupta ([32]) pointed that neither of the multivariate skew-normal models (2.3, 2.4) cohere with the joint distribution of a random sample from a univariate skew-normal distribution and introduced an alternative multivariate skew-normal model which overcomes this problem ([32]):

k

SNk (y, α, Σ) = f (y, α, Σ) = 2 φk (y, Σ)

k Y i=1

Φ(λ′i y)

(2.15)

−1/2

k

where α ∈ R and Λ = (λ1 , λ2 , . . . , λk ) = Σ

38 diag(α1 , α2 , . . . , αk ), φk (., Σ) is the k-

dimensional normal density function with covariance Σ and Φ(.) denotes the cumulative distribution function of the univariate standard normal variable. Chen and Gupta’s skew normal model ([32]) is in the form of skew symmetric density introduced in Definition 2.2.1. First note that a random variable z with probability density function

f (z, α) = 2φ(z)Φ(αz)

where α is a real scalar, φ(.) is the univariate standard normal density function and Φ(.) denotes the cumulative distribution function of the univariate standard normal q α 2α2 2 variable has mean µ(α) = π2 √1+α 2 , variance σ (α) = (1 − π(1+α2 ) ). (Skewness: γ1 = 4−π (µ(α))3 , 2 (σ 2 (α))3/2

4

(µ(α)) Kurtosis: γ2 = 2(π − 3) (σ 2 (α))2 .)

Let zj ∼ f (z, αj ) = 2φ(z)Φ(αj z) for j = 1, 2, . . . , k be independent variables. Then the joint p.d.f. of z = (z1 , z2 , . . . , zk )′ is

f (z, α1 , α2 . . . , αk ) = 2k φk (z)

k Y

Φ(αj e′j z)

j=1

where φk (z) =

Qk

j=1

φ(zj ) is the k-dimensional standard normal variable and ej are

the elementary vectors of the coordinate system Rk . Mean of z is E(z) = µ0 = (µ(α1 ), µ(α2 ), . . . , µ(αk ))′ ,

and covariance is C(z) = Σ0 = diag(σ 2 (α1 ), σ 2 (α2 ), . . . , σ 2 (αk )).

39 Now let Σ be a positive definite covariance matrix, and let y = Σ

1/2

z. Then

k Y 2k −1/2 φk (Σ y) Φ(αj e′j Σ−1/2 y). y ∼ f (y, α, Σ) = |Σ|1/2 j=1

Let x = y + µ where µ ∈ Rk . The probability density function of x is k Y 2k −1/2 f (x, α, Σ, µ) = φk (Σ (x − µ)) Φ(αj e′j Σ−1/2 (x − µ)). |Σ|1/2 j=1

Density Definition 2.3.1. (Multivariate Skew Normal Density). We call the density

f (x, α, Σ, µ) =

k Y 2k −1/2 φ (Σ (x − µ)) Φ(αj e′j Σ−1/2 (x − µ)) k |Σ|1/2 j=1

(2.16)

as the density of a multivariate skew normal variable with location parameter µ, scale parameter Σ1/2 , and shape parameter α = (α1 , α2 . . . , αk )′ ∈ Rk . and denote it by snk (µ, Σ1/2 , α). Remark 2.3.1. Neither models (2.2, 2.3) can represent the joint density of independent univariate variables xj ∼ SN1 (σj , αj ) except for some special cases. The joint density of independent univariate skew normal variables belong the new family of densities introduced by Definition 2.3.1. We can take the matrix Σ1/2 in the linear form x = Σ1/2 z +µ to be any (m×k) matrix. Then we do not have a density for the random variable x in all cases, nevertheless the distribution of x is defined by the moment generating function if it exists.

Moment Generating Function We need the following lemmas: See Zacks ([61]) and Chen and Gupta ([17]).

40 Lemma 2.3.1. Let z ∼ φk (z). For scalar b, a ∈ R , and for Σ a positive definite k

b matrix of order k E(Φ(b + a′ Σ1/2 z)) = Φ( (1+a′ Σa) 1/2 ).

Lemma 2.3.2. Let Z ∼ φk×n (Z). For scalar b, a ∈ Rk , and for A and B positive definite matrices of order k and n respectively, E(Φn (b + a′ AZB)) = Φn (a, (1 + a′ AA′ a)1/2 B). Let z ∼ snk (0k , Ik , α). The moment generating function of z evaluated at tk ∈ Rk is Mz (tk ) can be obtained as follows.

t′k z

Mz (tk ) = Eφk (z) (e

k Y

Φ(αj e′j z))

j=1

2k = (2π)k/2

Z

2k (2π)k/2

Z

=

2k e = (2π)k/2 1

=



2k e 2 tk tk (2π)k/2 1 t ′t 2 k k

2k e (2π)k/2 k

=2 e

e

Rk

1 t ′t 2 k k

=

− 21 z′ z+tk ′ z

1 t ′t 2 k k

=2 e

1 t ′t 2 k k

Φ(αj e′j z)dz

j=1

1





e− 2 (z z−2tk z)

Rk

k Y

Φ(αj e′j z)dz

j=1

Z

e

Z

e− 2 (z−tk ) (z−tk )

− 21 (z′ z−2tk ′ z+tk ′ tk )

Rk 1

Φ(αj e′j z)dz



k Y

Φ(αj e′j z)dz

j=1

k Z Y

1

2

e− 2 (zj −(tk )j ) Φ(αj z j )dzj

R

j=1

k Z Y k Y

k Y j=1

Rk

j=1

k

k Y

R

p

1 (2π)

1

2

e− 2 (yj ) Φ(αj y j + αj (tk )j )dyj

αj (tk )j Φ( p ) 2) (1 + α j j=1

Let x = Az + µ for constant (m × k) matrix A and m dimensional constant vector µ. Then the moment generating function of x evaluated at tm ∈ Rm is Mx(tm ) can be

41 obtained as follows. t′m µ

Mx(tm ) = e

k t′m µ+ 21 t′m AA′ tm



Mz (A tm ) = 2 e

k Y

αj (A′ tm )j ) Φ( p (1 + αj 2 ) j=1

Hence the following definition and theorem. Definition 2.3.2. (Multivariate Skew Normal Random Variable) Let zj ∼ 2φ(z)Φ(αj z) for j = 1, 2, . . . , k be independent univariate skew normal random variables. Then



k

z = (z1 , z2 , . . . , zk ) ∼ 2 φk (z) where φk (z) =

Qk

j=1

k Y

Φ(αj e′j z)

j=1

φ(zj ) and e′j are the elementary vectors of the coordinate sys-

tem Rk . Let A be a m × k constant matrix, and µ be a k-dimensional constant real vector. A m dimensional random variable x = Az + µ is distributed with respect to multivariate skew symmetric distribution with location parameter µ, scale parameter A, and k dimensional shape parameter α = (α1 , α2 . . . , αk )′ . We denote this by x ∼ SNm,k (µ, A, α). Theorem 2.3.2. If x has multivariate skew-normal distribution SNm,k (µ, A, α) then the moment generating function of x evaluated at tm ∈ Rm is given by k t′m µ+ 12 t′m AA′ tm

Mx(tm ) = 2 e

k Y

αj (A′ tm )j Φ( p ). (1 + αj 2 ) j=1

(2.17)

Moments The expectation and covariance of a multivariate skew normal variable x with distribution SNm,k (µ, A, α) are easily calculated from the expectation and variance of the

42 univariate skew normal variable: Let z ∼ snk (0k , Ik , α). Then, 

√ α1

 1+α21  r  √ α2 2  1+α22 A E(x) = AE(z) + µ =  .. π  .   √ αk 2

1+αk

Cov(x) = ACov(z)A′ = A(Ik −



     + µ,    

α12 α22 αk2 2 diag( , , . . . , ))A′ . 2 2 2 π 1 + α1 1 + α2 1 + αk

Given its jmgf, the cumulants of the multivariate skew normal distribution with µ = 0 can be calculated. The cumulant generating function is given by k X 1 ′ αj (A′ t)j ′ )). Kx(t) = log(Mx(t)) = k log(2) + t AA t log(Φ( p 2 (1 + αj 2 ) j=1

α e′ A′ t

α e′ A′

∂Kx(t) |t=0k = AA′ t + ∂t

j j j k √ φ( √ j 2 ) X (1+αj 2 ) (1+αj )

αj e′ A′ t

j=1



√ α1  1+α21  r  √ α2 2  1+α22 = A  .. π  .   √ αk 2

1+αk

∂ 2 Kx(t) |t=0k = AA′ + ∂t∂t′

k X

= A(Ik −

j=1



Φ( √ j 2 ) (1+αj ) 

|t=0k

        

αj e′j A′ t

φ′ ( √

)



αj e′j A′ t

φ( √

)

2 

αj2 (1+αj 2 ) (1+αj 2 )    ′ ′ Ae ⊗ e A − |   j ′ j αj e A ′ t αj e′ A′ t   t=0k 1 + αj2 ) ) Φ( √ j Φ( √ j (1+αj 2 )

α12 α22 αk2 2 diag( , , . . . , ))A′ . π 1 + α12 1 + α22 1 + αk2

(1+αj 2 )

43 p k 8 4 (2) X αj ∂ 3 Kx(t) |t=0k = ( − √ ) (q )3 Aej ⊗ e′j A′ ⊗ Aej . ′ ∂t∂t ∂t π π j=1 2 1+α j

Linear Forms By Definition 2.3.2 we can write z ∼ SNk,k (0k , Ik , α), and prove the following theorems. Theorem 2.3.3. Assume that y ∼ SNm,k (µ, A, α) and x = By + d with B a (l × m) matrix and d is a l-dimensional real vector. Then x ∼ SNl,k (Bµ + d, BA, α). Proof. From assumption we have y = Az + µ, and so x = (BA)z + (Bµ + d), i.e., x ∼ SNl,k (Bµ + d, BA, α). Corollary 2.3.1. Assume that y ∼ snk (µ, A, α) where A is a full rank square matrix of order k. Let Σ = AA′ , and let Σ1/2 be the square root of Σ as defined in Section d

2.1.1. Assume x ∼ SNk,k (µ, Σ1/2 , α). Then x − µ = O(y − µ) where O = Σ1/2 A−1 . Corollary 2.3.2. Assume that y ∼ SNm,k (µ, A, α) and let 

(1)





(1)



 y  y= , y (2)  µ  µ= , µ(2) 



 A11 A12  A=  A21 A22 (1)

(1)

where y , µ

are l-dimensional, A11 is (l×l). Then y

(1)

  (1) ∼ SNl,k µ , ( A11 A12 ), α .

Proof. In theorem above put B =



Il 0l×(m−l)



44 and d = 0l .

(2) (2) Corollary 2.3.3. y (1) ∼ SNm,k (µ(1) , A, α(1) ), y (2) ∼ SNm,k (µ , B,  α ) are inde(1)  α  (1) (2) (1) (2) pendent, then x = y + y has SNm,2k (µ + µ , [A, B],  ) distribution. α(2)

Corollary 2.3.4. Let y (i) ∼ SNm,k (µ, A, α), be independent for i = 1, 2, . . . , n, then    α     α  Pn   S = i=1 y (i) has SNm,nk (nµ, [A, A, . . . , A],  . ) distribution. The jmgf can be  ..      α written as k Y αj (A′ tm )j n nk t′m nµ+ 21 nt′m AA′ tm (Φ( p )) . MS (tm ) = 2 e (1 + αj 2 ) j=1 The jmgf of S/n is 1 ′ nk t′m µ+ 21 n tm AA′ tm

MS/n (tm ) = 2 e

k Y αj (A′ tm )j ))n . (Φ( p 2 n (1 + αj ) j=1

Some Illustrations To illustrate the preceding theory, we skew normal distribution.  consider the bivariate   a11 . . . a1k  Assume (x1 , x2 )′ ∼ SN2,k ((µ1 , µ2 )′ ,   , (α1 , . . . , αk )′ ). The mean vector a21 . . . a2k is    a11 . . . a1k  ′ ′ E((x1 , x2 )′ ) =   (µ(α1 )0 , . . . , µ(αk )0 ) + (µ1 , µ2 ) , a21 . . . a2k 



 a11 µ(α1 )0 + . . . + a1k µ(αk )0 + µ1  =  a21 µ(α1 )0 + . . . + a2k µ(αk )0 + µ2

(2.18)

45 covariance can be written as 





′

 a11 . . . a1k   a11 . . . a1k  2 2 C((x1 , x2 )′ ) =   diag(σ (α1 ), . . . , σ (αk ))   a21 . . . a2k a21 . . . a2k 



 =

2



2

′

 a11 σ (α1 ) . . . a1k σ (αk )   a11 . . . a1k  =   a21 σ 2 (α1 ) . . . a2k σ 2 (αk ) a21 . . . a2k a211 σ 2 (α1 )

2

2



a11 a21 σ (α1 ) + . . . + a1k a2k σ (αk )   (2.19) a11 a21 σ 2 (α1 ) + . . . + a1k a2k σ 2 (αk ) a221 σ 2 (α1 ) + . . . + a22k σ 2 (αk )

where µ(α) =

pπ 2

+ ... +

a21k σ 2 (αk )

√ α , 1+α2

and σ 2 (α) = (1 −

2α2 ). π(1+α2 )

First, let’s take k = 1. Then the mean vector becomes E((x1 , x2 )′ ) = (a11 , a21 )′ µ(α1 )0 + (µ1 , µ2 )′ = (µ(α1 )0 a11 + µ1 , µ(α1 )0 a21 + µ2 )′ (2.20)

r r π α1 a11 π α1 a21 p p =( + µ + µ2 )′ 1, 2 1 + α12 2 1 + α12

(2.21)

covariance can be written as

C((x1 , x2 )′ ) = (a11 , a12 )σ 2 (α1 )(a11 , a12 )′ = (1 −





2 a12 a21  2α12  a11 )   . (2.22) 2 π(1 + α1 ) 2 a12 a21 a22

For k = 2, the mean vector becomes 



 a11 a12  ′ ′ E((x1 , x2 )′ ) =   (µ(α1 )0 , µ(α2 )0 ) + (µ1 , µ2 ) a21 a22

(2.23)

46 r r r r π α1 a11 π α2 a12 π α1 a21 π α2 a22 p p p p + + µ1 , + + µ2 )′ =( 2 2 2 2 1 + α1 2 1 + α2 2 1 + α1 2 1 + α22

(2.24)

covariance can be written as 





′

 a11 a12   a11 a12  2 2 C((x1 , x2 )′ ) =   diag(σ (α1 ), σ (α2 ))   a21 a22 a21 a22 

 =



σ 2 (α1 )a211 + σ 2 (α2 )a212

a11 a21 σ 2 (α1 ) + a12 a22 σ 2 (α2 )  . 2 2 2 2 2 2 a11 a21 σ (α1 ) + a12 a22 σ (α2 ) σ (α1 )a21 + σ (α2 )a22

(2.25)

(2.26)

For k = 3, the mean is 



 a11 µ(α1 )0 + a12 µ(α2 )0 + a13 µ(α3 )0 + µ1   , a21 µ(α1 )0 + a22 µ(α2 )0 + a23 µ(α3 )0 + µ2

(2.27)

covariance is   

a211 σ 2 (α1 )

+

a213 σ 2 (α3 )

2

2



a11 a21 σ (α1 ) + a13 a23 σ (α3 )   a11 a21 σ 2 (α1 ) + a13 a23 σ 2 (α3 ) a221 σ 2 (α1 ) + a223 σ 2 (α3 )

(2.28)

47 Independence, Conditional Distributions Theorem 2.3.4. Assume that y ∼ snk (µ, Σ1/2 , α) and let 

(1)



 y  y=  (2) y   (1)  µ  µ=  µ(2)   (1)  α  α= , (2) α   1/2 1/2  Σ11 Σ12  Σ1/2 =   1/2 1/2 Σ21 Σ22 1/2

1/2

1/2

where y (1) , µ(1) , α(1) are l-dimensional, Σ11 is (l × l). If Σ12 = 0 and Σ21 = 0 then 1/2

1/2

y (1) ∼ snl (µ(1) , Σ11 , α(1) ) is independent of y (2) ∼ snk−l (µ(2) , Σ22 , α(2) ). Proof. The inverse of Σ1/2 is 

 Σ−1/2 = 

−1/2 Σ11

0

0 −1/2

Σ22



 .

Note that the quadratic form in the exponent of snk (µ, Σ1/2 , α) is (1) (2) Q = (y − µ)′ Σ−1 (y − µ) = (y (1) − µ(1) )′ Σ−1 − µ(1) ) + (y (2) − µ(2) )′ Σ−1 − µ(2) ). 11 (y 22 (y

Also k Y j=1

Φ(αj e′j Σ−1/2 (x−µ)) =

l Y j=1

−1/2

Φ(αj e′j Σ11 (x(1) −µ(1) ))

k Y

j=l+1

−1/2

Φ(αj e′j Σ22 (x(2) −µ(2) ))

48 and |Σ| = |Σ11 ||Σ22 |. The density of y can be written as snk (µ, Σ1/2 , α) =

l Y 2l −1/2 −1/2 (1) (1) φ (Σ (y − µ )) Φ(αj e′j Σ11 (y (1) − µ(1) )) l 11 |Σ11 |1/2 j=1

k−l Y 2k−l −1/2 −1/2 (2) (2) × φk−l (Σ22 (y − µ )) Φ(αj e′j Σ22 (y (2) − µ(2) )) 1/2 |Σ22 | j=1 1/2

1/2

= snl (µ(1) , Σ11 , α(1) )snk−l (µ(2) , Σ22 , α(2) ).

Since the ordering of variables are irrelevant, the above discussion also proves the following corollary: Corollary 2.3.5. Let y ∼ snk (µ, Σ1/2 , α) and assume that [i, j] partitions the indices k = 1, 2, . . . , k such that (Σ1/2 )(ij) = 0 for all i ∈ i and all j ∈ j. Then, the marginal joint distribution of variables with indices in i is skew normal with location, scale, and shape parameters are obtained by taking the corresponding components of µ, Σ1/2 and α, respectively. We can also prove the following: Corollary 2.3.6. Let y ∼ snk (µ, A, α) write A = (a′1 , a′2 , . . . , a′k )′ and assume that [i, j] are two disjoint subsets of the indices k = 1, 2, . . . , k such that a′i aj = 0 for all i ∈ i and all j ∈ j. Let Ai denote the matrix of rows of A with indices in i and Aj be the matrix of rows of A with indices in j. Then, then there exists a k dimensional skew normal variable, say w, that can be obtained by a nonsingular linear transformation to y, so that for w the variables with indices in i are independent of the variables with indices in j. Proof. The hypothesis about the matrix of linear transformation A requires that AA′ = Σ to be such that (Σ)ij = 0 for all i ∈ i and all j ∈ j. Let x ∼ snk (µ, Σ1/2 , α), and for

49 this distribution we have the variables with indices in i independent of the variables d

with indices in j by Corollary 2.3.5. By corollary 2.3.1, we can write y − µ = O(x − µ) for O = Σ1/2 A−1 . This concludes the proof. Corollary 2.3.7. Assume that x ∼ snk (µ, A, α) and let 







(1)  x  x= , (2) x

(1)  µ  µ= , (2) µ





 A11 A12  A=  A21 A22 where x(1) , µ(1) are l-dimensional, A11 is (l × l). Let 

 Il C= 0

−A12 A−1 22 Ik−l



 .

Consider the variable y = (CAA′ C ′ )1/2 (CA)−1 (Cx − Cµ). Write



(1)



 y  y=  y (2) where y (1) is l-dimensional. Then y ∼ snk (0k , (CAA′ C ′ )1/2 , α), and y (1) is independent of y (2) .

50 Corollary 2.3.8. Assume that x ∼

snk (µ, Σc1/2 , α) 

(1)





(1)



and let

 x  x= , x(2)  µ  µ= , µ(2) 



(1)  α  α= , (2) α





 A11 0  Σ1/2 =  c A21 A22 where x(1) , µ(1) are l-dimensional, A11 is (l × l) lower triangular matrix with positive diagonal elements, A22 is ((k−l)×(k−l)) lower triangular matrix with positive diagonal elements. Let



 C= Consider the variable



Il

0  .

−A21 A−1 Ik−l 11 

(1)



 x  y = Cx =  . y (2) (1) Then x(1) is independent of y (2) = x(2) − A21 A−1 which has 11 x (1) (2) snk−l (µ(2) − A21 A−1 11 µ , A22 , α ) distribution. The joint distribution of y is given by

(1) (2) snl (µ(1) , A11 , α(1) )snk−l (µ(2) − A21 A−1 11 µ , A22 , α ).

(1) The density of x then can be obtained by substituting y (2) = x(2) − A21 A−1 11 x , the

51 density of x is l Y 2l (1) −1 (1) (1) φk (A11 (x − µ )) Φ(αj e′j A11 −1 (x(1) − µ(1) )) |A11 | j=1

2k−l (1) φk (A22 −1 (x(2) − (µ(2) + A21 A−1 − µ(1) )))) 11 (x |A22 | k−l Y (2) (1) × Φ(αj e′j A22 −1 (x(2) − (µ(2) + A21 A−1 − µ(1) )))). 11 (x

×

j=1

(1) (1) (2) The conditional distribution of x(2) given x(1) is snk−l (µ(2) +A21 A−1 11 (x −µ ), A22 , α ).

2.3.3

Generalized Multivariate Skew-Normal Distribution

We can define a multivariate skew normal density with any choice of skewing cdf H that has properties in Theorem 2.2.1. In the next section, we define and study some properties of this generalized multivariate skew normal density Definition 2.3.3. (Generalized Multivariate Skew Normal Density) Let x have a multivariate skew symmetric distribution with kernel φk (.) and skewing distribution H(.) defined as in Definition 2.2.1 with location parameter µ = 0k , scale parameter Σ1/2 , and shape parameter α = (α1 , α2 . . . , αk )′ ∈ Rk ., i.e. the density of x is k Y 2k −1/2 φk (Σ x) f (x, α, Σ, µ = 0k ) = H(αj e′j Σ−1/2 x). |Σ|1/2 j=1

We call this density the generalized skew normal density, and denote it by gsnH k (µ, Σ, α). Definition 2.3.4. (Generalized Multivariate Skew Normal Random Variable) Let H(.) be defined as in Theorem 2.2.1. Let zj ∼ 2φ(z)H(αj z) for j = 1, 2, . . . , k be independent univariate skew symmetric random variables. Then

z = (z1 , z2 , . . . , zk )′ ∼ 2k φk (z)

k Y j=1

H(αj e′j z)

where φk (z) =

Qk

j=1

φ(zj ) and

e′j

52 are the elementary vectors of the coordinate system

Rk . Let A be a m × k constant matrix, and µ be a k-dimensional constant real vector. A m dimensional random variable x = Az + µ is distributed with respect to multivariate skew symmetric distribution with location parameter µ, scale parameter A, and k dimensional shape parameter α = (α1 , α2 . . . , αk )′ . We denote this by H x ∼ GSNm,k (µ, A, α).

Theorem 2.3.5. If x has generalized multivariate skew symmetric distribution H GSNm,k (µ, A, α) then the moment generating function of x evaluated at tm ∈ Rm is

given by k tm µ+ 12 tm ′ AA′ tm

Mx(tm ) = 2 e

k Y

Eφ(y) (H(αj y j + αj (A′ tm )j ))

(2.29)

j=1

Proof. If z has generalized multivariate skew symmetric density gsnH k (0k , Ik , α) then the moment generating function of z evaluated at tk ∈ Rk is given by

53

t′k z

Mz (tk ) = Eφk (z) (e

k Y

H(αj e′j z))

j=1

2k (2π)k/2

Z

2k = (2π)k/2

Z

=

1

=



1 t ′t 2 k k

2k e (2π)k/2

1 t ′t 2 k k

2k e (2π)k/2 k

=2 e

=2 e



1 t ′t 2 k k

1 t ′t 2 k k

k Y

H(αj e′j z)dz

j=1

− 21 (z′ z−2tk ′ z)

e

Rk

k Y

H(αj e′j z)dz

j=1

Z

− 21 (z′ z−2tk ′ z+tk ′ tk )

e

Rk

Z

1



Rk

H(αj e′j z)dz

k Y

H(αj e′j z)dz

j=1

k Z Y

1

2

e− 2 (zj −(tk )j ) H(αj z j )dzj

R

j=1

k Z Y k Y

k Y j=1

e− 2 (z−tk ) (z−tk )

j=1

k



Rk

2k e 2 tk tk = (2π)k/2 =

1

e− 2 z z+tk z

R

1 1 2 p e− 2 (yj ) H(αj y j + αj (tk )j )dyj (2π)

Eφ(y) (H(αj y j + αj (tk )j ))

j=1

Let x = Az + µ for constant (m × k) matrix A and m dimensional constant vector µ. Then the moment generating function of x evaluated at tm ∈ Rm is Mx(tm ) can be obtained as follows. t′m µ

Mx(tm ) = e



k t′m µ+ 21 tm ′ AA′ tm

Mz (A tm ) = 2 e

k Y

Eφ(y) (H(αj y j + αj (A′ tm )j ))

j=1

By Definition 2.3.4 we can write z ∼ gsnH k,k (0k , Ik , α), and prove the following theorem. H Theorem 2.3.6. Assume that y ∼ GSNm,k (µ, A, α) and x = By +d with B a (l ×m) H matrix and d is a l-dimensional real vector. Then x ∼ GSNl,k (Bµ + d, BA, α).

54 Proof. From assumption we have y = Az + µ, and so x = (BA)z + (Bµ + d), i.e., H x ∼ GSNl,k (Bµ + d, BA, α). H Theorem 2.3.7. Assume that y ∼ GSNm,k (µ, A, α) and let



(1)





(1)



 y  y= , y (2)  µ  µ= , µ(2) 



 A11 A12  A=  A21 A22 H where y (1) , µ(1) are l-dimensional, A11 is (l × l). Then y (1) ∼ GSNl,k (µ(1) , A11 , α).

Proof. In theorem above put B =   (1) H GSNl,k µ , ( A11 A12 ), α .



Il 0l×(m−l)



and d = 0l . Then we obtain y (1) ∼

In the rest of this section we study some members of the generalized multivariate skew normal distribution. Different choices for the skewing function H(.) will give different distribution families. We will take H(.) to be the cdf of Laplace, logistic, or uniform in the sequel. Some properties of these distributions are studied.

Multivariate Skew Normal-Laplace Distribution The cdf of Laplace distribution is given by

H1 (x) =

   1 − 1 exp(−x), x ≥ 0, 2  

1 exp(x), x 2

αu has density f (y, α) as given in (2.30) above. Corollary 2.3.12. Let zj ∼ f (z, αj ) = 2g(z)H(αj z)

61 for j = 1, 2, . . . , k be independent variables. The joint p.d.f. of z = (z1 , z2 , . . . , zk ) is

f (z, α) = 2k g ∗ (z)

k Y

H(αj e′j z)

j=1

Qk

where g ∗ (z) =

j=1

g(zj ) and e′j are the elementary vectors of the coordinate system 1

Rk . For µ ∈ Rk Let Σ1/2 be a positive definite matrix, x = Σ 2 z + µ has p.d.f. 1/2

f (x, α, Σ

k Y 2k ∗ −1/2 , µ) = g (Σ (x − µ)) H(αj e′j Σ−1/2 (x − µ)). 1/2 |Σ| j=1

(2.31)

A random variable y with density f (y, α, Σ1/2 , µ) has the same distribution with x. Using Theorem 2.3.8 and Corrolary 2.3.12, we can generate skew symmetric random vectors from snk (µ, Σ1/2 , α) density.

2.4

Extensions: Some Other Skew Symmetric Forms

In Theorem 2.2.1, and 2.2.2 we have seen that a function of the form

−1

f (y, α) = c g(y)

k Y

H(αj yj )

j=1

is a jpdf for a k−dimensional random variable if certain assumptions are met. Genton and Ma ([45]) obtain a flexible family of densities by replacing αy in this density by an odd function, say w(y). This motivates the following theorem. Theorem 2.4.1. Let g(.) be the k-dimensional density centrally symmetric around 0k (i.e. g(x) = g(−x)), H(.) be a absolutely continuous cumulative distribution function

62 ′

with H (.) is symmetric around 0, let w(.) be an odd function. Then

f (y) = c−1 g(y)

k Y

H(w(e′j y))

(2.32)

j=1

Q is a jpdf for a k−dimensional random variable where c = Ex( kj=1 P (zj ≤ w(xj )|x)

for (z1 , z2 , . . . , zk )′ iid random variables with cdf H(.) independent of x ∼ g(x). Proof. g(y)

Qk

j=1

H(w(yj )) is positive. We need to calculate c.

c=

Z

g(y)

Rk

k Y

H(w(yj ))dy

j=1

k Y = Ex( P (zj ≤ w(xj )|x) j=1

since (z1 , z2 , . . . , zk )′ iid random variables with cdf H(.) independent of x ∼ g(x). Observe that when g(x) is a density for k independent variables x1 , x2 , . . . , xk , for P example, when we choose g(x) = 21k , g(x) = φk (x), or g(x) = 21k exp(− kj=1 |xj |) like Q Q in Examples 2.2.1, 2.2.2, and 2.2.3 correspondingly, kj=1 P (zj ≤ w(xj )) = kj=1 P (zj − w(xj ) ≤ 0) becomes

1 2k

since assuming both H ′ (x) and φ1 (x) are symmetric implies

that w(xj ) and therefore zj − w(xj ) for j = 1, 2, . . . , k have independent and identical symmetric distributions. Some examples of models motivated by 2.4.1 are presented next: Example 2.4.1. Take w(x) = √ λ1 x

1+λ2 x2

, we obtain a multivariate form of the skew

symmetric family introduced by Arellanno-Valle at al. ([7]). This density is illustrated in two dimensions in Figure 2.7. Example 2.4.2. Take w(x) = αx + βx3 , we obtain a multivariate form of the skew symmetric family introduced by Ma and Genton ([45]). This density is illustrated in two dimensions in Figure 2.8.

63

λ11=2, λ12=−4 λ21=50, λ22=15

0.4

0.3

0.2

0.1

0 4 2 0

−4 −2 −2

0 2

x1

−4 4

x2

Figure 2.7: The surface for 2 dimensional pdf for skew-normal variable in Example 2.4.1 for α1 = 2, α2 = 2, β1 = 5, β2 = 10:

64

Example 2.4.2 with α1=2, α2=1 β1=5, β2=10

4 0.5 3 2 0 4

1 0

3 2 1

−1

0 x2

−1 −2

x1

−2

Figure 2.8: The surface for 2 dimensional pdf for skew-normal variable in Example 2.4.2 for α1 = 2, α2 = 2, β1 = 5, β2 = 10:

65 α1=2, α2=4 λ1=1, λ2=2

0.25 0.2 0.15 0.1 0.05 0 4 4

2 2 0 x2

0 −2

−2

x1

Figure 2.9: The surface for 2 dimensional pdf for skew-normal variable in Example 2.4.3 for α1 = 2, α2 = 4, λ1 = 1, λ2 = 2: Example 2.4.3. Take w(x) = sign(x)|x|α/2 λ(2/α)1/2 , we obtain a multivariate form of the skew symmetric family introduced by DiCiccio and Monti ([23]). This density is illustrated in two dimensions in Figures 2.9, 2.10, 2.11 and 2.4.3. If we take the linear transformation of this variable we get the following graphs: By the definition of multivariate weighted distribution in Equation 2.6, the skewing function of the multivariate variate skew normal density in Definition 2.3.1, i.e. k Y j=1

Φ(αj e′j (A−1 (x − µ))),

can be replaced by Φm (ΓA−1 (x − µ))

66

α1=2, α2=4 λ1=1, λ2=2 2.5 2 1.5

x2

1 0.5 0 −0.5 −1 −1.5 −1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

x1

Figure 2.10: The contours for 2 dimensional pdf for skew-normal variable in Example 2.4.3 for α1 = 2, α2 = 4, λ1 = 1, λ2 = 2:

67

α1=2, α2=4 λ1=1, λ2=2

0.2 0.15 0.1 0.05 0 −5

0

−4 5 x1

−2

0

2

4

6

8

x2

Figure 2.11: The surface for 2 dimensional pdf for skew-normal variable inExample 2.4.3  1 0 for α1 = 2, α2 = 4, λ1 = 1, λ2 = 2 with location µ = (0, 0)′ , Σ1/2 = 1 2

68

α1=2, α2=4 λ1=1, λ2=2 6 5 4

x2

3 2 1 0 −1 −2 −1

−0.5

0

0.5

1

1.5

2

2.5

x1

Figure 2.12: The contours for 2 dimensional pdf for skew-normal variable inExample 2.4.3  1 0 . for α1 = 2, α2 = 4, λ1 = 1, λ2 = 2 with location µ = (0, 0)′ , Σ1/2 = 1 2

69 where Γ is a matrix of dimension m × k. In this case, the normalizing constant 2 will k

have to be changed to Ez (P (y < ΓA−1 (z − µ))|z) and for y ∼ φk (0k , Im ) and z ∼ φk (µ, AA′ ) this is equal to Φm (0; Im + ΓΓ′ ).

This extension to multivariate skew normal density is given by Gupta et al. [34] and is similar to the fundamental skew distribution of Arellano-Valle, Branco, and Genton ([5]). Note that if k = m and Γ is diagonal then we are back to the density in Definition 2.3.1.

70

CHAPTER 3 A Class of Matrix Variate Skew Symmetric Distributions

3.1

Distribution of Random Sample

In order to make inferences about the population parameters of a k−dimensional distribution, we work with a random sample of n individuals from this population which can be represented by a k × n matrix X. The typical multivariate analysis assumes independence among the individuals. However, in many situations this assumption may be too restrictive. Matrix variable distributions can account for dependence among individual observations. In the normal case, the matrix variate density of X is written as ([36]) etr(− 12 (AA′ )−1 (X − M )(B ′ B)−1 (X − M )′ ) . φk×n (X; M, AA , B B) = (2π)nk/2 |AA′ |n/2 |B ′ B|k/2 ′



(3.1)

Like in the multivariate case the matrix A determines how the variables are related, also a matrix B is introduced to account for dependence among individuals. Note that, for the density in Equation 3.1 be defined, AA′ and B ′ B should be positive definite. In

71 practice, however, there will be natural restrictions on the parameter space. First, the knowledge about the sample design through which the variable X is being observed can be reflected to the parameter space. Second, simplifying assumptions about the relationship of k variables, like independence or zero correlation assumptions, can be utilized to achieve model parsimony. Finally, if there are no restrictions on AA′ or B ′ B then these parameters are confounded ([37]). A detailed investigation of the sources, magnitude and impact of errors is necessary to identify how survey design and procedures may be improved. For example, most data collection and sample designs involve some overlapping between interviewer workload and the sampling units (clusters). For those cases, a proportion of the measurement variance which is due to interviewers is reflected to some degree in the sampling variance calculations. The variable effects that interviewers have on respondent answers are sometimes labeled the correlated response variance ([14]). In the next example, we illustrate the use of matrix variate normal distribution to account for the correlated response variance. Example 3.1.1. Suppose that the k−dimensional random variable x has spherical multivariate normal density over a population, i.e. x ∼

1 φ ((σIk )−1 x). σk k

In order to

make inferences about the scale parameter σ 2 a random sample of individuals from this population are to be observed. In a realistic setting, there will be more than one interviewer to accomplish the work, each interviewer would observe a partition of the random sample of individuals. It is reasonable to assume that each interviewer introduces a degree of correlation to their observations and that observations from different interviewers are uncorrelated. Now, for simplicity, assume that there are only two interviewers available to observe the random sample of n individuals, say I-1 and I-2. I-1 is going to observe n1 of the individuals and introduces a correlation ρ1 and I-2 is responsible of observing the remaining of the sample and introduces a correlation of ρ2 . As the following results show, the measurement variance, which is due to the

72 interviewers, is reflected to some degree in the sampling variance calculations. Likelihood function given the observations can be represented with the matrix variate normal density: X = (x1 , x2 , . . . , xn ) ∼ φk×n (X; M, AA′ , B ′ B) with M = 0k×n , AA′ = σ 2 Ik and



 B′B = 

Ψ1 0n−n1 ×n−n1



0n−n1 ×n−n1   = Ψ. Ψ2

Ψ1 is a n1 × n1 symmetric matrix with all diagonal entries equal to one and all its off-diagonal entries equal to ρ1 . Similarly Ψ2 is a n − n1 × n − n1 symmetric matrix with all diagonal entries equal to one and all its off-diagonal entries equal to ρ2 . The log-likelihood is σ −2 1 2 nk/2 k/2 k/2 ) − log(σ ) − log|Ψ1 | − log|Ψ2 | − tr(X ′ XΨ−1 ). l(σ, ρ1 , ρ2 ; X) = log( nk/2 (2π) 2 In order to find the ML estimators of ρ1 , ρ2 and σ 2 , we equate the partial derivatives of l(σ, ρ1 , ρ2 ; X). According to this, if both ρ1 and ρ2 are known, asymptotically unbiased maximum likelihood estimator of σ 2 is given by (Ψ) tr(X ′ XΨ−1 ) σb2 = . nk

The usual maximum likelihood estimator from the uncorrelated model, σˆ2 =

(3.2) tr(X ′ X) , nk

is

asymptotically biased for σ 2 where the size of bias depends on the magnitudes of ρ1 and ρ2 . A detailed investigation of the sources, magnitude and impact of errors is necessary to identify how survey design and procedures may be improved. Chen and Gupta extend the matrix normal distribution to accommodate skewness in the following form ([17]): f1 (X, Σ ⊗ Ψ, b) = c∗1 φk×n (X; Σ ⊗ Ψ)Φn (X ′ b, Ψ)

(3.3)

where

c∗1

73 = (Φn (0, (1 + b Σb)Ψ)) . A drawback of this definition is that it allows ′

−1

independence only over its rows or columns, but not both. Harrar ([39]) give two more definitions for the matrix variate skew normal density: f2 (X, Σ, Ψ, b, Ω) = c∗2 φk×n (X; Σ ⊗ Ψ)Φn (X ′ b, Ω)

(3.4)

f3 (X, Σ, Ψ, b, B) = c∗3 φk×n (X; Σ ⊗ Ψ)Φn (tr(B ′ X))

(3.5)

and

where c∗2 = (Φn (0, (Ω+b′ Σb)Ψ))−1 , c∗3 = 2; Σ, Ψ, and Ω are positive definite covariance matrices of dimensions k, n and n respectively, B is a matrix of dimension k × n. Note that if Ω = Ψ then f2 is the same as f1 . Although, more general than f1 , the densities f2 and f3 still do not permit independence of rows and columns simultaneously. A very general definition of skew symmetric variable for the matrix case can be obtained from matrix variate selection models. Suppose X is a k×n random matrix with density f (X), let g(X) be a weight function. A weighted form of density f (X) is given by

h(X) = R

f (X)g(X) . g(X)f (X)dX k×n R

(3.6)

When the sample is only a subset of the population then the associated model would be called a selection model. In their recent article article Dom´ınguez-Molina, Gonz´alezFar´ıas,Ramos-Quiroga and Gupta introduced the matrix variable closed skew normal distribution in this form ([8]). In this chapter, a construction for a family of matrix variable skew-symmetric densities that allows for independence among both variables and individuals is studied. We also consider certain extensions and relation to matrix variable closed skew normal distribution.

74

3.2

Matrix Variable Skew Symmetric Distribution

To define a matrix variate distribution from the multivariate skew symmetric distribution first assume that z i ∼ ssg,H k (0k , Ik , αi ) for i = 1, 2, . . . , n are independently distributed random variables. Write Z = (z 1 , z 2 , . . . , z n ). We can write the density of X as a product as follows, n Y

2k g ∗ (Z)

i=1

k Y

H(αji e′j z i ).

j=1

This is equal to nk ∗∗

2 g (Z)

n Y k Y

H(αji e′j Zci ).

i=1 j=1

Let A, and B be nonsingular symmetric matrices of order k, and n respectively, also assume M is a k × n matrix. Define the matrix variate skew symmetric variable as X = AZB + M. Definition 3.2.1. (Matrix Variate Skew-Symmetric Density) Let g(.) be a density function symmetric about 0, H(.) be an absolutely continuous cumulative distribution function with H ′ (.) symmetric around 0. A variable X has matrix variate skew symmetric distribution if it has probability density function 2nk g ∗∗ (A−1 (X − M )B −1 )

Qn Qk i=1

j=1

H(αji e′j (A−1 (X − M )B −1 )ci )

|A|n |B|k

(3.7)

where αji are real scalars, M ∈ Rk×n , A and B be nonsingular symmetric matrices of Q Q order k and n respectively. Finally, g ∗∗ (X) = ni=1 kj=1 g(yij ). The density is called

matrix variate skew-symmetric density with location parameter M, scale parameters (A, B), and shape parameter ∆ = (αji ), and it is denoted by mssg,H k×n (M, A, B, ∆). Let Z ∼ mssg,H k×n (0k×n , Ik , In , ∆). The the moment generating function of Z evaluated

75 k×n

at Tk×n ∈ R

is MZ (Tk×n ) and can be obtained as follows:

′ MZ (Tk×n ) = E(etr(Tk×n Z)) Z n Y k Y ′ k ∗∗ etr(Tk×n Z)2 g (Z) = H(αji e′j Zci )dZ Rk×n

′ = Eg∗∗ (Z) (etr(Tk×n Z)

i=1 j=1

n Y k Y

H(αji e′j Zci )).

i=1 j=1

Let X = AZB + M for constant (k × k) matrix A, (n × n) matrix B and k × n dimensional constant matrix M. The the moment generating function of X evaluated at Tk×n is MX (Tk×n ): ′ MX (Tk×n ) = etr(Tk×n M )MZ (A′ Tk×n B ′ )

=

′ ′ etr(Tk×n M )Eg∗∗ (Z) (etr((BTk×n A)Z)

n Y k Y

H(αji e′j Zci )).

i=1 j=1

Definition 3.2.2. (Matrix Variate Skew Symmetric Random Variable) Let g(.) be a density function symmetric about 0, H(.) be an absolutely continuous cumulative distribution function with H ′ (.) symmetric around 0. Let zij ∼ f (zij , αji ) = 2g(zij )H(αji z) for i = 1, 2, . . . , n, and j = 1, 2, . . . , k be independent variables. Then the matrix variQ Q ate random variable Z = (zij ) has density 2nk |A|n1|B|k g ∗∗ (Z) ni=1 kj=1 H(αji e′j Zci ) Q Q where g ∗∗ (z) = ni=1 kj=1 g(zij ), and e′j and c′i are the elementary vectors of the co-

ordinate system Rk and Rn respectively. Let A be a k × k constant matrix, B be

a n × n constant matrix and M be a k × n-dimensional constant matrix.A random variable X = AZB + M is distributed with respect to matrix variate skew symmetric distribution with location parameter M, scale parameters (A, B), and shape parameter g,H ∆ = (αji ). We denote this by X ∼ M SSk×n (M, A, B, ∆).

76

3.3

Matrix Variate Skew-Normal Distribution

Definition 3.3.1. (Matrix Variate Skew Normal Density). We call the density

f (X, M, A, B, ∆) =

2kn φk×n (A−1 (X − M )B −1 )

Qk

Qn

′ −1 i=1 Φ(αji ej (A (X |A|n |B|k

j=1

− M )B −1 )ci ) (3.8)

as the density of a matrix variate skew normal variable with location parameter M, scale parameters (A, B), and shape parameter ∆. We denote it by msnk×n (M, A, B, ∆). Let Z ∼ msnk×n (0k×n , Ik , In , ∆). Then the moment generating function of Z evaluated at Tk×n is MZ (Tk×n ). It can be obtained as follows:

′ MZ (Tk×n ) = Eφk×n (Z) (etr(Tk×n Z)

n Y k Y

Φ(αji e′j Zci ))

i=1 j=1

2k = (2π)k/2 =

=

2k (2π)k/2 n Y i=1

=

n Y i=1

=

i=1

e

Rk

n Z Y i=1

1 ′ t t 2 i i

2k e (2π)k/2 (2π)k/2

1 ′ t t 2 i i

2k e (2π)k/2

n Y

2 e

n Y

2k e 2 ti ti

k

1 ′ t t 2 i i

i=1

=

− 21 z′i zi +ti ′ zi

1

i=1







e− 2 (zi zi −2ti zi )

Rk

Z

Φ(αji e′j z i )dz

k Y

Φ(αji e′j z i )dz

j=1

− 12 (z′i zi −2ti ′ zi +ti ′ ti )

e

Rk − 12 (zi −ti )′ (zi −ti )

e

Φ(αji e′j z i )dz

k Y

Φ(αji e′j z i )dz

j=1

k Z Y

1

2

e− 2 (zij −(t)ij ) Φ(αji z ij )dzij

R

j=1

k Z Y k Y

k Y j=1

Rk

j=1

1

k Y j=1

1 ′ Z n Y 2k e 2 ti ti

i=1

=

n Z Y

R

1 1 2 p e− 2 (yij ) Φ(αji y ij + αji (t)ij )dyij (2π)

αji (t)ij Φ( p ) 2) (1 + α ji j=1

n Y k Y αji (T )ij 1 ′ ). Φ( p = 2 etr( Tk×n Tk×n ) 2 (1 + αji 2 ) i=1 j=1 nk

77 Let X = AZB + M for constant (k × k) matrix A,(n × n) matrix B and k × n dimensional constant matrix M. Then the moment generating function of X evaluated at Tk×n ∈ Rk×n is MX (Tk×n ), this can be obtained as follows: n Y k Y αji (A′ Tk×n B ′ )ij 1 ′ ). Φ( p MX (Tk×n ) = 2nk etr(Tk×n M + (A′ Tk×n B ′ )′ A′ Tk×n B ′ ) 2) 2 (1 + α ji i=1 j=1

Hence the following definition and theorems. Definition 3.3.2. (Matrix Variate Skew Normal Random Variable) Let

zij ∼ 2φ(zij )Φ(αji zij ) for i = 1, 2, . . . , n, and j = 1, 2, . . . , k be independent univariate skew normal random variables. Then the matrix variate random variable Z = (zij ) has density

nk

2 φk×n (Z)

n Y k Y

Φ(αji e′j Zci )

i=1 j=1

where φk×n (Z) =

Qn Qk i=1

j=1

φ(zij ), and ej and ci are the elementary vectors of the

coordinate system Rk and Rn respectively. Let A be a k × k constant matrix, B be a n × n constant matrix and M be a k × n-dimensional constant matrix. A random variable X = AZB + M is distributed with respect to matrix variate skew symmetric distribution with location parameter M, scale parameters (A, B), and shape parameter ∆ = (αji ). We denote this by X ∼ M SNk×n (M, A, B, ∆). If the density exists it is given in Equation 3.8. We denote this case by writing X ∼ msnk×n (M, A, B, ∆). Theorem 3.3.1. If X has multivariate skew-normal distribution M SNk×n (M, A, B, ∆)

78 then the moment generating function of X evaluated at Tk×n is given by 1 ′ MX (Tk×n ) = 2nk etr(Tk×n M + (A′ Tk×n B ′ )′ A′ Tk×n B ′ ) 2 n k ′ Y Y αji (A Tk×n B ′ )ij × ). Φ( p 2) (1 + α ji i=1 j=1

(3.9)

By Definition 3.4.2 we can write Z ∼ M SNk×n (0, Ik , In , ∆), and prove the following theorems. Theorem 3.3.2. Assume that Y ∼ M SNk×n (M, A, B, ∆) and X = CY D + N. Then X ∼ M SNk×n (CM D + N, CA, BD, ∆). Proof. From assumption, we have Y = AZB +M, and so X = CAZBD+(CM D+N ), i.e., X ∼ M SNk×n (CM D + N, CA, BD, ∆). Theorem 3.3.3. Let x1 , x2 , . . . xn be independent, where xi is distributed according to snk (0k , Σ1/2 , α). Then,

n X j=1

x′ j Σ−1 xj ∼ χ2kn .

Proof. Let y ∼ Nk (µ = 0, Σ). Then y′ Σ−1 y ∼ χ2k , and x′ j Σ−1 xj and y′ Σ−1 y have the same distribution from Theorem 2.2.2. Moreover, x′ j Σ−1 xj are independent. Then the desired property is proven by the addition property of χ2 distribution. It is well known that if X ∼ φk×n (M, AA′ , Ψ = In ) then the matrix variable XX ′ has the Wishart distribution with the moment generating function given as |(I − 2(AA′ )T )|−n/2 , (AA′ )−1 − 2T being a positive definite matrix. The following theorem implies that the decomposition for a Wishart matrix is not unique. Theorem 3.3.4. If a k×n matrix variate random variable X has snk×n (0k×n , A, In , ∆) distribution for constant positive definite matrix A of order k and ∆ = α1′ , then XX ′ ∼ Wk (n, AA′ ).

79 Proof. The moment generating function of the quadratic form XX can be obtained ′

as follows, for any T ∈ Rk×k , with (AA′ )−1 − 2T being a positive definite matrix, ′

E(etr(XX T )) =

Z

etr(XX ′ T )dF X

Rk×n

Q Q 2nk etr(− 21 (AA′ )−1 XX ′ + XX ′ T ) ni=1 kj=1 Φ(αji e′j A−1 Xci ) dX = (2π)nk/2 |A|n Rk×n Q Q Z 2nk etr(− 21 X ′ ((AA′ )−1 − 2T )X) ni=1 kj=1 Φ(αji e′j A−1 Xci ) = dX (2π)nk/2 |A|n Rk×n Q Q Z 2nk etr(− 21 Z ′ Z) ni=1 kj=1 Φ(αji e′j A−1 ((AA′ )−1 − 2T )1/2 Zci ) dZ = (2π)nk/2 |(I − 2(AA′ )T )|n/2 Rk×n Q Q 1 ′ Z 2nk kj=1 ni=1 e(− 2 zi zi ) Φ(αji e′j A−1 ((AA′ )−1 − 2T )1/2 z i ) = dZ (2π)nk/2 |(I − 2(AA′ )T )|n/2 Rk×n Z

2nk (Ez Φk (αji e′j A−1 ((AA′ )−1 − 2T )1/2 z))n = (I − 2(AA′ )T )|n/2 = |(I − 2(AA′ )T )|−n/2 .

3.4

Generalized Matrix Variate Skew Normal Dis-

tribution Definition 3.4.1. (Generalized Matrix Variate Skew Normal Density). We call the density

f (X, M, A, B, ∆) =

2kn φk×n (A−1 (X − M )B −1 )

Qk

′ −1 j=1 H(αji ej (A (X |A|n |B|k

− M )B −1 )ci ) (3.10)

as the density of a matrix variate skew normal variable with location parameter M, scale

80 parameters (A, B), and shape parameter ∆. We denote it by

gmsnH k×n (M, A, B, ∆).

Let Z ∼ gmsnH k×n (0k×n , Ik , In ∆). The moment generating function of Z evaluated at Tk×n is MZ (Tk×n ), it can be obtained as follows:

MZ (Tk×n ) =

′ EHk×n (Z) (etr(Tk×n Z)

n Y k Y

H(αji e′j Zci ))

i=1 j=1

2k = (2π)k/2 2k = (2π)k/2 =

n Y i=1

=

i=1

1 ′ t t 2 i i

2k e (2π)k/2

n Y

1 ′ t t 2 i i

2k e (2π)k/2

n Y

2k e 2 ti ti

(2π)k/2

1



=

i=1

e

Rk

k

2 e

1 ′ t t 2 i i

k Y

H(αji e′j z i )dz

j=1

− 21 (z′i zi −2ti ′ zi )

e

Rk

k Y

H(αji e′j z i )dz

j=1

Z

e

Z

e− 2 (zi −ti ) (zi −ti )

− 21 (z′i zi −2ti ′ zi +ti ′ ti )

Rk 1

H(αji e′j z i )dz



k Y

H(αji e′j z i )dz

j=1

k Z Y

1

2

e− 2 (zij −(t)ij ) H(αji z ij )dzij

R

j=1

k Z Y k Y

k Y j=1

Rk

j=1

i=1

n Y

− 21 z′i zi +ti ′ zi

n Z Y

n Y 2k e

i=1

=

i=1

1 ′ t t 2 i i

i=1

=

n Z Y

R

1 1 2 p e− 2 (yij ) H(αji y ij + αji (t)ij )dyij (2π)

αji (t)ij H( p ) (1 + αji 2 ) j=1

n Y k Y 1 αji (T )ij ). = 2nk etr( Tk×n ′ Tk×n ) H( p 2) 2 (1 + α ji i=1 j=1

Let X = AZB + M for constant (k × k) matrix A, (n × n) matrix B and k × n dimensional constant matrix M. Then the moment generating function of X evaluated at Tk×n ∈ Rk×n is MX (Tk×n ), this can be obtained as follows: n Y k Y 1 αji (A′ Tk×n B ′ )ij ′ MX (Tk×n ) = 2nk etr(Tk×n M + (A′ Tk×n B ′ )′ A′ Tk×n B ′ ) H( p ). 2) 2 (1 + α ji i=1 j=1

Hence the following definition and theorems.

81 Definition 3.4.2. (Matrix Variate Generalized Skew Normal Random Variable) Let

zij ∼ 2φ(zij )H(αji zij ) for i = 1, 2, . . . , n, and j = 1, 2, . . . , k be independent univariate generalized skew normal random variables. The matrix variate random variable Z = (zij ) has density

2nk φk×n (Z)

n Y k Y

H(αji e′j Zci )

i=1 j=1

where φk×n (Z) =

Qn Qk i=1

j=1

φ(zij ), and e′j and c′i are the elementary vectors of the

coordinate system Rk and Rn respectively. Let A be a k × k constant matrix, B be a n×n constant matrix and M be a k×n-dimensional constant matrix. A random variable X = AZB +M is distributed with respect to generalized matrix variate skew symmetric distribution with location parameter M, scale parameters (A, B), and shape parameter H ∆ = (αji ). We denote this by X ∼ GM SNk×n (M, A, B, ∆). If the density exists it is

given in Equation 3.10. We denote this case by writing X ∼ gsnH k×n (M, A, B, ∆). Theorem 3.4.1. If X has generalized matrix variate skew-normal distribution, H GM SNk×n (M, A, B, ∆), then the moment generating function of X evaluated at Tk×n

is given by 1 ′ MX (Tk×n ) = 2nk etr(Tk×n M + (A′ Tk×n B ′ )′ A′ Tk×n B ′ ) 2 n k ′ YY αji (A Tk×n B ′ )ij ). × H( p 2) (1 + α ji i=1 j=1

(3.11)

H By Definition 3.4.2 we can write Z ∼ GM SNk×n (0, Ik , In , ∆), and prove the following

theorems. H Theorem 3.4.2. Assume that Y ∼ GM SNk×n (M, A, B, ∆) and X = CY D +N. Then H X ∼ GM SNk×n (CM D + N, CA, BD, ∆).

82 Proof. From assumption we have Y = AZB +M, and so X = CAZBD +(CM D +N ), i.e., X ∼ M SNk×n (CM D + N, CA, BD, ∆). Theorem 3.4.3. Let x1 , x2 , . . . xn be independent, where xi is distributed according 1/2 to gsnH , α). Then, k (0k , Σ

n X j=1

x′ j Σ−1 xj ∼ χ2kn

. Proof. Let y ∼ Nk (µ = 0, Σ). Then y′ Σ−1 y ∼ χ2k , and x′ j Σ−1 xj and y′ Σ−1 y have the same distribution from Theorem 2.2.2. Moreover, x′ j Σ−1 xj are independent. Then the desired property is proved by the addition property of χ2 distribution.

3.5

Extensions

As in Chapter 2, an odd function, say w(x), can be used to replace the term in the form αji x in the skewing function to give more flexible families of densities. As before we can take w(xji ) = √ λ1 x

1+λ2 x2

, to obtain a matrix variable form of the skew symmetric

family introduced by Arellanno-Valle at al. ([7]); take w(x) = αx + βx3 , we obtain a matrix variate form of the skew symmetric family introduced by Ma and Genton ([45]); or take w(x) = sign(x)|x|α/2 λ(2/α)1/2 to obtain a matrix variate form of the skew symmetric family introduced by DiCiccio and Monti ([23]). Also note that, by the definition of matrix variate weighted distribution in Equation 3.6, the skewing function of the matrix variate skew normal density in Equation 3.8, i.e.

k Y n Y j=1 i=1

Φ(αji e′j (A−1 (X − M )B −1 )ci ),

can be replaced by Φk∗ ×n∗ (ΓA−1 (Z − M )B −1 Λ)

83 where Γ and Λ are matrices of dimensions k k and n × n correspondingly. In this ∗



case, the normalizing constant 2kn will have to be changed to EZ (P (Y < ΓA−1 (Z − M )B −1 Λ|Z)) for Y ∼ φk×n (0k×n , Ik , In ) and Z ∼ φk×n (M, AA′ , B ′ B). This extension to matrix variate skew normal density is similar to the matrix variable closed skew normal distribution ([8]). Also, if k ∗ = k, n∗ = n and both Γ and Λ are diagonal such that αij = Γii Λjj we return to the family introduced in Definition 3.4.1.

84

CHAPTER 4 Estimation and Some Applications Pewsey discusses some of the problems related to inference about the parameters of the skew normal distribution ([52]). The method of moments approach fails for the samples for which the sample skewness index is outside the admissibility range of the skew normal model. Also, the maximum likelihood estimator of the shape parameter α may take infinite values with positive probability. To deal with this problem certain methods have been proposed: Azzalini and Capitanio recommends that maximization of the log-likelihood function be stopped when the likelihood reaches a value not significantly lower than the maximum ([12]). Sartori uses bias prevention of maximum likelihood estimates for scalar skew normal and skew-t distributions and obtains finite valued estimators ([57]). Bayesian estimation of the shape parameter is studied by Liseo and Loperfido ([44]). Minimum χ2 estimation method and an asymptotically equivalent maximum likelihood method are proposed by Monti ([46]). Chen and Gupta discuss goodness of fit procedures for the skew normal distribution ([33]). Maximum products of spacings (MPS) estimation is independently introduced by Cheng and Amin ([18]), and Ranneby ([53]). It is a general method for estimating parameters in univariate continuous distributions and is known to give consistent and asymptotically efficient estimates under general conditions. In this chapter, we consider

estimation of the parameters of

snk (α, µ, Σ1/2 c )

85 distribution introduced in Chapter 2,

using the MPS procedure. We also show that MPS estimation procedure can be extended to give a class of bounded statistical model selection criteria which is suitable even for cases where Akaike’s and other likelihood based model selection criteria do not exist.

4.1

Maximum products of spacings (MPS) estima-

tion Consider the common statistical inference problem of estimating a parameter θ from observed data x1 , . . . , xn , where the xi ’s are mutually independent observations from a distribution function Fθ0 (x). The distribution function Fθ (x) is assumed to belong to a family F = {Fθ (x) : θ ∈ Θ} of mutually different distribution functions, where the parameter θ may be vector-valued. Let x(1) , x(2) , . . . , x(n) be the ordered random sample from Fθ0 (x). MLE method arises from maximization of the log product of density function, n

n

1X 1X l(θ, X) = log fθ (x(i) ) = li (θ). n i=1 n i=1

(4.1)

b MPS method arises from maximization Let the maximum of this be denoted by l(θ). of the log product of spacings,

n+1

n+1

1 X 1 X si (θ) = log(Fθ (x(i) ) − Fθ (x(i−1) )), S(θ; X) = n + 1 i=1 n + 1 i=1

(4.2)

with respect to θ. It is assumed that Fθ (x0 ) = 0 and Fθ (x(n+1) ) = 1. Let the maximum b Under very general conditions, Cheng and Amin of this function be denoted by S(θ).

([18]) prove that the estimators produced by it have the desirable properties of being

86 consistent, asymptotically normal, and asymptotically efficient since ML method and MPS method are asymptotically equivalent. For the details of this proof and the necessary conditions see ([18]). An intuitive comparison of the log likelihood with the log products of spacings is helpful in understanding the similarities and differences. By the mean value theorem, Z log(s(i) ) = log(

x(i)

x(i−1)

fθ (x)dx) = li (θ) + log(x(i) − x(i−1) ) + R(x(i−1) , x(i) , θ).

(4.3)

for each i. Here, R(x(i−1) , x(i) , θ) is essentially of O(x(i) − x(i−1) ). Now, let the notation y = Op (g(n)) mean that, given ǫ > 0, K and N can be found, depending possibly on θ0 , such that P (|y| < Kg(n)) > 1 − ǫ, for all n > N. Similarly y = op (g(n)) mean that, given ǫ > 0, δ > 0, N can be found, depending possibly on θ0 , such that P (|y| < δg(n)) > 1 − ǫ, for all n > N. Under standard regulatory assumptions, the following result holds ([22]). Given any ǫ > 0, there is a K = K(ǫ, θ0 ) and N such that sup(x(i) − x(i−1) ) < K log(n) with probability 1 − ǫ for all n ≥ N. Therefore, the n P difference between l(θ, X) and S(θ, X) is approximately n+1 i=1 log(x(i) − x(i−1) )/(n + 1)

). Consequently, MPS and ML estimation are asymptotically equal which is Op ( log(n) n

and have the same asymptotic sufficiency, efficiency, and consistency properties. How1 ever S(θ; X) is always bounded above by log( n+1 ), MPS approach gives consistent

estimators even when MLE fails. The following theorem is proved under mild conditions in Cheng and Stephens ([19]), it justifies the use of MPS approach in its relation to the likelihood and Moran Statistic. ˘ exists Theorem 4.1.1. Let θ˘ be the MPS estimator of θ. Then the MPS estimator, θ, in probability, and the following holds:  −1  ∂ 2 l(θ) 1/2 ˘ 1. n (θ − θ0 ) ∼ n(0, −E |θ0 ) asymptoticaly. ∂θ2

87 2. If θb is the likelihood estimator, then θ˘ − θb0 = op (n−1/2 ).

3. Also ˘ = S(θ0 ) + S(θ)

1 Q + op (n−1 ), 2n

where Q is distributed as χ2k . ML method and MPS method can be seen as objective functions that are approximations to Kullback-Leibler divergence ([24]). Ranneby ([53]) gives approximations of information measures other than the Kullback-Leibler information. In Ranneby & Ekstr¨om ([54]) a class of estimation methods, called generalized maximum spacing (GMPS) methods, is derived from approximations based on simple spacings of so called ϕ-divergences. Ghosh & Jammalamadaka ([28]) show in a simulation study that GMPS methods other than the MPS method can perform better in terms of mean square error. Ekstrom ([24]) gives strong consistency theorems for GMPS estimators. There are two disadvantages to MPS estimation: MPS procedure fails if there are ties in the sample. Although this will happen with probability zero for continuous densities, when data is grouped or rounded this may become an issue. Titterington ([59]) and Ekstr¨om & Ranneby([54]) give methods to deal with ties in data using higher order spacings. The second disadvantage is that optimization related to MPS approach usually requires computational methods for solution.

88

4.2

A Bounded Information Criterion for Statisti-

cal Model Selection A fundamental problem in statistical analysis is about the choice of an appropriate model. The area of statistical model identification, model evaluation, or model selection deals with choosing the best approximating model among a set of competing models to describe a given data set. In a series of papers, Akaike ([2] and [3]) develops the field of statistical data modeling from the principle of Boltzmann’s ([15]) generalized entropy or Kullback-Leibler ([42]) divergence. As a measure of the goodness of fit of an estimated statistical model among several competing models, Akaike proposes the so called Akaike information criterion (AIC). AIC provides a trade off between precision and complexity of a model, given a data set competing models may be ranked according to their AIC, the model with smallest AIC is considered the best approximating model. The Kullback-Leibler divergence between the true density fθ0 (x) and its estimate fθ (x) is given by KLD(θ0 , θ) =

Z

fθ0 (x) log(

fθ0 (x) )dx. fθ (x)

(4.4)

The most obvious estimator of [4.4] is given by n

X fθ (x(i) ) \ 0 , θ) = 1 KLD(θ log( 0 ). n i=1 fθ (x(i) )

(4.5)

Minimization of the expression in (4.5) with respect to θ is equivalent to maximization of the log likelihood, in Equation 4.6. The maximum log likelihood method can thus be used to estimate the values of parameters. However, it cannot be used to compare different models without some corrections because it is biased. An unbiased estimate of −2 times the mean expected maximum value of log likelihood yields the famous

89 AIC: b + 2k/n. AIC = −2l(θ)

(4.6)

For small sample sizes, the penalty term penalty term of AIC, i.e. 2k/n, is usually not sufficient and may be replaced in search for improvement. For example the Bayes information criterion (BIC) of Schwarz ([58]) is obtained for 2k log(n)/n, HannanQuinn ([38]) information criterion is obtained for 2k log(log(n))/n. Many other model selection criteria are obtained by similar adjustments to the penalty term. However, all these different criteria share the same problem: There are many cases for which the likelihood function is unbounded as a function of the parameters (for example see ([21])), therefore in such situations likelihood based inference and model selection is not appropriate. Next, we develop a model selection criterion which overcomes the unbounded likelihood problem based on the MPS estimation. In Akaike ([2]), AIC is recommended as an approximation to the Kullback-Leibler divergence between the true density and its estimate ([2]). The connection of S(θ, X) to the Kullback-Leibler divergence and minimum information principle is studied by Ranneby [53] and Lind ([43]). Ranneby introduces MPS from an approximation to the Kullback-Leibler divergence [53]. Under mild assumptions the following holds b < sup (−γ − KLD(θo , θ) + ǫ)) → 1 P (S(θ) |{z} θ

as n → ∞ for every ǫ > 0 where γ ∼ = 0.57722 is the Euler’s constant. In Cheng and Stephens ([19]), the relation of MPS method to Moran’s goodness of fit statistic ([47]) is studied. Moran’s statistics is the negative of S(θ, X), and its distribution evaluate at the true value of the parameter does not depend on the particular distribution function it is calculated for. The percentage points of Moran’s statistic are given by Cheng and Thornton ([20]).

90 Cheng and Stephens extend the Moran’s test ([48]) to the situation where parameters have to be estimated from sample data. Under mild assumptions, when θb is an efficient estimate of θ (this may be the maximum likelihood estimate, or the value, θ, which minimizes S(θ; X)) then asymptotically the following holds: b = S(θ0 ; X) + S(θ)

1 Q + op (n−1 ) 2n

(4.7)

where Q is a χ2k random variable with expected value k. As n → ∞ expected value of b + the statistic −S(θ)

k 2n

does not depend on the particular distribution function it is

calculated for.

Because of the above reasons, we propose the statistic b + k M P SIC = −S(θ) 2n

(4.8)

as a model selection criterion and we refer to it as the maximum products of spacings information criterion (MPSIC) here on. Note that the penalty term in 4.8 can be replaced with

k log(n) 2n

or

k log(log(n)) 2n

to obtain other criteria of this kind, let’s call these

variants M P SIC2 and M P SIC3 . A In Section 4.1, we have argued that S(θ0 ; X) should be a close to the log likelihood in 4.1 when the latter exists; however, the main difference between S(θ0 ; X) and the P log likelihood in 4.1 is the term n+1 i=1 log(x(i) − x(i−1) )/(n + 1) which depends on the

unknown true parameter value θ0 . Because of this term the distribution of the log likelihood depends on the unknown true parameter value and therefore is not suitable for model identification purposes. As mentioned earlier in skew normal model likelihood function is often unbounded. Therefore, estimation in skew normal model leaves a lot of room for testing MPSIC as a model selection criterion. In Table 4.1, we display the results of the following

91 experiment: We generate n observations from sn1 (α, 0, 1), and then we compare the fit of this model to standard normal and half normal models with M P SC, M P SC2 , and M P SC3 . We record the percentage of times the skew normal model is selected by these model selection criterion. n α MPSIC MPSIC2 MPSIC3 20 0 0.272 0.056 0.251 50 0 0.265 0.030 0.190 100 0 0.294 0.030 0.182 20 2 0.961 0.953 0.961 50 2 1.000 1.000 1.000 100 2 1.000 1.000 1.000 20 5 0.806 0.777 0.803 50 5 0.960 0.959 0.960 100 5 0.999 0.999 0.999 20 10 0.536 0.483 0.529 50 10 0.842 0.822 0.835 100 10 0.972 0.966 0.972 Table 4.1: M P SIC, M P SIC2 , and M P SIC3 are compared for samples of size 20, 50 and 100 for different levels of α.

4.3

Estimation of parameters of snk (α, µ, Σ)

We need the following result in this section: Let x(1) , x(2) , . . . , x(n) be the ordered sample from sn1 (α, 0, 1), and let the cdf of this density be denoted as Fα (.). Then q

x

(1) 2 φ( (α2 +1) −1/2 ) ∂ π S(α; X) = − ∂α Fα (x(1) )(α2 + 1) q x(n) 2 φ( (α2 +1) −1/2 ) π + (1 − Fα (x(n) ))(α2 + 1) q x(i) x(i−1) n X π2 (φ( (α2 +1) −1/2 ) − φ( (α2 +1)−1/2 )) . + (Fα (x(i) ) − Fα (x(i−1) ))(α2 + 1) i=2

(4.9)

92

4.3.1

Case 1: MPS estimation of sn1 (α, 0, 1)

1/2 ′ Assume X ∼ snk×n (µ1′k , Σ1/2 are known, we can write z i = c , Ik , α1k ). When µ, Σc

Σ−1/2 (xi − µ) and estimate components αj of α by using the corresponding i.i.d. c sn1 (αj , 0, 1) random variables zj1 , zj2 , . . . , zjn . Example 4.3.1. The ”frontier data” 50 observations generated by a sn1 (0, 1, α = 5) distribution reported in Azzalini and Capitonio ([12]) is an example for which the usual estimation methods like method of moments or maximum likelihood fail. Figure 4.1 gives the histogram, kernel density estimate, density curves for sn1 (0, 1, α = 5) and sn1 (0, 1, α b = 4.9632) distributions. For the frontier data,the Bayesian estimates recommended by Liseo and Loperfido gives α b = 2.12 ([44]), the estimate through bias

prevention method by Sartori gives α b = 6.243 ([57]). Moreover the MPSIC for skew

normal model is 4.3999. The MPSIC for the half normal model is 5.0495. Therefore, MPSIC identifies the correct model. For any value shape parameter α, the absolute value of the skew normal variable has half normal distribution. If we take the absolute values of the frontier data, the appropriate model should be the half normal model instead of a skew normal model with large shape parameter. This fact is supported by MPSIC: Estimate of α by MPS method is 38.3906, and the MPSIC for this model is 4.4136. The MPSIC for the half normal model is 4.4057. The following Table 4.2 give summary statistics for the distribution of MPS estimator for different values of n and α obtained from 5000 simulations. These results can be compared with the simulation results for the ML estimator in Table 4.2. Although MPS estimation provides an improvement on ML estimation, highly skew distribution of the estimator, as seen in Figure 4.2, points to the tendency in some samples to give extremely large estimates for the shape parameter. Since α → ±∞ the density SN1 (y, α) approaches to the half normal density, the question weather skew

93

40

35

30

relative frequency

25

20

15

10

5

0 −2

−1

0

1

2

3

4

5

y

Figure 4.1: The histogram, kernel density estimate (blue), sn1 (0, 1, α = 5) curve (red) and sn1 (0, 1, α b = 4.9681) curve (black) for the ”frontier data”. n 20 50 100 20 50 100 20 50 100 20 50 100

α Mean (b α) 0 0.0028 0 0.0040 0 0.0057 2 2.9409 2 1.9662 2 1.9363 5 10.2334 5 5.5383 5 4.7936 10 21.5252 10 11.4908 10 10.0525

std(b α) Median(b α) MAD(b α) 0.2702 0.0045 0.1578 0.1572 0.0094 0.1048 0.1193 0.0065 0.0849 7.9874 1.6648 0.4516 1.1007 1.8120 0.2981 0.3932 1.8913 0.2446 19.1683 3.6313 1.1751 6.7330 4.2047 0.9611 1.4561 4.4462 0.7202 29.5315 6.2980 2.6961 15.1492 7.3934 1.9900 7.0268 8.3873 1.8476

Table 4.2: MPS estimation of α: Mean, standard deviation, median and mean absolute deviation of α b from 5000 simulations.

94 n 20 50 100 20 50 100 20 50 100 20 50 100

α Mean (b α) 0 0.0108 0 0.0110 0 0.0017 2 3.3901 2 2.2586 2 2.1468 5 16.6467 5 7.2979 5 5.7956 10 30.9972 10 16.0803 10 12.8935

std(b α) Median(b α) MAD(b α) 0.3469 0.0105 0.1969 0.1827 0.0156 0.1142 0.1308 0.0010 0.0895 6.5156 2.2049 0.6616 0.8532 2.1192 0.3817 0.4518 2.0744 0.2737 25.4334 5.7100 2.2305 7.8039 5.5378 1.4243 3.5357 5.2114 0.8775 32.6160 12.0381 6.5854 16.9440 10.8762 3.5005 9.2488 10.4143 2.3308

Table 4.3: ML estimation of α: Mean, standard deviation, median and mean absolute deviation of α b from 5000 simulations. normal model is appropriate for the data arises. When α b is large, perhaps it is better to use the half normal model with one less parameter for sake of model parsimony.

This can always be checked by comparing half normal model and skew normal model by one of the model selection criteria, M P SIC, M P SIC2 , or M P SIC3 .

4.3.2

Case 2: Estimation of µ, Σc1/2 given α

For X ∼ snk×n (µ1′k , Σc1/2 , Ik , α1′k ), when α is known, we can use method of moments to estimate the parameters µ, Σc1/2 . Note that

E(x) =

r



√ α1 2 1+α1

   √ α2 2 1/2  1+α22 Σc   .. π  .   √ αk 2

1+αk

Cov(x) =

Σ1/2 c (Ik



     + µ = Σ1/2 c µ0 (α) + µ,    

α12 α22 αk2 2 ′ ′ 1/2 ))Σc1/2 = Σ1/2 . , ,..., − diag( c Σ0 (α)Σc 2 2 2 π 1 + α1 1 + α2 1 + αk

95

5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0

0

100

200

300

400 alpha

500

600

700

800

Figure 4.2: Histogram of 5000 simulated values for α b by MPS. The MPS estimator is highly skewed suggesting that half normal model might be more suitable for cases where extreme values of α b are observed.

96 Therefore, given α, we can estimate Σ and µ by solving the moment equations

S = (1/n)

n X j=1



1/2 (xj − x)(xj − x)′ = Σ1/2 c Σ0 (α)Σc

x = (1/n)

n X

(4.10)

xj = Σ1/2 c µ0 (α) + µ.

(4.11)

j=1

Solution to the above equations are 2 α12 α22 αk2 d 1/2 1/2 Σ = S (I − diag( ))−1/2 , , , . . . , k c c π 1 + α12 1 + α22 1 + αk2 

√ α1

 1+α21   √ α2  1+α2 1/2 2 2 2 −1/2  2 ˆ = x − Sc (Ik − diag(µ(α1 ) , µ(α2 ) , . . . , µ(αk ) )) µ  . ..    √ αk 2

1+αk

(4.12)



    .    

(4.13)

d 1/2 ˆ is an equivariant estimator of (Σ1/2 Note that (Σ c , µ). c , µ) under the group affine

transformations obtained by scaling with lower triangular matrix with positive diagonal elements.

4.3.3

Case 3: Simultaneous estimation of µ, Σc1/2 and α.

Given an initial guess of α, the following estimation procedure gives reasonably well results. (a) Start with an initial estimate for α, say α0 . (b) Given an estimate of α, estimate Σ1/2 and µ by 4.12, and 4.13 c (c) Given an estimate of µ and Σ, estimate α by MPS.

97 (d) Repeat steps 2-3 until convergence. First, we apply this technique to univariate data. Example 4.3.2. Bolfarine et. al. ([29]), consider the 1150 height measurements at 1 millisecond intervals along the drum of a roller (i.e. parallel to the axis of the roller). Data is a part of a study on surface roughness of the rollers. Since there are ties in the sample before analyzing the data we have adjusted the data for ties. We apply the procedure described above, Figure 4.3 summarizes the results.

800

700

relative frequency

600

500

400

300

200

100

0 0

2

4 Drum Measurements

6

8

Figure 4.3: The histogram, kernel density estimate (blue), and estimated density curve sn1 (µ = 4.2865, σ 1/2 = 0.9938, α = −2.9928) (black) for 1150 height measurements.

98 Table 4.4 gives summary statistics for the distribution of estimators of α, µ = 0 and σ = 1 for different of n and α. n 30 50 100 30 50 100

α mean α b 1.5 2.6013 1.5 2.0914 1.5 1.8096 3.0 4.0288 3.0 3.7638 3.0 3.5131

std α b median α b MAD α b mean µ b 2.1136 1.9277 1.4755 -0.0722 1.2355 1.7920 0.8614 -0.0536 0.6817 1.6584 0.5292 -0.0372 2.7718 3.2239 2.0730 0.0229 2.0545 3.2340 1.5469 0.0072 1.5431 3.1676 1.1380 0.0012

std µ b mean σ b 0.2303 1.0560 0.1971 1.0432 0.1443 1.0228 0.1848 0.9836 0.1426 0.9934 0.1090 0.9981

std σ b 0.2003 0.1634 0.1157 0.1840 0.1509 0.1090

Table 4.4: Table above gives summary statistics for the distribution of estimators of α, µ = 0 and σ = 1 for different of n and α. 



 1 0  Example 4.3.3. We generate 100 observations from sn2 (µ = (0, 0)′ , Σ1/2 =  ,α = c .5 2

(5, −3)′ ) distribution. We estimate the parameters with the procedure described above with initial estimate for α, i.e. α0 = (10, −10)′ . Figure 4.4 gives the the scatter plot, and the contour plots of the true, and the estimated densities. Example 4.3.4. Australian Athletes Data: This data is collected on 202 athletes and

mentioned in Azzalini ([11]). We take the variables LBM and BMI and fit a bivariate skew normal model with the procedure described above with initial estimate for α, i.e. α0 = (10, 10)′ . Figure 4.5 gives the the scatter plot, and the contour plot the estimated density. Example 4.3.5. Australian Athletes Data 2: This data is collected on 202 athletes and mentioned in Azzalini ([11]). We take the variables LBM and SSF and fit a bivariate skew normal model with the procedure described above with initial estimate for α, i.e. α0 = (10, 10)′ . Figure 4.6 gives the the scatter plot, and the contour plot the estimated density.

99

1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 −0.5

0

0.5

1

1.5

2

2.5

Figure 4.4: The scatter plot  generated observations (black dots) and, contours  for 100 1 0 , α = (5, −3)′ ) (blue), and estimated density curve of sn2 (µ = (0, 0)′ , Σ1/2 = c .5 2  0.9404 0 1/2 ′ , α = (5.6825, −3.6566)′ ) (red). sn2 (µ = (0.0795, 0.0579) , Σc = 0.5845 1.1092

100

90

80

LBM

70

60

50

40

14

16

18

20

22

24 BMI

26

28

30

32

34

Figure 4.5: The scatter plot for LBM and BMI in Australian   Athletes Data, contours 4.2760 0 1/2 ,α = of the estimated density sn2 (µ = (19.7807, 48.8092)′ , Σc = 13.9306 10.7933 (2.5429, 0.8070)′ ) (red). 160 140 120

SSF

100 80 60 40 20 0 0

5

10

15

20 BMI

25

30

35

40

Figure 4.6: The scatter plot for LBM and SSF in Australian   Athletes Data, contours 4.2760 0 1/2 ′ ,α = of the estimated density sn2 (µ = (19.7807, 17.3347) , Σc = 15.6127 50.5834 (2.5429, 8.6482)′ ) (red).

101

n 10

20

30

50

100

200

α b mean 3.9836 3.7503 α b mean 2.6878 2.5243 α b mean 2.0305 1.7223 α b mean 1.2905 1.3733 α b mean 1.184 1.1238 α b mean 1.0341 1.0575

cov 17.6933 -0.1927 -0.1927 16.2678 cov 16.1777 -0.77 -0.77 12.7421 cov 8.983 -0.3011 cov 1.1753 -0.0115 cov 0.4377 -0.0227 cov 0.1308 0.0064

-0.3011 3.9441

-0.0115 1.2555

-0.0227 0.3385

0.0064 0.1485

µ b mean 0.8143 0.7769 µ b mean 0.8812 0.8466 µ b mean 0.9149 0.9044 µ b mean 0.9553 0.9345 µ b mean 0.9506 0.9537 µ b mean 0.9796 0.963

cov 0.2056 0.0879 0.0879 0.2981 cov 0.1227 0.0597 0.0597 0.167 cov 0.0854 0.0424 0.0424 0.1235 cov 0.0644 0.0347 0.0347 0.0859 cov 0.0354 0.0178 0.0178 0.05 cov 0.0187 0.0093 0.0093 0.0261

b and α b in estimation of sn2 (µ = (1, 1)′ , Σ1/2 Table 4.5: = c  The simulation results about µ  1 0 , α = (1, 1)′ ). .5 1

102

d 1/2 vec(Σ ) n mean 10 1.1266 0.5627 1.0634 d 1/2 vec(Σ ) n 20

n 30

n 50

n 100

n 200

mean 1.0797 0.5229 1.056 d 1/2 vec(Σ ) mean 1.0582 0.5038 1.0338 d 1/2 vec(Σ ) mean 1.0319 0.5201 1.0236 d 1/2 vec(Σ ) mean 1.0268 0.5168 1.0162 d 1/2 vec(Σ ) mean 1.0179 0.5112 1.0198

cov 0.1179 0.046 -0.0015 0.046 0.1867 0.0108 -0.0015 0.0108 0.1107 cov 0.0656 0.0349 -0.0042 0.0349 0.0841 0.0024 -0.0042 0.0024 0.0621 cov 0.044 0.0226 -0.0002 0.0226 0.0514 0.0018 -0.0002 0.0018 0.0396 cov 0.0293 0.0147 0.0147 0.03 0.0029 0.0014

0.0029 0.0014 0.029

cov 0.0159 0.0078 0.0078 0.0147 0.0003 0.0011

0.0003 0.0011 0.0159

cov 0.0085 0.0039 -0.0003 0.0039 0.0068 0.0002 -0.0003 0.0002 0.009

d 1/2 1/2 ′ Table 4.6:  The simulation results about Σc in estimation of sn2 (µ = (1, 1) , Σc =  1 0 , α = (1, 1)′ ). .5 1

103

BIBLIOGRAPHY [1] D.J. Aigner, C.A.K. Lovell, and P. Schmidt. Formulation and estimation of stochastic frontier production function models. Rand Corp., 1976. [2] H. Akaike. A new look at the statistical model identification. Automatic Control, IEEE Transactions on, 19(6):716–723, 1974. [3] H. Akaike. On entropy maximization principle. Applications of Statistics, 1(1):27– 41, 1977. [4] T.W. Anderson. An introduction to multivariate statistical analysis Wiley series in probability and mathematical statistics. Probability and mathematical statistics. Wiley-Interscience, 2003. [5] R.B. Arellano-Valle, M.D. Branco, and M.G. Genton. A unified view on skewed distributions arising from selections. Canadian Journal of Statistics, 34(4):581, 2006. [6] R.B. Arellano-Valle and M.G. Genton. On fundamental skew distributions. Journal of Multivariate Analysis, 96(1):93–116, 2005. [7] R.B. Arellano-Valle, H.W. Gomez, and F.A. Quintana. A new class of skew-normal distributions. Communications in Statistics-Theory and Methods, 33(7):1465– 1480, 2004. [8] G.F.; Gupta A.K. Armando, J.; Dom´ınguez-Molina and R. Ramos-Quiroga. A matrix variate closed skew-normal distribution with applications to stochastic

104 frontier analysis. Communications in Statistics - Theory and Methods, 36(9):1691– 1703, 2007. [9] B.C. Arnold and R.J. Beaver. The skew-Cauchy distribution. Statistics & Probability Letters, 49(3):285–290, SEP 15 2000. [10] A. Azzalini. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12:171–178, 1985. [11] A. Azzalini. The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics, 32(2):159–188, 2005. [12] A. Azzalini and A. Capitanio. Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 61(3):579–602, 1999. [13] A. Azzalini and A. Capitanio. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 65(2):367–389, 2003. [14] L. Bailey, T.F. Moore, and B.A. Bailar. An interviewer variance study for the eight impact cities of the national crime survey cities sample. Journal of the American Statistical Association, 73(1):16–23, 1978. [15] L. Boltzmann. On Zermelos paper On the mechanical explanation of irreversible processes. Annalen der Physik, 60:392–398, 1897. [16] M.D. Branco and D.K. Dey. A general class of multivariate skew-elliptical distributions. Journal of Multivariate Analysis, 79(1):99–113, 2001. [17] J.T. Chen and A.K. Gupta. Matrix variate skew normal distributions. Statistics, 39(3):247–253, 2005. [18] R.C.H. Cheng and N.A.K. Amin. Estimating parameters in continuous univariate distributions with a shifted origin. Journal of the Royal Statistical Society. Series B. Methodological, 45(3):394–403, 1983.

105 [19] R.C.H. Cheng and M.A. Stephens. A goodness-of-fit test using Moran’s statistic with estimated parameters. Biometrika, 76(2):385–392, 1989. [20] R.C.H. Cheng and K.M. Thornton. Selected percentage points of the Moran statistic. Journal of Statistical Computation and Simulation, 30(3):189–194, 1988. [21] R.C.H. Cheng and L. Traylor. Non-regular maximum likelihood problems. Journal of the Royal Statistical Society. Series B. Methodological, 57(1):3–44, 1995. [22] D.A. Darling. On a class of problems related to the random division of an interval. The Annals of Mathematical Statistics, 24:239–253, 1953. [23] T.J. DiCiccio and A.C. Monti. Inferential aspects of the skew exponential power distribution. Journal of American Statistical Association, 99(466):439–450, 2004. [24] M. Ekstr¨om. Alternatives to maximum likelihood estimation based on spacings and the Kullback–Leibler divergence. Journal of Statistical Planning and Inference, 138(6):1778–1791, 2008. [25] K. Fang. Generalized Multivariate Analysis. Science Press, Beijing, 1990. [26] R.A. Fisher. The effect of methods of ascertainment upon the estimation of frequencies. Annals of Eugenics, 6:13–25, 1934. [27] M.G. Genton. Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman & Hall/CRC, 2004. [28] K. Ghosh and S.R. Jammalamadaka. A general estimation method using spacings. Journal of Statistical Planning and Inference, 93(1-2):71–82, 2001. [29] H.W. Gomez, H.S. Salinas, and H. Bolfarine. Generalized skew-normal models: properties and inference. Statistics, 40(6):495–505, 2006. [30] H.W. Gomez, O. Venegas, and H. Bolfarine. Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics, 18(4):395–407, 2007.

106 [31] A.K. Gupta and F.C. Chang. Multivariate skew-symmetric distributions. Applied Mathematics Letters, 16(5):643–646, 2003. [32] A.K. Gupta and J.T. Chen. A class of multivariate skew-normal models. Annals of the Institute of Statistical Mathematics, 56(2):305–315, 2004. [33] A.K. Gupta and T. Chen. Goodness-of-fit tests for the skew-normal distribution. Communications in Statistics-Simulation and Computation, 30(4):907–930, 2001. [34] A.K. Gupta, G. Gonz´alez-Far´ıas, and J.A. Dom´ınguez-Molina. A multivariate skew normal distribution. Journal of Multivariate Analysis, 89(1):181–190, 2004. [35] A.K. Gupta and W.J. Huang. Quadratic forms in skew normal variates. Journal of Mathematical Analysis and Applications, 273(2):558–564, 2002. [36] A.K. Gupta and D.K. Nagar.

Matrix Variate Distributions.

Chapman &

Hall/CRC, 2000. [37] A.K. Gupta and T. Varga. Elliptically Contoured Models in Statistics. Kluwer Series In Mathematics and Its Applications, 1993. [38] E.J. Hannan and B.G. Quinn. The determination of the order of an autoregression. Journal of the Royal Statistical Society, 41(2):190–195, 1979. [39] S.W. Harrar and A.K. Gupta. On matrix variate skew-normal distributions. Statistics, 42(2):179–194, 2008. [40] W.J. Huang and Y.H. Chen.

Quadratic forms of multivariate skew normal-

symmetric distributions. Statistics & Probability Letters, 76(9):871–879, 2006. [41] M.C. Jones and M.J. Faddy. A skew extension of the t-distribution, with applications. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 65(1):159–174, 2003. [42] S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22(1):79–86, 1951.

107 [43] N.C. Lind. Information theory and maximum product of spacings estimation. Journal of the Royal Statistical Society. Series B. Methodological, 56(2):341–343, 1994. [44] B. Liseo and N. Loperfido. Default Bayesian analysis of the skew-normal distribution. Journal of Statistical Planning and Inference, 136(2):373–389, 2004. [45] Y.Y. Ma and M.G. Genton. Flexible class of skew-symmetric distributions. Scandinavian Journal of Statistics, 31(3):459–468, 2004. [46] A.C. Monti. A note on the estimation of the skew normal and the skew exponential power distributions. Metron, 61(2):205–219, 2003. [47] P.A.P. Moran. The random division of an interval. Supplement to the Journal of the Royal Statistical Society, pages 92–98, 1947. [48] P.A.P. Moran. The random division of an interval. Part II. Journal of the Royal Statistical Society, 13(1):92–98, 1951. [49] S. Nadarajah and S. Kotz. Skewed distributions generated by the normal kernel. Statistics & Probability Letters, 65(3):269–277, 2003. [50] S. Nadarajah and S. Kotz. Skew distributions generated from different families. Acta Applicandae Mathematicae, 91(1):1–37, 2006. [51] G.P. Patil, C.R. Rao, and M. Zelen. Weighted distributions. Encyclopedia of Statistical Sciences, 9:565–571, 1988. [52] A. Pewsey. Problems of inference for Azzalini’s skewnormal distribution. Journal of Applied Statistics, 27(7):859–870, 2000. [53] B. Ranneby. The maximum spacing method. An estimation method related to the maximum likelihood method. Scandinavian Journal of Statistics, 11(2):93– 112, 1984. [54] B. Ranneby and M. Ekstr¨om. Maximum spacing estimates based on different metrics. Institute of Mathematical Statistics, Umea University, 1997.

108 [55] C. Roberts and S. Geisser. A necessary and sufficient condition for the square of a random variable to be gamma. Biometrika, 53(1/2):275–278, 1966. [56] S.K. Sahu, D.K. Dey, and M.D. Branco. A new class of multivariate skew distributions with applications to bayesian regression models. The Canadian Journal of Statistics, 31(2):129–150, 2003. [57] N. Sartori. Bias prevention of maximum likelihood estimates for scalar skew normal and skew t distributions. Journal of Statistical Planning and Inference, 136(12):4259–4275, 2006. [58] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461– 464, 1978. [59] D.M. Titterington. Comment on estimating parameters in continuous univariate distributions. Journal of the Royal Statistical Society. Series B. Methodological, 47(1):115–116, 1985. [60] J.Z. Wang, J. Boyer, and M.G. Genton. A skew-symmetric representation of multivariate distributions. Statistica Sinica, 14(4):1259–1270, 2004. [61] S. Zacks. Parametric Statistical Inference. Pergamon Press, New York, 1981.