The Information Index for Multivariate Skew Elliptical Distributions

2 downloads 0 Views 985KB Size Report
However, the multivariate skew-normal distribution introduced by Azzalini & Dalla-Valle (1996) and its extensions to skew-elliptical models have proved to be ...
The Information Index for Multivariate Skew Elliptical Distributions Reinaldo Arellano-Valle (1), Javier Contreras-Reyes (2), Marc G. Genton (3) ´ (1) Departamento de Estad´ıstica, Pontificia Universidad Catolica de Chile, Santiago, Chile. ´ (2) Centro de Modelamiento Matematico, Universidad de Chile, Santiago, Chile. (3) Department of Statistics, Texas A&M University, College Station, USA. Overview Several researchers have been applied statistical techniques linked to information theory, which highlights the utilization of the so called Shannon information index (Shannon, 1948) based in the multivariate normal distribution. For this, is essential to transform the interesting variables, but reducing significantly the amount of contained information in the data set to be analyzed. However, the multivariate skew-normal distribution introduced by Azzalini & Dalla-Valle (1996) and its extensions to skew-elliptical models have proved to be more flexible in the practice and over data sets presenting skewness. The objective of this work is to introduce the skew-elliptical Shannon information index, studying its mains properties, implementing efficient numeric methods for its computation, and comparing with normal case. Finally, we illustrate the new results with a real application about the environment monitoring network optimization using some monitoring stations in Santiago of Chile, where the interest variable is ozone.

Entropy and Mutual Information n

The Skew-t case

m

Let X ∈ R and Y ∈ R be two random vectors with joint and marginal pdf’s pX ,Y (x, y), pX (x) and pY (y ), respectively. The mutual information between X and Y is defined by  Z     Z pX ,Y (x, y ) pX ,Y (x, y) IXY = E log = pX ,Y (x, y)dxdy. (1) log pX (x)pY (y ) p (x)p (y ) n m X Y x∈R y ∈R

ν+k T ¯ η z0; ν + k , and so For this model, we have Z0 ∼ STk (0, Ik , η¯, ν), with pdf p0(z0) = 2tk (z0, ν)T 2 ν+kz0 k s !#) ( " ν+k T ¯ Z0; ν + k , H0ST = H0t − E log 2T η 2 ν + kZ0k

Moreover the entropy of a random vector Z ∈ Rk with pdf pZ is defined by Z HZ = −E {log [pZ (z)]} = − log[pZ (z)]pZ (z)dx.

where H0t is the formula (5). To compute of the last factor of the ST-entropy (7) by integrating in one dimension only, we need the following result whose proof is given by Arellano-Valle (2010). Let Z0 ∼ STk (¯ η , ν). Then, s √ ν+k ν + k k¯ η k WST d T η¯ Z0 = q ν + kZ0k2 2 ν + k − 1 + WST

(2)

z∈Rk

The entropy concept is attributed to uncertainty of information or mathematical contrariety of information. From (1) and (2) is straightforward to see that mutual information index (MII) between X and Y can be compute as

(3)

IXY = HX + HY − HXY where HXY , HX and HY are joint and marginal entropies of (X , Y ), X and Y , respectively.

Location-scale models −1/2

−1/2

Let pZ (z) = |Ω| p0 Ω (z − ξ) be a location-scale pdf, where ξ ∈ Rk is the location vector and Ω ∈ Rk ×k is the dispersion/scale matrix. Let also Z0 = Ω−1/2(Z − ξ) be a standardized version of Z , which has a standardized pdf p0(z0), which does not depend on (ξ, Ω). Then 

1 HZ = log|Ω| + H0, 2 where H0 = −E(log[p0(Z0)]) is the entropy of the standardized vector Z0 ∼ p0.

(4)

q



where WST ∼ ST (k¯ η k, ν + k − 1). So, we have     s !#) ( " √     ν+k ν + kk¯ η k W     ST T ¯ η Z0; ν + k = E log 2T  q ; ν + k  E log 2T 2   ν + kZ0k 2   ν + k − 1 + WST

(7)

(8)

Numerical Simulation Plots of Skew-t and Skew-Normal (hsn) entropies obtained by Quadpack Numerical Method for the values k = 1, Ω = I1, α ∈ [0,1, 50] and ν = 1, 2, ..., 10, 20, ..., 50, 55, 75, ..., 185 to illustrate the entropies 1 [1 + log (2π)] − E {log [2Φ(αWX )]} 2 s !#) ( " ν+1 αWX ; ν + 1 = H0T − E log 2T 2 ν + WX HXSN =

HXST

Elliptical models Let Z ∼ Elk (ξ, Ω, h(k )) be a dimensional elliptical random vector in Rk , with location vector ξ ∈ Rk , dispersion matrix Ω ∈ Rk ×k and density generator function h(k ), whose pdf is  (k ) −1/2 (k ) T −1 pz (z) ≡ fk (z; ξ, Ω, h ) = |Ω| h (z − ξ) Ω (z − ξ) , z ∈ Rk . For this class the standardized random vector Z0 = Ω−1/2(Z − ξ) has a spherical pdf given by p0(z0) = h(k )(z0T z0), z0 ∈ Rk , for which H0 = −E(log h(k)(S)), where S = Z0T Z0 = (Z − ξ)T Ω−1(Z − ξ) is the radial random variable induced by the generator h(k ). Thus, the entropy HZ of Z ∼ Elk (ξ, Ω, h(k )) is computed from (4).

The Normal Case Let Z ∼ Nk (µ, Σ) denotes a k-dimensional normal random vector, with mean vector µ and covariance matrix Σ. Note that Z = µ + Σ1/2Z0, where Z0 ∼ Nk (0, Ik ). The pdf of Z0 is p0(z0) = φk (z0) = (2π)−k /2 exp{−(1/2)z0T z0}. Thus, since in this case S = Z0T Z0 ∼ χ2k , and so E(S) = k, we have H0 = (k /2) log(2π) + (1/2)E(S) = (k /2)[1 + log(2π)]. The mutual normal information (MNI) between X and Y is     1 |ΣXX | 1 |ΣYY | N IXY (Σ) = log = log . 2 |ΣXX ·Y | 2 |ΣYY ·X |

The Quadpack method it’s more exactly and efficient in computational job time words than others methods like MonteCarlo (MC). The numerical implementation suggest a convergence of Skew-Normal entropy to Skew-t entropy.

−1 where ΣXX ·Y = ΣXX − ΣXY Σ−1 Σ and Σ = Σ − Σ Σ YX YY ·X YY YX YY XX ΣXY are the conditional covariance matrices of X |Y and Y |X , respectively.

The t Case Let Z ∼ Tk (ξ, Ω, ν) denotes a k-dimensional t random vector with location vector ξ, dispersion matrix Ω and ν freedom degree. For this case, we have Z0 = Ω−1/2(Z − ξ) ∼ Tk (0, Ik , ν) and Z0Z0T ∼ k F(k ,ν). Thus, considering that   ν  k h ν+k ν+k si (k ) log h (s) = log Γ − log Γ − log(νπ) − log 1 + , 2 2 2 2 ν d

we have H0 = E[log h(k )(S)] where S ∼ k F(k ,ν). Using now the well-known fact that S = k (S1/k )/(S2/ν), where S1 ∼ χ2k and S2 ∼ χ2ν and they are independent, and hence S1 + S2 ∼ χ2k+ν , is straightforward to see that      ν  S ν+k E log 1 + = E[log (S1 + S2)] − E[log S2] = ψ −ψ , ν 2 2 where ψ(x) is the digamma function. We find for the entropy of Z0 ∼ tk (0, Ik , ν) that (  )     ν+k   Γ 2 ν+k ν+k ν t  + H0 = − log ψ −ψ . ν k/2 2 2 2 Γ 2 (νπ)

(5)

Skew Elliptical models

Aplication A practical illustration is provided in this section, about for a subset of time series of ozone concentrations, containing in seven stations: F, L, M, N, O, P, Q and n = 7 × 24 × 31 hourly observations of March, 2006. We define the mobile average smoothing t X 1 MAs := Tt,js = yi,j s i=t−s

with seasonal parameter s for station j in the time t and, we consider a multivariate data set XY contained 7 stations, a subset X of 6 monitoring stations and we choose one not monitored station Y to be removed from XY in such iteration s to compute IXY index related to multivariate normal, skew normal and skew-t variables. By Azzalini (1999), for a sample of independent observations zi ∼ SNk (ξi , Ω, η), i = 1, ..., n; we estimate parameters set by numerically maximizing the likelihood function: n

X n n −1 log L(ΘSN ) = − log |Ω| − tr(Ω V ) + log Φ[η T (zi − ξi )] 2 2 i=1 Pn 1 where ΘSN = {ξ, Ω, η} and V = n i=1[zi − ξi ][zi − ξi ]T . Now, if zi ∼ STk (ξi , Ω, η), we use the reparameterization and log-likelihood of Azzalini & Capitanio (2003). Let Ω = (AT DA)−1, where A is a upper triangular k × k matrix with diagonal terms equal to 1, D = diag(e−2ρ) and ρ ∈ Rk . So, for the parameters set ΘST = {ξ, A, ρ, η, log ν}, we obtain   s n n X X ν+k n/2 T  log L(ΘST ) = n log 2 + log |D| + log tk (zi − ξi ; ν) + log T η (zi − ξi ) ; ν + k ν + si i=1

k

k

We say that a random vector Z ∈ R has skew-elliptical distribution, with location vector ξ ∈ R , dispersion matrix Ω ∈ Rk ×k , shape/skewness parameter η ∈ Rk and density generator function h(k +1), which is denoted by Z ∼ SElk (ξ, Ω, η, h(k +1)), if its pdf be a is (1)

pZ (z) = 2fk (z; ξ, Ω, h(k ))F (η T (z − ξ); hs ),

z ∈ Rk .

i=1

b ST = arg minΘ {log L(ΘST )}, we can obtain the MLE {ξ, b Ω, b ηb, νb} of where si = (zi − ξi )T Ω−1(zi − ξi ). So, from Θ ST {ξ, Ω, η, ν}.

(6) (1)

Here fk (z; ξ, Ω, h(k )) = |Ω|−1/2h(k)(s), where s = z0T z0 and z0 = Ω−1/2(z − ξ), is the Elk (ξ, Ω, h(k ))-pdf, and F (x; hs ) = R x (1) (1) (k +1) (k) h (w)dw is the univariate cdf induced by the conditional density generator function h (u) = h (s +u)/h (s). s s −∞ Let H0El be the entropy of the spherical Elk (0, Ik , h(k )) distribution. Then, the entropy of Z0 ∼ SElk (0, Ik , η¯, h(k +1)) is (1)

H0SEl = H0El − E[log 2F (¯ η T Z0; hS )], where S = Z0T Z0 and η¯ = Ω1/2η.

The Skew-Normal case Let Z ∼ SNk (ξ, Ω, η) and Z0 = Ω−1/2(Z − ξ). Let also Z0N ∼ Nk (0, Ik ) and W0 ∼ SN(k¯ η k), where k¯ η k = (η T Ωη)1/2. Then, d d Z0 ∼ SNk (0, Ik , η¯) ≡ SNk (¯ η ), η T (Z − ξ) = η¯T Z0 = k¯ η kW0, and g(Z0) = g(Z0N ) for any even function g. The entropy of a SN random vector Z0 ∼ SNk (¯ η ) is H0SN = H0N − E {log[2Φ(k¯ η kW0)]}, where H0N is the entropy of Z0N . The MI for (X , Y ) is    VXY SN N IXY = IXY (Ω) + E log VX(Y)VY(X) where VXY = 2Φ(k¯ ηXY kWXY ), VX (Y ) = 2Φ(k¯ ηX (Y )kWX (Y )) and VY (X ) = 2Φ(k¯ ηY (X )kWY (X )). Let WSN ∼ SN(α) and WN ∼ N(0, 1), then E{log [2Φ(αWSN )]} = E{2Φ(αWN )log [2Φ(αWN )]}.

SN ST Figure: Graphics of a) IXY and b) IXY when the Y = F, L, M, N, O, P or Q station has removed from the network X . The vertical dotted gray lines corresponding to s = {8, 16, 24, 32, 40}.

Some conclusions We are proposed an alternative way to compute the Shannon’s Information Index for data with high presence of skewness. The calculation of this index produces a similar expression of normal and t case but the exception of a new term represent by an one dimension integral such as, it’s easily and quickly computed by a numeric standard method. Moreover, a simulation study shows the convergence of this integral and in fact, the Skew Normal and Skew-t Mutual Information Index. Finally, an analysis of a optimal design network of a classical pollutant is arrived where, the principal objective is choosing a network design in an optimal way through established methods of Shannon’s Index maximizing by several authors. In this paper, we are given the tools to compute this new information index and normalizing the methodologies for the further researches.