Elliptical regression models for multivariate sample ...

0 downloads 0 Views 707KB Size Report
Feb 3, 2016 - Elliptical regression models for multivariate sample-selection ... r , and the joint distribution of the error terms of ϵi and ηi is assumed to follow.
Journal of the Korean Statistical Society 45 (2016) 422–438

Contents lists available at ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Elliptical regression models for multivariate sample-selection bias correction Hea-Jung Kim a , Hyoung-Moon Kim b,∗ a

Department of Statistics, Dongguk University—Seoul, Seoul, Republic of Korea

b

Department of Applied Statistics, Konkuk University, Seoul, Republic of Korea

article

abstract

info

Article history: Received 18 May 2015 Accepted 11 January 2016 Available online 3 February 2016

In linear regression, a multivariate sample-selection scheme often applies to the dependent variable, which results in missing observations on the variable. This induces the sampleselection bias, i.e. a standard regression analysis using only the selected cases leads to biased results. To solve the bias problem, in this paper, we propose a class of multivariate selection regression models by extending classic Heckman model to allow for multivariate sample-selection scheme and robustness against departures from normality. Necessary theories for building a formal bias correction procedure, based upon the proposed model, are obtained, and an efficient estimation method for the model is provided. Simulation results and a real data example are presented to demonstrate the performance of the estimation method and practical usefulness of the multivariate sample-selection models. © 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

AMS 2000 subject classifications: primary 62J99 secondary 60E05 Keywords: Bias correction Heckman model MCECM algorithm Multivariate sample-selection regression Rectangle-screened scale mixture of normal

1. Introduction Many authors working with regression analysis have considered the problem of potentially biased estimates arising from the selection process that generated the sample (e.g., Azzalini & Capitanio, 1999; Heckman, 1974; Hevia & Arrazola, 2009 and Marchenko & Genton, 2012). This problem naturally occurs when observations of the dependent variable of the regression model are missing not at random (MNAR; Rubin, 1976) owing to a sample-selection rule such as incidental truncation, hidden truncation, or censoring (e.g., Greene, 2008). That is, observations of the dependent variable, y∗i ∈ R, in the regression model can be observed as yi = y∗i only when a corresponding latent variable u∗i ∈ R belongs to an interval C ⊂ R of its support. For the case of an unbounded interval C ≡ (0, ∞), Heckman (1974) introduced the classic sample-selection model as follows. y∗i = x⊤ i β + ϵi ,

i = 1, . . . , N ,

(1)

with a single-selection equation u∗i = w⊤ i γ + ηi ,

i = 1, . . . , N ,

(2)

such that the observations are missing on y∗i for all cases in which u∗i ≤ 0. Here the vectors β ∈ Rq and γ ∈ Rr are unknown parameters, xi ∈ Rq , wi ∈ Rr , and the joint distribution of the error terms of ϵi and ηi is assumed to follow

     2 ϵi 0 σ ∼ N2 , ηi 0 ρσ ∗

ρσ 1



.

Corresponding author. E-mail address: [email protected] (H.-M. Kim).

http://dx.doi.org/10.1016/j.jkss.2016.01.003 1226-3192/© 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

(3)

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

423

Under the ‘‘Heckman model’’ (also known as ‘‘Type 2 tobit model’’ considered by Amemiya, 1985), our interest is to estimate β by using observed values of the explanatory variables for all the sample, and N1 of the N observations for y∗i satisfied by the univariate selection (or single-selection) rule. Since we only observe N1 observations yi = y∗i for which si = 1 with si = I (u∗i ∈ C ), the regression function for the Heckman model reduces to ⊤ E [yi |xi , wi ] = x⊤ i β + ρσ λ(wi γ),

i = 1, . . . , N1 < N ,

(4)

where λ(·) = φ(·)/Φ (·) and φ(·) and Φ (·) denote the standard normal probability density and distribution function, respectively (e.g., Catsiapis & Robinson, 1982). Here, the result is a biased estimator of β if we perform the OLS regression for (1) on the selected sample, ignoring the extra term ρσ λ(w⊤ i γ) in (4) for ρ ̸= 0. As a result, various bias correction procedures for consistent estimation of β in the regression model (1), as well as its applications, have been dealt with extensively in econometrics literature. See, Griliches, Hall, and Hausman (1977) and Heckman (1979) for the estimation, Das, Newey, and Vella (2003) and Newey (1999) for some robust estimation methods, and Hevia and Arrazola (2008) and Hoffmann and Kassouf (2005) for some empirical applications. With regard to the variants of Heckman model, Marchenko and Genton (2012) used the heavy-tailed bivariate t-distribution for the error model (3) to relax the assumption of normality, and Catsiapis and Robinson (1982) considered a bias correction procedure for the regression model involving multiple independent selection rules. The model also has been generalized to the case of bivariate selection rules (i.e. two correlated selection rules) and applied to the empirical analysis of different phenomena. Mohanty (2001) demonstrated the bivariate selection model in analyzing male–female wage differences; Serumaga-Zake and Naudé (2003) used a bivariate selection scheme to estimate private returns to education in South Africa; and Hevia and Arrazola (2009) considered the marginal effects in the bivariate selection model on wages in Spain. In practical situations, however, the sample selection process often comprises multiple selection rules. A typical example is the customer segmentation analysis where customers are grouped by multiple selection rules for segmenting the level of consumer attitude, motivations, patterns of usage, and preferences (Jiang & Tuzhilin, 2006; Marcus, 1998). The analysis has been considered as one of the standard techniques used by marketers of insurance agencies, credit card companies, manufacturers and so on. As seen in Section 5, its popularity comes from the fact that segmented models usually outperform aggregated models of customer behavior (Besanko, Dube, & Gupta, 2000). In the segmented model, the observed yi ’s may be regarded as outcome of a p-variate selection scheme in (2) with p bounded intervals Cj∗ ≡ (aj , bj ), such that we only observe yi = y∗i for si = 1 and si = I (u∗ij ∈ Cj∗ , j = 1, . . . , p) in the sample-selection model (1) with a non-normal error distribution in (3). Unfortunately, even for the case of p = 1, all the bias correction procedures, i.e. the Heckman model and its variants, no longer apply to the case where the sample-selection rule is defined by a bounded interval C ∗ (i.e. C1∗ ), because the extra term in the regression function (4) takes a different form. In particular, the first step of the two-step estimation method by Heckman (1979) cannot be applicable to the case of the bounded interval C ∗ , because the first step estimates γ in (4) by using the probit model which fits the binary response si ’s defined by the open interval C . Further, surprisingly, a sample-selection bias correction method for the model with a selection rule using C ∗ has not been tackled in the literature. These motivated us to consider the contents of this paper. In this paper, we develop and study the properties of a class of sample-selection models that extends the classical Heckman model to allow for p-variate selection scheme (i.e. multivariate selection model) consisting of various bounded (C ∗ )/unbounded (C ) intervals and robustness against departures from normality of the error distributions. We then propose a sample-selection bias correction procedure which yields a consistent estimation of β defined in the multivariate selection model. The rest of this paper is organized as follows. Section 2 generalizes the Heckman model to develop a class of multivariate selection models whose robust-to-normality comes from assuming a family of elliptical distributions for the error distribution in (3). We then study some interesting properties of a model belonging to the class such as the conditional distribution of selected observation, yi ; a hierarchical representation of the distribution of yi ; moments of yi ; and exact likelihood function of the model. In Section 3, we describe some special models belonging to the class and study their relationship with the Heckman model. Section 4 constructs a bias correction procedure which uses an extended EM algorithm for estimating the multivariate selection models. A test procedure for testing existence of the selection bias in the proposed model is also given. The finite-sample performance of the algorithm is examined using a limited, but informative simulation study in Section 5. This section also gives a real data example to demonstrate the performance of the sample-selection bias correction procedure under the multivariate selection models which were proposed in this paper. The paper concludes with a discussion in Section 6.

2. Multivariate selection regression model In this section, we extend the classical Heckman regression model to a class of multivariate selection regression models (MSRM) whose selection rules are composed of various bounded or unbounded intervals. The primary interest of the regression model is the same as in (1), but now the selection equation (2) becomes a multivariate version. In addition, the distributional assumption of the normal error model in (3) is relaxed by using a flexible elliptically contoured error model.

424

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

2.1. The class of MSRM Consider a random sample of N observations in the MSRM, where the regression equation for individual i may be written as y∗i = x⊤ i β + ϵi ,

i = 1, . . . , N ,

(5)

with p (p > 1) selection equations, u∗ij = w⊤ ij γ j + ηij ,

i = 1, . . . , N and j = 1, . . . , p,

(6)

where the joint distribution of the random errors follow an elliptically contoured (EC ) distribution, given by ei =

    2   0 ϵi σ ,Ω = ∼ EC ℓ ψ = 0 ηi σρ

  σ ρ⊤ , h(ℓ) , Ωη

(7)

with joint density fℓ (ei |ψ, Ω , h(ℓ) ) = |Ω |−1/2 h(ℓ) [(ei − ψ)⊤ Ω −1 (ei − ψ)],

ei ∈ Rℓ .

(8)



Here, ei are independent, ℓ = p + 1, ψ ∈ R is a location vector, Ω is an ℓ × ℓ scale matrix, and h(ℓ) is the density generator function h(ℓ) (ω), for ω ≥ 0. In addition, Ωη is a correlation matrix of ηi = (ηi1 , . . . , ηip )⊤ and ρ = (ρ1 , . . . , ρp )⊤ which determines the severity of the selection process, and ρj ∈ (−1, 1), for j = 1, . . . , p. See, Fang, Kotz, and Ng (1990) for the properties of the EC distribution. Suppose we observe only N1 of the N observations of y∗i , for si = 1, from the model given in (5). We also observe the indicator si and the other explanatory variables, xi and wij , for the whole sample, where si = I (uij = 1, ∀j = 1, . . . , p),

uij = I (aj < u∗ij < bj ),

(9)

p-bounded intervals. Then, the multivariate selection model is composed of the continuous component, g (yi |si = 1), and ⊤ the discrete component, P (Si = si ), where yi = y∗i si . From MSRM, we see that the joint distribution of e∗i = (yi , η⊤ i ) is d

⊤ ⊤ EC ℓ (ψ ∗i , Ω , h(ℓ) ) and the conditional distribution [yi |si = 1] = [yi |ηi ∈ di ], where ψ ∗i = (x⊤ i β, 0 ) , di = (a − ci , b − ci ) ⊤ ⊤ p ⊤ ⊤ ⊤ is a rectangle in R , a = (a1 , . . . , ap ) , b = (b1 , . . . , bp ) , and ci = (wi1 γ 1 , . . . , wip γ p ) . This conditional distribution is essentially a member of the class of skew-elliptically contoured (SEC ) distributions discussed by Arellano-Valle, Branco, and Genton (2006). If we denote the distribution law of [yi |ηi ∈ di ] by SEC (di ; ψ ∗ , Ω , h(ℓ) ), a direct application of Eq. (9) in Arellano-Valle et al. (2006) leads to the following lemma.

Lemma 1. The conditional distribution of the observed yi , defined by the MSRM, is a skew-elliptically contoured (SEC) distribution

[yi |si = 1] ∼ SEC (di ; ψ∗i , Ω , h(ℓ) ),

i = 1, . . . , N ,

(10)

with its density of the observed yi given by 2 (1) f (yi |si = 1) = f1 (yi | x⊤ ) i β, σ , h

F¯h(p)

di ; ψ ηi |yi , Ωηi |yi



z (yi )

 (11)

F¯h(p) (di ; 0, Ωη )

2 (1) for yi ∈ R, where f1 (yi ; ·) is the pdf of the EC 1 (x⊤ ) distribution, and F¯h(p) (·) and F¯h(p) (·) are the respective i β, σ , h

(p)

z (yi )

probabilities of the p-dimensional rectangle region, di , under the EC p (ψ ηi |yi , Ωηi |yi , hz (y ) ) and EC p (0, Ωη , h(p) ) distributions. i (p) ⊤ (ℓ) (1) (z (yi )) is the induced conditional generator, Here, ψ ηi |yi = ρ(yi − x⊤ i β)/σ , Ωηi |yi = Ωη − ρρ , hz (y ) = h (u + z (yi ))/h i

2 2 and z (yi ) = (yi − x⊤ i β) /σ .

According to Lemma 1, the log-likelihood function for the MSRM based on N pairs of observations, (yi , si ), for i = 1, . . . , N, can be written as l(β, γ, σ , ρ, Ωη ; y, s) =

N 

l(β, γ, σ , ρ, Ωη ; yi , si )

i =1

=

N   

si ln F¯h(p)

z (yi )

i=1



 2 (1) di ; ψ ηi |yi , Ωηi |yi + ln f1 (yi | x⊤ ) i β, σ , h



  + (1 − si ) ln 1 − F¯h(p) (di ; 0, Ωη ) , because the discrete component can be written as P (Si = si ) = F¯h(p) (di ; 0, Ωη )



si 

1 − F¯h(p) (di ; 0, Ωη )

1−si

,

where γ = (γ 1 , . . . , γ p ) . For the remainder of this paper, we use the symbols in (12) with the same definition. ⊤

⊤ ⊤

(12)

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

425

The property of conditional density (11) was recently studied in detail by Kim and Kim (2015) under a special model obtained by choosing a multivariate normal density generator function. Note that the multivariate selection model changes depending on the choice of density generator function, h(ℓ) , in the error distribution, (7). In the special case where a MSRM consists of a univariate selection rule with p = 1, a = 0, and b = ∞, and the corresponding density generator functions are h(2) (ω) = exp{−ω/2}/(2π ) and h(2) (ω) = ν (ν+2)/2 (ν + ω)−(ν+2)/2 /(2π ), the MSRM reduces to the Heckman model proposed by Heckman (1974) and the Heckman selection-tν model studied by Marchenko and Genton (2012), respectively. Thus, the MSRM defines a class of flexible multivariate-selection Heckman models, relaxing the classic assumption of underlying normality and the unbounded selection rule with aj = 0 and bj = ∞, for all j = 1, . . . , p (e.g., see Hevia & Arrazola, 2009). 2.2. MSRM based on the scale mixture of normal errors Consider the case where the ℓ × 1 error vector, ei , in (7) is assumed to be a scale mixture of normal error which is a useful member of the class of the EC errors in (7). The density generator function, h(ℓ) (ω), for the ℓ-variate scale mixture normal is h(ℓ) (ω) =





(2π κ(θ ))−ℓ/2 exp{−ω/2κ(θ )}dH (θ ),

(13)

0

where θ is a mixing variable with the distribution function H (θ ) for θ > 0 and κ(θ ) is a weight function. Therefore, the MSRM with a scale mixture of normal errors (denoted by MSRMSMN ) is defined by (5) through (9), with the density generator ⊤ function (13). In other words, under the MSRMSMN , the distribution of ei = (ϵi , η⊤ i ) is G ∈ F , where



F = G : Nℓ (ψ, κ(θ )Ω ) , θ ∼ H (θ ) with κ(θ ) > 0, and







dH (θ ) = 1

(14)

0

and ℓ = (1 + p). Note that F , as defined by (14), denotes a class of the scale mixture of normal (SMN) distributions which will be denoted by SMNℓ (ψ, Ω , κ, H ). See, Chen and Dey (1998) for various mixture distributions. In this model, ρ = 0 does not imply the independence of ϵi and ηi (i.e., the independence of regression equation (5) and selection equations (6)), unless κ(θ ) = 1. When the MSRMSMN is considered for the regression, Lemma 1 leads to the distribution of the observed yi , i = 1, . . . , N. Corollary 1. Under the MSRMSMN , the distribution of sample-selected observation is [yi |si = 1] ∼ SSMN (di ; ψ ∗i , Ω , κ, H ), a skewed scale mixture of normal distribution with the density given by 1







1

yi − x⊤ i β



¯ p∗ (i) dH (θ ), φ Φ (15) κ(θ )1/2 σ κ(θ )1/2 σ ∞     ¯ p di ; 0, κ(θ )Ωη dH (θ ), φ(·) is the pdf of N (0, 1), and Φ ¯ p di ; 0, κ(θ )Ωη and Φ ¯ p∗ (i) = Φ where Qsi = 0   ⊤ ⊤ ¯ p di ; ρ(yi − xi β)/σ , κ(θ )(Ωη − ρρ ) are the respective probabilities of the p-dimensional rectangle region, di , under the Φ ⊤ Np (0, κ(θ )Ωη ) and Np (ρ(yi − x⊤ i β)/σ , κ(θ )(Ωη − ρρ )) distributions. fSSMN (yi |si = 1) =

Qsi

0

Proof. Proof is given in Appendix A.1.



If the distribution H (θ ) degenerates at κ(θ ) = 1, the density (15) reduces to that of ESN 1 (di ; ψ ∗i , Ω ), the extended skew-normal distribution, studied by Arellano-Valle et al. (2006) and Kim and Kim (2015). A hierarchical representation of the distribution of an observed yi , namely [yi |si = 1] ∼ SSMN (di ; ψ ∗i , Ω , κ, H ), is given by the following lemma. d

Lemma 2. Let Udi = (ηi |ηi ∈ di ) where the conditional distribution of ηi given θ is [ηi |θ] ∼ Np (0, κ(θ )Ωη ). Then a three stage hierarchical of the MSRMSMN , defined by [yi |si = 1] ∼ SSMN (di ; ψ ∗i , Ω , κ, H ) distribution, is



yi |Udi = udi , θ



Udi |θ



⊤ −1 2 ⊤ −1 ∼ N (x⊤ i β + σ ρ Ωη udi , κ(θ )σ (1 − ρ Ωη ρ)),



∼ TN dp i (0, κ(θ )Ωη ),

(16)

θ ∼ H (θ ), where TN di (0, κ(θ )Ωη ) denotes a p-variate truncated normal Np (0, κ(θ )Ωη ) distributed in the rectangle space {udi ∈ di }. Proof. Proof is given in Appendix A.2.



Lemma 2 provides an efficient method for generating a random variable with distribution SSMN (di ; ψ ∗i , Ω , κ, H ). It also reveals the structure of the class of SSMN distributions and indicates the kind of departure from the scale mixture of normal distribution. In addition, using the lemma, we can clarify the estimation problem associate with MSRMSMN as follows.

426

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

Theorem 1. The MSRMSMN reduces to a regression equation for the observed data, {yi }, whose regression function is given by ⊤ −1 E [yi |si = 1] = x⊤ i β + σ ρ Ωη E [Udi ],

i = 1, . . . , N1 , N1 < N ,

(17)

where ∞





0

and Qsi =

∞ 0

 u⊤ Ω −1 u  η exp − dudH (θ ), p / 2 1 / 2 Qsi (2π ) |κ(θ )Ωη | 2κ(θ ) u

E [Udi ] = di

¯ p (di ; 0, κ(θ )Ωη )dH (θ ). Φ d

Proof. The first and the second stages of the hierarchical models in Lemma 2 indicate that [yi |si = 1, θ] = x⊤ i β +

σ ρ⊤ Ωη−1 Udi + σ (1 − ρ⊤ Ωη−1 ρ)1/2 V , where Udi ∼ on θ . Since E [E [V |θ]] = 0, we have the result. 

d TN p i

(0, κ(θ )Ωη ) and V ∼ N (0, κ(θ )) are independent conditionally

The regression function of the MSRMSMN in (17) indicates that the inconsistency of the OLS estimator can be explained by the inclusion of the extra term, σ ρ⊤ Ωη−1 E [Udi ] whose sign and magnitude depend on the values of ρ as well as E [Udi ]. Also note that the selection bias term (i.e., the extra term) vanishes when ρ = 0 or all the si ’s are equal to one for i = 1, . . . , N, i.e. N1 = N. Ho, Lin, Chen, and Wang (2012) and Manjunath and Wilhelm (2009) give exact expressions for E [Udi ] for important members of the class of SMN distributions, such as truncated multivariate normal and multivariate t-distributions. 3. Relationship with the Heckman model In this section, we develop two useful members of the class of MSRMSMN . Then, we comment on link between the two multivariate selection models and the Heckman model, which shows that the proposed class is a multivariate generalization of the univariate selection model proposed by Heckman (1974). 1. Multivariate selection-normal model One member of the MSRMSMN is the MSRM with multivariate normal errors (denoted by MSRMN ) for which H (θ ) is degenerate, with κ(θ ) = 1, in distribution (14). In this case, the density (15) becomes fN (yi |si = 1) =

1

σ

φ



yi − x⊤ i β

σ

  ¯  ⊤ Φp di ; ρ(yi − x⊤ i β)/σ , Ωη − ρρ   . ¯ p di ; 0, Ωη Φ

(18)

This density function is a univariate version of the skew-normal density function considered by Arellano-Valle et al. (2006). Kim and Kim (2015) further investigated the properties and statistical applications of this distribution. We shall write the extended skew-normal distribution as

  [yi |si = 1] ∼ ESN 1 di ; ψ∗i , Ω d

and the conditional expectation of the observed data is obtained by setting Udi ∼ TN p i (0, Ωη ) in Eq. (17). See Manjunath and Wilhelm (2009) for the expectation of the truncated multivariate normal distribution, i.e. E [Udi ]. In the special case where the MSRMN consists of an unbounded and univariate selection rule, i.e. p = 1, a = 0, and b = ∞, it is straightforward to see that the conditional density (18) and the log-likelihood function of the MSRMN reduce to those of the Heckman model defined in Section 1 (e.g., see Marchenko & Genton, 2012, pp. 305) and the conditional expectation is equal to that of the Heckman model given in (4) as well. When the MSRMN with p = 1 is considered (i.e. the classic Heckman model with a bounded interval C ∗ = (a, b)), we see that E [Udi ] becomes the first moment of a doubly truncated standard normal (see, e.g. Kim, 2008a) whose exact expression is given by

λ(a,b) (w⊤ i γ) =

⊤ φ(a − w⊤ i γ) − φ(b − wi γ) , ⊤ Φ (b − w⊤ i γ) − Φ (a − wi γ)

i = 1, . . . , N1 < N .

(19)

⊤ Fig. 1 visualizes value of λ(a,b) (w⊤ i γ) in the MSRMN regression function (17) and compares λ(wi γ) from classic Heckman model in (4). According to Fig. 1, we can say that value of λ(a,b) (w⊤ γ) depends of the choice of the bounded interval C ∗ i and linearly decreases as the value of the selection linear predictor w⊤ γ increases. We also see Fig. 1 that magnitude of |λ(a,b) (w⊤ i γ)| becomes larger as the possibility of the sample-selection becomes the lower (or equivalently absolute value of w⊤ γ become larger). Therefore, the value of the extra term in the regression function (17) is not a monotone function of i ρ . These properties of the MSRMN are different from those of Heckman model (4) where λ(w⊤ i γ) is a monotone decreasing positive function and tends to go zero as the value of w⊤ γ increases. So that the extra term in (4) is a monotone function of ρ . The marginal effects of the predictors on the dependent variable yi in the observed sample can be also obtained. In the context of the multivariate selection-normal model, the marginal effects linked to different expected values in (17) permit

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

427

Fig. 1. Plot of λ(a,b) (k) for different values of a and b: λu is λ(0,∞) (k) in Heckman model; λbound (−1, 0) = λ(−1,0) (k) and λbound (1, 2) = λ(1,2) (k) are λ factor in the MSRMN .

the analysis and comparison of the effects of the explanatory variables, xik ’s, on the dependent variable yi (see, e.g. Hevia & Arrazola, 2009). Suppose that xik and wik are respective elements of xi and wi and xik = wik then the marginal effect of xik on yi is

∂ E [yi |si = 1, wi ] = βk + ρσ γk λ′(a,b) (w⊤ i γ). ∂ xik Otherwise, it is βk , where λ′(a,b) (w⊤ i γ) =

(20)

⊤ ⊤ ⊤ (a − w⊤ i γ)φ(a − wi γ) − (b − wi γ)φ(b − wi γ) − λ2(a,b) (w⊤ i γ). ⊤ Φ (b − w⊤ i γ) − Φ (a − wi γ)

We can easily see that (19) and (20) are the same as those obtained from the Heckman regression function in (4), provided that a = 0 and b = ∞ (see, e.g. Marchenko & Genton, 2012). These relationships between the special case of the MSRMN and the Heckman model indicate that the MSRMN is a natural multivariate extension of the Heckman model. 2. Multivariate selection-t model Another member of the MSRMSMN is the MSRM with multivariate Student’s tν error distribution (denoted by MSRMtν ). This error model is obtained from (14) by setting κ(θ ) = 1/θ and θ ∼ Gamma(ν/2, ν/2) with density h(θ ) =

 ν  (ν/2)ν/2 ν/2−1 θ exp − θ , Γ (ν/2) 2

θ > 0.

After performing some integral algebra in (15), the conditional density of the MSRMtν becomes ∗ T¯p di ; ρ(yi − x⊤ i β)/σ , Ωη|yi , ν + 1



fν (yi |si = 1) = tν (yi |xi β, σ ) ⊤

2

T¯p (di ; 0, Ωη , ν)

 (21)

∗ −1 2 2 for yi ∈ R, where Ωη| ν + (yi − x⊤ (Ωη − ρρ⊤ ). Here, tν (· |α, β) denotes the pdf of the univariate i β) /σ yi = (ν + 1)





Student’s t-distribution with location α , scale parameter β , and degrees of freedom ν . Furthermore T¯p (di ; α, B, ν) = P (Y ∈ di ), where Y follows a p-variate tν distribution with location vector α and scale matrix B (denoted by tp (α, B, ν)). The density produces a heavy-tailed and skewed MSRM that is useful for its robustness (e.g., see Arellano-Valle et al., 2006). When the MSRMtν is assumed, the conditional expectation of the observed data becomes ⊤ −1 E [yi |si = 1, Wi ] = x⊤ i β + σ ρ Ωη E [Udi ], d

d

i = 1, . . . , N1 ,

(22)

whereUdi = [Y|Y ∈ di ] ∼ Ttp i (0, Ωη , ν) is a truncated Y ∼ tp (0, Ωη , ν), distributed in the rectangle space {y ∈ di }. An exact moment expression of Udi is given by Ho et al. (2012) and Kim (2008b).

428

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

Fig. 2. Plots of λ(a,b)ν (k)/λ(a,b) (k) for different values of k: λtν = λ(a,b)ν (k) and λbound = λ(a,b) (k).

For the MSRMtν with p = 1 and the bounded selection interval C ∗ ∈ (a, b) (i.e., an extended version of Heckman selection tν model), the expected value E [Udi ] in (22) becomes

Γ ((ν − 1)/2)ν ν/2 T ((a, b); w⊤ i γ, ν)   λ(a,b)ν (w⊤ i γ) = √ ⊤ 2 π Γ (ν/2) F (b(w⊤ i γ), ν) − F (a(wi γ), ν)

(23)

and the marginal effect of kth explanatory variable xik on the response yi is

∂ E [yi |si = 1, wi ] = βk + ρσ γk λ′(a,b)ν (w⊤ i γ), ∂ xik ⊤ ⊤ ⊤ where F (·, ν) is the df of a standard Student tν variate, a(w⊤ i γ) = a − wi γ, b(wi γ) = b − wi γ ,  −(ν−1)/2  −(ν−1)/2  2 2 T ((a, b); w⊤ ν + a(w⊤ − ν + b(w⊤ , i γ, ν) = i γ) i γ)  Tν′ ((a, b); w⊤ Γ ((ν − 1)/2)ν ν/2  i γ)   λ′(a,b)ν (w⊤ √ i γ) =  F (b(w⊤ γ)) − F (a(w⊤ γ)) 2 π Γ (ν/2) ν ν i i    ⊤ ⊤  Tν ((a, b); w⊤ i γ) tν (a(wi γ); 0, 1) − tν (b(wi γ); 0, 1) − ,  2   Fν (b(w⊤ γ)) − Fν (a(w⊤ γ)) i

(24)

i

(ν − 1)a(wi γ) (ν − 1)b(w⊤ i γ) (ν+1)/2 −  (ν+1)/2 . 2 2 ν + a(w⊤ ν + b(w⊤ i γ) i γ) ⊤

T ′ ((a, b); w⊤ i γ, ν) = 

⊤ ⊤ Fig. 2 depicts the function λ(a,b)ν (w⊤ i γ)/λ(a,b) (wi γ) for different values of wi γ . Fig. 2 shows that, for large absolute ⊤ values of the selection predictor k = wi γ , the conditional expectation will be underestimated under the MSRMN for moderate values of degrees of freedom, ν . Further, Fig. 2 is not symmetric plots. These confirm that the MSRMtν produces the conditional expectation (24) robust to heavy-tailed and skewed error distributions. ′ ⊤ Fig. 3 is graphs of the function defined by λ′(a,b) (w⊤ i γ) and λ(a,b)ν (wi γ) in the marginal effect (20) and (24), respectively. The left panel of the figure plots the functions of classic Heckman model and Heckman-t model, i.e. unbounded case with C = (0, ∞), while the right panel depicts those of MSRMN and MSRMtν with p = 1. Comparing these two panels, we can see the followings. First, λ′(a,b) s are the monotone increasing functions of w⊤ i γ for the Heckman models, while they are not

monotonic functions for the MSRMs. Specifically, unlike the Heckman models, they decrease as the value of w⊤ i γ becomes large. Second, the right panel shows that the MSRMtν produces the conditional marginal effect (24) robust to heavy-tailed and skewed error distributions. That is, the conditional marginal effect of xik on y will be overestimated by the MSRMN for large absolute values of w⊤ i γ and moderate values of ν degrees of freedom. On the other hand, for the classic Heckman model, this overestimation holds only for the negative values of w⊤ i γ. It is straightforward to see that the bias term (23) and the marginal effects (24) are equal to those of the Heckman selection-tν model studied by Lee (1983) and Marchenko and Genton (2012), provided that p = 1, a = 0, and b = ∞. Thus, the MSRMtν defines a multivariate extension of the Heckman selection-tν model that allows for the multivariate selection scheme with bounded intervals.

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

429

Fig. 3. Plots of λ′(a,b) (k) and λ′(a,b)ν (k) for different values of k: λ′tν = λ′(a,b)ν (k) and λ′bound = λ′(a,b) (k).

4. Inference of the MSRMSMN 4.1. Efficient estimation To estimate the MSRMSMN , we may think of generalizing the two-step estimation proposed by Heckman (1979) to the case p of multivariate selection as follows. In the first step, the probit model, P ( j=1 {u∗ij > 0}|wij , j = 1, . . . , p), is fit to the data to estimate γ (i.e., E [Udi ]) in (17) and then β can be obtained by the OLS estimation applied to (17). However, the two-stage estimation does not apply to the estimation of the MSRMSMN , because the model assumes that b ̸= ∞ and non-normality of its error distribution. Thus, we cannot use the probit model to estimate γ . Instead, we suggest a new estimation method using a variant of the EM algorithm (Dempster, Laird, & Rubin, 1977). In this method, we use the hierarchical structure of the MSRMSMN given by (16). The hierarchical model for the joint distribution of yi and the binary variable Si with the space {si ; si = 0, 1} is ind.

⊤ −1 2 ⊤ −1 [yi |si = 1, Udi = udi , θi ] ∼ N (x⊤ i β + σ ρ Ωη udi , κ(θi )σ (1 − ρ Ωη ρ)),   ind. Udi |si = 1, θi ∼ TN di (0, κ(θi )Ωη ),   ¯ p (di ; 0, κ(θi )Ωη )si 1 − Φ ¯ p (di ; 0, κ(θi )Ωη ) 1−si , P (Si = si |θi ) = Φ i.i.d.

θi ∼ h(θi |ζ),

i = 1, . . . , N ,

where ζ represents the parameters in the distribution of the mixing variable, θ . Let (u, y, s, θ) denote the complete dataset, and the related parameter vectors be (β, γ, σ , ρ, Ωη , ζ), where u = (ud1 , . . . , udN ), y = (y1 , . . . , yN ), s = (s1 , . . . , sN ), and θ = (θ1 , . . . , θN ). Then, treating u and θ as missing data, we can apply an extended EM algorithm (Dempster et al., 1977). By the invariance property of MLEs, we can use transformations of the parameters, such as ψ = σ 2 (1 − ρ⊤ Ωη−1 ρ) and ρ∗ = σ ρ. After obtaining the MLEs, we apply back transformations

using σ 2 = ψ + ρ∗⊤ Ωη−1 ρ∗ and ρ = ρ∗ /σ . Therefore, the related parameters are Θ = (β, γ, ψ, ρ∗ , Ωη , ζ). According to the above hierarchical model, the complete-data log-likelihood function for the MSRMSMN , ignoring additive constant terms, can be written as lc (Θ ; u, y, s, θ) = −



+

N 1

2 i =1 N 1

2 i =1

si ln(κ(θi )ψ) −

 2 ⊤ N si yi − x⊤ β − ρ ∗ Ω −1 ud  η i i i=1

si ln |κ(θi )Ωη | −

N 1

2 i=1

2κ(θi )ψ −1 si u⊤ di Ωη udi /κ(θi )

N N     ¯ p (di ; 0, κ(θi )Ωη ) + (1 − si ) ln 1 − Φ ln h(θi |ζ). i =1

i=1

Thus, we see that the maximization of the expected log-likelihood can be done separately in an iteration of the EM algorithm, because (β, γ, ψ, ρ∗ , Ωη ) appears in the first five parts (the MSRMN part) and ζ appears in the last part of the right-hand side of the equation. Despite this advantage of the EM algorithm, we see that it is impossible to derive expressions for the expected likelihood analytically (i.e., Q = EΘ [lc (Θ ; u, y, s, θ)|y, s]) because the conditional distribution of θ , given y, s, and Θ , is complex. For this reason, we resort to using a Monte Carlo method for both expectation and maximization.

430

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

In this section, we propose a generalization of the expectation/conditional maximization either (ECME) algorithm (Liu & Rubin, 1994), namely the Monte Carlo ECME, with MCMC in the CM step. The Monte Carlo ECME algorithm replaces the expected complete-data log-likelihood function with respect to θ with the Monte Carlo estimate. The Metropolis–Hastings (M–H) algorithm is used to sample the conditional distribution of θ . For the M–H algorithm, we choose the proposal to be the marginal density h(θ|ζ) of θ , as described in McLachlan and Krishnan (2008, p. 249). Thus, in the kth iteration of the Monte Carlo ECME algorithm, the transition probability of selection turns out to be Min{1, p˜ k }, where p˜ k =

ˆ (k) )h(θ|ζˆ f (θ ∗ |y, s, Θ ˆ (k) )h(θ ∗ |ζˆ f (θ|y, s, Θ

(k) (k)

)

=

ˆ (k) ; y, s, θ ∗ )} exp{lp (Θ

(25)

ˆ (k) ; y, s, θ)} exp{lp (Θ

)

and lp (Θ ; y, s, θ) =

N  

1



si ln

φ

yi − x⊤ i β





  ⊤ ¯ p di ; ρ(yi − x⊤ Φ β)/σ , κ(θ )( Ω − ρρ ) i η i

κ(θi )1/2 σ κ(θi )1/2 σ    ¯ p di ; 0, κ(θi )Ωη + (1 − si ) ln 1 − Φ i=1

by (12) and (18). Thus, the computation of the transition probability in (25) only involves the conditional distribution of y, given θ . In the Monte Carlo ECME algorithm, suppose θ (1k ) , θ (2k ) , . . . , θ (Mk ) are sampled from the M–H algorithm. Then, we need to calculate the approximate Q -function at the (k + 1)th iteration of the E-step, which is defined by



ˆ (k) Qˆ Θ |Θ



=

M  1   ˆ (k) , y, s, θ (mk ) E lc (Θ ; u, y, s)|Θ M m=1



=−

N M 1  

2M

(mk )

si ln{κ(θi

− 2 yi − xi β ρ ⊤

+

N 

N  i =1

i=1

m=1



)ψ} +



∗⊤

(mk )

si ln |κ(θi

(m ) Ωη−1 ςˆ i k

) Ωη | +



∗⊤

N 

si (mk )

κ(θi

 )ψ

yi − x⊤ i β

(m ) Ωη−1 τˆi k Ωη−1 ρ∗

si (m )

2



   −1 (mk ) tr Ωη τˆi

k ) i=1 κ(θi   M N N    1   ( mk ) ( mk ) ¯ p (di ; 0, κ(θi )Ωη ) + (1 − si ) ln 1 − Φ ln h(θi |ζ) , +

i=1

M

where θ

(mk )

(mk )

m=1

i =1

i=1

(mk ) ⊤

(mk )

(mk )

= (θ1 , . . . , θN ) , and ςˆ i and τˆi   (m ) ˆ (k) , yi , si = 1, θi(mk ) , ςˆ i k = E Udi |Θ   (m ) ˆ (k) , yi , si = 1, θi(mk ) . τˆi k = E Udi U⊤ di |Θ

are the conditional expectations, defined as

Here, the conditional distribution is

 ˆ (k) , yi , si = 1, θi(mk ) ] ∼ TN di  [Udi |Θ

(k)

(k) ˆ ρˆ∗ (yi − x⊤ i β ) (k) ˆ η−(k) ρˆ∗ (k) ψˆ (k) + ρˆ∗ Ω ⊤

 (mk )

, κ(θi

ˆ (k)  , )Σ

where

 ˆ (k) = Ω ˆ η(k) − Σ

(k) (k) ρˆ∗ ρˆ∗



(k) ˆ η−(k) ρˆ∗ (k) ψˆ (k) + ρˆ∗ Ω ⊤

 .

These conditional moments can be found in Ho et al. (2012) or Manjunath and Wilhelm (2009). Then the kth iteration of the Monte Carlo ECME algorithm can be implemented by using the procedure described in Appendix A.3. The standard errors of the ML estimates of the Monte Carlo ECME algorithm can be obtained easily using the bootstrap approach, which is described in McLachlan and Krishnan (2008, p. 131). Although the Monte Carlo ECME circumvents evaluating a complicated E-step, a Monte Carlo error is introduced in the E-step and the stable monotone convergence property of the ECME is lost. As a result, specifying M and monitoring convergence are of central importance when using the Monte Carlo ECME. Wei and Tanner (1990) recommend using small values of M in the initial stage, and that values be increased as the algorithm moves closer to convergence. See McLachlan and Krishnan (2008), and references therein, for a discussion of these problems. The MSRMN assumes that the mixing variable, θi , degenerates at κ(θi ) = 1. In this special case

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

431

of the MSRMSMN , where θi has a degenerate distribution, the Monte Carlo ECME algorithm used to estimate the MSRMSMN reduces to the ECME algorithm, because the Q -function does not involve the integration of the Q -function with respect to the θ distribution. Thus, the ECME algorithm for estimating the MSRMN can be obtained from the Monte Carlo ECME algorithm, provided that we exclude the Monte Carlo integration with respect to θ in the Monte Carlo E-step, and set κ(θi ) = 1. In addition, some useful remarks concerning with implementing the algorithm are given in Appendix A.4. 4.2. Test of selection bias As indicated by (17), it is evident that the conventional OLS estimator of the MSRMSMN coefficient vector, β, in (5) produces inconsistent results when ρ ̸= 0. In contrast, the results are consistent when ρ = 0. Thus, in the multivariate selection model, the hypothesis H0 : ρ = 0 is important for testing the existence of the sample selection bias. Following the detailed proof given by Marchenko and Genton (2012), one can easily see that the scores corresponding to the MSRMSMN are not linearly dependent and, hence the observed information and Fisher information matrices for the parameters Θ are nonsingular at ρ = 0. This allows us to construct the likelihood ratio test for H0 : ρ = 0 using the standard likelihood ratio test statistic, defined as

−2(l0 − l1 ), where l0 and l1 denote the maximized log-likelihood functions under H0 and H0 ∪ H1 , respectively. The likelihood ratio test statistic asymptotically follows χp2 under H0 as n → ∞. We refer to Wilks (1962) for the proof. 5. Numerical illustrations 5.1. Finite-sample properties of the MLEs We carry out a simulation to evaluate the performance of the Monte Carlo ECME algorithm specified in Section 4. Here, we considered the MSRMN and MSRMtν with two selection rules. The data of explanatory variables in both models are generated by the following distributions. y∗i = 0.5 + 1.5xi + ϵi ,

i.i.d.

and xi ∼ N (0, 1),

i = 1, . . . , N ,

(26)

with selection equations



i.i.d.

u∗1i = w11i + 1.5w12i + η1i ,

w11i = 1 + xi , where w12i ∼ N (0, 1),

u∗2i = w21i + 2.0w22i + η2i ,

w21i = 1 + 2xi , where w22i ∼ N (0, 1).

i.i.d.

To sample y∗i ’s from the MSRMN and MSRMtν , the error terms were generated from a trivariate normal and trivariate tν distribution, respectively:



ϵi η1i η2i



ϵi η1i η2i







σ2 ∼ N3 0, σ ρ12 σ ρ13

i.i.d.

σ ρ12 1

ρη

 σ ρ13 ρ η  ,

i = 1, . . . , N

1

and







σ2   ∼ t3 0, σ ρ12 σ ρ13

i.i.d.

σ ρ12 1

ρη

  σ ρ13 ρη  , ν = 5 ,

i = 1, . . . , N ,

1

where σ = 1, ρ12 = 0.5, and ρ13 = 0.7. For the simulation, we consider several values of correlation ρη ∈ {0.0, 0.3, 0.5, 0.8} and sample size N ∈ {100, 500}. With regard to the sample-selection scheme, we observe only N1 of the N observations in y∗i , for which −1 < u∗1i < 2.5 and −1 < u∗2i < 2.5 (i.e., yi = si y∗i , where si = I (−1 < u∗1i < 2.5, −1 < u∗2i < 2.5)). In the simulation, the sample-selection rate, N1 /N × 100%, corresponds to about 25%–50% of all cases. Using the selected observations of size N1 generated from each MSRM model, we estimated its parameters by using the Monte Carlo ECME algorithm as well as the regression model (26) by using the ordinary least square method (OLS). The averages of the estimation results for each model, based on R = 200 replications of the Monte Carlo ECME algorithm and the OLS, are presented in Tables 1 and 2. The tables give a summary (mean and standard deviation) of the OLS and the ML estimates obtained from the Monte Carlo ECME algorithm applied to 200 sets of censored samples. Note that the MSRMN assumes that the mixing variable, θ , degenerates at κ(θ ) = 1. In this case, as noted in Section 4.1, the Monte Carlo ECME algorithm reduces to the ECME algorithm. Thus, for this simulation study, we applied the Monte Carlo i.i.d.

E-step with M = 1000 to the MSRMt5 using the proposal density for the H–M algorithm defined by θi ∼ Gamma(ν/2, ν/2), with ν = 5 and κ(θ ) = 1/θ , while the MSRMN did not involve the Monte Carlo integration in the E-step. We used the ˆ (k+1) − Θ ˆ (k) ∥ < 0.01 for the Monte Carlo ECME iteration. Since the convergence of the ECME algorithm is stopping rule ∥Θ

432

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

Table 1 OLS and ML estimates of the MSRMN : Standard errors in parentheses. N

Method

β0 = 0.5

β1 = 1.5

γ11 = 1

γ12 = 1.5

γ21 = 1

γ22 = 2

σ =1

ρ12 = 0.5

ρ13 = 0.7

ρˆ η

ρη = 0.0 100

500

ML

0.547 (0.147)

1.472 (0.131)

1.339 (0.633)

1.852 (0.594)

1.089 (0.298)

2.168 (0.491)

0.971 (0.080)

0.424 (0.343)

0.672 (0.305)

0.069 (0.255)

OLS

0.435 (0.186)

1.194 (0.189)

– –

– –

– –

– –

0.924 (0.243)

– –

– –

– –

ML

0.488 (0.056)

1.496 (0.060)

1.048 (0.202)

1.562 (0.196)

1.029 (0.116)

2.054 (0.197)

0.996 (0.043)

0.478 (0.131)

0.702 (0.106)

0.011 (0.080)

OLS

0.450 (0.172)

1.159 (0.115)

– –

– –

– –

– –

0.925 (0.112)

– –

– –

– –

ρη = 0.3 100

500

ML

0.519 (0.162)

1.473 (0.153)

1.321 (0.624)

1.794 (0.594)

1.082 (0.326)

2.127 (0.439)

0.982 (0.091)

0.461 (0.404)

0.692 (0.314)

0.307 (0.259)

OLS

0.450 (0.172)

1.194 (0.258)

– –

– –

– –

– –

0.943 (0.246)

– –

– –

– –

ML

0.508 (0.061)

1.492 (0.066)

1.030 (0.186)

1.521 (0.187)

1.018 (0.119)

2.016 (0.091)

0.998 (0.042)

0.495 (0.161)

0.702 (0.114)

0.285 (0.104)

OLS

0.443 (0.075)

1.186 (0.096)

– –

– –

– –

– –

0.930 (0.097)

– –

– –

– –

ρη = 0.5 100

500

ML

0.517 (0.165)

1.485 (0.165)

1.230 (0.572)

1.697 (0.537)

1.105 (0.346)

2.166 (0.474)

0.975 (0.083)

0.526 (0.364)

0.668 (0.279)

0.438 (0.260)

OLS

0.453 (0.171)

1.179 (0.270)

– –

– –

– –

– –

0.934 (0.259)

– –

– –

– –

ML

0.500 (0.065)

1.499 (0.065)

1.054 (0.203)

1.547 (0.205)

1.024 (0.108)

2.043 (0.177)

0.998 (0.040)

0.522 (0.164)

0.696 (0.100)

0.495 (0.064)

OLS

0.445 (0.084)

1.182 (0.108)

– –

– –

– –

– –

0.933 (0.101)

– –

– –

– –

ρη = 0.8 100

500

ML

0.498 (0.159)

1.500 (0.167)

1.327 (0.637)

1.756 (0.601)

1.126 (0.336)

2.184 (0.486)

0.998 (0.090)

0.519 (0.350)

0.677 (0.287)

0.799 (0.003)

OLS

0.454 (0.180)

1.176 (0.279)

– –

– –

– –

– –

0.945 (0.258)

– –

– –

– –

ML

0.507 (0.064)

1.497 (0.065)

1.036 (0.179)

1.538 (0.188)

1.018 (0.110)

2.014 (0.179)

0.996 (0.039)

0.500 (0.144)

0.686 (0.105)

0.800 (0.001)

OLS

0.455 (0.083)

1.203 (0.101)

– –

– –

– –

– –

0.933 (0.104)

– –

– –

– –

maintained, the iterations were repeated until the stopping rule was satisfied, and then gave the ML estimates. We visualized the variability in the parameter estimates of the MSRMN using box plots of the 200 replicated estimates of the regression coefficients, β0 , β1 , γ11 , γ12 , γ21 , and γ22 , as shown in Fig. 4. All summary values given in Table 1, Table 2, and Fig. 4 indicate the following. First, the Monte Carlo ECME algorithm performs well in estimating the parameters of the MSRMN (i.e., multivariate Heckman model) and the MSRMt5 , irrespective of the values of ρη and N. Second, for all the cases, the bias of the OLS estimate of β1 is so large that a confidence interval of β1 cannot include its true value. This highlights the performance of the two MSRMs. That is, for the regression with a selection sample, the ML estimates of the two MSRM eliminate the selection bias (or inconsistency), which could not be done by using the OLS estimates. Third, it is shown that the larger sample size, N (or N1 ), tends to yield the better ML estimation results. Fig. 4 represents this phenomenon. Fourth, as expected, the ML estimates of the MSRMt5 with heavy-tailed errors tend to give larger estimation errors than those of the MSRMN . Finally, we can check from each table that as the value of ρη is small, the estimation errors become large. This coincides with the fact that the smaller value of ρη corresponds to the smaller sample-selection rate (about 25% for ρη = 0) for the sample-selection scheme si = I (−1 < u∗1i < 2.5, −1 < u∗2i < 2.5).

5.2. Real data example The purpose of this data example is to show the bias problem in the regression with a multivariate selection (or segmented) data and to demonstrate the performance of the MSRMSMN in correcting the sample selection bias. This example analyzes the survey data of customers of HATCO (a paper company) obtained from the Prentice Hall data repository (www. prenhall.com/hair). The customer data consist of 100 observations. The descriptive statistics for the variables obtained from

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

433

Table 2 OLS and ML estimates of the MSRMt5 . N

Method

β0 = 0.5

β1 = 1.5

γ11 = 1

γ12 = 1.5

γ21 = 1

γ22 = 2

σ =1

ρ12 = 0.5

ρ13 = 0.7

ρˆ η

100

ML

ρη = 0.0 0.591 (0.151)

1.401 (0.183)

0.783 (0.413)

1.388 (0.346)

0.989 (0.139)

1.895 (0.317)

0.954 (0.125)

0.413 (0.209)

0.616 (0.168)

0.140 (0.292)

OLS

0.430 (0.211)

1.088 (0.153)

– –

– –

– –

– –

1.166 (0.431)

– –

– –

– –

ML

0.518 (0.056)

1.549 (0.060)

1.129 (0.202)

1.597 (0.196)

1.034 (0.116)

2.110 (0.197)

0.978 (0.105)

0.462 (0.131)

0.765 (0.106)

0.062 (0.080)

OLS

0.426 (0.108)

1.065 (0.137)

– –

– –

– –

– –

1.114 (0.178)

– –

– –

– –

ρη = 0.3 0.603 (0.150)

1.239 (0.518)

0.762 (0.473)

1.344 (0.373)

0.971 (0.168)

1.911 (0.305)

0.943 (0.141)

0.404 (0.180)

0.630 (0.176)

0.423 (0.281)

OLS

0.398 (0.219)

1.100 (0.172)

– –

– –

– –

– –

0.112 (0.417)

– –

– –

– –

ML

0.478 (0.092)

1.413 (0.106)

0.973 (0.174)

1.609 (0.210)

1.024 (0.134)

2.051 (0.127)

1.052 (0.1092)

0.486 (0.177)

0.733 (0.132)

0.325 (0.125)

OLS

0.440 (0.100)

1.122 (0.137)

– –

– –

– –

– –

1.173 (0.228)

– –

– –

– –

ρη = 0.5 0.589 (0.153)

1.359 (0.189)

0.772 (0.495)

1.411 (0.381)

0.946 (0.171)

1.933 (0.285)

0.956 (0.136)

0.355 (0.203)

0.672 (0.167)

0.459 (0.275)

OLS

0.429 (0.241)

1.121 (0.131)

– –

– –

– –

– –

1.060 (0.321)

– –

– –

– –

ML

0.524 (0.097)

1.428 (0.083)

1.133 (0.184)

1.587 (0.198)

1.043 (0.132)

1.974 (0.161)

1.018 (0.104)

0.596 (0.142)

0.726 (0.103)

0.439 (0.095)

OLS

0.429 (0.100)

1.135 (0.132)

– –

– –

– –

– –

1.185 (0.204)

– –

– –

– –

ρη = 0.8 0.585 (0.152)

1.367 (0.181)

0.887 (0.507)

1.441 (0.376)

0.964 (0.138)

2.013 (0.306)

0.948 (0.115)

0.315 (0.198)

0.608 (0.181)

0.768 (0.265)

OLS

0.440 (0.219)

1.143 (0.165)

– –

– –

– –

– –

1.178 (0.210)

– –

– –

– –

ML

0.518 (0.097)

1.469 (0.089)

1.104 (0.191)

1.542 (0.189)

1.121 (0.201)

2.011 (0.176)

0.981 (0.108)

0.452 (0.152)

0.629 (0.121)

0.742 (0.074)

OLS

0.446 (0.094)

1.149 (0.124)

– –

– –

– –

– –

1.197 (0.189)

– –

– –

– –

500

100

ML

500

100

ML

500

100

ML

500

Table 3 Descriptive statistics. Variable

y∗

y1

y2

x1

x2

x3

x4

Mean s.d. p-value

46.100 8.988 0.320

2.916 0.751 0.365

4.771 0.855 0.074

3.515 1.321 0.341

2.364 1.195 0.017

7.894 1.386 0.001

5.248 1.131 0.183

the 100 observations are listed in Table 3. The statistics include the sample mean, sample standard deviation (s.d.), and the p-value of the Shapiro–Wilks statistic for testing the normality of each variable. Among the 100 customers, we chose 51 to have missing values of the usage level (y∗ ) by using a bivariate sample selection scheme based on the overall service (y1 ) and satisfaction levels (y2 ) of HATCO. The usage level (y∗ ) indicates how much of the firm’s total product is purchased from HATCO, measured on a 100-point percentage scale. The other variables in Table 3 are measured by a scale ranging from 0 to 10, where 5 indicates an ‘‘average’’ level of customer’s evaluation. The dataset includes two explanatory variables, delivery speed (x1 ) and price level (x2 ), which measure customers’ perceptions of HATCO. With regard to the selection scheme, we observed only N1 = 49 values of y∗ of the 100 customers, for which 2.5 ≤ y1 ≤ 5 and 4.5 ≤ y2 ≤ 6 (i.e. potential patrons with dissatisfaction in the overall service). To show the performance of the MSRM in correcting the sample-selection bias in estimating the regression coefficients of x1 and x2 , we set up the MSRMSMN for the data as follows: y∗i = β0 + β1 x1i + β2 x2i + ϵi ,

i = 1, . . . , 100,

(27)

A

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

B

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

434

β0

β1

γ11

γ12

γ21

γ 22

Fig. 4. Box plots of the ML estimates of MSRMN coefficients over 200 replications for the case where: (A) N = 100 and ρη = 0.8; (B) N = 500 and

ρη = 0.8.

Table 4 Estimate of each method and its standard error. Method

OLSOrigin

OLSRWC

MLMSRM N

MLMSRM t5

β0 β1 β2 γ11 γ12 γ21 γ22 σ ρ12 ρ13 ρη

20.451 (2.509) 5.465 (0.486) 2.724 (0.532) 0.780 (0.086) −0.331 (0.086) 0.682 (0.062) 0.444 (0.062) 5.934 (0.181) – – –

41.750 (6.039) 2.166 (1.055) 0.094 (0.857) – – – – – – – –

21.020 (4.088) 5.819 (1.600) 1.926 (0.896) 0.820 (0.143) −0.530 (0.143) 0.718 (0.096) 0.314 (0.134) 6.532 (0.225) 0.579 (0.154) 0.508 (0.187) 0.412 (0.163)

23.375 (4.394) 4.671 (1.716) 1.856 (0.843) 0.720 (0.168) −0.548 (0.157) 0.624 (0.089) 0.321 (0.141) 6.727 (0.314) 0.612 (0.216) 0.498 (0.169) 0.381 (0.157)

with the bivariate selection equations, say manufacturer’s overall service and customer’s satisfaction levels: u∗1i = γ11 z1i + γ12 z2i + η1i ,

(28)

u∗2i = γ21 z1i + γ22 z3i + η2i , so that yi = si y∗i

and

si = I (2.5 ≤ y1i ≤ 5, 4.5 ≤ y2i ≤ 6),

where u∗1i and u∗2i are the standard scores of y1i and y2i , respectively. In turn, z1i , z2i , and z3i denote the standardized scores of delivery speed (x1 ), price flexibility (x3 ), and manufacturer image (x4 ) of HATCO, as evaluated by the ith customer. Using the original unselected dataset of size N = 100, we obtained the conventional OLS estimates of each regression equation in (27) and (28), and then denoted their estimates by OLSOrigin . Thus, OLSOrigin can be taken as unbiased and consistent estimates of the regression coefficients in (27) and (28). On the other hand, we used three methods to estimate (27) and (28) using the bivariate selection dataset with N1 = 49. The first was the usual OLS (OLSRWC ). Using this method, we cannot estimate the parameters associated with (28). Second is the ML method, assuming MSRMN for (27) and (28), and the third is the ML method under the assumption of MSRMt5 for the bivariate selection regression. The ML estimates were calculated using the Monte Carlo MCEM algorithm described in Section 4 and their standard errors were calculated using the bootstrap method. We used 100 bootstrap replications, which are sufficient to calculate the standard errors (cf. McLachlan & Krishnan, 2008, p. 131). The results are presented in Table 4 with corresponding standard errors in the parentheses. As expected, Table 4 reveals substantial differences in the magnitudes of the regression coefficients of the independent variable, with and without correction for selection bias. Specifically, the first row shows that customer satisfaction in the delivery speed (x1 ) and price level (x2 ) positively influences the usage level (y∗ ) of HATCO products (i.e., β1 and β2 are significant). The ML estimates of β1 and β2 in the third and fourth rows agree with the evidence of the first row obtained

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

435

using OLSOrigin . However, this evidence is not statistically significant in the second row because the estimate of β2 and its standard error, calculated using OLSRWC , indicate that β2 = 0. This may lead us to make a biased analysis of customer’s (classified as potential patrons) perception by saying that customers satisfaction in the price level (x2 ) does not effect the usage level (y∗ ) in purchasing HATCO products. Thus, a marketing strategy of HATCO should focus on the speeding-up of the delivery service for the potential patrons group. For the bivariate selection equations in (28), the ML estimations results indicates that respective delivery speed (x1 ) and price flexibility (x3 ) bring significant positive and negative effects about the manufacturer’s overall service (y1 ). The negative effect of the price flexibility may be due to an immediate price raising. On the other hand, both delivery speed (x1 ) and manufacturer image (x4 ) seem to have significant positive effect on the customer’s satisfaction levels (y2 ) on HATCO. The maximum log-likelihood values of the MSRMN and MSRMt5 defined by (27) and (28) were lMSRM N = −202.114 and lMSRM t5 = −209.236, respectively. This advocates using MSRMN as the bivariate selection model for the sample selected customer dataset. Based on this result, we carried out the approximate LRT test, described in Section 4.2, for testing H0 : ρ = 0 under the advocated MSRMN . The observed likelihood ratio test statistic obtained from the data was 23.448. The test, based on χ(22) distributions, yielded p-value < 0.001 for the data, which were significant. This test result suggests evidence of the existence of bivariate selection in the regression. Thus, we can conclude that the significance differences in the regression coefficient estimates between OLSRWC and MLMSRM N in Table 4 are originated from the bivariate sample-selection bias defined in (17).

6. Discussion In this paper, we introduced MSRM (multivariate selection regression model) for the robust regression analysis of an incomplete data caused by a multivariate sample-selection scheme applied to the dependent variable. In addition, we proposed a Monte Carlo MCEM with MCMC algorithm for the efficient estimation of the MSRM. Indeed, the class of MSRM’s is flexible enough to include extensions of the Heckman model and Heckman selection-t model to the case of multivariate selection. We derived an exact expression for the sample-selection bias in the MSRM, which enabled us to construct a likelihood test for the significance of this bias. Therefore, present paper introduces a new class of sample-selection models which allows for a multivariate extension of classic Heckman model, the bounded multivariate selection schemes, and the robustness property which is flexible enough to model non-normal regression errors. Then, using various properties of the MSRM, this paper provides the multivariate sample-selection bias correction procedure along with an efficient inference methods for the model. Some numerical studies confirm the usefulness of the class of MSRM in correcting the multivariate sample-selection bias.

Acknowledgments The first author’s research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2015R1D1A1A01057106). The corresponding author’s research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2015R1D1A1A01059161).

Appendix

A.1. Proof of Corollary 1 Using the elliptical generator function in (8), we can get an alternative and convenient expression for the conditional density (11) in Lemma 1 as f (yi |si = 1) =

|Ω |−1/2



ηi ∈di

h(p+1) [(e∗i − ψ ∗i )⊤ Ω −1 (e∗i − ψ ∗i )]dηi

|Ωη |−1/2



ηi ∈di

−1 h(p) [η⊤ i Ωη ηi ]dηi

.

Applying the generator function (13) to (29), we see that f (yi |si = 1) is

|Ω |−1/2



(2π κ(θ ))−(p+1)/2 exp{−(e∗i − ψ∗i )⊤ Ω −1 (e∗i − ψ∗i )/2κ(θ )}dH (θ )dηi  ∞ , −1 |Ωη |−1/2 η ∈di 0 (2π κ(θ ))−p/2 exp{−η⊤ i Ωη ηi /2κ(θ )}dH (θ )dηi

ηi ∈di

∞ 0

i

which is equivalent to fSSMN (yi |si = 1) for yi ∈ R.

(29)

436

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

A.2. Proof of Lemma 2 Let V ∼ N (0, κ(θ )) and ηi be independent conditional on θ . Then, from the MSRMSMN with the error distribution, d

d

⊤ −1 ⊤ −1 1/2 ⊤ −1 ei ∼ G ∈ F , we see that [yi |si = 1, θ] = [x⊤ V |θ , ηi ∈ di ] = [x⊤ i β+σ ρ Ωη ηi +σ (1 −ρ Ωη ρ) i β+σ ρ Ωη Udi +σ (1 −

ρ

Ωη−1 ρ)1/2 V |θ]

∼ ESN 1 (di ; ψi , κ(θ )Ω ) by Kim and Kim (2015). The stochastic representation of the ESN 1 (di ; ψ∗i , κ(θ )Ω ) ⊤ −1 1/2 ⊤ −1 V |Udi = distribution can be expressed by using two stages of the hierarchy: [x⊤ i β + σ ρ Ωη udi + σ (1 − ρ Ωη ρ) ⊤



d

2 ⊤ −1 ⊤ −1 udi , θ] ∼ N (x⊤ i β + σ ρ Ωη udi , κ(θ )σ (1 − ρ Ωη ρ)) for the first stage and [Udi |θ ] = [ηi |ηi ∈ di , θ] for the second stage. Finally, mixture of the ESN 1 (di ; ψ ∗i , κ(θ )Ω ) distribution using the third stage of the hierarchical distribution, θ ∼ H (θ ), immediately yields the density (15) of SSMN (di ; ψ ∗i , Ω , κ, H ).

A.3. The Monte Carlo ECME algorithm The kth iteration of the ECME algorithm can be implemented as follows: Monte Carlo E-step: (mk )

ˆ (k) , compute ςˆ i Given the sample from the M–H algorithm, θ (1k ) , θ (2k ) , . . . , θ (Mk ) , and the parameter vector Θ = Θ (mk )

and τˆi

.

CM steps:

ˆ (k) by 1. Update ψ M  N 

1

ψˆ (k+1) = M

N 

si

m=1 i=1



si (mk )

κ(θi

)

ˆ yi − x⊤ i β

 (k) 2

i=1

   ∗(k) ⊤ ˆ −(k) (mk ) ˆ −(k) ∗(k) −(k) (mk ) ˆ (k) ρˆ ∗(k) ⊤ Ω ˆ ˆ ˆ ς ˆ + ρ Ω τ ˆ − 2 yi − x⊤ β Ω ρ . η η i η i i 2. Update ρˆ ∗(k) by

 ρˆ ∗

(k+1)

=

M  N  m=1 i=1

ˆ 3. Update β

(k)

si

ˆ η−(k) τˆi(mk ) Ω ˆ η−(k) Ω (mk )

κ(θi

−1

)

M  N  m=1 i=1

si



(mk )

κ(θi

)

ˆ yi − x⊤ i β

(k)



ˆ η−(k) ςˆ i(mk ) . Ω

by

(k+1) βˆ =



M  N  m=1 i=1

 −1

si



xx (mk ) i i

κ(θi

)

M  N  m=1 i=1

si (mk )

κ(θi

  )

xi yi − ρˆ

∗(k+1)⊤

ˆ η−(k) ςˆ i(mk ) Ω



.

ˆ η(k+1) to maximize the constrained actual log-likelihood 4. Calculate Ω M  N  (k+1)   (m ) ˆ l β , γˆ (k) , ψˆ (k+1) , ρˆ ∗(k+1) , Ωη ; yi , si , θ k .

(30)

i

m=1 i=1

5. Calculate γˆ

(k+1)

to maximize the constrained actual log-likelihood

M  N  (k+1)   ˆ ˆ η∗(k+1) ; yi , si , θi(mk ) . l β , γ, ψˆ (k+1) , ρˆ ∗(k+1) , Ω

(31)

m=1 i=1

6. Update ζˆ

(k+1)

M  N 





ˆ (k) , by maximizing the last term of the approximate Q -function, Qˆ Θ |Θ (mk )

ln h(θi

|ζ).

m=1 i=1

A.4. Remarks on the Monte Carlo ECME algorithm The constrained actual log-likelihood can be evaluated by the updated parameters, and then approximated by the Monte Carlo integration.

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

437

Remark. 1. The constrained approximate actual log-likelihood (30) is given by

        ⊤ ˆ (k+1) (k+1)   ρˆ ∗ yi − xi β  (mk ) (k+1)  ¯p  ) Σ , κ(θ dˆ i ; si ln Φ  Ωη i   ϱˆ (k+1)   m=1 i=1     ¯ p dˆ i ; 0, κ(θi(mk ) )Ωη  + (1 − si ) ln 1 − Φ  , where M

N

ˆ (k+1) Ωη−1 ρˆ ∗(k+1) , ΣΩ(kη+1) = Ωη − ρˆ ∗(k+1) ρ∗ ˆ (k+1) /ϱˆ (k+1) , dˆ i = (a − cˆ i , b − cˆ i ), a = (a1 , . . . , aq )⊤ , ϱˆ (k+1) = ψˆ (k+1) + ρ∗  ⊤ ˆ (1k) , . . . , w⊤ ˆ (qk) . b = (b1 , . . . , bq )⊤ , and cˆ i = w⊤ iq γ i1 γ ⊤



2. The constrained actual log-likelihood (31) is given by

        ⊤ ˆ (k+1) (k+1)   ˆ ρ ∗ y − x β  i i  (m ) ¯p  , κ(θi k )ΣΩ(ˆk+1)  di ; si ln Φ η   ϑˆ (k+1)   m=1 i=1     ¯ p di ; 0, κ(θi(mk ) )Ω ˆ η(k+1)  + (1 − si ) ln 1 − Φ , M

N

where ⊤

ˆ (k+1) ρˆ ∗(k+1) ρ∗ . ( k + ϱˆ 1) 3. Step 4 of the CM steps can be simplified greatly if dimension p is low (e.g., p = 2 or 3) because there exists an exact formula for Ωη−1 . In fact, the number of parameters is 1 or 3 when p is 2 or 3, respectively. Note that when p is 2, step 4 ⊤ −(k+1) ˆη ˆ (k+1) Ω ϑˆ (k+1) = ψˆ (k+1) + ρ∗ ρˆ ∗(k+1)

(k+1)

and Σ ˆ Ωη

ˆ η(k+1) − =Ω

of the CM steps is just one-dimensional maximization. 4. When p = 1, Ωη in step 4 of the CM steps is 1 by assumption (3), so there is no need for a calculation. 5. When p = 1, the constrained actual log-likelihood (31) further simplifies to

      (k+1)     M  N   2(k+1) ρˆ ∗(k+1) yi − x⊤ βˆ  i ρˆ ∗   (m ) ¯1  , κ(θi k ) 1 − d i ; si ln Φ    ϑˆ (k+1) ϑˆ (k+1)   m=1 i=1     ¯ 1 di ; 0, κ(θi(mk ) )  + (1 − si ) ln 1 − Φ , ⊤ ˆ (k+1) = ψˆ (k+1) + ρˆ ∗2(k+1) . where di = (a − w⊤ i1 γ 1 , b − wi1 γ 1 ) and ϑ

References Amemiya, T. (1985). Advanced econometrics. Cambridge: Harvard University Press. Arellano-Valle, R. B., Branco, M. D., & Genton, M. G. (2006). A unified view on skewed distributions arising from selections. The Canadian Journal of Statistics, 34, 581–601. Azzalini, A., & Capitanio, A. (1999). Statistical applications of the multiovariate skew-normal distribiton. Journal of the Royal Statistical Society: Series B, 61, 579–602. Besanko, D., Dube, J.-P., & Gupta, S. (2000). Competitive price discrimination strategies in a vertical channel using aggregate retail data. Management Science, 49, 1121–1138. Catsiapis, G., & Robinson, C. (1982). Sample selection bias with multiple selection rules: An application to student aid grants. Journal of Econometrics, 18, 351–368. ¯ 60, Chen, M.-H., & Dey, D. K. (1998). Bayesian modeling of correlated binary responses via scale mixture of multivariate normal link functions. Sankhya, 322–343. Das, M., Newey, W. K., & Vella, F. (2003). Nonparametric estimation of sample selection models. Review of Economic Studies, 70, 33–58. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via EM algorithm (with discussion). Journal of the Royal Statistical Society: Series B, 39, 1–38. Fang, K. T., Kotz, S., & Ng, K. W. (1990). Symmetric multivariate and related distributions. New York: Chapman & Hall. Greene, W. H. (2008). Econometric analysis (6th ed.). Upper Saddle River, NJ: Prentice-Hall. Griliches, Zvi, Hall, G. H., & Hausman, J. A. (1977). Missing data and self-selection in large panels. Harvard Institute of Economic Research discussion paper no. 573, Harvard University. Heckman, J. J. (1974). Shadow prices, market wages, and labor supply. Econometrica, 42, 679–694. Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153–161.

438

H.-J. Kim, H.-M. Kim / Journal of the Korean Statistical Society 45 (2016) 422–438

Hevia, J. D., & Arrazola, M. (2008). Three measures of returns to education: An illustration for the case of Spain. Economics of Education Review, 27, 266–275. Hevia, J. D., & Arrazola, M. (2009). Marginal effects in the double selection regression model: an illustration for the wage of women in Spain. Economics Bulletin, 29, 611–621. Ho, U. J., Lin, T. I., Chen, H. Y., & Wang, W. L. (2012). Some results on the truncated multivariate t distribution. Journal of Statistical Planning and Inference, 142, 25–40. Hoffmann, R., & Kassouf, A. L. (2005). Deriving conditional and unconditional marginal effects in log earnings equations estimated by Heckman’s procedure. Applied Economics, 37, 1303–1311. Jiang, T., & Tuzhilin, A. (2006). Segmenting customers from population to individuals: Does 1-to-1 keep your customers forever? IEEE Transactions on Knowledge and Data Engineering, 18, 1–15. Kim, H. J. (2008a). A class of weighted multivariate normal distributions and its properties. Journal of Multivariate Analysis, 99, 1758–1771. Kim, H. J. (2008b). Moments of truncated Student-t distribution. Journal of the Korean Statistical Society, 37, 81–87. Kim, H. J., & Kim, Hyoung-Moon (2015). A class of rectangle-screened multivariate normal distributions and its applications. Statistics, 49, 878–899. Lee, L. F. (1983). Generalized econometric models with selectivity. Econometrica, 51, 507–512. Liu, C., & Rubin, D. B. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika, 81, 633–648. Manjunath, B. G., & Wilhelm, S. (2009). Moments calculation for the double truncated multivariate normal density. Working Paper. Available at SSRN: http://ssrn.com/abstract=1472153. Marchenko, Y. V., & Genton, M. G. (2012). A Heckman selection-t model. Journal of the American Statistical Association, 107, 304–317. Marcus, C. (1998). A practical yet meaningful approach to customer segmentation. Journal of Consumer Marketing, 15, 494–504. McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd ed.). Hoboken, NJ: John Wiley & Sons. Mohanty, M. S. (2001). Determination of participation decision, hiring decision, and wages in a double selection framework: Male–female wage differentials in the US labor market revisited. Contemporary Economic Policy, 19, 197–212. Newey, W. K. (1999). Two-step series estimation of sample selection models. Working Paper no. 99–04. Massachusetts Institute of Technology, Cambridge, MA. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592. Serumaga-Zake, P. A. E., & Naudé, W. A. (2003). Private rates of return to education of Africans in South Africa for 1995: A Double Hurdle model. Development Southern Africa, 20, 515–528. Wei, G. C. G., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85, 699–704. Wilks, S. S. (1962). Mathematical statistics. New York: John & Wiley.