n-Consistent Estimation of Heteroskedastic Sample ... - CiteSeerX

3 downloads 0 Views 308KB Size Report
We are grateful to J. Powell and B. Honor e for their helpful comments. .... For any s(A)?dimensional vector , we let (l) denote its lth component, and for any ..... max. 1 i n krnik = op(n ?1=4) by lemma A.2. It thus follows by the bound on k( ) that.
pn-Consistent Estimation of Heteroskedastic Sample Selection Models Songnian Chen

Hong Kong University of Science and Technology

Shakeeb Khan

University of Rochester October 1999 (Preliminary and Incomplete Draft)

Abstract This paper considers estimation of a sample selection model subject to conditional heteroskedasticity in both the selection and outcome equations. The form of heteroskedasticity allowed for in each equation is multiplicative, and each of the two scale functions is left unspeci ed, except for mild regularity conditions such as smoothness and boundedness. A three step estimator for the parameters of interest in the outcome equation is proposed. The rst two stages involve nonparametric estimation of the \propensity score" and the conditional interquartile range of the outcome equation, respectively. The third stage is based on the condition that the dependent variable and regressors in the outcome equation, when weighted by the inverse of the conditional interquartile range, can be expressed as a semilinear model with the nonparametric component a function of the propensity score. The parameters of interest can be estimated by applying any of the existing estimators for the semilinear model, and in the third stage we consider an approach analogous to that adopted in Ahn and Powell(1993). Under standard regularity conditions the proposed estimator p is shown to be n-consistent and asymptotically normal, and the form of its limiting covariance matrix is derived.

Key Words: sample selection model, heteroskedasticity, propensity score, interquartile range, semilinear model.

 Corresponding author. Department of Economics, University of Rochester, Rochester, NY 14627; e-mail:

[email protected]. We are grateful to J. Powell and B. Honore for their helpful comments.

1 Introduction and Motivation Estimation of economic models is often confronted with the problem of sample selectivity, which is well known to lead to speci cation bias if not properly accounted for. Sample selectivity arises from nonrandomly drawn samples which can be due to either self-selection by the economic agents under investigation, or by the selection rules established by the econometrician. In labour economics, the most studied example of sample selectivity is the estimation of the labor supply curve, where hours worked are only observed for agents who decide to participate in the labor force. It is well known that the failure to account for the presence of sample selection in the data may lead to inconsistent estimation of the parameters aimed to capture the behavioral relation between the variables of interest. Econometricians typically account for the presence of sample selectivity by estimating a bivariate equation model known as the sample selection model (or using the terminology of Amemiya(1985), the Type 2 Tobit model). The rst equation, typically referred to as the \selection" equation, relates the binary selection rule to a set of regressors. The second equation, referred to as the \outcome" equation, relates a continuous dependent variable, which is only observed when the selection variable is 1, to a set of possibly di erent regressors. Parametric approaches to estimating this model require the speci cation of the joint distribution of the bivariate disturbance term. The resulting model is then estimated by maximum likelihood or parametric \2-step" methods. This approach yields inconsistent estimators if the distribution of the disturbance vector is parametrically misspeci ed and/or conditional heteroskedasticity is present. This negative result has motivated estimation procedures which are robust to either distributional misspeci cation or the presence of conditional heteroskedasticity. Powell(1989) and Ahn and Powell(1993) proposed 2-step estimators which impose no distributional assumptions on the disturbance vector, but neither are robust to the presence of conditional heteroskedasticity in the outcome equation. Alternatively, Donald(1995) proposed a 2-step estimator which allow for general forms of conditional heteroskedasticity, but requires the disturbance vector to have a bivariate normal distribution. Given the observed characteristics of many types of microeconomic data, such as di ering variability across agents with di ering characteristics, as well as empirical distributions exhibiting tails thicker than would be consistent with a Gaussian distribution, it appears important to address both these issues simultaneously. This paper attempts to do so by considering a model which exhibits nonparametric multiplicative heteroskedasticity in each 1

of the two equations. This allows for conditional heteroskedasticity of very general forms, and does not require a parametric speci cation for the distribution of the disturbance vector. The idea of modelling conditional heteroskedasticity through a multiplicative structure is quite common in the econometrics and statistics literature. To site just a few of the many examples, Harvey(1976) adopted a parametric multiplicative structure in his estimation of limited dependent variable models and Engle(1982) also adopted a parametric multiplicative structure in his seminal work introducing ARCH models. In a nonparametric estimation setting, the nonparametric location scale model considered in Fan and Gijbels(1996) adopts a multiplicative scale function which is nonparametrically speci ed. In the estimation of semiparametric limited dependent variable models, Chen and Khan(1999b,c), adopt a nonparametric multiplicative structure to estimate the parameters of interest at the parametric rate. Finally, there are also many examples in the testing literature which adopt multiplicative structures when testing for the presence of conditional heteroskedasticity. In their proposed tests, Koenker and Bassett(1982), Powell(1986) and Maddala(1995) also consider alternatives which have a (parametric) multiplicative structure.

p

In this paper we show that n-consistent estimation of the parameters in the outcome equation is still possible with the presence of nonparametric multiplicative heteroskedasticity. Our estimation approach involves three stages. The rst stage concentrates on the selection equation, estimating the \propensity scores" introduced in Rosenbaum and Rudin(1983). In the second stage, nonparametric quantile regression methods are used to estimate conditional interquartile range of the outcome equation dependent variable for the selected observations. It will be shown that the conditional interquartile range is a product of the outcome equation scale function and an unknown function of the propensity score. This fact implies that when the dependent variable and regressors are transformed by dividing by the conditional interquartile range, the conditional expectation of the transformed outcome equation is of the semilinear form which arises in homoskedastic models. Since the nonparametric component of the semilinear model is a function of the propensity score, the rst stage estimated values can be used in combination with the transformed values to estimate the parameters of interest in a fashion analogous to the approach introduced in Ahn and Powell(1993). The rest of the paper is organized as follows. The next section describes the model in detail, and further details the estimation procedure. Sections 3 and 4 detail the regularity conditions imposed, and establish the asymptotic properties of the estimator, respectively. Section 5 concludes and suggest topics for future research. An appendix collects the proofs of the asymptotic arguments. 2

2 Heteroskedastic Model and Estimation Procedure We consider estimation of the following model:

di = I [wi0  +  (wi)i  0] yi = diyi = di  (x0i +  (xi )i) 0

(2.1) (2.2)

1

0

2

Where 2