Semiparametric Estimation of Fixed Effects Panel

0 downloads 0 Views 238KB Size Report
Apr 2, 2009 - availability of panel data, both theoretical and applied work in panel data analysis ... practice, economists often view the assumptions required for the random ... (2008) nonparametric panel data model with fixed effects as a special case. ..... (b) The Z9C are continuous random variables with a p.d.f. fC(z).
Semiparametric Estimation of Fixed E ects Panel Data Varying Coe cient Models Yiguo Sun Department of Economics, University of Guelph Guelph, ON, Canada N1G2W1 Raymond J. Carroll Department of Statistics, Texas A&M University College Station, TX 77843-3134, USA Dingding Li Department of Economics, University of Windsor Windsor, ON, Canada N9B3P4 April 2, 2009

Abstract We consider the problem of estimating a varying coe cient panel data model with xed e ects using a local linear regression approach. Unlike rst-di erenced estimator, our proposed estimator removes xed e ects using kernel-based weights. This results a one-step estimator without using back- tting technique. The computed estimator is shown to be asymptotically normally distributed. A modi ed least-squared cross-validatory method is used to select the optimal bandwidth automatically. Moreover, we propose a test statistic for testing the null hypothesis of a random e ects against a xed e ects varying coe cient panel data model. Monte Carlo simulations show that our proposed estimator and test statistic have satisfactory nite sample performance.

Key words: Consistent test; Fixed e ects; Panel data; Varying coe cients model.

1

INTRODUCTION

Panel data traces information on each individual unit across time. Such two-dimensional information set enables researchers to estimate complex models and extract information and inferences which may not be possible using pure time-series data or cross-section data. With the increased availability of panel data, both theoretical and applied work in panel data analysis have become more popular in the recent years. Arellano (2003), Baltigi (2005), and Hsiao (2003) provide excellent overview of parametric panel data model analysis. However, it is well known that a misspeci ed parametric panel data model may give misleading inferences. To avoid imposing the strong restrictions assumed in the parametric panel data models, econometricians and statisticians have worked on theories of nonparametric and semiparametric panel data regression models. For example, Henderson, Carroll, and Li (2008) considered the xed-e ects nonparametric panel data model. Henderson and Ullah (2005), Lin and Carroll (2000, 2001, 2006), Lin, Wang, Welsh and Carroll (2004), Lin and Ying (2001), Ruckstuhl, Welsh and Carroll (1999), Wang (2003), and Wu and Zhang (2002) considered the random-e ects nonparametric panel data models. Li and Stengos (1996) considered a partially linear panel data model with some regressors being endogenous via IV approach, and Su and Ullah (2006) investigated a xed-e ects partially linear panel data model with exogenous regressors. A purely nonparametric model su ers from the `course of dimensionality' problem; while a partially linear semiparametric model may be too restrictive as it only allows for some additive nonlinearities.

The varying coe cient model considered in this paper includes both pure non-

parametric model and partially linear regression model as special cases. Moreover, we assume a xed-e ects panel data model. By xed e ects we mean that the individual e ects are correlated with the regressors in an unknown way. Consistent with the well-known results in parametric panel data model estimation, we show that random e ects estimators are inconsistent if the true model is one with xed e ects and that xed e ects estimators are consistent under both random and xed e ects panel data model, although the random e ects estimator is more e cient than the xed e ects estimator when the random e ects model holds true. Therefore, estimation of random e ects models is appropriate only when individual e ects are uncorrelated with regressors. As in practice, economists often view the assumptions required for the random e ects model as being 1

unsupported by the data, this paper emphasizes more on estimating a xed e ects panel data varying coe cient model, and we propose to use the local linear method to estimate unknown smooth coe cient functions. We also propose a test statistic for testing a random e ects against a xed e ects varying coe cient panel data model. Simulation results show that our proposed estimator and test statistic have satisfactory nite sample performances. Recently, Cai and Li (2008) studied a dynamic nonparametric panel data model with unknown varying coe cients. As Cai and Li (2008) allow the regressors not appearing in the varying coe cient curves to be endogenous, the GMM-based IV estimation method plus local linear regression approach is used to deliver consistent estimator of the unknown smooth coe cient curves. In this paper, all the regressors are assumed to be exogenous. Therefore, the least-squared method combining with local linear regression approach can be used to produce consistent estimator of the unknown smoothing coe cient curves. In addition, the asymptotic results are given when the time length is nite. The rest of the paper is organized as follows. In section 2 we set up the model and discuss transformation methods that are used to remove xed e ects. Section 3 proposes a nonparametric xed e ects estimator and studies its asymptotic properties. In section 4 we suggest a statistic for testing the null hypothesis of a random e ects against a xed e ects varying coe cient model. Section 5 reports simulation results that examine the nite sample performance of our semiparametric estimator and the test statistic. Finally we concludes the paper in section 6. The proofs of the main results are collected in an Appendix.

2

FIXED-EFFECTS VARYING COEFFICIENT PANEL DATA MODELS

We consider the following xed-e ects varying coe cient panel data regression model Yit = Xit> (Zit ) +

i

+

it ;

(i = 1; :::; n; t = 1; :::; m)

(1)

where the covariate Zit = (Zit;1 ; :::; Zit;q )> is of dimension q, Xit = (Xit;1 ; :::; Xit;p )> is of dimension p; ( ) = f 1 ( );

;

p(

)g> contains p unknown functions, and all other variables are scalars. None

of the variables in Xit can be obtained from Zit and vice versa. The random errors vit are assumed to be i.i.d. with a zero mean, nite variance

2 v

> 0 and independent of 2

j,

Zjs , and Xjs for all i, j,

s and t. The unobserved individual e ects variance

2

> 0. We allow for

i

i

are assumed to be i.i.d. with a zero mean and a nite

to be correlated with Zit and/or Xit with an unknown correlation

structure. Hence, model (1) is a xed-e ects model. Alternatively, when

i

is uncorrelated with

Zit and Xit , model (1) becomes a random-e ects model. A somewhat simplistic explanation for consideration of xed e ects models and the need for estimation of the function ( ) arises from considerations such as the following. Suppose that Yit is the (logarithm) income of individual i at time period t, and Xit is education of individual i at time period t , e.g., number of years of schooling; and Zit is the age of individual i at time t. The xed e ects term

i

in (1) includes the individual's unobservable characteristics such as ability, e.g., IQ

level, characteristics which are not observable for the data at hand. In this problem, economists are interested in the marginal e ects of education on income, after controlling for the unobservable individual ability factors. Hence, they are interested in the marginal e ects in the income change for an additional year of education regardless of whether the person has high or low ability. In this simple example, it is reasonable to believe that ability and education are positively correlated. If one does not control for the unobserved individual e ects, then one would over-estimate the true marginal e ects of education on income (i.e., with an upwards bias). When Xit

1 for all i and t and p = 1, model (1) reduces to Henderson, Carroll, and Li's

(2008) nonparametric panel data model with xed e ects as a special case. One may also interpret Xit> (Zit ) as an interactive term between Xit and Zit where we allow (Zit ) to have a exible format since the popularly used parametric setup such as Zit and/or Zit2 may be misspeci ed. For a given xed e ects model, there are many ways of removing the unknown xed e ects from the model. The usual rst-di erenced (FD) estimation method deducts one equation from another to remove the time-invariant xed e ects. For example, deducting equation for time t from that for time t

1; we have for t = 2;

Yeit = Yit

Yit

Yeit = Yit

Yi1 = Xit> (Zit )

1

;m

= Xit> (Zit )

Xit>

1

(Zit

1)

+ veit ;

with veit = vit

vit

or deducting equation for time t from that for time 1; we obtain for t = 2; > Xi1 (Zi1 ) + veit ;

3

with veit = vit

vi1 :

1;

(2)

;m (3)

The conventional xed-e ects (FE) estimation method, on the other hand, removes the xed e ects by deducting each equation from the cross-time average of the system, and it gives for t = 2;

;m m

Yeit = Yit =

m X s=1

where qts =

1 X Yis = Xit> (Zit ) m s=1

> qts Xis (Zis ) + veit

1=m if s 6= t and 1

m

1 X > Xis (Zis ) + veit m s=1

with v~it = vit

1=m otherwise, and

Pm

s=1 qts

1 m

Pm

s=1 vis

(4)

= 0 for all t.

Many nonparametric local smoothing methods can be used to estimate the unknown function ( ). However, for each i, the right-hand sides of equations (2)-(4) contain linear combination of Xit> (Zit ) for di erent time t. If Xit contains a time-invariant term, say the rst component of Xit , and let

1 (Zit )

denote the rst component of (Zit ), then a rst di erence of Xit;1 1 (Zit )

Xi;1 1 (Zit ) gives Xi;1 ( 1 (Zit )

1 (Zi;t 1 )),

which is an additive function with the same function

form for the two functions but evaluated at di erent observation points. Kernel based estimator usually requires some back tting algorithms to recover the unknown function, which will su er the common problems as indicated in estimating nonparametric additive model. Moreover, if

1 (Zit )

contains an additive constant term, say (Zit ) = c + g1 (Zit ), where c is a constant, then the rst di erence will wipe out the additive constant c. As a consequence, one cannot consistently estimate 1(

) one were to estimate a rst-di erenced model in general (if Xi;1

averaging Yit

1, one can recover c by

Xit> ^(Zit ) for all cross sections and across time).

Therefore, in this paper we consider an alternative way of removing the unknown xed e ects, motivated by a least squares dummy variable (LSDV) model in parametric panel data analysis. We will describe how the proposed method removes xed e ects by deducting a smoothed version of cross-time average from each individual unit. As we will show later, this transformation method will not wipe the additive constant c in 1(

1 (Zit )

= c + g1 (Zit ). Therefore, we can consistently estimate

) as well as other components of ( ) when at most one of the variables in Xit is time invariant. We will use In to denote an identity matrix of dimension n, and em to denote an m

1 vector

with all elements being ones. Rewriting model (1) in a matrix format yields Y = BfX; (Z)g + D0

0

+ V;

(5)

4

where Y = (Y1> ;

; Yn> )> and V = (v1> ;

; vn> )> are (nm)

1 vectors; Yi> = (Yi1 ; :::; Yin ) and

vi> = (vi1 ; :::; vin ). BfX; (Z)g stacks all Xit> (Zit ) into an (nm) matching that of the (nm) is an (nm)

1 vector of Y ;

=(

0

1;

;

> n)

1 vector with the (i; t) subscript

is an n

n matrix with main diagonal blocks being em , where

1 vector, and D0 = In

em

refers to Kronecker product

operation. However, we can not estimate model (5) directly due to the existence of the xed e ects P term. Therefore, we need some identi cation conditions. Su and Ullah (2006) assume ni=1 i = 0. We show that assuming an i.i.d sequence of unknown xed e ects,

i,

with zero mean and a nite

variance is enough to identify the unknown coe cient curves asymptotically. We therefore impose this weaker version of identi cation condition in this paper. To introduce our estimator, we rst assume that model (1) holds with the restriction

Pn

i=1

i

=0

(note that we do not impose this restriction for our estimator, and this restriction is added here for motivating our estimator). De ne

=(

2;

;

> n) :

We then rewrite (5) as

Y = BfX; (Z)g + D + V; where D = [ en Pn 0 =( i=2 i ;

> 1]

1) matrix. Note that D = P n > 2 ; :::; n ) so that the restriction i=1 i = 0 is imposed in (6).

1

In

(6)

em is an (nm)

(n

De ne an m m diagonal matrix KH (Zi ; z) = diagfKH (Zi1 ; z);

a (nm) KfH

(nm) diagonal matrix WH (z) = diagfKH (Z1 ; z);

1 (Z it

z)g for all i and t; and H = diag(h1 ;

0

em with

; KH (Zim ; z)g for each i; and

; KH (Zn ; z)g, where KH (Zit ; z) =

; hq ) is a q

q diagonal bandwidth matrix.

We then solve the following optimization problem min [Y (Z);

D ]> WH (z)[Y

BfX; (Z)g

BfX; (Z)g

D ];

(7)

where we use the local weight matrix WH (z) to ensure locality of our nonparametric tting, and place no weight matrix for data variation since the fvit g are i.i.d. across equations. Taking rstorder condition with respect to D> WH (z)[Y

BfX; (Z)g

gives Db(z)] = 0;

(8)

which yields b(z) = fD> WH (z)Dg

1

D> WH (z)[Y

BfX; (Z)g]: 5

(9)

De ne SH (z) = MH (z)> WH (z)MH (z) and MH (z) = In where In

m

DfD> WH (z)Dg

m

(Z)

BfX; (Z)g]> SH (z)[Y

Note that MH (z)D

0(nm)

1

H (z),

in (7) by b(z), we

denotes an identity matrix of dimension nm by nm. Replacing

obtain the concentrated weighted least squares min[Y

1 D> W

BfX; (Z)g];

(10)

for all z. Hence, the xed e ects term

is removed in model (10).

To see how MH (z) transforms the data, simple calculations give MH (z) = In

DfA

m

1

1

A

en

> 1 en 1 A

1

=

n X

cH (Zi ; z)gD> WH (z);

i=1

where cH (Zi ; z)

1

=

Pm

t=1 KH (Zit ; z)

We use the formula (A + BCD)

1

for i = 1;

=A

1

A

1; :

; n and A = diagfcH (Z2 ; z) 1 B(DA 1 B

+C

1 ) 1 DA 1

:

; cH (Zn ; z)

1 g.

to derive the inverse

matrix, see Appendix B in Poirier (1995).

3

NONPARAMETRIC ESTIMATOR AND ASYMPTOTIC THEORY

A local linear regression approach is commonly used to estimate non-/semi-parametric models. The basic idea of this method is to apply Taylor expansion up to the second-order derivative. Throughout the paper we will use the notation that An

Bn to denote that Bn is the leading term

of An , i.e., An = Bn + (s:o:), where (s:o:) denotes terms having probability order smaller than that of Bn . For each l = 1; l (zit )

where

0 l (z) @2

l (z)

; p; we have the following Taylor expansion around z:

+ fH l0 (z)g> [H

1 (z

it

z)g: Of course,

when zit is close to z: De ne

of

(zit

1 z)] + rH;l (zit ; z); 2

(11)

= @ l (z)=@z is the q 1 vector of the rst order derive function, rH;l (zit ; z) = fH

l (z) z)g> fH @z@z > HgfH

1; 2;

1

; p; and

(z) = f 1 (z);

l (z)

;

l (z)

approximates

1 (Z it

and

= f l (z); [H l0 (z)]> g> ; a (q + 1) p (z)g

>,

a p

z)g> ]> is a (q + 1)

0 l (z)

approximates

it

0 l (zit )

1 column vector for l =

(q + 1) parameter matrix. The

(z) is (z): Therefore, we will replace (Zit ) in (1) by

Git (z; H) = [1; fH

l (zit )

1 (z

rst column

(z)Git (z; H) for each i and t; where

1 vector.

To make matrix operations simpler, we stack the matrix (z) into a p(q + 1) and denote it by vecf (z)g . Since vec(ABC) = (C > 6

A)vec(B) and (A

1 column vector

B)> = A>

B > ; where

refers to Kronecker product, we have Xit> (z)Git (z; H) = fGit (z; H)

Xit g> vecf (z)g for all i

and t. Thus, we consider the following minimization problem min[Y (z)

R(z; H)vecf (z)g]> SH (z)[Y

where

2

(Gi;1 (z; H) 6 .. Ri (z; H) = 4 . (Gi;m (z; H) >

R(z; H) = [R1 (z; H) ;

Xi1 )> Xim

)>

R(z; H)vecf (z)g]

3

7 5 is an m

(12)

[p(q + 1)] matrix,

; Rn (z; H)> ]> is an (nm)

[p(q + 1)] matrix.

Simple calculations give vecf b(z)g = fR(z; H)> SH (z)R(z; H)g

1

R(z; H)> SH (z)Y

= vecf (z)g + fR(z; H)> SH (z)R(z; H)g

1

(An =2 + Bn + Cn );

where An = R(z; H)> SH (z) (z; H), Bn = R(z; H)> SH (z)D0 The ft + (i frH;1 ( ; );

1)mgth element of the column vector eit ; z) = fH ; rH;p ( ; )g> and rH;l (Z

1 (Z it

0,

(13)

and Cn = R(z; H)> SH (z)V:

eit ; z); where rH ( ; ) = (z; H) is Xit> rH (Z 2

e

l (Zit ) z)g> fH @ @z@z > HgfH

1 (Z it

eit z)g with Z

lying between Zit and z for each i and t. Both An and Bn contribute to the bias term of the P estimator. Also, if ni=1 i = 0 holds true, Bn = 0; if we only assume i being iid with zero mean

and nite variance, the bias due to the existence of unknown xed e ects can be asymptotically ignored. To derive the asymptotic distribution of vecf b(z)g, we rst give some regularity conditions.

Throughout this paper, we use M > 0 to denote a nite constant, which may take a di erent value at di erent places.

Assumption 1: The random variables (Xit ; Zit ) are independently and identically distributed (i.i.d.) across the i index, and (a) EkXit k2(1+

)

M < 1 and EkZit k2(1+

)

M < 1 hold for some

> 0 and for all i and t.

(b) The Zit are continuous random variables with a p.d.f. ft (z). Also, for each z 2 Rq , P f (z) = m t=1 ft (z) > 0. P P (c) Denote it = KH (Zit ; z) and $it = it = m (z) = jHj 1 m t=1 it 2 (0; 1) for all i and t. t=1

E (1

$it )

T it Xit Xit

is a nonsingular matrix.

7

(d) Let ft (zjXit ) be the conditional pdf of Zit at Zit = z conditional on Xit and ft;s (z1 ; z2 jXit ; Xjs ) be the joint conditional pdf of (Zit ; Zjs ) at (Zit ; Zjs ) = (z1 ; z2 ) conditional on (Xit ; Xjs ) for t 6= s and any i and j. Also,

(z), ft (z), ft ( jXit ), ft;s ( ; jXit ; Xjs ) are uniformly bounded in the domain

of Z and are all twice continuously di erentiable at z 2 Rq for all t 6= s, i and j. Assumption 2: Both X and Z have full column rank; fXit;1 ; :::; Xit;p ; fXit;l Zit;j : l = 1; :::; p; j = 1; :::; qgg are linearly independent. If Xit;l

Xi;l for at most one l 2 f1;

depend on t, we assume E(Xi;l ) 6= 0: The unobserved xed e ects a nite variance variance

2 v

2

i

; pg, i.e., Xi;l does not

are i.i.d. with zero mean and

> 0. The random errors vit are assumed to be i.i.d. with a zero mean, nite

and independent of Zit and Xit for all i and t. Yit is generated by equation (1).

If Xit contains a time invariant regressor, say the lth component of Xit is Xit;l = Wi . Then the corresponding coe cient

l(

) is estimable if MH (z)(W

(W1 ; :::; Wn )> . Simple calculations give MH (z)(W

em ) 6= 0 for a given z, where W = P em ) = (n 1 ni=1 Wi )MH (z) (en em ). The

proof of Lemma A.2 in Appendix 7.1 can be used to show that MH (z)(en

em ) = 6 0 for any P given z with probability one. Therefore, l ( ) is asymptotically identi able if n 1 ni=1 Xit;l P a:s: n 1 ni=1 Wi 9 0 while ! 0. For example, if Xit contains a constant, say, Xit;1 = Wi 1; then Pn 1 1 ( ) is estimable because n i=1 Wi = 1 6= 0. Qq Assumption 3: K(v) = s=1 k(vs ) is a product kernel, and the univariate kernel function

k( ) is a uniformly bounded, symmetric (around zero) probability density function with a compact qP q 2 support [ 1; 1]. In addition, de ne jHj = h1 hq and kHk = j=1 hj : As n ! 1, kHk ! 0, njHj ! 1.

The assumptions listed above are regularity assumptions commonly seen in nonparametric estimation literature. Assumption 1 apparently excludes the case of either Xit or Zit being I(1); other than the moment restrictions, we do not impose I(0) structure on Xit across time, since this paper considers the case that m is a small

nite number.

Also, instead of imposing the

smoothness assumption on ft ( jXit ) and ft;s ( ; jXit ; Xis ) as in Assumption 1(d), we can assume T jz ; z that ft (z) E Xit XitT jz and ft;s (z1 ; z2 ) E Xit Xjs 1 2 are uniformly bounded in the domain of

Z and are all twice continuously di erentiable at z 2 Rq for all t 6= s and i and j. Our version of the smoothness assumption simpli es our notation in the proofs. Assumption 2 indicates that Xit can contain a constant term of ones. The kernel function

8

having a compact support in Assumption 3 is imposed for the sake of brevity of proof and can be removed at the cost of lengthy proofs. Speci cally, the Gaussian kernel is allowed. We use b(z) to denote the rst column of b(z). Then ^(z) estimates (z).

THEOREM 3.1 Under Assumptions 1-3, we obtain the following bias and variance for b(z), given a nite integer m > 0: bias( b(z)) =

(z)

var( b(z)) = n

where

1

jHj

1 Pm t=1

(z) = jHj

= O kHk2 , and

1

(z) =2 + O n 1 2 v

E (1

(z) = jHj

1

(z) $it )

1 Pm t=1 E

1=2

jHj ln (ln n) + o(kHk2 );

(z)

(z)

T it Xit Xit

,

h

$it )2

(1

1

+ o(n

1

jHj

(z) = jHj i 2 X XT . it it it

1

);

1 Pm t=1 E

h

(1

$it )

T it Xit Xit rH

Z~it ; z

The rst term of bias( b(z)) results from the local approximation of (z) by a linear function of

z, which is of order O kHk2 as usual. The second term of bias( b(z)) results from the unknown xed e ects

i:

(a) if we assumed

Pn

i=1

i

= 0, this term is zero exactly; (b) the result indicates

that the bias term is dominated by the rst term and will vanish as n ! 1. In Appendix, we show that jHj jHj

m X 1 t=1

E

h

jHj

1

m X

E

T it Xit Xit

=

(z) + o(kHk2 ),

t=1

T it Xit Xit rH

1

m X t=1

E

Z~it ; z 2 T it Xit Xit

i

(z) + o(kHk2 ),

=

2

(z)

=

Z

K 2 (u) du

H

(z) + o(kHk2 ),

h 2 (z) Pm 1 2 dv, T jz , and k (v) v (z) = f (z) E X X (z) = tr H @@z@z ; ; 2 1t 1t H T H t=1 t i T @ 2 p (z) . Since $it 2 [0; 1) for all i and t, the results above imply the existence of (z), tr H @z@z T H

where

(z), and

=

R

(z). However, given a nite integer m > 0, we can not obtain explicitly the asymptotic

bias and variance due to the random denominator appearing in $it . Further, the following Theorem gives the asymptotic normality results for b(z).

9

i

THEOREM 3.2 Under Assumptions 1-3, and assuming in addition that Ejvit j2+ < 1 for some p > 0, and that njHjkHk2 = O (1) as ! 1, we have

where

p njHjf b(z) (z)

=

(z)

2 v limn!1

(z)

1

(z)

d

1

(z) =2g ! N(0;

(z)

(z)

1

(z) );

. Moreover, a consistent estimator for

(z)

is given

as follows: b

(z)

= Sp b (z; H)

b (z; H) = n

b H) = n J(z;

1

b H) b (z; H) J(z;

1 > p Sp !

(z) ;

1

jHj

1

R(z; H)> SH (z)R(z; H)

1

jHj

1

R(z; H)> SH (z)Vb Vb > SH (z)R(z; H)

where Vb is the vector of estimated residuals and Sp includes the rst p rows of the p(q + 1) identify matrix. Finally, a consistent estimator for the leading bias can be easily obtained based on a nonparametric local quadratic regression result.

4

TESTING RANDOM EFFECTS VERSUS FIXED EFFECTS

In this section we discuss how to test for the presence of random e ects versus

xed e ects in

a semiparametric varying coe cient panel data model. The model remains as (1). The random e ects speci cation assumes that xed e ects case,

i

i

is uncorrelated with the regressors Xit and Zit , while for the

is allowed to be correlated with Xit and/or Zit in an unknown way.

We are interested in testing the null hypothesis (H0 ) that alternative hypothesis (H1 ) that

i

i

is a random e ect versus the

is a xed e ect. The null and alternative hypotheses can be

written as H0 : PrfE( i jZi1 ; :::; Zim ; Xi1 ;

; Xim ) = 0g = 1 for all i,

(14)

H1 : PrfE( i jZi1 ; :::; Zim ; Xi1 ;

; Xim ) 6= 0g > 0 for some i ,

(15)

while we keep the same setup given in model (1) under both H0 and H1 . Our test statistic is based on the squared di erence between the FE and RE estimators, which is asymptotically zero under H0 and positive under H1 : To simplify the proofs and save computing time, we use local constant estimator instead of local linear estimator for constructing our test.

10

Then following the argument in Section 2 and Appendix 7.2, we have bF E (z) = fX > SH (z)Xg

1

bRE (z) = fX > WH (z)Xg

1

X > WH (z)Y

p matrix with X = (X1> ;

where X is a (nm) m

X > SH (z)Y

p matrix with Xit = [Xit;1 ;

; Xn> ); and for each i, Xi = (Xi1 ;

; Xim )> is an

; Xit;p ]> : Motivated by Li, et al. (2002), we remove the random

denominator of bF E (z) by multiplying X > SH (z)X and our test statistic will be based on Z Tn = f bF E (z) bRE (z)g> fX > SH (z)Xg> fX > SH (z)Xgf bF E (z) bRE (z)gdz Z e (z)> SH (z)XX > SH (z)U e (z)dz = U

since fX > SH (z)Xgf bF E (z) bRE (z)g = X > SH (z)fY

X bRE (z)g

e (z): To simplify the X > SH (z)U

statistic, we make several changes in Tn . Firstly, we simplify the integration calculation by replacing e (z) by U b , where U b U

b (Z) = Y U

BfX; bRE (Z)g and BfX; bRE (Z)g stacks up Xit> bRE (Zit ) in

the increasing order of i rst then of t: Secondly, to overcome the complexity caused by the random denominator in MH (z), we replace MH (z) by MD = In

m

m

1I n

(em e> m ) such that the xed

e ects can be removed due to the fact that MD D0 = 0. With the above modi cation and also P P removing the i = j terms in Tn (since Tn contains two summations i j ), our further modi ed test statistic is given by def Ten =

n X X i=1 j6=i

where Qm = Im

b > Qm U i

m Z

Z

1 e e> : m m

bj ; KH (Zi ; z)Xi> Xj KH (Zj ; z)dzQm U If jHj ! 0 as n ! 1, we obtain

jHj 1 KH (Zi ; z)Xi> Xj KH (Zj ; z)dz (16) 2 3 >X >X KH (Zi;1 ; Zj;1 )Xi;1 KH (Zi;1 ; Zj;m )Xi;1 j;1 j;m 7 6 .. .. .. = 4 5, . . . > X > X KH (Zi;m ; Zj;1 )Xi;m KH (Zi;m ; Zj;m )Xi;m j;1 j;m R where KH (Zit ; Zjs ) = KfH 1 (Zit Zjs )+!gK(!)d!. We then replace KH (Zit ; Zjs ) by KH (Zit ; Zjs ); this replacement will not a ect the essence of the test statistic since the local weight is untouched. Now, our proposed test statistic is given by n

Tbn =

n

1 X X b> bj Ui Qm Ai;j Qm U n2 jHj

(17)

i=1 j6=i

11

where Ai;j equals to the right-hand side of equation (16) after replacing KH (Zit ; Zjs ) by KH (Zit ; Zjs ). Finally, to remove the asymptotic bias term of the proposed test statistic, we calculate the leaveone-unit-out random-e ects estimator of (Zit ); that is, for a given pair of (i; j) in the double summation of (17) with i 6= j, bRE (Zit ) is calculated without using the observations on the j th -

th b unit, f(Xjt ; Zjt ; Yjt )gm t=1 and RE (Zjt ) is calculated without using the observations on the i -unit.

We present the asymptotic properties of this test below and delay the proofs to Appendix 7.3.

THEOREM 4.1 Under Assumptions 1-3, and ft (z) has a compact support S for all t, and p n jHj kHk4 ! 0 as n ! 1, then we have under H0 that Jn = n

where b02 = 2 0

p

d

jHjTbn =b0 ! N(0; 1)

2 n2 jHj

= 4(1

Pn Pn i=1

b 2 b> j6=i (Vi Qm Ai;j Qm Vj )

1=m)2 v4

where Vbit = Yit

(18)

Z

2

K (u)du

m X t 1 X t=2 s=1

Xit> bF E (Zit )

is a consistent estimator of

i h > E ft (Z1s )(X1s X2t )2 ,

bi and for each pair of (i; j), i 6= j, bF E (Zit ) is a leave-two-

unit-out FE estimator without using the observations from the ith and j th units and bi = Yi P >b m 1 m t=1 Xit F E (Zit ). Under H1 , Pr[Jn > Bn ] ! 1 as n ! 1, where Bn is any nonstochastic p sequence with Bn = o(n jHj).

Assuming that ft (z) has a compact support S for all t is to simplify the proof of supz2S jj bRE (z)

(z) jj = op (1) as n ! 1; otherwise, some trimming procedure has to be placed to show the

uniform convergence result and the consistency of b02 as an estimator of 02 . Theorem 4.1 states p that the test statistic Jn = n jHjTbn =b0 is a consistent test for testing H0 against H1 . It is a one-sided test. If Jn is greater than the critical values from the standard normal distribution, we reject the null hypothesis at the corresponding signi cance levels.

5

MONTE CARLO SIMULATIONS

In this section we report some Monte Carlo simulation results to examine the nite sample performance of the proposed estimator. The following data generating process is used: Yit =

1 (Zit )

+

2 (Zit )Xit

+

i

+ vit ;

(19) 12

where

1 (z)

= 1 + z + z2,

[0; =2], Xit = 0:5Xi;t

1

2 (z)

+

it ,

= sin(z ), Zit = wit + wi;t

it

1,

wit is i.i.d. uniformly distributed in

is i.i.d. N(0; 1). In addition,

i

= c0 Zi + ui for i = 2;

; n with

c0 = 0; 0:5; and 1.0, ui is i.i.d. N(0; 1): When c0 6= 0, i and Zit are correlated; we use c0 to control P the correlation between i and Zi = m 1 m t=1 Zit . Moreover, vit is i.i.d. N(0; 1), wit , it , ui and vit are independent of each other.

We report estimation results for both the proposed xed-e ects (FE) estimator and the randome ects (RE) estimator, see Appendix 7.2 for the asymptotic results of the RE estimator. To learn how the two estimators perform when we have xed-e ects model and when we have random-e ects model, we use the integrated squared error as a standard measure of estimation accuracy: ISE( bl ) =

Z

f bl (z)

l (z)g

2

f (z)dz;

(20)

which can be approximated by the average mean squared error AM SE( bl ) = (nm)

1

n X m X i=1 t=1

[ bl (Zit )

2 l (Zit )]

for l = 1; 2. In Table 1 we present the average value of AM SE( bl ) from 1000 Monte Carlo

experiments. We choose m = 3 and n = 50, 100, and 200.

Since the bias and variance of the proposed FE estimator do not depend on the values of the xed e ects, our estimates are the same for di erent values of c0 ; however, it is not true under the random-e ects model. Therefore, the results derived from the FE estimator are only reported once in Table 1 since it is invariant to di erent values of c0 . It is well-known that the performance of non/semiparametric models depends on the choice of bandwidth. Therefore, we propose a leave-one-unit-out cross validation method to automatically select the optimal bandwidth for estimating both the FE and RE models. Speci cally, when estimating ( ) at a point Zit ; we remove f(Xit ; Yit ; Zit )gm t=1 from the data and only use the rest of (n

1)m observations to calculate b(

i) (Zit ).

In computing the RE estimate, the leave-one-

unit-out cross validation method is just a trivial extension of the conventional leave-one-out cross validation method. The conventional leave-one-out method fails to provide satisfying result due to the existence of unknown xed e ects. Therefore, when calculating the FE estimator, we use the

13

following modi ed leave-one-unit-out cross validation method: b opt = arg min[Y H H

where MD = In

m

m

BfX; b(

1I n

1) (Z)g

then of t. Simple calculations give BfX; b(

stacks up Xit> b(

> > 1) (Z)g] MD MD [Y

= [BfX; (Z)g

BfX; b(

+2[BfX; (Z)g

1) (Z)g];

(21)

(em e> m ) satis es MD D0 = 0; this is used to remove the unknown

xed e ects. In addition, BfX; b( [Y

BfX; b(

> > 1) (Z)g] MD MD [Y

BfX; b(

in the increasing order of i rst

1) (Z)g]

> > 1) (Z)g] MD MD [BfX;

BfX; b(

i) (Zit )

> > 1) (Z)g] MD MD V

(Z)g

BfX; b(

1) (Z)g]

+ V > MD MD V;

(22)

where the last term does not depend on the bandwidth. If vit is independent of the fXjs ; Zjs g for all i, j, s and t, or (Xit ; Zit ) is strictly exogenous variable, then the second term has zero expectation because the linear transformation matrix MD removes a cross-time not cross-sectional average P from each variable, e.g. Yeit = Yit m 1 m s=1 Yis for all i and t. Therefore, the rst term is the dominant term in large samples and (21) is used to nd an optimal smoothing matrix minimizing

a weighted mean squared error of f b(Zit )g: Of course, we could use other weight matrices in (21)

instead of MD as long as the weight matrices can remove the xed e ects and do not trigger a non-zero expectation of the second term in (22). Table 1 shows that the RE estimator performs better than the FE estimator when the true

model is a random e ects model. However, the FE estimator performs much better than the RE estimator when the true model is a xed-e ects model. This is expected since the RE estimator is inconsistent when the true model is the xed e ects model. Therefore, our simulation results indicate that a test for random e ects against

xed e ects will be always in demand when we

analyze panel data models. In Table 2 we report simulation results of the proposed nonparametric test of random e ects against xed e ects. For the selection of the bandwidth h, for univariate case, Theorem 4.1 indicates that h ! 0, nh ! 1, and nh9=2 ! 0 as n ! 1; if we take h

n

; Theorem 4.1 requires

both conditions nh ! 1 and nh9=2 ! 0 as n ! 1, we use Table 2, we use h = c(nm)

2=7 b z

2 ( 29 ; 1): To ful ll

= 2=7. Therefore, in producing

to calculate the RE estimator with c taking a value from :8 , 1:0,

and 1:2. Since the computation is very time consuming, we only report results for n = 50 and 100. 14

With m = 3, the e ective sample size is 150 and 300, which is small but moderate sample size. Although the bandwidth chosen this way may not be optimal, the results in Tables 2, 3, and 4 show that the proposed test statistic is not very sensitive to the choice of h when c changes and that a moderate size distortion and decent power are consistent with the ndings in the nonparametric tests literature. We conjecture that some bootstrap procedures can be used to reduce the size distortion in nite samples. We will leave this as a future research topic.

6

CONCLUSION

In this paper we proposed a local linear least squares method to estimate a xed e ects varying coe cient panel data model when the number of observations across time is nite; a data-driven method was introduced to automatically nd the optimal bandwidth for the proposed FE estimator. In addition, we introduced a new test statistic to test for a random e ects model against a xed e ects model. Monte Carlo simulations indicate that the proposed estimator and test statistic have good nite sample performance.

7 7.1

APPENDIX Proof of Theorem 3.1

To make our mathematical formula short, we introduce some simpli ed notations rst: for each i P and t; it = KH (Zit ; z) and cH (Zi ; z) 1 = m t=1 it , and for any positive integers i; j; t; s 2 3 1 Gjs1 Gjsq 6 Git1 Git1 Gjs1 Git1 Gjsq 7 6 7 [ ]it;js = Git (z; H) GTjs (z; H) = 6 . 7 . . .. .. 4 .. 5 . .. =

"

Gitq

1 H

Gitq Gjs1

Gitq Gjsq #

T

1 (Z it

z)

H 1 (Zjs z) H 1 (Zit z) H

1 (Z js

z)

where the (l + 1)th element of Gjs (z; H) is Gjsl = (Zjsl

zl ) =hl ; l = 1;

show that

0

[ ]i1 t1 ;i2 t2 [ ]j1 s1 ;j2 s2

= @1 +

Ri (z; H)T KH (Zi ; z) em eTm KH (Zj ; z) Rj (z; H) =

q X j=1

m X m X t=1 s=1

15

(A.1)

T

; q. Simple calculations 1

Gj1 s1 j Gi2 t2 j A [ ]i1 t1 ;j2 s2 ;

it js [ ]it;js

T Xit Xjs

In addition, we obtain for a nite positive integer j jHj jHj

m X 1 t=1

where St;j;1 St;j;2 where RK;j

2

E4

1

m X

E

t=1

2j it

q X

h

j it [ ]it;it jXit

G2itj 0

j 0 =1

i

m X

=

t=1

3

m X

[ ]it;it jXit 5 =

t=1

E (St;j;1 jXit ) + Op kHk2 ;

(A.2)

E (St;j;2 jXit ) + Op kHk2 ;

(A.3)

"

# R (zjXit ) ft (zjXit ) K j (u) du @ft@z HR K;j T = it ) RK;j H @ft (zjX f (zjX t it ) RK;j @z # " R 2j (zjX1t ) ft (zjXit ) K (u) uT udu @ft @z H K;2j T = @ft (zjXit ) f (zjX ) H t it K;2j K;2j @z R j R = K (u) uuT du and K;2j = K 2j (u) uT u uuT du.

(A.4) (A.5)

Moreover, for any nite positive integer j1 and j2 ; we have jHj =

2

m X m X t=1 s6=t

m X m X t=1 s6=t

jHj =

E

(t;s)

2

E4

i

(A.6)

(t;s)

j1 j2 it is

0 @

q X

j 0 =1

1

3

Gitj 0 Gisj 0 A [ ]it;is jXit ; Xis 5

(A.7)

E Tj1 ;j2 ;2 jXit ; Xis + Op kHk2

where we de ne bj1;j2 ;i1 ;i2 = Tj1 ;j2 ;1 =

[ ]it;is jXit ; Xis

(t;s)

t=1 s6=t

t=1 s6=t

j1 j2 it is

E Tj1 ;j2 ;1 jXit ; Xis + Op kHk2

m X m X 2

m X m X

h

R

1 K j1 (u) u2i 1 du

R

K j2 (u) u12i2 du

ft;s (z; zjXit ; Xis ) bj1 ;j2 ;0;0 H 5t ft;s (z; zjXit ; Xis ) bj1 ;j2 ;1;0

5Ts ft;s (z; zjXit ; Xis ) Hbj1 ;j2 ;0;1 H 52t;s ft;s (z; zjXit ; Xis ) Hbj1 ;j2 ;1;1

tr H 52t;s ft;s (z; zjXit ; Xis ) H H 5s ft;s (z; zjXit ; Xis )

5Tt ft;s (z; zjXit ; Xis ) H ft;s (z; zjXit ; Xis ) Iq q

and (t;s)

Tj1 ;j2 ;2 =

bj1 ;j2 ;1;1 ;

with 5s ft;s (z; zjXit ; Xis ) = @ft;s (z; zjXit ; Xis ) =@zs and 52t;s ft;s (z; zjXit ; Xis ) = @ 2 ft;s (z; zjXit ; Xis ) = @zt @zsT . The conditional bias and variance of vec b(z) are given as follows: h i h i Bias vec b(z) j fXit ; Zit g = R (z; H)T SH (z) R (z; H) 16

1

R (z; H)T SH (z) [ (z; H) =2 + D0

0] ;

h i V ar vec b(z) j fXit ; Zit g =

i 1 h i 2 R (z; H)T SH (z) R (z; H) R (z; H)T SH (z) R (z; H) h i 1 : R (z; H)T SH (z) R (z; H)

2 v

h

Lemma A.1 If Assumption A3 holds, we have "

n X

#

1

cH (Zi ; z)

i=1

= Op n

1

jHj ln (ln n) .

(A.8)

P 2 and KH (Zit ; z) = Proof: Simple calculations give E ( m t=1 KH (Zit ; z)) = jHj f (z)+O jHj kHk P jHj ft (z) + O jHj kHk2 , where f (z) = m t=1 ft (z). Next, we obtain for any small " > 0 Pr

(

max

1 i n

(

= 1

1

it

>"

t=1

Pr

(m X t=1

n 1

1

m X

it

1

)

f (z) jHj ln (ln n)

>"

1

=1 ))n

f (z) jHj ln (ln n)

Pr

(

max

1 i n

1

1

on " 1 + M kHk2 = ln (ln n) ! 0 as n ! 1

m X

it

t=1

"

1

f (z) jHj ln (ln n)

P "E ( m t=1 it ) f (z) jHj ln (ln n)

n

where the rst inequality uses the the generalized Chebyshev inequality and the limit is derived using the l'H^opital's rule. This will complete the proof of this lemma.

Lemma A.2 Under Assumptions 1-3, we have n

1

jHj

where $it =

1

T

R (z; H) SH (z) R (z; H)

it =

Pm

t=1

it

1

jHj

m X t=1

2 (0; 1) for all i and t.

17

E $it

it [ ]it;it

Xit XitT

)

Proof: First, simple calculation gives An = R (z; H)T SH (z) R (z; H) = R (z; H)T WH (z) MH (z) R (z; H) n X = Ri (z; H)T KH (Zi ; z) Ri (z; H) i=1

n X n X

=

qij Ri (z; H)T KH (Zi ; z) em eTm KH (Zj ; z) Rj (z; H)

j=1 i=1 n m XX

Xit XitT

it [ ]it;it

i=1 t=1 n X n X

qii

qij

m X m X

m X m X

it is [ ]it;is

T Xit Xis

t=1 s=1

i=1

T Xit Xjs

it js [ ]it;js

t=1 s=1

j=1 i6=j

= An1

n X

An2

An3 ;

where MH (z) = In m Q em eTm WH (z), and the typical elements of Q are qii = cH (Zi ; z) P P cH (Zi ; z)2 = ni=1 cH (Zi ; z) and qij = cH (Zi ; z) cH (Zj ; z) = ni=1 cH (Zi ; z) for i 6= j. Here, P 1 for all i. cH (Zi ; z) = ( m t=1 it ) Pm Xit XitT Applying (A.2), (A.3), (A.6), and (A.7) to An1 , we have n 1 jHj 1 An1 t=1 E St;1;1 1

1

+Op kHk2 +Op n 2 jHj 2 if kHk ! 0 and n jHj ! 1 as n ! 1. Pm Apparently, t=1 $it = 1 for all i. In addition, since the kernel function K ( ) is zero out-

side the unit circle by Assumption 3, the summations in An2 are taken over units such that 1 (Z it

H

z)

n jHj

1. By Lemma A.1 and by the LLN given Assumption 1 (a), we obtain

Pn

1

i=1 cH (Zi ; z)

T Xit Xis

$it $is [ ]it;is

= Op n

1

ln (ln n)

i=1 t=1 s=1

p 1 Pn Pm Pm T T Xit Xis Xit Xis it is [ ]it;is i=1 t=1 s6=t 2njHj p P = Op (jHj), where we use m 2 it is for any t 6= s. it + is t=1 it P P Hence, we have n 1 jHj 1 An2 = n 1 jHj 1 ni=1 m Xit XitT + Op (jHj). Det=1 $it it [ ]it;it P P note dit = $it it [ ]it;it Xit XitT and n = n 1 jHj 1 ni=1 m Edit ). It is easy to show t=1 (dit h i that n 1 jHj 1 n = Op n 1=2 jHj 1=2 . Since E (kdit k) E it [ ]it;it Xit XitT M jHj h i P holds for all i and t, n 1 jHj 1 An2 = jHj 1 m Xit XitT + op (1) exists, but t=1 E $it it [ ]it;it and

1 njHj

Pn Pm Pm

n X m X m X

i=1

t=1

s6=t

it Pm

is

t=1

it

[ ]it;is

we can not calculate the exact expectation due to the random denominator. Consider An3 .

We have n

tion 1, and the fact that n

1 jHj

1 jHj 1 kA

n3 k

1 Pn Pm i=1 t=1 I

= Op jHj2 ln (ln n)

18

H

1 (Z it

z)

by Lemma A.1, Assump1 = 2f (z) + Op kHk2 +

Op n

1=2 jHj 1=2

.

Hence, we obtain n

1

jHj

1

An

n = n

1

1

jHj jHj

= jHj

1

An1

n

n X m X 1

1

1

jHj

(1

n X m X

$it

Xit XitT

it [ ]it;it

i=1 t=1

$it )

Xit XitT

it [ ]it;it

i=1 t=1

h

m X 1

E (1

t=1

$it )

Xit XitT

it [ ]it;it

i

+ op (1) :

This will complete the proof of this Lemma. Lemma A.3 Under Assumptions 1-3, we have n

1

jHj

1

R (z; H)T SH (z)

(z; H)

jHj

1

m X t=1

h E (1

$it )

it (Git

Xit ) XitT rH Z~it ; z

i

:

Proof: Simple calculations give Bn = R (z; H)T SH (z) (z; H) n X m X Xit ) XitT rH Z~it ; z = it (Git =

i=1 t=1 n X m X i=1 t=1 n X

qii

it (Git

Xit ) XitT rH Z~it ; z

m X m X

is it (Git

t=1 s6=t i=1 n n m X m XX X

qij

j=1 i6=j

= Bn1

Bn2

n X n X

qij

m X m X

s=1 t=1 j=1 i=1 n m X X 2 qii it (Git t=1 i=1

js it (Git

T Xit ) Xjs rH Z~js ; z

Xit ) XitT rH Z~it ; z

T Xit ) Xis rH Z~is ; z

js it (Git

T Xit ) Xjs rH Z~js ; z

t=1 s=1

Bn3

Bn4 ;

where

(z; H) is de ned in Section 3. Using the same method in the proof of Lemma A.2, we show P P n 1 jHj 1 Bn n 1 jHj 1 ni=1 m $it ) it (Git Xit ) XitT rH Z~it ; z . t=1 (1 For l = 1;

; k we have

jHj

1

E[

it rH;l

(Zit ; z) jXit ] =

jHj

1

E

it rH;l

(Zit ; z) H

1

(Zit

2 ft (zjXit )

H

(z) + Op kHk4

z) jXit = Op kHk3 ;

19

and E n

n

1 jHj 1 B n1

T 3 H (z)] ; O kHk

2 [ (z)

@ 2 1 (z) H ; @z@z T

H (z) = tr H

; tr H

@ 2 k (z) H @z@z T

1 jHj 1 B n1

Similarly we can show that V ar n

=O n

oT

, where T

: 1 jHj 1 kHk4

if E

M < 1 for all t and s.

P P In addition, it is easy to show that n 1 jHj 1 ni=1 m t=1 $it i h P P n 1 jHj 1 ni=1 m Xit ) XitT rH Z~it ; z +Op n t=1 E $it it (Git h h i P 1 Pm Tr ~it ; z E $ (G X ) X E jHj 1 m Z jHj it it it it H it t=1 t=1

it (Git

T X XT Xit Xis it is

Xit ) XitT rH Z~it ; z

1=2 jHj 1=2 kHk2 it


h by Assumption 3, (c) $it

1=2 jHj ln (ln n)

i

iid 0;

2

, we have

= Op n

1=2

. It follows that

.

2 R (z; H)T SH (z) R (z; H)

jHj

1

m X t=1

20

h E (1

; q,

1, and (d) E kXit k1+ < M < 1

Lemma A.5 Under Assumptions 1-3, we have n

i.

i=1 t=1

> 0 by Assumption 1. Since

1 jHj 1 C n

i=1

n X

Xit ) !

i=1

= R (z; H)T SH (z) (en em ) ! n n m X X X qij RiT Ki em jt j=1

i=1 t=1

Pn

$it )2

2 it [ ]it

Xit XitT

i

:

Proof: Simple calculations give 2 Dn = R (z; H)T SH (z) R (z; H) = R (z; H)T WH (z) MH (z) MH (z)T WH (z) R (z; H) n n X n X X T 2 2 = Ri (z; H) KH (Zi ; z) Ri (z; H) 2 qji Rj (z; H)T KH (Zj ; z) em eTm KH (Zi ; z) Ri (z; H) i=1

+

j=1 i=1

n X n X n X

2 qij qji0 Ri (z; H)T KH (Zi ; z) em eTm KH (Zj ; z) em eTm KH (Zi0 ; z) Ri0 (z; H)

j=1 i=1 i0 =1

= Dn1

2Dn2 + Dn3 :

Using the same method in the proof of Lemma A.2, we show Dn

Pn Pm i=1

t=1 (1

$it )2

2 [] it it;it

P P 2 Xit XitT . It is easy to show that n 1 jHj 1 Dn1 = n 1 jHj 1 ni=1 m Xit XitT = t=1 it [ ]it;it Pm Xit XitT + Op kHk2 + Op n 1=2 jHj 1=2 . t=1 E St;2;1 P P Also, we obtain n 1 jHj 1 ni=1 m $it )2 2it [ ]it;it Xit XitT = { (z)+Op n 1=2 jHj 1=2 , t=1 (1 i i h h P 1 Pm 2 2 2 [] T T jHj E X X where { (z) = jHj 1 m E (1 $ ) [ ] X X it it it it it it it it;it t=1 it;it t=1 M < 1 for all i and t.

The four lemmas above are enough to give the result of Theorem 3.1. Moreover, applying Liaponuov's CLT will give the result of Theorem 3.2. Since the proof is a rather standard procedure, we drop the details for compactness of the paper.

7.2

Technical Sketch{Random E ects Estimator

The RE estimator, ^RE ( ), is the solution to the following optimization problem: min [Y (z)

R (z; H) vec ( (z))]T WH (z) [Y

R (z; H) vec ( (z))] ;

that is, we have vec ^RE (z) h i 1 = R (z; H)T WH (z) R (z; H) R (z; H)T WH (z) Y h i 1 ~n + C~n = vec ( (z)) + R (z; H)T WH (z) R (z; H) A~n =2 + B

where A~n = R (z; H)T WH (z)

~n = R (z; H)T WH (z) D0 (z; H), B

Its asymptotic properties are as follows.

21

0,

and C~n = R (z; H)T WH (z) V .

Lemma A.6 Under Assumptions 1-3, and E Xit XitT jz and E ( i Xit jz) have continuous secondp order derivative at z 2 Rq . Also, n jHj kHk2 = O (1) as n ! 1, and E jvit j2+ < 1 and

E j i j2+ p where

< M < 1 for all i and t and for some

n jHj bRE (z)

2

=

R

d

(z)

k (v) v 2 dv,

2

(z);RE

H

(z) =2 ! N 0; 2

=

+

2 v

(z)

Under H1 , we have Bias bRE (z) V ar ^RE (z)

where

H

=

(z)

1

m X

ft (z) E (

t=1

= n

> 0, we have under H0

1

jHj

1

2 v

(z)

1

1R

1 X1t jz)

Z

(z);RE

;

(A.9)

K 2 (u) du and

(z) =

Pm

t=1 ft (z) E

T jz . X1t X1t

+ o (1)

K 2 (u) du

(A.10)

(z) is given in the proof of Lemma A.3.

Proof of Lemma A.6: First, we have the following decomposition h p n jHj ^RE (z)

h i p (z) = n jHj ^RE (z)

where we can show that the

E ^RE (z)

i

+

h p n jHj E ^RE (z)

i (z) ;

rst term converges to a normal distribution with mean zero by

Liaponuov's CLT (the details are dropped since it is a rather standard proof), and the second term contributes to the asymptotic bias. Since it will cause no notational confusion, we drop the n o n o subscription `RE'. Below, we use Biasi ^ (z) and V ari ^ (z) to denote the respective bias and variance of ^RE (z) under H0 if i = 0 and under H1 if i = 1.

n o First, under H0 , the bias and variance of ^ (z) are as follows: Bias0 ^ (z) j f(Xit ; Zit )g = h i 1 R (z; H)T WH (z) (z; H) =2 and Sp R (z; H)T WH (z) R (z; H) n o V ar0 ^ (z) j f(Xit ; Zit )g h i 1h i = Sp R (z; H)T WH (z) R (z; H) R (z; H)T WH (z) V ar(U U T )WH (z) R (z; H) h i 1 R (z; H)T WH (z) R (z; H) SpT :

It is simple to show that V ar(U U T ) =

2I n

em eTm +

22

2 v In m .

n o n o Next, under H1 , we notice that Bias1 ^ (z) j f(Xit ; Zit )g is the sum of Bias0 ^ (z) j f(Xit ; Zit )g h i 1 R (z; H)T WH (z) D0 0 , and that plus an additional term Sp R (z; H)T WH (z) R (z; H) n o V ar1 ^ (z) j f(Xit ; Zit )g =

i 1h i R (z; H)T WH (z) R (z; H) R (z; H)T WH (z)2 R (z; H) h i 1 SpT . R (z; H)T WH (z) R (z; H)

2 v Sp

h

Noting that R (z; H)T WH (z) R (z; H) is An1 in Lemma A.2 and that R (z; H)T WH (z)

(z; H)

is Bn1 in Lemma A.3, we have n o Bias0 ^ (z) =

2

H

(z) =2 + o kHk2 .

(A.11)

In addition, under Assumptions 1-3, and E j i j2+ all i and t and for some n =

1

m X

jHj

1

< M < 1 and E kXit k2+

< M < 1 for

> 0, we show that

Sp R (z; H)T WH (z) D0

ft (z) E (

1 X1t jz)

t=1

0

=n

1

jHj

1

Sp

n X

i

1=2

it (Git

Xit )

t=1

i=1

+ Op kHk2 + Op (n jHj)

m X

;

(A.12)

which is a non-zero constant plus a term of op (1) under H1 . Combining (A.11) and (A.12), we obtain (A.10). Hence, under H1 , the bias of the RE estimator will not vanish as n ! 1, and this leads to the inconsistency of the RE estimator under H1 . As for the asymptotic variance, we can easily show that under H0 V ar0

n

o ^ (z) = n

1

jHj

1

n o and under H1 , V ar1 ^ (z) = n

2

+

2 v

1 jHj 1

(z) 2 v

1

Z

(z)

K 2 (u) du; 1R

R (z; H)T WH (z)2 R (z; H) is Dn1 in Lemma A.5, and

K 2 (u) du, where we have recognized that 2

+

leading term of R (z; H)T WH (z) V ar(U U T )W H (z) R (z; H).

23

(A.13)

2 v

R (z; H)T WH (z)2 R (z; H) is the

7.3

Proof of Theorem 4.1 ; 4im )T with 4it = XitT

De ne 4i = (4i1 ;

^RE (Zit ) . Since MD D0 = 0; we can

(Zit )

decompose the proposed statistic into three terms n

1 X X ^T ^j Ui Qm Ai;j Qm U n2 jHj

T^n =

i=1 j6=i

1 2 n jHj

=

+

n X X

i=1 j6=i n X X

1 n2 jHj

n

4Ti Qm Ai;j Qm 4j +

2 XX T 4i Qm Ai;j Qm Vj n2 jHj i=1 j6=i

ViT Qm Ai;j Qm Vj

i=1 j6=i

= Tn1 + 2Tn2 + Tn3 ; vim )T is the m

where Vi = (vi1 ;

1 error vector. Since ^RE (Zit ) does not depend on the jth

unit observations and ^RE (Zjt ) does not depend on the ith unit observations for a pair of (i; j); it is easy to see that E (Tn2 ) = 0: The proofs fall into the standard procedures seen in the literature of nonparametric tests. We therefore give a very brief proof below. Firstly, applying Hall's (1984) CLT, we can show that under both H0 and H1 n

p

d

jHjTn3 ! N 0;

by de ning Hn ( i ;

j)

2 0

(A.14)

= ViT Qm Ai;j Qm Vj with

i

= (Xi ; Zi ; Vi ), which is a symmetric, centred and

degenerate variable. We are able to show that E G2n (

1;

2)

+n

fE [Hn2 ((

1E 1;

Hn4 (( 2 ))]g

1;

2 ))

O jHj3 + O n

=

2

1 jHj

O jHj2

if jHj ! 0 and n jHj ! 1 as n ! 1; where Gn (

1;

2)

jHj

1

= E i [Hn ((

!0 1;

i )) Hn (( 2 ;

i ))].

In

addition, var n

p jHjTn3

= 2 jHj 2 1

1

E Hn2 (

m

1 2

4 v

1;

2)

m X m X t=1 s=1

h i 2 2 T E KH (Z1s ; Z2t ) X1s X2t =

2 0

+ o (1) .

p Secondly, we can show that n jHjTn2 = Op kHk2 + Op n 1=2 jHj 1=2 under H0 and p p p n jHjTn2 = Op (1) under H1 : Moreover, we have, under H0 , n jHjTn1 = Op n jHj kHk4 ; p p under H1 , n jHjTn1 = Op n jHj . 24

Finally, to estimate

2 0

consistently under both H0 and H1 , we replace the unknown Vi and

Vj in Tn3 by the estimated residual vectors from the FE estimator. Simple calculations show that P T^ the typical element of V^i Qm is e v^it = yit XitT ^F E (Zit ) vit yi m 1 m vi t=1 Xit F E (Zit ) P T ~ ~ = XT =4 (vit vi ), where 4 (Zit ) ^F E (Zit ) m 1 m (Zit ) ^F E (Zit ) = it it it t=1 Xit Pm T (Zil ) ^F E (Zil ) with qtt = 1 1=m and qlt = 1=m for l 6= t. The leave-two-unitl=1 qlt Xil

out FE estimator does not use the observations from the ith and jth units for a pair (i; j), and this h i 2 Pm Pm 2 ~2 ~2 2 T ~ 2 v~2 + 4 ~ 2 v~2 + v~2 v~2 4it 4js + 4 leads to E V^iT Qm Ai;j Qm V^j it js js it it js t=1 s=1 E KH (Zit ; Zjs ) Xit Xjs h i Pm Pm P 2 2 2 2 (Z ; Z ) X T X E K v ~ v ~ ~it = vit vi and vi = m 1 m it js js it it js , where v s=1 t=1 t=1 vit . H

ACKNOWLEDGEMENTS Sun's research was supported from the Social Sciences and Humanities Research Council of Canada (SSHRC). Carroll's research was supported by a grant from the National Cancer Institute (CA-57030), and by the Texas A&M Center for Environmental and Rural Health via a grant from the National Institute of Environmental Health Sciences (P30-ES09106). The corresponding author: Yiguo Sun. Email address: [email protected].

REFERENCES Arellano, M. (2003). Panel Data Econometrics. Oxford University Press. Baltagi, B. (2005). Econometrics Analysis of Panel Data (2nd edition). Wiley, New York. Cai, Z. and Li, Q. (2008). Nonparametric estimation of varying coe cient dynamic panel data models. Econometric Theory, 24, 1321-1342. Hall, P. (1984). Central limit theorem for integrated square error of multivariate nonparametric density estimators. Annals of Statistics, 14, 1-16. Henderson, D. J., Carroll, R.J., and Li, Q. (2008). Nonparametric estimation and testing of xed e ects panel data models. Journal of Econometrics, 144, 257-275. Henderson, D. J. and Ullah, A. (2005). A nonparametric random e ects estimator. Economics Letters, 88, 403-407. Hsiao, C. (2003). Analysis of Panel Data (2nd edition). Cambridge University Press. 25

Li, Q. and Huang, C.J., Li, D. and Fu, T. (2002). Semiparametric smooth coe cient models. Journal of Business & Economic Statistics, 20, 412-422. Li, Q. and Stengos, T. (1996). Semiparametric estimation of partially linear panel data models. Journal of Econometrics, 71, 389-397. Lin, D. Y. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data (with discussion). Journal of the American Statistical Association, 96, 103-126. Lin, X., and Carroll, R. J. (2000). Nonparametric function estimation for clustered data when the predictor is measured without/with Error. Journal of the American Statistical Association, 95, 520-534. Lin, X. and Carroll, R. J. (2001). Semiparametric regression for clustered data using generalized estimation equations. Journal of the American Statistical Association, 96, 1045-1056. Lin, X. and Carroll, R. J. (2006). Semiparametric estimation in general repeated measures problems. Journal of the Royal Statistical Society, Series B, 68, 68-88. Lin, X., Wang, N., Welsh, A. H. and Carroll, R. J. (2004). Equivalent kernels of smoothing splines in nonparametric regression for longitudinal/clustered data. Biometrika, 91, 177-194. Poirier, D.J. (1995). Intermediate Statistics and Econometrics: a Comparative Approach. The MIT Press. Ruckstuhl, A. F., Welsh, A. H. and Carroll, R. J. (2000). Nonparametric function estimation of the relationship between two repeatedly measured variables. Statistica Sinica, 10, 51-71. Su, L. and Ullah, A. (2006). Pro le likelihood estimation of partially linear panel data models with xed e ects. Economics Letters, 92, 75-81. Wang, N. (2003). Marginal nonparametric kernel regression accounting for within-subject correlation. Biometrika, 90, 43-52. Wu, H. and Zhang, J. Y. (2002). Local polynomial mixed-e ects models for longitudinal data. Journal of the American Statistical Association, 97, 883-897.

26

Table 1: Average mean squared errors (AMSE) of the xed and random e ects estimators when the data generation process is a random e ects model and when it is a xed e ects model. Data Process Random E ects Estimator Fixed E ects Estimator n = 50 n = 100 n = 200 n = 50 n = 100 n = 200 Estimating 1 ( ): c0 = 0 .0951 .0533 .0277 c0 = 0:5 .6552 .5830 .5544 .1381 .1163 .1021 c0 = 1:0 2.2010 2.1239 2.2310 Estimating 2 ( ): c0 = 0 .1562 .0753 .0409 c0 = 0:5 .8629 .7511 .7200 .1984 .1379 .0967 c0 = 1:0 2.8707 2.4302 2.5538

Table 2: Percentage c n = 50 1% 5% 0.8 .007 .015 1.0 .011 .023 1.2 .019 .043

Rejection Rate When n = 100 10% 1% 5% .024 .021 .035 .041 .025 .040 .075 .025 .054

c0=0 10% .046 .062 .097

Table 3: Percentage c n = 50 1% 5% 0.8 .626 .719 1.0 .682 .780 1.2 .719 .811

Rejection Rate When c0=0.5 n = 100 10% 1% 5% 10% .764 .913 .929 .933 .819 .935 .943 .951 .854 .943 .962 .969

Table 4: Percentage c n = 50 1% 5% 0.8 .873 .883 1.0 .908 .913 1.2 .931 .938

Rejection Rate When c0=1.0 n = 100 10% 1% 5% 10% .888 .943 .944 .946 .921 .962 .966 .967 .944 .980 .981 .982