Bootstrap Consistency for General Semiparametric M-estimation

3 downloads 0 Views 411KB Size Report
Jun 6, 2009 - Consider M-estimation in a semiparametric model that is charac- terized by a Euclidean parameter of interest and a nuisance function.
Submitted to the Annals of Statistics

BOOTSTRAP CONSISTENCY FOR GENERAL SEMIPARAMETRIC M-ESTIMATION

arXiv:0906.1310v1 [math.ST] 6 Jun 2009

June 6, 2009 By Guang Cheng∗ and Jianhua Z. Huang† Purdue University and Texas A&M University Consider M-estimation in a semiparametric model that is characterized by a Euclidean parameter of interest and a nuisance function parameter. We show that, under general conditions, the bootstrap is asymptotically consistent in estimating the distribution of the Mestimate of Euclidean parameter; that is, the bootstrap distribution asymptotically imitates the distribution of the M-estimate. We also show that the bootstrap confidence set has the asymptotically correct coverage probability. These general conclusions hold, in particular, when the nuisance parameter is not estimable at root-n rate. Our results provide a theoretical justification for the use of bootstrap as an inference tool in semiparametric modelling and apply to a broad class of bootstrap methods with exchangeable bootstrap weights. A by-product of our theoretical development is the second order asymptotic linear expansion of the (bootstrap) M-estimate.

1. Introdution. Due to its flexibility, semiparametric modelling has provided a powerful statistical modelling framework for complex data, and proven to be useful in a variety of contexts, see [1, 5, 17, 25, 26, 30, 41, 42]. ∗

Guang Cheng is Assistant Professor, Department of Statistics, Purdue University Jianhua Z. Huang is Professor, Department of Statistics, Texas A&M University AMS 2000 subject classifications: Primary 62F40; Secondary 62G20



Keywords and phrases: Bootstrap Consistency, Bootstrap Confidence Set, Semiparametric Model, M-estimation, Second Order Accuracy

1 imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

2

GUANG CHENG AND JIANHUA Z. HUANG

Semiparametric models are indexed by a Euclidean parameter of interest θ ∈ Θ ⊂ Rd and a nuisance function parameter η belonging to a Banach space H with norm k · k. M-estimation, including the maximum likelihood estimation as a special case, refers to a general method of estimation, where the estimates are obtained by optimizing certain criterion functions [8, 15, 23, 38]. The asymptotic theories and inference procedures for semiparametric maximum likelihood estimation, or more generally M-estimation, have been extensively studied in [2, 26, 27, 28, 34, 36]. In particular, a general theorem for investigating the asymptotic behavior of M-estimate for θ in semiparametric models is given in [38, 39]. The bootstrap method is a widely used data-resampling method in drawing statistical inference such as obtainig standard errors and constructing confidence regions. See [3, 14, 18, 22, 23, 24, 33] for its application in semiparametric models. By replacing complicated theoretical derivations with the routine simulation of bootstrap samples, the bootstrap method is conceptually and operationally simple. It is such simplicity that leads to the popularity of the bootstrap method. However, the validity of the bootstrap, i.e., why it yields the valid inference, needs to be theoretically justified. While the asymptotic validity of the bootstrap method has been well established in parametric models [13, 32], a systematic theoretical study on the bootstrap inference in semiparametric models is almost non-existence, √ especially when the nuisance function η is not n convergent. This paper tries to fill in this gap. An unfortunate observation is that even the basic bootstrap consistency theorem has never been established for semiparametric models. The lack

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

3

of theoretical justifications of the bootstrap in the semiparametric context leads to the development of other semiparametric inferential tools, e.g., piggyback bootstrap [9] and profile sampler [21]. Although having established asymptotic theory, these alternatives of the bootstrap have their own shortcomings. The piggyback bootstrap is not the standard bootstrap and requires knowing the limiting distribution of MLE for θ, which may not be easy to estimate in general. The implementation of profile sampler involves the computation of the profile likelihood and may have some MCMC computational issues, e.g. the choice of prior and the convergence of the Markov chain. Moreover, neither of these methods is as simple to implement as the classical bootstrap of Efron. The purpose of this paper is to develop a general theory on bootstrap consistency for the classical bootstrap method and thus promote its use as an inference tool for semiparametric models. We focus on the Euclidean parameter of the model. Our main results are summarized below. b ηb) and bootstrap M-estimator (θb∗ , ηb∗ ) The semiparametric M-estimator (θ,

are obtained by optimizing the objective function m(θ, η) based on the i.i.d. observations (X1 , . . . , Xn ) and bootstrap sample (X1∗ , . . . , Xn∗ ), respectively: (1) (2)

b ηb) = arg (θ,

(θb∗ , ηb∗ ) = arg

sup

n X

θ∈Θ,η∈H i=1

sup

n X

θ∈Θ,η∈H i=1

m(θ, η)(Xi ), m(θ, η)(Xi∗ ).

Note that we can also express (3)

(θb∗ , ηb∗ ) = arg

sup

n X

θ∈Θ,η∈H i=1

Wni m(θ, η)(Xi ).

by using the bootstrap weights Wni ’s. For example, if (X1∗ , . . . , Xn∗ ) are independent draws with replacement from the original sample, i.e. Efron’s imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

4

GUANG CHENG AND JIANHUA Z. HUANG

(Nonparametric) Bootstrap, then Wn ≡ (Wn1 , . . . , Wnn ) follows the Multinomial distribution with parameter (n, (n−1 , . . . , n−1 )). The more general exchangeable bootstrap weighting scheme [29] that includes Efron’s bootstrap as a special case are considered in this paper. We first present a preliminary result showing the asymptotic normality of semiparametric M-estimate θb in Theorem 1. As a key result in this paper,

our Theorem 2 confirms that this limiting distribution can be bootstrapped consistently. For example, in the Efron’s bootstrap, the bootstrap distribu√ b conditional on the observed data, asymptotically imitates tion of n(θb∗ − θ), √ b the distribution of n(θ−θ 0 ), where θ0 is the true value of θ, see Corollary 1. As shown in Theorem 3, the consistency of the bootstrap confidence set of θ, which means that its coverage probability converges to the nominal level, immediately follows from the above distributional consistency theorem. All the above conclusions are valid, in particular, when the nuisance parameter has slower than root-n convergence rate. The rigorous proof of Theorem 2 is very challenging since it requires careful probabilistic analysis, see Lemma A.1, and involves the bootstrapped empirical processes techniques. As a by-product of our theoretical development, we also obtain the second order asymptotic linear expansions for the M-estimator of the Euclidean parameter and for its bootstrap version, which imply that the second order accuracy of the (bootstrap) M-estimate depends on the convergence rate of the nuisance function parameter. Such results extend similar results of Cheng and Kosorok [6, 7] for MLE and are of its own interests. In a highly related paper, Ma and Kosorok (2005) obtained some consistency results for what they called the weighted bootstrap where the boot-

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

5

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

strap weights {Wni }ni=1 are assumed to be independent. This independence assumption rules out the possiblity of applying their results to justify Efron’s boostrap in the semiparametric setting. They stated in the paper that the independence assumption makes the proofs easier and the relaxation to the dependent weights appears to be quite difficult. In this paper, we make use of the bootstrapped empirical processes techniques [12] to overcome the technical difficulties. Other related work includes interesting results on bootstrap consistency in nonparametric models. Wellner and Zhan (1996) proved the bootstrap consistency for root-n convergent nonparametric estimates obtained by solving estimation equations. Sen et. al. (2009) and Kosorok (2008) showed the inconsistency of bootstrapping Grenander estimator. Our main results, including the bootstrap consistency theorem, are presented in Section 2. Sections 3 and 4 discuss how to verify various technical conditions needed for the main results. Section 5 illustrates the applications of our main results using three examples. Sections 6 contains proofs of our main results in Section 2. Some useful lemmas and additional proofs are postponed to Appendix. 2. Main Results. 2.1. Preliminaries. We first introduce a paradigm for studying the semib which parallels the efficient influence function parametric M-estimate θ,

paradigm used for MLEs (where m(θ, η) = log lik(θ, η)), developed in Section 6 of [38]. Let m1 (θ, η) =

imsart-aos ver.

∂ m(θ, η) ∂θ

and

2006/01/04 file:

m2 (θ, η)[h] =

∂ |t=0 m(θ, η(t)), ∂t

bconsistency-JH_v6.tex date:

June 6, 2009

6

GUANG CHENG AND JIANHUA Z. HUANG

where h is a “direction” along which η(t) ∈ H approaches η as t → 0, running through some index set H ⊆ L02 (Pθ,η ). Similarly we also define ∂ m1 (θ, η) ∂θ

and

∂ m2 (θ, η)[h] ∂θ

and

m11 (θ, η) = m21 (θ, η)[h] =

∂ |t=0 m1 (θ, η(t)), ∂t ∂ m22 (θ, η)[h, g] = |t=0 m2 (θ, η2 (t))[h], ∂t

m12 (θ, η)[h] =

where h, g ∈ H and (∂/∂t)|t=0 η2 (t) = g. Define m2 (θ, η)[H] = (m2 (θ, η)[h1 ], . . . , m2 (θ, η)[hd ])′ m22 [H, h] = (m22 (θ, η)[h1 , h], . . . , m22 (θ, η)[hd , h])′ , where H = (h1 , . . . , hd ) and hj ∈ H for j = 1, . . . , d. Assume there exists an H † (θ, η) = (h†1 (θ, η), . . . , h†d (θ, η))′ , where each h†j (θ, η) ∈ H, such that for any h ∈ H (4)

n

o

Eθ,η m12 (θ, η)[h] − m22 (θ, η)[H † , h] = 0.

Following the idea of the efficient score function, we define the function e m(θ, η) = m1 (θ, η) − m2 (θ, η)[H † (θ, η)].

b ηb) satisfies Based on the constructions of m1 (θ, η) and m2 (θ, η)[H], (θ,

(5)

where Pn f denotes

Pn

b ηb) = 0, e θ, Pn m(

i=1 f (Xi )/n.

We assume that the observed data are

from the probability space (X , A, PX ), and that (6)

imsart-aos ver.

e 0 , η0 ) = 0, PX m(θ 2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

7

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

where PX f is the customary operator notation defined as

R

f dPX . The as-

sumption (6) is common in semiparametric M-estimation [23, 38] and usually holds by the semiparametric model specifications, e.g. the semiparametric regression models with “panel count data” [38]. In particular, when e m(θ, η) = log lik(θ, η), (6) trivially holds and m(θ, η) becomes the well stud-

ied efficient score function for θ in semiparametric models, see [2].

Throughout the rest of the paper, we use the shortened notations H0† = b ηb), and we use the superscript e 0 = m(θ e 0 , η0 ) and m b = m( e θ, H † (θ0 , η0 ), m

“o” to denote the outer probability. For a probability space (Ω, A, P ) and a ¯ that need not be measurable, the notations E o T , Oo (1), map T : Ω 7→ R PX

and ooPX (1) represent the outer expectation of T w.r.t. P , bounded and converges to zero in outer probability, respectively. More precise definitions can be found on Page 6 of [34]. Let V ⊗2 represent the outer product matrix V V ′ for any vector V . Define x ∨ y (x ∧ y) to be the maximum (minimum) value of x and y. We now state some general conditions that will be used throughout the whole paper. We assume that the true value θ0 of the Euclidean parameter is an interior point of the compact set Θ. I. Postive Information Condition: The matrices A = PX (m11 (θ0 , η0 ) − m21 (θ0 , η0 )[H0† ]) and B = PX [{m1 (θ0 , η0 )−m2 (θ0 , η0 )[H0† ]}⊗2 ] are both nonsingular. Condition I above is used to ensure the nonsingularity of the asymptotic variance of θb in (14), which will be shown to be A−1 B(A−1 )′ . √ For the empirical process Gn = n(Pn −PX ), denote its norm with respect to a function class Fn as kGn kFn = supf ∈Fn |Gn f |. For any fixed δn > 0, imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

8

GUANG CHENG AND JIANHUA Z. HUANG

define a class of functions Sn as (7)

Sn ≡ Sn (δn ) =





e 0 , η) − m(θ e 0 , η0 ) m(θ : kη − η0 k ≤ δn , kη − η0 k

and a shrinking neighborhood of (θ0 , η0 ) as (8)

Cn ≡ Cn (δn ) = {(θ, η) : kθ − θ0 k ≤ δn , kη − η0 k ≤ δn }.

The next two conditions, Conditions S1–S2, make sure that the empirical e e processes indexed by m(θ, η) are well behaved and m(θ, η) is smooth enough

around (θ0 , η0 ).

S1. Stochastic Equicontinuity Condition: for any δn → 0, kGn kSn = OPo X (1)

(9) and

o e e 0 , η)) = OP (10) Gn (m(θ, η) − m(θ (kθ − θ0 k) X

for

(θ, η) ∈ Cn .

S2. Smoothness Condition:

e e 0 ) = A(θ − θ0 ) + O(kθ − θ0 k2 ∨ kη − η0 k2 ), (11) PX (m(θ, η) − m

for (θ, η) in some neighborhood of (θ0 , η0 ). For any fixed θ, define

ηbθ = arg sup Pn m(θ, η). η∈H

The next condition says that ηbθ should be close to η0 if θ is close to θ0 .

S3. Convergence Rate Condition: there exists a γ ∈ (1/4, 1/2] such that, (12)

kηbe − η0 k = OPo X (kθe − θ0 k ∨ n−γ ), θ

e for any consistent θ.

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

9

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

Here γ is required to be larger than 1/4, which is always true for regular semiparametric models (see Section 3.4 of [34]). Verifications of Conditions S1–S3 will be discussed in Sections 3 and 4 and illustrated with examples in Section 5. 2.2. Semiparametric M-estimator. We show that the semiparametric Mestimator θb has an asymptotic linear expression and it is asymptotically nor-

mally distributed. This result plays an important role in proving bootstrap consistency in the next subsection. Theorem 1.

b ηb) Suppose that Conditions I, S1-S3 hold and that (θ,

satisfies (5). If θb is consistent, then

(13)



√ o e 0 + OP n(θb − θ0 ) = − nA−1 Pn m (n−2γ+1/2 ). X

Thus, we have (14)



d

n(θb − θ0 ) −→ N (0, Σ),

where Σ ≡ A−1 B(A−1 )′ , A and B are given in Condition I. We assume consistency of θb in Theorem 1. The consistency can be usually

guaranteed under the following “well-separated” condition: (15)

PX m(θ0 , η0 ) > sup PX m(θ, η) (θ,η)6∈G

for any open set G ⊂ Θ × H containing (θ0 , η0 ), see Theorem 5.7 in [35]. To obtain the asymptotic normality, we only need the remainder term in the asymtotic linear expansion (13) to be of order ooPX (1). Since γ > 1/4, the order of the remainder term OPo X (n−2γ+1/2 ) is always ooPX (1). We actually imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

10

GUANG CHENG AND JIANHUA Z. HUANG

obtain a higher order expansion, stronger than what is needed for showing asymptotic normality. It is interesting to note that the rate of convergence of the remainder term depends on how accurately the nuisance function parameter η can be estimated. In other words, we can conclude that the second order estimation accuracy of θb is higher for semiparametric models

with faster convergent η. This higher order result extends a similar result of Cheng and Kosorok [6, 7] developed for maximum likelihood to the Mestimation setting. For maximum likelihood estimation, m(θ, η) = log lik(θ, η), and it is easy to see that A = −B and Σ = B −1 . In this case, Σ−1 becomes the efficient information matrix. Remark 1.

b of Σ, we have Given any consistent estimator Σ

√ b −1/2 b d nΣ (θ − θ0 ) −→ N (0, I)

(16)

b can be by Theorem 1 and Slutsky’s Theorem. In practice, the consistent Σ

obtained via either the observed profile information approach [27] or the profile sampler approach [21]. Remark 2.

The M-estimation equation (5) can be relaxed to the

b ηb) = oo (n−1/2 ). Under weaker cone θ, “nearly maximizing” condition Pn m( PX

ditions than stated in Theorem 1, the same argument can be used to show (17)



√ e 0 + ooPX (1), n(θb − θ0 ) = − nA−1 Pn m

b i.e. (14) and (16). Note which also implies the asymptotic normality of θ,

that (17) has also been established in [23, 38], but our argument can be used to obtain the higher order expansion result (13). imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

11

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

2.3. Bootstrap Consistency. In this subsection, we establish the consistency of bootstrapping θ under general condtions in the framework of semiparametric M-estimation. Define P∗n f = (1/n)

Pn

i=1 Wni f (Xi ),

where Wni ’s

are the bootstrap weights defined on the probability space (W, Ω, PW ). In view of (3), the bootstrap estimator can be rewritten as (θb∗ , ηb∗ ) = arg

sup θ∈Θ,η∈H

P∗n m(θ, η).

Similar to (5), we can show that (θb∗ , ηb∗ ) satisfies e θb∗ , ηb∗ ) = 0. P∗n m(

(18)

The bootstrap weights Wni ’s are assumed to belong to the class of exchangeable bootstrap weights introduced in [29]. Specifically, they satisfy the following conditions: W1. The vector Wn = (Wn1 , . . . , Wnn )′ is exchangeable for all n = 1, 2, . . ., i.e. for any permutation π = (π1 , . . . , πn ) of (1, 2, . . . , n), the joint distribution of π(Wn ) = (Wnπ1 , . . . , Wnπn )′ is the same as that of Wn . W2. Wni ≥ 0 for all n, i and

Pn

i=1 Wni

for all n.

W3. lim supn→∞ kWn1 k2,1 ≤ C < ∞, where kWn1 k2,1 = W4. limλ→∞ lim supn→∞ supt≥λ t2 PW (Wn1 > t) = 0. W5. (1/n)

Pn

i=1 (Wni

R∞p 0

PW (Wn1 ≥ u)du.

P

W − 1)2 −→ c2 > 0.

In Efron’s nonparametric bootstrap, the bootstrap sample is drawn from the nonparametric estimate of the true distribution, i.e. empirical distribution. Thus, it is easy to show that Wn ∼ Multinomial(n, (n−1 , . . . , n−1 )) and Conditions W1–W5 are satisfied with c = 1 in W5. In general, Conditions W3-W5 are easily satisfied under some moment conditions on Wni , see imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

12

GUANG CHENG AND JIANHUA Z. HUANG

Lemma 3.1 of [29]. In addition to Efron’s nonparametric boostrap, the sampling schemes that satisfy Conditions W1–W5 include Bayesian bootstrap, Multiplier bootstrap, Double bootstrap, and Urn boostrap; see [29]. There exist two sources of randomness for the bootstrapped quantity, e.g. θb∗ : one comes from the observed data; another comes from the resampling done by the bootstrap, i.e. random Wni ’s. Therefore, in order to rigorously state our theoretical results for the bootstrap, we need to specify relevant probability spaces and define stochastic orders with respect to relevant probability measures. We view Xi as the ith coordinate projection from the canonical probability space (X ∞ , A∞ , PX∞ ) onto the ith copy of X . For the joint randomness involved, the product probability space is defined as (X ∞ , A∞ , PX∞ ) × (W, Ω, PW ) = (X ∞ × W, A∞ × Ω, PXW ). In this paper, we assume that the bootstrap weights Wni ’s are independent of the data Xi ’s, thus PXW = PX∞ × PW . We write PX∞ as PX for simplicity o thereafter. Define EXW as the outer expectation w.r.t. PXW . The notations o o and E EX|W , EX W are defined similarly.

Given a real-valued function ∆n defined on the above product probability space, e.g. θb∗ , we say that ∆n is of an order ooPW (1) in PXo -probability if for

any ǫ, η > 0, (19)

o PXo {PW |X (|∆n | > ǫ) > η} −→ 0

as n → 0, and that ∆n is of an order OPo W (1) in PXo -probability if for any η > 0, there exists a 0 < M < ∞ such that (20) imsart-aos ver.

o PXo {PW |X (|∆n | ≥ M ) > η} −→ 0

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

13

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

as n → ∞. Given a function Γn defined only on (X ∞ , A∞ , PX∞ ), if it is of an order ooPX (1) (OPo X (1)), then it is also of an order ooPXW (1) (OPo XW (1)) based on the following argument: o o PXW (|Γn | > ǫ) = EXW 1{|Γn | > ǫ} = EX EW |X 1{|Γn | > ǫ}o

= EX 1{|Γn | > ǫ}o = PXo {|Γn | > ǫ}, where the third equation holds since Γn does not depend on the bootstrap weight. More results on transition of various stochastic orders are given in Lemma A.1 of the Appendix. Such results are used repeatedly in proving our bootstrap consistency theorem. To establish the bootstrap consistency, we need some additional conditions. The first condition is the measurability condition, denoted as M (PX ). We say a class of functions F ∈ M (PX ) if F possesses enough measurability so that Pn can be randomized, i.e. we can replace (δXi − PX ) by (Wni − 1)δXi , and Fubini’s Theorem can be used freely. The detailed description for M (PX ) is spelled out in [12] and also given in the Appendix of e this paper. Define T = {m(θ, η) : kθ − θ0 k + kη − η0 k ≤ R} for some R > 0.

For the rest of the paper we assume T ∈ M (PX ).

The second class of conditions parallels Conditions S1-S3 used for obtaining asymptotic normality of M-estimators and is only slightly stronger. Thus, the bootstrap consistency for the Euclidean parameter in semiparametric models is almost automatically guaranteed once the semiparametric M-estimate is shown to be asymptotically normal. Let Sn (x) be the envelop

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

14

GUANG CHENG AND JIANHUA Z. HUANG

function of the class Sn = Sn (δn ) defined in (7), i.e. Sn (x) =

sup kη−η0 k≤δn

m(θ e0 e 0 , η) − m kη − η k . 0

The next condition controls the tail of this envelop function. SB1. Tail Probability Condition: lim lim sup sup t2 PXo (Sn (X1 ) > t) = 0

(21)

λ→∞

n→∞ t≥λ

for any sequence δn → 0. e Let T˙ = {∂ m(θ, η)/∂θ : (θ, η) ∈ Cn }, where Cn = Cn (δn ) is defined in (8).

SB2. We assume that T˙ ∈ M (PX ) ∩ L2 (PX ) and that T˙ is P-Donsker.

Condition SB2 ensures that the size of the function class T˙ is reasonable √ so that the bootstrapped empirical processes G∗n ≡ n(P∗n − Pn ) indexed by T˙ has a limiting process conditional on the observations; see Theorem 2.2 in [29]. For any fixed θ, define ηbθ∗ = arg max P∗n m(θ, η). η∈H

The next condition says that ηbθ∗ should be close to η0 if θ is close to θ0 .

SB3. Bootstrap Convergence Rate Condition: there exists a γ ∈ (1/4, 1/2] such that (22)

kηbe∗ − η0 k = OPo W (kθe − θ0 k ∨ n−γ ) in PXo − probability θ

o

PXW θ0 . for any θe −→

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

15

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

Verifications of Conditions SB1–SB2 will be discussed in Section 3. Two general Theorems are given in Section 4 to aid verification of Condition SB3. To rigorously state the bootstrap consistency result, we need the notion of conditional weak convergence [16]. Let BL1 (B) be a collection of Lipschitz continuous functions h : B 7→ R bounded in absolute value by 1 and having Lipschitz constant 1, i.e. |h(x)| ≤ 1 and |h(x) − h(y)| ≤ kx − yk for all b ∗ converges weakly to X conditional on the data Xn , x, y ∈ B. We say X n

b ∗ =⇒ X, if denoted as X n

(23)

sup

h∈BL1 (B)

b ∗ ) − EX h(X)| = oo (1), |E·|Xn h(X n PX

where E·|Xn denotes the conditional expectation given the data Xn , provided b ∗ ) is asymptotically measurable unconditionally for all h ∈ BL1 (B). h(X n

More discussions of conditional weak convergence can be found in [34]. The next result is a bootstrap version of Theorem 1. Theorem 2.

Suppose that θb and θb∗ satisfy (5) and (18), respectively. o

PXW PX θ0 . Assume that Conditions I, S1-S3, θ0 and θb∗ −→ Also assume that θb −→

SB1-SB3 and W1-W5 hold. We have that

kθb∗ − θ0 k = OPo W (n−1/2 )

(24)

in PXo -Probability. Furthermore, (25)

√ b∗ b o e 0 + OP (n1/2−2γ ) n(θ − θ) = −A−1 G∗n m W

in PXo -probability. Thus, (26)

√ b =⇒ N (0, Σ), ( n/c)(θb∗ − θ)

where c is given in W5, whose value depends on the used sampling scheme, and Σ ≡ A−1 B(A−1 )′ with A and B given in Condition I. imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

16

GUANG CHENG AND JIANHUA Z. HUANG o

PXW θ0 can be established by adapting the Argmax The assumption θb∗ −→

Theorem, Corollary 3.2.3, in [34]. Briefly, we need two conditions for accom-

plishing this. The first one is the “well separated” condition (15). Another one is (27)

sup (θ,η)∈Θ×H

Po

XW |P∗n m(θ, η) − PX m(θ, η)| −→ 0.

By the Multiplier Glivenko-Cantelli Theorem, i.e. Lemma 3.6.16 in [34], and (A.1) in the Appendix, we know that (27) holds if {m(θ, η) : θ ∈ Θ, η ∈ H} is shown to be P-Donsker. Note that (25) and (26) of Theorem 2 are in parallel to (13) and (14) of Theorem 1. In particular, the asymptotic linear expansion (25) is a second order one in the sense that the remainder term is of more refined order than ooPW (1) whose rate of convergence depends on how accurately the nuisance function η can be estimated. Thus, it might be reasonable to conjecture that more accurate bootstrap inferences can be drawn from the semiparametric models with faster convergent η. Let PW |Xn denote the conditional distribution given the observed data Xn . Note that Theorem 2 implies that √ b ≤ x) − P (N (0, Σ) ≤ x) = oo (1) (28) sup PW |Xn (( n/c)(θb∗ − θ) PX x∈Rd

by setting h(·) = 1{· ≤ x}, where “ ≤ ” is taken componentwise, in (23). Theorem 1 together with Lemma 2.11 in [35] implies that (29)

√ sup PX ( n(θb − θ0 ) ≤ x) − P (N (0, Σ) ≤ x) = o(1).

x∈Rd

Combining (28) and (29), we obtain the following bootstrap consistency result. imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

17

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

Corollary 1.

Under the conditions in Theorem 2, we have that

Po √ √ X b ≤ x) − P ( n(θb − θ ) ≤ x) −→ (30) sup PW |Xn (( n/c)(θb∗ − θ) 0 X 0 x∈Rd

as n → ∞.

√ b asympCorollary 1 says that the bootstrap distribution of ( n/c)(θb∗ − θ) √ totically imitates the unconditional distribution of n(θb − θ0 ). o

Remark 3.

PX XW b −→ b ∗ P−→ Σ, we have Σ and Σ For any consistent Σ

√ b =⇒ N (0, I), b ∗ )−1/2 (θb∗ − θ) (31) ( n/c)(Σ ! √ b ( n/c)(θb∗ − θ) (32) sup PW |Xn ≤ x − PX b ∗ )1/2 x (Σ



n(θb − θ0 ) b 1/2 Σ

! Po X 0 ≤ x −→

by Theorem 2, Slutsky’s Theorem and the arguments in proving Corollary 1. b ∗ is the block jackknife proposed An appropriate candidate for the consistent Σ

in [24].

2.4. Bootstrap Confidence Sets. In this subsection, we show that the distribution consistency of the bootstrap estimator θb∗ proven in Corollary 1 im-

plies the consistency of a variety of bootstrap confidence sets, i.e. percentile, hybrid and t types.

∗ ∈ A lower α-th quantile of bootstrap distribution is any quantity τnα ∗ = inf{ǫ : P b∗ Rd satisfying τnα W |Xn (θ ≤ ǫ) ≥ α}, where ǫ is an infimum

over the given set only if there does not exist a ǫ1 < ǫ in Rd such that

PW |Xn (θb∗ ≤ ǫ1 ) ≥ α. Because of the assumed smoothness of the criterion

function m(θ, η) in our setting, we can, without loss of generality, assume ∗ ) = α. We can also define κ∗ = (√n/c)(τ ∗ − θ) b so that PW |Xn (θb∗ ≤ τnα nα nα √ b ≤ κ∗ ) = α. Note that τ ∗ and κ∗ are not unique PW |Xn (( n/c)(θb∗ − θ) nα nα nα since we assume θ is a vector. imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

18

GUANG CHENG AND JIANHUA Z. HUANG

Recall that, for any x ∈ Rd , √ PX ( n(θb − θ0 ) ≤ x) −→ Ψ(x), o √ PX b ≤ x) −→ PW |Xn (( n/c)(θb∗ − θ) Ψ(x),

where Ψ(x) = P (N (0, Σ) ≤ x), by (14) and (28). By invoking the quantile P

XW Ψ−1 (α). convergence Theorem, i.e. Lemma 21.1 in [35], we know κ∗nα −→ √ b ∗ Considering the Slutsky’s Theorem which implies n(θ−θ 0 )−κn(α/2) weakly

converges to N (0, Σ) − Ψ−1 (α/2) unconditionally, we further have PXW

θ0 ≤ θb −

κ∗n(α/2) √ n

!

=

PXW

√



n(θb − θ0 ) ≥ κ∗n(α/2)





−→ PXW N (0, Σ) ≥ Ψ−1 (α/2) = 1 − α/2.

The above arguments prove the consistency of the hybrid-type bootstrap confidence set, i.e. (34), and can also be applied to the percentile-type bootstrap confidence set, i.e. (33). More rigorous proof can be found in Lemma 23.3 of [35]. The following Theorem 3 summarizes the above discussions. Theorem 3. 

(33) PXW θb + (34)

∗ τn(α/2) − θb

c

PXW

as n → ∞.

Under the conditions in Theorem 2, we have ≤ θ0 ≤ θb +

∗ τn(1−α/2) − θb

c

κ∗n(α/2) κ∗n(1−α/2) b b √ ≤ θ0 ≤ θ − √ θ− n n



1 − α,

!

−→ 1 − α,

 −→

It is well known that the above bootstrap confidence sets can be obtained easily through routine bootstrap sampling. Investigating the consistency of the bootstrap variance estimator is also of great interest. However, the usual sufficient condition for moment consistency, i.e. uniform integrability condition, becomes very hard to verify due to imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

19

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

the existence of an infinite dimensional parameter η. An alternative resampling method to obtain the bootstrap variance estimator in semiparametric models is the block jackknife approach, which was proposed and theoretically justified in [24]. We do not pursue this topic further in this paper. Remark 4.

b ∗ and Σ b are Provided consistent variance estimators Σ

available, we can prove that the following t-type bootstrap confidence set is also consistent by applying Lemma 23.3 in [35] to (16) and (31): PXW





b 1/2 ω ∗ b 1/2 ω ∗ Σ Σ n(α/2)  n(1−α/2) θb − √ √ ≤ θ0 ≤ θb − −→ 1 − α, n n

√ ∗ satisfies P b∗ b b ∗ 1/2 ≤ ω ∗ ) = α. as n → ∞, where ωnα W |Xn (( n/c)(θ − θ)/(Σ ) nα

3. Verifications of Conditions S1-S2 and SB1-SB2. In this sec-

tion, we discuss how to verify Conditions S1–S2 and SB1–SB2 in Subsection 3.1 and 3.2, respectively. Two general theorems are given in Section 4 to assist verifying the remaining convergence rate conditions S3 and SB3. 3.1. Verifications of Conditions S1-S2. The continuity modulus condition (9) in S1 can be checked via one of the following two approaches. o kG k The first approach is to show the boundedness of EX n Sn by using

Lemma 3.4.2 in [34]. The second approach is to calculate the bracketing entropy number of Sn and apply Lemma 5.13 in [36] if L2 -norm is used on the nuisance parameter. As for (10), we can verify it easily if we can show e that the class of functions {(∂/∂θ)m(θ, η) : (θ, η) ∈ Cn } is P-Donsker.

Next, we discuss how to verify the smoothness condition S2. We first write

e e 0 ) as the sum of PX (m(θ, e e 0 , η)) and PX (m(θ e 0 , η) − PX (m(θ, η) − m η) − m(θ imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

20

GUANG CHENG AND JIANHUA Z. HUANG

e 0 ). We apply the Taylor expansion to PX (m(θ, e e 0 , η)) to obtain m η) − m(θ e e 0 , η)) PX (m(θ, η) − m(θ n

o

= PX m11 (θ0 , η) − m21 (θ0 , η)[H † (θ0 , η)] (θ − θ0 ) + O(kθ − θ0 k2 ) = A(θ − θ0 ) + O(kθ − θ0 k2 ) + (θ − θ0 )O(kη − η0 k), where A is defined in Condition I, the first and second equality follows from e the Taylor expansion of θ 7→ PX m(θ, η) around θ0 and n

o

η 7→ PX m11 (θ0 , η) − m21 (θ0 , η)[H † (θ0 , η)]

around η0 , respectively. By applying the second order Taylor expansion to e 0 , η) around η0 and considering (4), we can show that P (m(θ e 0 , η)− η 7→ PX m(θ e 0 ) = O(kη − η0 k2 ). In summary, Condition S2 usually holds in models m

e 0 , η) is smooth in the sense that the Fr´ where the map η 7→ m(θ echet derivae 0 , η)) around η0 and the second order Fr´ tive of η 7→ PX ((∂/∂θ)m(θ echet

e 0 , η) around η0 are bounded as discussed above. derivative of η 7→ PX m(θ

3.2. Verifications of Conditions SB1-SB2. We can verify Condition SB1

by showing either Sn (x) is uniformly bounded, i.e., lim supn→∞ Sn (x) ≤ M < ∞ for every x ∈ X , or more generally, lim supn→∞ E{Sn (X1 )2+δ } < ∞ for some δ > 0. That the moment condition implies Condition SB1 follows from the Chebyshev’s inequality. In our examples in Section 5, the uniformly boundedness condition is usually satisfied. Hence, we focus on how to show Sn (x) is uniformly bounded here. By the Taylor expansion in a Banach space e 0 , η)− m e 0 = De e lies on the line segment [40], we can write m(θ η [η−η0 ], where η

e 0 , η) at between η and η0 , and Dξ [h] is the Fr´echet derivative of η 7→ m(θ

ξ along the direction h. Since we require kη − η0 k ≤ δn → 0, the bounded

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

21

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

Fr´echet derivative at η0 will imply that Sn (x) is uniformly bounded. The method in verifying (10) of Condition S1 can be applied to check Condition SB2; see the discussion in the previous subsection. 4. Convergence Rates of Bootstrap Estimate of Functional Parameter. In this section, we present two general theorems for calculating the convergence rate of the bootstrap estimate of the functional parameter. These results can be applied to verify Condition SB3. The convergence rate condition S3 can also be verified based on these theorems by assuming the weights Wni = 1 for i = 1, . . . , n. Note that both theorems extend general results on M-estimators [27, 34] to bootstrap M-estimators and are also of independent interest. 4.1. Root-n rate. In the first theorem we consider a collection of measurable objective functions x 7→ k(θ, η)[g](x) indexed by the parameter (θ, η) ∈ Θ × H and an arbitrary index set g ∈ G. For example, k(θ, η)[g] can be the score function for η given any fixed θ indexed by g ∈ G. Define Un∗ (θ, η)[g] = P∗n k(θ, η)[g], Un (θ, η)[g] = Pn k(θ, η)[g], U (θ, η)[g] = PX k(θ, η)[g]. We assume that the maps g 7→ Un∗ (θ, η)[g], g 7→ Un (θ, η)[g] and g 7→ U (θ, η)[g] are uniformly bounded, so that Un∗ , Un and U are viewed as maps from the parameter set Θ × H into ℓ∞ (G). The following conditions are

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

22

GUANG CHENG AND JIANHUA Z. HUANG

assumed in Theorem 4 below: (35) {k(θ, η)[g] : kθ − θ0 k + kη − η0 k ≤ δ, g ∈ G} ∈ M (PX ) ∩ L2 (PX ) and is P-Donsker for some δ > 0, (36) sup PX {k(θ, η)[g] − k(θ0 , η0 )[g]}2 → 0 as kθ − θ0 k + kη − η0 k → 0. g∈G

Let Dn =



k(θ, η)[g] − k(θ0 , η0 )[g] √ √ : g ∈ G, kθ − θ0 k + kη − η0 k ≤ δn 1 + nkθ − θ0 k + nkη − η0 k



and Dn (X) be the envelop function of the class of functions Dn . For any sequence δn → 0, we assume that Dn (X) satisfies lim lim sup sup t2 PXo (Dn (X1 ) > t) = 0.

(37)

λ→∞

n→∞ t≥λ

Now we consider the convergence rate of ηbe∗ satisfying: θ −1/2 e ηb∗ )[g] = O o Un∗ (θ, ) PXW (n e

(38)

θ

o PXW

for any θe −→ θ0 and g ranging over G. In the below Theorem 4, we will

show that ηbe∗ has the root-n convergence rate under the conditions (35)–(37). θ

Theorem 4.

Suppose that U : Θ × H 7→ ℓ∞ (G) is Fr´echet differ-

entiable at (θ0 , η0 ) with bounded derivative U˙ : Rd × linH 7→ ℓ∞ (G) such that the map U˙ (0, ·) : linH 7→ ℓ∞ (G) is invertible with an inverse that is continuous on its range. Furthermore, assume that (35)-(37) hold, and that U (θ0 , η0 ) = 0, then (39)

kηbe∗ − η0 k = OPo W (kθe − θ0 k ∨ n−1/2 ) θ

Po

Po

XW XW in PXo -probability, given that θe −→ θ0 and ηbe∗ −→ η0 . θ

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

23

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

4.2. Slower than root-n rate. We next present another Theorem which √ yields slower than n convergence rate for the bootstrap M-estimate of the functional parameter. This result is so general that it can be applied to the sieve estimate of the nuisance parameter [11]. The essence of the sieve method is that a sequence of increasing spaces (sieves), i.e. Hn , is employed to approximate the large parameter space, e.g. H, so that asymptotically, the closure of the limiting space contains the original parameter space. In other words, for any η ∈ H, there exists a πn η ∈ Hn such that kη −πn ηk → 0 as n → ∞. Now we consider the M-estimate ηbθ∗ ∈ Hn satisfying

(40)

P∗n v(θ, ηbθ∗ ) ≥ P∗n v(θ, ηn ) for any θ ∈ Θ and some ηn ∈ Hn ,

< ” and where x 7→ v(θ, η)(x) is a measurable objective function. Let “ ∼

> ” denote greater than or smaller than, up to an universal constant. We “∼

assume the following conditions hold for every δ > 0: < − d2 (η, η ) + kθ − θ k2 , EX (v(θ, η) − v(θ, ηn )) ∼ n 0

(41)

(42)

(43)

o EX

sup θ∈Θ,η∈Hn ,kθ−θ0 k≤δ,d(η,ηn )≤δ

o EXW

< ψ (δ), |Gn (v(θ, η) − v(θ, ηn ))| ∼ n

sup θ∈Θ,η∈Hn ,kθ−θ0 k≤δ,d(η,ηn )≤δ

< ψ ∗ (δ). |G∗n (v(θ, η) − v(θ, ηn ))| ∼ n

Here d2 (η, ηn ) may be thought of as the square of a distance, i.e. kη − ηn k2 , but our theorem is also true for any arbitrary function η 7→ d2 (η, ηn ). Theorem 5.

Suppose that Conditions (41)-(43) hold. We assume

(42) ((43)) is valid for functions ψn (ψn∗ ) such that δ 7→ ψn (δ)/δα (δ 7→ imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

24

GUANG CHENG AND JIANHUA Z. HUANG

e ηb∗ ) satisfyψn∗ (δ)/δα ) is decreasing for some 0 < α < 2. Then, for every (θ, θ˜

ing P (θe ∈ Θ, ηbe∗ ∈ Hn ) → 1, we have θ

d(ηbe∗ , ηn ) ≤ OPo W (δn ∨ kθe − θ0 k) θ

in PXo -probability, for any sequence of positive numbers δn satisfying both √ √ ψn (δn ) ≤ nδn2 and ψn∗ (δn ) ≤ nδn2 for every n. In application of Theorem 5, the parameter ηn is taken to be some element in Hn that is very close to η0 . When Hn = H, a natural choice for ηn is η0 and we can directly use Theorem 5 to derive the convergence rate d(ηbe∗ , η0 ) θ

as shown in the examples of Section 5. In general, ηn may be taken as the maximizer of the mapping η 7→ PX v(θ0 , η) over Hn , the projection of η0 onto Hn . Then we need to consider the approximation rate of the sieve space Hn to H, i.e. d(ηn , η0 ), since d(ηbe∗ , η0 ) ≤ d(ηbe∗ , ηn )+d(ηn , η0 ). The approximation θ

θ

rate d(ηn , η0 ) depends on the choices of sieves and is usually derived in the mathematical literature. Here, we skip further discussion and refer readers to [4]. Now we discuss verification of the nontrivial conditions (41)–(43). The smoothness condition for v(θ, η), i.e. (41), is implied by


−kθ − θ0 k2 .

(44)

EX (v(θ, η) − v(θ0 , ηn ))



(45)

EX (v(θ, ηn ) − v(θ0 , ηn ))



The two conditions depict the quadratic behaviors of the criterion functions (θ, η) 7→ EX v(θ, η) and θ 7→ EX v(θ, ηn ) around the maximum point (θ0 , ηn ) and θ0 , respectively. We next present one useful lemma for verifying the continuity modulus of empirical processes and its bootstrapped version, i.e. imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

25

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

(42) and (43). Denote (46) Vδ = {x 7→ [v(θ, η)(x) − v(θ, ηn )(x)] : d(η, ηn ) ≤ δ, kθ − θ0 k ≤ δ}. and define the bracketing entropy integral of Vδ as (47)

K(δ, Vδ , L2 (PX )) =

Z

0

δ

q

1 + log N[] (ǫ, Vδ , L2 (PX ))dǫ,

where log N[] (δ, A, d) is the δ-bracketing entropy number for the class A under the distance measure d. Lemma 1.

Suppose that the functions (x, θ, η) 7→ vθ,η (x) are uniformly

bounded for (θ, η) ranging over some neighborhood of (θ0 , ηn ) and that (48)

< d2 (η, η ) + kθ − θ k2 . EX (vθ,η − vθ,ηn )2 ∼ n 0

Then Condition (42) is satisfied for any functions ψn such that (49)





K(δ, Vδ , L2 (PX )) √ . ψn (δ) ≥ K(δ, Vδ , L2 (PX )) 1 + δ2 n

Let Vn (X) be the envelop function of the class Vδn . If we further assume that, for each sequence δn → 0, the envelop functions Vn satisfies lim lim sup sup t2 PXo (Vn (X1 ) > t) = 0,

(50)

λ→∞

n→∞ t≥λ

then Condition (43) is satisfied for any functions ψn∗ such that (51)



ψn∗ (δ) ≥ K(δ, Vδ , L2 (PX )) 1 +



K(δ, Vδ , L2 (PX )) √ . δ2 n

√ 2 √ 2 < < nδ and ψn∗ (δ) ∼ nδ Note that the inequalities ψn (δ) ∼ √ 2 < are equivalent with K(δ, Vδ , L2 (PX )) ∼ nδ when we let ψn and ψn∗ be Remark 5.

equal to right hand side of (49) and (51), respectively. Consequently, the convergence rate of ηbe∗ calculated in Theorem 5, i.e. δn , is determined by the θ

bracketing entropy integral of Vδn . imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

26

GUANG CHENG AND JIANHUA Z. HUANG

Remark 6.

The assumptions of Lemma 1 are relaxable to great ex-

tent. For example, we can drop the uniform bounded condition on the class of functions v(θ, η) by using the “Bernstein norm”, i.e. kf kP,B = (2P (e|f | − 1 − |f |))1/2 , instead of the L2 -norm. In some cases, the bracketing entropy integral diverges at zero. Then we can replace (47) by the integral: K(δ, Vδ , L2 (PX )) =

Z

δ

cδ2 ∧δ/3

q

1 + log N[] (ǫ, Vδ , L2 (PX ))dǫ,

for some small positive constant c, see Lemma 3.4.3 and page 326 in [34]. 5. Examples. In this section, we apply the main results in Section 2 to justify the bootstrap validity of drawing semiparametric inferences in three examples of semiparametric models. In the first two examples, we use the log-likelihood as the criterion function, while in the last example, the least squares criterion is used. The M-estimate of the nuisance functional parameters have different convergence rates in these examples. This section also serves the purpose of illustration on verification of the technical conditions used in the general results. 5.1. Cox Regression Model with Right Censored Data. In the Cox regression model, the hazard function of the survival time T of a subject with covariate Z is modelled as: (52) λ(t|z) ≡ lim

∆→0

1 P (t ≤ T < t + ∆|T ≥ t, Z = z) = λ(t) exp(θ ′ z), ∆

where λ is an unspecified baseline hazard function and θ is a regression vector. In this model, we are usually interested in θ while treating the cumulative hazard function η(y) =

imsart-aos ver.

2006/01/04 file:

Ry 0

λ(t)dt as the nuisance parameter. The

bconsistency-JH_v6.tex date:

June 6, 2009

27

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

MLE for θ is proven to be semiparametric efficient and widely used in apb which corresponds to treating plications. Here we consider bootstrapping θ,

log-likelihood as the criterion function m(θ, η) in our general formulation.

With right censoring of survival time, the data observed is X = (Y, δ, Z), where Y = T ∧ C, C is a censoring time, δ = I{T ≤ C}, and Z is a regression covariate belonging to a compact set Z ⊂ Rd . We assume that C is independent of T given Z. The log-likelihood is obtained as m(θ, η) = δθ ′ z − exp(θ ′ z)η(y) + δ log η{y},

(53)

where η{y} = η(y) − η(y−) is a point mass that denotes the jump of η at point y. The parameter space H is restricted to a set of nondecreasing cadlag functions on the interval [0, τ ] with η(τ ) ≤ M for some constant M . By some algebra, we have e m(θ, η)(x) = m1 (θ, η) − m2 (θ, η)[H † (θ, η)] 



δz − z exp(θ ′ z)η(y)

=



− δH † (θ, η)(y) − exp(θ ′ z)

Z

y



H † (θ, η)(u)dη(u) ,

0

where H † (θ, η)(y) =

Eθ,η Z exp(θ ′ Z)1{Y ≥ y} . Eθ,η exp(θ ′ Z)1{Y ≥ y}

Conditions I, S1-S3 in guaranteeing the asymptotic normality of θb have

been verified in [6]. In particular, the convergence rate of the estimated nuisance parameter is established in theorem 3.1 of [27], i.e. (54)

1

kηbe − η0 k∞ = OPX (n− 2 + kθen − θ0 k), θ n

where k · k∞ denotes the supreme norm. We next verify the bootstrap consistency conditions, i.e., SB1-SB3. Condition SB1 trivially holds since it is imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

28

GUANG CHENG AND JIANHUA Z. HUANG

e 0 , η) has bounded Fr´ easy to show that η 7→ m(θ echet derivative around

η0 . The P-Donsker condition SB2 has been verified when verifying (10) in condition S1. In the end we will verify the bootstrap convergence rate condition kηbe∗ − η0 k∞ = OPo XW (kθe − θ0 k ∨ n−1/2 ) via Theorem 4. Since ηbθ∗ θ

maximizes P∗n m(θ, η) for fixed θ, we set k(θ, η)[g] = m2 (θ, η)[g] and have ˙ (0, ·), the condiUn∗ (θ, ηbθ∗ )[g] = P∗n m2 (θ, ηbθ∗ )[g] = 0. The invertibility of W

tions (35) and (36) have been verified in [27] when they showed (54). Now we only need to consider the condition (37): for n so large that δn ≤ R Dn (x) ≡ sup



|(m2 (θ, η)[g]) − m2 (θ0 , η0 )[g]| √ , g ∈ G, 1 + n(kθ − θ0 k + kη − η0 k∞ )

kθ − θ0 k + kη − η0 k∞ ≤ δn }

≤ 2 sup {|m2 (θ, η)[g]|, g ∈ G, kθ − θ0 k + kη − η0 k∞ ≤ R} ≤ some constant. The last inequality follows from the assumption that G is a class of functions of bounded total variation and the inequality that

Ry 0

g(u)dη(u) ≤

η(τ )kgkBV , where kgkBV is the total variation of the function g. Thus, the condition (37) holds trivially. 5.2. Cox Regression Model with Current Status Data. We next consider the current status data when each subject is observed at a single examination time C to determine if an event has occurred. The event time T cannot be known exactly. Then the observed data are n i.i.d. realizations of X = (C, δ, Z) ∈ R+ × {0, 1} × Z, where δ = I{T ≤ C} and Z is a vector of covariates. The corresponding criterion function, i.e. the log-likelihood, is derived as (55) m(θ, η) = δ log[1 − exp(−η(c) exp(θ ′ z))] − (1 − δ) exp(θ ′ z)η(c). imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

29

We make the following assumptions throughout the rest of this subsection: (i) T and C are independent given Z; (ii) the covariance of Z − E(Z|C) is positive definite, which guarantees the efficient information to be positive definite; (iii) C possesses a Lebesgue density which is continuous and positive on its support [σ, τ ], for which the true nuisance parameter η0 satisfies η0 (σ−) > 0 and η0 (τ ) < M < ∞, and this density is continuously differentiable on [σ, τ ] with derivative bounded above and bounded below by zero. e The form of m(θ, η) can be found in [7] as follows

e m(θ, η) = m1 (θ, η) − m2 (θ, η)[H † (θ, η)]

= (zη(c) − H † (θ, η)(c))Q(x; θ, η)

where θ′ z

Q(x; θ, η) = e





1 δ − (1 − δ) exp(eθ′ z η(c)) − 1

and the form of H † (θ, η)(c) is given in (4) of [7]. Conditions I and S1-S3 are verified in [7]. Conditions SB1 and SB2 can be checked similarly as in the previous example. Note that the convergence rate for the nuisance parameter becomes slower, i.e. (56)

kηbe − η0 k2 = OPX (kθen − θ0 k + n−1/3 ), θ n

where k·k2 denotes the regular L2 -norm, due to the less information provided by the current status data, as shown in [27]. By Theorem 5 we can show that the same convergence rate, i.e. n−1/3 , also holds for the bootstrap estimate for η, i.e. ηbθ∗ . The assumptions (41) and (42) in Theorem 5 are verified in

[27] when showing (56). As for the last assumption (43), we apply Lemma 1. We show that Condition (50) on the envelop function Vn (x) holds: for n so imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

30

GUANG CHENG AND JIANHUA Z. HUANG

large that δn ≤ R Vn (x) ≡ sup {|m(θ, η) − m(θ, η0 )| : kη − η0 k2 ≤ δn , kθ − θ0 k ≤ δn } ≤ 2 sup{|m(θ, η)| : kη − η0 k2 ≤ R, kθ − θ0 k ≤ R} ≤ some constant. 5.3. Partly Linear Models. In this example, a continuous outcome variable Y , depending on the covariates (W, Z) ∈ [0, 1]2 , is modelled as Y = θW + f (Z) + ξ, where ξ is independent of (W, Z) and f is an unknown smooth function belonging to H ≡ {f : [0, 1] 7→ [0, 1],

R1

0 (f

(k) (u))2 du

≤ M } for a fixed

0 < M < ∞. In addition we assume E(V ar(W |Z)) is positive definite and E{f (Z)} = 0. We want to estimate (θ, f ) using the least square criterion: m(θ, f ) = −(y − θw − f (z))2 .

(57)

Note that the above model would be more flexible if we did not require knowledge of M . A sieve estimator could be obtained if we replaced M with a sequence Mn → ∞. The theory we develop in this paper will be applicable in this setting, but, in order to maintain clarity of exposition, we have elected not to pursue this more complicated situation here. An alternative approach is to use penalization, the study of which is beyond the scope of the present paper. Simple calculations give e m(θ, η)(x) = m1 (θ, η) − m2 (θ, η)[H † (θ, η)]

= 2(y − θw − f (z))(w − H † (θ, η)(z)),

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

31

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

where H † (θ, η)(z) =

Eθ,η (W (Y − θW − f (Z))2 |Z = z) . Eθ,η ((Y − θW − f (Z))2 |Z = z)

The finite variance condition I follows from E[W {W − H † (θ0 , η0 )(Z)}] > 0. The distribution of ξ is assumed to have finite second moment and satisfy (6), e.g. ξ ∼ N (0, 1). Conditions S1–S3 and SB2 can be verified using similar arguments in Example 3 of [7], in particular, kfbe − f0 k2 = θ

OPX (kθe − θ0 k ∨ n−k/(2k+1) ) in (12). It is easy to show that the Fr´echet

e 0 , η) is bounded around η0 , and thus the tail condition derivative of η 7→ m(θ

SB1 holds. To prove kfˆe∗ − f0 k2 = OPo XW (kθe − θ0 k ∨ n−k/(2k+1) ) via Theoθ

rem 5, we proceed as in the previous example, checking the assumption (50) using similar arguments, i.e. Vn (x) is uniformly bounded. 6. Proof of Main Results. 6.1. Proof of Theorem 1 (Semiparametric M-estimate Theorem). By Conb we have ditions S1, S3 and the consistency of θ,

o b −m e 0 ) = OP (kθb − θ0 k ∨ kηb − η0 k) = OPo X (kθb − θ0 k ∨ n−γ ). Gn (m X

Considering (5), we can further simplify the above equation to (58)



√ o b −m e 0 ) = − nPn m e 0 + OP (kθb − θ0 k ∨ n−γ ). nPX (m X

By Conditions S2-S3 we can show that the left hand side of (58) equals √

√ nPX (m11 (θ0 , η0 ) − m21 (θ0 , η0 )[H0† ])(θb − θ0 ) + OPo X ( nkθb − θ0 k2 ∨ n1/2−2γ ).

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

32

GUANG CHENG AND JIANHUA Z. HUANG

Thus (58) becomes √ nPX (m11 (θ0 , η0 ) − m21 (θ0 , η0 )[H0† ])(θb − θ0 ) √ √ o e 0 + OP (kθb − θ0 k ∨ n−γ ) + OPo X ( nkθb − θ0 k2 ∨ n1/2−2γ ) (59) = − nPn m X √ √ o e 0 + OP = − nPn m (n1/2−2γ ∨ kθb − θ0 k ∨ nkθb − θ0 k2 ) X √ (60) = OPo X (1 ∨ nkθb − θ0 k2 ),

where the second equality follows from the range of γ and the third equality e 0 . Due to the consistency of θb and condition I, holds by applying CLT to m

we can conclude that kθb − θ0 k = OPo X (n−1/2 ) based on (60), thus kηb − η0 k =

OPo X (n−γ ). Plugging the convergence rates for θb and ηb back to (59), we

complete the proof for (13). The standard CLT and (6) implies (14).

6.2. Proof of Theorem 2 (Bootstrap Consistency Theorem). To prove Theorem 2, we need the following lemma whose proof is given in the Appendix. Lemma 2. (61)

Under the assumptions in Theorem 2, we have

o e e 0 , η0 )) = OP (kθ − θ0 k ∨ kη − η0 k) G∗n (m(θ, η) − m(θ W

in PXo -Probability for (θ, η) ∈ Cn .

We shall use repeatedly Lemma A.1 in the Appendix, which concerns about the transition of stochastic orders among different probability spaces. √ √ We first prove (24). Recall that Gn = n(Pn −PX ) and G∗n = n(P∗n −Pn ). b ∗ as m( e θb∗ , ηb∗ ). By some algebra, we have Define m

imsart-aos ver.



b∗ −m e 0) nPX (m √ e0 −m b ∗ ) + Gn (m e0 −m b ∗ ) + nP∗n m b ∗, = G∗n (m e 0 + Gn m e0 + G∗n m

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

33

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

e 0 = 0. Thus we have the below inequality: since PX m

√ b∗ −m e 0 )k ≤ kG∗n m e 0 k + kGn m e 0 k + kG∗n (m e0 −m b ∗ )k k nPX (m √ e0 −m b ∗ )k + k nP∗n m b ∗k + kGn (m

(62)

≤ L1 + L2 + L3 + L4 + L5 .

Based on Theorem 2.2 in [29], we have L1 = OPo W (1) in PXo -probability. The CLT implies L2 = OPo X (1). We next consider L3 and L4 . By Condition SB3, we can show that kηb∗ − η0 k = ooPW (1) in PXo -probability since θb∗ is

assumed to be consistent, i.e. kθb∗ −θ0 k = ooPW (1) in PXo -probability, by (A.1)

and (A.5) in Lemma A.1. Then, we have L3 = ooPW (1) in PXo -probability

based on Lemma 2 and (A.5) in Lemma A.1. Next, we obtain that L4 = ooPW (1) in PXo -probability based on Condition S1 and (A.3) in Lemma A.1. Finally, L5 = 0 based on (18). In summary, (62) can be rewritten as: √ o b∗ −m e 0 )k ≤ OP (1) + OPo X (1) k nPX (m W

(63)

in PXo -probability.

Let αn = kθb∗ − θ0 k. Combining (11) with (63) and noticing (22), we have

(64)



nkAαn k ≤ OPo W (1) + OPo X (1) + OPo W

√

nα2n ∨ n−2γ+1/2



in PXo -probability. By considering the consistency of θb∗ and Condition I, we

complete the proof of (24) based on (64). We next prove (25). Write

√ b∗ −m e 0 ) = n(P∗n − Pn )(m e0 −m b ∗ ), I1 = −G∗n (m √ b −m e 0 ), b −m e 0 ) = n(Pn − PX )(m I2 = Gn (m √ b∗ −m e 0 ) = n(Pn − PX )(m e0 −m b ∗ ), I3 = −Gn (m √ ∗ ∗ √ b − nPn m. b I4 = nPn m

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

34

GUANG CHENG AND JIANHUA Z. HUANG

By some algebra, we know that

√ P e 0 = 4j=1 Ij . b ∗ − m) b + G∗n m nPX (m

By the definition (20), we can show that An × Bn = OPo W (1) in PXo probability if An and Bn are both of the order OPo W (1) in PXo -probability. Then the root-n consistency of θb∗ proven in (24) together with SB3 implies kηb∗ − η0 k ∨ kθb∗ − θ0 k = OP∗ W (n−γ )

(65)

in PXo -probability. Thus, by Lemma 2, we know I1 = OPo W (n−γ ) in PXo probability. Note that (9) and (10) of Condition S1 imply o e e 0 ) = OP (kθ − θ0 k ∨ kη − η0 k) Gn (m(θ, η) − m X

(66)

for (θ, η) in the neighborhood of (θ0 , η0 ). Considering (66), Condition S3 and Theorem 1, we have I2 = OPo X (n−γ ). By (65), (66) and (A.4), we know the order of I3 is OPo W (n−γ ) in PXo -probability. I4 = 0 by (5) and (18). Therefore, we have established: √

(67)

o e 0 + OP b ∗ − m) b = −G∗n m (n−γ ) + OPo W (n−γ ) nPX (m X

in PXo -probability. To analyze the left hand side of (67), we rewrite it as √ √ b∗−m e 0 ) − nPX (m b −m e 0 ). Applying Condition S2 to both componPX (m nents, we obtain √

b nPX (m11 (θ0 , η0 ) − m21 (θ0 , η0 )[H0† ])(θb∗ − θ)

o e 0 + OP (n−γ ) + OPo W (n−γ ) + OPo X (n1/2−2γ ) + OPo W (n1/2−2γ ) = −G∗n m X

o e 0 + OP = −G∗n m (n1/2−2γ ) + OPo W (n1/2−2γ ) X

in PXo -probability, by considering Condition S3, SB3 and the range of γ. By considering Conditions I and (A.2), we complete the proof of (25). The proof of (26) follows from Theorem 2.2 in [29].

APPENDIX imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

35

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

A.1. Measurability and Stochastic Orders. Measurability Condition M (P ) We say that a class of random functions F ∈ M (P ) if F is nearly linearly deviation measurable for P and that both F 2 and F ′2 are nearly linearly supremum measurable for P . Here F 2 and F ′2 denote the classes of squared functions and squared differences of functions from F, respectively. It is known that if F is countable, or if {Pn }∞ n=1 are stochastically separable in F, or if F is image admissible Suslin [10], then F ∈ M (P ). More precise descriptions can be found in Page 853-854 of [12]. The following lemma is very important since it accurately describes the transition of stochastic orders among different probability spaces. We implicitly assume the random quantities in Lemma A.1 posses enough measurability so that the usual Fubini theorem can be used freely. Lemma A.1.

Suppose that Qn = ooPW (1) in PXo -Probability, Rn = OPo W (1) in PXo -Probability.

We have (A.1)

An = ooPXW (1)

⇐⇒

An = ooPW (1) in PXo -Probability,

(A.2)

Bn = OPo XW (1)

⇐⇒

Bn = OPo W (1) in PXo -Probability,

(A.3) Cn = Qn × OPo X (1)

=⇒

Cn = ooPW (1) in PXo -Probability,

(A.4) Dn = Rn × OPo X (1)

=⇒

Dn = OPo W (1) in PXo -Probability,

En = Qn × Rn

=⇒

En = ooPW (1) in PXo -Probability.

(A.5)

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

36

GUANG CHENG AND JIANHUA Z. HUANG

Proof: To verify (A.1), we have for every ǫ, ν > 0, n

o PXo PW |X (|An | ≥ ǫ) ≥ ν

(A.6)

o

≤ ≤

1 o o E P (|An | ≥ ǫ) ν X W |X 1 o o E E 1{|An | ≥ ǫ} ν X W |X

based on the Markov’s inequality. Based on Lemma 6.5 and 6.14 in [19], we o Eo o o have EX W |X 1{|An | ≥ ǫ} ≤ EXW 1{|An | ≥ ǫ} = PXW (|An | ≥ ǫ), and thus

(A.7)

n

o

o PXo PW |X (|An | ≥ ǫ) ≥ ν ≤

1 o P (|An | ≥ ǫ). ν XW

¿From (A.7), we can conclude that if An = ooPXW (1), then An = ooPW (1) in PXo -probability. Another direction of (A.1) follows from the following inequalities: for any ǫ, η > 0, o o o PXW (|An | ≥ ǫ) = EX {PW |X (|An | ≥ ǫ)} o o o = EX {PW |X (|An | ≥ ǫ)1{PW |X (|An | ≥ ǫ) ≥ η}} o o o +EX {PW |X (|An | ≥ ǫ)1{PW |X (|An | ≥ ǫ) < η}} o o ≤ EX {1{PW |X (|An | ≥ ǫ) ≥ η}} + η

(A.8)

o ≤ PXo {PW (|An | ≥ ǫ) ≥ η} + η.

Note that the first term in (A.8) can be made arbitrarily small by the assumption that An = ooPW (1) in PXo -Probability. Since η can be chosen aro (|A | ≥ ǫ) = 0 for any ǫ > 0. bitrarily small, we can show limn→∞ PXW n

This completes the proof of (A.1). (A.2) can be shown similarly by using the inequalities (A.6) and (A.8).

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

37

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

As for (A.3), we establish the following inequalities: n

o o PXo PW |X (|Qn × OPX (1)| ≥ ǫ) ≥ ν

n

o o ≤ PXo PW |X (|Qn | ≥ ǫ/|OPX (1)|) ≥ ν

o

o

o o o ≤ PXo {PW |X (|Qn | ≥ ǫ/M ) + PW |X (|OPX (1)| ≥ M ) ≥ ν} o o o o ≤ PXo {PW |X (|Qn | ≥ ǫ/M ) ≥ ν/2} + PX {PW |X (|OPX (1)| ≥ M ) ≥ ν/2}

2 o o o ≤ PXo {PW |X (|Qn | ≥ ǫ/M ) ≥ ν/2} + PX (|OPX (1)| ≥ M ) ν

for any ǫ, ν, M > 0. Since M can be chosen arbitrarily large, we can show (A.3) by considering the definition of OPo X (1). The proof of (A.4) is similar by using the above set of inequalities. The proof of (A.3) can be carried over to that of (A.5). Similarly, we establish the below inequalities: n

o PXo PW |X (|Qn × Rn | ≥ ǫ) ≥ η

o

o o o ≤ PXo {PW |X (|Qn | ≥ ǫ/M ) ≥ η/2} + PX {PW |X (|Rn | ≥ M ) ≥ η/2}

for any ǫ, η, M > 0. Then by selecting sufficiently large M , we can show that n

o

o PXo PW |X (|Qn × Rn | ≥ ǫ) ≥ η → 0 as n → ∞ for any ǫ, η > 0. 

A.2. Two useful inequalities. Here we give two key technical tools, Multiplier Inequality and Hoffmann-Jorgensen Inequality, used in proving Lemmas 1 and 2. Multiplier Inequality (Lemma 4.1 of [37]) Let Wn = (Wn1 , . . . , Wnn )′ be non-negative exchangeable random variables on (W, Ω, PW ) such that for every n Rn =

Z

0

∞q

PW (Wn1 ≥ u)du < ∞.

Let Zni , i = 1, 2, . . . , n, be i.i.d. random elements in (X ∞ , A∞ , PX∞ ) with values in ℓ∞ (Fn ), and write k · kn = supf ∈Fn |Zni (f )|. It is assumed that imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

38

GUANG CHENG AND JIANHUA Z. HUANG

Zni ’s are independent of Wn . Then for any n0 such that 1 ≤ n0 < ∞ and any n > n0 , the following inequality holds: o EXW

n

1 X

Wni Zni



n i=1

(A.9)

o ≤ n0 EX kZn1 kn ·

n

EW (max1≤i≤n Wni ) √ n

 





X

 i

1 o √

+ Rn · max EX Znj

. n0 0. Obviously, the event {θe ∈

Θ, ηbe∗ ∈ Hn : d(ηbe∗ , ηn ) ≥ 2M (δn ∨ kθe − θ0 k)} is contained in the union of the θ

θ

e ηb∗ ) ∈ Sn,j,M } for j ≥ M . Thus, we have events {(θ, θe   o PXW d(ηbe∗ , ηn ) ≥ 2M (δn ∨ kθe − θ0 k), θe ∈ Θ, ηbe∗ ∈ Hn θ θ   X o e ηb∗ ) ∈ S PXW ≤ (θ, n,j,M e θ

j≥M

X



o PXW

j≥M

sup

(θ,η)∈Sn,j,M

P∗n (v(θ, η)

!

− v(θ, ηn )) ≥ 0 .

The second inequality follows from the definition of ηbe∗ . By the smoothness θ

condition on v(θ, η), i.e. (41), we have the below inequality when (θ, η) ∈ Sj,n,M for j ≥ M : (A.23)

< − d(η, η )2 + kθ − θ k2 < − 22j−2 δ 2 PX (v(θ, η) − v(θ, ηn )) ∼ n 0 ∼ n

for sufficiently large M . Considering (A.23), we have 

≤ ≤

(θ,η)∈Sn,j,M

j≥M

X

o PXW

j≥M

+PXo
− v(θ, ηn ))| ∼

> |Gn (v(θ, η) − v(θ, ηn ))| ∼





n22j−3 δn2

n22j−3 δn2

!

!

ψ (2j δ ) √ 2 2j + √n 2 n2j nδn 2 nδn 2

X ψ ∗ (2j δn ) n

j≥M

X

2j(α−2) ,

j≥M

where the third inequality follows from the Markov inequality and (42)-(43). Note that the assumption that δ 7→ ψn (δ)/δα (δ 7→ ψn∗ (δ)/δα ) is decreasimsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

45

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY

ing for some 0 < α < 2 implies that ψn (cδ) ≤ cα ψn (δ) for every c > 1. √ Combining another assumption on ψn and ψn∗ , i.e. ψn (δn ) ≤ nδn2 and √ ψn∗ (δn ) ≤ nδn2 , we have proved the last inequality in the above. By allowing M = Mn → ∞, we have completed the proof of (A.22), and thus Theorem 5.  REFERENCES [1] Banerjee, M., Mukherjee, D. and Mishra, S. (2009), Semiparametric Binary Regression Models under Shape Constraints with an Application to Indian Schooling Data. Journal of Econometrics 149 101-117 [2] Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1998) Efficient and Adaptive Estimation for Semiparametric Models. Springer-Verlag, New York. [3] Chen, X.H. and Pauzo, D. (2008) Efficient Estimation of Semiparametric Conditional Moment Models with Possibly Nonsmooth Residuals, Yale Economics Department Working Paper No.38 [4] Chen, X. (2007).Large Sample Sieve Estimation of Semi-nonparametric Models. Handbook of Econometrics, 6B 5550–5632, North-Holland. [5] Cheng, G. (2008a) Semiparametric Additive Isotonic Regression, Journal of Statistical Planning and Inference, 100, 345-362. [6] Cheng, G. and Kosorok, M. (2008a) Higher Order Semiparametric Frequentist Inference with the Profile Sampler, Annals of Statistics, 36, 1786-1818 [7] Cheng, G. and Kosorok, M. (2008b) General Frequentist Properties of the Posterior Profile Distribution, Annals of Statistics 36 1819-1853 [8] Delecroix, M., Hristache, M. and Patilea, V. (2006) On Semiparametric M-estimation in Single-index Regression, Journal of Statistical Planning and Inference 136 730-769 [9] Dixon, J., Kosorok, M. and Lee, B.L. (2005) Functional Inference in Semiparametric Models Using the Piggyback Bootstrap, Ann. Inst. Statist. Math. 57, 255-277 [10] Dudley, R.M. (1984) A course on empirical processes. Lecture Notes in Math. 1097 2-142. Springer, Berlin [11] Grenander, U. (1981). Abstract Inference John Wiley, New York.

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

46

GUANG CHENG AND JIANHUA Z. HUANG

[12] Gine, E. and Zinn, J. (1990) Bootstrapping General Empirical Functions, Annals of Probability 18 851-869 [13] Hall, P. (1992) The Bootstrap and Edgeworth Expansion. Springer Series in Statistics. Springer-Verlag, New York [14] Hardle, W., Huet, S., Mammen, E. and Sperlich, S. (2004) Bootstrap Inference in Semiparametric Generalized Additive Models, Econometric Theory 20 265-300 [15] Henry, M. and Robinson, P.M. (2003) Higher-Order Kernel Semiparametric MEstimation of Long Memory, Journal of Econometrics 114 1-27 [16] Hoffmann-Jorgensen, I. (1984) Stochastic Processes on Polish Spaces. Unpublished Manuscript [17] Huang, J. (1999) Efficient Estimation of the Partly Linear Cox Model, Annals of Statistics 27 1536-1563. [18] Kosorok, M., Lee, B.L. and Fine, J.P. (2004) Robust Inference for Univariate Proportional Hazards Frailty Regression Models. Annals of Statistics 32 1448-1491 [19] Kosorok, M. (2008) Introduction to Empirical Processes and Semiparametric Inference. Springer, New York [20] Kosorok, M. (2008) Boostrapping the Grenander estimator. Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen. IMS Collectiions 1 282-292. [21] Lee, B.L., Kosorok, M.R. and Fine, J.P. (2005) The profile sampler, Journal of the American Statistical Association 100 960-969 [22] Lin, D. Y., Fleming, T. R. and Wei, L. J. (1994) Confidence bands for survival curves under the proportional hazards model, Biometrika 81 73-81 [23] Ma, S. and Kosorok, M. (2005a) Robust Semiparametric M-estimation and the Weighted Bootstrap, Journal of Multivariate Analysis 96 190-217 [24] Ma, S. and Kosorok, M. (2005b) Penalized Log-likelihood Estimation for Partly Linear Transformation Models with Current Status Data, Annals of Statistics 33 22562290 [25] Morton-Jones, T., Diggle, P., Parker, L., Dickinson, H.O. and Binks, K. (2000) Additive Isotonic Regression Models in Epidemiology, Statistics in Medicine 19 849-859. [26] Murphy, S.A. and van der Vaart, A.W. (1997) Semiparametric Likelihood Ratio In-

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

47

SEMIPARAMETRIC BOOTSTRAP CONSISTENCY ference, Annals of Statistics 4 1471-1509

[27] Murphy, S.A. and van der Vaart, A.W. (1999) Observed Information in Semiparametric Models, Bernoulli 5 381-412 [28] Murphy, S.A. and van der Vaart, A.W. (2000) On Profile Likelihood, Journal of the American Statistical Association 95 1461-1474 [29] Praestgaard, J. and Wellner, J. (1993) Exchangeably Weighted Bootstraps of the General Empirical Process, Annals of Statistics 21 2053-2086 [30] Roeder, K., Carroll, R. J., and Lindsay, B. G. (1996) A Semiparametric Mixture Approach to Case-Control Studies with Errors in Covariables, Journal of the American Statistical Association 91 722-732 [31] Sen, B., Banerjee, M. and Woodroofe, M. B. (2009) Inconsistency of bootstrap: the Grenander estimator. Annals of Statistics Invited Revision. [32] Shao, J. and Tu, D. (1996) The Jackknife and Bootstrap, Springer, New York [33] Strawderman, R. (2006) A Regression Model for Dependent Gap Times, The International Journal of Biostatistics 2 1 [34] van der Vaart, A. W., and Wellner, J. A. (1996) Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York [35] van der Vaart, A.W. (1998) Asymptotic Statistics, Cambridge University Press, Cambridge [36] van de Geer, S. (2000) Empirical Processes in M-estimation, Cambridge University Press, Cambridge [37] Wellner, J.A. and Zhan, Y. (1996) Bootstrapping Z-estimators. Technical report 308, University of Washington [38] Wellner, J.A. and Zhang, Y. (2007) Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates. Annals of Statistics 35 2106-2142 [39] Wellner, J.A., Zhang, Y. and Liu, H. (2002) Two Semiparametric Estimation Methods for Panel Count Data. Unpublished Manuscript. [40] Wouk, A. (1979). A Course of Applied Functional Analysis. Wiley, New York. [41] Zeng, D.L. and Lin, D.Y. (2007) Maximum Likelihood Estimation in Semiparametric Models with Censored Data (with discussion), Journal of the Royal Statistical Society B 69 507-564

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009

48

GUANG CHENG AND JIANHUA Z. HUANG

[42] Zhang, C.M. and Yu, T. (2008) Semiparametric Detection of Significant Activation for Brain FMRI, Annals of Statistics 36 1693-1725 Guang Cheng

Jianhua Z. Huang

Department of Statistics

Department of Statistics

Purdue University

Texas A&M University

West Lafayette, IN 47907-2066

College Station, TX 77843-3143

Email: [email protected]

Email: [email protected]

imsart-aos ver.

2006/01/04 file:

bconsistency-JH_v6.tex date:

June 6, 2009