Quantile Regression on Quantile Ranges

0 downloads 0 Views 316KB Size Report
exploring the whole conditional distribution of the response and not just its mean. ... ical results using convexity arguments and derived a sup-Wald test for threshold ... quantile regression on quantile ranges (QRQR), aiming to estimate the ..... 4Assuming i.i.d. errors, the formula for the scale term ω can be simplified as ω =.
Quantile Regression on Quantile Ranges Chung-Ming Kuan, Christos Michalopoulos and Zhijie Xiao

(First version: December 20th , 2010) This version: June 5th , 2015

Abstract We study, via quantile regression, time series models whose conditional distribution may change over different quantile range of a threshold variable. A general threshold quantile regression model, Quantile Range Quantile Regression (QRQR), is proposed. We derive the limiting distribution of the estimated threshold parameter under the framework of asymptotically shrinking regime-change size. We derive the Bahadur representation and show that the asymptotic distribution of the standardized estimators converges to a two-parameter Gaussian process with a variance-covariance matrix reflecting the serially correlated errors . We construct confidence intervals for the estimated threshold parameter via a Likelihood-ratio-type statistic, tabulate critical values and by extensive simulation we investigate their coverage probabilities. Our asymptotic results complement those found in the existing literature on threshold regression models.

∗ Chung-Ming Kuan, Finance Dept. National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 106, Taiwan. Email: [email protected]. ∗∗ Christos Michalopoulos, Economics Dept. Soochow University, 56, Kuei-Yang St., Sec. 1, Taipei, Taiwan. Email: [email protected]. ∗∗∗ Zhijie Xiao, Economics Dept. Boston College, Chestnut Hill, MA, 02467, USA. Email: zhi [email protected]. This paper has benefitted from comments received in SETA 2010 conference in Singapore and in the 2013 International Conference of Financial Econometrics at Shandong University. Michalopoulos acknowledges the financial help received from the National Science Council of Taiwan: NSC 100-2420-H-002013-DR.

1

Introduction

Quantile regression, as introduced in Koenker and Bassett (1978), is a robust to outliers and flexible to error distribution alternative to the least squares regression, capable of exploring the whole conditional distribution of the response and not just its mean. The considerable interest it has generated in applied and theoretical statistics and econometrics is reflected in the monograph of Koenker (2005). While in its simplest and most widely used setting, a linear model is specified, increasing empirical research has shown that such a specification is unable to describe the intricate and often non-linear relations among economic variables. Economic factors such as technical changes, policy shocks and other unforeseen events in the economic environment often have a non-linear effect on the relationship between economic variables. A flexible modeling approach to capture such non-linear effects in the data without assuming a specific non-linear functional form for the covariates, is the so-called “threshold model” which may have known or unknown, one or multiple threshold variables. Such models split the sample into different “classes” or “regimes” according to the magnitude of some threshold variable, hence admitting a threshold-type non-linear effects on the conditional distribution of the response. Threshold regression models have witnessed an increasing interest in applied and theoretical statistics and econometrics research, with a wide array of applications; see Tong (1983, 1990). In the least squares context, threshold models have been used in many fields. In economics, they have been utilized to explain the cross-country behavior of GDP growth, Durlauf and Johnson (1995)1 and Hansen (2000); modeling the different regimes of GNP and unemployment, Potter (1995), Chan and Tsay (1998) and Gonzalo and Wolf (2005), investigating the non-linear adjustement in the deviation of exchange rates from their equilibrium, Kilian and Taylor (2003) and searching for threshold effects in the relationship between growth and inflation, Khan and Senhadji (2001). In finance, self-exciting threshold autoregressive models have been used to check for mean reversion in interest rates, Pfann et al. (1996) and Gospodinov (2005), and classify stock market regimes as in Chen et al. (2009). Further theoretical results and applications in linear least squares with threshold effects can be found in Chan (1993), Chan and Tsay (1998), Hansen (1996, 2000), Caner and Hansen (2001) and Seo and Linton (2007). It is natural to consider threshold effects on conditional quantile functions rather that conditional mean functions in order to capture non-linear effects of different quantiles of a covariate on different quantiles of the response. Recently, it has been noticed that in practice that different ranges of some covariate can have a different impact on the conditional distribution of the response. Koenker and Machado (1999) found het1

The authors use regression tree analysis, but as Hansen (2000) notes, this is another form of a threshold regression model.

1

erogeneous effects of public consumption and terms of trade, on different quantiles of the GDP growth rate for a panel of countries. Kuan and Chen (2013) studied the effects of National Health Insurance on precautionary saving in Taiwan and showed that different quantile ranges of the income and age covariates, affect differently the conditional quantiles of savings. The issue has only recently started receiving attention in the econometrics literature. Caner (2002) provided asymptotic results for a threshold least absolute deviations model but did not consider inference. Kato (2008) provides theoretical results using convexity arguments and derived a sup-Wald test for threshold effects in Caner’s LAD framework. Cai and Stander (2008) propose a quantile self-exciting threshold autoregressive time-series model and adopt Bayesian inferential procedures. Galvao et al. (2011, 2014), consider a threshold related to a particular quantile of the errors process; for some known function of covariates, they assume there is a threshold effect at some point on this function related to the error’s process. However, Galvao et al. (2011, 2014) do not consider inference regarding the threshold parameter estimator and do not consider serially correlated errors. Finally, Lee et al. (2011) discuss threshold effects on a variety of models including quantile regression but in an independent errors setting. In this paper, we formulate a general quantile regression threshold model, called quantile regression on quantile ranges (QRQR), aiming to estimate the effects of a specific quantile range of a covariate, on quantiles of the response distribution. We derive the Bahadur representation of the normalized estimators and show that due to serially correlated errors, it converges to a two-paremeter Gaussian process with a complicated variance/covariance matrix that depends heavily on nuisance parameters. This extends the results in Galvao et al. (2014). We derive the limiting distribution of the estimated partition quantile assuming asymptotically diminishing threshold effects as in Picard (1985) and Hansen (2000), in order to obtain a limiting distribution invariant to the distribution of the regressors and error. We also derive the limiting distribution for threshold effects that do not depend on sample size. We then construct confidence intervals for the estimated threshold parameter using a Likelihood-Ratio-type statistic extending the results of Caner (2002) and tabulate critical values for all quantiles. Extensive simulations are provided under different error settings with satisfying results. The paper is organized as follows. Section 2 introduces the model, section 3 discusses estimation and assumptions, section 4 gives asymptotic results while section 5 deals with inference. Section 6 concludes. All proofs are delegated to the Technical Appendix. In what follows, we denote by k · k the classical Euclidean norm, by weak converP gence over some compact metric space and by → convergence in probability.

2

2

The Model

Let {yi : i = 1, . . . , n} be the regressand and {zi : i = 1, . . . , n} a p × 1 vector of random variables. We consider the following threshold linear regression model |

yi = zi θi + i ,

(1)

where {i : i = 1, . . . , n} are regression errors. Let xi be the threshold variable, which may be an element of zi . We assume that the distribution of xi is continuous, and partition zi into two groups z1i and z2i , that the regime change affects the covariates in z2i , and not the covariates in z1i . Thus we can rewrite (1) as |

|

yi = z1i γ + z2i βi + i .

(2)

In this paper, we are particularly interested in the conditional distribution of yi in the presence of a threshold effect. For this reason, we further assume the τth conditional quantile function of yi can be written as |

|

Q yi (τ|zi ) = z1,i γ(τ) + z2i βi (τ).

(3)

n o where Q yi (τ|zi ) ≡ inf y : F y|z (yi |zi ) ≥ τ , is the quantile function of yi conditional on zi , τ ∈ [0, 1], and the parameter βi (τ) = β(τ, xi ) reflects the possible regime-change behavior of β in some quantile of yi . For convenience of discussion, we consider the case where regime-shift occurs at, say, the τ∗x -th quantile of xi , q∗ = F−1 (τ∗x ), then X ( βi (τ) = β(τ, xi ) =

β(τ), xi ≤ q∗ . β(τ) + δ(τ), xi > q∗

(4)

We call q∗ = F−1 (τ∗x ) the partition quantile. For convenience of our later analysis, we X | | | | introduce the following variable: zqi = z2i 1{xi > q}. Thus, for any given q, zqi = z2i when | | | | | xi > q, and zqi = 0 if xi ≤ q. Let Zq∗ i = (z1i , z2i , zq∗ i ), we can re-write (3) as |

|

|

|

Q yi (τ|zi ) = z1i γ(τ) + z2i β(τ) + zq∗ i δ(τ) = Zq∗ i α(τ),

(5)

| where δ(τ) denotes the size of the regime-change of β(τ), α(τ)| = (γ(τ)| , β(τ)| , δ(τ) ).  | th Considering a p -order autoregressive specification with z2,i = 1, yi−1 , · · · , yi−(p−1) , and | zq,i = z2,i 1{xi ≤ q∗ }, and assume that i are iid, we obtain the threshold quantile autore| | gressive model of Galvao et al. (2011, 2014): Q yi (τ|Fi−1 ) = z2,i β(τ) + zq∗ ,i δ(τ), with Fi−1 the sigma-algebra generated by lagged values of yi , i ∈ Z. The formulation of threshold regression models shares similarities with modeling

3

structural breaks since in the later, the threshold variable is just the index i for crosssection or t for time series data. But it is different to a change-point model in that regime changes are driven by some random variable which can be an exogenous and unobservable Markov chain in Markov-switching models, or another random variable for threshold models, thereby allowing for modeling a much richer data behavior. For structural breaks models in the linear least squares framework, see among others, Andrews (1993), Kuan and Hornik (1995), Bai (1996), Kuan and Hsu (1998), Bai and Perron (1998). For structural breaks models on quantiles (including the median) see Bai (1995), Qu (2008) and Su and Xiao (2008, 2009). We study the QRQR model under different assumptions of δ(τ). In particular, we consider the problem of estimating the threshold effect and partition quantile q∗ in Section 2; and consider inference problems about the partition quantile estimator in Section 3. In Section 4, we investigate statistical inference on the null hypothesis of no threshold effect.

3

Estimation of The Partition Quantile

When the partition quantile q∗ is known, one can simply run quantile regressions using specification (5) by a minimization procedure, i.e., arg min α∈A

n X

  | ρτ yi − Zq∗ ,i α ,

i=1

for ρτ (·) (check function) defined as ρτ (u) = u(τ − 1{u < 0}) and A some compact parameter space, to obtain estimates of γ(τ), β(τ) and δ(τ). In practice, the partition quantile q∗ is unknown so we need to estimate it along with the other model parameters. For each q, we define the sum of asymmetrically weighted absolute deviations as n   X   | Sn τ, q, α = ρτ yi − Zq,i α . i=1

In practice, we consider τx ∈ [τxL , τxU ], 0 < τxL < τxU < 1, and let Q = {q : qL ≤ q ≤ qU } x −1 x where qL = F−1 x (τL ) and qU = Fx (τU ). Given q ∈ Q = [qL , qU ], we first obtain the estimates of parameters γ(τ), β(τ) and δ(τ) from   ˆ q)| )| = arg min Sn τ, q, α , ˆ q)| , δ(τ, ˆ q) = (γ(τ, ˆ q)| , β(τ, α(τ, α

4

  ˆ q) over q ∈ Q: Then we estimate the partition quantile q∗ by minimizing Sn τ, q, α(τ,   ˆ q) , qˆ = arg min Sn τ, q, α(τ, q∈Q

ˆ qˆ ). In practice, we may therefore obtaining the estimated model parameters by α(τ, order xi and search for the partition quantile among the x(k) s, the ordered statistics of xi , over k = nτxL , nτxL + 1, . . . , nτxU . Since there are n observations, computing the partition quantile requires less than n function evaluations. For convenience of asymptotic analysis, we make the following assumptions: Assumption [A.1]  (i) The conditional distribution functions of yi , Fi (y) = Pr Yi ≤ y|Fi−1 have continuous Lebesgue densities fi (y) uniformly bounded away from 0 and ∞ at points Fi −1 (τ). (ii) We assume {yi , zi , xi }ni=1 is β-mixing with mixing coefficients βm satisfying mp/(p−2) (log m)2(p−1)/(p−2) βm → 0, as m → ∞. [A.2] For any δ > 0, there exists some σ(δ) > 0 such that −1 < δ, sup fi (F−1 (τ) + c) − f (F (τ)) i i i τ∈T

for all |c| < σ(δ) and all 1 ≤ i ≤ n. n  o | [A.3] Q = [qL , qU ] is compact. For all q ∈ Q, α(τ, q) = arg minα∈A E ρτ yi − Zqi α(τ) exists and is unique. Furthermore, α(τ) is in the interior of the parameter space A, with A compact and convex. [A.4] kzi k2+ < ∞ and max1≤i≤n kzi k = oP (n1/2 ). Assumption [A.1] is standard in quantile regression literature. The assumption about the errors is the same as in Arcones and Yu (1994) and Galvao et al. (2011), and is required for the empirical process to satisfy a Functional Central Limit Theorem2 . Assumption [A.2] is borrowed from the structural breaks literature and imposes smoothness of the conditional densities in some neighborhood of F−1 (τ), uniformly in i = 1, · · · , n. i Finally, assumption [A.3] is assumed for convenience of identification while assumption [A.4] is used to verify that our regressors satisfy the Lindeberg condition for a central limit theorem. 2

See Theorem 2.1 of Arcones and Yu (1994). The concept of β-mixing is defined there as well.

5

The next Theorem, is an extension of Lemma 2 of Galvao et al. (2011), gives consistency of the estimated partition quantile qˆ when the threshold effect, δ(τ), is ono-zero and fixed as n → ∞. Theorem 3.1 (Consistency) Given assumptions [A.1]-[A.4] and (4), , P

P

ˆ qˆ ) → α0 (τ, q∗ ) and qˆ → q∗ , α(τ, and



ˆ qˆ ) − α0 (τ, q∗ ) = OP (1) and n|ˆq − q∗ | = OP (1), n α(τ, where q∗ is the true partition quantile. We can see that qˆ is super-consistent due to the discontinuous threshold effect specification of our model. The fast rate of convergence implies that in the estimation procedure described before, we can assume the partition quantile q as “known” and proceed to the estimation of the remaining parameters without worrying about threshold estimation effects on the slope coefficients. In the next section, to facilitate inference on q∗ , we consider a shrinking shifts where the difference in regression slopes gets smaller as the sample size increase. We have simulated the performance of the above estimation procedure and the results are displayed in the tables and figures that follow. The tables display statistics for the empirical distribution of the estimated partition quantile for different specification of the error distribution, e ∼ N(0, 1), e ∼ t3 and e ∼ 0.7 × N(0, 3) + 0.3 × N(1, 2), different magnitude of the regime-change, i.e. δ(τ) = 0.5, 2, and different values of the sample size n = 200, 500. The threshold has been set to occur at q∗ = 0.

6

Table 1: Quantiles of qˆ distribution: i ∼ N(0, 1). n=200 n=500 δ = 0.5, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90

qˆ 5% -1.504 -1.064 -1.313 -1.217 -1.636

δ = 2, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90



50% -0.021 -0.178 -0.063 -0.023 -0.046

95% 1.331 1.289 1.048 1.270 1.382

5% -0.598 -0.301 -0.648 -0.214 -0.526

qˆ 5% -0.074 -0.019 -0.119 -0.027 -0.066

50% 0.017 0.002 0.003 -0.006 0.002

95% 0.707 0.127 0.669 0.160 0.547



50% -0.048 0.014 -0.016 -0.024 -0.007

95% -0.016 0.020 0.067 0.044 0.076

5% -0.051 -0.048 -0.038 -0.036 -0.012

50% 0.001 -0.025 -0.034 0.001 0.000

95% 0.014 0.001 -0.025 0.018 0.001

Table 2: Quantiles of qˆ distribution: i ∼ t3 . n=200 n=500 δ = 0.5, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90

qˆ 5% -1.663 -1.642 -1.396 -1.555 -1.611

δ = 2, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90



50% 0.056 -0.096 -0.007 -0.037 -0.051

95% 1.704 1.437 1.264 1.399 1.661

5% -1.410 -0.369 -1.038 -0.585 -1.342

qˆ 5% -0.334 -0.026 -0.049 -0.059 -0.095

50% 0.008 -0.007 0.006 -0.051 0.006

95% 1.317 0.633 0.673 0.676 1.208



50% -0.030 -0.011 0.017 -0.005 0.071

95% 0.051 0.022 0.223 0.075 0.288

7

5% -0.017 -0.008 -0.053 -0.019 -0.042

50% 0.007 0.002 0.012 0.003 0.001

95% 0.064 0.071 0.098 0.010 0.053

80% 60% 0% 10 20%

40%

Relative Frequency

60% 40% 0% 10 20%

Relative Frequency

80%

100%

^ Empirical Distribution Function of q

100%

^ Empirical Distribution Function of q

−2

−1

0 Threshold n=200, N(1,2)

1

2

−2

1

2

200 150 Relative Frequency

n=500, e~t_3

0

50

100

150 100 0

50

−1

0

1

2

−2

−1

0

1

^ Empirical Distribution Function of q

^ Empirical Distribution Function of q

2

80% 60% 0% 10 20%

0% 10 20%

40%

60%

Relative Frequency

80%

100%

^ q

100%

^ q

40%

Relative Frequency

n=200, e~t_3

−2

Relative Frequency

0 Threshold n=500, N(1,2)

^ Empirical Distribution Function of q

200

^ Empirical Distribution Function of q

−1

−2

−1

0 Threshold n=200,e~Mix.Norm.

1

2

−2

−1

0 Threshold n=500,e~Mix.Norm.

1

2

Figure 1: EDF of partition quantile q, e ∼ N(1, 2), t3 , 0.7 × N(0, 3) + 0.3 × N(1, 2), δ = 2, n=200, 500.

8

Table 3: Quantiles of qˆ distribution: i ∼ 0.7 × N(0, 3) + 0.3 × N(1, 2). n=200 n=500 δ = 0.5, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90

qˆ 5% -1.576 -1.709 -1.466 -1.410 -1.675

δ = 2, q0 = 0 Quantiles τ=0.10 τ=0.25 τ=0.50 τ=0.75 τ=0.90

4



50% -0.180 -0.089 -0.113 -0.066 0.016

95% 1.529 1.248 1.552 1.412 1.719

5% -1.299 -1.138 -1.437 -1.043 -1.345

qˆ 5% -0.348 -0.249 -0.180 -0.201 -0.189

50% 0.001 -0.007 0.029 -0.019 0.001

95% 1.491 1.180 1.040 0.688 1.409



50% -0.035 -0.015 -0.002 -0.062 0.023

95% 0.368 0.0236 0.287 0.081 0.208

5% -0.042 -0.077 -0.056 -0.021 -0.106

50% 0.003 -0.009 0.003 -0.007 0.003

95% 0.065 0.042 0.042 0.030 0.032

Inference on the Partition Quantile

To derive the limiting distribution of qˆ and facilitate statistical inference, we use a shrinking threshold asymptotics framework. Define the following matrix functionals:   |   | |x = q . |x = q and V(τ, q) = E z z D(τ, q) = E fi (F−1 (τ)|z )z z i 2i i 2i i i 2i 2i Denote by g(q) the density of xi at q, assumed continuous here, and by g(q∗ ) the same density at the true partition quantile q∗ . Similarly, define V0 = V(τ, q∗ ) and D0 = D(τ, q∗ ), both V0 and D0 are dependent on τ. We need to make some additional assumptions: [A.6] For all q ∈ Q, E[kz2i k4 |xi = q] < ∞, E[ fi (F−1 (τ)|zi )2 kz2i k4 |xi = q] < ∞ for i 0 < g(q) ≤ g < ∞. [A.7] The matrix functionals V0 , D0 and the density g0 , are continuous. [A.8] δ(τ) = cn−ζ with c , 0 and α ∈ (0, 1/2).3 [A.9] c| V0 c > 0 and c| D0 c > 0. 3

For simplicity, we supress notation that makes c dependent on τ, that is c = cτ .

9

The above assumptions, are similar to those of Hansen (2000) and Caner (2002). Assumption [A.6] bounds the conditional moments of our regressors while [A.7] requires the distribution of the regime-change regressor xi to be continuous. Assumption [A.8] requires that the difference in regression slopes gets smaller as the sample size gets bigger; see Picard (1986) and Bai (1995). This shrinking threshold (shrinking shifts) asymptotics framework is necessary if we want to have a nuisance-parameter free asymptotic distribution of the partition quantile q. Finally assumption [A.9] is a full-rank condition, required in order to have a non-degenerate asymptotic distribution. It also excludes the case of having a continuous threshold model; see Hansen (2000). We can now state the asymptotic distribution of the estimated partition quantile qˆ . Theorem 4.1 Under assumptions [A.1] and [A.6]-[A.9], and for ζ ∈ (0, 1/2) n1−2α (ˆq − q∗ )

ωT,

where ( ) 1 c| V0 c and T = arg max W(r) − |r| , ω = τ(1 − τ) | (c D0 c)2 g0 2 r∈R with W(r) a two-sided Wiener process. Some comments on the above theorem are needed. The difference between a threshold and a change-point model is that in the first, the asymptotic precision of qˆ is proportional | to the conditional matrix functional E(z2i z2i |xi = q) while in the second, the asymptotic | precision is proportional to the unconditional moment matrix E(z2i z2i ). In addition, the asymptotic distribution of qˆ becomes less dispersed with larger g0 , that is when there is an increasing number of observations near the true partition quantile q∗ . Comparing our result to that of the current literature, our scale term ω generalizes that of Caner (2002) while in Hansen (2000), the same term under the assumption of conditional homoskedasticity, takes the form ω=

σ2 , | (c| E[z2i z2i ]c)g0

for σ2 the variance of the error term. For LAD estimation (see, e.g. Bai (1995)), a similar term is given by 1 . | f (F−1 (1/2))2(c| E[z2i z2i ]c)g0 The parameter ζ controls the rate at which δ, the size of the partition quantile effect decreases to zero. Notice though that if ζ is small enough, the rate of convergence 10

approaches the super-consistent rate n. The two-sided Wiener process W(r) appearing in the formula above, is defined by ( W(r) =

W1 (r) when r ≥ 0 , W2 (−r) when r < 0

for two independent Wiener processes W1 (r), W2 (r) on the non-negative half line with W1 (0) = W2 (0) = 0. The distribution function for T is found in Bhattacharya and Brockwell (1976) and is given by x P(T ≤ x) = 1 + 2π

!1/2

! ! ! 3 3x1/2 x x+5 exp(−x/8) + exp(x)Φ − − Φ − , 2 2 2 2

where Φ(x) denotes the standard normal cumulative distribution function and P(T ≤ x) = 1 − P(T ≤ −x), for x < 0. The two-sided Wiener process results from the discontinuity created by the threshold parameter q of the regression function; see Chan (1993).4 For completeness of discussion, we consider the case of a fixed magnitude of shift for the partition quantile q. We need to replace assumption [A.8] with the following; [A.9] δ(τ) = c(τ), for c(τ) a fixed number. [A.10] The random variables ρτ ( ± d) − ρτ () are continuous. Assumption [A.9] implies that ζ = 0 in Assumption [A.8], hence the fixed magnitude of the change-point not varying with sample size. Assumption [A.10] is borrowed from Koul et al. (2003) and is needed to obtain the limiting distribution of the estimated partition quantile. Theorem 4.2 Under assumptions [A.1]-[A.4] and [A.6], [A.7], [A.9], [A.10]. we have n (ˆq − q∗ )

arg min W# (τ, v), v

where   o  Pn n  ∗ ∗ −1  ρ  (τ) + z 1{q < x ≤ q + vn } − ρ  (τ) , v>0  τ i 2i i τ i i=1    # W (τ, v) =  ,    o  Pn n   ∗ −1 ∗  < xi ≤ q } − ρτ i (τ) , v < 0 i=1 ρτ i (τ) − z2i 1{q + vn and W# (τ, v) = 0 when v = 0, where W# is a dependent two-sided random walk on the integers. 4

Assuming i.i.d. errors, the formula for the scale term ω can be simplified as ω =

11

τ(1−τ) . fi (F−1 (τ)|zi )2 c| D0 cg0 i

Assumption [A.10] assures us that the arg minv W# (τ, v) is uniquely (almost sure) defined; see Bai (1995) pp.410-411 for detailed discussion. Since the limiting distribution depends on the distribution of the regressors and the error, it is not convenient to use in hypothesis testing, although simulation is possible5 . We proceed with conducting inference for the estimated partition quantile, and follow Hansen (2000) and Caner (2002) to construct a Likelihood-ratio-type test for testing the null hypothesis H0 : q = q∗ and construct confidence intervals for our estimator. The reason we have chosen this test is that confidence intervals based on Wald statistics are found to be less reliable. In particular, Dufour (1997) has shown that inverting a Waldtype statistic to build confidence intervals for some parameter that is locally almost unidentified (LAU - as is the case in threshold regression models), will lead to confidence intervals with true level that deviates arbitrarily from its nominal level. In addition, approximations of the statistic based on Edgeworth expansions or the bootstrap will not help. On the contrary, likelihood-ratio tests are found to be more reliable; see Gleser and Hwang (1987), Nelson and Savin (1990) and Dufour (1997). Our Likelihood-ratio-type statistic for H0 : q = q0 takes the following form: LRn (τ) =

Sn (τ, q0 ) − Sn (τ, qˆ ) , τ(1 − τ)

where Sn (·) denotes the sum of asymmetrically weighted absolute residuals for the restricted (under the null) and the unrestricted model for each τ ∈ T . We reject the null for large values of LRn (τ, q, α). Our statistic extends that of Caner (2002) where he considers only the case τ = 1/2, and also extends the statistic of Koenker and Basset (1982) where they considered conditionally homoskedastic errors with no threshold effects. Theorem 4.3 Under assumptions A[1]-A[8] and H0 , we have LRn (τ)

η2 ξ,

where " # c| V0 c |r| and ξ = max W(r) − , η = | r∈R c D0 c 2 2

for a two-sided Wiener process W(r) and V0 , D0 as defined before. Furthermore, the distribution function of ξ is given by P(ξ ≤ t) = (1 − exp(−t))2 . 5

See Bai (1995).

12

Under homoskedasticity, we have η2 =

1 fˆ(F−1 (τ))

,

where f (·) and F (·) are the pdf and CDF of the errors. Let fˆ(F−1 (τ)) be a consistent estimator of f (F−1 (τ)), the density re-scaled Likelihood-ratio-type statistic can be written as h i fˆ(F−1 (τ)) Sn (τ, q0 ) − Sn (τ, qˆ ) ξ. LR∗n (τ) = τ(1 − τ) This requires only the estimation of the unconditional density of the errors at the τth quantile, and it is nuisance-parameter-free. In the general case with heterogeneous errors, following the lines of Hansen (2000), Weiss (1991) and Caner (2002), denote r1i =



2 | gˆ i,τ δ(τ) ,

r2i =



f (F−1 v (τ)|zi , xi )

2 | gˆ i,τ δ(τ)

E(r1i |xi = q∗τ ) and η = . E(r2i |xi = q∗τ ) 2

we estimate η2 by kernel regression as Pn ηˆ = Pn 2

i=1

|

2 ˆ Khn (ˆqτ − xi )( gˆ i,τ δ(τ)) |

2 ˆ qτ − xi )Khn (uˆ i )( gˆ i,τ δ(τ)) i=1 Khn (ˆ

,

and construct the re-scaled Likelihood-ratio-type statistic as h i Sn (τ, q0 ) − Sn (τ, qˆ ) ξ. LR∗n (τ) = τ(1 − τ)ηˆ 2 In the following table, we provide critical values for a range of quantiles τ, hence expanding Table 1 of Hansen (2000)6 . We assess the performance of the proposed confidence regions for the estimated partition quantile qˆ by simulation, under the assumption of homoskedastic and heteroskedastic errors, following the test-inversion method of Hansen (2000). In the case of homoskedastic errors, we write n o Qˆ = q : LRn (τ) ≤ cξ (β) , where β denotes the asymptotic confidence level, i.e. 90%, 95%, etc, while cξ (β) is the 6

The row for τ = 0.50 in our table, corresponds to Table 1 of Hansen (2000).

13

τ 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95

Table 4: Critical Values for the LR statistic 1-a 0.80 0.85 0.90 0.925 0.95 0.975 4.497 5.101 5.939 6.528 7.352 8.751 4.519 5.127 5.969 6.561 7.389 8.796 4.589 5.206 6.062 6.663 7.504 8.932 4.714 5.347 6.226 6.843 7.707 9.174 4.906 5.566 6.481 7.123 8.022 9.549 5.192 5.890 6.858 7.538 8.490 10.105 5.621 6.376 7.424 8.160 9.190 10.939 6.297 7.143 8.317 9.141 10.295 12.254 7.495 8.502 9.899 10.880 12.254 14.586 10.316 11.702 13.626 14.977 16.867 20.077

0.99 10.592 10.645 10.810 11.103 11.556 12.230 13.240 14.831 17.653 24.299

critical value taken from Table 1. In case of heteroskedastic errors, we have n o Qˆ ∗ = q : LR∗n (τ) ≤ cξ (β) , where LR∗n (·) = LRn (·)/ηˆ 2 , for a consistent estimator of η. Therefore, Qˆ ∗ is a heteroskedasticity robust confidence interval for the estimated threshold value qˆ . The simulation design is the same as in Hansen (2000) and Caner (2002). In particular, in the specification model, |

|

yi = zi γ + zq,i δ + i , we have set zi = (1 , xi )| , δ = (δ1 , δ2 ), δ1 = 0, xi , i ∼ N(0, 1) and q = 0.75. We have used threshold sizes δ = 0.50, 1, and 2 and sample size n = 100, 200, 300. For the homoskedastic errors setting, η2 = 1, therefore, in order to compute the Likelihood-ratio-type statistic, an estimator of the unconditional density of the errors at the quantile of interest is required. This can be done by using a kernel estimator, such as the Epanechnikov kernel, Khn (u) = 34 (1 − u2 ){|u| ≤ 1}, with hn the bandwidth such that √ hn → 0, nhn → ∞ and Khn (u) = h−1 n K(u/hn ); see Caner (2002) and Hardle and Linton (1994). Here, we have used the bandwidth recommended on page 81 of Koenker (2005), that is n  R y o −1 hn = min σˆ y , Φ (τ + bn ) − Φ−1 (τ − bn ) , 1.34 where Φ is the Gaussian cumulative distribution function with density function φ, R y = !1/5 4.5φ4 (Φ−1 (τ)) 1 ˆ ˆ ˆ 0.75) − Q( y, ˆ 0.25) the interquartile range, and bn given by bn = n1/5 [2Φ−1 (τ)2 +1]2 Q( y, .

14

The second case assumes heterogeneous errors, therefore we estimate η2 by ηˆ 2 . Since it is known that the kernel function matters less and the bandwidth is the important parameter in nonparametric estimation, we have tried two bandwidths: the first is Silverman’s “rule of thumb” and the second is based on cross-validation; see Li and Racine (2007). In the heterogeneity tables, we display the coverage probability estimated with Silverman’s “rule of thumb” while in brackets we display the bandwidth selected by cross-validation. Table 5: 90% Coverage Probabilities for qˆ : Homogeneity. τ= 0.10 σx = 1 σ = 1 τ= 0.25 σx = 1 σ = 1 Sample Size δ= 0.50 δ= 1.00 δ= 2.00 Sample Size δ= 0.50 δ= 1.00 n = 100 94.3% 87.0% 95.3% n = 100 99.6% 99.30% n = 200 92.0% 90.3% 88.3% n = 200 97.3% 98.6% n = 300 91.3% 89.0% 90.0% n = 300 97.0% 98.3%

δ= 2.00 99.0% 99.3% 98.0%

τ= 0.10 Sample Size n = 100 n = 200 n = 300

σx = 1 δ= 0.50 85.0% 86.3% 90.6%

σ = 2 δ= 1.00 96.0% 93.6% 89.0%

τ= 0.10 Sample Size n = 100 n = 200 n = 300

σx = 1 δ= 0.50 88.6% 87.3% 88.0%

σ = 4 δ= 1.00 90.3% 86.6% 89.6%

τ= 0.10 Sample Size n = 100 n = 200 n = 300

σx = 2 δ= 0.50 91.6% 91.0% 92.6%

σ = 1 δ= 1.00 96.6% 95.0% 92.0%

τ= 0.10 Sample Size n = 100 n = 200 n = 300

σx = 4 δ= 0.50 99.0% 96.3% 88.3%

σ = 1 δ= 1.00 99.3% 98.6% 95.3%

δ= 2.00 94.3% 91.6% 89.6%

τ= 0.25 Sample Size n = 100 n = 200 n = 300

σx = 1 δ= 0.50 94.3% 89.6% 90.6%

σ = 2 δ= 1.00 89.3% 91.6% 91.3%

δ= 2.00 87.6% 86.6% 88.9%

δ= 2.00 87.3% 88.6% 89.3%

τ= 0.25 Sample Size n = 100 n = 200 n = 300

σx = 1 δ= 0.50 88.3% 88.0% 85.0%

σ = 4 δ= 1.00 86.6% 84.3% 87.6%

δ= 2.00 84.6% 87.6% 89.0%

δ= 2.00 98.3% 96.3% 94.6%

τ= 0.25 Sample Size n = 100 n = 200 n = 300

σx = 2 δ= 0.50 83.6% 87.3% 81.6%

σ = 1 δ= 1.00 80.6% 85.3% 80.0%

δ= 2.00 79.6% 84.3% 86.0%

δ= 2.00 99.0% 98.6% 97.3%

τ= 0.25 Sample Size n = 100 n = 200 n = 300

σx = 4 δ= 0.50 79.8% 82.0% 82.6%

σ = 1 δ= 1.00 80.0% 79.6% 81.0%

δ= 2.00 87.0% 92.3% 87.6%

15

Table 6: 90% Coverage Probabilities for qˆ : Homogeneity. τ= 0.50 σx = 1 σ = 1 τ= 0.75 σx = 1 σ = 1 Sample Size δ= 0.50 δ= 1.00 δ= 2.00 Sample Size δ= 0.50 δ= 1.00 n = 100 85.3% 89.0% 92.3% n = 100 86.9% 92.3% n = 200 82.6% 86.0% 88.6% n = 200 92.6% 92.0% n = 300 86.6% 88.6% 89.6% n = 300 88.3% 89.3%

δ= 2.00 91.0% 89.6% 89.3%

τ= 0.50 Sample Size n = 100 n = 200 n = 300

σx = 1 δ= 0.50 85.6% 83.6% 82.3%

σ = 2 δ= 1.00 86.0% 81.3% 84.0%

τ= 0.50 Sample Size n = 100 n = 200 n = 300

σx = 1 δ= 0.50 78.0% 76.6% 80.3%

σ = 4 δ= 1.00 80.0% 79.0% 81.6%

τ= 0.50 Sample Size n = 100 n = 200 n = 300

σx = 2 δ= 0.50 94.3% 88.3% 86.0%

σ = 1 δ= 1.00 94.0% 87.3% 93.3%

τ= 0.50 Sample Size n = 100 n = 200 n = 300

σx = 4 δ= 0.50 96.0% 96.0% 90.6%

σ = 1 δ= 1.00 91.6% 96.3% 92.0%

δ= 2.00 82.6% 84.0% 86.6%

τ= 0.75 Sample Size n = 100 n = 200 n = 300

σx = 1 δ= 0.50 80.6% 82.0% 79.6%

σ = 2 δ= 1.00 83.0% 87.6% 87.0%

δ= 2.00 89.0% 93.6% 92.9%

δ= 2.00 81.6% 76.6% 78.6%

τ= 0.75 Sample Size n = 100 n = 200 n = 300

σx = 1 δ= 0.50 85.0% 85.6% 85.3%

σ = 4 δ= 1.00 88.6% 87.3% 88.6%

δ= 2.00 80.6% 85.0% 87.6%

δ= 2.00 94.1% 89.6% 92.3%

τ= 0.75 Sample Size n = 100 n = 200 n = 300

σx = 2 δ= 0.50 93.0% 91.0% 86.6%

σ = 1 δ= 1.00 91.0% 89.6% 93.0%

δ= 2.00 89.6% 90.3% 91.6%

δ= 2.00 93.4% 98.6% 94.4%

τ= 0.75 Sample Size n = 100 n = 200 n = 300

σx = 4 δ= 0.50 93.0% 92.3% 92.6%

σ = 1 δ= 1.00 92.6% 94.0% 91.6%

δ= 2.00 91.3% 89.9% 88.6%

16

Table 7: 90% Coverage Probabilities for qˆ : Homogeneity. τ= 0.90 σx = 1 σ = 1 Sample Size δ= 0.50 δ= 1.00 δ= 2.00 n = 100 94.3% 97.0% 97.6% n = 200 93.0% 96.0% 96.6% n = 300 93.3% 95.6% 94.0% Sample Size n = 100 n = 200 n = 300

σx = 1 78.6% 78.3% 82.6%

σ = 2 88.0% 83.6% 85.6%

96.3% 96.0% 92.3%

Sample Size n = 100 n = 200 n = 300

σx = 1 75.0% 70.3% 71.6%

σ = 4 76.0% 75.0% 76.6%

76.6% 82.0% 86.6%

Sample Size n = 100 n = 200 n = 300

σx = 2 98.0% 96.0% 97.3%

σ = 1 97.3% 96.3% 97.0%

98.0% 96.3% 95.6%

Sample Size n = 100 n = 200 n = 300

σx = 4 97.0% 97.0% 94.3%

σ = 1 96.6% 96.3% 94.0%

96.0% 95.3% 94.6%

17

Table 8: 90% Coverage Probabilities for qˆ : Heterogeneity. τ= 0.10 σx = 1 σ = 1 Sample Size δ= 0.50 δ= 1.00 δ= 2.00 n = 100 96.3% (92.3%) 88.0% (91.6%) 84.6% (86.6%) n = 200 92.6% (89.6%) 85.0% (84.6%) 80.6% (82.0%) n = 300 88.3% (88.9%) 91.0% (89.6%) 82.6% (84.6%) Sample Size n = 100 n = 200 n = 300

σx = 1 81.3% (81.6%) 76.0% (78.6%) 80.6% (86.6%)

σ = 2 97.0% (96.0%) 93.0% (91.3%) 86.6% (85.6%)

96.0% (93.6%) 87.6% (92.0%) 92.6% (88.6%)

Sample Size n = 100 n = 200 n = 300

σx = 1 79.0% (83.3%) 76.0% (76.3%) 73.0% (74.6%)

σ = 4 95.6% (93.6%) 87.6% (92.0%) 81.6% (82.3%)

97.6% (92.3%) 88.6% (92.0%) 86.6% (87.3%)

Sample Size n = 100 n = 200 n = 300

σx = 2 86.3% (88.0%) 88.6% (91.6%) 86.6% (88.3%)

σ = 1 96.6% (96.0%) 95.0% (94.6%) 86.0% (86.6%)

97.6% (93.6%) 93.0% (90.6%) 92.3% (91.6%)

Sample Size n = 100 n = 200 n = 300

σx = 4 99.0% (99.0%) 98.6% (98.0%) 98.0% (98.0%)

σ = 1 99.6% (98.6%) 99.0% (98.3%) 98.0% (98.0%)

98.3% (98.0%) 99.0% (98.3%) 98.6% (97.6%)

18

Table 9: 90% Coverage Probabilities for qˆ : Heterogeneity. τ= 0.25 σx = 1 σ = 1 Sample Size δ= 0.50 δ= 1.00 δ= 2.00 n = 100 95.3% (92.6%) 96.0% (94.6%) 87.6% (93.3%) n = 200 86.0% (92.0%) 88.6% (91.6%) 94.6% (92.3%) n = 300 84.6% (91.6%) 88.3% (92.4%) 86.6% (91.0%) Sample Size n = 100 n = 200 n = 300

σx = 1 95.6% (94.6%) 97.6% (97.3%) 91.3% (91.0%)

σ = 2 96.0% (94.3%) 95.6% (94.0%) 95.3% (93.6%)

93.0% (92.6%) 86.3% (88.6%) 94.2% (93.6%)

Sample Size n = 100 n = 200 n = 300

σx = 1 92.0% (81.6%) 91.0% (79.6%) 86.6% (84.6%)

σ = 4 97.0% (92.6%) 92.0% (83.0%) 93.0% (82.6%)

93.6% (86.0%) 93.0% (87.6%) 90.6% (88.0%)

Sample Size n = 100 n = 200 n = 300

σx = 2 99.0% (99.0%) 98.0% (97.2%) 98.6% (98.3%)

σ = 1 99.6% (99.0%) 98.6% (98.3%) 98.0% (98.0%)

99.6% (98.6%) 98.0% (98.0%) 97.6% (97.3%)

Sample Size n = 100 n = 200 n = 300

σx = 4 99.0% (98.0%) 98.6% (98.3%) 97.0% (96.0%)

σ = 1 98.6% (98.0%) 96.3% (94.0%) 96.0% (95.3%)

98.6% (98.3%) 98.0% (97.0%) 96.6% (96.3%)

19

Table 10: 90% Coverage Probabilities for qˆ : Heterogeneity. τ= 0.50 σx = 1 σ = 1 Sample Size δ= 0.50 δ= 1.00 δ= 2.00 n = 100 89.3% (89.6%) 96.0% (97.3%) 94.0% (95.6%) n = 200 85.3% (84.6%) 95.0% (95.1%) 95.6% (96.0%) n = 300 88.3% (88.6%) 94.2% (94.6%) 93.3% (93.8%) Sample Size n = 100 n = 200 n = 300

σx = 1 87.0% (86.6%) 82.0% (82.3%) 88.3% (80.6%)

σ = 2 91.0% (90.6%) 83.0% (83.0%) 84.3% (83.0%)

93.6% (94.3%) 87.6% (88.0%) 89.6% (90.0%)

Sample Size n = 100 n = 200 n = 300

σx = 1 75.3% (74.0%) 73.1% (71.3%) 74.0% (73.0%)

σ = 4 76.6% (75.0%) 76.3% (76.0%) 78.3% (78.0%)

82.3% (81.3%) 77.0% (74.0%) 76.6% (74.3%)

Sample Size n = 100 n = 200 n = 300

σx = 2 97.3% (97.0%) 89.3% (90.0%) 87.3% (88.6%)

σ = 1 98.3% (98.0%) 97.0% (95.6%) 96.6% (96.0%)

99.3% (99.0%) 96.0% (95.3%) 95.6% (94.3%)

Sample Size n = 100 n = 200 n = 300

σx = 4 98.6% (97.3%) 99.0% (98.6%) 97.6% (95.9%)

σ = 1 98.6% (98.0%) 98.0% (97.3%) 98.0% (97.0%)

99.3% (99.0%) 98.0% (98.0%) 96.6% (95.3%)

20

Table 11: 90% Coverage Probabilities for qˆ : Heterogeneity. τ= 0.75 σx = 1 σ = 1 Sample Size δ= 0.50 δ= 1.00 δ= 2.00 n = 100 94.0% (89.0%) 87.3% (88.6%) 91.0% (90.3%) n = 200 85.3% (87.0%) 92.0% (91.6%) 86.6% (87.6%) n = 300 81.6% (84.6%) 82.6% (92.6%) 87.0% (92.3%) Sample Size n = 100 n = 200 n = 300

σx = 1 86.0% (90.0%) 83.6% (88.0%) 90.3% (89.0%)

σ = 2 87.0% (89.3%) 92.0% (90.3%) 93.0% (92.0%)

96.3% (94.6%) 96.6% (96.0%) 93.6% (92.3%)

Sample Size n = 100 n = 200 n = 300

σx = 1 71.3% (84.3%) 74.3% (87.0%) 83.6% (94.0%)

σ = 4 92.6% (90.0%) 89.6% (89.3%) 93.0% (87.6%)

86.0% (88.0%) 88.3% (87.9%) 92.0% (86.3%)

Sample Size n = 100 n = 200 n = 300

σx = 2 99.0% (99.0%) 98.0% (97.2%) 98.6% (98.3%)

σ = 1 99.6% (99.0%) 98.6% (98.3%) 98.0% (98.0%)

99.6% (98.6%) 98.0% (98.0%) 97.6% (97.3%)

Sample Size n = 100 n = 200 n = 300

σx = 4 99.0% (98.0%) 98.6% (98.3%) 97.0% (96.0%)

σ = 1 98.6% (98.0%) 96.3% (94.0%) 96.0% (95.3%)

98.6% (98.3%) 98.0% (97.0%) 96.6% (96.3%)

21

Table 12: 90% Coverage Probabilities for qˆ : Heterogeneity. τ= 0.90 σx = 1 σ = 1 Sample Size δ= 0.50 δ= 1.00 δ= 2.00 n = 100 80.0% (88.0%) 87.0% (94.0%) 86.6% (92.3%) n = 200 79.6% (85.0%) 93.0% (86.0%) 91.0% (88.0%) n = 300 76.6% (86.3%) 85.0% (87.0%) 87.0% (88.0%) Sample Size n = 100 n = 200 n = 300

σx = 1 83.3% (89.3%) 86.6% (87.0%) 94.6% (93.6%)

σ = 2 80.6% (85.6%) 86.3% (87.0%) 94.0% (93.3%)

88.0% (88.3%) 95.0% (92.0%) 92.6% (90.3%)

Sample Size n = 100 n = 200 n = 300

σx = 1 84.0% (84.6%) 71.0% (75.6%) 70.3% (72.6%)

σ = 4 82.6% (83.0%) 74.0% (76.6%) 74.6% (77.3%)

77.3% (78.6%) 85.0% (85.3%) 87.0% (88.0%)

Sample Size n = 100 n = 200 n = 300

σx = 2 86.3% (86.6%) 97.0% (87.6%) 96.6% (90.6%)

σ = 1 88.3% (89.6%) 96.0% (94.6%) 98.3% (93.3%)

91.0% (88.0%) 95.6% (94.6%) 92.3% (89.3%)

Sample Size n = 100 n = 200 n = 300

σx = 4 95.6% (92.3%) 94.0% (90.6%) 95.3% (90.3%)

σ = 1 96.3% (96.0%) 93.0% (90.3%) 94.0% (90.0%)

97.0% (97.3%) 95.0% (92.3%) 94.0% (92.0%)

22

We briefly comment on the simulation results. In the case of the median, we can see that the confidence regions under homogeneity are close to their nominal value but they become conservative when increasing the variance of the regressor x. Increasing the variance of the error , results in confidence regions that undercover. The confidence regions under heterogeneity, in general overcover but bt they improve as the sample size increases and the size of the jump increase. The overcovering gets more serious when we increase the variability of  while increasing the variability of x, results again in undercovered confidence regions. For the other quantiles, the picture is different. Increasing the variability of x, results in conservative confidence regions for extreme quantiles τ = 0.10 and τ = 0.90. Increasing the variability of  has smaller effects on the confidence regions when the sample and the size of the jump both increase. The same quantiles under heterogeneity, show similar behavior. For quantiles τ = 0.25 and τ = 0.75 under homogeneity, increasing both the variability of x and  improves the behavior of the confidence sets. Under heterogeneity, increasing the variance of  improves the confidence sets but increasing the variance of x makes the confidence sets more conservative. Concerning the bandwidth parameter, the results from cross-validation look better in general, and should be prefered on the basis of this simulation exercise. We should also mention that when the confidence regions overcover, it helps choosing a larger bandwidth (undersmoothing). We have tried this and has helped in conservative confidence regions but we do not report the results here since we cannot really formally advice as to how large one needs to set the bandwidth in order to improve the test performance.7

5

Testing for the Threshold Effect

In this section we present some new asymptotic results that result from taking into account serially correlated errors. For convenience of asymptotic analysis, we define Hn (τ, q) = n

−1

n X

| | fi (Zq,i α0 (τ))Zq,i Zq,i

and Jn (τ, q) = n

−1/2

n X

  | | ψτ yi − Zq,i α0 (τ) Zq,i ,

i=1

i=1

where under the null hypothesis of no threshold effects, α0 (τ) = (γ0 (τ)| , β0 (τ)| , 0)| and ψτ (u) = τ − 1{u < 0}, the influence function in quantile regression. [A.5] We assume that 7

It is known that confidence regions based on Hansen’s threshold least squares estimator, behave better since the errors are assumed standard normally distributed which implies the least squares estimator is more efficient than the LAD estimator. In particular, and for the case of homoskedastic errors, the efficiency gains are captured by the ratio of the scaled terms in least squares and quantile LS regression respectively, i.e. Eff = ωωQR = τ(1 − τ) fi (F−1 (τ)|zi )2 σ2 . This can be large for non-Gaussian error i distributions; see Caner (2002).

23

P | (i) sup n−1 ni=1 Zq,i Zq,i −Q0 (q) = oP (1), for some finite and symmetric, positive definite q∈Q

|

matrix Q0 (q) = E(Zq,i Zq,i ). (ii) sup Hn (τ, q) − H0 (τ, q) = oP (1), for some finite, symmetric and positive definite τ×q∈T ×Q   |   | matrix H0 (τ, q) = E fi Zq,i α0 (τ) Zq,i Zq,i for τ ∈ T , q ∈ Q. Assumption [A.5] facilitates the asymptotic analysis and can be easily modified to accommodate a location-shift specification. We can now state our result. Theorem 5.1 Under assumptions A[1]-A[5], for all q ∈ Q = [qL , qU ] and τ ∈ T = [τL , τU ], we have  



ˆ q) − α0 (τ, q) − Hn (τ, q)−1 Jn (τ, q) = oP (1), sup n1/2 α(τ,

(6)

(τ,q)∈T ×Q

and Jn (τ, q) ≡ −n−1/2

n X

 |  | ψτ yi − Zq∗ i α0 (τ) Zqi

B∗ (τ, q),

i=1

where B∗ (τ, q) is a two-parameter Gaussian process with mean zero and variance/covariance matrix given by i   Xh E B∗ (τ1 , q1 )B∗ (τ2 , q2 ) = P(i (τ1 ) ≤ 0, i+h (τ2 ) ≤ 0) − τ1 τ2 K(h, q1 , q1 ), |h|≤∞

where  | | | |  E(z1,i z1,i+h ) E(z1,i z2,i+h ) E(z1,i zq2 ,i+h )  | | | | K(h, q1 , q1 ) =  E(z2,i z1,i+h ) E(z2,i z2,i+h ) E(z2,i zq2 ,i+h )  | | | | E(zq1 ,i z1,i+h ) E(zq1 ,i z2,i+h ) E(zq1 ,i zq2 ,i+h )

    .  

The above asymptotic representation implies that   ˆ q) − α0 (τ, q) n1/2 α(τ,



−1 H0 (τ, q) B∗ (τ, q),

for the same B∗ (τ, q) as above. This is an extension of Theorem 1 in Galvao et al. (2011) in that the covariance matrix of the two-parameter Gaussian process takes a complicated form and is heavily dependent on nuisance parameters. This makes the problem of developing tests for detecting threshold effects on quantiles hard. 24

For example, if we are interested in testing for threshold effects at (τ, q) ∈ T × Q, we can proceed as in Galvao et al. (2014) and propose a Wald test defined by, ( Wn (τ, q) =

sup (τ,q)∈T ×Q

sup

)  |  −1   −1 −1 | ˆ q) R[Hn (τ, q)] Jn (τ, q)[Hn (τ, q)] R ˆ q) . n Rα(τ, Rα(τ,

(τ,q)∈T ×Q

(7) In this framework, testing the hypothesis of no heterogeneity in the effects of quantiles of xi on the conditional distribution of yi , we reject the null if sup(τ,q)∈T ×Q Wn (τ, q) > w0 , for some constant w0 as suggested by Davies (1977, 1987). Then, at some unknown τ ∈ T and q ∈ Q, we have sup Wn (τ, q) (τ,q)∈T ×Q

sup W0 (τ, q),

(8)

(τ,q)∈T ×Q

where the process W0 (τ, q) is given by −1    |  R[H0 (τ, q)]−1 B∗ (τ, q) , R[H0 (τ, q)]−1 B∗ (τ, q) R[H0 (τ, q)]−1 Q0 (q)[H0 (τ, q)]−1 R| where B∗ (τ, q) is a 2-parameter Gaussian process with mean zero and covariance kernel defined in Theorem 4.1. As it stands, it is very hard the task to simulate the critical values of the above test since the two-parameter Gaussian process variance/covariance kernel is heavily dependent on nuisance parameters and to the best of our knowledge, there is no known way to remove these effects by re-centering of some other way. Currently, this is an issue left investigated by the authors.

6

Concluding Remarks

We have formulated a general threshold quantile regression model with one threshold value which we call “partition quantile” and derived asymptotic results under the null and the alternative hypothesis, concerning the estimators as well as the unknown estimated partition quantile assuming a “shrinking shifts” asymptotic framework. We have constructed confidence intervals on the estimated partition quantile by inverting a Likelihood-Ratio-type test statistic, and have derived its limiting distribution. The coverage probability of the estimated confidence regions is assessed through a simulation exercise. An alternative interesting approach that perhaps gives better results for all simulation settings considered in this paper, is sub-sampling; see Gonzalo and Wolf (2005). This is a topic for future research. We have shown that the standardized estimators converge to a two-parameter Gaussian process that extends that of Galvao et al. (2014) since our error process allows for 25

serial correlation. The fact that the variance/covariance matrix of this process is heavilly dependent on nuisance parameters, makes the problem of inference hard. One way is to use some resampling technique, like bootstrap or subsampling to tabulate critical values a Wald statistic. This is currently investigated by the authors. The model here can be extended to accommodate multiple regimes. It might also be of interest to allow for a continuous threshold by replacing the indicator function by some integrated kernel as in Seo and Linton (2006) There are topics left to future research.

7

Technical Appendix

Proof of Theorem 3.1: This proof follows closely Lemma 2 of Galvao et al. (2011) with minor modifications hence it is omitted.  Proof of Theorem 4.1: We proceed as in Su and Xiao (2008) and define the following quantities: |

|

|

α(τ) = (γ(τ)| , β(τ)| , δ(τ)| )| , α0,τ = α0 (τ) = (γ0,τ , β0,τ , δ0,τ )| and ˆ q)| )| = (γˆ |τ,q , βˆ|τ,q , δˆ|τ,q )| . ˆ q)| , δ(τ, ˆ q) = (γ(τ, ˆ q)| , β(τ, αˆ τ,q = α(τ, Also, define

φˆ τ,q

 1/2  n (γˆ τ,q − γ0,τ )  =  n1/2 (βˆτ,q − β0,τ )  1/2 n (δˆτ,q − δ0,τ )

  1/2   n (γ(τ) − γ0,τ )    , φτ =  n1/2 (β(τ) − β0,τ )     1/2 n (δ(τ) − δ0,τ )

    and Zq,i = (z| , z| , z| )| .  1,i 2,i q,i 

Notice that φˆ τ,q = argmin φτ ∈Rp

n X

  | | ρτ yi − α0,τ Zq,i − n−1/2 φτ Zq,i .

i=1

Set, Sn (τ, q; φτ ) = n

−1/2

n X

  | | ψτ yi − α0,τ Zq,i − n−1/2 φτ Zq,i Zq,i

i=1

and Sn (τ, q; φτ ) = n

−1/2

n X h  i | | E ψτ yi − α0,τ Zq,i − n−1/2 φτ Zq,i Zq,i , i=1

26

|

and observe that −φτ Sn (τ, q; vφτ ) is an increasing function of v ≥ 1. Theorem 4.1 is a consequence of the following lemmata as in Su and Xiao (2008). Lemma 7.1 Under assumptions [A.1]-[A.5],



sup Sn (τ, q; φ) − Sn (τ, q; 0) − [Sn (τ, q; φ) − Sn (τ, q; 0)] = oP (1).

sup

(τ,q)∈T ×Q kφk≤M

Proof : Define ( ) n X −1/2 Ξn (τ, q; φ) = − Sn (τ, q; φ) − Sn (τ, q; 0) − [Sn (τ, q; φ) − Sn (τ, q; 0)] = n ξn,i (τ, q; φ), i=1

where   ξn,i (τ, q; φ) = ξn,i (τ, q; φ) − Ei ξn,i (τ, q; φ) and # n o h n  | o | −1/2 ξn,i (τ, q; φ) = 1 yi ≤ α0,τ + n φ Zq,i − 1 yi ≤ α0,τ Zq,i Zq,i .



We need to show that Ξn,k (τ, q; φ) = oP (1), for fixed τ, q, φ and k = 1, 2, · · · , 2p. Notice that Ξn,k (·) is the kth -element of Ξn (·), hence we further define the corresponding kth elements Zq,i,k , ξn,i,k , ξn,i,k of Zq,i , ξ n,i , ξn,i respectively. If each kΞn,k (τ, q; φ)k is oP (1) we can

then conclude that Ξn , (τ, q; φ) = oP (1). By using the chaining technique, we can show that h i h i E ξn,i (τ, q; φ) = E ξn,i (τ, q; φ) − E [ξn,i (τ, q; φ)] = 0. We now calculate the variance of Ξn,k (·): n X i h i −1 Var Ξn,k (τ, q; φ) = n E Vari [ξn,i,k (τ, q; φ)]

h

i=1 Jensen Inequality



" # n n X X h i   |   2 −1 −1/2 2 E Ei [ξn,i,k (τ, q; φ) ] ≤ n E Fi (α0,τ +n φ)Zq,i −Fi α0,τ Zq,i Zq,i n −1

i=1

i=1 −3/2

≤n

n h X

|

C1i kφ

Zq,i kZ2q,i

i

n h X i   −1/2 −1 ≤ C max n kZq,i k n C1i kZq,i k2

i=1

  ≤ C max n−1/2 kzi k n−1 i≤n

n h X

i≤n

i=1

i C1i kzi k2 = OP (1),

i=1

27

by assumption [A.4] and due to kZq,i k ≤ kzi k. An application of Chebyshev’s inequality yields



Ξn,k (τ, q; φ)

= OP (1). We need to prove that this asymptotic negligibility holds uniformly in Φ = {φ : kφk ≤ M}, for fixed τ ∈ T , q ∈ Q and constant M ∈ (0, ∞). For this, it is sufficient to prove the following: sup Ξ+n,k (τ, q; φ) = OP (1)

sup

and

(τ,q)∈T ×Q kφk≤M

sup

sup Ξ−n,k (τ, q; φ) = OP (1),

(A.1)

(τ,q)∈T ×Q kφk≤M

where Ξ+n,k (τ, q; φ) and Ξ−n,k (τ, q; φ) are defined as Ξn,k (τ, q; φ) but having replaced Zq,i,k by     Z+q,i,k ≡ max Zq,i,k , 0 and Z−q,i,k ≡ max − Zq,i,k , 0 , respectively. We will show only the first relation in (A.17) as the second follows along similar lines. Define, + Ξn,k (τ, q; φ, v)

=n

−1/2

n ( X

1{yi ≤ (α0,τ + n−1/2 φ)| Zq,i + v kn−1/2 Zq,i k }

i=1

)   |   | −1/2 | −1/2 −Fi (α0,τ + n φ) Zq,i + v kn Zq,i k − 1{yi ≤ α0,τ Zq,i } − Fi α0,τ Zq,i . +

Notice that when v = 0, then Ξn,k (τ, q; φ, 0) = Ξ+n,k (τ, q; φ). As in Su and Xiao (2008), we will show that (A.17) follows from

+

Ξn,k (τ, q; φ, v)

= oP (1),

sup

(A.2)

(τ,q)∈T ×Q

for fixed v and φ. Since Φ is compact, we can partition it into a finite number of N(σ) subsets {Φ1 , · · · , ΦN(σ) }, each of diameter not greater than σ. Fix s ∈ {1, · · · , M(σ)} and φs ∈ Φs and note that |

φ| Zq,i ≤ φs Zq,i + σkZq,i k,

∀φ ∈ Φs .

By the monotonicity of the indicator function and the non-negativity of Zq,i , for any φ ∈ Φs , we have +

Ξ+n,k (τ, q; φ) ≤ Ξn,k (τ, q; φs , σ) +n

−1/2

) n ( X     −1/2 | −1/2 −1/2 | + Fi (α0,τ + n φs ) Zq,i + σkn Zq,i k − Fi (α0,τ + n φs ) Zq,i Zq,i . i=1

28

The reverse inequality holds for −σ, for all φs ∈ Φs . We have

)

n (    

−1/2 X −1/2 | −1/2 −1/2 | + Fi (α0,τ + n φs ) Zq,i + σkn Zq,i k − Fi (α0,τ + n φ) Zq,i Zq,i

sup n

τ∈T i=1

≤ σn

−1

n X

C1,i kZq,i kZ+q,i

−1

≤n

n X

i=1

C1,i kzi kZ+q,i = σOP (1),

i=1

with the OP (1) term uniform ∀q ∈ Q. Therefore,

+

sup Ξn,k (τ, q; φ)

sup

(τ,q)∈T ×Q kφk≤M

≤ sup

sup



+

Ξn,k (τ, q; φs , σ)

+ sup

sup

+

Ξn,k (τ, q; φs , −σ)

+ σOP (1).

s≤N(σ) (τ,q)∈T ×Q

s≤N(σ) (τ,q)∈T ×Q

Since Φ is compact, σ can be made arbitrarily small and N(σ) is finite. Therefore (A.1) follows from (A.2). To show (A.2), we will use the chaining technique again. Fix v and φ, let N1 ≡ N1 (n) be an integer such that N1 = [n1/2+d ] + 1 for d ∈ (0, 1/2), where [·] denotes the integer part of the argument and divide T into N1 sub-intervals of points c1 = τ0 < τ1 < · · · < τN1 = 1 − c1 , the length of each sub-interval denoted by δ∗ = (1 − 2c1 )/N1 . By assumption A.2(i) in Su and Xiao (2008), ∀τi , τ j ∈ T such that |τi − τ j | ≤ δ∗ , we get kα0,τ j − α0,τi k ≤ (p + q)C0 |τi − τ j | ≤ C0 δ∗ ≡ C∗ . By the monotonicity of both the indicator and distribution function Fi (·) for τs ≤ τ ≤ τs+1 , we have +

Ξn,k (τ, q; φ, v) − Ξn,k (τs+1 , q; φ, v) −1/2

≤n

n ( X



Fi (α0,τs+1 +n

−1/2

φ) Zq,i +vkn

−1/2

|

)    −1/2 | −1/2 Zq,i k −Fi (α0,τ +n φ) Zq,i +λkn Zq,i k

i=1

+n

−1/2

) n ( X | | | | 1{yi ≤ α0,τs+1 Zq,i } − Fi (α0,τs+1 Zq,i ) − 1{yi ≤ α0,τ Zq,i } + Fi (α0,τ Zq,i ) Z+q,i,k , i=1

| α0,τ Zq,i

|

|

|

since = ζ0,τ zi ≤ ζ0,τs+1 zi = α0,τs+1 Zq,i , where in the first equality we have that under | | | the null hypothesis δτ = 0, hence ζ0,τ = (γ0,τ , β0,τ )| . Also notice that ζ0,τ zi is the τth quantile of yi given zi . A reverse inequality holds by replacing τs+1 by τs . Therefore, we

29

get sup

+

+

Ξn,k (τ, q; φ, v)

≤ max sup

Ξn,k (τs , q; φ, v)

0≤s≤N1 q∈Q

(τ,q)∈T ×Q

(A.3)

n ( )

X    

Fi α0,τs+1 +n−1/2 (φ| Zq,i +vkZq,i k) −Fi α0,τs +n−1/2 (φ| Zq,i +vkZq,i k) Z+q,i

(A.4) + max sup n−1/2

0≤s≤N1−1 q∈Q i=1

n ( o  

X n 1 yi ≤ α0,τ` Zq,i − Fi α0,τ` Zq,i + sup sup n−1/2

τ` ,τm ∈T , |τ` −τm |≤δ∗ q∈Q i=1 )

n o  

(A.5) −1 yi ≤ α0,τm Zq,i + Fi α0,τm Zq,i Z+q,i

.

By a mean value expansion, we deduce that (A.4) is OP (1). Now, under the null hypothesis, (A.5) becomes sup τ` ,τm ∈T , |τ` −τm |≤δ∗

)

n (

−1/2 X

sup n 1{Fi (yi ) ≤ τ` }−τ` −1{Fi (yi ) ≤ τm }+τm Z+q,i

.

q∈Q

(A.6)

i=1

It therefore suffices to prove that (A.5) = OP (1). We know that Fi (yi ) are i.i.d. U(0, 1) from Diebold et. al. (1998). We need to show that (A.5) is stochastically equicontinuous. This can be done by using Lemma A.1 of Qu (2008). Consider the subgradient process under the null, defined by Sn (τ, q, α) = n−1/2

n X

) ( o n   | | Z+q,i 1 yi ≤ α0,τ` Zq,i + Fi α0,τ` Zq,i .

i=1

Take T × Q as the parameter space and define the metric   ρ {τ1 , q1 }, {τ2 , q2 } = |q1 − q2 | + |τ1 − τ2 |. We need to show that the stochastic process Sn (τ, q, α) is stochastically equicontinuous on the metric space (T × Q, ρ), which means that for any , η > 0, there exists some δ > 0 such that for large n, !



P sup Sn (τ1 , q1 , α(τ1 )) − Sn (τ2 , q2 , α(τ2 )) > η < , [δ]

n o with [δ] = (s1 , s2 ) ∈ T × Q : s1 = {τ1 , q1 }, s2 = {τ2 , q2 }, ρ(s1 , s2 ) < δ . The proof is as in Lemma A.1 of Qu (2008) and therefore is omitted. We conclude that (A.4) is OP (1).

30

We now proceed to show that (A.3) is OP (1). Take  > 0 and notice the following: ! !

+

+



P max sup Ξn,k (τs , q; φ, v) >  ≤ (N1 +1) max P sup Ξn,k (τs , q; φ, v) >  . 0≤s≤N1 q∈Q

0≤s≤N1

|

|

|

|

|

(A.7)

q∈Q

|

|

|

As before, Zq,i = (z1,i , z2,i , zq,i ) = (zi , zq,i ) and define φ = (φ1 , φ2 )| , φ1 , φ2 be a p-vectors. Let,  |   |  √ | ηn,1,i = n−1/2 φ1 zi + vkzi k , and ηn,2,i = n−1/2 φ1 zi + φ2 zi + 2λkzi k . Then ηn,q,i = ηn,2,i in the regime where xi ≤ q, while ηn,q,i = ηn,1,i in the regime where xi > q. Define, n o n o  |   |  | | s∗n, j,i = 1 yi ≤ ζ0,τs zi + ηn,j,i − Fi ζ0,τs zi + ηn, j,i − 1 yi ≤ ζ0,τs zi + Fi ζ0,τs zi ,

j = 1, 2.

We are going to bound the probability on the RHS of (A.7) by considering two cases corresponding to our two regimes. CASE 1 : Regime where xi ≤ q. We have Z+q,i,k = z+q,i,k = x+i,k 1{xi ≤ q}. Therefore + Ξn,k (τs , q; φ, v)

=n

−1/2

n X

s∗n,2,i z+q,i,k

+n

−1/2

i=1

n X

s∗n,1,i z+i,k ,

i=1

and we get ! ! ! n n X X

+







P sup Ξn,k (τs , q; φ, v) >  ≤ P sup n−1/2 s∗n,2,i z+q,i,k > /2 + P sup n−1/2 s∗n,1,i z+i,k > /2 . q∈Q

q∈Q

q∈Q

i=1

|

{z

}

|

(A)

i=1

{z (B)

n o Since (s∗n,2,i , z+q,i,k , Fi ), 1 ≤ i ≤ n is a martingale difference sequence with respect to the σ-algebra Fi generated by the regressors and its lagged values, using Doob’s inequality we get n 16

X ∗

4 sn,2,i . (A) ≤ 2 4 E n i=1

Rosenthal’s inequality gives

n

4 "X # n n X h i h i2

X ∗ +

∗ + 4 ∗ + 2 E sn,2,i zq,i,k ≤ C E (sn,2,i zq,i,k ) + C E E (sn,2,i zq,i,k ) Fi .

i=1 i=1 i=1 | {z } | {z } (C)

(D)

31

}

Now, (C) is oP (n1/2 ) and for (D), note that since z+q,i,k is measurable with respect to Fi−1 , we have h i  |   |  E s∗n,2,i Fi−1 ≤ Fi ζ0,τs zi + ηn,2,i − Fi ζ0,τs zi ≤ C1,i ηn,2,i , therefore (D) ≤ C E

"X n

C1,i ηn,2,i (z+q,i,k )2

# = O(n).

i=1

4

P We conclude that E ni=1 s∗n,2,i z+q,i,k = O(n). Also ! ! n n X −1/2 X ∗ + ∗ + −1/2 sn,1,i zi,k > /2 = O(1/n), sn,2,i zq,i,k > /2 = O(1/n) and P sup n P sup n q∈Q q∈Q i=1

i=1

therefore ! +

P max sup Ξn,k (τs , q; φ, v) >  = O(N1 /n) = o(1). 0≤s≤N1 q∈Q

CASE 2 : Regime where xi > q. We have Z+q,i,k = z+i and it follows that + Ξn,k (τs , q; φ, v)

=n

−1/2

n X

s∗n,1,i z+i,k .

i=1

As before, we have  + P max sup Ξn,k (τs , q; φ, v) >  = O(N1 /n) = o(1). 0≤s≤N1 q∈Q

Putting the two cases together and applying Chebyshev’s inequality, we conclude that (A.3) = oP (1) and the proof of the Lemma concludes  Lemma 7.2 Under assumptions [A.1]-[A.5], sup



sup Sn (τ, q; φ) − Sn (τ, q; 0) + Hn (τ, q)φ = oP (1).

(τ,q)∈T ×Q kφk≤M

32

Proof : Remember that Hn (τ, q) = n−1 [A.1]-[A.5], we have

| | i=1 fi (Zq,i α0,τ )Zq,i Zq,i .

Pn

Under assumptions



sup Sn (τ, q; φ) − Sn (τ, q; 0) + Hn (τ, q)φ

sup

(τ,q)∈T ×Q kφk≤M

=

n h X i



| −1/2 sup n Fi ((α0,τ + n−1/2 φ)| Zq,i ) − Fi (α0,τ Zq,i ) Zq,i − Hn (τ, q)φ

sup

(τ,q)∈T ×Q kφk≤M

=

i=1

n Z X

−1/2 sup n

sup

(τ,q)∈T ×Q kφk≤M



1

h

0

i=1

i

| | fi ((α0,τ + n−1/2 φ)| Zq,i ) − fi (α0,τ Zq,i ) ds Zq,i Zq,i φ

n X

| sup n C2,i (n−1/2 φ| Zq,i ) Zq,i Zq,i φ −1

sup

(τ,q)∈T ×Q kφk≤M

i=1

−1/2

2

≤ 2 M max [n 1≤i≤n

−1

kzi k]n

n X

C2,i kzi k2 = oP (1).

i=1

 Lemma 7.3 Under assumptions [A.1]-[A.5], sup



Sn (τ, q; φˆ τ )

= oP (1).

(τ,q)∈T ×Q

Proof : By the proof of Lemma 2 in Ruppert and Carroll (1980), we have sup



Sn (τ, q; φˆ τ )

=



sup (τ,q)∈T ×Q

sup (τ,q)∈T ×Q

(τ,q)∈T ×Q

−1/2

n

n X

n X 



| −1/2

n ψτ yi − αˆ τ Zq,i Zq,i i=1

√ | 1{yi − αˆ τ Zq,i = 0}kZq,i k ≤ 2 2(p + q)n−1/2 max kzi k = oP (1). 1≤i≤n

i=1

 We now derive the two-parameter Gaussian process variance-covariance matrix stated in the Theorem.  | P  Under serially correlated errors, for Jn (τ, q) = n−1/2 ni=1 τ − 1{i (τ) ≤ 0} Zq,i , we get h i E Jn (τ1 , q1 )Jn (τ2 , q2 ) ( ) n  XX    | = E n−1/2 n−1/2 τ1 − 1{i (τ1 ) ≤ 0} Zq1 ,i τ2 − 1{i+h (τ2 ) ≤ 0} Zq2 ,i+h |h|≤∞ i=1

33

=n

−1

( ) n XX    | E Zq1 ,i Zq2 ,i+h τ1 − 1{i (τ1 ) ≤ 0} τ2 − 1{i+h (τ2 ) ≤ 0} |h|≤∞ i=1

=n

−1

( ) n XX   | E Zq1 ,i Zq2 ,i+h τ1 τ2 −τ1 1{i+h (τ2 ) ≤ 0}−τ2 1{i (τ1 ) ≤ 0}+1{i (τ1 ) ≤ 0}1{i+h (τ2 ) ≤ 0} |h|≤∞ i=1

n XX  h i | =n E Zq1 ,i Zq2 ,i+h τ1 τ2 −τ1 P(i+h (τ2 ) ≤ 0) −τ2 P(i (τ1 ) ≤ 0) +P(i (τ1 ) ≤ 0, i+h (τ2 ) ≤ 0) | {z } | {z } |h|≤∞ i=1 −1

=τ2

=n

−1

=τ1

n XX  h i | E Zq1 ,i Zq2 ,i+h P(i (τ1 ) ≤ 0, i+h (τ2 ) ≤ 0) − τ1 τ2 . |h|≤∞ i=1

Therefore, the variance-covariance matrix of our Gaussian process takes the form   Xh i E B∗ (τ1 , q1 )B∗ (τ2 , q2 ) = P(i (τ1 ) ≤ 0, i+h (τ2 ) ≤ 0) − τ1 τ2 |h|≤∞

 | | | |  E(z1,i z1,i+h ) E(z1,i z2,i+h ) E(z1,i zq2 ,i+h )  | | | | ×  E(z2,i z1,i+h ) E(z2,i z2,i+h ) E(z2,i zq2 ,i+h )  | | | | E(zq1 ,i z1,i+h ) E(zq1 ,i z2,i+h ) E(zq1 ,i zq2 ,i+h )

    .   

Before we prove Theorem 5.1, we need to define some quantities and an additional Lemma. Define h i ∆Zq,i = Zq,i − Zq∗ ,i = z2i 1{xi ≤ q − 1{xi ≤ q∗ } h i = z2i 1{xi ≤ q∗ + v/an } − 1{xi ≤ q∗ } = ∆Zv,i , with an = O(n1−2ζ ), ζ ∈ (0, 1/2) and v ∈ V for some V compact set. Also, Zv,i = z2i 1{xi ≤ q + v/an }. Lemma 7.4 Under assumptions [A.1]-[A.9], and uniformly in v ∈ V, we have 2ζ−1

n

"X n (

)     −1/2 | ρτ i (τ) − n Zq∗ ,i φτ − ρτ i (τ)

i=1

)# n (  X    | | −1/2 | − ρτ i (τ) − n Zq,i φτ − ∆Zv,i δ(τ) − ρτ i (τ) − ∆Zv,i δ(τ) = oP (1). i=1

34

Proof : By Knight’s identity, Knight (1998), v

Z ρτ (u + v) − ρτ (u) = −vψτ (u) +

n o 1{u ≤ s} − 1{u < 0} ds,

0

we can write the first term inside the brackets in the above Lemma as follows, −1/2

−n

| φτ

n X

ψτ (i (τ))Zq∗ ,i + n

−1/2

i=1

|

| φτ

n X

Zq∗ ,i

}

h

i | 1{i (τ) ≤ sn−1/2 φτ Zq∗ ,i } − 1{i (τ) ≤ 0} ds .

0

i=1

{z

1

Z

|

{z

(I)

}

(II)

In the same way, we can write the second term inside the brackets in the above Lemma as −1/2

−n

| φτ

n X

ψτ (i (τ))∆Zv,i

i=1

|

{z

}

(I0 )

+n

−1/2

| φτ

n X

1

Z ∆Zv,i

h

i | 1{i (τ) − δ(τ)| ∆Zv,i ≤ sn−1/2 φτ Zq,i } − 1{i (τ) − δ(τ)| ∆Zv,i ≤ 0} ds .

0

i=1

|

{z

}

(II0 )

We first observe that the difference of the terms (I) and (I0 ) is of order oP (1) for v ∈ V, since n−1+2ζ → 0 as n → ∞ with ζ < 1/2, that is, we have n n X −1/2 X | | sup n ψτ (i (τ))w ∆Zv,i − ψτ (i (τ))w Zq∗ ,i = oP (1), kwk≤M i=1

i=1

for M > 0 and w ∈ W with W some compact set. To deal with the remaining terms (II) and (II0 ), we need to define the following classes of functions: n o F (1) = 1{x ≤ q}, q ∈ Q , n o | | | F (2) = τ − 1{y − z1 γ(τ) − z2 β(τ) − zq δ(τ) ≤ 0}, q ∈ Q , n o F (3) = w| z : w ∈ W . The above functions are all VC-subgraph classes and so is their product by Lemma 2.6.18 of Van der Vaart and Wellner (1996). Following Galvao et al. (2014), define the

35

semi-metric   h  i1/p | | ρ (q1 , w1 ), (q2 , w2 ) = E |wz| |p |1{xi ≤ q1 }1{yi ≤ w1 zi }−1{xi ≤ q2 }1{yi ≤ w2 zi }|p , p ≥ 2. | {z } | {z } (1)

(2)

i (τ)≤0

i (τ)≤0

By Theorem 2.1 of Arcones and Yu (1995), the stochastic process " Gn (w, q) =

| n−1/2 wzqi 1{yi

≤ wz| } −

# ≤ wz| |z)] , (q, w) ∈ Q × W, | {z }

| E[wzqi F(yi

i (τ)≤0

is stochastically equicontinuous over Q × W. We analyze the supremum of (II) with the follwoing decomposition, Z 1h n i −1/2 | X | Zq∗ ,i 1{i (τ) ≤ sn−1/2 φτ Zq∗ ,i } − 1{i (τ) ≤ 0} ds sup n φτ 0 kwk≤M i=1

Z 1h n i i −1/2 X h | | = sup n 1{i (τ) ≤ sn−1/2 φτ Zq∗ ,i } − 1{i (τ) ≤ 0} ds E φτ Zq∗ ,i 0 kwk≤M

(∗)

i=1

+n

−1/2

n X

| φτ Zq∗ ,i

−n

n X

h

h i i | 1{i (τ) ≤ sn−1/2 φτ Zq∗ ,i } − E 1{i (τ) ≤ 0} ds

0

i=1 −1/2

1

Z

| φτ Zq∗ ,i

1

Z 0

i=1

h

i 1{i (τ) ≤ 0} − E[1{i (τ) ≤ 0}] ds | {z } =τ

By the stochastic equicontinuity result above, the second and the third term are oP (1) and we get Z 1h n i i −1/2 X h | | (∗) = sup n 1{i (τ) ≤ sn−1/2 φτ Zq∗ ,i } − 1{i (τ) ≤ 0} ds + oP (1) E φτ Zq∗ ,i 0 kwk≤M i=1

Z 1h n i −1/2 X h | −1/2 ≤ M n E kZq∗ ,i k F(sn φτ Zq∗ ,i |z) − F(0|z) dz + oP (1) |{z} |{z} 0 i=1

≤M

≤kzi k

= oP (1) since by Assumption [A.4] max1≤i≤n kzi k = oP (n1/2 ) and n−1+2ζ → 0 as n → ∞ with ζ < 1/2. In a similar way we can show that the term (II0 ) is oP (1) and we have finished.  36

Proof of Theorem 5.1 We study the local behavior of the quantile regression objective function as in Bai (1995). Let Qn (q) = Sn (q∗ ) − Sn (q) =

n X

  | √ | √ | √ ρτ i (τ) − n−1/2 z1i ( n(γ − γ(τ))) − n−1/2 z2i ( n(β − β(τ))) − n−1/2 zq∗ i ( n(δ − δ(τ)))

i=1



n X

  | √ | √ | √ ρτ i (τ)−n−1/2 z1i ( n(γ−γ(τ)))−n−1/2 z2i ( n(β−β(τ)))−n−1/2 zqi ( n(δ−δ(τ)))−∆zq,i δ(τ) .

i=1

Then we have ) n (  X    | n2α−1 Qn (q) = −n2α−1 ρτ i (τ) − ∆zq,i δ(τ) − ρτ i (τ) +oP (1), i=1

which means the effect of the threshold dominates. Now, we need to find the limiting distribution of ) n (  X    | 2α−1 n ρτ i (τ) − ∆zq,i δ(τ) − ρτ i (τ) . i=1

Using again Knight’s identity, we can write 2α−1

−n

Qn (q) = n

2α−1

n Z X i=1

= −n2α−1 δ(τ)|

n X i=1

|

∆zq,i δ(τ)ψτ (i (τ))

h

i 1{i (τ) ≤ s} − 1{i (τ) < 0} ds

0

! n X 1 | ∆zq,i ψτ (i (τ)) + n2α−1 δ(τ)| fi (F−1 i (τ|zi , xi ))∆zq,i ∆zq,i δ(τ) + oP (1). 2 i=1

We write Sn (τ, q) =

n X

|

ˆ q)) ρτ (yi − Zqi α(τ,

i=1

=

n X

 | √ ˆ − γ(τ))) ρτ i (τ) − n−1/2 z1i ( n(γ(τ)

i=1

 √ | √ ˆ − δ(τ))) − ∆z| δ(τ) . ˆ − β(τ))) − n−1/2 z| ( n(δ(τ) −n−1/2 z2i ( n(β(τ) qi q,i We write, h i qˆ = argminSn (τ, q) = argmin Sn (τ, q∗ ) − Sn (τ, q) . q∈Q

q∈Q

37

We have Qn (q) = Qn (v) = Sn (τ, q∗ ) − Sn (τ, q) ) n (  X    | =− ρτ i (τ) − ∆zq,i δ(τ) − ρτ i (τ) +oP (1) i=1

) n (  X    | =− ρτ i (τ) − ∆zq,v,i δ(τ) − ρτ i (τ) +oP (1) i=1

= δ(τ)

|

n X i=1

! n X 1 | −1 | ∆zq,v,i ψτ (i (τ)) − δ(τ) fi (Fi (τ|zi , xi ))∆zq,v,i ∆zq,v,i δ(τ) + oP (1). 2 i=1

Now, for the first term we have n

X αn | ∆zq,v,i ψτ (i (τ)) δ(τ) n1−2α i=1 =n c

−α |

n X

  z2i 1{xi ≤ q∗ + v/αn } − 1{xi ≤ q∗ } ψτ (i (τ))

i=1

  B τ(1 − τ)c| V0 cg0 , for B(·), Brownian motion with covariance matrix consisting of terms defined on the main text. For the second term, we have ! n X αn | | −1 δ(τ) fi (Fi (τ|zi , xi ))∆zq,v,i ∆zq,v,i δ(τ) n1−2α i=1 c| D0 cg0 |v|, where matrix functional D0 is also defined at the main text. Therefore, αn Qn (v) n1−2α

p

1 τ(1 − τ)c| V0 cg0 W(v) − c| D0 cg0 )|v| 2

p 1 λτ W(v) − µτ |v|, 2 | with λτ = τ(1 − τ)c V0 cg0 and µτ = c| D0 cg0 . More analytically, we have =

1−2a

n



(ˆq − q )

( ) | | 1/2 arg min − c D0 cg0 |v|/2 + (τ(1 − τ)c V0 cg0 ) W(v) . −∞