In this paper, kernel and nearest neighbor estimators of ep(x) are ... these estimators with varying k in k-nearest neighbor method and with varying h in kernel.
KERNEL AND NEAREST NEIGHBOR ESTIMATION OF A CONDITIONAL QUANTILE
by P .K. Bhattacharya University of California, Davis
and Ashis K. Gangopadhyay University of North Carolina, Chapel Hill
ABSTRACT Let (Xl'Z1)' (X 2 ,Z2),· .. ,(X n ,Zn) be LLd. as (X,Z), Z taking values in
o
a and A < that Ix-xal ~ ( implies If"(x) -f"(xa)1 ~ Alx-xal .
a. OJ
such
e
(a) g(~lxa) > a, where G(~lxo) = p. (b)
The partial derivativesgx(zlx) and gxx(zlx) of g(zlx) and Gxx(zlx) of G(z Ix) exist in a neighborhood of (xo'~)' and there exist (> 0 and A < such that
Ix-x OI,,$
f
and" Iz-~ I $
f
Igz(zlx)1 $ A, Igx(zlxo)1 $ A,
OJ
together imply Igxx(zlxa)1 $ A,
Igxx(z Ix)- gxx(z.1 xO) I $ A Ix-xOI, IGx)zlx) -G By condition 2, Now let
~
J9C
(zlx o)1 $ Alx-xol .
is uniquely defined by G( ~ IxO) = p.
{(X.,Z.), i=1,2,... } be iid as (X,Z), and let Yo1 = 1 1
{(Vi'Zi)' i=1,2, ... } are iid as (V =
IX-x OI,
IX,-x I, 1 a
so that
Z) with the pdf fV of V, the conditional pdf g*( 'Iy) of Z given V=y and the corresponding conditional cdf G*( 'Iy) given by
e
-3-
(1)
g*(zly)
= [f(xO +
y) g(zlx o + y) + f(xo-y) g(zlxo-Y)] / fy(Y),
G*(zly) = [f(xO + y) G(zlxo+Y) + f(xo-y)G(zlxo-Y)] / fy(Y). Note that g*(zIO)
= g(zlxo) = g(z),
G*(zIO) = G(zlxo) = G(z).
Here and in what follows, we write g(zlx ) o
= g(z)
and G(zlx ) o
= G(z)
for simplicity.
Let Yn1 < .., < Ynn denote the order statistics and Zn1, ... ,Znn the induced order statistics of (Y1,Zl)'''''(Y ,Z ), Le., Z . = Z. if Y . = y.. For any positive integer nn llJ J m J k ~ n, the k-NN empirical cdf of Z (with respect to x ) is now defined as: O
(2) where 1(5) denotes the indica.tor of the event S. The k-NN esdmc.tor of expressed as the p-quantile of G
(3)
~n ,k
nk
, Le.,
~
.
I
~
can nO\\' be
;
;
= the [kp]th order statistic of Z ;,... ,z k n . , n. "
= inf {z:
.
Gnk(z) ~ [kp]/k}.
I
{
The kernel estimator of ~ with uniform kernel and bandwidth h can also be expressed in the same manner, viz.,
.
-
(4)
{{nh = inf ~ z: Gn K (h)(z) n K (h) = n
~
i=l
l(Y. ~ h/2) 1
~ ~ Kn (liM / Kn(h)}
= n F y , n(h/2)
,
. Fy,n being the empirical cdf of Y1,... ,yn' The kernel estimators are thus related to the NN estimators by
-4-
where Kn(h) is the random integer given by (4). We now state our main results in the following two theorems of which Theorem Nl ,
;'l
'0 " , '
•
gives a Bahadur-type represe~tationfor th,e k-NN estimator ~nk of ~ with k lying in ,~~
!f'h';·.:l'~.
'
and Theorem Kl gives a corresponding representation for the kernel estimator
~
h lying in
Theorem Nl.
t
t
"nk -"
where
f3(~) =-[f(~o)Gxx({lxo) + 2f' (xO) Gx(~lxo)] / {24 rl(xo) g(~)}, .'
and
Theorem Kl.
t
t
"nh -"
where f3(~) and Z~i are as in Theorem Nl, and
nh
with
-5-
Remarks. ; ; : ;-:~';,
1. Let
i \
.A = O'{Y I'Y 2''''}' Then Znl"",Znn are conditionally independent given ,
~ ~_ ~ ~;
;} J
.A,
;
with Zni having conditional cdf G*(·\ Y ni)' as shown by Bhattacharya (1974). Hence G*(ZniIYni)' 1 ~ i ~ n, are conditionally independen~ and uniform (0,1) given.A, and therefore,
Thus for each n, Z~1 "",Z~n are iid with cdf G. Since G(~) = p, it follows that for each n, the summands 1(Z~i > ~) - (l-p) in the above representations are independent random variables with mean 0 and variance p(l-p). 2. The remainder terms in both theorems are O(n corresponds
to
O(k
_ ;l 5
-,i log n),
a.s.
Ir~
Theorem Nl, this
4
log k)
with
k -
O(n~, as one would expect. ,4
The same 1
explanation applies to Theorem Kl, because [n h f(xO)J= O(n~ for h = O(n- 5). 3.
Weaker versions of the above theorems were prdved by Gangopadhyay (1987). 2
His
2
remainder terms were o(n- 5), a.s. in Theorem Nl and 0p(n- 5) in Theorem Kl.
3. Weak Convergence Properties of NN Estimators with Varying k and Kernel Estimators with Varying Bandwidth.
.
-
Consider the stochastic processes {~nk' k E In(a,b)} and {~nh' hE In(c,d)}. The two theorems in this section describe the weak convergence properties of suitably normalized versions of these processes, as n
-+ 00.
The symbol
fi'
indicates convergence in
-6-
distribution, Le., weak convergence of the distributions of the stochastic processes (or random vectors) under consideration and
{B(t), t ~ O}
denotes a standard Brownian
motion. A
Theorem N2. Let Tn(t)
=e
4.• n~[n~tJ'·
2 {n%[Tn(t) - el - (3t , a
Then for any 0 < a < b, ,. (
~ t ~ b}
*'
I
{a C B(t), a
~ t ~ b}
,
2 where (3 = (3(e) given in Theorem NI and a = P(I-p)/g2(e). 4
Proof.
+ (n(t)
In the representation for
enk given in Theorem NI, take
k
4
= [n5 t] = n5 t
with 0 ~ (n(t) < 1. After a little rearrangement of terms, this leads to 4
2
n5[T (t)-e]-(3(e)t n
2
=
5 1 2 [n t] h/p{ I-p) /g(enC . n- 5 E W.
nl
1
;
3 + E RnJ·(t), j=I
where
are iid with mean 0 and variance 1 for each I
g
remainder terms, R 1(t) = n R 4 n n,[n!it] come from the discrepancy 0 ~ (n(t)
n
in view of Remark 1.
from Theorem NI, while R
< 1 due to replacing k
4
n2
Of the three
(t)
and
n3
(t)
4
= [n5 t] by n5t in the first
•
two terms of the representation. Hence
a.s.,
while
R
-7-
:.
4
2
4
is easily seen to be 0p(n- 0) by virtue of sup a~t~b _ 1
\n- 5
'. ' . .
I
[n'5"t]
W. I
~ iJ
1
m
=
°( ) p
1. The three
'...
remainder terms are together 0p(n 5" log n) uniforrnly:lp, a
~
t
~
b. We thus have, with
a=Jp(l-p) /g(~),
uniformly in a
~
t
~
b. Now use Theorem 1, page 452 of Gikhman and Skorokhod (1969)
4
[n5 t] S to see that {nE W., a ~ t ~ b} ~ {B(t), a ~ t ~ b}. This proves the theorem. 2
1
m
Theorem K2. Let Sn(t) =
~
_ 1 . Then for any 0 n ,n 5t
where 'Y
= f3 f2(x O)
Proof.
In the representation for
and
T
= a /f(x O)'
rearrange terms to obtain 2
n5[S n (t)-~] - 'Y t
2
with
~nh
f3
0 and M < all bounded by M for 0
~
y
~
00
such that Iq(z)l, iQ(z)l, Ir(y,z)I and IR(y,z)1 are
{ and Iz-{I
~
{.
Proof. Expand f(x O ± y), g(zlxo ± y) and G(zlxo ± y) about 0 to obtain:
1 2
g(zlx o ± y) = G(zlxo) ± y gx(zlx o) + 2 y {gxx(zlxo) + ~2(Y'z)}, G(zlx o ± y) = G(zlxo) ± y Gx(zlx o) + ~ y2{G xx (zlx o) + ~3(Y'z)} , with max{ 1~I(y)l, 1~2(Y'z)\, 1~3(y,z)l} ~ Ay for 0 ~ y ~ { and Iz-{I ~ {, where
{ > 0 and A < G*(z Iy)
00
are as in conditions 1 and 2. To obtain the formulas for g*(z Iy) and
stated in the lemma, use the above expansions in (1).
It will be seen by
appropriate arrangement of terms that in remainders y3r(y,z) for g*(y Iz) and y3 R(y,z) for G*(y,z), the quantities Ir(y,z)1 and IR(y,z)1 for 0
~
y
~
{ and Iz-{I
~
( will
remain bounded by a constant M determined by f(x O)' f"(x O)' g({lx o)' gxx({lx o), Gx ( {I xO)' Gxx ( {I xO) and A, where {> 0 and A < 00 are as in conditions land 2. The boundedness of 1q(z) I and
6. Proof of Theorem Nl:
IQ(z) I
Bias in
for 1z-{ I ~. { also follows from Condi tion 2.
0
{nk.
. Recall that the target of (nk is (nk' the p-quantile of the random cdf G"nk(·) = k k-1 G*(·I Y ni)' while ( is the p-{}uantile of G(·). The leading term of ~nk - ~ is
f
-14 -
non-stochastic with probability 1~ which is determined in this section. This is the bias in A
~nk' We first use Lemma 3 to bound the discrepancy between G"nk(') and G(') near (, .
:: :
',"
in terms of Y k' This makes max I~ k-~ I small whenever Y 54 is small, in kE I n(a,b) n n,[n b] n the manner described in the foiI6wing lemma.
•
The almost sure order of magnitude of
~nk-~ is obtained as a corollar~. " I
;
Lemma 4. For every B, there exist Nand C such that in the sample space of infinite 1
Y 4 < B n- 5 n,[n5b]
sequences
Proof. Fix B
~nk)
- (l-G ni (e nk )), which are conditionally independent
-17 -
.A= u{Yl'Y2''''},
given
and the remainder term is
corresponds to O(k-! log k) with k =
O(n~,
O(n-! log n),
a.s.,
which
as one would expect. This representation I(Zni > (nk)
is then modified to another one in which the random variables
- (1 - Gni((nk)) are replaced by I(Zni > () - (1;- G ni (()), and to yet another one in which these random variables are replaced by l(Z*. >.~) (1 - G( ~)) where Z*. ill III
-
= G-
1
0
Gni(Zni)' 1 ~ i
~ n, are iid with cdf G(~). Together with the bias term obtained
in Lemma 5, this will establish the representation of ~nk given in Theorem Nl. Having completed the proof of Theorem Nl in this section, we shall then prove Theorem Kl in the next section, via formula (5), establishing the corresponding representation for the kernel estimator (nh (with uniform kernel and bandwidth h). It should be emphasized that Theorem Nl is proved uniformly in k E In(a,b) with 0 < a < b and Theorem K1 is proved uniformly in h E In(c,d) with 0 < c < d where In(a,b) and In(c,d) are as given in Section 2. The first representation of ~nk - ~nk rests on Lemmas 8 and 9, which run parallel to Bahadur's proof [1]. However, we start with a lemma which provides an exponential bound for the deviation of sums of independent Bernoulli variables from their mean and then
.
prove another lemma dealing with fluctuations of G nk (') - G"nk(' ). Lemma 6. Let Unl"",U mn be independent Bernoulli variables with P(Uni=l) = Then
P[I n-
1n 1 2 E (U n1·-7r .) I > t n] ~ 2 exp[- -2 n t n / {max 1 ill 1 ~ i ~n
In particular, if t / max n 1 ~ i ~n P[ In-
1n E (U 1
7r ..... ill
ill
111
0, then for large n,
.-7l' .) I > t
ill
7r.
1 2 ] ~ 2 exp[- -4 n t / max n n 1 ~ i ~n
7l'.]. ill
+ t } 1. n
ni .
7r
-18 -
and if
max 7r. It -l 0, then for large n, '( m n ..." 1(_ l_n
, Proof. The first inequality is 205), from which the Lemma
7.
I (nk-(nk I ~
a simplified version of Bernstein's inequality (see [14], page
other't~~ follow as special cases.
Suppose
kmeasurable
are
random
variables
with
2
C n- 5 log n = (n(C), Then for any "I, there exists M such that
3
> M n- 5 log nJ < 00 , Proof. Write Unki
= I(Zni
E(U nki I vC). Then
Choose
B
I (nk-(nk I ~
> b/f(x O)'
~ (nk) - I(Zni ~ (nk) and /lnki
-'·r
, ,'.
and for each
(n (C) = C n-
glog n,
n, let
S n
=
{Y
4
n,[n 5bJ
~
1
B n- 5}
Since
Lemma 4 implies that there exist C' and N such
that for n ~ N and for z lying between (nk and (nk'
Iz-( I ~
2
2
C n- 5 log n + C' n- 5 ~
2 (n(C) holds on the set Sn' Using Lemma 3, we now conclude that when n is large, then on Sn'
-19 -
It now follows from Lemma 6 that for sufficiently large n,
max kEln(a,b)
k P[I k-1 E (U k Jl k o-
1
nl
"
O )
nl
~
I > M n- ~ log nl "
3 1k max E P[lk- E (U k Jl k > M n-Slog nl kEln(a,b) 1 n I n I
=
o-
O
)
~ 2 exp[-M 2 a{5 C g(~)}-l log nl + P(S~) To complete the proof, observe that
~
n-
.
0
5Cg(~)} -1
C n-! log n La.]
=0
for large C.
.
Proof. It follows from the monotonicity of Gnk (') and ITnk (·) that for
Z
E J nk ,r =
[1]nk ,r' 1]nk ,r+1],
where Hnk (·) is given by (17), and
Hence
~
Let Sn
= {Y
IHnk (1]nk r) I +
max -bn - {M + ~ P[
2g(en n-!log nJ
max max kEI (a,b) -b M n kEI (a b) -b n- Slog nl n n
k 2 E EP[I k-1 E {U k' - E(U k· 1~} I > n- 5 log n IAl kEI n ( a, b) 1 n) n )
-23-
4 ~ 2(b-a)n~
2 exp[-2(b-a)(log n) ] , 00
~
by Theorem 1 of Hoeffding (1963), and
4 n~
2
exp[-2(b-a) (log n) ] < 00.
n=l Hence
From (18), (19), (20), and (21), we have
max kEI (a,b)
1((nk-~nk)-{g(~)}-l [P-Gnk(~nk)]1 =O(n-~logn),
a.s.
n
.
-1 k
Since p - Gnk(~nk) = k
~ [1(Zni > ~nk) - {1-Gni(~nk)}]' 1
we now have the following representation:
(22a) max kEI n (a,b)
IR k l =O(n-!logn),
a.s.
n
This representation can be easily modified to two other slightly different forms, viz.,
(22b)
. ~nk = ~nk
+ {k g( ~)}-
1k ~ [1(Zni > ~) - {1-Gni(~)}] 1
+ Rnk
and
•
(22c)
~nk
k
=
~nk + {k g(~)}-1 ~ [1(Z~i > ~) - {1-G(~)}] + Rnk 1
-24 -
max kEI n (a,b)
where
IRnkl
== O(n-! 'log n),
a.s., in both (22b) and (22c), and Z*. m
1
•
= G- 0 Gni(Zni) and G(·) = G( ·Ixo) is the conditional cdf of Z given
x=
x O' Note
are conditionally independent given v'6, with
Z.
having
,
that since
11l
conditional cdf G , ni
P[Z~i ~ zi' i=l, ... ,n] = E P[Z~i ~ zi' i=l,... ,n I ~ -1
.
n
= E P[Z 11l. -(G n 1. 0 G(z.), 1=1,... ,nl ~ = II 1 1 G(Z1')' VOJ
so that for each n, Z~l "",Z~n are iid with cdf G( . ). To obtain (22b) from (22a), use Lemma 7 with (nk = e to conclude that 1k max Ik- ~ {l(Z·~ek)-l(Z·~e)}-{G·(ek)-G ·(enl kEI n (a,b) 1 m n m m n m
To obtain (22c) from (22b) it is enough to show that
(23)
max kEI (a,b) n
where E(U
ni
U. m = l(Z*. m
I JC).
I
k-
1k, 3 ~1 (Uni-/l ) 1 = O(n- 5 log n), a.s., ni
> G-1 0" ·G..m·(e)) - l(Z*. m > e), and Itnl' - G(e) - Gnl·(O =
For this, use Lemma 3 to observe that for large n,
max 4
i~ [ n5 b]
l/l·I~B
m
2
2
IQ(Oln- 5 0nthesetS ={Y
and then use Lemma 6 and Lemma 1 to obtain
n
1
4
n,[n 5bj
~Bn-5},
•
-25-
~ 2(b-a)
00
2
4
1
2
00
E n!i exp[-a{4 B IQ(~)I}- (log n) ] + E P(S~)
~) - (l-p)] + R I' (h)' 1 . III n \.n
However, this is of no use unless we can show that
(a)
sup IRK (h) I converges at a fast rate, hEJn(c,d) n n 1
where In(c,d) (b)
1
= [n- 5 c, n-!i d],
0
2nfilogn]~(I/+I) sup P[I~I(h)l>n510gn]. hEJn(c,d) n n hEJn(c,d) n
?
2
sup P[I~nl(h)1 > n5 log nJ ~ 2 exp[-(log nt/{S df(x )}] O hEJ n (c,d) 00
by Bernstein's inequality, and
I;
2
2
n5 exp[-a(log n) ]
O.
II cncC'
n=1 sup I~ 1(h) I = hEJn(c,d) n
•
2
0(n 5 log n), a.s., and the lemma is proved.
o
-27 -
The convergence rate of
sup IRK' (h)· 1 is now determined in the following hEJ (c,d) n n n
lemma. Lemma 11.
sup IRnK (h) I = O(n-! log n), a.s. hEJn(c,d) n
Proof. For 0 < c < d, let 0 < a = c f(x O)/2 < 2 d f(x O) = b, and let
IR
A = {sup hEJ (c,d) n n
K (h)1 > M n-!log n}, n n
IRnk I >
B = { max n kEI (a,b) n
3
M n- 5 log n},
2
C = {max I~ (h) I > M nO log n} . n hEJ n(c,d) n There there exists NO=NO(M) such that for n > NO' C~ implies Kn (h) E In (a,b) for all hE In(c,d). Thus CnC n An c CnC n Bn for n > NO . It now follows that for sufficiently large M,
P[A
n
La.] ~ P[C
n
~ P[C
La.]
+ P[ U
La.] + P[ U
n
n
N0->1 n>N - 0
n
C~ and An 1.0.]
N>N O n~NO
Cnc and Bn La.]
= 0, since for large M, P[C n La.] = 0 by Lemma 10 and P[B n La.] = 0 by Theorem N1. o
-28 -
We now consider the first two term,s on the RHS of (24). Of these,
•
(25) where Rl~h
= ,8(Of2(xO)h 2 [~n(h)/{nhf(xO)}]
. [2
+ ~n(h)/{nhf(xO)}], and by Lemma
10,
To examine the other term, let
Then
{Kn(h)g(~)}-l
Kn(h)
E., Uni 1
=
{mn(h)g(~)}-l
mn(h)
E 1
Um.
+ R"nh + R'"nh '
mn(h) , R~h = -{~n(h)/Kn(h)} {mn(h)g(~)}-l E U. m
(27)
1
where Un1 ,,,,,U nn are conditionally independent given ..t6, with E(U m·1 Lemma 10,
sup hEJ n (c,d)
I~n(h)/Kh(h) I = O(n-
mn(h) 1 m (h) E sup n 1 hEJ n (c,d)
U·I = m
2 fj
~ =
O. By
log n), a.s., and
k 1 1k-1 E U .1 = O(n- 5), a.s. max 1 m n!cf(x ) ~k~n!df(xo) o
•
-29-
by an application of Theorem 1 of Hoeffding (1963). Hence
Now consider the jump-points of mn(h) = [nhf(x )] in In(c,d) together with the endO 1
1
points n- 5" c and n- 5" d, and call these points n-
!c =h
< h n1 < ... < h nvn = nnO
1
5" d.
Then
and mn(h) is constant on each of the vn intervals [hnj , hn ,j+l). At the same time, Kn(h) is also integer-valued and non- - 2 M n log n]
1 '+1) - Kn(h nJ·n -< P[n- {K n (h n,J. ~
nJ,>- M n-~ log n]
-1r
1
exp[-(M/4)n5 Iog n] 1
00
by Lemma 6, and
4
v exp[-(/L/4)n5 log n]
I: n=1 n
< 00, because vn
= canst. n5 , This proves
e -\
(29). Kn(h nj ) Finally, note that
f'
I11
Uni
n(h nj )
f
Uni is a sum of
IKn(h nj ) -
mn(h nj ) I
terms in which the summands U ' given in (26), when K (h .) > m (h ,), and -U " III n nJ n nJ III when K (h .) < m (h .) are conditionally independent given.A, with E(U ,I vC) = o. n nJ n nJ· , III Using Hoeffding's inequality, we therfore have
•
- 31-
2
< > M nfilog n] - 2 P[lK n(h nJ.) -m n(hn·)1 1 J
+ 2 exp[-2 M log n] Since
00
00
4
E vn exp[-2 M log n] = canst. E nfi n=l n=1
-2M
< 00 for sufficiently large M, we only
need to show that 00
(30)
4
2
E nfi P[I K (h .) - m (h .) I > M n5 log n] < 00
n=1
n nJ
n nJ
for large M, in order to establish (28). But Kn(h) - Mn(h) = Kn(h) - [nhf(x O)] differs from ~n(h) = Kn(h) - nhf(x O) by at most 1, and it was shown in the proof of Lemma 10,
g
that P[ I~n(h) I > M n log n] < exp[-a(1og n)2] for some a > 0, which implies (30) anel thus (28) is established. \Ve have now shown that in (27),
t
sup IR"'h I = O(n-- log n), a.s. hEJn(c,d) n To complete the proof of Theorem Kl, substitute the expressions in (25) and (27) for the first two terms on the RHS of (24). The remainder terms R'nlI' R"l nl and R"'l nl is these expressions have all been shown to be
O(n-
t log
n), a.s. and uniformly in
h E In(c,d), and the other remainder term RnKn(h) has also been shown to be of this order in Lemma 11. This establishes the order of magnitude of R*h n = R'nh + R"h n + + RnK (h) claimed in the theorem. n
R"']
nl
-32-
REFERENCES
(1) Bahadur, RR (1966). A note on quantiles in large samples. Ann. Math. Statist. 37, 577-580. (2) Bhattacharya, P .K. (1963). On an analog of regression analysis. Ann. Math. Statist. 34, 1459-1473. (3) Bhattacharya, P .K. (974). Convergence of sample paths of normalized sums of induced order statistics. Ann. Statist. 2, 1034-1039. (4) Bhattacharya, P.K. and Mack Y.P. (1987). Weak convergence of k-NN density and regression estimators with varying k and applications. 15, 976-994.
(5) Cheng, K.F. (1983).
Nonparametric estimators for percentile regression functions.
Commun.Statist.-Theor. Meth. 12, 681-692. (6) Csorgo, M. and Revesz, P. (1981). Strong approximations in Probability and Statistics. Academic Press, New York. (7) Gangopadhyay, A.K. (1987). Nonparametric estimation of conditional quantile function. Ph.D. dissertation at the University of California at Davis. (8) Gikhman 1.1. and Skorokhod, A.V. (1969). Introduction to the theory of random processes. \V.B. Saunders company, Philadelphia. (9) Hoeffding, \V. (1963). Probability ineqalities for sums of bounded random variables. 1. Amer. Statist. Assoc. 58, 13-30. (10) Hogg, RV.
(1975).
Estimates of percentile regression lines using salary data.
J.
Amer. Statist. Assoc. 70, 56-59. (11) Krieger, A.M. and Pickands, III, J. (1981). Weak convergence and efficient density estimation at a point. Ann. Statist. 9, 1066-1078.
(12) Stone, C.J. (1977). Consistent nonparametric regression. Ann. Statist. 5,595-645. (13) Stute, W. (1986). Conditional empirical processes. Ann. Stistist., 14,638-647. (14) Uspensky, J.V. (1937). Introduction to Mathematical Probability. McGraw-Hill, New York.
e)