KERNEL AND NEAREST NEIGHBOR

1 downloads 0 Views 824KB Size Report
In this paper, kernel and nearest neighbor estimators of ep(x) are ... these estimators with varying k in k-nearest neighbor method and with varying h in kernel.
KERNEL AND NEAREST NEIGHBOR ESTIMATION OF A CONDITIONAL QUANTILE

by P .K. Bhattacharya University of California, Davis

and Ashis K. Gangopadhyay University of North Carolina, Chapel Hill

ABSTRACT Let (Xl'Z1)' (X 2 ,Z2),· .. ,(X n ,Zn) be LLd. as (X,Z), Z taking values in

o
a and A < that Ix-xal ~ ( implies If"(x) -f"(xa)1 ~ Alx-xal .

a. OJ

such

e

(a) g(~lxa) > a, where G(~lxo) = p. (b)

The partial derivativesgx(zlx) and gxx(zlx) of g(zlx) and Gxx(zlx) of G(z Ix) exist in a neighborhood of (xo'~)' and there exist (> 0 and A < such that

Ix-x OI,,$

f

and" Iz-~ I $

f

Igz(zlx)1 $ A, Igx(zlxo)1 $ A,

OJ

together imply Igxx(zlxa)1 $ A,

Igxx(z Ix)- gxx(z.1 xO) I $ A Ix-xOI, IGx)zlx) -G By condition 2, Now let

~

J9C

(zlx o)1 $ Alx-xol .

is uniquely defined by G( ~ IxO) = p.

{(X.,Z.), i=1,2,... } be iid as (X,Z), and let Yo1 = 1 1

{(Vi'Zi)' i=1,2, ... } are iid as (V =

IX-x OI,

IX,-x I, 1 a

so that

Z) with the pdf fV of V, the conditional pdf g*( 'Iy) of Z given V=y and the corresponding conditional cdf G*( 'Iy) given by

e

-3-

(1)

g*(zly)

= [f(xO +

y) g(zlx o + y) + f(xo-y) g(zlxo-Y)] / fy(Y),

G*(zly) = [f(xO + y) G(zlxo+Y) + f(xo-y)G(zlxo-Y)] / fy(Y). Note that g*(zIO)

= g(zlxo) = g(z),

G*(zIO) = G(zlxo) = G(z).

Here and in what follows, we write g(zlx ) o

= g(z)

and G(zlx ) o

= G(z)

for simplicity.

Let Yn1 < .., < Ynn denote the order statistics and Zn1, ... ,Znn the induced order statistics of (Y1,Zl)'''''(Y ,Z ), Le., Z . = Z. if Y . = y.. For any positive integer nn llJ J m J k ~ n, the k-NN empirical cdf of Z (with respect to x ) is now defined as: O

(2) where 1(5) denotes the indica.tor of the event S. The k-NN esdmc.tor of expressed as the p-quantile of G

(3)

~n ,k

nk

, Le.,

~

.

I

~

can nO\\' be

;

;

= the [kp]th order statistic of Z ;,... ,z k n . , n. "

= inf {z:

.

Gnk(z) ~ [kp]/k}.

I

{

The kernel estimator of ~ with uniform kernel and bandwidth h can also be expressed in the same manner, viz.,

.

-

(4)

{{nh = inf ~ z: Gn K (h)(z) n K (h) = n

~

i=l

l(Y. ~ h/2) 1

~ ~ Kn (liM / Kn(h)}

= n F y , n(h/2)

,

. Fy,n being the empirical cdf of Y1,... ,yn' The kernel estimators are thus related to the NN estimators by

-4-

where Kn(h) is the random integer given by (4). We now state our main results in the following two theorems of which Theorem Nl ,

;'l

'0 " , '



gives a Bahadur-type represe~tationfor th,e k-NN estimator ~nk of ~ with k lying in ,~~

!f'h';·.:l'~.

'

and Theorem Kl gives a corresponding representation for the kernel estimator

~

h lying in

Theorem Nl.

t

t

"nk -"

where

f3(~) =-[f(~o)Gxx({lxo) + 2f' (xO) Gx(~lxo)] / {24 rl(xo) g(~)}, .'

and

Theorem Kl.

t

t

"nh -"

where f3(~) and Z~i are as in Theorem Nl, and

nh

with

-5-

Remarks. ; ; : ;-:~';,

1. Let

i \

.A = O'{Y I'Y 2''''}' Then Znl"",Znn are conditionally independent given ,

~ ~_ ~ ~;

;} J

.A,

;

with Zni having conditional cdf G*(·\ Y ni)' as shown by Bhattacharya (1974). Hence G*(ZniIYni)' 1 ~ i ~ n, are conditionally independen~ and uniform (0,1) given.A, and therefore,

Thus for each n, Z~1 "",Z~n are iid with cdf G. Since G(~) = p, it follows that for each n, the summands 1(Z~i > ~) - (l-p) in the above representations are independent random variables with mean 0 and variance p(l-p). 2. The remainder terms in both theorems are O(n corresponds

to

O(k

_ ;l 5

-,i log n),

a.s.

Ir~

Theorem Nl, this

4

log k)

with

k -

O(n~, as one would expect. ,4

The same 1

explanation applies to Theorem Kl, because [n h f(xO)J= O(n~ for h = O(n- 5). 3.

Weaker versions of the above theorems were prdved by Gangopadhyay (1987). 2

His

2

remainder terms were o(n- 5), a.s. in Theorem Nl and 0p(n- 5) in Theorem Kl.

3. Weak Convergence Properties of NN Estimators with Varying k and Kernel Estimators with Varying Bandwidth.

.

-

Consider the stochastic processes {~nk' k E In(a,b)} and {~nh' hE In(c,d)}. The two theorems in this section describe the weak convergence properties of suitably normalized versions of these processes, as n

-+ 00.

The symbol

fi'

indicates convergence in

-6-

distribution, Le., weak convergence of the distributions of the stochastic processes (or random vectors) under consideration and

{B(t), t ~ O}

denotes a standard Brownian

motion. A

Theorem N2. Let Tn(t)

=e

4.• n~[n~tJ'·

2 {n%[Tn(t) - el - (3t , a

Then for any 0 < a < b, ,. (

~ t ~ b}

*'

I

{a C B(t), a

~ t ~ b}

,

2 where (3 = (3(e) given in Theorem NI and a = P(I-p)/g2(e). 4

Proof.

+ (n(t)

In the representation for

enk given in Theorem NI, take

k

4

= [n5 t] = n5 t

with 0 ~ (n(t) < 1. After a little rearrangement of terms, this leads to 4

2

n5[T (t)-e]-(3(e)t n

2

=

5 1 2 [n t] h/p{ I-p) /g(enC . n- 5 E W.

nl

1

;

3 + E RnJ·(t), j=I

where

are iid with mean 0 and variance 1 for each I

g

remainder terms, R 1(t) = n R 4 n n,[n!it] come from the discrepancy 0 ~ (n(t)

n

in view of Remark 1.

from Theorem NI, while R

< 1 due to replacing k

4

n2

Of the three

(t)

and

n3

(t)

4

= [n5 t] by n5t in the first



two terms of the representation. Hence

a.s.,

while

R

-7-

:.

4

2

4

is easily seen to be 0p(n- 0) by virtue of sup a~t~b _ 1

\n- 5

'. ' . .

I

[n'5"t]

W. I

~ iJ

1

m

=

°( ) p

1. The three

'...

remainder terms are together 0p(n 5" log n) uniforrnly:lp, a

~

t

~

b. We thus have, with

a=Jp(l-p) /g(~),

uniformly in a

~

t

~

b. Now use Theorem 1, page 452 of Gikhman and Skorokhod (1969)

4

[n5 t] S to see that {nE W., a ~ t ~ b} ~ {B(t), a ~ t ~ b}. This proves the theorem. 2

1

m

Theorem K2. Let Sn(t) =

~

_ 1 . Then for any 0 n ,n 5t

where 'Y

= f3 f2(x O)

Proof.

In the representation for

and

T

= a /f(x O)'

rearrange terms to obtain 2

n5[S n (t)-~] - 'Y t

2

with

~nh

f3


0 and M < all bounded by M for 0

~

y

~

00

such that Iq(z)l, iQ(z)l, Ir(y,z)I and IR(y,z)1 are

{ and Iz-{I

~

{.

Proof. Expand f(x O ± y), g(zlxo ± y) and G(zlxo ± y) about 0 to obtain:

1 2

g(zlx o ± y) = G(zlxo) ± y gx(zlx o) + 2 y {gxx(zlxo) + ~2(Y'z)}, G(zlx o ± y) = G(zlxo) ± y Gx(zlx o) + ~ y2{G xx (zlx o) + ~3(Y'z)} , with max{ 1~I(y)l, 1~2(Y'z)\, 1~3(y,z)l} ~ Ay for 0 ~ y ~ { and Iz-{I ~ {, where

{ > 0 and A < G*(z Iy)

00

are as in conditions 1 and 2. To obtain the formulas for g*(z Iy) and

stated in the lemma, use the above expansions in (1).

It will be seen by

appropriate arrangement of terms that in remainders y3r(y,z) for g*(y Iz) and y3 R(y,z) for G*(y,z), the quantities Ir(y,z)1 and IR(y,z)1 for 0

~

y

~

{ and Iz-{I

~

( will

remain bounded by a constant M determined by f(x O)' f"(x O)' g({lx o)' gxx({lx o), Gx ( {I xO)' Gxx ( {I xO) and A, where {> 0 and A < 00 are as in conditions land 2. The boundedness of 1q(z) I and

6. Proof of Theorem Nl:

IQ(z) I

Bias in

for 1z-{ I ~. { also follows from Condi tion 2.

0

{nk.

. Recall that the target of (nk is (nk' the p-quantile of the random cdf G"nk(·) = k k-1 G*(·I Y ni)' while ( is the p-{}uantile of G(·). The leading term of ~nk - ~ is

f

-14 -

non-stochastic with probability 1~ which is determined in this section. This is the bias in A

~nk' We first use Lemma 3 to bound the discrepancy between G"nk(') and G(') near (, .

:: :

',"

in terms of Y k' This makes max I~ k-~ I small whenever Y 54 is small, in kE I n(a,b) n n,[n b] n the manner described in the foiI6wing lemma.



The almost sure order of magnitude of

~nk-~ is obtained as a corollar~. " I

;

Lemma 4. For every B, there exist Nand C such that in the sample space of infinite 1

Y 4 < B n- 5 n,[n5b]

sequences

Proof. Fix B


~nk)

- (l-G ni (e nk )), which are conditionally independent

-17 -

.A= u{Yl'Y2''''},

given

and the remainder term is

corresponds to O(k-! log k) with k =

O(n~,

O(n-! log n),

a.s.,

which

as one would expect. This representation I(Zni > (nk)

is then modified to another one in which the random variables

- (1 - Gni((nk)) are replaced by I(Zni > () - (1;- G ni (()), and to yet another one in which these random variables are replaced by l(Z*. >.~) (1 - G( ~)) where Z*. ill III

-

= G-

1

0

Gni(Zni)' 1 ~ i

~ n, are iid with cdf G(~). Together with the bias term obtained

in Lemma 5, this will establish the representation of ~nk given in Theorem Nl. Having completed the proof of Theorem Nl in this section, we shall then prove Theorem Kl in the next section, via formula (5), establishing the corresponding representation for the kernel estimator (nh (with uniform kernel and bandwidth h). It should be emphasized that Theorem Nl is proved uniformly in k E In(a,b) with 0 < a < b and Theorem K1 is proved uniformly in h E In(c,d) with 0 < c < d where In(a,b) and In(c,d) are as given in Section 2. The first representation of ~nk - ~nk rests on Lemmas 8 and 9, which run parallel to Bahadur's proof [1]. However, we start with a lemma which provides an exponential bound for the deviation of sums of independent Bernoulli variables from their mean and then

.

prove another lemma dealing with fluctuations of G nk (') - G"nk(' ). Lemma 6. Let Unl"",U mn be independent Bernoulli variables with P(Uni=l) = Then

P[I n-

1n 1 2 E (U n1·-7r .) I > t n] ~ 2 exp[- -2 n t n / {max 1 ill 1 ~ i ~n

In particular, if t / max n 1 ~ i ~n P[ In-

1n E (U 1

7r ..... ill

ill

111

0, then for large n,

.-7l' .) I > t

ill

7r.

1 2 ] ~ 2 exp[- -4 n t / max n n 1 ~ i ~n

7l'.]. ill

+ t } 1. n

ni .

7r

-18 -

and if

max 7r. It -l 0, then for large n, '( m n ..." 1(_ l_n

, Proof. The first inequality is 205), from which the Lemma

7.

I (nk-(nk I ~

a simplified version of Bernstein's inequality (see [14], page

other't~~ follow as special cases.

Suppose

kmeasurable

are

random

variables

with

2

C n- 5 log n = (n(C), Then for any "I, there exists M such that

3

> M n- 5 log nJ < 00 , Proof. Write Unki

= I(Zni

E(U nki I vC). Then

Choose

B

I (nk-(nk I ~

> b/f(x O)'

~ (nk) - I(Zni ~ (nk) and /lnki

-'·r

, ,'.

and for each

(n (C) = C n-

glog n,

n, let

S n

=

{Y

4

n,[n 5bJ

~

1

B n- 5}

Since

Lemma 4 implies that there exist C' and N such

that for n ~ N and for z lying between (nk and (nk'

Iz-( I ~

2

2

C n- 5 log n + C' n- 5 ~

2 (n(C) holds on the set Sn' Using Lemma 3, we now conclude that when n is large, then on Sn'

-19 -

It now follows from Lemma 6 that for sufficiently large n,

max kEln(a,b)

k P[I k-1 E (U k Jl k o-

1

nl

"

O )

nl

~

I > M n- ~ log nl "

3 1k max E P[lk- E (U k Jl k > M n-Slog nl kEln(a,b) 1 n I n I

=

o-

O

)

~ 2 exp[-M 2 a{5 C g(~)}-l log nl + P(S~) To complete the proof, observe that

~

n-

.

0

5Cg(~)} -1
C n-! log n La.]

=0

for large C.

.

Proof. It follows from the monotonicity of Gnk (') and ITnk (·) that for

Z

E J nk ,r =

[1]nk ,r' 1]nk ,r+1],

where Hnk (·) is given by (17), and

Hence

~

Let Sn

= {Y

IHnk (1]nk r) I +

max -bn - {M + ~ P[

2g(en n-!log nJ

max max kEI (a,b) -b M n kEI (a b) -b n- Slog nl n n

k 2 E EP[I k-1 E {U k' - E(U k· 1~} I > n- 5 log n IAl kEI n ( a, b) 1 n) n )

-23-

4 ~ 2(b-a)n~

2 exp[-2(b-a)(log n) ] , 00

~

by Theorem 1 of Hoeffding (1963), and

4 n~

2

exp[-2(b-a) (log n) ] < 00.

n=l Hence

From (18), (19), (20), and (21), we have

max kEI (a,b)

1((nk-~nk)-{g(~)}-l [P-Gnk(~nk)]1 =O(n-~logn),

a.s.

n

.

-1 k

Since p - Gnk(~nk) = k

~ [1(Zni > ~nk) - {1-Gni(~nk)}]' 1

we now have the following representation:

(22a) max kEI n (a,b)

IR k l =O(n-!logn),

a.s.

n

This representation can be easily modified to two other slightly different forms, viz.,

(22b)

. ~nk = ~nk

+ {k g( ~)}-

1k ~ [1(Zni > ~) - {1-Gni(~)}] 1

+ Rnk

and



(22c)

~nk

k

=

~nk + {k g(~)}-1 ~ [1(Z~i > ~) - {1-G(~)}] + Rnk 1

-24 -

max kEI n (a,b)

where

IRnkl

== O(n-! 'log n),

a.s., in both (22b) and (22c), and Z*. m

1



= G- 0 Gni(Zni) and G(·) = G( ·Ixo) is the conditional cdf of Z given

x=

x O' Note

are conditionally independent given v'6, with

Z.

having

,

that since

11l

conditional cdf G , ni

P[Z~i ~ zi' i=l, ... ,n] = E P[Z~i ~ zi' i=l,... ,n I ~ -1

.

n

= E P[Z 11l. -(G n 1. 0 G(z.), 1=1,... ,nl ~ = II 1 1 G(Z1')' VOJ

so that for each n, Z~l "",Z~n are iid with cdf G( . ). To obtain (22b) from (22a), use Lemma 7 with (nk = e to conclude that 1k max Ik- ~ {l(Z·~ek)-l(Z·~e)}-{G·(ek)-G ·(enl kEI n (a,b) 1 m n m m n m

To obtain (22c) from (22b) it is enough to show that

(23)

max kEI (a,b) n

where E(U

ni

U. m = l(Z*. m

I JC).

I

k-

1k, 3 ~1 (Uni-/l ) 1 = O(n- 5 log n), a.s., ni

> G-1 0" ·G..m·(e)) - l(Z*. m > e), and Itnl' - G(e) - Gnl·(O =

For this, use Lemma 3 to observe that for large n,

max 4

i~ [ n5 b]

l/l·I~B

m

2

2

IQ(Oln- 5 0nthesetS ={Y

and then use Lemma 6 and Lemma 1 to obtain

n

1

4

n,[n 5bj

~Bn-5},



-25-

~ 2(b-a)

00

2

4

1

2

00

E n!i exp[-a{4 B IQ(~)I}- (log n) ] + E P(S~)
~) - (l-p)] + R I' (h)' 1 . III n \.n

However, this is of no use unless we can show that

(a)

sup IRK (h) I converges at a fast rate, hEJn(c,d) n n 1

where In(c,d) (b)

1

= [n- 5 c, n-!i d],

0
2nfilogn]~(I/+I) sup P[I~I(h)l>n510gn]. hEJn(c,d) n n hEJn(c,d) n

?

2

sup P[I~nl(h)1 > n5 log nJ ~ 2 exp[-(log nt/{S df(x )}] O hEJ n (c,d) 00

by Bernstein's inequality, and

I;

2

2

n5 exp[-a(log n) ]
O.

II cncC'

n=1 sup I~ 1(h) I = hEJn(c,d) n



2

0(n 5 log n), a.s., and the lemma is proved.

o

-27 -

The convergence rate of

sup IRK' (h)· 1 is now determined in the following hEJ (c,d) n n n

lemma. Lemma 11.

sup IRnK (h) I = O(n-! log n), a.s. hEJn(c,d) n

Proof. For 0 < c < d, let 0 < a = c f(x O)/2 < 2 d f(x O) = b, and let

IR

A = {sup hEJ (c,d) n n

K (h)1 > M n-!log n}, n n

IRnk I >

B = { max n kEI (a,b) n

3

M n- 5 log n},

2

C = {max I~ (h) I > M nO log n} . n hEJ n(c,d) n There there exists NO=NO(M) such that for n > NO' C~ implies Kn (h) E In (a,b) for all hE In(c,d). Thus CnC n An c CnC n Bn for n > NO . It now follows that for sufficiently large M,

P[A

n

La.] ~ P[C

n

~ P[C

La.]

+ P[ U

La.] + P[ U

n

n

N0->1 n>N - 0

n

C~ and An 1.0.]

N>N O n~NO

Cnc and Bn La.]

= 0, since for large M, P[C n La.] = 0 by Lemma 10 and P[B n La.] = 0 by Theorem N1. o

-28 -

We now consider the first two term,s on the RHS of (24). Of these,



(25) where Rl~h

= ,8(Of2(xO)h 2 [~n(h)/{nhf(xO)}]

. [2

+ ~n(h)/{nhf(xO)}], and by Lemma

10,

To examine the other term, let

Then

{Kn(h)g(~)}-l

Kn(h)

E., Uni 1

=

{mn(h)g(~)}-l

mn(h)

E 1

Um.

+ R"nh + R'"nh '

mn(h) , R~h = -{~n(h)/Kn(h)} {mn(h)g(~)}-l E U. m

(27)

1

where Un1 ,,,,,U nn are conditionally independent given ..t6, with E(U m·1 Lemma 10,

sup hEJ n (c,d)

I~n(h)/Kh(h) I = O(n-

mn(h) 1 m (h) E sup n 1 hEJ n (c,d)

U·I = m

2 fj

~ =

O. By

log n), a.s., and

k 1 1k-1 E U .1 = O(n- 5), a.s. max 1 m n!cf(x ) ~k~n!df(xo) o



-29-

by an application of Theorem 1 of Hoeffding (1963). Hence

Now consider the jump-points of mn(h) = [nhf(x )] in In(c,d) together with the endO 1

1

points n- 5" c and n- 5" d, and call these points n-

!c =h

< h n1 < ... < h nvn = nnO

1

5" d.

Then

and mn(h) is constant on each of the vn intervals [hnj , hn ,j+l). At the same time, Kn(h) is also integer-valued and non- - 2 M n log n]

1 '+1) - Kn(h nJ·n -< P[n- {K n (h n,J. ~

nJ,>- M n-~ log n]

-1r

1

exp[-(M/4)n5 Iog n] 1

00

by Lemma 6, and

4

v exp[-(/L/4)n5 log n]

I: n=1 n

< 00, because vn

= canst. n5 , This proves

e -\

(29). Kn(h nj ) Finally, note that

f'

I11

Uni

n(h nj )

f

Uni is a sum of

IKn(h nj ) -

mn(h nj ) I

terms in which the summands U ' given in (26), when K (h .) > m (h ,), and -U " III n nJ n nJ III when K (h .) < m (h .) are conditionally independent given.A, with E(U ,I vC) = o. n nJ n nJ· , III Using Hoeffding's inequality, we therfore have



- 31-

2

< > M nfilog n] - 2 P[lK n(h nJ.) -m n(hn·)1 1 J

+ 2 exp[-2 M log n] Since

00

00

4

E vn exp[-2 M log n] = canst. E nfi n=l n=1

-2M

< 00 for sufficiently large M, we only

need to show that 00

(30)

4

2

E nfi P[I K (h .) - m (h .) I > M n5 log n] < 00

n=1

n nJ

n nJ

for large M, in order to establish (28). But Kn(h) - Mn(h) = Kn(h) - [nhf(x O)] differs from ~n(h) = Kn(h) - nhf(x O) by at most 1, and it was shown in the proof of Lemma 10,

g

that P[ I~n(h) I > M n log n] < exp[-a(1og n)2] for some a > 0, which implies (30) anel thus (28) is established. \Ve have now shown that in (27),

t

sup IR"'h I = O(n-- log n), a.s. hEJn(c,d) n To complete the proof of Theorem Kl, substitute the expressions in (25) and (27) for the first two terms on the RHS of (24). The remainder terms R'nlI' R"l nl and R"'l nl is these expressions have all been shown to be

O(n-

t log

n), a.s. and uniformly in

h E In(c,d), and the other remainder term RnKn(h) has also been shown to be of this order in Lemma 11. This establishes the order of magnitude of R*h n = R'nh + R"h n + + RnK (h) claimed in the theorem. n

R"']

nl

-32-

REFERENCES

(1) Bahadur, RR (1966). A note on quantiles in large samples. Ann. Math. Statist. 37, 577-580. (2) Bhattacharya, P .K. (1963). On an analog of regression analysis. Ann. Math. Statist. 34, 1459-1473. (3) Bhattacharya, P .K. (974). Convergence of sample paths of normalized sums of induced order statistics. Ann. Statist. 2, 1034-1039. (4) Bhattacharya, P.K. and Mack Y.P. (1987). Weak convergence of k-NN density and regression estimators with varying k and applications. 15, 976-994.

(5) Cheng, K.F. (1983).

Nonparametric estimators for percentile regression functions.

Commun.Statist.-Theor. Meth. 12, 681-692. (6) Csorgo, M. and Revesz, P. (1981). Strong approximations in Probability and Statistics. Academic Press, New York. (7) Gangopadhyay, A.K. (1987). Nonparametric estimation of conditional quantile function. Ph.D. dissertation at the University of California at Davis. (8) Gikhman 1.1. and Skorokhod, A.V. (1969). Introduction to the theory of random processes. \V.B. Saunders company, Philadelphia. (9) Hoeffding, \V. (1963). Probability ineqalities for sums of bounded random variables. 1. Amer. Statist. Assoc. 58, 13-30. (10) Hogg, RV.

(1975).

Estimates of percentile regression lines using salary data.

J.

Amer. Statist. Assoc. 70, 56-59. (11) Krieger, A.M. and Pickands, III, J. (1981). Weak convergence and efficient density estimation at a point. Ann. Statist. 9, 1066-1078.

(12) Stone, C.J. (1977). Consistent nonparametric regression. Ann. Statist. 5,595-645. (13) Stute, W. (1986). Conditional empirical processes. Ann. Stistist., 14,638-647. (14) Uspensky, J.V. (1937). Introduction to Mathematical Probability. McGraw-Hill, New York.

e)