Some asymptotics for multimodality tests based on

0 downloads 0 Views 742KB Size Report
Theorem 1 can now be applied to decide how to choose the kernel K to get a kernel .... of modes of the kernel estimator ft* in an interval (s,, t,) by N*(hl, s,, t,).
Probab. Theory Relat. Fields 91, 115-132 (1992)

Probability Theory~dted Fields 9 Springer-Verlag 1992

Some asymptotics for multimodality tests based on kernel density estimates E. Mammen 1, J.S. Marron z, and N.I. Fisher 3 1 Universit/it Heidelberg, Institut fiir Angewandte Mathematik, Im Neuenheimer Feld 294, W-6900 Heidelberg, Federal Republic of Germany 2 University of North Carolina, Department of Statistics, Chapel Hill, NC 27514, USA 3 CSIRO Division of Mathematics and Statistics, Lindfield,N.S.W., Australia 2071 Received October 24, 1990; in revised form August 26, 1991

Summary. A test due to B.W. Silverman for modality of a probability density is based on counting modes of a kernel density estimator, and the idea of critical smoothing. An asymptotic formula is given for the expected number of modes. This, together with other methods, establishes the rate of convergence of the critically smoothed bandwidth. These ideas are extended to provide insight concerning the behaviour of the test based on bootstrap critical values.

1 Introduction In this paper we present results about the (random) number of modes of a kernel density estimator. These results are applied in the asymptotic treatment of tests on the number of modes of a density. In a statistical exploratory analysis of a data set X1, ..., X, it may be important to decide if a mode of a kernel density estimator is caused by a mode of the underlying density or by random fluctuations of the data. A statistical approach to this problem can be based on hypothesis testing about the number of modes. For a discussion of other multimodality tests we refer to Good and Gaskins (1980); Hartigan and Hartigan (1985); Miiller and Sawitzki (1991) and Silverman (1981, 1983, 1986). For testing the hypothesis of k modes Silverman (1981) has proposed a test based on the "critically smoothed" bandwidth h = hCR1T,k of the kernel density estimator

fh (x) = (n h)-~ ~ K ((X, - x)/h), i=1

i.e. the bandwidth h for which the estimator is just between having k and k + 1 modes. A null hypothesis of k modes should be rejected for large values of hcaIr,k. The quantity hCRIT,k is uniquely defined if one uses a Gaussian kernel K=~0 because then the number of modes of fh is monotone in h (see Silverman 1981).

116

E. Mammen et al.

In Sect. 2 we present results about the (random) number of modes of a kernel density estimator. These results can be applied in the asymptotic treatment of the multimodality test of Silverman. For the determination of critical values Silverman (1981) proposed use of a smoothed bootstrap. The asymptotic behaviour of this bootstrap is discussed in Sect. 3. The proofs are contained in Sect. 4.

2 The number of modes of a kernel density estimator In this section we will consider a density f which has exactly j modes. We will assume that f does not lie on the " b o u n d a r y " of the null hypothesis: f ' has no zero crossing Xs of higher order (i.e. f'(xs)=f"(Xs)=O). For instance this excludes the cases that f has flat parts (that there exists an interval I with f ' ( x ) = O for x in I) and that f has a critical point which is neither a minimum nor a maximum. Densities on the " b o u n d a r y " of the null hypotheses can partially be treated using similar methods to those in this paper. More formally we will make the following assumptions. Assumptions. (A 1) f is a bounded density with bounded support [a, b]. (A2) f is twice continuously differentiable on (a, b). (A3) f ' ( a + ) > O , f ' ( b - ) < O . (A4) f has j local maxima ZoO

X~oO

We note in passing the following application of Theorem 1. The theorem remains valid for a large class of kernels K if one replaces [I(?" II by ILK"II

Some asymptotics for multimodality tests based on kernel density estimates

117

in the formula (see Mammen (1991c)). Then the additional number of modes of .fh (compared with f ) may be interpreted as a measure of smoothness of fh. (This is indeed connected with how our eyes measure smoothness of a curve.) Theorem 1 can now be applied to decide how to choose the kernel K to get a kernel estimator fh with approximately the same amount of "smoothness" (i.e. nearly the same number of modes) as f For a discussion of related issues see also Mammen (1991b, c) and Cuevas, Gonzales, Manteiga (1991). Theorem t has an immediate consequence when the bandwidth h converges to zero at a slower rate than n -~/5 (Apply the monotonicity of N(h) and E g(h)). Corollary 1.2 Assume nl/S h--* co. Then, under the assumptions (A 1), ..., (A5),

E N(h)=j - the critical bandwidth is of order n- 1/5. Corollary 2.1 Assume k >=j. Then, under the assumptions (A 1)..... (A 5), for Pn ~ O,

P(Pn n- 1/5 < hCRIT,k ~ Zn n - 1/5) ~ 1. Corollary 2.1 corrects the theorem in Silverman (1983) where it has been erroneously stated that under the null hypothesis, the critical bandwidth is of order n-1/5 log(n)l/s (see also the remark in the proof of our Theorem 2). The rate of convergence of hCRIT,k coincides with the rate of the bandwidth which is optimal in most senses for the class of two times continuously differentiable densities. Therefore, under the null hypothesis, the multimodality test of Silverman is based on the inspection of a reasonable kernel density estimate. In the proofs we will see that hCRIT.k is of order n -1/5 because the second derivative fh" is of order Op(1) if h is Of order n-1/5. Note that the optimal bandwidth for two times continuously differentiable densities is of the same order O(n-1/5) for another reason, namely because of the balance of bias and variance for this order of convergence. The machinery developed for the proofs of Theorems 1 and 2 easily yields two other interesting results. The next theorem shows that the analysis of the Silverman test is essentially based on a study of the observations near the location of the local extremes o f f Theorem 3 Assume (A1),...,(A5), and that h = h , is of order n -1/5 (i.e. 0 j) + T3 where T3= ~

P(N(ca)--j>m). Consider now the case that the level a is small

m=l

(e ~ 0). Then n 1/5 ca --* oo (because of Corollary 2.1) and we conjecture that Ta = 0 (P (N(ca) >j)2) = 0 (a2), and furthermore, if a tends to 0 slowly enough, that ]Ti] Ca)= P (N (ca) >j) "~E (N (ca) -j). So a sensible choice of ca is the solution of ~ = ~-' H .=o

ll "Jr

Some asymptotics for multimodality tests based on kernel density estimates

119

where f(zp) (or f"(zp) respectively) is an estimate of f (of f " respectively) at the p-th extremum. This requires the estimation of the extremum points and of a second derivative. As an alternative we consider in the next section bootstrap estimates of critical values.

3 Bootstrapping the test statistic To judge the stochastic behavior of hcglT,j under the null hypothesis o f j modes Silverman proposed using a smoothed bootstrap approach based on resampling from fh were h = hcRmj. This is a density on the boundary of the hypothesis o f j modes. The "critically smoothed" bandwidth for a kernel density estimate in the bootstrap resample may be called hcgiT,j. Then the bootstrap test rejects the hypothesis that the underlying density has at most j modes if T = P ( h *RIT, j ~

hCRIT,j I X t . . . . , X n) ~ 1 -- 0:.

In fact, Silverman (1981) proposed resampling from a slight modification of fhcR,~ instead of from fhcR,T itself. This modification has an important effect for small sample sizes but is~ negligible in first order asymptotics (see Fisher et al. 1990). As alternatives one should also consider resampling from other densities on the "boundary" of the null hypotheses o f j modes. One possibility is the L2 projection of a kernel density estimate onto the densities with j modes (see Mammen (1991a) for the asymptotically equivalent analysis of a related nonparametric regression estimate). We conjecture that the operational characteristics of the Silverman test depend strongly on the choice of the bootstrap density if the true density f itself lies on the boundary of the null hypothesis. Consider for instance the case that f is constant on an interval. We discuss here neither this case nor the right choice of the bootstrap density. For a discussion of this point and of other modifications of the Silverman test we refer to Fisher et al. (1990). For other nonparametric curve estimates under the condition of simple curve characteristics see also Mammen (1991 b). , The next theorem shows that, like hCRIT,j , hCRIT,j is also of order n -1/5. This suggests that the probability P ( T > 1 - ~ ) of rejecting the null hypothesis is asymptotically bounded away from 0. Theorem 5 Under the assumptions (A 1), ..., (A 5), for p, ~ O, z, --~ oo P(pnn-1/5

,:5

olo

o~2

o14

P-VALUE EXPECTED

Some asymptotics for multimodality tests based on kernel density estimates

123

4 Proofs of theorems

The main idea behind these proofs is approximation of the stochastic process fh(t), and it's derivatives with respect to t, by suitable Gaussian processes. This allows analysis of the expected number of modes, through application of formulas for the expected numbers of zero - crossings of such processes. We will give explicit proofs only for the case that f has j = 1 mode. Other cases are very similar, but require additional cumbersome notation. We will make repeated use of the following result in Sect. 13.2 of Cram6r and Leadbetter (1967). Theorem (Cram6r and Leadbetter). For a differentiable Gaussian process Z(t)

consider the number N of zero crossings of Z in the interval [u, v]. Then ira(t)\ E N = .S ~ (t) a (t) - 1 p (t) got ~ i ~ ) G (tl (t)) d t where 72 (t) = var (Z' (t))

a2(t)=var(Z(t)) (t) = cov ( z (t), z'

(t) (t))

p(t)--(1 -#2(t))x/2 m(t) = E Z (t) =

m' (t) -- y (t) # (t) m (t)la (t)

(t) p (t) G(x) = 2 go(x) + x(2 q~(x)- 1). Proof of Theorem 1 For X . ( t ) = X . ( t , h ) = f h ( t ) - E f~(t) there exists a Brownian bridge W ~ (which is independent of h) such that for Y,,(t) = Y,(t, h) = n - 1/2 ~h-2 q ; ( ( x - t)lh) W~

dx

= n-112 S h- 1 @( ( x - t)/h) d W ~(F(x)) the following holds if c1 is large enough:

P(B~) = o(n- 1)

(4.1) where (4.2)

B~ = {sup IX~)(t)- Y.(*)(t)I> c, (log n) n -(3 -,)/5} t

for 0_ c (log n) n- 1/5, if] Y','(t)[__(C--Cl)(1og n) n-1/5] /f[-[ ~','(t)[ < c(log n) n -2/5 or [fh'(t)[ c' (log n)5/4/~.) = 0 (b.)+ o (n- 1).

(4.8)

Proof of Lemma 7 Write g =fh'. First note that Lemma 4 remains valid for a,=b,/(log(n)) if we replace (Y,", Y') by ~ ' , f~). This can be seen along the lines of the proof of Lemma 4. The only point which needs a closer look is

P(lf// (s)[ 0 depending on e.

[2 Co 1 ~

n n- 1/53/I-2~ .

n - a/5 (log n)- 3/4 c - l/z31

= P (U(h)>= Co c a12(log n)S/4/]/~.). P r o o f o f Theorem 2 Because N(h) is decreasing in h we can assume n - l h - 2 ( l o g n ) = o ( n - 1 / 5 ) . Put Ij, n = [ ( j - - 1 ) n - l / 5 + Z o , j n - 1 / S + Z o ] for je2g. Proceeding as in the proof of Proposition 2 in Silverman (1983) one can show for e v e r y j e Z : n 1/5 sup fh' (x) ~ oe (in probability), x~lj,n

n 1/5 inf f~ (x) --+ - oc

(in probability)

xelj,n

This proves Theorem 2. Note that the proof of Proposition 2 in Silverman (1983), where a stronger statement than in our Theorem 2 is claimed, does not work. In that proof the first equation on page 257 and therefore also (18) do not follow from (17) and (15) (In (i7) the exponent 1/2 must be replaced by - i/2.).

128

E. Mammen et al.

Proof of Theorem 3 Consider the Gaussian process Yn constructed in the beginning of the p r o o f of T h e o r e m 1. Choose An--+0% An=o( l ~ g n ). F o r c2 large denote the n u m b e r of zero crossings of Y,'(t)+Ef/,(t)+Cl(logn) n -Us in the interval [zo+An.n -1/5, Zo+C 2 l o [ f i ~ n -1/5] by Nl(h ). Similarly N2(h) is the n u m b e r of zero - crossings of I7' (t) + Efh' (t) -- c, (log n) n - 1/5 in the interval [Zo --c2 i ~ n n - 1 / 5 , Z O _ _ Zjn ] q - 1 / 5 ] . The theorem follows from P(Nj(h) > O)< E Nj(h) ~ O. N o t e that P (1 Y.' (t) + E3~' (t) --fh' (t)[ > C1 (log n) n - 1/5) __+0. The convergence E Nj (h) follows by an application of the Cram6r-Leadbetter formula. N o t e for instance (see (4.4)) that with a sequence sn b o u n d e d away from 0 and oe Zo+C2l/oI~ n -

1/5

a(t) -1 (o(m(t)/a(t)) d t(1 + 0 (1))

S

E N (h) =

zO + A n n - 1/5

Zo+C2

=

0]~l 1

~

1/5

nl/Ss2 1 q)(m(t)nl/Ss;1)m'(t)dt(l+O(1))

Zo + Ann - 1/5 t2

= ~ qo(t)dt(l+O(1)) tl

where t I =nl/5s2 1 m(zo + A,n 1/5) and t2=nl/S snl m(zo + C2 ~ E N1 (h) --+ 0 follows from tl --' oe.

n-l/5). N o w

Proof of Theorem 4 As in the beginning of the p r o o f of T h e o r e m 1 define Y,(t, h). Choose m fixed constants 0 < 61 < . . . < ~,,. Put hi = 6i-n-1/5. Put hcRma, ~ = inf{hi lfh has exactly one local m a x i m u m for h = hl}, beReT,j,2 = inf{hi IEfh (t)+ Y, (t, h) has exactly one local m a x i m u m for h = hi}, hCRIT, j, 3 =

inf {hi [if,, (Zo)(t - Zo)2 + Y,(t, h) has exactly one local m a x i m u m for h

hi},

where the infimum over an empty set is defined as + oo. We will show that

P(hcRIT,j, 1 m_ hCRIT,j,3) --~ 1. This shows that the distribution of hCRIT,j has the same weak limit as the distribution of the critical bandwidth h of lf"(zo)(t-Zo)2+ Y~(t, h). This limit distribution depends only on f(zo) and f"(Zo). This proves the theorem. N o w P(hcRIT,j,I=hcRIT,j, 2)-+ 1 follows from L e m m a 2 and T h e o r e m 3. So it remains to show P(hcRmj,2=hcRIT,j,3)-+ 1. But this follows from L e m m a 4 if one chooses an = 2 l ~ o g n. (Co v 1) sup { I f ' ( t ) - f ' ( z o ) l : It--Zol

=-n- 2) _~ 1

where P*( ) = P ( ] X I , ..., X,), E*( )=E(IX1, ..., X,), B*h={sutp ~ * ( t ) - - E * f h * ( 0 - - Y * ( t , h ) ) > = c j ( l o g n ) n

-(3-*)Is}

for 0 < s < 3 . Denote the number of modes Offh* by N* (h). We will show: (4.10)

P* (N* (hn) > 1) --* 1 (in probability) if h, = o(n- i/s),

(4.11)

P* (N* (h,) = 1) ~ 1 (in probability) if h~- 1 = o (nl/5).

(4.10) can be shown similarly as in the proof of Theorem 2. To prove (4.11) we proceed similarly as in the proof of Theorem 1. With the arguments given in the proof of Lemma 2 one can show for Co large enough (4.12)

EP* (N* (h,) 4=N* (h,)) ---,0

where -N*(hn) are the number of modes of Y*(t, hn)+E*f~,(t) in the interval (uv, vn)= ( Z o - c o ~ g n n-1/5, Zo + Co 1]/logn n-,is). So it remains to show that (4.13)

E* N*(hn) ---, 1 (in probability).

Proof of (4.13) First note that E*fh* --=f~

where

"c='Cn=Vh2-I- h2RIT,3.

Put

m (t) = mn(t) =f; (t), O-z = O-n2= n -1 hn 3 ll~0'l[2f(zo),

~2 =72 = n-1 h , 5 I[~o"]l2f(z0).

130

E. Mammen et al.

Then by application of the Cram6r-Leadbetter formula one gets with similar expansions as in the proof of Theorem 1 : (4.14)

E* 2_N* ( h . ) - 1 On

= ~ 7a -1 q~(m(t)/~r) G(m'(t)/y)dt+op(1 ) Un

= u ~ + ... + u ~ + %(1)

where -

U1 = ~ a -1 (p

"27rp

dt,

Un v

U2= ~ a - l q )

2m'(t)~

dt,

ttn

(Em'(t)-m'(t)) dt,

U3= ~ o'-1 q~ Un

Un

Now the theorem is proved by the following statements: (4.15)

glgpl --,0

(4.16)

EU4---,1.

for p = 1,2, and 3,

For the proof of (4.15) and (4.16) note that re(t) and m'(t) are asymptotically independent. This implies for p = 1 : E[UI[ = ~ g a - 1 q)

E27~0

dr.

Un

Now 7 ~ 0 (because we have assumed h~-a = Ea

o(n~/S)) and

therefore

1(p(~f))=~-i

~o(Em(t)/~)+o(1)

- a .~2 -

~.-- 3 [lr

where ~2

0.2 _1._~ - 1

For the proof of the other statements in (4.15), (4.16) use E(m'(t)-Em'(t)) 2---, 0.

Proof of Theorem 6 Suppose for simplicity that the interval [s, t] coincides with the support [a, b] off. Write N*(hO=N*(h t, Sn, t~) and - as above - N(hl) =N(hl,s,,tn). Consider a resample X*, ..., X* with density fh 2. As in the last proof we will use a Brownian bridge W* (given X1, ..., X,) such that for Y*(t, h)=n- 1/2~h- t q~((x- t)/h) dW*(F,(h2, x)) property (4.9) holds for cl large enough. Here F,(h2,x) is the distribution function with density fh2- We will show: E/V* (hi) ~ EN(hl) + o (1)

Some asymptotics for multimodality tests based on kernel density estimates

131

where - as a b o v e - N*(hl) is the n u m b e r of m o d e s of Y*(t, hl)+E*f~(t) in the interval (u,, v,)=(Zo-Co I/log n n- 1/s, Zo + Co I/log n n- 1/5). This shows the statement of the t h e o r e m because of lim E m i n ( N * ( h l ) , c ) = EN*(hl). c --+ o 0

N o w put z = l / ~ t + h 2 and define m(.), a, 7 as in the last proof. T h e n one can show g g * (hi) -= EE* IV* (hi) y = I + E v~ ~ a-lcp(m(t)/a)sG

(~)

dr+o(1)

un

= l + i"~-rq~(Em~)_P(Em'(t))dt+o(1) where ~ is defined as in the last p r o o f and where

F(u)=~SG(~)lq~(5-~) dv and

#2 = n-1 z- 5 II(#" [IZf (zo) ( = var m' (t) + 0 (n- 2/s log

n)).

N o w by differentiation one can see that G is convex. This implies that

=2 U s i n g this and

Em' (t) =f" (Zo)+ o (1) gives

EN*(hl)>I+

vn [Em(t)\ 7 S if-1 q ) ~ } ~ G

tE;'(t)) - -

dr+o(1)

un

=EN(hl)+o(1). References Cram6r, H., Leadbetter, M.R.: Stationary and related processes. New York: Wiley 1967 Cuevas, A., Gonzalez Manteiga,. W.: Data - driven smoothing based on convexity properties. In: Nonparametric functional estimation and related topics Roussas, G. (ed.) pp. 225 240. Dordrecht: Kluwer 19% Fisher, N.[., Mammen, E., Matron, J.S. : Testing for multimodality. Technical Report 1990 Good, I.J., Gaskins, R.A.: Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data. J. Am. Star Assoc. 75, 42-73 (1980) Hall, P.: Edgeworth expansions for nonparametric density estimators, with applications. Technical Report 1989 Hartigan, J.A., Hartigan, P.M. : The DIP test of unimodality. Ann. Stat. 13, 70-84 (1985) Mammen, E.: Estimating a smooth monotone regression function. Ann. Stat. 19, 724-740 (1991 a)

132

E. Mammen et al.

Mammen, E.: Nonparametric regression under qualitative smoothness assumptions. Ann. Stat. 19, 741-759 (1991b) Mammen, E.: On qualitative smoothness of kernel density estimates. Technical Report 1991 c Mfiller, D.W., Sawitzki, G.: Excess mass estimates and tests for multimodality. J. Am. Stat. Assoc. 86, 738 746 (1991) Silverman, B.W.: Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. Ann. Stat. 6, 172184 (1978) Silverman, B.W.: Using kernel estimates to investigate multimodality. J. R. Star. Soc., Ser. B43, 97-99 (1981) Silverman, B.W.: Some properties of a test for mulitmodality based on kernel density estimates. In: Kingman, J.F.C., Reuter, G.E.H. (eds.) Probability, statistics and analysis, pp. 248-259 Cambridge: Cambridge University Press 1983 Silverman, B.W.: Density estimation for statistics and data analysis. London: Chapman and Hall 1986