Nonparametric symmetry tests for statistical functionals - CiteSeerX

4 downloads 0 Views 236KB Size Report
Nonparametric symmetry tests for statistical functionals. Arnold Janssen. Heinrich-Heine-Universit at D usseldorf. Dedicated to the 70th birthday of Johann ...
Nonparametric symmetry tests for statistical functionals

Arnold Janssen Heinrich-Heine-Universitat Dusseldorf Dedicated to the 70th birthday of Johann Pfanzagl

Summary. Along the lines of Pfanzagl's work the testing theory for the non-

parametric null hypothesis of symmetry (including matched pairs) is developed. The testing problem is typically given by a skew symmetric statistical functional which seems to be adequate for the nonparametric world. Under mild regularity assumptions asymptotically maximin tests are obtained which turn out to be ecient for special submodels. The tests can be carried out as nite sample distribution free randomization tests conditioning under the absolute values. It is shown that these tests are asymptotically valid level tests also under some extended null hypotheses when the random variables have heterogeneous distributions under the null hyptohesis. In the case of invariant statistical functionals we end up with rank tests for matched pairs. These rank tests can now be derived by testing problems for functionals.

AMS 1991 subject classi cations: 62G10 Key words: testing symmetry, symmetry statistical functionals, heterogeneous null distribution, conditional central limit theorems matched pairs, canonical gradient test, rank test, randomization test, Wilcoxon test

1. Introduction Parametric or semiparametric testing problems for hypotheses given by their parameters typically rely on likelihood methods. The derivative of the likelihood function determines the score test which often turns out to be asymptotically ecient in various cases. On the other hand nonparametric hypotheses can often be speci ed by statistical functionals, see for instance Witting and Muller-Funk [18] for concret examples. In his work Pfanzagl developed a wonderful method and construction principle for the ecient testing of functionals which extends the classical parametric point of view. In the light of the L -di erentiability calculus of statistical models this concept will brie y be compared with the parametric situation. Suppose that the model is given by a set P  M ( ; A) of relevant probability measures which is a subset of the set M ( ) = M ( ; A) of all probability measures on some measureable space ( ; A). Let P 2 P be xed. The local geometry of the space M ( ) at P is described by all L -di erentiable curves t 7! Pt at t = 0 and their tangents g 2 L (P ) which are just the L (P ) derivatives 2

1

1

1

0

1

0

2

2

0

2

0

d   dPt 1=2  d dPt 2 = log dt dP0 jt=0 dt dP0 jt=0 in the Hilbert space L2 (P0), see Bickel et al. [4], Strasser [15] and Strasser [16] R for a recent discussion. Let L  L02 (P0) = fh 2 L2 (P0) : h dP0 = 0g be the set of all tangents at P0 given by L2 -di erentiable curves at t = 0 with Pt 2 P for t small enough. The set L may be viewed as a set of nonparametric local parameters which is now used to describe the testing problem locally at P0. g=

(1.1)

Pfanzagl and Wefelmeyer [12, 13] introduced tangents and their tangent spaces, denoted by T (P ; P ) (the L (P )-closure of L), by means of their di erentiability concept in statistics which are powerful tools for estimation and testing problems. In connection with estimation theory we also refer to the recent monograph of Bickel et al. [4]. 0

2

0

1

Similarly to the introduction of score functions in parametric situations a linearization of a given statistical functional  : P ! R at P is now described. The functional is called to be di erentiable at P with gradient _ = _ (P ) 2 L (P ) if Z 1 (1.2) ((P ) ? (P )) ! g _ dP as t ! 0 0

0

0 2

0

0

t

t

0

0

for each L -di erentiable curve t ! Pt in P with tangent g 2 L at P . Under convexity assumptions there exists a canonical gradient ~ = ~(P ) among all gradients which has the smallest L (P )-norm. If ~ is itself a tangent, ~ 2 L, then we have the following nice interpretation of the canonical gradient. A curve with tangent g = ~ determines the hardest one-parametric submodel concerning testing or estimation problems given by the functional  at P . Testing problems are now typically speci ed by the one-sided hypothesis 2

0

0

2

0

0

(1.3)

fP 2 P : (P )  (P )g against fP 2 P : (P ) > (P )g 0

0

or two-sided alternatives with (1.4)

fP 2 P : (P ) = (P )g against fP 2 P : (P ) 6= (P )g: 0

0

If we have n independent replications X ; : : : ; Xn of random variables with distribution in P the parametric score test statistic is now substituted by the test statistic 1

(1.5)

n

X Tn = n?1=2 ~(Xi ) i=1

given by the canonical gradient ~. One-sided and two-sided Tn -tests for (1.3) and (1.4), respectively, were introduced by Pfanzagl and Wefelmeyer [12, 13] and studied in detail. Within full tangent models with respect to the subclass P = fP 2 P : (P ) = (P )g these tests are asymptotically ecient, see [13], sect. 6.3. Notice that the present approach can be combined with the powerful local asymptotic normality (LAN) 0

2

0

of LeCam, see also Janssen [8, 9] for an elaboration of a one-sided di erentiabity concept for functionals. Actually the LAN approximation by Gaussian statistical models is combined with the linearization (1.2). These technical tools establish the optimality. If the model is not full under the null hypothesis then certain optimality properties of the Tn-tests remain valid. The tests are asymptotic maximin tests, see Janssen [8, 9] and (2.13) below. The results also apply to two-sample problems. In [7, 8] it is shown how the class of two-sample linear rank tests of Hajek and Sidak [6] can be deduced as asymptotically ecient tests for nonparametric statistical functionals. Robust testing of multivariate functionals given by parametric families can be found in Beran [3] and Rieder [14]. Our section 4 about conditional tests was in uenced by Neuhaus [10] who applied similar principles for censored data. The present paper is concerned with symmetry tests where the alternatives are speci ed by skew symmetric statistical functionals. Section 2 is devoted to functionals of von Mises type. Invariant functionals w.r.t. transformations between 0-symmetric distributions lead to signed rank tests. As a special example the Wilcoxon functional of asymmetry is considered, see section 3. In addition the eciency of the tests is discussed also for some classes of extended null hypotheses. Under composite nonparametric null hypotheses the tests can be carried out as distribution free randomization tests. Within the asymptotic set up this procedure also works for a class of heterogeneous null hypotheses with identically distributed observations. These applications are based on a conditional central limit theorem which can be found in section 4.

2. Symmetry tests Throughout it is shown how the null hypothesis of symmetry can be tested along the lines of the general principles. Again we refer to Pfanzagl and Wefelmeyer 3

[12], sect 2.3 and sect. 15 for much basic work. For dimension k = 1 the tangent space business and the estimation theory for the center of symmetry of symmetric distributions can be found in Bickel et al. [4]. The following testing problem concerning the symmetry can be motivated by two-sample problems for matched pairs which are very popular testing plans in nonparametric medical statistics, see Hajek and Sidak [6], Behnen [1] and Behnen and Neuhaus [2] for the univariate case. Let (Yi; Zi) 2 R k be measurements carried out for twins, 1  i  n, where the rst one (Yi) is under treatment and the second is not. If the treatment has no in uence (null hypothesis) the components are exchangeable and we have that 2

(Yi; Zi) =D (Zi; Yi)

(2.1)

are equal in distribution. The independence of the components is not required which would often be too restrictive for applications. By (2.1) the distribution of the di erences Xi := Yi ? Zi

(2.2)

2 R k ; 1  i  n;

are 0-symmetric on R k , i.e. symmetric under the transformation x 7! ?x. The consideration of the X 's transforms the problem to a one-sample testing problem for X ; : : : ; Xn where now the null hypothesis with independent skew symmetric distributions 1

(2.3)

L(?Xi ) = L(Xi); 1  i  n

is considered. However, we will sometimes allow that the X 's are not identically distributed under the null hypothesis. For instance the Xi may be heterogeneous in the sense that (2.4)

Xi = i (Yi0 ? Zi0 ); i > 0

4

where (Yi0; Zi0)in are i.i.d. and i stands for di erent standards for the i-th patient. We will show in the sections 3 and 4 how exact valid tests for some non i.i.d. null hypotheses of symmetry can be obtained. Our rst aim is the investigation of symmetry tests for i.i.d. random variables. The roots are based on Pfanzagl and Wefelmeyer [12] sect. 8 and [13] sect. 6 and the technical details are taken from Janssen [8]. Throughout, let X ; : : : ; Xn be i.i.d. R k -valued random variables with joint distribution P 2 P  M (R k ). Introduce the re ection P? = L(?idjP ) of P at zero and let Ms = fP 2 M (R k ) : P = P?g be the set of 0-symmetric distributions. Suppose that the following assumptions hold. 1

1

1

A(0). P \ Ms 6= ;; P 2 P ) P? 2 P , A(1). Let  : P ! R be a functional with (P ) = 0 for all P 2 P \ Ms with a convex set L of tangents at P and suppose that  is di erentiable (in the sense of (1.2)) at P with canonical gradient ~ = ~(P ) 6= 0. 0

0

0

The null hypothesis of symmetry (2.3) is usually strictly contained in the null hypothesis (1.4), namely

Ms \ P  fP 2 P : (P ) = 0g:

(2.5)

For xed P 2 Ms \ P we will rst brie y summarize local results at P for testing our functional  given by the larger one-sided null hypothesis (1.3). Later the results are speci ed to Ms \ P . For the present problem the whole theory of section 1 works which suggests the asymptotic level sequence of tests 0

0

(2.6)

'n = 1[u ? k~k ;1) (Tn ) 1

2

based on (1.5), where k~k is the L (P )-norm of the canonical gradient and u ? = ? (1? ) denotes the (1? )-quantile of the standard normal distribution function . The following result is taken from Janssen [8]. 2

1

2

0

1

5

2.1 Lemma Let t 7! Pt be a L -di erentiable curve in P with tangent g at P . Under A(0) 2

0

and A(1) we have the following results for 'n. Let tn = O(n? = ) be a sequence of reals. (a) (Asymptotic unbiasedness). Suppose that 1 2

lim inf n = ((Ptn ) ? (P )) > 0 n!1

(2.7)

1 2

0

holds then we have (2.8)

lim inf E n ('n) > : n!1 Ptn

(b) (Level property for local sequences of hypotheses). If (2.9)

lim sup n = ((Ptn ) ? (P ))  0 1 2

n!1

0

holds we have lim sup EPtnn ('n)  :

(2.10)

n!1

(c) (Power function under implicit alternatives (2.11)). Let (Ptnn )n be a sequence of implicit local alternatives along the curve t 7! Pt with (2.11)

(Ptn ) = (P0 ) + n?1=2 # + o(n?1=2 );

where # 2 R is a further local parameter. Then (2.12)

lim E n ('n) n!1 Ptn

= (#=k~k ? u ? ) 2

1

holds which is independent of the special curve. (d) (Maximin tests). For xed # > 0 let K# denote the family of all possible sequences of implicit alternatives (Ptnn )n arising from L -di erentiable curves such that (2.11) holds. Then (2.12) is the maximum bound, i.e. 2

(2.13)

lim sup EPtnn (n) (#=k~k ? u ? ) = max inf n n2N K# n!1 2

1

(

)

6

where the maximum is taken over all sequences (n)n2N of asymptotically P n-level tests. Thus ('n )n2N attains the present maximin bound. 0

As we will see the asymptotic eciency of 'n depends on the geometry of the null hypothesis around P . Especially, the gradient test (2.6) may be ecient for the larger null hypothesis f(P ) = 0g\P but not for Ms \P . This phenomenon is discussed below. At this stage we refer to the cotangent space criterion of Pfanzagl and Wefelmeyer [13], sect. 6 which answers the question of the eciency of tests in general. So far the results are concerned with general testing problems for functionals. Now we will turn to the actual symmetry testing problem. Good functionals which are able to seperate 0-symmetric distributions and non-symmetric alternatives are skew symmetric functionals with 0

(2.14)

(P? ) = ?(P ) for all P

2 P:

It is easy to see that then the canonical gradient ~ (which is uniquely determined) is itself skew symmetric, ~(?x) = ?~(x), provided  is di erentiable. Notice that the re ection of a curve t 7! (Pt)? has a tangent x 7! g(?x). From now on we always assume that the canonical gradient is skew symmetric. To be concrete we will consider von Mises functionals.

2.2 Example (von Mises functional) Let h : R k ! R be a skew symmetric function and let K be a positive constant. For convenience consider

(2.15)

P = PK := fP 2 M

1

(R k ) :

Z

h2 dP < K g:

R

Then h : P ! R, h(P ) := h dP is di erentiable at each P 2 P \ Ms with canonical gradient ~ = h. 0

7

More generally, PK may be substituted by some set P with

PK  P  fP :

(2.16) R

Z

h2 dP < 1g

such that h dP is bounded locally on some Hellinger balls around P for each P 2 Ms \ P , see Janssen [8], Ex. 2.1 and Bickel et al. [4], p. 457. 2

0

0

In the next step the tests 'n will be applied to composite null hypotheses. For these reasons we need critical values which are independent of P 2 Ms \ P . Throughout suppose that A(0) and A(1) hold for all P 2 Ms \ P and consider the following assumption A(2). 0

0

A(2). The canonical gradient is of the form (2.17)

~ (P0 ) = cP h 0

for all P 2 Ms \ P where cP is a positive constant and h : R k ! R is a skew symmetric function. 0

0

As test statistic we may now choose (with sign(x) := x=jxj) (2.18)

n

X Tn = n?1=2 sign(h(Xi )) jh(Xi )j: i=1

If h(Xi) = 0 holds, then let sign(h(Xi)) denote an uniformly distributed 1 valued random variable independent of the data. The Tn-test will be carried out as a randomization test which is a conditional test given the absolute values jh(X )j; : : : ; jh(Xn)j. To explain this fact let Y ; : : : ; Yn be univariate independent (possibly heterogeneous) random variables with 0-symmetric distributions on R for each i  n. Then the absolute values and the signs 1

1

(2.19)

(jYij)in and Sn := (sign(Yi))in 8

are independent. Moreover, Sn is uniformly distributed on f1gn. Let now Wn(Y ; : : : ; Yn) be an arbitrary real valued test statistic for the null hypothesis of symmetry. Then Sn may be substituted by new uniformly distributed random variables "n = ("in)in : ( ~ ; A~; P~ ) ! f1gn which are independent of the data. For given xed absolute values the (1 ? )-quantile cn = cn((jYij)in) of the conditional distribution 1

"n 7! Wn (("in jYij)in )

(2.20)

is now taken as critical values. There exist values n = ((jYij)in) 2 [0; 1] such that (2.21)

Z

(1 cn;1 (Wn(("injYij)in)) + n1fcng (Wn(("injYij)in)) dP~ = (

)

holds for each vector of absolute values (jYij)in. Then the upper Wn -test (2.22)

n

= n1fcng(Wn) + 1 cn ;1 (Wn) (

)

is called the randomization test.

2.3 Remark (a) Under the hypothesis of 0-symmetry for each i  n the test n is by construction a test with exact level . Recall that the absolute values (2.19) are then sucient under the null hypothesis and n is a test of Neyman structure in the sense of Pfanzagl [11], 4.6.1. (b) Suppose that under the null hypothesis (included in the hypothesis of 0symmetry) the absolute values (jYij)in are boundedly complete. Then each exact level test for the composite null hypothesis is a randomization test. Conditions for boundedly complete nonparametric families can be found for instance in Pfanzagl [11], 4.6.1. 9

As an application we see that under our assumptions A(0){A(2), which includes the von Mises functionals, the canonical gradient Tn-test (2.6) can be carried out as a randomization test under Ms \P and it is nite sample distribution free with exact error one probability . The result holds true under heterogeneous distributions (2.3) under the null hypothesis. Now we are concerned with the more delicate null hypothesis

fP n : (P ) = 0; P 2 Pg

(2.23)

and one-sided alternatives. Although (2.19) fails to be true the test 'n (2.6) is asymptotically equivalent to a studentized randomization test under (2.23) in various cases.

2.4 Theorem

Consider the von Mises functional h of Example 2.2 with skew symmetric function h. De ne the studentized test statistic (2.24) and let

n

X Wn = n?1=2 h(Xi )=sn ; i=1

n n X X 1 1 sn = (h(Xi) ? n h(Xj )) : n?1 i j 2

2

=1

=1

R

denote the upper Wn-randomization test at level . If h dP = 0 and R h dP < 1 we have n

2

(2.25)

EP n (j'n ?

n

j) ! 0

where 'n is the unconditional test (2.6).

2

Proof: See section 5. The test 'n (2.6) can also be substituted by the studentized test (2.26)

'0n = 1(u ? ;1)(Wn ) 1

where Wn is as in (2.24). 10

Next we will sharpen the maximin result for von Mises functionals h by the eciency concept. Consider a null hypothesis H0  fh (P ) = 0g \ P :

(2.27)

2.5 De nition

Consider a sequence of asymptotic level tests n for H , i.e. 0

lim sup EP n (n )  for all P 2 H :

(2.28)

0

n!1

It is called asymptotically ecient against H = fP 2 P : (P ) > 0g locally at P 2 H within the class (2.28), if similarly to (2.13) 1

0

0

(2.29)

min lim inf(EPtnn (n) ? EPtnn (n))  0

n )n2N n!1

(

holds for all implicit alternatives (Ptnn )n2N 2 K# = K#(P ) for all # > 0 given by Lemma 2.1(d). The minimum is taken again over all sequences (n)n2N of asymptotic H level -tests (2.28). 0

0

This de nition coincides with the de nition of the eciency given by sequences of alternatives (Pnn? = t)n2N along L -di erentiable curves, confer Janssen [8] and also for the connection to the uniform local asymptotic normality. The following result for canonical gradient tests for (not necessarily skew symmetric) von Mises functionals is (up to some technical details) essential due to Pfanzagl and Wefelmeyer [13], sect. 6.3. 1 2

2

2.6 Theorem

Consider the von Mises functional h on the model PK (2.15) and set H = fP 2 PK : h(P ) = 0g. Then the sequence of tests ('0n )n2N given by (2.26) is asymptotically ecient for each P 2 H against fP 2 P : h(P ) > 0g. 0

0

11

0

Proof. Notice that PK has full tangent space and the tangent space T (P ; H ) R is given by fg 2 L (P ) : hg dP = 0g which is easy to see. The result follows from the co-tangent space criterion, see Janssen [8], Corollary 3.3. 2 0

0 2

0

0

0

If now h is skew symmetric and H = PK \ Ms is the hypothesis of 0-symmetry the class of comparable tests increases and '0n will not be ecient in general since the co-tangent space of T (P ; H ) within T (P ; PK ) may be too large. However, ('0n)n can be derived as an ecient test for reasonable submodels. For this purpose let P 2 Ms(R k ) be a xed 0-symmetric distribution. Introduce the convolution 0

0

0

0

0

s := P0  "c(s)

(2.30)

given by new centers of symmetry c() : U ! R k ; s 7! c(s) which are skew symmetric with c(?s) = ?c(s) on some symmetric neighbourhood U  R of zero. A typical example is a shift family, see Example 2.8. Suppose that s 7! s is L -di erentiable at s = 0 with tangent h, h 6= 0. Obviously, h is skew symmetric. Suppose now that 2

fs : s 2 U g  P

(2.31)

holds and  : P ! R is di erentiable with canonical gradient cP h; cP > 0, at P . For instance  may be the von Mises functional h . The curve (2.30) is then least favorable for our testing problem. De ne now a new semiparametric family of distributions P s;Q on R k by 0

0

0

(

)

dP(s;Q) d dQ ( x) = s (x) (x ? c(s)); s 2 U; Q 2 Ms ; dP0 dP0 dP0

(2.32)

which parametrizes shifted skew symmetric distributions along the curve s 7! c(s). The singular parts of (2.32) w.r.t. P may be proportional to some P -singular distribution. The parameter s determines the center of P s;Q whereas Q is its shape parameter for the symmetry. For dimension k = 1 compare with Pfanzagl and Wefelmeyer [12], sect. 2.3 and Bickel et al. [4], p. 55. 0

0

(

12

)

Restricted to the submodel

Psub = fP s;Q : s 2 U; Q 2 Msg

(2.33)

(

)

the canonical gradient tests are now ecient locally at P . 0

2.7 Theorem

In addition to the assumptions (2.30){(2.33) let s 7! c(s) be a curve without loops at zero, i.e. c(s) ! 0 i s ! 0. Then the sequences of gradient tests ('0n)n and ( n)n of (2.22) and (2.26) given by the direction h are asymptotically ecient locally at P for testing Ms against restricted one-sided alternatives 0

fP 2 Psub \ P : (P ) > 0g:

(2.34)

2

Proof: See section 5.

2.8 Example (Pfanzagl and Wefelmeyer model [12], sect. 2.3) s For dimension k = 1 let d d (x) = f (x ? s) be a translation model with

nite Fisher information given by a 0-symmetric density f . Then the tests under consideration with h(x) := ?f 0 (x)=f (x) are locally asymptotically ecient at  = P for the von Mises functional h restricted to the present submodel (2.34). 0

0

3. Rank tests for invariant functionals For a moment let us return to the motivation for symmetry tests (2.1){(2.4). The requirements in medicine often suggest to use invariant statistical functionals which are invariant w.r.t. some class of transformations. For dimension k = 1 one may consider certain skew symmetric (monotone) bijections T : R ! R . Obviously, the von Mises functionals are not invariant in general. 13

Throughout, we will restrict ourselves to dimension k = 1. Notice that the subsequent results also work for functionals k of k-dimensional distributions which only depend by k (P ) =  (L(S jP )) on univariate distributions L(S jP ) given by a skew symmetric transformation S : R k ! R . Let now always 1

P  M (R ) \ fP : P (fxg) = 0 for all x 2 R g

(3.1)

1

be a class of univariate continuous distributions. We identify P with its distribution function F (x) = P (?1; x] and P? with F?, F?(x) = 1 ? F (?x). Similarly to two-sample functionals, see Janssen [8], Ex. 4.1, let ' : (0; 1) ! R be square integrable w.r.t. the uniform distribution. Then (3.2)

(F ) =

Z

'(F? (x)) dF (x)

de nes a functional which is invariant under strictly increasing skew symmetric one to one transformations on R of F . A reasonable condition for testing symmetry is '(1=2 + u) = ?'(1=2 ? u); 0 < u  1=2

(3.3)

which means that u 7! '(1=2 + u) is skew symmetric. Under this condition we have (F? ) = ?(F ):

(3.4)

The special choice '(u) = u ? 1=2 determines the famous Wilcoxon functional (F ) = P (f?X2  X1 g) ? 1=2

(3.5)

where X ; X are i.i.d. with distribution function F , see Witting and Muller{ Funk [18], p. 544. The next lemma motivates the consideration of canonical gradients which factorize via F . 1

2

14

3.1 Lemma

Suppose that P 2 Ms \ P is a 0-symmetric distributions with distribution function F of P and de ne P := 2? j ? ; . Let  be invariant w.r.t. the skew symmetric transformation T = 2F ? 1, i.e. 0

0

0

1

00

(

1 1)

0

(L(T jP )) = (P );

(3.6)

fL(T jP ) : P 2 Pg  P :

If : (?1; 1) ! R is a gradient of  at P then  T is a gradient of  at P . 00

0

Proof: Let t 7! Pt be a curve with tangent g at P . Then it is well known that t 7! L(T jPt ) is L -di erentiable at t = 0 with tangent 0

2

g~(x) = E (g jT = x);

see Witting [17], p. 178 and Bickel et al. [4]. Since F is continuous we have L(T jP ) = P and F ?  F (y) = y for P almost all y 2 R . This identity implies g~ = g (T ? ) with T ? (u) = F ? ((1 + u)=2). The invariance of  implies 0

0

00

1

0

1

1

0

0

0

1

t?1 ((Pt ) ? (P0 )) = t?1 ((L(T jPt )) ? (P00 ))

=

Z

g  T ?1 dP

00

+ o(t) =

Z

 T g dP + o(t): 0

2

In various cases (including the Wilcoxon functional (3.5)) the functional  given in (3.2) is di erentiable with canonical gradient ~(F0 ) = 2'(F0 ) = 2 (2F0 ? 1)

(3.7)

for each F 2 Ms \ P , where : (?1; 1) ! R is the skew symmetric function 0

(3.8)

(u) = '((u + 1)=2); u 2 (?1; 1):

In addition to the assumptions A(0) and A(1) of section 2 we will now suppose that condition A(3) holds which substitutes (2.17). 15

A(3). There exist a j ? ; square integrable skew symmetric function : (?1; 1) ! R and positive constants cF with (

1 1)

0

~(F0 ) = cF (2F0 ? 1)

(3.9)

0

for all F 2 Ms \ P . 0

Under this assumption the parametric canonical gradient statistic (similar to (2.18)) is given by (3.10)

n

X Tn = n?1=2 i=1

n

X (2F0(Xi) ? 1) = n?1=2 sign(Xi) i=1

(2F (jXij) ? 1): 0

It is now the problem to make the Tn-test distribution free under F 2 Ms \ P . We will repeat the nonparametric arguments leading to rank tests since some statisticians still hesitate to apply rank methods. Recall that under Ms \ P the signs (sign(Xi))in and the absolute values (Zi)in, Zi := jXij are independent vectors. Since G (x) := 2F (x) ? 1; x  0, is the distribution function of Zi we have 0

0

(3.11)

0

n

X Tn = n?1=2 sign(Xi ) (G0 (Zi )): i=1

The unknown nuisance parameter G can be estimated by the empirical distribution function 0

(3.12)

G^ n (x) = n?

1

n X i=1

1 ?1;x (Zi) (

]

which is the ecient nonparametric estimator. If G is replaced by G^ n then (3.11) naturally becomes a rank statistic which is based on the ranks 0

(3.13)

Rn (Z ) = (Rni (Z ))in := (nG^ n (Zi ))in

of the absolute values Z = (Z ; : : : ; Zn). Notice that (3.13) is uniformly distributed on the set of permutations of 1; : : : ; n under Ms \ P . This motivates the symmetry rank tests of Hajek and Sidak [6], sect. II. 1.3, with test statistic 1

(3.14)

n

X Sn = n?1=2 sign(Xi )an (Rni (Z )): i=1

16

According to Hajek and Sidak [6] the choice of exact scores an (i) := E ( (2Ui:n ? 1))

(3.15)

given by order statistics Ui n of i.i.d. uniformly distributed random variables U ; : : : ; Un on (0; 1) imply Sn = E (Tn j(sign(Xi ))in ; Rn (Z )) and :

1

Tn ? Sn ! 0

(3.16)

in P n-probability for all P 2 Ms \ P . Note that (3.16) also holds if (3.15) is replaced by L (0; 1)-convergent scores, see Hajek and Sidak [6]. The upper Sntest can be carried out as an exact level conditional test given the ranks Rn(Z ) of the absolute values, see (2.19){(2.22). Similarly to (2.22) it is given by 0

0

2

(3.17)

n

= n1fcng(Sn) + 1 cn ;1 (Sn) (

)

where cn is the conditional critical value of the randomization distribution (2.20) of Sn = Sn((sign(Xi))in; (jXij)in). For L (0; 1)-convergent scores it is easy to prove along the lines of Theorem 2.4 that the equivalence statement (2.25) holds for the test (3.17) and 'n given by (2.6) under Ms \ P . The results of section 2 can now be applied to functionals with invariant canonical gradients. 2

3.2 Corollary

Suppose that A(0), A(1), A(3) hold and let n be the symmetry rank test given by (3.15) (or related L (0; 1)-convergent scores). (a) For each P 2 Ms \ P the sequence n is an asymptotically maximin test in the sense of (2.13). (b) Suppose that under the conditions of Example 2.8 there exists a 0-symmetric distribution F with density f such that 2

0

0

(3.18)

~ (F0 )(x) = ?cf 0 (x)=f (x)

17

holds fore some c > 0. Consider the submodels (2.32), (2.33) with c(s) = s. According to Theorem 2.7 the sequence n is then asymptotically ecient locally at P against one-sided alternatives de ned by (2.34). 0

The assumption A(3) will now be veri ed for various functionals of the type (3.2) which serve as typical examples. Notice rst that under the present assumption (3.3) our functional (3.2) is invariant under strictly increasing skew symmetric transformations T : R ! R of the distribution F . The di erentiability of  at a 0-symmetric distribution function F can be obtained under regularity assumptions. Write 0

Z

Z

(3.19) (F ) ? (F ) = ('(F?) ? '(F )) dF + ( '(F ) dF ? 0

0

0

Z

'(F0 ) dF0 )

According to Janssen [8] the second part of the right hand side of (3.19) is di erentiable at F with gradient '(F ) = (2F ? 1). For the treatment of the rst term consider the distribution function t 7! Ft given by an arbitrary L -di erentiable curve with tangent g at t = 0. If ' is bounded and di erentiable with integrable derivative '0 then formal di erentiation under the integral leads to 0

0

0

2

(3.20)

d dt

Z

=

('((Ft )?) ? '(F )) dFt jt = 0 Z

0

d ('((Ft )?) ? '(F0))jt = 0 dFt = dt

Z

'(F0 ) g dF0

where the last equality is proved in section 5 in the proof of Lemma 5.2. Regularity conditions ensure the validity of the rst equality of (3.20). Altogether we have that (3.7) is a gradient of (3.19). A set of required technical conditions (including the Wilcoxon functional (3.5)) is given in the next Lemma below.

3.3 Lemma

R

Consider the functional (F ) = '(F?) dF on a subset P of all continuous distributions. Under the following conditions  is di erentiable at F 2 Ms \ P 0

18

with gradient _ = 2'(F ). (i) There exist di erentiable distribution functions 'i : (0; 1) ! [0; 1] such that u 7! 'i (1=2 + u) is skew symmetric for 1  i  k and reals ai 2 R with 0

(3.21)

' = a0 +

k X i=1

ai 'i :

(ii) The L -di erentiability of a curve t 7! Ft at F in P implies the L -di erentiability of the associated curve with distribution functions 'i (Ft) at t = 0 for each 1  i  k. 2

0

2

2

Proof: See section 5.

3.4 Example

On the set P of all continuous distributions the Wilcoxon functional

(3.22)

(F ) =

Z

F? (x) dF (x) ? 1=2 =

Z

(F (?x)=2) dF (x)

is given by '(u) = u ? 1=2 and (u) = u=2. For each F 2 Ms \ P it is di erentiable at F with canonical gradient 0

0

(3.23)

~ = 2F0 ? 1:

According to Corollary 3.2 the Wilcoxon symmetry sign test (3.16) with scores (3.24)

an (i) = (n + 1)?1 2i ? 1

given by (3.15) is an asymptotic maximin test at F 2 Ms \ P against implicit alternatives (2.13) de ned by the functional (3.22). If F denotes the (symmetric) logistic distribution function, see Hajek and Sidak [6], then the Wilcoxon 0

0

symmetry test is asymptotically ecient at P against the restricted alternatives (2.34) where the submodel is de ned by (2.30) and (2.32) for the shift family F  "s of the logistic distribution. 0

0

19

3.5 Remarks

(a) In practice the statistician often likes to test the null hypothesis of stochastically smaller distributions w.r.t. some given 0-symmetric distribution function F against stochastically larger alternatives, namely 0

(3.25)

H = fF

2 P : F  F g against 0

K = fF

2 P : F  F ; F 6= F g: 0

0

Throughout we will indicate how von Mises functionals and our invariant functionals (3.2) can be used for this problem, see also (1.3). (i) Consider a von Mises functional h (F ) =

Z

1 0

h(F ?1 (u)) du

with an increasing function h : R ! R for dimension k = 1. If F  G holds we have (G)  (F )

(3.26)

since G?  F ? . In this case the canonical gradient tests 'n (2.6) are asymptotically unbiased at F for local sequences generated by K (3.25), see Lemma 2.1(a). On the other hand these tests also have the level property for local sequences given by H , see Lemma 2.1(b). (ii) The same assertion and (3.26) hold for the invariant functionals 1

1

0

(F ) =

Z

0

1

'(F? (F ?1 (u))) du

whenever ' is increasing on (0; 1). Notice that F  G implies G?  F ? and G?  F?. In particular the Wilcoxon symmetry test can be used as an asymptotically unbiased test for (3.25) for all 0-symmetric continuous distribution functions F . (b) Two-sided testing problems given by (2.5) against fP 2 P : (P ) 6= 0g can be treated similarly. For details we refer to Janssen [8]. In this case two-sided tests are based on the absolute value of the canonical gradient test statistic. 1

0

20

1

4. A conditional central limit theorem In this section it is shown that the conditional randomization tests of the previous sections asymptotically work also for certain schemes of not symmetric and non i.i.d. sequences of the null hypothesis speci ed by the functional. As a tool a conditional central limit theorem (CLT) in the spirit of Janssen [7] is used. In contrast to [7] the procedure here works for linear non-studentized test statistics. However, it is our impression that at least in the case of von Mises functionals one should prefer the randomization of studentized test statistics in the sense of [7] in practice, see also Theorem 2.4 of the previous section 2. We begin with the von Mises type test statistics for a heterogeneous null hyptohesis with E (h(Xi)) = 0 for each i  n and skew symmetric functions h which is included below by setting Yn;i = h(Xi). More generally, let (Yn;i)in, n 2 N , be a triangular array of rowwise independent real random variables on some probability space ( ; A; P ). Assume that (4.1)

E (Yn;i) = 0; ni := (V ar(Yn;i))1=2 < 1

holds for all i  n. Let by de nition n := ( always suppose that (4.2)

Pn 2 1=2 i=1 ni )

> 0 hold and let us

max P (jn? Yn;ij > ") ! 0 in 1

1

as n ! 1 for each " > 0. Then we may recall the following well-known easy consequences of the CLT of Lindeberg and Feller, see for instance Gnedenko and Kolmogorov [5] for the rst statement of (4.4).

4.1 Remark

Under the assumptions (4.1) and (4.2) the validity of the CLT

(4.3)

n

X L(?1 Y n

i=1

n;i )

21

! N (0; 1)

in the sense of weak convergence of distributions implies max jn?1Yn;ij ?! 0 1in P

(4.4)

n

X  ?2 Y 2

and

n

i=1

?! 1

n;i P

in P-probability as n ! 1. Introduce now the absolute values Zn;i := jYn;ij and assume that (4.5)

Tn ((Zn;i)in ; (sign(Yn;i))in ) := n?1

n X i=1

sign(Yn;i)Zn;i = n?

1

n X i=1

Yn;i

is a linear test statistic. We will show how the idea of randomized Tn-tests (2.20){ (2.22) also works under the heterogeneous null hyptohesis (4.1). As in (2.20) let "n = ("in )in be uniformly distributed signs on some probability space ( ~ ; A~; P~ ) independent of the data. Although the following construction seems to be arti cial for non-symmetric random variables we will now substitute Yn;i = sign(Yn;i )Zn;i by "in Zn;i. Similar to (2.20) this substitution leads to the conditional distribution function (4.6)

Fn (xj(Zn;i )in) := P~ (n?

1

n X i=1

"inZn;i  x)

for given absolute values Zn;i. If G?n denotes its inverse then the conditional critical values 1

cn ( ; (Zn;i)in ) := G?n 1 (1 ? )

of the upper Tn-randomization test are taken from (4.6). Under the conditions of Theorem 4.2 below we have (4.7)

cn ( ; (Zn;i)in ) ? ! ?1(1 ? ) P

in P -probability.

22

4.2 Theorem

Suppose that (4.1) and (4.2) hold. (a) The unconditional CLT (4.3) implies the following conditional CLT sup jFn(xj(Zn;i)in) ? (x)j ?! 0 P

(4.8)

x2R

in P -probability as n ! 1. (b) If all Yn;i; 1  i  n, are 0-symmetrically distributed then the three conditions (4.3), (4.4) and (4.8) are equivalent. Proof. (a) Without restrictions we may assume that both expressions in (4.4) are P -almost surely convergent. Otherwise we may turn to subsequences. Consider a xed point ! 2 with (4.9)

n

X  ?2 Z n

i=1

n;i (! )

! 1 and max j? Z (!)j ! 0: in n n;i 1

1

The CLT, see Hajek and Sidak [6], p. 153 now implies that "n 7! Tn ((Zn;i(! ))in; "n )

(4.10)

is asymptotically standard normal for xed !. (b) We have already shown that (4.4) implies (4.8). In view of Remark 4.1 it remains to prove that (4.8) implies the CLT (4.3). Under 0-symmetry of the Yn;i's we have equality in distribution of X i=1

D "ni Zn;i =

n X i=1

sign(Yn;i)jYn;ij =

n X i=1

Yn;i:

Thus the conditional CLT implies the unconditional CLT (4.3).

4.3 Remarks

2

Consider test statistics Tn given by (4.5), their related unconditional tests 'n (2.6) and the conditional randomized counterparts n (2.22) with Wn = Tn . 23

Suppose that (4.8) holds. (a) Suppose that also (4.3) holds. Then we have (4.11)

E (j'n ((Yn;i)in ) ?

j) ! 0

n ((Yn;i )in )

as n ! 1. Statement (4.11) is labeled as the asymptotic equivalence of unconditional and conditional tests under the extended null hypothesis. (b) If all Yn;i are 0-symmetrically distributed (with possibly heterogeneous distributions) we have (4.12)

E ( n ((Yn;i)in )) = for each n:

(c) In general, we only have (4.11) and E ( n ((Yn;i)in )) ! :

(4.13)

Thus the sequence n has the asymptotic nominal level . (d) Conditional CLT's can often be used to calculate power functions, see Janssen [7] for details. In the second step the present results will be extended to rank statistics Sn (3.14) and the corresponding conditional randomization tests n which are given by the ranks of the absolute values. Suppose that (Xn;i)in is an arbitrary triangular array (with possibly dependent random variables) such that the "absolute ranks" Rn(Zn) (3.13) of Zn = (Zn;i)in ; Zn;i := jXn;ij

are well de ned. Consider (4.14)

Sn (Rn (Zn ); (sign(Xn;i))in

with real scores (an(i))in and (4.15)

n := (

n X i=1

n

X ) = ?1 sign(X n

i=1

an (i)2 )1=2 > 0:

24

n;i )an (Rn;i (Zn ))

As in (4.6) de ne the conditional distribution function Fn (xjRn (Zn)) = P~ (Sn (Rn (Zn); "n )  x)

(4.16)

given the absolute ranks. Below we prove a conditional CLT for arbitrary ranks under the following familiar regularity condition

ja (i)j ! 0 as n ! 1: n?1 1max in n

(4.17)

Again the conditional CLT implies the unconditional CLT for rank tests under the restricted conditions.

4.4 Theorem

Suppose that (4.17) holds. (a) We have sup jFn(xjRn (Zn)) ? (x)j ! 0

(4.18)

x2R

for all ! 2 . (b) Suppose that the scheme (Xn;i)in is rowwise independent with 0-symmetric distributions for each i  n (not necessarily identically distributed). Then Sn (Rn (Zn ); (sign(Xn;i))in ) is asymptotically standard normal. Proof. (a) As in the proof of Theorem 4.2(a) the CLT can be applied to n

X "n ! n?1 "ni an (Rni (Zn ))

(4.19)

i=1

for given absolute values Zn. Notice that (4.19) has conditional variance one and condition (4.17) implies the result. (b) Since sign(Xn;i) and Zn;i are independent for symmetric distributions we again have equality in distribution of n X i=1

n DX

"nian (Rni (Zn)) =

i=1

25

sign(Xn;i)an(Rni(Zn)):

2

The conditional CLT now implies the unconditional CLT.

4.5 Remark

The results stated above give a justi cation for the use of conditional randomization tests n (3.16) for the extended null hypothesis (2.4). Notice that under (4.17) the conditional critical values of (4.16) again satisfy (4.7) for each ! 2 . Thus the statements (d) and (c) of Remark 4.3 remain true for our rank tests. Moreover if in addition (4.14) is asymptotically standard normal then n is asymptotically equivalent to a sequence of unconditional tests with critical values ? (1 ? ), see Remark 4.3(a). 1

5. Technical results and the proofs The proof of Theorem 2.4. Obviously it remains to prove a conditional CLT for "n 7! Wn (("injXi j)in )

(5.1)

in the sense of (4.8). Consider rst the numerator n? = (2.24). By Theorem 4.2 the random variable

1 2

"n 7! n?1=2

n X i=1

Pn i=1 sign(Xi )h(

jXij) of

"in h(jXi j) R

is asymptotically N (0;  ) normal distributed with variance  = h dP . The denominator of (5.1) can be written as 2

(5.2)

"n 7!

n

2

n 1 X

n?1 n

i=1

h(jXi j)

2

?

n ?1 X

n

i=1

"in h(jXi j)

2

2 

:

By the strong law of large numbers we may restrict ourselves to ! 2 with n 1X h(jXi (! )j) !  :

n

2

i=1

26

2

The conditional variance of the mean now satis es n X  1 E ( "inh(jXi j)) j (jXi j)in n

?

(5.3)

= 1

n2

2

i=1 n X i=1

h(jXi j)2 ! 0 as n ! 1:

Thus (5.2) converges to  in P~ -probability for xed !. Notice that an inspection of the proof of Theorem 4.2 shows that the conditional distribution functions of (5.1) are almost surely convergent in !. 2 2

The proof of Theorem 2.7 requires some knowledge about the tangent space of the submodel. For the univariate symmetric location model we refer to Pfanzagl and Wefelmeyer [12], sect. 2.3 and Bickel et al. [4], p. 55. Let (5.4)

Z dP dQ 1 d(P; Q) = ( (( ) = ? ( ) = ) d ) = 2 d d 1 2

1 2 2

1 2

denote the Hellinger distance of two distributions P; Q and P + Q