Confidence tubes for multiple quantile plots via empirical likelihood

Con dence tubes for multiple quantile plots via empirical likelihood John H.J. Einmahl Eindhoven University of Technology & EURANDOM Ian W. McKeague Florida State University June 15, 1999 Abstract

The nonparametric empirical likelihood approach is used to obtain simultaneous con dence tubes for multiple quantile plots based on k independent (possibly rightcensored) samples. These tubes are asymptotically distribution-free, except when both k 3 and censoring is present. Pointwise versions of the con dence tubes, however, are asymptotically distribution-free in all cases. The various con dence tubes are valid under minimal conditions. The proposed methods are applied in three real data examples.

1 Introduction The quantile-quantile (Q-Q) plot is a well-known and attractive graphical method for comparing two distributions, especially when con dence limits are added. In this paper we develop Q-Q plot methods for the comparison of two or more distributions from randomly censored data. More speci cally, we consider the problem of nding simultaneous con dence tubes for multiple quantile plots (for brevity, multi-Q plots) from k independent samples of possibly right-censored survival times. The multi-Q plot is de ned to be the k-dimensional curve (Q (p); : : : ; Qk (p)) parameterized by 0 < p < 1, where Qj is the 1

Partially supported by a Fulbright grant and by European Union grant ERB CHRX-CT 940693. AMS 1991 subject classi cations. Primary: 62G15; secondary: 62G20. Key words and phrases. Censoring, con dence region, distribution-free, k-sample comparison, nonparametric likelihood ratio, quantile-quantile plot. 3 Running head: Con dence tubes for quantile plots. 1 2

1

quantile function of the j -th distribution. It specializes to the ordinary Q-Q plot in the two-sample case. The comparison of quantile functions is particularly useful for the analysis of survival data in biomedical settings. Frail and strong individuals (corresponding to low and high values of p) often respond to dierent treatments in dierent ways, so treatment eects can be hard to determine from comparison of mean or median survival times alone, see, e.g., Doksum (1974). The approach developed here allows comparison of treatments simultaneously across all frailty levels. Our approach is based on the nonparametric empirical likelihood method. This method was originally developed by Thomas and Grunkemeier (1975) and Owen (1988, 1990) as a way of improving upon Wald-type con dence regions. There now exists a substantial literature on empirical likelihood indicating that it is widely viewed as a desirable and natural approach to statistical inference in a variety of settings. Moreover, there is considerable evidence that procedures based on the method outperform competing procedures. Empirical likelihood based con dence bands for individual quantile functions have recently been derived in Li, Hollander, McKeague and Yang (1996). Naik-Nimbalkar and Rajarshi (1997) employed the approach to test for equality of k medians; their test naturally extends to a test for equality of k quantiles. We use the nonparametric empirical likelihood approach to derive asymptotic simultaneous con dence tubes for multi-Q plots based on k independent random samples, including con dence bands for ordinary Q-Q plots (k = 2). The tubes are applicable to situations with or without random censoring. The limiting processes involved in the construction of the tubes are distribution-free, except when k 3 and censoring is present. In general, we are able to obtain asymptotically distribution-free pointwise con dence regions for the multi-Q plot. The various con dence tubes are valid under minimal conditions, although for convenience we shall assume continuity of the underlying distribution functions. Q-Q plots are studied in detail using classical methods in Doksum (1974, 1977), Doksum and Sievers (1976) and Switzer (1976) for models without censoring; see Shorack and Wellner (1986, pages 652{657) for a summary and discussion. For models with censoring, Wald-type simultaneous con dence bands for Q-Q plots are obtained in Aly (1986), but restrictive dierentiability conditions on the underlying distribution functions are required. The k-sample problem without censoring is studied in Nair (1978, 1982), but there essentially only pairwise comparisons are made. A review of graphical methods in nonparametric statistics with extensive coverage of Q-Q plots can be found in Fisher (1983). Some re ned approximation results for normalized Q-Q plots with statistical applications have been established in Beirlant and Deheuvels (1990) for the uncensored case, and Deheuvels and Einmahl (1992) in the censored case. The paper is organized as follows. The proposed con dence tubes and the main results are presented in Section 2. Our approach is illustrated in Section 3 using three real data examples. All the proofs are contained in Section 4. 2

2 Main results We begin by specifying the setup precisely and introducing the basic notation. It is convenient rst to recall the notation in the one-sample case. For the corresponding notation in the general k-sample case, we use a further subscript j to refer to the j-th sample. The random censorship model deals with n i.i.d. pairs (Zi ; i ); i = 1; : : : ; n; obtained from two independent random samples Xi and Yi; i = 1; : : : ; n; in the following way: Zi = Xi ^ Yi ; i = 1fX Y g . The distribution functions of Xi and Yi are denoted F and G, respectively, and F is assumed to be continuous. We will work with non-negative Xi and Yi, but this restriction is in fact not needed anywhere; see the discussion at the end of this section. The (right-continuous) quantile function corresponding to F is denoted by Q. We write n Y ~ L(F ) = (F~ (Zi ) ? F~ (Zi?)) (1 ? F~ (Zi)) ? i

i

1

i

i

i=1

for the likelihood, where F~ belongs to , the space of all distribution functions on [0; 1). The ordered uncensored survival times, Pn i.e., the Xi with corresponding i = 1, are written 0 T : : : TN < 1, and rj = i 1fZ T g denotes the size of the risk set at Tj ?. The empirical likelihood ratio for F~ (t) = p (given 0 < p < 1) is de ned by ~ ~ ~ R(t) = supfL(F ) : F~(t) =~ p; F 2 g : supfL(F ) : F 2 g Note that the sup in the denominator is attained by the Kaplan{Meier (or product-limit) estimator Y 1 Fn(t) = 1 ? 1? : 1

=1

i

j

ri

i:Ti t

It can be shown with the aid of Lagrange's method [see Thomas and Grunkemeier (1975) or Li (1995)] that

?2 log R(t) = ?2

(ri ? 1) log 1 + ri ? 1 ? ri log 1 + ri ;

X

i:Ti t

where the Lagrange multiplier > D := maxi T t(1 ? ri ) satis es the equation Y 1 1? = 1 ? p: r + i i T t : i

(2.1)

: i

Now we turn to the multi-sample setup. The P k samples are assumed to be independent k with sample sizes denoted n ; : : : ; nk ; write n = j nj . Set F = (F ; : : : ; Fk ) and de ne the multi-Q plot to be f(Q (p); : : : ; Qk (p)) : 0 < p < 1g: 1

=1

1

3

1

Observe that this is the classical Q-Q plot when k = 2. In the sequel we consider the following more convenient version of the multi-Q plot: the graph Q of the function

t 7! (Q (F (t )); : : : ; Qk (F (t ))); 1

2

1

1

1

1

for t 0. Denote the joint likelihood by 1

L(F~ ) =

k Y j =1

Lj (F~j );

and the empirical likelihood ratio at t = (t ; : : : ; tk ) by k ~ ~ ~ ~ R(t) = supfL(F ) : Fj (tj ) = F (t ~) for~ all j k= 2; : : : ; k; F 2 g : supfL(F ) : F 2 g Again we nd using Lagrange's method with the k ? 1 constraints: F~ (t ) = F~j (tj ); j = 2; : : : ; k; that 1

1

1

1

?2 log R(t) = ?2

j (rji ? 1) log 1 + r ? 1 ? rji log 1 + r j ;(2.2) ji ji

k X X j =1 i:Tji tj

where the j ; j = 2; : : : ; k, satisfy the k ? 1 equations

Y

i:T1i t1

1? 1 r i+ 1

P

1

1

=

P

Y

i:Tji tj

1? 1 rji + j ;

(2.3)

here we have set = ? kj j (so kj j = 0) and the j should satisfy j > Dj for j = 1; : : : ; k. Later we show that this system of equations indeed has a unique solution, see Lemma 4.1. In the one-sample case it is immediately clear that the corresponding Lagrange multiplier equation (2.1) has a unique solution, but it is not obvious in the multi-sample case. Computation of the j 's can be carried out using a special-purpose root- nding procedure which exploits the monotonicity of the r.h.s. of (2.3) as a function of j (see Section 3 and the proof of Lemma 4.1). The various con dence sets we propose are easily obtained from the main theorem below and are presented in the three subsequent theorems. These con dence sets are all of the form ft : R(t) > cg, where c is derived using asymptotic considerations. Before stating our main theorem we introduce some more notation. We assume throughout that nj =n ! pj > 0 as n ! 1 for j = 1; : : : ; k (although with some care this condition can be relaxed to nj ! 1). De ne 1

=2

j (s) = 2

Z

s 0

=1

dFj (u) (1 ? Fj (u))(1 ? Fj (u?))(1 ? Gj (u?)) : 4

We will need the k k-matrix D = D(t) with entries 8 > > ? i(ti )j (tj ) for j 6= i

> ppipj < dij = > i (ti ) X > > : pi l6 i li

ij

2

for j = i

=

where

Y l (tl ) pl ij = l6 ki; l6 j X Y l (tl ) pl q l6 q 2

=

=

2

=1

=

(the empty product is de ned to be 1). Also de ne V = V (t) to be the random kvector with j -th entry Wj (j (tj ))=j (tj ), where the Wj are independent standard Wiener processes. Let be such that F ( ) > 0 and let be such that F ( ) < 1; G ( ) < 1 and Gj (Qj (F ( ))) < 1 for j = 2; : : : ; k. We assume throughout that the Fj are continuous. 2

1

1

1

1

2

1

1

2

1

2

2

Theorem 2.1. When R, D and V are evaluated at t = (t ; Q (F (t )); : : : ; Qk (F (t ))) for t , we have D ?2 log R?!jj DV jj (2.4) on D[ ; ], with jj jj the k-dimensional Euclidian norm. Write the restriction of Q to t 2 [ ; ] as Q[ ; ]. In the next theorem we consider 1

1

1

2

1

1

1

1

2

2

1

2

1

1

2

1

2

the important case k = 2, in which the multi-Q plot reduces to the usual Q-Q plot. De ne c [s ; s ] for 0 < < 1 by 1

!

2

sup W (s)=s < c [s ; s ] = 1 ? :

P

2 1

s2[s1 ;s2 ]

1

2

Set c^ = c [^ ( ); ^ ( )], where 2

1

2

^ (t ) ^ (Q (F (t ))) n2 n1 ^ (t ) = n n + ; n 2

2

with

2 1

1

2 2

1

2

1

1

^j (s) = nj 2

(2.5)

1

2

X

1

(2.6)

r r ? 1)

( i:Tji s ji ji

and with Q n2 the (right-continuous) quantile function corresponding to F n2 . Now we de ne the con dence band for Q[ ; ] to be B = ft 2 [ ; ] [0; 1) : ?2 log R(t) < c^g: 2

2

1

1

2

2

5

Theorem 2.2. In the censored case, for k = 2 and 0 < < 1, lim P (Q[ ; ] 2 B) = 1 ? : n!1 Remark 2.1 In the uncensored case (and k = 2) we have that 1

2

(t ) := p(t ) + (Q (pF (t ))) 1 1 F (t ) + = p p 1 ? F (t ) 1 = p1 + p (t ): 2

2 1

1

2 2

1

2

1

1

1

2

1

1

1

2

1

1

2 1

2

1

(2.7)

1

Therefore for this case we can replace the ^ (t ) de ned in (2.5) by the simpler but almost equivalent estimator 1 1 F (t ) ^ (t ) = n n + n 1 ? Fn1 (t ) : n1 For use in c^ , we can replace ^ (t ) by F n1 (t ) : 1 ? F n1 (t ) Observe that this last expression is not an estimator of (t ) but of (t ). This however makes no dierence because of (2.7) and the fact that for c > 0 2

2

1

1

2

1

1

2

1

1

1

1

1

1

1

1

2

2 1

1

1

sup W (s)=s =D sup W (s)=s: 2 1

s2[s1 ;s2 ]

2 1

s2[cs1 ;cs2 ]

Of course, here the Kaplan{Meier estimator F n1 is just the empirical distribution function of the rst sample. 1

Next we return to general k 2, but assume that there is no censoring. Note that in this case the assumptions on reduce to F ( ) < 1. De ne C [s ; s ] for 0 < < 1 by 2

1

2

k? X 1 sup s Wj (s) < C[s ; s ] 1

P

2

s2[s1 ;s2 ]

1

j =1

!

2

1

2

= 1 ? :

Set C^ = C [^ ( ); ^ ( )], where 2 1

1

2 1

2

^ (t ) = 1 ?F Fn1 (t ()t ) : 2 1

1

1

n

1 1

De ne the con dence tube for Q[ ; ] by 1

n

1

2

1

o

T = t 2 [ ; ] [0; 1)k? : ?2 log R(t) < C^ : 1

1

2

6

(2.8)

Theorem 2.3. In the absence of censoring, for all k 2 and 0 < < 1, lim P (Q[ ; ] 2 T ) = 1 ? : n!1 1

2

Now we allow censoring and k 2 but take = . Set Q[ ] = Q[ ; ]. De ne the con dence region for Q[ ] by 2

1

1

1

1

1

R = ft 2 f g [0; 1)k? : ?2 log R(t) < g; 1

1

2

where is the upper -quantile of the chi-square distribution with k ? 1 degrees of freedom. In the case k = 2 note that R amounts to a con dence interval for the F ( )quantile of F . 2

1

1

2

Theorem 2.4. In the censored case, for all k 2 and 0 < < 1, lim P (Q[ ] 2 R) = 1 ? : n!1 1

The asymptotic null distribution in the test for equality of k medians developed by Naik-Nimbalkar and Rajarshi (1997) can be essentially derived from the proof of Theorem 2.4 by taking as their estimator of the common median. 1

Finally we establish an interval property for the con dence tube T (which also applies to B and R): one-dimensional cross-sections parallel to a given axis are intervals. This is useful for computing the various con dence sets because points belonging to them can then be found by a simple search strategy that sweeps along each axis.

Theorem 2.5. Suppose that t l = (t ; : : : ; tjl ; : : : ; tk ) 2 T for l = 1; 2, where tj < tj . Then, t = (t ; : : : ; tj ; : : : ; tk ) 2 T for any tj 2 [tj ; tj ]. ( )

1

( )

(1)

(1)

1

(2)

(2)

In the two-sample case (k = 2) we have a somewhat stronger result.

Theorem 2.6. Let (t l ; t l ), l = 1; 2, belong to the con dence band B and suppose t t and t t . Then (t ; t) also belongs to B whenever t t t and t t (2) 1 (2) 2

t .

(2) 2

(1) 2

( ) 1

( ) 2 1

(2) 1

2

1

(1) 1

(1) 1

(1) 1

2

This theorem (as well as Theorem 2.5) implies, by taking t = t or t = t , that the intersection of the band B with a vertical or horizontal line is an interval. In addition, it shows that the bands are nondecreasing in the sense that their lower or upper boundaries are nondecreasing. (1) 1

7

(2) 1

(1) 2

(2) 2

Discussion We wish to emphasize that our approach, including the de nition of the

multi-Q plot, is new even in the uncensored case. We also remind that non-negativity of the observations is not needed anywhere in the proofs. This is especially useful in the uncensored case, where often the k samples do not represent life- or failure times, or when a transformation is applied to the data (see the third example in Section 3). Another desirable feature of our approach is that the con dence bands and tubes are essentially invariant under permutations of the order of the k samples involved. (Only at the two `ends' of the tube does the rst sample play a somewhat special role.) We did not formulate a version of our con dence tubes in the censored case for k 3 since then jjDV jj in (2.4) is not distribution-free, even when only one of the k samples is subject to censoring. Our approach can, however, be generalized to this situation by estimating all the unknowns appearing in D and V and then using simulation. This means that we replace D by C (given in the proof of Theorem 2.1), tj by Qjn (F n1 (t )) for j = 2; : : : ; k, and j by ^j . The process to be simulated has the form jjC V^ jj , where V^ is the estimated version of V . Hence, approximate 1 ? con dence tubes can be constructed for the censored case as well, but we do not pursue this in further detail here. The one-sample Q-Q plot, t 7! Q(F (t)) with F known, is essentially treated in Li et al. (1996), since their con dence bands for Q(p) can be transformed to bands for Q(F (t)) by the time change p = F (t). The present paper can be seen as a generalization of their approach to the k-sample case. For uncensored data, in the two-sample Q-Q plot case, our con dence bands perform well in the tails due to the weighting which naturally arises when using the empirical likelihood method. Our bands share this property with the weighted bands (W bands) introduced in Doksum and Sievers (1976), which are based on the standardized two-sample empirical process. The bands in Switzer (1976) [and Aly (1986) for the censored case] are much wider in the tails, since they are based on the unweighted empirical process. All these procedures as well as our procedures are essentially based on the inversion of a distance between empirical distribution functions (or Kaplan{Meier estimators). In fact, the W bands are asymptotically equivalent to our bands in the uncensored case. 2

1

j

2

2

1

2

0

0

0

0

3 Applications to real data In this section we illustrate our approach in three real data examples. First we consider a biomedical example for the two-sample case with censored survival data. The data come from a Mayo Clinic trial involving a treatment for primary biliary cirrhosis of the liver, see Fleming and Harrington (1991) for discussion. A total of n = 312 patients participated in the randomized clinical trial, 158 receiving the treatment (D-penicillamine) and 154 receiving a placebo. Censoring is heavy (187 of the 312 observations are censored). Figure 1 displays the 90% con dence band (and pointwise con dence intervals) for the Q-Q plot of treatment versus placebo for survival time in days. The standard empirical Q-Q plot based on quantiles of the Kaplan{Meier estimator 8

0

Treatment (survival in days) 1000 2000 3000

4000

is also displayed. Note that although the diagonal departs from the pointwise con dence region at some points, it remains within the simultaneous band, so there is no overall evidence of a dierence between treatment and placebo.

0

1000 2000 3000 Placebo (survival in days)

Figure 1: 90% Con dence band (solid line) for the treatment versus placebo Q-Q plot in the Mayo Clinic trial, for 186 t 2976 days; pointwise con dence intervals (short dashed line), empirical Q-Q plot (long dashed line). 1

The second example also illustrates the two-sample case. Hollander, McKeague and Yang (1997) analyzed data on 432 manuscripts submitted to the Theory and Methods Section of JASA during 1994. Each observation consists of the number of days between a manuscript's submission and its rst review or the end of the year, along with a censoring indicator (1 if a paper received its rst review by the end of the year; 0 otherwise). Similar data (on 444 manuscripts) are available for 1995. The censoring is light (330 of the 876 observations are censored) compared with the previous example. It is of interest to look for dierences in the pattern of review times for the two years. Figure 2 displays the 95% con dence band (and pointwise con dence intervals) for the Q-Q plot. The lower 9

0

Days to First Review (1995) 50 100 150

200

endpoints of the pointwise con dence intervals touch the diagonal between 10 and 25 days, which might suggest that \rapid" reviews were faster in 1994 than in 1995. However, the diagonal is completely contained within the simultaneous band, so there is no overall evidence of a dierence between the patterns of review times.

0

50 100 150 Days to First Review (1994)

200

Figure 2: 95% Con dence band (solid line) for the Q-Q plot based on the JASA timeto- rst-review data, for 5 t 195 days; pointwise con dence intervals (short dashed line), empirical Q-Q plot (long dashed line). 1

The third example concerns times to breakdown (in minutes) of an insulating uid under three elevated voltage stresses, from data reported in Nair (1982, Table 1). It is important to determine whether the distribution of time to breakdown changes with voltage. There are 60 uncensored observations at each voltage level (34, 35 and 36 Kv). As in Nair (1982) we use the 34 Kv measurements as a reference sample and put the breakdown times on a log-scale. Figure 3 shows cross-sections of the 95% con dence regions for the multi-Q plot at three values of the reference sample: t = 0:41, 1.06 and 1.65. The con dence tube gives simultaneous coverage over the interval 0:41 t 1:65. 1

1

10

The diagonal (t ; t ; t ) runs above the pointwise con dence region at t = 1:65 (top right plot) suggesting that increased voltage can reduce breakdown time in the upper tail of the distribution. However, the diagonal falls completely inside the simultaneous tube (left column) so there is a lack of signi cant evidence for breakdown time changing with voltage. 1

1

2

2

3

1

3

1

-2

-1

0

1

••••• •••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

-3

-3

-2

-1

0

1

••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••

-1

0

1

2

3

-3

-2

-1

0

1

2

3

2

3

2

3

2

2

3

-2

3

-3

-2

-1

0

1

• ••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••

-3

-3

-2

-1

0

1

••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••• • ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

-1

0

1

2

3

-3

-2

-1

0

1

2 1

1

2

3

-2

3

-3

-2

-1

0

•••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••

-3

-3

-2

-1

0

••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

-3

-2

-1

0

1

2

3

-3

-2

-1

0

1

Figure 3: Time to insulating uid breakdown (in log-scale), 36Kv sample versus 35Kv sample; cross-sections of the 95% simultaneous con dence tube (left column) and pointwise con dence regions (right column) at t = 0:41, 1.06 and 1:65 in the 34 Kv reference sample (bottom row to top row, respectively). 1

In these examples, we computed the Lagrange multipliers in the system of equations (2.3) using the van Wijngaarden{Dekker{Brent root nding algorithm (Press, Teukolsky, Vetterling and Flannery, 1992, p. 359). The proof of Lemma 4.1 provides a constructive method to obtain the solution by repeated use of their algorithm. The thresholds c^ and C^ used in the con dence bands/tubes were computed by simulation of the Wiener processes on a ne grid. 11

4 Proofs Here we present proofs of the theorems in Section 2. Some lemmas used in these proofs are given at the end of this section.

Proof of Theorem 2.1 First we note that by Lemma 4.1 below the system of equations P k in (2.3) with replaced by ? j j has a unique solution for all k 2. De ne gj : (Dj ; 1) ! R by 1

=2

gj () =

X

i:Tji

log 1 ? 1 rji + t

j

for j = 1; : : : ; k. Denote aj = gj (0) = log S^j (tj ), where S^j is the Kaplan{Meier estimator of Sj = 1 ? Fj and bj = gj0 (0) = ^j (tj )=nj with ^j as in (2.6). Here tj = Qj (F (t )) for j = 2; : : : ; k. Taylor series expansions of g and gj , in conjunction with Lemma 4.2 and the argument of Li (1995, proof of (2.15), p.102), yield 2

2

1

1

1

0 = g ( ) ? gj (j ) = a ? aj + b ? j bj + OP (n? ); 1

1

1

(4.9)

1

1 1

uniformly in t 2 [ ; ]. Ignore the remainder term for the moment and consider the system of equations ~ j bj ? ~ b = a ? aj for j = 2; : : : ; k; ~ + : : : + ~k = 0; (4.10) with unknowns ~ ; : : : ; ~k . By Lemma 4.3 this system has as unique solution ~ j = P i6 j (ai ? aj ) ij with the ij as de ned in the lemma. We now use this result to obtain an approximation for the j . The remainder term in (4.9) consists of the remainders in the Taylor series expansions of g and gj , and both are of order OP (n? ). Attach these remainder terms to a and aj , respectively, and apply Lemma 4.3. Note that ^j is a uniformly consistent estimator of j , so bj = OP (n? ) and ij = OP (n), and it follows that 1

1

2

1 1

1

1

1

=

1

1

1

2

2

1

j = ~j + OP (n? )OP (n) = ~ j + OP (1); 1

uniformly for t 2 [ ; ]. We also have that j bj = ~ j bj + OP (n? = ): 1

1

2

2

(4.11)

1 2

2

Applying the Taylor series argument of Li (1995, p.102) to (2.2) and using (4.11) then gives k X ?2 log R(t) = ~j bj + OP (n? = ): 2

j =1

12

1 2

Write the leading term above in the form k X ~j bj = jjCwjj ; 2

2

j =1

where C is the k k-matrix with entries

p for j 6= i cij = b? Pbibj

ij for j=i i l6 i li =

and w is the k-vector with entries

p Sj (tj ) = aj ? log p S (t ) : wj = aj ? log bj bj 1

1

The proof is completed by noting that

W ( (t )) (wj (tj ))j ;:::;k ?! j j j j (tj ) j 2

D

=1

;:::;k

=1

= V (t ); 1

P where W ; : : : ; Wk are independent standard Wiener processes, and cij ?! dij . 1

Proof of Theorem 2.2 Let us rst simplify jjDV jj for this case. Note that for k = 2 2

we have

D = (1t ) 2

with as in Remark 2.1. So

1

?1p(t1 )2 (t2 )

12 (t1 ) p1 ?1p(t1 )2 (t2 ) p1 p2

p1 p2

!

22 (t2 ) p2

;

2

2

1

where a = aa0 , and hence

!

p 1 (t1 ) p1 ?p2(t2 ) p2

D = (1t )

2

;

2

DV = (1t ) 2

=D so that

1

1 (t ) 2

1

p 1 (t1 ) p1 ?p2(t2 ) p2 p ( 1 t1 ) p1 ?p2(t2 ) p2

! !

W ( (t )) ? W ( (t )) pp pp 2 1

1

1

1

2

1

jjDV jj =D W ((t(t) )) : 2 1

2

2

2

13

1

1

2 2

2

2

W ( (t )); 1

2

P It is well-known that ^j (s)?! j (s); j = 1; 2; and hence with some care it can be P P shown that ^ (l )?! (l ); l = 1; 2. Setting c = c [ ( ); ( )], this yields c^ ?! c . Combining the above we obtain 2

2

2

2

2

2

1

2

P (Q[ ; ] B) = P (?2 log R(t ; Q (F (t )) < !c^ for all t 2 [ ; ]) ! P sup W ((t(t) )) < c t1 2 1 ;2 ! W (s) < c = 1 ? ; = P sup s s2 2 1 ; 2 2 1

2

1

2

2 1

[

2

]

1

2

1

1

1

2

1

1

2 1

[

(

)

(

)]

where we used, for the convergence statement, that the random variable in the last expression has a continuous distribution. Before continuing with the proofs of the theorems let us do some calculations on

jjDV jj of Theorem 2.1, in general. Note that D is symmetric and by Lemma 4.4 it is idempotent of rank k ? 1. Thus we may diagonalize D = D(t ) as follows: 2

D(t ) = P (t 1

1

)0

1

Ik? 0 P (t ); 0 0 1

(4.12)

1

where P (t ) is orthogonal and Ik? is the identity matrix of order k ? 1. Put Z (t ) = P (t )V (t ). Then 1

1

1

1

1

jjD(t )V (t )jj = V (t )0D(t )0D(t )V (t ) = V (t )0D(t )V (t ) 1

2

1

1

1

1

1

1

Ik? 0 Z (t ); 0 0

= Z (t )0

1

1

1

1

(4.13)

1

where the second equality follows since D is symmetric and idempotent. The covariance structure of the process Z (t ) is given, for two values of t , say s t, by 1

E (Z (s)Z (t)0 ) = P (s)diag

1

(s) (Q (F (s))) k (Qk (F (s))) 0 (t) ; (Q (F (t))) ; : : : ; k (Qk (F (t))) P (t) : (4.14) 1

2

2

1

1

1

2

2

1

1

Proof of Theorem 2.3 First observe that

= F (t ) = (t ); j (Qj (F (t ))) = 1 ?FjF(Q(jQ(F(F(t ())) t ))) 1 ? F (t ) 2

1

1

1

j

j

1

1

1

1

1

1

1

2 1

1

for j = 2; : : : ; k. This implies D(t ), and hence P (t ), does not depend on t . Thus the r.h.s. of (4.14) reduces to (s) I : (t) k 1

1

1

1

14

1

It follows that the process Z (t ) has the same distribution as the process 1

0 W ( (t )) W k ( (t ) (t )) ; : : : ; (t ) ; 2 1

1

1

2 1

1

1

1

1

1

where the Wj 's are independent standard Wiener processes, and hence by (4.13) k? X jjDV jj = Wj((t(t) )) : j 2

1

D

2

2 1

2 1

=1

1

1

Now the proof of this theorem can be completed along the same lines as that of Theorem 2.2. In this case use continuity of the random variable k? X Wj (s) s ; j 1

sup

s2[12 (1 );12 (2 )]

2

=1

which follows from a property of Gaussian measures on Banach spaces, namely that the measure of a closed ball is a continuous function of its radius; apply, e.g., Paulauskas and Rackauskas (1989, Ch. 4, Theorem 1.2) to the Gaussian measure induced by the process s? = (W (s); : : : ; Wk? (s)) on the Banach space of Rk? -valued continuous functions on [ ( ); ( )] endowed with the supremum norm. 1 2

2 1

1

1

2 1

1

1

2

Proof of Theorem 2.4 This theorem can be proven along the lines of the previous two.

We only note that now the r.h.s. of (4.14), with s = t = , reduces to the identity matrix Ik . Thus Z ( ) is a k-vector of independent standard normal random variables. Hence from (4.13) we nd that jjDV jj , evaluated at , has a k? distribution. 1

1

2

2

1

1

Proof of Theorem 2.5 In order not to overdo the notation we restrict ourselves to proving this theorem for k = 3; for k = 6 3 the proof is essentially the same. W.l.o.g. we take j = 3. Because the denominator of the likelihood ratio does not depend on t = (t ; : : : ; tk ), we only consider the expression 1

?2 log

N YY 3

j

j =1 i=1

hji (1 ? hji )r

ji

?1;

with the hji 2 (0; 1) de ned by hji = (F~j (Tji ) ? F~j (Tj;i? ))=(1 ? F~j (Tj;i? )), cf. Li (1995, (1.3)). Setting zji = log(1 ? hji ), this becomes 1

?2 log

N YY 3

j

j =1 i=1

(1 ? e e

zji ) zji (rji ?1)

= ?2

N XX 3

j

j =1 i=1

1

fzji(rji ? 1) + log(1 ? ez )g =: g(z); ji

with z = (z ; : : : ; z N1 ; z ; : : : ; z N2 ; z ; : : : ; z N3 ). Observe that g is a convex function. 11

1

21

2

31

3

15

Now g(z), z 2 (?1; 0)N1

X

i:T1i t1

zi= 1

N2 +N3 ,

+

X

i:T2i t2

has to be minimized under the constraints

X

z i and 2

i:T1i t1

zi= 1

X

i:T3i t3

z i:

(4.15)

3

Solutions of (4.15) for t = t l that minimize g(z), are denoted with z l ; l = 1; 2; respectively. For t 2 [t ; t ], de ne the function ( )

3

(1) 3

(2) 3

f (x) =

X

i:T1i t1

( )

(xz i + (1 ? x)z i ) ? (1) 1

(2) 1

X

i:T3i t3

(xz i + (1 ? x)z i ) (1) 3

(2) 3

for 0 x 1. Since t t , we easily see that f (0) 0. Similarly, using t t , we obtain f (1) 0. Thus there exists an x 2 [0; 1] such that f (x ) = 0. De ne z = (xz + (1 ? x)z ; : : : ; xz N3 + (1 ? x)z N3 ): Then trivially the two equations in (4.15) are satis ed for z = z and t = t. Also because g is convex g(z ) xg(z ) + (1 ? x )g(z ): This implies, since ?2 log R(t l ) < C^; l = 1; 2; that ?2 log R(t) < C^ , i.e. t 2 T . (2) 3

3

3

(1) 11

(2) 11

(1) 3

(1) 3

(2) 3

(1)

(2)

( )

The proof of Theorem 2.6 is similar to, but easier than, the previous proof. Moreover it is a straightforward extension of the proof of Theorem 1 in Li et al. (1996). Therefore we will omit the proof here. We conclude by proving the four lemmas that we used earlier. Lemma 4.1. The system of equations (2.3), with unknowns ; : : : ; k , has a unique solution for all k 2 provided Dj < 0 for j = 1; : : : ; k. 2

Proof De ne fj : (Dj ; 1) ! (0; 1) by fj () =

Y

i:Tji tj

1? 1 rji +

(4.16)

for j = 1; : : : ; k. We need to show that the system of equations

f ? 1

k X j =2

!

j = fj (j ); j = 2; : : : ; k

(4.17)

has a unique solution. Note that fj is continuous, strictly increasing, and vanishes as j # Dj . It then follows that there is a unique solution to (4.17) when k = 2, because 16

the decreasing function f (? ) must cross the increasing function f ( ) at exactly one value of 2 (D ; ?D ). Now consider k 3. For each xed > D and j = 3; : : : ; k, there exists a unique j = j ( ) such that f ( ) = fj (j ). Each of these j 's is strictly increasing as a function of because f and fj are strictly increasing. Now consider the equation 1

2

2

2

2

2

1

2

2

2

2

2

2

2

f ? ? 1

k X

2

j =3

!

j ( ) = f ( ): 2

2

(4.18)

2

The l.h.s. of (4.18) is de ned whenever D < < D, where D is the unique solution to 2

2

2

2

X ?D ? j (D) = D1: k

2

2

j =3

Note that D < D because 2

2

?D ? 2

k X j =3

j (D ) = ? 2

k X j =2

Dj > 0 > D : 1

Moreover, as a function of 2 (D ; D ), the l.h.s. of (4.18) is strictly decreasing and vanishes as " D; the r.h.s. is strictly increasing and vanishes as # D . Thus (4.18) holds for some unique = 2 (D ; D ). Now set j = j ( ) for j = 3; : : : ; k. It is then clear that ( ; : : : ; k ) is the unique solution to (4.17). Lemma 4.2. Suppose nj =n ! pj > 0 for j = 1; : : : ; k. Set tj = Qj (F (t )) for j = 2; : : : ; k and t = (t ; : : : ; tk ). Then 2

2

2

2

2

2

2

2

2

2

2

2

2

1

1

1

j = j (t ) = OP (n = ) uniformly over [ ; ]: Proof Write the value of each side of (2.3) as 1 ? p when t has the above form. By Li (1995, p.101), if j < 0 then n ? log(1 ? p) A^j (tj ) n +j ; j j 1 2

1

1

2

where A^j is the Nelson{Aalen estimator of Aj , and if j 0 then the above inequality reverses. Thus for any Pkpair j , l with j < 0, l 0 (such pairs always exist, if not all the j 's are 0, since j j = 0) we have =1

A^j (tj ) and hence

nj

nj + j

A^l(tl )

nl

nl + l ;

l nj A^j (tj ) ? j nl A^l (tl ) (A^l (tl ) ? A^j (tj ))nl nj : 17

Note that Aj (tj ) = A (t ) and A (t ) is bounded away from 0 if t . Thus by the uniform convergence of the Nelson{Aalen estimators A^j , we have that for any " > 0 and n suciently large, A^j (tj ) A (t ) for all t 2 [ ; ] with probability at least 1 ? ", similarly for A^l . It then follows that 1 0 (l nj ? j nl )A (t ) (A^l (tl ) ? A^j (tj ))nl nj ; 2 with probability 1 ? ", for n suciently large. Finally, using the fact that A^j (tj ) = A (t ) + OP (n? = ) uniformly over [ ; ], we nd that j = OP (n = ) for all j = 1; : : : ; k, uniformly for t 2 [ ; ]. Lemma 4.3. The system of equations (4.10) has solution X ~j = (ai ? aj ) ij ; 1

1

1

1 2

1

1

1

1

1

1

1

1

1 2

1

1

1

1

1

2

1

1 2

2

2

i6=j

where

ij =

Y 0

l6=i; l6=j

k Y X

bl and = 0

The solution is unique when all the bl 's are positive. Proof The coecient of a1 in Pkj=1 ~j is

X i6=1

i ? 1

X j 6=1

i=1 l6=i

bl

!?

1

:

j = 0; 1

P similarly for the coecients of a ; : : : ; ak . Thus kj ~j = 0. The coecient of a in ~ b is Y b = bl 2

2

and the coecient of a in ~ b is 1

X

1 1

?b

1

i6=1

12

0

2 2

Y

1

0

l6=1

bl +

XY 0

i6=1 l6=i

XY

1 1

0

2 2

l6=1

i = ?

so the coecient of a in ~ b ? ~ b is 1

1

=1

i6=1 l6=i

bl

bl = 1:

The same argument shows that the coecient of a in ~ b ? ~ b is ?1. The coecient of aq , with q 3, in ~ b ? ~ b is 2

2 2

1 1

b q ? b q = b 2

2

2 2

1

1

0

2

Y

l6=2; l6=q

18

bl ? b

Y

1

l6=1; l6=q

1 1

!

bl = 0:

This shows that ~ b ? ~ b = a ? a and the same argument shows that all the other equations in (4.10) are satis ed. Lemma 4.4. The k k-matrix D = D(t) is idempotent, i.e. D = D, and of rank k ? 1. 2 2

1 1

1

2

2

Proof Setting vj = j (tj )=pj , we have 2

P Q v dii = Pkj 6 i Ql6 j l ; q l 6 q vl ?pvivj Ql6 i;l6 j vl dij = Pk Q ; i 6= j: q l 6 q vl =

=

=1

=

=

=1

=

=

Because of the various symmetries it suces to show that

d = 11

k X i=1

d i and d = 2 1

k X

12

d i d i; 1

i=1

(4.19)

2

for the idempotency of D. For the rst equality we need to show that k Y X j =2 l6=j

Writing C =

vl

!

Pk Q j =2

k Y X i=1 l6=i

!

vl =

k Y X j =2 l6=j

!

2

vl +

v this reduces to ! k Y X C C + vl = C + v vj

k X j =2

v vj 1

l6=j l ,

2

1

l6=1

or, subtracting C on both sides,

2

l6=1;l6=j

vl :

!

Y

2

l6=1; l6=j

j =2

!

Y

vl ;

2

k Y X j =2 l6=j

vl

!

Y l6=1

vl = v

k X 1

j =2

vj

!

Y

2

l6=1; l6=j

vl ;

which is easily seen to be true. For the second equality in (4.19) we have to show that

! k ! k Y XY p vl = ? v v vl 1 2

l=3

XY

i=1 l6=i

j 6=1 l6=j

+ + 19

vl

!

1 2

XY j 6=2 l6=j

! k Y p ? v v vl

vl

!

l=3

! k Y p ? v v vl 1 2

l=3

k X Y Y vi pv v vl vl : i=3

1 2

l6=1; l6=i

l6=2; l6=i

Dividing both sides by ?pv v

Qk

v yields k Y k Y X XY XY X vl = vl + vl ? vl ; 1 2

i=1 l6=i

l=3 l

j 6=1 l6=j

j 6=2 l6=j

i=3 l6=i

which is obviously true. This establishes the idempotency of D. For the second statement in the lemma, note that the rank of an idempotent matrix is equal to its trace. It is easily seen that the trace of D is k ? 1. Acknowledgements We thank the reviewers for many constructive suggestions, and Myles Hollander for providing the JASA time-to- rst-review data. John Einmahl thanks the Department of Statistics, Florida State University, for their warm hospitality during the writing of the article.

References Aly, E.-E. A. A. (1986). Quantile-quantile plots under random censorship. Journal of Statistical Planning and Inference 15, 123{128. Beirlant, J. and Deheuvels, P. (1990). On the approximation of P-P and Q-Q plot processes by Brownian bridges. Statistics and Probability Letters 9, 241{251. Deheuvels, P. and Einmahl, J. H. J. (1992). Approximations and two-sample tests based on P-P and Q-Q plots of the Kaplan-Meier estimators of lifetime distributions. Journal of Multivariate Analysis 43, 200{217. Doksum, K. A. (1974). Empirical probability plots and statistical inference for nonlinear models in the two-sample case. The Annals of Statistics 2, 267{277. Doksum, K. A. (1977). Some graphical methods in statistics. A review and some extensions. Statistica Neerlandica 31, 53-68. Doksum, K. A. and Sievers, G. L. (1976). Plotting with con dence: Graphical comparisons of two populations. Biometrika 63, 421{434. Fisher, N. I. (1983). Graphical methods in nonparametric statistics: A review and annotated bibliography. International Statistical Review 51, 25{58. Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis, Wiley, New York. Hollander, M., McKeague, I. W. and Yang, J. (1997). Likelihood ratio-based con dence bands for survival functions. Journal of the American Statistical Association 92, 215{226. Li, G. (1995). On nonparametric likelihood ratio estimation of survival probabilities for censored data. Statistics and Probability Letters 25, 95{104. 20

Li, G., Hollander, M., McKeague, I. W. and Yang, J. (1995). Nonparametric likelihood ratio con dence bands for quantile functions from incomplete survival data. The Annals of Statistics 24, 628{640. Naik-Nimbalkar, U. V. and Rajarshi, M. B. (1997). Empirical likelihood ratio test for equality of k medians in censored data. Statistics and Probability Letters 34, 267{ 273. Nair, V. N. (1978). Graphical Comparisons of Populations in some Non-linear Models, Ph.D. thesis, University of California at Berkeley. Nair, V. N. (1982). Q-Q plots with con dence bands for comparing several populations. Scandinavian Journal of Statistics 9, 193{200. Owen, A. (1988). Empirical likelihood ratio con dence intervals for a single functional. Biometrika 75, 237{249. Owen, A. (1990). Empirical likelihood ratio con dence regions. The Annals of Statistics 18, 90{120. Paulauskas, V. and Rackauskas, A. (1989). Approximation Theory in the Central Limit Theorem. Exact Results in Banach Spaces, Kluwer, Dordrecht. Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (1992). Numerical Recipes in C (second edition), Cambridge University Press. Shorack, G. R. and Wellner, J. A. (1986). Empirical processes with applications to statistics. Wiley, New York. Switzer, P (1976). Con dence procedures for two-sample problems. Biometrika 63, 13{25. Thomas, D. R. and Grunkemeier, G. L. (1975). Con dence interval estimation of survival probabilities for censored data. Journal of the American Statistical Association 70, 865{871. Department of Mathematics Department of Statistics and Computing Science Florida State University Eindhoven University of Technology Tallahassee, FL 32306-4330 P.O. Box 513 E-mail: [email protected] 5600 MB Eindhoven The Netherlands E-mail: [email protected]

21