Contention Resolution with Bounded Delay - CiteSeerX

1 downloads 0 Views 287KB Size Report
if not, then it could guess that its trial probability ... In striking contrast to Kelly's result, the important ... de ned in 9], in that, for every packet P, the expected.
Contention Resolution with Bounded Delay Mike Patersony

Aravind Srinivasanz

Dept. of Computer Science University of Warwick Coventry CV4 7AL England

Dept. of Info. Systems & Computer Science National University of Singapore Singapore 0511 Republic of Singapore

Abstract

their main result is that for any given resource request, its expected delay (expected time until it is serviced) is O(logn). Assuming further that the initial clock times of the processes are within a known bound B of each other, we present a stable protocol again for some xed positive request rate 1 , wherein the expected delay for each request is O(1), independent of n. We derive this by showing an analogous result for an in nite number of processes (which is a model for processes entering and leaving dynamically), assuming that all processes agree on the time; this is the rst such result. We also present tail bounds which show that for every given resource request, it is unlikely to remain unserviced for much longer than expected, and extend our results to various classes of input distributions.

When distributed processes contend for a shared resource, we need a good distributed contention resolution protocol, e.g., for multiple-access channels (ALOHA, Ethernet), PRAM emulation, and optical routing. Under a stochastic model of request generation from n synchronous processes, Raghavan & Upfal have shown a protocol which is stable for a positive request rate; their main result is that for every resource request, its expected delay (time to get serviced) is O(log n). Assuming that the initial clock times of the processes are within a known bound of each other, we present a stable protocol, wherein the expected delay for each request is O(1). We derive this by showing an analogous result for an in nite number of processes, assuming that all processes agree on the time.

1 Introduction

In scenarios where a set of distributed processes have a single shared resource that can service at most one process per time slot, the main problem is devising a \good" distributed protocol for resolving contention for the resource by the processes. This has traditionally been studied in the context of multipleaccess channels (e.g., ALOHA) and for Ethernet protocols, and more recently for PRAM emulation and for routing in optical computers. Assuming a stochastic model of continuous request generation from a set of n synchronous processes (see Section 1.1 for the formal de nition), Raghavan & Upfal have very recently shown a protocol which is stable as long as the request rate is at most 0 for some xed 0 < 1 [19];  Supported in part by the ESPRIT Basic Research Action Programme of the EC under contract No. 7141 (project ALCOM II). y E-mail: [email protected]. z E-mail: [email protected]. Most of the work was done while this author was visiting the Department of Computer Science, University of Warwick; part was done while visiting the Max-Planck-Institut fur Informatik, Saarbrucken, Germany.

1.1 Model and motivation

In multiple-access channels (MACs), there is one channel (resource) shared by a nite or in nite number of synchronized senders (i.e., each sender's local clock ticks at the same rate as the others' clocks). Time is slotted into units and in each time unit, each sender may receive packets according to some distribution. Each sender which has packets will have to send its packets (one at a time) to the channel, but if more than one sender attempts to transmit at the same time slot, the packets collide and are not received by the sender. Otherwise if exactly one packet was sent in a slot, it is received by the channel which sends the corresponding sender an acknowledgement. Thus if a sender did not receive an acknowledgement for a packet sent, it knows that there was a collision, and must try again; it is natural to expect randomized protocols to play a key role in this. This model was initiated by work on ALOHA, a multi-user communication system based on radio-wave communication (Abramson [1]), and a similar situation arises in Ethernet protocols (Metcalfe & Boggs [16]). Much research on MACs was spurred by ALOHA, especially in the information theory community; see, e.g., the spe-

cial issue of IEEE Trans. Info. Theory on this topic [12]. Recently such resource-allocation problems have arisen again, in the context of PRAM emulation (running PRAM algorithms on more realistic models of parallel computation) and in message routing in optical computers. These parallel models include optical networks (Anderson & Miller [4], Gereb-Graus & Tsantilas [7], Goldberg, Jerrum, Leighton & Rao [8]), DMM models (Dietzfelbinger & Meyer auf der Heide [6]), and Valiant's S*PRAM model [20]; see MacKenzie, Plaxton & Rajaraman [14] for details. In addition, MACs provide a good model to study the abstract problem of distributed contention resolution for a common shared resource. All these de nitions can easily be extended to the case of more than one channel (shared resource). Rather than the static case (see below), we will be interested in the dynamic scenario of packets arriving into the system at every time step, according to some distribution. There are two important parameters for a MAC protocol: the arrival rate  of packets into the system (the expected number of new arrivals per unit time), and stability. To de ne stability, suppose W(P) is the random variable measuring the amount of time a packet P spends in the system (before being sent successfully to the channel). Then let the random variable Wave be the limit as i ! 1 of the arithmetic mean of W(P) for the rst i packets arriving into the system. Similarly, we may de ne the random variable Lave as the limit as i ! 1 of the average over the rst i steps of the number of waiting packets. Finally, we may de ne Tret to be the time taken to have all sender queues empty, if we start from an arbitrary state of the system (weighted by the probability of being in such a state). Unifying several previous de nitions, Hastad, Leighton & Rogo de ne a protocol to be stable if and only if all three of E[Wave ], E[Lave], and E[Tret ] are nite [9]. Actually, Lave = Wave with probability one and similarly, the throughput rate (average rate of successful transmissions) equals  with probability one, for a stable protocol [9]. We must distinguish a few models when de ning the problem further. First, we might have a nite or in nite number of senders. In the former case, there are n senders into which there is a continuous in ux of packets; at most one packet arrives per sender, in any given time step. The usual assumption is that these arrivals are independent across di erent time steps and across di erent senders, and that the expected total arrival per time step is at most . The in nite case is a natural extension of this, with a random number of packets arriving with a Poisson distribution of mean , inde-

pendently at each step. Here, each packet may be regarded as a sender in itself. The next key feature is the type of acknowledgement sent by the channel to the senders. A popular model used in the information theory literature for this is that of ternary feedback : at the end of each time slot, each sender receives information on whether zero, one, or more than one packets were sent to the channel at that time step. In this case, stable protocols are known for   0:4878    (Vvedenskaya & Pinsker [21]), and there is no stable protocol for   0:587    (Mikhailov & Tsybakov [17]); but if the stronger feedback of the exact number of packets that tried at the current step is sent to each sender, then there is a stable protocol for all  < 1 (Pippenger [18]). A weaker feedback model which is more realistic for the purposes of PRAM emulation and optical routing is acknowledgement-based, wherein the only information known to each sender which attempted to send a packet, is whether it succeeded or not; idle senders get no information. The acknowledgement-based feedback model is thus a minimal-information model, and we follow [9, 14, 19] in focussing on this henceforth. The above classi cations dealt with the dynamic situation of packet arrivals at every step. Alternatively, we may consider a static scenario where at most h of n senders have a packet each to send to the channel; the problem then is to design a distributed protocol (wherein each sender only knows the value h, and whether it has a packet to send or not) for this. Assuming acknowledgement-based feedback, the work of [14], among other things, improves on previous work to provide near-optimal bounds for various problems relating to the static version; in the optical routing case, a similar problem is termed h-relation routing, for which the best known bounds are due to [8]. Since the static case is fairly well understood, we focus only on the dynamic case in this work.

1.2 Previous work

We now give an informal description of a common idea that runs through most known protocols for our problem; this is merely a rough sketch, and there are many variants. In the in nite case, consider a newly born packet P. P could try using the channel with some probability p1 at each slot, for some number of time slots t1 . If it is successful, it leaves the system; if not, then it could guess that its trial probability p1 was \too high", and hence will next try using the channel with some probability p2 at each slot for the next t2 time slots, where p2 < p1 and t2 > t1 . This idea is then repeated with p1 > p2 > p3 >    and t1 < t2 < t3 <   , until P successfully leaves the

system. One way to formalise this is via backo protocols, which are parametrised by a non-decreasing function f : Z + ! Z + , where Z + denotes the set of nonnegative integers. In the in nite case, a generic packet P that has made i  0 unsuccessful attempts at the channel, will pick a number ri uniformly at random from f1; 2; : : :; f(i)g, and will next attempt using the channel ri time slots from then. If successful in this process, P will leave the system, otherwise it will increment i and repeat this process. In the nite case, each sender queues its packets and conducts such a protocol with the packet at the head of its queue; once this packet is successful, the failure-count i is reset to 0. If f(i) = (i + 1)(1) or 2i, then such a protocol is naturally termed a polynomial backo protocol or a binary exponential backo protocol, respectively. For our model of interest, the dynamic setting with acknowledgement-based protocols, only negative results were known in the in nite senders case. Kelly showed that any backo protocol with a backo function f(i) that is o(2i ){thus, any polynomial backo protocol in particular{is unstable for all  > 0 [13], and Aldous extended this to the case of binary exponential backo , a modi cation of which is the Ethernet protocol [2]. Also, any stable protocol in the in nite case must have  < 0:587    [17]. In striking contrast to Kelly's result, the important work of [9] showed, among other things, that in the nite senders case most polynomial backo protocols are stable for all  < 1. However, their proven upper bound for E[Wave] is 2f (n) , where f(n) = nO(1) . For applications such as high-speed communications, where the average delay E[Wave] needs to be kept small, the very recent work of [19] presented a protocol for the nite case, which is stable for all  < 0 ( 1=10, using the analysis of [19]), with the key property that E[Wave] = O(logn), after an initial setup time of nO(1) steps (note the signi cant reduction in E[Wave]). Moreover, it is shown in [19] that, for each member P of a large set of protocols that includes all known backo protocols, there exists a threshold P < 1 such that if  > P then E[Wave ] = (n) must hold for protocol P .

1.3 Our results

In the nite senders case, we take a further step in the direction of [19], with the view that E[Wave] must be kept low. Recall that the known results use the fact that the senders' clocks all tick at the same rate. Under the additional assumption that the clocks of the n senders di er by at most a known bound of B time steps, we present a protocol that has E[Wave] = O(1)

for  < 1 , independent of n, after a setup time of O(B log B + n log B log n) steps. In this paper, we take 1 = 1=(2e) (where e is the base of the natural logarithm), though we will show in the journal version how this can be improved to 1=e. Our result and that in [19] have a stability property stronger than that de ned in [9], in that, for every packet P, the expected waiting time for P is O(1) (resp., O(log n) in [19]). In our view, this assumption on the clock di erences is reasonable since, within a local enough area to be able to share a common resource, clocks would usually agree to within a few minutes. (With the same motivation, Hui & Humblet consider a somewhat different problem [11].) Another way of looking at this result is that since the expected waiting time for packets is a crucial parameter, yet another payo is seen for building accurate clocks. Our above result is shown quite easily from the main construction we have, which is a stable protocol for the in nite case when  < 1, assuming that all senders agree on the time. Thus, this additional assumption on accurate clocks allows the rst stable acknowledgement-based protocol for the in nite case. The in nite case is of interest too, since it models situations where senders may enter and leave the system, with no known reasonable bound on the number n of competing processes. An interesting point here is that our results are complementary to those of [9]: while the work of [9] shows that (negative) results for the in nite case may have no bearing on the nite case, our results suggest that better intuition and positive results for the nite case may be obtained via the in nite case. Our protocols are simple. We show an explicit, easily computable collection fSi;t : i; t = 0; 1; 2; : ::g of nite sets of nonnegative integers Si;t ; for all i and t, every element of Si;t is smaller than every element of Si+1;t . A packet born at time t and which has made i (unsuccessful) attempts at the channel so far, picks a time r uniformly at random from Si;t , and tries using the channel at time r. If it succeeds, it leaves the system; else if it fails, it increments i and repeats this process. We also show good upper tail bounds on W(P) for every packet P: 8a > 0, Pr[W(P)  a] = O(a?c1 ), where c1 > 1 is a constant. Thus for our protocol, the expected number of packets (and hence the expected total storage size) in the system at any given time is O(1), improving on the (log n) bound of [19]. Finally, we extend our results to various input distributions to show that our protocol is robust to fairly \non-random" distributions with weak tail properties. Thus we show that the expected waiting times of

packets can be reduced to just O(1), if reliable clocks are available. In the in nite senders case, we ask for all clocks to agree; this gives us the rst stable acknowledgement-based protocol. In the case of nite senders, it suces if there is a known upper bound on the time di erences between the clocks.

2 Notation and preliminaries

For any ` 2 Z + , we denote the set f1; 2; : : :; `g by [`]; logarithms are to the base two, unless speci ed otherwise. In any time interval of a protocol, we shall say that a packet P succeeded in that interval if it reached the channel successfully during that interval. Theorem 1 presents the Cherno -Hoe ding bounds [5, 10]; see, e.g., Appendix A of [3] for details.

Theorem 1 Let R be a random variable with E[R] =   0 such that either: (a) R is a sum of a nite

number of independent random variables X1 ; X2; : : : with each Xi taking values in [0; 1], or (b) R is Pois-: son. Then for any   1, Pr[R  ]  H(; ) =

(e ?1 =  ) : Fact 1 is easily veri ed.

Fact 1 If  > 1 then H(; )  e?=M , where M is positive and monotone decreasing for  > 1.

We next recall a very useful tail inequality due to McDiarmid [15]:

Lemma 1 ([15, Lemma 1.2]) Let X1 ; : : :; Xn be independent random variables, variable Xi taking values in a nite set Ai Q for each i 2 [n], and suppose the function f : i Ai ! R (the set of reals) satis es the following \bounded di erence" condition: for each k 2 [n], there is a constant ck such that jf(x) ? f(x0 )j  ck , whenever the vectors x; x0 di er only in the kth co{ordinate. Then for any given t > 0, both Pr[f(X) ? E[f(X)] Pt] and 2 2 Pr[f(X) ? E[f(X)]  ?t] are at most e?2t = i ci . Suppose (at most) s packets are present in a static system, and that we have s time units within which we would like to send out a \large" number of them to the channel, with high probability (w.h.p.). We now give an informal sketch of our ideas. A natural rst scheme to try is for each packet independently to attempt using the channel at a randomly chosen time from [s]. Since a packet is successful if and only if no other packet chose the same time slot as it did, the \collision" of packets is a dominant concern; the number of such colliding packets is now studied by

Lemma 2 Suppose at most s balls are thrown uni-

formly and independently at random into a set of s bins. Let us say that a ball collides if it is not the only ball in its bin. Then, (i) for any given ball B , Pr[B collides]  1 ? (1 ? 1=s)s?1 < 1 ? 1=e, and (ii) if C denotes the total number of balls that collide then, for any  > 0,

Pr[C  s(1 ? 1=(e(1 + )))]  F(s; ); where F(s; ) =: e?s2 =(2e2(1+)2 ) :

Proof Part (i) is direct. For part (ii), number the

balls arbitrarily as 1; 2; : : : . Let Xi denote the random choice for ball i, and C = f(X1 ; X2 ; : : :) be the number of colliding balls. It is easily seen that, for any placement of the balls and for any movement of any desired ball (say the ith ) from one bin to another, we have ci  2, in the notation of Lemma 1. Invoking Lemma 1 concludes the proof. 2 Lemma 2 suggests an obvious improvement to our rst idea if we have many more slots than packets. Suppose we have s packets in a static system and ` available time slots t1 < t2 <    < t` , with s  `=(e(1 + )) for some  > 0. Let  i?1 1 ` : for i  1; (1) `i () = e(1 + ) 1 ? e(1 + ) thus, s  `1(). The idea is to have each packet try using the channel at some randomly chosen time from fti : 1  i  `1 ()g. The number of remaining packets 1 )  `2 () w.h.p., by Lemma 2(ii). is at most s(1? e(1+ ) Each remaining packet attempts to use the channel at a randomly chosen time from fti : `1 () < i  `1 () + `2 ()g; the number of packets remaining is at most `3 w.h.p.(for s large). The basic \random trial" process of Lemma 2 is thus repeated a suciently large number of times. total number of time slots used P `The is at most 1 j =1 j () = `, which was guaranteed to be available. In fact, we will also need a version of such a scenario where some number z of such protocols are run independently, as considered by De nition 1. Although we need a few parameters for this de nition, the intuition remains simple.

De nition 1 Suppose `, m and z are positive integers,  > 0, and we are given sets of packets P1; P2; : : :; Pz and sets of time slots T1 ; T2 ; : : :; Tz such that:  Pi \ Pj =  and Ti \ Tj =  if i 6= j , and  jTij = ` for all i.

Let Ti = fti;1 < ti;2 <    < ti;` g. De ne `0 = 0, and `i = `i () as in (1) for i  1. Then, RT(fPi : i 2 [z]g; fTi : i 2 [z]g; m; z; ) denotes the performance of z independent protocols E1 ; E2; : : :; Ez (\RT" stands for \repeated trials"). Each Ei has m iterations, and its j th iteration is as follows: each packet in Pi that collided in all of the rst (j ? 1) iterations, picks a random time from

fti;p : `0 + `1 +    + `j ?1 < p  `0 + `1 +    + `j g; and attempts using the channel then.

Remark. Note that the fact that distinct protocols

Ei are independent follows directly from the fact that the sets Ti are pairwise disjoint. Since no inter-packet communication is needed in RT, we de ne, for convenience, De nition 2 If P 2 Pi in a protocol RT(fPi : i 2 [z]g; fTi : i 2 [z]g; m; z; ), the protocol for packet P is denoted PRT(Ti; m; ). The following useful lemma shows that, for any xed  > 0, two desirable facts hold for RT provided jPij  `1 () for each i (where ` = jTij), if ` and the number of iterations m are chosen large enough: (a) the probability of any given packet not succeeding at all can be made at most any given small positive constant, and (b) the probability of there remaining any given constant factor of the original number of packets can be made exponentially small in `. Lemma 3 For any given positive ,  and , there exist nite positive m(), `(; ; ) and p(; ; ) such that for any m  m(), any `  `(; ; ), any z  1, and `i = `i () de ned as in (1), the following hold if we perform

RT(fPi : i 2 [z]g; fTi : i 2 [z]g; m; z; ); provided jPij  `1 for each i. (i) For any packet P , Pr[P did not succeed ]  . (ii)

Pr[in total at least `z packets were unsuccessful ]  ze?`p(;;) . Proof Let P 2 Pi. Let nj (i) denote the number of unsuccessful elements of Pi before the performance of the j th iteration of protocol Ei , in the notation of De nition 1. By assumption, we have n1(i)  `1 ; iterative application of Lemma 2 shows that X Pr[nm+1(i)  `m+1 ]  F(`j ; ): j 2[m]

It is also easily seen, using Lemma 2, that the probability of P failing throughout is at most Y (1 ? 1=e) (1 ? 1=e + F(`j ; )): j 2[m?1]

These two failure probability bounds imply that if we pick ? log  m() > ? log(1 ? 1=e) and then choose ` large enough, we can ensure part (i), and also make the probability of ` elements of Pi remaining unsuccessful as small as e?`p(;;) . This also yields part (ii). 2

3 Tree model for contention resolution: the in nite case

We assume a system of time slots. At every time slot t = 0; 1; 2; : ::; a random number of packets, in a Poisson distribution with mean  < 1, is injected into the system; the arrivals are independent for di erent time slots. We assume that all the packets agree on a common global time; there is no common knowledge (or inter-packet communication) apart from this. At every time slot, each packet in the system will decide autonomously, based on its current time, its time of entry into the system, its history of unsuccessful attempts in the past, and on the outcome of its internal coin ips, to try using the channel or not. If it attempts and succeeds, then the packet leaves the system; otherwise, if it makes a failed attempt, the only information it gains is that it tried at this current time but failed due to a collision. If it chose not to attempt at that time slot, it gets no information on what happened then in the channel. We present the ideas parameterized by several constants. Later we will choose values for the parameters to maximize the throughput. There will be a trade-o between the maximum throughput and the expected waiting time for a packet; a di erent choice of parameters could take this into consideration. The constants we have chosen for simplicity guarantee that our protocol is stable for  < 1=(2e). We can increase this threshold for  by employing more complicated protocols, which we omit in this version. Henceforth, we assume that  < 1=(2e) is given and de ne 0 by 0 = (1 ? 2e)=(1 + 2e) : Note that 0 > 0 by our assumption on .

(2)

3.1 The tree protocol

Three important constants, b; r and k, shape the protocol, where b is a positive integer and r; k > 1. At any time during its lifetime in the protocol, a packet is regarded as residing at some node of an in nite tree T, which is structured as follows. There are countably in nitely many leaves ordered left-to-right, with a leftmost leaf. Each non-leaf node of T has exactly k children, where k>r: (3) As usual, we visualise all leaves as being at the same (lowest) level, their parents being at the next level, and so on. Note that the notions of left-to-right ordering and leftmost node are well-de ned for every level of the tree. T is not actually constructed; it is just for exposition. We associate a nite nonempty set of nonnegative integers Trial(v) with each node v. De ne L(v) =: minfTrial(v)g, R(v) =: maxfTrial(v)g, and the capacity cap(v) of v, to be jTrial(v)j. A required set of properties of the Trial sets is the following: P1. If u and v is any pair of distinct nodes of T, then Trial(u) \ Trial(v) = ; P2. If u is either a proper descendant of v, or if u and v are at the same height with u to the left of v, then R(u) < L(v). P3. The capacity of all nodes at the same height is the same. Let ui be a generic node at height i. Then, cap(u0) = b and cap(ui) = dr cap(ui?1)e, for i  1. (Thus, cap(ui ) = bri if r is integral; otherwise, bri  cap(ui )  bri + (ri ? 1)=(r ? 1).) Suppose we have such a construction of the Trial sets. (Note property (P1): in particular, the Trial set of a node is not the union of the sets of its children.) Each packet P injected into the system at some time slot t0 will initially enter the leaf node u0(P) where u0(P) is the leftmost leaf such that L(u0 (P)) > t0 . Then P will move up the tree if necessary to parent nodes of increasing height, in the following way. In general, suppose P enters a node ui (P) at height i, at time ti ; we will be guaranteed the invariant \Q: ui (P) is an ancestor of u0(P), and ti < L(ui (P))." P will then run protocol PRT(Trial(ui(P)); m; 0 ), where m is a suitably large integer to be chosen later. If it is successful in this process, P will (of course) leave the system, otherwise it will enter the parent ui+1 (P) of ui (P), at the last time slot (element of Trial(ui (P))) at which it tried using the channel and failed, while running PRT(Trial(ui(P)); m; 0 ). (P knows what this time slot is: it is the mth slot at

which it attempted using the channel, during this performance of PRT.) Invariant Q is established by an easy induction on i, using Property P2. Note that the set of packets Pv entering any given node v perform protocol RT(Pv ; Trial(v); m; 1; 0), and, if v is any non-leaf node with children u1 ; u2; : : :; uk , then the trials at its k children correspond to RT(fPu1 ; : : :; Puk g; fTrial(u1); : : :; Trial(uk )g; m; k; 0); by Property P1. Thus, each node receives all the unsuccessful packets from each of its k children; an unsuccessful packet is imagined to enter the parent of a node u, immediately after it found itself unsuccessful at u. The intuition behind the advantages o ered by the tree is roughly as follows. Note that in a MAC problem, a solution is easy if the arrival rate is always close to the expectation (e.g., if we always get at most one packet per slot, then the problem is trivial). The problem is that, with probability 1, in nitely often there will be \bulk arrivals" (bursts of a large number of input packets within a short amount of time); this is a key problem that any protocol must confront. The tree helps in this by ensuring that such bursty arrivals are spread over a few leaves of the tree and are also handled independently, since the corresponding Trial sets are pairwise disjoint. One may expect that, even if several packets enter one child of a node v, most of the other children of v will be \well-behaved" in not getting too many input packets. These \good" children of v are likely to successfully transmit most of their input packets, thus ensuring that, w.h.p., not too many packets enter v. Thus, bursty arrivals are likely to be smoothed out, once the corresponding packets enter a node at a suitable height in the tree. In short, our assumption on time-agreement plays a symmetrybreaking role. Informally, if the proportion of the total time dedicated to nodes at height 0 is 1=s, where s > 1, then the proportion for height i will be approximately (r=k)i=s. Since the sum of these proportions for all i can be at most 1, we require s  k=(k ? r); we will take s = k=(k ? r) :

(4)

More precisely, the Trial sets are constructed as follows; it will be immediate that they satisfy Properties P1,P2,P3. First de ne k = 16; s = 2; and r = 8:

(5)

We remark that though we have xed these constants, we will use the symbols k; s and r (rather than their numerical values) wherever possible, to retain generality. Also, rather than present the values of our other

constants right away, we choose them as we go along, to clarify the reasons for their choice. For i  0, let Fi = fj  0 : j  2i (mod 2i+1 )g; the sets Fi form a partition of Z + . Note that Fi is just the set of non-negative integers with zeroes in the i least signi cant bits (denoted lsbs henceforth) in their binary expansion, and a one in the (i + 1)st lsb. Let vi be a generic node at height i; if it is not the leftmost node in its level, let ui denote the node at height i that is immediately to the left of vi . We will ensure that all elements of Trial(vi ) lie in Fi. (For any large enough interval I in Z + , the fraction of I lying in Fi is roughly 1=2i+1 = (r=k)i=s; this was what we meant informally above, regarding the proportion of time assigned to di erent levels of the tree.) We now de ne Trial(vi ) by induction on i and from left-to-right within the same level, as follows. If i = 0, then if v0 is the leftmost leaf, we set Trial(v0 ) to be the smallest cap(v0 ) elements of F0; else we set Trial(v0 ) to be the cap(v0) smallest elements of F0 larger than R(u0). If i  1, let w be the rightmost child of vi . If vi is the leftmost node at height i, we let Trial(vi ) be the cap(vi ) smallest elements of Fi that are larger than R(w); else de ne Trial(vi ) to be the cap(vi) smallest elements of Fi that are larger than maxfR(ui); R(w)g. In fact, it is easy to show by the same inductive process that, if ui is de ned, then R(w) > R(ui); hence for every node vi with i  1, L(vi )  R(w) + 2i+1 = R(w) + s(k=r)i : (6)

3.2 Waiting times of packets

Our main random variable of interest is the time that a generic packet P will spend in the system, from its arrival. Let a = e(1 + 0 ) (7) and d be a constant greater than 1. De nition 3 For any node v 2 T , the random variable load(v), the load of v, is de ned to be the number of packets that enter v; for any positive integer t, node v at height i is de ned to be t-bad if and only if load(v) > bridt?1=a: Node v is said to be t-loaded if it is t-bad but not (t + 1)-bad. It is called bad if it is 1-bad, and good otherwise. It is not hard to verify that, for any given t  1, the probability of being t-bad is the same for any nodes at the same level in T; this brings us to the next de nitions. De nition 4 For any (generic) node ui at height i in T and any positive integer t, pi (t) denotes the probability that ui is t-bad.

De nition 5 (i) The failure probability q is the maximum probability that a packet entering a good node will not succeed during the functioning of that node. (ii) For any packet P , let u0(P); u1(P); u2(P); : : : be the nodes of T that ui is allowed to pass through, where the height of ui (P) is i. Let Ei (P) be the event that P enters ui(P). If a node u at height i is good then, in the notation of Lemma 3, its load is at most `1 (0 ), where ` = cap(u); hence, Lemma 3(i) shows that for any xed q0 > 0, q < q0 can be achieved by making b and the number of iterations m large enough. Note that the distribution of Ei (P) is independent of its argument . Hence, for any i  0, we may de ne fi =: Pr[Ei(P)] for a generic packet P. Suppose P was unsuccessful at nodes u0 (P); u1(P); : : :; ui(P). Let A(i) denote the maximum total amount of time P could have spent in these (i+1) nodes. Then, it is not hard to see that A(0)  s cap(u0) + s cap(u0 ) = 2sb and that for i  1, A(i)  kA(i ? 1)+(k=r)i sbri , using (6). Hence, A(i)  (i + 1)sbki for all i.

(8)

The simple, but crucial, Lemma 4 is about the distribution of an important random variable W(P), the time that P spends in the system.

Lemma 4 (i) For any packet P , Pr[W(P) > A(i)]  P A(j)f fi+1 for all i  0, and E[W(P)]  1 j . (ii) j =0 For all i  1, fi  qfi?1 + pi?1(1): Proof Part (i) is immediate, using the fact that for a non-negative random variable Z, P Pr[Z integer-valued E[Z] = 1  i]. For part (ii), note that i=1 fi = fi?1 Pr[EijEi?1]:

(9)

Letting ci = Pr[ui?1(P) was goodjEi?1], Pr[EijEi?1] = ci Pr[Eijui?1(P) was good ^ Ei?1] + (1 ? ci)Pr[Eijui?1(P) was bad ^ Ei?1]  Pr[Eijui?1(P) was good ^ Ei?1] + Pr[ui?1(P) was badjEi?1]  q + Pr[ui?1(P) was badjEi?1]  q + Pr[ui?1(P) was bad]=Pr[Ei?1]: By (9), we see that fi is at most fi?1 q + Pr[ui?1(P) was bad], i.e., at most qfi?1 +pi?1 (1). 2

3.3 The improbability of high nodes being heavily loaded

As is apparent from Lemma 4, our main interest is in getting a good upper bound on pi (1). However, to do this we will also need some information about pi (t) for t  2, and hence De nition 4. The basic intuition is that if a node is good then, w.h.p., it will successfully schedule \most" of its packets; this is formalized by Lemma 3(ii). In fact, Lemma 3(ii) shows that for any node u in the tree, the good children of u will, w.h.p., pass on a total of \not many" packets to u, since the functioning of each of these children is independent of the other children. To estimate pi(t), we rst handle the easy case of i = 0. Recall that if X1 and X2 are independent Poisson random variables with means 1 and 2 respectively, then X1 + X2 is Poisson with mean 1 + 2 . Thus, u0 being t-bad is a simple large-deviation event for a Poisson random variable with mean sb. If, for every t  1, we de ne t =: dt?1=(sa) and ensure that t > 1 by guaranteeing sa < 1; (10) then Theorem 1 shows that p0(t) = Pr[u0 is t-bad]  H(sb; t) : (11) Our choices for s and a clearly validate (10). We now consider how a generic node ui at height i  1 could have become t-bad, for any given t. The resulting recurrence yields a proof of an upper bound for pi(t) by induction on i. The two cases, t  2 and t = 1, are covered by Lemmas 5 and 6 respectively. We now require d2 + k ? 1  dr : (12) Remark. Lemma 5 can be strengthened, but we present this version for simplicity. Lemma 5 Suppose d2 + k ? 1  dr. Then for i  1 and t  2, if a node ui at height i in T is t-bad, then at least one of the following two conditions holds, for ui 's set of children. (i) At least one child is (t+1)-bad, or (ii) at least 2 children are (t ? 1)-bad. Thus,

 pi (t)  kpi?1(t + 1) + k2 (pi?1(t ? 1))2 : Proof Suppose that ui is t-bad but that neither (i) or (ii) holds. Then, ui has at most one child v that is either t-loaded or (t ? 1)-loaded, and none of the other children of ui is (t ? 1)-bad. Node v can contribute a load of at most bri?1dt=a packets to ui ;

the other children contribute a total load of at most (k ? 1)bri?1dt?2=a. Thus the children of ui contribute a total load of at most bridt?2(d2 + k ? 1)=a, which contradicts the fact that ui is t-bad if (12) holds. 2 In the case t = 1, a key role is played by the intuition that the good children of ui can be expected to transmit much of their load successfully. We x q and m, and place a lower bound on our choice of b. Note, from (12), that r > d. De ne 1 ; 2 > 0 by r ? d and  = r : 1 = a(k 2 ak ? 1) For q, we just need to ensure q < 1=k; say q = 1=(k + 1) :

(13)

In the notation of Lemma 3, we de ne m = m(q) , (14) and require b  maxf`(q; 0; 1); `(q; 0 ; 2)g: (15)

Lemma 6 For any i  1,

k  pi (1)  kpi?1(2) + 2 (pi?1 (1))2 + k(k ? 1)pi?1(1)e?bri?1 p(q;0 ;1 ) + ke?bri?1 p(q;0 ;2 ) : Proof Suppose that ui is t-bad. There are two possibilities: that at least one child of ui is 2-bad or that at least two children are 1-bad. If neither of these conditions holds, then either (A) ui has exactly one child which is 1-loaded with no other child being bad, or (B) all children are good. In case (A), the k ? 1 good children contribute a total of at least i cap(ui)=a ? cap(ui?1)d=a = br (rar? d) packets to ui. In the notation of Lemma 3, z = k ? 1, ` = bri?1 and  = 1. Since there are k choices for the 1-loaded child, Lemma 3(ii) shows that the probability of occurrence of case (A) is at most k(k ? 1)pi?1(1)e?bri?1 p(q;0 ;1 ) : In case (B), the k good children contribute at least cap(ui)=a = bri =a : By a similar argument, the probability of occurrence of case (B) is at most ke?bri?1 p(q;0 ;2 ) :

The inequality in the lemma follows. 2 Next is a key theorem that proves an upper bound for pi(t), by induction on i. We assume that our constants satisfy the conditions (3, 4, 7, 10, 12, 13, 14, 15).

Theorem 2 For any xed  < 1=(2e), there is a suf ciently large value of b such that the following holds. There are positive constants ; and , with ; > 1, such that 8i  0 8t  1; pi(t)  e? i t?1 : Before proceeding to the proof of Theorem 2, let us see why it shows the required property, that E[W(P)], the expected waiting time of a generic packet P in the system, is nite. Theorem 2 shows that for large i, pi?1 (1) is negligible compared to qi and hence, by Lemma 4(ii), fi = qi(1 + o(1)), where the o(1) term goes to zero as i tends to in nity. Hence, Lemma 4(i) combined with the bound (8) shows that if we ensure that q < 1=k then E[W(P)] is nite (and good upper tail bounds can be proven for the distribution of W(P)). Thus (13) guarantees the niteness of E[W(P)]. Proof of Theorem 2 This is by induction on i. If i = 0, we use inequality (11) and require that H(sb; t)  e? t?1 :

(16)

From (10), we see that t > 1; thus by Fact 1, there is some M > 0, M  M1 , such that H(sb; t)  e?t sb=M : Therefore to satisfy inequality (16), it suf ces to ensure that dt?1b=(aM)  t?1 : We will do this by choosing our constants so as to satisfy: d  and b  aM :

(17)

We will choose and to be fairly close to (but larger than) 1, and so the rst inequality will be satis ed. Although will have to be quite large, we will be free to choose b suciently large to satisfy the second inequality. We proceed to the induction for i  1. We rst handle the case t  2, and then the case t = 1. Case I: t  2. By Lemma 5, it suces to show that k i?1 t ?

ke + 2 e?2 i?1 t?2  e? i t?1 : It is easy to verify that this holds for some suciently large , provided > and 2 > :

(18)

We can pick = 1 +  and = 1 + 2 for some small positive ,  < 1, to satisfy (18). Case II: t = 1. The rst term in the inequality for pi (1) given by Lemma 6 is the same as for Case I with t = 1;i thus it can be assumed to be much smaller than e? , by an appropriate choice of constants, as seen above. Similarly, the second term in the inequality for pi(1) can be handled by assuming that < 2 and that is large enough, which again has been handled above. The nal two terms given by Lemma 6 sum to k(k ? 1)pi?1 (1)e?bri?1 p(q;0 ;1 ) + ke?bri?1 p(q;0 ;2 ) : (19) We wish to make each summand in (19) at most, say, e? i =4; we just need to ensure that bri?1p(q; 0; 1)  i + ln(4k2) and (20) bri?1p(q; 0; 2)  i + ln(4k) : (21) Since r > , both of these are true for suciently large i. To satisfy these inequalities for small i, we choose b suciently large to satisfy (15,17,20,21), completing the proof of Theorem 2. 2 Finally, we can choose d = 4: It is now easily veri ed that conditions (3,4,10,12, 17,18) are all satis ed. Thus, we have presented stable protocols for  < 1=(2e).

Theorem 3 In a MAC problem with in nitely many

senders, suppose the senders' clocks all agree on the time. Then for any xed  < 1=(2e), our protocol guarantees an expected waiting time of O(1) for every packet.

4 The nite case

This model is the one studied in [9, 19]. There are n senders, with a packet arriving with probability i at sender i at every time step, independently of the other senders; arrivals at di erent time P steps are independent of each other. We assume ni=1 i   < 1=(2e), as in the in nite case. The further assumption we make is that, in addition to synchrony, there is a known bound on the time di erence between any pair of sender clocks, i.e., that only the last W bits of the time will have to be agreed upon by the senders, for some known W. Note that once we have this agreement, we can simply run our \in nite senders" protocol; so we focus on this clock agreement problem now. One obvious solution to this is for the senders to communicate with each other to agree on the time. Though this is potentially expensive, this one-shot

cost might well be balanced by the good saving in the storage requirements and in the waiting times for all packets, from then on. Suppose though that such inter-process communication is prohibitively expensive, and the only means of communication is the shared channel. We show how to use it to agree on the time within O(W2W + nW logn) steps, w.h.p. (We have not attempted to optimize the running time of this protocol.) To this end, the senders will send fake \packets" to the channel; this should not be confused with our actual MAC protocol to be run later on. The clock agreement protocol would ask all senders to \switch on" when their local clocks show some particular agreed-upon time. Let ` = a0 log n for some suitable constant a0. Each sender s will independently attempt to use the channel with probability 1=n independently at each step, until it succeeds. If s does not succeed within 2W + ` steps, it will stop attempting to use the channel; else if it does succeed, it will then continuously attempt using the channel, for the next 2  2W + ` steps. Since any pair of senders switch on within 2W steps of each other, it is clear that at most one sender (the leader) is successful. No leader will be elected only if, for the ` successive steps beginning 2W steps after the rst sender switched on, either no sender or at least two senders tried using the channel. The probability of this happening is very small, e? (`) . Thus we may assume that a leader s0 was elected. Starting at time step 3  2W + ` + 1 since it switched on, s0 will attempt to make all other senders agree with its local time, in phases, P1; P2; : : :; PW . A generic sender that is not s0 will be denoted by s. The sender s will, starting at time step 3  2W + ` + 1 since it switched on, try to agree with s0 's clock. Phase Pi lasts for h = 3  2W + cn log(nW) steps for a suitable constant c; thus, two di erent senders might di er by at most one in the index of the phase that they think they are in. After phase Pi, all senders will agree with s0 on the i lsbs of the time, w.h.p. Assuming that P1; P2; : : :; Pi?1 have been nished, we describe Pi. Let Ti denote the set of time steps when the clock of s0 shows a one in the ith lsb. In Pi, s0 attempts to use the channel exactly at those time steps that lie in Ti . Any other sender s, on the other hand, attempts using the channel independently with probability 1=(3n) at each time slot, and infers the ith lsb by taking the majority result from the time steps (in its version of Pi) in which it tried using the channel and collided. A quick analysis of the correctness of this follows. The details will be given

in the nal version. During the period that was Pi according to s0 , s would have tried using the channel ((h ? 2W )=n) times, w.h.p. Since the measure of Ti during this period is roughly 1=2 and since the expected number of non-leaders that can collide with s at any time step is roughly 1=3, the majority result chosen by s will be correct, w.h.p. Similarly, the fact that s might have thought that some portions of this period belonged to Pi?1 (or Pi+1) has negligible e ect, since h  2W . This protocol takes O(W2W +nW log(nW)) = O(W2W +nW log n) steps, and hence we get the following result.

Theorem 4 In a MAC problem with n senders, suppose the senders' clocks di er by at most a known number B of steps. For any xed  < 1=(2e), our protocol, after a setup time of O((B + n log n) logB) steps, achieves w.h.p. an expected waiting time of O(1) for every packet.

5 The e ect of the input distribution

Suppose that the distribution of incoming packets to the system has substantially weaker random properties than the independent Poisson distribution (or independent binomial, in the nite case); our protocol will still ensure that the expected waiting time for every packet is O(1). The motivation for studying this is two-fold. First, our contention resolution protocol might be a module in a larger system, with the previous module feeding packets with some possibly very \non-random" distribution. For instance, one of the results of [14] is that for PRAM emulation, memory locations can be hashed in an `-wise independent fashion for some suitably large xed ` rather than in a completely random fashion, to avoid having to store huge hash tables. (Recall that a sequence of random variables X1 ; X2 ; : : :; Xm is `-wise independent if every ` of them are mutually independent; it is well known that such sequences can be sampled using many fewer random bits than their completely independent counterparts. We will encounter these again, below.) We might be able to guess the packet distribution of their PRAM emulation, for any given PRAM algorithm. The second reason is to show that the very good largedeviation properties of \well-behaved" distributions like independent Poisson/binomial are not crucial for our protocol to achieve E[Wave ] = O(1). In particular, for an `-wise independent distribution to be sketched below, direct use of the protocol and analysis of [19] for the nite case will mandate E[Wave] = n (1), rather than their O(log n) bound that holds for independent binomial arrivals. (Of course it is conceivable that a modi cation of their protocol might do better.) Due

to the lack of space, we just sketch the result. From the paragraph immediately following the statement of Theorem 2, we see that pi(1) = O(qi) will suce to maintain the property that E[Wave] = O(1); the strong (doubly exponential) decay of pi (1) as i increases is unnecessary. In turn, by analyzing the recurrences presented by Lemmas 5 and 6, we can show that rather than the strong bound of (16), it suces if H(sb; t)  ?t (22) for some constant  large enough in comparison with k, and for a suciently small constant  > 0. We can then proceed by induction on i to show that pi(1) = O(qi ) (by showing that pi (t) = O((k + 1)?i?t?1)), which is all we need. Bound (22) can connote a very weak tail behaviour. In particular in the nite senders case, such a bound holds if packets arrive independently at di erent time steps, even if within each time step the (at most n) incoming packets have an `-wise independent distribution, for some large enough constant `. It is for this scenario that direct use of the protocol and analysis of [19] will mandate E[Wave] = n (1). In fact, such a requirement can be weakened further, to non-independent arrivals at each time slot. For any nite sequence of distinct time slots t1 < t2 <    < tm , consider the probability that the total arrival in each of these time slots is more than some given value beyond expectation. We would only need that this probability is at most some constant times the corresponding probability had the arrivals been independent at these time slots with the weak tail distribution of (22). Such situations occur commonly in \negatively correlated" cases. For instance, suppose a total of at most N packets, for some large N, can arrive into the system, each arriving independently at a time chosen uniformly at random from [N]. Note that the arrivals at di erent time steps are not independent, but that they do satisfy the above negative correlation property.

6 Conclusions and open questions

We have shown that an assumption on exact or approximate time agreement can help MAC protocols signi cantly. Several open questions remain: (i) Can we get good delay versus arrival rate tradeo s in our model? Is there a good ne-tuning of our protocol or constants which ensures short delays for \small" values of ? (ii) In our model of time agreement, is there a stable protocol for all  < 1? If not, then what is the supremum of the allowable values for , and how can

we design a stable protocol for all allowed values of ? What is a \minimal" assumption that will ensure a stable protocol for all  < 1? (As described in the introduction, some sucient conditions are described in [18, 9] for certain models.) (iii) While approximate time agreement (as in the nite case) seems a reasonable assumption, exact time agreement seems too stringent. Can we eliminate or at least weaken this assumption? Acknowledgement. We wish to thank Michael Kalantar of Cornell University for explaining the practical side of this problem to us. We thank Prabhakar Raghavan and Eli Upfal for sending us an early version of their paper [19], and thank Phil MacKenzie, Greg Plaxton and Rajmohan Rajaraman for allowing us to use part of the LATEX source of their work [14]. Thanks are also due to the participants of a seminar at Carnegie-Mellon University, whose questions and comments helped us clarify some points.

References

[1] N. Abramson. The ALOHA system. In N. Abramson and F. Kuo, editors, Computer-Communication Networks. Prentice Hall, Englewood Cli s, New Jersey, 1973. [2] D. Aldous. Ultimate instability of exponential backo protocol for acknowledgement based transmission control of random access communication channels. IEEE Trans. on Information Theory, IT-33(2):219{ 223, 1987. [3] N. Alon, J. H. Spencer, and P. Erd}os. The Probabilistic Method. Wiley{Interscience Series, John Wiley & Sons, Inc., New York, 1992. [4] R. J. Anderson and G. L. Miller. Optical communication for pointer based algorithms. Technical Report CRI-88-14, Computer Science Department, University of Southern California, 1988. [5] H. Cherno . A measure of asymptotic eciency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493{509, 1952. [6] M. Dietzfelbinger and F. Meyer auf der Heide. Simple, ecient shared memory simulations. In Proc. ACM Symposium on Parallel Algorithms and Architectures, pages 110{119, 1993. [7] M. Gereb-Graus and T. Tsantilas. Ecient optical communication in parallel computers. In Proc. ACM Symposium on Parallel Algorithms and Architectures, pages 41{48, 1992. [8] L. A. Goldberg, M. Jerrum, F. T. Leighton, and S. B. Rao. A doubly logarithmic communication algorithm for the completely connected optical communication parallel computer. In Proc. ACM Symposium on Parallel Algorithms and Architectures, pages 300{ 309, 1993.

[9] J. Hastad, F. T. Leighton, and B. Rogo . Analysis of backo protocols for multiple access channels. In Proc. ACM Symposium on Theory of Computing, pages 241{253, 1987. To appear in SIAM J. Comput. [10] W. Hoe ding. Probability inequalities for sums of bounded random variables. American Statistical Association Journal, 58:13{30, 1963. [11] J. Y. N. Hui and P. A. Humblet. The capacity region of the totally asynchronous multiple-access channel. IEEE Trans. on Information Theory, IT-31:207{216, 1985. [12] Special issue of IEEE Trans. on Information Theory, IT-31, 1985. [13] F. P. Kelly. Stochastic models of computer communication systems. J. Royal Statistical Society (B), 47:379{395, 1985. [14] P. D. MacKenzie, C. G. Plaxton, and R. Rajaraman. On contention resolution protocols and associated probabilistic phenomena. In Proc. ACM Symposium on Theory of Computing, pages 153{162, 1994. [15] C. McDiarmid. On the method of bounded di erences. In Surveys in Combinatorics, London Math. Soc. Lecture Notes Series 141, pages 148{188, Cambridge University Press, 1989. [16] R. Metcalfe and D. Boggs. Ethernet: distributed packet switching for local computer networks. Communications of the ACM, 19:395{404, 1976. [17] V. A. Mikhailov and T. S. Tsybakov. Upper bound for the capacity of a random multiple access system. Problemy Peredachi Informatsii, 17:90{95, 1981. Also presented at the IEEE Information Theory Symposium, 1981. [18] N. Pippenger. Bounds on the performance of protocols for a multiple access broadcast channel. IEEE Trans. on Information Theory, IT-27:145{151, 1981. [19] P. Raghavan and E. Upfal. Stochastic contention resolution with short delays. In Proc. ACM Symposium on Theory of Computing, pages 229{237, 1995. [20] L. G. Valiant. General purpose parallel architectures. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A, pages 943{971. Elsevier, New York, 1990. [21] N. D. Vvedenskaya and M. S. Pinsker. Non-optimality of the part-and-try algorithm. In Abstracts of the International Workshop on Convolutional Codes, Multiuser Communication, Sochi, USSR, pages 141{ 148, 1983.