How fair is fair queuing - UCCS

1 downloads 0 Views 2MB Size Report
A. G. GREENBERG AND N. MADRAS. 19a= min{ha, Of.,,}, where dfa,, is determined by the constraint Z fl= ~0. = 1. A stream a whose arrival rate ha lies below.
How

Fair Is Fair Queuing?

ALBERT

G. GREENBERG

AT& T Bell Laboratories,

Murray

Hill,

New Jerse.v

AND NEAL York

MADRAS University,

North

York,

Ontario,

Canada

Abstract. Fair Queuing is a novel queuing discipline with important applications to data networks that support variable-size packets and to systems where the cost of preempting jobs from service is high. The disciphne controls a single server shared by N job arrival streams with each stream allotted a separate queue. After every job completion, the server is assigned to serve, without possibihty of interruption, the job at the head of one of the queues (as soon as at least one job appears in the system). Fair Queuing is designed to handle arbitrary job arrival sequences with essentially no a priori knowledge of their attributes. such that each stream receives its ‘Lfam share” of serwce. In this paper, we consider two variants of the fair queuing discipline, and rigorously establish their fairness wa sample path comparisons with the head-of-line processor sharing disclphne, a mathematical idealization that prowdes a fairness paradigm. An efficient Implementation of one of the fair queuing disciplines is presented, In passing, a new, fast method for simulating processor sharing is derived. Simulation results are presented to further explore the comparison between fair queuing and processor sharing. Categories and Sub]ect Descriptors: C.2.4 Systems—nehtwk operaring systems; D.4.1 [Operating Systems]:Reliability —uer~ication,

[Computer-Communication Networks]: Distributed [Operating Systems]: Process Management; D.4.5 D.4.8 [Operating Systems]:Performance—queuing

theon

General Terms: Design, Management,

Performance,

Theory,

Verification

1. Introduction The queuing discipline head-of-line processor sharing ( PS) provides an appealing paradigm for the fair sharing of a service. To define PS, consider the model depicted in Figure 1. There is a single server, and N job arrival streams, each feeding a different first in, first out queue. In the PS discipline, in any interval of time when exactly k > 0 queues are not empty the server serves A short version of this paper, which omits the analysis in Section 3, appeared m the Proceedings of Performance ’90 under the title “Comparison of a Fair Queueing Discipline to Processor Sharing” (Edinburgh, Scotland: September 1990; North Holland, Amsterdam, The Netherlands. pp. 193 -207). N, Madras research was support m part by a University and Engineering Research Council of Canada.

Research Fellowship

from the Natural Science

Authors’ current addresses: A. G. Greenberg. Room 2C- 119, AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, N.J. 07974; N. Madras, Mathematics Dept.. York University, 4700 Keele Street, Downsview, Ontario M3J 1P3, Canada. Permission to copy without fee all or part of this material is granted prowded that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is gwen that copying is by permission of the Assoclatlon for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 01992 ACM 0004-5411 /92/0700-0568 $01.50 Jourml

of the Absockmm

for Computmg

M..hmery,

Vol

39. No

3, July 1992. PP 568-59S

How

Fair

is Fair 2

1

Queuing?

*o*

— — —

569

N — — — FIG. 1.

Queuing model

w

Q the job at the head of each simultaneously at rate proportional to 1/k. This policy can be easily generalized to provide priority or discriminatory service to the queues [8, 15]. In recent years, evidence has been mounting that queuing disciplines approximating the PS discipline can play a crucial role in the congestion control of large, scalable data networks. Other applications are mentioned below. In data networks, the server corresponds to a single transmission line between nodes in the network, and the N arrival streams to N source-destination pairs passing data over the line. Keeping a separate queue for each stream puts up ‘‘ firewalls” protecting well-behaved streams against streams that might otherwise saturate the transmission line and drive delays to unacceptable levels. The PS mechanism allots the line’s bandwidth fairly, and provides relatively small queuing delays for short communications. The control of isolated transmission lines does not in itself ensure the adequate control of the network as a whole. However, it has been found that mechanisms like PS, if appropriately emulated and combined with sensible buffer and rate control mechanisms, provide network-wide fair bandwidth allocation, protection small delay for short communications, against streams that would otherwise swamp the network, and stable overall network performance, especially if traffic is heavy [5, 7, 9, 13, 16, 17, 18, 20]. Though in widespread use, the simple first-in, first-out (FIFO) queuing mechanism often leads users, either singly or in unpredictable coalitions, to create bottlenecks with deleterious results throughout the network. Processor sharing was originally proposed as an idealization of time-slicing in computer systems [15], where the server cycles through the active jobs, giving each a small time quantum of service, and then preempting the job to work on the next. However, in data networks, the counterpart of time-slicing is not feasible. A job corresponds to a transmission of a packet, which is constructed at the packet’s source. It would violate the communications protocol if a node in the interior of the network were to cut short a packet transmission —preempting the job’s service—to switch to the partial transmission of another packet. How then is processor sharing to be “emulated”’ without preempting jobs in service? At a minimum, the emulation ought to retain the fairness of PS. In [5], Demers et al. tie fairness to throughput, and appeal to the fact that PS achieves max-rnin fair throughputs. Suppose that the server works at rate 1, and that stream a receives jobs at rate & in bits (the units of work) per unit time. Let or output rate of stream CYunder PS. If 6. ~ () be the achieved throughput h=

Xfl=lh.


1, then

570

A. G. GREENBERG AND N. MADRAS

min{ha, Of.,,}, where dfa,, is determined by the constraint Z fl= ~0. = 1. A stream a whose arrival rate ha lies below the threshold of.,,, achieves throughput &; otherwise it achieves throughput 0 ~,,,. This is equivalent [51 to: (i) no de >0 is greater than &, (ii) no other 19’s satisfying condition (i) have a higher minimum value, and (iii) condition (ii) remains recursively true as we remove the minimal stream and reduce the service capacity accordingly. A second requirement for a proper emulation of PS is that it handle arbitrary distributions of job sizes. Many data networks allow packets (jobs) of variable size [19], for reasons touched on briefly below. In particular, the Internet Protocol (1P) [19], now in very widespread use throughout the world, handles a range of packet lengths: Short packets arise from users in remote login sessions (TELNET, the TCP/IP remote terminal login protocol) and long packets arise from users transferring files (FTP, the TCP/IP File Transfer Protocol). Variable length packets rule out emulating PS by serving jobs round robin; that is, cycling through the nonempty queues, in one cycle serving just the job at the head of each. If the streams are not homogeneous, then round-robin is not fair. Consider, for example, two streams each with an infinite backlog of packets. Unlike PS, under round-robin a stream can increase its throughput by simply combining several consecutive short jobs into a smaller number of long ones. In [5], a new queuing discipline was introduced, termed fair queuing, which attempts to emulate the PS discipline, handles variable job sizes, and never preempts a job in service. In this paper, we analyze two variants of the fair queuing discipline, and prove (Section 3) that these disciplines indeed emulate PS in a strong sense, which implies, in particular, that the disciplines provide the same throughputs as PS. Roughly, we couple the fair queuing system to the PS system by considering the behavior of each on the same arrival sequence, and then argue that the discrepancy between the two systems remains bounded. Our main results hold for general arrival streams to the N queues. We believe these results, linking fair queuing to PS, give a satisfactory demonstration of the fairness of the fair queuing disciplines. In Section 4, we further explore the relationship between PS and fair queuing using simulation, focusing on job delay or response time. The two variants of the fair queuing discipline we consider are termed

19a=

— FQS, fair queuing based on starting times, which is proposed here, and — FQF, fair queuing based on finishing times, which is the discipline proposed in [5]. To implement FQF, the length of a job must be known at the time it arrives. In the data network application, a packet’s length is available in a “header” preceding the data portion of the packet. No such knowledge is needed to in implement FQS. Thus, FQS might find use outside of data networks, systems, such as some computer systems, where processor sharing service is desirable, but job lengths are unknown a priori and the cost of switching between partially completed jobs is high. (The higher the switch-over cost, the greater the inefficiency of time-slicing.) An efficient implementation of the FQF discipline is presented in Section 2.6. In passing, a new efficient method for simulating head-of-line processor sharing is derived, in which just O(log N) time is needed to simulate each arrival or departure event. (The method also applies to usual processor sharing [15], the discipline where all jobs present continuously share the server.) This improves

How

Fair

is Fair

Queuing?

571

on the natural simulation method, which requires 0(N) -time to simulate each event. In our model, the number of streams N is assumed to be fixed. In the data network application, each queue corresponds to a virtual circuit. It is reasonable to expect the number of virtual circuits to vary much more slowly than the rate of traffic over those circuits. To get at the question of how our results carry over to scenarios where N varies, in Section 3.5, we treat N as a slowly increasing function of time, and establish conditions under which the fair queuing disciplines continue to emulate PS. In [5], simulation results were presented on the performance of small networks using either the FQF discipline or the FIFO discipline to control transmission lines. The FIFO discipline’s potential for single users to create egregious bottlenecks comes through. FQF combined with simple rate control algorithms [3] is shown not only to beat FIFO in fairness, but also in overall performance: throughputs are higher, fewer packets are dropped because of buffer overflows, and delays are smaller. There are plans to implement fair queuing in an experimental Internet testbed termed Dartnet (S. Shenker, personal communication). Last, we remark that although some important existing protocols, such as 1P, allow variable-size packets, other protocols, such as those in Datakit network [6], do not. If packets all have the same size, then there is little to choose with regard to fairness between the fair queuing disciplines and round-robin. However, there is a compelling case for supporting a range of packet sizes: —Useful data networks support a wide variety of traffic types, including bursty flows of short messages (e.g., remote login traffic) and regular flows of long messages (e.g., file transfer traffic). —In high-speed networks, the cost of packet processing limits the performance available to users [12]. This remains true even when the minimal amount of end-to-end protocol processing is performed. For a fixed message size, the smaller packet size implies higher processing costs, which reduces the performance available to a user. Chopping a large message into several short packets increases the message’s processing cost at each switch these packets are routed through. Although processor cycle times keep improving, with communication networks moving to the gigabit range, we expect packet processing to persist as a bottleneck. From this perspective, a large packet size is preferable for the traffic types that have regular flows of long messages. This is especially true in datagram or connectionless protocols, must be attached to each packet for where a relatively large “header” routing and control purposes. —It is better to transmit a small message in a small packet, because packing the message into a large packet (padded with blank space) is wasteful of bandwidth. An argument against allowing a range of packet sizes is that this complicates the design of the high-speed switches used to route the packets in the network [10]. In addition, if the queuing discipline is FIFO or round robin and a range of packet sizes are supported, then it is inevitable that bursty flows of small messages suffer unduly large delays, losing out to flows of long messages. To counter these arguments, we note first that packet switches handling variable packet sizes are now in wide use and even faster ones are being developed

572

A. G. GREENBERG AND N. MADRAS

[2, 4, 11]. Second, as the data of [5] and Section4 show, if fair queuing is used, then small messages see relatively small delays. Kent and Mogul [14] give further arguments on the pros and cons of variable packet sizes, addressing, in particular, error recovery and message reassembly lags. We expect networks supporting a range of packet sizes will persist, as will the need for fair queuing and similar control mechanisms. More generally, we believe that head-of-line processor sharing provides a simple, useful paradigm for the fair sharing of a common resource, though it is a mathematical idealization that cannot be implemented directly. The fair queuing disciplines discussed here are the only disciplines we know of that provably simulate head-of-line processor sharing, without ever preempting a job in service. As noted above, these disciplines may find use in systems, such as some computer systems, where fairness is desired but the cost of preemption is high. 2. Queuing

Disciplines

In this section, we first briefly review the head-of-line processor sharing discipline, introduce some notation, and present the fair queuing disciplines. We then state our main results with the proofs deferred to Section 3. Next, an efficient implementation of FQF is presented. Generalized counterparts of the disciplines providing discriminatory service to the queues are defined in the Appendix. 2.1. HEAD-OF-LINE PROCESSOR SHARING. The setup is depicted in Figure 1. A job that starts service at time t~ and finishes at time tz will be considered to have occupied the half-closed interval [ tl, t~). Let IV&(t) denote the number of active users, i.e., those with nonempty queues at time t; O s N~Ct(t) s N. Let ~,” and P,a denote the time at which the ith job arrives to user u and the length of service the job requires, respectively; i = 1, 2, . . . ; a = 1, 2, . . . . N. Job lengths are counted in bits. The server works at a rate of 1-bit-per-unit-time. Under the head-of-line processor sharing discipline (PS), the server is shared equally among the jobs at the heads of the queues; that is, each receives service at rate 1/NaCt( t). There is a natural virtual time clock associated with PS, which runs at the rate of service seen by a job at the head of a queue. Define virtual time R(t) corresponding to real time t by 1 Ii(t) = :R(t)

(2.1)

= max{l, N..,(t)}“

Since R is strictly increasing, it has an inverse, R – 1, which maps virtual time to real time. We need to discuss where events lie both in real and in virtual time. We always explicitly use the word, virtual, when referring to virtual time. Figure 2 gives an example of a PS sample path for N = 3 users, named a, ~, and y. Two jobs arrive to user CYat times ~~ = O and I-; = 2, two arrive to user fl at times 7 P ~ – — 1 and ~? = 2, and one arrives to user T at time r? = 3. The job lengths are, respect&ely, Pr = 3, P~=l, P~=l, Pf=4, and P: = 3. In the interval [0, 1), queue o- is the only queue active, and so receives one unit of service, and the virtual time R(1) = 1. Thus, user a‘s first job now requires exactly two additional units of service. In the interval [1, 3), queues u

How

Fair

is Fair

t

R(t)

o

o

1

1

2

Queuing?



573 Queue /3

Queue a

Queue y

P:=l F;=4

3

2

4

5

6

3

7

8

9

4

10

11

5

12

6

FrG. 2. There are three streams, a, /3, and y. The queuing mechanism is PS. The boxes mark job arrival times. The S, F, and P values are the associated virtual starting times, virtual finishing times and job lengths. The line segments mark the time intervals during which the jobs are served.

and 6 are active, so each receives service at rate R = 1/2 and accumulates one unit of service in this interval, leading to l?(3) = R(1) + 1/2 “ (3 – 1) = 2. User 13’s first job is thereby completed. User a‘s first job now requires one additional unit of service. User a‘s second job, arriving at time t = 2, waits in queue u, thus far receiving no service. User (3’s second job, also arriving at time t = 2, goes into service at t = 3, as service completes for the first job. In the interval [3, 9), all three queues are active, so each receives service at rate R = 1/3 and accumulates an additional two units of service in this interval,

574

A. G. GREENBERG AND N. MADRAS

leading to R(9) =R(3) + 1/3 o (9 – 3) = 4. This suffices to complete the service of both of user a’s jobs (at t = 9). At f = 9, user 6’s second job now requires two additional units of service, and user T‘s first job requires one. In the interval [9, 11), just queues D and T are active, so each receives one unit of service; hence, R( 11) = 5. This leaves user (3’s second job with one unit of service still remaining, which it receives in the interval [11, 12). Demers et al. [5] introduced the following recurrence relations, which succinctly describe how the PS system evolves in virtual time. For each i=l ,2,... anda=l~ N, let S,” and F’,” denote the virtual times at ,-, ..., which the ith job of user a starts and finishes. Then

where S: = F: = O. Equation (2.2) states that the ith job for queue a starts service either when it arrives (virtual time R ( r,a)) if queue a is idle on the job’s arrival, or when the previous job finishes (virtual time F,: ~) if the queue is busy on the job’s arrival. Equation (2.3) reflects the fact that one unit of virtual clock time is needed to transmit one bit from each active queue. This equation also follows from integrating (2. 1). It is remarkable that, via (2.2) and (2.3), we can compute a job’s virtual finishing time the moment it arrives, given its length P?. We cannot compute a job’s real finishing time, R-1 (~~), on arrival because the real finishing time depends on future arrivals. Virtual starting and finishing times are shown for the example depicted in Figure 2. The reader is especially encouraged to check that the virtual finishing times calculated via the recurrence relations match the virtual times (the R values) at which the jobs actually finish. Note that the PS system is different than the (more well known) M/G/1 processor sharing queue [8, 15], in which all jobs present are in service; none wait in line. FAIR QUEUING. 2.2. As mentioned in the Introduction, the motivation for fair queuing (FQ) is to provide a service discipline that attempts to emulate processor sharing without ever preempting a job in service. We consider two FQ disciplines: fair queuing based on starting times (FQS) and fair queuing based on finishing times (FQF). To implement FQF a job’s service requirement P,” must be known at the time the job arrives. FQS requires no such knowledge. Like PS the FQ disciplines are work conserving, meaning that the server is never idle while there is work in the system. To define the FQ disciplines, suppose that a service is completed at time t.at which point there are k nonempty queues, O s k s N. If k = O, then the next job to arrive enters service immediately on arrival, breaking ties arbitrarily. If k >0, then a job from among those waiting enters service immediately. In FQS, the job selected is the one that would start earliest under the PS service discipline (again, breaking ties arbitrarily); viz, the job among all those waiting with the smallest virtual starting time S,e. In FQF, the job selected is the one that would finish earliest under the PS service discipline; viz, the job among all those waiting with the smallest virtual finishing time F,”. Note that FQF is not the same as shortest-job-first.

How

Fair

is Fair

Queuing?

575

Thus, to implement the FQ disciplines, we need to compute virtual starting and finishing times on the fly as if PS were running. This means that when running FQ, A&t(t) must be computed as if PS were running, that is, Al,,t( f) is the number of nonempty queues in the PS system at time t on the same arrival sequence. In Section 3, we prove the information needed for these computations is always available to the FQ algorithms when needed. In Figure 3, we have superimposed the FQS and FQF sample paths on the PS sample path for the example depicted in Figure 2. The solid lines indicate the starting and finishing times of jobs under the PS discipline. Of course, these lines agree with Figure 2. The dashed lines indicate the starting and finishing times of jobs under the FQF discipline, and the dotted lines these times under the FQS discipline. To make this clear, consider FQF. At time t = O, the only job present is user u‘s first job, so this is put into immediate service and is served to completion in the interval [0, 3). Among the jobs present at t = 3 (user CY’s second job, user (3’s two jobs, and user T‘s first job), user ~‘s first job has the smallest virtual finishing time among those present, Ff’ = 2, and so becomes the next job scheduled, and is served to completion in the interval [3, 4). Continuing to select the job with the smallest virtual finishing time, user CY’ssecond job is served to completion in the interval [4, 5). user -y’s first job in the interval [5, 8), and finally user @‘s second job in the interval [8, 12). A number of interesting features of the service disciplines come through in this example. A given job’s service periods in the PS and the FQF systems need not overlap. User u‘s second job starts and finishes under FQF before it starts under PS, and user ~‘s first job starts and finishes under PS before it starts under FQF. Similarly, a given job’s service periods in the PS and the FQS systems need not overlap. Under FQS, user a‘s second job starts much later than it finishes under PS. However, it can never happen that a job starts under FQS before it starts under PS; see the proof of Proposition 3.1. Indeed, otherwise the definition of FQS would be inconsistent. Also note that a given job’s delay may be either smaller or greater under one of fair queuing algorithms than under PS. 2.3. consider

MAIN the

RESULTS. behavior

To of

each

compare on

the

the same,

FQ

and

arbitrary

PS

queuing sequence

disciplines, of job

we

arrivals.

discrepancy bound that holds under The key result (Theorem 2.1) is a certain the assumption that there is a maximum job size: Using the terms of the data network application, the number of bits transmitted from a given queue in FQ never differs by more than the maximum packet size from the number of bits transmitted from that queue in PS, whatever the actual job arrival sequence happens to be. It turns out that the difference in throughputs between PS and FQ tends to zero as time increases, for a broad spectrum of cases in which the job sizes are not bounded. We obtain corollaries comparing FQ and PS queue lengths and delays, and obtain results for cases where the number of users N increases with time. To state the results, we introduce the following notation: For a specified queuing discipline D, let #~( S, t) denote the units of service (not necessarily an integer) given to a queue a during the real time interval [s, t).In the data network application, fl~(s, t) represents the number of bits transmitted from queue CYin the interval [s, t). Throughout this section, FQ denotes either FQS or FQF.

576

A. G. GREENBERG AND N. MADRAS

t

R(t)

o

o

Queue~

Queue a

,

Queue 7



e

1

1

● ●

e ●

2

e ●

3

2

q=2

F;=6



P;=3 ●

F~=5



e

4

I I

5

I

6

I

3

I I I

7

I I

I I

8

I

I

9

I

4

I I I

10

I I

11

I

5

I I

12

I

6

FIG, 3. The FQF and FQS sample paths The boxes mark job arrival times. The S, virtual fimshmg times and job length~. The are served (solid for PS, dashed for FQF,

are F, line and

superimposed on the sample path for PS, from Fig. 2, and P values are the associated vmtual starting times. segments mark the time intervals during which the jobs dotted for FQS),

Suppose there is a maximum job size P~QX. The following THEOREM 2.1. bound holds for all possible realizations of the arrivals process, for every time t and for every queue a:

IGS(O,

t) -

EQ(O, t)[< Pmax.

(2 .4)

It is not hard to see that this bound is the best possible, not only for FQ, but also for any queuing discipline where the server must work on exactly one job

How

Fair

is Fair

Queuing?

5’77

at a time and must not interrupt a service once it has begun. For example, suppose P~,X = 1, and suppose that a job of size 1 arrives at each of N queues at time O. If FQ chooses to serve the job at queue u first and the job at queue u last, then j~~(O, 1) = 1 and &(O, 1) = l/N, and also &(O, N – 1) = O and &(O, N – 1) = (N – 1)/N. Theorem 2.1 will be proven for FQS in Section 3.3, and for FQF in Section 3.4. Theorem 2.1 has several consequences. Since the difference between what enters a queue and what departs equals what remains in the queue, it follows immediately that for every a and t

where Q~s( t) denotes the amount of work in queue u at time t under PS and Q~~( t) the corresponding quantity for FQ. Another immediate consequence of Theorem 2.1 is that for every a

(2.5) with probability one; that is, the two systems have identical limiting throughput. A third consequence is a bound on the difference in delay that a given job can experience under PS as opposed to FQ. 2.2. Let 4,” denote in FQ on the ith job from

THEOREM

pleted

IR(4:) In particular,

this implies

that for

the (real) time at which service is comqueue a. Then for every i and a, - fil



t

f) _

#;Q(03 t) t

converges to O and the delay discrepancy

(where N(t) is the number of queues at time t) grows more slowly power of t. Our basic assumption on the arrivals processes is that finite set of probability distributions, and that all the jobs arriving queue are selected from one of these distributions. We do not need that job sizes are independent, however. For example, the following

than some there is a at a given to assume scenario is

578

A. G. GREENBERG

AND N. MADRAS

allowed: Queue 1 receives its first job with length chosen from the third probability distribution, say, and then every subsequent job arriving at queue 1 is the same size as the first. We need the mild assumption that there is a fixed constant A such that the arrival rate at any queue is at most A. Specifically, there exists a constant A such that the probability that some queue has more than At arrivals during [0, t] for some t > T tends to O as T tends to co. This holds, for example, if arrivals are Poisson processes with bounded rates and N(t) grows polynomially in t. THEOREM2.3. have finite Kth every a). (i) (ii)

Assume moments

that the probability distributions for job size all for some fixed K > 1 (i. e.. E( P;) K < m for

then lim+mA(t) = O. IfN(t) = O(t ‘-1), If q > l/K and ifN(t) = O(t(”&-l)J(~+ l)), In particular,

if N is constant,

then limt+~D(t)/tq

then limf +~ II(t)

= O.

/t’/K = o.

Roughly, part (i) of Theorem 2.3 states that if the number of users does not grow too quickly, then the FQ and PS disciplines attain the same limiting throughputs with probability 1. The delay information provided in part (ii) of the Theorem may be of use in a heavy traffic analysis of FQ. The proofs of Theorems 2.2 and 2.3 appear in Section 3.5. 2.4. IMPLEMENTATION. To implement the FQ service disciplines, we need the virtual starting and finishing times of the jobs computed as if the service discipline were PS. In this sense. any implementation of the FQ disciplines must embody a deterministic simulation of PS. Unfortunately. the cost of the straightforward event driven simulation of PS (sketched below) is 0(N) per event, because after each event an update must be made to the state of each of the active queues, and there are N queues. We present a simple simulation method based on the recurrences (2.2) and (2.3) that reduces the cost to O(log N). An efficient implementation of FQF will be presented. FQS can be implemented similarly, without a priori knowledge of job lengths. The methods described below require priority queues, data structures supporting the operations insert, find minimum, and delete minimum [1]. In our applications, a priority queue holds at most N values. There are numerous efficient implementations of priority queues (e.g., heaps) that cost just O(log N) time per operation [1]. 2.5. SIMULATING PS. What makes simulating PS nontrivial is that the real times at which jobs complete depend on future arrivals to the system. The natural solution is to record for each job in service the amount of remaining service required. N.,t, the number of active jobs, must also be recorded. Suppose that the last event occurred at time t. If the event is a service completion that leaves the system empty, then the next event is the next arrival to the system. Otherwise, suppose that, immediately after t, the job with the least remaining service requires 1 additional units of service, and the next job to arrive is scheduled to arrive at time a. Neglecting arrivals, the completion would occur at time f = t + 1 “ N~Ct. Moreover, an arrival in the interval [t, f] cannot decrease the real finishing time of any job present at time t.Hence, if a < f, then the next event is the arrival at time t‘ = a. Otherwise, the next event is the completion at t‘ = f. To update the state after the next event,

How

Fair

is Fair

Queuing?

579

(f’ – t)/N,C, units of service must be subtracted from each of the other active jobs. Since there may be as many as N of these, the cost per event is 0(N). Using the recurrence relations (2.2) and (2. 3), we can do better. In addition to keeping track of the real time t of the last event, we keep track of the corresponding virtual time R = R ( t), A priority queue is used to hold the virtual finishing times of the active jobs. Suppose that at time t the system is not empty, and that F is the smallest virtual finishing time among the jobs present. Neglecting arrivals, this job would finish at real time j = t + (F – R) . N,C,. Thus, to determine whether an arrival or a service completion is the next event, we compare f to the next arrival time a and set the time of the next event t‘ = rein{ f, a}, as before. 1? is updated to R‘ = R + (t’ – t) /max{ 1, NaC,}. A second priority queue may be used to keep track of next job arrival times. In this way, we have replaced the 0(N) cost update of the remaining service times for each of the N queues with an 0(1) cost update to the single quantity R(t). As a result, the cost of processing an event is dominated by the priority queue operations, bringing the net cost down to O(log N) per event. 2.6.

IMPLEMENTING

FQF.

Our

implementation

of

the

FQF

discipline

em-

of PS. A real clock is used to bodies a (low-overhead) discrete event simulation keep track of real time. The role of the PS simulation is simply to generate virtual time. The implementation deals with three types of events: new job arrivals, actual departures of jobs following the FQF discipline, and simulated Actual arrivals and departures are departures following the PS discipline. assumed to trigger corresponding signals in real time. PS departures may be simulated while handling actual arrivals and FQ departures. Two sets of data structures are maintained: One for the actual FQF system and one for a simulated PS system. Both sets include: —for each queue u, a FIFO buffer holding the pointers to jobs that are awaiting service at queue a but are not at the heads of the queues, and — a priority queue holding pointers to the jobs at the heads of the nonempty queues, and keyed by the virtual finishing times of these jobs. In addition, the PS data set includes, for each queue CY,the virtual finishing time of the last job to arrive to that queue. Buffer space used to store the data associated with a job may be reclaimed as soon as the job completes service in the actual (FQF) system, and other memory records associated with the job may be reclaimed after it has completed service in both systems. Variables R and N,C, are maintained for the PS system and are updated on each job arrival and on each simulated job departure in IN, as described in Section 2.5. To simulate services in the PS system, on each I% departure or new job arrival, the priority queue in the PS data set is consulted to obtain the smallest virtual finishing time F extant, and the corresponding service completion is scheduled for time f=t+(F-R). N,C,. As noted above, the implementation treats three types of events: arrivals, actual departures, and simulated PS departures. Suppose the last event occurred at time t and the next occurs at time t‘ a t.If the event at t‘is an arrival, then the job’s virtual starting and finishing times are computed (consulting the job’s length and the virtual finishing time of the last job to arrive to this queue) and pointers to the job are inserted in the two data sets. If this job finds the actual

580

A. G. GREENBERG

AND N. MADRAS

system empty, then it goes immediately into service. If the event at t’ is a departure inthe actual system, then thejobis removed from the FQF data set. If some jobs remain to be served in the actual system, then the job with the least virtual finishing time (located by consulting the priority queue in the FQF data set) is put into service immediately. If the event at t‘ is a departure in the PS system, then the job is removed from the PS data set, and the service of the next job in the PS system is scheduled, if at least one job remains. In computing the scheduling decisions the basic tasks are a few priority queue operations, so the total work per event is O(log N). Each job processed event, a simulated PS departure. generates exactly one “extra” 3.

Analysis

3.1. PRELIMINARIES. In this section, we establish some notation and provide a formal statement of the FQS and FQF disciplines. Most of the notation that we use was introduced in Sections 2.1 and 2.3. The queues are numbered 1,2, . . . . N. The number of the queue will be denoted by a Greek letter (a, B, . . . ), while the number of a job will be denoted by a l,..., N, let na = n~Qs( t) (respecRoman letter (i, j, . ..). Foreacha= tively, n. = na‘QF( t)) be the number of jobs belonging to queue a whose service has finished at time t or earlier in FQS (respectively, in FQF). We now describe the FQS algorithm using the n. notation; FQF will be treated later. Suppose that a service is completed in FQS at time t, and that there are k nonempty queues at time t (so O < k s N). For concreteness, we assume that queues 1, 2, . . ., k are nonempty. Let n. = n LQS(0 for each a= l,..., N. Then, FQS chooses the next job as follows: (a) Ifk> 1. Serve the job whose starting time in PS would have been the earliest among all of the jobs now waiting. That is: Let 6 be the index such that (3.1) (if there is Then, serve section, we available to

more than one minimizer, then choose arbitrarily among them). queue ~ next, starting immediately (i.e., at time t). (In the next show that (3 can indeed be calculated using only information that is the FQS system at time t.)

(b) If k = O.

Serve the next arrival.

That is, choose T so that (3.2)

and begin serving queue y at time T; +*. If there is a tie, choose a job with the earliest starting time in PS. (It will bd seen later that if there is a tie, then all of these jobs have the same starting time in PS, so the tie will have to be broken arbitrarily. See Lemma 3.1 below. ) Now for FQF. Again, suppose that a job finishes in FQF at time t and that queues 1,2, ..., k are not empty at that time. Let n. = n~QF( t ). Then FQF chooses the next job as follows: (a’) Ifk> 1. Serve the job whose finishing time in PS would have been the earliest among all of the j obs now waiting. That is: Let 13be the index such that (3.3)

How

Fair

is Fair

(if there is Then serve section, we available to

Queuing?

581

more than one minimizer, then choose arbitrarily among them). queue ~ next, starting immediately (i.e., at time t). (In the next show that (3 can indeed be calculated using only information that is the FQF system at time t.)

(b’) If k = O. Serve the next arrival. That is, choose ~ so that (3.2) holds and begin serving queue -y at time ~: + ~. If there is a tie, choose the job with the earliest finishing time in PS. ‘ 3.2. to

CONSISTENCY

prove

always

that be

demonstrates

FQF

OF THE ALGORITHMS. and

FQS

implemented this

for

are

using

based

available

The on

main

purpose

well-defined

of this

algorithms

information.

Our

first

section that

is can

proposition

FQS.

PROPOSITION 3.1. Suppose that a service finishes in FQS that queues 1 through k are nonempty at time t, where k > be calculated from formula (3.1) using only information that the FQS system at time t. (Recall that we know P? at i < na.)

at time t, and 1. Then @ can is available to time t only if

PROOF. At the (real) time t in FQS, we know all arrival times ~~ which are less than or equal to t, and we know the sizes of all jobs that have been completed by time t (i. e., for every u and every i = 1, . . . . na, we know P,”). Using this information, we can reconstruct the events of PS up to and including the virtual time R(t), unless service (in PS) begins on the (n ~ + l)th job of some queue u before the virtual time R(t). Thus, we know everything about the PS process up to and including the virtual time

min

{

R(t),

S~l+l,

...,

s~+

, ; }

in particular, we know which of these N + 1 quantities Therefore, D (see (3. 1)) is known at time t if

attains this minimum.

So the proposition will be proven when (3.4) is proven. We prove (3.4) by contradiction. Consider a realization for which (3.4) is false at time t, with k >1. (As usual, we assume that at time t, queues 1 through k are nonempty and the rest are empty.) Then, R-’(S:

+,) a

> t

for

every a=l,

. . ..k.

(3.5)

k, at time t:since queue a is not empty in FQS, a‘s last Foreacha= l,..., completed job could not have been the last job that arrived at a: that is, we with (3.5) and (2.2) (taking must have ~~ + ~ ~ _ t. Using this observation i = na + 1 inc~2.2)), we obtain

R-’(W > t

for

every a=l,

. . ..k.

(3.6)

Let to = sup{s s t:PS is empty at virtual time R(s)}; so tO is the starting time of the current busy period in PS (or t, if PS is currently empty). Let L ~ = rein{ i: T,” > tO}; so L ~ is the index of the first job to arrive at queue a Now, for any queue a = 1, . . . . k: during the in the time interval [to, +co).

A. G. GREENBERG AND N. MADRAS

582

time interval [to, t), the PS discipline could not have worked on any of a‘s jobs that arrived before job L ~, and it was not able to finish job n. during this interval (by (3. 6)). Therefore,

l-k(h)

0

forall

t

a=k+l,

. . ..N.

(3.8)

Therefore, during the time interval [ tO, t), the PS discipline could not have worked on any of queue a‘s jobs that arrived before La or after n ~, and so ifs(~o,

f) =

5 ~:

forall

a=k+l,

. . ..N.

(3.9)

,=L .

Since PS is not idle during [to, t),the total amount of work processed during this interval must be t – to;that is, ti;s(h), f ~=1

t) =

(3.10)

t-to.

Since job La arrives at queue a at time to or later, and since job na has been completed by a by FQS no later than time t (by definition), we know that queue a must have had its jobs L&, Lm + 1, . . . . n ~ all completely processed during [to, t);that is, for all But now eqs. (3.7),

(3.9),

(3.10),

a=l,

. . ..N.

(3.11)

and (3. 11) imply

N

~~,#;QS(fO, which is impossible.

This contradiction

t) > f -

proves that (3,4) must be true.

Remark. It follows from (3.4) that a job in PS. For, suppose the ith job of queue n.(t) + 1 = i, so we can infer from the S,” = rein{ S“na+l:l < a s k}, and so S,” s We now turn our attention

to,



never starts in FQS before it starts w starts at time t in FQS. Then, decision of FQS at time t that R(t).

to the consistency

of FQF.

Suppose that a service finishes in FQF at time t, and PROPOSITION3.2. that queues 1 through k are nonempty at time t, where k > 1. Then (3 can be calculated from formula (3.3) using only information that is available to the FQF system at time t. PROOF. At time t in FQF, ~; and Pi” are known for every a and i such that ~Z” < t, and therefore everything is known about PS in the (virtual) time interval [0, R(t)].

How

Fair

is Fair

Consider information

Queuing?

583

any CYfrom 1 to k. We show that F;+ ~ can be computed from available to FQF at time t.(i)On the ~ne hand, if S: + ~ s R(t),

then F,: + ~ can be computed immediately from equation (2.3). (ii) On the other hand, suppose S~u+, > R(t). We know that the ( na + l)th job at queue CYhas arrived at time t or earlier (because at time t in FQF, job n ~ has finished, and yet queue a is not empty by hypothesis); since the finishing time of this job in PS is greater than t,itfollows that queue a must be busy at real time t in PS. Let i be the job that is being served in PS at time t at queue CY (thus, S,” s R(t) < Fin). It follows that queue a must remain busy in PS, at least until job na + 1 is completed. This observation allows one to compute F~~,+, easily using virtual time:

Combining ❑ tion,

(i) and (ii)

from

the preceding

paragraph

proves

the proposi-

Now that we know that FQS and FQF are well-defined, we prove the following lemma, which applies to both disciplines. We use FQ here to denote either FQS or FQF. The PS system is busy at real time t (i. e., at virtual R(t)) if and only if the FQ system is busy at time t. In particular, every time t, LEMMA

3.1.

time for

Suppose the lemma is false for some realization of the arrivals PROOF. sequence. Let u be the first time at which one system is empty and the other is not. ( u is well-defined because of our convention that a job whose service starts at t, and ends at t2 occupies the half-closed interval [t,, t2). ) Clearly, u # O. Let X be the system which is empty at time u, and let Y be the other system. The fact that X is empty at time u implies that there is no arrival at time u. Let s = sup{ t s u: Y is empty}. Observing that Y must be nonempty at time s, we see that there must be an arrival at time s; by the above observation, then we infer s < u. Let @ be the collection of all jobs arriving during [s, u]; let P be the total length of all of the jobs in .!?. On the one hand, we know that Y began serving jobs in @ at time s, worked without pausing during [s, u], and had not yet finished at time u; therefore, P > u – s. On the other hand, X was able to serve all of @ during [s, u); therefore, P s u – s. This gives a contradiction, ❑ and proves the lemma, Remark. As the above proof shows, Lemma 3.1 is true for any two disciplines that are never idle as long as there is work present in the system. PROOF OF THE DISCREPANCYBOUND FORFQS. In this section, we prove 3.3. the discrepancy bound (2.4) for the FQS discipline. Throughout this section, n ~ denotes n~Qs, as defined in Section 3.1. Note that if a job from queue a is being served in FQS at time t,then it must be job number n ~ + 1 that is being served. (As usual, we suppress the t when there is no risk of ambiguity.)

584

A. G. GREENBERG AND N. MADRAS

The following lemma isakeyobservation forthe FQS discipline. It says that when the decision is made as to which job will be served next in FQS, the PS starting time of the chosen job is in fact the smallest among all jobs which have not yet been served in FQS, including those jobs that have not yet arrived. Equivalently, it says that Eq. (3.1) holds even when the minimum is over 1 ~ ~ ~ N. (Note that the analogous statement is not true for FQF. For example, if job A, of length 10, arrives at queue 1 at t = O, and job B, of length 1, arrives at queue 2 at t = 1; then FQF begins job A at t = O even though job B finishes first in PS.) LEMMA 3.2. Suppose that queue 8 is being served in FQS at time t. (Recall that it must be the ( na -t 1) th job that is being served.) Then for every a, (3.12) PROOF. Let u be the (real) time at which service began on the ( na + l)th job at queue 6 in FQS. Observe that since this job is still being served at time t, nti(u) + 1 = no(t) + 1 for every U. Now fix an a. (i) On the one hand, if in FQS at time u. Since the FQS ‘r~ + ~ < u, then queue ~ was not empty di~cipline did not choose to serve queue u at time u, it follows that (3. 12) must be true (see (3. 1)). (ii) On the other hand, suppose that u < r,~a+ ~. Equation (3.4) implies that S~,(U)+ ~ s R(u) (since queue 8 was not empty in FQS at

time u). Combining

the above information

shows that

(where the last inequality follows from the elementary ❑ S:). Again, we see that (3.12) holds. PROPOSITION

3.3.

For

every

observation

R( ~:)

s

a and t,

tih(o, f) -

#;QS(O> t) = -Prnax.

Remark. Lemma 3.1 implies that j~s(O, r) = f&(O, r) if the system is empty at time r. Therefore, in proving this proposition (as well as many of the similar results in this paper), it suffices to consider only the busy periods; that is, we may assume without loss of generality that the system is never empty during the time interval [0, t]. PROOF OF PROPOSITION 3.3. For OJ = 1, . . . ,Nand t >0, let mti(t) be the number of jobs at queue u whose service in PS begins no later than (real) time t; that is,

mm(t)

= max{i:S;

< R(t)}.

Fix arbitrary t and u. As explained in the above remark, we can assume without loss of generality that the system is never empty during the time interval [0, t].Let t~ be the time at which service begins on job m.(t) + 1 at queue a in FQS; then na(tl) Since queue R(fl)

+ 1 = mu(t)

+ 1.

a is not empty in FQS at time t~, it follows from (3.4) that Combining this with the preceding equation and the observa-

~ %(tl)+l.

How

Fair

is Fair

Queuing?

585

tion that S; ~t)+ ~ > R(t) (an immediate consequence of the definition of m ~), we conclude’ that R( ZI) > I?(t). Therefore, fl > t. During the (real) time interval [0, t)in PS, the first ma(t) – 1 jobs of queue a are known to have been completed; during [0, t,)in FQS, the first m.(t) jobs of queue a have been completed, and service on a‘s next job has not yet started. Putting these facts together:

= #;Qs(o,

~,)-

~ #gQS(05f)

pm.,

- Pmax (since t, > t).



This proves the proposition.

To simplify the proof of the other half of the discrepancy bound, we use the following lemma. It will also be used in the same way for FQF in the next section. LEMMA 3.3.

Let FQ denote

either FQS or FQF. f) -

II%(o)

I&j@,

f) =

If the bound (3.13)

Pmax

holds for every t and a such that t is a time at which FQ on a job from queue a, then it holds for every ~=~ ,. ... N.

service begins in t >0 and every

Consider any queue u and time t.Let s be the (real) time that work PROOF. on job na( t) + 1 begins at queue a in FQ. It suffices to show that !;s(0,

s) -

KQ(O, s) ~ H;,(O$ t) -

(3. 14)

KQ(O> t).

On the one hand, suppose that queue a is being served at time t in FQ. Then s is the time that the current service began, so s s t,and we immediately see that ti;(Jo> s) + (f !4;s(0>

~) +- (t

-s) = -s)

I.KQ(O, f)

~ j;,(o,

t).

Subtracting the equation from the inequality gives (3. 14). On the other hand, if queue a is idle at time t in FQ, then s > t, which implies that !;s(0, Also,

queue a is necessarily

idle between times t and s, so tt;Q(o,

Again,

subtracting

PROPOSITION

3.4.

s) ~ tgs(o, t).

s) = KQ(09

t).

the equation from the inequality For every

gives (3. 14).

a and t,

ti:s(o> t) -

#;Qs(03 t) = Pmax.



586

A. G. GREENBERG AND N. MADRAS

at time t a service begins in FQS at queue u. By PROOF. ~ssume that Lemma 3.3, if the bound can be proven under this assumption, then the proposition must be true in general. Also, as observed in the remark following Proposition 3.3, we can assume that the system is never empty during the time interval [0, f]. First, we show that s; Consider (nti(t))th

.

5 s“no+l

for every

any co. Let u be the time job of queue OJ.By Lemma

u=l,

at which 3.2,

. . .. IV.

service

begins

(3.15) in FQS on the

But no(u) + 1 = n@(t), and S; ~U)+l s S~atf)+l (since u < t); therefore, . (3. 15) is proven. Let t‘be the (real) time at which every queue a has finished its n .th job in PS; that is, t’ =R-l(max{F~’:l

< a

= P:

f) -

ti;s(o> q



5 F’max, And so the proposition Propositions

is proven.



3.3 and 3.4 together prove Theorem

2.1 for the FQS discipline.

In this section, we prove 3.4. PROOF OF THE DISCREPANCY BOUND FOR FQF. the discrepancy bound (2.4) for the FQF discipline. Throughout this section, n ~ denotes n~QF, as defined in Section 3.1. We use the new notation ai” (respectively, b;) to denote the real time at which service on the ith job of queue co is started (respectively, finished) in FQF.

How

Fair

is Fair

PROPOSITION

Queuing?

587

For every

3.5.

a and t, (3.18)

PROOF. Fix a time t. (Throughout this proof, n ~ without a time argument will denote n.(t). ) As usual, we can assume that the system is never idle during [0, t].For definiteness, suppose that the lth job from queue 1 is being served in FQF at time t (so I = rzl + 1). Suppose that queues 1, . . . . k were the queues that were nonempty in FQF when service was started on the current job (i. e., at the time o; < t).First, observe that since no job is completed in FQF between the times a; and t, we know rim(t) By considering sees that

the decision

= rzw(o~)

for every

of the FQF discipline

F) s F“nd+l

for all

co. at time

(3.19) O; (see (3.3)),

o = 1 ,. ..> k

one

(3.20)

(observe that we have used (3. 19) here). Also, we know that the (nO + l)th job of queue co has arrived at time a; for u = 1, . . . . k but not for ~ = k + 1, . . . , N. Therefore T“nu+l

< 0; —

for

ti=l,

while for a > k, we have r: + ~ > O;, which . ti;s(o> ~;)

= ,:,7

Define the virtual

= &QF(o, %’)

. . ..k.

(3.21)

implies

for

~=k+

l,...

, N.

(3.22)

time r by r=$”–p

max

(3.23)

.

Since F~” +, SS:+l+pmax, itfollows from (3.23) and (3.20) that u . r< s,~+, for all ti=l, . . ..k. .

(3.24)

First, we show that r s I/(t), and then we use this to complete the proof of the proposition. Assume that r > R(t). Combining this with o; s f and (3.24), we obtain

l?(a~)

Sl?(t)

< r.s

S~6,+1

forall

ti=l,

. . ..k.

Considering this with (3.2 1), we see that for each queue o = 1, . . . . k, the ( rzti + l)th job must have been present and waiting for service during the entire (real) time interval [o;, R - 1(r)] in PS; therefore, each of these queues must have been busy throughout this entire interval. Therefore, for o = 1, . . . . k:

588

A. G. GREENBERG AND N. MADRAS

Combining

this with (3.22),

f

we find that

d) < ~=1 f

#;s(”

~=1

‘~)

t$QF(O

But this is a contradiction of Lemma 3.1, and so r < R(t) is proven. Using (3.23) and r s R(t), we see that l?(f) z F; – P~aX, and so we have !&(o,

~) = #:,(o,

R-l(F;

- %x))

2 #:s(o, R-’(F;)) = ~

P;

- Pmax

- Pmay

j=l

~:QF(o, t) - Pmax

2

(the second inequality holds because at most one unit of work can be done from each queue during one unit of virtual time). This proves (3. 18) for the case in which queue OJis being served in FQF at time t.Also, by letting t increase continuously to o;, it is clear that (3. 18) also holds if a service at queue o finishes at time t in FQF. Therefore, (3. 18) holds for a given a and t if ‘i” ~ t < @~ for some i. To handle the remaining possibilities, suppose that d: < t < a,; ~ for some i (interpreting O: = O). Then ti;s(o,

f) ~ i?;s(o> 0:) ~ #;QF(03 0:) =

&QF(O,

‘)

- pm.. -

‘mu

(the second inequality follows from the preceding paragraph; holds because queue o is not busy during [~;, t) in FQF). (3. 18) holds for every o and t. ❑ PROPOSITION

3.6.

For

every

[%(o~ PROOF.

By

Lemma

3.3,

the final equality This proves that

a and t. ‘) -

it suffices

#;QF(O~

‘)

to prove


R(t),

(

R

s

P: i= 1

= #:Q,(o,

+

%,,))

~ -’(%+J) + ~max

+ Pm,, t) + pm,. !

which gives (3. 25). Therefore our goal in the rest of the proof is to prove (3.30). The proof of (3.30) will be by contradiction. Assume, then, that s;+,* Observe first that (3. 31) implies

+ Pmax < l?(t). that F:+l

Also,

(3.31)

from (3.29) and the definition R(t)

< l?(t).

(3.32)

of v, we obtain
u,~,,, and consequently o;+, > ~,~,. So. since ~~+, s o;, we infer from th~ FQF decision at time a,: that F;+, ~ F;. . ,, Therefore: “ S;+ , > F?+,

– P~,X

z F“ – Pmax n“ >~(f) > R(o:)

– pm,. ,,

(by (3.33))

(by (3.31)

and (3.36)).

This says that in PS, service on the ( J + 1)th job of queue u does not begin until after (real) time o;, from which the claim (3.4 1) follows. Now we can put all of the pieces together. Combining (3.40) with (3.41) gives (3.38). Equations (3.37) and (3.38) together imply that 5 !&(o! ~=1

0;”)
~=1

which contradicts Lemma 3.1. This contradiction ❑ and so the proposition is proven. Propositions

%“)3

says that (3. 31) is impossible,

3.5 and 3.6 together prove Theorem

2.1 for the FQF discipline.

3.5. PROOFS OF THE OTHER RESULTS. In this section we use the main discrepancy bound to prove the bound on difference in delays (Theorem 2.2) and the results for random unbounded arrivals (Theorem 2.3). FQS and FQF will be treated simultaneously. PROOFOFTHEOREM2.2. Fix i and a. First assume that R(@;) > Fj” + P,n~X. We can assume that a huge job arrives at queue a immediately after the ith job arrives; specifically. assume Pi: ~ = Pm,, and ~1~~ = ~,” + ~. where O < e < P,” (this can be done because changing the (i + l)th job at queue a cannot affect either d; or F,”). Observe that since R( ~,fl ~) < Fin, we have S,:+ ~ = F,” and F,: ~ = F,” + P~,X. Therefore, by our initial assumption, ~: > R-1 (F%,); that is, at time R-l ( Fly, ), FQ has not yet finished service on the ith job of queue a, which means” that

#&@R-’(E.’i,))

< i,p~.

Also, + Pmax,

592

A. G. GREENBERG AND N. MADRAS

and therefore

which contradicts

Theorem

2.1. Therefore R((b;)

Now assume that &> nonempty in PS throughout

R(@~) +Pm,,. Since r~s~~, queue a must be the realtime interval [o:, R-l(F,”)). Therefore

= w - ~(w) > ~nmx’

~-’(r))

I.Es(w

(3 .42)

sF[a+Pmax.

and so

which again contradicts

Theorem

2.1. Therefore

F,” s R(cj;

) + Pm,,.

Equations (3.42) and (3.43 ) together prove (2.6). ❑ ate consequence of (2.6).

(3 .43) Equation

(2.7) is an immedi-

Before we proceed to the details of the proof of Theorem 2.3, we of the tools. Let X denote a nonnegative random variable with E( (K is a fixed number.) Let Xl, Xz. . . . denote a sequence variables (possibly dependent), each having the same distribution finiteness of the Kth moment implies that ~ Pr{XK> ~=1 This implies

an}

< m

forall

setup some XK ) < co. of random as X. The

a>O.

that ~ Pr{X~ ~=1

> (an)”K}

< ~

for all

a >0.

By the Borel-Cantelli lemma, this in turn implies that lim. +~ Xn /~1/~ = o with probability one. Finally, it is elementary to show that this implies that max{X,

,..

lim n-m

#/K

.,Xn}

=0

with probability one. Next, we define

(For simplicity, we will assume that N(t) is nondecreasing in t; otherwise. N(t) should be replaced by the smallest nondecreasing function that dominates it. ) The assumptions about the arriving jobs, as described in Section 2.3,

How

Fair

is Fair

Queuing?

guarantee that P(t) large t). Therefore,

593

is a maximum over at most AN(t) t jobs (for sufficiently using the result of the preceding paragraph, P(t) lim ~+~ (N(f)t)’K

(3 .44) = 0

with probability one. (Observe that there is no complication if the job sizes are chosen from a finite set of probability distributions, since the maximum of several random variables has finite Kth moment if each individual random variable has finite Kth moment.) Finally, observe that Theorems 2.1 and 2.2 hold when P~,X is replaced by P(t). PROOF OF THEOREM

2.3

(i) Using the observations of the preceding paragraph, we find that A(t) < P(t)/t. Since N(t) = O(t ‘-1), we have (N(t) f)llK = O(t), and so the result follows from (3.44). (ii) Similarly, ll(t)/tq s N(t) P(t)/tg. Since N(t) = O(f(qK- l)/(K+ 1)), we ❑ have N(t)P(t) = o(N(t)[N(t)t]lJ~) = O(tq). 4.

Monte

Carlo

Simulation

Results

Though our results comparing the PS and FQ systems establish that in the worst case neither lags far behind the other, we have no analysis quantifying the stochastic behavior of either system. Perhaps on average FQ is much closer to PS than the worst-case bounds indicate. Monte Carlo simulation results are presented in this section to help study this question. In a first series of experiments, we treated a system of N = 10 arrival streams, with the arrival times for each described by an independent Poisson process. The PS and FQ systems were simulated on the same arrival sequences. All jobs arriving to streams 1-5 had constant length 10 and all jobs arriving to streams 6 – 10 had constant length 1. To equalize the arrival rates in work per unit time, the average time between consecutive arrivals for the first group was ten times greater than that for the second group. The upper halves of Tables I and II show the results of these experiments, over a wide range of values for p, the aggregate rate at which work arrives to the system as a whole. Recall that time is counted so that the rate of the server is 1. The data were collected over runs of 107 jobs, with samples taken as averages over intervals of 106 jobs. In the tables, subscripts 1,5 and 6, 10 refer to streams 1-5 and 6-10, respectively, and the quantity D denotes job delay averaged over the entire run. Each term following * in the table is twice the standard deviation of the ten samples for the corresponding data point, and is shown only if the value exceeds 0.01. The quantity 811 denotes the average absolute difference between a job’s delay in the two systems. The data show that for all three queuing disciplines the group of streams generating the longer jobs (streams 1-5) incur higher delays. There is no oyerall winner among the three disGiphnes. At high load, p = 0.9, streams 1– 5 (large packets arriving infrequently) fare best in FQS, followed by FQF, followed by PS, while streams 6-10 (small packets arriving frequently) fare best in FQF, followed by PS, followed by FQS. A classical result about the M/G/l processor sharing queue is that the steady state average delay for jobs

594

A. G. GREENBERG AND N. MADRAS TABLE

I.

lST SERIES:FQS VS. PSSIMUL~TION AVERAGE DELAY RESULTS

Ps Q

D

0.1

11.05 12.34 1396 16.07 18.93 23.03 29.48 41.29 71.98

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1,5

2c 0.02 +0.05 t 0.15 to.54 t4.50

1.0 1868.54 1.5 614’267.58 NOTE: there 10 and streams terms following data points was Section 4.

TABLE II.

FQS D 6, III

D 1,5

1.11 1.24 J.40 1.62 1.92 2.35 3.06 4.39 8.02 +0.03

9.44 * 0.07

9.74

10.31 10.70 1122 11.94 12.98 14.63 17.57 23.92 44.56

~0.02 *0.07 +0.32 ~ 3.47

1833.91 614248.69

FQF

130 167 2.13 2.73 3.51 4.61 6.25 9.03 ~0.01 15.35 *0.08

28.15 *0.64 29.00 t0.02

Db, ,0

12.13 13.43 1436 15.27 16.52 +0.01 18.56&0.03 22.32 ~0,10 30.51 kO.44 56.05 +415

1843.80 614249.94

1.14 1.44 1 82 226 2.76 3.31 3.94 4.69 5.94

6,18 6.23

are two groups oftive arrwal streams, with streams 1-5 generating packets of length 6-10 packets of length 1. The total arrival rate n O. D refers to average delay. The + are twice the standard deviation of the ten samples from which the corresponding derived. This value was suppressed if smaller than 0.01. For further explanation, see

lSTSERIES: FQSVS.PSSIMULATION AVERAGE ABSOLUTE DELAYDISCREPANCY RESULTS IPS-FQSI

P

D 1,5

D 6, 10

6D1,5

IPS-FQFI 8D6, ,0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

074 1.64 2.74 413 595 8.40 11.91

*0.02 17,36* 0.05 27,42 AO.13

0.24 0.53 0,87 1,28 180 2,48 3.43 4.88 7.51 *0.01

1.0 1.5

3472 35.85

18.71 *0.31 19,66 + 0.01

NOTE: The results were derwedfrom delay discrepancy.

13D1,5

C$D6,,0

1.58 2.35 2.86 3.47 4,42 5.88 8.06 11.29 *0.02 16.15 ~0.03

0.16 0.42 0.71 102 13’2 1.63 1.97 2.55 4.34

24.73 28.25

4.32 fO.01 4.42

thesame runs asin Table I. 6Ddenotes

the average absolute

of a given length is proportional to that length [15], meaning ~,5/~G,~O should be near 10.0. The datd show this to be approximately true for the PS system, unless the load is heavy. (As noted above, the M/G/1 processor sharing queue differs from PS in that all jobs present enter service, not just those at the heads of the lines.) We also investigated the behavior of the two systems in overload conditions, where p exceeds 1, and so some of the queues must saturate. As before, the arrival times of all streams are described by independent Poisson processes, with streams 1– 5 statistical y identical, and streams 6– 10 statistically identical. We fixed aggregate arrival rate for streams 6-10 at 1/4 (work per unit time), and the aggregate arrival rate for streams 1– 5 at p – 1/4, The data were collected over runs of 106 jobs, with samples taken over intervals of 105 jobs.

How

Fair

is Fair

Queuing?

595

Each term following i- in the table is twice the standard deviation of the ten samples for the corresponding data point, and is shown only for queues 6– 10 and then only if the value exceeds 0.01. The results are given in the lower halves of Tables I and II. For all three disciplines, the data reflect the fact that streams 1–5 saturate with delays increasing with time, and that streams 6– 10 are relatively insensitive to the overload traffic. This behavior is crucial in the application to data networks, where it is very desirable that well-behaved users are protected from other users generating packets at a rate exceeding their fair share (perhaps because of malfunctioning hardware or software). Alternatively, the saturated streams can be thought of as file transfers and the other streams as remote logins, in which case we see that the remote logins are automatically isolated and not unduly disturbed by the file transfers. For the stable group (streams 6-10) FQF attains the smallest average delays, followed by PS, followed by FQS. The results show that the FQ disciplines behave quite similarly to PS. The average delay discrepancies between the FQ disciplines and PS are much less than the worst case value, NP~,X = 10 “ 10 = 100, though these values increase with the load. In another series of experiments, we investigated the sensitivity of the two systems to the probability distribution describing the job sizes. We especially wanted to get some idea of how the average job delay depends on the coefficient of variation (ratio of the variance to the square of the mean) of this distribution. To hold the average job size fixed at 1 while varying the coefficient of variation C2, we assumed a convenient two point distribution in which the probability 1 – 1/x the job size was chosen to be 1/2 and with probability 1/x was chosen to be ( x + 1)/2, where x is set to achieve a given coefficient of variation: C2 = ( x – 1)/4. As before, N = 10 Poisson streams were simulated, over runs of 107 jobs, with samples taken over intervals of 106 jobs. Each term following + in the table is twice the standard deviation of the ten samples for the corresponding data point, and is shown only if the value exceeds 0.01. The results are given in Table III. The quantity a represents the standard deviation of the job delay. The data show that for small C* ( C2 = 0.5, 1.0), the FQ disciplines achieve smaller mean and standard deviation of delay, and at greater C2 (C2 = 5.0), PS does somewhat better than FQF, which in turn does slightly better than FQS. As might be expected from classical results about the M/G/ 1 processor sharing queue the PS discipline appears least sensitive to Cz [15].

5. Conclusions In this paper, we discussed two versions of a novel fair queuing discipline with important applications to data networks and to systems where the cost of preempting jobs from service is high. The fairness of these disciplines was rigorously established via sample path comparisons with the head-of-line processor sharing discipline, a mathematical idealization that provides a fairness paradigm. An efficient implementation of one of the two fair queuing disciplines was presented. In addition, the results of Monte Carlo simulations was reported giving an absolute frame of reference for comparing the disciplines over a wide range of traffic conditions.

596

A. G. GREENBERG AND N. MADRAS

TABLE III.

2NDSERIES: FQSVS.FQFVS.PSSIMULATION RESULTS FQS

Ps C2

p

0.5

0.2

D

D

a

0.5 0.4 1.64 0.5 0.6 2.40 0.5 0.8 4.56

1.01 1.51 2.47 5.27 *0.02

1.0 1.0 1.0 1.0

0.2 0.4 0.6 0.8

1.25 1.67 2.50 4.99

1.36 1.98 3.21 6.90 + 0.6

50 5,0 5.0 5.0

0.2 0,4 0.6 0.8

1.32 1.91 3.29 8.43 +012

1.24

FQF D

a

1.19 1.50 2,12 3.99 1,25 1.67 2.50 4.99

3.02 175 4.48 3.00 7.78*0.06 5.50 18.74+ 1.40 12.96*0.12

u

0.88 1.20 1.96 4.54 ?0.02

1.19 1.49 2.03 3.51

0.99 1.35 2.03 4.42 * 0.02

1.24 1.68 2.70 6.19+0.06

1.26 1.65 2.29 4.02

1.43 1.86 2.67 5.71 *0.06

3.32 5.00 8.45 +0.05 19.61t 1.40

1.81 2.78 4.24 8.13 *0.08

3.75 4.64 6.51 +0.04 14.74* 1.41

NOTE: D refers to average delay, u tostandard deviation of delay. Thetotal arrival rate is p, and the packet lengths have coefficient of variation C’z. The terms following + are twice the standard deviation of the ten samples from which the corresponding data points was derwed. This value was suppressed if smaller than 0.01. Fur further explanation, see Section 4.

Recently, we learned of another variation on fair queuing due to Heybey and Davin, which agrees with the disciplines just described except in the computation of R (S. Shenker, personal communication). Instead of using (2.1) to compute R, they propose simply setting R to the F’ value assigned to the last job served, still using the update equations (2.2) and (2.3). (R may be set to O if no jobs are present. ) There are examples where under this discipline the discrepancy I lP~(O, t) – h~~(O, t) I is arbitrarily close to 2PmaX;we suspect that it cannot be larger.

Appendix:

Generalized

Service Disciplines

The PS discipline queues as follows.

can be generalized to provide discriminatory service to the Each queue a, CY= 1, . . . . N, is assigned a positive weight ‘a” ‘et l{ Q.w>O} equal 1 if queue a is nonempty at time t, and O otherwise. Under the generalized PS discipline, the job at the head of a given queue 6 receives service at instantaneous rate

Just as PS is the limit of the time-slicing service discipline as the time quantum becomes small, generalized PS is the corresponding limit of prioritized timeslicing, where queue a is allotted Wa time quanta per cycle. The virtual time clock R(t) for generalized PS is determined by 1 R(t) = max The counterparts

of (2.2)

{

1, Zfl=lW’.l~Qa(tJ>O}

and (2.3)

giving

the virtual

}“ starting

and finishing

How

Fair

is Fair

Queuing?

597

times of each job are

where S; = F; = O. The generalized fair queuing disciplines FQS and FQF are the same as the originals except that recurrence relations (A 1) and (A2) are used rather than (2.2) and (2.3). The proofs of consistency for the generalized FQ disciplines are essentially the same as for the original disciplines. However, we have not proven analogous discrepancy bounds for the generalized disciplines. We expect the bound of Theorem 2.1 to hold with P~~X replaced by Pmax(v)

wa max

(

:v=I,

. . ..N.

Wv

1

where P~,X( v) is the largest job to arrive at queue v, ACKNOWLEDGMENTS .

It

Keshav

enlightening

discussions

of

and

Ko

helpful

Fleming, are

very

packet

for

several

Sam grateful sizes

Morgan,

is

to Hemant

is desirable

a pleasure Keng-Tai Kanakia

in data

to

for

thank for

Scott

explaining

Shenker

their

paper

comments. why

support

and [5],

Srinivasan and

Philip

In addition, for

a range

we of

networks.

REFERENCES 1. AHO, A., HOPCROFT,J., AND ULLMAN, J. -1 e.

3.

4. 5.

6. 7. 8. 9. 10.

The Design and Analysis of Computer Algorithms. Addison-Wesley, New York, 1974. ARNOULD, E. A.. BITZ, F. J., COOPER,E. C., KUNG, H. T., SANSON, R. D.. AND STEENKLSTE,P. A. The design of nectar: A network backplane for heterogeneous multicomputers. In AS’PLOS III: The 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, Mass., Apr.) ACM, New York, 1989, pp. 205-216. (Published in Computer Architecture News 17, 2 (April 1989), Operating Systems Review 23, Special issue (April 1989), and SIGPLAN Notices 24, Special issue (May 1989)). CHHJ, D.-M., JAJN, R., AND RAMAKRISHNAN, K. K. CongestIon avoidance in computer networks with a connectionless network layer. Tech. Rep. TR-507 ,508,509,510. Digital Equipment Corporation, Maynard, Mass., 1987 (This is a four part series. ) CIDON, I., AND GOPAL. L S. Par%: An approach to integrated high speed private networks. Znt, J. Digit. Analog Cabled Syst. 1, 2 (Apr.-June 1988), pp. 77-86. DEMERS, A., KESHAV, S., AND SHENKER, S. Analysis and simulation of a fair queueing algorithm. In SIGCOMM ‘ 89 Symposium: Communications Architectures & Protocols (Austin, Tex., Sept. 19-22). ACM, New York, 1989, pp. 1-12 (published as Comput. Commun. Rev. 19, 4; revised version J. Znternet. Res., Experience I, 1 (Sept. 1990), 3-26). FRASER,A. G. Towards a universal data transport system, IEEE J. Select. Areas in Commun., SAC-Z (1983). 803. FRASER, A. G., AND MORGAN, S. Queueing and framing disciplines for a mixture of data traffic types. AT&T Bell Lab. Tech. J. 63, 6 (1984), 1061-1087. GELENBE, E., AND MITRANI, 1, Analysis and synthesis of Computer Systems, Academic Press. New York. 1980. HAHNE, E. Round robin scheduling for fair flow control in data commumcation networks. PhD dissertation, Massachusetts Institute of Technology, Cambridge, Mass., December 1986 (published as report LIDS-TH-1631, Laboratory for Information and Decision Systems). HUANG, A., AND KNAUER, S. Starlite: A wideband digital switch. In GLOBECOM ‘ 84 (Atlanta, Ga., Nov.). IEEE, New York, 1985. pp. 121-125.

598

A. G. GREENBERG AND N. MADRAS

11. KANAKIA, H. Yswitch. A glgablt packet switch architecture. In preparation. The VMP network adapter board (NAB) High-performance 12. ICANAKIA, H. AND CHERITON, D. R. ’88 S.WRDOSiUIn: communications network commumcatlon for multiprocessors. In SIGCOMM Architectures and Protocols (Stanford, Cahf., Aug. 16-19). ACM. New York, 1988. pp. 175-187. 13. KATEVENIS, M. Fast switching and falrcontrol ofcongested flow ln broadband networks IEEE J. Selected Areas in Comrnun. 5,8(1987), 83-99. SIGCOMM ’87 14. KENT. C A., ~NDMoGuL, J. C. Fragmentation considered harmful. In ACM W’orkshop:F rontiersin ComputerC ommunicarions Technology, (Stowe, Vt., Aug. 11-13) ACM, New York, 1987, pp. 390-401. 15. KLErNROC~, L. Queueing Systems, Voltrme 2: Computer Apphcat10ns. Wdey, New York. 1976 16. MORGAN, S. Queumng disciplines and passive congestion control in byte-stream networks. In IEEE INFOCOMM ‘ 89 (Ottawa, Ontario, Canada, Apr. ) IEEE Computer Society Press, Washington, DC., 1989, pp. 711-720. 17. NAGLE. J. On packet switches with infinite storage. IEEE Trans Commzm. 35 (Apr 1987). 435-438. 18 SHENKER. S Making greed work m networks. A game-theoretic analysls of gateway serwce disciplines (abridged version proofs excised). Unpublished 19 TANENBAUM, A. S. Computer Networks 2nd edition Prentice-Hall, Inc., Englewood Cliffs, N. J., 1988. 20. ZHANG, L. A new architecture for packet swltchmg network protocols. Ph D dissertation. Massachusetts Institute of Technology, Cambridge, Mass., 1989 RECEIVED

JANUARY

Journal of the Axoclat,on

1990;

REVISED

for Computmg

AUGUST

M.chmcry,

1990

Vol

AND

JANUARY

39, No. 3, July 199’2

1991 ;

ACCEPTED

FEBRUARY

1991