Competitive Distributed File Allocation - CiteSeerX

5 downloads 178888 Views 124KB Size Report
dynamic file re-allocation strategy that adapts online to a se- quence of read and ..... copy at the beginning of a phase, and ai;1 i D denote its position after the i'th ...
Competitive Distributed File Allocation (Extended Abstract) Baruch Awerbuch

Yair Bartaly

Amos Fiaty

Abstract

can be implemented with small amount of local space.

This paper deals with the file allocation problem [BFR92] concerning the dynamic optimization of communication costs to access data in a distributed environment. We develop a dynamic file re-allocation strategy that adapts online to a sequence of read and write requests whose location and relative frequencies are completely unpredictable. This is achieved by replicating the file in response to read requests and migrating the file in response to write requests while paying the associated communications costs, so as to be closer to processors that access it frequently.

1 Introduction

We develop first explicit deterministic online strategy assuming existence of global information about the state of the network; previous (deterministic) solutions were nonconstructive and more expensive. Our solution has (optimal) logarithmic competitive factor. The paper also contains the first explicit deterministic data migration [BS89] algorithm achieving the best known competitive ratio for this problem. Using somewhat different technique, we also develop first deterministic distributed algorithm (using only local information) with poly-logarithmic competitive factor against a globally-optimized optimal prescient strategy. This algorithm

 Lab. for Computer Science, MIT, Cambridge, MA 02139. Supported by Air Force Contract TNDGAFOSR-86-0078, ARO contract DAAL03-86K-0171, NSF contract CCR8611442, and a special grant from IBM. E-mail: [email protected]. y Department of Computer Science, School of Mathematics, Tel-Aviv University, Tel-Aviv 69978, Israel.

1.1 The Problem This paper is concerned with finding efficient strategy for replicating and migrating data in presence of a dynamic pattern of reads and writes (details in Section 2). This is a well-studied problem in distributed systems [DF82]. In order to achieve maximum efficiency, we need to maximally exploit the locality of reference in the underlying distributed program, and incorporate it with the spatial locality of the underlying network architecture. More specifically, if many read requests to a specific file are issued in some vicinity, it is advisable to copy the relevant file to, or near, that vicinity. However, this should be balanced by the relatively greater cost of moving an entire file versus the smaller cost of transferring only the data being read. On the other hand, if the file is frequently written, then it seems advisable to remove all replica but one, and locate this replica close to the vicinity of the writes. In this paper, we will avoid making any statistical assumptions about the location or frequency of read and write requests. The above conflicting heuristics must somehow be balanced

0

Page 1

in a dynamic fashion, so that the resulting algorithm automatically adapts to changing access patterns. The goal here is finding an “almost optimal” dynamic policy, rather than “learning” the best “static” file assignment. We will use, as a measure of performance, the “competitive ratio” which is defined as the ratio between the costs associated with a online algorithm versus the costs expended by an optimal dynamic policy (referred to as “adversary”), who has perfect knowledge of the future, to deal with same sequence of events. To capture additional subtleties arising in distributed systems, such as the need for local control, the standard definition of competitiveness, as introduced by Sleator and Tarjan [ST85, KMRS88], needs to be refined. Below we briefly elaborate on this important issue.

1.2 The centralized versus distributed models Centralized model. In this “standard” setting, as introduced in [ST85, KMRS88], the online algorithm is fully aware of the global state of the system, which consists of the current configuration (i.e. the position of all file replicas) and the current input, but is unaware of the future sequence of input requests. In other words, we assume some “daemon” that keeps track of the migrating, replicating, and dying populations of files in the network. In particular, we assume that the daemon tells processors how to go about finding the closest current copy of every file, how to update all replicas after a write, etc. More importantly, the daemon coordinates the decisions made by different processors.

Distributed model. In a more realistic distributed setting, such a daemon does not exists. The decisions are made based solely on local information. It is thus up to the online algorithm to find out (for pay) the relevant part of the global state necessary to take a decision; the only information that the algorithm has “for free” is the past local input. The additional handicap imposed on the online distributed algorithm is that it is evaluated against the offline adversary that does not pay for overhead of control needed to take an intelligent decision. Consequently, the task of designing efficient online distributed algorithms is quite a challenging one, and cannot be taken for granted. While certain techniques have been developed in the literature in order to competitively locate

mobile objects based on local information [AP89, BFR92], it is hard (and sometimes provably impossible) to “localize” global-knowledge decision strategy without compromising the performance. Consequently, “good” global-knowledge online algorithm may prove to be absolutely useless in the distributed environment. For example, the “existential” method of [BBK+ 90] for derandomizing online algorithms is based on obtaining global information and thus cannot be be efficiently implemented in a distributed setting (competitive factor of

(n)).

1.3 Existing results. The management of data in a multiprocessing environment has been extensively studied, both from the theoretical and the practical standpoints. The 1981 survey paper by Dowdy and Foster [DF82], dealing with the file allocation (or assignment) problem, cites close to a hundred references. First competitive algorithms for special cases of the centralized version of the problem were found by Black and Sleator and by Westbrook [BS89, Wes]. The file allocation problem may be viewed as the combined solution to the two subproblems defined in [BS89]. Other special cases of the problems has been considered in [CLRW, WY]. In [BFR92], randomized algorithms has been developed for the general network, with competitive factor of O(logn) for the centralized problem and O(log4 n) for the distributed problem. The existential construction of [BBK+ 90] can be used to obtain an O(log2 n)-competitive deterministic centralized online “algorithm”. One of its disadvantages is the exponential computational cost, which makes it unattractive from the practical perspective. Since this algorithm is based on obtaining global information, it cannot be efficiently implemented in the distributed setting (competitive factor of (n)).

1.4 Summary of our results In this paper, we develop the following results:



Centralized: we develop an optimum deterministic al-

gorithm with an O(log n) competitive ratio. (Previously only existential constructions were known with O(log2 n) ratio.)

Page 2



Distributed: we develop the first deterministic solution with an O(log4 n) competitive ratio. (Previously only randomized constructions were known.)

One of the corollaries of our centralized algorithm is a simple deterministic 7-competitive Move-To-Min (MTM) data migration algorithm, which achieves the best known competitive ratio.

1.5 Structure of the paper. In Section 2 we give a somewhat more formal statement of the problem. We start by presenting in section 3 the building blocks of our solution, which deal with the two special cases of the problem. The first is a simple potential function analysis of the greedy Steiner tree heuristic, and the second is a simple deterministic 7-competitive data migration algorithm. This is the best known competitive ratio. In section 4 we present a (deterministic) centralized algorithm which is O(log n)-competitive, based on the basic ideas presented in section 3. This algorithm is optimally-competitive (up to a constant factor) for general network topologies. The analysis of this algorithm is given in section 5. In section 6 we present a different (deterministic) distributed algorithm which is O(log4 n)-competitive. The analysis of this algorithm is given sketched in section 7.

tributed online algorithm, configuration also includes various distributed data structures maintained by the algorithm. The response is based on the information available to the algorithm. In the case of offline algorithm, this information consists of the past and the future inputs. In the case of centralized online algorithm, it consists of the past inputs and the current configuration. In the case of distributed online algorithm, it consists of the past local inputs and the part of the current global configuration explicitly communication to the location at which the decision is made. Data is assumed to be organized in indivisible blocks such as files (or pages). It can be accessed via communication links by paying a charge equal to the data transfer size times the distance traversed (in the above metric space). Individual words or records can be read or written remotely, but a file cannot be split among processors. Files may be replicated in various processors throughout the network, but copy consistency must be maintained. In other words, all file replicas must be identical and must reflect all write transaction performed until now. Each read must return the most updated value, and the value written must be propagated to all replicas.

Costs. The total cost expended by a centralized (online or offline) algorithm consists of the cost caused by the configuration changes, plus the cost of serving the input requests. The costs of configuration changes are as follows:

2 Problem statement Network model. The underlying network topology is modeled by a weighted undirected graph (also referred to as “weighted graph”, or “metric space”) where processors are represented by vertices, and edge weights represent transmission costs between corresponding pairs of vertices. The weighted graph need not obey the triangle inequality, but a natural metric space can be defined where the points are processors and the distance between two points is equal to the length of the shortest path between the processors in the weighted graph.

Input/Output. The algorithm receives, as an input, the sequence of read and write requests issued at different network nodes. As an output, it produces a sequence of responses, each response changing the configuration, namely the network nodes currently holding the replicas of the file. In case of dis-

 

The cost of erasing a file replica is zero. The cost of replicating a file from one location to another is the cost of communication between these two locations, times the files size, D (which may be quite large).

The costs of request service are as follows:





The cost of serving a read request is the cost of communication from the location of the request to the closest replica. (Minimum cost obtained by communicating over the shortest path.) The cost of serving a write request is the cost of communication from location of the write request to all the replicas. (Minimum cost obtained by communicating over the minimum-weight spanning tree, or Steiner Tree, of the above locations.)

Page 3

In case of distributed online algorithm, the cost incorporates also the cost of control, i.e. maintenance of distributed data structures. Each time a control message is sent, online pays the number of bits in the message, times the cost of the path traversed by that message.

3 The Building blocks of the solution This section presents the two basic ideas in the deterministic algorithm is section 4. It consists of two parts. In the first part, we consider a special case of the problem, where D = 1, and only read requests are issued. This version of the problem is equivalent to the On-line Steiner Tree problem, studied by [IW].

3.1 Proof for the Greedy Steiner Tree The greedy Steiner tree algorithm. The Greedy Steiner tree algorithm connects a new point to the closest point already in the tree. To this algorithm corresponds a simple replication algorithm (for the read-only case), namely replicate from the closest replica. In this subsection we give a simple potential function proof for the competitiveness of the greedy on-line Steiner tree algorithm, and thus to read-only replication. (Alternative proof given by [IW].) The intuition behind this potential function is the basis to the more complex potential function used to prove the main result. Theorem 1 The greedy Steiner tree algorithm is strictly O(log n) competitive for any weighted graph over n vertices. For the proof, we need to introduce the following notation. Notation. In the context of some weighted graph G, the distance between two vertices p; q of G is denoted d(p; q). For a subset of vertices Q and a vertex p, minq2Q fd(q; p)g is denoted by d(Q; p). For a subset of vertices Q of G, T(Q) denotes the weight of the minimal Steiner tree spanning Q. T(Q) is also used to denote the tree itself, and the meaning should be clear from the context. When S is a tree then T(S) simply denotes the weight of the tree. The k-neighborhood of a vertex u, s.t., d(u; v)  k.

v is the set of all vertices

For the definition of the potential function we need a way to form a tree cover, i.e., a partition of a subtree S in a graph to O(T(S)=k) subtrees of diameter at most k each, where k is some integer. We shall denote these subtrees by C1; C2; : : :; Ct, and in each subtree we choose one covering processor, denoted p1 ; p2; : : :; pt, respectively. In the general case we shall need this where the subtree S is dynamically growing in the graph. This tree cover problem is a restriction of the cover problem of [BFR92]. Proof Sketch of Theorem 1. Let A denote the adversary optimal Steiner tree, and B denote the greedy on-line subtree. Define log n levels of tree covers of the adversary’s tree A, for k = 2i, for i’s such that T(A)=n < 2i  T(A), and let pi1 ; pi2; : : :; piti be the cover processors in the i’th cover level. Informally the potential function will hold a credit of 4k for any covering processor such that the on-line subtree has not reached its k-neighborhood. Thus the potential function  is the sum of these credits over all levels and covering processor, which we will call , plus  = 4  T(A)=n  (n ? jB j) , which can be viewed as spare credit used to deal with requests of small cost to greedy. Clearly   0. Since the number of covering processors in level i = logk, is O(T(A)=k), and each has a credit of at most 4k the total credit in the i’th level is O(T(A)), also obviously   4  T(A), and thus initially   O(log n)  T(A). We need to show the potential function decreases by at least the on-line’s cost for adding a new vertex v to its subtree, d(B; v). Let 4k be the smallest power of two such that 4k  d(B; v). If k  T(A)=n then  decreases by at least 4k, otherwise there was no vertex in the on-line subtree in v’s 2k neighborhood. Since v 2 A, there is a covering processor p, in the i = log k level, at distance at most k from v. It follows that p’s k-neighborhood contains a new vertex in the on-line’s subtree, and thus the potential function decreases by at least 4k.

3.2 A Deterministic Data Migration Algorithm While the Steiner tree problem model a special case of the read only file allocation problem, the data migration problem is equivalent to file allocation with only write requests. We give a simple deterministic algorithm for data migration on arbitrary topologies.

Page 4

Algorithm Move-To-Min (MTM) The algorithm divides the request sequence into phases. Each phase consists of D consecutive write requests at processors w1 ; w2; : : :; wD . During a phase the algorithm doesn’t move the copy of the file. At the end of a phase migrate the copy to a processor m in the network such that the sum of distances from m to the wi ’s is minimized. Theorem 2 Algorithm Move-To-Min is 7-competitive for data migration on arbitrary network topologies.

rithm This section describes the general algorithm for the file allocation problem (FA). We assume that initially the on-line algorithm holds a single copy of the file in the network, otherwise we delete all copies but one, incurring no additional cost.

4.1 The description of the algorithm

We use the following. Notation. Let w1; w2; : : :; wD be D points in the metric space. Let q be any point in the metric space, i d(q; wi) is denoted by q .

P

Proof Sketch. We analyze the performance of the algorithm in a phase. Let a = a0 denote the position of the adversary’s copy at the beginning of a phase, and ai ; 1  i  D denote its position after the i’th request of the phase. Also let b denote the position of the on-line copy at the beginning of the phase. Clearly the cost for MTM during the phase is

CostMTM = b +D d(b; m)  a +D d(b; a)+D d(b;m) The adversary’s cost during the phase is

CostAdv = D 

4 A Deterministic File Allocation Algo-

X d(a i

i?1; ai) +

X d(a i

It follows from the triangle inequality that for all i. Thus we also get

i?1; wi)

CostAdv  a ,

As in the write only case, the algorithm partitions the request sequence into phases of D consecutive write requests. Within each phase, the algorithm deals only with read requests. At every time the algorithm maintains a partial list L, of previous requests. Upon receiving a new read request initiated by processor r, add r to L. If L consists of more than D read requests, consider the smallest k = 2i-neighborhood of r containing D requests. If the algorithm does not hold a copy of the file at distance less than 8k from r, replicate a copy of the file to r from the closest processor to r holding a copy, and remove the D requests from L. When a phase ends, let w1; w2; : : :wD be the positions of the D writes initiated during the phase. Let m be a processor such that the sum of distances from m to the wi’s (i.e., m ) is minimized. Now, replicate a copy of the file to m from the closest processor to it holding a copy, and then delete all copies of the file except the one at m. This procedure is call the reorganization step.

i

Theorem 3 Algorithm FA is trary network topologies.

O(log n)-competitive on arbi-

D  d(ai; m)  a + m  2a  2CostAdv i

i

We now use the potential function  = 2D  d(a; b). We show that over a phase it increases by at most 7 times the adversary’s cost and decreases by at least the on-line’s cost. Let a0 = aD .

 = 2D  (d(a0; m) ? d(a; b))  D(2d(a0 ; m) + d(a; m) ? d(b; m) ? d(a; b)) = 2D  d(a0; m) + D  d(a; m) + a ? a ? D  d(b; m) ? D  d(a; b)  7CostAdv ? CostMTM

5 Proof of Deterministic File Allocation Algorithm This section describes the proof for general algorithm for the file allocation problem (FA) in Section 4 (see Theorem 3). We prove theorem 3 using a potential function argument. We analyze the change in the potential function , after each operation within phase, and show that during the phase the potential function increases by at most O(log n) times the adversary’s cost and decreases by at least the on-line’s cost, which

Page 5

concludes the theorem. The analysis is separated into four main parts. First we prove that when the adversary changes its configuration  increases by at most O(logn) times its move cost, then we deal with the change in the potential on serving read requests, and when FA replicates a copy of the file, and finally we analyze the change in the potential function at the end of a phase proving that the potential increases by at most O(log n) times the cost the adversary incurred during the phase and decreases by at least the cost for FA on the write requests and for the reorganization step taken at the end of the phase. Along the proof we use A to denote the adversary configuration, and B the on-line configuration. We define an intermediate adversary, we shall refer to as Int. At the beginning of a phase the intermediate adversary holds a minimum Steiner tree spanning the copies held by the original adversary and the single copy held by the on-line algorithm. During the phase the intermediate adversary adds to its subtree the copies the adversary replicates to, by adding the edges along which the adversary has replicated, but the intermediate adversary never drops. Thus, during a phase the intermediate adversary never pays at most the cost of the original one, on each operation. The subtree of this adversary is denoted A.

before the last one in this neighborhood occurred, it had at least D requests in its k-neighborhood (since the previous one is contained in it), and it’s nearest on-line copy would be at distance at least 8k ? k=2 > 4k from it. Therefore algorithm FA should have replicated to that request position, at distance at most k=2 from p, a contradiction. Similarly 2 deals with the on-line algorithm’s replication step. For each covering processor at the log k level, such that its nearest on-line copy is outside its 2k neighborhood, we give a credit of D  16k. Again 2 is the sum of all these credits. The  component of the potential take cares of read requests of small cost of the on-line, and is also divided to two parts  = 1 + 2. Let l2 be the total number of requests in L. Note that this is bounded by n  D, since after D requests at a single processor, the algorithm replicates to that processor. The potential is defined by: 1 = 128  T(A)=n(n  D ? l2 ), and 2 = 32D  T(A)=n(n ? jB j). Where 1 manages the service of such requests, while 2 deals with replications in response to them. Finally  = D  T(B), were B is the subtree spanning B , implied by the on-line replications.  is required for dealing with the cost of write requests.

5.1 The Potential Function

5.2 Analysis

The potential function for FA , , is composed of 3 main potential components; i.e.,  = ++. Each is responsible of dealing with different parts of the analysis.

Lemma 4 The total change in  due to an adversary’s replications and deletions during a phase is at most O(logn) times the cost of these operations.

is the main component which is used to analyze read requests. In the definition of we need to create a tree cover of the intermediate adversary’s subtree A. As section 3, we have log n levels of such tree covers, for cover parameters k = 2i, for i’s such that T(A)=n < 2i  T(A). Hence at each level we have O(T(A)=k) processors covering the cover subtrees. is composed of two subcomponents as well; i.e., = 1 + 2 . 1 deals with the analysis of serving read requests. For each covering processor at the log(k=4) level, such that its nearest on-line copy is outside its 8k-neighborhood, let l1 be the number of read requests in L in its k=2-neighborhood. We give this processor a credit of 32k  (D ? l1 ) 1 is the sum of these credits. Note that each of these credits is nonnegative, since if for some covering processor p, l1 > D it means that there were more than D requests in p’s k=2-neighborhood, and thus when the request

Proof. Clearly if the adversary deletes copies, the potential function does not change since it’s only a function of the intermediate adversary’s configuration A, and the on-line’s configuration B . If the adversary performs a replication from some processor p 2 A to a subset Q of the processors, than the intermediate adversary does the same thing, thus causing a change in and . Obviously  increases by at most O(1) times the adversaries cost, since D  T(A) increases by at most this cost. associates with a covering processor at the i’th level of the tree cover of A, O(2i ) credits, thus adding such a processor increases the potential by O(2i)  D. Let T be largest power of two smaller than the final value of T(A). We have that at each of the final log n levels, for i’s such that T=n < 2i  T , there are O(T=2i) covering processors, and thus this levels caused a total increase of O(log n)  D  T to the potential. Each level i, such that 2i = T=(n  2j ), for j  1,

Page 6

has been part of the potential function only when the weight of the subtree of the intermediate adversary was smaller than T=2j , and thus there were at most O( 2Tj =2i) covering processors at the i’th level, each increasing the potential function by O(2i )  D, thus the total increase due to the level is O(T=2j ). Summed over all j  1, we get that all of the low levels caused an increase of O(D  T) to the potential, and therefore the total increase in the potential is O(logn)  D  T . We now proceed with analyzing the on-line algorithm’s cost. We show that the potential function does not increase by more than O(logn) times the adversary’s cost and decreases by at least the on-line algorithm’s cost. To get this, it is enough to show that either the potential function decreases by the online’s cost or the adversary’s cost is at least some constant fraction times the on-line cost. Lemma 5 For each read request,  increases by at most O(log n) times the adversary’s cost for that request and decreases by at least FA’s cost for the same request. Proof. Consider a read request at processor r, at distance d from nearest on-line copy of the file. Let k be the largest power of two such that 8k  d. Thus the cost for the on-line algorithm is d  16k. If k  8  T(A)=n than 1 decreases by at least 128  T(A)=n  16k  d. Otherwise, if there is no copy of the intermediate adversary at distance less than k=8 from r than the original adversary doesn’t have such a copy as well, and thus the adversary’s cost is at least k=8; i.e., a fraction of the on-line’s cost, and the claim follows. Finally, if there is a copy a, of the intermediate adversary, at distance at most k=8 from r, than there exists a covering processor at the log(k=8) at distance at most k/8 from a. This is true since T(A) includes a path from a to some on-line copy. But such a copy is at distance at least 8k ? k=8 > k=8 from a, and hence T(A) > k=8. We conclude that there is covering processor p, of the log(k=8) level, at distance k=4 at most from r. The nearest on-line copy to p is at distance at least 8k ? k=4 > 4k from it, and therefore 1 holds a credit p. This credit decreases whenever there are requests in L at p’s k=4neighborhood. Since r is in this neighborhood, 1 decreases by at least 16k  d. Lemma 6 For each replication FA performs,  increases by at most the adversary’s cost due to the D read requests that caused that replication, and decreases by at least the replication cost for FA .

Proof. Let r be the processor FA replicated to, and let d be the distance to closest on-line copy to r. As in the previous proof let k be the largest power of two such that 8k  d. It follows from the fact that FA has replicated to r, that at least D read requests in L, were initiated in r’s k-neighborhood. The cost of FA for the replication is D  d  16k  D. Note that  increases by exactly the replication cost. Thus we are interesting in decreasing the other potential components by at least twice the on-line’s cost; i.e. 2D  d  32k  D. If k  T(A)=n then since B has grown, 2 decreases by at least 32k  T(A)=n  D. If the closest copy of the intermediate adversary to r, is at distance at least 2k from r, than since the intermediate adversary never deletes copies, we assert that the adversary never had a copy of the file in r’s 2k-neighborhood during the current phase. Since there were at least D requests at r’s k-neighborhood it follows that the adversary had a cost of at least k for each of these requests, and thus a total cost of at least D  k; i.e., at least some constant fraction of FA’s cost, and the claim follows. Otherwise, there is a copy a, of the intermediate adversary at distance less than 2k from r. Since the closest on-line copy to a is at distance at least 8k ? 2k > 2k from a, we have that T(A) > 2k. It follows that there is a covering processor p, at the log(2k) level, at distance at most 2k from a, and hence at distance at most 4k from r. Before the replication FA did not hold a copy of the file in p’s 4kneighborhood, since it is contained in r’s 8k-neighborhood. After the replication to r, there is such an on-line copy and therefore 2 decreases by at least 32k  D as required. Lemma 7 At the end of the phase, the change in the potential function due to the reorganization step, and the initialization of the intermediate adversary’s configuration, is bounded by O(logn) times the cost the adversary incurred during the phase, minus the cost FA incurred for the write requests of the phase, and the reorganization step at the end of the phase. Proof. We divide the reorganization step into two parts. First we simulate additional D read request at processor m, ignoring other requests still in L, thus analyzing the change in the potential function due to the replication to m. Then we analyze the change in the potential as a result of the deletion step, and the initialization of the intermediate adversary configuration. Let A,B , and B denote the appropriate configurations at the end of the phase, before the reorganization step takes place. From lemma 6 we infer the following. Claim 7.1 The change in the potential function for the simu-

Page 7

lation step is   O(log n)  D  d(A; m) ? D  d(B; m). After the deletion step FA leaves only one copy of the file at processor m. Therefore   ?D  T(B [ fmg), where B [ fmg is the resulting subtree held by the on-line algorithm after the replication is performed. The intermediate adversary is initialized to a minimum Steiner tree spanning the adversary’s configuration and m. By lemma 4 this increases the potential function by at most O(logn)  D  T(A [ fmg) Thus we have that the total change in the potential function due to the reorganization step is:   O(log n)  D  fT(A [ fmg) + d(A; m)g (?1)

D  fT(B [ fmg) + d(B; m)g

(2)

Let CostFA denote the cost FA incurred for the write requests of the phase, and CostAdv denote the adversary’s cost for the same requests plus its replications cost.

X T(B [ fw g) i?1 i i X T(B [ w )

subtree is monotonically growing during a phase, we have:

CostFA =



Since m

i

i

 D  T(B [ fmg) + m  D  T(B [ fmg) + CostAdv

 CostAdv , as proved in claim 7.2 .

Now, using claims 7.2 and 7.3 in equation 1 we get the lemma. Theorem 3 follows directly from lemmas 4, 5, 6, 7, observing that the adversary is not charged more that once its cost in the phase in each of these lemmas, and that in all the lemmas together, the total cost for the on-line algorithm is charged.

The following two claims conclude the lemma. Claim 7.2 The cost the adversary incurs for the write requests of the phase and for replications satisfies: CostAdv  D  14 fT(A [ fmg) + d(A; m)g. Proof of Claim. Let w1; w2; : : :; wD be the positions of the D write requests of the phase. Let Ai be the adversary’s configuration after the request at wi arrived, and AD = A. By the triangle inequality we have:

CostAdv 



X T(A [ fw g) + i?1 i i X D  dist(Ai?1 ; Ai) i X T(A [ fw g) i

i

(3) (4)

The last sum is at least the sum of distances from one processor in A to the wi ’s, and therefore at least m . We conclude that: D  fT(A [ fmg) + d(A; m)g (5)

 2D  T(A [ fmg) X  2( T(A [ fwig) + m ) 

i X 4 T(A [ fw g)  4Cost i

i

Adv

Claim 7.3 The cost for FA for the write requests of the phase satisfies: CostFA  D  T(B [ fmg) + CostAdv . Proof of Claim. Let w1; w2; : : :; wD be the positions of the D write requests of the phase. Let B i be FA’s configuration after the request at wi arrived, and B D = B . Since the on-line

6 The Distributed Algorithm In this section we describe a modification of algorithm FA presented in section 4 which works in a distributed environment.

6.1 Background. Network Decompositions. We strongly rely the network decomposition data structure of [AP90]. For all k, [AP90] show that the network can decomposed into overlapping clusters, such that each k-neighborhood is contained in some cluster. Moreover, the diameter of each cluster is at most k log n, and the number of different clusters a vertex belongs to is at most log n. We maintain such a decomposition for every k = 2i ; 0  i  log(Diam), and choose a leader in each cluster.

Data tracking. We also use the distributed data tracking mechanism of [BFR92], a generalization of the Awerbuch and Peleg [AP89, AP91] mobile user algorithm. The data tracking problem enables an algorithm to dynamically maintain a set of processors holding the data with total cost proportional to the inherent costs of updating the set of processors and finding the closest copy. The data tracking mechanism allows insert and delete operations, and find requests. The total cost for a sequence of requests is O(log3 n) times the inherent cost for these operations. The find operation the data tracking

Page 8

mechanism supports is designed to find some copy of the data at distance = O(logn) times the distance of the closest copy from the invoking processor. is called the data tracking approximation factor. assuming the diameter of the network is polynomial in n. (We shall maintain this assumption along the paper for simplicity, but we shall remark where the meaning may be not clear).

6.2 Algorithm Sketch Dealing with write requests. The algorithm partitions the request sequence into phases of D consecutive write requests. The writes are counted at the processor initially holding a copy of the file. The algorithm maintains a subtree spanning the set of copies it holds by adding to it the edges along which it replicates. When a write request arrives it traverses its tree, and increases the writes counter by one. Once the write counter reaches the file size D , the phase ends, and we make a “reorganization step” that is somewhat different from that of FA . Namely, among all the D locations where the file has been written, find the “central” location with the property that its distance from all other D ? 1 locations is minimized. Now, all the replicas are erased, except for the last one, which is migrated into the central location. The central location is computed by the root of the tree, that knows about all the D write locations.

Dealing with read requests. Within each phase, the algorithm deals only with read requests. The algorithm holds read requests counters in each of the clusters leaders. Initially all counters equal zero. Upon receiving a new read request initiated by processor r. We search for a copy of the file using the data tracking algorithm [BFR92]. Let h be the distance at which a copy of the file has been found. Let k be the largest power of two such that k  h=(8   log n). Increase the counters at all the clusters at the logk level that contain r by one. If the counter of any of them equals D then replicate a copy of the file to r. Theorem 8 Algorithm DFA is O(log4 n)-competitive on arbitrary network topologies.

7 The Proof Sketch for Distributed Algorithm In this section we sketch proof for the Distributed Algorithm in Section 6 (see Theorem 8). Note that in this section we assume the diameter of the graph is polynomial in n. The proof of theorem 8 is a modification of the proof of theorem 3. We follow the same steps as in section 5. In the next paragraph, we show how to make the main modifications in the potential function. The analysis of the potential function is basically based on the same ideas as in the analysis of algorithm FA .

The Modified Potential Function. As in the proof of FA , the potential function for DFA , is  = +  + .

is used to analyze read requests, where = 1 + 2 . We need to create a tree cover of the intermediate adversary’s subtree A. We have log n levels of such tree covers, for cover parameters k = 2i , for i’s such that T(A)=n < 2i  T(A), as in the previous proof. 1 deals with the analysis of serving read requests. For each covering processor p at the log(k=2) level, we associate a cluster in the log k level, containing p’s k-neighborhood. Let c be the read counter of that cluster, then we give p a credit of 16 log2 n  k  (D ? c). 1 is the sum of these credits. 2 deals with the on-line algorithm’s replication step. For each covering processor p at the log k level, we associate a cluster in the log(2k) level, containing p’s 2k-neighborhood. We give p a credit of D  8k log n if this cluster does not contain a copy of the on-line algorithm. 2 is the sum of all these credits.  and  are defined the same as in section 4, except that the modified  has of a factor of 2 log2 n Analysis Sketch. We have lemmas matching to those given in section 4, with the appropriate change in the competitive factor. The analysis and proofs are modifications of those in section 4, using the properties of the sparse partition. The basic ideas are that if DFA counts a read request at a cluster on the log k level, then all counted requests are contained in its k logn-neighborhood. Thus since the algorithm counts in such a level only if a copy of the file has been found at distance at least 8   klogn, we have by the find approximation property of the data tracking mechanism of [BFR92], that the closest copy to r is at most closer by a factor of , and thus at distance

Page 9

8k log n at least. The rest of the analysis is the same as in the proofs in section 4.

ment. In Proc. 24th ACM Symp. on Theory of Computing, pages 39–50, 1992. [BS89]

D.L. Black and D.D. Sleator. Competitive algorithms for replication and migration problems. Technical Report CMU-CS-89-201, CarnegieMellon, 1989.

[CLRW]

M. Chrobak, L. Larmore, N. Reingold, and J. Westbrook. Optimal multiprocessor migration algorithms using work functions. manuscript.

[DF82]

D. Dowdy and D. Foster. Comparative models of the file assignment problem. Computing Surveys, 14(2), jun 1982.

[IW]

M. Imaze and B.M. Waxman. Dynamic steiner tree problem. SIAM Journal on Discrete Mathematics, 4(3):369–384, august.

8 Open Problems In [BFR92] a lower bound of (logn) was proved for the file allocation problem in the global-view setting. While this paper shows this ratio to be tight up to a constant factor, the upper bound achieved here in the distributed setting is O(log4 n). Thus the first obvious open problem is determining the exact competitive ratio of the distributed file allocation problem. on specific network topologies. Let the competitive ratio of the on-line Steiner tree problem on a specific metric space be cn . We conjecture that there exists a deterministic O(cn ) competitive algorithm for the file allocation problem. Finally open, is the problem of devising competitive algorithms for the constrained file allocation problem [BFR92] on arbitrary network topologies. We hope some of the basic techniques developed in this paper will prove useful in the solution of this problem, and other problems in this area.

References

[KMRS88] Karlin, Manasse, Rudolpoh, and Sleator. Competitive snoopy caching. Algorithmica, 3(1):79–119, 1988. [MMS88] M.S. Manasse, L.A. McGeoch, and D.D. Sleator. Competitive algorithms or on-line problems. In Proc. 20th ACM Symp. on Theory of Computing, pages 322–333. ACM SIGACT, ACM, May 1988.

[AP89]

Baruch Awerbuch and David Peleg. Online tracking of mobile users. Technical Memo TM-410, MIT, Lab. for Computer Science, August 1989.

[ST85]

D.D. Sleator and R.E. Tarjan. Amortized efficiency of list update and paging rules. Comm. of the ACM, 28(2):202–208, 1985.

[AP90]

Baruch Awerbuch and David Peleg. Sparse partitions. In Proc. 31st IEEE Symp. on Foundations of Computer Science, pages 503–513, 1990.

[Wes]

J. Westbrook. Randomized algorithms for multiprocessor page migration. to appear in Proc. of DIMACS Workshop on On-Line Algorithms.

[AP91]

Baruch Awerbuch and David Peleg. Concurrent online tracking of mobile users. In Proceedings of the Annual ACM SIGCOMM Symposium on Communication Architectures and Protocols, Zurich, Switzerland, September 1991.

[WY]

J. Westbrook and D.K. Yan. Personal communication. unpublished.

[BBK+ 90] S. Ben-David, A. Borodin, R.M. Karp, G. Tardos, and A. Wigderson. On the power of randomization in online algorithms. In Proc.of the 22nd Ann. ACM Symp. on Theory of Computing, pages 379– 386, may 1990. [BFR92]

Yair Bartal, Amos Fiat, and Yuval Rabani. Competitive algorithms for distributed data manage-

Page 10