Parallel Dynamic Algorithms for Minimum

0 downloads 0 Views 380KB Size Report
das@ponder.csci.unt.edu [email protected]. August 1995. 1 Introduction ...... Journal of the ACM, 36(3):540{572, 1989. 17] H. N. Gabow, Z. Galil, T. H. Spencer, ...
Parallel Dynamic Algorithms for Minimum Spanning Trees  Sajal K. Das

Department of Computer Science University of North Texas, Denton, USA [email protected]

Paolo Ferragina

Dipartimento di Informatica Universita di Pisa, Italy [email protected]

August 1995

1 Introduction Parallel graph algorithms have attracted a lot of attention in the last two decades. In particular, two important problems have been looked at extensively from the viewpoint of designing static parallel algorithms. These problems are: nding the connected components of an undirected graph and the minimum spanning tree (or forest) in a weighted, undirected graph. Besides their natural applications in solving numerous problems in science and engineering, the importance of these two problems is also justi ed by the fact that the performance (e.g., time and work complexity) of parallel algorithms for many other graph problems, as for example, ear decomposition, biconnectivity, strong orientation, st-numbering, Euler tours, and so on, depend on the connectivity algorithm [19, 27, 37]. Dynamic algorithms for these two problems have also been studied widely in the sequential computing environment. Recent thrust is to obtain fast parallel dynamic solutions which can be executed on the massively parallel machines consisting of hundreds or thousands of processors. The goal here is to maintain some property of a dynamically changing graph, more eciently than a recomputation \from scratch" on the entire graph. In this case, the property may be the connectivity or the minimum spanning tree of a weighted graph. The changes (also called updates) typically include edge insertions or deletions, vertex insertions or deletions. In some cases the update operation may consist of a change of cost of an edge in a weighted graph. However, this update may be treated as a combination of an edge deletion and an edge insertion. Two issues motivate the search for dynamic graph algorithms: (i) From a practical point of view, we want to solve problems faster by recomputing only some parts of the The rst author has been supported in part by Texas Advanced Research and Technology Program grants TARP-003594-003 and TATP-003594-031. The second author has been supported in part by MURST of Italy. 

current solution as the instance is subject to changes, rather than having to recompute the entire solution from scratch; (ii) From a theoretical point of view, we wish to search for more insight into the nature of the problem and above all in its dynamic formulation. Dynamic parallel algorithms requiring low work on the update and query times are interesting from both of these points of view. Since the minimum spanning tree (MST) of an unweighted graph contains all the information to answer queries related to connectivity, we can maintain the connectivity property of an unweighted graph (i.e. edge weights are 1) by simply maintaining its MST. Therefore, this chapter will focus on the problem of maintaining the minimum spanning tree of an undirected weighted graph under: (1) single vertex insertion or deletion, (2) multiple vertex insertion or deletion, (3) single edge insertion or deletion, and (4) mixed batches of edge updates. In the following, parallel dynamic algorithms will be studied from two di erent perspectives, namely, to develop techniques to increase the bandwidth of the employed data structures such that a batch of updates can be performed simultaneously, or to provide ecient parallel algorithms to speed-up the execution of a single update. If an algorithm allows only single insertion and/or single deletion, it is called fully-dynamic. A dynamic algorithm is called a multiple update algorithm if it performs simultaneous insertions and/or deletions of a batch of edges or vertices. We will devote more attention to the multiple update problem, because of its importance in the context of parallel computing in which the idea is to exploit parallelism in treating all changes simultaneously with the help of multiple processors. There is a source of potential saving since the e ects of some edge insertions and deletions may possibly cancel each other, and hence the corresponding updates may be avoided in a multiple update algorithm. In this chapter, the parallel algorithms will be designed on the widely used parallel random access machine (PRAM) model [19, 27]. In the EREW (exclusive-read and exclusive-write) variant of this model, simultaneous access to the global memory is allowed if each processor is reading from or writing into a di erent memory location at any instant. On the other hand, the CREW (concurrent-read and exclusive-write) model allows concurrent reading but exclusive writing to the same shared memory location. The most powerful variant, called CRCW (concurrent-read and concurrentwrite), allows both concurrent reading and writing to the same shared memory location. This chapter is organized as follows. After some basic concepts of minimum spanning trees in Section 2, we brie y describe in Section 3 the techniques and main ideas underlying some of the results concerned with the static computation of the MST of an undirected, weighted graph. Our aim here is to describe the main parallel static techniques in order to point out what are the inherent diculties that make the dynamic algorithms challenging either from a computational or from a practical point of view. Sections 4{5 describe the most signi cant dynamic parallel algorithms in this eld. In particular, Subsections 4.1 and 4.2 respectively deal with single vertex and multiple vertex updates. Similarly, Subsections 5.2 and 5.3 are concerned with single edge update and mixed batch of edge updates, respectively. For each update problem, we brie y review the existing solutions and then concentrate on the high level description. At the beginning of each section, a table summarizing the complexities of the existing parallel algorithms is also presented. Section 6 concludes the chapter. 2

2 Preliminaries

Let G = (V; E ) be a connected , undirected graph with jV j = n nodes and jE j = m edges. Let us also assume that the graph is weighted , that is a weight (or cost) ce is associated to each edge e 2 E . A spanning tree of G is a subgraph ST = (V; EST ) of G such that ST is a tree and EST  E . The weight (or cost) of the spanning tree ST is the sum of the costs of its edges, i.e. WST = Pe2EST ce. A spanning tree T of G with the smallest possible cost is called the minimum spanning tree (MST) of G. From now on, we assume w.l.o.g. that the minimum spanning tree of a graph is unique. This can be guranteed by imposing a total ordering among the edges of the graph, so that those edges are pairwise di erent.

De nition 2.1. A cut in a graph G = (V; E ) is a partition of the vertex set V into two parts, say X and V ? X . An edge crosses the cut if it is incident to X and V ? X . The following observations will be used extensively in the rest of the chapter and they lead to ecient strategies for solving the MST problem.

 For each node u 2 V , the minimum cost edge in E incident on u, say e(u), belongs

to the MST.  The set of edges E 00 = fe(u) : u 2 V g de nes a set of components. Notice that each component is a tree, and if E 00 forms a tree (i.e., it has only one component), then E 00 is the MST of G.  Given a simple cycle in the graph G, the maximum cost edge in it does not belong to the MST.

There are three well known greedy strategies to sequentially compute the MST of G, all of which are based on the following two coloring rules derivable from the preceding properties.

red rule: Select a simple cycle containing no red edges. Among the uncolored edges

on the cycle, select the one with the maximum cost and color this edge red. blue rule: Select a cut that no blue edge crosses. Among the uncolored edges crossing the cut, select the one with the minimum cost and color this edge blue.

Let us assume that at the beginning all the edges of G are uncolored. Therefore, if we apply only the red rule, the edges remained uncolored form the MST, which is precisely known as Kruskal's algorithm [28]. Otherwise, applying the blue rule only, the edges colored blue are the ones in the MST, leading to what is known as Prim's algorithm [36]. There is yet a third strategy that begins with an empty forest and grows trees on subsets of V until there is a single tree containing all the vertices (recall that G is connected). During each iteration, the minimum cost edge incident on each tree is selected. This is Sollin's algorithm [2]. Such a tree growing strategy is the key technique upon which most of the parallel algorithms for the construction of the MST are based on (e.g. [5]). 3

Recall that the insertion of a vertex also determines the insertion of all its incident edges. However, since the insertion of an isolated vertex is trivial, the edge update problem is, in some sense, the basic tool to be analyzed before going into the details of the existing dynamic parallel solutions. Let us introduce a few concepts that will be useful in understanding the techniques implemented in the existing dynamic algorithms for managing only edge insertions and deletions. There are three cases to be handled for updating the MST under single edge insertion or deletion. Obviously, if a non{tree edge is deleted, no operations are performed because there will be no change in the MST. While for insertion or deletion of a tree edge, the MST may be forced to change. However at most one edge will leave the tree and one edge will enter it. This is called the stable property of MST. For example, let the insertion of an edge (u; v) force out some other edge. Clearly, (u; v) induces a cycle in T . In order to recompute the new MST, it suces to consider such a cycle and detect the edge having the maximum cost. Applying the red rule, we can conclude that the removal of the maximum cost edge on that cycle allows the new MST to be recomputed. In the worst case, there can be O(n) edges in the fundamental cycle and hence there can be O(n) candidate edges for removal. Perhaps the most interesting case occurs when a tree edge (x; y) is deleted from T . Clearly, the deletion of (x; y originates a cut in T . Thus, we may apply the blue rule, computing the minimum cost edge crossing such a cut and connecting the two subtrees of T originated by the removal of (x; y). A simple observation tells that the new connected subgraph obtained by introducing such a minimum cost edge is really the new MST. In the worst case, all the edges in G ? T cross the cut and thus there can be O(m) candidate edges for this replacement.

3 Static Algorithms In this section we discuss the start-over algorithms for constructing the MST of an undirected weighted graph on the PRAM models. Our goal is to describe the main parallel techniques used to compute the MST from scratch, in order to point out what are the inherent diculties that make the dynamic algorithms interesting either from a computational or from a practical point of view. Existing sequential algorithms for the MST construction (e.g. [42]) add at every iteration an appropriate edge to the partially computed minimum spanning forest. The newly added edge never forms a cycle with the previously identi ed edges. Therefore, these algorithms can be characterized as cycle-avoiding algorithms, i.e. they apply the blue rule . Another interesting sequential approach starts out with the given graph and successively removes edges until the resulting graph is the MST, by removing the maximum cost edge in each simple cycle, i.e. it applies the red rule . In such a case, it is not assumed that the cycles are to be broken one at a time or in any speci c order. We may consider several cycles simultaneously in each step, or all cycles may even be broken simultaneously in one step. The correctness does not depend upon how many cycles are broken at each step. However, cycle-breaking start-over algorithms are inecient due to the time required to detect a cycle, and because of the large number of cycles that need to be broken. The best known sequential algorithms for constructing the MST run in time O(n2) 4

for dense graphs [36], and in time O(m log(2) 2 logd n) for sparse graphs [16, 17], where m d = max f n ; 2g is the average density, and log(2) n = log log n. Obviously, (m + n) is a trivial lower bound on the number of operations that must be performed in order to compute the MST, since the total set of nodes and edges in the graph should be visited before completing the computation. It is not yet known how to apply the greedy sequential techniques, sketched in Section 2, in order to derive an ecient parallel algorithm for computing MST. Savage and Ja Ja [38] were rst to provide an algorithm (based on the adjacency matrix representation of graphs) which require O(log2 n) time and O(n2 ) processors on the 2 n CREW PRAM. The processor requirement was later improved to O( log2 n ) by Chin, Lam and Chen [5] who achieve a work-optimal algorithm for dense graphs. Nath and Maheshwari [30] provided an algorithm which require O(log2 n) time using O(n2) processors on the weakest EREW PRAM model. On the powerful CRCW PRAM model, O(log n) time deterministic algorithms exist (e.g., see [40]). In particular, Cole and Vishkin [7] attain a nearly optimal processor bound, i.e. O((n + m) log(3) n= log n) processors, on the STRONG CRCW PRAM model. Awerbuch and Shiloach [1] provided a PRIORITY CRCW PRAM algorithm requiring O(n + m) processors. Johnson and Metaxas [23] provided the rst EREW PRAM algorithm requiring o(log2 n) time and O(n + m) processors, which has been very recently improved by Chong and Lam [6]. In the following we outline the main ideas and techniques underlying the most important static parallel algorithms designed so far. We also sketch the most recent results due to Johnson and Metaxas [23] and Chong and Lam [6].

3.1 Chin, Lam and Chen's Algorithm

This algorithm is based upon an ecient parallel implementation of the hook and contract scheme provided by the sequential Sollin's algorithm [2], yielding a work-optimal solution for dense graphs. The algorithm begins with the forest F0 = (V; ;) of the input connected graph G and grows trees on subsets of V until there is a single tree containing all the vertices. During each iteration, the minimum cost edge incident on each tree is selected (i.e., \marked") and added to the current forest, say Fi, in order to obtain the new forest, say Fi+1. This is called the hook step. Clearly, the total number of trees in Fi+1 is at most one-half the number of trees in Fi (which is called the contract step). Hence the algorithm requires O(log n) iterations. The main problem here is given by the ecient computation of these minimum cost edges incident on each tree of the forest Fi, and the shrinking of the new components of Fi formed by the insertion of those edges. Let W denote the cost matrix of G. For u 6= v, W (u; v) is the cost of the edge (u; v), if any, otherwise W (u; v) = +1. When the algorithm terminates, all the \marked" edges form the nal MST of G. Since the algorithm proceeds in iterations, we associate at each iteration a graph Gi = (Vi; Ei), where each super-node in Vi denotes a tree of Fi, and each edge (u; v) 2 Ei is the minimum cost edge of E connecting the two trees denoted by u and v. Let Wi be the cost matrix relative to the \contracted graph", Gi. The generic i-th iteration consists of the following three steps:

Step 1: For each super-node u 2 Vi , compute the minimum cost edge (u; u0) incident 5

in u. Mark (u; u0) and insert it into Fi+1. Step 2: Identify each connected component of Fi+1, that is assign a unique identi er to the nodes belonging to the same component of Fi+1. Step 3: Set ni+1 to be equal to the number of connected components of Fi+1, and set Wi+1 as the ni+1  ni+1 cost matrix induced by the contracted graph Gi+1 . Exploiting the identi ers given to each node of Gi , Steps 1 and 2 can be implemented eciently in O(log ni ) time and O(n2i ) work on the CREW PRAM model. Finally, Step 3 requires rst to sort the columns of the matrix Wi into ni+1 groups, according to their super-nodes, and then to shrink the sorted Wi, computing the ni  ni+1 matrix Wi0 obtained by replacing each subrow, corresponding to some component of Fi, with its minimum (cost edge). The nal matrix Wi+1 is obtained by performing a similar sorting, but now on the rows of Wi0. Step 3 globally requires O(log ni) time and O(n2i ) work. Since we need O(log n) iterations to obtain a single tree, the total time needed to compute the nal MST is O(log2 n). Moreover, summing up the number of operations performed at each iteration and recalling that ni  ni2?1 , we conclude that the total work is O(n2).

3.2 Awerbuck and Shiloach's Algorithm

This algorithm is also based on the Sollin's strategy, and thus it grows trees on subsets of nodes in V . The main di erence between this algorithm and the Chin, Lam and Chen's algorithm is given by the use of the CRCW model combined with new ecient parallel techniques to improve over the hooking of trees. Since the adopted model is the powerful PRIORITY CRCW PRAM, each processor Pe is assigned to an edge e 2 E , and the cost ce is the number that determines the processor's strength, where lower means stronger. Such a number is used together with the PRIORITY feature of this model, to implement eciently the hooking process without executing any sorting as opposed to [5]. We remark that the pairwise di erence in the edge length guarantees the uniqueness of that edge which is also crucial for the correctness of the algorithm and the de nition of the processor priorities. The main structures used to construct the MST are: the rooted star which is a tree of height 1, and the rooted tree that is a directed tree with a distinguished vertex called root. Let FG denote a forest of rooted trees plus self-loops (i.e., cycles of length 1) which occur only at the roots, and let FOREST be a set of undirected edges which form a subforest of the MST. Therefore, FOREST is a better and better approximation of the nal MST. The algorithm loops for at most O(log n) iterations on three steps, namely star hooking, tie breaking, and shortcutting. In the course of the loop the following invariant is maintained: the connected components of FG are always identical to the connected components induced by FOREST . Thus, FOREST grows until eventually it becomes the whole MST. In the star hooking step, all the processors that correspond to edges outgoing from a rooted star in FG, try to hook the star on another tree. However, since the PRIORITY CRCW PRAM model is used, the winning processor is the one having the highest 6

priority, i.e. the edge having the minimum cost among the ones outgoing from that star. As a result of this step, directed cycles may be formed in FG, but it is possible to prove that all of them have at most length 2. Therefore, during the tie breaking step, these cycles are detected and opened by deleting the edge directed from the smaller indexed vertex to the larger one. Thus FG is again a rooted forest at the beginning of Step 3, and FOREST is a better approximation of the nal MST (by the blue rule). Finally, since hooking a star on another tree always yields a nonstar, during the shortcutting step, the distance between a node and the root of its subforest is halved by applying the pointer jumping technique for a single step (see [19]). The time complexity is easily derived by observing that at each loop either a rooted star is hooked onto another tree of FG, or the distance between a node and the root of its tree in FG is halved. Since the total number of rooted star in FG is at most n, by an accounting argument it can be shown that O(log n) loops are sucient to grow FOREST until it becomes the whole MST.

3.3 Designing an EREW PRAM algorithm

Obviously the preceding two algorithms can be executed on the weaker EREW PRAM model requiring the same number of processors, but introducing an O(log n) slowdown factor in the overall time complexity. Therefore, on the exclusive-write model, all the derived algorithms would require O(log2 n) time to compute from scratch the MST of an undirected weighted graph. Hence, designing an o(log2 n) time EREW PRAM algorithm for constructing the MST has been an open problem for almost a decade. Similarly, it was unknown how to compute the connected components of an undirected graph in o(log2 n) time [27]. The breakthrough in this eld was due to Johnson and Metaxas [20], who gave a CREW PRAM algorithm for computing the connected components in O(log3=2 n) time using O(n + m) processors. Later, other results have been developed in this direction, attaining the same time and processors bounds [26, 25, 31]. Recently, Johnson and Metaxas [23] provided the rst o(log2 n) time algorithm, running on the weakest EREW PRAM model, for computing the MST of an undirected graph. The algorithm uses the growth-control scheduling of the connectivity algorithm in [20], and it also makes use of an observation due to Gabow et al. [16]. A major innovation of this paper, is the discovery that necessary information can be extracted without ever explicitly shrinking the components, as opposed to the approach undertaken in the third step of Chin, Lam and Chen's algorithm. Another diculty is posed here by the exclusive-write requirement, so that the selection of the minimum cost edge to hook subtrees, seems to require either a powerful model of computation (as in Awerbuck and Shiloach's algorithm), or some minimization process (as in Chin, Lam and Chen's algorithm), which thereby takes time logarithmic in the length of the list. To overcome this diculty, Johnson and Metaxas [23] modi ed the classical hook and contract scheme, providing a new approach to schedule the hooking of the subtrees in order to control their growth rate. Unlike the previous methods, in which every subforest could be doubled in size at each iteration ofpO(1) time, this algorithm can schedule every subforest to grow by a factor of at least 2 log n in an iteration of O(log n) p time. Hence, O( log n) iterations suces to nd the nal MST. Indeed, within a single 7

iteration of O(log n) time, slow-growing components are scheduled to hook and contract in o(log n) time repeatedly until they catch up with fast-growing components, and fast-growing components are left idle once they achieve the intended size. The basic idea is very simple, while the non trivial part is to show how each slowgrowing component may contract and hook in o(log n) time. Namely, though a small component requires little time to contract a vertex, it may be time consuming to prepare that vertex to hook again due to the shrinking of its adjacency list. Indeed, after a component has contracted to a node v, we need to form the adjacency list of v by merging the adjacency lists of all the vertices in its component, and cleaning-up the redundant information which includes edges connecting nodes belonging to the same component, and edges between the same pair of components. A trivial cleanup strategy, as the one adopted in [5], would require (log n) time in the worst case. Johnson and Metaxas observed that after an expensive O(log n) time clean-up, a slowgrowing component can be scheduled to contract and hook in o(log n) time repeatedly without cleaning plog n up all the redundant edges, until it grows to a certain size, i.e. at least O(2 ) factor. Then, another O(log n) time clean-up stage is needed and the p computation is repeated. Since O( log n) stages are executed, the algorithm requires globally O(log3=2 n) time using O(m + n) processors. Recently, Chong and Lam [6] improved the worst case time for the cleaning up adjacency lists of slow-growing (i.e., small) components. They provided a new hooking strategy which prohibits the generation of small components with unreasonably long adjacency lists. Indeed, this strategy ensures that the total number of edges in the adjacency list of any component is at most the square of its number of nodes. Thus it can take advantage of di erentiating components with respect to ne di erence in their growth rate, and hence each component receives its most appropriate schedule. Such a schedule can be considered as a re nement or a recursive version of the Johnson and Metaxas's growth schedule, and allows to achieve a better time bound, i.e. O(log n log log n). The next subsection shows how to compute the MST more eciently, that is workoptimally, if the graph has a particular structure.

3.4 Bipartite Graphs

Let G = (V1; V2; E ) be an undirected connected bipartite graph, where V1 and V2 are two partitions of the vertex-set, with jV1j = n and jV2j = k. Bipartite means that there exist no edges connecting two vertices within V1 or within V2. The computation from scratch of the MST on this structured graph is very interesting since it arises in the multiple vertex insertion problem (see Section 4.2). Pawagi and Kaser [33] provided a very simple algorithm that exploits the bipartite nature of the graph to achieve a workoptimal bound. Suppose that n  k. The rst iteration consists of computing, for each vertex in V1, the minimum cost edge incident on it. Note that such a computation involves at most jV2j = k edges for each vertex in V1, and thus can be performed in O(log k) time using O(nk= log k) processors. In the second iteration, each tree formed by these minimum cost edges is contracted, so that k components are formed. Finally, the classical hook and contract technique is applied to the contracted graph which is 8

formed by k super-nodes and at most nk + k2 edges. Using the algorithm of Chin, Lam and Chen [5], the nal MST is computed in O(log2 k) time and O(nk + k2) total work. It remains an open question to develop a work-optimal parallel algorithm running on the weakest EREW PRAM model for the computation of the MST of a general graph. However, even if such an algorithm is devised, it surely requires (n + m) work, due to the inherent visit of the total node- and edge-list of the graph. Therefore, we observe that the maintenance of the MST of an undirected graph under edge/vertex updates, would be very costly if attained by recomputing from scratch the MST at each update, since it would su er from such an inherent lower bound. Hence, if a more ecient solution to the dynamic problem is devised, i.e. o(n + m) work, the new solution should exploit the local properties of the MST without recomputing it from scratch.

4 The Vertex Update Problem The problem of maintaining the MST of a graph under the insertion of a single or multiple nodes along with all their incident edges have been well studied either in sequential or in parallel computation. In sequential computation, rst Spira and Pan [41] and later Chin and Houck [4] presented an O(n) sequential algorithm for updating the MST when a new vertex is inserted into the graph. We point out that, an O(n)-time algorithm for single vertex insertion is optimal , because any such algorithm must examine at least all the edges incident into it, that might be (n) in the worst case. Notice that, inserting a group of k nodes can be trivially solved by applying repeatedly the single vertex insertion algorithm, thus achieving an optimal O(nk)-time sequential algorithm. Namely, after each vertex insertion, the MST increases its size of one unit, so that the total time complexity becomes O(nk + k2) = O(nk), assuming that k  n. Clearly, the optimality comes from the fact that there will be (nk) new edges incident in the k new nodes, that have to be examined in the worst case. Parallel algorithms for maintaining the MST under the insertion of a single or multiple nodes have been also well studied, providing work-optimal solutions. Notice that, the trivial approach applied in sequential computation to treat the case of multiple vertex insertions, cannot be extended to the parallel computing environment because inserting a single vertex at a time would loose all the parallelism inherently present in the problem. On the contrary, although the problem of maintaining the MST under the deletion of a single or a group of vertices has been deeply studied, work optimal parallel algorithms have not been yet proposed. The performance of the existing parallel algorithms for the vertex update problem are summarized in Tables 1 and 2. In Sections 4.1 and 4.2 we respectively outline the basic ideas underlying the most important results in this area, trying to point out what are the techniques, and whenever possible, the data structures employed by those algorithms.

9

Table 1. Performance of Parallel MST Algorithms Under Single Vertex Update Researchers PRAM model Time Complexity Work Vertex Insertion Pawagi & Ramakrishnan [35] Varman & Doshi [44] Jung & Mehlhorn [24] Pawagi & Kaser [33] Johnson & Metaxas [22] Vertex Deletion

CREW CREW CRCW CREW EREW

Tsin [43] CREW Pawagi & Kaser [33] CREW Shen & Liang [39] CREW (deleted vertex of degree d in MST)

4.1 Single Vertex Update

O(log n) O(log n) O(log n) O(log n) O(log n)

O(n2 log n) O(n log n) O(n) O(n) O(n)

O(log n) O(n2 log n) 2 d O(log n + log2 d) O(n2 (1 + log log n )) O(log n log d) O(n2 )

Pawagi and Ramakrishnan [34, 35] were rst to investigate this problem on a CREW PRAM model. Their algorithm solves the single vertex insertion problem in O(log n) time using O(n2 ) processors. Tsin [43] has extended this result to single vertex deletions, thus achieving the same time and processor bounds. The processor bound for the single vertex insertion has been improved by Varman and Doshi [44] to O(n) processors. However, the work of their algorithm is O(n log n), which is a O(log n) factor away from the optimal work of O(n). Later, Jung and Mehlhorn [24] gave an optimal algorithm to treat vertex insertions on the powerful CRCW PRAM model, requiring O(log n) time and (n) total work. Recently, Johnson and Metaxas [22] proposed another optimal algorithm for vertex insertions on the weaker EREW PRAM model, having the same time and work bounds of [24]. Pawagi and Kaser [33], and then Shen and Liang [39] presented a more ecient parallel algorithm for the single vertex deletion problem 2 n requiring O(log n log d) time and O( log n log d ) processors on the CREW PRAM model, where d is the degree of the deleted vertex in the current MST. In the following we discuss only the optimal algorithm due to Jung and Mehlhorn [24], pointing out the main features of their interesting technique based upon an ecient evaluation of expression trees. Even if the described algorithm runs on the CRCW PRAM model, we might exploit the techniques known in the literature for the evaluation of expression trees on the weakest EREW PRAM model [27, 29], and thus implement this algorithm also on this model without loosing the eciency. Jung and Mehlhorn [24] solve the single vertex insertion problem by evaluating an expression associated with the original MST, where an operator is assigned to every vertex of the tree. In such a way, while the whole MST corresponds to the entire expression, the subtree rooted at every vertex of the MST corresponds to a subexpression. The operators of these expressions involve maximum and minimum (by cost comparisons) over a set of edges. However, the techniques for evaluating these subexpressions require that the arity of operators (i.e., number of their arguments) is bounded above by a constant. Thus the rst step is to change the current MST into a bounded degree tree T1, where all the (dummy) edges introduced in the transformation have cost ?1. 10

The insertion of the new vertex z, along with all its incident edges (they are at most n), has inevitably induced a set of cycles in the augmented tree T1. Thus, in order to recompute the new MST, T 0, we need to apply the red rule, rst detecting for each cycle its maximum cost edge and then deleting it. Obviously, a brute-force approach would be very costly since the total number of induced cycles can be very large. Therefore, we exploit the tree structure of T in order to compute all those maximum cost edges eciently. Before proceeding further, we need the following notations. Let v be a vertex in the graph G. A descending z-path from v to z starts at v and traverses T1 toward its leaves, till it reaches the edge connecting a leaf of T1 to z. The minimum descending z-path (in short mdzp) from v is the descending z -path whose maximum cost edge is the least cost among all the maximum cost edges on the other descending z-paths from v. The aim is to build an expression tree on T1 such that at every vertex we compute the maximum cost edge on its minimum descending z-path. After such a computation, the edges in T 0 can be determined as follows: 1. Mark as candidates all nondummy edges of T1 and all edges incident on z. 2. For each vertex v with k children, unmark k edges which correspond to the maximum cost edges on the k +1 descending z-paths from v, except the maximum cost edge on the mdzp. At the end, all the edges remained marked form the nal MST T 0. It remains to show how to compute eciently the maximum cost edge on the mdzp from v. Let v be a vertex of T1 to be processed, and let z; c1; : : :; ck be the children of v in T1. Let us assume also that, by induction, we have the maximum cost edge (say ei) on the mdzp from ci, for each 1  i  k. For all 1  i  k, we de ne pi as the path from v to its child ci, and thereafter it is the mdzp from ci (we de ne p0 = (v; z)). Clearly, the maximum cost edge on each path pi can be determined as the maximum between the cost of the edge (v; ci) and the cost of ei. It is simple to determine by induction the mdzp from v looking only at the paths pi outgoing from v, for all 1  i  k. Indeed, either the maximum cost edge on the mdzp from v is some (v; ci) and thus pi can be taken to be the corresponding mdzp, or such an edge is found on some subpath descending from a child cj of v. In the latter case, the searched edge belongs to the path pj . Therefore, the set of paths p0; : : :; pk contains all the information which suces to compute the mdzp path for all the nodes in T1. More precisely, the actual functions chosen to compute the maximum cost edge on the mdzp from each vertex v are: 1. fv = (v; z), if v is a leaf. 2. fv (x) = min-cost-edge f(v; z); max-cost-edge fx; (v; v1)gg, if v has one child v1. Here, min-cost-edgefa; bg (resp., max-cost-edgefa; bg) denotes the minimum(resp., maximum) cost edge between the edges a and b. 3. fv (x; y) = min-cost-edge f(v; z); max-cost-edge fx; (v; v1)g; max-cost-edge fy; (v; v2)gg, if v has two children, namely v1; v2. 11

These functions become the operators of the expression tree that is evaluated. It is easy to show that, all the edges examined but not selected by the min-cost-edge operator in (2) and (3) (i.e., unmarked), do not belong to the nal MST, T 0, and thus they must be deleted from T1 (by the red rule). Indeed, each edge (x; y) having the maximum cost in a cycle C of the augmented T1, is surely examined but not selected during the execution of the two steps above at some node of T1. Thus, by the red rule, every edge that does not belong to the nal MST, will be unmarked at Step 2 of the above procedure. More precisely, the cycle C must traverse the node z (because T1 is a tree), and thus can be decomposed into two descending paths p0 and p00 from some node v0 2 T1. W.l.o.g. let us assume that (x; y) belongs to the path p0. During the processing of x, the edge (x; y) has been obviously examined, and thus it enters the computation. Hence, when the node v0 is processed in Step 2, either (x; y) has been already unmarked, or it will be unmarked in this step. In fact, since (x; y) is the maximum cost edge on the cycle C , the other path p00 contains edges whose costs are surely smaller than the cost of (x; y). Thus (x; y) does not belong to the mdzp from v0, and hence it will not be selected during the processing of v0 and then it will be unmarked. All the computations described above can be carried out during the contraction of the current MST, thus yielding the overall time and processor bounds as shown in Table 1.

4.2 Multiple Vertex Updates

For the multiple vertex update problem, we have optimal algorithms to insert a group of vertices, along with their incident edges, but it is still unknown how to perform eciently a group of vertex deletions (see Table 2). Let us consider the simpler task of inserting a group of k vertices along with their incident edges. We could attempt this problem on a PRAM, by inserting one vertex at a time as done for the optimal sequential algorithm of [4]. Although this approach is very simple, it turns out to be inecient in parallel. For example, using any parallel single vertex insertion algorithm iterated for k times, the problem could be solved within O(k log n) time. But, if k = (n), the time bound for this problem is not poly-logarithmic and this approach does not give us a fast parallel algorithm. Another approach would be the application of the start-over algorithm of Chong and Lam [6] (see Section 3) to the graph consisting of the old MST and k new inserted nodes. However, when k is very small, say 2, we need O(log n log(2) n) time and (n + k) processors. Therefore, to strike a balance between smaller and larger values of k, we must have time and processor bounds which vary very smoothly with respect to k. Thus, we need techniques ad hoc to perform eciently the vertex updates in parallel. Pawagi [32] presented a parallel algorithm for the multiple vertex update problem, which requires O(log k log n) time using O(nk) processors on the CREW PRAM model, where k is the number of inserted vertices. Later, Pawagi and Kaser [33] described parallel algorithms running on the CREW PRAM model for updating the MST when k new vertices are inserted or deleted in the underlying graph. For multiple vertex deletions, their algorithm requires O(log n + log2 d) time and O( logn2n ) processors, where d is the minimum between n and the total number of edges in T that are incident in the set of vertices to be deleted. Recently, Johnson and Metaxas [21, 22] used a di erent technique to implement Pawagi's algorithm [32], and obtained an optimal algorithm for inserting 12

Table 2. Performance of Parallel MST Algorithms Under Multiple Vertex Updates Researchers PRAM model Time Complexity Work Pawagi [32] CREW O(log n log k) O(nk log n log k) (k vertex insertions) Pawagi & Kaser [33] CREW O(log n log k) O(nk) (k vertex insertions) Johnson & Metaxas [22] EREW O(log n log k) O(nk) (k vertex insertions) 2d Pawagi & Kaser [33] CREW O(log n + log2 d) O(n2(1 + log log n )) (deleting k vertices of total degree d)

a group of k vertices, which runs in O(log k log n) time and O(nk) total work on the EREW PRAM model. Table 2 provides a summary of the existing results. In the rest of this section we will concentrate on the discussion of the Pawagi and Kaser's algorithm [33] for the insertion of k vertices. The major contribution of this paper is the transformation of the original MST and k new vertices to a bipartite graph, which enables to design a work-optimal solution for dense graphs, that is when m = (n2). Let ET be the set of tree edges and Vk = fz1; : : :; zk g be the set of inserted new vertices. Each new vertex may have at most n edges incident on it. Let G0 be the graph after that Vk has been added to G with all the incident edges added to ET . As observed in Section 4.1, we are concerned with the problem of breaking all the cycles in G0 by deleting the maximum cost edge on each of them (by applying the red rule). Since the total number of cycles induced in G0, by the insertion of the k vertices along with all their incident edges, might be very large, the parallel algorithm cannot adopt a brute-force approach to break cycles. Therefore, the signi cance of Pawagi and Kaser's algorithm relies in their novel technique employed to break all those cycles simultaneously and eciently. This technique is based upon a transformation of the augmented graph G0 into a bipartite graph, called Gb . The key property of such a transformation is that, although the MST, Tb, of Gb is not identical to the MST, T 0, of G0, Tb and Gb can help in selecting the edges of G0 that belong to T 0. Moreover, instead of T 0, the computation of Tb can be performed eciently by exploiting the bipartite structure of Gb (as described in Section 3). Therefore, in order to obtain the desired time and processors bounds, Pawagi and Kaser's algorithm breaks these cycles in three stages. Stage 1 consists of building a graph Gz , which constitutes the rst step of the transformation producing Gb. Since the graph Gz , obtained from G0 deleting some edges, has the same MST of G0 , the following two stages of the computation will concentrate on breaking all the cycles in Gz . For this purpose, Stage 2 exploits the bipartite nature of Gz , by constructing a bipartite graph Gb such that, the edges of its MST Tb may help in nding the MST of Gz (i.e., the edges of T 0). Finally, Stage 3 rst computes eciently Tb, by exploiting the bipartite structure of Gb, and then determines the nal MST T 0 with the help of Tb and Gb . The detailed description of the three stages follows. Stage 1 (Construction of Gz ). This stage consists of three steps. First, k copies of T , namely T1; T2; : : : ; Tk, are made. Then, the new vertex zi is inserted in the tree Ti, for each i  k, using the algorithm for the single vertex insertion update. Finally, 13

the new graph Gz = (Vz ; Ez ) is built as follows. The set of vertices Vz is obtained by adding Vk to the set V . The set of edges Ez is given by all the edges incident on some zi in Ti, plus the edges of the old T that are retained in all Ti after the insertion of the new vertices. By construction, it immediately follows that the MST of Gz is identical to the MST T 0 of G0. Thus, let us just concentrate on Gz , and the goal of the following two stages will consist of breaking all the cycles in Gz simultaneously. For this purpose, we point out that the graph Gz is bipartite having the two partitions as: one consists of components of the old MST, T ; the other partition is Vk . Moreover we point out that, since Gz is obtained by merging k trees, any cycle C of Gz contains at least two new vertices of Vk , otherwise, C would be contained in some Ti. These cycles have a bipartite

avor, because they can be described as an alternating sequence of new vertices in Vz and components (subtrees) of T . That is, when we trace one such cycle the following pattern is observed: it starts at a (new) vertex of Vz , then it enters a component of T , traverses some edges of that component, leaves that component, visits a (new) vertex of Vz and so on. Now, we can introduce the following useful notation. Let e-vertex denote a vertex of Gz that is adjacent to a new vertex of Vz . Therefore, each cycle in Gz enters and exits a component only from e-vertices. Moreover, we de ne i-path as the simple path in Gz connecting a new vertex zi to an e-vertex v 2 Gz , such that no other (new) vertex of Vz is on it. Clearly, each cycle of Gz is formed by an alternating sequence of edges connecting e-vertices to new vertices, and i-paths. Stage 2 (Transformation of Gz into Gb ). Since the parallel computation of T 0 by using Gz is still very expensive, Stage 2 performs a further transformation on Gz producing the well-structured (bipartite) graph Gb . Let Ve be the set of e-vertices of Gz . Using the observations above, we build the bipartite graph Gb such that, the partition on the set of vertices is given by Ve and Vk , and each edge (zi; v) corresponds to a simple path in Gz that connects a new vertex zi to an e-vertex v (i.e., i-path). The idea is to de ne the cost of those edges in order to maintain into Gb all the information useful to nd eciently the cycles of Gz , and in particular the maximum cost edges on them. For this reason, the information about this maximum cost edge on those simple paths should not be lost. Hence, the cost of each edge in Gb is given by the maximum cost edge on the simple path that such an edge denotes. This would guarantee that the computation of the MST of Gz (i.e., T 0) can take advantage of the computation of the MST Tb of Gb . In particular, the de nition of the set of edges proceeds as follows. 1. Identify all e-vertices. This is simple due to the fact that such vertices are adjacent to the new vertices of Vz , and thus they can be detected by a simple OR-computation on the selected rows of the adjacency matrix of G [19]. 2. For every new vertex zi and e-vertex v, de ne the edge (zi; v) if there exists an i-path from zi to v. In order to determine eciently if two vertices zi and v are connected, the following approach is adopted. (a) For every zi, make a copy of each component of T and root it at the e-vertex (if any) adjacent to zi. This requires O(n) copies, and since each copy can be executed in O(log n) time and optimal work, the whole process requires O(log n log k) time, using O( log nnklog k ) processors. 14

(b) For every new vertex zi, form one tree from its copies of components by making zi the new root. The maximum cost edge on the path from every vertex to the root can be computed work optimally and in logarithmic time by applying the parallel tree contraction technique [29]. 3. The edge (zi; v) 2 Gb is labeled by the maximum cost edge on the i-path of Gz connecting zi to v. Such an edge has been already computed in Step (2.b), since v is a vertex in the tree built for zi. Since we are considering i-paths departing from each new vertex and entering the components of T , it is possible that an edge is the maximum cost edge on many i-paths and thus it labels more than one edge of Gb . In order to impose the uniqueness of the MST, we need to introduce a lexicographic ordering (denoted by b) among the edges of Gb . In such a way, the edges of Gz turn out to be partitioned into three classes. The rst class contains edges that have no copies in Gb , and thus they cannot be maximum cost edges in any i-path of Gz . Hence, such edges cannot be maximum cost edges in any cycle of Gz , and hence they belong to the nal MST, T 0. The second class correspond to the edges of Gz that have copies into Gb, but they are not maximum cost edges on any cycle of Gz . Thus, by de nition, they belong also to the nal MST, T 0. The last class correspond to the edges of Gz that have copies in Gb and are maximum cost edges on some cycle of Gz . Obviously, by the red rule, such edges must be deleted from Gz in order to compute the nal T 0. The main result in Pawagi and Kaser's paper [33] consists of proving that such edges can be detected looking at the graph Gb . Lemma 4.1. If (x; y) is the maximum cost edge on some simple cycle of Gz , then, for any copy of it there is at least one simple cycle in Gb where such a copy has the maximum cost. Moreover, if (x; y) has copies in Gb but it is not the maximum cost edge of any cycle of Gz , then its minimum copy in Gb (according to b) is retained in the nal minimum spanning tree of Gb.

Stage 3 (Computation of T 0). By Lemma 4.1, we can infer that the MST Tb of

Gb can provide useful information for computing the MST T 0 of Gz (and thus of G0). Namely, since Gb is a connected bipartite graph, with partitions of n and k vertices, we can compute Tb using the algorithm for bipartite graphs, described in Section 3, thus requiring O(log n log k) time and O(nk) total work. As observed above, edges of Gb and Tb help in identifying the edges of Gz that must be removed in order to obtain T 0. From Lemma 4.1, it immediately follows that if an edge is the maximum cost edge on some cycle of Gz , then no copy of such an edge is retained in Tb. Vice versa, if an edge of Gb is not the maximum cost edge of some cycle, then its minimum copy (according to b) is surely in the nal Tb. Thus, it is sucient to check the minimum copy of an edge, to see if it must be canceled from Gz , i.e. it belongs to the third class. Hence, for each edge e of Gz that is transformed, we examine Tb to see if the minimum copy of e is present. If not, e is deleted from Gz ; otherwise it is inserted to form T 0. Such a minimum copy can be found in the claimed time and processor bounds by modeling the computation as an expression evaluation on disjoint trees (see [8, 29]). From the considerations made above, it is simple to derive the time and processor bounds as shown in Table 2. 15

5 The Edge Update Problem Sequential algorithms for updating the MST of a graph under edge updates, p have received considerable attention in the past. Frederickson [15] designed an O( m) time algorithm for the single edge update problem, which is recently improved to O(pn) by Eppstein et al. [11]. They used a new technique called sparsi cation. Although the edge update problem has been well studied in parallel computation, work optimal parallel algorithms have p not been proposed. Till now, the best work bound [10, 13] is even far from the O( n) sequential time, and thus the parallel dynamic algorithms are still far from being work-optimal, at least for sparse graphs. (See Tables 3 and 4 for a comparison.) Designing an ecient dynamic parallel algorithm for updating MST under edge insertions and deletions, requires fast computations of the following three subproblems: 1. Determining the unique path joining each pair of vertices in the minimum spanning tree T . 2. Choosing the maximum cost edge lying on such a path. 3. Identi cation of the vertices that belong to two di erent subtrees that are created by deleting a tree edge (i.e., component identi cation). Existing literature on the parallel dynamic algorithms for edge updates, can be classi ed into two categories, depending on the main ideas involved in the fast computations of the above three properties. One category refers to those papers in which the eciency and speedup were obtained only by applying fast parallel techniques to the edge list of the graph. Consequently, the achieved time and work bounds su er from the inevitable lower bound of (m + n) on the work, required for visiting the graph edges. The other category of papers overcomes this drawback by attempting to p achieve o(m + n) total work in order to match the sequential bound of O( n). Such an approach was rst proposed by Ferragina and Luccio [14], who showed how to combine simple parallel techniques with good dynamic data structures maintained on the current MST. Indeed, the sparsi cation data structure due to Eppstein et al. [12], designed to speed-up fully-dynamic sequential algorithms, has been used in [14] to provide ecient dynamic parallel algorithms, too. Subsequently, this approach has been followed (see [10, 13]), providing more and more ecient techniques to manage in parallel the sparsi cation data structure, thus leading to more ecient parallel solutions to the edge update problem. Section 5.1 brie y reviews the basic concepts and properties of the sparsi cation data structure, that will be used heavily in several algorithms proposed by us. In Sections 5.2 and 5.3 we will overview the most signi cant parallel dynamic algorithms di erentiating them according to the techniques and data structures employed to provide ecient solutions.

5.1 Background on Sparsi cation Data Structures

We introduce the notion of certi cate [3] and other relevant de nitions related to the sparsi cation tree that will be useful in the subsequent sections. For details on the sparsi cation technique, refer to [12]. 16

C C' U C''

C'

C''

G'

G''

A generic internal node of the sparsi cation tree. C 0 and C 00 are the sparse certi cates (i.e., MSTs) of G0 and G00, respectively. Each internal node contains a sparse subgraph formed by merging the two MSTs of its two children (i.e., C 0 and C 00 ), and a certi cate given by the MST of this subgraph (i.e., the MST of C 0 [ C 00 ).

Figure 1.

De nition 5.1. Given a graph G = (V; E ) and a property P , a certi cate for G is a graph G0 = (V; E 0) such that, G has property P if and only if G0 has property P . The graph G0 is a subgraph of G with the vertex-set V and edge-set E 0  E . The certi cate is called sparse certi cate if jE 0j = O(n), that is, G0 is a sparse graph.

Moreover, a sparse certi cate G0 for G is called stable, if an edge insertion or deletion from G produces a constant number of updates in G0. The algorithm that computes the sparse certi cate of G for a property P , is called a sparse certi cate algorithm [3]. For the MST problem, let T^ be a spanning tree of G, then P = (T^ is an MST of G) is the property that we want to maintain. Eppstein et al. [12] have shown that G0 = T . That is, the minimum spanning tree of G is used as certi cate for this de nition of P [3, 12]. De nition 5.1 is satis ed, since the MST of G is trivially the MST of G0 = T , and the sparse certi cate algorithm is here any algorithm computing T from scratch (see Section 3). Additionally, such a certi cate is sparse and stable. In fact, jT j = O(n) and the insertion or the deletion of an edge in G possibly determines the insertion or the deletion of at most one edge in the current MST of G (see Section 2). The computation of certi cates satis es the following important property, that will be exploited to design the sparsi cation data structure. Let G0 and G00 be two graphs de ned on the same set of vertices, and let T 0 and T 00 be their certi cates (i.e., MSTs), respectively. The subgraph T 0 [ T 00 is a certi cate of the augmented graph G0 [ G00 for the MST property. That is, the MST of T 0 [ T 00 is identical to the MST of G0 [ G00. Thus, if the certi cate algorithm is again applied to T 0 [ T 00, the resulting MST is clearly the MST of G0 [ G00 and hence it is its certi cate. The correctness immediately follows by applying the red rule to G0 and G00. It is simple prove that the computation of the certi cate is an associative and a transitive operation. 17

We are ready now to recall the sparsi cation data structure. The sparsi cation data structure has been recently proposed by Eppstein et al. [12] and later improved in [11]. From now on, we will refer to the data structure provided in [12], considering only the case of the maintenance of the minimum spanning tree of a graph. Therefore, if not explicitly written, the notion of certi cate of G and MST of G will be used equivalently, since the certi cate for the MST problem is the MST itself. The sparsi cation data structure built on a graph G = (V; E ), with jV j = n and jE j = m, is de ned as follows. The graph G is partitioned into a collection of sparse subgraphs, G1 = (V; E1); : : : ; Gk = (V; Ek ), where k = d mn e, such that j E1 j= : : : =j Ek?1 j= n, j Ek j n, Ei \ Ej = ; for 1  i < j  k, and E1 [ E2 [ : : : [ Ek = E . The information relevant for each subgraph Gi is summarized in an even sparser subgraph Ci given by the sparse certi cate of Gi (here, Ci is the MST of Gi ). The certi cates are merged in pairs in a common parent, producing larger subgraphs which are again made sparse by applying the certi cate reduction computed by the sparse certi cate algorithm. The result is a balanced binary tree ST of depth O(log mn ), in which the original collection G1; : : :; Gk and their certi cates are allocated to the leaves L1; : : : ; Lk of ST , where Lk is the rightmost leaf called small leaf . Each internal node u of ST contains a (sparse) subgraph G(u), formed by merging the certi cates (i.e., MSTs) contained into the two children of u, and the MST C (u) of G(u) (see Figure 1). The MST C (u) is computed applying the sparse certi cate algorithm to the subgraph G(u). By applying level by level in ST the previous argument, shown for two graphs G0 and G00, it immediately follows that C (u) is the MST of the subgraph formed by the edges contained into the leaves descending from u in the sparsi cation tree ST . Updating the sparsi cation tree under single edge insertions or deletions is simple. An edge insertion a ects the rightmost leaf Lk in ST , if it contains less than n edges, otherwise a new leaf is created and inserted into the tree along with all its parents. On the other hand, an edge deletion a ects the leaf containing the edge to be removed, say Li . In this case, to maintain the load distribution of the edges among the leaves, we need to swap one edge in the rightmost leaf Lk with the deleted edge. Thus, the two leaves Lk and Li will be a ected by the deletion process. Due to the single edge update performed on the a ected leaf, say Li (where i = k in case of insertion), the subgraph G(Li ) has been changed and thus its certi cate C (Li) must be recomputed. Then, we need to maintain consistency in the sparsi cation tree propagating upward the changes induced on C (Li) by the updates in G(Li ). Namely, the updating process a ects the parent p(Li) of Li in ST , since we have to re ect on G(p(Li )) the changes determined by the updates in C (Li)  G(p(Li ))). Clearly, as done before on Li, the updates a ecting G(p(Li )) must be re ected also on C (p(Li )), since it is the certi cate of G(p(Li )). Then, the changes in C (p(Li )) are propagated to the parent of p(Li ) in ST , and so on. Such a computation is repeated until the root of ST is reached. At the end, the nodes a ected by the updating process will belong to the path , called a ected path , connecting the leaf Li to the root of ST . The correctness of this technique is proved by the red rule, applied level by level to the a ected nodes of the sparsi cation tree. Since the number of a ected paths is at most 2 (in case of deletion), each update operation requires to maintain the certi cate of at most O(log mn ) nodes, i.e. sparse graphs . Moreover, the stability property of MSTs guarantees that for each single edge update, at most two edge update operations, namely a single edge 18

insertion and a single edge deletion, must be performed on C (u) to maintain it under the changes in G(u), for all u 2 .

5.2 Single Edge Update

The most of the existing literature on the edge update problem are based on the design of ecient parallel techniques running on the edge list of the graph, without taking particular advantage of any dynamic data structure maintained on the current MST. Consequently, the achieved time and work bounds su er from the inevitable lower bound of (m + n) on the work required. The recent results, however, overcome this drawback by reducing the total work and eciently manage the sparsi cation tree data structure in parallel. Table 3 summarizes the performance of the single edge update algorithms for MST. Pawagi and Ramakrishnan [34, 35] rst investigated the parallel complexity of maintaining the MST of an undirected graph under single edge updates. Their parallel algorithm runs on the CREW PRAM model, requiring O(log n) time and O(n2 log n) work. Pawagi and Kaser [33] proposed a parallel algorithm on the CREW PRAM model requiring O(log n) time and O(n) work for the single edge insertion, while for the single edge deletion O(log n) time and O(n2 ) work. Recently, Johnson and Metaxas [21] presented a parallel algorithm running on the weakest EREW PRAM model requiring O(log n) time and O(m) work. Ferragina and Luccio [14] started a new direction of research in the design of parallel dynamic algorithms for MST. They have shown how to combine the sparsi cation data structure with new parallel techniques, thus obtaining ecient an fully-dynamic parallel algorithm for MST on the EREW PRAM model. However, even if the algorithm of [14] improves over the previous results for dense graphs, it does not take advantage of the stable property of the MST certi cate. In fact, the sparse certi cate of each a ected node in the sparsi cation data structure is recomputed from scratch after each edge update. Consequently, the achieved work bounds su er from the lower bound of

(n), which is however less than the previous bound (m + n) for dense graphs. Hence there was the need to provide an ecient fully-dynamic parallel algorithm, exploiting the sparsity of the subgraph contained into each node of the sparsi cation tree. Along this direction, Das and Ferragina [10] provided the rst sub-linear work EREW PRAM algorithm for updating MST, requiring a total of O(m2=3) work. Applying this algorithm to maintain the certi cate of each a ected node of the sparsi cation tree, the authors show how to achieve an o(n)-work algorithm, requiring O(log n log mn ) time. The factor O(log mn ) or O(log n) for dense graphs, in the time complexity of [10] has been recently avoided by Ferragina [13] who suggested a novel parallel technique for managing the sparsi cation data structure on the EREW PRAM. This technique turns out to be very general, in the sense that it can be used to speed-up fully-dynamic parallel algorithms for MST.

5.2.1 Pawagi and Ramakrishnan's Algorithm

The main contribution of [34, 35] was to demonstrate the importance of inverted trees { rooted trees where a node points towards its parent node { to solve update problems in undirected graphs, on the CREW PRAM model. The current MST is maintained as 19

Table 3. Performance of Parallel MST Algorithms Under Single Edge Update Researchers PRAM model Time Complexity # of processors Pawagi & Ramakrishnan [35] CREW O(log n) O(n22) Pawagi & Kaser [33] CREW O(log n) O( logn n ) Johnson & Metaxas [21] EREW O(log m) O( logmm ) n log(m=n) ) Ferragina & Luccio [14] EREW O(log3=2 n log log mn ) O( loglog( m=n) Das & Ferragina [10] EREW O(log n) O( mlog2=n3 ) n2=3 ) Das & Ferragina [10] EREW O(log n log mn ) O( log n n2=3 log mn Ferragina [13] EREW O(log n) O( log n )

an inverted tree, and the provided algorithms ensure that the recomputed MST is again an inverted tree after successive updates. Let p[w] be the parent of the node w in the inverted tree. The algorithm is based on the observation that the three computations described at the beginning of Section 5 can be done eciently on an inverted tree. Therefore, the following lemma is the key tool for obtaining the nal result:

Lemma 5.2. Let pk [1; jV j] be an array such that, p0[w] = w and pk [w] = p(pk?1 [w]), for all w 2 V and k > 0. The function pk can be computed in O(log n) time using O(n2 ) processors on the CREW PRAM model.

Note that pk [w] is the k-th ancestor of node w in the inverted tree denoting the MST, T , of G. Once the array pk has been computed, we can initialize in the same time and work bounds the array D+ [w], that contains the depth of the node w 2 V , and the array E k [w] containing the maximum cost edge on the tree path connecting v to its k-th ancestor. We remark that E 1[w] = (w; p[w]), and E k [w] is the maximum cost edge between E k?1[w] and (pk?1 [w]; pk [w]). Deleting a single edge. When a tree edge (x; y) is deleted , the algorithm consists of four steps. Let r denote the root of the inverted tree T .

Step 1. Set p1[x] = x, so that the edge (x; y) is deleted from T and thus creates a

forest of two subtrees, one rooted at r and the other at x. Step 2. Compute the arrays pk , for all k  log n. The vertices in each subtree are determined by the respective roots in the array plog n (see Lemma 5.2). Step 3. For each vertex w, nd the minimum cost edge in its adjacency list such that the other end-point is not in the same subtree of w. Select the minimum cost edge among these edges (i.e., apply the blue rule). Step 4. Let (w; w0) be the selected edge. Add it to the new MST setting p1[w] = w0. Finally, reverse the directions of the edges on the path from w to x in the old inverted MST. >From Lemma 5.2, it immediately follows that each step of the above four can be executed in O(log n) time using O(n2 ) processors. Inserting a single edge. Let us assume now that the edge (u; v) has to be inserted in the graph G. After computing the arrays pk ; E k and D+ in the time and processors 20

bounds claimed in Lemma 5.2, we can nd the maximum cost edge emax on the fundamental cycle induced in T by the insertion of (u; v). If the cost of the edge (u; v) is less than emax, we can delete emax from T and insert (u; v). By the red rule, we can infer that the attained tree is the MST of the current graph. To maintain the invariant we need to transform the nal MST into an inverted tree. Thus we execute Step 4 of the deletion procedure above. The previous algorithm is interesting since it constitutes the rst approach to the ecient parallel maintenance of the MST of an undirected graph. However, the number of employed processors is too large due to their inecient use for the computation of the arrays E k and pk . Moreover, a redundant information is computed in order to maintain the MST, and this explains the very large number of operations needed to nd either the minimum cost edge crossing the cut induced by the removal of (x; y), or the maximum cost edge on the cycle induced by the insertion of (u; v). The parallel fully-dynamic algorithms described in the following will address these important issues. In particular, they will speed up those basic operations by developing new parallel techniques and new auxiliary data structures which store the previous solution and thus provide a rapid access to the necessary information for ecient updates of the current MST.

5.2.2 Johnson and Metaxas's Algorithm

This algorithm introduces the use of cross-pointers between each edge (u; v) and its reversal (v; u) (see [21]). Such pointers are very useful to perform eciently the edge deletion operation without read/write con icts. The main idea underlying the use of these cross-pointers is that, they allow us to determine in constant time and without read con icts if two nodes belong to the same half of the cut induced by the removal of a tree edge. Therefore, let us assume that we maintain bidirectional links between each edge (u; v) and its reversal (v; u) so that the reversal of any edge can be identi ed in constant sequential time. The detailed procedures to treat a single edge update follow. Inserting a single edge. The inserted edge (u; v) forms a cycle C in the current MST T . By a list ranking on the edges of T , it is easy to nd the maximum cost edge in C in O(log n) time using O( logn n ) processors [19]. Indeed, given the Euler tour of T , we mark the edge (u; p(u)) by 1, and all the other edges by 0. Then we execute a parallel pre x sum on the Euler tour of T , thus broadcasting the value 1 to all the edges following (u; p(u)) in that list. Now, the integer value associated with an edge and its reversal, allows us to establish if this edge is on the upward path connecting u to the root of T . Namely, it is simple to show that if (p(w); w) has value 0 and (w; p(w)) has value 1, for some node w in T , then (w; p(w)) is on the upward path from u. In a similar way, the upward path emanating from v is found, and hence the lowest common ancestor of u and v can be identi ed in the same time and work bounds. Finally, a list ranking is applied on the edges belonging to those two upward paths, thus computing the maximum cost edge in the required performance bounds. Deleting a single edge. If the edge (x; y) to be deleted does not belong to T , no change a ects T . Otherwise, the deletion of (x; y) induces a cut in T , and to recompute the new minimum spanning tree T 0, we need to nd the minimum cost edge crossing that cut. This can be done by visiting the tree T in order to give the same mark 21

to the nodes belonging to the same half of the cut. Then, using the marks and the bidirectional links (connecting each edge to its reversal), a list ranking of E ? f(x; y)g can nd the minimum cost edge crossing the cut without read/write con icts. These steps require O(log n) time and O( logmn ) processors. Obviously, a more ecient solution cannot be found by speeding up the parallel techniques described above, since they are already work and time optimal on the EREW PRAM model. Therefore, the performance of the fully-dynamic parallel algorithms for MST strictly depends upon the design of more ecient parallel data structures that can support fast edge update operations. The following three papers are based on this approach.

5.2.3 Ferragina and Luccio's Algorithm

The sparsi cation data structure is combined with a novel parallel technique, which allows us to work in parallel on all the nodes a ected by the update operations. In this way, we may overcome the inherent sequentiality of the method provided in [12]. The resulting single edge update algorithm is more ecient for dense graphs, compared with the previous known bounds. As observed in Section 5.1, the operation Ci  Cj de ned as the merge of two certi cates Ci; Cj (i.e., MSTs) and then the computation of the sparse certi cate of Ci [ Cj , is associative. Let [i] be the i-th node from the leaf level in the a ected path . After a single edge insertion/deletion, the subgraph of [0] is changed. Instead of propagating this update upward in , by sequentially processing all the a ected nodes as done in Section 5.1, we apply the classical parallel pre x technique [19] using  on a proper vector C of certi cates. Each entry C [i] contains the certi cate of the child of [i] not belonging to  (if any); otherwise C [i] is an empty graph. Note that [ji C [j ] is a subgraph of G, and it suces for computing the certi cate of [i]. Indeed, it can be easily proved that the MST of [ji C [j ] is identical to the MST of the sparse subgraph contained into the node [i]. This property immediately follows from the construction of the sparsi cation tree, which guarantees that the certi cate of each node corresponds to the MST of the subgraph formed by all the edges contained into the leaves descending from it (see Section 5.1). Thus, exploiting the associativity of , we perform a parallel pre x computation on C . At the end of this computation, C [i] will contain the new certi cate of the node [i], after the update on [0]. Thus C [i] can be copied in the corresponding node [i]. Ferragina and Luccio apply the algorithm in [23] for computing the MST of C [i] [ C [j ] on the EREW PRAM model, i.e. the -operation. Thus, the algorithm globally requires O(log3=2 n log jj) and O(n logjjj j ) processors, because each C [i] is a sparse graph. Since jj = O(log mn ), then the complexity shown in Table 3 immediately follows.

5.2.4 Das and Ferragina's Algorithm

The authors [9, 10] provide a fully-dynamic parallel algorithm for MST running on 2 the EREW PRAM model and requiring O(m 3 ) work and O(log n) time complexity. The proposed algorithm is then improved using the sparsi cation data structure, which introduces an extra logarithmic factor in the time complexity, but it needs a signi cantly 22

C1

2

v1

1

C4

v2 1

C2 2

v3

v4

v8 3

3

3 v9

1 v7

v10

5

3

C3

8 2 v6

FC v1 v2

Figure 2.

= 2.

1

v5

v1

v2

FC

-

2

v3

2

-

v4

2

v3

v4

-

2

2

C3 C 4 C3

E

34

C4

-

( v6 ,v10 ) --- ( v6 , v9 )

E43

(v10 , v 6 ) --- ( v 9 , v6 )

An example of MST and the partition determined by the clustering phase, where

lower work, namely O(n 23 log mn ). We point out that this is the rst algorithm in the literature requiring o(n) work for the maintenance of an MST under single edge updates. In this paper, the employed parallel data structures are designed to handle bounded{ degree graphs in which no node has degree greater than three. This is not a limitation since, given the input graph G, a well-known transformation in graph theory [15, 18] can be used to produce a graph G0 = (V 0; E 0), with jV 0j = jE 0j = O(m), in which each node satis es the desired degree constraint. Therefore, for the purpose of describing the algorithm, one can assume to deal with bounded{degree graphs of n^ = O(m) nodes, each of degree at most three. To eciently support single edge update operations in the MST, a set of data structures similar to the one proposed by Frederickson [15] is maintained. Indeed, the current MST is partitioned into connected components Ci, henceforth called clusters , such that jCij 2 [ ; 4 ], for some suitably chosen . Given such a partition of T , we need to introduce the following notation. A tree edge is said to be internal, if it connects two nodes belonging to the same cluster, otherwise it is called an external edge. A boundary node is one having an external edge incident on it. For example, in Fig. 2, dark edges correspond to an MST of a graph and broken edges are non-tree edges. The labels on the edges are their weights or costs. Also, the edge (v1; v2) is an internal edge while (v1; v8) is an external edge. The nodes v1; v2; v3; v4; v5; v8 are boundary nodes. Moreover, the notation Cx will be used to mean the cluster containing the node x of the graph. Thus Cx = Cy if and only if both the nodes x and y belong to the same cluster. The following data structures are maintained:

D1 : Adjacency lists of the minimum spanning tree, where each node is associated with a pointer to its cluster in D3. 23

D2 : Adjacency lists of the bounded-degree graph in which each node has degree at most 3. We assume that each edge e = (u; v) has a pointer to its reversal edge eR = (v; u). D3 : Each cluster is maintained as an array containing the list of nodes belonging to it. Thus D3[i] is the list of nodes belonging to the i-th cluster. D4 : The external edges of the i-th cluster are stored in the array D4 [i]. Thus D4 combined with D3 represents the `super' tree having as nodes the clusters and as edges the external edges of the current minimum spanning tree, T . Eij : It is the list of non{tree edges (maintained as an array) connecting a pair of clusters, Ci and Cj . The minimum cost edge within each Eij is also maintained. We assume that each edge e = (u; v) 2 Eij has a pointer to the head of Eij and another pointer to its reversal edge eR = (v; u) belonging to Eji. Figure 2 illustrates E34 and E43. FCz : It is de ned for each cluster Cz and consists of a two-dimensional array indexed by a pair of boundary nodes of Cz . For a boundary node-pair v and w, the maximum cost edge in the tree path connecting v to w is stored in FCz [v; w]. Partitioning the tree into clusters and constructing the above data structures require O(log2 n^ ) time and O(^n log n^ ) total work. Due to lack of space, we do not go into the details of this construction phase, nor can we describe in details the two basic operations, called Split and Merge . For the sake of presentation, let us brie y review their use in the design of the nal algorithm. The objective of the Split operation is to make sure that the cluster partitioning is consistent under the removal of an edge. Here consistent means that we maintain the current MST partitioned into clusters of sizes in the desired range [ ; 4 ]. The main problem lies in re ecting the changes induced in the cluster partition by the removal of some edge (x; y), on the whole set of data structures under consideration. In [10] we have shown that a Split operation a ects the data structures of at most two2 clusters so that their maintenance can be performed in O(log n^ + log ) time and O( n^ 2 + ) work. The objective of the Merge operation is to restore the consistency in the cluster partitioning when a cluster Cu has a size less than (small cluster). By induction, let us assume that all the other clusters adjacent to the small one, are consistent according to our notion. Therefore, merging the small cluster Cu with an adjacent consistent cluster, say Cv , by the external tree edge (u; v) creates a larger cluster Cu+v of size in the range [ ; 5 ]. Now, the e ect due to the merging of the two clusters must be re ected on the whole set of data structures. Since the size of Cu+v can be larger than 4 , we possibly need to perform a further Split operation on some internal edge of Cu+v . Again a Merge operation a ects the data structures of at most two clusters, and 2 n ^ n ^ their maintenance takes O(log + log ) time and O( 2 + ) total work. Inserting a single edge. Since the insertion of an edge (u; v) into the minimum spanning tree T induces a fundamental cycle C , to maintain eciently the MST we fast retrieve the maximum cost edge in C using the data structure F , without explicitly scanning C (as instead done in all the previous algorithms). Let T^ denote the contracted 24

minimum spanning tree, where the clusters are nodes of T^ and the external edges of T are edges of T^ (e.g., see Figure 3). If the edge (u; v) to be inserted has end-points belonging to the same cluster Cu = Cv , the cycle C is entirely contained within Cu and the simple algorithm of Johnson and Metaxas [21] can be used to nd the maximum cost edge, requiring O(log ) time and O( ) work, since jCuj = O( ). Otherwise, the computation is more complex and involves the use of data structure F . In this case, the main idea is to exploit the \super" tree T^ whose description of such is stored in the data structures D3 and D4. It is simple to observe that the cycle C can be decomposed into a sequence of alternating simple paths of T , that are entirely contained within the clusters, and external edges (see Figure 3). Thus, in order to speed-up the computation of the maximum cost edge on C , we do not scan straightforwardly the edges on C , but we exploit the information available in F about the simple paths in C that are contained within a cluster, since they are incident on boundary nodes. Therefore, the maximum cost edge in those simple paths is computing using the data structure F , while the maximum cost edge among the external edges in C can be computed eciently by observing that those edges form a cycle in the super tree T^ consisting of O(^n= ) nodes. Thus we can apply on them the same technique used in the Johnson and Metaxas's algorithm [21], requiring at most O(^n= ) work and O(log n^ ) time. Deleting a single edge. We consider the more interesting case which deletes an edge (x; y) belonging to the minimum spanning tree T , otherwise this operation does not signi cantly a ect the data structures. If the edge (x; y) is internal to a cluster, we execute the Split operation on it and maintain the consistency of the cluster partitioning. Now T is partitioned into two connected components, and the minimum spanning tree can be recomputed with the help of the data structure E , by eciently nding the minimum cost edge connecting these two components. Recall that T^ denotes the super tree whose nodes are the clusters and edges are the external edges of T (see Figure 3). After having split T (and thus T^) into two connected components, we are interested in nding the entries Eij such that Ci belongs to one half of the induced cut, while Cj belongs to the other half. In fact, such entries are the candidates to contain the minimum cost edge crossing that cut. Therefore, to identify those entries, we simply apply in T^ a marking procedure to the clusters, depending on which half of the cut they belong to. Then, the marks are broadcast in the data structure E , so that each entry Eij knows if it refers to two clusters belonging to the two di erent halves of the cut. 2 n ^ This process retrieves at most O( 2 ) edges that are candidates for being the minimum cost edge crossing the cut (i.e., as many candidates as the total number of cluster-pairs). A parallel minimum computation on those values nds the edge to be inserted in order to recompute the minimum spanning tree. Thus, a single edge deletion from the MST requires O( n^22 + ) work and O(log + log n^ ) time on the EREW PRAM model. Now, in order to minimize the total work2 of the proposed fully-dynamic algorithm, 2 3 3 Das and Ferragina [10, 9] assumed = n^ = O(m ). Substituting this value of into the time and work bounds computed for the single edge insertion and deletion procedures, the complexities of Table 3 are achieved. To further reduce the total work, Das and Ferragina combined the sparsi cation data n2=3 ) structure with the algorithm for single update operation. Indeed, a group of O( log n processors is mapped to the leaf Li a ected by the single edge update operation. These 25

C1

a b

c

C2

e1 e2

z

C4

C5

v

w

C3

u

An example of a tree of clusters T^, where Ci 's are the clusters into which the minimum spanning tree T was partitioned. The dotted edges are in the graph but not in T . The cycle induced by the insertion of (u; w) consists of (u; v ), pathC4 [v; z ], (z; b), pathC2 [b; c], and (c; w), where pathC [h; k] denotes the simple path in T connecting the boundary nodes h; k 2 C .

Figure 3.

processors in parallel maintain the MST certi cate of this leaf requiring O(log n) time, since the subgraph is sparse. As described in Section 5.1, this update is propagated on all the nodes on the a ected path  connecting Li to the root of the sparsi cation tree. The group of processors used for Li scans sequentially the path , and allows the update of the certi cates on  to be performed in parallel. Since jj = O(log mn ) and each node requires the application of the fully-dynamic parallel algorithm, we achieve the performance bounds provided in Table 3.

5.2.5 Ferragina's Algorithm

Recently, Ferragina [13] combined the sparsi cation data structure with a novel parallel technique which eciently treats edge deletions. Indeed, such a technique avoids the sequential scan of the a ected path  as in [10], and recomputes in parallel all the certi cates contained into the nodes in . Inserting a single edge. Recall that the insertion of an edge, say eins , a ects the small leaf Lk and thus we have to update all the certi cates of the nodes in the rightmost path  of the sparsi cation tree (see Section 5.1). In principle we want to distribute directly the information about the edge eins to be inserted to all the nodes on , without waiting for the \updating information" coming from its descendants. Thus, we distribute eins to all the nodes on , by copying it jj times in order to avoid read/write con icts in the following steps. Then, we recompute simultaneously the certi cates C (u) of all the nodes u 2 , inserting eins in all G(u), by the fully-dynamic parallel algorithm of Das and Ferragina [10] applied to G(u). Finally, the new subgraph G(u) is recomputed as the \union" of the two certi cates of its children. The correctness of this algorithm follows from the observation that the augmented subgraph G(u) [feins g is a certi cate of the subgraph formed by all the edges contained into the leaves descending from u in the sparsi cation tree. This trivially follows by the properties of certi cates, discussed in Section 5.1. Thus, computing the MST of G(u) [ feins g consists of inserting the single edge eins into C (u). Moreover, the \recomputation" of the new subgraph G(u) by merging the two certi cates of its children, is dominated by the complexity of the previous steps, since the merge process is not explicitly performed, but it takes advantage of the stable property of MST. Therefore, 26

the total time is O(log n) and the total work is O(n2=3 log mn ). Deleting a single edge. The simple distribution strategy adopted to eciently perform a single edge insertion, cannot be applied to the deletion of an edge. This is due to the fact that the certi cate of each node summarizes only the relevant information for the computation of the MST of the underlying subgraph. Thus, the edge needed to be inserted to maintain the certi cate of a node after an edge update, might not be available. Hence, we proceed as follows. Let Gtot(nj ) be obtained by merging the edges contained into the leaves descending from nj . Thus, by the properties of certi cates, Gtot(nj ) corresponds to the subgraph of G whose MST is C (nj ). Obviously, G(nj )  Gtot(nj ). Suppose the edge edel is deleted from the leaf n0, thus obtaining the graph G(n0 ) ?fedelg. The a ected path  connects n0 to the root of the sparsi cation tree, and it is formed by the nodes n0; n1; : : : ; n?1; where jj = . For each nj 2 , we de ne C new (nj ) as the new certi cate of nj after the deletion of edel. Indeed, C new (nj ) denotes the certi cate of Gtot(nj ) ? fedelg. Now, if the MST, C (nj ), does not contain the edge edel , all the ancestors of nj (including itself) are not a ected by the deletion operation according to the updating procedure. Otherwise edel 2 C (nj ), and its deletion cannot be performed by recomputing the certi cate of nj using only the edges within G(nj ), but it has to consider Gtot (nj ). Ferragina has shown how to use the certi cates of the sparse subgraphs G(nj )?fedel g for determining the nal MST in nj , say C new (nj ), for all nj 2 . Let eins j denote the del edge to be inserted in C (nj ) ? fe g in order to recompute the MST of the subgraph G(nj ) ? fedelg  Gtot(nj ) ? fedelg. It can be shown that: del Lemma 5.3. The edge eeff k to be inserted into C (nk ) ? fe g, for maintaining the ins ins certi cate C new (nk ), belongs to the set feins 0 ; e1 ; : : :; ek g.

A straightforward application of this result implies that a single processor mapped to each node nk of the a ected path , can scan all the descendants of nk in order to 2m m compute eeff k . This would require O(log n ) time and O(log n ) total work. In [13], it is also shown how to further reduce the total work of such a computation by performing a kind of parallel pre x sum on the list of edges eins j , for all j   . The operation underlying this parallel pre x computation, exploits the cut induced in each certi cate C (nj ) by the removal of edel. The obtained algorithm globally requires O(log n) time and O(n2=3 log mn ) work, thus it is faster and requires smaller work than any other previous algorithms (see Table 3 for a comparison). What it is very important to point out is that, this technique is general and can be applied to speed-up parallel fullydynamic algorithms for MST. Namely, let us assume that a more ecient fully-dynamic algorithm for MST require T (n; m) time and W (n; m) total work. The new algorithm attained by applying Ferragina's technique, would then require O(T (n; n) + log log mn ) time and O(W (n; n) log mn ) total work. Note that T (n; n) and W (n; n) are the time and work bounds of the used fully-dynamic algorithm for sparse graphs. Clearly, the total work of the attained algorithm is reduced, still maintaining the same time complexity.

5.3 Multiple Edge Updates 27

Table 4. Performance of Parallel MST Algorithms Under Multiple Edge Updates Researchers PRAM model Time Complexity Work Batch of b insertions Pawagi & Kaser [33] Shen & Liang [39] Ferragina & Luccio [14] Das & Ferragina [10] Batch of b deletions

CREW CREW CREW EREW

O(log n log b) O(log n log b) O(log3=2 n) O((b + log mn ) log n)

O(nb) O(maxfnb; n log n log bg) O(n log3=2 n log mn ) O(n2=3(b + log mn ))

Pawagi & Kaser [33] Shen & Liang [39] Ferragina & Luccio [14] Das & Ferragina [10]

CREW CREW EREW EREW

O(log n + log2 b) O(log n log b) O(log3=2 n log mn ) O(b log n log mn )

2b O(n2(1 + log log n )) O(n2 ) O(bn log3=2 n log mn ) O(bn2=3 log mn )

Pawagi and Kaser [33] were rst to propose a parallel algorithm on the CREW PRAM model to maintain the MST under multiple insertions or deletions of edges. Although other results have appeared in the literature [14, 10, 39], yet ecient solutions to the multiple edge deletion problem are not known. The following discusses the main ideas behind the existing literature.

5.3.1 Pawagi and Kaser's Algorithm

The multiple edge insertion problem is solved by transforming it into the problem of inserting a group of vertices in the graph, along with a proper set of incident edges [33] In this way, the work optimal solutions known for the latter problem, can be used to compute the new MST T 0 in the case of multiple edge updates (see Section 4.2). The transformation consists of mapping each edge insertion into a single vertex insertion so that the graph obtained after the updating is an homeomorphic expansion of the original graph. This transformation provides a fast parallel solution, however, it su ers from the inherent lower bound (nb) of the multiple vertex insertion problem. Similarly, in case of b deletions, Pawagi and Kaser's algorithm computes the new MST by simply applying a start-over algorithm to the graph obtained after the removal of the b edges. The key property here is that, the start-over algorithm can take advantage of the b connected components in which T has been partitioned after the removal of the b edges. Thus, the hook and contract technique of the static algorithm can start from this partition instead of processing the whole graph from scratch. Even in this case, the transformation adopted to solve eciently the multiple edge deletion problem su ers from the lower bound (m + n) relative to the recomputation of the MST from scratch. Before describing the algorithm for the multiple update problem, we need to introduce a simple lemma that will allow us to perform the insertion and deletion perations in the claimed bounds.

Lemma 5.4. On the CREW PRAM model, for an n-vertex tree, the maximum cost edge on the path from each vertex to the root r can be found in O(n=p) time with p  n= log n processors. 28

This result has been recently extended by Das and Ferragina [10] to the EREW PRAM model by augmenting the tree contraction technique [29] with a new RAKEoperation. Inserting a batchof b edges. If b = 1, i.e. the batch consists of a single edge to be inserted, the approach is to remove the maximum cost edge from the cycle induced by the insertion of the edge into T . However, this approach does not lead to a resourceecient algorithm for a batch of b > 1 edge insertions, since the total number of induced cycles grows exponentially with b. Pawagi and Kaser generalized the approach due to Chin and Houck [4] arriving at the result shown in Table 4. Each edge (u; v) to be inserted is replaced with a pair of new edges (u; z) and (z; v) having costs c(u;v) and ?1, respectively, where z is a new (dummy) vertex. The resulting graph is a homeomorphic expansion of the original graph in the sense that the maximumcost edge on any cycle is the same as the maximum cost edge on the corresponding cycle in the original graph. Thus, the multiple edge insertion problem is transformed into the multiple vertex insertion problem, where each new vertex has two incident edges. In order to compute the new MST, we can simply apply the multiple vertex insertion algorithm (see Section 4.2) to all the new vertices z, along with their incident edges (u; z) and (v; z). Deleting a batch of b edges. This simple algorithm removes all the b edges from the current MST, T , in such a way that b components are identi ed. Then, following the approach in [43], the minimum cost edge between each pair of components is found. The graph G0 having as vertices the b components, and as edges the minimum cost edges connecting them, is constructed. Finally, the MST of this graph is computed from scratch and its edges are added back to form the new MST, requiring the bounds shown in Table 4. A nal pass using optimal Euler tour techniques [19] may be required to corectly orient the edges of the minimum spanning tree and to reroot it.

5.3.2 Das and Ferragina's Algorithm

The parallel algorithms of [33, 34, 35, 39] for maintaining the MST under a batch of edge insertions and deletions do not take advantage of any special data structure that could reduce the number of operations needed to recompute the MST. In this context, it is worth mentioning that Ferragina and Luccio [14] rst applied a dynamic data structure along with known parallel algorithmic techniques. However, the drawback of this approach is that the certi cate of each a ected node is always recomputed from scratch. In [10], we have designed an \adaptive" version of the parallel algorithm presented in [14] to treat a mixed batch of edge updates. Here, \adaptive" means that the update algorithm chooses to apply either a static or the fully-dynamic algorithm of [10], depending on the number of edge updates a ecting a single node of the sparsi cation tree. In such a way, the adaptive algorithm takes advantage of the stability property of the MST certi cate (see Section 5.1). Inserting a batch of b edges. The new batch of edges are inserted into the rightmost leaf of the sparsi cation tree, possibly splitting that leaf to maintain its sparsity. Let us consider b = O(n) so that only a constant number of additional leaves are created. A pipelining computation is performed on the a ected rightmost path  to perform the b edge insertions. Indeed, the fully-dynamic parallel algorithm provided in [10] is 29

used to perform eciently the single edge updates at each node on . As shown in Section 5.1, given an edge insertion a ecting the rightmost leaf of the sparsi cation tree, this *** which?*** operation determines at most two edge updates (one insertion and one deletion) into the subgraph of each node [i] on , where 1  i  dlog mn e. Let o1(t) be the t-th edge insertion of the input batch and oi(t) be the pair of edge insertion and deletion operations to be performed on the node [i] as the result of the t-th update of the path . Since the changes on the certi cate of [i] will a ect the subgraph contained in [i + 1], we conclude that oi+1 (t) depends on oi (t). Das and Ferragina's algorithm is applied on each oi(t) in order2=3to maintain the certi cate of n ) processors to each node of [i], and then determine oi+1 (t). Thus, mapping O( log n , each edge update requires O(log n) time. Since jj = O(log mn ), the procedure for b edge insertions requires the performance bounds stated in Table 4. In comparison, the approaches proposed in [33, 39] require a larger total work, even running on the CREWpPRAM model. Moreover, the algorithm in [14] requires more 3 3 2 time and work if b < n log n. Refer to Table 4 for an overview of this comparison. Deleting a batch of b edges. In this case the concept of an adaptive algorithm is introduced, in the following sense. Depending on the number of edge updates a ecting a node of the sparsi cation tree, the adaptive algorithm decides whether to recompute the certi cate from scratch or to apply a fully-dynamic algorithm repeatedly. Since the subgraph is sparse, the recomputation of the certi cate (i.e., MST) from scratch of each a ected node requires O(log n log(2) n) time and O(n) processors. using the algorithm of Chong and Lam [6]. Instead, if we apply repeatedly the fully-dynamic algorithm of [10], we can recompute the certi cate of each a ected node in O(log n) time and O(n2=3) work per update. The leaves a ected by the removal of b edges are the ones containing those edges to be deleted. In such a case, b swap operations must be performed with the small leaf in order to maintain the balance in the distribution of the edges among the leaves of the sparsi cation tree. Thus, b upward paths might be a ected by the updating process. The certi cates of all the a ected nodes are recomputed in parallel and level-by-level, so that each node is recomputed only once . This does not require pipelining. From now on, for the sake of presentation, let us x our attention to a single a ected node of the sparsi cation tree. Let bS be the size of the batch of updates a ecting a single node. In order to reduce the time required by each certi cate recomputation, the adaptive algorithm applies repeatedly the fully-dynamic algorithm, as sketched above, on the sparse subgraph of this node if bS < log n, thus requiring O(b log n) time and O(bn2=3) work. Otherwise, the adaptive algorithm recomputes from scratch the minimum spanning tree of its sparse subgraph, and thus also the whole set of data structures maintained by the fully-dynamic algorithm in this node. In this case, the required time for updating is O(log2 n) and the overall work is O(n log n log(2) n). On the other hand, if the goal is to minimize the total work performed in each a ected node of the sparsi cationp tree, the adaptive algorithm applies repeatedly the fully-dynamic algorithm if bS < 3 n log n log(2) n; otherwise it performs a recomputation from scratch.

30

6 Conclusions In this chapter, we have described dynamic parallel algorithms for updating the minimum spanning tree under single and multiple edge/vertex insertions and deletions. As discussed, several parallel algorithms have been developed for the edge/vertex update problem, but work-optimal parallel algorithms are still required to be developed. Therefore, new parallel data structures and algorithmic techniques are needed to match p even the sequential bound O( n) provided in [11]. This is an important topic, since maintaining the MST of an undirected graph is the basis of many real-life applications [19, 27]. It remains an open question how to design ecient dynamic parallel algorithms for other problems, for example k-vertex connectivity or k-edge connectivity, possibly extending the use of the sparsi cation data structures in the context of parallel computing.

References

[1] B. Awerbuch and Y. Shiloach. New connectivity and MSF algorithms for ShueExchange network and PRAM. IEEE Transactions on Computers, C-36(10):1258{1263, 1987. [2] C. Berge and A. Ghouila-Houri. Programming, Games, and Transportation Networks. John Wiley, New York, 1965. [3] J. Cheriyan and R. Thurimella. Algorithms for parallel k-vertex-connectivity and sparse certi cates. In ACM Symposium on Theory of Computing, pages 391{401, 1991. [4] F. Chin and D. Houck. Algorithms for updating minimum spanning trees. Journal of Computer and System Science, 16:333{344, 1978. [5] F. Y. Chin, J. Lam, and I. Chen. Ecient parallel algorithms for some graph problems. Communications of the ACM, 25(9), 1982. [6] K.W. Chong and T. W. Lam. Finding connected components in O(log n log log n) time on the EREW PRAM. Journal of Algorithms, 18:378{402, 1995. [7] R. Cole and U. Vishkin. Approximate and exact parallel scheduling with applications to list, tree and graph problems. In IEEE Symposium on Foundations of Computer Science, pages 478{491, 1986. [8] R. Cole and U. Vishkin. The accelerated centroid decomposition technique for optimal parallel tree evaluation in logarithmic time. Algorithmica, 3:329{346, 1988. [9] S. K. Das and P. Ferragina. A fully-dynamic EREW parallel algorithm for updating MST. Technical Report CRPDC-94-8, Center for Research on Parallel and Distributed Computing, Dept of Computer Science, University of North Texas, 1994. [10] S. K. Das and P. Ferragina. An o(n) work EREW parallel algorithm for updating MST. In European Symposium on Algorithms, Lecture Notes in Computer Science, SpringerVerlag, pages 331{342, 1994. [11] D. Eppstein, Z. Galil, and G. F. Italiano. Improved sparsi cation. Technical Report 9320, Dept. of Information and Computer Science, University of California, Irvine, 1993. [12] D. Eppstein, Z. Galil, G. F. Italiano, and A. Nissenzweig. Sparsi cation - a technique for speeding up Dynamic Graph Algorithms. In IEEE Symposium on Foundations of Computer Science, pages 60{69, 1992. [13] P. Ferragina. An EREW PRAM fully-dynamic algorithm for MST. In International Parallel Processing Symposium, 1995.

31

[14] P. Ferragina and F. Luccio. Batch dynamic algorithms for two graph problems. In Parallel Architectures and Languages Europe, Lecture Notes in Computer Science, SpringerVerlag, pages 713{724, 1994. [15] G. N. Frederickson. Data structures for on-line updating of minimum spanning trees, with applications. SIAM Journal on Computing, 14(4):781{798, 1985. [16] H. N. Gabow, Z. Galil, and T. H. Spencer. Ecient implementation of graph algorithms using contraction. Journal of the ACM, 36(3):540{572, 1989. [17] H. N. Gabow, Z. Galil, T. H. Spencer, and R. E. Tarjan. Ecient algorithms for minimum spanning trees on directed and undirected graphs. Combinatorica , 6:109{122, 1986. [18] F. Harary. Graph Theory. Addison-Wesley, 1969. [19] J. Ja Ja. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. [20] D. B. Johnson and P. Metaxas. Connected components in O(log3=2 n) parallel time for the CREW PRAM. In IEEE Symposium on Foundations of Computer Science, pages 688{697, 1991. [21] D. B. Johnson and P. Metaxas. Optimal parallel and sequential algorithms for the vertex updating problem of a minimum spanning tree. Technical Report PCS-TR91-159, Dartmouth College, Dept. of Mathematics and Computer Science, 1991. [22] D. B. Johnson and P. Metaxas. Optimal algorithms for the vertex updating problem of a minimum spanning tree. In International Parallel Processing Symposium, pages 306{314, 1992. [23] D. B. Johnson and P. Metaxas. A parallel algorithm for computing minimum spanning trees. In ACM Symposium on Parallel Algorithm and Architectures, pages 363{371, 1992. [24] H. Jung and K. Mehlhorn. Parallel algorithms for computing maximal independent sets in trees and for updating minimum spanning trees. Information Processing Letters, 27(5):227{236, 1988. [25] D. R. Karger. Sampling in Matroids with Applications to Graph Connectivity and Minimum Spanning Trees. In IEEE Symposium on Foundations of Computer Science, 1993. [26] D. R. Karger, N. Nissan, and M. Parnas. Fast connected component algorithm for the EREW PRAM. In ACM Symposium on Parallel Algorithms and Architectures, pages 373{381, 1992. [27] R. M. Karp and V. Ramachandran. A survey of parallel algorithms for shared memory machines, chapter 17, pages 869{941. Handbook of Theoretical Computer Science. Elsevier Science Publisher B.V., J. van Leeuwen edition, 1990. [28] C. P. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. In Proceedings of the American Mathematical Society, vol. 7, pages 48{50, 1956. [29] G. L. Miller and J. H. Reif. Parallel tree contraction and its applications. In IEEE Symposium on Foundations of Computer Science, pages 478{489, 1985. [30] D. Nath and S. N. Maheshwari. Parallel algorithms for the connected components and minimal spanning tree problems. Information Processing Letters, 14: 7{11, 1982. [31] N. Nissan and E. Szemeredi and A. Wigderson. Undirected Connectivity in O(log1:5 n) Space. In IEEE Symposium on Foundations of Computer Science, 1992. [32] S. Pawagi. A parallel algorithm for multiple updates of the minimum spanning trees. In International Conference of Parallel Processing, vol. 3, pages 9{15, 1989. [33] S. Pawagi and O. Kaser. Optimal parallel algorithms for multiple updates of minimum spanning trees. Algorithmica, 9:357{382, 1993. [34] S. Pawagi and I. V. Ramakrishnan. Parallel update of graph properties in logarithmic time. In International Conference on Parallel Processing, volume 3, pages 682{691, 1985.

32

[35] S. Pawagi and I. V. Ramakrishnan. An O(log n) algorithm for parallel update of minimum spanning trees. Information Processing Letters, 22(5):223{229, 1986. [36] R. C. Prim. Shortest connection networks and some generalizations. Bell System Technical Journal, 36(6):1389{1401, 1957. [37] J. H. Reif. (Ed.) Synthesis of Parallel Algorithms. MOrgan Kaufmann Pub., San Mateo, CA, 1993. [38] C. Savage and J. Ja Ja. Fast ecient parallel algorithms for some graph problems. SIAM Journal on Computing, 10(4):682{691, 1981. [39] X. Shen and W. Liang. A parallel algorithm for multiple edge updates of minimum spanning trees. In International Parallel Processing Symposium, pages 310{317, 1993. [40] Y. Shiloach and U. Vishkin. An O(log n) parallel connectivity algorithm. Journal of Algorithms, 3(1):57{67, 1982. [41] P. M. Spira and A. Pan. On nding and updating spanning trees and shortest paths. SIAM Journal on Computing, 4(3):375{380, 1975. [42] R. E. Tarjan. Data Structures and Network Algorithms, volume CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, 1983. [43] Y. H. Tsin. On handling vertex deletion in updating minimum spanning trees. Information Processing Letters, 27:167{168, 1988. [44] P. Varman and K. Doshi. A parallel vertex insertion algorithm for minimum spanning trees. Theoretical Computer Science, 58:379{397, 1988.

33