Generating Minimal Perfect Hash Functions 1

2 downloads 0 Views 231KB Size Report
Key words. data structures, hashing, minimal perfect hashing, random graphs, ...... The Netherlands, June 1993, 153{165 (Lectures Notes in Computer Science.
Generating Minimal Perfect Hash Functions Zbigniew J. Czech  Institute of Computer Science, Silesia University of Technology 44{100 Gliwice, Poland Abstract

The randomized, deterministic and parallel algorithms for generating minimal perfect hash functions (MPHF) are proposed. Given a set of keys, W , which are character strings over some alphabet, the algorithms using a three-step approach (mapping, ordering, searching) nd the MPHF of the form h(w) = (h0 (w) + g(h1 (w)) + g(h2 (w)))mod m, w 2 W , where h0 , h1 , h2 are auxiliary pseudorandom functions, m is a number of input keys, and g is a function implemented as a lookup table. The randomized and deterministic algorithms are time optimal, i.e. they construct the MPHF | of relatively large representation space | in expected time O(m). The time complexity of the parallel algorithm is exponential in m, but it generates the space ecient MPHF. The algorithm which is based on a specially devised data structure, called a reversed trie, exhibits a consistent and almost linear speedup in the number of processors on a message-based distributed-memory computer. The results of timing experiments conducted on the implementations of the algorithms are presented. Key words. data structures, hashing, minimal perfect hashing, random graphs, parallel algorithms

1 Introduction

Let W be the universe of character strings (keys) having some nite maximum length. The keys are composed of characters from an ordered alphabet . Let W be a static1 set of m distinct keys belonging to W . A hash function is a function h : W ! I that maps the set W into some given interval of integers I, say [0; q ? 1], where q  m. The hash function computes for each key from W an address (an integer from I) for the storage and retrieval of that key. The storage area used to store keys is known as a hash table . A perfect (or 1-probe) hash function (PHF) is an injection h : W ! I, where W and I are sets as de ned above, q  m. If q = m, then we say that h is a minimal perfect hash function (MPHF). The function h is said to be order preserving , if for any pair of keys wi ; wj 2 W the equivalence h(wi) < h(wj ) () i < j holds. In other words, if the keys in W are arranged in some prespeci ed order, then h preserves this order in the hash table. Such a function is called an ordered minimal perfect hash function (OMPHF). Usually for the given set W many MPHFs exist. We are interested in nding any one of these functions. MPHFs are used for memory ecient storage and fast retrieval of items from static sets, such as reserved words in programming languages, command names in operating systems, commonly used words in natural languages, etc. Other applications like hypertext, hypermedia and large CD-ROM databases are mentioned in [14]. Various algorithms of di erent time complexities for constructing MPHFs have been presented, including [3, 4, 5, 6, 7, 8, 14, 16, 18, 19, 21, 31, 33, 35]. An overview of perfect hashing is given in [17, x3.3.16], and the area is surveyed in [26]. We propose the randomized, deterministic and parallel algorithms for constructing MPHFs. The randomized and deterministic algorithms are time optimal, i.e. they nd the MPHF | of relatively large representation space | in expected time O(m). The time complexity of the parallel algorithm is exponential in m, but it generates a space ecient MPHF. The paper is organized as follows. In section 2 we give an overview of a general approach used by the algorithms. Section 3 contains the description of the randomized algorithm. Section 4 presents the deterministic algorithm. In Section 5 the parallel algorithm is discussed. Section 6 concludes the paper, and the Appendix contains the results of timing experiments conducted on the implementations of the algorithms.  This 1

research was supported by the grants BK-608/RAu2/93 and BK-17/RAu2/94 of the Polish Committee for Scienti c Research. By static we mean a set which is essentially unchanging, i.e. it is not subject to insertions or deletions of elements.

2 The MOS approach The construction of the MPHF is accomplished in the three steps: mapping , ordering and searching (MOS) [14]. We look for the MPHF of the form [31]:





h(w) = h0(w) + g(h1 (w)) + g(h2 (w)) mod m (1) where h0 , h1 and h2 are auxiliary psuedorandom functions, and g is a function implemented as a lookup table. The rst step | mapping | transforms the set of keys into a set of triples of integers. The three tables T0 , T1 and T2 of random integers from the interval [0; m ? 1] are generated, one for each of the functions h0 , h1 and h2. Each table contains a random number for each position i in the key, and each possible character at this position. Given a key as the character string w = a1a2 . . .ajwj , the triple is computed using the following formulas:

1 0 jwj X h0 (w) = @ T0 [is ; ai]A mod m; 1 0 ij=1 wj X h1 (w) = @ T1 [is ; ai]A mod r; 1 1 00i=1jwj X h2 (w) = @@ T2 [is ; ai]A mod rA + r; i=1

(2)

where r is a parameter discussed later, is = ((jwj + i) mod jwjmax) + 1 determines the starting position for fetching numbers from the tables, and jwjmax is the maximum key length in the set W. The mapping step has to preserve the \uniqueness", i.e. if two keys are distinguishable in the original universe so they have to be in the new one. In other words, it is essential that the triples generated for the input keys are all distinct. It can be proved that the probability to get distinct triples using formulas (2) goes quickly to 1 for increasing m [14]. The values h1 (w) and h2(w) de ne the undirected bipartite dependency graph G = (V; E), V = fh1(w) : w 2 W g [ fh2 (w) : w 2 W g and E = f(h1 (w); h2 (w)) : w 2 W g. The second step | ordering | places the keys in a sequential collocation that determines the precedence in which hash values are assigned to keys. Keys are divided into subsets, W0, W1 , . . ., Wk , such that W0 = ;, Wi  Wi+1 , and Wk = W, for some k. The sequence of these subsets is called a tower , each subset Xi = Wi ? Wi?1 is called a level of the tower, and k is called the height of the tower. A subset Xi contains the keys which are interdependent. Thus the hash values must be assigned to the members of a level at the same time, since assignment of a hash value to any of them determines the hash values for all others. The third step | searching | tries to extend the desired function h from the domain Wi?1 to Wi . It takes the levels created in the ordering step and assigns hash values to the keys so as to produce the MPHF. If the searching step encounters Wi?1 for which h cannot be extended, it backtracks to earlier levels, assigns di erent hash values to the keys of these levels, and tries again to recompute hash values for successive levels. This is the only step of potentially exponential time complexity. A crucial issue is the performance of an algorithm constructing the MPHF, and the performance of the MPHF itself. They depend on: construction time | the time needed to construct h, evaluation time | the time needed to evaluate h, and representation space | the amount of memory needed for storing keys in the hash table and for storing the parameters of h. The best of the known algorithms generate the hash function in time proportional to the number of input keys, i.e. in time O(m) [14]. It is desirable that the hash function is evaluated in time O(1), and that the space to store the keys and the function is small. Provided that a key ts into one hash table location, the minimal size of the table is m. Mehlhorn proved that if the keys are integers belonging to the universe U = f0; 1; . . .; M ? 1g, then the lower bound of the representation space for a MPHF is (m+log logM) bits [28, 29]. A perfect hash scheme with O(1) evaluation time which achieves this lower bound was given by Schmidt and Siegel [32]. The scheme is a variation of the one presented by Fredman, Komlos and Szemeredi, and can be constructed deterministically in time O(m3 logM) [16]. The bounds cited above are valid for the standard RAM model in which O(logM)-bit words can be added, subtracted, multiplied and (integer) divided in constant time. In particular, an array access of an O(logM)-bit word takes unit time. We modify this model by assuming that the constant time operations are performed on O(log m)-bit words. In the worst case, a single key is represented on jwjmax log jj bits, and thus more than one O(log m)-bit word may be required for storing the key. Under these assumptions, the evaluation time of the MPHF given by Eq. (1) is proportional to the size of the key, i.e. it is O(jwj). To represent the MPHF we need to store the tables T0 , T1 and T2 , and the lookup table g. Provided that the 2

size of the alphabet  is bounded by a constant, the space to store T0 , T1 and T2 is jjjwjmax log m = O(jwjmax logm) bits. The size of table g, denoted by jgj, is determined by the parameter r (see Eqs. (2)). The table consists of jgj logm = 2r log m bits. To minimize the description space we would like to have r as small as possible. However, the smaller r the harder is to nd such g which makes h perfect and minimal. In this paper we investigate three cases: 1. jgj > 2m (r > m) | For this case we propose the randomized algorithm of the expected time complexity O(m). It builds a random acyclic dependency graph which enables to construct the OMPHF in only two steps: mapping and searching. The searching step consists in a depth- rst search of the dependency graph. 2. jgj = 2m (r = m) | Here the three-step deterministic MOS algorithm is used. Since the dependency graph is sparse, the searching step of the potentially exponential time complexity nds the MPHF in linear time. 3. jgj < 2m (r < m) | In this case a space ecient MPHF is desired. To nd the function the exhaustive search of the exponential running time is to be executed. For this task we propose a parallel algorithm with almost linear speedup in the number of processors.

3 The randomized algorithm

The randomized algorithm [9, 10] constructs the OMPHF of simpli ed form2:



P



h(w) = g(h1 (w)) + g(h2 (w)) mod m

P

(3)

where h1(w) = ( jiw=1j T1[is ; ai]) mod jgj, and h2 (w) = ( jiw=1j T2 [is; ai]) mod jgj. Consider the following problem. For a given dependency graph G = (V; E), jV j = jgj > 2m, jE j = m, nd a function g : V ! [0; m ? 1] such that the function (3) is a bijection. In other words, we are looking for an assignment of values to vertices so that for each edge the sum of values associated with its endpoints taken modulo the number of edges is a unique integer in the range [0; m ? 1]. This problem is not always solvable if arbitrary graphs are considered. However, if the dependency graph G is acyclic , a simple searching procedure can be used to nd values for each vertex (Fig. 1). Each key w 2 W de nes a single edge in G, e = (u; t), u = h1 (w), t = h2 (w), e 2 E. Thus the hash function given by Eq. (3) can be rewritten as h(e) = (g(u) + g(t)) mod m. The search proceeds as follows. Associate with each edge a unique number h(e) 2 [0; m ? 1] in any order. For each connected component of G choose a vertex v. For this vertex, set g(v) to 0. Traverse the graph using a depth- rst search (or any other regular search on a graph), beginning from vertex v. If vertex t is reached from vertex u, and the value associated with the edge e = (u; t) is h(e), set g(t) to (h(e) ? g(u)) mod m. Apply the above method to each component of G. Observe that we have reversed our original problem, by de ning the values of the function h rst and then searching for suitable values for function g. Due to this freedom in de ning the hash values for keys we are able to construct the OMPHF. To prove the correctness of the method it is sucient to show that the value of function g is computed exactly once for each vertex. This property is clearly ful lled if G is acyclic. The values of tables T1 and T2 which de ne an acyclic graph G are found by the mapping step (Fig. 2). It generates random tables repeatedly, until an acyclic graph is obtained3.

Theorem 1 Given jgj = cm, c > 2, the expected time complexity of the randomized algorithm is O(m). Proof To prove the theorem, we show that the expected number of iterations in the mapping step can be made constant by suitable choice of c. Let pa denote the probability of generating an acyclic graph with cm vertices and m edges. Let X be a random variable such that Pr(X = i) = pa (1 ? pa)i?1 . By standard probability arguments, the mean of X, E(X), which is equal to the expected number of iterations executed in the mapping step, is 1=pa . For random labeled graphs with cm vertices and m edges, as cm ! 1, the expected number of cycles of length d tends towards 2d =(2dcd ) [2, p.98]. This result is for graphs with no loops (d = 1) or multiple edges (d = 2);P however, it may d d be extended to cover them. Then, the probability of having an acyclic graph tends towards exp(? cm d=1 2 =(2dc )) [13]. Since for c > 2

 c  cm 2d X 1 lim d = 2 ln c ? 2 ; cm!1 d=1 2dc

2 Generating the OMPHF of the more general form h(w ) = (g (h (w )) + g (h (w )) + . . . + g (h (w ))) mod m is discussed in [27, 20]. It r 1 2 has been found experimentally that the minimum construction time and the representation space of the OMPHF are achieved for r = 3. However, to nd the OMPHF is more complex then as hypergraphs are involved. 3 Note that the dependency graph generated here is \standard", not bipartite.

3

procedure traverse (u: vertex ); begin for t 2 neighbors (u) do if g(t) < 0 then g(t) := (h(e) ? g(u)) mod m; traverse (t); end if; end for; end traverse ; begin Associate with each edge e 2 E a unique number h(e) 2 [0; m ? 1]; g(8v 2 V ) := ?1; for v 2 V do if g(v) < 0 then g(v) := 0; traverse (v); end if; end for; end;

Figure 1: The searching step

p

the probability of getting an acyclic graph tends towards p1 a = (c ? 2)=c. Now, the expected time complexity of the mapping step is O(E(X)jwjmax m) = O(jwjmaxm=p1 ). For xed jwjmax and c > 2 this bound is O(m). The searching a step executes the depth- rst search of the dependency graph in time O(jgj + m) = O(m). Thus, the expected time complexity of the randomized algorithm is O(m). 2 p For jgj = cm = 3m, the expected number of iterations executed in the mapping step is E(X)  1=p1 a = 3. This theoretical result was con rmed experimentally (see Table 1).

4 The deterministic algorithm

For the case jgj = 2m, we propose the three-step deterministic MOS algorithm [11]. In the mapping step, the keys are transformed into distinct triples of integers h0 (w), h1 (w) and h2 (w) by applying formulas (2). This transformation takes O(m) time if the size of the alphabet  and the maximumkey length are xed. The values h1 (w) and h2(w) de ne the bipartite dependency graph G in which each key is associated with edge e = (h1 (w); h2 (w)). The dependency graph consisting of jgj = 2r = 2m vertices and m edges re ects constraints among keys. Observe that allocating a

repeat

Initialize E := ;; Generate random tables T1 and T2 ; for w 2 W do P h1(w) := (Pjiw=1j T1[is ; ai]) mod jgj; h2(w) := ( jiw=1j T2[is ; ai]) mod jgj; Add edge (h1 (w); h2(w)) to graph G; end for; until G is acyclic; Figure 2: The mapping step 4

place in the hash table for key w requires selecting the value U(w) = g(h1 (w))+g(h2 (w)). There may exist a sequence of keys fw0; w1; . . .; wt?1g such that h1 (wi) = h1(wi+1 ) and h2 (wi+1) = h2(w(i+2)mod t), for i = f0; 2; 4; . . .; t ? 2g. Once keys w0 , w1, . . ., wt?2 are allocated some places in the hash table, both g(h1 (wt?1)) and g(h2 (wt?1)) are set (note that h1 (wt?2) = h1 (wt?1) and h2 (wt?1) = h2 (w0)). Hence, key wt?1 cannot be allocated an arbitrary place, but must be placed in the hash table at location





h(wt?1) = h0(wt?1) + U(wt?1) mod m: In our sequence, the keys w0, w1, . . ., wt?2 are independent, i.e. they have a choice of a place in the hash table, whereas the key wt?1 is dependent, i.e. it has not such a choice. We shall call these keys canonical and noncanonical , respectively. It is easy to see that U(wt?1) = g(h1 (wt?1)) + g(h2 (wt?1)) =

X

p2path(wt?1 )

where path(wt?1) is a sequence of keys fw0; w1; . . .; wt?2g, and thus

0 h(wt?1) = @h0(wt?1) +

X p2path(wt?1 )

(?1)p U(wp )

1

(?1)p U(wp)A mod m:

If the place h(wt?1) is occupied, a collision arises and no minimal perfect hash function for selected values of g can be found. In such a case, the searching step executes backtrack and tries to nd di erent values of g that do not lead to a collision. This dependency of keys is re ected in the dependency graph by cycles. Each of them corresponds to a sequence of keys similar to the described. The goal of the ordering step is to nd such an order of keys | given by the tower | that keys without a choice (i.e. dependent) are processed by the searching step as soon as possible. Such an order is usually found heuristically. In [31] Sager gave a heuristic called mincycle . We shall describe it shortly. Let Wi , i  1, be the i-th set in the tower. Then Wi = Wi?1 [ Xi , where Xi is a group of keys selected as follows. Choose an edge (possibly a multiple edge4 ) lying on a maximal number of minimal length cycles in the dependency graph G. Let Xi be all keys associated with the chosen edge. Remove the edge from G and merge its endpoints. Repeat this procedure until all edges are removed from G. The heuristic which gave the name to the whole Sager's algorithm [31] tries to ensure that each time an edge is selected, it is done in such a way that the maximum number of dependent keys is placed in the tower. Our heuristic for building the tower is much simpler than the Sager's. We consider only the keys which correspond to cycles and multiple edges in the dependency graph. All other keys of W are free , i.e. arbitrary hash values can be assigned to them. We denote the set of free keys as Wa . Consequently, after generating the dependency graph G, the subgraph G1 = (R1 ; E1) that contains only the multiple edges and the edges lying on cycles is constructed. For this purpose the biconnected components of G are found. The edges of the biconnected components of size greater than 1 are placed into E1. Once G1 is constructed, we rst put the multiple edges into the tower beginning with the edges of highest multiplicity. These edges are then deleted from G1 , and a set of the fundamental cycles for the resultant graph is found by using the breadth- rst search. The fundamental cycles are considered in order of increasing length. The edges of the shortest cycle are put to the tower one at a time. Each edge is then deleted from all the cycles it lies on, and the next shortest cycle is considered. While building the tower we maintain a spanning forest that consists of the edges corresponding to the canonical keys in the tower. The forest is used to compute the path s for noncanonical keys. To determine the time complexity of the ordering step we need the following lemmas (proved in [11]) estimating the expected total length of cycles, the expected number of cycles, and the expected number of multiple edges, in a random bipartite graph of 2r = 2m vertices and m edges. Lemma Let C2d denote number of cycles of length 2d. Then the expected total length of cycles in the graph is Pm=2 2d 1E(C pm)the . ) = O( 2d d=1

2 Lemma 2 The expected number of cycles in the graph is Pm= d=1 E(C2d ) = O(ln m). Lemma 3 Let ej denote the number of multiple edges of multiplicity j in the graph. Then limm!1 Pmj=3 E(ej ) = 0.

Using the above lemmas we can prove 4

A multiple edge represents the keys w with identical pairs (h1 (w); h2 (w)).

5

Lemma 4 The time complexity of the ordering step is O(m). Proof The ordering step comprises nding the fundamental cycles and building the tower. By Lemma 3 we can omit the multiple edges in our analysis. The cost of nding thepfundamental cycles of G1 is proportional to the total length of these cycles. By Lemma 1 this length cannot exceed O( m). While placing the edges of the fundamental cycles in the

tower, we maintain a heap of cycles. The following operations are executed: (i) selecting the shortest cycle in the heap; (ii) nding a path in a spanning forest between the end vertices of a noncanonical edge, and making the path list for it; (iii) restoring the heap. Let  denote the expected number of the fundamental cycles. The operation (i) takes time O( log), that by lemma 2 is O(ln m log(lnm)). Finding the paths for all noncanonical edgesp in operation (ii) requires at most O(m1 ) = O(pm lnm) time, whereas making the path lists is done in O(m1 ) = O( m) time (m p1 denotes the number of edges lying on the fundamental cycles). The cost of operation (iii) is at most O(m1 log ) = O( m log(ln m)). All these costs imply that the time complexity of the ordering step is less than O(m). 2 In the searching step, the following combinatorial problem is solved: Find U(wi) 2 [0; m ? 1], i = 1; 2; . . .; k, where k is the height ofPthe tower, such that values h(wi ) = (h0 (wi ) + U(wi)) mod m, for canonical keys wi 2 Yk , and h(wj ) = (h0 (wj ) + p2path(w ) (?1)p U(yp )) mod m, for noncanonical keys wj 2 W ? Wa ? Yk are all distinct, i.e. for any w1, w2 2 W ? Wa , h(w1) 6= h(w2). The U(wi)s (or U-values) are found during the exhaustive search at every level Xi of the tower. The search starts with U(wi ) = 0 for each canonical key wi, i.e. an attempt is made to locate it at position h0 (wi ) in the hash table. Note that since values h0 are random, we begin to locate all the canonical keys of the tower in a random probe of places. Once the hash value for the canonical key wi on a given level of the tower is found, the value of U(wi) is known. It enables to compute the hash values for the noncanonical keys at the level. Following [14], we shall call the set of the hash values of keys at a given level Xi a pattern of size s = jXi j. Clearly, if all places de ned by the pattern are not occupied, the task on a given level is done and the next level can be processed. Otherwise, the pattern is moved up the table modulo m until the place where it ts is found. Except for the rst level of the tower, this search is conducted for the hash table that is partially lled. Thus, it may happen that no place for the pattern is found. In such a case the searching step backtracks to earlier levels, assigns di erent hash values for keys on these levels, and then again recomputes the hash values for successive levels (Fig. 3). Having U(wi)s, the values of the table g can be computed by making use of the O(m) procedure FINDg presented in [31]. j

1 2 3

i := 1; while i 2 [1; k] do Place the canonical key wi 2 Xi in the next table location (beginning at position h0(wi )), and compute U(wi) = (h(wi) ? h0 (wi)) mod m; if wi not located then i := i ? 1; { { backtracking

4 5 6 else 7 for each noncanonical key wj 2 Xi do 8 h(wj ) = (h0 (wj ) + U(wi )) mod m; 9 end for; 10 if all places h(wi) and h(wj ) are not occupied then 11 i := i + 1; 12 end if; 13 end if; 14 end while;

Figure 3: The searching step The exhaustive search applied here has a potential worst-case time complexity exponential in the number of keys to be placed in the hash table. However, if this number is small as compared to the table size, the search is carried out in the table that is mostly empty, and can be done faster (see e src in Table 2).

Lemma 5 The time complexity of the searching step is O(pm). Proof The keys to be placed in the hash table during the search correspond to multiple edges and the edges of fundamental cycles in graph G1. By Lemmas 3 and 1, as m ! 1 the number of multiple edges goes to 0, and the number of edges on fundamental cycles m = O(pm). Among the latter, m ?  edges are canonical. They constitute 1

1

6

patterns of size 1 and are placed independently of each other at the positions h0 (w). The other  edges are noncanonical (dependent) and form patterns of size s > 1. Assuming that i random elements of the hash table are occupied, the probability of successfully placing in one probe a pattern of size s > 1 is:

?m?is! (m ? i)(m ? i ? 1)    (m ? i ? s ? 1) ps = s = : ms

ms

Since i < m1 , s  , m1  pm, and   ln m, we have lim m!1 ps = 1. Thus, one can neglect the existence of

noncanonical edges. Treating these edges as canonical, the search can be approximated to a task of placing m1 keys (edges) in the hash table of size m using h0(w) as the primary hash function and the linear probing to resolve collisions. The expected number of probes to place a single key in the table containing i keys is [23]:

"

 1 2 1 0 Ci  2 1 + 1 ? i

#

where i = i=m is the load factor of the hash table which varies between 0 and (m1 ? 1)=m = O(1=pm). The total number of probes to place all m1 keys is then mX 1 ?1 i=0

" 

Ci0  m1 1 + 2

#

1 p 2  m = O(pm): 1 1 ? 1= m

p Thus, as m ! 1 each key is placed in the hash table in a constant time, and the search is done in time O( m). 2

Theorem 2 Given jgj = 2m, the time complexity of the deterministic algorithm is O(m). Proof The theorem follows from Lemmas 4 and 5. 2

5 The parallel algorithm

When the inequality jgj < 2m holds, a space ecient MPHF is desired. In this case the third step of the MOS algorithm executes the exhaustive search of the exponential running time (see Table 3). We propose a parallel algorithm for speeding up such a search to nd a rst solution to the problem [1]. A message-based distributed-memory computer is assumed as a model of parallel computations. Denote for simplicity U(wi ) as Ui . The problem we wish to solve can be expressed as follows: Find a sequence of values (U1 , U2 , . . ., Uk ) which satisfy property Pk (U1 , U2 , . . ., Uk ). The property P Pk is that the values h(wi) = (h0(wi ) + U(wi)) mod m, for canonical keys wi 2 Yk , and h(wj ) = (h0 (wj ) + p2path(w ) (?1)p U(yp )) mod m, for noncanonical keys wj 2 W ? Wa ? Yk are all distinct. We nd the solutions by making use of the exhaustive search which systematically enumerates all solutions (U1 , . . ., Uk ) to the original problem by considering all partial solutions (U1 , . . ., Ui ) that satisfy Pi . During the search we attempt to expand a partial solution (U1 , . . ., Ui?1 ) into (U1 , . . ., Ui?1, Ui ). If (U1 , . . ., Ui?1 , Ui ) does not satisfy Pi , then by induction, no extended sequence (U1 , . . ., Ui?1, Ui , . . ., Uk ) can satisfy the original property Pk . In such a case we backtrack to the partial solution (U1 , . . ., Ui?1) and then try to expand it with the next value of Ui (see Fig. 3). The exhaustive search can be viewed as a search of a state space that consists of all sequences (U1 ; . . .; Ui ) satisfying property Pi , for i = 1, 2, . . ., k. This space is structured as a graph which is usually a tree. We shall call this tree a search tree , or simply a tree . Except for the root, each node of the tree corresponds to a single value Ul . We assume that Ul is a label of the node. The path from the root to a node de nes a partial solution (U1 ; . . .; Ui ). When i = k, such a path de nes an element of the solution space . We are interested in nding any one element of this space. Given node Ui?1, its successors are generated by computing the set of all Ui such that Pi (U1 ; . . .; Ui ) is satis ed. The pair (Ui?1 ; Ui ) de nes an arc in the tree. We shall call a node that has been generated and whose successors have not yet been generated an active node . The simplest approach to parallelization of the exhaustive search is to compute at any node the set of possible successor nodes, and then to search the subtrees beneath each node in parallel by using di erent processors. However, if we search only for a single solution | which is our main interest | and it is found in the rst subtree, the work done in other subtrees is wasted. As a consequence, such an organization of a parallel search can give a sublinear speedup (speedup less than p , where p is the number of processors) or even a decrease in speedup with an increase in the number of processors (deceleration anomaly). It is also possible to obtain a superlinear speedup (speedup greater than p ), when for example a solution is found relatively quickly in the p -th subtree. These anomalous results in the context of parallel branch-and-bound algorithms were rst observed and discussed by Lai and Sahni [25]. They were j

7

also reported in [22, 34]. Our goal is to devise a parallel search algorithm with consistent linear speedup in the number of processors. The sequential search shown in Fig. 3 can be eciently implemented as a depth- rst search of the tree using a LIFO stack of active nodes. A set of all possible values Ui can be generated in line 3. These values are then pushed into the stack. As a sequence (U1 ; . . .; Ui?1 ) is to be expanded, the top element in the stack is used. This implementation can be parallelized, and two variants of the parallel stack-based depth- rst search can be distinguished: the multiple and shared stack schemes [30, 24, 22]. Unfortunately, the amount of memory required to carry out the search in either of these schemes can be as large as O(pbd), where b is the average branching factor de ned as the number of new tree nodes that can be generated from a given node, and d is the depth of the tree. This memory requirement is usually prohibitive in practice. Both schemes may also lead to speedup anomalies mentioned earlier. Our parallel search algorithm uses the technique of the processor farm [12]. The basic idea of farming consists of having a central controller | the Master process | that hands out pieces of work to be processed by the members of a pool of Worker processes. The Master stores active nodes of the search tree that represent partial solutions to the MPHF problem. The active nodes are sent to the Workers which do the search by placing successive levels of the tower in the hash table and reporting the results to the Master . We introduce a minimum number of levels, s, which a Worker must place before communicating with the Master . This parameter can be viewed as a grain size or granularity of the search work. In order to decrease the memory requirement (see Theorem 3) and to achieve a linear speedup of the parallel algorithm (see Fig. 4) we use the concepts of a delayed-release technique and a reversed trie . The idea behind the delayed-release technique [22] is that when a node is expanded, the descendants are not immediately available to other processors. Except for the leftmost descendant which is expanded next, the remaining descendants are put into a list. This is applied to every node until a leaf node is reached. Then, all the nodes in the list are released as active nodes and may be picked up by other processors. This strategy reduces the immediate availability for expansion of untried alternatives (nodes) at shallower levels in the search space. During the search the intermediate levels of the tree are skipped and the tree is explored from its bottom what minimizes wasted work. The partial solutions are kept in a specially devised data structure that we called a reversed trie (r-trie ) [1]. The name comes from its resemblance to the trie structure [15]. An r-trie is essentially a b -ary tree, whose nodes correspond to the U -values. The nodes on level l represent the set of partial solutions that begin with a certain sequence of U -values. In a single node of the r-trie we hold: a level number, a U -value for that level, a ag indicating whether the node is assigned to a Worker or not, and a pointer to the parent node. A partial (or complete) solution is restored by traversing the r-trie from a node up to its root. The active nodes in the r-trie are arranged into a doubly-linked list named ActiveNodes . This list is used for selecting nodes and handing them out to Workers . The advantage of the r-trie structure is that we do not need to compute node priorities, as in [22]. Also, no priority queue of nodes has to be maintained. To nd a node to be assigned to a Worker we scan the ActiveNodes list inspecting at most p ? 1 of its elements. In our parallel search algorithm a Worker process receives two types of messages from the Master : New. This message contains a new partial solution for carrying on the search. It consists of a level number where the new partial solution begins, a number of levels in the solution and the U -values, one value for each level. Continue. This message has no components. Upon receiving the Continue message a Worker continues its search with the currently available partial solution. The Master holds partial solutions and assigns work to Workers . It stores pointers to last nodes processed by each Worker and keeps track of idle Workers. When active nodes become available, they are sent to idle Workers . PartialSuccess. This message is sent when a Worker placed successfully some levels, and then encountered a level which cannot be placed. It contains the number of levels placed along with the corresponding U -values. DeepBack. This message is sent when a Worker has executed a backtrack pruning [1]. It contains the level number to which the Worker backtracked. TotalSuccess. This message is sent when a Worker placed successfully the last level of the tower. It contains the U -values for the levels placed.

Lemma 6 In the worst case the number of nodes in the r-trie is O(d2), where d is the depth of the search tree. Proof See [1]. 2 Theorem 3 The worst-case memory complexity of the r-trie based parallel algorithm is O(d2 + pm), where p is the

number of processors, and m is the number of keys in the input set W .

8

Proof The Master stores the r-trie and the table of pointers to nodes processed by the Workers . By Lemma 6 it requires c1 d2 + c2 (p ? 1) memory cells. Each Worker stores the hash table of size m what requires c3 m cells. Thus, the memory usage of the algorithm is c1d2 +c2 (p ? 1)+c3 (p ? 1)m = O(d2 +pm), for some constants c1 , c2 and c3. 2

6 Conclusions We discussed the problem of generating MPHFs for keys which are nite string of characters. In our approach the input keys are mapped into pseudorandom triples of integers. Based on these triples the unique hash values for the keys are computed. It can be proved that the probability of distinctness for the triples | what is the necessary condition for the e ectiveness of the method | goes rapidly to 1, when the size of the set of keys increases. We proposed the randomized, deterministic and parallel algorithms for solving the problem. The evaluation time of the generated MPHF is proportional to the size of the key, i.e. it is O(jwj). Both the randomized and deterministic algorithms compute the MPHF in optimal time O(m). However, the representation space for generated function equal to jgj logm + jjjwjmax logm = O(m logm + jwjmax log m) bits is relatively large, as compared to the lower bound

(m + log logM) bits [28, 29]. The parallel algorithm computes the more space ecient MPHF | theoretically it can construct the function with representation space linear in m by keeping jgj = m= log m. The experiments showed that the algorithm exhibits a consistent and almost linear speedup in the number of processors on a message-based distributed-memory computer. However, the time complexity of the parallel algorithm is exponential in m, what precludes its application for large sets of keys. Thus, we conclude that there is still room for improvement.

Acknowledgments I wish to thank Marek Konopka for his useful comments on an earlier draft of this paper. Thanks also to Bo_zena Bartoszek for implementing the parallel algorithm and conducting the experiments. I am grateful to Bohdan S. Majewski and George Havas for their collaboration in designing the randomized and deterministic algorithms. I thank the reviewer for the comments which helped to improve the notation used in this paper.

Appendix This Appendix contains the results of timing experiments conducted on the implementations of the algorithms. All times are given in seconds. m 512 1024 2048 4096 8192 16384 24692 32768 65536 131072 262144 524288

iter map src total 1.704 0.037 0.010 0.047 1.684 0.052 0.019 0.072 1.776 0.095 0.037 0.132 1.676 0.169 0.067 0.236 1.668 0.320 0.142 0.463 1.680 0.628 0.293 0.921 1.688 0.950 0.444 1.394 1.636 1.353 0.597 1.949 1.696 2.718 1.198 3.916 1.676 5.448 2.416 7.864 1.768 11.273 4.813 16.087 1.736 22.493 10.414 32.907

Table 1: Running times of the randomized algorithm on a Sun SPARC station 2 with SunOSTM operating system, jgj = 3m (iter | average number of iterations executed in the mapping step, map | mapping, src | searching)

References [1] Bartoszek, B., Czech, Z.J., and Konopka, M., Parallel searching for a rst solution, University of Kent at Canterbury, TR 8{93, September 1993. 9

m 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 12000 16000 20000 24000 28000 32000 36000 40000 45000 50000 100000

map 0.058 0.098 0.142 0.183 0.224 0.264 0.305 0.347 0.392 0.439 0.539 0.697 0.870 1.043 1.365 1.556 1.748 1.951 2.207 2.437 4.486

ord 0.238 0.282 0.322 0.366 0.409 0.452 0.495 0.534 0.577 0.619 0.722 0.867 1.027 1.187 1.374 1.544 1.696 1.866 2.091 2.260 4.018

e src 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000

src total 0.023 0.318 0.047 0.427 0.070 0.534 0.091 0.641 0.115 0.747 0.140 0.855 0.160 0.960 0.186 1.067 0.209 1.178 0.232 1.290 0.288 1.549 0.374 1.938 0.470 2.367 0.567 2.797 0.666 3.405 0.766 3.866 0.866 4.311 0.966 4.783 1.089 5.387 1.200 5.897 2.405 10.909

Table 2: Running times of the three-step deterministic MOS algorithm on a Sun SPARC station 2 with SunOSTM operating system, jgj = 2m (map | mapping, ord | ordering, e src | exhaustive search, src | whole searching step) m 50 60 70 80 90 100

c = 0:45 227.658 382.592 385.010 318.291 402.634 865.847

c = 0:46 195.883 260.768 250.005 200.772 322.956 445.430

c = 0:47 c = 0:48 c = 0:49 c = 0:5 112.862 84.468 58.293 21.217 80.682 73.845 44.657 10.769 144.504 66.800 57.800 13.147 205.961 45.931 29.858 17.782 181.241 123.641 55.215 14.851 244.551 115.510 57.919 44.709

Table 3: Sequential search times on a single T800 transputer, jgj = cm (occam implementation) [2] Bollobas, B., Random Graphs , Academic Press, New York, 1985. [3] Brian, M.D., and Tharp, A.L., Near-perfect hashing of large word sets, Software | Practice and Experience 19, (1990), 967{978. [4] Cercone, J., Boates, J., and Krause, M., An interactive system for nding perfect hash functions, IEEE Software 2, (1985), 38{53. [5] Chang, C.C., The study of an ordered minimal perfect hashing scheme, Comm. ACM 27, 4, (April 1984), 384{387. [6] Chang, C.C., and Lee, R.C.T., A letter-oriented minimal perfect hashing scheme, The Computer Journal 2, (1986), 277{281. [7] Cichelli, R.J., Minimal perfect hash functions made simple, Comm. ACM 23, 1, (January 1980), 17{19. [8] Cook, C.R., and Oldehoeft, R.R., A letter oriented minimal perfect hashing function, Sigplan Notices 17, 9, (September 1982), 18{27. [9] Czech, Z.J., Havas, G., and Majewski, B.S., An optimal algorithm for generating minimal perfect hash functions, DIMACS at Rutgers University, TR 92{24, Piscataway, NJ, May 1992. 10

9 8

s=1

Speedup

7 6

s=2

5

s=3

4 s=4

3 2

s=5

1 0

2

3

4

5 6 7 8 Number of processors

9

10

Figure 4: Speedup versus number of processors measured for occam implementation on a T800 Meiko Computing Surface (s denotes the grain size of search work) [10] Czech, Z.J., Havas, G., and Majewski, B.S., An optimal algorithm for generating minimal perfect hash functions, Information Processing Letters 43, (5 October 1992), 257{264. [11] Czech, Z.J., and Majewski, B.S., A linear time algorithm for nding minimal perfect hash functions, The Computer Journal 36, 6 (1993), 579{587. [12] Day, W., Farming: towards a rigorous de nition and ecient transputer implementation, in Transputer Systems | Ongoing Research , 49{62, Allen, A., Ed., Proc. 15th World Occam and Transputer User Group Technical Meeting, Aberdeen, Scotland, 1992. [13] Erdos, P., and Renyi, A., On the evolution of random graphs, Publ. Math. Inst. Hung. Acad. Sci. 5 (1960), 17{61; Reprinted in: Spencer, J.H., Ed., The Art of Counting: Selected Writings , Mathematicians of Our Time, MIT Press, Cambridge, MA, 1973, 574{617. [14] Fox, E.A., Heath, L.S., Chen, Q., and Daoud, A.M., Practical minimal perfect hash functions for large databases, Comm. ACM 35, 1 (January 1992), 105{121. [15] Fredkin, E., TRIE memory, Comm. ACM 3, 9 (September 1960), 490{499. [16] Fredman, M.L., Komlos, J., and Szemeredi, E., Storing a sparse table with O(1) worst-case access time, J. ACM 31, 3 (1984), 538{544. [17] Gonnet, G.H., and Baeza-Yates, R., Handbook of Algorithms and Data Structures , Addison-Wesley, 1991. [18] Gori, M., and Soda, G., An algebraic approach to Cichelli's perfect hashing, BIT 29, (1989), 2{13. [19] Haggard, G., and Karplus, K., Finding minimal perfect hash functions, ACM SIGSCE Bull. 18, (1986), 191{193. [20] Havas, G., Majewski, B.S., Wormald, N.C., and Czech, Z.J., Graphs, hypergraphs and hashing, Proc. 19th International Workshop, WG'93, Utrecht, The Netherlands, June 1993, 153{165 (Lectures Notes in Computer Science 790, Springer-Verlag). [21] Jaeschke, G., Reciprocal hashing: A method for generating minimal perfect hashing functions, Comm. ACM 24, 12, (December 1981), 829{833. [22] Kale, L.V., and Saletore, A., Parallel state-space search for a rst solution with consistent linear speedups, International Journal of Parallel Programming 19, 4 (1987), 251{293. [23] Knuth, D.E., The Art of Computer Programming, Vol. 3: Searching and Sorting , Addison-Wesley, Reading, MA, 1973. 11

[24] Kumar, V., and Rao, V.N., Parallel depth rst search. Part II. Analysis, Intern. Journ. of Parallel Programm. 16, 6 (1987), 501{519. [25] Lai, T.H., and Sahni, S., Anomalies in parallel branch-and-bound algorithms, Comm. ACM 27, 6 (June 1984), 594{602. [26] Lewis, T.G., and Cook, C.R., Hashing for dynamic and static internal tables, IEEE Computer 21, (1988), 45{56. [27] Majewski, B.S., Wormald, N.C., Czech, Z.J., and Havas, G., A family of generators of minimal perfect hash functions, DIMACS at Rutgers University, TR 92{16, Piscataway, NJ, April 1992. [28] Mehlhorn, K., On the program size of perfect and universal hash functions, Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science , Chicago, 1982, 170{175. [29] Mehlhorn, K., Data Structures and Algorithms 1: Sorting and Searching , Springer-Verlag, Berlin, 1984. [30] Rao, V.N., and Kumar, V., Parallel depth rst search. Part I. Implementation, Intern. Journ. of Parallel Programm. 16, 6 (1987), 479{499. [31] Sager, T.J., A polynomial time generator for minimal perfect hash functions, Comm. ACM 28, 5, (May 1985), 523{532. [32] Schmidt, J.P., and Siegel, A., The spatial complexity of oblivious k-probe hash functions, SIAM Journ. on Computing 19, 5 (October 1990), 775{786. [33] Sprugnoli, R., Perfect hashing functions: A single probe retrieving method for static sets, Comm. ACM 20, 11, (November 1977), 841{850. [34] McKeown, G.P., Rayward-Smith, V.J., Rush, S.A., and Turpin, H.J., Using a transputer network to solve branch-and-bound problems, in Transputing'91 , Welch, P., et al., Eds., IOS Press, 1991. [35] Winters, V.G., Minimal perfect hashing in polynomial time, BIT 30, (1990), 235{244.

12