Efficient In-network Computing with Noisy Wireless ... - Semantic Scholar

2 downloads 0 Views 734KB Size Report
networks. C. Li was with NC State University, Raleigh, NC. He is currently with Broad- com Corporation, Matawan, NJ, 07747. Email: chengzhi@broadcom.com.
1

Efficient In-network Computing with Noisy Wireless Channels Chengzhi Li, Student Member, IEEE, and Huaiyu Dai, Senior Member, IEEE Abstract—In this paper we study distributed function computation in a noisy multi-hop wireless network. We adopt the adversarial noise model, for which independent binary symmetric channels are assumed for any point-to-point transmissions, with (not necessarily identical) crossover probabilities bounded above by some constant ϵ. Each node takes an m-bit integer per instance and the computation is activated after each node collects N readings. The goal is to compute a global function with a certain fault tolerance in this distributed setting; we mainly deal with divisible functions, which essentially cover the main body of interest for wireless applications. We focus on protocol designs that are efficient in terms of communication complexity. We first devise a general protocol for evaluating any divisible functions, addressing both one-shot (N = O(1)) and block computation, and both constant and large m scenarios. We also analyze the bottleneck of this general protocol in different scenarios, which provides insights into designing more efficient protocols for specific functions. In particular, we endeavor to improve the design for two exemplary cases: the identity function, and size-restricted type-threshold functions, both focusing on the constant m and N scenario. We explicitly consider clustering, rather than hypothetical tessellation, in our protocol design. Index Terms—Distributed Computing, Noisy Multi-hop Network, Clustering

F

1

I NTRODUCTION

1.1 Motivation Networked systems of intelligent devices are playing an increasingly important role in our life. In particular, they will facilitate monitoring and control of the nation’s critical infrastructures; seamless surveillance, intelligent transportation, and secure Internet are a few such examples. Designing efficient protocols to facilitate information processing among distributed nodes is crucial to the success of these networked systems. While there has been extensive research in traditional distributed computing [1], [2], [5], the influence of channel noise is largely ignored. In previous studies targeting VLSI or wireline networks, noise-free communications can be fairly assumed; instead some consideration was given to the fault tolerance of collapsed or Byzantine nodes. However, as we deal with networked information processing in wireless networks, consideration of noisy channels becomes necessary. Usually a protocol originally designed for noiseless channels fails in the noisy communication due to messages errors and discrepancy in individual interpretations of the communication history. Furthermore, the problems of interest in wireless applications are usually different; here we are interested in some real (possibly vector) functions of the data at all nodes, typically with physical meanings. A good example is the summary or statistics of collected data in wireless sensor networks. C. Li was with NC State University, Raleigh, NC. He is currently with Broadcom Corporation, Matawan, NJ, 07747. Email: [email protected]. H. Dai is with the Department of Electrical and Computer Engineering, NC State University, Raleigh, NC 27695. Email: [email protected].

Devising communication protocols with noisy channels imposes additional challenges. In some sense, it is the counterpart of Shannon’s channel coding in the much more challenging network setting. In general, increase in complexity is inevitable even if a constant error tolerance is allowed, and any protocols working in the noisy environment should make this penalty as small as possible. Meanwhile, it is also required that the protocols be oblivious, i.e., the transmission schedule of nodes be pre-determined, independent of initial inputs and the communication history; this avoids transmission contention and out-of-order execution. Time and message complexity are two key measures for the efficiency of distributed computing protocols. In this work, we concentrate on the latter, as communication cost is typically a dominant factor in the energy consumption and determines the life time of wireless networks. Also, we focus on the bit complexity, representing fundamental limits in the theoretical approaches. This naturally draws the connection with the theory of communication complexity [3]–[5]. Like computational complexity, communication complexity is an inherent property of a problem; it measures the hardness of a problem in terms of the communication (rather than the execution time) required for the most efficient solution. For a noiseless-channel protocol with communication complexity of n, one could repeat transmission of each bit O(log(n/Q)) times and take the majority of the receiving results; this approach, termed standard amplification in literature, leads to a noisy-channel protocol of complexity O(n log(n/Q)) with error tolerance Q. Obtaining noisy-channel protocols with less increase in complexity, especially those of complexity O(n), is highly non-trivial, and in some cases impossible. In this paper, we consider

2

efficient protocol designs for computing divisible functions in a noisy multi-hop network, which constitutes the main body of interest for applications in wireless senor networks. Our contributions are summarized as follows: •





We devise a general protocol for evaluating any divisible function in a noisy multi-hop wireless network. The complexity of the general protocol is analyzed with respect to various parameters such as the number of nodes in the network n, the length of data blocks at each node N , the size of each data point in bits m, and the cardinality of the function range rn . For some specific functions such as histogram and parity, our protocol achieves the best results available in literature. The analysis on the bottleneck of this general protocol in different scenarios motivates further studies on more efficient protocols. We endeavor to improve the design for two special cases: the identity function, and a special class of restricted type-threshold functions introduced in Section 2. We reveal a tight bound on the communication complexity of the identity function and propose a more efficient protocol for the special class of type-threshold functions. We also believe that the methodologies developed in these two cases may assume wider applicability. We incorporate clustering techniques into our protocol design to improve protocols’ efficiency, which is more practical and flexible than network tessellation, a theoretical approach widely adopted in literature.

1.2 Related works Communication complexity of distributed computing with noisy channels was first considered by El Gamal in [7] in a broadcast network, where each of the n nodes, holding one binary input, can broadcast (and listen) to all others through independent binary symmetric channels. Gallager showed that complexity O(n log log n) is achievable for computing any functions in a noisy broadcast network [8], which was further shown optimal for the identity function in [14]. In [15], Yao posed the question whether there exist nontrivial Boolean functions that can be computed with O(n) broadcasts; this is answered in the affirmative for the threshold functions in [16] with the independently and identically distributed (i.i.d.) random noise model, and in [9] for the OR function with the more realistic (and more general) adversarial model (where a “benign” adversary is allowed to arbitrarily reduce the error probability of each link at the beginning, or even dynamically cancel any errors on the fly, so as to prevent the protocols from exploiting the stochastic regularities of the previous i.i.d. model). While most work in this area focuses on computing Boolean functions of binary variables, our recent work [18] extends the study to finding the K largest integer values among nodes in a

TABLE 1: communication complexity in noisy broadcast networks (mN = O(1)) Function OR parity identity threshold function

Upper bound O(n) [9] O(n log log n) [8] O(n log log n) [8] O(n) [16]

Lower bound O(n) O(n) O(n log log n) [14] O(n)

noisy broadcast network. The results in noisy broadcast networks are summarized in Table 1. Due to concerns on energy consumption and scalability, transmissions are typically carried out in a multi-hop fashion in wireless networks. [6], [24] and [10] focused on noiseless sensor networks, i.e., the communication channels are assumed reliable. In [6] the authors exploited block computation to study the communication complexity of evaluating symmetric functions in random networks with unbounded node degrees, while [24] investigated the computation problem in networks with finite degrees. In [10] the authors explored the minimal time cost and power consumption for the evaluation of the max function. Distributed computing with a reliability constraint in noisy multi-hop networks is arguably more challenging than in noiseless multi-hop networks and noisy broadcast networks. In particular, it takes more efforts to combat the adverse effect of noise, as much more severe error propagation is expected in multi-hop transmission. So far there are only a few works in this area. [11] indicated that a symmetric function can be evaluated with complexity O(n log log n), which can be further improved to O(n) if block coding is adopted. Note that we address divisible functions in this work, including histogram and the identity function. Therefore our protocol can compute any symmetric functions (through evaluation of histogram), even any functions (through evaluation of the identity function). An algorithm for the max function is proposed in [13], taking advantage of the “witness discovery” protocol in [9] and the coding strategy in [12], and shown order optimal in both the number of transmissions and computation time. Our scheme explores hierarchical cooperation to improve the efficiency of distributed computing in wireless networks. Similar ideas were also explored in [30], [31] for the study of network capacity, where a network is divided into different layers and distributed MIMO is exploited for performance improvement. Besides the apparent difference in problem settings, our work emphasizes using as few bits as possible to complete computational tasks, while in [30], [31] the main goal is to increase the transmission rate. Nonetheless, advanced signal processing such as MIMO technique may help design more efficient protocols for distributed computing and deserves further study. The remainder of this paper is organized as follows. The system model is given in Section 2, together with some preliminaries and the summary of our main results. The outline of our protocol design and some prop-

3

erties of clustering are presented in Section 3. A general protocol is proposed for divisible functions in Section 4, with performance analysis and further discussion. Based on the insight obtained from Section 4, in Section 5 and 6, more efficient protocols are proposed for the identity function and some restricted type-threshold functions, respectively. The conclusion and future directions are provided in Section 7.

2

PROBLEM FORMULATION AND MAIN RE SULTS

In this section, we first discuss the models and assumptions considered in this work, then introduce some existing results to serve our analysis1 . 2.1 System Model We consider a synchronized dense network model2 , where n nodes are uniformly and independently distributed in a unit square. In our following analysis we model the network as a Geometry Random Graph [19] G(n, tn ); n nodes are vertices, and there is an undirected link between node i and j , i, j ∈ {1, . . . , n} , [n], if ||Pi − Pj || ≤ tn , where Pi is the position of node i, || · || is Euclidean norm and tn is the transmission range, identical for all the nodes. Each node i holds an m-bit integer xi (t) at time t, taking values from some finite set χ , [|χ|] (|χ| ≤ 2m ). Each m-bit integer could be a measurement in the field, e.g., temperature, or some metrics assigned beforehand. The computation is performed after each node collects a block of N readings. The goal is to calculate a divisible function f (x(t)) , f (x1 (t), x2 (t), ..., xn (t)) correctly with a certain fidelity in this distributed setting. Divisible functions are defined as [6]: Divisible Function: A function f is divisible if • rn = |R(f, n)| is nondecreasing in n, where R(f, n) is the function range; • given any partition Π(S) = S1 , S2 , ..., Sp of S ⊂ [n], there exists a function g Π(S) , such that for any x ∈ χn f (xS ) = g Π(S) (f (xS1 ), f (xS2 ), ..., f (xSp )),

(1)

where xS = {xi }, i ∈ S.

Divisible functions essentially cover the main body of interest for distributed computing in wireless networks; identity function, histogram3 , parity, mean, max/min are a few examples. Since divisible functions include the histogram and identity function, a general protocol evaluating divisible functions can also compute all the symmetric functions, i.e, functions invariable to the permutation of their arguments and depending on the input only through their histograms, and actually all functions (through the identity function). Without loss of generality, we assume that the result is made known to a special node, named sink node. This is common for applications in sensor networks; when necessary, the result at the sink node can be distributed to the whole network, typically at a similar complexity as what is shown below. We assume identical transmission power P for each node. Each point-to-point transmission is disrupted by the interference from concurrent transmissions and thermal noise. To constrain the interference, we adopt the Protocol Model widely used in literature [19]. Namely, node R(i) receives the (noisy) transmission from node i if • •

the distance between the transmitter and the receiver is no more than tn , i.e., ||Pi − PR(i) || ≤ tn ; for every node k, k ̸= i, transmitting at the same time, ||Pk − PR(i) || ≥ (1 + ∆)tn , where ∆ is the protocol-specified guard-factor to limit interference.

To deal with noise we adopt the adversarial noise model [9], for which independent (but not necessarily identical) binary symmetric channels are assumed for any transmissions, with crossover probabilities bounded above by some constant ϵ. More precisely, for any bit vector I ∈ {0, 1}k the received noisy copy I ′ = I XOR n ¯, where n ¯ ∈ {0, 1}k is an independent noise bit vector with Pr(¯ ni = 1) = pi ≤ ϵ. In other words, for any transmission the received bit is flipped with some probability no larger than ϵ. In this paper, it is assumed that ϵ < 1/4 for technical convenience.

2.2

Preliminaries

In this subsection we introduce a few known results, beginning with a generalization of the Chernoff bound.

1. We use the following order notations throughput our paper. Given non-negative functions f (n) and g(n): • f (n) = Ω(g(n)) if there exists a positive constant c1 and an integer k1 such that f (n) ≥ c1 g(n) for all n ≥ k1 . • f (n) = O(g(n)) if there exists a positive constant c2 and an integer k2 such that f (n) ≤ c2 g(n) for all n ≥ k2 . • f (n) = Θ(g(n)) if f (n) = Ω(g(n)) and f (n) = O(g(n)). • f (n) = o(g(n)) if there exists a positive constant k3 such that f (n) ≤ c3 g(n) for all c3 > 0 and n ≥ k3 . • f (n) = ω(g(n)) if there exists a positive constant k4 such that f (n) ≥ c4 g(n) for all c4 > 0 and n ≥ k4 .

Lemma 1: Let Xi (1 ≤ i ≤ n) be n i.i.d. random variables over∑the interval [a, b]. For the sum of these n variables S = i=1 Xi , we have the inequity

2. Our results hold for the extended network model [32] as well, where the network space increases with the network size while the network density is fixed.

3. The histogram of a vector x ∈ χn is defined as τ (x) = [τ1 (x), τ2 (x), ..., τ|χ| (x)], where τi (x) := |{j : xj = i}|, the number of occurrences of i in x.

2.2.1

Hoeffding inequality [25]

2nδ 2

Pr (S − E(S) ≥ nδ) ≤ e− b−a ,

δ > 0.

4

2.2.2 Constant-rate, Constant-fraction-Minimumdistance Codes [26], [14] Lemma 2: For γ ∈ (0, 12 ) there is an integer C1 = C1 (γ) such that for each positive integer n, ∀C ≥ C1 , there is a binary code Γn of size 2n and length Cn, such that for all v, w ∈ Γn with v ̸= w, the hamming distance d(v, w) ≥ γCn. Based on the above facts, the following result can be derived, which will be extensively used in our protocol design and analysis. Corollary 1: With O(m) transmissions over a binary symmetric channel (with error probability bounded above by a constant ϵ), an m-bit integer can be correctly received by a receiver with probability at least 1 − e−δm , ∀δ > 0. The proof is given in Appendix B. Remark 1: Any positive value can be chosen for δ without affecting the scaling law. Also, m does not need to be large. 2.2.3 Tail bounds for sums of noise bits [14] Lemma 3: Let n ¯ = {¯ ni } be a vector containing k independent random variables, with each taking value 1 with probability pi ≤ ϵ. For any nonzero vector α ∈ Rk , and t > 0 we have [ ] ∑ 2 2 Pr αi (¯ ni − ϵ) > t ≤ e−2t /|α| . i

This lemma is a straightforward generalization of Lemma 12 in [14], where i.i.d. noise components are assumed. 2.3 Overview of Main Results In this paper, we present three protocols: a general protocol for computing all divisible functions, and two specific efficient protocols, one for the identity function and one for a special class of type-threshold functions. The metric of interest is communication complexity of the protocol, i.e., the total number of bits exchanged in the network to get the task completed. We study the scaling law of this metric with respect to various parameters including n, N , and m. Note that this metric is closely related to energy consumption; in particular, it coincides with the energy usage in [11] when multiplied by a factor tα n , where α > 2 is the path loss exponent. We address both one-shot (N = O(1)) and block computation, and both constant and large m scenarios4 . Our results are summarized below. Theorem 1: The sink node can compute any divisible functions correctly with probability at least 1 − o(1), at the cost of

transmitted bits per instance. Remark 2: For some functions the general protocol actually achieves the best results known to us in literature, while for some others improvement can still be obtained in certain scenarios. Besides being interesting in its own right, this general protocol also provides insights into the design of more efficient protocols for specific functions. As two exemplary cases, we further propose more efficient protocols for the identity function and a special class of restricted type-threshold functions, both focused on the one-shot case with constant-size data. For the identity function, we have Theorem 2: For constant m and N , the communication complexity of computing the identity function, f (x) = x, by the sink √ node correctly with probability at least 1 − o(1) is Θ(n logn n ). Remark 3: The identity function corresponds to the sink node collecting all the raw data held by the nodes, which enables the sink node to calculate any functions. Therefore, the complexity of any functions evaluated in √ a noisy multi-hop network is O(n logn n ). Our final interest is on type-threshold functions defined as [6]: Type-Threshold Function: A function f (·) is said to be typethreshold if there exists a nonnegative |χ|-dimensional vector θ, called the threshold vector, such that f (x) = f ′ (τ (x)) = f ′ (min(τ (x), θ)), for all x ∈ X n , where τ (x) is the histogram of x and min is understood as elementwise minimum. Remark 4: Type-threshold functions are a subset of symmetric functions 5 , whose values are solely determined by the histograms of the input data. For example, the max function is a type threshold function with a threshold vector θ = [1, 1, ..., 1]; we can find the maximum among the inputs by searching the first non-zero position from right in the vector min(τ (x), θ). This definition of the type-threshold function implies that the value of a type-threshold function can be determined by a subset xS of the total inputs, i.e., f (x) = f (xS ). For example, as to the max function, S could be any subset that contains the index i such that xi is the maximum of the input. Of course, this subset is typically input-dependent and not known a priori. However, it is possible to design efficient protocols that lead to the discovery of this subset. To this purpose, we further impose restriction on the size of S, and define the sizerestricted type-threshold functions as follows:

(2)

Size-Restricted Type-Threshold Function: A type-threshold function f (·) is said to be K-restricted if there exists a subset S with its size bounded above by a known value K, such that f (x) = f (xS ). Remark 5: Note that only the size of subset xS is needed to be known, instead of its content. Fox example,

4. large m has relevance, e.g., when node identities are used as inputs, a common practice in the study of distributed computing.

5. Although type-threshold functions are not necessarily divisible, our general protocol still serves as a benchmark for them through evaluation of histogram.

(

O

max(N log rn , log n)n n max(N m, log log n) + N N log n

)

5

TABLE 2: communication complexity without block coding (mN = O(1)) Function histogram (Sec. 4.3) parity (Sec. 4.3) identity (Sec. 5)

Upper bound O(n log log n) O(n√ log log n) n O(n log ) n

Lower bound O(n) O(n)√ n O(n log ) [10] n

max (Sec. 6) K(= o(log n)) largest numbers (Sec. 6)

O(n) O(n log log K)

O(n) O(n)

TABLE 3: communication complexity with block coding (mN = Ω(log log n)) for all the divisible functions Function rn = O(nm ) (Sec. 4.3) rn = Ω(nm ) (Sec. 4.3)

Upper bound O(nm) log rn O( nlog ) n

Lower bound O(nm) O(nm)

|S| = 1 for the max function and the indicator function, and |S| = K for the function that computes the K largest values (which need not be distinct). However, the function that computes the Kth largest value (required to be strictly smaller than the (K − 1)th one) is not size-restricted, even though it is also a type-threshold function; the reason is that the size of its defining subset S can not be pre-determined but depends on the input instead. For the size-restricted type-threshold function, we have Theorem 3: A size-restricted type-threshold function with K = o(log n) can be computed by the sink node correctly with probability at least 1 − δ¯ for an arbitrary ¯ at the cost of O(n log log K) transmitted small constant δ, bits for constant m and N . Remark 6: This is the best result known to us in literature for this type of functions. For K = O(1) the size-restricted type-threshold functions admit linear complexity, which is tight. For K = ω(1) the nonlinear factor log log K grows very slowly compared with n. We summarize our results (achievable communication complexity) for some interesting functions in Table 2 and 3, together with lower bounds best known in literature. The linear lower bounds are trivial so no references are given.

3

O UTLINE

OF PROTOCOL DESIGN

As a common approach in the study of wireless networks, we take a layered approach in our protocol design for in-network computing. Each individual protocol is composed of an inter-cluster protocol and an intracluster protocol: the former one is employed for local computation and the latter one for data aggregation. The intuition behind this layered design is as follows. Generally speaking, functions can be evaluated more efficiently in broadcast networks than in multi-hop networks since multi-hop transmission may incur additional errors. It is therefore beneficial to partition the whole multi-hop network into local broadcast networks as few as possible. Functions are first evaluated in local broadcast networks

and then the local (partial) results are aggregated and sent to the sink node. One application of this idea is tessellating the network into regular cells [19], which is also widely used in literature [10], [11]. However it is highly non-trivial, if possible at all, to implement tessellation in practice. As an alternative, clustering techniques are more flexible and practical even though each cluster is not necessarily a broadcast network 6 . Our protocols work on clustered networks without degrading the performance in terms of scaling law, compared with cellbased ones. In our analysis we assume that clusters are formed so that (1) each node uniquely belongs to one cluster; (2) in each cluster i there is a cluster head hi , which can communicate with any other nodes in this cluster; (3) ||Phi − Phj || ≥ tn , ∀i, j. One clustering approach satisfying the above assumptions can be found in [28] (c.f. Appendix A). Clearly, the set of cluster heads is a dominating set7 . Searching the minimal dominating set is an NP-hard problem [27], which, however, is not required in our study. Two clusters c1 and c2 are neighbors if there exist nodes x and y in c1 and c2 respectively with ||Px − Py || ≤ tn , and x, y can serve as relay nodes. If multiple pairs of such nodes are available, one pair is randomly chosen. Two clusters c1 and c2 are potential interfering neighbors if there exist nodes x and y in c1 and c2 respectively with ||Px − Py || ≤ (2 + ∆)tn . A typical clustered network is shown in Fig. 1. Denote by Nc , ni , Bi and Di , the number of clusters in the network, the number of nodes in cluster i, the number of neighbors and the number of potential interfering neighbors of cluster i, respectively. Lemma 4: Given the transmission range √ tn = Θ( log n/n), which guarantees the full connectivity of the network, and a clustering that satisfies the assumptions indicated above, we have, with high probability8 (w.h.p.): a) Nc = Θ( logn n ); b) ni = Θ(log n); c) 1 ≤ Bi ≤ Di < K, where K is a constant. Proof: Since the cluster head is one hop away from other nodes within the same cluster the area of each cluster is upper bounded by the circle centered at the cluster head with radius tn . Thus, Nc ≥ πt12 = Ω( logn n ) n and ni ≤ 2nπt2n = O(log n), w.h.p., according to the cental limit theorem. Since for any two cluster heads ||Phi − Phj || ≥ tn , there is no overlapping among circles centered at the cluster heads with radius tn /2, which indicates that Nc ≤ π(tn1/2)2 = O( logn n ). a) is proved. It is known that the degree of a node in a geometric random 6. In a cluster satisfying our assumptions below, one node may reach another one through two hops. 7. A dominating set for a graph G = (V, E) is a subset D of V such that every vertex not in D is joined to at least one member of D by some edge. 8. The probability approaches 1 when n goes to infinity.

6

Fig. 1: a clustered multi-hop network graph scales as Θ(log n), w.h.p. [22], thus ni ≥ Θ(log n). b) is then proved. Bi ≥ 1 due to the fact that the transmission range tn guarantees full connectivity of network. Since two neighboring clusters must also be potential interfering neighbors, while the inverse is not true, Bi ≤ Di . The distance between the cluster heads of two potential interfering neighbors is at most (1+2+∆+1)tn = (4+∆)tn . Thus, all the interfering neighbors of cluster i are located within the circle centered at hi with radius (5 + ∆)tn . 4π(5+∆)2 t2n Di < , K due to the fact that there is no πt2n overlapping among circles centered at the cluster heads with radius tn /2. Hence the proof is competed. Note that although these properties coincide with those when tessellation is applied, clustered networks lack the regular structure of tessellated networks, which complicates the protocol analysis. Clustering approaches are out of the scope of this paper; in the following discussion we assume clusters satisfying our assumptions and properties in Lemma 4 are already formed. As we mentioned at the beginning of the section, in clustered networks the protocol design is decomposed into intra-cluster and inter-cluster ones. Some existing protocols specific for noisy broadcast networks may facilitate the intra-cluster protocol design, however, the difficulty still remains to bound the total error probability since there are unbounded number of clusters in a network. The inter-cluster protocol aims at aggregating the local results calculated within clusters and forwarding them to the sink node through a sink tree T (Fig. 2), which is a spanning tree of graph G′ , a subgraph of G composed of all the cluster heads and some relay nodes. Caution should be taken to prevent error propagation in the inter-cluster protocol design. In the following, we assume T is generated and known to all the cluster heads and relay nodes before the computation starts. The protocols are executed according to a proper scheduling scheme described as follows. We color the

Fig. 2: a sink tree rooted at the sink node. The directed line shows the information flow from the leaves to the sink node. clusters in a way that each cluster is colored differently from its interfering neighbors. The clusters with the same color are scheduled (active) simultaneously and nodes in the active clusters can transmit to any nodes within their transmission range. Note that 1) this scheduling scheme only constrains the interference from concurrent transmissions and the influence of the noise should still be taken into consideration; 2) there may exist a better scheduling scheme to improve the time complexity, which is not our concern in this paper.

4 A GENERAL F UNCTIONS

PROTOCOL

FOR

D IVISIBLE

In this section we introduce a general protocol for computing any divisible function. On the practical side, a general protocol can deal with various functions and thus simplify implementation. And on the theoretical side, a general protocol serves as a uniform platform to investigate the distributed computing problem, and its bottleneck analysis provides significant insights into potential performance improvement. 4.1

Protocol

The proposed protocol is composed of an intra-cluster protocol and an inter-cluster protocol, corresponding to two sequential stages: local processing and data aggregation, respectively. The intra-cluster protocol evaluates the function over the data within the cluster through the coordination of cluster head hi , which takes (much) more responsibility besides its normal role as other nodes in the protocols. In the inter-cluster protocol, the local (partial) results are aggregated and routed through cluster heads and relay nodes to the sink node, with possible further processing along the way. The intra-cluster protocol is executed in each cluster when it is active, and nodes in the active cluster take

7

turns (in a fixed order) to transmit their information, which conforms to the requirement that the protocols dealing with noisy communications be oblivious. For the ith cluster, assuming that its members are ordered from 1 to ni , the intra-cluster protocol is executed in two phases: Intra-cluster protocol: (i) Each node in the ith cluster encodes its data, composed of N observations, into a codeword with size O (max(N m, log(ni ))) and transmits it to the cluster head hi . Then hi decodes the noisy copy of the codeword, re-encode the N m-bit raw data into a codeword with size O (max(N m, log(ni ))) and broadcasts it. This phase ends when the cluster head broadcasts all the data in the cluster once. (ii) After decoding the codewords from the cluster head, each node computes the function and encodes the output (N log rni bits) into a codeword with size s = O(max(N log rni , ni )). The codeword is padded with 0 to a length of an integral multiple of ni , and then partitioned equally into ni blocks. Then each node j in the ith cluster takes turn to broadcast the jth block of its own codeword9 . The cluster head hi collects all the blocks from its cluster and decodes it. After executing the intra-cluster protocol, local results with certain reliability are obtained by the cluster heads. Then the local results are aggregated and transmitted to the sink node on a sink tree described in Section 3 (see Fig. 2). The data aggregation process is performed by the inter-cluster protocol, where the information flow is from the leaves (which are all cluster heads) to the root (the sink node) and each vertex on the tree only transmits once. Inter-cluster protocol: (i) The leaves encode the results they obtain into codewords with size O(max(N log rn , log n)) and transmit them to their parent nodes on the tree. (ii) Intermediate vertices on the tree, either cluster heads or relay nodes, perform decoding-fusionforward: first decode the codeword(s) they receive, then fuse the local results according to the definition of divisible functions, and finally forward the new results to their parent nodes after encoding it into codewords with size O(max(N log rn , log n)). 4.2 Analysis We analyze the performance of the intra-cluster protocol first. Proposition 1: For any cluster i, the complexity ( ) of the intra-cluster protocol is O max(N m, log ni )ni N1 per instance. And the cluster head hi obtains a correct result with probability at least 1 − n22 . 9. Note that each node transmits only a portion of its local computation result, which effectively introduces spacial diversity into the final aggregation at the cluster head.

The proof is given in Appendix C. Since ni = Θ(log n) and there are Θ( logn n ) clusters, (the total complexity )of the intra-cluster protocol n max(N m, log log n) . And by the union bound, is O N the probability that all the cluster heads acquire correct 2 information in their clusters is at least 1 − O( n log n) = 1 − o(1). To analyze the inter-cluster protocol we know that there are Θ( logn n ) cluster heads and at most KΘ( logn n ) relay nodes according to Lemma 4. It follows that the size of the spanning tree is Θ( logn n ). Since each node only transmits once, the complexity ( of the inter-cluster proto) col for one instance is10 O N1 max (N log rn , log n) logn n . We now consider the error probability for the intercluster protocol. A codeword can be correctly decoded by its parent node with probability at least Pr

(a codeword is decoded correctly)



1 − e−γ max(N log rn ,log n) ,

(3)

for some γ > 1, according to Corollary 1. By the union bound Pr (all the codewords are decoded correctly) n ≥ 1 − Θ( )e−γ max(N log rn ,log n) log n n −γ log n ≥ 1−a e log n a (4) = 1 − γ−1 . n log n where a is a constant. Combining the analysis for both the intra-cluster and inter-cluster protocols we reach the conclusion in Theorem 1. 4.3

Discussion

Since our general protocol can compute any divisible function, it is interesting to examine its efficiency and potential bottlenecks. Depending on different relations between the system parameters, we differentiate a few cases as below: 4.3.1 One shot, constant-size data case In this scenario N = O(1) and m = O(1). From Theorem 1, we can see in such a case the complexity becomes O(n log log n + max(log rn , log n) logn n ). rn plays an important role here and provides us some insights for the protocol design. log log n • rn = O(n ): In this case the intra-cluster operation dominates, and the complexity of our general protocol becomes O(n log log n), which provides the best upper bound we know in literature on some individual functions. For example, our result for the histogram computation (rn = O(n|χ| )) is the same as what is achieved in [11]. For the parity function (rn = O(1)), the best known protocol is given in [8] for a noisy broadcast network with the complexity 10. m may play a role here through rn .

8



of O(n log log n). Our result shows that actually this result is achievable for a noisy multi-hop network as well. In addition, when rn = O(n) the complexity of the inter-cluster protocol is O(n). Obviously for all non-degenerate functions, Ω(n) transmissions is needed, even in a noiseless broadcast network. Thus, any effort to improve the efficiency of the intercluster protocol is not necessary in terms of scaling law of communication complexity. rn = Ω(nlog log n ): In this case it is easily seen that the complexity of our proposed protocol is O(log rn logn n ). This reveals an interesting point that, when the function range rn is large enough the computation bottleneck lies in the inter-cluster pron tocol. The identity function (rn = |χ| ) belongs to this category, and our general protocol can compute n2 it with the complexity of O( log n ). We will give a more efficient inter-cluster protocol for computing the identify function in the next section.

4.3.2 Block computation This scenario corresponds to mN = ω(1). It can be shown that O(nm) is achieved when rn = O(nm ), and log rn O( nlog n ) is achieved otherwise. Intuitively when either m or N is large enough, the computation efficiency will be improved through block coding. By examining the complexity expression of our protocol in (2), we find that this intuition is only true for the intra-cluster protocol. In particular, the first term of the complexity expression suggests that when N m = Ω(log log n), a tight bound O(nm) for the intra-cluster protocol is achieved. For the inter-cluster protocol, when rn = O(n), block computation helps improve the efficiency from O(n) to log rn O( nlog n ). However when rn = Ω(n) it does not help. For example block computation can not help for the inter-cluster propagation in our general protocol for the computation of the identity function; in contrast, for the max function with constant m, block computation helps both for the intra-cluster processing and the inter-cluster propagation. To summarize this section, our general protocol is actually quite efficient; for example, it admits the best known results about the computation of the histogram and parity function in literature. On the other hand, we also identify the scenarios for potentially further improvement. In the following two sections, we discuss two such improvements.

5 I MPROVEMENT FOR THE IDENTITY FUNC TION It is shown in the last section that if rn is large the intercluster protocol limits the performance. To demonstrate how to improve the efficiency of an inter-cluster protocol we consider the identity function, which attracts much attention in the studies of distributed computing since it is the mother function of all the other functions. In

the one-shot, constant-size data case, it is known that in a noisy broadcast network with n nodes the (order) optimal complexity for identity function is θ(n log log n) [14], which is achieved by the intra-cluster protocol above. Therefore we focus on the improvement of the inter-cluster protocol for the identity function in this section. We consider the one shot case with constant m. In the new inter-cluster protocol each cluster head codes its aggregated information, Θ(m log n) bits in total obtained from the intra-cluster protocol (recall from Lemma 4 that each cluster contains Θ(log n) nodes), into a codeword with size O(log n). Each codeword is transported to the sink node through the shortest path and appropriate scheduling. The decode-and-forward scheme is adopted for the information’s relay. Then we have Proposition 2: For constant m and N , the identity ( √ func) tion can be evaluated at the complexity of O n logn n with error probability o(1). Proof: To begin ( with, √ we)show that the total number of hops Nh is Θ logn n logn n . Without losing generality we assume the sink node is located at the center of the square. We first divide the space into one circle with radius tn , and a sequence of annulus Ci , C((2i − 1)tn , (2i + 1)tn ), i = 1, 2, ..., Θ( t1n ), all centered at the sink node (see Fig 3). And then the space√is further tessellated11 into cells with side sn = tn / 5 so that each cell contains at least one node w.h.p.. Therefore, the √ cluster heads in Ci need at most (2i + 1)tn /sn = 5(2i + 1) hops to reach the sink node and there are at π(2i+1)2 t2n −π(2i−1)2 t2n = 32i clusters in Ci (following most π(tn /2)2 the same arguments in Lemma 4). Therefore the total number of hops Nh can be computed as: Θ(

Nh

=

√ n log n ) ∑ √ 5(2i + 1) · 32i (

i=1

= Θ

n log n



n log n

(5)

) .

Since O(log n) bits are transmitted by each hop the communication complexity of this inter-cluster protocol is given by ( √ ) n O(log nNh ) = O n . (6) log n It’s easy to check that in each hop a codeword can be correctly decoded by a receiver with probability at least Pr

(a codeword is decoded correctly)



1 − e−γ log n ,

(7)

for some γ > 3/2, according to Corollary 1. 11. Tessellation here only facilitates the calculation of Nh . The protocol for the identity function is still applied in clustered networks.

9

6

IMPROVEMENT FOR THRESHOLD FUNCTIONS

Fig. 3: division and tessellation of the network. The data of each node is transmitted to the sink node through the shortest path routing.

By the union bound Pr ≥ = =

(all the codewords are decoded correctly) √ n n 1 − Θ( )e−γ log n log n log n a 1 − γ−3/2 n (log n)3/2 1 − o(1). (8)

where a is a constant. The lower bound of evaluating the identity function in multi-hop networks is achieved by assuming the networks are error free and each individual node transmits its integer to the sink node through the shortest path routing. Note that there is no redundancy in the inputs to compute the identity function since the sink node requires all the information in the network. According to [10] this ideal ( √shortest ) path routing protocol admits n complexity Θ n log n , which reveals that our protocol applied in the noisy networks provides a sharp bound. The above analysis completes the proof of Theorem 2. Remark 7: This inter-cluster protocol incorporates a coding scheme into the shortest path routing, which is obviously applicable for any function. However it improves the computing efficiency for the identity function, not necessarily for other functions. As we discussed in 4.3, for the rn = O(nlog log n ) case, the bottleneck lies in the intra-cluster protocol, so there is no need to improve the inter-cluster protocol. Remark 8: The inter-cluster protocol in the general protocol exploits in-node computation, i.e., relay nodes may process the information they receive before forwarding it, which could be more efficient for small rn . In contrast, the inter-cluster protocol here utilizes the shortest path routing with the decode-and-forward scheme, which could be better for large rn .

RESTRICTED

TYPE -

In this section we’ll show that the intra-cluster protocol could be further improved for some restricted typethreshold functions f , which are defined in Sec. 2. It’s not difficult to check that for this class of functions, a tight bound Θ(mn) in communication complexity is achieved by our general protocol (Theorem 1) in block computation scenarios. Therefore we focus on the one shot case (N = O(1)) with constant m, for which the general protocol gives a complexity of O(n log log n). As we discussed in Sec. 4.3, the bottleneck of computation for this type of functions lies in the intra-cluster protocol. In this section we design a more efficient intra-cluster protocol, K-best candidates, for this class of functions with complexity O(n log log K) and a fault tolerance Q + o(1), where Q is an arbitrarily small constant. Our study is inspired by the “witness discovery” protocol in [9] which is designed for the max function (one candidate) and applied in a noisy broadcast network. Our problem is more complicated and challenging in nature since the number of candidates (K) and clusters in the network are both unbounded. The non-triviality of our design lies in the way to bound the total error probability. In the following, we use the function, finding the K largest values (without distinction), to describe our design for concreteness. With suitable modification, it can be extended to computing other restricted typethreshold functions. 6.1

The intra-cluster protocol—K best candidates

The intuition behind the K best candidates protocol comes from the elimination match, where teams or individuals are grouped to compete with each other and the winner(s) will go into the next round and be grouped again. The whole process is repeated until a champion is selected. Similarly in the K best candidates protocol, the cluster head intends to find out the K largest numbers in its cluster. First, we divide ni nodes in cluster i into12 ni 4K groups, each of size 4K. The cluster head hi selects K nodes from each group, which have the (nominally) largest values, into the next round. Then the selected nodes are grouped and compared again. The details of the protocol are described as below: (1) The cluster head hi collects all the measurements in its cluster. We differentiate two scenarios based on the value of K: • K = O(1): in this case K is a constant. Each node simply encodes into a code( its integer ) K word with size O max(m, log Q ) and transmits it to hi ; • K = Ω(1): in such a case each group is further divided into subgroups of size log 4K. 12. Throughout this protocol, extra dummy nodes with the smallest measurements may be added to make the group sizes equal.

10

And then the general intra-cluster protocol discussed in Sec. 4 is applied in each subgroup, with ni there replaced with13 log 4K. (2) For each group, hi compares the 4K values and determines the indices of the K largest values. And then hi assigns ‘1’ to nodes holding the K largest values and ‘0’ to others. These ni assigned bits are concatenated into a word, which is further encoded to a codeword of size O(ni ) and broadcasted. (3) Let J be the set of winning nodes in (2). The protocol (steps 1 − 3) proceeds by executing the protocol on J (of size ni /4) for 3 independent times. In each time, the head node hi obtains from the protocol output K indices14 corresponding to the K (nominally) largest values; it takes the majority of the three outputs as the result for the protocol on cluster i (or arbitrarily decides if there is no majority). (4) hi encodes the K indices found in (3) into a codeword with size O(ni ) and broadcasts it. The nodes selected then encode their data into codewords with size O(ni ) and send them to hi . The protocol ends up with hi knowing the K largest values in cluster i. Remark 9: Recursion happens on the third step, where we obtain a winning set of reduced size (by a factor of 4), and apply this protocol to the winning set for three times. That is, to find out the K candidates once over z nodes, the protocol is executed over z/4 nodes for three times. The recursion ends when z = 4K. In this case, hi compares the 4K values and determines the indices of the K (nominally) largest values for three independent times. This protocol is non-oblivious and can be turned into an oblivious one by the ’helper’ idea in [9] without degrading the performance.

7

C ONCLUSION

AND FURTHER WORK

Distributed computing in noisy multi-hop networks is an attractive yet challenging problem. In this paper we first designed a general protocol for computing any divisible functions reliably with high probability in a noisy multi-hop network, with communication complexn n ity O( N max(N m, log log n) + max(N log rn , log n) N log n ), where n, N, m and rn are the number of nodes in the network, the length of data blocks at each node, the size of each data point in bits, and the range of the function of interest, respectively. The merits of this general protocol include: 1) it provides a benchmark to evaluate various functions; 2) its bottleneck analysis motivates further improvement. After analyzing its bottleneck in different scenarios, we endeavored to design more efficient protocols for the identity function and some restricted type-threshold functions, focusing on the one-shot case (N = O(1)) with constant-size data (m = O(1)). We √show that they can be evaluated with complexity Θ(n logn n ) and O(n log log K), respectively. In our future work we’ll devote our effort to good lower bounds for restricted type-threshold functions, which can reveal the tightness of the upper bound we achieved in this paper. We will also consider the computation of type-sensitive functions [6] over noisy multihop networks. Intuitively they may not be calculated (much) more efficiently than the identity function since all the information in the network is required to evaluate them. ACKNOWLEGEMENTS

This work was supported in part by the US National Science Foundation under Grant CCF-0721815, CCF0830462 and ECCS-1002258. A conference version of this work appeared in IEEE INFOCOM 2010 [29].

6.2 Analysis

R EFERENCES

Proposition 3: With O(n log log K) bits transmitted, the K largest values are received by cluster heads incorrectly K with probability Q + log nlog n + n log n , where Q is an arbitrarily small constant. The proof is given by the Appendix D. The inter-cluster protocol is the same as the one ( in the general )protocol, with the complexity O max(mK, log n) logn n = O(n) and the error probabil1 ity n log n. Theorem 3 follows from combining the results for the intra-cluster and the inter-cluster protocol and setting

[1] N. A. Lynch, Distributed Algorithms, San Francisco, CA: Morgan Kaufmann, 1997. [2] H. Attiya and J. Welch, Distributed Computing: Fundamentals, Simulations, and Advanced Topics, second edition, WileyInterscience, 2004. [3] A. Orlitsky and A. El-Gamal, “Communication complexity,” Complexity in Information Theory, Y. S. Abu-Mostafa (ed), pp. 16-61, 1988. [4] L. Lovasz, “Communication complexity: A survey,” Paths, Flows, and VLSI layout, B. H. Korte (ed), Springer-Verlag, 1990. [5] E. Kushilevitz and N. Nisan, Communication Complexity, Cambridge Univ. Press, 1997. [6] A. Giridhar and P. R. Kumar, “Computing and communcating functions over sensor network,” IEEE J. Sel. Areas Commun., vol. 23, no. 4, Apr. 2005. [7] A. El Gamal, Open problem presented in the 1984 Workshop on Specific Problems in Communication and Computation, sponsored by Bell Communication Research, 1984. [8] R. Gallager, “Finding parity in a simple broadcast network,” IEEE Trans. Info. Theory, vol. 34, no. 2, pp. 176-80, Mar. 1988. [9] I. Newman, “Computing in fault tolerance broadcast networks,” IEEE Annual Conference on Computational Complexity (CCC), 2004. [10] N. Khude, A. Kumar, and A. Karnik, “Time and Energy Complexity of Distributed Computation of a Class of Functions in Wireless Sensor Networks,” in IEEE Trans. Mobile Computing, 2008.

K 1 log log n + + . δ¯ = Q + n n log n n log n 13. The only difference from the intra-cluster protocol in Sec. 4 is that, we do not need individual heads for each subgroup; hi collects the data from each of them. 14. the result from one execution of the protocol on ni /4 nodes

11

[11] L. Ying, R. Srikant, and G. E. Dullerud, “Distributed symmetric function computation in noisy wireless sensor network,” IEEE Trans. Infor. Theory, 2007. [12] S. Rajagopalan and L. Schulman, “A coding theorem for distributed computation,” Proc. 26th Annual ACM Symp. Theory of Comp., 1994. [13] Y. Kanoria and D. Manjunath, “On distributed computation in noisy random planar networks,” Proc. IEEE International Symposium on Infomration Theory, Nice, France, June 2007. [14] N. Goyal, G. Kindler and M. Saks, “Lower bounds for the noisy broadcast problem,” IEEE Symposium on Foundations of Computer Science, 2005. [15] A. Yao, “On the complexity of communication under noise,” invited talk in the 5th ISTCS Conference, 1997. [16] E. Kushilevitz and Y. Mansour, “Computation in noisy radio networks,” ACM-SIAM Symp. Discrete Algorithms, 1998. [17] U. Feige and J. Kilian, “Finding OR in a noisy broadcast network,” Information Processing Letters, vol. 73, no. 1-2, 2000. [18] C. Li, H. Dai and H. Li, “Finding the K largest metrics in a noisy broadcast network,” Allerton Conference on Communication, Control and Computing, 2008. [19] P. Gupta and P. R. Kumar, “The capacity of wireless networks,” IEEE Trans. Infor. Theory, vol. 46, no. 2, Mar. 2000. [20] F. Xue and P. R. Kumar, Scaling Laws for Ad Hoc Wireless Networks: An Information Theoretic Approach, Delft, The Netherlands, 2006. [21] P. Gupta and P. Kumar, “Criticial power for asymptiotic connectivity in wireless networks,” Stochastic Analysis, Control, Optimization and Applications: A Volume in Honor of W. H. Fleming, W. McEneaney, G. Yin, and Q. Zhang, Eds. Boston, MA: Birkhauser, 1998. [22] F. Xue and P. Kumar, “The number of neighbors needed for connectivity of wireless networks,” Wireless network, vol. 10, no. 2, Mar. 2004. [23] D. Mosk-Aoyama and D. Shah. “Computing separable functions via gossip,” ACM Principles of Distributed Computing, Sep. 2007. [24] S. Subramanian, P. Gupta and S. Shakkottai, “Scaling Bounds for Function Computation over Large Networks,” Proc. IEEE International Symposium on Infomration Theory, Nice, France, June 2007. [25] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” J. Amer. Stat. Assoc., vol. 58, pp. 13-30, 1963. [26] J. H. van Lint, Introduction to coding theory, 3rd Edition, SpringerVerlog, 1999. [27] M. R. Garey and D. S. Johnson, A Guide to the Theory of NPCompleteness, W. H. Freeman and Company, 1979. [28] W. Li and H. Dai, “Cluster-based Distributed Consensus,” IEEE Trans. Wireless Communications, vol. 8, no. 1, Jan. 2009. [29] C. Li and H. Dai, “Towards Efficient Designs for In-network Computing with Noisy Wireless Channels,” in Proc. IEEE INFOCOM, 2010. [30] A. Ozgur, O. Leveque and D. Tse, “Hierarchical Cooperation Achieves Optimal Capacity Scaling in Ad Hoc Networks, ” IEEE Transactions on Information Theory, vol 53, no. 10, pp. 3549 - 3572, October 2007 [31] J. Ghaderi, L. Xie and X. Shen, “Hierarchical Cooperation in Ad Hoc Networks: Optimal Clustering and Achievable Throughput,” IEEE Transactions on Information Theory, vol 55, no. 8, 2009. [32] L.-L. Xie and R. R. Kumar, “A network information theory for wireless communication: Scaling laws and optimal operation,” IEEE Transactions on Information Theory, vol 50, no. 5, 2004.

Chengzhi Li (S’10) received the B.E. degree from Nankai University, Tianjin, China, and M.S. degree from Peking University, Beijing, China, in 2001 and 2004, respectively, and the Ph.D. degree in electrical engineering from North Carolina State University, Raleigh, NC in 2012. He is currently a staff scientist in Broadcom Corporation, Matawan, NJ. His research interests are in wireless communications and networking, signal processing for wireless communications.

Huaiyu Dai (M’03, SM’09) received the B.E. and M.S. degrees in electrical engineering from Tsinghua University, Beijing, China, in 1996 and 1998, respectively, and the Ph.D. degree in electrical engineering from Princeton University, Princeton, NJ in 2002. He was with Bell Labs, Lucent Technologies, Holmdel, NJ, during summer 2000, and with AT&T Labs-Research, Middletown, NJ, during summer 2001. Currently he is an Associate Professor of Electrical and Computer Engineering at NC State University, Raleigh. His research interests are in the general areas of communication systems and networks, advanced signal processing for digital communications, and communication theory and information theory. His current research focuses on networked information processing and crosslayer design in wireless networks, cognitive radio networks, wireless security, and associated information-theoretic and computation-theoretic analysis. He has served as editor of IEEE Transactions on Communications, Signal Processing, and Wireless Communications. He co-edited two special issues for EURASIP journals on distributed signal processing techniques for wireless sensor networks, and on multiuser information theory and related applications, respectively.

12

A PPENDIX A D ISTRIBUTED

CLUSTERING

[28]

We assume each node i has an initial seed si which is unique within its neighborhood. This can be realized through, e.g., drawing a random number from a common pool, or simply using nodes’ IDs. From time 0, each node i starts a timer with length ti = si , which is decremented by 1 at each time instant as long as it is greater than 0. If node i’s timer expires, it becomes a clusterhead, and broadcasts a “cluster initialize” message to all its neighbors. Each of its neighbors with a timer greater than 0 signals its intention to join the cluster by replying with a “cluster join” message, and also sets its timer to 0. If a node receives more than one “cluster initialize” messages at the same time, it randomly chooses one cluster-head. At the end, clusters are formed such that every node belongs to one and only one cluster. Note that the uniqueness of seeds within the neighborhood ensures that cluster-heads are at least of distance tn from each other. Therefore, this distributed approach form clusters satisfying our assumptions.

A PPENDIX B P ROOF OF C OROLLARY 1 The proof follows the lines of Observation 2.1 in [9]. According to Lemma 2, for some fixed γ ∈ (0, 1/2), we can encode the m-bit integer by a codeword v ∈ Γm with length Cm, where C ≥ C1 (γ) will be further specified below. A receiver gets a noisy copy v ′ = v XOR n ¯ of v and decodes it as w ∈ Γm which minimizes d(v ′ , w). The decoding error is upper bounded by ( ) ( ) 1 1 ′ Pr d(v , v) ≥ γCm = Pr |¯ n| ≥ γCm 2 2 where | · | is l1 norm of the noise vector n ¯ . By Lemma 1, this probability is bounded above by exp(−2( 12 γ − 1 ϵ)2 Cm). This value is at most e−δm for C ≥ δ (γ/2−ϵ) 2, ∀δ > 0.

A PPENDIX C P ROOF OF PROPOSITION 1 We analyze the complexity first. It is clear that in step (i) O(max(N m, log(ni )))ni bits in total are transmitted for N instances. In step (ii) each node O(max(N log rni ,ni )) outputs bits, which leads to ni overall O(max(N log rni , ni )) transmitted bits. Since max log rni = mni the latter one is upper bounded by O(N mni ). Considering both step (i) and (ii), the complexity of the intra-cluster protocol for one instance ) ( is O max(N m, log ni )ni N1 . Then we turn to the analysis of the error probability. Assume ni = a log n, with a constant a > 0, and set βi = 2/a. In phase 1, denote Ui as the set of nodes in cluster i, which do not correctly receive all the other data relayed by the cluster head. According to Corollary 1, each individual data reaches some node I with error

probability 2e−δ max(mN,log ni ) ≤ n2δ , ∀δ > 0, where the i factor 2 comes from the fact that data is relayed once by the cluster head. By the union bound Pr[I ∈ Ui ] ≤ ni

2 = 2n1−δ . i nδi

Therefore ∀αi > 0 αi Pr[|Ui | ≥ ϵni ] ≤ 2

ni ∑ s=

αi 2

(2n1−δ )s i

ϵni

) ≤ (2n1−δ i

αi 2

ϵni ni

2

( ) n s

(9)

< 2−βi ni ,

where the second inequality is due to the fact that the total number of node subsets in cluster i is no more than 2ni , and the last inequality follows the fact that 2n1−δ < i −2(1+βi ) αi ϵ

for large enough δ. In phase 2, each node outputs an N log rni -bit word after some calculation. Without noise, these words are identical and can be encoded into a codeword v2 ∈ Γ′n with length Cli , where li = max(N log rni , ni ), according to Lemma 2. In the noisy environment, the cluster head receives a noisy copy of v2 , v2′ . The decoding error is upper bounded by Pr(d(v2′ , v2 ) ≥ 12 γCli ) = Pr(d(v2′ , v2 ) ≥ (1 + αi )ϵCli ) for γ = 2(1 + αi )ϵ, and occurs due to two reasons: 1) the blocks of bits from some node j ∈ Ui are erroneous; 2) bits may be corrupted by noise during transmission. We hence have 2

Pr(decoding error)

(10)

Pr(d(v2′ , v2 )

≥ (1 + αi )ϵCli ) li li αi ≤ Pr(C |Ui | ≥ C ϵni ) + Pr(Ne ≥ (1 + αi /2)ϵCli ) ni ni 2 α2 ≤ 2−βi ni + exp(− i Cϵ2 ni ) 2 2βi −βi ni 1 and K = Ω(1) (resp. K = O(1)). Due to the fact that the K target nodes come from at most K groups,

Pr(e1,1 ) ≤

2K < Q2 , log(4K)

(4K)β−1

β

where ni = Θ(log n) is the number of nodes in cluster i. The complexity of the index-broadcasting in step (2) is O(ni + ni /4 + ni /16 + ...) = O(ni ) and the complexity in step (4) is O(ni ). Therefore the communication complexity in cluster i is O(ni log log K). Since there are Θ(n/ log n) clusters, the total complexity of the intracluster protocol is O(n log log K). For the K = O(1) case, each node transmits O(max(m, log K Q )) bits in each group. Then following the same proof above we conclude that the complexity of the intra-cluster protocol is O(n max(m, log K Q )) = O(n) (since K, Q, m are all constants). We then turn to the analysis of the error probability. For the ease of description, the nodes holding the (globally) K largest values are called target nodes. Note that 1) the target nodes could be distributed in at most K groups, located in at most K clusters; 2) errors do not occur if wrong nodes are selected in groups without target nodes. Therefore we only need to guarantee that the protocol is executed correctly at the groups containing the target nodes, and all the selected nodes (at the end of recursion) transmit correct information to their corresponding cluster heads, with high probability. An error may occur if either of the following two cases happens: e1 : not all the target nodes are selected by the cluster heads (corresponding to the first three steps); e2 : the cluster heads receive incorrect information from the selected nodes (corresponding to step (4)). To analyze e1 , we define the following two events:

(resp. Pr(e1,1 ) ≤ KQβ−1 < Q2 ) for sufficiently large β. For the probability Pr(e1 |Ie1,2 = 0), it’s easy to check Pr(e1 |Ie1,2 = 0) < Q when n = 4K. Assume for n′ /4 nodes Pr(e1 |Ie1,2 = 0) < Q. Executing the protocol on the n′ /4 nodes three times results in an error probability of at most 3Q2 . By induction, for n′ nodes Pr(e1 |Ie1,2 = 0) ≤ Q2 +3Q2 < Q for small Q. Therefore Pr(e1 |Ie1,2 = 0) remains bounded during recursion. Next we examine the probability of e1,2 . For cluster i, there are at most O(log ni ) recursions in total since the number of nodes in each recursion is reduced exponentially. In each recursion, the probability that not all the nodes receive the broadcasted indices correctly is en2ni i , according to Corollary 1 and the union bound. Therefore, nodes in cluster i may receive wrong indices during the protocol’s execution with error probability at ni . Since there are θ(n/ log n) clusters, most nielog 2ni Pr(e1,2 ) =

n ni log ni log log n = . log n e2ni n

Thus according to Eq. (13) log log n . n Finally let us check the probability of e2 , i.e., the probability that the cluster heads receive incorrect information from the selected nodes. Θ(K logn n ) nodes in total are selected after execution of the intra-cluster protocol and each selected node transmits its data to its corresponding cluster head correctly with possibility at least 1− n12 according to Corollary 1. By the union bound K the error probability of e2 is at most n log n . Therefore, the total error probability for the intra-cluster protocol n K is Q + log log + n log n n. Pr(e1 ) < Q +

e1,1 : in the groups containing one or more target nodes, at least one cluster head obtains wrong data from the remaining nodes in its group at step (1); e1,2 : the nodes receive incorrect indices broadcasted by the cluster heads at step (2) and (4). Essentially we would like to show that Pr(e1 ) remains bounded during recursion. However, e1,2 leads to error propagation in the recursion and renders the analysis intractable. Therefore, we delineate the effect of e1,2 and calculate Pr(e1 ) in two steps by Pr(e1 ) < Pr(e1 |Ie1,2 = 0) + Pr(e1,2 ),

(13)

where I is the indicator function. First, it is assumed that all the nodes receive the indices broadcasted by the cluster heads in step (2) correctly. The information from a group (of 4K nodes) is received incorrectly with

15. According to Prop. 1, the cluster head receives the data from 2 each subgroup incorrectly with probability at most (4K) β . Since there 4K are ⌈ log(4K) ⌉ subgroups, by union bound, the error probability is at most (4K)β−12 log(4K)