Distributed sensor failure detection in sensor networks

2 downloads 18062 Views 607KB Size Report
Sep 26, 2011 -
SCHOOL OF ENGINEERING - STI SIGNAL PROCESSING LABORATORY LTS4 Tamara Toˇsi´c, Nikolaos Thomos and Pascal Frossard CH-1015 LAUSANNE

arXiv:1109.5636v1 [cs.NI] 26 Sep 2011

Telephone: +4121 6932708 Telefax: +4121 6937600 e-mail: [email protected]

D ISTRIBUTED SENSOR

FAILURE DETECTION IN SENSOR NETWORKS

Tamara Toˇsi´c, Nikolaos Thomos and Pascal Frossard Ecole Polytechnique F´ed´erale de Lausanne (EPFL)

Signal Processing Laboratory LTS4 Technical Report

September 6th, 2011

Distributed sensor failure detection in sensor networks Tamara Toˇsi´c, Nikolaos Thomos and Pascal Frossard Ecole Polytechnique F´ed´erale de Lausanne (EPFL) Signal Processing Laboratory (LTS4), Lausanne, 1015-Switzerland E-mail: {tamara.tosic,nikolaos.thomos,pascal.frossard}@epfl.ch. Abstract We investigate distributed sensors’ failure detection in networks with a small number of defective sensors. We assume that sensors measure a smooth physical phenomenon and that defective sensors’ measurements significantly differ from neighboring sensor measurements. We consider that the defective sensors are well represented by binary sparse signals. Sensor messages are also binary and they are propagated through the network using a pull-based protocol. We build on the sparse nature of the binary sensor failure signals and propose a new detection algorithm based on Group Testing (GT). The distributed GT algorithm estimates the set of defective sensors from a small number of linearly independent binary messages with a simple distance decoder. Furthermore, we theoretically determine the lower bound of the minimal number of linearly independent messages needed for detection guarantees in case of a single defective sensor. We show through experimentation that the number of messages required for successful detection is in practice much smaller for small and medium sized networks. We extend our framework to the detection of multiple failures case by modifying appropriately the message exchange protocol and the decoding procedure. The employed decoder is of low complexity and is robust to noisy messages. The overall method is furthermore resilient to the network dynamics because of our gossip-based message dissemination protocol. We provide results for both regular and irregular network topologies. Given a network setup, we provide parameter selection rules that improve the detection accuracy. Simulations demonstrate that in terms of detection performance the proposed method outperforms methods based on random walk measurements collection. Our method performs detection in less system rounds time, but it requires larger communication overhead compared to random walk based algorithms that collect network measurements.

I. I NTRODUCTION Over the past years we have witnessed the emergence of simple and low cost sensors. This has led to wide deployment of sensor networks for monitoring signals in numerous applications, like in medical applications, broadcast conflict resolving for multiple access channels, cryptography, natural hazard detection, etc. However, sensor networks have often dynamic architecture with loose coordination, due to the cost of communications. This raises new demands for collaborative data processing under network topology constraints and communication protocol. In general, a sensor network is represented as a connected graph S G = (V, E), where vertices V = {si }i=1 stand for the S sensors and edges E determine network connectivity. For instance, if two sensors si and sj lie within each other communication range, the edge ei,j ∈ E has nonzero value. Fig. 1 illustrates a setup where sensors capture a smooth physical phenomenon (e.g., temperature). Sensor readings are typically represented with messages that are further gathered for analysis. When a sensor is defective, its measurements are inaccurate. It thus becomes important to detect the defective sensors in the network, so that their erroneous values do not impact the accuracy of the underlying data processing applications.

Fig. 1.

Ad-hoc sensor network measuring a smooth physical phenomenon.

In this paper, we propose a fully distributed sensors’ failure detection method that employs a simple distance decoder. We assume that out of S sensors in the network at most K are defective, where K ≪ S and the set of defective sensors is denoted

3

as K. Therefore, the defective sensor identification problem boils down to a sparse binary signal recovery, where nonzero signal values correspond to defective sensors. Our approach is based on Group Testing (GT) methods that are commonly applied for centralized systems. The core idea is to perform low cost experiments, tests, for saving resources. The tests are performed on pools of sensors instead of testing each sensor separately. In this work, we call the subset of network sensors that perform the tests on sensor measurements as master sensors. They gather real valued sensor measurements from their neighbors. Each sensor responds to a single master sensor with probability q. Due to the smoothness assumption, non erroneous neighboring sensors have similar measurements. Each master sensor further compares the sensor measurements based on a similarity measure to decide on the presence of defective sensors in its neighborhood. Test outputs take binary values. Noise flips the nonzero test bits with probability 1 − p, where p is the activation probability. Tests and their outputs together form the network messages which are communicated between neighboring sensors with a gossip algorithm (rumor mongering) that [1] follows a pull protocol [2], [3]. When a new message reaches the sensor, its value is linearly combined with the message available at the current sensor. The formed message is transmitted in the next round with a certain probability. Due to the particular probabilistic test design and message dissemination, the employed simple distance decoder (e.g., Hamming decoder) at any sensor detects the set of up to K failure sensors given a certain number of messages. We also analyze the decoder failure bounds and analytically derive the conditions needed for successful recovery in the case of a single defective sensor. Further, we adapt the message dissemination protocol and provide error bounds for detection of sparse binary signals with multiple defective sensors. We show that the number of linearly independent measurements required for decoding in practice is smaller than the theoretically computed ones. In addition, we discuss on the algorithm parameter selection values that maximize the detection probability given a specific network. We provide results on the detection probability for regular and irregular networks. The experiments outline the advantages of the proposed detection method compared to other binary signal detection algorithms based on the random walk measurements gathering in terms of the detection accuracy. It worths to note that our method enables fast detection, but it requires higher overhead compared to the methods under comparison. This paper is organized as follows. Section II reviews related works on detection and data gathering. Further, Section III introduces centralized Group Testing framework. Section IV proposes a novel distributed detection method. It describes message formation and dissemination processes in sensor networks, formally defines the detection problem and provides decoder specifications for single and multiple defective sensors detection. Section V analyzes the message dissemination properties for sensor networks, while Section VI presents the experimental results. Finally, Section VII concludes the work. II. R ELATED WORK In this section we briefly overview related works on signal detection methods and classical data gathering algorithms. The detection literature can be classified into centralized and distributed detection methods. Most of the works on detection methods for binary sparse signals mainly deal with centralized systems. The pioneering work in [4] proposes a simple idea of pooling blood samples and observing viral presence in a set, instead of performing tests on every single blood sample separately. Typically, the main target is to minimize the number of tests required to identify all the infected sensors, while keeping the decoding procedure as simple as possible. This paradigm, known as Group Testing (GT), dates back more than a half century ago. Recent advances in malicious events detection in sensor networks are presented in [5] and the references within. Further, a survey on GT decoding algorithms has been presented in [6]. It classifies decoding approaches to cases with errors, inhibitors and their combinations. These algorithms are rather naive and are based on the idea of removing identified non-defective sensors from the total set of sensors, until failure sensors are detected. Their decoding time is of O(SB), where B is the number of performed tests and S is the total number of sensors. Certain test designs improve the effective time of decoding in centralized systems. For example a useful design property, called as K-disjunctness property (the Boolean sum of every K columns does not result in any other column in the matrix) speeds up the decoding process. Particular code construction, namely superimposed codes, relies on this property [7], [8]. Further, in [9] is proposed a random efficient construction of K-disjunct matrices with in total B = O(K 2 log S) tests and decoding time in poly(B) · B log2 B + O(t2 ), that first matches an efficient decoding time1 and whose required number of tests corresponds to the best known bound O(K 2 log S). In sensor networks, test designs are contingent on the communication limitations. Previous works that consider constraints that are imposed by the sensor network topology in GT framework are few [10], [11]. Authors in [10] propose to form tests on well-connected graphs by a random walk process. The minimal number of tests required for detection in this case depends on the random walk mixing time. Communication restricted GT algorithms can be roughly separated in namely non-adaptive and adaptive. If the designs are selected prior to test, we call it as non-adaptive, otherwise the test is denoted as adaptive. A work that considers the bipartite graph structure [11], proposes a hybrid detection system, formed as a combination of non-adaptive and adaptive test design. The required number of measurements in this case is smaller than the number required for fully adaptive systems. The efficient data retrieval for adaptive GT is studied in [12] and a binary tree splitting algorithm is proposed for this purpose. For large-scale sensor networks or networks with dynamic topology, the above described centralized methods are not anymore feasible. In those scenarios one rather needs to use distributed detection methods. Nevertheless, to the best of our knowledge none of these works considers distributed detection methods of sparse and binary signals. 1

poly stands for polynomial.

4

Due to the arbitrary network topology, the number of system rounds required to collect the necessary number of messages for the accurate decoding varies. To estimate the number of messages collected over the network rounds, we follow a methodology similar to [13]. Message propagation for an arbitrary connected network depends on the graph conductance parameter [14]. However, these results are not directly applicable in our case, since the required assumption does not hold anymore, i.e., the number of messages in the system is not greater than the cardinality of the signal field space. Recent works on data gathering algorithms mainly examine fully connected or regular graphs with high connectivity. Commonly, the considered scenario is one where every sensor requests local observations from a randomly selected set of neighboring sensors. It is shown that the standard forwarding rumor algorithm [2] for fully connected networks with S sensors requires Θ(S ln S) transmissions and Θ(log S) rounds for guaranteeing message transmission from one sensor to another in the network. Coding-based gossip algorithm for message dissemination is proposed in [13] where it has been proven that transmission of S messages to all the sensors with high probability lasts O(S) transition rounds. This improves the results reported in [3]. For completeness, we glance at classical distributed detection methods. These methods are employed for non-sparse signal detection with explicit network and message constraints and they fairly employ statistical decoders. For an overview of such methods, interested readers are referred to [15]. In [16], a Bayesian approach appends a score to predefined sets of hypothesis, that depends on the received set of network messages. From these, the hypothesis with the highest score drives the decision. The binary event detection problem for hierarchically clustered networks fuses the cluster decision and makes a final hard decision [17]. Surveys on similar methods can be found in [18], [19]. When the destination sensor is known beforehand, specific optimal dissemination methods are tractable, like in [20], [21]. These approaches are not applicable in our framework since no prior knowledge of message codebook nor of the signal statistics is available. III. C ENTRALIZED DETECTION WITH G ROUP T ESTING Hereafter, we adopt the following notation: matrices, vectors and their elements are represented with boldface capital letters (M, m) and lowercase letters (Mi,j , mi ), respectively. Calligraphic letters are used to denote sets G, while | · | the number of elements in a set. Furthermore, the i-th matrix column and the i-th row are represented with Mi and M i respectively. A. Centralized Probabilistic Group Testing We first review the centralized detection of failures with methods based on Group Testing (GT). Detection is the identification procedure of a subset of elements with distinct properties from the rest. Methods for detection are categorized into deterministic and probabilistic algorithms. In centralized deterministic GT methods, each sensor is pre-assigned to a test. Given a large number of elements, this approach is not feasible anymore, since it is difficult to design tests in a simple manner. To this aim, probabilistic GT has been proposed in [22]. Hereafter, we focus on probabilistic non-adaptive test design methods for GT. GT aims to detect the set of non-zero values f given the test matrix W and the test outcomes g. Let the nonzero entries of a S-dimensional binary vector f ∈ FS2 indicate the defective sensors that we want to detect. F2 is a finite field of size two and f is a K-sparse signal, K ≪ S. The non-adaptive tests preformed on sensor measurements are represented with a B × S dimensional matrix W. The nonzero entries of the i-th row Wi ∈ FS2 indicate sensors that participate in the i-th test. The binary tests results are denoted with the test outcome vector g ∈ FB 2 , B < S. Finally, test outcomes are calculated as follows: g = W ⊗ f.

(1)

The boolean matrix multiplication operator ⊗ stands for the combination of element-wise “logical OR” addition and “logical AND” multiplication operations. The design of matrix W is crucial for reducing the number of required tests for error free detection of defective sensors. This design resembles the design of LDPC codes [23] generator matrices. In Tanner graphs representation of LDPC codes, the LDPC encoded symbols are partitioned in check and variable sensors. The check sensors are used to detect errors introduced during transmission of LDPC encoded packets. Motivated by this similarity in [22], W is constructed as:  1, with probability q, Wi,j = (2) 0, otherwise. where Wi,j ∈ W. Test matrix design assures that any test matrix column is not a subset of any union of up to K columns, with high probability (disjunctness property, Def. 3.1). In other words, a matrix W is called K-disjunct if any column Wi of W does not lay in the sub-space formed by any set of K columns Wj with j 6= i. This property enables fast decoding with the distance decoder (counts Hamming distance), since g and columns Wi , for i ∈ K with K be the set of defective sensors, differ at few positions. The distance decoder exploits the knowledge of the test outcome vector g and the seed that was used for generating the randomly generated test matrix. Different than GT methods, LDPC codes decoding is usually performed by iterative belief propagation.

5

B. Detection probability For sake of completeness, we provide some definitions. These are the starting point of our analysis, as they help to understand the proposed decoder design. For more details refer to [22]. We first define the disjunctness property of test matrices that results in reduced computational complexity for detection. This property assures that the union of any set of at most K different columns of the test matrix differs in at least ǫ positions with any other column of that matrix. More formally, the disjunctness definition is given as: Definition 3.1: Disjunct matrix: A boolean matrix W with S columns W1 , W2 , . . . , WS is called (K, ǫ)-disjunct if, for every subset T of its columns with |T | ≤ K: [ supp(Wj )) |> ǫ, ∀i ∈ {1, . . . , S} (3) | supp(Wi )\( j∈T \{i}

where supp(Wi ) denotes the nonzero elements (support) of the column Wi and \ is the set difference operator. The decoder counts the number of positions in the column Wi for which the union of distinct columns differs from the set T . The following proposition shows the connection between the structure of disjunct matrices and suitable test matrices that assure detection in a noisy environment. Proposition 3.2: Let W be a (K, ǫ)-disjunct matrix. Then, taking W to be the test matrix resolves the detection problem of K-sparse vector f with error parameter ǫ. We use the disjunct matrix parameter ǫ as the distance decoder threshold for detection. The decoder accumulates the number of entries in a column of the (K, ǫ)-disjunct test matrix and the outcome vector g that are different. Columns of the test matrix that achieve the lowest Hamming distance correspond to defective sensors. Recall that non-zero entries of the test matrix indicate sensors participation in a test. Proposition 3.3: Distance decoder theorem: For any column Wi of the test matrix W that is (K, ǫ)-disjunct, the decoder verifies if: | supp(Wi )\supp(g) |≤ ǫ, (4) where g = W ⊗ f is a vector consisting of the test outcomes. The entries of the vector f are inferred as nonzero iff the inequality holds. The following theorem provides the required number of measurements in centralized detection for successful decoding with the distance decoder. Proposition 3.4: Let the test matrix be (K, ǫ)-disjunct. The distance decoder successfully detects the correct support with overwhelming probability for a K-sparse vector f in noisy environment when the number of tests is equal to b = O(K log(S)/p3 ). IV. D ISTRIBUTED

DETECTION METHOD

In this section, we propose a novel distributed detection method and analyze its detection error probability. Specifically, for a single defective sensor, we provide the lower bound on the number of linearly independent messages required for accurate detection with high probability using a distance decoder. We extend the proposed detection method to the multiple defective sensors case by modifying the gossip protocol and provide the analytic error bound caused by this modification. A. Sensor network message design and dissemination We propose a novel message design which enables distributed and efficient detection of defective sensors. The sensors create messages with a two stage procedure. The messages are generated and communicated in synchronized rounds. Each round t ∈ N consists of two phases, tI and tII as shown in Fig. 2. During the first phase tI , the sensors obtain messages, that estimates the presence of defective sensors in their neighborhood. In the second phase tII , the sensors exchange and combine the generated by the previous step messages employing a gossip mechanism. These phases are described below in more details. The first phase tI represents the message construction process. It starts with a selection of L master sensors, that cluster the sensors V into disjoint subsets Vl ⊂ V, l = 1, . . . , L. This selection can be random or deterministic. Each sensor in a cluster randomly chooses the test participation indicator as in Eq. (2). The master sensors locally gather the real valued readings from sensors that participate in the test. Measurements of neighbor sensors do not vary significantly when they are not defective, due to the smoothness of the signal. Each master sensor estimates the presence of defective sensors within its neighborhood and appends a binary value f (si ) ∈ f to each sensor in the neighborhood. The value f (si ) = 1 denotes that the sensor si is defective. Noise in tests is modeled with the activation probability p. The identifiers in W l that participate in the test are flipped to zero with probability 1 − p. The process of the test outcome computation is summarized as:  1, sensor(s) ∈ K , l gl (tI ) = W ⊗ f = (5) 0, otherwise,

6

s1

s5 s4

s1

s5 s4

s2

s2

s3 (1) (2)

s3

(a) Phase tI : Message design.

(b) Phase tII : Message dissemination.

Fig. 2. Illustration of the message design and dissemination through the sensor network. (a) Message formation based on local sensor measurements: Full and dashed arrows are marked with (1) and (2), respectively. They denote the sequence of transmissions and their directions within the cluster during the phase tI . (1): The master sensor collects the sensor measurements from its vicinity set {s1 , . . . , s4 }, compares their values and forms the message (g(t), W l (t)), that represents a test outcome and test participation indicators, respectively. (2): Message is propagated from the master sensor to its neighbor sensors. (b) Message dissemination based on a gossip algorithm with pull protocol, where the sensors request the previous round message from its neighbors, chosen uniformly at random.

where the binary matrix operator ⊗ is composed by ⊙ and ⊕, that stand for the bitwise OR and the bitwise addition operators, respectively. The message (gl (tI ), W l (tI )) formed by a master sensor contains the outcome result gl and the test participation identifier W l in phase tI . Then it is sent to the neighbor sensors, which concludes the phase tI . This phase is illustrated in Fig. 2(a). Note that the master sensors that participate in phase tI are chosen either deterministically or uniformly at random. In the latter case, with a non zero probability some sensors do not belong to any masters’ sensors neighborhood. When multiple master sensors request the message from a sensor, the sensor responds to only one randomly selected master sensor. During the phase tII , the messages created in tI are disseminated within the network. The phase tII is illustrated in 2(b). It starts when every sensor i ∈ {1, . . . , S} requests the message formed at the previous round (t − 1) from its neighbor j, chosen uniformly at random, by the gossip mechanism with pull protocol. Next, each sensor j responds on the message request that it has received from the sensor i. The sensor i further combines these messages as follows: gi (t) = gi (tI ) ⊕ gj (t − 1), W i (t) = W i (tI ) ⊕ W j (t − 1),

(6)

where gj (t − 1) denotes the sensor outcome value of the neighbor j from the previous round (t − 1) and the vector W i (t) represents the resulting indicator vector at the sensor i in the round t. Since the messages are created probabilistically, the message combination across the rounds assures that an innovative message reaches sensors at every round with high probability. A toy example of the dissemination phases is illustrated in Fig. 3. In this example the sensor s2 at round t = τ pulls the message from the sensor s1 and constructs a new message. It is illustrated in Fig. 2(b).

(g1 (t−1), W 1 (t−1)) ( 1 , 1010 )

(g2 (t), W 2 (t)) ( 1 , 1110 )

(g2 (tI ), W 2 (tI )) ( 0 , 1100 ) Fig. 3. The message formation at sensor s2 in round τ . We assume that sensor s2 pulls sensor s1 to send its previous round values (round t − 1). We assume that the sensor s3 is defective f = [0010 . . . ]. The outcome value and the test identifier vector are formed by bitwise XOR.

In matrix form, the process of message formation and transmission in B rounds is represented as: g = W ⊗ f,

(7)

where the sensor identifier matrix W = [W 1 ; . . . ; W B ] is of B ×S. The former equation resembles to the outcome computation in the centralized GT case. However, in the distributed GT the elements of the probabilistic matrix W(tI ) are chosen as in Eq. (2) and W represents the boolean addition of rows of different realizations of W(tI ). The values in W depend on the random message propagation path. Note that for an arbitrary network, the number of network rounds required for collecting at least B linearly independent tests varies and depends on both the network topology and the test participation probability q. The problem that we deal in this paper is outlined formally as follows: Problem Detect a set of up to K ≪ S defective sensors in sensor networks with S sensors, given B < S linearly independent network tests W and test outcomes f , which are formed and propagated through the network in a distributed way.

7

B. Detection of one defective sensor in the network Below we provide the necessary conditions for erroneous detection of a single defective sensor. A defective sensor is detected with high probability with a distance decoder when the test matrix, created over tI phase of each round, is (K, ǫ)-disjunct and the number of available linearly independent tests collected by the sensors is proportional to O(K log(S)/p3 ). The formal propositions are given below. The first proposition gives the conditions for centralized detection and uniform gathering of clusters’ messages. The second one determines the number of per-cluster messages collected distributedly that assures error free detection with high probability using the distance decoder. Proposition 4.1: We assume that the sensor identification vectors formed over tI of rounds build a L × S (K, ǫ)-disjunct matrix W(tI )=[W 1 (tI ); . . . ; W L (tI )]. The messages are collected uniformly at random from all the clusters. If the rank of the messages available at the sensors is rank(W(tI ))≥O(K log(S)/p3 ), the distance decoder defined in Eq. (4) identifies the defective sensor with high probability. Proof: Due to our special message design, the matrix W(tI ) is (K, ǫ)-disjunct with high probability. This proof is given in the Appendix A. In short, we bound the erroneous events following Chernoff bound analysis and show that the probability of detection failure per cluster is small. These events correspond to cases where the number of column flips in the probabilistic tests matrix generation is higher than ǫ or the disjunctness property is violated. Finally, we show that, if the number of collected messages is proportional to O(K log(S)/p3 ), the disjunctness property holds for any fixed test matrix with high probability. In such cases, g(tI ) = W(tI ) ⊗ f is equivalent to the centralized GT setup given in Eq. (1). During the second phase tII , the available sensor messages are linearly combined in binary field. The outcome value of g = W ⊗ f depends on the presence of a defective sensor sk in the test. Next, we show that the distance between the vector g and the k-th column of the matrix W, Wk does not increase beyond ǫ during tII , while this is not true for all other columns. We denote a distance operator with:  1, if a 6= b, dist(a, b) = (8) 0, otherwise. When sensor j sends its message to sensor i during round t, we have: dist(gi (t), Wi,k (t)) = =

dist(gi (tI ) ⊕ gj (t − 1), Wi,k (tI ) ⊕ Wj,k (t − 1))

(9)

dist(gi (tI ), Wi,k (tI )) ⊕ dist(gj (t − 1), Wj,k (t − 1)),

where the first equality is resulting from Eq.(6). The second equality follows directly from the fact that the values of g(tI ) and Wk (tI ) columns are identical in error free case and differ in at most ǫ positions in noisy cases. We further show that the linear combinations of messages in the network enable failure detection by the distance decoder. We assume that L deterministically chosen master sensors partition the sensor network in disjunct parts. Test realizations within a cluster form test vectors. Over the rounds, these vectors create matrices:  1, with probability q = αKi , (10) W(tI ) = 0, otherwise, PL ˜ where αi ’s are chosen such that α = i=1 αi . The set of matrices W(tI ) created over the network rounds we denote as W. ˜ W is a linear combination of rows of matrices in W, Eq. (6). The decoding error tends to zero asymptotically when at least O(K log(S)/p3 )/L measurements are obtained from every cluster, given that the test matrices created in tI are (K, ǫ)-disjunct. The corresponding proposition is given below. Proposition 4.2: If the above assumptions hold and if the number of linearly independent measurements received per cluster is at least B/L, where B≥O(K log(S)/p3 ), the probability that the distance decoder fails to detect defective sensors tends to zero as S → ∞. Proof: The proof consists of two parts: the first part, given in Appendix (A1 - A3), provides the bounds on the events that ˜ while the second part that demonstrates that the distance decoder detects the defective cause decoding failure for matrices W, sensor is proven in the Proposition 4.1. C. Detection of multiple defective sensors in the network We analyze the probability of erroneous events and discuss on the decoder parameter tuning. We assume here that the number of defective sensors is much smaller than the total number of sensors, so messages with nonzero test outcomes are rarely combined together. However, we need to prevent the occurrence of such combinations, since it causes the erroneous detection (Proposition 4.1 is not valid anymore). We propose to modify the gossip algorithm by changing the message formation process in tII . We explain this modification through a simple example. Let us assume that the sensor i pulls the information from the sensor j and that both messages have nonzero outcome values. Instead of combining the messages as in Eq. (6), we buffer the i-th sensor message and consider that the j-th message is the resulting message of the phase t, where: gi (t) = gj (t − 1), W i (t) = W j (t − 1).

(11)

8

At the first subsequent round when two messages with zero-valued test outcomes occur at sensor i, the buffered message in i replaces its current message that has zero-value outcome. Next, these messages are combined as in Eq. (6). In the following, we analyze the error that might occur in the multiple defectives case in an example. We describe the adaptation of the distance decoder threshold value according to the probability of sensors’ failures. An erroneous message is generated in a cluster which contains more than one defective sensor, when only a fraction of defective sensors participates in the test actively. Due to the protocol modification, only one cluster may generate the erroneous message per round. We here investigate the probability of such an event. In total we assume there are m < L defective S sensors and that the master sensors’ clusters contain n = L sensors. An erroneous event E is encountered in the following joint event: we randomly select one cluster out of a group of clusters that contains multiple defective sensors (occurs with probability PL ), we select 1 < m ≤ K cluster sensors to be defective (occurs with probability Pm ) and finally defective sensors partially participate in tests with probability Pq . The probability of such an event is a product of P (E) = PL · Pm · Pq (independence assumption). Next, we provide the analytical expressions for PL , Pm , Pq and P (E). m sensors are distributed over m1 < m clusters at maximum, since the occurrence of one defective sensor per cluster does not introduce errors. We  1 −1 different ways (combinations with repetitions). We have L clusters in select m1 clusters with defective sensors in L+m m1 the network which can be chosen uniformly, so the probability to select a cluster out of a group of clusters with multiple defective sensors is equal to PL = L+mL1 −1 . Then, within a cluster with n elements, m elements are selected uniformly at ( m1 ) 1 1 . . . n−m+1 . Finally, every sensor independently decides whether it participates in a test (permutations random: Pm = n1 n−1 with repetitions). Therefore, there are in total 2m − 2 such cases, since “all zeros” or “all ones” events are excluded because they do not lead to decoding errors. This probability is given as a polynomial of order m − 1, Pq = poly(q m−1 ) which can be easily computed for given values of m and q. Combining these probabilities we obtain P (E) as: P (E) =

L L−m1 +1 m1



1 1 1 ... · poly(q m−1 ). nn−1 n−m+1

(12)

The decoder performs as follows. The outcomes g = [g0 g1 ]T are separated into two orthogonal sets, i.e., negative and positive outcome vectors g0 and g1 , respectively. Subsequently, the rows of the test matrix W form two sub-matrices W0 and W1 and Eq. (7) becomes:      f0 W0 0 g0 . (13) = f1 0 W1 g1 Then, we eliminate non-defective sensors from W1 using knowledge from W0 . To this aim, we form sets of unions of up to K columns from the columns of matrix W1 with at least one non-zero value. Distance decoding is performed between g1 and the newly formed set. Basically, this approach boils down to decoding of a single defective sensor in a cluster, where the solution is unique2. The distance decoder threshold value is adapted according to the probability of sensors’ failures. The ′ parameter ǫ is increased ǫ = ǫ + δǫ , where δǫ = P (E)E(g1 ) and E(g1 ) is the expected number of positive test outcomes in tests. It is set to the total number of observed positive test outcomes. For multiple defective sensors case, the transmission protocol ensures that the assumptions behind Proposition 4.2 are verified. We then use the above results for computing the lower bound on the number of necessary messages. Proposition 4.3: If the number of linearly independent measurements received per cluster is at least B/L, where B ≥ ′ O(K log(S)/p3 ) and the error parameter ǫ = ǫ + δǫ , the distance decoder detects defective sensors with high probability. V. M ESSAGE DISSEMINATION

ANALYSIS

In this section we analyze the message dissemination in networks and demonstrate its sensitivity on network parameter values through illustrative examples where the master sensors’ selection is deterministic. Previous works in this area assume that the number of messages S is smaller than the message field space size Q, S ≤ Q. A dissemination analysis for fully connected or regular graphs is available in [13] and for irregular graphs in [14]. However, due to the probabilistic test formation process, this assumption does not hold in our case. In our work, the distance decoder requires a certain number of linearly independent messages for detection that is defined in the Section IV, Proposition 4.2. Over the rounds, some of the messages received at sensors are redundant, so they do not enhance detection. With the term rank we denote here a number of linearly independent messages available at the sensor. We examine the two major dissemination stages, illustrated in Fig. 4: • •

Dominant intra-cluster dissemination: innovative message originates from sensors that belong to the same cluster, Dominant inter-cluster dissemination amongst neighboring clusters: innovative message arrives from neighbor clusters.

We say that an intra-cluster dissemination is dominant if the probability that the rank increases is higher when the message that originates from the sensor located within the same cluster as the receiver sensor than that the message is received from the sensor that belongs to a different cluster. The opposite case is denoted as the dominant inter-cluster dissemination event.

9

s1

s1

s5

s5

s4

s4

s2

s2 s3

(1) (2)

(1) (2)

(a) Dominant intra-cluster dissemination

s3

(b) Dominant inter-cluster dissemination

Fig. 4. Illustration of a message dissemination stages in an irregular network. Full and dashed arrows marked with (1) and (2) denote the dominant and the secondary dissemination event, respectively.

These two dissemination stages are followed by the dominant inter-cluster dissemination amongst distant clusters, which is out of the scope of this paper. We investigate the probability distribution that a received message increases the rank of a sensor over the rounds. This distribution is highly dependent on the network topology and the chosen routing paths, so we analyze over simple networks. In particular, we consider k-regular networks with n = S/L sensors per cluster. All sensors in clusters are connected with each other cluster(s). Although deterministic master sensors selection generates a set of messages whose cardinality is smaller compared to that generated when master selection is done randomly, we selected it for analysis as random case computations are not feasible. The total number of connections in a k-regular graph among n sensors is o1 = n(k − 2) + (k − 1). Exactly o2 = 2(k − 2) are the inter-cluster links. Thus, a sensor from one cluster communicates with a sensor that belongs to another cluster with probability: 2(k − 2) pcl = . (14) n(k − 2) + (k − 1) Since each sensor is engaged in a test with probability q, the number of sensors that participate in tests is not constant. For worst case analysis, we fix the number of sensors to ⌊nq⌋, where ⌊·⌋ is the floor operator. Thus, each message contains n1 = ⌊nq⌋ symbols “1” (sensor participates in a test) and n − ⌊nq⌋ symbols “0”. With Trω we denote a random variable that a sensor i receives exactly rω = rank(W) linearly independent network messages. If the probability P r(Trω ) > 0.5, a sensor i is considered to have a rank equal to rω .

A. Dominant intra-cluster dissemination  The cardinal number of set of messages that can be generated within a cluster is equal to c = nn1 , since n1 sensors out of n sensors in the cluster participate in a test. At each dissemination round, we examine the distribution P r(Trω ) that represents the arrival of innovative messages. This probability evolves with the dissemination rounds t as given below: t ∈ {1, 2}, P r(Tt) = 1 1 1 t = 3, P r(T3 ) = (1 − ) > , c 2 1 2 t = 4, P r(T4 ) = (1 − ) > , c 2 ... c 1 1 1 t = − 1, P r(T 2c −1 ) = (1 − ( − )) > . 2 2 c 2

(15)

This phase lasts exactly τ = c/2 − 1 rounds as P r(T 2c ) ≤ 12 , which means that the probability that an innovative message is received from the same cluster decreases. Therefore, for t ∈ [3, 2c ) the P r(Trω ) is approximated by a geometric distribution Geom(prω ), with parameter prω : P r(Trω ) = (1 − prω )rω −1 prω . Here, we demonstrate that the duration of dissemination rounds τ is very sensitive to the test participation parameter q. We set q = 0.5 to maximize message diversity. In this case, the overall number of possible messages becomes 2Q , where the parameter Q represents the message field space size. From these messages, 2Q − 1 messages are innovative as an “all zero” 2 If

more than one set solution exists, we encounter decoding failure.

10

message means that no sensor participates in a test. We get the following P r(Trω ) over the rounds: t = 1, P r(T1 ) = 1

(16)

1 1 )> , 2n 2 1 1 t = 3, P r(T3 ) = (1 − n−1 ) > , 2 2 ... 1 1 t = c∗ , P r(Tc∗ ) = (1 − n−c∗+2 ) > . 2 2 This phase terminates at round τ ∗ = c∗ , since the probability that the sensor i receives an innovative message in t = c∗ + 1 is smaller than 0.5, P r(Tc∗ +1 ) < 0.5 . Obviously, τ ∗ 6= τ but Trω , for rω ∈ (2, c∗ ] again follows a geometric distribution function. t = 2, P r(T2 ) = (1 −

B. Dominant inter-cluster dissemination The inter-cluster communication analyzed in the previous dissemination phase is necessary for the probability distribution analysis of the inter-cluster dissemination stage. In other words, the rank rω at the sensor i does not increase if the message is linear dependent to the messages available to that sensor. In this dissemination phase, two different types of innovative messages can be received from the sensor i: (a) innovative messages that are created in the same cluster to which the sensor i belongs to and (b) innovative messages created in neighboring clusters. For the first type of messages, in total 2c + 1 or c − c∗ messages are innovative in the two examples of Section V-B. For the second message type, the number of innovative messages depends on the number of received messages in the previous, intra-cluster dissemination phase. Here we compute the number of inter-cluster messages received during the previous dissemination phase and use it to model the probability distribution of innovative messages received in the current phase. We assume that there are L − 1 clusters with direct links to the cluster where the sensor i belongs to. The inter-cluster communication probability pcl is defined as in Eq. (14). pcl (L − 1) stands for the probability of receiving a message from a neighbor cluster. Then, over t rounds in the previous dissemination stage, the probability that an innovative message arrived from neighboring clusters is equal to product of pcl (L − 1) and Eq. (15). The number of messages received by the neighbor cluster is evaluated by counting the number of events that pcl (L − 1)P r(Tt ) > 0.5 for the intra-cluster dominance phase. This value determines known messages from other clusters at the end of the intra-cluster dominance stage and it is denoted with n2 . Next, we compute the number of available innovative messages at the beginning of the inter-cluster dissemination stage for two examples. The overall number of innovative messages at sensor i is L( 2c −1)−n2 . We obtained it as: ( 2c −1)+(L−1)( 2c −1)−n2 . The first term gives the number of innovative messages from the cluster that the observed sensor belongs to, the second term stands for the minimal number of innovative messages from neighboring clusters and the third term represents the messages that are not innovative for the receiving sensor. The worst case scenario (the smallest number of available innovative messages) occurs for L = 2 and n2 = 2c . Obviously, the number of rounds passed between the reception of two consecutive innovative messages increases as dissemination progresses. This process can be modeled by Binomial distribution Bino(prω ):   r P r(Trω ) = (17) (prω )rω (1 − prω )(r−rω ) , rω which represents the probability distribution that sensor i has increased its rank from rω − 1 to rω within r subsequent rounds. In a similar way, for the example where q = 0.5, we estimate the number of messages n2 received during the previous dissemination phases from the neighbor cluster by counting the number of rounds where pcl (L − 1)P r(Tt ) > 0.5 and P r(Tt ) are given with Eq. (16). Apparently, the number of rounds admitted to receive B linearly independent messages depends on the topology and the protocol. VI. P ERFORMANCE

EVALUATION

In this section, we investigate the performance of the proposed defective detection method hereafter denoted as GP for various scenarios. Specifically, we first examine the influence of the different network parameters in speeding up the innovative message dissemination. Next, we examine the decoding probability for both single and multiple defective sensor(s) detection. The simulations are performed for fully connected, k-connected and irregular graphs. Further, we discuss on the number of required linearly independent measurements needed for successful detection and compare it with the theoretical one. The latter is only provided for centralized networks, that corresponds to fully connected graphs with a single cluster as computations of more complex cases are not feasible. The proposed method is compared in terms of detection probability and communication overhead with: (a) a Random Walk method that employs a Gossip mechanism with pull protocol (RWGP) and (b) a Random Walk (RW) detection. A random walk determines the path of successive random dissemination message exchanges between neighbor sensors. In RWGP method, the

11

random walk is initiated at L sensors (equivalent to the master sensors in the GP method) and terminates after a pre-determined number of rounds. In RWGP sensors create messages from the sensor measurements collected along the random walk path. These messages are transmitted with the gossip algorithm that uses pull protocol. Note that, for identical choice of the sensors over rounds, RWGP and GP are identical. The RW method initiates the raw (uncompressed) measurements collection in L random sensors and completes it in a given number of rounds. Every sensor that lays along the random walk path stores the values of all previously “visited” sensors. Regarding the average rank of collected messages, the proposed GT is also compared with a Store-and-Forward (SF) and a Greedy Store-and-Forward (GSF) method that employs pull protocol. Both algorithms disseminate raw sensor measurements. For the SF method, upon receiving a message request, randomly chosen messages from the set available at sender sensor are transmitted to the “caller” sensor. In GSF, the “caller” sensor randomly requests the “called” sensor, but it requests innovative measurements in a greedy manner. This procedure involves additional message exchange among sensors in every round. In addition, we compare all the above mentioned methods with respect to the communication overhead. For irregular sensor networks construction, we place sensors randomly in a unit square area. Sensors that lay within a certain radius are considered as linked and they can directly exchange messages. For drawing accurate conclusions, we build 10 different network realizations and for each such realization we perform 100 independent simulations. The programming platform used for conducting experiments is Matlab. A. Selection of parameters for proposed method First, we study the influence of networks’ capability to generate innovative messages on the decoder performance. We consider two different methods for selecting master sensors: random master sensor selection (RM) and deterministic master sensor (DM) selection. Fig. 5 illustrates the detection probability and the achieved average rank with respect to the number of message dissemination rounds, for fully connected graphs with S = 20 sensors and one (K = 1) defective sensor. We observe that the performance depends on both L and α = qK parameters for both RM and DM. These values should be selected properly to maximize the message diversity in the network. Specifically, we observe that RM achieves the maximum message diversity for α = 1 (maximum value) since the diversity of messages in this case is maximized by construction in Fig. 5. We can also note that the number of clusters does not affect significantly the detection performance of RM. On the contrary, for DM both parameters are important. Thus, small values of α guarantee high message diversity. This is due to the fact that DM method requires more rounds to receive sufficient messages for detection. In the following, we focus on RM selection where possible (that is, for K = 1), as it provides higher probability of creating innovative messages. B. Discussion of results The detection probability and the average rank evolution over rounds are examined for fully connected (FG) and k-connected regular networks (RG) with sensors degree k ∈ {6, 16}. For all cases the network consists of S = 20 sensors and one defective sensor. From Fig. 6 it becomes clear that networks with higher number of connections achieve faster dissemination of innovative messages. We also note that high connectivity value k is beneficial, but it cannot drive by itself the performance of our detection scheme. It should be combined with appropriate choice of network parameters, as discussed earlier. For example, RM master sensor selection for k = 16 achieves better detection performance, compared to that in fully connected graphs. In Fig. 7, we illustrate the average results for ten different random graph realizations (100 simulations per different graph) with S = 20, K = 1 defective sensor, L = 5 random clusters and minimum sensors’ degree k ≥ 3. Random graphs require more rounds in average for successful detection, as expected. We can also observe that the detection performance decreases, due to the limited diversity (smaller probability of receiving innovative messages) of the exchanged messages and the low number of connections among sensors. Similarly, Fig. 8 presents results for larger networks which are in accordance with the above. In Figs. 9 and 10 we present results for larger number of defective sensors (K = 2) in networks with 20 sensors. The results are given in terms of the average detection probability over dissemination rounds, for both fully and irregularly connected graphs. Master sensors are selected deterministically (DM) due to decoder design for multiple defective sensors identification. Note that this example violates the condition K ≪ S and the performance drops significantly. Similarly, results for S = 70 and K = 2 are depicted in Figs. 11 and 12. We focus on the evolution of the decoding probability and the average number of messages collected over rounds. From the evaluation it is clear that the detection performance is reasonable when the selected parameters favor diverse message generation. This is expected, as sparsity condition holds for such parameter selection. In [22], it has been proposed a centralized system that can be considered as dual to fully connected networks with centralized tests (single master sensor that covers all the network). For comparison reasons, we compute the required number of measurements for networks with: (S = 20, K ∈ {1, 2}, p ∈ (0.9 − 1), q ∈ (0.15 − 0.3), pf1 = 0.01, pf2 = 0.01) and (S = 70, K ∈ {1, 2}, p ∈ (0.9 − 1), q ∈ (0.15 − 0.3), pf1 = 0.01, pf2 = 0.01). The results are reported in Table I. We observe that the worst case analysis leads to higher number of dissemination rounds than the real ones. However, these values decrease relatively to the growth of number of sensors in the network. Simulations show that in practice the required measurements are significantly fewer.

1

1

0.8

0.8

Detection probability

Detection probability

12

0.6 RM,α=1 DM,α=1 RM,α=0.7 DM,α=0.7 RM,α=0.3 DM,α=0.3

0.4

0.2

0 5

10

15

20 25 30 System rounds

35

0 5

40

RM,L=5 DM,L=5 RM,L=8 DM,L=8 RM,L=10 DM,L=10 10

15

20 25 30 System rounds

35

40

20 Average network rank value

Average network rank value

0.4

0.2

20

15

10 RM,α=1 DM,α=1 RM,α=0.7 DM,α=0.7 RM,α=0.3 DM,α=0.3

5

0

0.6

5

10

15 20 25 System rounds

30

35

40

15

10 RM,L=5 DM,L=5 RM,L=8 DM,L=8 RM,L=10 DM,L=10

5

0

5

10

(a) L = 5

15 20 25 System rounds

30

35

40

(b) α = 0.7

Fig. 5. Simulation results for fully connected graphs with S = 20 sensors, K = 1, where RM and DM denote the random and deterministic selection mode of master sensors, respectively. Top row: Probability of defective sensor detection. Bottom row: Average rank of messages received per sensor. Column (a): fixed values of the master sensors (L = 5). Column (b): fixed values of the sensor participation constant (α = qK = 0.7). TABLE I T HE THEORETICAL MEASUREMENT REQUIREMENTS FOR NETWORKS WITH S

p ∈ (0.9 − 1)

S=20 K=1 K=2 130 (115-244)

SENSORS .

S=70 K=1 (174-217)

K=2 (125-284)

Detection probability comparison of the proposed method with several detection methods are illustrated in Figs. 13 and 14, for 20 and 70 sensors respectively. The proposed scheme outperforms all other methods. Note that the rounds in RWGP scheme last longer than the other schemes, while RW needs higher communication overhead for dissemination, due to the transmission of raw sensor measurements. Average rank values over the network rounds are illustrated in Fig. 15. C. Communication overhead For sake of completeness, we analyze the communication cost of the proposed gossiping protocol and compare it with all other schemes under comparison. Let Rd and Id denote the number of bits needed for raw measurements transmission and sensor identifier, respectively. Recall that the tuple (S, L, Ln , n, τ ) stands for the number of sensors in the network, the number of master sensors (clusters), the number of neighbors that each master is connected with, the average number of sensors per cluster (n = S/L) and the total number of transmission rounds. We have already mentioned that the proposed GP algorithm is completed in two phases: message formation and message transmission. During the first phase, master sensors receive raw measurements from their neighbors. Thus, Ln · Rd bits are consumed for communicating the values. Further, master sensors create binary messages and send them to their neighbors. Every neighbor requires knowledge about the identifier of sensors that participate in a test, thus the cost is Id ·⌈q(L + Ln )⌉ bits, plus an additional bit for sending the outcome result. Hence, the overall bit consumption is Ln Rd + Ln (Id ⌈q(L + Ln )⌉) + 1).

13

20

0.8

0.6 FG,α=0.7 FG,α=0.3 RG,k=16,α=0.7 RG,k=16,α=0.3 RG,k=6,α=0.7 RG,k=6,α=0.3

0.4

0.2

0

5

10

15 20 25 System rounds

30

35

Average network rank value

Detection probability

1

15

10

5

0

40

FG,α=0.7 FG,α=0.3 RG,k=16,α=0.7 RG,k=16,α=0.3 RG,k=6,α=0.7 RG,k=6,α=0.3 5

10

15 20 25 System rounds

(a)

30

35

40

(b)

1

1

0.8

0.8

Detection probability

Detection probability

Fig. 6. Simulation results for fully connected (FG), k = 16-regular connected (RG, k = 16) and k = 6-connected graphs (RG, k = 6) with S = 20 sensors, K = 1 and a random selection (RM) of L = 5 master sensors: (a) Probability of defective sensor detection; (b) Average rank of messages received per sensor.

0.6 FG,α=1 FG,α=0.7 FG,α=0.3 IG,α=1 IG,α=0.7 IG,α=0.3

0.4

0.2

0

5

10

15 20 25 System rounds

(a) L = 5

30

35

0.6

0.4

FG,L=5 FG,L=8 FG,L=10 IG,L=5 IG,L=8 IG,L=10

0.2

40

0

5

10

15 20 25 System rounds

30

35

40

(b)α = 0.7

Fig. 7. Probability of defective sensor detection; Simulation results for irregular graphs (k > 3) and random selection (RM) of S = 20 sensors, K = 1. (a) L = 5 master sensors; (b) sensor participation constant α = qK = 0.7.

Correspondingly, for the message exchange phase S(1 + S) bits are required, from which S + 1 bits are reserved for the test outcome and the test matrix row W. Trivially, the overall number of transmitted bits over τ rounds is given by: nbGP = τ [Ln {Rd + Id ⌈q(L + Ln )⌉ + 1} + S(1 + S)] .

(18)

We compared the communication cost of GP with that of RWGP that take place also in two phases. The first phase represents the random walk message collection, while the second is equivalent to the GP algorithm. It is worth noting that RWGP and GP perform equally well in terms of decoding performance when n values are collected as the messages are propagated through a path. Therefore, a random walk terminates at nth hop. RWGP transmits raw measurements, which results d bits. Therefore, RWGP communication cost is given by: in Rd + 2Rd + · · · + nR = (1+n)R 2   (n + 1)Rd nbRW GP = τ L + S(1 + S) . (19) 2 The bit transmission requirements for the RW algorithm is equivalent to that of the first step of RWGP. Hence, it is equal d L. Finally, it can be easily shown that SF algorithm requires: nbSF = τ Rd log S bits. to: nbRW = τ (n+1)R 2 The comparison between the proposed method and all other comparison schemes regarding the bits spent for communication is illustrated in Fig. 16. Note that the proposed algorithm in this setup requires only t = 15 rounds for efficient detection (Fig. 5), but it consumes approximately three times more communication overhead compared to that of RWGP algorithm. However, due to the specific collection approach (hops), the duration of one transmission round of RWGP lasts ten times longer than that

1

1

0.8

0.8

Detection probability

Detection probability

14

0.6 FG,α=1 FG,α=0.7 FG,α=0.3 IG,α=1 IG,α=0.7 IG,α=0.3

0.4

0.2

0

0.6

0.4

FG,L=5 FG,L=8 FG,L=10 IG,L=5 IG,L=8 IG,L=10

0.2

0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 System rounds

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 System rounds

(a)L = 5

(b)α = 0.7

Fig. 8. Probability of defective sensor detection; Simulation results for irregular graphs (k > 3) and random selection (RM) of S = 70 sensors, K = 1.(a) L = 5 master sensors; (b) sensor participation constant α = qK = 0.7.

14 Average network rank value

Detection probability

1

0.8

0.6 FG,α=0.3 FG,α=0.7 FG,α=1 IG,α=0.3 IG,α=0.7 IG,α=1

0.4

0.2

0

5

10

15 20 25 System rounds

30

35

40

12 10 8 FG,α=0.3 FG,α=0.7 FG,α=1 IG,α=0.3 IG,α=0.7 IG,α=1

6 4 2 0

(a)

5

10

15 20 25 System rounds

30

35

40

(b)

Fig. 9. Simulation results for fully connected (FG) and irregular graphs (IG), d > 3 with S = 20 sensors, K = 2 and deterministic selection (DM) of L = 5 master sensors: (a) Probability of defective sensor detection; (b) Average rank value.

of the proposed algorithm. From the figure we can observe that the RW algorithm has very small communication overhead. However, it requires significantly higher number of rounds (S log S ≈ 130 rounds) compared to the detection time of the proposed GP algorithm. VII. C ONCLUSION In this work, we have dealt with distributed sensors’ failure detection in sensor networks. We have proposed a novel distributed algorithm that is able to detect a small set of defective sensors in large sized networks. To this aim, we have designed a probabilistic message propagation algorithm that allows the use of a simple and efficient distance decoder at sensors. The transmitted messages are formed from local sensor observations and they are communicated using a gossip algorithm. We have derived for the worst case scenario the lower bound on the required number of linearly independent messages that sensors need to collect to ensure reliable detection for one defective sensor. We have shown experimentally that this number is quite smaller in practice, even for the small sized networks. Furthermore, we have investigated the tradeoffs of the various design parameters and discussed on how they can be efficiently selected to allow faster detection. The experimental results have shown that the proposed method outperforms other detection schemes in terms of successful detection probability. The algorithm rate and the communication overhead have been also examined and it has been shown that the proposed method is very fast, but it requires higher overhead compared to comparison methods. Finally, it is worth to note that the proposed method is highly robust to topology changes.

15

14 Average network rank value

Detection probability

1

0.8

0.6

0.4

FG,L=10 FG,L=8 FG,L=5 IG,L=10 IG,L=8 IG,L=5

0.2

0

5

10

15 20 25 System rounds

30

35

12 10 8 6 4 2 0

40

FG,L=10 FG,L=8 FG,L=5 IG,L=10 IG,L=8 IG,L=5

(a)

5

10

15 20 25 System rounds

30

35

40

(b)

Fig. 10. Simulation results for fully connected (FG) and irregular graphs (IG), d > 3 with S = 20 sensors, K = 2 and deterministic selection (DM) of master sensors, α = 0.3: (a) Probability of defective sensor detection; (b) Average rank value.

35 Average network rank value

Detection probability

1

0.8

0.6 FG,α=0.3 FG,α=0.7 FG,α=1 IG,α=0.3 IG,α=0.7 IG,α=1

0.4

0.2

0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 System rounds

(a)

30 25 20 FG,α=0.3 FG,α=0.7 FG,α=1 IG,α=0.3 IG,α=0.7 IG,α=1

15 10 5 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 System rounds

(b)

Fig. 11. Simulation results for fully connected (FG) and irregular graphs (IG), d > 3 with S = 70 sensors, K = 2 and deterministic selection (DM) of L = 10 master sensors: (a) Probability of defective sensor detection (b) Average rank value.

R EFERENCES [1] A. Dimakis, S. Kar, J.M.F. Moura, M.G. Rabbat, and A. Scaglione, “Gossip algorithms for distributed signal processing,” Proc. IEEE Trans. Inform. Theory, vol. 98, pp. 1847–1864, Nov. 2010. [2] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry, “Epidemic algorithms for replicated database maintenance,” pp. 1–12, 1987. [3] R. Karp, C. Schindelhauer, S. Shenker, and B. V¨ocking, “Randomized rumor spreading,” pp. 565–574, 2000. [4] R. Dorfman, “The detection of defective members of large populations,” Annals of Mathematical Statistics, vol. 14, pp. 436–440, 1943. [5] M.Young and R. Boutaba, “Overcoming adversaries in sensor networks: A survey of theoretical models and algorithmic approaches for tolerating malicious interference,” To appear in IEEE Communications Surveys and Tutorials, 2011. [6] Y. Cheng and D.-Z. Du, “Bounding the number of columns which appear only in positive pools,” Taiwan J. Math, to apper, 2008. [7] W. Dai and O. Milenkovic, “Weighted superimposed codes and constrained integer compressed sensing,” IEEE Trans. Inform. Theory, vol. 55, pp. 2215–2229, May 2009. [8] A. De Bonis and U. Vaccaro, “Constructions of generalized superimposed codes with applications to group testing and conflict resolution in multiple access channels,” Theor. Comput. Sci., vol. 306, no. 1-3, pp. 223–243, 2003. [9] P. Indyk, H. Q. Ngo, and A. Rudra, “Efficiently decodable non-adaptive group testing,” in SODA, 2010, pp. 1126–1142. [10] M. Cheraghchi, A. Karbasi, S. Mohajer, and V. Saligrama, “Graph-constrained group testing,” To appear in IEEE Trans. Inform. Theory. [11] M. Mezard and C. Toninelli, “Group testing with random pools: Optimal two-stage algorithms,” IEEE Trans. Inform. Theory, vol. 57, no. 3, pp. 17361745, March 2011. [12] Y.-W. Hong and A. Scaglione, “Group testing for sensor networks: The value of asking the right question,” 38th Asilomar Conference on Signals, Systems and Computers, 2004. [13] S. Deb, M. Medard, and C. Choute, “Algebraic gossip: A network coding approach to optimal multiple rumor mongering,” IEEE Trans. Inform. Theory, vol. 52, no. 6, pp. 2486–2507, 2006. [14] D. Mosk-Aoyama and D. Shah, “Information dissemination via network coding,” IEEE Intern. Symposium on Inform. Theory, 2006. [15] P. K. Varshney, Distributed Detection and Data Fusion, Springer-Verlag New York, Inc., 1st edition, 1996.

16

35 Average network rank value

Detection probability

1

0.8

0.6

0.4

FG,L=10 FG,L=8 FG,L=5 IG,L=10 IG,L=8 IG,L=5

0.2

0

30 25 20 15 10 5 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 System rounds

FG,L=10 FG,L=8 FG,L=5 IG,L=10 IG,L=8 IG,L=5 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 System rounds

(a)

(b)

1

1

0.8

0.8

0.6

0.4

GP,DM GP,RM RWGP,DM RWGP,RM RW,DM RW,RM

0.2

0

5

10

15 20 25 System rounds

30

35

Detection probability

Detection probability

Fig. 12. Simulation results for fully connected (FG) and irregular graphs (IG), d > 3 with S = 70 sensors, K = 2 and deterministic selection (DM) of master sensors, α = 0.3: (a) Probability of defective sensor detection; (b) Average rank value.

0.6

0.4

GP,DM GP,RM RWGP,DM RWGP,RM RW,DM RW,RM

0.2

0

40

(a)

5

10

15 20 25 System rounds

30

35

40

(b)

Fig. 13. Comparison in terms of detection performance for networks with S = 20 sensors and L = 5 master sensors. Abbrevations: GP: Proposed method, RWGP: Random Walk rounds with the gossip algorithm with pull protocol dissemination, RW: Random Walk in the network initiated at L sensors. (a) fully connected sensor network; (b) irregular sensor network.

[16] J. N. Tsitsiklis, “Decentralized detection,” Proc. of Advanced Statistical Signal Processing, vol. 2-Signal Detection, pp. 297–344, 1993. [17] Q. Tian and E. J. Coyle, “Optimal distributed detection in clustered wireless sensor networks,” IEEE Trans. on Signal Proc., vol. 55, no. 7, pp. 3892–3904, 2007. [18] R. Viswanathan and P. K. Varshney, “Distributed detection with multiple sensors: Part I-fundamentals,” Proc. IEEE, vol. 85, no. 1, pp. 54–63, Jan. 1997. [19] R. S. Blum, S. A. Kassam, and H. V. Poor, “Distributed detection with multiple sensors: Part II-advanced topics,” Proc. IEEE, vol. 85, no. 1, pp. 6479, Jan. 1997. [20] B. Bui-Xuan, A. Ferreira, and A. Jarry, “Computing shortest, fastest, and foremost journeys in dynamic networks,” Research Report RR-4589, INRIA, Oct. 2002. [21] B. Bui-Xuan, A. Ferreira, and A. Jarry, “Evolving graphs and least cost journeys in dynamic networks,” in Proc. of Modeling and Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt’03). March 2003, pp. 141–150, INRIA Press. [22] M. Cheraghchi, A. Hormati, A. Karbasi, and M. Vetterli, “Group testing with probabilistic tests: Theory, design and application,” Proc. of the 47th Annual Allerton Conference on Communication, Control, and Computing, 2010. [23] R. Gallager, “Low-density parity-check codes,” IEEE Trans. Information Theory, pp. 21–28, Jan 1962.

A PPENDIX A. Decoder failure probability We analyze the occurrences which leads to decoding failure, which happens when: (1) the number of flips of column elements of the matrix W is higher than ǫ, (2) the (K, ǫ)-disjunct property of the matrix W generated probabilistically is violated. For the sake of completeness, we give below a detailed analysis for both cases. Short versions of analysis can be found in [22].

1

1

0.8

0.8

0.6

0.4

GP,DM GP,RM RWGP,DM RWGP,RM RW,DM RW,RM

0.2

0

Detection probability

Detection probability

17

0.6

0.4

GP,DM GP,RM RWGP,DM RWGP,RM RW,DM RW,RM

0.2

0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 System rounds

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 System rounds

(a)

(b)

Fig. 14. Comparison in terms of detection performance for networks with S = 70 sensors and L = 5 master sensors. Abbrevations: GP: Proposed method, RWGP: Random Walk rounds with the gossip algorithm with pull protocol dissemination, RW: Random Walk in the network initiated at L sensors. (a) fully connected sensor network; (b) irregular sensor network.

70

15

10 GP,RM RWGP,RM RW SF GSF

5

0

5

10

15 20 25 System rounds

30

35

40

Average network rank value

Average network rank value

20

60 50 40 30 GP,RM RWGP,RM RW SF GSF

20 10 0

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 System rounds

(a)

(b)

Fig. 15. Average rank value for irregular sensor networks with L = 5 master sensors: (a) S = 20 sensors (b) S = 70 sensors. Abbrevations: GP: Proposed method, RWGP: Random Walk rounds with the gossip algorithm with pull protocol dissemination, RW: Random Walk in the network initiated at L sensors, SF: pull store-and-forward algorithm with a random choice of transmission message available at sensor, GSF: pull store-and-forward algorithm with a greedy choice of a transmission message available at sensor.

1) Failure caused by the high number of flips: The multiplicative form of the Chernoff bound in probability theory for independent variables Xi ∈ {0, 1} and constant t > 0 is given by the Markov inequality: QS tXi ] E[etX ] i=1 E[e P (X ≥ a) = P (etX ≥ eat ) ≤ inf = inf . (20) ta ta t>0 t>0 e e In our case, the probability of the occurrence of flip F is equal to q(1 − p) (the product of probability that sensor plays the role in the test and the deactivation probability), so the expectation of the number of flips Wi that has mi columns is µ = q(1 − p)mi in total. Similarly to the previous equation, we are interested in finding the lower bounds for the joint event that more than (1 + δ)µ flips occurred in the column of the matrix mi : QS E[etFi ] . (21) P (F ≥ (1 + δ)µ) ≤ inf i=1 t(1+δ)µ t>0 e Plugging the probability of the flips event, given as:  1, with probability (1 − p)q, Fi = 0, with probability 1 − (1 − p)q,

(22)

18

5

2

x 10

1 GP RWGP RW

Detection probability

Communication overhead (bits)

2.5

1.5

1

0.5

0.8 0.6 0.4 0.2 FG,α=1 RW

0

5

10 System rounds

15

0

20

25

50 75 100 System rounds

(a)

125

(b)

Fig. 16. (a)Comparison of the communication overhead for several algorithms, for the following parameter values: (S, L, Ln , α, Rd , Id , τ ) = (70, 5, 50, 0.7, 7, 7, 80). Graph is fully connected. Abbreviations: GP: Proposed method, RWGP: Random Walk rounds with gossip algorithm and pull protocol dissemination, RW: Random Walk in the network initiated at L sensors. (b) Comparison of detection vs. number of hops.

to the previous equation leads to: P (F ≥ (1 + δ)µ) ≤ inf

t>0

Qmi

i=1 [(1

− p)qet + (1 − (1 − p)q)] = inf t>0 et(1+δ)µ

Q mi

i=1 [(1

− p)q(et − 1) + 1] . et(1+δ)µ

If we set (1 − p)q(et − 1) = x and plug the inequality 1 + x < ex , we obtain: Qmi (1−p)q(et −1) t t e(1−p)qmi (e −1) eµ(e −1) i=1 e = inf = inf t(1+δ)µ . P (F ≥ (1 + δ)µ) ≤ inf t>0 t>0 e t>0 et(1+δ)µ et(1+δ)µ For the constant t = log(1 + δ), we finally obtain:

(23)

(24)

δ

e )µ = eµδ−µ(1+δ) log(1+δ) . P (F ≥ (1 + δ)µ) ≤ ( (1 + δ)(1+δ) Observing that log(1 + δ) >

2δ 2+δ ,

(25)

the Eq. (25) becomes: P (F ≥ (1 + δ)µ) ≤ e

−µδ2 2+δ

,

(26)

where the constant δ > 0. 2) Failure caused by violation of (K, ǫ)- disjunctnes property: Row of the matrix Wi (tI ) is considered to have a good disjunct property if a single symbol “1” occurs, while the rest K values are equal to zero. The probability to have such event is equal to µ = q(1−q)K , where we denote the total number of the rows with such property with G. The distribution of such events is binomial with mean value µ and the cumulative distribution function for the Bernoulli distribution is f (k; n, p) = P (X ≤ k), where k denotes the success in n trials, where each trial success is given with the parameter p ∈ (0, 1). The probability of having less then ǫ rows with a good disjunct property, under the assumption that ǫ < µ (equivalent to the previous expression P (X ≤ k)): 2 1 (µ−ǫ) µ

P (G < ǫ) ≤ e− 2

K −(1−p)(1+δ)]2

=e

−qmi [(1−q)

2(1−q)K

.

(27)

−α Observing that 2−α ≥ e−α ≥ 3−α (from 2 < e < 3), the limit values of (1 − q)K given with lim (1 + ) = e−α and the K→∞ K K −(1−p)(1+δ)]2 K −α , we can set γ = [(1−q) 2(1−q) to lay between logarithm property is ab = ebloga , we can set 3−α ≤ (1 + −α K K ) ≤ 2 (log2, log3), so we obtain: P (G < ǫ) ≤ e−qmi γ . (28) 3) Condition for the required number of measurements: For the fixed connection matrix, we have to check when the disjunct property holds. The union probability bound for all possible choices of s out of S columns (eq. (28)) is P ≤ Se−mi qγ and it log S vanishes in case when mi > K αγ . Further on, for the condition (26) gives the probability bound for the number of flips in any K out of T columns exceeds certain value: Ke

−δ2 4

(1−p)qmi

= Ke

−δ2 4γ

(1−p)qlog(N )

= o(1).