Modeling Pairwise Key Establishment for Random Key ... - CiteSeerX

2 downloads 0 Views 685KB Size Report
Mehta is with. Tumbleweed. Communications. (email: manish.mehta@tumbleweed.com). A. van de Liefvoort and D. Medhi are with the Department of Computer.
Modeling Pairwise Key Establishment for Random Key Predistribution in Large-scale Sensor Networks Dijiang Huang, Member, IEEE, Manish Mehta, Member, IEEE, Appie van de Liefvoort, Member, IEEE, Deep Medhi, Senior Member, IEEE,

Abstract— Sensor networks are composed of a large number of low power sensor devices. For secure communication among sensors, secret keys are required to be established between them. Considering the storage limitations and the lack of postdeployment configuration information of sensors, Random Key Predistribution schemes have been proposed. Due to limited number of keys, sensors can only share keys with a subset of the neighboring sensors. Sensors then use these neighbors to establish pairwise keys with the remaining neighbors. In order to study the communication overhead incurred due to pairwise key establishment, we derive probability models to design and analyze pairwise key establishment schemes for large-scale sensor networks. Our model applies the binomial distribution and a modified binomial distribution and analyzes the key path length in a hop-by-hop fashion. We also validate our models through a systematic validation procedure. We then show the robustness of our results and illustrate how our models can be used for addressing sensor network design problems.

I. I NTRODUCTION Large-scale sensor networks are composed of a large number of low-powered sensor devices. According to [1], the number of sensor nodes deployed to study a phenomenon may be on the order of hundreds or thousands; depending on the application, the number may reach an extreme value of millions. Typically, these networks are installed to collect sensed data from sensors deployed in a large area. Within a network, sensors communicate among themselves to exchange data and routing information. Because of the wireless nature of the communication among sensors, these networks are vulnerable to various active and passive attacks on the communication protocols and devices. This demands secure communication among sensors. Due to inherent storage constraints, it is infeasible for a sensor device to store a shared key value for every other sensor in the system. Moreover, because of the lack of postdeployment geographic configuration information of sensors, keys cannot be selectively stored in sensor devices. Although a na¨ıve solution would be to use a common key between every pair of sensors to overcome the storage constraints, it offers weak security. Manuscript received March 2005; revised February 2006, April 2006. D. Huang is with the Department of Computer Science & Engineering, Arizona State University, Tempe, AZ, USA (e-mail: [email protected]). M. Mehta is with Tumbleweed Communications. (email: [email protected]). A. van de Liefvoort and D. Medhi are with the Department of Computer Science and Electrical Engineering, University of Missouri–Kansas City, USA (e-mail: [email protected], [email protected]).

Random Key Predistribution (RKP) schemes ([10], [6], [15] and [8]) have been proposed to provide flexibility for the designers of sensor networks to tailor the network deployment to the available storage and the security requirements. The RKP schemes propose to randomly select a small number of keys from a fixed key pool for each sensor. Sensors then share keys with each other with a probability proportional to the number of keys stored in each sensor. Since the RKP schemes necessitate only limited number of keys to be preinstalled in sensors, a sensor may not share keys with all of its neighbor nodes. In this case, a Pairwise Key Establishment (PKE) scheme is required to set up shared keys with required fraction of neighbor nodes. The PKE schemes require sensors to set up pairwise keys via the nodes that share keys with either or both the sensors. This PKE phase involves communication overhead for finding the shortest path to a neighbor node and for setting up the pairwise key through that path. The lesser the number of keys preinstalled in each sensor, the lower the probability that a sensor shares a key with a given neighbor node. Consequently, the sensor requires more overhead in the PKE phase with the remaining neighbor nodes. Studies in [5] show that the energy consumption due to communication in sensors is several orders higher than that due to computation overhead. The constraints such as scarce battery power and limited storage necessitate a reference model to study the tradeoff between storage and communication overhead involved during the PKE phase in RKP schemes. It may be noted that the memory limitation of sensors restricts the number of keys that can be preinstalled in each sensor to a small number. For example, the capabilities of sensor nodes for large-scale sensor networks can be as limited as those of Smart Dust sensors [12], [11] that have only 8Kb of program and 512 bytes for data memory. Moreover, studies in [6] and [8] show that a small key pool size increases security vulnerabilities. Thus, for large-scale sensor networks, a small number of keys preinstalled in each sensor and a large key pool size result in a small value of probability (p1 ) that two sensors share keys (see (1) in Section II-B.1). Our studies show that the smaller the value of p1 , the higher the number of hops required to set up pairwise keys (A detailed analysis is given in Section V). Analyses presented in [6] and [8] provide communication overhead in the PKE phase for up to 3 hops. Due to the restrictions mentioned above, a general mathematical model to study the communication overhead for the PKE phase is required. In this paper, we propose a probability model to analyze

©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE.

2

communication overhead requirements for the PKE phase in RKP schemes. Unlike the PKE scheme proposed in [8], our model is based on the PKE scheme where sensors set up pairwise keys using only their neighbor nodes. This design significantly reduces the communication overhead involved in the PKE phase. Similar to the recent schemes in [10], [6], [15], [8], [22] and [14], our model is based on networks with uniformly distributed sensors. Our model applies the binomial probability distribution and a modified binomial probability distribution (presented in Section III-B) in a hop-by-hop fashion. There are three input parameters to our model: 1) probability of two sensors sharing keys, 2) average number of neighbors of a sensor, and 3) probability that two neighbors of a node share keys and are located within each other’s communication range. Our model can be used for evaluating the fraction of neighbors of a sensor to which it can communicate securely. Furthermore, the model derives the probability that for a given sensor network configuration, every sensor can securely communicate with all its neighbors. We then validate our model through our proposed validation procedure. Finally, we use our model to analyze the communication overhead in the PKE phase. The rest of the paper is organized as follows: In section II, we provide necessary background and related work in the area of RKP schemes. Section III describes the proposed probability model for pairwise key establishment and key-graph connectivity. Section IV presents the validation methodology and results of our proposed probability model. Communication overhead analysis of using our proposed model is given is Section V. Section VI summarizes the work. II. BACKGROUND

OF

R ANDOM K EY P REDISTRIBUTION S CHEMES

The goal [10] of RKP schemes is to reduce the number of preinstalled keys in each sensor considering unknown postdeployment geographic configuration. The preinstalled keys can help the sensor to set up pairwise keys with its neighbors. Sensors located within a sensor’s communication range are called neighbors of that sensor. In this paper, we use terms sensor and node interchangeably. In this section, we first list phases for pairwise key setup in RKP schemes. We then provide general mathematical background of RKP schemes in literature. Finally, we present related work. A. The Phases In Random Key Predistribution Schemes Four main phases for key setup in RKP schemes are presented as follows: 1. Key predistribution phase: A centralized key server generates a large key pool offline. The key server is assumed to be protected and no adversary can break into the server to reveal the keys. The procedure for offline key distribution is as follows: a) Generate a large key pool of size P . b) Randomly select m different keys for each sensor from the key pool to form a key ring. c) Load the key ring into the memory of the sensor.

d) Assign a unique node identifier or key ring identifer to each sensor. 2. Sensor deployment phase: Sensors are randomly picked and uniformly distributed in a large area. Typically, the average number of neighbors of a sensor (n′ ) is much smaller than the total number of deployed sensors (n). 3. Key discovery phase: Two steps are involved in the key discovery phase. In the first step, each sensor attempts to discover shared key(s) with each of its neighbors. To accomplish this, the sensor can broadcast its key ring identifier to its neighbors. The sensor can also use the secret discovery protocol specified in [10], [17], [7] to discover the shared key without using cleartext broadcast. After the first step of the key discovery phase, the sensor knows all its neighbors. The set of all neighbors of sensor i is represented by Wi and |Wi | = n′ . The set of neighbors of sensor i who share at least one key with the sensor i is represented by Qi . The set of neighbors of sensor i who do not share any key with sensor i is represented by Ri . Thus, we have Wi = Qi ∪ Ri and |Qi | + |Ri | = n′ . In the second step, every sensor i broadcasts its set Qi . Using the sets received from neighbors, a sensor can build a key graph (see Definition 1) based on the key-share relations among neighbors. 4. Pairwise key establishment phase: If sensor i shares at least one key with a given neighbor (a neighbor in set Qi ), the shared key(s) can be used as their pairwise key(s). For each of the neighbors in set Ri , sensor i uses the key graph built during key discovery phase to find a key path (see Definition 2) via the neighbors in set Qi to set up a pairwise key. Once a pairwise key is set up with a neighbor in set Ri , the neighbor is included in set Qi and deleted from set Ri . The above PKE procedure can be achieve by using source-routing based pairwise key PKE protocol [13]. The goal of the PKE phase for sensor i is to set up pairwise keys with its neighbors in |Qi | set Ri and satisfy f = |W ≥ c, where c is the fraction i| of the total number of neighbors of sensor i that are required to be reached. Definition 1 (Key graph): A key graph maintained by node i is defined as Gi = (Vi , Ei ) where, Vi = {j|j ∈ Wi ∨ j = i}, Ei = {ejk |j, k ∈ Vi ∧k ∈ Wj ∧j ∈ Wk ∧jSk}, S is a relation defined between two nodes if they share at least one key after the key discovery phase. Definition 2 (Key path): A key path between node A and B is defined as a sequence of nodes A, N1 , N2 ,. . ., Nj , B, such that, each pair of nodes (A, N1 ), (N1 , N2 ), . . ., (Nj−1 , Nj ), (Nj , B) has at least one shared key after the key discovery phase. The length of the key path is the number of pairs of nodes in it. B. Mathematical Foundations of Random Key Predistribution Scheme An important probability (p1 ) is the probability that two nodes share at least one key after the key predistribution phase [10]. We first state this result here for completeness. We then

3

TABLE I C OMPARISON O F EXISTING RKP S CHEMES Basic scheme[10] q-composite[6] Grid-based[15] sk-RKP[8], [15] Key pool (P) unstructured unstructured structured structured Key selection (m) RWR RWR restricted random restricted random Shared-key discovery CB/PSD CB/PSD CB/PSD CB/PSD Number of key paths one-path one-path one-path one-path Communication overhead analysis n/a partial♮ n/a partial♮ ♮ RWR: random without replacement. : They only present the analysis for key path length 2 and 3. CB/PSD: clear-text broadcasting or private shared-key discovery.

k-path[6], [22] unstructured RWR CB/PSD k-path n/a

give two approaches proposed in the literature that are used for computing the key graph connectivity. 1) The probability that two nodes share at least one key (p1 ) : Given a key pool of size P and that each sensor is loaded with randomly selected m different keys from the key pool, the probability that two sensors share at least one key is given as (for proof, see [10]):  P −m P ((P − m)!)2 m m =1− p1 = 1 − , (1)  2 P (P − 2m)!P !

with its neighbors. In other words, the key setup requests must be flooded instead of using chosen paths. Consequently, the communication overhead invoked by the pairwise key requests is prohibitively high. Moreover, the area covering approach is based on the analysis of geographical locations of nodes on all possible key paths. For the path length greater than 3, the analyses of the node locations and the graphical representations are very complicated. In their work in [8], the authors only present the analyses for key paths with lengths 2 and 3.

2) Random graph approach (RGA) : An approximation method to compute key graph connectivity is proposed by Eschenauer and Gligor [10]. Their method utilizes the random graph theory [20]. Given a desired probability of the graph −c connectivity Pc = ee , pl is the global two-node connectivity probability that a link exists between any two nodes:

C. Related Work

m

(where P > 2m).

c ln(n) + , n n where n is the total number of sensors in the system. Using pl , we can derive the local two-node connectivity probability (p1 ) by amplifying pl with factor (n − 1)/n′ . Here, p1 is the estimated probability that a node shares a key with any of its neighbors, where n′ is the average size of its neighborhood. pl =

n−1 p. n′ l Note that in [8], the authors define the Pc as the probability for global key graph connectivity and the p1 as the probability for the local key graph connectivity. The RGA uses the probability of global key graph connectivity to estimate the local key graph connectivity. This approach does not provide the key path length information of a key graph that can be useful in design of pairwise key establishment schemes. Moreover it estimates the key graph connectivity and produces inconsistent results when the neighborhood size n′ is relatively small (see our validation results in Section IV-E). p1

=

3) Area covering approach: Du et al. [8] uses the area covering approach to analyze the probability that a node can set up pairwise keys with any of its neighbors. They calculate the two-node connectivity probability as a function of the overlapped range shared by a sensor with its neighbors. During the PKE phase, the intermediate nodes may not necessarily be located within the source node’s communication range. Thus, the sensor cannot determine key paths to set up pairwise keys

The first RKP scheme was proposed by Eschenauer and Gligor [10], and we call it the basic scheme. The proposals that followed are all based on the basic scheme and they propose improvements in terms of security. The proposed improvements focus on three aspects: key pool structure [8] [15], key selection threshold [6], and path-key establishment protocol [6] [22]. Table I shows five recent proposals and the basic scheme. Chan et al. [6] proposed the q-composite scheme. In this scheme, the key selection threshold is set to q. To form a secure link, the scheme requires at least q shared keys between two nodes. The structured key pool scheme (sk-RKP) [8] proposed by Du et al. and Grid-based scheme [15] proposed by Liu and Ning change the unstructured key pool to a structured key pool. The structured key pool is formed by multiple key spaces. Within each key space, the key space structure uses the group key scheme proposed by Blom [2] and further developed by Blundo et al. [3]. Both q-composite scheme and k-path scheme [22] use k key paths (k ≥ 1) to setup pairwise key. The k-path scheme uses secret sharing scheme1 to setup pairwise key. In all presented schemes, during the key discovery phase, both the clear-text broadcast discovery and the private share-key discovery scheme2 are specified. Clearly, the private share-key discovery approach involves more communication overhead. It may be noted that the key-graph connectivity problem we have considered here has some similarity to global network connectivity problems, i.e., the connectivity of an entire 1 In their proposed scheme, a pairwise key can be derived by k exclusive-OR operations on k secret shares received from k paths. 2 Specified in [10], using private share-key discovery, for every key on a key ring, each node could broadcast of list α, EKi (α), i = 1, . . . , k, where α is a challenge. The encryption of EKi (α) with the proper key by a recipient would reveal the challenge α and establish a shared key with the broadcasting node.

4

TABLE II

a

N OTATIONS ki n n′ p1 p¯1 pc p2 p¯2 pr (h) pr ( h) pcon (h)

number of nodes selected on hop i Total number of sensors in the network The average number of sensors within a sensor’s range Probability that two sensors share at least one key before the PKE phase. p¯1 = 1 − p1 The probability that any two neighbors of a sensor are within each other’s range p2 = p1 pc p¯2 = 1 − p2 The probability that a sensor can set up a pairwise key with a given neighbor with exactly h hops The probability that a sensor can set up a pairwise key with a given neighbor within h hops The probability that a sensor can set up pairwise keys with all of its neighbors within h hops

network; a comprehensive study on graph connectivity can be found in [18]. The recent work by Xue and Kumar [21] addresses the connectivity of wireless networks by inspecting the minimal number of neighbors in order to achieve global connectivity. Note that both these works achieve network connectivity of the entire graph. In our case, the key-graph connectivity is from the point of view of each individual sensor (node) when global knowledge/connectivity is not known/possible due to limited storage and communication ability at each sensor. In our sensor network model, two neighboring sensors must be physically visible via a direct wireless link in order to set up a direct key in their key graphs; in other words, the wireless links outside of a sensor’s communication range are considered to be invisible for that sensor. Consequently, given the storage and communication restrictions, we study the following problem: the probability that a node can establish a key with one or all of neighbors within h-hop visible key path(s).

r b

i

x

j

d

c

Fig. 1.

Overlapped Region Between Two Sensor Nodes.

distance between i and j is x. The cumulative distribution function for the distance between a node and one of its neighbors is given by F (x) = P r(distance ≤ x) = x2 /r2 . Thus, the probability density function is f (x) = F ′ (x) = 2x/r2 . The area of the overlapped region abcd is: r x2 2 −1 x Aabcd (x) = 2r cos ( ) − x r2 − . 2r 4 The expected area of the overlapped region is given by [6]: √ Z r 3 3 2 A(x) = Aabcd (x)f (x) = (π − )r = 0.5865πr2 . 4 0 As shown in Fig. 1, a node k must be located within the shaded region abcd in order to be in node i’s and node j’s range simultaneously. On an average, pc is the ratio of the shaded area to the whole area of the circle. Using the result for A(x), we can then determine pc as pc = 0.5865 πr2 /(πr2 ) = 0.5865.

Furthermore, given a node i and its range, the probability that any two nodes within the range of node i share a key and are in each other’ range is given as: p2 = pc p1 .

III. P ROBABILITY M ODEL F OR PAIRWISE K EY E STABLISHMENT In this section, we analyze the PKE phase of RKP scheme for a large number of sensors uniformly distributed within a vast 2-dimensional area. The uniform distribution of sensors was introduced in [10] and extensively utilized in many proposals, such as [6], [15] and [8]. We derive here the probability that a node can reach any of its neighbors with exactly h hops and the probability that a node can reach all of its neighbors within h hops. Notations used for this work are summarized in Table II. A. Computing pc and p2 In order to model the probability that a sensor can set up a pairwise key with any of its neighbors with exactly h hops, we first determine the probability that any two neighbors, say j and k, of a sensor, say i, are within each other’s range—this probability is denoted by pc . Our analytical approach requires a result on expected area of overlap region, A(x), given in [6]. In order to review this result, consider Fig. 1; we can draw two circles with centers as sensor node i and j and each with communication radius r resembling the range of sensors. The

(2)

(3)

In our case, p2 = 0.5865 p1 due to (2). It may be noted that p2 is computed based on the assumption that every sensor has a circular communication range with equal radius. Other mechanisms can be devised to find p2 for different configurations. However, any different mechanism will not affect our probability model introduced in Section III-B and Section III-C. B. Computing pr (h) The basic idea behind our approach is the probability of node selection on each hop. The selection follows the binomial distribution or a modified binomial distribution. The binomial probability mass function for given n and p is represented as:   n k f (n, p) = p (1 − p)n−k , k = 1, . . . , n. k The modified binomial probability mass function is represented as follows:   n ′k f (n, p′ , p′′ ) = p (1 − p′′ )n−k , k = 1, . . . , n (4) k where probabilities p′ and p′′ need not be the same.

5

stage B stage A stage C

stage C

1

1 a

k1=0

1

b

a k1

n'-1

n'-k1-1 n'-1 nodes

n'-1 nodes

(a) h = 1

(b) h = 2

(c) h ≥ 3

pr (1) = p1 .

h

kh-1

b

Node selection for computing pr (h).

(5)

The case for h = 2 is shown in Fig. 2(b). In stage A, we select k1 nodes from n′ − 1 nodes as the first hop nodes who share at least one key with node a, where k1 = 1, . . . , n′ − 1. Since our goal is to derive the probability that node b is reachable with exactly 2 hops, once node b is fixed, we can only select at most n′ − 1 nodes as the first hop nodes. We Pn′ −1 ′  ′ now have the equation p¯1 k1 =1 nk−1 (p1 )k1 (p¯1 )n −k1 −1 for 1 stage A. This equation can be interpreted as follows: k1 out of n′ − 1 nodes are selected for the first hop; the probability that k1 nodes share keys with node a is (p1 )k1 ; the probability that ′ n′ −k1 −1 nodes do not share keys with node a is (p¯1 )n −k1 −1 ; also, in each selection, we have the condition that node a does not share keys with node b (represented by p¯1 = 1 − p1 ). Following stage A, is the stage C in the second hop. Any selected node in stage A may share keys with node b. Thus the probability that at least one of the selected nodes in stage A shares key with node b is given as: 1 − p¯2k1 , in which p¯2 k1 means that all selected nodes in stage A do not share a key with node b. It may be noted that the probability that a node selected in stage A shares keys with node b is p2 , as derived in Section III-A. Thus, we have,

k1 =1

h-1

kh-2

k2

n'-1 nodes

Fig. 2 gives a graphical view of our approach. Node a is a sensor that wants to set up a pairwise key with one of its neighbors, say node b. a’s range contains n′ nodes including the node b. pr (h) is the probability that node a can set up pairwise key with a given neighbor with exactly h hops. As shown in Fig. 2, our equation is derived in three stages. Stage C always represents the final stage that node a can reach node b; stage B represents all the intermediate hops when h ≥ 3; and stage A represents the first hop when h ≥ 2. For h = 1, as shown in Fig. 2(a), there is no intermediate node. Therefore, p1 is the probability that node a shares at least one key with node b. If node a and node b share key(s), the same key(s) can be used as pairwise key(s) and no additional key setup is required. Otherwise, they cannot set up pairwise key(s) directly and must go through the PKE phase to set up pairwise key(s). Thus, we have:

pr (2) = p¯1

h-2

2

b k1

Fig. 2.

′ nX −1  ′

stage C sub-stage B1

2

a

sub-stage B2

stage A

 ′ n −1 (p1 )k1 (p¯1 )n −k1 −1 (1 − p¯2 k1 ). (6) k1

For h = 3, as shown in Fig. 2(c), there exists three stages. In stage A, the mathematical expression is similar to the expression we presented for h = 2. One difference Pn′ −2 ′  ′ is that equation p¯1 k1 =1 nk−1 (p1 p¯2 )k1 (p¯1 )n −k1 −1 is a 1 cumulative modified binomial distribution function multiplied by p¯1 (using (4), p′ = p1 p¯2 , p′′ = 1 − p¯1 ). The expression (p1 p¯2 )k1 stands for the condition that none of the k1 nodes selected for the first hop share keys with node a and node b simultaneously. Since there must be at least one node available on hop 2, the maximum number of nodes that can be selected from n′ − 1 candidate nodes is n′ − 2. In stage B (only substage B2 exist when h = 3), the formula is a cumulative binomial distribution function that k2 nodes are selected from the remaining nodes (n′ − k1 − 1 nodes) after the selection for the first hop and each of the k2 nodes shares at least one key with at least one of the k1 nodes. for stage B is  The expression P ′ 1 −1 n′ −k k1 k2 k1 n′ −k1 −k2 −1 1 −1 given as: nk2−k (1 − p ¯ ) ( p ¯ ) . 2 2 =1 k2 All the nodes selected for this hop are eligible to connect to node b. The expression of stage C is 1 − p¯2 k2 , which means that at least one of the k2 nodes shares a key with node b. Using the expressions of all the stages, we derive the following equation for h = 3: ′  nX −2  ′ ′ n −1 pr (3) = p¯1 (p1 p¯2 )k1 (p¯1 )n −k1 −1 . k1 k1 =1

×

n′ −k 1 −1  ′ X

 k2 n − k1 − 1 1 − p¯2 k1 k2 k2 =1 ′   n −k1 −k2 −1 × p¯2 k1 1 − p¯2 k2 .

(7)

For h ≥ 4, there exists three stages; see Fig. 2(c). The analysis of stage A′ is the same as that for h = 3. We have the P n′ −1 n′ −k1 −1 k1 expression p¯1 nk1−h+1 for stage =1 k1 (p1 p¯2 ) (p¯1 ) A. There are two sub-stages in stage B, we denote them as B1 and B2 . The expression of substage B1 represents the formulas from hop 2 to hop h − 2. It represents a iterative process that is used on each hop. In other words, on hop i, ki nodes are selected from the left over nodes of previous i − 1 hops. For example, on hop 2, we select k2 nodes from n′ − k1 − 1 nodes. We now show inductionPas follows: on hop i, we can select ki nodes from n′ − 1 − i−1 j=1 kj nodes. Following (4), p′ = (1 − p¯2 ki−1 )p¯2 shows the probability of selection of a

6

stage A stage C

stage C

1

1

2

1

a

a

n' nodes

k2

n′X −h+1  ′

 ′ n −1 (p1 p¯2 )k1 (p¯1 )n −k1 −1 k1

×

h−2 Y

kj

j=1

X

n′ − 1 −

 

ki =1

i=2



i−1 X

kj

ki ′

×



1 − p¯2 ′

n −1−

×

 ki−1

h−2 X

kj

Xj=1

kh−1 =1

p¯2



ki

× p¯2

n′ − 1 −

 

n ki−1

h−2 X

kj

j=1

kh−1

× 1 − p¯2 kh−2

× p¯2 kh−2



−1−

  n

i X

kj

j=1

−1−

 × 1 − p¯2 kh−1 .

h−1 X

kj

j=1

(8)

The probability that a node can reach any of its neighbors within h hops is represented as pr ( h). The formula is then given as: pr ( h) =

h X

pr (i),

In this subsection, we study the key graph connectivity of a given sensor network. A graph is said to be connected if any two of its vertices can be joined by a path, disconnected otherwise [4]. We say that a key graph, Ghi , is h-hop-connected at vertex i if vertex i can reach any other vertex of the key graph with a path no more than h hops in length. We now derive the probability that a key graph, Gha , maintained by node a is h-hop-connected. Our equation derivation includes three stages as shown in Fig. 3. There are n′ nodes within a’s range. pcon (h) is the probability that the key graph Gha is h-hop-connected. Stage C always represents the final hop in which node a can reach all its neighbors; stage B represents all the intermediate hops when h ≥ 3; and stage A represents the first hop when h ≥ 2. We first present the formula for h = 1 (see Fig. 3(a)). It is easy to see that pcon (1) is: pcon (1) = (p1 )n .





kh−1

kh

C. Key Graph Connectivity

  

j=1

kh-1

k2

Node selection for computing pcon (h).

pr (h ≥ 4)

i−1 X

k1

(c) h ≥ 3

node that shares at least one key with at least one of the ki−1 nodes in the hop i − 1 and this selected node does not share key(s) with node b; 1 − p′′ = p¯2 ki−1 represents the probability that a node does not share any key with ki−1 nodes in the hop i − 1. ForP each hop i, the value of ki has to be less than n′ − h + i − i−1 j=1 kj to guarantee that at least one node is available at each of the hops from i + 1 to h. The sub-stage B2 represents the hop h − 1. As discussed in the analysis of pr (3), the nodes selected for hop h− 1 are all eligible to share a key with node b. The analysis of stage C is the same as that of stage C for h = 3. Thus, we arrive at:

n′ − h + i −

h

n' nodes

(b) h = 2 Fig. 3.

k1 =1

h-1

n' nodes

(a) h = 1

p¯1

2

stage C

a

k1

k1

=

stage B

stage A

(9)

i=1

where pr (i) is as derived for different values of i earlier in this section. Note that (9) is not dependent on radius r nor on the total number of sensors n; it is dependent on n′ , p1 , and p2 .

(10)

For h = 2, as shown in Fig. 3(b), there are two stages: the first stage A and the final stage C. In stage A, k1 out of n′ nodes (1 ≤ k1 ≤ n′ ) are selected for the first hop. This is a binomial probability mass function with probability p . Thus, we have the binomial probability distribution Pn′ 1 n′  k1 (n′ −k1 ) to represent the first hop. Unlike in k1 =1 k1 p1 p¯1 expressions for pr (2), the value of k1 for pcon (2) can be n′ . In stage C, the probability that each of n′ − k1 nodes shares at least one key with at least one of the k1 nodes is given by 1 − p¯2 k1 . Then, the probability of connecting to all n′ − k1 ′ nodes on hop 2 is (1 − p¯2 k1 )n −k1 . The pcon (2) is given as follows: pcon (2) Pn′ = k1 =1

n′ k1

 k1 n′ −k1 ′ p1 (p¯1 )n −k1 1 − p¯2 k1 .

(11)

For h ≥ 3, as shown in Fig. 3(c), there are three stages. The analysis and mathematical expression of stage A are the same as that for h = 2. There are h − 2 hops in the stage B. For hop 2, we select k2 nodes from n′ − k1 nodes that share keys with nodes in hop 1. Now, we can do induction Pi−1as follows: in hop i, we can select ki nodes from n′ − j=1 kj nodes; it is a binomial probability mass function with the probability 1 − p¯2 ki−1 that each of i hop nodes shares at least P one key i with at least one of the i − 1 hop nodes and n′ − j=1 kj nodes do not share key(s) with i − 1 hop nodes. Then, we

1

1

0.9

0.9

0.8

0.8

Accumulated Normal Distribution

Accumulate Poisson Distribution

7

0.7

0.6

0.5

0.4

0.3

0.7

0.6

0.5

0.4

0.3

0.2

0.2

0.1

0.1

0

0

10

20

30

40

50

60

70

Number of neighbors (n0)

conclude that the stage B is a sequence of cumulative binomial distribution functions and each successive function depends on the previous hop nodes selection. The analysis of stage C is the same as that for h = 2. Then, we have the following expression for h ≥ 3, pcon (h ≥ 3) n′  ′  X ′ n = pk11 (p¯1 )n −k1 k1 k1 =1

×

h−1 Y i=2

i−1 X

X

ki =1 ′

× p¯2

kj

j=1

n − ki−1





i−1 X



ki  n − j=1 kj  k   1 − p¯2 i−1 ki

i X

j=1



kj

× 1 − p¯2 kh−1

9

9.5

10

10.5

11

11.5

Communication radius (r ) 0

Fig. 4. Cumulative Poisson Distribution, r ′ = 10. The average number of neighbors of a sensor is n′ = ρπr ′2 = 50.

n′ −

0 8.5

n



h−1 X j=1

Fig. 5. Cumulative Normal Distribution, r ′ = 10. δ = 0.05r ′ = 0.5 and 2δ = 1.

with mean r′ and standard deviation δ or uniform distribution within the interval [r′ − 2δ, r′ + 2δ]. A. Number of Neighbors of a Sensor We consider A to be a large sensor deployment area, i.e., A ≫ π(r′ )2 where r′ is the average transmission range of a sensor and A′ = π(r′ )2 . We define ρ = n/A is the sensor deployment density. The probability that a node is placed within area A′ is p = A′ /A. Probability, p(x), that x of n nodes are placed in the area A′ is:   ′ n n′ ′ p(x = n ) = p (1 − p)n−n . (13) n′ When n ≫ 1 and A′ ≪ A, we can approximate this solution with a Poisson distribution [19]: ′

kj

. (12)

In order to deploy a sensor system using RKP schemes, we first select different values of p1 and plug them into (10)-(12) to find out suitable value of p1 to achieve the required value of connectivity probability pcon (h). Once p1 is found, we can apply (1) to select the proper P (key pool size) and m (the number of keys to be preinstalled in a sensor).

(np)n · e−np n′ ! ′ ′ (n AA )n A′ ≈ · e−n A ′ n! ′ ′ (ρA′ )n ≈ · e−ρA n′ ! ′ ′2 (ρπr′2 )n ≈ · e−ρπr . n′ ! The cumulative poisson distribution function P (X) is: p(x = n′ ) ≈

IV. VALIDATION M ETHODOLOGY AND R ESULTS In (3)–(12), we have the following assumptions: 1) each sensor can communicate directly with n′ neighbors, where n′ is the average number of neighbors of a sensor, 2) each sensor has the same communication radius r′ , and 3) p2 is a fixed value. Here, through a systematic approach, we show that our mathematical derivations based on above assumptions are sound approximations. We consider following two parameters in our validation: the number of neighbors of a sensor, and the communication radius of a sensor. We also consider two different distributions: 1) sensors are uniformly distributed within a vast area, and 2) a sensor’s communication radius follows normal distribution

P (X) = e−ρπr

′2

′ X X (ρπr′2 )n . n′ ! ′

(14)

(15)

n =0

The average number of neighbors of a sensor is: n′ = ρπr′2 .

(16)

In order to assign a number of neighbors of a sensor, we can simply map uniform distribution in the range of [0, 1] to the cumulative poisson distribution. For example, for each sensor, we randomly select a number within the range [0, 1] (y-axis in Fig. 4); based on the histogram shown Fig. 4, we can find its corresponding x-coordinate, which is the number of neighbors assigned to the sensor.

8

b

r2

k

r1 r0

c

x

k a

i

r2

b i

r2

j r0 r1

d

a

k r0

d i

x j

c

c r2

r1

j b

x

d a

r0

r1 , r2

r1

Fig. 6.

r0

r2

r1 , r2

r0

Coverage area with different communication radius.

B. Generating p2 We assume that sensors have average communication radius r′ when they are shipped out of factories. We consider both normal distribution and uniform distribution to model the communication radius of a sensor. We use mean r′ and standard deviation δ = 0.05r′ . We can easily derive the proportion of a distribution that is below a given number of standard deviations from the mean. It can be shown that only 2.3% of the population will be less than or equal to a value two standard deviations below the mean. Similarly, the same portion that is above the mean. Using the similar mapping technique, we can draw the cumulative normal distribution function for sensor’s communication radius. For example, shown in Figure 5, we can randomly select value from the yaxis (within the interval of dashed lines); then we can assign the radius of a sensor from the x-axis to a sensor. Note that the value selected for a sensor is within 2δ from the mean r′ . In the case of uniform distribution, we randomly select a radius within the range [r′ − 2δ, r′ + 2δ] for each sensor. In real world, the communication radii of sensors are of varying length. Fig. 6 shows the coverage area of sensors in three different scenarios. Sensor i (with communication radius r0 ) has two neighbors j and k with communication radii r1 and r2 , respectively: where 1) r0 ≤ r1 , r2 , 2) r1 ≤ r0 ≤ r2 (the analysis of the scenario r1 ≤ r0 ≤ r2 is the same), and 3) r1 , r2 ≤ r0 . In all our following analysis, we assume r1 ≤ r2 . In order to set up pairwise keys, two sensors must be located within each other’s communication range. In Fig. 6, the shade area is the intersected coverage area between two sensors with smaller communication radii. It is easy to prove that, in all three scenarios, the intersected coverage area used to compute pc is the intersected area of two sensors with less communication radii. We note that, in the scenario r1 , r2 ≤ r0 shown in Fig. 6, if k is located in the area between the dashed circle (centered by node i with radius r2 ) and the circle centered by i (with radius r0 ), node k will not consider i as its neighbor. Thus node k must located within the shade area. The area abcd shown in Fig. 6 is:  2  γ1 + x2 − γ22 −1 Aabcd (x) = cos · γ12 2xγ1  2  γ2 + x2 − γ12 −1 + cos · γ22 2xγ2 p −2 S(S − x)(S − γ1 )(S − γ2 ), (17)

where S = (x+γ1 +γ2 )/2, γ1 and γ2 represent two of the less communication radii of r0 , r1 , and r2 (we assume γ1 ≤ γ2 ). In Section III-A, we presented the cumulative distribution function for the distance between the node and one of its neighbors with F (x) = P r(distance ≤ x) = x2 /r2 ), where we assume each node has the same communication radius r. The probability density function is f (x) = dF (x)/dx = 2x/r2 . For the sensor network with different communication radii, we have the probability density function f (x) = dF (x)/dx = 2x/γ12 . Thus, the expected coverage area abcd is given by: Z γ1 Aabcd (x)f (x)dx. (18) A(x) = 0

Then, we can compute pc as follows: pc = A(x)/πr02

(19)

and, p2 is computed using (3). C. Validation Procedure Our aim here is to validate the soundness of probability models pr h and pcon (h) derived earlier in Section III. We use the following procedure to estimate pr (h) and pcon (h) to compare against the models presented in Section III: 1. Select n′ to be average number of neighbors for a sensor. 2. For the average communication radius of sensors, r′ , use the exponential mapping method presented in Section IV-A to find the number of neighbors of a sensor. 3. Select a distribution for each neighbor, and use the method presented in Section IV-B to assign a radius for each sensor. 4. Use (17)–(19) to compute pc for each pair of neighbors, and then use (3) to derive the probability p2 for the same. Based on the derived value of p2 , assign pairwise keys for each pair of neighbors. 5. Compute pr (h) and pcon (h) based on the steps 2 to 4 described above. The above procedure is run 300,000 times for each distribution selected in step-3, and then the average value for pr (h) and pcon (h) is obtained over the 300,000 runs; note that we consider uniform and normal distribution separately for this step.

9

3

4

5

6

7

8

h

radius=10 radius=30 radius=50 theoretical pr(h)

0.1

0.05

2

3

4

1

2

3

4

5

6

7

8

0.1

3

4

5

8

h

1

2

3

0

radius=10 radius=30 radius=50 theoretical pr(h)

1

2

3

6

7

8

4

5

6

7

8

1

2

3

4

Number of hops (n’=10,p =0.3)

5

0

1

2

3

8

h

6

7

8

4

5

6

7

8

h

Number of hops (n’=50,p1=0.2) 0.6

radius=10 radius=30 radius=50 theoretical p (h)

0.4

r

0.2 0

h

1

2

3

4

5

6

7

8

h

Number of hops (n’=50,p1=0.3)

1

(a) n′ = 10

7

radius=10 radius=30 radius=50 theoretical pr(h)

Number of hops (n’=30,p =0.3)

1

6

0.2

h

radius=10 radius=30 radius=50 theoretical pr(h)

0.2 0

5

0.4

Number of hops (n’=30,p1=0.2)

0.4

h

4

Number of hops (n’=50,p =0.1)

0.1

h

radius=10 radius=30 radius=50 theoretical pr(h)

2

7

0

1

0.2

0.6

0.2

1

6

0.3

Number of hops (n’=10,p1=0.2)

0.3

5

0.4

Probability pr(h)

Probability pr(h)

1

r

0.1

Number of hops (n’=30,p1=0.1)

0.15

0

0

Number of hops (n’=10,p1=0.1)

0.2

0

0.05

radius=10 radius=30 radius=50 theoretical p (h)

0.2

Probability pr(h)

2

0.1

Probability pr(h)

Probability pr(h)

1

radius=10 radius=30 radius=50 theoretical pr(h)

0.15

0.3

Probability pr(h)

0.05

0

Probability pr(h)

radius=10 radius=30 radius=50 theoretical pr(h)

Probability pr(h)

Probability pr(h)

Radius r follows Uniform distribution 0.1

(b) n′ = 30

(c) n′ = 50

5

6

7

8

1

2

3

radius=10 radius=30 radius=50 theoretical pr(h)

0.15 0.1

0.05 1

2

3

4

5

6

2

3

4

5

r

8

1

2

3

7

8

0

radius=10 radius=30 radius=50 theoretical pr(h)

1

2

3

6

7

8

4

5

6

7

8

0.4

1

2

3

0

4

5

1

2

3

h

4

5

6

7

8

h

Number of hops (n’=50,p =0.2)

6

7

0.6

radius=10 radius=30 radius=50 theoretical p (h)

8

0.4

r

0.2 0

h

1

2

3

4

5

6

7

8

6

7

8

h

Number of hops (n’=50,p1=0.3)

1

(d) n′ = 10

8

r

Number of hops (n’=30,p =0.3)

1

7

0.2

h

radius=10 radius=30 radius=50 theoretical pr(h)

Number of hops (n’=10,p =0.3)

6

1

0.2 0

5

radius=10 radius=30 radius=50 theoretical p (h)

Number of hops (n’=30,p1=0.2)

0.4

h

4

Number of hops (n’=50,p =0.1)

0.1

h

radius=10 radius=30 radius=50 theoretical pr(h)

0.1

1

7

1

0.2

0.6

0.2

0

6

0.3

Number of hops (n’=10,p1=0.2)

0.3

5

0.4

Probability pr(h)

Probability pr(h)

0

4

0

Number of hops (n’=30,p1=0.1)

0.2

Probability pr(h)

Probability pr(h)

Number of hops (n’=10,p1=0.1)

r

0.1

h

r

4

0

Probability p (h)

3

0.05

h

radius=10 radius=30 radius=50 theoretical p (h)

0.2

r

2

0.1

Probability p (h)

1

radius=10 radius=30 radius=50 theoretical pr(h)

0.15

0.3

Probability p (h)

radius=10 radius=30 radius=50 theoretical pr(h)

0.05

0

Probability pr(h)

Probability pr(h)

Radius r follows Normal distribution 0.1

(e) n′ = 30

(f) n′ = 50

Radius r follows Uniform distribution

2

3

4

5

6

7

8

h

0.005 0

4

5

6

7

8

h

1

2

3

4

5

6

7

8

h

0

(h) 5

(h)

con

con

4

radius=10 radius=30 radius=50 theoretical p

0.2

Probability p

radius=10 radius=30 radius=50 theoretical pcon(h)

0.1

0.05 3

1

2

3

1

2

3

4

5

6

7

8

h

6

7

8

h

0.8 0.6

radius=10 radius=30 radius=50 theoretical p

0.4 0.2 1

2

3

Number of hops (n’=10,p =0.3)

0.6

4

5

0.2 0

1

2

3

6

4

5

6

7

8

h

0

7

8

h

Number of hops (n’=50,p1=0.2)

1

radius=10 radius=30 radius=50 theoretical pcon(h) 1

2

3

4

5

6

7

8

6

7

8

h

Number of hops (n’=50,p1=0.3)

1

(g) n′ = 10

h

radius=10 radius=30 radius=50 theoretical pcon(h)

0.4

Number of hops (n’=30,p =0.3)

1

5

0.5

(h)

con

0

4

0.8

1

0.15

2

0

Number of hops (n’=30,p =0.2)

0.2

1

0.05

Number of hops (n’=50,p1=0.1)

0.4

Number of hops (n’=10,p1=0.2) Probability pcon(h)

3

(h)

con

radius=10 radius=30 radius=50 theoretical pcon(h)

0.02

0

2

1

0.04

0

1

radius=10 radius=30 radius=50 theoretical pcon(h)

0.1

Number of hops (n’=30,p =0.1)

Probability p

Probability pcon(h)

Number of hops (n’=10,p1=0.1)

Probability pcon(h)

1

radius=10 radius=30 radius=50 theoretical pcon(h)

Probability pcon(h)

0

0.01

Probability pcon(h)

0.5

0.015

con

radius=10 radius=30 radius=50 theoretical pcon(h)

1

(h)

x 10

Probability p

Probability pcon(h)

−3

1.5

(h) n′ = 30

(i) n′ = 50

Radius r follows Normal distribution

3

4

5

6

7

8

h

0.005 0

2

3

5

6

7

8

h

3

4

5

6

7

8

h

0.4

radius=10 radius=30 radius=50 theoretical p

0.2

(h)

con

radius=10 radius=30 radius=50 theoretical pcon(h)

0.1

0.05 2

3

4

5

6

Number of hops (n’=10,p =0.3) 1

(j) n′ = 10 Fig. 7.

7

8

h

Probability p

0.15

(h)

con

0

1

2

3

Number of hops (n’=10,p1=0.2) 0.2

1

0

1

2

3

4

5

6

7

8

h

Number of hops (n’=30,p1=0.2) 0.8 0.6

radius=10 radius=30 radius=50 theoretical p

0.4 0.2

(h)

con

0

1

2

3

4

5

6

Number of hops (n’=30,p =0.3) 1

(k) n′ = 30

4

5

h

Number of hops (n’=50,p1=0.1)

(h)

con

2

Probability p

radius=10 radius=30 radius=50 theoretical pcon(h) 1

0.05

1

0.02

0

4

radius=10 radius=30 radius=50 theoretical pcon(h)

0.1

Number of hops (n’=30,p =0.1)

0.04

0

1

Probability pcon(h)

2

radius=10 radius=30 radius=50 theoretical pcon(h)

Probability pcon(h)

1

0.01

7

8

h

Probability pcon(h)

0

(h)

0.5

0.015

con

radius=10 radius=30 radius=50 theoretical pcon(h)

1

Probability p

x 10

Number of hops (n’=10,p1=0.1)

Probability pcon(h)

Probability pcon(h)

Probability pcon(h)

−3

1.5

0.8 0.6

radius=10 radius=30 radius=50 theoretical pcon(h)

0.4 0.2 0

1

2

3

1

4

5

Number of hops (n’=50,p1=0.2)

6

8

h

radius=10 radius=30 radius=50 theoretical pcon(h)

0.5

0

7

1

2

3

4

5

Number of hops (n’=50,p1=0.3)

(l) n′ = 50

n′ = 10, 30, 50, p1 = 0.1, 0.2, 0.3, radius = 10, 20, 30. (a)-(f): validate pr (h); (g)-(l): validate pcon (h).

6

7

8

h

10

TABLE III T HE CONNECTIVITY PROBABILITY FOR key graph: pr ( h) METHOD VS . pcon (h) METHOD VS . RGA METHOD h)♯

n′

pr ( (p1 )

pcon (h) (p1 )



10 0.982 0.998 15 0.905 0.946 21 0.689 0.814 30 0.531 0.649 40 0.422 0.528 50 0.352 0.446 60 0.304 0.387 70 0.269 0.340 80 0.241 0.305 90 0.220 0.277 100 0.203 0.255 ♯ : h = 5 and p ( 5) > 0.99999, r ♮ : h = 5 and p con (5) > 0.99999, ♭ : P = 0.99999, n = 10000. c

RGA (p1 )



2.072 1.381 0.987 0.691 0.518 0.414 0.345 0.296 0.259 0.230 0.207

D. Validation Results In Fig. 7, we plot computed average value from the validation procedure and the theoretical result derived from our proposed probability models pr (h) (see sub-figures (a)– (f)) and pcon (h) (see sub-figures (g)–(l)). In Fig. 7, each subfigure shows three scenarios with p1 = 0.1, 0.2, 0.3 for different values of n′ = 10, 30, 50 where each plot considers three values or radius at 10, 30, 50 to compare against the theoretical result. We assume that a sensor’s communication radius follows either a uniform distribution with the range r′ − 2δ ∼ r′ + 2δ, or normal distribution with mean r′ and standard deviation δ in our validation process. Our results show that the proposed probability models fit the method used for validation. Readers might notice the gaps between the simulation results and theoretical results in figures (g)–(l). Note however that the evaluated probabilities (see y-axis) are very small; thus, this difference has less practical significance, and our analytical model can be considered to be very accurate. E. Comparison of key graph connectivity methods Next, we compare key graph connectivity. We can find the key graph connectivity probability by using any of the following three methods: • Using the probability that a node connects to any of its neighbors within h hops (see (9)), we can derive the fraction of neighbors of a node for which the pairwise keys can be set up within h hops. This approach only provides the number of neighbors (can be derived by n′ · pr (h)) that connect to a node within h hops. When pr (h) ≈ 1, a node can set up pairwise keys with practically all its neighbors, then we can say the key graph is connected. • The second method is as described in Section III-C (see (10)-(12)). To start with, this method only considers directly reachable nodes and pairwise-key sharing relations among sensors without considering the geographical location of each sensor, which significantly reduces the analytical complexity. In addition, it provides the key path



length information which is valuable for sensor network designers to evaluate or design a sensor system. The third evaluation method, RGA, is as described in Section II-B.2 which is by [10].

In Table III we compare the three approaches. Note that the RGA method requires the total number of sensors (n) while our approach does not. We find that the RGA method may produce inconsistent result, especially when the neighborhood size is relatively small, for example, p1 > 1 when n′ ≤ 20. In addition, compared to pcon method, the RGA method requires higher p1 value when n′ < 40 and lower p1 value when n′ > 40. Since our model has been validated via simulation (see Section IV-D), our comparative results in Table III show that the RGA method will result in inaccurate parameter settings in sensors.

F. Using Connectivity Probability pcon (h) or pr ( h) to Deploy Sensor Networks Our model allows to answer questions such as “Can our proposed analytical model help sensor network designers to deploy sensor networks and evaluate a random key predistribution (RKP) scheme with given storage constraints and network configurations?”. We briefly illustrate this aspect. For instance, we might have the following requirements: “Deploy a uniformly-deployed sensor network such that each node can establish pairwise keys, within 5 hops, to 99.999% of all its neighboring sensors.” This requirement can be translated to the following parameters used in our model, i.e., pcon (h) = 0.99999 and h = 5. Once pcon (h) is fixed, we need to determine the important RKP parameter, p1 . Recall that the selection of p1 is determined by evaluating Equation (12). Since pcon (h) and h are known, there are three variables in Equation (12); they are n′ , p2 , and p1 . From Equation (3), we know that p2 is a function of p1 . Thus, the problem translates to solving Equation (12) in order to determine unknowns n′ and p1 . Relations between different values of n′ and p1 so determined can be found in Table III. For example, if we choose n′ = 30, then p1 = 0.649. In fact, we can create multiple similar tables with different connectivity requirements. The next step is to determine RPK scheme parameters. We choose three RKP schemes for our analysis. They are: basic scheme, q-composite scheme, and sk-RKP scheme (see Table I). The mathematical representations of p1 for each RKP scheme are given in Section II-B.1 and in Appendices A and B. For additional information about these schemes, refer to the publications [10], [6], and ([15], [8]). Thus, we can use equations (1), (20), and (21) to determine the number of keys to be installed in each sensor and the size of the key pool. Based on the above discussion, it is easy to see that our analytical model provides a nice mathematical representation for key-graph connectivity for a given number of hops. This model can help network designers to evaluate the communication overhead (restricted by the number of hops, i.e., h) with the considerations of storage capability of sensors (i.e., the value of m).

11

TABLE IV C OMMUNICATION OVERHEAD A NALYSIS FOR pr ( 4) ≥ 0.99999 AND pr ( 8) ≥ 0.5 pr (i) n′ = 10 0.9973 0.0027 ≪ 0.0001 ≪ 0.0001 0.99999 0.2050 0.1599 0.0853 0.0348 0.0115 0.0031 0.0006 0.0001 0.5002

Weight (%) 99.73 0.27 ≈0 ≈0 40.98 31.97 17.05 6.948 2.299 0.616 0.129 0.020

pr (i) n′ = 30 0.649 0.3509 0.00009 ≪ 0.0001 0.99999 0.0851 0.1062 0.1060 0.0844 0.0566 0.0338 0.0187 0.0097 0.5005

Weight (%) 64.9 35.09 0.009 ≈0 17.00 21.23 21.17 16.86 11.31 6.760 3.733 1.938

Weight (%) 44.6 55.27 0.13 ≈0 10.79 15.21 18.25 18.07 15.01 10.91 7.24 4.53

pr (i) n′ = 70 0.340 0.6548 0.0052 ≪ 0.0001 0.99999 0.0400 0.0602 0.0798 0.0893 0.0840 0.0682 0.0497 0.0299 0.5011

Weight (%) 34 65.48 0.52 ≈0 7.982 12.02 15.92 17.82 16.76 13.61 9.926 5.975

pr(=0.99999 1 n’=10 n’=30 n’=50 n’=70

r

V. C OMMUNICATION OVERHEAD A NALYSIS

pr (i) n′ = 50 0.446 0.5527 0.0013 ≪ 0.0001 0.99999 0.0540 0.0761 0.0913 0.0904 0.0751 0.0546 0.0362 0.0227 0.5004

Probability that two nodes share key(s) (p (h))

Hop (i) 1 2 3 4 pr ( 4) 1 2 3 4 5 6 7 8 pr ( 8)

0.8 0.6 0.4 0.2 0

1

2

3

Number of hops Communication weight for each hop (%)

In this section, we use our proposed probability model to analyze the communication overhead involved in the PKE phase. We analyze the required number of hops to set up a pairwise key and the communication overhead for PKE distributed on each hop. Since sensors establish pairwise keys via only the neighbors nodes in our model, our analysis is based on the average number of neighbors (n′ ) and is independent of the size of the sensor network (n). The number of neighbors (n′ ) of a sensor is usually less than 100. For our computations, when pr ( h) ≥ 0.99999, we assume that a sensor can set up pairwise keys with all its neighbors within h hops. We refer to the condition pr ( h) ≥ p as p-fraction h-connected condition. A special case when pr ( h) ≥ 0.99999 is referred to as strong h-connected condition. In Table IV, we show probability pr (h) under the conditions strong 4-connected and 0.5-fraction 8connected. It may be noted that the communication overhead during pairwise key establishment for strong h-connected sensor networks is mainly distributed within first 3 hops. As shown in Fig. 8, with increases in the neighborhood size, the pairwise key establishment communication overhead shifts from hop 1 to hop 2 and drops dramatically on hop 3. On the other hand, the pairwise key establishment communication overhead for 0.5-fraction h-connected sensor network is distributed in more than 3 hops. Fig. 9 shows that the probability curves become flatter with increase in the neighborhood size and the probability peaks shift to higher hop numbers. We define the communication weight on hop i as pr (i)/pr ( h). We notice that the highest communication weight for each curve also shifts to a higher hop number with increase in the neighborhood size. Thus, we summarize our findings as follows: 1) For strong h-connected (pr ( h) ≥ 0.99999): a) the pairwise key establishment communication overhead is mainly distributed within first three hops. b) the pairwise key establishment communication overhead shifts from the first hop to the second hop when n′ increases (10 ≤ n′ ≤ 100).

100 n’=10 n’=30 n’=50 n’=70

80 60 40 20 0

1

2

3

Number of hops

Fig. 8. Sensor network key establishment communication overhead distribution for pr ( 4) ≥ 0.99999.

2) For p-fraction h-connected (0 < pr ( h) ≤ 0.99999):

a) For given n′ and maximum number of hops h, when pr ( h) decreases, the pairwise key establishment communication overhead shifts from lower number hops to higher number hops, the value of the peak probability decreases and shifts to a higher number hop, the probability curve becomes flatter, and p1 (= pr (1)) decreases (see the changes from Fig. 8 to Fig. 9). b) For a given pr ( h) ≪ 0.99999 and the maximum number of hops h, when n′ increases, the pairwise key establishment communication overhead shifts from lower number hops to higher number hops, the value of the peak probability decreases, the probability curve becomes flatter, and the p1 (= pr (1)) decreases (see Fig. 9).

3) From findings 1) and 2), we make following observations: for a fixed n′ and the maximum number of hop h, the decreases of pr ( h) causes decrease in p1 ; for a fixed given pr ( h), the increases in n′ causes decrease in p1 ; the increase in the maximum number of allowed hops h also causes decrease in p1 . These observations for

12

n’=10 n’=30 n’=50 n’=70

r

Probability that two nodes share key(s) (p (h))

pr(=0.5 0.2

0.15

0.1

0.05

0

1

2

3

4

5

6

7

8

Communication weight for each hop (%)

Number of hops 50 n’=10 n’=30 n’=50 n’=70

40 30 20 10 0

1

2

3

4

5

6

7

8

Number of hops

Fig. 9. Sensor network key establishment communication overhead distribution for pr ( 8) ≥ 0.5.

the PKE phase lead us to the study of tradeoff between the communication overhead, which is restricted by h, and the storage overhead of a sensor, which is restricted by p1 . It may be noted from (1) that for a fixed P , the smaller the p1 the smaller the m. On the other hand, the smaller the maximum number of hops h, the lesser the communication overhead involved. The exact tradeoff study between p1 and h for different scenarios requires additional analysis which is out of scope of this paper. VI. S UMMARY In this paper, we have derived two analytical probability models for large-scale sensor networks to analyze the PKE phase for RKP schemes. Through a validation procedure, we show the robustness of our mdoels. Our models can help designers to analyze the PKE phase for RKP schemes in following ways: 1) to study the key graph connectivity, which in turn helps to determine the number of keys to be preinstalled in each sensor and the range of a sensor, and 2) the pairwise key establishment path length can help in determining the communication overhead during pairwise key establishment and then evaluate if a designed the PKE scheme fulfils the energy consumption requirements for a sensor. The software for the analytical models and the validation process are available at http://www.csee.umkc.edu/ ∼dmedhi/software/. R EFERENCES [1] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A survey on sensor networks,” IEEE Communications Magazine, vol. 40, pp. 102 – 114, August 2002. [2] R. Blom, “An optimal class of symmetric key generation systems,” in EUROCRYPT’84, ser. Lecture Notes in Computer Science, vol. 209. Paris, France: Springer-Verlag, 1985, pp. 335–338. [3] C. Blundo, A. D. Santis, A. Herzberg, S. Kutten, U. Vaccaro, and M. Yung, “Perfectly-secure key distribution for dynamic conferences,” Information and Computation, vol. 146, no. 1, pp. 1–23, 1998. [4] B. Bollob´as, Modern Graph Theory. Springer-Verlag, 1998. [5] D. W. Carman, P. S. Kruus, and B. J. Matt, “Constraints and approaches for distributed sensor network security,” NAI Lab, Tech. Rep., September 2000.

[6] H. Chan, A. Perrig, and D. Song, “Random key predistribution schemes for sensor networks,” in Proceedings of 2003 Symposium on Security and Privacy. Los Alamitos, CA: IEEE Computer Society, 2003, pp. 197–215. [7] R. Di Pietro, L. V. Mancini, and A. Mei, “Efficient and resilient key discovery based on pseudo-random key pre-deployment,” in Proceedings of 18th International Parallel and Distributed Processing Symposium (IPDPS’04), April 2004. [8] W. Du, J. Deng, Y. S. Han, and P. K. Varshney, “A pairwise key pre-distribution scheme for wireless sensor networks,” in Proceedings of 10th ACM Conference on Computer and Communications Security (CCS’03), October 2003, pp. 42–51. [9] W. Du, J. Deng, Y. S. Han, P. Varshney, J. Katz, and A. Khalili, “A pairwise key pre-distribution scheme for wireless sensor networks,” accepted by the ACM Transactions on Information and System Security, 2005. [10] L. Eschenauer and V. D. Gligor, “A key-management scheme for distributed sensor networks,” in Proceedings of 9th ACM Conference on Computer and Communication Security (CCS-02), November 2002, pp. 41–47. [11] V. D. Gligor and P. Donescu, “Fast encryption and authentication: Xcbc encryption and xecb authentication modes,” in Proceedings of 2nd NIST Workshop on AES Modes of Operation, August 2001. [12] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. E. Culler, and K. S. J. Pister, “System architecture directions for networked sensors,” in Proceedings of Architectural Support for Programming Languages and Operating Systems, 2000, pp. 93–104. [13] D. Huang, M. Mehta, and D. Medhi, “Source routing based pairwise key establishment protocol for sensor networks,” in Proceedings of 24th IEEE International Performance Computing and Communications Conference, 2005, pp. 177–183. [14] D. Huang, M. Mehta, D. Medhi, and H. Lein, “Location-aware key management scheme for wireless sensor networks,” in Proceedings of ACM Workshop on Security of Ad Hoc and Sensor Networks (SASN ’04), October 2004, pp. 29–42. [15] D. Liu and P. Ning, “Establishing pairwise keys in distributed sensor networks,” in Proceedings of 10th ACM Conference on Computer and Communications Security (CCS’03), October 2003, pp. 52–61. [16] D. Liu, P. Ning, and R. Li, “Establishing pairwise keys in distributed sensor networks,” ACM Transactions on Information and System Security, vol. 8, no. 1, pp. 41 – 77, 2005. [17] M. Mehta, D. Huang, and L. Harn, “RINK-RKP: A scheme for key predistribution and shared-key discovery in sensor networks,” in Proceedings of 24th IEEE International Performance Computing and Communications Conference, 2005. [18] M. D. Penrose, “On k-connectivity for a geometric random graph,” Random Structures and Algorithms, vol. 15, no. 2, pp. 145–164, September 1999. [19] P. E. Pfeiffer and D. A. Schum, Introduction to Applied Probability. New York: Academic Press, 1973. [20] J. H. Spencer, The Strange Logic of Random Graphs (Algorithms and Combinatorics, ser. 22. Springer Verlag, 2001. [21] F. Xue and P. R. Kumar, “The number of neighbors needed for connectivity of wireless networks,” Wireless Networks, vol. 10, pp. 169– 181, 2004. [22] S. Zhu, S. Xu, S. Setia, and S. Jajodia, “Establishing pair-wise keys for secure communication in ad hoc networks: A probabilistic approach,” in Proceedings of 11th IEEE International Conference on Network Protocols (ICNP), November 2003.

A PPENDIX We briefly review here the mathematical background for two RKP schemes: q-composite scheme [6] and sk-RKP scheme [15], [16], [8], [9]. A. q-composite scheme According to [6], the probability to set up a secure link requires at least q keys. The probability that two nodes can set up q keys is denote as p(q). Thus, p(q) is given as:     p(q) =

P q

P −q 2(m−q)

 2 P m

2(m−q) m−q

.

13

The probability that two nodes can set up a secure link is: p1 (q) = p(q) + p(q + 1) + . . . + p(m).

(20)

B. sk-RKP scheme The sk-RKP scheme has been independently proposed by Du at al. [8], and Liu and Ning [15]. In this scheme, the key pool P of RKP schemes is constructed by ω key spaces and each key space (i.e., a key matrix) is the structure of N sub-key-space (i.e., an array of keys), the size of each key space is λ + 1. For structured key pool, we can determine the number of keys (m) that pre-installed in a sensor is given by m = τ (λ + 1), where τ is number of key spaces are selected for each sensor with 2τ < ω, and λ + 1 is number of keys installed in each sensor from each of selected key spaces. The probability p1 that two sensor nodes share at lease one key is:   ω ω−τ ((ω − τ )!)2 τ τ = 1 − . (21) p1 = 1 −  2 ω (ω − 2τ )!ω! τ

Dijiang Huang (M’00/ACM’00) received his B.S. degree from Beijing University of Posts & Telecommunications, China 1995. He received his M.S., and Ph.D. degrees from the University of Missouri– Kansas City, in 2001 and 2004, respectively. He is an Assistant Professor in the Computer Science & Engineering Department at the Arizona State University. His current research interests are computer networking, security, and privacy.

Manish Mehta is currently a Senior Software Engineer at Tumbleweed Communications, Redwood City, CA. He earned his Ph.D. in Computer Science in 2006 and M.S. in Computer Science in 2002 from University of Missouri–Kansas City (UMKC), USA. He earned his B.E. in Computer Engineering from Mumbai University, India in 1999. His research interests are in Cryptography, Network Security, and sensor networks.

Appie van de Liefvoort is a Professor and Chair of the Department of Computer Science Electrical Engineering at the University of Missouri-Kansas City, where he has been since 1987. Prior to joining UMKC, he was a faculty member at the University of Kansas. He received graduate degrees in Computer Science and Mathematics from the University of Nebraska–Lincoln, and from the Katholieke Universiteit in Nijmegen, the Netherlands, respectively. His research interests include Queueing Theory and performance modeling of computer- and communication networks, specializing in linear algebraic queueing theory, the matrixexponential distribution, and correlated matrix-exponential sequences.

Deep Medhi is Professor of Computer Networking, Computer Science and Electrical Engineering Department at the University of Missouri-Kansas City, USA. He received B.Sc. (Hons) in Mathematics from Cotton College/Gauhati University, India, M.Sc. in Mathematics from the University of Delhi, India, and Ph.D. in Computer Sciences from the University of Wisconsin-Madison, USA. Prior to joining UMKC in 1989, he was a member of technical staff at AT&T Bell Laboratories. He was an invited visiting professor at the Technical University of Denmark and a visiting research fellow at Lund Institute of Technology, Sweden. He is a Fulbright senior specialist. His research interests are resilient multi-layer network design, network routing and design, sensor networks. He has published over seventy papers, and is co-author of the book Routing, Flow, and Capacity Design in Communication and Computer Networks (2004), and the forthcoming book Network Routing: Algorithms, Protocols, and Architectures, both published by Morgan Kaufmann Publishers.