Inner-Circle Consistency for Wireless Ad Hoc Networks - IEEE Xplore

4 downloads 112 Views 4MB Size Report
Awireless ad hoc network is a group of nodes that are capable of forming a network without any prefixed infrastructure. Wireless ad hoc networks (ranging from.
IEEE TRANSACTIONS ON MOBILE COMPUTING,

VOL. 6,

NO. 1,

JANUARY 2007

39

Inner-Circle Consistency for Wireless Ad Hoc Networks Claudio Basile, Member, IEEE, Zbigniew Kalbarczyk, Member, IEEE, and Ravishankar K. Iyer, Fellow, IEEE Abstract—This paper proposes and evaluates strategies to build reliable and secure wireless ad hoc networks. Our contribution is based on the notion of inner-circle consistency, where local node interaction is used to neutralize errors/attacks at the source, both preventing errors/attacks from propagating in the network and improving the fidelity of the propagated information. We achieve this goal by combining statistical (a proposed fault-tolerant cluster algorithm) and security (threshold cryptography) techniques with application-aware checks to exploit the data/computation that is partially and naturally replicated in wireless applications. We have prototyped an inner-circle framework and used it to demonstrate the idea of inner-circle consistency in two significant wireless scenarios: 1) the neutralization of black hole attacks in AODV networks and 2) the neutralization of sensor errors in a target detection/ localization application executed over a wireless sensor network. Index Terms—Intrusion tolerance, ad hoc networks, sensor networks, security, reliability.

Ç 1

INTRODUCTION

A

wireless ad hoc network is a group of nodes that are capable of forming a network without any prefixed infrastructure. Wireless ad hoc networks (ranging from mobile networks of laptops/PDAs to sensor networks) are highly unstable, highly susceptible to accidental errors (in software and hardware components), and easy targets of security attacks. Importantly, these problems stem from the very nature of wireless networks, i.e., node mobility, deployment in harsh environments, need for low-cost solutions, limited availability of communication/computation/energy resources, and broadcast communication [1]. The goal of this paper is to propose and evaluate strategies to build wireless ad hoc networks that continue to operate correctly in hostile computing environments, even if some of the nodes have been compromised by errors1 or attacks. We make the following contributions: . .

Introduction of the notion of inner-circle consistency, where local node interaction (in a one-hop neighborhood) is used to neutralize errors/attacks at the source, both preventing errors/attacks from propagating in the network and improving the fidelity of the propagated information. We achieve this goal by combining

1. The focus of this work is not on transmission errors (e.g., due to fading), but on errors in the computation, communication, or sensing units of wireless ad hoc nodes. Causes of these errors include hardware transients, software bugs, and device degradation.

. C. Basile is with Google Inc., 1600 Amphitheater Parkway, Mountain View, CA 94301. E-mail: [email protected]. . Z. Kalbarczyk and R.K. Iyer are with the Center for Reliable and HighPerformance Computing, University of Illinois at Urbana-Champaign, 1308 W. Main St., Urbana, IL 61801. E-mail: {kalbar, iyer}@crhc.uiuc.edu. Manuscript received 6 Nov. 2005; revised 15 Mar. 2006; accepted 18 Apr. 2006; published online 15 Nov. 2006. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TMC-0321-1105. 1536-1233/07/$20.00 ß 2007 IEEE

a secure topology service, which discovers the local network topology; 2. a deterministic voting technique, which embeds application-aware checks that validate the information disseminated by exploiting the data/ computation that is partially and naturally replicated in wireless applications (e.g., neighboring AODV [2] nodes may compute similar routing information, while near sensor nodes may collect similar environmental data); 3. a statistical voting technique, which improves the accuracy of the propagated information and removes hidden error/attack data by means of a proposed fault-tolerant cluster algorithm; and 4. threshold cryptography, which guarantees message integrity despite node intrusions. Design and formal specification of an inner-circle framework for wireless ad hoc nodes, a reconfigurable architecture that provides a common substrate in which one can embed a wide range of error/attackneutralization techniques. By trading off a targeted dependability level (i.e., guarantees on message integrity) with resource usage, the framework can be scaled to the communication, computation, and energy resources available on a wireless node. The architecture spans both software modules and hardware modules (Crypto-Processor for tamper-resistant key-store plus signature creation/verification introduced in [3] and Fault-Tolerant Cluster Processor for faulty/malicious data masking introduced in Section 4.4.1). Prototype and evaluation of the inner-circle framework with the ns-2 network simulator [4] using two significant wireless scenarios: 1) the neutralization of black hole attacks in AODV networks and 2) the neutralization of sensor errors in a target detection/ localization application run over a wireless sensor network. The first scenario is motivated by the 1.

.

Published by the IEEE CS, CASS, ComSoc, IES, & SPS

40

IEEE TRANSACTIONS ON MOBILE COMPUTING,

devastating impact of black hole attacks (e.g., the network throughput—measured as the ratio between received and sent packets—is reduced from 96 percent to 0.7 percent in our experiments) and by the absence of an effective counter-measure up to the time of this writing [5], [6]. We show that the innercircle approach can maintain the degraded throughput to more than 60 percent. The second scenario is motivated by the experimental evidence (e.g., [7]) that wireless sensor networks can be very unreliable due to sensor devices, which interact directly with the environment and fail days before the electronics may fail. We show that use of the inner-circle approach in a sensor network can halve energy consumption (double network lifetime) while providing a four-to-six-fold improvement in target detection latency and target localization accuracy.

2

SYSTEM MODEL

The system considered comprises a set N of mobile nodes that communicate by exchanging messages through wireless channels. A node does not know the complete set N but can discover nodes in its proximity by means of periodic beacons. Correct nodes are associated with unique ids that they maintain throughout their life and are aware of their geographic position [8]. A threshold cryptography scheme is available [9].2 Given a message h. . .i, we denote with Kp ðh. . .iÞ the message signature produced by a node p with its secret key Kp . Also, we indicate with h. . .iKp a signed message h. . . ; Kp ðh. . .iÞi. A dependability level is an integer L  1 that a (source) node x specifies when it wants to diffuse a piece of information I in the network. Example criteria for the choice of L include the importance of the information and the size of the node’s neighborhood. The inner-circle approach permits a (remote) recipient node y of information I to infer whether I was agreed upon by L neighbors of source x. To support this mechanism, we limit dependability level L to vary within a predetermined range (e.g., 1  L  10) and associate a secret signing key KL with each value of L. KL is not disclosed to any node, but each node only has an ðL þ 1Þ-threshold share of KL (thus, L þ 1 nodes must cooperate to sign a message with secret key KL ). For simplicity, this paper assumes that wireless nodes obtain their signing key shares from a trusted dealer at the system’s initialization time; here, we do not discuss potential extensions to support proactive secret sharing [10]. The system computation is modeled by extending the timed asynchronous model [11] to wireless ad hoc networks. Each node has access to a local hardware clock with bounded drift rate, but node clocks are not required to be globally synchronized. When reasoning about a system, we refer to a global clock whose ticks coincide with the real numbers IR in some right-unbounded range T and we let to ¼ min T be the system startup time. We capture the execution of a wireless ad hoc network through the notion of a system run, which is defined as a triple  ¼ ðH; F ; LÞ, 2. In a t-threshold cryptography scheme, each participant has a share of a secret signing key and can generate shares of a signature of a given ciphertex. If and only if t þ 1 valid signature shares are available, one can generate a full signature of the original ciphertext.

VOL. 6,

NO. 1,

JANUARY 2007

where H is a global history, F is a node failure function, and L is a timely connectivity function. Also, we make a number of assumptions on the power of an adversary. Global History. The execution of the system results in each node performing an event (possibly null) chosen from a set E—example events include communication events such as send, bcast, and recv. A global history of a system execution is a function H : N  T ! E [ fg, where  denotes the null event. If node p executes an event e 2 E at time t, then hðp; tÞ ¼ e; otherwise, Hðp; tÞ ¼  if node p performs no event at time t. Node Failure. Nodes may fail by crashing or by becoming Byzantine. The cause for a node failure may be an accidental error (e.g., a transient in the hardware or a software bug) or an adversary that has compromised the node (e.g., by stealing the node’s secret keys or by reprogramming its software to execute malicious code). The evolution of node failures during a system execution is captured through the node failure function F : T ! 2N , where F ðtÞ denotes the set of nodes that have failed at time t. For simplicity, we do not consider node recoveries; hence, we have F ðtÞ  F ðt0 Þ, for t0 > t. To distinguish crashes from Byzantine failures, we introduce a node crash failure function F C and a node Byzantine failure function F B such that F C ðtÞ  F B ðtÞ ¼ F ðtÞ at any time t. With C ¼ fp 2 N j8t 2 T : p 62 F ðtÞg, we indicate the set of correct nodes, i.e., those nodes that never fail. Timely Connectivity. In the absence of failures and node movement, each node is stably connected to its neighbors, and wireless communication channels are timely, i.e., if a node p sends a message to a neighbor q at a time t, then q receives the message by time t þ  (where  includes both transmission and processing delays). In that case, we say that p is t-connected with q at time t. Nonetheless, the system executions we consider are subject to communication failures, which can be temporary (e.g., a single untimely reception due to collisions) or permanent (e.g., no further reception due to node movement or adversary jamming). The evolution of communication failures and repairs during a system execution is captured through the timely connectivity function L : N  T ! 2N . If q 2 Lðp; tÞ, we say that node p is t-connected with node q at time t, and we t write p e> q. If q is t-connected with p at time t also, then we t say that p and q are t-connected and we write p  q. We assume that a correct node is always t-connected with itself, t i.e, 8p 2 C, 8t 2 T : p e> p. Communication failures can be unpredictable; thus, Lðp; tÞ and Lðq; t0 Þ, with t 6¼ t0 , may differ arbitrarily. To model communication between two t nodes that are not directly connected, we extend the e> relation. We say that node p is k-hop t-connected with node q t at time t, and we write p e>k q, if there is a simple path from p to q of k0  k timely links and passing through only correct t intermediate nodes; formally, p e>k q iff t

9p1 ; . . . ; pk0 1 2 Cj1  k0  k : ^p e> p1 ^k0 2  t t > p ^ p ^ pk0 1 e> q i iþ1 e i¼1  ^ ð8i; j 2 f1; . . . ; k0 gji 6¼ j : pi 6¼ pj ^ pi 6¼ p ^ pi 6¼ q : If q is k-hop t-connected with p also, we say that p and q are t k-hop t-connected and we write p k q.

BASILE ET AL.: INNER-CIRCLE CONSISTENCY FOR WIRELESS AD HOC NETWORKS

41

Fig. 1. Semantic definition of temporal formulas.

Adversary. We make the following assumptions about an adversary: 1) The cryptographic primitives used by correct nodes are secure. 2) The adversary has limited jamming range and cannot disrupt communication in the whole network. 3) A compromised node that colludes with other compromised nodes by sharing (e.g., through out-ofband channels) identities/secret keys [12] is counted for the number of different identities the node can present. We will use this assumption to characterize the power of an adversary while formulating the correctness properties of our protocols. Notation. The notation used in the following sections is greatly simplified by using linear temporal logic operators. We start by defining temporal formulas inductively. Atomic formulas are of the form ep or p e> q, where e is an event and q and p are wireless nodes. Boolean operators (^, _, :, !, $) and temporal operators (u t,}, u t ,}) are used to compose more complex formulas. We also introduce bounded temporal operators (u tT , }T ). Given formulas f1 and f2 , we write f1 ) f2 as a short form of u tðf1 ! f2 Þ. For a system run  ¼ ðH; F ; LÞ and a time t 2 T , we indicate that a formula f holds at time t with ðtÞ f. If t ¼ to , then we write  f or simply f when run  is clear from the context. Fig. 1 formalizes the semantics of temporal formulas.

3

OVERVIEW

OF INNER-CIRCLE

CONSISTENCY

This section introduces the idea of inner-circle consistency through an example. Subsequent sections provide a more detailed presentation of the inner-circle consistency algorithms. Execution Scenario. Consider a wireless network where a given node x wants to send a message m in the network. In an ad hoc network of laptops/PDAs, m could be a route update reporting a newly discovered route to a given destination node. In a sensor network, m could be a notification indicating the presence of a target. The information contained in m can be corrupt due to errors (e.g., a software bug in the routing mechanism ran on x) or due to attacks (e.g., an adversary has compromised node x and is using it to inject fictitious target detections in the network)—see Fig. 2a. Inner-circle Concept. When a node x wants to send a message m in the network, all the wireless nodes that are neighbors of x act as an inner-circle that checks and filters any information originating from x—see Fig. 2b. In this way, the inner-circle can stop any error/attack originating from x. Whereas x’s errors/attacks are handled by its innercircle nodes, we need to also consider possible corruption of the value(s) forwarded outside the inner-circle due to

Fig. 2. Inner-circle consistency concept.

faulty/malicious inner-circle nodes and faulty/malicious nodes on the forwarding paths. Inner-circle Mechanism. On each wireless node, we run a secure topology service that discovers the local network topology (up to two hops away) securely and despite node movement. The service sends periodic beacons and runs a localized authentication protocol that verifies neighbors’ identities. When a node x wants to send a message m, node x first selects a dependability level L and then initiates an innercircle voting protocol, which is executed by x and x’s innercircle nodes. Dependability level L indicates the minimum number of inner-circle nodes that must agree with x in order for the voting algorithm to complete successfully. Node x selects value L based on the size of its inner-circle, the importance of the information in m, and the number of error/attacks that must be tolerated. We limit dependability level L to vary within a predetermined range (e.g., 1  L  10) and associate a secret signing key KL with each value of L. KL is not disclosed to any node but each node only has an ðL þ 1Þ-threshold share of KL (thus, L þ 1 nodes must cooperate to sign a message with secret key KL ). In general, different nodes have different shares of KL and these shares can be generated randomly [9]. When initiating an inner-circle voting protocol, node x specifies (in a propose message) the dependability level L that it wishes to use. Two voting protocols are provided. In Deterministic Voting, the inner-circle enables applicationaware checking by enforcing that the information is propagated outside only if it satisfies application-specific criteria (e.g., the information is contained within a predetermined range). In Statistical Voting, the inner-circle improves the accuracy of the information from x by combining it with the local information that inner-circle nodes have (e.g., x’s local estimate of a target’s position is improved by fusing it with local estimates from x’s innercircle nodes). Importantly, statistical voting must cope with corrupt data values (e.g., local target estimates) sent by faulty/malicious inner-circle members that may disrupt the fusion process. Both deterministic and statistical voting algorithms terminate succesfully only if at least L innercircle nodes cooperate with x. During a voting algorithm execution, an inner-circle node p cooperates with x only if it agrees with the information that x wants to send. Node p manifests its agreement by replying to x with a partial signature

42

IEEE TRANSACTIONS ON MOBILE COMPUTING,

obtained with its share of secret KL . If at least L inner-circle nodes reply to x (i.e., the inner-circle voting protocol terminates), then x can fuse the received L partial signatures with its generated partial signature and, hence, obtain a signature KL of the agreed message m with secret KL . At this point, node x encapsulates message m in an agreed message m0 that includes message m, dependability level L, and the combined signature KL . On receiving an agreed message m0 , a remote node y verifies the validity of the included signature KL and, hence, the validity of message m. Node y delivers m to the local application only if this check passes. Discussion. A unique characteristic of the inner-circle approach is that expensive intrusion and fault-tolerant algorithms are executed only in the proximity of a source node, i.e., within the node’s inner-circle. The advantage is three-fold: 1) Local interaction enables fast detection and suppression of errors/attacks, which both prevents errors/ attacks from escalating and avoids an unnecessary waste of resources, e.g., messages sent, energy consumed. 2) Executing complex protocols only locally helps reduce communication overhead and energy consumption; also, node mobility only affects the execution of the protocol instances run in a locality of the moving nodes. 3) Application-aware checking can be more efficient when performed locally, where redundant application information can be readily available. Conducting this check far from the source may be questionable due to node movement and transmission delay, which may make the checked information obsolete. The inner-circle approach does trade-off performance for dependability, since the number of errors/attacks it can tolerate is limited by the size of the inner-circle, which can be less than what one can theoretically achieve with standard fault tolerance algorithms run across the entire wireless network [13]. Defining larger inner-circles (e.g., including all nodes two hops away from a source node) can effectively rebalance this trade-off; for ease of presentation, however, the remainder of this paper focuses on the case of one-hop inner-circles in a homogeneous set of nodes.

4

NO. 1,

JANUARY 2007

Fig. 3. Inner-circle consistency node architecture.

1.

2.

INNER-CIRCLE CONSISTENCY NODE ARCHITECTURE

This section introduces a wireless node architecture for implementing inner-circle consistency applications (see Fig. 3). At the bottom of the architecture, a Physical Layer, a Medium Access Control (MAC) Layer, and a Link Layer provide best-effort, single-hop unicast/multicast communication. These services are abstracted out as a Single-hop Communication Service. At the top of the architecture, the Application represents a user application that runs on the wireless node, and the Routing and Forwarding Service corresponds to the ad hoc routing and forwarding mechanisms implemented in the wireless node in support of multihop communication. These services are not specific to the inner-circle approach. Five components are unique to the inner-circle architecture—the following description is simplified by means of examples with respect to three nodes, A, B, and C.

VOL. 6,

3.

4.

An Inner-circle Interceptor intercepts messages to/ from the link layer and performs extra actions for those messages that match a registered message template (i.e., a description of the messages for which the application requests inner-circle checking)—e.g., A’s interceptor will intercept messages from A’s application and from the network. Matching outgoing messages are redirected to the innercircle services, while matching incoming messages are suppressed if they have originated from a suspected node (i.e., a potentially misbehaving node) or if the messages’ signatures are incorrect. A Suspicions Manager receives node misbehavior indications from other inner-circle services and maintains a list of the suspected nodes—e.g., A’s suspicions manager would receive A’s voting services report about node C being compromised. The mechanism is such that a node p suspects a node q permanently only if p has a provable evidence of q’s misbehavior (e.g., when p receives a message m that is properly signed by q but has an invalid field or violates the currently executing protocol); otherwise, suspicion is only temporary (e.g., for a few minutes). A Secure Topology Service enables nodes to discover the topology of their neighborhood in a secure manner and to determine in which inner-circle they should participate—e.g., A’s topology service discovers B as A’s inner-circle node. An Inner-circle Voting Service enables nodes to perform deterministic or statistical voting (as specified by the application) on the messages/values sent by an innercircle’s center (or source) node—e.g., a message m sent by A’s application is redirected by A’s innercircle interceptor to A’s voting services, which

BASILE ET AL.: INNER-CIRCLE CONSISTENCY FOR WIRELESS AD HOC NETWORKS

Fig. 4. Inner-circle consistency embodiments. (a) Ad hoc node embodiment. (b) Sensor node embodiment.

initiates a voting algorithm with A’s inner-circle nodes to obtain their agreement on the content of m. 5. A set of Inner-circle Callbacks are application-provided callback functions that supply an applicationspecific customization to the inner-circle voting service and are invoked in response to events occurring in the node (e.g., arrival of a message that needs to be checked)—e.g., B’s inner-circle callbacks enable B’s voting service to extract and check the data from A’s application message that is encapsulated in the protocol messages exchanged during A’s voting algorithm. The proposed inner-circle consistency node architecture can be customized depending on the available resources and the characteristics of the application and of the wireless environment. Fig. 4a and Fig. 4b provide two instantiations of the architecture for ad hoc nodes and sensor nodes, respectively. In the examples, the physical layer is fully hardware-implemented, the MAC layer is partially firmware, and the link layer is an integral part of the operating system (kernel or microkernel). Routing and forwarding service is provided by AODV and Directed Diffusion protocols [2], [14].

43

The architecture also includes two dedicated hardware modules: a Crypto-Processor, which provides tamper-resistant key-store plus key cryptographic functions (i.e., signature generation and verification) (discussed in [3]) and a Fault-Tolerant Cluster Processor (discussed in Section 4.4.1), which provides key error/attack masking functions (i.e., the fault-tolerant cluster algorithm of Section 4.4). These modules can guarantee high protection against malicious tampering with the wireless nodes, high performance, and low energy consumption (up to two orders of magnitude less energy than in software implementations). An early Crypto-Processor was introduced in [3] and has been reconfigured to trade off performance for area occupation and energy consumption. A Fault-Tolerant Cluster Processor has also been developed. In the architectures of Fig. 4, the operating system (Linux for ad hoc nodes and TinyOS for sensor nodes) is augmented with the Inner-circle Interceptor. The interceptor is implemented in Linux as a loadable kernel module and in TinyOS as a TinyOS component that exports a send/ receive TinyOS interface, which is directly used by the sensor application. Other inner-circle services (voting service, secure topology service, and suspicions manager) are implemented as user-level daemons in Linux and as TinyOS components in TinyOS. Applications access innercircle services via an API that allows them 1) to initiate and configure the services (e.g., selection of deterministic versus statistical voting, selection of dependability level L), 2) to specify message templates that describe the application messages to be checked (the architecture enables selective use of the inner-circle approach, as not all application messages are necessarily checked by the inner-circle services), and 3) to specify a set of Inner-circle Callbacks (whose code resides in a shared library in Linux and in a TinyOS component in TinyOS). The architecture is modular and flexible enough that new services can be easily added and existing ones can be scaled down depending on node resources and application needs. The following sections discuss the main inner-circle services in greater detail.

4.1 Single-Hop Communication Service A Single-hop Communication Service (SCS) implements the basic communication primitives (i.e., timely communication with neighbor nodes) assumed in the system model (see Section 2) and is formally specified in Table 1. These primitives are modeled through sendðm; qÞ, bcastðmÞ, and recvðmÞ events for sending (to a neighbor q or to all neighbors) and receiving a message m. Each message m is assumed to be unique and to include its sender id m:src.

TABLE 1 Single-Hop Communication Service Specifications

44

IEEE TRANSACTIONS ON MOBILE COMPUTING,

VOL. 6,

NO. 1,

JANUARY 2007

TABLE 2 Secure Topology Service Specifications

In a wireless node architecture, SCS functionalities are built on top of those offered by the Medium Access Control (MAC) subsystem, which coordinate node accesses to the wireless medium. We distinguish two main directions to MAC design: 1) Scheduled access approaches focus on achieving reliable and time-bounded communications with high probability and, hence, are well suited for implementing our SCS specifications. For example, in [15], the geographic area occupied by mobile nodes is divided into a number of cells. In each cell, the MAC subsystems of mobile nodes run a synchronous atomic multicast protocol to achieve distributed agreement in negotiating a set of Time Division Multiple Access (TDMA) slots in which nodes can transmit, so that transmission can be free of collisions. 2) Contention approaches focus on maximizing throughput and minimizing average packet delay at the expense of possible packet corruption and unpredictable transmission delay. Consider, for instance, the IEEE 802.11x MAC standard, which is widely used for wireless communication. The MAC subsystem transmits unicast frames in a RTS/CTS/DATA/ACK scheme that achieves both collision avoidance (RTS/CTS) and reliable delivery (ACK). As the scheme does not work for broadcast frames (multiple nodes would collide in responding to an RTS frame), these frames are sent without use of control frames and are subject to hidden terminal problems. Solutions have been proposed to overcome this limitation [16].

4.2 Secure Topology Service A Secure Topology Service (STS) discovers and authenticates bidirectional links up to two hops away and provides each node with a local topology view. (Two hops are necessary, since a node needs to authenticate its neighbors’ neighbors in order to securely participate in its neighbors’ innercircles.) Formally, we augment the system model with a topology function T opo : N  T ! 2N N . Condition ðp; qÞ 2 T opoðr; tÞ is indicated by saying that “ðp; qÞ 2 T opor holds at time t.” Node pairs ðp; qÞ are not ordered, i.e., we do not distinguish between ðp; qÞ and ðq; pÞ. We assume that there is a known time interval ST S such that an STS implementation satisfies the properties in Table 2. Intuitively, the Completeness property captures the ability to exclude untimely (e.g., broken) links, while the One-Hop Accuracy and Two-Hop Accuracy properties capture the ability to include one and two-hop timely links. The proposed STS implementation assumes that local clocks at neighboring nodes are kept (approximately) synchronized (e.g., by using the technique in [17]) and operates as follows: At each node, link authentication is

accomplished by periodic broadcasting of STS messages (with period ), which, to satisfy the two-hop accuracy property, are forwarded in a geographic region surrounding the source (e.g., a sphere having a radius Rfw of n times the node radio range). Whereas a small forwarding region is desirable, the region must be large enough to include a timely path between the source p and any node q two hops away from the source p, so that p’s STS messages can reach q. This is to avoid failure scenarios such as p and q being connected through an intermediate Byzantine node r that attempts to disrupt STS operation by omitting to forward STS messages from p to q. If the forwarding region is large enough, though, STS messages can be routed around r, and nodes p and q can discover each other. The format of an STS message originating from a correct node p is the following: hp; ts0p ; posp ; lp iKp , where ts0p is p’s monotonic timestamp (obtained by reading p’s local clock and used to avoid replay attacks), posp is p’s current position (used to limit forwarding of the STS message), lp is an authenticated list of p’s neighbors, and Kp is p’s private key and is used to sign the above fields. List lp authenticates bidirectional links between p and its neighbors. A list element hq; tsq ; Kp ðhq; tsq iÞ; tsp ; Kq ðhp; tsp iÞi corresponds to a neighbor q and contains: q’s id, the most recent q’s timestamp (tsq ) that p found in a q’s STS message it received, p’s signature of tuple hq; tsq i (used to authenticate p to q), the most recent p’s timestamp (tsp ) that p found in the neighbor list of a q’s STS message (mq ) it received, and q’s signature of tuple hp; tsp i (used to authenticate q to p), which p extracts from message mq . A correct node q receiving an STS message m at a time t from the network verifies that the message signature is valid and that if the message is not a forwarded message then the message timestamp ts0p is such that t  ts0p <  (to exclude an untimely link with p). If message m satisfies these criteria, then node q uses m’s content to create/update entries in its local topology T opoq for nodes that are at most two hops away. Node q inserts ðp; qÞ in T opoq if m’s list lp contains a list element for q with a timestamp tsq such that t  tsq < ST S =2 and a valid signature Kp ðhq; tsq iÞ. Node q inserts ðp; rÞ in T opoq , where r 6¼ q if ðq; rÞ 2 T opoq and list lp contains a list element for r with a timestamp tsp such that t  tsp < ST S =2 and a valid signature Kr ðhp; tsp iÞ. After this step, node q first removes from T opoq those entries that correspond to nodes that have not been inserted/updated in T opoq in the last ST S =2 time units, and then rebroadcasts m if q’s current position is within p’s forwarding region (as indicated by m).

BASILE ET AL.: INNER-CIRCLE CONSISTENCY FOR WIRELESS AD HOC NETWORKS

45

Theorem 4.1 (STS Correctness). The following propositions hold: 1. 2. 3.

The Completeness property holds. If ST S > 2 þ 4, then the One-Hop Accuracy property holds. Let k be the maximum STS forwarding time throught a timely k-hop path. If p and q are k-hop t-connected for ST S time units through a path contained in the intersection of p’s and q’s forwarding regions and ST S > 2 þ 2k, then the Two-Hop Accuracy property holds.

Proof. 1.

2.

3.

Suppose that u t ST S ðp 6 qÞ holds at time t. Also, let mp be any STS message that r receives from p at a time t0 2 ½t; t þ ST S . Since p and q cannot communicate timely in ½t; t þ ST S , the last STS message q receives and accepts from p has a timestamp ts0p < t (recall that the proposed STS implementation discards untimely STS messages). Therefore, message mp cannot include in lp an element for q with a timestamp tsp and a signature Kq ðhp; tsp iÞ unless tsp  ts0p < t. Thus, by time t þ ST S =2 node r has not inserted or updated link ðp; qÞ in T opor for ST S =2 time units, which leads to link ðp; qÞ being removed from T opor . As a result, the Completeness property holds. Suppose that u t ST S ðp 6 qÞ holds at time t. It is easy to see that, by time ts0q þ , node p has received an STS message mq with timestamp ts0q 2 ½t; t þ 

from q, and that, by time ts0q þ  þ , node p has sent an STS message mp such that mp ’s neighbor list lp includes an element for q with timestamp ts0q and signature Kp ðhq; ts0q iÞ. By hypothesis ts0q þ  þ   t  2 þ  < ST S and, thus, p and q are tconnected in ½t; ts0q þ  þ  . As a result, node q receives message mp by time t0  ts0q þ  þ 2. At that time, node q executes the STS algorithm described above. Note that t0  ts0q   þ 2, which leads to t0  ts0q < ST S =2 since by hypothesis ST S > 2 þ 4. Consequently, by time t0 , node q inserts ðp; qÞ in T opoq . Note also that t0  t þ ST S ; thus, the One-Hop Accuracy property holds. By hypothesis, ðq; rÞ 2 T opoq from time t to time t þ ST S . Therefore, node q has received an STS message mr from r having a timestamp ts0r and a list lr that contains an entry hq; tsq ; Kr ðhq; tsq iÞ; tsr ; Kq ðhr; tsr iÞi such that tsq 2 ½t; t þ ST S =2 . By time t0  tsq þ , node q has sent an STS message mq with a timestamp ts0q and a list lq that contains an element hr; ts0r ; Kq ðhr; ts0r iÞ; tsq ; Kr ðhq; tsq iÞi. Note that t0  t  ST S =2 þ , which in conjunction with the hypothesis that ST S > 2 þ 2k, guarantees that t0  t þ k  ST S =2 þ  þ k < ST S . Thus, nodes q and p are k-hop t-connected sufficiently enough for p to receive mq , say, at

Fig. 5. Inner-circle voting service algorithms. (a) Deterministic voting algorithm. (b) Statistical voting algorithm.

time t00  tsq þ  þ k. On receiving mq , node p checks that t00  tsq < ST S =2 and inserts ðq; rÞ in T opop . Note that t00  t < ST S and, thus, the Two-Hop Accuracy property holds. u t

4.3 Inner-Circle Voting Service An Inner-circle Voting Service (IVS) implements two voting schemes (see Fig. 5). Deterministic voting prevents illegitimate values from being propagated in the wireless network. A value v agreed upon by c’s inner-circle nodes is the value initially proposed by c, and it is agreed upon only if it complies with an application-dependent criterion f (the check method of the Inner-circle Callbacks in Fig. 3). Statistical voting improves a proposed value vc ’s accuracy. The value agreed upon by the inner-circle nodes is obtained by the statistical fusion of an original value vc from c with corresponding values vp from other inner-circle members p via a fault-tolerant fusion function f (the fuseVal method of the Inner-circle Callbacks). Importantly, function f must cope with arbitrary values sent by faulty/malicious innercircle members (see Section 4.4). The objective of statistical voting is to enable reliable and secure in-network processing, where intermediate nodes aggregate data collected from a group of nodes before forwarding the aggregated data to the proper destinations. Traditional fault tolerance algorithms provide a fixed level of dependability (e.g., resilience to Byzantine failures of up to 1/3 of the nodes) due to the use of the outcome (e.g., a yes/no decision) agreed upon by a majority of correct participants. In contrast, we take a more flexible approach and allow the outcome to be dictated by a specifiable number of correct participants. Inner-circle voting algorithms are parameterized by a dependability level L, associated with each agreed message, that indicates the number of inner-circle nodes that must cooperate with a center (or source) node c. Based on the maximum number F of node failures to tolerate in an inner-circle—where FB are Byzantine, FC are crashes, and FL are due to broken links—and the number N of nodes in an inner-circle

46

IEEE TRANSACTIONS ON MOBILE COMPUTING,

VOL. 6,

NO. 1,

JANUARY 2007

TABLE 3 Inner-Circle Voting Service Specifications

(including the center node) at the time of initiating a voting algorithm, node c sets dependability level L so as to guarantee a minimum number T of non-Byzantine participants in each IVS execution that completes successfully. Following this approach, a recipient node y of an agreed message m can decide whether to accept m based on the message’s dependability level L and the importance of the information in m. It can be shown that setting L ¼ N  F  1 guarantees Agreement, Integrity, and Termination properties of IVS protocols (introduced below) for T ¼ L  FB . As a special case, fixing L þ 1 ¼ 2N=3 and ignoring FC and FL provides tolerance to N=3  1 Byzantine failures and guarantees that a majority of correct nodes must agree for the protocol to terminate; this scenario corresponds to standard Byzantine agreement algorithms. Agreement, Integrity, and Termination properties of IVS algorithms are formalized in Table 3—for simplicity, Table 3 formalizes IVS properties only for deterministic voting. Note that IVS algorithms are not required to terminate if the initiator node is faulty. Also, note that the scenario in which a node c moves after initiating a voting algorithm and, consequently, loses connectivity with M of its inner-circle nodes is modeled by setting FL  M. Thus, the termination property will hold only if node c maintains connectivity with a sufficient number of its inner-circle nodes while moving. If that is not the case, node c can initiate a new voting algorithm in its new inner-circle. Theorem 4.2 (IVS Correcntess). The following propositions hold: 1. 2. 3.

If L  T þ FB and T  0, then the Agreement property holds. The Integrity properties holds. If L  N  F  1, then the Termination property holds.

Proof. 1.

Sending a correctly assembled agreed message involves generating a signature KL , which requires L þ 1 inner-circle nodes to cooperate, including center node c. By hypothesis, at most FB nodes are Byzantine in an inner-circle. Consider two cases: .

.

If c is non-Byzantine, then we can guarantee a minimum number T of non-Byzantine innercircle nodes participating in an IVS execution if L  FB  T . If c is Byzantine, then we can guarantee a minimum number T of non-Byzantine

2.

3.

inner-circle nodes participating in an IVS execution if L þ 1  FB  T . In both cases, the hypothesis is that L  T þ FB guarantees minimum number T . Consider that a non-Byzantine inner-circle node p participates in an IVS execution only if condition fðvÞp ^ ðp; cÞ 2 T opop holds. Hence, the Agreement property holds. An agreed message m from c is signed with c’s private key Kc . If c is non-Byzantine (e.g., c has not disclosed Kc to other nodes), then a message signed with c’s private key can only be generated by c. Thus, the Integrity propery holds. Termination of an IVS algorithm requires L þ 1 participants, including the center node c. Note that L þ 1 participants are guaranteed if the number C of correct nodes in an inner-circle is equal to or greater than L þ 1. From the hypothesis that L  N  F  1 and considering that C ¼ N  F , we obtain that condition L  C  1 holds, which implies that the Termination property holds. u t

The following corollary immediately follows from the theorem above: Corollary 4.3. If L ¼ N  F  1 > 0, then Agreement, Integrity, and Termination properties hold for T ¼ L  FB . We can draw a parallel between the properties of innercircle voting algorithms with those of Byzantine agreement. Suppose that L ¼ T þ F and L ¼ N  F  1. If L þ 1 ¼ 2N=3, then 2N=3  1 ¼ N  F  1, which implies that F ¼ N=3 and that C ¼ 2N=3. Thus, for L þ 1 ¼ 2N=3, IVS algorithms can tolerate 1/3 of Byzantine node failures. Also, consider that 2N=3 ¼ T þ N=3, which implies that T ¼ N=3 ¼ C=2. Thus, for L þ 1 ¼ 2N=3, an IVS algorithm terminates only if a majority of correct (inner-circle) nodes have agreed. Before concluding this section, we mention that the Agreement property formulated in Table 3 does not prevent a node c from collecting partial signatures while moving in the network. For instance, a node c that does not receive enough consensus in a given geographical position could quickly move to a different area and seek additional consensus there. If needed, this scenario can be avoided. To enforce that c collects partial signatures only from nodes that are members of c’s inner-circle at the same time, we propose including in IVS messages a hash H of the c’s inner-circle membership list, as viewed by c. An inner-circle node p agrees with c’s proposed value if p’s view of c’s inner-circle members hashes to the same value H. The

BASILE ET AL.: INNER-CIRCLE CONSISTENCY FOR WIRELESS AD HOC NETWORKS

47

Fig. 7. Fault-tolerant cluster example. Fig. 6. Fault-tolerant cluster algorithm.

drawback of this solution is that sustained node movement can adversely affect IVS termination.

4.4 Fault-Tolerant Value Fusion This section discusses techniques to implement the faulttolerant fusion function f used in Section 4.3 to support reliable and secure in-network processing. The mathematical problem considered is the (fault-tolerant) estimation of an unknown (vector) parameter  from a set of L (vector) observations P ¼ fp1 ; . . . ; pL g that are corrupted by random noise such that pi ¼  þ Ni , where Ni are i.i.d. zero-mean random variables. In contrast with classical estimation theory, we allow up to a number F of these observations to be arbitrarily corrupted (beyond noise Ni ), owing to faults/attacks. A simple implementation of function f is given by a fault-tolerant mean algorithm originally proposed in the context of approximate agreement [18] and then applied to fault-tolerant in-network processing [19]. Fault-tolerant mean is just one of the many algorithms proposed for approximate agreement and clock synchronization [21], [22]. The limitation of these techniques, when applied to innetwork processing, is that they always discard a number of input observations, which results in limited accuracy even in the common case of no faulty data. High accuracy is important for the inner-circle approach because local value fusion is done on the data collected by a limited number of nodes—an inner-circle is not expected to have more than 10–15 members [22]. 4.4.1 Fault-Tolerant Cluster Algorithm This paper contributes with the proposal of a fault-tolerant cluster algorithm to generate, from a set P of L observations, b F T that is highly accurate yet robust to faulty/ an estimate  malicious data (in P ). To achieve this goal, we exclude from the estimation process only those observations that are likely to be faulty/malicious, i.e., that are inconsistent with the distribution indicated by the remaining observations. Before presenting the algorithm, however, a few definitions are required. Given a cluster (or set of data points) C and a point p 2 C, we define the distance dðp; CÞ of point p from cluster C: dðp; CÞ ¼ kp  centroidðCnpÞk;

ð1Þ

P where centroidðfp1 ; . . . ; pn gÞ ¼ ni¼1 pi =n. The justification for the above definition is that we want distance dðp; CÞ to be proportional to the information lost when excluding p from C. Indeed, consider a set C 0 ¼ Cnp of i.i.d. observations having a unimodal p.d.f., symmetric with respect to the mean . The farther point p is from —for which centroidðC 0 Þ is an estimator—the lower the probability of observing point p, and thus, the larger the information3 carried by such an observation. The fault-tolerant cluster CP is defined as the maximumsize subset of P such that each point p 2 CP has distance from CP less than a user-specified threshold : CP ¼ arg max jfp 2 C : dðp; CÞ  gj: C22P

ð2Þ

Parameter must be chosen so that two correct observations are at distance greater than only with negligible probability.4 The justification for the above definition is that we want to include in CP all observations p that best fit the underlying distribution. These observations should be such that, when considered together, the removal of one of them causes the loss of only a little information. Thus, we form CP by discarding all those observations that are unlikely to be correct. b F T is defined as the Finally, the fault-tolerant estimate  centroid of the fault tolerant cluster CP : b F T ¼ centroidðC Þ:  P

ð3Þ

Based on the above definitions, Fig. 6 provides a pseudocode of the proposed fault-tolerant cluster algorithm. Fig. 7 depicts an example in which the current cluster C comprises four points p1 ; . . . ; p4 , which are observations of a common (unknown) value  independently produced by b is four sensor nodes n1 ; . . . ; n4 . The current estimate  computed considering all available data and has poor accuracy due to p4 —in the figure, we suppose that point p4 is due to an error in node n4 ’s sensor, e.g., n4 ’s sensor was physically damaged by humidity and reports readings stuck at a high value. To decide whether to exclude point p4 , the b 4 as the fault-tolerant cluster algorithm first computes  centroid of points fp1 ; p2 ; p3 g and then computes the 3. In information theory, the information of an event e is defined in terms of its probability of occurrence P rfeg : IðeÞ ¼ log2 ð1=P rfegÞ. 4. For instance, in the experiments discussed in Section 4.2, we set to be five times the standard deviation of the random noise N that affects sensor readings.

48

IEEE TRANSACTIONS ON MOBILE COMPUTING,

Fig. 8. Monte Carlo simulation study of different fusion strategies. (a) No faults. (b) Calibration faults. (c) Positioning faults. (d) Stuck at zero faults.

b 4 and p4 . If distance d4 is greater than a distance d4 between  user-supplied threshold , then p4 is excluded from cluster C. In this case, the new estimate of  is going to be b b 4 , which is much more accurate than .  The fault-tolerant cluster algorithm cannot guarantee removal of (and only of) faulty/malicious data if these data are very similar to the correct data; however, the negative effect of such a case should be negligible. It can be shown that: 1) If the number F of faulty/malicious points is less C than half of the total points N, then condition F > 12F =N guarantees that only faulty/malicious points are removed, where C and F represent the maximum distance from b C —the estimate computed with only the correct points  —in the correct and faulty/malicious points, respectively. 2) The worst-case scenario corresponds to all faulty/ malicious observations clustering in a point p at a distance C b F ¼ 12F =N from C . Thus, the maximum estimation error b b E ¼ kC  F T k added by faulty/malicious observations is F F . For instance, the case in which one third of the E ¼ N points are erroneous ðF ¼ N=3Þ corresponds to F ¼ 3C b F T is in the and E ¼ C , which indicates that estimate  range of the correct observations. Experimental Evaluation. The performance of the proposed fault-tolerant cluster algorithm is evaluated in Fig. 8 through a Monte Carlo simulation study of a target location estimation problem (detailed in Section 4.2) while

VOL. 6,

NO. 1,

JANUARY 2007

using the standard least-squared-based trilateration approach, the fault-tolerant mean algorithm [18], and the fault-tolerant cluster algorithm. The experiments consider both error-free conditions and error conditions; in the latter case, 1/3 of the data is faulty and is generated according to a sensor failure model formalized in Section 4.2. The results in Fig. 8 clearly show that the fault-tolerant cluster algorithm provides the best performance among the studied algorithms. While the improvement over the faulttolerant mean can be justified by the ability to include all valid observations, the improvement over the least-squared method (even in the case of no faults) can be justified by the ability to exclude observations that are valid yet inconsistent (outliers) with the remaining observations. Hardware Implementation. The proposed fault-tolerant cluster algorithm was implemented as a dedicated hardware module on an FPGA device. Our hardware implementation guarantees high performance and low energy consumption, which makes the proposed design particularly attractive for battery-powered wireless sensor nodes. To demonstrate the advantages of a hardware solution versus a software implementation, Table 4 compares the performance of an FPGA-implemented Fault-Tolerant Cluster Processor with that of a StrongARM SA-1100 processor that executes in software the fusion algorithm implemented in hardware by the Fault-Tolerant Cluster Processor. Note that the design of the Fault-Tolerant Cluster Processor is parameterized in the maximum number of sensor observations L and the bit-size of each observation W . The data in Table 4 for Fault-Tolerant Cluster Processor corresponds to a design synthesized for L ¼ 15 and W ¼ 8, which results in a circuit that can operate up to 125 MHz and uses 365 slices (less than 4 percent of the area occupied by a DLX processor) on a VirtexE50e-8 FPGA device. Execution time and power consumption is obtained with Xilinx’s ISE toolset. On the other hand, the data for the StrongARM processor is obtained with JouleTrack [23]. The table clearly indicates that the hardware Fault-Tolerant Cluster Processor provides improvement of one to two orders of magnitude in both performance and energy consumption. Proofs. The remainder of this section formalizes and proves the properties of the fault-tolerant cluster algorithm discussed above. u t Theorem 4.4. The following propositions hold: 1. 2.

C If F < N=2 and F < 12F =N , then only faulty/ malicious points are removed. The maximum estimation error added by faulty/ F F . malicious observations is E ¼ N

TABLE 4 Performance of Hardware Fault-Tolerant Cluster Processor versus a StrongARM-Based Software Approach

BASILE ET AL.: INNER-CIRCLE CONSISTENCY FOR WIRELESS AD HOC NETWORKS

49

Summarizing, the fault-tolerant clustering makes no mistake if     N b C  pf  bC þ  p     c C F C > ; ð12Þ b C  pf k F k

Proof. 1.

Consider a set P of N observations, C of which are correct and F of which are faulty/malicious. Let pci be a generic correct observation and pfi be a generic faulty/malicious observation. From the pseudocode of Fig. 6, it easily follows that the fault-tolerant cluster algorithm makes a mistake, i.e., removes a correct observation pcC instead of a malicious observation pfF , only if  !   F C 1 X X 1   pfi þ pci  pcC    N  1 i¼1 i¼1  !   F 1 C X X 1   pfi þ pci ; > pfF    N  1 i¼1 i¼1 which implies that     F C 1 X X   p fi  pci  ðN  1ÞpcC    i¼1 i¼1     F 1 C X X   > ðN  1ÞpfF  pfi  pci :   i¼1 i¼1

which is implied by    N bC  b C  pf k p     þ k c C F C > b C  pf k F k b N kpcC  C k þ 1: ¼ b C  pf k F k

ð4Þ

Note that (13) is implied by a looser bound C N C >1þ ; F F F

ð5Þ

F >

2.

        C1 C X X     pci  > Cpf  pci : ðN  1ÞpcC  F pf      i¼1 i¼1 ð6Þ b C be the estimate obtained considering Let  only correct points: bC ¼ 1  C

p ci :

ð14Þ

b C  pc k is the maximum where C ¼ max1iC k i error among correct observations, and C ¼ b C  pf k is the minimum error among min1iF k i faulty/malicious observations. Finally, a necessary condition for the faulttolerant cluster algorithm to make no mistake is the following:

The worst case scenario is when all faulty/ malicious points have the same value pf , because in this case, the error inflicted by a faulty/ malicious point is not elided by the error inflicted by another faulty/malicious point. Considering, thus, the worst case scenario, i.e., pfi ¼ pf , (5) becomes

C X

ð13Þ

C : F 1  2N

C Let F ¼ 12 F . If a malicious point is at a distance N b C , then the fault-tolerant greater than F from  cluster algorithm discards the point. However, if all malicious points cluster at a point p F at b C , then they inflict a maximum distance F from  b b C k over the correct estimate error E ¼ kF T   b C . Error E can be computed as follows: 

bFT   bCk E ¼ k ! !  1 X C F C X 1 X   ¼ pci þ p fi  pci ;  N i¼1 C i¼1 i¼1

ð7Þ

i¼1

  C F C  N X  1X   E ¼ pci þ pfi ;  CN i¼1 N i¼1 

Using the definition above, we can simplify (8) as indicated in the following steps:      b C  pc  F pf  b C k; ð8Þ ðN  1ÞpcC  C   > Ckpf   C

ð15Þ



ð16Þ

ð17Þ

   F X C   F    F b C E ¼  pci þ pF  ¼ p F   ; ð18Þ  CN i¼1 N  N

b C  F pf k > Ckpf   b C k; kNpcC  C    N  Cb b  F  pcC  C  pf   > CkC  pf k; F F

ð9Þ

and, finally, as follows: E ¼

ð10Þ

F  : N F

ð19Þ u t

b pc  NF C kN F C  pf k < F C b F   kC  pf k  N bC þ  b C  pf   F pcC    : ¼ b C  pf k k

5 ð11Þ

APPLICATION EXAMPLES

The previous sections have introduced the notion of innercircle consistency and have presented a wireless node

50

IEEE TRANSACTIONS ON MOBILE COMPUTING,

Fig. 9. Mounting a black hole attack.

architecture that supports the proposed approach. The next sections demonstrate the inner-circle approach in two significant wireless application scenarios: 1) the neutralization of black hole attacks in AODV networks and 2) the neutralization of sensor errors in a wireless sensor network.

5.1

Reliable and Secure AODV Networks: Black Hole Attack Case Study We applied the inner-circle approach to neutralize security attacks that disrupt the operation of the Ad hoc On-demand Distance Vector (AODV) routing protocol in wireless networks [2]. As a representative example of security attacks, we focus on black hole attacks, in which a malicious node M advertises itself to other nodes as having the shortest (or the most recent) path to a node D, whose packets it wants to intercept. In AODV, a route is discovered only when needed. As depicted in Fig. 9a, a source S requests a route to a destination D by flooding a Route Request message (RREQ) into the network. When the RREQ message reaches D (see Fig. 9b), that node constructs a Route Reply message (RREP) and sends it to the node from which it received the first RREQ message (node N3 in the figure). The RREP message is unicast back to S using the reverse route through which the RREQ was received. Hence, forwarding nodes update their routing tables to create a route from S to D. Both RREQ and RREP messages include a destination sequence number that is used to distinguish fresher routes from older ones. This field can be exploited to mount a black hole

VOL. 6,

NO. 1,

JANUARY 2007

attack: a malicious node M replies to a received RREQ message with a malicious RREP message that has a large destination sequence number.5 In Fig. 9c, M’s RREP message is more recent than the N3 ’s RREP message (having a destination sequence number 20 greater than 5); hence, node N2 sets in its routing table node M as the next hop for forwarding data packets from S to D. Black hole attacks are very difficult to detect and protect against because the mere use of user authentication and signed routing information cannot prevent compromised nodes from generating correctly signed yet malicious routing packets. Some work has attempted to cope with black hole attacks in AODV networks. The proposed techniques require changes to AODV, have limited coverage and unbounded detection latency because they are based on network-wide mechanisms, and cannot cope with attack variations (gray hole attacks) in which a malicious node behaves most of the time as a good node and only sporadically as a black hole node [5], [6], [24]. For instance, in [6], detection of black hole attacks is done by the destination nodes. When a route is broken owing to node movement, a source S sends a new RREQ message to a destination D. At this point, D may detect that its local sequence number is lower than the destination sequence number included in the received RREQ message. This inconsistency indicates that a black hole attack was mounted on the previous route from S to D. We argue that coping with black hole attacks with such mechanisms is not effective because the detection coverage is limited and the detection latency can be arbitrarily large. We propose using the innercircle approach to neutralize black hole attacks at the source. By exploiting the redundant routing information that is naturally present in an ad hoc network, we can prevent the diffusion of malicious RREP messages in the network. In our approach, each wireless node embeds the innercircle framework, which is configured to intercept incoming/outgoing RREP messages and to run a deterministic voting service that checks the validity of received RREP messages. The execution of the deterministic voting protocol is adapted to our case study by the instantiation (e.g., in a Linux shared library) of the Inner-circle Callbacks (see Section 4) in Fig. 11, which implement AODV-specific actions that prevent black hole attacks. The mechanism is such that each wireless node maintains a mapping fw that associates a pair ðD; dseqnoÞ—where dseqno is the destination sequence number—with the set of nodes (depicted in brackets in Fig. 10) that are allowed to forward messages to D when the active route to D has destination sequence number dseqno. The operation is depicted in Fig. 10 (for an execution example that assumes a dependability level L ¼ 1) and is described below: .

.

AODV service at a node c (initially, node D in Fig. 10a) sends a RREP message designating node next_hop as the next hop in the process of sending the RREP message back to source S. Node c’s Inner-circle Interceptor intercepts the RREP message and passes it to c’s Inner-circle Voting Service, which executes a deterministic voting algorithm.

5. RREP messages also contain a hop-count field that can be similarly exploited. Although this section focuses only on exploits of the destination sequence number, the discussion can be generalized to the hop count case.

BASILE ET AL.: INNER-CIRCLE CONSISTENCY FOR WIRELESS AD HOC NETWORKS

51

Fig. 11. Inner-circle Callbacks pseudocode for black hole attack neutralization.

Fig. 10. Neutralizing a black hole attack.

During the voting algorithm execution, each node p that is an inner-circle node of c (N3 and N4 in Fig. 10a) verifies the validity of c’s RREP message (check function in Fig. 11). Node p agrees only if c is the destination of the sought route ðc ¼ DÞ or p considers c as a valid forwarding node to the destination (c 2 fwðD; dseqnoÞ), and both c and RREP message’s next hop are authenticated neighbors of p. . If L inner-circle nodes agree with c’s proposed RREP, then c assembles an agreed message and sends it to all its inner-circle nodes, including c’s designated next hop (as indicated by the getAgrDst function in Fig. 11). . On receiving the agreed message (the onAgreed function in Fig. 11), an inner-circle node p includes both c and next_hop in its set fwðD; dseqnoÞ. If p is c’s designated next hop, then p passes the RREP message encapsulated in the agreed message to its local AODV service. . The operation continues with node p’s AODV service sending an RREP message heading back to S (node N3 in Fig. 10b). In Fig. 10d, a malicious node M sends an invalid RREP message that never gets approved by M’s inner-circle nodes and, thus, never propagates in the network. It is easy to see that if the dependability level L is chosen so as to guarantee at least one non-Byzantine inner-circle node other than the center node (i.e., T ¼ 1, see Section 4.3), then the proposed .

mechanism guarantees that only valid routes are established, i.e., it is impossible for a malicious node M to diffuse a malicious RREP message for a destination D if M is not on a path to destination D. Experimental Evaluation. We have used the ns-2 network simulator [4] to study the effectiveness of the proposed inner-circle approach in neutralizing black hole attacks. The simulation parameters6 and the results are reported in Fig. 12, respectively. For simplicity, our simulations do not allow node clocks to drift and, thus, do not model the execution of a clock synchronization protocol underlying the STS service. Note that the effect of such an approximation on the results in Fig. 12 would be negligible since the execution of a clock synchronization protocol would involve only neighboring nodes (see Section 4.2). In the figure, we distinguish two main configurations: In a real STS case, we assume that the local network topology is discovered by the STS implementation delineated in Section 4.2. Basically, two-hop connectivity is discovered by periodic broadcasting and local forwarding of authenticated I-am-alive messages. In an ideal STS case, we assume an ideal topology service that provides the most up-to-date information without exchanging network messages. We use this idealistic scenario to study the overhead induced by the proposed STS implementation. Fig. 12a shows the overall network throughput (measured as the total number of packets received in the network divided by the total number of packets sent in the network). A significant result is that a single malicious node is capable of reducing a 96 percent throughput (with no attack) to less than 10 percent throughput (with attacks) in a network of 50 nodes. The throughput quickly degrades to 0.7 percent when 20 malicious nodes are present. On the other hand, the inner-circle approach pays the price of a suboptimal 75–85 percent throughput in the absence of attacks (due to underlying STS and IVS communication), but significantly reduces the effect of the malicious nodes by maintaining the throughput up to 50–65 percent in the presence of 20 malicious nodes. Fig. 12b shows a node’s average energy consumption. The results indicate that the energy overhead brought by the inner-circle approach ranges from 4–7 percent in the absence of attacks to less than 15 percent in the presence of attacks. Note that when 6. The value of parameter  is selected to be larger than the measured one-hop communication delay (35 ms on average in the discussed experiments).

52

IEEE TRANSACTIONS ON MOBILE COMPUTING,

VOL. 6,

NO. 1,

JANUARY 2007

Fig. 12. Simulation study of black hole attack. (a) Network throughput. (b) Energy consumption. (c) Number of attempted attacks. (d) Number of successful attacks.

the inner-circle approach is not used, black hole attacks result in reduced energy consumption because of the reduced number of messages delivered in the network. Figs. 12c and 12d show the number of attempted attacks and successful attacks, respectively. Clearly, the number of attempts is much larger in the case of the inner-circle approach because none of them is successful (as shown in Fig. 12d) and, hence, malicious nodes keep trying to attack. Finally, we note that the use of the proposed STS implementation involves only marginal overhead in the studied experimental setup.

5.2

Reliable and Secure Sensor Networks: Faulty Sensors Case Study This section demonstrates an application of the inner-circle consistency approach to improve sensor data accuracy in spite of sensor errors. The scenario considered is a wireless sensor network deployed in a remote region R to detect and localize events of interest. It is assumed that a target event at a location u emits an energy signal Si ðuÞ that can be measured by a sensor node i at location si . In addition, the strength of the emitted signal is assumed to decay polynomially with the distance, as modeled below [19]: ( K T if d < d0 ; ð20Þ Si ðuÞ ¼ K T otherwise; ðd=d0Þk where K is the power emitted at the target’s location u, T is the sensor’s sampling duration, d ¼ ku  si k is the distance between the target and the sensor, and d0 is a constant determined by the physical sizes of the target and of the sensor. Since a sensor’s energy measurements are usually corrupted by random environmental/measurement noise, the total energy measured by sensor i is assumed to be Ei ¼ Si ðuÞ þ Ni2 , where Ni  N ð0; N Þ.

The task of each sensor node is to detect the presence of a nearby target and to estimate the target’s position. To detect the presence of targets, sensor nodes follow the NeumannPerson strategy: Sensor i detects a target if the sensed energy Ei is greater than a predetermined threshold , a parameter chosen to maximize the detection probability PD while keeping the false alarm probability PF below a desired threshold . To localize a detected target, sensor nodes use their own position si as estimations. Once a sensor i detects and localizes a target, its task is to send a target notification hti ; Ei ; ui i to the base station, where ti is the detection time, Ei is the energy level with which the target was sensed, and ui is the target’s estimated position. The directed diffusion protocol stack is used to support this communication [14]. Sensor devices interact directly with the environment and, hence, are subjected to a variety of physical, chemical, and biological forces; this makes them degrade fairly quickly. Field studies [7] indicate that errors originating in degraded sensor devices are a major cause of unreliability in a wireless sensor network. Interestingly, these sensor failures are likely to manifest days before the sensor electronics may fail. Based on these results, we assume the following sensor fault model: . .

Stuck at Zero, a faulty sensor i constantly reports a fixed reading (a zero reading in this study): Ei ¼ 0. Calibration Error, a faulty sensor i’s readings are affected by a multiplicative error: Ei ¼ clbr ðSi þ Ni2 Þ:

BASILE ET AL.: INNER-CIRCLE CONSISTENCY FOR WIRELESS AD HOC NETWORKS

53

Fig. 13. Simulation study of a faulty sensor network. (a) Miss alarm probability. (b) False alarm probability. (c) Energy consumption with target. (d) Energy consumption with no target. (e) Target decision latency. (f) Target localization error.

Signal Interference, a faulty sensor i reports readings affected by strong environmental disturbances: Ei ¼ Si þ intf Ni2 , with intf >> 1. . Positioning Error, a faulty sensor i has an incorrect estimate of its own position si : si  UniformðRÞ. The remainder of this section contrasts two solutions to the detection/localization problem formulated above: 1) a centralized solution, where the base station collects raw target notifications hti ; Ei ; ui i as they are generated by the sensor nodes, and 2) an inner-circle solution, where each wireless sensor node embeds the inner-circle framework, which is configured to intercept incoming/outgoing directed diffusion messages that carry target notifications hti ; Ei ; ui i and to run the statistical voting service. The execution of the statistical voting protocol is adapted to our case study by the instantiation (e.g., in a TinyOS component) of Innercircle Callbacks (see Section 4) that implement sensorspecific actions to prevent both faulty and redundant data propagation. Due to space limitations, formal specifications of these actions are relegated to [25]. To carry out reliable and secure in-network processing, the inner-circle solution uses statistical voting to improve the fidelity of each field of a target notification hti ; Ei ; ui i, where the fault-tolerant cluster algorithm7 of Section 4.4 is used as the fault-tolerant fusion function. In particular, a target’s position is estimated locally by 1) computing the distance di of each inner-circle sensor i from the target (by using (20), 2) using a trilateration algorithm on each triple of pairs ðui ; di Þ to obtain target location estimates pi , and .

7. Parameter is set based on the standard deviation of noise Ni 2 .

  3) filtering the obtained L3 estimates pi with the faulttolerant cluster algorithm. Experimental evaluation. We have used the ns-2 network simulator [4] to study the effectiveness of the proposed innercircle approach in coping with sensor errors. The simulation parameters and the results are reported in Fig. 13. Fig. 13a shows the probability of missing a valid target, which is zero for all configurations considered. Fig. 13b shows the probability of a spurious target detection. In the centralized case (marked as “No IC” on all graphs), this probability can be as high as 19 percent (under the signal interference fault model), while the inner-circle solution can reduce it to zero by exploiting target information shared by neighboring nodes. Fig. 13c and Fig. 13d show a node’s average energy consumption both when there is a target and when there is no target. The figures indicate an over 50 percent reduction in energy consumption (due to the ability of suppressing both duplicate and spurious detections), which can result in doubling overall network lifetime. Fig. 13e and Fig. 13f show the target detection latency (i.e., the time elapsed between when a target pops up and when the base station receives the first notification) and the target localization error (i.e., the distance between the real target location and the location estimated by the sensor network), respectively. The innercircle solution provides a six-fold reduction in detection latency, and between a four-fold and five-fold reduction in localization error (for inner-circle sizes over four nodes). In conclusion, the inner-circle approach provides notably improved performance in the considered configuration and fault models. While this result is significant, we cannot expect the same improvement if node density is sparse or,

54

IEEE TRANSACTIONS ON MOBILE COMPUTING,

alternatively, if the signal emitted by the target is weak and is detected only by very few neighboring nodes (e.g., 1–2). To study this point further, we have re-run the experiments while using a weaker target signal ðK T ¼ 10;000Þ. Apart from the miss alarm probability, all considered metrics show an improvement similar to that of Fig. 13. The miss alarm probability, however, increases up to 2–5 percent for inner-circle sizes greater than five nodes, where the worst cases correspond to signal interference and stuck-at-zero fault modes. Consequently, while designing an inner-circle sensor network, it is important to balance the need to locate sensors densely so that they can check malfunction of their neighbors (by observing the same events) and locate them sparsely to save resources. How to accomplish this trade-off is application dependent and must be studied case by case.

6

RELATED WORK

Providing availability despite crashes and node movement has been investigated by a number of researchers who have attempted to migrate fault tolerance techniques for distributed systems (e.g., replication, group communication, checkpointing) to the wireless domain [26], [27], [28]. Because of their complexity (heavy interaction between non-neighbor nodes), these techniques have rarely been extended to more complex failure models (e.g., data corruption, Byzantine faults) [29]. Providing security guarantees (e.g., integrity, authentication, authorization) is the goal of a significant body of literature in wireless ad hoc networks (including wireless sensor networks). The focus of the work is largely on distributing trust among the nodes [30], [31], [32] and on protecting network services (e.g., routing protocols) against attacks from the outside [33], [34]. However, only a few studies have started to investigate the protection of wireless environments against internal attacks [35], [36], [37]. In [37], the authors focus on maintaining wireless service availability and present a mechanism to stimulate users to keep their mobile devices turned on, to refrain from overloading the network, and to thwart tampering aimed at converting the device into a selfish one. In [36], intrusion tolerance is attempted by means of admission control and threshold cryptography. In order to participate in the network, a node must acquire a token (in practice, a certificate) from its neighbors. The token is collectively signed by K neighbors (through threshold cryptography) and must be renewed periodically. The scheme presupposes a local intrusion-detection module running on each node and, under the assumption that a malicious node n is eventually detected by a sufficient number of its neighbors, guarantees that malicious node n is eventually denied access to the network (its neighbors will revoke its token). The scheme fails if a malicious node changes its neighborhood before being detected and convicted by its previous neighbors. Consequently, the network may be left vulnerable to attacks for arbitrarily long periods—consider, for instance, a malicious node n that injects malicious attacks intermittently while roaming the network. It could be argued that the proposed inner-circle approach shares notions of threshold cryptography and localized message signing with [36]. A significant difference, though, is to be found in the masking nature of inner-circle voting, which prevents erroneous data propagation and, hence, avoids the problems of [36].

VOL. 6,

NO. 1,

JANUARY 2007

The concept of intrusion and fault tolerance has been extensively studied in the context of distributed systems and has resulted in a number of prototypes [3], [38], [39]. The inner-circle consistency approach differs substantially from these techniques, as it is specifically designed for mobile wireless environments. Typical intrusion and fault tolerance techniques forcibly replicate data and computation on multiple nodes in order to mask errors/intrusions in a server application run on these nodes. In contrast, inner-circle consistency leverages the partially replicated data/computation that is naturally available in a locality of a sender node to neutralize errors/attacks originating from that node.

7

CONCLUSIONS

This paper proposes the notion of inner-circle consistency to protect wireless ad hoc networks from errors and attacks. Through local node interaction, errors/attacks are neutralized at the source, both preventing their propagation in the wireless network and improving the fidelity of the propagated information. Thus, an unreliable and insecure wireless network is transformed into a dependable network substrate on top of which applications benefit from improved network reliability and security. This goal is achieved by combining statistical (a proposed fault-tolerant cluster algorithm) and threshold cryptography techniques with application-aware checks to exploit the data/computation that is partially and naturally replicated in wireless applications. A formally specified inner-circle framework is prototyped with the ns-2 network simulator and used to demonstrate the idea of inner-circle consistency in two significant wireless scenarios: 1) the neutralization of black hole attacks in AODV networks and 2) the neutralization of sensor errors in a target detection/localization application executed over a wireless sensor network.

ACKNOWLEDGMENTS This work was supported in part by a Vodafone fellowship, US National Science Foundation grant CNS-0406351, MURI grant N00014-01-1-0576, the Gigascale Systems Research Center (GSRC/MARCO), and the Motorola Corporation as part of Motorola Center. The authors thank Fran Baker for insightful editing of this manuscript.

REFERENCES [1] [2] [3]

[4] [5] [6]

C. Basile, M.-O. Killijian, and D. Powell, “A Survey of Dependability Issues in Mobile Wireless Networks,” technical report, LAAS-CNRS, Toulouse, 2003. C.E. Perkins, E.M. Belding-Royer, and I. Chakeres, “Ad Hoc on Demand Distance Vector (AODV) Routing,” IETF Internet Draft, technical report, 2003. G.P. Saggese, C. Basile, L. Romano, Z. Kalbarczyk, and R. Iyer, “Hardware Support for High-Performance, Intrusion- and FaultTolerant Systems,” Proc. IEEE Symp. Reliable Distributed Systems (SRDS ’04), 2004. The Network Simulator—ns-2, http://nsnam.isi.edu/nsnam/ index.php, 2006. W. Wang and B. Bhargava, “On Vulnerability and Protection of Ad Hoc On-Demand Distance Vector Prototol,” Proc. Int’l Conf. Telecomm., 2003. S. Ramaswamy, H. Fu, and M. Sreekantaradhya, “Prevention of Cooperative Black Hole Attack in Wireless Ad Hoc Networks,” Proc. Int’l Conf. Wireless Networks, 2003.

BASILE ET AL.: INNER-CIRCLE CONSISTENCY FOR WIRELESS AD HOC NETWORKS

[7]

[8]

[9] [10]

[11]

[12] [13]

[14]

[15]

[16]

[17]

[18] [19]

[20] [21]

[22]

[23] [24] [25]

[26]

[27]

[28]

[29]

[30]

[31]

[32] [33]

[34]

R. Szewczyk, J. Polastre, A. Mainwaring, and D. Culler, “Lessons from a Sensor Network Expedition,” Proc. European Workshop Wireless Sensor Networks, 2004. C. Savarese, J. Rabay, and K. Langendoen, “Robust Positioning Algorithms for Distributed Ad-Hoc Wireless Sensor Networks,” Proc. USENIX Technical Conf., 2002. V. Shoup, “Practical Threshold Signatures,” Lecture Notes in Computer Science, vol. 1807, pp. 207-218, 2000. A. Herzberg, S. Jarecki, H. Krawczyk, and M. Yung, “Proactive Secret Sharing or: How to Cope with Perpetual Leakage,” Lecture Notes in Computer Science, vol. 963, 1995. F. Cristian and C. Fetzer, “The Timed Asynchronous Distributed System Model,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 6, June 1999. J. Douceur, “The Sybil Attack,” Proc. Int’l Workshop Peer-to-Peer Systems (IPTPS ’02), 2002. L. Lamport, R. Shostak, and M. Pease, “The Byzantine Generals Problem,” ACM Trans. Programming Languages and Systems, vol. 4, no. 3, 1982. C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks,” Proc. MobiCom, 2000. R. Cunningham and V. Cahill, “Time Bounded Medium Access Control for Ad Hoc Networks,” Proc. Principles of Mobile Computing (POMC ’02), 2002. M.T. Sun, L. Huang, A. Arora, and T.H. Lai, “MAC Layer Multicast in IEEE Wireless Networks,” Proc. Int’l Conf. Parallel Processing (ICPP ’02), 2002. B. Barak, S. Halevi, A. Herzberg, and D. Naor, “Clock Synchronization with Faults and Recoveries,” Proc. Symp. Principles of Distributed Computing, 2000. D. Dolev et al. “Reaching Approximate Agreement in the Presence of Faults,” J. ACM, vol. 33, no. 3, pp. 499-516, 1986. T. Clouqueur, K.K. Saluja, and P. Ramanathan, “Fault Tolerance in Collaborative Sensor Networks for Target Detection,” IEEE Trans. Computers, vol. 53, no. 3, pp. 320-333, Mar. 2004. L. Lamport and P.M. Melliar-Smith, “Synchronizing Clocks in the Presence of Faults,” J. ACM, vol. 32, no. 1, pp. 52-78, 1985. M.H. Azadmanesh and R.M. Kieckhafer, “New Hybrid Fault Models for Asynchronous Approximate Agreement,” IEEE Trans. Computers, vol. 45, no. 4, pp. 439-449, Apr. 1996. L. Kleinrock and J. Silvester, “Optimum Transmission Radii for Packet Radio Networks or Why Six Is a Magic Number,” Proc. IEEE Nat’l Telecomm. Conf., 1978. “Jouletrack,” http://carlsberg.mit.edu/JouleTrack/, 2005. H. Deng, W. Li, and D.P. Agrawal, “Routing Security in Wireless Ad Hoc Network,” IEEE Comm. Magazine, 2002. C. Basile, Z. Kalbarczyk, and R. Iyer, “Neutralization of Error and Attacks in Wireless Ad Hoc Networks,” technical report, Univ. of Illinois at Urbana-Champaign, 2005. T. Hara, “Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility,” Proc. INFOCOM, pp. 1568-1576, 2001. Q. Huang, C. Julien, G. Roman, and A. Hazemi, “Relying on Safe Distance to Ensure Consistent Group Membership in Ad Hoc Networks,” Proc. Int’l Symp. Consumer Electronics, 2001. R. Prakash and M. Singhal, “Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems,” IEEE Trans. Parallel Distributed Systems, vol. 7, no. 10, pp. 1035-1048, Oct. 1996. J. Luo, J.-P. Hubaux, and P. Eugster, “PAN: Providing Reliable Storage in Mobile Ad Hoc Networks with Probabilistic Quorum Systems,” Proc. MobiHoc, pp. 1-12, 2003. N. Asokan and P. Ginzboorg, “Key-Agreement in Ad-Hoc Networks,” Computer Comm., vol. 23, no. 17, pp. 1627-1637, 2000, citeseer.ist.psu.edu/asokan99key.html. H. Luo, P. Zefros, J. Kong, S. Lu, and L. Zhang, “Self-Securing Ad Hoc Wireless Networks,” Proc. IEEE Symp. Computers and Comm., 2002. W. Du et al., “A Key Management Scheme for Wireless Sensor Networks Using Deployment Knowledge,” Proc. INFOCOM, 2004. S. Ghazizadeh, O. Ilghami, E. Sirin, and F. Yaman, “SecurityAware Adaptive Dynamic Source Routing Protocol,” Proc. IEEE Conf. Local Computer Networks, 2002. M.G. Zapata, “Secure Ad Hoc On-Demand Distance Vector (SAODV) Routing,” Internet Draft draft-guerrero-manetsaodv-00. txt, 2002.

55

[35] S. Marti, T.J. Giuli, K. Lai, and M. Baker, “Mitigating Routing Misbehavior in Mobile Ad Hoc Networks,” Proc. Int’l Conf. Mobile Computing and Networking, pp. 255-265, 2000. [36] X.M.H. Yang and S. Lu, “Self-Organized Network Layer Security in Mobile Ad Hoc Networks,” Proc. MobiCom, 2002. [37] L. Buttyan and J. Hubaux, “Enforcing Service Availability in Mobile Ad-Hoc Wans,” Proc. Workshop Mobile Ad Hoc Networking and Computing, 2000. [38] L. Zhou et al., “Coca: A Secure Distributed Online Certification Authority,” ACM Trans. Computer Systems, vol. 20, no. 4, 2002. [39] MAFTIA Project, http://www.newcastle.research.ec.org/maftia/, 2003. Claudio Basile received the Laura degree (summa cum laude) in computer engineering from the University of Naples Federico II, Italy, and the MS and PhD degress in electrical and computer engineering from the University of Illinois at Urbana-Champaign, where he was in part supported by a Vodafone fellowship. He has recently joined Google in Mountain View, California. His research interests include the design and validation of secure and reliable distributed computing systems, with particular focus on techniques for software/ hardware-implemented dependability in wireless ad hoc networked systems. He is a member of the IEEE. Zbigniew Kalbarczyk received the PhD degree in computer science from the Technical University of Sofia, Bulgaria. After that, he worked as an assistant professor in the Laboratory for Dependable Computing at Chalmers University of Technology in Gothenburg, Sweden. He is currently a principal research scientist in the Center for Reliable and High-Performance Computing in the Coordinated Science Laboratory at the University of Illinois at UrbanaChampaign, where he is a lead researcher on the project to explore and develop high availability and security infrastructure capable of managing redundant resources across interconnected nodes, to foil security threats, detect errors in both the user applications and the infrastructure components, and recover quickly from failures when they occur. His research also involves developing automated techniques for the validation and benchmarking of dependable computing systems. He has served as program cochair for the International Performance and Dependability Symposium (IPDS), a track of the Conference on Dependable Systems and Networks (DSN ’02), and is regularly invited to work on the program committees of major conferences on design of fault-tolerant systems. His research interests are in the area of reliable and secure networked systems. He is a member of the IEEE and the IEEE Computer Society. Ravishankar K. Iyer is the director of the Coordinated Science Laboratory (CSL) at the University of Illinois at Urbana-Champaign, where he is the George and Ann Fisher Distinguished Professor of Engineering. He holds appointments in the Department of Electrical and Computer Engineering and the Department of Computer Science, and is codirector of the Center for Reliable and High-Performance Computing at CSL. He also currently leads the Chameleon-ARMORs project at Illinois, which is developing adaptive architectures for supporting a wide range of dependability and security requirements in heterogeneous networked environments. His research interests are in the area of reliable networked systems. He has received several awards, including the Humboldt Foundation Senior Distinguished Scientist Award for excellence in research and teaching, the AIAA Information Systems Award and Medal for “fundamental and pioneering contributions towards the design, evaluation and validation of dependable aerospace computing systems” and the IEEE Emanuel R. Piore Award “for fundamental contributions to measurement, evaluation and design of reliable computing systems.” He is a fellow of the IEEE, ACM, an associate fellow of the American Institute for Aeronautics and Astronautics (AIAA), and a member of the IEEE Computer Society.