Sensor Scheduling for Aggregate Monitoring in Wireless ... - CiteSeerX

1 downloads 0 Views 107KB Size Report
School of Information and Computer Science, University of California. Irvine, CA ... Abstract. Wireless sensor network is characterized by their capa- ... and top- down within an abstract tree structure. ..... which data is sent from all nodes to the AP, and top-down .... greedy approach: it selects the first available free time slot.
Sensor Scheduling for Aggregate Monitoring in Wireless Sensor Networks Xingbo Yu, Sharad Mehrotra, Nalini Venkatasubramanian School of Information and Computer Science, University of California Irvine, CA 92697, USA fxyu, sharad, [email protected] Abstract

for “drop and play” applications to execute over such devices. Efforts are now ongoing to develop powerful programming environments to facilitate application development over sensor networks. In particular, both hardware and software designs of micro sensing devices enable them to be deployed in large-scale monitoring applications. It has been recognized that a flexible framework for queries is a key component of a sensor application development environment [21, 2]. Many of the recent works [22, 17, 27, 21] have made strong efforts on building a querying functionality as part of the basic programming environment for sensor networks. Much as in traditional SQL, such a querying module will allow application writers to declaratively specify a set of sensors of interest (e.g., sensors in a given geographical region or whose values satisfy certain conditions) and perform operations (such an aggregation) over the data of the selected sensors. This querying functionality has been shown to dramatically simplify the application development by shielding the application writers from the complexities of the sensor environments, such as networking mechanisms. It also handles the optimization issues, such as energy management and timeliness as discussed in this paper, transparently without the notice from application users. In this paper, our concern is on techniques to implement a query mechanism to collect data periodically over sensor networks while allowing in-network aggregation, which can be either numerical aggregation or message merging. We term this type of application as aggregate monitoring. In-network aggregation is desired for two reasons. First, it enables simple but efficient distributed processing of aggregate queries such as summation and nearest neighbors. On the other hand, even for pure data collection, it allows significant degree of reduction in the number of messages generated within the network, which in turn helps improve energy-efficiency. Aggregate monitoring in sensor networks consists of two phases. During the first phase, a hierarchical tree structure rooted at an access point is discovered via techniques such as network clustering [3, 16]. A continuous query is then

Wireless sensor network is characterized by their capability of generating, processing, and communicating data. They are being explored in many scientific domains, e.g., ecology and oceanography. Most of these applications involve primarily data collection with in-network processing in which continuous aggregate queries are posed and processed. There are two principle concerns with this type of applications. First, due to the use of batteries, limited power resource has been identified as a major challenge in deploying wireless sensor networks. Minimizing power consumption becomes of ultimate importance. Second, data is usually expected to be gathered as soon as possible to facilitate the monitoring of the physical phenomena. In this paper, we tackle these issues through sensor state scheduling. The proposed technique is based on the observation that there are two types of traffic in sensor networks designed for data aggregation, bottom-up and topdown within an abstract tree structure. We show that it is possible to achieve deterministic schedules for data aggregation with very good performance. Specifically, we develop greedy algorithms to schedule transmission and listening operations for each sensor node to achieve collisionfree communication. We show that the schedules can maximize the time sensor nodes spent on low-power states which helps achieve great energy efficiency, as well as allow fast data aggregation.

1 Introduction Advances in MEMS technology has enabled the development of small, inexpensive electronic microsensor devices (e.g. Berkeley Motes [15] ) that integrate sensing, processing, and communicating components. Applications of the networks have been found in many scientific domains, such as ecological and biomedical studies [23, 26]. Portable operating systems that can be embedded into such devices (e.g. TinyOS [12]) have been developed that allow 1

disseminated to the nodes of interest in the network following the tree structure. The dissemination phase is followed by the processing phase in which every sensor node periodically transmits its data to a data collection point. Data is aggregated on sensor nodes en-route. Furthermore, sensor nodes are transitioned to low energy states (e.g., by switching off their radios) after they have received and transmitted their data to conserve energy. Periodicity of the data collection is usually determined by factors such as application needs, the residual battery power on the sensor nodes, and the expected lifetime of the sensor network. Two critical concerns application developers face are power constraints of the sensor devices and the timeliness of the data collection process. The sensor devices are usually battery powered. Due to the deployment environment, physically replacing the batteries can be very painful if at all possible. Hence minimizing power consumption has been identified as a primary optimization goal in many works on wireless sensor networks. On the other hand, the timeliness of data collection is also critical to some applications in which a timely response is needed, such as biomedical monitoring or intrusion detection. Minimizing the data collection time, or in another term, makespan, becomes another important goal in these applications. Aggregate monitoring has been studied in database community in TAG [22]. Approximation techniques [6, 24] have also been explored to achieve energy-efficient data aggregation by eliminating unnecessary messages. While all of these works have been focusing on the first challenge, they have ignored the media access problems caused by shared media space. The problems are instead pushed to the existing CSMA based MAC protocols, which are designed for general network traffic. On the other hand, the utilization of CSMA based protocols can result in very long makespan which can be unbearable. In this paper, we tackle these issues by exploring sensor state scheduling for data collection and aggregation within sensor networks. Our contributions with this paper are as follows.

  

Table 1. Communication States of Sensor Nodes Symbol

Sx Sr Ss



Sensor Node State Transmitting Receiving Sleeping

Radio Operation Mode Transmitter Receiver Off

We present preliminary experimental results to evaluate the proposed algorithms.

Road Map: The rest of the paper is developed as follows. In the next section, we describe our system model and formally present the problem to be addressed in this work. In Section 3, we present two scheduling algorithms to determine sensor states for data collection and control message dissemination. Section 4 evaluates the performance of the proposed techniques. We review related works in Section 5 and conclude the paper in section 6 with a summary of the work and future research directions.

2 Preliminaries and Problem Statement 2.1 Sensor Model and Sensor Networks The sensor node we consider in this work is composed of a processor, an embedded sensor, an A/D converter, and a radio, as with Mica Mote [15]. Each of these components is controlled by a micro operating system [12]. The radio is a hybrid transceiver which allows single channel communication, i.e., either transmitting or receiving at the same frequency. The nodes are battery-powered. Based on the radio circuit operation mode, a sensor node has 3 different states as shown in table 1. While in transmitting state Sx , a sensor node has its radio circuit operating as a transmitter. Receiving state Sr is a state in which a sensor node has its receiver on and is ready to receive data. Generally, a sensor node consumes its power at a similar rate when operating in states Sx and Sr . The most energy-efficient state is the sleeping state Ss in which a sensor completely turns its radio off and therefore consumes power at a much slower rate. The network under consideration has a super node, an “access point” (AP ), with superior resources in terms of computation and communication with remote users who pose queries. All sensor nodes are scattered around the access point to form a connected network. The network topology is relatively stable. Communication between sensor nodes occur via multi-hop messaging. The sensor nodes are homogeneous and generate sensor data. They are also clock-synchronized1. We consider single channel commu-

We identify the problem of aggregate monitoring and its characteristics that are critical to developing data collection schedules. We identify the potential collisions relevant to aggregation monitoring in the deployed wireless environment. We propose TDMA scheduling algorithms based on greedy heuristics to determine sensor node operation states to achieve energy-efficient, collision-free communication for data collection and control message dissemination with short makespans.

1 Clock synchronization is an important research issue [7] in wireless sensor networks and is required by most sensor network applications. But it is beyond the scope of this paper.

2

nication which means that a node can not receive and transmit at the same time. On the other hand, only if a sensor node is in the radio transmission range of one transmitting node can it successfully receive a message. Otherwise, a message contention occurs.

Figure 1. False Block: Packets from s3 to r3 are falsely blocked although they can be delivered concurrently with packets from s1 to r1

2.2 Sensor Data Aggregation In the applications of aggregate monitoring, a set of sensors sense the environment and generate data. The goal is to collect the data to the AP from the sensor nodes periodically while performing in-network aggregation. In-network aggregation is enabled through an aggregation tree, which is an abstract tree structure commonly used to represent a sensor network. The tree is usually build at the topology discovering phase of the deployment of a sensor network using clustering techniques[11, 20, 28, 3]. Each non-leaf node in the aggregation tree serves an extra role of an aggregator besides data deliverer and (sometimes) producer. The aggregation can be any type of operations, such as summation on numerical data or simply the merging of messages without any computation. In this work, we deal with general data aggregation without concentrating on any specific sensor data and aggregation operations. In the data aggregation process, data flows from bottom of the tree all the way up to the root following the tree links. To maximize the benefit of data aggregation, the following observation can be made about this process. The intuitive observation illustrates an important property of the applications under concern and sets a constraint for the scheduling algorithm to be developed later in the paper.

collection [6, 24], error tolerance needs to be dynamically adjusted as well. These information needs to be disseminated from the root to all the nodes in a network to facilitate query processing. Intuitively, a node transmits its data only after it receives data from its parent node in the dissemination process. Although this type of top-down data flow happens much less often, a deterministic schedule can still be helpful as described next.

2.3 Message Contention The data aggregation techniques proposed so far work well if the networks are sparse. However, sensor networks are often relatively dense and contention issues on the shared media have to be resolved. In wireless networks based on 802.11 MAC protocols, CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) is deployed to handle message contention. We examine the problems inherent with these protocols. The first issue lies on the exponential backoff scheme deployed in the protocol to resolve contention. Consider a node waiting for all its child nodes to send their messages. In these protocols, a sender can back-off exponentially when a collision is detected. Theoretically, this collision avoidance technique can not completely eliminate collision in any given time duration. This implies no matter how long the receiving time interval is, there is always a possibility a node can not receive data from all its child nodes. On the other hand, for collisions to be practically avoided, the receiver has to keep listening for a likely very long time (usually also referred to as idle listening) before the transmissions can be completed successfully, especially when the number of child nodes is not small. Another issue lies on the fact that in the MAC protocol two types of control packets, RTS (request to send) and CTS (clear to send), are used to reserve medium to resolve the so called hidden terminal problem [25]. A CTS message is required from a receiver for a sender to transmit. On the other hand, if a node is receiving, all nodes in its vicinity must avoid transmission and are thus blocked. However, when a node can not issue a CTS message for this reason while it is requested and it can actually receive without collision, a false blocking problem [25, 19] occurs. As illustrated in figure 1 where the circles represent radio coverage range

Observation 1 In aggregate monitoring applications with in-network processing, a node transmits its data only after it receives data from all its child nodes in the data collection process. By deduction, a node transmits only after all its descendants have completed their data transmissions. With an aggregation tree, data is collected at a periodicity of epoch specified by users. Decided by the monitoring applications, the epoch is usually longer as compared with the communication latency to aggregate data values all the way to the AP . Within each epoch, a time interval is allocated for each parent node to receive updates from its child nodes. Child nodes resort to reservation-based CSMA MAC protocol to communicate with the parent node. Nodes go back to sleeping state when they are not communicating. Note that although an epoch can be a long period, it is desired that the data can be collected as early as possible so that a fast response can be initiated if anything abnormal is observed. This in-network aggregation technique has been investigated by Madden et al. in TAG [22]. On the other hand, certain control messages are needed to adjust the epoch or query predicates. In quality-ware data 3

of the nodes, r2 is blocked when s1 is sending packets to r1 . Hence when s2 sends a RTS packet to r2 , r2 can not respond. But the RTS packet also blocked r3 . As a result, if s3 requests to send packets to r3 , r3 will not be able to respond. A possible parallel communication is falsely blocked. This problem exaggerates the issues of energy inefficiency and large makespan by requiring longer listening time. Note that since the sensor states are classified by radio circuit operation mode, idle listening and receiving are considered to be the same sate as both require receiver to be turned on, although a sensor node may actually consume more energy receiving packets due to the operation of other components. The above discussion shows that while CSMA/CA is a general solution to message contention, it is also an expensive solution in terms of both energy efficiency and timeliness. In this work, we exploit the well-defined traffic characteristics in the aggregate monitoring applications and explore TDMA (Time Division Multiple Access) scheduling to address these issues.

algorithms is to decide specific mode each node should operate on for each time slot. Note the proposed algorithms are designated to develop TDMA schedules for aggregate monitoring. The schedules are on application layer and are not intended to serve as the fundamental MAC protocol for general communication in the network. CSMA/CA may still present to serve as the general MAC protocol. In fact, the initial topology discovery and scheduling phase of our algorithm need such a protocol to deliver messages.

3 Scheduling for Data Aggregation Minimal makespan scheduling problem for general data collection problem has been shown to be NP-complete in [4]. We propose heuristic scheduling algorithms to achieve small deterministic makespans for aggregate monitoring. In this section, we first present PAS , a Postorder Aggregation Scheduling algorithm, which produces schedules that can guarantee the data aggregation process to be completed within a certain time while maximizing node sleeping time. The sleeping time is maximized due to the fact that nodes are in transmitting or receiving states only when they need to be. Idle listening is eliminated by the scheduling algorithm. The proposed algorithm is a distributed algorithm in which the computation for schedules is pushed to sensor nodes. However, the algorithm can also be executed centrally if the access point obtains the topology information of the network. The algorithm is performed only at the network configuration phase following topology discovery or when a change in network topology is detected. The scheduling process can be viewed as a traversal of the aggregation tree rooted at AP . The postorder scheduling in the algorithm is determined by the precedence constraint in data collection. Transmission priority is given to lower level nodes with leaves at the lowest levels. For control messages, we describe at the end of this section a similar algorithm, Preorder Scheduling, that schedules parent nodes before child nodes.

2.4 Problem Statement We consider data collection with in-network aggregation in a wireless sensor network through an aggregation tree. This implies that every node knows its parent node and its child nodes in its neighborhood. The aggregation epoch is T . We consider two types of data flow, bottom-up in which data is sent from all nodes to the AP , and top-down in which control messages are disseminated from the root to all sensor nodes. We have the following precedence constraint as a direct result of observation 1. Definition 1 (P RECEDENCE C ONSTRAINT). In the aggregate monitoring applications with in-network aggregation, a node has to transmit its data before its ancestors for data collection. Correspondingly, a node has to transmit its data before its descendants for control message dissemination. The goal of this work is to design a scheduling algorithm to schedule the radio operations for each node in order to 1) enable the two types of data flow with the precedence constraint; 2) collect data to the AP within a known time; 3) achieve energy-efficiency by eliminating message contention. The aggregation period T can be segmented into time slots from the beginning of the period in order to schedule node operations. The length of the slots is determined by the transmission rate of the sensor radio and the size of the data packet to be transmitted. To simplify discussion, we assume all time slots are of same duration. The presented algorithm can be easily extended to handle varying time slots. With time slots introduced, the goal of the proposed scheduling

3.1 Design Considerations and PAS Overview The key to implementing collision avoidance on sensor nodes and maximizing spatial reuse of the shared media is to identify collisions. We have identified two types of collisions in the context of data aggregation using a tree structure. The following two definitions are presented from the standpoint of a node which concerns about the consequences when it decides to transmit. Definition 2 (I NTERFERENCE C OLLISION). An interference collision occurs when some nodes in its neighbor4

hood, besides the expected recipient, are scheduled to receive messages at the same time slot. Definition 3 (D ESTINATION C OLLISION). A destination collision occurs when the parent node, the expected recipient of the transmission, is able to hear from other nodes in its neighborhood at the same time. Figure 2. Interference Conflicts and Destination Conflicts

What distinguishes the two types of collisions is whether the collisions happen at a random neighbor of a node or at its parent node which should be the expected recipient. It is important to avoid interference collision to make sure the transmission to be scheduled does not interfere with other nodes which may already have established their own receiving schedules. Avoiding the destination collision ensures that the message can reach the destination without being corrupted by other messages. Indeed, we have the following lemma.

The interference conflict captures the possibility of interference collisions. Note that two leaf nodes’ schedules are not interference conflict to each other, since neither will receive in the data aggregation process. This observation can help reduce the information to be stored during scheduling. However, they may constitute the another type of conflicts. As shown in figure 2, the schedule of node C can be interference conflicts to both node B and D.

Lemma 1 Avoiding both interference collisions and destination collisions is necessary and sufficient to ensure successful transmissions for all sensor nodes.

Definition 5 (D ESTINATION C ONFLICT). A complete or partial schedule of a node is a destination conflict to an unscheduled (or partially scheduled) two-hop neighboring node, when they share a common neighboring node which is the parent node of either the unscheduled (or partially scheduled) node or both of them.

We next identify sensor scheduling states to better understand the effects of the collisions on scheduling. There are three scheduling states for any node during the scheduling process as listed below.



The destination conflict captures the potential destination collisions. In figure 2, the schedule of node C is destination conflict for node A. Note that the dotted line illustrates that nodes B and C are within each other’s radio range, while the link is not used in the aggregation tree as a tree link. We emphasize again that the two types of conflicts are defined from the point of view of a specific sensor node. A closer examination shows that the destination conflict as observed by a node is actually the interference conflict of its parent node. The distinction is made in order to facilitate the scheduling of the current node. For a node to make scheduling decisions, the awareness of the two types of conflicts is critical in order to avoid collisions. The awareness is implemented by a conflict collecting and a conflict passing scheme in PAS. To enable conflict collecting, nodes always broadcast their schedules right after they finalize the schedules or updates their partial schedules. An unscheduled or partially scheduled node puts together all the conflict schedules it received from neighbors into a table. We denote this table a Conflict Table (CT ) (see the 3-column tables in figure 5). The conflict table only captures interference conflicts. Destination conflicts are acquired by a node through conflict passing; A parent node always passes its conflict table to a child along with a scheduling request, as discussed later. The PAS algorithm schedules child nodes before parent nodes, with leaf nodes scheduled first. It concentrates on

S CHEDULED. The schedule of the node is completed and fixed. In PAS, completed schedules are always protected. Later schedules should not conflict with them.



P ENDING. The schedule is partially fixed. This applies to the nodes serving as parent nodes. When part of their child nodes have scheduled their transmissions, its receiving time slots are fixed accordingly. The node is still waiting for its other descendants to complete their scheduling. However, the fixed partial schedule, should be protected as well.



U NSCHEDULED. No communication state has been determined for any of its time slots.

To avoid collisions and hence protect fixed schedules, sensor nodes need to be aware of these schedules before it schedules itself. Similar to collisions defined above, we define two types of potential conflicts as follows. The separation will facilitate further discussion. Definition 4 (I NTERFERENCE C ONFLICT). A complete or partial schedule of a node is an interference conflict to an unscheduled (or partially scheduled) neighboring node, when at least one of them is an aggregation node. 5

P ROCEDURE : Postorder Aggregation Scheduling (1) while(receive a message msg ) (2) if(msg schedule and this.scheduled = false) (3) if(this.notLeaf() or schedule.sender.notLeaf()) (4) insert msg into this: ; (5) endIf; (6) endIf; (7) if(msg from parent and i .notEmpty()) (8) nc = i .getNode(); (9) send ( , ) to nc ; (10) endIf; (11) if(msg from a child c) (12) set radio to r for the x slot in c’s schedule; (13) broadcast(this:schedule); (14) c; i= i (15) if( i .notEmpty()) (16) nc = i .getNode(); (17) send ( , ) to nc ; (18) endIf; (19) endIf; (20) if( i .empty()) (21) pick the earliest “free” slot as x slot; (22) broadcast(this:schedule); (23) send back to parent; (24) endIf; (25) endWhile;

P ROCEDURE : Preorder Scheduling (1) while(receive a msg ) (2) if(msg schedule and this.scheduled = false) (3) if(this.notLeaf() or schedule.sender.notLeaf()) (4) insert msg into this: ; (5) endIf; (6) endIf; (7) if(msg from parent and i .notEmpty()) (8) pick the earliest “free” slot as x slot; (9) broadcast(this:schedule); (10) nc = i .getNode(); (11) send ( , ) to nc ; (12) endIf; (13) if(msg from a child c) (14) set radio to r for the x slot in c’s schedule; (15) broadcast(this:schedule); (16) c; i= i (17) if( i .notEmpty()) (18) nc = i .getNode(); (19) send ( , ) to nc ; (20) endIf; (21) endIf; (22) If( i .empty()) (23) send back to parent; (24) endIf; (25) endWhile;

=

=

CT

= SR C SR CT = SR S C C C C SR CT

C

CT

C

= SR

C SR CT = SR S C C C C SR CT

S

S

C

SR

Figure 3. Postorder Aggregation Scheduling

C S

S

SR

Figure 4. Preorder Scheduling slot of its last child node). To test if a slot is free for a node to transmit, the following two criteria have to be enforced to avoid both interference collisions and destination collisions:

transmission states of sensor nodes. Receiving states are determined accordingly at the parent nodes. Note that each node transmits only once in aggregate monitoring applications. The success of receiving is guaranteed by avoiding destination collisions and interference collisions. The receiving time slots of a parent node are determined one by one, since they correspond to children nodes’ transmission schedules. Conflict collecting and conflict passing schemes are exploited to collect all conflict information for scheduling. When computing a transmission state, a node takes a greedy approach: it selects the first available free time slot which allows it avoid both interference collisions and destination collisions.



C RITERION 1: None of its scheduled neighbors has marked the slot as an Sr slot. This is used to avoid interference conflicts between immediate neighbors. To implement this criteria, a node scans its conflict table ensuring that none of the schedules in the table has marked the slot as Sr .



3.2 Details of the PAS Scheduling Algorithm

C RITERION 2: None of its parent’s neighbors was scheduled to transmit in the slot. This criterion helps avoid destination conflicts between two-hop neighbors. When a slot is reserved for a child to transmit, it guarantees that the intended parent node can receive the message without collision. To implement it, a parent node always passes its own current CT to its children along with scheduling request(SR). A node scans this table to ensure that none of the schedules mark the slot as Sx .

The focus of the algorithm is for a node to find its transmission time slot. In postorder scheduling, the root of the aggregation tree generates a scheduling request (SR) message and injects it into the tree. The SR traverses the tree in postorder with each node passing the message to its children first. A sensor node starts scheduling its transmission after it receives the SR message back from the last child node, if it has child nodes, or after it receives the message from its parent node, if it is a leaf node. SR constrains parameters such as the duration of time slots, as decided by the root node of the tree. The node sequentially examines time slots from the very first time slot if it is a leaf node, or otherwise after its latest receiving time slot (i.e., the transmission time

Although the CT may be always changing, it is not necessary for a node to request its parent’s CT right before it makes its scheduling decision. We observe that it suffices for a parent node to pass the present version of CT at the time of passing SR. In fact, we have the following lemma. 6

Figure 5. An Aggregation Tree

Figure 6. Node Schedules

Lemma 2 The schedules inserted into a parent node’s CT after it passes SR to a child node will not violate criterion 2.

In figure 3, function notLeaf() returns true if a node is not a leaf node and notEmpty() returns true if a set is not empty. Ci is the set of chilled nodes. Ci is the set of child nodes; CT is a conflict table; This denotes current node ni .

Proof sketch. The lemma is based on the precedence constraint which requires the child node to schedule its transmission time slot after the child node receives data from all its descendants. On the other hand, the schedules inserted into the CT can only come from these descendant nodes. Hence, the transmission time slot to be scheduled for the child node has to be later than these schedules which implies they will not collide at the parent node.

3.3 An Example We go through the PAS algorithm with the tree in figure 5. In the figure, a 3-column table presents a conflict table for each node, as seen when its own schedule is computed. A two-column table presents the complete schedule of the node. The solid lines represent links in the aggregation tree. While the dotted lines represent the communication links not included in the tree. A schedule request SR is initiated at n1 and passed to n2 . n2 then sends it to n4 , which marks the very first time slot ts1 as its transmission slot by setting its sensor state to Sx in that slot and broadcasts this schedule. Upon receiving the SR from n4 , n2 updates its partial schedule and broadcasts it. Nodes n1 , n3 , n6 , and n5 all store the partial schedule in their conflict tables. The SR is then passed to n5 which picks ts2 as its Sx slot. Note that the conflict tables of n1 and n2 are empty when SR is passed from n1 to n2 and from n2 to n4 . We omit discussions on other nodes. As an example of free slot identification, when n6 computes its own transmitting slot, it traverses the Sr column of its conflict table and the Sx column of n3 ’s conflict table which has been passed to it with SR. Based on the two criteria discussed earlier in this section, ts4 turns out to be the earliest available time for it to transmit. The final schedules are shown in figure 6. As can be observed from the schedules, parallel transmissions are achieved at time slot ts1 when both nodes n4 and n7 can transmit data. On the other hand, if we ignore the dotted links in the tree graph, the algorithm will produce ideal schedules for all the nodes in a pure aggregation tree. The complete time would be 4 time slots. The existence of interference represented

We describe the algorithm with node operations during the scheduling phase. Figure 3 shows the pseudo-code for the procedure. A node listens for any scheduling-related messages. If the message is a potential conflict schedule, it saves it into its own conflict table (lines 2-6). If the message is the very first SR (schedule request) destined for it, it passes the SR to a child, together with its own current CT (lines 7-10). When SR is sent back by the child, it updates and broadcasts its partial schedule. The updated schedule contains a new Sr slot which has to be inserted into the CT s of its unscheduled and partially scheduled neighbors. SR is then passed to another child, with its updated CT (lines 11-17). The CT can be different from what was passed to the previous child due to the fact that more conflict schedules might have been inserted when the previous child node and its descendants perform their scheduling. When the node gets back SR from the last child, it schedules its own transmission based on its own CT and the CT from its parent node, following the two criteria. Upon completing the scheduling, the node passes SR to its parent node and broadcasts its schedule (lines 20-23). The algorithm stops when the root node of the tree receives the SR from its last child node. The procedure is shown as a pseudo-code in figure 3. 7

transmission time for one packet is around 25ms. In a network with one root node and five child nodes, A TDMA schedule requires only a little more than 125 ms for the aggregation to complete. However, when five nodes try to send data packets to one common parent node at the same time via CSMA/CA, it takes more than 145ms to completely transmit all data successfully. Generally, it has been reported [4] that the RTS/CTS handshake results in 15% overhead. The experiment demonstrates significant potential savings with a deterministic schedule. Due to the fact that only a general framework is proposed in [22], it is impossible to perform a full comparison study. Hence we focus on evaluating the performance of the scheduling algorithms in the following sections.

by the dotted lines causes the aggregation to take one more time slot to finish.

3.4 Scheduling Control Messages Since control messages flow top-down from the root, priority has to be given to parent nodes. The algorithm, Preorder Scheduling, is very similar to PAS. We show the algorithm in figure 4 and ignore detailed discussion.

4 Performance Evaluation We performed a preliminary experimental study to evaluate the proposed PAS algorithm. We simulate sensor network aggregation application scenarios with ns2 [1]. The radio coverage distance of each sensor node is set at 40m, which is consistent with other well-known works, e.g. [13]. To test the performance of the proposed algorithm over different densities of sensor node placement, we vary the number of nodes and the network dimension. The nodes are placed randomly on the chosen space. We simulate aggregate monitoring queries initiated at a randomly chosen node in the entire query region. All our results are averages over multiple runs. Since network topology is the most important factor in our algorithm performance evaluation, we only work with synthetic data while simulating various topologies.

4.2 Makespan As mentioned earlier, although there are usually no deadlines in the aggregate monitoring applications, early makespan is still a very important property that is desired for the application to initiate necessary response as soon as possible. In this section, we examine the performance of PAS in terms of makespan. Specifically, we investigate the makespan of the aggregation process under various network setups. Makespan as used in this section refers to the last time slot in which the AP receives data from its last child. 4.2.1 Impact of Network Density

4.1 Energy Efficiency

We first examine the impact of network density on makespan. In a dense network, the number of child nodes of a parent node is usually large. Hence the degree of interference is higher among sensor nodes. The opportunity for parallel transmission is relatively small, which can result in longer makespan. This is confirmed with our experimental results as listed in table 2. This set of experiments use 20 nodes placed randomly in a 2D space. The density variation is achieved by varying the size (dimension in meters) of the simulation space as shown in the table. As a systematic way to measure density, we use the average number of nodes per cell in the table. Cell is defined as a circle with radius of the sensor radio range (40m). As we can see, the makespan for high density setup (17.2) is much longer than the low density setup (7.8). Note that parallel transmissions are still achieved even in high density setup. A naive sequential schedule will give the longest makespan (20).

With deterministic schedules, which allow each sensor node to listen only when its child nodes are transmitting, energy efficiency is guaranteed. This listening time is the minimum time required to collect all data in the network. On the other hand, resorting to a CSMA/CA based MAC protocol can result in much longer listening time. Note that to compare the two approaches, we only consider listening time, which is the time a sensor node spends in active receiving mode Sr . For CSMA/CA protocols, without knowing exactly when a packet is coming, a node has to stay in the receiving mode for an extended period of time. On the other hand, despite the existence of RTS/CTS packets in CSMA/CA, we consider transmission time to be roughly consistent irregardless of the availability of a schedule, since the size of RTS/CTS packets are relatively small and a sensor node can go back to sleeping state Ss during backoff. The observation is confirmed with a preliminary experiment with ns2. For example, using a CSMA/CA based MAC protocol with TwoRayGround signal propagation model in ns2, when wireless data rate is 50kbps and packet size is 30 bytes (both parameters are commonly used in wireless sensor network simulations), the approximate

4.2.2 Impact of Network Size The next set of experiments target at evaluating the scalability of the scheduling algorithm. When the number of nodes increases with node density remaining constant, the 8

Node Density 4 8 12 16 20

Network Dimension 158 158 112 112 91 91 79 79 70 70

    

makespan 7.8 10.6 12.7 13.9 17.1

Nodes 10 20 30 40 50

Table 2. Impact of Network Density with 20 Nodes

    

Nodes 10 20 30 40 50

makespan 7.5 10.2 14.0 17.6 20.1

Table 4. Deployment Strategies with Fixed Network Dimension lection. Previous works in sensor databases, such as the TinyDB project [21] and COUGAR project [27] investigate a variety of issues in developing a sensor database, such as query dissemination, data sampling, and in-network data processing. Madden et al. at UC Berkeley investigated data aggregation through a hierarchical tree structure[22], which is the same scheme we follow in this work. Our work on quality aware sensor data management in our QUASAR project [17, 10] tries to trade data quality for network performance in an efficient way. Sensor network data gathering has been explored in the network community. LEACH [11] and PEGASIS [20] explore cluster-based data gathering protocols that rotate the cluster-head randomly to evenly distribute workload among sensor nodes. Directed Diffusion [14] provides a data dissemination approach in which a node requests data by propagating interests, and aggregation is implemented through “path sharing”. Several approximate network clustering algorithms using weakly-connected dominating sets were presented in [3]. [16] and [5] target at uniform energy dissipation in order to achieve maximum lifetime for a sensor network. Any of these technique can be used to produce the hierarchical tree structure we used in this work. There have been a few recent works on incorporating sensor state management in the design of MAC protocols. S-MAC [29] is a CSMA/CA based approach that enables energy savings by exploiting periodic sleeping. In [18], authors propose centralized link coloring schemes to solve the parallel communication scheduling problem. The result is a TDMA schedule for collision-free communication. These approaches are general MAC protocols and is not tailored to suit the data aggregation problem we address with PAS. A couple of approaches [4, 9, 8] have been recently proposed to address the data collection problem. However, the problem differs from aggregate monitoring in that no in-network aggregations, hence precedence constraints, are considered. In addition, all these works simplify the wireless communication by considering linear networks or treelike networks and focus on developing a theoretical upper bound for makespan for the simplified network model. Interference in a general network setting is said to be handled with CSMA as in TAG [22], although it is not clear how this extension will affect the pre-achieved TDMA schedules. Contrary to theses approaches, our scheduling algorithms focus on resolving general interference to achieve

aggregate process makespan becomes longer. our experiments confirms this observation. Another interesting fact is that when the network size is large the makespan increases much slower. Our explanation is, due to the fact that in networks covering larger space, there exist more opportunities for parallel transmissions. In fact, the makespan will be mostly determined by the section of network where interference is high. In the table, constant node density is maintained while the space of sensor node deployment is enlarged and the number of nodes is correspondingly adjusted. Network Dimension 56 56 79 79 97 97 111 111 125 125

Equivalent Density 3.5 7 10.5 14 17.5

makespan 8.6 14.0 17.5 18.7 19.9

Table 3. Impact of Network Size with Fixed Node Density 4.2.3 Impact of Deployment Strategies When deploying a sensor network, the physical phenomina being monitored are usually spatially constrained in one specific region. The application users face a question as to how many sensor nodes should be placed to cover the region. While there can be many factors playing roles in the decision-making process, such as spatial coverage, hardware cost, and the monitoring resolution, data collection latency should also be considered. This set of experiments are designed to evaluate performance on makespan when various number of sensor nodes are deployed in a fixed region. We specifically vary the deployed nodes from 10 to 50 on a space of dimension 120m120m. The network density also increases, as shown in the second column. The result is similar to what was obtained for the density test. But it focuses on fixed space instead of fixed number of nodes. As a result, the makespan, shown in table 4, increases faster as compared with the results in table 3.

5 Related Work Research areas related to this work include sensor databases, network aggregation, and quality-aware data col9

practical collision-free schedules for aggregate monitoring. In addition, research in quality-aware data collection [24, 6] addresses the relation between quality requirements of query answers and quality of the raw data, with the ultimate goal of satisfying query requirements while minimizing certain cost. Control messages are usually required to adjust the quality settings. This set of work motivates us to schedule control messages in our paper.

[9] S. Gandham, Y. Zhang, and Q. Huang. Minimal time convergecast scheduling in wireless sensor networks. In ICDCS, 2006. [10] Q. Han, S. Mehrotra, and N. Venkatasubramanian. Energy efficient data collection in distributed sensor environments. In ICDCS, 2004. [11] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan. Energy-efficient communication protocol for wireless microsensor networks. In HICSS, 2000. [12] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister. System architecture directions for network sensors. In ASPLOS, 2000. [13] C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks. In MobiCOM, 2000. [14] C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks. In MobiCOM, 2000. [15] J. Kahn, R. Katz, and K. Pister. Next century challenges: Mobile networking for ’smart dust’. In MOBICOM, 1999. [16] K. Kalpakis, K. Dasgupta, and P. Namjoshi. Maximum lifetime data gathering and aggregation in wireless sensor networks. In NETWORKS, 2002. [17] I. Lazaridis, Q. Han, X. Yu, S. Mehrotra, N. Venkatasubramanian, D. V. Kalashnikov, and W. Yang. Quasar: Qualityaware sensing architecture. SIGMOD Record, 33, 2004. [18] H. Li, P. Shenoy, and K. Ramamritham. Scheduling communication in real-time sensor applications. In RTAS, 2004. [19] H. Li, P. Shenoy, and K. Ramamritham. Scheduling messages with deadlines in multi-hop real-time sensor networks. In RTAS, 2005. [20] S. Lindsey and C. S. Raghavendra. Pegasis: Power efficient gathering in sensor information systems. In Proceedings of IEEE Aerospace Conference, 2002. [21] S. Madden, M. Franklin, J. Hellerstein, and W. Hong. The design of an acquisitional query processor for sensor networks. In SIGMOD, 2003. [22] S. Madden, M. F. J. Hellerstein, and W. Hong. Tag: a tiny aggregation service for ad-hoc sensor networks. In OSDI, 2002. [23] A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler, and J. Anderson. Wireless sensor networks for habitat monitoring. In WSNA, 2002. [24] C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In SIGMOD, 2003. [25] S. Ray, J. B. Carruthers, and D. Starobinski. Rts/cts-induced congestion in ad hoc wireless lans. In WCNC, 2003. [26] L. Schwiebert, S. K. Gupta, and J. Weinmann. Research challenges in wireless networks of biomedical sensors. In The seventh annual international conference on Mobile computing and networking, 2001. [27] Y. Yao and J. Gehrke. Query processing in sensor networks. In CIDR, 2003. [28] F. Ye, G. Zhong, S. Lu, and L. Zhang. Gradient broadcast: A robust data delivery protocol for large scale sensor networks. WINET, 11(2), March 2005. [29] W. Ye, J. Heidemann, and D. Estrin. An energy-efficient mac protocol for wireless sensor networks. In INFOCOM, 2002.

6 Concluding Remarks In this paper, we propose a scheduling algorithm to schedule sensor node operations to achieve contention-free communication in aggregate monitoring applications. We show that a deterministic transmission schedule can be computed for each sensor node which in turn guarantees the successful completion of data aggregation within a known time. The algorithm helps achieve significant amount of savings on power consumption over CSMA based alternative approaches. The aggregation process can also be completed in a much shorter time. The work presented here is an early step in our effort on studying energy-efficient query processing over wireless sensor networks. An immediate task is to explore optimal scheduling with deadlines and techniques for other classes of queries. With the unknown nature of data generation, probabilistic techniques could be devised to design a more flexible aggregation protocol. Our eventual goal is to bring together data management protocols and mechanisms so as to ensure cost-efficient, reliable querying service for sensornetwork applications.

References [1] The network simulator - ns-2. http://www.isi.edu/nsnam/ns/. [2] P. Bonnet, J. Gehrke, and P. Seshadri. Towards sensor database systems. In 2nd International Conference on Mobile Data Management, 2001. [3] Y. Chen and A. Liestman. Approximating minimum size weakly-connected dominating sets for clustering mobile ad hoc networks. In MobiHoc, 2002. [4] H. Choi, J. Wang, and E. Hughes. Scheduling on sensor hybrid network. In ICCCN, 2005. [5] K. Dasgupta, K. Kalpakis, and P. Namjoshi. An efficient clustering-based heuristic for data gathering and aggregation in sensor networks. In WCNC, 2003. [6] A. Deligiannakis, Y. Kotidis, and N. Roussopoulos. Hierarchical in-network data aggregation with quality guarantees. In EDBT, 2004. [7] J. Elson and D. Estrin. Time synchronization for wireless sensor networks. In IPDPS Workshop on Parallel and Distributed Computing Issues in Wireless Networks and Mobile Computing, 2001. [8] C. Florens and R. McEliece. Dpackets distribution algorithms for sensor networks. In INFOCOM, 2003.

10