TurboSync: Clock Synchronization for Shared Media ... - UCLA.edu

1 downloads 0 Views 512KB Size Report
Computer Science Department - University of California, Los Angeles, CA 90095, US. † ..... turbochargers where the kinetic energy of the exhaust gases.
TurboSync: Clock Synchronization for Shared Media Networks via Principal Component Analysis with Missing Data Ryad Ben-El-Kezadri∗†‡ , Giovanni Pau ∗ , and Thomas Claveirole



∗ Computer

Science Department - University of California, Los Angeles, CA 90095, US Universit´e de la Manouba, Campus Manouba, 2010 - Tunisia ‡ LIP6/CNRS - UPMC Univ Paris 06 e-mail: [email protected], [email protected], [email protected] † CRISTAL,

Abstract—Clock synchronization in shared media networks is particularly challenging because the operating conditions are dynamic and the resources limited. This paper presents TurboSync, an accurate and bandwidth efficient synchronization scheme. Unlike traditional solutions that synchronize pairs of nodes, TurboSync, is able to synchronize entire node clusters. TurboSync relies on principal component analysis with missing data. Packets are broadcasted on the medium and their capture times at each node side are used to compute the clock conversion parameters. To have a complete and usable set of capture times for each packet, our idea is to fill out the missing packet timestamps at the transmitters’ side using an inference mechanism. TurboSync synchronizes all the clocks in the cluster at a time which leads to a coherent clock conversion system between the nodes. Our performance results show better accuracy compared to the RBS protocol. Index Terms—Clock, Synchronization, Principal Component Analysis, Missing Data, Broadcast Media

I. I NTRODUCTION Synchronization in distributed networks has been studied for a long time. But as networks are going faster, mobile and wireless, new solutions have to be proposed to support the tight timing constraints required by the new wireless radios [1] and monitoring tools. Without proper time management, the data collected by the network also loses part of their context. Hence, algorithms that perform well on short timescale are required. The ideas behind TurboSync are motivated by two simple observations. Existing synchronization schemes rely on a specific broadcaster that transmits identifiable reference frames on the network. These packets are captured by the neighboring nodes and their id and capture time are sent back to the broadcaster. The broadcaster uses this information to compute the conversion parameters between each pair of nodes’ clock. The conversion parameters are then distributed on the network so that each application can translate a time reference from any other node base to its own time base. The first idea of TurboSync comes from the fact that it exists in the network a lot of frames that are uniquely identifiable and can be used instead of dedicated reference frames. Exploiting these frames makes the synchronization algorithm more bandwidth efficient because no more dedicated frames are needed. Most of

broadcast control packets are reusable because they include a unique sequence number. However control traffic is generated at all network ends, and so the synchronization scheme will have to deal with a situation where the role of the broadcaster is distributed on every node, which is a fundamental change in the algorithm design. The second idea of TurboSync is to synchronize multiple nodes at a time. The time observations captured at each node can be laid in a matrix with rows representing the packet’s id and columns representing nodes. A synchronization scheme needs a complete data set to compute the conversion parameters between the nodes. The problem is that the capture time is not available at the transmitter side because the packet capture system does not record the true time when the packets are sent in the air. As a result, every transmission observation in the matrix is flagged as a missing data. Traditional schemes only use one transmitter to gather all the missing observations in one column and operate pairwise on the remaining columns to compute the conversion parameters between each pair of nodes. This is in order to cope with simple and complete matrix structures. TurboSync is able to operate on matrices with missing data scattered on each column. Surprisingly, it has lower computational complexity than classical schemes and it operates on all columns at a time, which ultimately provides a coherent conversion parameter system between the nodes’ clocks. Thanks to such a conversion system, each time reference has a unique value in each node base, whatever the different time conversions it experiences. Contrary to existing algorithms that operate on clock pairs via 2-D line fitting to compensate for offset and skew, our solution can process N clocks simultaneously through N-D line fitting. It uses the same inputs as the well-known RBS scheme [1] and does not introduce additional assumptions. But it makes better use of reference frames diversity and corrects offset and skew jointly for all nodes. The synchronization algorithms of RBS and TurboSync operate on single broadcast domains (clusters). To function as a cluster, the nodes must be physically close enough so that each one hears each other. In such a configuration, all nodes can decode the broadcast packets sent in the cluster because broadcast frames are generally transmitted with robust modulation. TurboSync is both compatible with off-line and on-line clock synchronization

2

frameworks. More interestingly, the on-line version of the algorithm has three unique properties: (i) it is completely distributed and so more robust to channel poor conditions (ii) it reduces the synchronization protocol complexity as only one round is needed to synchronize all nodes in a cluster (a RBS broadcaster can not synchronize itself during its own synchronization round, it must wait another round for another RBS broadcaster to send its own conversion parameters [1]), (iii) it is communication efficient as external broadcast packets can be reused instead of reference frames; in both on-line and off-line versions (iv) the common clock that the nodes agree on is more accurate because it averages all the single clocks in the cluster (v) the clock conversion system is coherent over the cluster. The rest of the paper is organized as follows. Section II provides an overview of the state of the art. Section III describes the main design strategies of synchronization algorithms. Section IV presents TurboSync. Section V compares the performance of our solution with a pairwise synchronization algorithm that does not leverage the multidimensional nature of the local broadcast media. Section VI concludes the paper. II. R ELATED W ORK Clock synchronization algorithms either fall into the off-line or the on-line categories. Off-line methods receive as input the NIC or OS timestamps1 of outdated trace files. The input files are not synchronized with one another. Off-line methods do not fundamentally differ from on-line techniques except that all the data are available in a central point. In the simplest case, the algorithm chooses a master among the nodes and computes for every node the conversion parameters to the master clock from some reference frames already present in the traces. Thus, the timestamps of the packets collected by each node can be converted into the time base of the master by applying the right conversion parameters. The drawback of off-line algorithms is that they are often more complex and require this much communication and processing resources that they can not be used on-line [2]. On-line methods keep the nodes NIC and/or OS clocks synchronized while the network is running. The first synchronization schemes have been proposed for wired networks. [3] synchronizes receivers over transmitters and so assumes bounds on the message transit delay. As a result, these solutions do not adapt well to ad-hoc networks where the communication link can introduce significant jitter. [4] proposes a scheme that reduces the limitation imposed by message delivery variance by synchronizing the receivers among themselves thanks to the reference frames broadcasted by an external third-party node. RBS [1] extends the concept to multi-hop wireless networks and uses internal third-party 1 In PCAP trace files the NIC timestamps are added by the Network Interface Card (the card must run in monitor mode). The OS (aka PCAP) timestamps are generally set by the Operating System each time a packet enters into the kernel.

nodes. Third-party nodes are chosen randomly in each node cluster at each synchronization round. At each round2 , (1) the third-party node broadcasts reference frames, (2) the receivers send back their observations, and (3) the initial sender performs a regression on the collected data to compute the clocks conversion parameters (offset and skew corrections) for each receiver pair. The clocks conversion parameters (offset and skew corrections) are then sent back to the receivers. Regression is the key element to average multiple observations over time. Hence, several schemes have been proposed: linear interpolation [5], linear regression [6], [7], or fitting a simple f (x) = x + b function omitting skew [8]. The regression process computes the conversion parameters (offset and skew) between two clocks by looking for the best fit line between the set of observations of the two nodes. The points are not perfectly aligned because they exhibit timestamping errors. Timestamping errors occur because each packet experiences a variable processing delay at the NIC and OS levels between the antenna and the packet capture system. The skew corresponds to the slope of the line and the offset to its intercept. Closer to our work, [9] proposes a scheme where several third-party nodes act as broadcasters in a single round. However, the scheme does not account for clock skew. This makes the theory impractical for most wireless shared media technologies. [2] proposes an off-line method that operates at the network scope for synchronizing all nodes’ clocks at once. The reference frames are used as anchor points for a maximum likelihood estimation of the clock conversion parameters. One particularity of the method is that it needs to estimate the time when the packets reach the receivers NIC in addition to the clocks’ offset and skew. The maximum likelihood estimator leads to a large linear programming problem which excludes it from on-line use. The distribution of the timestamping delays is assumed exponential which is also not completely realistic as these delays are generally bounded to a dozen of microseconds. [10] proposes an algorithm to synchronize nodes’ clusters. One drawback of this approach is that it estimates skews and offset on different time scales and that it requires fine tuning of its parameters. With the improvement of the flexibility of the radio interface, wireless sensor networks have seen a renewal of interest in transmitter-receiver synchronization. [11] proposes a MAC-layer timestamping technique to obtain more accurate timestamps at the sender and receiver sides. While very interesting, these techniques are limited to specific platforms (like Mica/Mica2). On the other hand, RBS or TurboSync accommodate a broader range of platforms as they do not require access to the low levels of the system. Both techniques could however benefit from MAC-layer timestamping at the receiver side. At the network scope, TurboSync belongs to the clustering scheme class as it partitions the network into clusters, as RBS, to achieve multi-hop synchronization. Readers 2 The two rounds of two nodes can run simultaneously but the two nodes do not share the observations they have collected during the round.

3

(a)

(b)

Fig. 1.

(c)

(d)

Examples of system design

are referred to [12] for an in-depth review of existing methods. III. S YSTEM D ESIGN S TRATEGIES The two fundamental design principles that characterize a synchronization algorithm are how the role of the broadcaster is shared between the nodes and how many nodes can be synchronized at a time. This information can be represented with the help of a matrix (XM ×N ) of M rows and N columns where M denotes the number of broadcast frames used for synchronization and N the number of nodes in the network. The cell Xi,j contains the capture time of the observation of the ith reference frame collected by the j th node, nj . The capture times in XM ×N are the material used by the algorithms to synchronize the nodes. They are provided by the packet capture library running on each node. A cell is colored in black in figure 1 if the capture corresponds to a transmission. The problem with the transmissions is that they are completely desynchronized from the receptions. This is because most of Network Interface Cards (NIC) do not provide to the packet capture library the actual time when the first byte of the packet is sent on the medium3 . That is why transmissions must not be used for synchronization and why they are blacklisted by synchronization algorithms. Figure 1.d highlights the problem caused by repeated blacklisting of transmissions. Transmissions and receptions are respectively represented by black and white cells. Scenario 1.d relates to a 3-node cluster synchronized via 6 reference frames. Two successive reference frames are never transmitted by the same node. We assume that the algorithm operates pairwise on the nodes and tries to synchronize n1 and n2 . To do so, the algorithm will look for the valid pairs of observations in the first and second columns of X6×3 . Only pairs 3 and 6 are valid because all the other pairs contain a blacklisted element. Actually, the algorithm will not be efficient because there are 8 valid observations (white cells) in the two first columns of X6×3 but only the half (the 4 observations from pairs 3 and 6) can be used for synchronization. In comparison, as we will show later, 3 Only the packet arrival time at the MAC layer is available, but those values differ from the reception time due to the high latency of packets inside the MAC. A packet waits for transmission in the MAC if the packet queue is not empty or if the medium is busy.

TurboSync would not drop any valid observation. One can also note that the pairs of observations used by a pairwise algorithm to synchronize (n1 , n2 ), (n1 , n3 ) and (n2 , n3 ) are respectively (3, 6), (2, 5) and (1, 4) which have no element in common. This means that the three synchronizations will be completely independent. As we will see in section IV, TurboSync synchronizes all the clocks together so the synchronizations are coherent with respect to each other. Let’s now consider briefly designs 1.a, 1.b and 1.c. Figure 1.a characterizes the design of RBS [1]. Only one node acts as a broadcaster. So the design matrix is very simple and the synchronization algorithm has maximum efficiency. However, the broadcaster n1 can not be synchronized with the other nodes because all its observations are blacklisted. To resolve this problem, RBS can use a thirdparty broadcaster. This scenario is represented in figure 1.b. The extra broadcaster, n2 , computes the conversion parameters of n1 with the rest of the network and sends them to n1 . Vice versa, n1 computes n2 ’s parameters and send them to it. It is worth noting that the RBS broadcasters do not share their observations (i.e. their design matrix) because the receiver of the reference frames only deliver ‘a report of these timestamps to the pulse sender’ [1]. Figure 1.c exemplifies a design for offline synchronization where three passive monitors are deployed nearby to monitor a public or an enterprise WLAN. The design is very simple because the monitors do not transmit and so all the observations are valid. However, for systems that carry out the monitoring tasks themselves, the transmitters also behave as monitors. MANETs are examples where it is not affordable to deploy a separate overlapping infrastructure just for monitoring. TurboSync will be particularly efficient in these cases because it can cope with interleaved transmissions and receptions. IV. T URBO S YNC This section describes the two components of TurboSync : the TurboSync Observation Exchange Protocol which is related to the generation of reference frames and the exchange of observations; and the TurboSync algorithm that computes the clock conversion parameters from the collected data. A. The TurboSync Observation Exchange Protocol The TurboSync protocol is dedicated to the distribution of pulse observations between nodes into a cluster. Pulses are represented by reference broadcast frames. Reference frames can be generated by TurboSync or any other protocol. Reference frames are packets that appear ”in the air” only once for the whole duration of the measurement. The beacons generated by Wi-Fi infrastructures and the OLSR Hellos in ad-hoc networks are examples of reference frames because they include a counter that uniquely identifies each packet. TurboSync is in this way able to reuse any external reference frame broadcasted on the cluster. This property is unique to TurboSync because TurboSync considers all nodes as potential broadcasters and the frames that one node generates can be reused for its own synchronization contrary to pairwise schemes which filter

4

transmitted packets (see Section III). Moreover, the execution is not divided in rounds between the different transmitters as in RBS and so the observations of multiple broadcasters can be exploited simultaneously. We call that feature ”pulse space-time combination” (by opposition to classical ”pulse time combination”) because it combines observations of pulses generated at any time and location in the network. Off-line synchronization obviously does not require the exchange of observations as the input trace files are centrally processed and all the information is available to the processing unit. Off-line synchronization uses the MAC and routing pulses already available in the input traces as TurboSync pulses. The TurboSync protocol acts as follows:

(OR) technique can be applied to find the best fit line and this line can be used in the cluster as the master clock.

Fig. 2.

N-D regression in a 2 node cluster is similar to 2-D regression

1) Each transmitter broadcasts one or more pulses 2) Each receiver records the reception time of the pulses according to its local clock 3) The receivers exchange their past observations Not only the pulses of phase (1) but the control packets exchanged during phase (3) can be used as reference frames to maximize energy efficiency. This property is similar to turbochargers where the kinetic energy of the exhaust gases is fed back into the engine (hence the name of TurboSync). As explained in section III, one needs at least two broadcasters to synchronize all nodes in a cluster. Otherwise, TurboSync will fail like RBS to synchronize the pulse transmitter with the receivers. However, TurboSync will still provide safe time conversion tables for the rest of the receivers. Broadcasters are chosen randomly or on a deterministic basis to increase the pulse density over time. Fig. 3. With a 3 nodes (X, Y, Z) each pulse (namely A B, C) has 3 coordinates

B. The TurboSync Algorithm In on-line mode, the TurboSync algorithm is run locally on a server. The server operates on the data collected by the observation exchange protocol (see IV-A) and sends back the conversion parameters to the clients. The algorithm can also be executed on each node but in that case the nodes have to agree on what pulses to process. TurboSync extends the concept of 2-D regression to N-D regression. A dimension is associated to each node in the cluster. The best fit line is the line that best explains the variation of the multidimensional data. Figure 2 illustrates the regression process for two nodes. In that case, there are only two dimensions and the process is comparable4 to 2-D regression. Figure 3 shows how regression works with three nodes. The process is still comparable to 2-D regression but each point/pulse has three coordinates instead of two. In these ideal cases, a simple Orthogonal Regression 4 Actually, a subtle difference exists between ordinary 2-D and orthogonal ND regressions : ordinary regression minimizes the ”vertical” distance between the observed data points and the fitted curve while orthogonal regression minimizes the ”orthogonal” distance between the points and the line. Actually, orthogonal regression is better suited for synchronization because it assumes observational errors on all dimensions while ordinary regression only assumes error on the y-dimension (i.e. on n2 ’s observation set).

However, as explained in section III, the timestamps at the transmitter side are not usable because packets are not timestamped when they are sent on the medium and so they are not synchronized with the receptions. As a result, the transmitter coordinate is always missing from each point and it is not possible to feed the OR with complete N-tuples. Missing data can also arise at the receivers if the pulses are not observable because of poor channel conditions or if outliers are detected. The goal of TurboSync is to fill the missing values with the time values that would have been observed if the transmitters were receivers. Figure 4 provides an example with 3 nodes (X, Y and Z) and 3 pulses (A, B and C). We assume that frames A, B and C are respectively sent by Y, Z and X. Let’s note • the unknown quantities in each 3tuples. The 3-tuples associated to pulses A, B and C can be written (X, •, Z)A , (X, Y, •)B , (•, Y, Z)C . If the missing data were randomly chosen in R, there would be a different fit line for each possible 3-tuples. Figure 4 shows the best fit lines corresponding to four different random tuples. However, receptions and transmissions are mutually dependent and so the missing values are not likely to be random but located relatively close to a line in the space. We use

5

Fig. 4.

Convergence of TurboSync around the best fit line

principal component analysis with missing data to solve this problem and determine the best fit line. Several techniques have been proposed to infer the missing values [13]. TurboSync uses MILES (Maximum likelihood via Iterative Least squares Estimation) [14] because it performs well when the model is deterministic and the errors on the observed values are independent and identically distributed (i.i.d) Gaussian. It is worth noting that the Gaussian distribution is moderate-tailed with a shorter tail than most distributions and thus it is particularly well adapted for modeling timestamping errors. To increase MILES accuracy and minimize the number of iterations of the algorithm, we have modified the code so that the missing values at the transmitters side are initialized with the packets arrival time at the MAC layer instead of random values. Our implementation is presented in figure 5.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Data: XM ×N : design matrix M issM ×N : index of the missing data Result: VN ×1 : direction vector of the 1st PC M ean1×N : barycenter of XM ×N Itmax ←− 300; Itmiles ←− 0; F itnew ←− 3; F itold ←−6; 

−F itold while abs F itnew > 1e−8 ∧ Itmiles < Itmax do F itold Itmiles ←− Itmiles + 1; F itold ←− F itnew ; M ean ←− mean(X); Xc ←− X − 1M ×1 . M ean; [U S V ] ←− svd(Xc , 1); M odel ←− U. S. V t + 1M ×1 . M ean; X(M iss) ←− M odel(M iss); F itnew ←− kX(¬M iss) − M odel(¬M iss)k2 ; end

Fig. 5.

MILES algorithm for TurboSync

The four first lines intialize the variables of the algorithm. More specifically, line 1 sets the maximum number of iterations of the algorithm to 300. It is the default value of MILES. TurboSync needs many fewer iterations because of the distinctive distribution of the time observations (they draw a line). Actually, TurboSync typically converges in less than 40 iterations for small matrices (ex. 5 × 3) and about 10 iterations for large ones (ex. 15 × 10). Each iteration mainly consists in one Singular Vector Decomposition (SVD). Principal Component Analysis (PCA) is performed by applying the SVD to the centered data. So, line 8 computes the barycenter of the data set (in figure 3, the barycenter would be located on segment AC at the right side of B) and line 9 centers the data set around the barycenter. 1M ×1 is an M-by-1 matrix of ones. The goal of PCA is to identify the most meaningful basis to re-express a data set. The hope is that this new basis will filter out the noise and reveal hidden structure (in our case, a line). The principal components (PC) represent the axes of this new base. In our case, we are only interested in the first PC (the dimension that best explains the variation of the data) because the other PCs only contain noise. The SVD of line 10 is a truncated SVD because it only computes the first PC. S is a scalar and UM ×1 and VN ×1 two column vectors. The vectors V and U.S respectively correspond to the new basis and to the points’ coordinates in this basis. Because of the properties of the SVD, we have Xc = U.S.V t . However, our PCA only uses one PC among N , so Xc does not exactly correspond to U.S.V t : our Xc0 = U.S.V t are perfectly organized along a line (the first PC). Line 11 simply reconstructs this line in the old basis from the barycenter and Xc0 . Actually, we show that our model/truncated SVD exactly provides the line that would be found by an orthogonal regression. Line 12 corrects the missing data in the design matrix with the values found by the model. This operation attracts the missing data to the best fit line as requested. The quality of the fit is defined at line 13 as the sum of the errors on the received timestamps. The loop stops when the attraction process does not move the points any more, ie., when the displacement of the points becomes negligible (see line 5 and 13). C. Algorithm Property RBS does not provide coherent conversion parameters because each parameter is computed individually on a partial view of the dataset. Wrong estimates of skew and offset can lead to important differences between conversion parameters along network paths. For illustration, let’s suppose that an application wants to send a time value to p nodes and that the message is forwarded hop by hop. Let’s note src and dst the end points of the route of the message and tsrc the value of the timestamp in the time base of node nsrc . The conversion chain between nsrc and ndst can be represented as follows: nsrc = n1 → n2 , . . . , nk → nk+1 , . . . , np → np+1 = ndst (1)

6

The time conversions are applied successively at each hop to the original packet timestamp. So, the skew Asrc,...,dst and the offset Bsrc,...,dst of the entire path are functions of the offsets and skews between the successive nodes along the path [1] : Y (2) Asrc,...,dst = ani ,ni+1 i=1:p

Bsrc,...,dst =

X

bni ,ni+1

i=1:p

Y

anj ,nj+1

(3)

Bsrc,...,dst = bsrc,dst

(8)

Asrc,...,dst . tsrc asrc,dst . tsrc

(9)

and so that : tdst

= =

+ Bsrc,...,dst + bsrc,dst

j=i+1:p

where the ank ,nk+1 ’s and bnk ,nk+1 ’s represent the skew and the offset between each pair of clocks nk , nk+1 on the path. We directy get the value tdst of the timestamp at the destination from equations 2 and 3: tdst = Asrc,...,dst . tsrc + Bsrc,...,dst

(4)

With RBS, the value tdst at the destination will depend on the specific path of the data, because Asrc,...,dst and Bsrc,...,dst depends on the ank ,nk+1 ’s and bnk ,nk+1 ’s of each link on the path. The advantage of TurboSync over RBS is to provide a coherent clock conversion system in the cluster. We express it formally as the following theorem. Theorem 1. If a node cluster is synchronized with TurboSync, any time reference has only one temporal value in the time base of each node. Proof We demonstrate it for a general N-D scenario with N nodes. The equation of the fit line D is expressed as :     v1 mean1     D = V t + M eant =  ...  t +  ...  vN

meanN

where t ∈ R and the vk ’s and meank ’s are the parameters of line D returned by MILES (see algorithm 5). We note he~1 , . . . , e~N i the orthonormal base in which D is represented. The conversion line Di,j between clocks ni and nj is obtained by projecting D on the plane defined by e~i and e~j . We first show that the skew and the offset of the conversion function between clocks ni and nj are related to the vk ’s and meank ’s as follows : ai,j = vj /vi (5) bi,j = meanj − ai,j meani

(6)

To prove the theorem, we then show that for every pair of clocks (nsrc , ndst ) the value of a timestamp reference tsrc in the base of nsrc takes the same value tdst when converted in the base of ndst regardless how this conversion is performed. Let’s suppose that tsrc is routed from node nsrc to ndst and that it follows the route described in equation 1. By replacing the ank ,nk+1 ’s of equation 2 with their real values (see equation (5)) and simplifying, we obtain: Y vnk+1 v p+1 Asrc,...,dst = = n = asrc,dst (7) vnk vn1 k=1:p

In other words, the total skew on the path does not depend on the intermediate clocks but only on the source and destination clocks. We show in the same way that :

which proves the transitivity of our time conversion system and concludes the proof. Indeed, the value tdst does not depend on the conversion chain between nsrc and ndst because the right hand side of equation (9) is a constant. Generally speaking, TurboSync provides safe conversion parameters over the cluster which means that the results of the processing of time stamped sensing data is independent of the location of the processing units and of the communication pattern between the nodes. We illustrate the benefit of a safe conversion system with two examples. We assume a sensor network organized as a N-node cluster. Each node senses the sound signals and records the data in its local memory. A sink queries the sensing data every minute and processes them to report if one intruder has been detected. The system considers that an acoustic event is a false alarm if one sensor or more do not sense it. In order to save energy, the sensors implement a simple communication pattern. A node nsrc is randomly selected and a route connecting all nodes is established between nsrc and the sink. The source transmits its sensing data (time and value of the acoustic events) to its successor on the route. The receiver converts the received time value in its base, compares it to the time of its own observation and keeps the smallest. The data are forwarded along the route until they reach the sink. At each hop, the receiver compares the time and the value of the received data with its own data. If an event does not match (timestamps not in range), it is considered as a false alarm and it is filtered. This communication pattern is energy efficient because the sink does not need to hear every node but just its predecessor on the route and the filtering process is distributed on the network. Furthermore, the filtering process subsequently reduces the total number of events transmitted in the network. The sink queries the data two times in a row. The two routes established for the two queries are illustrated in figure 6.a. The sink is the black node. With a classical synchronization scheme, the sink will not receive the same set of time values for the two queries because the conversion parameters depend on the route of the query and the route is not the same for both queries. TurboSync does not meet this problem because its clock conversion system is safe. The example of figure 6.a shows the benefit of a safe clock conversion system. However, a classical synchronization scheme can still force direct communications between the nodes and the sink in order to resolve the discrepancies between

7

Fig. 6. Data fusion scheme over a node-cluster with one (a) or two (b) sinks

The interpolation error is caused by the conversion of a time value (ex. T0 ) which is taken during the training period of the algorithm. It is generally limited to one or two couples of microseconds because the algorithm is designed to minimize the error in this interval. The extrapolation error term appears when the time reference is taken out of the training period. The more the time reference to convert departs from the period the synchronization algorithm was running, the more the error increases and causes high discrepancies in the dataset. V. P ERFORMANCE E VALUATION

the results of the queries. Figure 6.b presents an example where forcing the communication pattern between the nodes can not solve these discrepancies. We suppose two sinks ndst1 and ndst2 and each one queries the network. If TurboSync synchronizes the cluster, the time values of the datasets Tdst1 and Tdst2 respectively collected by nodes ndst1 and ndst2 are related by formula (9). So we can write : T dst2 = adst1,dst2 . T dst1 + bdst1,dst2 which means that ndst1 can convert its dataset in the base of ndst2 in order to reason on the same set of values. This is not true for a classical synchronization algorithm. Indeed if a node ni directly transmits its data to the sinks and if ndst1 converts again the time values to the base of ndst2 , the offset and the skew of the conversion chain are Ani,dst1,dst2 and B ni,dst1,dst2 . These values are different from ani,dst2 and bni,dst2 because the classical conversion systems are not transitive. So we will have two different estimations of the dataset Ti of ni . On the one hand at ndst2 , T dst 22 = ani,dst2 . Ti + bni,dst2

T dst 12 = Ani,dst1,dst2 . Ti + B ni,dst1,dst2 As a result, ndst1 and ndst2 can not have the same view of the network which may lead to incoherence. For instance, in a broader context, a localization scheme based on the difference of time of arrival of waves, may not localize an acoustic source at the same place at site ndst1 and ndst2 . In this case, if the events sensed by the network trigger actions, the decisions of ndst1 and ndst2 will differ. The error E between the two estimations equals T dst 22 − T dst 12 and can be decomposed into an interpolation and an extrapolation error : = T dst 22 − T dst 12 = ∆A Ti + ∆B = ∆A T0 + ∆B + ∆A (Ti − T0 ) {z } | | {z } interpolation error extrapolation error

A. Computational Complexity The complexity of TurboSync and RBS depends on the size of the design matrix XM ×N which contains the observations of the reference frames collected by the nodes (see section III). We assume M reference frames and N nodes. As described in section IV-B and algorithm 5, TurboSync uses a singular vector decomposition (SVD) to estimate the clock conversion parameters from XM ×N . More specifically, MILES uses the implicitly restarted Arnoldi method [15] to compute the SVD. The Arnoldi method is fed with the matrix :   0M ×M XM ×N 0 X(M +N )×(M +N ) = t XN 0N ×N ×M 0 X(M +N )×(M +N ) is simply obtained by concatenating XM ×N t to its transpose XN ×M . It contains 2.M.N non-zero entries and it can be stored as a sparse matrix because 0M ×M and 0N ×N are two zero blocks. The time complexity of the Arnoldi method for computing the principal components is :

T (svd(X)) = T (X 0 .v 0 ).itarnoldi + 2.size(X 0 ).it2arnoldi

and on the other hand at ndst1 ,

E(Ti )

This section compare the computational complexity, the bandwidth utilization and the accuracy of TurboSync and RBS.

(10)

where ∆A = ani,dst2 − Ani,dst1,dst2 and ∆B = bni,dst2 − B ni,dst1,dst2 and T0 represents the last time (in node ni base) that the classical synchronization algorithm has been executed.

where the v 0 s are M+N-long column vector associated to the principal components. The complexity T (X 0 .v 0 ) of the matrix vector product X 0 .v 0 equals 2.M.N because X’ is sparse. Furthemore, when there is only a few principal components to find out, as in our case (we are only interested in the first PC), the number of iterations itarnoldi of the method is small. As a result, T (svd(X)) = O(M.N ) [16]. The total complexity of MILES reduces to T (MILES) = O(M.N ) because it converges quiclky (i.e. itmiles is also small) when N and M are large, and all its inner operations, including svd(X), are of complexity O(M.N ). By comparison, RBS has to synchronize each pair of nodes. As there are N (N − 1) node pairs in a Nnode cluster and the cost of each synchronization (regression) is M , the complexity of RBS equals T (RBS) = O(M.N 2 ). These results hold for large N and M . The Arnoldi method and MILES converges in a couple of iterations for 25 × 25 matrices. For simpler configuration, however, the convergence is slower (typical values for itarnoldi and itmiles are 40) but the processing delay is still not perceptible.

8

B. Communication Efficiency We measure the communication efficiency CN as the number of pulses that need to be transmitted for synchronizing N nodes with a given level of accuracy. We assume that RBS and TurboSync respectively run with 2 and N broadcasters (configurations (b) and (d) of figure 1) and that they provide the same accuracy with 2.M reference frames5 . Reuse gain : The reuse gain RN represents the number of pulses saved by TurboSync by reusing the frames of another protocol (see section IV-A). Let’s denote by T the duration of a synchronization round and by K the mean number of reusable frames sent per node per second. So, we have : RN = KN T

(11)

Turbo mode gain : TurboSync can reuse its own output (the observation reports) to feed itself. We assume that the frame ids and the packet timestamps are coded on 8 bytes and that the maximum frame size is 2000 bytes. In this case, a single frame observation report can carry 125 observations. So a node only needs to transmit one frame observation report per round if M is less than 125 (which is the case in most designs). So by hypothesis, we can write the turbo gain TN as : TN = N

(12)

The communication gain GT S/RBS measures the bandwidth savings achieved by TurboSync thanks to reuse of external frames and turbo mode. From equations 11 and 12, we have : GT S/RBS =

2M CN (RBS) = CN (T S) 2M − N (KT + 1)

C. Algorithm Accuracy We compare the accuracy of TurboSync and RBS through simulation. The results are obtained for different cluster sizes N and distributions of timestamping errors. Both schemes are implemented in Matlab. The results are averaged over 100 runs. At each run, we generate a design matrix XM ×N common to the two algorithms but with different distributions of missing values: for TurboSync, the missing values are distributed across the columns in a round robin fashion as illustrated in figure 1.(d). TurboSync synchronizes the N nodes at each synchronization round, so, we refer to it as T SN . We compare it to the RBS variants of figures 1.(a) and 1.(b). In variant 1.(a), the first node acts as the broadcaster and the N -1 others as receivers. This variant is denoted by RBSN −1 because the broadcaster is excluded from the set of nodes to synchronize, and so, only N -1 nodes are synchronized at the end of a synchronization round. The second variant of RBS uses an extra broadcaster to compute the conversion parameters 5 Actually,

this is a pessimistic assumption: we show in section V-C that TurboSync (T SN ) provides better estimates of the conversion parameters than RBS (RBSN ) with less pulses.

of the first broadcaster. All the nodes are synchronized at the end of the broadcasters’ rounds, so we call this variant RBSN . The packet reception and transmission times are generated taking into account timestamping errors and clock parameters. The timestamping errors follow an exponential distribution. Clocks have different offsets and skews. The skew of each clock is randomly chosen between 0 and 100 parts per million (ppm). The pulse rate equals 5 reference frames per second in all scenarios. All receiving antennas detect the signal at the same time, 0 to 10 ms after transmission6 . Each algorithm estimates the conversion parameters of each pair of nodes7 . We then compute for each pair the difference between the real and estimated skews. Our performance metric is the mean absolute skew error over all pairs. A value of 1ppm represents an error of 1usec per second. Figures 7 to 10 show the performance for different network size and mean timestamping error (N , λ) : (3, 3us), (3, 5us), (10, 3us) and (10, 5us). The graphs say how many pulses M to generate to reach a given accuracy. For instance, in figure 7, one can see that RBSN −1 needs about 10 pulses to synchronize the nodes with a 2ppm error, and that RBSN needs twice as much frames to achieve the same performance. By definition, RBSN always needs twice the number of pulses as RBSN −1 for the same accuracy. This is the cost of adding an extra broadcaster to synchronize the whole network. Obviously, for each algorithm, the number of pulses to generate increases with the quality required for synchronization. TurboSync clearly outperforms RBSN in every scenario. It means that TurboSync can synchronize the cluster with much less pulses than RBS. Actually, the performance of T SN is very close to RBSN −1 (which only synchronizes N -1 nodes). As figures 9 and 10 show, the efficiency of TurboSync grows with the size of the cluster. This is because the percentage of missing data in the design matrix is lower for large networks. As a result, TurboSync’s core, MILES, has better convergence to the true clocks’ conversion parameters. VI. C ONCLUSION This article has presented TurboSync a technique that provides consistent time across a node cluster. TurboSync is particularly resource efficient because it turns each node into a time tracker and each reference frame into a time indicator. The role of the broadcaster is distributed on each node which makes it fault-tolerant and resilient to transmission errors. TurboSync also provides a coherent clock conversion system. Our results show that TurboSync provides more consistent conversion parameters than RBS. In future works, we plan to improve TurboSync’ current regression core, MILES, to detect outlying observations in order to have more robust estimations of the clock conversion parameters. 6 This

delay models the packet latency in the MAC at the transmitter side. RBSN and RBSN −1 , each broadcaster computes the parameters of (N -1)(N -2) pairs. T SN operates, instead, on N (N -1) pairs. 7 In

9

Fig. 7.

Quality of synchronization, 3-node cluster, timestamping error=3us

Fig. 9.

Fig. 8.

Quality of synchronization, 3-node cluster, timestamping error=5us

Fig. 10. Quality of synchronization, 10-node cluster, timestamping error=5us

R EFERENCES [1] J. Elson, L. Girod, and D. Estrin, “Fine-grained network time synchronization using reference broadcasts, SIGOPS Oper,” Syst. Rev, vol. 36, pp. 147–163, 2002. [2] B. Scheuermann, W. Kiess, M. Roos, F. Jarre, and M. Mauve, “On the time synchronization of distributed log files in networks with local broadcast media,” IEEE/ACM Transactions on Networking (TON), vol. 17, no. 2, pp. 431–444, 2009. [3] P. Ramanathan, K. Shin, and R. Butler, “Fault-tolerant clock synchronization in distributed systems,” Computer, vol. 23, no. 10, pp. 33–42, 1990. [4] P. Verissimo and L. Rodrigues, “A posteriori agreement for fault-tolerant clock synchronization on broadcast networks,” in 22th Int. Symp. on Fault-Tolerant Computing, 1992. [5] R. Mahajan, M. Rodrig, D. Wetherall, and J. Zahorjan, “Analyzing the MAC-level Behavior of Wireless Networks in the Wild,” ACM SIGCOMM Computer Communication Review, vol. 36, no. 4, p. 86, 2006. [6] T. Claveirole, M. de Amorim et al., “WiPal: Efficient Offline Merging of IEEE 802.11 Traces,” Arxiv preprint arXiv:0806.4526, 2008. [7] J. Yeo, M. Youssef, and A. Agrawala, “A framework for wireless LAN monitoring and its applications,” in Proceedings of the 3rd ACM workshop on Wireless security. ACM New York, NY, USA, 2004, pp. 70–79.

Quality of synchronization, 10-node cluste, timestamping error=3us

[8] Y. Cheng, J. Bellardo, P. Benk¨ o, A. Snoeren, G. Voelker, and S. Savage, “Jigsaw: Solving the puzzle of enterprise 802.11 analysis,” ACM SIGCOMM Computer Communication Review, vol. 36, no. 4, p. 50, 2006. [9] J. Halpern and I. Suzuki, “Clock synchronization and the power of broadcasting,” Distributed Computing, vol. 5, no. 2, pp. 73–82, 1991. [10] R. Karp, J. Elson, C. Papadimitriou, and S. Shenker, “Global synchronization in sensornets,” in Proceedings of the 6th Latin American Symposium on Theoretical Informatics, 2004, pp. 609–624. [11] M. Marti and a. L. A. Kusy B., and Simon G., “The flooding time synchronization protocol,” in Proceedings of the 2nd international conference on Embedded networked sensor systems. ACM New York, NY, USA, 2004, pp. 39–49. [12] K. Romer, P. Blum, and L. Meier, “Time synchronization and calibration in wireless sensor networks,” Handbook of Sensor Networks: Algorithms and Architectures, pp. 199–237, 2005. [13] I. Jolliffe, Principal component analysis. Springer verlag, 2002. [14] R. Bro, N. Sidiropoulos, and A. Smilde, “Maximum likelihood fitting using ordinary least squares algorithms,” J. Chemometrics, vol. 16, pp. 387–400, 2002. [15] R. B. Lehoucq, D. C. Sorensen, and C. Yang, ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, 1998. [16] The MathWorks, “Matlab documentation,” http://www.mathworks.com/ access/helpdesk/help/techdoc/math/f6-8856.html.