This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available athttp://dx.doi.org/10.1109/LCOMM.2014.2341604 IEEE COMMUNICATION LETTERS

1

Clock Synchronization over Communication Paths with Queue-Induced Delay Asymmetries Zdenek Chaloupka, Member, IEEE, Nayef Alsindi, Member, IEEE, and James Aweya, Senior Member, IEEE

Abstract—Timing protocols such as IEEE 1588 Precision Time Protocol (PTP) and Network Time Protocol (NTP) require an accurate measurement of the communication path delay between the time server (master) and the client (slave) in order to provide a precise timing synchronization. The precise time at the clients site is then estimated using an assumption that forward and backward delays due to physical propagation time through network are equal, or any difference between them is calibrated beforehand. Apart from physical link delays, a timing packet experiences queue induced delay due to switching/routing devices on the path. This queuing delay is usually different in forward and backward directions, thus introducing the Queue-Induced Asymmetry (QIA), which is a major contributor to the time error between master and slave clocks if physical asymmetries are calibrated. This paper proposes a new technique for QIA compensation that does not require any on-path timing support, thus is easily deployed with current network devices. Index Terms—asymmetry, delay estimation, IEEE 1588v2 PTP, time synchronization.

I. I NTRODUCTION

T

IMING transfer using a protocol such as IEEE 1588v2 PTP [1] and a well designed slave clock recovery mechanism can provide time synchronization in the submicrosecond and lower [2]. However, this is done using the important assumption that the packets time delay from master to slave is equal to that from slave to master. In real life, the communication paths are not perfectly symmetric mainly due to dissimilar forward and reverse physical link delays and queuing delays [3]. The main sources of the packet delay variations are Network Elements (i.e. switches and routers) due to the packet queuing process. This is accentuated for the case when timing transfer is done in an end-to-end manner without any form of timing assistance from the network to help mitigate the effects of the variable queuing delays. Propagation delay asymmetry (as a summary of physical and queuing asymmetry delays) has become a major challenge in clock synchronization. The use of network timing support mechanisms such as boundary clocks (BCs) and transparent clocks (TCs) can eliminate delay asymmetry arising in the following two scenarios [4], [5]. First, variable queuing delays on forward and reverse paths (mainly due to different traffic load on the two traffic directions) and second, asymmetry caused by timing packets taking different paths in each direction. Note that even timing support mechanisms (BCs and TCs) The authors are with Etisalat BT Innovation Center (EBTIC) at Khalifa University of Science, Technology and Research, Abu Dhabi, UAE. E-mail: (zdenek.chaloupka, nayef.alsindi, james.aweya)@kustar.ac.ae. Manuscript received January 27, 2014; revised June 4, 2014 and July 10, 2014.

are unable to correct for delay asymmetry due to dissimilar physical links between network elements, thus a calibration procedure has to be performed [5]. Literature review indicates that QIA in the end-to-end context has not yet been addressed, since most of the work on asymmetry focuses on physical link asymmetries (due to link length or speed differences) as reported in [2], [6]. In this paper we deal with the scenario, where timing packets travel through the same routes on both, forward and backward paths, yet may experience different queuing delays. This paper sets the following goals: • Describe a mechanism for compensating for the QIA on a communicating path without using networking timing support mechanisms like BCs and TCs. • Analyze the bounds and sensitivity of the proposed mechanism in simulation. The rest of this paper is organized as follows: Section II introduces a basic clock model based on IEEE 1588v2 PTP, Section III derives the QIA compensation algorithm, and Section IV shows simulation results. Finally, Section V concludes this paper. II. BASIC C LOCK M ODEL F OR IEEE 1588 PTP The IEEE 1588v2 PTP defines a packet-based synchronization protocol for communicating frequency, phase and timeof-day information from a master to one or more slaves. PTP relies on the use of accurately timestamped packets (at nanosecond level granularity) sent from the master clock to one or more slave clocks to allow them to synchronize to the master clock. The PTP message exchange process is explained in the following text and illustrated in Fig. 1. Note that symbols and notations used in the paper are explained close to the equation that uses them. We define now a generalized clock offset and skew equation for the synchronization problem. We assume that at any particular time instant, the instantaneous view of the relationship between the master (server) clock with timeline S(t) and the slave (client) clock with timeline C(t), can be described by the well-know simple skew clock defined as S(t) = (1 + α)C(t) + θ,

(1)

where α represents skew and θ is time offset of the slave clock. The above equation can be extended to account for the case where the master clock and slave clock exchange messages over a communication link with delay as follows. The master sends a Sync message with timestamp T1 [n] ∈ S(t) which arrives at the slave with timestamp T2 [n] ∈ C(t)

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available athttp://dx.doi.org/10.1109/LCOMM.2014.2341604 IEEE COMMUNICATION LETTERS

2

The raw offset θr is the quantity often computed during clock synchronization because the system is assumed to be symmetric in the average sense (the average delay in both directions on the path are assumed to be equal). The offset θp can be determined as described in [5]. We propose here a new technique for computing the offset θq , which is a quantity often ignored during clock synchronization, since it is assumed that the average queuing delays in both directions on the path are equal. Let us define the first difference of (2) and (3) as

Fig. 1. IEEE 1588 PTP message flow between a master (top) and slave (bottom).

experiencing a fixed physical link delay df (as a summary of all static delays) plus variable queuing delay qf (see Fig. 1). With respect to the PTP protocol and (1) we can write T1 [n] + df + qf [n] = (1 + α[n])T2 [n] + θ[n].

(2)

Next, the slave sends a DelayReq message, which departs the slave with timestamp T3 [n] ∈ C(t) and arrives at the master with timestamp T4 [n] ∈ S(t), experiencing a fixed delay dr (as a summary of all static delays) plus variable queuing delay qr , thus we can write T4 [n] − db − qb [n] = (1 + α[n])T3 [n] + θ[n].

(3)

The master conveys timestamp T4 to the slave by embedding it in a DelayResp message. At the end of this PTP messages exchange, the slave possesses all four timestamps T1 , T2 , T3 , T4 . An asymmetric path exists when the fixed delay components and/or queuing delay components in both directions are unequal. We assume that the physical link asymmetry is calibrated beforehand, but the QIA is compensated for using a technique like the one described here. A key assumption is that the message exchanges occur over a period of time so small that the offset θ and skew α can be assumed constant over that period. This is a valid assumption since telecom grade oscillators used for synchronization purposes maintain more than nanosecond precision over short periods of time during which PTP message exchanges occur (PTP can be set up to 128 messages per second).

dT1 [n] + qf [n] − qf [n − 1] = (1 + α[n]) dT2 [n],

(5)

dT4 [n] − (qr [n] − qr [n − 1]) = (1 + α[n]) dT3 [n],

(6)

where dTi [n] = Ti [n] − Ti [n − 1] and i ∈ 1 . . . 4. In (5) and (6) we assume that α and θ values do not change over short periods of time, that is, between two sets of PTP messages exchange. By rearranging (5) and (6) we obtain [n] = qf [n] − qf [n − 1] = (1 + α[n]) dT2 [n] − dT1 [n], (7) γ[n] = qr [n] − qr [n − 1] = dT4 [n] − (1 + α[n]) dT3 [n], (8) where [n] and γ[n] are the first differences of the queuing delays qf and qr , respectively. Since inverse function to difference is integration, let us define a cumulative sum (numerical integration) of the differential queuing delays as follows s [n] = s [n − 1] + [n] = s [n − 1] + qf [n] − qf [n − 1], = s [0] + qf [n] − qf [0],

(9)

γs [n] = γs [0] + qr [n] − qr [0],

(10)

where s [0] = γs [0] = 0. We observe that the cumulative summation of (or γ) gives the forward (or reverse) queuing delay at time n biased by the first experienced queuing delay qf [0] (qr [0]) that is uknown. This is similar to the integration without knowing the integration constant (initial conditions). As mentioned in Section 1, we assume that forward and backward traveling packets go through the same NEs. We also expect the NEs to process packet streams invariably and independently for both paths as verified in [3], that is, a minimum processing time of a packet inside NEs (if experienced) is a constant, and thus, qfmin equals qrmin . Hence, we can use the minimum queuing time as a common reference and define normalized qf and qr as k=n

qfnr [n] = s [n] − min {s [k]} = qf [n] − qf [0]− k=n−N

k=n

− min {qf [k] − qf [0]} , k=n−N

III. N EW T ECHNIQUE FOR C OMPENSATING Q UEUE -I NDUCED A SYMMETRY Adding equations (2) and (3) and rearranging we obtain the overall clock offset θ as θr

z }| { 1 θ[n] = (T1 [n] + T4 [n] − (1 + α[n]) (T2 [n] + T3 [n])) 2 1 1 + (df − dr ) + (qf [n] − qr [n]) . (4) 2 2 | {z } | {z } θp

θq

= qf [n] − qfmin [n],

(11)

k=n

qrnr [n] = γs[n] − min {γs [k]} = qr [n] − qrmin [0], (12) k=n−N

where we assume that the simplest way of estimating qfmin and qrmin is to keep a window of N samples of s and γs , respectively, and selecting a sample with the minimum value from that window. Note that in case the whole network is heavily loaded, the probability of arrival of a minimally delayed packet within a finite window of length N drops significantly [3]. This can be resolved by increasing the window length N in (11), (12), yet in case where the minimally delayed packet

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available athttp://dx.doi.org/10.1109/LCOMM.2014.2341604 IEEE COMMUNICATION LETTERS

3

fails to arrive the precision of the proposed method decreases (see simulation results in Section IV.B). Finally, using (11), (12), we can estimate queuing offset as: θ˜q [n] = (qfnr [n] − qrnr [n]) /2,

(13)

where θ˜q is an instantaneous asymmetry offset between forward and backward traveling packets. Since the majority of synchronization algorithms is based on some averaging strategy, in order to compensate the QIA one has to find an average quantity θ˜qav . One of the easiest average estimators is an Exponentially Weighted Moving Average (EWMA) filter defined as: θ˜qav [n] = (1 − β) θ˜qav [n − 1] + β · θ˜q [n],

(14)

where β is the filtering factor (0 < β < 1). The proof of validity of (13) is straightforward, since inserting (11), (12) into (13), and using the fact that qfmin equals qrmin , we can write θ˜q [n] = (qf [n] − qfmin [n] − (qr [n] − qrmin [n])) /2, = (qf [n] − qr [n]) /2,

TABLE I PACKET D ELAY D ISTRIBUTION C HARACTERISTICS

(15)

where the last line follows from (4). IV. A NALYSIS OF THE C OMPENSATION A LGORITHM IN S IMULATION Obviously, the number of NEs and their traffic load conditions have significant impact on every part of the QIA Compensation Algorithm (QIACA), yet it is outside of the scope of this paper to investigate all the different network topologies, loading conditions and synchronization algorithms. Instead, we analyze how sensitive is the QIACA to its variables: α – skew estimate in (7) and (8), N – length of the window in (11) and (12), β – filtering factor in (14). The sensitivity analysis is performed in simulation by comparing the expected value with the output of (14). A. Simulation Setup A high precision simulation of the synchronization for packet networks is a challenging problem due to processing requirements. It was shown in [7] that a simple model based on properties of PTP is able to provide nanosecond precision with reasonable processing requirements. We have implemented model reported in [7] and used it to generate timestamps T1 to T4 with frequency of 16 sync packets per second. The oscillator skew was set to 10 parts per million (ppm) as this is within the expected range of frequency skew of a telecom grade OCXO oscillator. Queuing and physical wire propagation delays are simulated using distributions reported in [3], specifically, we used distributions for 20, and 40% load (see summary of statistical parameters in Table I). We assume that only the QIA is present in the system and its magnitude is given by the difference between mean of forward and backward paths propagation delay (i.e. |136 − 162| = ±13μs for combination of 20 and 40% loads). Note that the sign depends on which load is on forward and backward paths, that is, +13μs for 40/20% Fw/Bw load.

Net. Load

Min Delay [μs]

Mean Delay [μs]

Std. Dev. [μs]

20% 40%

120 120

136 162

12.7 16.9

Since we have three different variables (α, β, N ) for study, in order to perform an analysis of one parameter the other two need to be fixed to avoid interference. The fixed values of the variables were as follows: α – set to 10 ppm, β – 0.0005 to keep the output variance of (4) as low as possible, N – 10 minutes window (9600 samples) as it was shown in [3] that minimally delayed sample arrives within that window up to 50% load. The sensitivity of the QIACA was tested on a data set of 3 hours of run time (172800 exchanges of T1 to T4 timestamps) for each parameters iteration. B. Analysis of the Parameters At first we tested the sensitivity of QIACA to α parameter by adding Gaussian noise with varying standard deviation (s.d.) to the a-priori known value of the skew (10 ppm) and computed how the asymmetry estimation differs from expected value (see Fig. 2). Note that there was no traffic load applied, since the variance coming from the traffic load noise would mask the effect of the noisy α estimate on the QIACA. The other parameters were fixed as mentioned in the previous paragraph. We can observe in Fig. 2 that the sensitivity to the skew estimation imprecision is negligible up to 10−2 ppm of s.d. of the added noise, where the resulting mean error is close to zero with s.d. below 100ns, however, above that threshold the cumulative sum in (9) and (10) produces a random walk behavior, which causes a random mean with high s.d. of error. In order to verify that this limit (less then 10−2 ppm) is feasible, we have implemented Kalman Filter (KF) algorithm for skew estimation from [8], where it was shown that KF reduces input noise variance linearly with the number of measurements. We tested in simulation (see dash-dot line in Fig. 2) that with 20/40% Fw/Bw load conditions KF needs about 3000 samples to achieve required performance for QIACA (this is in accordance with [8]). ☎✶✞✁✝

✂✁✁✁ ✄✁✁✁

✢✶✒✖ ✶✣✣✗✣ ✤✣✣✗✣ ✔✵✘✵

☎✁✁✁ ✎✍ ✡✌

✁ ✡✠

✲☎✁✁✁

☛☛

☛☞ ✟ ✟ ▼

☎✶✞✁ ✎✍ ✌✡✏ ✑✏

☎✶✞✁✂ ✍ ☛☞ ☛☛ ❊

✲✄✁✁✁

☎✶✞✁✄

✲✂✁✁✁ ✲ ✁✁✁

☎✶✲✁✆ ☎✶✲✁✝ ✁✵✁✁☎ ✁✵☎ ●✒✓✔✔✕✒✖ ✖✗✕✔✶ ✔✵✘✵ ✒✘✘✶✘ ✙✗ ❛ ✥✚✚✛✜

☎✶✞✁☎

Fig. 2. A sensitivity of the QIACA to the skew estimators error. Dash-dotted line shows KF estimators performance after 3000 samples.

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available athttp://dx.doi.org/10.1109/LCOMM.2014.2341604 IEEE COMMUNICATION LETTERS

4

An analysis of sensitivity to parameters β and N is performed with 20 and 40% load on forward and backward paths respectively. The QIACA was then used to estimate the asymmetry offset with different values of β/N and the difference (mean error with s.d.) between precise and observed value is shown in the Fig. 3. As expected, the output variance is lower when EWMA bandwidth decreases (lower values of β), achieving 200ns s.d. of the timing error for β equal to 0.001. Note that the parameter β does not affect the mean error, which remains constant, only the variance of the estimate. On the other hand, the parameter N has a significant impact on the unbiassedness of the QIACA. This is a direct consequence of (11) and (12), as shorter windows do not contain the minimally delayed packet consistently, thus produce biased offset estimate. A window longer then 8 minutes keeps the residual bias error in the order of tens of nanoseconds. The overall convergence time of the QIACA is driven by the properties of the skew estimator and parameter β. In case KF is used, skew estimator converges after 3000 samples. Based on EWMA filtering theory (14) converges to the true value exponentialy. It can be shown that (14) achieves 90% of the true value after n samples, where n ≥ 2.3 · (1 − β)/β. Thus, for β equal 0.001, n ≥ 2300 samples. This can be reduced by initializing θ˜qav in (14) using a simple sample mean. At worst, the QIACA requires about 6000 samples for its convergence (less than 2 minutes for 64 packets per second) during which the loading conditions are required to remain constant. C. Application of the QIACA The above analysis shows the sensitivity of QIACA to its parameters. In order to further emphasize the importance of QIA compensation, we implemented a simple synchronization algorithm from [8] called Averaged Time Differences (ATD) and used it to synchronize a slave to the master’s time. The QIACA was applied with the settings based on the sensitivity analysis: α – estimated on the fly by KF from [8], β – 0.001, N – 10 minutes. We simulated 8 hours of running time during which the traffic load changed as labeled in the top of Fig. 4.

✠✟ ✞✝ ☎✆ ☎☎ ❊

❱✙✚✡✙✛✎✏ ❜ ✕✲✘ ✂✵✂✂✂✄ ✂✵✂✂✄ ✂✵✂✄ ✂✵✄ ✁✂✂ ✂✂✂ ✄✁✂✂ ✄✂✂✂ ✁✂✂ ✂ ✲✁✂✂ ✲✄✂✂✂ ✲✄✁✂✂ ❱✙✚✡✙✛✎✏ ❜ ✲ ✂✂✂ ❲✡☛☞✌✍ ✎✏☛✑✒✓ ✔ ✲ ✁✂✂ ✹ ✻ ✽ ✂ ❲✡☛☞✌✍ ✎✏☛✑✒✓ ✔ ✕✖✡☛✗✘

✄

✄✂

Fig. 3. A sensitivity analysis of the QIACA to the EWMA coefficient β (top x-axis in logscale) and window length parameter N (bottom x-axis).

✁✁✁✁ ✂✁✁✁✁ ☛☞ ✡✠

✄✁✁✁✁ ✝

✲✄✁✁✁✁

✝✞

✟✞✞

✄✁✷ ✌✍✎✏✍

✄✁✎✂✁✷ ✌✍✎✏✍

✁ ☎✆ ❚

✲✂✁✁✁✁ ✲ ✁✁✁✁

✂✁✎✄✁✷ ✌✍✎✏✍ ✁

✺✁

✶✁✁

✶✺✁

✁

✺✁

✶✁✁

✶✺✁

✄✁✁

✂✁✷ ✌✍✎✏✍ ✄✺✁

✸✁✁

✸✺✁

✂✁✁

✂✺✁

✺✁✁

✸✺✁

✂✁✁

✂✺✁

✺✁✁

✁✁✁✁ ✂✁✁✁✁ ☛☞ ✡✠

✄✁✁✁✁ ✝

✲✄✁✁✁✁

✞✞

✞✟

✁ ✝ ☎✆ ❚

✲✂✁✁✁✁ ✲ ✁✁✁✁

✄✁✁ ✄✺✁ ✸✁✁ ❊✑✒✓✔✕✖ ✗✘✙✕ ✚✙✘✛✔✜

Fig. 4. Time error between the master and slave without (top) and with (bottom) QIACA applied. The traffic load settings are labeled in the top figure.

Notice that the time estimate bias (top of Fig. 4) of the plain ATD algorithm is greatly reduced when the QIACA is applied (bottom of Fig. 4). V. C ONCLUSION An end-to-end time transfer is the most challenging problem in clock synchronization, but also offers attractive benefits to the network operator. One of the problems hindering the deployment of timing solutions in the end-to-end manner is a queue-induced asymmetry introduced by network elements. In this paper we have introduced a method that efficiently suppresses queue-induced asymmetries without any on-path timing support. It was shown that the queue-induced asymmetry in the order of tens of microseconds is reduced to less then a microsecond. The future work is focused on implementation of the proposed method with some more advance synchronization algorithms into a hardware prototype that is to be deployed in real networks to verify the performance of our algorithm experimentally. R EFERENCES [1] IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems, IEEE, 2008. [2] N. Simanic et al., “Compensation of asymmetrical latency for ethernet clock synchronization,” in IEEE Symp. on Precision Clock Synchronization for Measurement, 2011. [3] I. Hadˇzi´c and D. R. Morgan, “On packet selection criteria for clock recovery,” in IEEE Symp. on Precision Clock Synch. for Measurement, October 2009. [4] M. Ouellette et al., “Using IEEE 1588 and boundary clocks for clock synchronization in telecom networks,” IEEE Com. Mag., February 2011. [5] “Time and phase synchronization aspects of packet networks,” ITU-T, G.8271/Y.1366, February 2012. [6] Sungwon Lee, Seunggwan Lee, C. Hong, “An accuracy enhanced IEEE 1588 synchronization protocol for dynamically changing and asymmetric wireless links,” IEEE Comm. Letters, vol. 16, no. 2, 2012. [7] Z. Chaloupka et al., “Efficient and precise simulation model of synchronization clocks in packet networks.” IEEE CAMAD, 2013. [8] A. Bletsas, “Evaluation of Kalman filtering for network time keeping,” IEEE Trans. on Ultrasonics, Ferroel., and Freq. Control, vol. 52, September 2005.

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]

1

Clock Synchronization over Communication Paths with Queue-Induced Delay Asymmetries Zdenek Chaloupka, Member, IEEE, Nayef Alsindi, Member, IEEE, and James Aweya, Senior Member, IEEE

Abstract—Timing protocols such as IEEE 1588 Precision Time Protocol (PTP) and Network Time Protocol (NTP) require an accurate measurement of the communication path delay between the time server (master) and the client (slave) in order to provide a precise timing synchronization. The precise time at the clients site is then estimated using an assumption that forward and backward delays due to physical propagation time through network are equal, or any difference between them is calibrated beforehand. Apart from physical link delays, a timing packet experiences queue induced delay due to switching/routing devices on the path. This queuing delay is usually different in forward and backward directions, thus introducing the Queue-Induced Asymmetry (QIA), which is a major contributor to the time error between master and slave clocks if physical asymmetries are calibrated. This paper proposes a new technique for QIA compensation that does not require any on-path timing support, thus is easily deployed with current network devices. Index Terms—asymmetry, delay estimation, IEEE 1588v2 PTP, time synchronization.

I. I NTRODUCTION

T

IMING transfer using a protocol such as IEEE 1588v2 PTP [1] and a well designed slave clock recovery mechanism can provide time synchronization in the submicrosecond and lower [2]. However, this is done using the important assumption that the packets time delay from master to slave is equal to that from slave to master. In real life, the communication paths are not perfectly symmetric mainly due to dissimilar forward and reverse physical link delays and queuing delays [3]. The main sources of the packet delay variations are Network Elements (i.e. switches and routers) due to the packet queuing process. This is accentuated for the case when timing transfer is done in an end-to-end manner without any form of timing assistance from the network to help mitigate the effects of the variable queuing delays. Propagation delay asymmetry (as a summary of physical and queuing asymmetry delays) has become a major challenge in clock synchronization. The use of network timing support mechanisms such as boundary clocks (BCs) and transparent clocks (TCs) can eliminate delay asymmetry arising in the following two scenarios [4], [5]. First, variable queuing delays on forward and reverse paths (mainly due to different traffic load on the two traffic directions) and second, asymmetry caused by timing packets taking different paths in each direction. Note that even timing support mechanisms (BCs and TCs) The authors are with Etisalat BT Innovation Center (EBTIC) at Khalifa University of Science, Technology and Research, Abu Dhabi, UAE. E-mail: (zdenek.chaloupka, nayef.alsindi, james.aweya)@kustar.ac.ae. Manuscript received January 27, 2014; revised June 4, 2014 and July 10, 2014.

are unable to correct for delay asymmetry due to dissimilar physical links between network elements, thus a calibration procedure has to be performed [5]. Literature review indicates that QIA in the end-to-end context has not yet been addressed, since most of the work on asymmetry focuses on physical link asymmetries (due to link length or speed differences) as reported in [2], [6]. In this paper we deal with the scenario, where timing packets travel through the same routes on both, forward and backward paths, yet may experience different queuing delays. This paper sets the following goals: • Describe a mechanism for compensating for the QIA on a communicating path without using networking timing support mechanisms like BCs and TCs. • Analyze the bounds and sensitivity of the proposed mechanism in simulation. The rest of this paper is organized as follows: Section II introduces a basic clock model based on IEEE 1588v2 PTP, Section III derives the QIA compensation algorithm, and Section IV shows simulation results. Finally, Section V concludes this paper. II. BASIC C LOCK M ODEL F OR IEEE 1588 PTP The IEEE 1588v2 PTP defines a packet-based synchronization protocol for communicating frequency, phase and timeof-day information from a master to one or more slaves. PTP relies on the use of accurately timestamped packets (at nanosecond level granularity) sent from the master clock to one or more slave clocks to allow them to synchronize to the master clock. The PTP message exchange process is explained in the following text and illustrated in Fig. 1. Note that symbols and notations used in the paper are explained close to the equation that uses them. We define now a generalized clock offset and skew equation for the synchronization problem. We assume that at any particular time instant, the instantaneous view of the relationship between the master (server) clock with timeline S(t) and the slave (client) clock with timeline C(t), can be described by the well-know simple skew clock defined as S(t) = (1 + α)C(t) + θ,

(1)

where α represents skew and θ is time offset of the slave clock. The above equation can be extended to account for the case where the master clock and slave clock exchange messages over a communication link with delay as follows. The master sends a Sync message with timestamp T1 [n] ∈ S(t) which arrives at the slave with timestamp T2 [n] ∈ C(t)

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available athttp://dx.doi.org/10.1109/LCOMM.2014.2341604 IEEE COMMUNICATION LETTERS

2

The raw offset θr is the quantity often computed during clock synchronization because the system is assumed to be symmetric in the average sense (the average delay in both directions on the path are assumed to be equal). The offset θp can be determined as described in [5]. We propose here a new technique for computing the offset θq , which is a quantity often ignored during clock synchronization, since it is assumed that the average queuing delays in both directions on the path are equal. Let us define the first difference of (2) and (3) as

Fig. 1. IEEE 1588 PTP message flow between a master (top) and slave (bottom).

experiencing a fixed physical link delay df (as a summary of all static delays) plus variable queuing delay qf (see Fig. 1). With respect to the PTP protocol and (1) we can write T1 [n] + df + qf [n] = (1 + α[n])T2 [n] + θ[n].

(2)

Next, the slave sends a DelayReq message, which departs the slave with timestamp T3 [n] ∈ C(t) and arrives at the master with timestamp T4 [n] ∈ S(t), experiencing a fixed delay dr (as a summary of all static delays) plus variable queuing delay qr , thus we can write T4 [n] − db − qb [n] = (1 + α[n])T3 [n] + θ[n].

(3)

The master conveys timestamp T4 to the slave by embedding it in a DelayResp message. At the end of this PTP messages exchange, the slave possesses all four timestamps T1 , T2 , T3 , T4 . An asymmetric path exists when the fixed delay components and/or queuing delay components in both directions are unequal. We assume that the physical link asymmetry is calibrated beforehand, but the QIA is compensated for using a technique like the one described here. A key assumption is that the message exchanges occur over a period of time so small that the offset θ and skew α can be assumed constant over that period. This is a valid assumption since telecom grade oscillators used for synchronization purposes maintain more than nanosecond precision over short periods of time during which PTP message exchanges occur (PTP can be set up to 128 messages per second).

dT1 [n] + qf [n] − qf [n − 1] = (1 + α[n]) dT2 [n],

(5)

dT4 [n] − (qr [n] − qr [n − 1]) = (1 + α[n]) dT3 [n],

(6)

where dTi [n] = Ti [n] − Ti [n − 1] and i ∈ 1 . . . 4. In (5) and (6) we assume that α and θ values do not change over short periods of time, that is, between two sets of PTP messages exchange. By rearranging (5) and (6) we obtain [n] = qf [n] − qf [n − 1] = (1 + α[n]) dT2 [n] − dT1 [n], (7) γ[n] = qr [n] − qr [n − 1] = dT4 [n] − (1 + α[n]) dT3 [n], (8) where [n] and γ[n] are the first differences of the queuing delays qf and qr , respectively. Since inverse function to difference is integration, let us define a cumulative sum (numerical integration) of the differential queuing delays as follows s [n] = s [n − 1] + [n] = s [n − 1] + qf [n] − qf [n − 1], = s [0] + qf [n] − qf [0],

(9)

γs [n] = γs [0] + qr [n] − qr [0],

(10)

where s [0] = γs [0] = 0. We observe that the cumulative summation of (or γ) gives the forward (or reverse) queuing delay at time n biased by the first experienced queuing delay qf [0] (qr [0]) that is uknown. This is similar to the integration without knowing the integration constant (initial conditions). As mentioned in Section 1, we assume that forward and backward traveling packets go through the same NEs. We also expect the NEs to process packet streams invariably and independently for both paths as verified in [3], that is, a minimum processing time of a packet inside NEs (if experienced) is a constant, and thus, qfmin equals qrmin . Hence, we can use the minimum queuing time as a common reference and define normalized qf and qr as k=n

qfnr [n] = s [n] − min {s [k]} = qf [n] − qf [0]− k=n−N

k=n

− min {qf [k] − qf [0]} , k=n−N

III. N EW T ECHNIQUE FOR C OMPENSATING Q UEUE -I NDUCED A SYMMETRY Adding equations (2) and (3) and rearranging we obtain the overall clock offset θ as θr

z }| { 1 θ[n] = (T1 [n] + T4 [n] − (1 + α[n]) (T2 [n] + T3 [n])) 2 1 1 + (df − dr ) + (qf [n] − qr [n]) . (4) 2 2 | {z } | {z } θp

θq

= qf [n] − qfmin [n],

(11)

k=n

qrnr [n] = γs[n] − min {γs [k]} = qr [n] − qrmin [0], (12) k=n−N

where we assume that the simplest way of estimating qfmin and qrmin is to keep a window of N samples of s and γs , respectively, and selecting a sample with the minimum value from that window. Note that in case the whole network is heavily loaded, the probability of arrival of a minimally delayed packet within a finite window of length N drops significantly [3]. This can be resolved by increasing the window length N in (11), (12), yet in case where the minimally delayed packet

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available athttp://dx.doi.org/10.1109/LCOMM.2014.2341604 IEEE COMMUNICATION LETTERS

3

fails to arrive the precision of the proposed method decreases (see simulation results in Section IV.B). Finally, using (11), (12), we can estimate queuing offset as: θ˜q [n] = (qfnr [n] − qrnr [n]) /2,

(13)

where θ˜q is an instantaneous asymmetry offset between forward and backward traveling packets. Since the majority of synchronization algorithms is based on some averaging strategy, in order to compensate the QIA one has to find an average quantity θ˜qav . One of the easiest average estimators is an Exponentially Weighted Moving Average (EWMA) filter defined as: θ˜qav [n] = (1 − β) θ˜qav [n − 1] + β · θ˜q [n],

(14)

where β is the filtering factor (0 < β < 1). The proof of validity of (13) is straightforward, since inserting (11), (12) into (13), and using the fact that qfmin equals qrmin , we can write θ˜q [n] = (qf [n] − qfmin [n] − (qr [n] − qrmin [n])) /2, = (qf [n] − qr [n]) /2,

TABLE I PACKET D ELAY D ISTRIBUTION C HARACTERISTICS

(15)

where the last line follows from (4). IV. A NALYSIS OF THE C OMPENSATION A LGORITHM IN S IMULATION Obviously, the number of NEs and their traffic load conditions have significant impact on every part of the QIA Compensation Algorithm (QIACA), yet it is outside of the scope of this paper to investigate all the different network topologies, loading conditions and synchronization algorithms. Instead, we analyze how sensitive is the QIACA to its variables: α – skew estimate in (7) and (8), N – length of the window in (11) and (12), β – filtering factor in (14). The sensitivity analysis is performed in simulation by comparing the expected value with the output of (14). A. Simulation Setup A high precision simulation of the synchronization for packet networks is a challenging problem due to processing requirements. It was shown in [7] that a simple model based on properties of PTP is able to provide nanosecond precision with reasonable processing requirements. We have implemented model reported in [7] and used it to generate timestamps T1 to T4 with frequency of 16 sync packets per second. The oscillator skew was set to 10 parts per million (ppm) as this is within the expected range of frequency skew of a telecom grade OCXO oscillator. Queuing and physical wire propagation delays are simulated using distributions reported in [3], specifically, we used distributions for 20, and 40% load (see summary of statistical parameters in Table I). We assume that only the QIA is present in the system and its magnitude is given by the difference between mean of forward and backward paths propagation delay (i.e. |136 − 162| = ±13μs for combination of 20 and 40% loads). Note that the sign depends on which load is on forward and backward paths, that is, +13μs for 40/20% Fw/Bw load.

Net. Load

Min Delay [μs]

Mean Delay [μs]

Std. Dev. [μs]

20% 40%

120 120

136 162

12.7 16.9

Since we have three different variables (α, β, N ) for study, in order to perform an analysis of one parameter the other two need to be fixed to avoid interference. The fixed values of the variables were as follows: α – set to 10 ppm, β – 0.0005 to keep the output variance of (4) as low as possible, N – 10 minutes window (9600 samples) as it was shown in [3] that minimally delayed sample arrives within that window up to 50% load. The sensitivity of the QIACA was tested on a data set of 3 hours of run time (172800 exchanges of T1 to T4 timestamps) for each parameters iteration. B. Analysis of the Parameters At first we tested the sensitivity of QIACA to α parameter by adding Gaussian noise with varying standard deviation (s.d.) to the a-priori known value of the skew (10 ppm) and computed how the asymmetry estimation differs from expected value (see Fig. 2). Note that there was no traffic load applied, since the variance coming from the traffic load noise would mask the effect of the noisy α estimate on the QIACA. The other parameters were fixed as mentioned in the previous paragraph. We can observe in Fig. 2 that the sensitivity to the skew estimation imprecision is negligible up to 10−2 ppm of s.d. of the added noise, where the resulting mean error is close to zero with s.d. below 100ns, however, above that threshold the cumulative sum in (9) and (10) produces a random walk behavior, which causes a random mean with high s.d. of error. In order to verify that this limit (less then 10−2 ppm) is feasible, we have implemented Kalman Filter (KF) algorithm for skew estimation from [8], where it was shown that KF reduces input noise variance linearly with the number of measurements. We tested in simulation (see dash-dot line in Fig. 2) that with 20/40% Fw/Bw load conditions KF needs about 3000 samples to achieve required performance for QIACA (this is in accordance with [8]). ☎✶✞✁✝

✂✁✁✁ ✄✁✁✁

✢✶✒✖ ✶✣✣✗✣ ✤✣✣✗✣ ✔✵✘✵

☎✁✁✁ ✎✍ ✡✌

✁ ✡✠

✲☎✁✁✁

☛☛

☛☞ ✟ ✟ ▼

☎✶✞✁ ✎✍ ✌✡✏ ✑✏

☎✶✞✁✂ ✍ ☛☞ ☛☛ ❊

✲✄✁✁✁

☎✶✞✁✄

✲✂✁✁✁ ✲ ✁✁✁

☎✶✲✁✆ ☎✶✲✁✝ ✁✵✁✁☎ ✁✵☎ ●✒✓✔✔✕✒✖ ✖✗✕✔✶ ✔✵✘✵ ✒✘✘✶✘ ✙✗ ❛ ✥✚✚✛✜

☎✶✞✁☎

Fig. 2. A sensitivity of the QIACA to the skew estimators error. Dash-dotted line shows KF estimators performance after 3000 samples.

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available athttp://dx.doi.org/10.1109/LCOMM.2014.2341604 IEEE COMMUNICATION LETTERS

4

An analysis of sensitivity to parameters β and N is performed with 20 and 40% load on forward and backward paths respectively. The QIACA was then used to estimate the asymmetry offset with different values of β/N and the difference (mean error with s.d.) between precise and observed value is shown in the Fig. 3. As expected, the output variance is lower when EWMA bandwidth decreases (lower values of β), achieving 200ns s.d. of the timing error for β equal to 0.001. Note that the parameter β does not affect the mean error, which remains constant, only the variance of the estimate. On the other hand, the parameter N has a significant impact on the unbiassedness of the QIACA. This is a direct consequence of (11) and (12), as shorter windows do not contain the minimally delayed packet consistently, thus produce biased offset estimate. A window longer then 8 minutes keeps the residual bias error in the order of tens of nanoseconds. The overall convergence time of the QIACA is driven by the properties of the skew estimator and parameter β. In case KF is used, skew estimator converges after 3000 samples. Based on EWMA filtering theory (14) converges to the true value exponentialy. It can be shown that (14) achieves 90% of the true value after n samples, where n ≥ 2.3 · (1 − β)/β. Thus, for β equal 0.001, n ≥ 2300 samples. This can be reduced by initializing θ˜qav in (14) using a simple sample mean. At worst, the QIACA requires about 6000 samples for its convergence (less than 2 minutes for 64 packets per second) during which the loading conditions are required to remain constant. C. Application of the QIACA The above analysis shows the sensitivity of QIACA to its parameters. In order to further emphasize the importance of QIA compensation, we implemented a simple synchronization algorithm from [8] called Averaged Time Differences (ATD) and used it to synchronize a slave to the master’s time. The QIACA was applied with the settings based on the sensitivity analysis: α – estimated on the fly by KF from [8], β – 0.001, N – 10 minutes. We simulated 8 hours of running time during which the traffic load changed as labeled in the top of Fig. 4.

✠✟ ✞✝ ☎✆ ☎☎ ❊

❱✙✚✡✙✛✎✏ ❜ ✕✲✘ ✂✵✂✂✂✄ ✂✵✂✂✄ ✂✵✂✄ ✂✵✄ ✁✂✂ ✂✂✂ ✄✁✂✂ ✄✂✂✂ ✁✂✂ ✂ ✲✁✂✂ ✲✄✂✂✂ ✲✄✁✂✂ ❱✙✚✡✙✛✎✏ ❜ ✲ ✂✂✂ ❲✡☛☞✌✍ ✎✏☛✑✒✓ ✔ ✲ ✁✂✂ ✹ ✻ ✽ ✂ ❲✡☛☞✌✍ ✎✏☛✑✒✓ ✔ ✕✖✡☛✗✘

✄

✄✂

Fig. 3. A sensitivity analysis of the QIACA to the EWMA coefficient β (top x-axis in logscale) and window length parameter N (bottom x-axis).

✁✁✁✁ ✂✁✁✁✁ ☛☞ ✡✠

✄✁✁✁✁ ✝

✲✄✁✁✁✁

✝✞

✟✞✞

✄✁✷ ✌✍✎✏✍

✄✁✎✂✁✷ ✌✍✎✏✍

✁ ☎✆ ❚

✲✂✁✁✁✁ ✲ ✁✁✁✁

✂✁✎✄✁✷ ✌✍✎✏✍ ✁

✺✁

✶✁✁

✶✺✁

✁

✺✁

✶✁✁

✶✺✁

✄✁✁

✂✁✷ ✌✍✎✏✍ ✄✺✁

✸✁✁

✸✺✁

✂✁✁

✂✺✁

✺✁✁

✸✺✁

✂✁✁

✂✺✁

✺✁✁

✁✁✁✁ ✂✁✁✁✁ ☛☞ ✡✠

✄✁✁✁✁ ✝

✲✄✁✁✁✁

✞✞

✞✟

✁ ✝ ☎✆ ❚

✲✂✁✁✁✁ ✲ ✁✁✁✁

✄✁✁ ✄✺✁ ✸✁✁ ❊✑✒✓✔✕✖ ✗✘✙✕ ✚✙✘✛✔✜

Fig. 4. Time error between the master and slave without (top) and with (bottom) QIACA applied. The traffic load settings are labeled in the top figure.

Notice that the time estimate bias (top of Fig. 4) of the plain ATD algorithm is greatly reduced when the QIACA is applied (bottom of Fig. 4). V. C ONCLUSION An end-to-end time transfer is the most challenging problem in clock synchronization, but also offers attractive benefits to the network operator. One of the problems hindering the deployment of timing solutions in the end-to-end manner is a queue-induced asymmetry introduced by network elements. In this paper we have introduced a method that efficiently suppresses queue-induced asymmetries without any on-path timing support. It was shown that the queue-induced asymmetry in the order of tens of microseconds is reduced to less then a microsecond. The future work is focused on implementation of the proposed method with some more advance synchronization algorithms into a hardware prototype that is to be deployed in real networks to verify the performance of our algorithm experimentally. R EFERENCES [1] IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems, IEEE, 2008. [2] N. Simanic et al., “Compensation of asymmetrical latency for ethernet clock synchronization,” in IEEE Symp. on Precision Clock Synchronization for Measurement, 2011. [3] I. Hadˇzi´c and D. R. Morgan, “On packet selection criteria for clock recovery,” in IEEE Symp. on Precision Clock Synch. for Measurement, October 2009. [4] M. Ouellette et al., “Using IEEE 1588 and boundary clocks for clock synchronization in telecom networks,” IEEE Com. Mag., February 2011. [5] “Time and phase synchronization aspects of packet networks,” ITU-T, G.8271/Y.1366, February 2012. [6] Sungwon Lee, Seunggwan Lee, C. Hong, “An accuracy enhanced IEEE 1588 synchronization protocol for dynamically changing and asymmetric wireless links,” IEEE Comm. Letters, vol. 16, no. 2, 2012. [7] Z. Chaloupka et al., “Efficient and precise simulation model of synchronization clocks in packet networks.” IEEE CAMAD, 2013. [8] A. Bletsas, “Evaluation of Kalman filtering for network time keeping,” IEEE Trans. on Ultrasonics, Ferroel., and Freq. Control, vol. 52, September 2005.

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]