ON PERFORMANCE OF CONSENSUS PROTOCOLS SUBJECT TO ...

3 downloads 114 Views 1MB Size Report
Aug 10, 2015 - literature and we call it the starry line graph. ...... U. Buy, A.D. Kshemkalyani, “Clock synchronization for wireless sensor networks: a survey,” Ad ...
ON PERFORMANCE OF CONSENSUS PROTOCOLS SUBJECT TO NOISE: ROLE OF HITTING TIMES AND NETWORK STRUCTURE

arXiv:1508.00036v2 [math.OC] 10 Aug 2015

ALI JADBABAIE, ALEX OLSHEVSKY

A BSTRACT. We study the performance of linear consensus protocols based on repeated averaging in the presence of additive noise. When the consensus dynamics corresponds to a reversible Markov chain, we give an exact expression for the weighted steady-state disagreement in terms of the stationary distribution and hitting times in an underlying graph. This expression unifies and extends several results in the existing literature. We show how this result can be used to characterize the asymptotic mean-square disagreement in certain noisy opinion dynamics models, as well as the scalability of protocols for formation control and decentralized clock synchronization.

C ONTENTS 1.

2.

Introduction

2

1.1. Statement of our main result . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. Our contributions and the organization of this paper . . . . . . . . . . .

3 5

1.3. Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

Three motivating problems

7

2.1. The noisy DeGroot model of opinion dynamics . . . . . . . . . . . . . . .

7

2.2. Decentralized clock synchronization . . . . . . . . . . . . . . . . . . . . . 10 2.3. Formation control from offset measurements . . . . . . . . . . . . . . . . 13 3.

Proof of Theorem 1

17

4.

Examples and connections

22

4.1. Preliminary observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2. The complete graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3. The circle graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4. The line graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.5. The star graph

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.6. The two-star graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.7. The lollipop graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Department of Electrical and Systems Engineering, University of Pennsylvania, [email protected]. Department of Industrial and Enterprise Systems Engineering, University of Illinois at UrbanaChampaign, [email protected]. 1

4.8. The starry line graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.9. Two-dimensional grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.10. The d-dimensional grid with d ≥ 3 . . . . . . . . . . . . . . . . . . . . . . 28 4.11. The complete binary tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.12. Regular expander graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.13. Dense Erdos-Renyi random graphs . . . . . . . . . . . . . . . . . . . . . . 29 4.14. Regular dense graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.15. Regular graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.16. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.17. Bounds in terms of resistance and the Kemeny constant . . . . . . . . . . 31 4.18. Bounding δuni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 ss 5.

32

Symmetric matrices

5.1. Simplifications of Theorem 1 in the symmetric case . . . . . . . . . . . . 33 5.2. Correlation of the errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3. Clock synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.4. Formation control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.

Simulations

39

7.

Conclusion

40

References

42

1. I NTRODUCTION This paper studies the discrete-time noisy linear system, (1)

x(t + 1) = Px(t) + w(t),

when the matrix P is stochastic. The vector w(t) represents noise, and we will assume it to be a random vector with zero mean, covariance Σw , and having the property that w(t1 ) and w(t2 ) are independent whenever t1 6= t2 . This recursion is often known as the consensus iteration. This is because the noiseless version x(t + 1) = Px(t) has the property that x(t) converges to span{1}, the subspace spanned by the all-ones vector, subject to some mild technical assumptions on the matrix P. Consensus protocols have many applications in algorithm design for distributed and multi-agent systems, where one usually thinks of each component xi (t) as being controlled by a different “agent,” with the agents asymptotically ”coming to consensus” as all the components of x(t) approach the same value. Indeed, the design of distributed policies for control and signal processing in networks of potentially mobile agents has attracted considerable attention in recent years, and the past decade of research has led to the understanding that a key tool for such systems 2

is the consensus iteration. It turns out that many sophisticated network coordination tasks can be either entirely reduced to consensus or have decentralized solutions where the consensus iteration plays a key role; we mention formation control [43, 52, 51, 45], distributed optimization [63, 41], coverage control [22, 57], distributed task assignment [12, 38], networked Kalman filtering [6, 59, 1, 53], cooperative flocking/leader-following [25, 42], among many others. Our goal in the present paper is to understand exactly how much the “coming to consensus” property deteriorates due to the addition of the noise term w(t) in Eq. (1). In a sense to be made precise next, we would like to characterize how far all the xi (t) are from each other in the limit as t → ∞, and to understand how the answer depends on combinatorial properties of the matrix P. Intuitively, the action of multiplying a vector x(t) by a stochastic matrix P has the effect of bringing the components xi (t) “closer together,” while the addition of the noise w(t) counteracts that; the two processes result in some equilibrium level of expected disagreement as t → ∞. We are motivated by the observation (made in some of the previous literature on the subject and discussed more formally later) that the equilibrium level of disagreement often grows with the dimension of the matrix P. Thus even though Eq. (1) is stable in the sense that expected disagreement between any pair of nodes is bounded as t → ∞, this stability can be almost meaningless for large systems. This has implications for all distributed protocols which rely on consensus, as it implies that in some cases they may not be robust under the addition of noise. Understanding exactly when this happens is the goal of this paper. 1.1. Statement of our main result. This paper has a simple main result which serves to unify and extend several previous works, and we begin with a concise statement of it. We start with a number of definitions. We will assume P to be an irreducible and aperiodic matrix, and we let π be the stationary distribution vector, i.e., T

T

π P=π ,

n X

πi = 1.

i=1

We will use Dπ to stand for the diagonal matrix whose (i, i)’th entry is πi . Furthermore, we define the weighted average vector, ! n X x(t) := πi xi (t) 1, i=1

as well as the error vector e(t) := x(t) − x(t). Intuitively, e(t) measures how far away the vector x(t) is from consensus. Indeed, it is easyPto see that the noiseless update x(t + 1) = Px(t) has the property that x(t) converges to ( i πi xi (0)) 1. The quantity e(t) thus measures the difference between the “current state” x(t) and the limit of the noiseless version of Eq. (1) starting from x(t). Our goal is to understand how big the error e(t) is as t goes to infinity. We will measure this by considering the following two linear combinations of squared errors at each node, 3

δ(t) :=

n X

πi E[e2i (t)]

i=1

1X E[e2i (t)], n i=1 n

δuni (t) :=

i.e., we weigh the squared error at each node either proportionally to the stationary distribution of the node or uniformly. Finally, our actual measures of performance will be the asymptotic quantities δss := lim sup δ(t) t→∞

δuni ss

:= lim sup δuni (t), t→∞

which measure the limiting disagreement among the nodes. We will sometimes write δss (P, Σw ) when the update matrix P and the noise covariance Σw are not clear from context and likewise for δuni ss . Before stating our main result, let us recall the notion of a hitting time from node i to node j in a Markov chain: this is the expected time until the chain visits j starting from node i. We use HM (i → j) to denote this hitting time in the Markov chain whose probability transition matrix is M. By convention, HM (i → i) = 0 for all i. We will use the notation HM to denote the matrix whose i, j’th element is HM (i → j). With the above definitions in place, we are now able to state the main result of this paper. Theorem 1. If the Markov chain with transition matrix P is reversible, then δss = πT HP2 Dπ Σw Dπ 1 − Tr(HP2 Dπ Σw Dπ ).

The theorem characterizes δss in terms of combinatorial quantities associated with an underlying Markov chain, namely the stationary distribution and the hitting times. Inspecting the theorem, note that it expresses δss in terms of a difference of two linear combinations of entries of the matrix HP2 Dπ Σw Dπ , both with nonnegative coefficients which add up to n. As we will demonstrate later, we can often use this theorem as the basis for “back-of-the-envelope” calculations which result in accurate bounds on δss . Furthermore, this theorem captures the intuition that not all noises are created equal, in the sense that noise at key locations should have a higher contribution to the limiting disagreement. Indeed, in the event that noises at different nodes are uncorrelated, the second term of Theorem 1 is easily seen to be zero and we obtain (2)

δss P, diag

σ21 , . . . , σ2n



=

n X n X

σ2i π2i πj HP2 (j → i).

i=1 j=1

We see that in this case δss is aP linear combination of the variances at each node, where the variance σ2i multiplied by π2i nj=1 πj HP2 (j → i). Note that this multiplier is a product of a measure of importance coming from theP stationary distribution (i.e., π2i ) and a measure of the “mean accessibility” of a node (i.e., nj=1 πj HP2 (j → i)). 4

In the event that all noises have the same variance, we obtain the simplified version (3)

2



δss P, σ I = σ

2

n X n X

π2i πj HP2 (j → i).

i=1 j=1

P P As we discuss later in this paper, for many classes of matrices P the quantity ni=1 nj=1 π2i πj HP2 (j → i) grows linearly with the total dimension of the system n. In other words, although the system is technically stable in the sense of having bounded expected disagreement as t → ∞, this stability is almost meaningless if n is large. Equations (2) and (3) allow us to determine when this is the case by analyzing how stationary distribution and hitting times grow on various kinds of graphs1 . Later in the paper (in Section 4) we will use these equations to work out how δss scales for a variety of matrices P which come from graphs. 1.2. Our contributions and the organization of this paper. This paper has two main contributions. The first is to prove Theorem 1 (which unifies a number of earlier results in the literature) and use it to obtain order-optimal estimates for δss for a variety of matrices P naturally associated with graphs. Our second contribution is to demonstrate the utility of Theorem 1 by describing the connection of δss to opinion dynamics, clock synchronization, and formation control, and in particular by analyzing the scalability of certain protocols for formation control in the presence of noise and distributed clock synchronization. Additionally, we discuss corollaries of this theorem which connect δss to notions of graph resistance, the Kemeny constant of a Markov chain, and other graph-theoretic quantities. We also discuss the connection between δss and the related quantity δuni ss , and show that, as a byproduct of Theorem 1, we can obtain the tightest known combinatorial upper and lower bounds on δuni ss . The remainder of this paper is organized as follows. We conclude the introduction with Section 1.3 which discusses the previous literature and places our results in the context of existing work. The subsequent Section 2 discusses noisy opinion dynamics, distributed clock synchronization in a network, and noisy formation control, and shows that the behavior of dynamics in these problems can, in many cases, be written as the δss of an appropriately defined matrix. We then turn to the proof of Theorem 1, which is proved in Section 3. The following Section 4 uses Theorem 1 to compute order-optimal estimates for δss on a variety of (not necessarily symmetric) matrices coming from graphs and discusses connections between δss and other graph-theoretic quantities. Section 5 collects a number of observations and simplifications that can be made under the assumption that the matrix P is symmetric and then revisits the problems of formation control and clock synchronization, in particular characterizing their performance on many different graphs. Section 6 contains some simulations, and we conclude with some final remarks in Section 7. 1.3. Related work. Theorem 1 can be thought of as a unification and extension of several results in the existing literature. The problem of analyzing the equilibrium level of disagreement in consensus with noise was initiated in [65] where it was shown that for 1

In particular, we observe that the the amount of noise amplification in the network δss (P, σ2 I) is not fully characterized by the spectral gap of the underlying graph; see, for example, the table in Section 4.16. 5

symmetric P with eigenvalues 1 = λ1 , λ2 , . . . , λn , we have n  X δss P, σ2 I = i=2

1 . 1 − λ2i

In fact, the expression on the right-hand side equals the so-called Kemeny constant of the Markov chain P2 , which we will formally define later and which is a certain linear combination of hitting times. The above equality follows from Theorem 1; see Section 5.1 for details. Several papers have explored the connection betwen δss (P, σ2 I) and the electric resistance of the underlying graph. The recent work [37] proved the inequality (4)

n2

3 π3min 2 2 πmax RP2 ≤ δuni (P, σ I) ≤ n R 2, ss πmax πmin P

where RM is a measure of the average resistance associated with the stochastic, reversible matrix M. As [37] notes, in the case when P is symmetric we have that πmax = πmin = 1/n and Eq. (4) becomes an equality; thus for symmetric matrices, δss can be expressed in terms of the average resistance. A similar reduction to resistance was remarked on in [47] in continuous time. Appealing to the fact that graph resistances can be written as a sum of hitting times [11], this equality turns out to be another special case of Theorem 1; see Section 5.1 for details. Furthermore, Theorem 1 may be viewed as a generalization of this equality in two ways: from symmetric to reversible chains, and from uncorrelated noise Σw = σ2 I to general Σw . We note that the reduction of various applications to consensus problems usually introduces introduction of correlation in the noises – see, for example, our discussion of clock synchronization and formation control in Section 2 – meaning that it is of some importance to study the case of general correlation matrices Σw . We remark that Eq. (4) provides combinatorial upper and lower bounds on δuni ss whose ratio is (πmax /πmin )4 , and the latter can be thought of as a measure of the skewness of the distribution of node influence. The problem of obtaining a combinatorial expression for δuni ss is still open; we will later show later (in Section 4.18) that, as a corollary of Theorem 1, it is possible to provide combinatorial upper and lower bounds on δuni ss whose ratio is only πmax /πmin . Finally, the main observation that Eq. (1) can have asymptotic disagreement which grows with the size of the system was, to our knowledge, first made in [3] (in continuous time). As observed in [3] in the context of vehicular formation control, this means that any protocol which relies on consensus iterations can suffer from a considerable degradation of performance in large networks. Furthermore, [3] showed that topology can have a profound influence on performance, by proving that while on the ring graph the asymptotic disagreement grows linearly with the number of nodes, it remains bounded on the 3D torus (and grows only logarithmically on the 2D torus). Theorem 1 allows us to replicate these calculations (in discrete-time) and extend them to many different graphs, as well as to understand the separate effects of noise at each node; see the myriad of worked-out examples in Section 4 and in particular the table in Section 4.16. We next mention some other closely related strands of literature. Our work is related to the recent sequence of papers [66, 67, 48, 18] which considered the effects of noise in a continuous-time version of Eq. (1) over directed graphs (by contrast, our assumption that 6

P corresponds to a reversible and irreducible Markov chain implies that Pij 6= 0 if and only if Pji 6= 0). In [66], explicit expressions for a measure of steady-state disagreement were computed for a number of graphs. The paper [67] investigated steady-state disagreement on trees and derived a partial ordering capturing which trees have smaller steady-state disagreements. The papers [48, 18] studied leader selection problems wherein we must choose leader(s) to track an external signal. It turns out that optimal leader selection is related to a novel measure of information centrality as explained in [48, 18]. The recent work [34] considered relaxations of such leader selection problems in terms of semi-definite programming. Other related work includes [55] which investigated consensus-like protocols with noise in continuous time, focusing on connections with measures of sparsity such as number of spanning trees, number of cut-edges, and the degree-sequence. The related paper [56] investigated several measures of robustness related to δss in terms of their convexity. The recent paper [47] characterized steady-state disagreement in a number of fractal graphs. Our earlier work [26] focused on connections between asymptotic disagreement and the Cheeger constant and coefficients of ergodicity of the corresponding Markov chain. Finally, we mention again the paper [65] which began the literature on the subject by formulating the problem of optimizing δss over symmetric matrices with a given graph structure as a convex optimization problem; and the recent paper [62] which considered approximation algorithms for the problem of designing networks that minimize δss . 2. T HREE MOTIVATING PROBLEMS As we have previously remarked, consensus protocols have been central to a number of recent advances in distributed control and signal processing. In this section, we focus on three motivating scenarios, namely the analysis of opinion dynamics with noise, synchronizing clocks in a network, and the problem of formation maintenance. For each of these problems, we spell out the reduction to an appropriately defined consensus problem and explain how these problems lead to the study of the quantities δss and δuni ss . 2.1. The noisy DeGroot model of opinion dynamics. The mathematical study of opinion dynamics is an old subject dating back to the classic works of Stone [60], Harrary [24], and DeGroot [14] which has recently experienced a resurgence of interest (e.g., [10, 35, 23, 4, 39, 40, 27]). It is, of course, impossible to accurately model the behavior of human beings, which are the result of a complex interaction of a host of psychological processes. Nevertheless, human societies do appear to exhibit regularities [5], for example in the emergence of common languages or consensus around particular issues, and it is of interest to understand whether these global regularities may be accounted for with simple models of human behavior. Correspondingly, the goal of much of the recent research is to investigate the macroscopic consequences of simple rules for opinion change inspired by the experimental literature on small group dynamics. We next describe a popular model for consensus formation known as the DeGroot model. Following the classic works of [19, 14, 24], we consider a group of n individuals, each of which has an opinion modeled by the vector xi ∈ Rd . We may think of xi as stacking up the opinions of individual i on d distinct issues. There is an underlying graph G, which we assume to be undirected, and agents repeatedly interact with their neighbors in this graph. As a result, we have a discrete-time dynamic system where the opinions xi (t) of 7

the agents are updated as, (5)

xi (t + 1) = xi (t) +

X

pij (xj (t) − xi (t)),

j∈N(i)

where N(i) is the set of neighbors of node i and pij is some collection of nonnegative numbers. Intuitively, each agent thatP interacts with i moves i’th opinion in his or her own direction. It is standard assume j∈N(i) pij < 1, which is equivalent to requiring that every node is influenced by its own opinion in its updates. The DeGroot model is consistent with empirical findings that discussion of opinions in small groups usually results in opinions that lie somewhere between the maximum and minimum opinions of the participants [21], as well as with the sociological analysis of mechanisms which produce opinion change [2]. We remark that the matrix P = [pij ] in the DeGroot model is reversible for many common choices of the coefficients pij . For example, one natural choice of weights is (6)

pij =

1 for all j ∈ N(i), 2d(i)

which leads to the update rule, 1 1 xi (t + 1) = xi (t) + 2 2

P j∈N(i)

xj (t)

d(i)

,

corresponding to each agent averaging its own opinion with the mean opinion of its neighbors. More generally, a natural choice of pij comes from adding self-loops to the graph and assigning weights w({i, j}) to each edge representing the strength of the relationship between agents i and j, and finally setting w({i, j}) . j∈N(i) w({i, j})

pij = P

In other words, each individual takes a convex combination of the opinions of its neighbors but with weights depending on the strength of the underlying relationship. It is not hard to see that Eq. (6) is a special case of this, and that the matrix P corresponding to such a choice of coefficients is always reversible2 . In this form, the model was first proposed by DeGroot in [14]. It stands in contrast to “bounded confidence” models, which model scenarios where each individual only interacts with agents whose opinions are not too different from its own [23, 35, 4]. In the DeGroot model, every individual interacts with its neighbors in the graph regardless of the difference in their opinions. We do not provide a detailed overview of the opinion dynamics literature, which is quite copious and where many variations on these rules have been proposed and studied, but instead refer the reader to the surveys [10, 35]. The DeGroot model described above has the property of resulting in asymptotic agreement among individuals, i.e., we will have that ||xi (t) − xj (t)||2 → 0 as long as the underlying graph G is connected. However, this finding stands in contrast to the widely noted 2

Indeed, one can verify that the stationary distribution of node i is proportional to which one can immediately calculate that πi pij = πj pji . 8

P j∈N(i)

w({i, j}), from

Opinion Evolution, Noisy DeGroot Model

Opinion Evolution, DeGroot Model

2

1

1.8 0.5

1.6

1.4 0 1.2

1

-0.5

0.8 -1

0.6

0.4 -1.5 0.2

-2

0 0

5

10

15

20

25

30

35

0

5

10

15

20

25

30

35

Iteration

Iteration

F IGURE 1. On the left, we show a run of the DeGroot model for six individuals whose initial opinions are random Gaussians in R1 ; on the right-hand side, we show the same for the noisy DeGroot model. Both simulations are done on the line graph on six nodes. The noisy model captures the intuition that opinions do not perfectly synchronize as a result of interactions between individuals, but are rather brought within a range of each other. phenomenon of persistent disagreement, wherein opinions do not fully synchronize. It is therefore natural to consider the noisy DeGroot model, consisting of the update, X xi (t + 1) = xi (t) + pij (xj (t) − xi (t)) + wi (t), j∈N(i)

where each wi (t) is an independent random variable. The noisy DeGroot model incorporates the intuition that, while Eq. (5) captures a general feature of opinion evaluation, there are a number of essentially random, person-specific factors that influence opinion changes as well. The noisy model does lead to persistent disagreement among individuals; we refer the reader to Figure 1 for a simulation. Rather than coming wholly to consensus, the opinions of all of the individuals will exhibit a collective drift around a common mean. It is natural to wonder at the distribution of opinions generated by this model. Indeed, a particularly salient quantity is the expected size of the disagreement in the network as t → ∞. Intuitively, we might expect that better connected graphs might have less disagreement, whereas opinions will be drifting further apart on less connected graphs. More concretely, one might wonder how big the disagreement is for the complete graph (when all individuals talk to each other) versus the star graph (where all individuals talk one common neighbor) versus other network structures like the line graph or a tree graph. We are thus lead to ask how much the individuals are expected to deviate from the average opinion of the group as t → ∞. Mathematically, it P is slightly more natural to ask how much the agents deviate on average from the opinion ni=1 πi xi (t), which is the limit of the (noiseless) DeGroot model from starting opinions xi (t). Of course, this is the very definition of δuni ss . 9

Similarly, if we seek to place a higher weight on disagreement at important nodes by weighting the square difference at node i proportionately to the stationary distribution πi , we obtain the definition of δss . Thus all the calculations we perform in this paper discussing how δuni ss and δss scale on various graphs may be understood in terms of opinion disagreement in the noisy DeGroot model. The interested reader may glance at Section 4 for many examples of such calculations; here we briefly mention a couple of examples to pique the reader’s interest. We will show that, with the choice of weights from Eq. (6) and all wi (t) being uncorrelated with identical variances σ2 , we have that: • δss , δuni ss both grow linearly with the size of the group on the line graph (Section 4.4); this means that the noise effectively undermines any attempt consensus. Note that this fact alone does not tell us exactly how this happens; forP example, it may be that each node has expected square deviation from the opinion i πi xi (t) which is linear in n, or it might be that just one node has an expected square deviation which is quadratic in n. However, regardless of which of these possibilities holds, it is difficult to say that consensus is achieved even approximately in this case. • On any regular graph dense enough we have that both δss , δuni ss are bounded independent of the number of agents (Section 4.14), meaning that in this case the average square deviation has expectation independent of the size of the group. The same goes for dense random graphs (Section 4.13) and k-dimensional grids (Section 4.10) when k ≥ 3. • On the 2D grid and the complete binary tree, both δss and δuni ss grow as as O(log n) (Sections 4.9 and 4.11). Thus as long as the network does not grow very large, these graphs will lead to approximate consensus; see, for example, Figure 1 where a simulation of the noisy DeGroot model on the line graph on six node is shown. 2.2. Decentralized clock synchronization. We consider a network of n nodes, each equipped with a clock which progressively drifts away from the true time. The nodes desire to correct this by repeatedly altering their clock readings as a result of comparisons with their neighbors in some connected, undirected graph G = (V, E). Ongoing research in the area of clock synchronization seeks to characterize the long-term performance of such schemes. A central concern is to understand just how far apart the clocks will drift with such a correction scheme on various networks. Network clock synchronization is particularly important in signal processing applications, for example when a signal source is localized by comparing the times it was obsrved by different nodes. Consequently, much attention has been paid to developing and analyzing distributed protocols for it over the past few decades; we refer the reader to the surveys [64, 61, 58]. We consider here the simplest possible model: we model each clock as repeatedly adding a zero-mean random bias term to the true time. Specifically, let us divide time into periods of length ∆ and use ci (k) to denote the clock reading of node i at time k∆. Without any clock synchronization scheme, we will have (7)

ci (k + 1) = ci (k) + ∆ + zi (t),

where E[zi (t)] = 0, E[z2i (t)] = σ2i , and all the random variables zi (t) are independent. 10

We note that this is a substantial simplification of real-world clock dynamics. Indeed, real-world clocks are appropriately modeled as nonlinear oscillators [15]. Furthermore, clocks may possess some bias which causes them to always overestimate or underestimate the true time. Nevertheless, we stick here with Eq. (7) due to its simplicity, and especially since, as we will discuss next, even some basic mathematical question about this model remain open. A natural scheme for clock synchronization is for each node to repeatedly moves its clock reading towards the reading of its neighbors. This idea was introduced in [33] which referred to it as “synchronous diffusion.” Unfortunately, it is difficult for nodes to know the difference ci (k) − cj (k) exactly. Nodes can exchange or broadcast time-stamped messages, but these are subject to unknown propagation or processing delays. We will therefore assume that nodes i and j can cooperate to compute the quantity cj (k) − ci (k) + wij (k) where wij (k) satisfies E[wij (k)] = 0, E[w2ij (k)] = λ2ij , and all wij (k) are uncorrelated with each other. For convenience, let us adopt the convention that λij = 0 for all pairs i, j such that (i, j) ∈ / E. Thus, as each node repeatedly moves its clock reading in the (noisy) direction of its neighbors, we obtain the update (8)

ci (k + 1) = ci (k) +

X

fij (cj (k) − ci (k) + wij (k)) + ∆ + zi (t),

j∈N(i)

where N(i) denotes the set of neighbors of i in G and fij are some positive weights. It is not hard to see that the update of Eq. (8) succeeeds in bounding the limiting expected disagreement among P clock readings, which would have increased to infinity under Eq. (7), provided that j∈N(i) fij < 1 for all i. We refer the reader to Figure 2 for an illustration showing results from a single run of this equation. As can be seen in the figure, while the clock readings ci (k) perform random walks, the squared deviation from the mean clock reading remains bounded in expectation. We note that we do not model the asynchrony that inevitably results when nodes without access to a common clock execute Eq. (8). Our model in closest in spirit to the recent papers [9, 7, 8, 54, 20] which also sought to model networked clock synchronization in terms of a noisy linear system. Our goal is to obtain a quantitative analysis of how the performance of this scheme depends on the graph G and the numbers fij . The natural measure of performance is the quantity  !2  n n X X 1 1  ClockDis(G, {fij }) := lim sup E ci (k)  , ci (k) − n n k→∞ i=1 i=1 representing the average squared disagreement from the mean clock reading. In general, giving a combinatorial formula for ClockDis(G, {fij }) is an open problem. However, in the case when the numbers fij are symmetric (i.e., fij = fji ) a solution can be given, as we describe next. 11

Average Deviation From True Time

Average Squared Deviation From Mean Clock Reading

35

4.5

30

4

3.5 25 3 20 2.5 15 2 10 1.5 5 1 0

-5

0.5

0

500

1000

1500

2000

2500

0

3000

0

500

1000

Iteration

1500

2000

2500

3000

Iteration

F IGURE 2. Both graphs show the outcome of a single run of Eq. (8) on the star graph with the “equal neighbor” coefficients fij = 1/(1 + d(i)) and σ2 = 1, λ2ij = 1/4 for all (i, j) ∈ E. The left figure shows the average deP viation from the true time, i.e., the quantity (1/n) ki=1 ci (k) − k∆, which performs a random walk and eventually becomes unbounded. The right figure shows the average deviation from the mean clock reading, i.e., the P P quantity (1/n) ki=1 (ci (k) − (1/n) nj=1 cj (k))2 whose expectation is upper bounded. Indeed, define di (k) to be the difference between the clock reading at node i at time k∆ and the true time, i.e., di (k) = ci (k) − k∆. Then, di (k + 1) = ci (k + 1) − (k + 1)∆ X = ci (k) + fij (cj (k) − ci (k) + wij (k)) + ∆ + zi (t) − (k + 1)∆ j∈N(i)

(9)

= di (k) +

X

fij (dj (k) − di (k)) + qi (k),

j∈N(i)

where qi (k) = zi (k) +

P

j∈N(i) fij wij (k).

Let d(k) be the vector which stacks up all the di (k) and q(k) be the random vector which stacks up all the qi k). Moreover, define Pcl be the stochastic matrix with Pijcl = fij ; then Eq. (9) may be rewritten as d(k + 1) = Pcl d(k) + q(k), which is clearly a special case of Eq. (1). The symmetry of the weights {fij } implies Pcl is symmetric and consequently πi = 1/n for all i. Furthermore, since for all i we have 1X 1X ci (k) − cj (k) = di (k) − dj (k), n j=1 n j=1 k

n

we have just proven the following proposition. 12

Proposition 2. If fij = fji for all (i, j) ∈ E,  ClockDis(G, {fij }) = δss Pcl , Σcl , where Σcl is defined via [Σcl ]ij = E[qi (t)qj (t)]. For the sake of completeness, let us write out Σcl explicitly. Even though the noises zi (t) at each node are uncorrelated, the quantities qi (k) will be correlated since for neighbors m, l the expression for both qm (k) and ql (k) includes the random variables wml (k). Indeed, E[qi (t)] = 0

X

E[q2i (t)] = σ2i +

f2ij λ2ij

j∈N(i)

(10)

E[qi (t)qj (t)] =

f2ij λ2ij ,

where the last line used our national convention that λij = 0 for i, j that are not neighbors. In short, the performance ClockDis(G, {fij }) of the clock synchronization scheme is just δss for an appropriately defined matrix. We remind the reader that this equality was derived under the assumption that the weights {fij } are symmetric. We will return to this in Section 5.3 where we combine the above analysis with Theorem 1 to characterize to performance of clock synchronization on a number of different graphs. 2.3. Formation control from offset measurements. Our exposition here closely parallels our earlier works [45, 46]. We consider n nodes which start at arbitrary positions pi (0) ∈ Rd . As in the previous sections, there is a connected graph (V, E), and now the goal of the nodes is to move into a formation which is characterized by certain desired differences along the edges of this graph. Effective formation control is important in low-energy flying because it allows nodes to position themselves within wakes in the wind created by other nodes. Furthermore, in situations where sensing resources are limited, formations allow each individual node to focus their sensor on particular patches of the environment, ensuring full coverage among cooperating nodes. Formally we associate with each edge (i, j) ∈ E a vector rij ∈ Rd known to both nodes i and j. A collection of points p1 , . . . , pn in Rd are said to be “in formation” if for all (i, j) ∈ E we have that pj − pi = rij . In the current section (i.e., in Section 2.3), we will find it convenient to assume that G is a directed graph with the “bidirectionality” property that (i, j) ∈ E implies (j, i) ∈ E; we will do this so that we may refer to (i, j) and (j, i) as distinct edges of the graph. Note that, given the vectors rij , there may not exist a collection of points in formation; that is, some collections of vectors {rij , (i, j) ∈ E} may be thought of as “inconsistent.” For example, unless rij = −rji for all (i, j) ∈ E the collection {rij , (i, j) ∈ E} will clearly be inconsistent. Moreover, since the property of being in formation is defined through differences of position, any translate of a collection of points in formation is itself in formation. We thus consider the following problem: a collection of nodes would like to repeatedly update their positions so that p1 (t), . . . , pn (t) approaches some collection of points in formation. We assume that node i knows pj (t) − pi (t) for all of its neighbors j at every 13

time step t and furthermore we assume a “first-order” model in which each node can update its positions from step to step. A considerable literature has emerged in the past decade spanning many variants of the formation control problem. We make no attempt to survey the vast number of papers that have been published on the topic and refer the interested reader to the surveys [49, 50, 43]. We stress that the problem setup we have just described is only one possible way to approach the formation control problem; a popular and complementary approach is to consider formations defined by distances ||pj − pi ||2 rather than offsets pj − pi (see e.g., [16, 44, 68, 30]). In terms of the existing literature, our problem setup here is closest to some of the models considered in [30, 50, 43, 17, 45]. P A natural idea is for the nodes to do gradient descent on the potential function (i,j)∈E ||pi − pj − rij ||22 . This leads to the update rule X (11) pi (t + 1) = pi (t) + fij (pj (t) − pi (t) − rij ), j∈N(i)

P where {fij } are positive numbers that satisfy the step-size condition j∈N(i) fij < 1 for all i. Note that this update may be implemented in a completely decentralized way as long as node i knows the differences pj (t) − pi (t) and the desired offsets rij . We further remark that no access to a global coordinate system is needed to implement this update, as the above equation allows node i to translate knowledge of the differences pj (t) − pi (t), which can be measured directly, into knowledge of the difference pi (t + 1) − pi (t), which in turn be used to update the current position. In other words, this update may be executed without node i ever knowing what pi (t) is. It is easy to see that if there exists at least one collection of points in formation, then this control law works in the sense that all pi (t) converge and pj (t) − pi (t) → rij for all (i, j) ∈ E (considerably stronger statements were proved in [17, 49]). Indeed, let us sketch the proof of this simple claim now. If p1 (t), . . . , pn (t) is any collection of points in formation, then defining ui (t) := pi (t) − pi (t), we have that ui (t) follow the update (12)

ui (t + 1) = ui (t) +

X

fij (uj (t) − ui (t)).

j∈N(i)

Let Pform be the stochastic matrix which satisfies Pijform = fij and let uj (k) be the vector which stacks up the j’th entries of the vectors u1 (t), . . . , un (k). We thus have uj (k + 1) = Pform uj (k), for all j = 1, . . . , d, and it is now immediate that all ui (t) approach the same vector. This implies that all pi (t) approach positions in formation. We now turn to the case where the formation control update of Eq. (11) is executed with noise; as we will see, under appropriate assumptions the performance of the (noisy) formation control protocol can be written as the δss of a certain matrix. Specifically, we 14

3

2

1.5

r34 = [−1, −1]

r23 = [−1, 1]

1

0.5

4

0

2

-0.5

r14 = [1, −1]

-1

r12 = [1, 1]

-1.5

-2 -5

1

-4

-3

-2

-1

0

1

2

3

F IGURE 3. The offsets shown on the left side of the figure define a “ring formation” with 4 nodes. On the right, we show the result of simulating Eq. (13) on this graph with all the weights fij equal to 1/9 starting from random positions. We see that the nodes begin by moving close to the formation and spend the remainder of the time doing essentially a random walk in a neighborhood of the formation. will consider the update (13)

pi (t + 1) = pi (t) +

X

fij (pj (t) − pi (t) − rij ) + ni (t).

j∈N(i)

The random vector ni (t) can arise if each node executes the motion that updates its position pi (t) imprecisely. Although our methods are capable of handling quite general assumptions on the noise vectors ni (t), for simplicity let us assume that E[ni (t)] = 0, E[ni (t)ni (t)T ] = λ2i I for all i, t, and that ni (t1 ) and nj (t2 ) are independent whenever t1 6= t2 or i 6= j. Of course, once noise is added convergence to a multiple of the formation will not be possible; rather, we will be measuring performance by looking at the asymptotic distance to the closest collection of points in formation. For an illustration, we refer the reader to Figure 3 which shows a single run of Eq. (13) with four nodes. As can be read off from the figure, the nodes will move “towards the formation” when they are far away from it, but when they are close the noise terms ni (t) effectively preclude the nodes from moving closer and the nodes end up performing random motions in a neighborhood of the formation. We next formally define the way we will measure the performance of the formation b 1 (t), . . . , p b n (t) be a collection of points in formation whose centroid control protocol. Let p is the same as the centroid of p1 (t), . . . , pn (t), i.e., 1X 1X b i (t). pi (t) = p n i=1 n i=1 n

n

15

It is easy to see that, as long as there exists a single collection of points in formation, such b 1 (t), . . . , p b n (t) always exist, and in fact p b 1 (t), . . . , p b n (t) is closest collection of points in p formation to p1 (t), . . . , pn (t). Therefore, we will measure the performance of the formation control scheme via the quantity  1X  b i (t)||2 . Form(G, {fij }) := lim sup E ||pi (t) − p t→∞ n i=1 n

In general, obtaining a combinatorial expression for Form(G, {fij }) is an open problem. The next proposition describes a solution once again under the additional condition that the weights {fij } are symmetric, i.e., fij = fji . Proposition 3. Let Q be the matrix defined by Qij = λ2i + λ2j . If there exists at least one collection of points in formation and fij = fji for all (i, j) ∈ E then    Pn 2   form 1 2 2 T l=1 λl Form(G, {fij }) = d · δss P , nDiag(λ1 , . . . , λn ) − Q + 11 . n n Proof. We proceed by changing variables to b i (t) = pi (t) − p b i (t). u Observe that by definition 1X b i (t) = 0. u n i=1 n

(14) Naturally, we also have that

 1X  Form(G, {fij }) = lim sup E ||b ui (t)||2 . t→∞ n i=1 n

(15)

We now observe that the symmetry of the weights {fij } as well as the fact that rij = −rji imply that n n n 1X 1X 1X pj (t + 1) = pj (t) + nj (t), n j=1 n j=1 n j=1 which allows us to conclude 1X nj (t). n j=1 n

b i (t + 1) = p b i (t) + p

b i (t) are updated as In turn, this implies that the quantities u

(16)

b i (t + 1) = u b i (t) + u

X

b i (t)) + ni (t) − fij (b uj (t) − u

j∈N(i)

! n 1X nj (t) . n j=1

Inspecting now Eq. (14), Eq. (15) and Eq. (16), it is now almost immediate that we can recast Form(G, {fij } as δss of an appropriately defined matrix. Indeed, for each j = 1, . . . , d, 16

b j (t) to stack up the j’th components of the vectors u b 1 (t), . . . , u b n (t). We then have define u that Eq. (15) implies d X  1  j E ||b u (t)||22 , n t→∞ j=1

Form(G, {fij }) = lim sup

(17) while Eq. (14) implies

1X j b (t) = 0, u n i=1 n

(18) and finally Eq. (16) implies (19)

b j (t + 1) = Pform u b j (t) + qj (t) u

where the noise vector qj (t) satisfies E[qj (t)] = 0

Pn 2 λ λ2k + λ2m + l=12 l for all k 6= m = − n Pn n 2 2 λ λ E[(qjk )2 (t)] = λ2k − 2 k + l=12 l for all k. n n

E[qjk (t)qjm (t)]

We may summarize these last three equations as (20)

  1 E qj (t)(qj )T (t) = n

1 nDiag(λ21 , . . . , λ2n ) − Q + n

n X

! λ2l 11T

! .

l=1

Equations (19), (18), (17), (20) now immediately imply the proposition.



Summarizing, Proposition 3 characterizes the performance of a formation control protocol in terms of the δss of an appropriately defined matrix. In Section 5 we will combine this proposition with Theorem 1 in order to obtain combinatorial characterizations of the performance of formation control protocols in a number of common graphs.

3. P ROOF OF T HEOREM 1 We begin our proof of Theorem 1 with a series of preliminary lemmas. The matrix J defined as J := 1πT , will be of central importance to the proof. The following lemma collects a number of its useful properties. 17

Lemma 4. x(t) J1 JP PJ J2 (I − J)2 (Pl − J)k ρ(P − J)

= = = = = = =
1, j > 1, i 6= j. Consequently, ! n X X X1 1 1 σ2i 2 1 · n + 1+ n δss ' σ21 n n n i=2 j=2,...,n, j6=i j6=i ' σ21 +

σ22 + · · · + σ2n . n

As might be expected, noise at the center vertex contributes an order-of-magnitude more to δss than noise at a leaf vertex with the same variance. 4.6. The two-star graph. Consider two stars joined by a link connecting their centers. It is not hard to see that all hitting times are Θ(n), with the exception of hitting times from a leaf to its own center, which are Θ(1) as before. Adopting the conventions of having node 1 and node n denote the two centers, we have that 1 π1 = πn ' 1, πk ' , for all k 6= 1, n. n Thus δss '

(σ21

+

σ2n )(1

n−1 X 1 1 1 X · n + n · 1 + n n) + σ2i 2 nπj n n n i=2 j6=i

' n(σ21 + σ2n ) +

σ22 + · · · + σ2n−1 . n

It is interesting to compare our results for the star graph with our results for the twostar graph. While on the star graph, noise at the center vertex contributes Θ(n) times more to the limiting disagreement than noise at a leaf vertex, on the two-star graph the  2 corresponding factor is Θ n . Furthermore, if all σ2i are positive and bounded away from zero independently of n, the disagreement on the two-star graph is Θ(n) while 25

disagreement on the star graph is Θ(1). One implication of these comparisons is that the diameter of the graph (which is constant for both the star and the two-star graph) does not determine the order of magnitude of δss . 4.7. The lollipop graph. The lollipop graph consists of a line graph on n/2 vertices joined to a complete graph on n/2 vertices (we assume here that n is even). Adopting the notation that the line graph has vertices {1, . . . , n/2} and that the complete graph has vertices {n/2, . . . , n}, we use the formula πi = d(i)/m to conclude that 1 1 , i = n/2, . . . , n. , i = 1, . . . , n/2 − 1, π ' i n2 n

πi '

Let us assume that all variances are the same, i.e., σ2i = σ2 . An analysis of resistances then immediately gives the following estimates for the commute times. • if i, j ≥ n/2, then HPe (i → j) + HPe (j → i) ' n. • If i ≥ n/2, j < n/2 then HPe (i → j) + HPe (j → i) ' n2

n 2

 −j .

• If i < n/2, j < n/2, then HPe (i → j) + HPe (j → i) ' |j − i|n2 . Appealing to Theorem 1 and breaking up the sum in Theorem 1 into four pieces, we obtain  δss = σ2 

X

i≥n/2,j≥n/2

1 O(n) + n3

X i≥n/2,j