Randomly coloring constant degree graphs - CiteSeerX

4 downloads 106 Views 248KB Size Report
Mar 5, 2009 - There exists C5 > 02 and ∆0, such that for any graph. G = (V,E) on n vertices ...... [11] M.R. Jerrum, A. Sinclair and E. Vigoda. A polynomial-time ...
Randomly coloring constant degree graphs Martin Dyer



Alan Frieze†

Thomas P. Hayes‡

Eric Vigoda§

March 5, 2009

Abstract We study a simple Markov chain, known as the Glauber dynamics, for generating a random k-coloring of a n-vertex graph with maximum degree ∆. We prove that, for every  > 0, the dynamics converges to a random coloring within O(n log n) steps assuming k ≥ k0 () and either: (i) k/∆ > α +  where α∗ ≈ 1.763 and the girth g ≥ 5, or (ii) k/∆ > β +  where β ∗ ≈ 1.489 and the girth g ≥ 7. Previous results on this problem applied when k = Ω(log n), or when k > 11∆/6 for general graphs.

1

Introduction

Markov Chain Monte Carlo (MCMC) is an important tool in sampling from complex distributions. It has been successfully applied in several areas of Computer Science, most notably computing the volume of a convex body [5], [13], [14] and estimating the permanent of a non-negative matrix [11]. One particular problem that has attracted significant interest is that of generating a (nearly) random proper k-coloring of a G = (V, E). This is a well-studied problem in Combinatorics [2] and Statistical Physics [16]. Jerrum [10] proved that a simple, popular Markov chain, known as the Glauber ∗ School of Computer Studies, University of Leeds, Leeds LS2 9JT, UK. Supported by EC Working Group RANDAPX. † Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh PA 15213, USA. Supported in part by NSF grant CCR-0200945. ‡ Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA. § School of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA. Supported in part by NSF grant CCR-0237834. A preliminary version of this paper appeared in Proceedings of the 45th Annual Symposium on Foundations of Computer Science (FOCS), 582-589, 2004.

1

dynamics, converges to a (near) random k-coloring after O(n log n) steps, provided k/∆ > 2. Here n = |V | and ∆ is the maximum degree of G. This leads to the challenging problem of determining the smallest value of k/∆ for which a random k-coloring can be generated in time polynomial in n. Vigoda [18] gave the first significant improvement over Jerrum’s result, reducing the lower bound on k/∆ to 11/6 by analyzing a different Markov chain. There has been no success in extending Vigoda’s approach to smaller values of k/∆, and it remains the best bound for general graphs. Dyer and Frieze [4] introduced an approach, known as the burn-in method, which improved the lower bound on k/∆ for the class of graphs with large maximum degree and large girth. It is within this context that this paper is written. We will prove that the Glauber dynamics is efficient for a much wider range of girth and maximum degree than has done before. The task in a theoretical analysis of MCMC algorithms is to show that a given Markov chain converges rapidly to its steady state. The time to get “close” in variation distance is called the mixing time. One of the most useful tools for doing this is coupling. We take two copies (Xt , Yt ) of a Markov chain M and then bound the variation distance dt between the t-step distribution and the steady state distribution via the coupling inequality dt ≤ Pr(Xt 6= Yt ).

(1)

We are free to choose our coupling and we endeavour to minimise the RHS of (1). Often we define a distance function dist between states such that Xt 6= Yt implies dist(Xt , Yt ) ≥ 1 and then try to prove that our coupling satisfies E(dist(Xt+1 , Yt+1 ) | Xt , Yt ) ≤ α dist(Xt , Yt )

(2)

for some α < 1. One must consider all possible Xt , Yt and so it would seem that we have to take a worst-case pair here. We should point out that path coupling [3] does ameliorate this, in that it allows us to only consider the case where dist(Xt , Yt ) = 1. In the burn-in method, we allow the chains to run uncoupled for a sufficient amount of time (the burn-in period) so that only typical pairs of states need be considered. Using this idea Dyer and Frieze reduced the bound to k/∆ ≥ α for any α > α∗ where α∗ ≈ 1.763 is the root of α = e1/α . They required lower bounds on the maximum degree ∆ = Ω(log n) and on the girth g = Ω(log ∆). Under these assumptions, Dyer and Frieze proved that after the burn-in period, the colorings Xt and Yt satisfy certain properties in the local neighborhood of every vertex, so called local uniformity 2

properties. Assuming these local uniformity properties they were able to avoid the worst case pair in (2). With the same restrictions on the maximum degree and girth, Molloy [15] improved the lower bound to k/∆ ≥ β for any β > β ∗ where β ∗ ≈ 1.489 is the root of (1 − e−1/β )2 + βe−1/β = 1. The girth assumptions were the first to be (nearly) removed. Hayes [6, 7] reduced the girth requirements to g ≥ 5 for k/∆ > α∗ and g ≥ 7 for k/∆ > β ∗ . Subsequently, Hayes and Vigoda [8] (using a non-Markovian coupling) reduced the lower bound on k/∆ to (1 + ) for all  > 0, which is nearly optimal. Their result requires girth g > 10. The large maximum degree restriction remained as a serious bottleneck for extending the burn-in approach to general graphs. The assumption ∆ = Ω(log n) is required in all of the improvements so far that rely on the burn-in approach. We significantly improve the maximum degree assumption, only requiring ∆ to be a sufficiently large constant, independent of n. When ∆ is constant, in a typical coloring a constant fraction of the vertices do not satisfy the desired local uniformity properties. This is the main obstacle our proof overcomes. Before formally stating our theorem we will define the Glauber dynamics. All of the aforementioned results (except Vigoda’s [18]) analyze the Glauber dynamics, which is a Markov chain for generating a random k-coloring. Let K denote the set of proper k-colorings of G. For technical purposes, the state space of the Glauber dynamics is Ω = [k]V ⊇ K where [k] = {1, 2, . . . , k}. From a coloring Zt ∈ Ω, the evolution Zt → Zt+1 is defined as follows: (a) Choose v = v(t) uniformly at random from V . (b) Choose color c = c(t) uniformly at random from the set of colors [k] \ Zt (N (v)) available to v. The set N (v) denotes the neighbors of vertex v. (c) Define Zt+1 by ( Zt (u) u 6= v Zt+1 (u) = c u=v It is straightforward to verify that the stationary distribution is uniformly distributed over the set K. For δ > 0, the mixing time Tmix (δ) is the number of transitions until the dynamics is within variation distance at most δ of the stationary distribution, assuming the worst initial coloring Z0 . 3

We prove the following theorem. Theorem 1. Let α∗ ≈ 1.763 and β ∗ ≈ 1.489 be the constants defined earlier. For all  > 0, there exist k0 , C > 0, such that for every graph G on n vertices with maximum degree ∆ and girth g, if either: (a) k ≥ max{(1 + )α∗ ∆, k0 } and g ≥ 5, or (b) k ≥ max{(1 + )β ∗ ∆, k0 } and g ≥ 7, then for all δ > 0, the mixing time of the Glauber dynamics on k-colorings of G satisfies Tmix (δ) ≤ Cn log(n/δ). Using now classical results of Jerrum et al [12], the above rapid mixing results imply a fullypolynomial approximation scheme (fpras) for counting k-colorings under the same conditions. Recent work of Stefankovic et al [17] designs such an approximate counting algorithm with running time O∗ (n2 ). The heart of our proof analyzes a simple coupling over Tm = Θ(n) steps for an arbitrary pair of colorings which initially differ at a single vertex v0 . We prove that the expected Hamming distance after Tm steps is at most 3/4. We do this by breaking the analysis into two scenarios. In the advantageous scenario, during the entire Tm steps, the Hamming distance stays small and all disagreements are close to v0 . If both of these events occur, after an initial burn-in period of Tb < Tm steps, every updated vertex near v0 will have certain local uniformity properties (the same properties used by [4, 15, 6]). It will then be straightforward to prove that the Hamming distance decreases in expectation over the final Tm − Tb steps. In the disadvantageous scenario where one of the events fails, we use a crude upper bound on the Hamming distance.

2

Preliminaries

For Xt , Yt ∈ Ω, denote their difference by Dt = {v : Xt (v) 6= Yt (v)}, and denote their Hamming distance by Ht = |Dt |. For vertex v, let d(v) denote its degree and N (v) denote its neighborhood. For an event A, we will use the notation 1(A) to refer to the {0, 1}-valued indicator variable for the event A, i. e., ( 1 if A 1(A) = 0 if A. 4

To prove Theorem 1, we will use path coupling [3] for T = Cn log(n/δ) steps of the Glauber dynamics. Therefore, for all X0 , Y0 ∈ Ω where H0 = 1, we will define a T -step coupling such that E(H(XT , YT )) ≤

δ . n

(3)

Then, for any X0 , Y0 ∈ Ω, since the maximum possible Hamming distance is n, it follows by path coupling that Pr (XT 6= YT ) ≤ E(H(XT , YT )) ≤ δ, after which Theorem 1 follows by the coupling inequality, (1).

3 3.1

Proof of Rapid Mixing Local Uniformity Properties

A key element in our proof is that for a “nice” initial coloring, after O(n) steps of the Glauber dynamics, a vertex will have certain local uniformity properties with high probability. We make the following definition. Definition 2. We say that a coloring X is C-heavy for color c at a vertex v if at least C∆ vertices within distance 2 of v receive color c under X, or at least C∆/ log ∆ neighbors of v receive color c under X. To be considered “nice” at a vertex v, a coloring should not be heavy for any colors at any vertices too close to v. We formalize this notion as follows. Definition 3. Let X, Y be colorings, let ρ > 0, and let v be a vertex such that X(v) 6= Y (v). We say v is ρ-suspect for radius R if there exists a vertex w within distance R of v and a color c such that either X or Y is ρ-heavy at w for c. Otherwise, we say that v is ρ-above suspicion for radius R. We next make an easy but crucial observation about the above definitions. For a pair of colorings X and Y , let X = Z0 ∼ Z1 ∼ Z2 ∼ · · · ∼ Z` ∼ Y = Z`+1 denote a shortest path between X and Y along pairs of colorings that differ at a single vertex, i.e., H(Zi , Zi+1 ) = 1 for all 0 < i ≤ `. This is the path used for the purposes of the path coupling approach. We refer to the colorings Z1 , . . . , Z` as interpolated colorings for X and Y . A key aspect of the above definitions is that “niceness” is automatically inherited by interpolated colorings, as we now formally state. Observation 4. If X and Y are colorings, neither of which is C-heavy for color c at vertex v, then no interpolated coloring is 2C-heavy at v. Likewise, if X and Y are ρ-above suspicion for radius R at v, then every interpolated coloring is 2ρ-above suspicion for radius R at v. 5

The first basic local uniformity result says that from any initial coloring X0 , for any vertex v, after O(n log ∆) steps of the Glauber dynamics, v is not (2e)-heavy at v with high probability1 . Lemma 5 (Corollary 32 of Hayes [7]). There exists C5 > 02 and ∆0 , such that for any graph G = (V, E) on n vertices with maximum degree ∆ > ∆0 and girth g ≥ 5, and for any k > 1.45∆, the following holds. For any X0 , any v ∈ V , any color c, any T ≥ 3n log ∆, Pr ((∃t ∈ [3n log ∆, T ]) : Xt is (2e)-heavy for c at v) ≤

T exp(−∆/C5 ). n

Moreover, if the initial coloring is not (4e)-heavy at vertex v, this property is maintained, and even improves slightly after O(n) steps with high probability. Lemma 6 (Corollary 32 of Hayes [7]). There exist C6 , C6 0 > 0 and ∆0 , such that for any graph G = (V, E) on n vertices with maximum degree ∆ > ∆0 and girth g ≥ 5, and for any k > 1.45∆, the following holds. For all v ∈ V , any color c, any X0 which is not (4e)-heavy for c at v, and any T ≥ C6 n, T Pr ((∃t ∈ [C6 n, T ]) : Xt is (2e)-heavy for c at v) ≤ exp(−∆/C6 0 ). n Finally, we describe the main burn-in result. For an initial coloring X0 which is not (4e)-heavy at a vertex v, after O(n) steps of the Glauber dynamics certain local uniformity properties hold for v with high probability. In particular, it has close to the expected number of available colors as if its neighbors were colored independently, and close to the expected number of neighbors that are unblocked for a pair of colors c and c0 . We need the following definitions. For a coloring Xt and vertex v, let A(Xt , v) = [k] \ Xt (N (v)), denote the set of available colors for v in Xt . For colors c1 6= c2 , w ∈ V , v ∈ N (w), coloring Xt , let ( 1 if {c1 , c2 } 6⊂ Xt (N (w) \ {v}) 1(U (Xt , w, v, c1 , c2 )) = 0 otherwise be the indicator variable for the event that w is unblocked for c1 or c2 , i.e., at least one of c1 and c2 does not appear on N (w) \ {v}. The following is from Hayes [7]. Part 1 of the following lemma is equation (1) in Theorem 4 of [7], and Part 2 is Corollary 35 of [7]. 1

In this paper, an event is said to occur with high probability if its failure probability is exp(−Ω(∆γ )) for some positive constant γ 2 There are several constants Cx in this paper. Constants Cx will defined in Lemma x or Theorem x etc.

6

Lemma 7 (Hayes [7]). For every η > 0 there exist C7 , ∆0 such that for every graph G = (V, E) with maximum degree ∆ > ∆0 , and v ∈ V , and for k > 1.45∆, and X0 which is not (4e)-heavy for any vertex w ∈ BR (v) and any color c, where R = R(η). Then, for every T > C7 n, 1. If the girth of G is ≥ 5, then Pr ((∃t ∈ [C7 n, T ]) |A(Xt , v)| < (1 − η)k exp(−∆/k)) ≤

T exp(−∆/C7 ). n

2. If the girth of G is ≥ 7, then, for every pair of colors c1 , c2 ∈ [k],   −∆/k 2 X 1(U (Xt , w, v, c1 , c2 )) ∆(1 − (1 − e ) ) T ≥ (1 + η) ≤ exp(−∆/C7 ). Pr (∃t ∈ [C7 n, T ]) |A(Xt , w)| k exp(−∆/k) n w∈N (v)

3.2

Coupling Analysis

We use Jerrum’s optimal3 one-step coupling (introduced in [10]). At every time t we choose a random vertex v = v(t), and update v in both chains Xt and Yt . We maximally couple the available colors for v to define Xt+1 (v) and Yt+1 (v). Since we use path coupling we only need to analyze pairs of colorings that initially differ at a single vertex v, which we refer to as neighboring colorings. A basic tool used in several of our proofs will be the notion of propagation of disagreements. If for some t, v = v(t) we have Xt (v) = Yt (v) and Xt+1 (v) 6= Yt+1 (v) then there exists a neighbor w of v which propagates its disagreement to v in the following sense: in chain X we chose color c(t + 1) = Yt (w) or in chain Y we chose c(t + 1) = Xt (w). In this way, if we initially had a single disagreement X0 ⊕ Y0 = {v0 }, then a disagreement at time t can be traced back via a path of disagreements to v0 . The following result captures the behavior of the coupling for a worst-case pair of neighboring colorings. Lemma 8. There ∆0 > 0, such that for any C ≥ 3, 0 <  < 1, any graph G on n vertices with maximum degree ∆ > ∆0 and girth g ≥ 5, any k > 1.45∆, the following hold. Let X0 , Y0 be colorings which disagree at a single vertex v. Let T = Cn/. Then 1. E|XT ⊕ YT | ≤ exp(3C/). 3

Optimal in the sense that the expected Hamming distance between Xt+1 and Yt+1 is always minimized, conditioned on Xt and Yt

7

2. E|XT log ∆ ⊕ YT log ∆ | ≤ ∆3C/ 3. Let ST log ∆ denote the set of disagreements of (XT log ∆ , YT log ∆ ) that are (2e)-suspect for radius √ 3/5 2∆ . Then E|ST log ∆ | ≤ exp(− ∆). Proof. For parts 1 and 2, we will just bound the rate of spreading of disagreements. In each time step, ∆ the number of expected disagreements increases by at most a factor of 1 + n(k−∆) ≤ exp(3/n). This holds regardless of the history on previous steps. (There is a ≤ ∆/n chance that v(t) is the neighbor of a particular disagreement and then a ≤ 1/(k − ∆) chance that the disagreement spreads to v(t)). Hence, expanding out the conditional probabilities, it follows by induction that, after t steps, the expected number of disagreements is at most exp(3t/n). Plugging in the values t = T = Cn/ and t = T log ∆ = Cn log ∆/ establishes parts 1 and 2 respectively. √ For part 3, we divide the analysis into two cases: those vertices inside and outside BR (v) for R = ∆. We apply Lemma 5 to each vertex w ∈ BR+2∆3/5 (v) and color c ∈ [k], concluding that, with probability at least 1 − exp(−∆/C), none of these vertices is (2e)-heavy for any color under XT log ∆ , and so no vertex in BR (v) is (2e)-suspect for radius 2∆3/5 , where C ≤ 2C5 . Note that, since R + 2∆3/5 = o(∆/ log ∆) and k = O(∆), the union bounds over the ball and the colors do not affect the form of the error probability–only the constant C is affected, and it will be close to C5 . Hence, if D is the set of disagreements of (XT log ∆ , YT log ∆ ) that are (2e)-suspect, we have shown E|D ∩ BR (v)| ≤ exp(−∆/C)|BR (v)| ≤ exp(−∆/2C). To bound the number of disagreements outside BR (v), we observe that each disagreement in DT \ BR (v) comes from a paths of disagreements starting at v, and having length at least R. Hence, by a union bound, we have   ` T 1 E|DT \ BR (v)| ≤ ∆ ` n(k − ∆) `≥R X  eT ∆ ` ≤ `n(k − ∆) `≥R  X 1 ` ≤ 4 `≥R √ ≤ exp(−(4/3) ∆) X

`

by choice of R

Summing the above bounds on E|DT \ BR (v)| and E|D ∩ BR (v)| gives the desired upper bound on |D|, assuming ∆0 is sufficiently large. 8

Having shown that things have improved somewhat by O(n log ∆) steps, we now consider coupling from this point on. The heart of our rapid mixing proof will be the following result, which shows that for a pair of neighboring colorings that are “nice” (namely, above suspicion), there is a coupling of O(n) steps of the Glauber dynamics where the expected Hamming distance decreases and it is extremely unlikely that the coupling behaves very poorly. Theorem 9. There exists C9 ≥ 3 and ∆0 , such that for every graph G = (V, E) on n vertices with maximum degree ∆ and girth g, if either: (a) k ≥ max{(1 + )α∗ ∆, ∆0 } and g ≥ 5, or (b) k ≥ max{(1 + )β ∗ ∆, ∆0 } and g ≥ 7, then the following hold. Suppose X0 , Y0 differ only at v and v is (4e)-above suspicion for R, where ∆3/5 ≤ R ≤ 2∆3/5 . Let Tm = C9 n/. Let D denote the event that |XTm ⊕ YTm | > ∆2/3 . Then 1. E|XTm ⊕ YTm | ≤ 1/3. √ 2. E|XTm ⊕ YTm |1(D) < exp(− ∆).   √ √ 3. Pr there exists a (2e)-suspect disagreement for R0 = R − ∆ at time Tm ≤ 2 exp(− ∆) We will prove the above theorem in the next section. Tying together Lemma 8 and Theorem 9 we obtain the following lemma analyzing the coupling for O(n log ∆) steps from an initial pair of colorings that are not suspect. Lemma 10. There exists a constant C10 > 0 and ∆0 , such that for every graph G = (V, E) on n vertices with maximum degree ∆ and girth g, if either: (a) k ≥ max{(1 + )α∗ ∆, ∆0 } and g ≥ 5, or (b) k ≥ max{(1 + )β ∗ ∆, ∆0 } and g ≥ 7, then the following holds. Let X0 , Y0 be colorings which disagree at a single vertex v that is (4e)-above suspicion for R = 2∆3/5 . Let T = C10 nlog ∆ . Then, 1 E|XT ⊕ YT | ≤ √ . ∆

9

Proof. The high level idea is to apply Theorem 9 a number of times. Let Tm = C9 n/. First, we start from (X0 , Y0 ), and run Tm steps. We use Theorem 9 to analyze the coupling for these first Tm steps. In the event that the number of disagreements has not dropped to zero after these Tm steps, we interpolate a sequence of intermediate colorings, Z0 , . . . , Zd , so that each Zi , Zi+1 differ at a single vertex, and then apply path coupling. Then for each pair of colorings Zi , Zi+1 that differ at a single vertex vi , to analyze the performance of the coupling over the next Tm steps, we apply Theorem 9 if vi is (4e)-above suspicion, and otherwise we apply Lemma 8. At the end of these Tm steps we apply path coupling again and repeat the above procedure. √ For colorings interpolated at time iTm , we will use R = 2∆3/5 −i ∆ in our applications of Theorem 9. Let Di denote the event that, at some time t ≤ iTm , the Hamming distance between Xt and Yt exceeds ∆2i/3 (D = D1 was defined in Theorem 9). Let Si denote the event that, at some time t ≤ iTm , there exists a (2e)-suspect disagreement of Xt and Yt . Recall, that if Xt and Yt have no (2e)-suspect disagreements, then the interpolated pairs of neighboring colorings have no (4e)-suspect disagreements. Let Hi = |Xt ⊕ Yt | be the total number of disagreements at time t = iTm . We prove the following Claim by induction: EHi ≤

1 2i

for i = O(log ∆). EHi+1 ≤ E(Hi+1 1(Di )) + E(Hi+1 1(Si )) + E(Hi+1 1(Di )1(Si )) We now consider the summands on the right-hand side one by one. In the following, the phrase “by path coupling” conveys the idea that if there are k disagreements at a certain time then we can bound the expected number of disagreements at a later time by kL where L is a bound obtained by assuming that k = 1. E(Hi+1 1(Di )) = E(E (Hi+1 1(Di ) | Hi )) = E(E (Hi+1 | Hi )1(Di )) ≤ e3C9 / E(Hi 1(Di ))

by path coupling and Lemma 8.1  ≤e E(Hi 1(Di−1 )) + E(Hi 1(Di )1(Di−1 ))  √  ≤ e3C9 / E(Hi 1(Di−1 )) + ∆2(i−1)/3 exp(− ∆) by path coupling and Theorem 9.2 3C9 /

By induction, it follows that √ E(Hi+1 1(Di )) ≤ 2e3iC9 / ∆2i/3 exp(− ∆). 10

Now for the second summand:  E(Hi+1 1(Si )) = E(E Hi+1 1(Si ) | Hi ) 1 ≤ E(Hi 1(Si )) 3 1 ≤ E(Hi 1(Si−1 )) 3 ≤ 3−i

by path coupling Theorem 9.1 since Si ⊆ Si−1 by induction

Now the third and final summand: E(Hi+1 1(Si )1(Di )) = E(E (Hi+1 | Hi )1(Si )1(Di )) ≤ ∆2i/3 e3C9 / Pr (Si \ Di )

√ ≤ ∆2i/3 e3C9 / i∆2i/3 2 exp(− ∆) We now have

by path coupling and Lemma 8.1 by Theorem 9.3 and a union bound

√ EHi ≤ 3−i + poly(∆i+1/ ) exp(− ∆)

which is at most 2/3i for sufficiently large ∆ and i = O(log ∆). Finally, we prove the mixing time is O(n log n) by analyzing the coupling for O(n log n) steps for an arbitrary pair of initial colorings. Theorem 11. There exists a constant C11 > 0 and ∆0 , such that for every graph G = (V, E) on n vertices with maximum degree ∆ and girth g, if either: (a) k ≥ max{(1 + )α∗ ∆, ∆0 } and g ≥ 5, or (b) k ≥ max{(1 + )β ∗ ∆, ∆0 } and g ≥ 7, then the following holds. Let X0 , Y0 be arbitrary colorings and let δ > 0. Let Tmix = Then, Pr (XTmix 6= YTmix ) ≤ δ

C11 n log(n/δ) . 

Proof. Let us define a weighted Hamming metric ρ on the space of colorings as follows. ρ(Xt , Yt ) equals the sum of the usual Hamming distance plus A times the number of (2e)-suspect disagreements. Here A = ∆3/+1/2 , and we will require ∆ to be large enough that √ exp( ∆) √ A≤ , ∆ 11

which is always true for sufficiently large ∆. Let T = C10 n(log ∆)/. Claim: For any i ≥ 0,  2 E ρ(X(i+1)T , Y(i+1)T ) | XiT , YiT ≤ √ ρ(XiT , YiT ). ∆ Proof of Claim: Let s denote the number of (2e)-suspect disagreements for XiT , YiT , and let t denote the total number of disagreements. So ρ(XiT , YiT ) = sA + t. Similarly, let s0 denote the number of suspect disagreements for X(i+1)T , Y(i+1)T , and t0 the total number of disagrements. So ρ(X(i+1)T , Y(i+1)T ) = s0 A + t0 . By Lemma 8.3, and by path coupling, we have the following bound on s0 : √ Es0 ≤ t exp(− ∆). By Lemmas 8.2 and 10, and path coupling, we have 1 Et0 ≤ s∆3/ + (t − s) √ . ∆ Putting these together, we have √  1 E ρ(X(i+1)T , Y(i+1)T ) | XiT , YiT ≤ s∆3/ + (t − s) √ + t exp(− ∆)A. ∆ 2 ≤ ∆3/ s + √ t ∆ 2 ≤ √ ρ(XiT , YiT ). ∆ This completes the proof of the Claim. Now, by induction and the Claim, we have, for all i ≥ 0,   2 i Eρ(XiT , YiT ) ≤ ρmax √ , ∆ where ρmax = nA is the maximum possible value of ρ. Finally, we observe that if C is sufficiently large then when i ≥ C(log(n/δ))/(log ∆), we have that Eρ(XiT , YiT ) ≤ δ, and so the desired conclusion follows by Markov’s inequality. It is straightforward to verify that Theorem 1 follows immediately from Theorem 11. 12

4

Proof of Theorem 9

Recall, for Xt , Yt ∈ Ω, their difference is denoted by Dt = {v : Xt (v) 6= Yt (v)}, and their Hamming distance by Ht = |Dt |. Denote their cumulative difference by [ D≤t = Dt , t0 ≤t

and denote their cumulative Hamming distance by H≤t = |D≤t |. Let Tb = C7 n. For t > Tb , we define the following bad events: • D(t) denotes the event H≤t ≥ ∆2/3 . • B1 (t) denotes the event D≤t 6⊆ BR = BR (v). • For part (a) of Theorem 1, let B2 (t) denote the event that there exists Tb ≤ τ ≤ t and z ∈ BR such that A(Xτ , z) < Θ0 := (1 − /2)ke−∆/k . For part (b) of Theorem 1, let B2 (t) denote the event that there exists Tb ≤ τ ≤ t, z ∈ BR and colors c1 , c2 such that X 1(U (Xt , w, z, c1 , c2 )) ∆(1 − (1 − e−∆/k )2 ) ≥ Ψ0 := (1 + /2) . A(Xτ , w) k exp(−∆/k)

w∈N (z)

• B3 (t) denotes the event that there exists Tb ≤ τ ≤ t and z ∈ BR such that |{w ∈ N (z) : Xτ (w) 6= Yτ (w)}| ≥ ∆2/3 . • B4 (t) denotes the event that there exists Tb ≤ τ ≤ t and z ∈ BR such that |{w ∈ B2 (z) : Xτ (w) 6= Yτ (w)}| ≥ ∆5/6 . (This is only relevant for the case g ≥ 7).

13

Then we let B(t) = B1 (t) ∪ B2 (t) ∪ B3 (t) ∪ B4 (t), and finally we define our good event to be G(t) = D(t) ∩ B(t). For all of these events when the time t is dropped, we are referring to the event at time Tm . We will bound the Hamming distance by conditioning on the above events in the following manner, EHTm

= EHTm 1(D) + EHTm 1(D)1(B) + EHTm 1(G) ≤ EHTm 1(D) + ∆2/3 Pr (B) + EHTm 1(G).

(4)

To prove Part 1 of Theorem 9 will bound each of the terms in (4) by 1/4, thus ensuring that E(HTm ) ≤ 3/4. Note the first term is the quantity considered in Part 2 of Theorem 9. Hence, we begin by proving Part 2 before returning to the proof of Part 1. Proof of Theorem 9.2. We will prove that for every integer 1 ≤ D ≤ n, Pr (H≤Tm ≥ D) ≤ exp(−De−2C9 / ).

(5)

For 1 ≤ i ≤ D, let ti be the time at which the i’th disagreement is generated (possibly counting the same vertex multiple times). Denote t0 = 0. Let ηi := ti − ti−1 be the waiting time for the formation of the i’th disagreement. Conditioned on the evolution at all times in [0, ti ], the distribution of ηi stochastically dominates a geometric distribution with success probability ρi and range {1, 2, . . . }, where min{i∆, n − i} ρi = . (k − ∆)n This is because at all times prior to ti we have Ht ≤ i and thus the set H≤t increases with probability at most ρi at each step, regardless of the history. The numerator in the expresion for ρi is an upper bound on the number of vertices that are non-disagreeing neighbors of disagreements and the denominator is a lower bound on the probability of choosing a fixed such vertex and then choosing a color that increases the number of disagreements. Hence η1 +· · ·+ηD stochastically dominates the sum of independent geometrically distributed random variables with success probabilities ρ1 , ρ2 , . . . , ρD . Now for any real x ≥ 0,   ρi dxe −1 Pr(ηi ≥ x) = (1 − ρi ) ≥ exp − x ≥ e−2ρi x 1 − ρi since ρi ≤ 1/(k − ∆). 14

Thus η1 +· · ·+ηD stochastically dominates the sum of exponential random variables with parameters ∆ 2ρ1 , 2ρ2 , . . . , 2ρD . Now ρi ≤ iρ where ρ = (k−∆)n and so η1 + · · · + ηD stochastically dominates the sum of exponential random variables ζ1 , ζ2 , . . . , ζD with parameters 2ρ, 4ρ, . . . , 2Dρ. Now consider the problem of collecting D coupons, assuming each coupon is generated by a Poisson process with rate 2ρ. The delay between collecting the i’th coupon and the i + 1’st coupon is exponentially distributed with rate 2(D − i + 1)ρ. Hence the time to collect all D coupons has the same distribution as ζ1 +· · ·+ζD . But the event that the total delay is less than Tm is nothing but the intersection of the (independent) events that all coupons are generated in [0, Tm ]. The probability of this is    −2Tm ρ D −2C9 / 1−e < exp −De . This completes the proof of (5). We can now bound the expected Hamming distance at time Tm as follows: EHTm 1(D) ≤ EH≤Tm 1(D) n X = D Pr (H≤Tm = D) D=∆2/3

= ∆

2/3



Pr H≤Tm ≥ ∆

2/3



n X

+

Pr (H≤Tm ≥ D)

D=∆2/3 +1

X

< ∆2/3

Pr (H≤Tm ≥ D)

D≥∆2/3

X

< ∆2/3

exp(−De−2C/ )

by (5)

D≥∆2/3

=

∆2/3 exp(−∆2/3 e−2C/ ) 1 − exp(−e−2C/ )

< ∆2/3 exp(3C/ − ∆2/3 e−2C/ ) √ The above quantity is at most exp(− ∆), for sufficiently large ∆. This completes the proof of Theorem 9.2. We now return to the proof of Part 1. We now bound the probability of one of the bad events occuring. Lemma 12. Pr (B) ≤ 1/4∆2/3 .

15

Proof. We can bound the probability of the event B1 by a standard paths of disagreement argument.   R 1 Tm Pr (B1 ) ≤ ∆ R n(k − ∆) R

< (∆C9 e/(k − ∆)R)R < 1/20∆2/3 ,

(6)

assuming R ≥ max{20C9 /, log(20∆2/3 )}. To bound the probability of the event B2 , we first bound the number of re-colorings of interest. Let S = {Tb < t ≤ Tm : v(t) ∈ BR }. By Lemma 7, with  = δ/2, the desired bound on the local uniformity property of a vertex z fails with probability that is exponentially small in ∆. Therefore, Pr (B2 ) ≤

Tm R+1 1 ∆ exp(−∆/C7 ) < . n 20∆2/3

(7)

√ To bound B3 we first recall that by Theorem 9.2, Pr (D) ≤ exp(− ∆). Now, assuming D does √ not occur, we show that with high probability, no vertex in BR is likely to have more than ∆ disagreements in its neighborhood.

!∆2/3  2/3 − 1) T ∆ + (∆ m Pr |N (w) ∩ D≤Tm | ≥ ∆2/3 , D ≤ ∆2/3 n(k − ∆) ∆2/3  eC9 2∆ ≤ ∆2/3 (k − ∆) 





≤ exp(−∆2/3 )

(see below)

for sufficiently large ∆

The first inequality comes from considering the number of times at which a disagreement in N (w) might form, multiplied by the probability of forming such a disagreement at a particular time. Since we may assume D doesn’t occur, there are at most ∆2/3 vertices from which this disagreement in N (w) might form. Moreover, except for w itself, our assumption that girth ≥ 5 implies that each of these disagreements is adjacent to at most one vertex in N (w). Thus the total number of edges from a previous disagreement to a vertex in N (w) is at most ∆ + (∆2/3 − 1). Considering that a particular edge has at most a 1/n(k − ∆) chance to propagate a disagreement at a given time step, the claim follows.

16

Finally, the bound on B3 follows by a union bound over all the ≤ ∆R vertices w ∈ BR , using that R ≤ 2∆2/5  ∆2/3 . We can bound the probability of the event B4 (t) in a similar way. First, fix a vertex w ∈ BR . We will assume B3 (t) does not occur, so in particular, at most ∆2/3 neighbors of w are disagreements. Since we are assuming girth ≥ 7, this implies that at most ∆5/3 + ∆2/3 edges ever go from disagreements to B2 (w). 

Pr |B2 (w) ∩ DTm | ≥ ∆

5/6





, D, B 3 ≤



∆5/3 + ∆2/3 n(k − ∆) !∆5/6 eC9 2∆5/3 ∆5/6 (k − ∆)

Tm ∆5/6



!∆5/6

≤ exp(−∆5/6 ), for sufficiently large ∆. The bound on Pr (B4 ) follows by a union bound over w ∈ BR . Lemma 13. EHTm 1(G) < 1/4. Proof We will bound the expected change in H(Xt , Yt ) using path coupling. Thus, let W0 = Xt , W1 , W2 , . . . , Wh = Yt be a sequence of colorings where h = H(Xt , Yt ) and Wi+1 is obtained from Wi by changing the color of one vertex wi from Xt (wi ) to Yt (wi ). We maximally couple Wi and 0 . More precisely, both chains recolor Wi+1 in one step of the Glauber dynamics to obtain Wi0 , Wi+1 the same vertex, and maximize the probability of choosing the same new color for the chosen vertex. Consider a pair Wi , Wi+1 . With probability 1/n both chains recolor wi to the same color, and the distance decreases by one. Consider z ∈ N (wi ), and let c1 = Wi (wi ) and c2 = Wi+1 (wi ). Note, color c1 is not valid for z in Wi , however, it is valid in Wi+1 if c1 6∈ Wi+1 (N (z) \ {wi }). Similarly, color c2 is valid in Wi+1 , but it is valid in Wi if c2 6∈ Wi (N (z) \ {wi }). If at least one of these two cases hold, with probability at most 1/n min{A(Wi , z), A(Wi+1 , z)}, vertex z is recolored to different colors in the two chains. Otherwise z will be recolored the same in both chains. Therefore, given Wi , Wi+1 , 0 )) − H(Wi , Wi+1 ) ≤ − E(H(Wi0 , Wi+1

1 1 + n n

X z∈N (wi )

1(U (Wi , z, wi , c1 , c2 )) min{A(Wi , z), A(Wi+1 , z)}

(8)

In any coloring every vertex has at least k − ∆ available colors. Since k − ∆ ≥ ∆/3, we have the following trivial bound. Given Wi , Wi+1 , 0 E(H(Wi0 , Wi+1 )) − H(Wi , Wi+1 ) ≤ −

17

1 ∆3 2 + = . n n∆ n

(9)

Therefore, given Xt , Yt , E(H(Xt+1 , Yt+1 )) − H(Xt , Yt ) ≤

2 H(Xt , Yt ). n

(10)

This bound will only be used for the burn-in phase of Tb steps. We will need to do significantly better for the remaining Tm − Tb steps of an epoch. Assume that G(t) holds. We will bound the distance in (8) separately for part (a) and part (b) of Theorem 1. Suppose G has girth ≥ 5 and k = (1 + )α∗ ∆,  < .3. For all 0 ≤ i ≤ h, z ∈ BR , all t ∈ [Tb , Tm − 1], A(Wi , z) ≥ A(Xt , z) − ∆2/3 ≥ Θ0 − ∆2/3 Hence, for t ∈ [Tb , Tm ], given Wi , Wi+1 , 0 E(H(Wi+1 , Wi0 ) − H(Wi+1 , Wi )) ≤ −

1 ∆  + ≤− 1/3 n (Θ0 − ∆ )n 4n

(11)

Suppose G has girth ≥ 7 and k = (1 + )β ∗ ∆,  < .3. For all 0 ≤ i ≤ h, z ∈ BR , c1 , c2 ∈ [k], all t ∈ [Tb , Tm − 1], X y∈N (z)

X 1(U (Xt , y, z, c1 , c2 )) 1(U (Wi , y, z, c1 , c2 )) ∆5/6 ≤ + ≤ Ψ0 + 5∆5/6 , min{A(Wi , y), A(Wi+1 , y)} k−∆ A(Xt , y) − ∆1/3

(12)

y∈N (z)

since A(Xt , y) ≥ k − ∆ > ∆/3. Plugging (12) into (8) proves (11) for part (b) of the Theorem. Therefore, for parts (a) and (b) of the Theorem, for t ∈ [Tb , Tm − 1], given Xt , Yt , E(H(Xt+1 , Yt+1 ) − H(Xt , Yt )) ≤ −

 H(Xt , Yt ). 4n

(13)

Let t ∈ [Tb , Tm − 1]. Then EHt+1 1(G(t)) = E(E (Ht+1 1(G(t)) | X0 , Y0 , . . . , Xt , Yt )) = E(E (Ht+1 | X0 , Y0 , . . . , Xt , Yt )1(G(t))) ≤ (1 − /4n)EHt 1(G(t)) ≤ (1 − /4n)EHt 1(G(t − 1)) The above derivation deserves some words of explanation. In brief, the first equality is Fubini’s Theorem, the second is because G(t) is determined by X0 , Y0 , . . . , Xt , Yt . The first inequality uses (13) and the definition of G(t), and the second inequality uses G(t) ⊂ G(t − 1). 18

By induction, it follows that EHTm 1(G) ≤ (1 − /4n)Tm −Tb EHTb 1(G(Tb )). And by (10) and the exact same argument for t ∈ [0, Tb − 1], EHTm 1(G) ≤ (1 − /4n)Tm −Tb (1 + 1/3n)Tb H0 .

(14) 2

The result follows from the choice of constants (note, H0 = 1). This completes the proof of Theorem 9.1. Now we will prove Theorem 9.3. Proof of Theorem 9.3. Consider the event B10 that DTm 6⊂ BR0 (v) where R0 = R − as in the upper bound on B1 , we have:   R 0  1 0 R0 Tm Pr B1 ≤ ∆ R0 n(k − ∆) < (∆C9 e/(k − ∆)R0 )R √ < exp(− ∆) for ∆ sufficiently large since R0 >



∆. Proceeding

0

√ ∆.

Hence, we can assume the disagreements are contained in BR0 (v). By Lemma 6, each vertex w ∈ 0 BR (v) is not (2e)-heavy with probability at least 1 − √exp(−∆/C6 ). Therefore, all v ∈ BR0 (v) are above suspicion with probability at least 1 − 2 exp(− ∆).

References [1] J. van den Berg and J.E. Steif. Percolation and the hard-core lattice gas model. Stochastic Processes and their Applications, 49(2):179–197, 1994. [2] G. R. Brightwell and P. Winkler. Random colorings of a Cayley tree. Contemporary combinatorics, 10:247–276, 2002. [3] R. Bubley and M.E. Dyer. Path coupling: a technique for proving rapid mixing in Markov chains. In Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 223–231, 1997. [4] M. Dyer, A. Frieze. Randomly coloring graphs with lower bounds on girth and maximum degree. Random Structures and Algorithms, 23(2):167-179, 2003. 19

[5] M.E. Dyer, A.M. Frieze and R. Kannan. A random polynomial time algorithm for approximating the volume of convex bodies. Journal of the Association for Computing Machinery 38(1):1–17, 1991. [6] T. P. Hayes. Randomly coloring graphs of girth at least five. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC), 269–278, 2003. [7] T. P. Hayes. Uniformity Properties for Glauber Dynamics on Graph Colorings. Submitted to Random Structures and Algorithms. Version dated March 5, 2009, available from: http://www.cc.gatech.edu/˜vigoda/Hayes.pdf [8] T. P. Hayes and E. Vigoda. A Non-Markovian Coupling for Randomly Sampling Colorings. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 618–627, 2003. [9] T. P. Hayes and E. Vigoda. Variable length path coupling. Random Structures and Algorithms, 31(3):251-272, 2007. [10] M.R. Jerrum. A very simple algorithm for estimating the number of k-colorings of a low-degree graph. Random Structures and Algorithms, 7(2):157–165, 1995. [11] M.R. Jerrum, A. Sinclair and E. Vigoda. A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries. Journal of the Association for Computing Machinery, 51(4):671-697, 2004. [12] M.R. Jerrum, L.G. Valiant and V.V. Vazirani, Random generation of combinatorial structures from a uniform distribution, Theoretical Computer Science, 43(2-3):169–188, 1986. [13] R. Kannan, L. Lov´ asz and M. Simonovits. Random walks and an O∗ (n5 ) volume algorithm for convex bodies. Random Structures and Algorithms, 11(1):1–50, 1997. [14] L. Lov´asz and S. Vempala. Simulated Annealing in Convex Bodies and an O∗ (n4 ) Volume Algorithm. Journal of Computer and System Sciences, 72(2):392-417, 2006. [15] M. Molloy. The Glauber dynamics on colorings of a graph with high girth and maximum degree. SIAM Journal on Computing, 33(3):712-737, 2004. [16] J. Salas and A. Sokal. Absence of phase transition for antiferromagnetic Potts models via the Dobrushin uniqueness theorem. Journal of Statistical Physics, 86(3-4):551–579, 1997. [17] D. Stefankovic, S. Vempala, and E. Vigoda. Adaptive Simulated Annealing: A Near-optimal Connection between Sampling and Counting. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 183-193, 2007.

20

[18] E. Vigoda. Improved bounds for sampling colorings. Journal of Mathematical Physics, 41(3):1555-1569, 2000.

21