reconstruction sequences and equipartition

0 downloads 0 Views 261KB Size Report
We take ?n to be the set of words. Xn! = (!1;. ; !n) in n for which Tn;m(!)a] is close to m a] for all m n and all a 2 m. The sequence f?ng is a canonical sequence.
RECONSTRUCTION SEQUENCES AND EQUIPARTITION MEASURES: AN EXAMINATION OF THE ASYMPTOTIC EQUIPARTITION PROPERTY J.T. Lewisa , C.-E. P sterb , R. Russella, W.G. Sullivanac a Dublin Institute

for Advanced Studies 10 Burlington Road Dublin 4, Ireland b Ecole Polytechnique Federale Departement de Mathematiques CH-1015 Lausanne, Switzerland c University College Department of Mathematics Bel eld, Dublin 4, Ireland

Abstract: We consider a stationary source emitting letters from a nite al-

phabet A. The source is described by a stationary probability measure on the space := AIN of sequences of letters. Denote by n the set of words of length n and by n the probability measure induced on n by . We consider sequences f?n  n : n 2 INg having special properties. Call f?n  n : n 2 INg a supporting sequence for if limn n [?n] = 1. It is well-known that the exponential growth-rate of a supporting sequence is bounded below by hSh ( ), the Shannon entropy of the source . For ecient simulation, we require ?n to be as large as possible, subject to the condition that the measure n is approximated by the equipartition measure n?n , the probability measure on n which gives equal weight to the words in ?n and zero weight to words outside it. We say that a sequence f?n  n : n 2 INg is a reconstruction sequence for if each ?n is invariant under cyclic permutations and limn m?n = m for each m 2 IN. We prove that the exponential growth-rate of a reconstruction sequence is bounded above by hSh ( ). We use a large{deviation property of the cyclic empirical measure to give a constructive proof of an existence theorem: if is a stationary source, then there exists a reconstruction sequence for having maximal exponential growth-rate; if is ergodic, then the reconstruction sequence may be chosen so as to be supporting for . We prove also a characterization of ergodic measures which appears to be new.

Key Words: asymptotic equipartition property, empirical measure, stationary, ergodic, Kolmogorov, reconstruction, large deviations

Reconstruction Sequences and Equipartition Measures

1

1 Introduction Let n denote the set of words of length n formed using letters taken from a nite alphabet A of size r; if ?n is a subset of n , then we denote the number of elements in ?n by #?n. Let := AIN denote the set of all sequences of letters from A { the set of words of in nite length. Suppose the source emitting the letters which form the words is described by a stationary probability measure on the the space ; can we nd a sequence f?n  n : n 2 INg of sets of words of increasing length from which we can reconstruct the measure ? Take A = f1; 0g with the Bernoulli ( 31 ; 32 ) measure on = AIN. De ne, for n such that n3 is an integer, n 1 1 ?n := fa 2 n : aj = g ; (1.1) n 3 X

j =1

these are the words of length n in which the relative frequencies of ones and zeroes are 31 and 23 . We claim that the measure is determined completely by the sequence f?n  n : n3 2 INg. The rst step is to construct a sequence of equipartition measures. De ne n?n to be the probability measure on n which gives equal weight to the words in ?n and zero weight to words outside it: for each subset n of n , put n \ ?n ) : (1.2) n?n [n] = #(#? n For m < n, every measure n on n induces a measure m on m via the projection Xmn : n ! m which selects the rst m letters from a word of length n. We claim that ?n (1.3) nlim !1 m [a] = m [a] n3 2IN

for every a 2 m and every m 2 IN; here m is the measure on m induced by the measure on via the projection Xm : ! m which selects the rst m letters in an in nite sequence. But the set f m[a] : a 2 m ; m 2 INg is precisely the data required, according to Kolmogorov's Reconstruction Theorem [K], to determine the measure completely. Our claim (1.3) can be proved using a conditional limit theorem of van Campenhout and Cover [CC]; see (6.13) of Section 6. We say that a sequence f?n  n : n 2 INg is a reconstruction sequence for if each ?n is invariant under cyclic permutations and ?n lim (1.4) n m = m for each m 2 IN ; an alternative de nition of the concept is discussed in Section 6. The concept of a reconstruction sequence for is illustrated by the example of the sequence f?n  n : n 2 INg de ned by (1.1). For ecient simulation, we would like the sequence to grow as fast as possible so that we have large samples of words of reasonable length. Consider the sequence constructed using a thickened shell: n 1  (1.5) ?n := fa 2 n : j n aj ? 31 j  g ; j =1 X

Reconstruction Sequences and Equipartition Measures

2

this sequence has a faster growth-rate than the sequence de ned by (1.1); it is a n 1 ? reconstruction sequence, but not for : for 0 <  < 6 , the sequence f g converges to  , the Bernoulli ( 31 + ; 23 ? ) measure on . This can be deduced from a conditional limit theorem proved in [LPS1] (see also [LPS2]). (For   61 , we recover the Bernoulli ( 12 ; 21 ) measure.) These examples illustrate a property of reconstruction sequences: they cannot grow too quickly. In fact, we have the following upper bound on the growth-rate:  If f?n  n : n 2 INg is a reconstruction sequence for , then limnsup n1 log #?n  hSh ( ) ; (1.6) where hSh ( ) is the Shannon entropy of . There are reconstruction sequences which grow very slowly; in Section 6, we give a proof of the following result: Let be a stationary source; then there exists a reconstruction sequence f?n  n : n 2 INg for which has zero growth-rate: limnsup n1 log #?n = 0 : (1.7) We have the following existence theorem:  Let be a stationary source; then there exists a reconstruction sequence for having maximal growth-rate. We turn our attention to another property which a sequence of sets of words may have: we call a sequence f?n  n : n 2 INg a supporting sequence for if lim n n [?n ] = 1 ;

(1.8)

where n is the probability measure induced on n . The sequence de ned by (1.5) is, for all values of  > 0, an example of a supporting sequence for the Bernoulli ( 13 ; 23 ) measure while that de ned by (1.1) fails to be. A supporting sequence cannot grow too slowly. We have the following lower bound to the growth-rate:  If f?n  n : n 2 INg is a supporting sequence for , then limninf n1 log #?n  hSh ( ) : (1.9) For economical coding, it is important to have a supporting sequence which grows as slowly as possible. We have the following existence theorem:

Reconstruction Sequences and Equipartition Measures

3

 Let be an ergodic source; then there exists a supporting sequence for having minimal growth-rate.

Since the Shannon entropy is a lower bound on the growth-rate of a supporting sequence and an upper bound on the growth-rate of a reconstruction sequence, a sequence which has both properties has a growth-rate equal to the Shannon entropy. Let us examine how we can modify the construction (1.5) in order to get a sequence which is both a reconstruction sequence for and a supporting sequence for . De ne n p ?0n := fa 2 n : j 1 aj ? 1 j  log n= ng: (1.10) n j=1 3 We can use the conditional limit theorem in [LPS1] to prove that the sequence f?0n g has the reconstruction property and the Central Limit Theorem to prove that it has the supporting property for the Bernoulli ( 31 ; 32 ) measure. X

Let us examine this construction more closely. It selects those words of length n for which the relative frequency of ones lies in a closed neighbourhood of 31 (and hence the relative frequency of zeroes lies in a closed neighbourhood of 23 ). We can think of the measure as being described by a vector ( 31 ; 32 ). Introducing a relative frequency vector n n 1 1 Rn (a) := ( n aj ; 1 ? n aj ) ; (1.11) j =1 j =1 we can re-write (1.10) as (1.12) ?0n := R?n 1 Fn p where Fn is the closed ball of radius log n= n centred on the point ( 31 ; 23 ). In other words, what we have done is to de ne a mapping Rn from n to the space of Bernoulli measures and a decreasing sequence fFng of closed neighbourhoods of in the space of Bernoulli measures whose intersection is , and taken ?n 0 to be those words a 2 n for which Rn (a) lies in Fn. This choice has some nice properties: 1. the set ?n 0 is invariant under cyclic permutations of the letters in the words | this is important because the measure which we are attempting to approximate is stationary; 2. the set ?n 0 is nonempty for all n suciently large | this is important because we want to condition on it. X

X

In order to prove our existence theorems, we need to generalize the construction which produced the sequence f?n 0g. We introduce a class of sequences called canonical sequences; to de ne+ these, we make use of the cyclic empirical measure, a mapping Tn from to M1 ( ), the space of probability measures on . The cyclic empirical measure is a generalization of the relative frequency vector which will do what we want in the general case | its precise de nition will be given later. For the present, we will describe it in terms of its marginals. For each ! 2 ,

Reconstruction Sequences and Equipartition Measures

4

we have a measure Tn(!)[] de ned on subsets of ; the projection Xm : ! m induces a measure Tn;m(!)[] on the subsets of m:

Tn;m(!)[m] := Tn(!)[Xm?1 m] : (1.13) For m  n, and a = (a1;  ; am) 2 m, we can describe Tn;m(!)[a] directly. Consider the n cyclic permutations of the word Xn ! = (!1;  ; !n ): (!1;  ; !n ); (!2;  ; !n ; !1);  ; (!n; !1;  ; !n?1) ; (1.14) then Tn;m(!)[a] is the fraction of these in which the rst m entries coincide with a = (a1; :::; am). Thus Tn;1(! )[a1] is just the relative frequency of the letter a1 in the word Xn !, Tn;2(!)[(a1; a2)] is the relative frequency of the adjacent pair (a1; a2) in the (cyclic) word Xn !, and so on. We take ?n to be the set of words Xn ! = (!1;  ; !n ) in n for which Tn;m (!)[a] is close to m[a] for all m  n and all a 2 m . The sequence f?n g is a canonical sequence. Of course, it is necessary to say what we mean by `close to'; that is what is accomplished by the formal de nition: let fFng be a decreasing sequence of closed neighbourhoods of in the space of measures whose intersection is ; for each n, the measure Tn(!) depends only on the rst n coordinates of ! and so Tn?1Fn determines a subset ?n of n ; a sequence f?n  n : n 2 INg constructed in this way with ?n nonempty for all n suciently large, is called the canonical sequence based on fFng. The de nition of Tn ensures that the set ?n is cyclically invariant. Our reason for introducing the concept of a canonical sequence is the following result which holds for an arbitrary stationary source :  Every canonical sequence for is a reconstruction sequence for . All we have done so far is to push the problem of existence one stage back: does there exist a canonical sequence for an arbitrary stationary source ? There is no diculty in nding a sequence of neighbourhoods which contract to ; the problem is to prove that the subsets ?n which they determine are non-empty | at least for all n suciently large. One way of doing this is to show that the growth-rate of f?n g is strictly positive; this will be the case if the sequence of neighbourhoods of contracts suciently slowly. Our strategy is to start with an arbitrary sequence of closed neighbourhoods contracting to and slow its rate of contraction until we are sure that the corresponding subsets ?n are growing fast enough; to check on this, we use large{deviation theory. (In fact, we use only the most basic result of the theory: the large{deviation lower bound, a direct consequence of the existence of the rate{ function; see [LP], for example. A derivation of the large{deviation properties of the cyclical empirical measure which we require can be found in [LPS].) We prove the following result:  Let be a stationary source; then there exists a canonical sequence for having maximal growth-rate. A canonical sequence is not necessarily supporting. In the case of a Bernoulli measure, we were able to use the Central Limit Theorem to nd a rate which makes the

Reconstruction Sequences and Equipartition Measures

5

sequence fR?n 1 Fng supporting; in the general case, we do not have such a precise estimate available. Nevertheless, when the measure is ergodic we are able to use the Ergodic Theorem to prove the existence of a canonical sequence which is supporting. The converse also holds so that we have the following characterization of ergodic measures:  Let be a stationary source; then is ergodic if and only if there exists a canonical sequence which is supporting for . We have seen that canonical sequences of subsets are useful and arise naturally in the reconstruction problem for stationary sources. It is instructive to compare them with the set of `typical' sequences of letters associated with an ergodic source. Let be an ergodic source; there exists a set ( )  with [( )] = 1 such that each sequence ! in ( ) determines uniquely (see Section 6) . For a stationary source , a canonical sequence plays an analogous r^ole: let f?n  n : n 2 INg be a canonical sequence for ; any sequence fan 2 ?n : n 2 INg of words determines uniquely (this is proved in Section 6). A canonical sequence has some advantages over the typical set: one is that every stationary source has a canonical sequence | it is not necessary that the source be ergodic; another is that a canonical sequence is associated with an increasing sequence fFn : n 2 INg of {algebras, where Fn is generated by the rst n coordinate functions, while the typical set ( ) is in the tail {algebra so that the rst n coordinates of an element of ( ) are irrelevant. To put our results in context, it may be useful to recall the Asymptotic Equipartition Property: in terms of the concepts used here, the conclusion of the theorem of Shannon{McMillan{Breiman ([S], [M], [B]), may be stated: Let be an ergodic source; then for each  > 0 there exists a sequence f?n g which is supporting for and whose growth{rate satis es hSh ( )  limninf n1 log #?n  limnsup n1 log #?n  hSh ( ) +  : (1.15) It follows from the construction used in the proof that each word a 2 ?n satis es

b?n(hSh ( )+)  n [a]  b?n(hSh ( )?) ;

(1.16)

where b is the base of logarithms used in the de nition of the Shannon entropy (see (2.15) below); this is the origin of the name asymptotic equipartition property. The sequence f?n g is a not a reconstruction sequence for . In Section 6, we discuss how this construction may be re ned to yield a sequence which has both properties.

Reconstruction Sequences and Equipartition Measures

6

2 Statement of Results In Section 2, we make precise the concepts introduced informally in Section 1 and sketch proofs of our main theorems. The main result of this paper is an existence theorem: Theorem 2.1 Let be a stationary source; then there exists a reconstruction sequence for having maximal growth-rate. If, in addition, is ergodic, then the reconstruction sequence may be chosen so as to be a supporting sequence for . We will give a constructive proof of this theorem. A by-product of this investigation is a characterization of ergodic measures: Theorem 2.2 Let be a stationary source; then is ergodic if and only if there exists a canonical sequence for which is supporting for . This section is, to some extent, self-contained: we recall the de nitions and results required to understand the concepts de ned here. We state six lemmas, indicating roughly on what their proofs depend; we prove our existence theorem using the rst ve | the sixth is used to complete the proof of the characterization of ergodic measures. The reader who is prepared to accept the lemmas need read no further. The rst two lemmas are proved in Section 3 using properties of the speci c information gain de ned there. The third lemma is crucial: it states that every canonical sequence is a reconstruction sequence; it is proved in Section 4.

To construct sequences with the required properties, we make use of the cyclic empirical measure to de ne canonical sequences of subsets. The sequence of probability distributions of the cyclic empirical measure with respect to the uniform product measure on satis es a large deviation principle with the speci c information gain as rate-function. This is exploited in Section 5 where the fourth, fth and sixth lemmas are proved. Some of the ideas have their origin in statistical mechanics; some readers will nd reference to this confusing while others will nd it enlightening. Having in mind those in the rst category, we make no reference to statistical mechanics in the body of the paper; for the others, we provide in the nal section, Section 6, a commentary on the concepts and results. We now make precise the structures we are considering. The space = AIN is the space of in nite sequences with entries taken from a nite alphabet A = fa(1); : : :; a(r)g having r > 1 letters; the map xj : ! A is the coordinate projection onto the j th factor in the product. Let Fn = (x1; : : :; xn) be the {algebra generated by the rst n coordinate functions and let F = (xn : n 2 IN) be the {algebra generated by all coordinate functions. Since A contains r elements, the {algebra Fn is generated by the rn atoms fAa = Xn?1 a : a 2 An g, where Xn : ! n := An is the projection onto the rst n coordinates. Sometimes the discussion can be clari ed by working on n rather than . Let be a probability measure de ned on F . On n

Reconstruction Sequences and Equipartition Measures

7

we have n, the image measure de ned on the subsets of n by n [B ] = [Xn?1B ]. Equivalently, one could consider on restricted to the {algebra Fn. The two viewpoints are complementary. Recall that, for m < n, every measure n on n induces a measure m on m via the projection Xmn : n ! m which selects the rst m letters from a word of length n; since Xm = Xmn  Xn , it follows that if the measures fn : n 2 INg are induced from a probability measure  on F so that for all n 2 IN we have n =   Xn?1 ; (2.1) then they satisfy the compatibility conditions m = n  (Xmn )?1 (2.2) for all m 2 IN and all n > m. Conversely, Kolmogorov's Reconstruction Theorem [K] implies that given a sequence fn : n 2 INg of probability measures satisfying the compatibility conditions (2.2), there exists a unique probability measure  on F such that for all n 2 IN the probability measures n are given by (2.1). For a function f : ! IR, we write f 2 Fn to mean that f is Fn {measurable and bounded; we write f 2 Floc to mean that there exists a nite n with f 2 Fn. We use the notation M+1 to denote the space of probability measures on ( ; F ) with the coarsest topology for which each mapping Z

(2.3) M+1 3  7! f d 2 IR is continuous whenever f 2 Floc: this is called the bounded local topology. In Section 1, we encountered the following notion of convergence: a sequence f(n) : n 2 INg of probability measures on F converges to the probability measure  in the sense of convergence of nite{dimensional marginals if, for every m 2 IN and every a 2 m , (n) (2.4) lim n m [a] = m [a] : In the present set{up, convergence of nite{dimensional marginals is equivalent to convergence in the bounded local topology; this can be seen from the following considerations: (mn)[a] is the integral of the indicator function 1a(!) = 1 if !1 = a1; : : : ; !n = an (2.5) 0 otherwise. of the atom Xm?1 a of Fm and the set f1a 2 Fm : a 2 m ; m 2 INg spans Floc. 

We use a product probability measure on ( ; F ) as a reference measure; we take to be the measure on F which, for all n 2 IN, assigns equal probability to each of the rn atoms of Fn, so that [Aa] = r?n for each a 2 n . Notice that for ?n  n , we have #?n : n[?n] = #

(2.6) n Let be a probability measure on ( ; F ); we may think of as characterizing the statistical properties of the source of the words, and we shall refer to itself as the source.

8

Reconstruction Sequences and Equipartition Measures

We recall the de nitions of stationary measure and ergodic measure. We de ne the shift operator S on by (S!)k := !k+1 k 2 IN: (2.7) The shift S acts on functions f : ! IR by composition: Sf := f  S: (2.8) We de ne the action of S on a measure by Z

Z

f d (S ) := (Sf ) d :

From the shift operator, we construct the averaging operator: k?1 Ak := k1 S j : j =0 X

(2.9) (2.10)

A source is stationary if it is invariant under the shift: for all B 2 F , [B ] = [S ?1B ] : (2.11) A stationary source satis es the Ergodic Theorem: Let be a stationary probability measure; for f 2 L1( ), the limit f (!) := lim (2.12) n (An f )(! ) exists {almost surely and the function ! 7! f (!) is shift-invariant and satis es f d = f d : (2.13)

Z

Z

A source is ergodic if it is stationary and it assigns probability zero or one to each invariant subset: for all B 2 F such that S ?1B = B , either [B ] = 0 or [B ] = 1. We have the following Corollary to the Ergodic Theorem: If is ergodic, then the limit (2.12) is constant for {every ! and hence

f (!) =

Z



f d ;

?a:e: :

(2.14)

Recall that hSh( ), the Shannon entropy of a stationary source , is non-negative and given by 1 n [a] log n [a] ; (2.15) hSh ( ) = ? lim n n X

a2 n

where the logarithm is taken in some xed base b > 1. De nition 2.1 Let be a stationary source. A sequence f?n  n : n 2 INg is said to be a supporting sequence for if and only if the condition (2.16) nlim !1 n [?n ] = 1 holds.

Reconstruction Sequences and Equipartition Measures

9

Example A simple example of a supporting sequence is f?n = n : n 2 INg. The

elements of this supporting sequence are too big: in the context of data compression, the goal is to choose the sets ?n to be as small as possible, consistent with condition (2.16) holding.

A lower bound on the exponential growth-rate of a supporting sequence is provided by the following result, a consequence of elementary properties of the speci c information gain; it will be proved in Section 3, Proposition 3.1. Lemma 2.1 Let be a stationary source. If f?n  n : n 2 INg is a supporting sequence for , then limninf n1 log #?n  hSh ( ) : (2.17)

This result motivates the following de nition. De nition 2.2 Let be a stationary source. A sequence f?n  n : n 2 INg is said to have entropic growth-rate for if and only if 1 (2.18) lim n n log #?n = hSh ( ) : As a rst step in the de nition of a reconstruction sequence, we de ne a class of probability measures, the equipartion measures. In Section 1, we de ned them `downstairs' on n: the equipartition measure n?n determined by the subset ?n of

n is the probability measure on n which gives equal weight to the words in ?n and zero weight to words outside it. Here we de ne them `upstairs' on with the aid of the reference measure . De nition 2.3 Let ?n be a subset of n with n[?n] > 0; we call the probability measure given on F by (2.19) ?n [  ] := [  j Xn?1?n ] the equipartition measure determined by ?n . Notice that, for each subset n of n , we have n \ ?n ) n?n [n] = #(#? : (2.20) n Although the original measure is stationary, the equipartition measure ?n is not stationary unless ?n = n . Since we wish to use equipartition measures to approximate a stationary measure , we have to do something about this. The most elegant solution is to de ne a reconstruction sequence with the aid of the averaging operator (2.10): a sequence f?n  n : n 2 INg is a reconstruction sequence for if limn An ?n = . While readers familiar with ergodic theory may nd this de nition natural, others may nd it puzzling. For this reason, we prefer to adopt a de nition in which the averaging is performed `downstairs' on n rather than `upstairs' on ; the connection between the two de nitions is discussed in Section 6. We use n, the cyclic permutation operator acting on n : a

= (a1; a2; : : :; an) 7! n a := (a2; : : : ; an; a1) :

(2.21)

10

Reconstruction Sequences and Equipartition Measures

De nition 2.4 Let be a stationary source. A sequence f?n  n : n 2 INg is said to be a reconstruction sequence for if and only if 1. for all n suciently large, n[?n ] > 0; 2. each ?n is invariant under the cyclic permutation n; 3. the corresponding sequence f ?n g of equipartition measures converges to : ?n lim n m = m

for each m 2 IN :

(2.22)

For reconstruction sequences, we have the following upper bound on the exponential growth-rate; it will be proved in Section 3, Proposition 3.2, using the lower semicontinuity of the speci c information gain.

Lemma 2.2 Let be a stationary source. If f?n  n : n 2 INg is a reconstruction

sequence for , then

limnsup 1 log #?n  hSh ( ) : (2.23) n We have the following obvious corollary: Corollary 2.1 Let be a stationary source. If a supporting sequence for is also a reconstruction sequence for , then it has entropic growth-rate.

Examples: Take = f1; 0gIN with the Bernoulli ( 21 ; 21 ) probability measure. Let be the 1 Bernoulli ( 3 ; 32 ) measure. De ne

n 1 ?n := fa 2 n : j n aj ? 31 j  ng: j =1 X

(2.24)

 If n < 31n , then ?n = ; unless n is divisible by 3; limn n1 log #?n does not exist.

 If n = 31n , then for each n 2 IN there is exactly one k so that a 2 ?n implies n a = k , and #? = n . A direct calculation shows that f? g is a n n 1 j k reconstruction sequence for . A simple computation using Stirling's formula shows that f?n g has entropic growth-rate. It is not supporting.  If n = log n=pn, then the Central Limit Theorem shows that f?n g is supP





porting for . Direct arguments show that it is also a reconstruction sequence, hence has entropic growth-rate.  If n = , where 0 <   61 is a constant, then the sequence is supporting, but not a reconstruction sequence, for . It is a reconstruction sequence with entropic growth-rate for the Bernoulli p{measure with p = 31 + . However, it is not supporting for this p{measure.

Reconstruction Sequences and Equipartition Measures

11

One may also ask if a sequence can be both supporting and have entropic growthrate for two distinct measures. To see that this is the case, let be the Bernoulli ( 23 ; 31 ) measure and let  := 12 + 21 ; de ne ?n := fa 2 n : j n1

n

X

j =1

p

aj ?

2j   g n 3

(2.25)

and let ?n := ?n [ ?n . With n = log n= n, the sequence f?ng is supporting and of entropic growth-rate for , and . It is a reconstruction sequence for the non{ergodic  . Sets forming a reconstruction sequence may grow very slowly; for examples with zero exponential growth-rate, see Section 6. Our existence proofs make use of a construction which generalises that used for a Bernoulli measure in the above examples. De ne the blocking operator Pn:

Pn (!) = (!1; : : : ; !n; !1; : : : ; !n ; : : :): (2.26) which is Fn{measurable since Pn(!) depends only on !1; : : : ; !n. Next we de ne the cyclic empirical measure Tn (!) := AnPn(!) (2.27) in the space M+1 ( ) of probability measures on ( ; F ). Since the map Tn : ! M+1 is Fn-measurable, the inverse image Tn?1A of a subset A of M+1 is determined completly by the rst n coordinates.

De nition 2.5 Let Tn : ! M+1( ) be the cyclic empirical measure. A sequence f?n  n : n 2 INg is said to be a canonical sequence for if and only if 1. there exists a decreasing sequence fFng of closed neighbourhoods of whose intersection is f g; 2. each set ?n is given by

?n = Xn Tn?1Fn ;

(2.28)

3. for all n suciently large, n[?n ] > 0. In this case, we shall say that the canonical sequence f?n g is based on the sequence fFng.

The key to the proof of our main theorem is a conditional limit theorem; this is the subject of Section 4. It says that if f?n g is a canonical sequence for the stationary measure , then the sequence f ?n g of conditioned measures converges to : Lemma 2.3 Let be a stationary source; every canonical sequence for is a reconstruction sequence for .

Reconstruction Sequences and Equipartition Measures

12

This result is an easy consequence of the cyclical invariance of the sets ?n and the compactness of the space M+1 ( ). A great advantage which comes from working `upstairs' on is that we have available results on the large deviation properties of the cyclic empirical measure. The results we need are summarized in Section 5; proofs can be found in [LPS]. We use the large-deviation lower bound to prove the existence of a canonical sequence for a stationary measure. Lemma 2.4 Let be a stationary source; then there exists a canonical sequence for having entropic growth{rate. Since the alphabet is assumed to be nite, the existence of a decreasing sequence of closed neighbourhoods contracting to is easily established; the large-deviation lower bound is used to control the rate at which the sequence contracts to so as to ensure that, at least for all n suciently large, the sets ?n = Xn Tn?1Fn satisfy n[?n ] > 0. We do so by exhibiting a sequence f?n g whose growth{rate is bounded below by the Shannon entropy of ; it then follows from Lemma2.3 and Lemma 2.2 that the growth{rate is entropic. A canonical sequence for is not necessarily supporting; however, when the source is ergodic, we can use the Ergodic Theorem in place of the large-deviation lower bound to control the sequence of contracting neighbourhoods. In this way, we can ensure that the sequence we construct is supporting and, by Lemma 2.1, canonical. This is done in Section 5, Proposition 5.2, establishing the following result: Lemma 2.5 Let be an ergodic source; then there exists a canonical sequence for which is a supporting sequence for . In Section 5, we use the compactness of M+1 to prove the converse of Lemma 2.5: Lemma 2.6 If there exists a sequence which is both canonical and supporting for a stationary source , then must be ergodic. We are ready to prove Theorem 2.1: Lemmas 2.4 and 2.3 together prove that if is a stationary source, then there exists a reconstruction sequence for having entropic growth{rate; Lemma 2.5 proves that if the source is ergodic, then the reconstruction sequence may be chosen so as to be supporting for . Lemma 2.6 is the converse of Lemma 2.5; together they prove Theorem 2.2.

13

Reconstruction Sequences and Equipartition Measures

3 Information Gain The principal tool in the proofs of Lemma 2.1, the lower bound on the growth-rate of a supporting sequence, and Lemma 2.2, the upper bound on the growth-rate of a reconstruction sequence, is the speci c information gain. De nition 3.1 The information gain of the probability measure  with respect to the probability measure  is given by D(  jj  ) := d  log dd  (3.1) when  is absolutely continuous with respect to ; otherwise D(  jj  ) := +1. Z

De nition 3.2 The speci c information gain of the probability measure  with respect to  is given by

h(  j  ) := limnsup n1 D(jFn jj jFn ) ;

(3.2)

where jFn is the restriction of  to Fn. Note: D(jFn jj jFn ) = D(n jj n) . 



We always have D(  jj  )  0, h(  j )  0; if D(  jj  ) = 0, then  = , but the corresponding result for h(  j  ) does not always hold. However, if is stationary and  is a stationary product measure, then 1 h( j  ) = lim (3.3) n n D( jFn jj jFn ) : When A is a nite alphabet with r letters, and is the uniform product measure, this yields h( j ) = log r ? hSh ( ): (3.4) Likewise, when A is a nite alphabet and ?n  n , we have log n[?n] = log #?n ? n log r :

(3.5)

Henceforth we will replace #?n by n[?n] and hSh ( ) by ?h( j ) in the statements of the propositions. Modi ed in this way they hold in greater generality; this is discussed in Section 6.

Proposition 3.1 Let be a stationary source. If f?n g is a supporting sequence for , then

limninf n1 log n[?n]  ?h( j ) :

(3.6)

Proof: Since ?n := Xn?1 (?n ) is in Fn, we have e

D( jFn jj jFn )  [?n] log [?n] + [ n ?n ]) log [ n ?n ] [?n] [ n ?n ] e

e

e

e

e

e

(3.7)

14

Reconstruction Sequences and Equipartition Measures

 [?n ] log [?n ] + [ n ?n ] log [ n ?n ] ? [?n ] log [?n]: e

e

e

e

e

e

The inequality follows by dividing by n and taking lim supn, using [?n ] ! 1. e

(3.8)

2

Lemma 2.1 follows using (3.4), (3.5) and (3.6). There are some results of a more technical character which we require concerning cyclic symmetrization and the speci c information gain. We collect them in a lemma; they are proved in Section 8 of [LPS]. The space has a natural decomposition into a product space (3.9)

= n  cn ; and the measure ?n is the product measure of n?n on n and on cn . Notice that, for each subset n of n, we have n \ ?n ) n?n [n] = #(#? : (3.10) n

Lemma 3.1 Let ( 1; F1) and ( 2; F2) be measurable spaces. Let = 1  2 with F the corresponding product {algebra. Let  and be probability measures on ( ; F ) with 1 ; 2 and 1 ; 2 denoting the restrictions to F1; F2 considered as sub{{algebras of F . Assume = 1 2. Then we have D(jj ) = D(jj1 2) + D(1 jj 1) + D(2 jj 2) : (3.11) We are now in a position to prove the upper bound on the growth{rate of a reconstruction sequence. Proposition 3.2 Let be a stationary source. If f?n g is a reconstruction sequence for , then limnsup n1 log n[?n ]  ?h( j ) : (3.12)

Proof: By direct calculation, we have D( so that

?n

jFn

d ?n d ?n log d ?n = ? log [?n] ;

jj jFn ) =

Z

(3.13)

limninf n1 D( ?n jFn jj jFn ) = ? limnsup n1 log [?n] : (3.14) Next we make use of the cyclical invariance of ?n : for any integer k such that k +m  n, the projections of ?n on the {algebras (x1; : : : ; xm) and (xk+1; : : : ; xk+m) are the same. Let m < n and q(njm) be the largest integer smaller than n=m. From Lemma 3.1 we have 1 D( ?n jj )  q(njm) D( ?n jj ) : (3.15) jFn jFm jFn jFm n n

Reconstruction Sequences and Equipartition Measures

15

Since limn ?n = , it follows from the lower semicontinuity of D( jj jFm ) on the space of measures on m that ? limnsup n1 log n[?n ]  limninf n1 D( ?n jFn jj jFn ) (3.16)  m1 D( jFm jj jFm ) :

Hence

lim sup n1 log n[?n]  ?h( j ) : n

Lemma 2.2 follows using (3.4), (3.5) and (3.12).

(3.17)

2

16

Reconstruction Sequences and Equipartition Measures

4 A Conditional Limit Theorem We state and prove a conditional limit theorem. We shall need the following lemma which exploits the invariance of the reference measure under the the cyclic shift  operator Sn , de ned by:  mod n 6= 0; (Sn !)k := !!k+1 ifif kk mod (4.1) n = 0. k?n+1 We sometimes nd the following notation useful: let  2 M+1 and f 2 Floc; we set 

Z

hf; i := f d:

(4.2)

Lemma 4.1 Let ?n be Tn?1B{measurable with n[?n ] > 0 and let f 2 Fk with k  n; then (4.3) f (!) ?n [d!] = hf; Tn(!)i ?n [d!] :



C 2 B such that ?n = f! : Proof: Since ?n is Tn?1B {measurable, there exists     Tn(!) 2 C g. Note that Sn is bijective, S Pn =Sn Pn = Pn Sn , Snn+j =Snj , and e

Z

Z

e

e

n?1    Tn(!) =An Pn(!) with An := n1 Snj : X

j =0



(4.4)



Also Sn Tn(!) = Tn(!), Sn = , which imply 



Sn?1fTn 2 C g = fTn 2 C g and Sn [ jfTn 2 C g] = [ jfTn 2 C g]: Since f 2 Fk with k  n, we have f = f  Pn so Z

Z



f  Pn(!) An [d!jfTn 2 C g]

= hf; Tn(!)i [d!jfTn 2 C g]:

f (!) [d!jfTn 2 C g] =

(4.5) (4.6)

Z

2

A second ingredient in the proof of our conditional limit theorem is a lemma which states a simple consequence of the compactness of M+1 ; we shall need it again in Section 5. Lemma 4.2 Let be a stationary source and let f?n  n : n 2 INg be a canonical sequence for based on the sequence fFn g. Then for each f 2 Floc and each " > 0 there exists N (f; ") such that whenever n  N (f; ") Fn  f 2 M+1 : jhf; i ? hf; ij < "g: (4.7) Proof: fFng is a decreasing sequence of closed neighbourhoods of whose intersection is f g. We then have (4.8) (Fn n f 2 M+1 : jhf; i ? hf; ij < "g) = ;: \

n

We deduce there exists N so that FN nf 2 M+1 : jhf; i?hf; ij < "g) = ; because M+1 is compact, so we have (4.7). 2

Reconstruction Sequences and Equipartition Measures

17

Theorem 4.1 Let be a stationary source; every canonical sequence for is a reconstruction sequence for .

Proof: Let f?n  n : n 2 INg be a canonical sequence for based on the sequence fFng. Then, by Lemma 4.2, for each f 2 Floc and each " > 0 there exists N (f; ") such that f 2 FN (f;") and whenever n  N (f; ") we have Fn  f 2 M+1 : jhf; i ? hf; ij < "g: (4.9) It follows that n  N (f; ") and ! 2 ?n imply jhf; Tn(!)i ? hf; ij < ": (4.10) e

Since ?n is supported by ?n , we have e

Z



f (!) d!] = ?n [

Z e ?

n

hf; Tn(!)i ?n [d!] :

(4.11)

It follows from the above and (4.3) that Z

j f (!) ?n [d!] ? hf; ij < " whenever n  N (f; "). This proves that the sequence f ?n g converges to .

(4.12)

2

Reconstruction Sequences and Equipartition Measures

18

5 Canonical Sequences We begin by summarizing the ideas of large deviation theory and the single result we shall require; proofs can be found in [LPS]. Denote by IMn the distribution of the cyclic empirical measure Tn : ! M+1( ) de ned on the probability space ( ; F ; ), where is our reference measure | the uniform product measure: IMn :=  Tn?1 : (5.1) For each open set G, de ne (5.2) m[G] := limnsup n1 log IMn[G] ; (5.3) m[G] := limninf n1 log IMn [G] ; the following result is Lemma 8.3 of [LPS]: for each  2 M+1, we have inf fm[G] : G 3 g = inf fm[G] : G 3 g :

(5.4)

De nition 5.1 ([LP]) The Ruelle{Lanford function  is de ned on M+1 by (5.5) () := inf fm[G] : G 3 g = inf fm[G] : G 3 g : (5.6) It is a basic result in Large{Deviation Theory (see [LP]) that the existence of the Ruelle{Lanford function implies that the large{deviation lower bound holds for open subsets:  for each open set G, we have (5.7) sup ()  limninf n1 log IMn [G]; 2G In the present case, we have:  the Ruelle{Lanford function (RL{function) is given explicitly by ( j ); if  is stationary, () = ?h?1 ; otherwise. 

(5.8)

The following result is fundamental: it establishes the existence of the sequences of closed neighbourhoods of on which our construction of canonical sequences is based. Lemma 5.1 Let be a stationary source;+ then there exists a decreasing sequence fFng of closed neighbourhoods of in M1 such that \

n

Fn = f g :

(5.9)

Reconstruction Sequences and Equipartition Measures

19

Proof: The statement is a consequence of our hypothesis that the factor spaces of

are nite sets, copies of a nite alphabet; it holds whenever the factor spaces are standard Borel spaces, for then there exists a sequence fgm g in Floc which separates M+1: if ;  are any two probability measures which satisfy Z

Z

gm d = gm d;

(5.10)

for all m, then  = . In the nite alphabet case, we can take for fgmg the set f1a 2 Fn : a 2 n ; n 2 INg of all indicator functions of atoms determined by nite words: if !1 = a1; : : : ; !n = an (5.11) 1a(!) = 01 otherwise. We can choose Fn := f 2 M+1 : j gk d ? gk d j  n1 ; k = 1; : : :; ng: (5.12) 

Z

Z

2

We now use the large deviation lower bound (5.7) to prove the existence of a canonical sequence for which a lower bound on the exponential growth-rate holds. The proof employs a construction which we call stretching: Let fFkg be a decreasing sequence in M+1 of closed neighbourhoods of whose intersection is f g and let fNmg be a strictly increasing sequence of positive integers; the decreasing sequence fFn0 g de ned, for each n 2 IN, by + n < N1 ; Fn0 := FM1 ifif N (5.13) m m  n < Nm+1 , is called the stretching of fFng by fNmg; note that the intersection of the stretched sequence is again f g. Proposition 5.1 Let be a stationary source; then there exists a canonical sequence f?n g for which 1 lim (5.14) n n log n [?n ] = ?h( j ) : Proof: Let fFk g be a decreasing sequence in M+1 of closed neighbourhoods of whose intersection is f g; consider Fm for xed m. Since Fm is a neighbourhood of , there exists an open set G such that 2 G  Fm. It follows from the large deviation lower bound (5.7) that ( )  sup ()  limninf n1 log IMn[G]  limninf n1 log IMn[Fm] ; (5.15) 2G hence, for each m 2 IN, there is an integer Nm such that (5.16) ( ) ? m1  p1 log IMp[Fm] for all p  Nm. We may choose the sequence fNmg to be strictly increasing. Let fFn0 g be the stretching of fFng by fNm g; de ne ?n by ?n = Xn Tn?1Fn0 : (5.17) (

Reconstruction Sequences and Equipartition Measures

20

By construction, for all n such that Nm  n < Nm+1 , we have 1 log [? ] = 1 log IM [F 0 ]  ( ) ? 1 ; (5.18) n n n n n n m it follows that (5.19) limninf n1 log n[?n]  ( ) : But is stationary, so that ( ) = ?h( j ) ; (5.20) hence we have limninf n1 log n[?n ]  ?h( j ) : (5.21) In particular, n[?n ] > 0 for all n suciently large; it follows that f?n g is a canonical sequence and that the lower bound holds. The equality (5.14) for the growth{rate now follows from Theorem 4.1 and Proposition 3.2. 2 When the source is ergodic, we use the stretching construction together with the Ergodic Theorem to prove the existence of a canonical sequence f?n g which is supporting. First we need two lemmas: Lemma 5.2 Let be an ergodic source and let f 2 Floc; then (5.22) nlim !1hf; Tn (! )i = hf; i ; ?a:e: : Proof: For f 2 Fm , an elementary calculation shows that sup jhf; Tn(!)i ? (Anf )(!)j  2(mn? 1) sup jf (!)j ; (5.23) !2

!2

so that the sequence fhf; Tn(!)ig converges whenever the sequence f(Anf )(!)g converges and they have the same limit; since is ergodic, it follows from the Ergodic Theorem that their common limit is hf; i. 2 Lemma 5.3 Let be an ergodic source and let G be an open subset of M+1 containing ; then ?1 (5.24) nlim !1 [Tn G] = 1: Proof: Since G is an open set containing , there exist f1; : : :; fm and positive numbers "1; : : : ; "m such that f 2 M+1 : jhfk ; i ? hfk ; ij < "k ; k = 1; : : :; mg  G: (5.25) Since is ergodic and fk 2 Floc, it follows from Lemma 5.2 that (5.26) nlim !1 hfk ; Tn (! )i = hfk ; i ?a:e: ; so (5.27) nlim !1 [f! 2 : jhfk ; Tn (! )i ? hfk ; ij < "k ; k = 1; : : :; mg] = 1 : It now follows from (5.25) that ?1 (5.28) nlim !1 [Tn G] = 1:

2

Reconstruction Sequences and Equipartition Measures

21

Proposition 5.2 Let be an ergodic source; then there exists a canonical sequence

for which is supporting for . Proof: Let fFk g be a decreasing sequence in M+1 of closed neighbourhoods of whose intersection is ; consider Fm for xed m. Since Fm is a neighbourhood of , there exists an open set G such that 2 G  Fm. Suppose that the source is ergodic; then, by Lemma 5.3, we have ?1 lim n [Tn G] = 1

(5.29)

so there exists Nm such that, for all n  Nm, [Tn?1Fm]  [Tn?1G]  1 ? 1=m :

(5.30) The sequence fNm g may be chosen to be strictly increasing. Let fFn0 g be the stretching of fFng by fNmg; put ?n = Xn Tn?1Fn0 : (5.31) Then (5.32) n[?n ] = [Tn?1Fn0 ] ; so that ?1 0 (5.33) lim n n [?n ] = lim n [Tn Fn ] = 1 : Hence f?ng is supporting for ; in particular, by Proposition 3.1, n[?n ] > 0 for all n suciently large so that f?n g is a canonical sequence. 2 We conclude this section by proving a theorem of relevance to the coding problem: Theorem 5.1 Let be a stationary source. If there exists a canonical sequence for which is supporting for , then the source is ergodic. We make use of two lemmas. Lemma 5.4 Let be a stationary source. Let f?n  n : n 2 INg be a canonical sequence for and suppose there exists a sequence ffk 2 Floc : k 2 INg which separates M+1 and lim (5.34) n sup jhTn (! ); fk i ? h ; fk ij = 0 !2e?n

for each k 2 IN. If f?n g is supporting for , then the source is ergodic. Proof: For simplicity, we replace each fk by fk ? h ; fk i so that

h; fk i = 0 for all k 2 IN ()  = : Since fk 2 Floc, we deduce from (5.34) that lim n sup jAn fk (! )j = 0 : !2e?n

(5.35) (5.36)

Since fk is bounded and limn [?n] = 1 by hypothesis, it follows that e

Z

lim n jAn fk (! )j d = 0 :

(5.37)

22

Reconstruction Sequences and Equipartition Measures

Now let A be any shift-invariant set with [A] > 0. We have Z

Z

Z

j A fk d j = j A Sfk d j = j A Anfk d j 

Z



jAnfk j d

(5.38)

because both and A are shift-invariant. Then (5.37) and (5.38) imply that Z



fk [ d! jA] = 0

(5.39)

for all k 2 IN. From (5.35), we conclude that

[ jA] = [  ]

(5.40)

which implies that [A] = 1. Since each shift-invariant set A has either [A] = 0 or [A] = 1, we deduce that is ergodic. 2 We use compactness to prove the existence of a separating sequence having property (5.34).

Lemma 5.5 Let be a stationary source and let f?n  n : n 2 INg be a canonical sequence for based on the sequence fFng; then condition (5.34) holds for each fk in Floc. Proof: fFng is a decreasing sequence of closed neighbourhoods of whose intersection is f g. It follows from Lemma 4.2 that for each f 2 Floc and each " > 0 there exists N (f; ") such that whenever n  N (f; ") (5.41) Fn  f 2 M+1 : jhf; i ? hf; ij < "g: It follows that n  N (f; ") and ! 2 ?n imply jhf; Tn(!)i ? hf; ij < ": (5.42) e

2 Since A is nite, the collection of indicator functions of atoms from n for all n This implies (5.34).

is a countable separating set. Taken together, Lemma 5.4 and Lemma 5.5 prove Theorem 5.1.

Reconstruction Sequences and Equipartition Measures

23

6 Commentary 1. To simplify the exposition, we have assumed A to be a nite set with 1 the equiprobable distribution on 1 = A. Our results extend with some modi cation to the case in which A is a compact metric space and 1 a probability measure on A with n on n and on being the product of copies of 1. Here are the modi cations which must be made:  #?n must be replaced by n[?n ] and hSh ( ) by ?h( j ) in the statements of the propositions; this is done when we come to prove them in Sections 3, 4 and 5;  the hypothesis `Let be a stationary source' must be ampli ed to read `Let be a stationary source with h( j ) nite'. The results may be further extended to the case in which A is a standard Borel space and 1 a probability measure on A, with n and the corresponding product probability measures. In this case we need to modify the de nition of \canonical sequence" so that with ?n = Tn?1Fn we require not only that fFng be a decreasing sequence of closed neighbourhoods of with \Fn = f g, but also that there exists a sequence ffk g with each fk 2 Floc, such that the topology on M+1 determined by ffk g separates the points of M+1, and that fFng is a neighbourhood basis for in this topology. In the non{compact case, in general, Lemma 4.2 is no longer valid. However, the conclusions of Lemma 4.2 hold when f = fj 2 ffk g, the separating sequence. One proves Theorem 4.1 by noting that the level sets of h are compact so that the sequence f ?n g has limit points in M+1 . The modi ed Lemma 4.2 shows uniqueness of the limit point of f ?n g, which implies convergence. e

2. The concept of an equipartition measure is inspired by that of a microcanonical measure in statistical mechanics. For Gibbs, the microcanonical measure was fundamental; the canonical measure, an approximation to the microcanonical, was useful by virtue of being more tractable analytically. The idea of bounded local convergence is fore-shadowed in the statement of his `general theorem': If a system of a great number of degrees of freedom is microcanonically distributed in phase, any very small part of it may be regarded as canonically distributed. ([G],p.183) In the concept of a reconstruction sequence, we turn Gibbs' idea on its head: the stationary source corresponds to his canonical measure; our equipartition measure corresponds to his microcanonical measure. For us, the stationary source is fundamental; it can be approximated by an equipartition measure. 3. From the point of view of digital computation, a reconstruction sequence is more tractable than a stationary measure. Reconstruction sequences may prove useful in providing ecient ways of simulating stationary measures. We will pursue these ideas elsewhere.

Reconstruction Sequences and Equipartition Measures

24

4. The distinction between average and cyclic average becomes negligible as n ! 1 because the limit employs Floc: for f 2 Fm and n > m, an elementary computation shows that sup j ((Anf )  Pn) (!) ? (Anf )(!)j  2(m ? 1) sup jf (!)j ; (6.1) n !2

!2

so the results of this paper hold with the following alternative de nition of reconstruction sequence:

De nition 6.1 Let be a stationary source. A sequence f?n  n : n 2 INg is said to be a reconstruction sequence for if and only if (a) for all n suciently large, [?n] > 0; (b) the corresponding sequence fAn ?n g of averaged equipartition measures converges to : ?n lim (6.2) n An = :

If we use this de nition, then canonical sequences have the following important property:

Corollary 6.1 (to Theorem 4.1) Let f?n g be a canonical sequence for the stationary source . Let f?0n g satisfy ?0n  ?n and [?0n] > 0 for all n suciently large. Then f?0n g is a reconstruction sequence in the sense of De nition 6.1: ?0n lim (6.3) n An n = : Proof: Let f 2 Fm; from the proof of Theorem 4.1, for each " > 0, there exists N (f; ") so that n  N (f; ") implies (6.4) sup jhf; Tn(!)i ? hf; ij  " : !2e?n

By hypothesis we have for all large n

?0n [?n ] = 1 :

(6.5)

jhf; An ?0n i ? hf; ij = jAnf; ?0n i ? hf; ij jf (!)j :  " + 2(mn? 1) sup !2

(6.6)

limnsup jhf; An ?0n i ? hf; ij  " :

(6.7)

e

It follows from (6.1) that

Hence

Since f 2 Floc and " > 0 are arbitrary, it follows that fAn ?0n g converges to . 2

Remarks:

25

Reconstruction Sequences and Equipartition Measures

(a) If each ?0n has cyclic symmetry, then the conclusion of the corollary holds with the original de nition of reconstruction sequence, De nition 2.4. (b) The set ?0n can be a singleton or, in the case of cyclic symmetry, contain at most n elements. For such f?0ng, we have lim n1 log #?0n = 0 : (6.8) 5. Examples of `small' reconstruction sequences are provided also by the Ergodic Theorem. Let be an ergodic measure on ( ; F ); we give an example of a reconstruction sequence f?n g for which grows very slowly: 1 log #? = 0 : lim (6.9) n n n Let 1a 2 Fn denote the indicator function of the atom Xn?1a of Fn , where a is a word in n ; 1a(!) = 1 if !1 = a1; : : : ; !n = an (6.10) 0 otherwise. The Ergodic Theorem implies that ?1 (6.11) nlim !1 An 1a (! ) = [Xn a] = n [a] -a.e. 

Let ( ; a) be the set on which the above limit holds; let ( ) denote the intersection of ( ; a) over all words a in n and all n = 1; 2; : : : . We have [( )] = 1; hence ( ) is non-empty. Choose a sequence ! 2 ( ); for each n 2 IN, de ne ?n to be the set formed by the distinct cyclic permutations of the word Xn !; then ?n (6.12) lim n = ; so that f?ng is a reconstruction sequence for . 6. The approach to large deviation theory sketched in Section 1 is described fully in [LP]; it has its origins in Ruelle's treatment [R] of thermodynamic entropy and Lanford's proof [L] of Cramer's Theorem. 7. Our conditional limit theorem has antecedents; the earliest we are aware of is due to van Campenhout and Cover [CC]: Let Y1; Y2; : : : be i.i.d. random variables having uniform probability mass on the range f1; 2; : : : ; mg. Then, for 1   m and for all x 2 f1; 2; : : : mg, we have 1 n Y = g = (x); (6.13) lim Prob f Y = x j 1 n!1 n i=1 i n integer where x (6.14) (x) = me ek X

P

k=1

and the constant  is chosen to satisfy the constraint

P

k k

(k ) = .

Reconstruction Sequences and Equipartition Measures

26

A landmark in the development of such theorems is the paper by Csiszar [C], in which several important concepts are introduced. 8. We have shown that canonical sequences have the reconstruction property. In the case where is a product measure, we have given other examples of reconstruction sequences. In the literature, sequences of the form ?mn := fa 2 n : j n1 log n[[aa]] ? h( j ) j  m1 g ; (6.15) n are used frequently. When is ergodic, the Shannon{McMillan{Breiman Theorem implies that f?mn g is a supporting sequence: m (6.16) nlim !1 n [?n ] = 1 :

We can choose the strictly increasing sequence Nm so that n  Nm implies (6.17) n [?mn]  1 ? m1 : De ne n < N1 , ?0n := ? mn ifif N (6.18) m  n < Nm+1 ; n in the case of ergodic , we can use sets of the form (6.15) to generate a supporting sequence for . Straightforward estimates show f?0n g to have entropic growth{rate. This suggests asking whether a supporting sequence with entropic growth{rate is a reconstruction sequence. The example near (2.25) shows that, in general, a supporting sequence for with entropic growth{rate need not be a reconstruction sequence for . We need additional hypothesis so that ?n does not include points of low {probability. Here is a result of this kind: 

Proposition 6.1 Let be a stationary product measure and let f?n g be a supporting sequence for . For each " > 0, de ne ?"n := fa 2 ?n : n1 log n[[aa]] < h( j ) ? "g: n If for each " > 0 n [?"n ] nlim !1 n [?n ] = 0 ; then f?n g is a reconstruction sequence for .

Proof: We have

(6.19) (6.20)

" [?"n] ?"n ; ?n = n [?n[?n ?] n ] ?nn?"n + n[? (6.21) n n n n] so (6.20) implies that the sequence fAn ?n g converges to if, and only if, the " sequence fAn ?nn?n g converges to . Thus it suces to consider the case in which ?"n = ; for all " > 0 and all n 2 IN. For a 2 n , we have fn (a) n[?n] n?n [a] = n[a]; (6.22)

Reconstruction Sequences and Equipartition Measures

where fn is given by Assuming ?"n = ;, we have Then so

fn (a) := n [a]= n[a]:

(6.23)

log fn  n (h( j ) ? ") :

(6.24)

1 D( ?n n) = [? n n]

27

Z

?n

? log(fn n[?n]) d n;

1 D( ?n )  ? lim inf 1 log [? ] ? h( j ) + "  " lim sup n n n n!1 n n!1 n

(6.25) (6.26)

for every " > 0, because f?n g is a supporting sequence. Any limit point  of the sequence fAn ?n g is stationary. Lemma 8.1 of [LPS] and the lower semi-continuity of the speci c information gain imply

h(j ) = 0:

(6.27)

Since is assumed to be a product measure, this implies  = . But the level sets of h are compact, so the sequence fAn ?n g converges to . 2

Remarks: (a) If f?n g is a supporting sequence for , then the sequence f?n g given by ?n := ?n ?0n ; \

(6.28)

as given in (6.18), is a supporting sequence for , which satis es (6.20). (b) Under the condition that is weakly dependent (see [LPS]), one may also deduce (6.27). One can then conclude that  is a Gibbs state for the interaction associated with ; this does not , in general, imply that  = .

Acknowledgements:

We thank Frank den Hollander for a careful reading of an earlier draft of this paper. This work was partially supported by the European Commission under the Human Capital and Mobility Scheme (EU contract CHRX-CT93-0411).

Reconstruction Sequences and Equipartition Measures

28

References [B]

L.Breiman, The Individual Ergodic Theorem of Information Theory, Ann.Math.Stat. 28 809-811 (1957) [C] I.Csiszar, Sanov property, generalized I-projection and a conditional limit theorem, Ann.Prob. 12, 768-793 (1984) [CC] J.M.van Campenhout and T.M.Cover, Maximum Entropy and Conditional Probability, IEEE Trans. Inform. Theory 4, 483-489 (1981) [G] J.W.Gibbs, Elementary Principles of Statistical Mechanics New Haven, Connecticut: Yale University Press (1902) [K] A.N.Kolmogorov, Grundbegri e der Wahrscheinlichkeitsrechnung (Ergebnisse der Math.), Berlin: Springer (1933) Transl. as Foundations of Probability New York: Chelsea (1956) [L] O.E.Lanford, Entropy and equilibrium states in classical statistical mechanics, in Lecture Notes in Physics 20, 1-113 Berlin: Springer (1973) [LP] J.T.Lewis, C.-E.P ster, Thermodynamic Probability Theory: Russian Math. Surveys 50:2, 279-317 (1995) [LPS1] J.T.Lewis, C.-E.P ster and W.G.Sullivan, Large Deviations and the Thermodynamic Formalism: a new proof of the equivalence of ensembles, in On Three Levels, M. Fannes, C. Maes, A. Verbeure eds., Plenum Press 183-193, (1994) [LPS2] J.T.Lewis, C.-E.P ster and W.G.Sullivan, The Equivalence of Ensembles for Lattice Systems: Some examples and a Counterexample, J.Stat.Mech., 77 397{419 (1994) [LPS] J.T.Lewis, C.-E.P ster and W.G.Sullivan, Entropy, Concentration of Probability and Conditional Limit Theorems, Markov Processes and Related Fields, 1 319{386 (1995) [M] B.McMillan, The Basic Theorems of Information Theory, Ann.Math.Stat. 24 196{219 (1953) [R] D.Ruelle, Correlation functionals, J. Math. Phys. 6, 201-220 (1965) [S] C.E.Shannon, A mathematical theory of communications, Bell System Technical Journal 27 379-423, 623-656 (1948)