Inequalities for Shannon entropies and Kolmogorov complexities

10 downloads 0 Views 218KB Size Report
From the very beginning the notion of complexity of nite objects was considered as an algorithmic counterpart to the notion of Shannon entropy. Kol- mogorov's ...
Inequalities for Shannon entropies and Kolmogorov complexities D. Hammer Technische Universitat, Berlin

A. E. Romashchenko Moscow State University

[email protected]

A. Shen Institute of Problems of Information Transmission, Moscow [email protected], [email protected]

Abstract

N.K. Vereshchagin Moscow State University [email protected]

inequality ([3], see also [8]) is valid for ranks but not for Shannon entropies; (4) for some special cases all three classes of inequalities coincide and have simple description. We present an inequality for Kolmogorov complexities that implies Ingleton's inequality for ranks; another application of this inequality is a new simple proof of one of Gacs{Korner's results on common information [1]. Acknowledgements. The work of Moscow authors was supported in part by the INTAS project No 93-0893. A. Shen thanks also the Volkswagen Foundation for support and Bonn University and Prof. M. Karpinski for hospitality.

The paper investigates connections between linear inequalities that are valid for Shannon entropies and for Kolmogorov complexities.

1 Introduction From the very beginning the notion of complexity of nite objects was considered as an algorithmic counterpart to the notion of Shannon entropy. Kolmogorov's paper [4] was called \Three approaches to the quantitative de nition of information"; Shannon entropy and algorithmic complexity were among these approaches. It was mentioned by Kolmogorov in [5] that the properties of algorithmiccomplexity and Shannon entropy are similar. We investigate one aspect of this similarity. Namely, we are interested in linear inequalities that are valid for Shannon entropies and for Kolmogorov complexities. It turns out that (1) all inequalities that are valid for Kolmogorov complexities, are also valid for Shannon entropies and vice versa; (2) all inequalities that are valid for Shannon entropies, are valid for ranks of nite subsets of linear space; (3) the opposite statement is not true: Ingleton's

2 Shannon entropy and Kolmogorov complexity Let be a random variable with a nite range a1 ; : : : ; an. Let pi be the probability of the event = ai . Then the Shannon entropy of is de ned as H( ) = ?

X p logp :1 i

i

i

Using the convexity of the function ?x logx, one can prove that the Shannon entropy of random variables does not exceed the logarithm of its range (and is 1

1

All logarithms in the paper are base 2

The Kolmogorov complexity of a binary string a is de ned as the minimal length of a program that generates a. There are di erent re nements of this idea (called simple Kolmogorov complexity, monotone complexity, pre x complexity, decision complexity, see [6], [7]). However, for our purposes the di erence is not important, since all these complexity measures di er only by O(log m) where m is the length of a. Therefore, in the sequel we denote Kolmogorov complexity of a binary string a by K(a) not specifying which version we use, and all our equalities and inequalities are valid up to O(logm) term where m is the total length of all strings involved. The conditional complexity K(ajb) is de ned as the minimal length of a program that produces a having b as input; one can prove that K(bja) = K(ha; bi) ? K(a); (see e.g. [9] for the proof). Here ha; bi denotes the encoding of the pair a; b by a binary string (di erent computable encodings lead to complexities that di er only by O(1)). As always, O(log m) additive term is omitted. So the precise meaning of this equality is as follows: there exists constants p; q such that K(bja)  K(ha; bi) ? K(a) + p log(jaj + jbj) + q; K(ha; bi) ? K(a)  K(bja) + p log(jaj + jbj) + q for all binary words a; b. The mutual information is de ned as I(a : b) = K(b) ? K(bja): An equivalent (up to O(logm) term) symmetric definition is I(a : b) = K(a) + K(b) ? K(ha; bi): As for the Shannon case, the mutual information is always non-negative (up to O(log m) term). The conditional version of mutual information is de ned as I(a : bjc) = K(ajc) + K(bjc) ? K(ha; bijc): The inequality I(a : bjc)  0 is valid up to a logarithmic term, that is, I(a : bjc)  ?O(log(jaj + jbj + jcj)). This inequality plays an important role in the sequel.

equal to the logarithm of the range for uniformly distributed variables). Let be another variable with a nite range b1; : : : ; bk de ned on the same probabilistic space as is. We de ne H( j = bj ) in the same way as H( ); the only di erence is that pi is replaced by the conditional probability Pr[ = ai j = bj ]. Then we de ne the conditional entropy as H( j ) =

X Pr[ = b ]  H( j = b ): j

j

j

It is easy to check that H(h ; i) = H( ) + H( j ): Using the convexity of the function ?x logx, one can prove that H( j )  H( ); and that H( j ) = H( ) if and only if and are independent. In other terms, H(h ; i)  H( ) + H( ): The mutual information in and is de ned as I( : ) = H( )?H( j ) = H( )+H( )?H(h ; i): The mutual information I( : ) is always nonnegative and is equal to 0 if and only if and are independent. The conditional version of mutual information is de ned as I( : j ) = H( j ) + H( j ) ? H(h ; ij ) and is always non-negative, too. This is proved as follows. For any possible value ci of we have H( j = ci) + H( j = ci) ? H(h ; ij = ci )  0: Multiplying this inequality by Pr[ = ci] and summing over i yields the desired inequality. All these notions have their counterparts in Kolmogorov complexity theory. 2

3 Inequalities

Let us consider another example. Assume that a and b are two binary strings. Let us prove that the mutual information I(a : b) is an upper bound for complexity K(x) of any string x which has negligible conditional complexities K(xja) and K(xjb). Indeed, the following inequality holds for any three strings a; b; x: K(x)  K(xja) + K(xjb) + I(a : b): (6) This inequality is a consequence of the equality

We have already mentioned several inequalities for Shannon entropies and Kolmogorov complexities. Some others are known; for example, the inequality 2K(ha; b; ci)   K(ha; bi) + K(ha; ci) + K(hb; ci): (1) This inequality is equivalent in a sense to the following geometric fact: if V is the volume of the set A  R3 and Sxy , Sxz and Syz are areas of its three projections (on OXY , OXZ and OY Z), then V 2  Sxy  Sxz  Syz (see [2]). It turns out that the inequality (1), as well as all other known inequalities for Kolmogorov complexities, is a corollary of the inequalities and equalities of the following type I(P : QjR)  0; (2) K(QjP) = K(hP; Qi) ? K(P); (3) I(P : QjR) = K(P jR) + K(QjR) ? K(hP; QijR); (4) where P; Q; R are some tuples (possibly empty) of binary strings. Indeed, (1) is a consequence of the equality 2K(ha; b; ci) = = K(ha; bi) + K(ha; ci) + K(hb; ci) ? ? I(a : bjc) ? I(ha; bi : c) (5) and inequalities I(a : bjc)  0 and I(ha; bi : c)  0. To check the equality (5) we express all the quantities included in terms of unconditional complexities. For example, we replace I(a : bjc) by K(ajc) + K(bjc) ? K(ha; bijc) = = K(ha; ci) ? K(c) + K(hb; ci) ? K(c) ? ? K(ha; b; ci) + K(c) = = K(ha; ci) + K(hb; ci) ? K(ha; b; ci) ? K(c); and so on. 3

K(x) = I(a : b) + K(xja) + K(xjb) ? ? K(xjha; bi) ? I(a : bjx) and inequalities K(xjha; bi)  0 and I(a : bjx)  0. The inequalities of type (2) can be written in different equivalent forms: I(P : QjR)  0 K(P jR) + K(QjR)  K(hP; QijR) K(P jR)  K(P jhQ; Ri) K(hP; Ri) + K(hQ; Ri)  K(hP; Q; Ri) + K(R) Here P, Q and R are strings or tuples of strings; hP; Ri denotes the union of tuples P and R (it does not matter whether we list strings that are in P \ R twice or not, the complexity does not change), etc. The latter form does not involve conditional complexities. In general, we may always replace conditional complexities and mutual informations by linear combinations of unconditional complexities, using equalities (3) and (4). Therefore, in the sequel we will consider inequalities containing only unconditional complexities. The same applies to inequalities for Shannon entropies. We call the inequalities K(hP; Ri) + K(hQ; Ri)  K(hP; Q; Ri) + K(R) (7) (for any tuples P; Q; R) basic inequalities. Let us mention two special cases of inequalities (7). If P = Q, we get the inequality K(hP; Ri) + K(hP; Ri)  K(hP; Ri) + K(R); or K(hP; Ri)  K(R);

or

general form of a linear inequality under consideration is: K(P jR)  0: X  K(aw )  0; (8) Therefore, the inequality K(xjha; bi)  0 in our secw w ond example is also the corollary of basic inequalities (7). where the sum is over all nonempty subsets w of If R is empty, we get the inequality f1; 2; : : : ; ng, and aw stands for the tuple consisting of all ai for i 2 w. K(P) + K(Q)  K(hP; Qi) Now we consider the set of inequalities that are valid (up to a O(log m) term, as usual) for all binary or strings. This set is a convex cone in R2 ?1 . We K(P)  K(P jQ) want to compare this cone with the similar cone for Shannon entropies. All inequalities mentioned in this section have counterparts that involve Shannon entropy instead of Theorem 1 Any linear inequality that is true for Kolmogorov complexity. The questions we are inter- Kolmogorov complexities is also true for Shannon enested in are: 1) whether the same linear inequalities tropies and vice versa. are true for Shannon entropies and Kolmogorov complexities and 2) whether all linear inequalities valid Proof. Let an inequality of the form (8) be true for for Shannon entropies (Kolmogorov complexities) are Kolmogorov complexities (up to O(logm) term). Let consequences of basic inequalities. In next section, we 1; : : : ; n be random variables. We have to prove X  H( w)  0; obtain positive answer to the rst question and pos- that w itive answer to the second question in the case when w at most three random variables (binary strings) are where the sum is over all nonempty subsets w of involved. f1; 2; : : : ; ng, and w stands for the tuple consisting of all i for i 2 w. Consider a sequence of independent tuples of 4 Linear inequalities random variables 1 = h 11; : : : ; 1ni; : : : ; N = Consider n variables a1 ; : : : ; an whose values are bi- h N1 ; : : : ; Nn i; : : : All 1; 2; : : : are independent and nary strings (if we consider Kolmogorov complexities) have the same distribution as = h 1; : : : ; ni. For or random variables (for Shannon entropies). There a given N, consider the random variables 1(N ) = are 2n ? 1 nonempty subsets of the set of variables. 11 21 : : : N1 ; : : : ; n(N ) = 1n 2n : : : Nn . Values of i Therefore, there are 2n ? 1 tuples whose complex- are elements of a xed nite set; using a suitable enity (or entropy) may appear in the inequality. We coding (where all codes have the same length) we consider only linear inequalities. Each inequality has may assume that the values of all j are binary i 2n ? 1 coecients P indexed by non-empty subsets strings of equal length. In this case the values of P of the set f1; 2; : : : ; ng; for example, for n = 3 the (N ) ; : : : ; (N ) are binary strings, and any inequality n 1 general form is for Kolmogorov complexities may be applied to any values of 1(N ) ; : : : ; n(N ). Therefore, for some c for 1K(a1 ) + 2 K(a2 ) + 3 K(a3 ) + all N X  K(( (N ) )w )  ?c log(N) ? c; +1;2 K(ha1 ; a2i)+1;3 K(ha1 ; a3i)+2;3 K(ha2 ; a3i)+ w + 1;2;3K(ha1 ; a2; a3i)  0 w Here a1 ; a2; a3 are binary strings; for Shannon en- with probability 1. Dividing this by N we get Pw w K(( (N ) )w ) ?c log N ? c tropies they should be replaced by random variables,  and K should be replaced by H. For arbitrary n the N N n

4

with probability 1. It remains to use the following connection between Shannon entropy and Kolmogorov complexity. Let  be a random variable whose values are nite binary strings of a xed length. Consider the sequence 1 ; 2; : : : of independent random variables, where each i has the same distribution as . Then lim K(1N: : :N ) = H() N !1 with probability 1 (see [9], equation (5.18)). let us x w and apply this to  = w . It is easy to see that K(( (N ) )w ) is equal (up to O(1) term) to K(1    N ). Therefore,

in some order. (However at any moment of the run of P we can not be sure that all the elements of M have been printed.) The number of elements in M is at least 2K (ha1 ;::: ;a i)?O(log m) . Indeed, the tuple ha1 ; : : : ; ani can be speci ed by identifying the numbers K(au jau) and K(au ) for all u (total O(log m) bits) and identifying the ordinal number of ha1 ; : : : ; ani in the order in which the program P prints all the elements in P (at most log jM j bits). Thus, K(ha1 ; : : : ; ani)  log jM j + O(logm) and the inequality jM j  2K (ha1 ;::: ;a i)?O(log m) follows. Let = h 1 ; 2; : : : ; n i denote the random variable uniformly distributed in M. We have X  H( u )  0: u n

n

u

K(( (N ) )w ) = lim K(1    N ) = H( w ): lim N !1 N !1 N N 1. Hence the inequality Pwithwprobability w )  0 is true. H( w Now, we have to prove the converse: if the inequality X  H( w)  0; w

Let us derive from this the desired inequality for complexities of a1; a2; : : : ; an its pairs, triples, etc. To this end let us prove that H( u ) is close to K(au ) for any nonempty u  f1; 2; : : : ; ng. Let us x u  f1; 2; : : : ; ng. We will show that u is close to the random variable uniformly distributed in the set having 2K (a ) elements. Indeed, the cardinality of the set fbu j K(bu )  K(au )g is at most 2K (a )+O(1) . Therefore u has no more than 2K (a )+O(1) values. Hence, H( u )  K(au ) + O(1). To prove the converse inequality, let us note that if Pr[ = x]  p for all possible values x of a random variable , then H()  ? log p. So it suces to show that Pr[ u = bu]  2?K (a )?O(log m) for any b1 ; : : : ; bn in M. We have j c u = bu j : Pr[ u = bu ] = jc 2 MjM j If c 2 M and cu = bu , then K(cu jbu) = K(cu jcu ) = K(aujau). Therefore, jc 2 M j cu = buj  2K (a ja )+O(1) . Hence, K (a  ja )+O(1) 2 u u Pr[ = b ]  2K (ha1 ;::: ;a i)?O(log m) = 2K (a  ja )?K (ha1 ;::: ;a i)+O(log m)  2?K (a )+O(log m) : u

w

is true for any random variables 1; : : : ; n, then the inequality

X w

u

w K(aw )  ?O(log m)

u

is true for all binary strings a1 ; a2; : : : ; an, where m =

ja1j + ja2 j +   + jan j (the constant hidden in O(logm)

may depend on n). Let a1; a2; : : : ; an be binary strings. Given u  f1; 2; : : : ; ng denote by u the set f1; 2; : : : ; ng n u. Consider the set M consisting of all tuples hb1 ; b2; : : : ; bni such that

u

K(bu )  K(au ); K(bu jbu)  K(au jau )

u

for all nonempty u  f1; 2; : : : ; ng. (If u = f1; 2; : : : ; ng the second inequality should be skipped.) There exists a program P that given numbers K(au jau ) and K(au ) for all nonempty u  f1; 2; : : : ; ng eventually prints all the elements in M

u

u

u

n

u

u

u

5

n

Now consider the case F = R. We may assume that L is a Euclidean space. Let be a random variable uniformly distributed in the unit disk in L. For any subspace A consider a random variable A that is the orthogonal projection of onto A. This random variable has in nite domain, so we need to digitize it. For any " > 0 and for any subspace A  L we divide A into equal cubes of dimension dimA and size "    ". By A;" we denote the variable whose value is the cube that contains A . Let us prove that H( A;") = log(1=")  dimA + O(1) (when " ! 0). If " is small enough the number kA;" of the cubes which are possible values of A satis es the inequality kA;"  2(1=")dim A Vdim A ; where Vdim A stands for the volume of dimAdimensional unit disk. Therefore, H( A;")  log(1=")  dimA + 1 + logVdim A : On the other hand, for any xed cube the probability of A getting into it is at most "dim A V(dim L?dim A) : Vdim L Hence H( A;")  log(1=")dimA+log V(dim L?dim A ) ?logVdim L : The projection A1 +A2 is equivalent to h A1 ; A2 i. This is not true for "-versions; the random variables A1 +A2 ;" and h A1 ;"; A2;" i do not determine each other completely. However, for any xed value of one of these variables there exist only a nite number of possible values of the other one, therefore, the conditional entropies are limited and the entropies di er by O(1). Now we let " ! 0 and conclude that any inequality that is valid for Shannon entropies is valid for ranks. (End of proof.) Therefore, we have a sequence of inclusions: (basic inequalities (7) and their non-negative linear combinations)  (inequalities valid for Kolmogorov complexities) = (inequalities valid for Shannon entropies)  (inequalities valid for ranks).

(End of proof.) Assume that a linear space L over a nite eld or over R is given. Let 1 ; : : : ; n be nite subsets of L. For any subset A  f 1; : : : ; ng consider the rank of the union of all 2 A. Now consider all linear inequalities that are valid for ranks of these subsets for all 1; : : : ; n  L. For example, inequality of type (7) for ranks says that rk( 1 [ 3) + rk( 2 [ 3)  rk( 1 [ 2 [ 3) + rk( 3 ) This inequality can be rewritten in terms of dimensions of subspaces: Replacing each i by a linear subspace Ai generated by i, we get dim(A1 + A3 ) + dim(A2 + A3)   dim(A1 + A2 + A3 ) + dim(A3 ) It is easy to verify that this inequality is true for any linear subspaces of any linear space. So, all basic inequalities are true when K() is replaced by rk() and strings are replaced by vectors. Moreover, the following is true.

Theorem 2 Any inequality valid for Shannon en-

tropies is valid for ranks in any linear space over any nite eld or over R.

Proof of Theorem 2. Assume that A1; : : : ; An are subspaces of a nite-dimensional linear space L over a eld F. It suces to construct random variables 1; : : : ; n such that H( i) is proportional to dimAi , H(h i ; j i) is proportional to dim(Ai +Aj ), : : :, H(h 1; 2; : : : ; ni) is proportional to dim(A1 + A2 +    + An). If F is nite, the construction is straightforward. Consider a random linear functional : L ! F. For any subspace A  L consider the restriction jA. This is a random variable with jF jdim A values (here jF j is the number of elements in F); all values have equal probabilities, so H( jA) = dimA  log jF j. If Ai and Aj are di erent subspaces, the pair h jA ; jA i is equivalent to (and has the same distribution as) jA +A . Therefore, the entropy of the pair h jA ; jA i is equal to dim(Ai + Aj )  log jF j; the same is true for triples, etc. i

i

i

i

j

j

j

6

For n = 1; 2; 3 all these sets are equal, as the following theorem shows:

A

Theorem 3 For n = 1; 2; 3 any inequality valid for

ac

ranks is a consequence (linear combination with nonnegative coecients) of basic inequalities (7).

        

b

B

bc

c

Proof. The cases n = 1; 2 are trivial. Let us consider the case n = 3. Consider the following 9 basic inequalities: rk(A + B) rk(A + C) rk(B + C) rk(A + B) rk(A + C) rk(B + C) rk A + rk(A + B + C) rk B + rk(A + B + C) rk C + rk(A + B + C)

a

ab abc

C Figure 2: Old and new variables

rk(A + B + C) rk(A + B + C) rk(A + B + C) rk A + rk B rk A + rk C (9) rk B + rk C rk(A + B) + rk(A + C) rk(A + B) + rk(B + C) rk(A + C) + rk(B + C):

The relations between new and old variables are: rk A = [a] + [ab] + [ac] + [abc]; rk(A + B) = [a] + [b] + [ab] + [ac] + [bc] + [abc]; rk(A + B + C) = [a] + [b] + [c] + [ab] + [bc] + [ac] + [abc]: and similar formulae obtained by permutations of letters. (See Fig. 2.) Or, equivalently, [a] = rk(A + B + C) ? rk(B + C); [ab] = rk(A + C) + rk(B + C) ? rk(A + B + C) ? rk C; [abc] = rk(A + B + C) ? rk(A + B) ? rk(A + C) ? rk(B + C) + rk A + rk B + rk C; [b] = rk(A + B + C) ? rk(A + C); :::

We claim that any linear inequality for dimA, dimB, dimC, dim(A+B), dim(A+C), dim(B +C), dim(A+B +C) is a non-negative linear combination of these nine ones (for instance, so are all other basic inequalities). The inequalities (9) determine a convex cone C in the space R7 where variables are rk A; rk B; rk C; rk(A + B); rk(B + C); rk(A + C); rk(A + B + C)

The inequalities (9) rewritten in new variables are as follows: Any three subspaces A; B; C determine a point inside C, let us denote the set of all points in C obtained in [a]  0; [b]  0; [c]  0; this way by C0 . To prove Theorem 3 it is enough [ab] + [abc]  0; [ac] + [abc]  0; [bc] + [abc]  0; to show that any point in C can be represented as 0 a non-negative linear combination of points from C . [ab]  0; [ac]  0; [bc]  0: (10) To this end consider 8 points in C0 shown on Fig. 1 Let us show that any point in C can be represented as (Please note that [abc] may be negative.) In new a non-negative linear combination of those 8 points. variables, the 8 speci ed points in C0 are written as To prove this it is convenient to consider another co- shown on Fig. 3. Thus we have to show that any vecordinate system in R7. We denote new coordinates tor satisfying the inequalities (10) is a non-negative by linear combination of 8 vectors represented on Fig. 3 (we denote them by v1 {v8). [a]; [b]; [c]; [ab]; [ac]; [bc]; [abc] 7

Figure 1: The 8 points in C0 . By e1 ; e2 ; e3 we denote three pairwise independent vectors in 2-dimensional space. fu; : : : g stands for the linear subspace generated by u; : : :. By 0 we denote the 0-dimensional subspace. A B C rk A rk B rk C rk(A + B) rk(A + C) rk(B + C) rk(A + B + C) fe1 g 0 0 1 0 0 1 1 0 1 0 fe1 g 0 0 1 0 1 0 1 1 0 0 fe1 g 0 0 1 0 1 1 1 fe1 g fe1 g 0 1 1 0 1 1 1 1 fe1 g 0 fe1 g 1 0 1 1 1 1 1 0 fe1 g fe1 g 0 1 1 1 1 1 1 fe1 g fe1 g fe1 g 1 1 1 1 1 1 1 fe1 g fe2 g fe1 + e2 g 1 1 1 2 2 2 2 Figure 3: The 8 points in C0 rewritten in new coordinates. [a] [b] [c] [ab] [ac] [bc] [abc] v1 1 0 0 0 0 0 0 v2 0 1 0 0 0 0 0 v3 0 0 1 0 0 0 0 v4 0 0 0 1 0 0 0 v5 0 0 0 0 1 0 0 v6 0 0 0 0 0 1 0 v7 0 0 0 0 0 0 1 v8 0 0 0 1 1 1 ?1

the non-negative linear combinations of basic inequalities. However, for n = 4 the situation becomes more complicated: there is an inequality that is true for ranks but not for Shannon entropies. This inequality was found by Ingleton [3].

Let v = ([a]; [b]; [c];[ab];: :: ; [abc]) be arbitrary vector in C. If [abc] is non-negative, we can represent v as non-negative linear combination of v1{v8 as follows: v = [a]  v1 + [b]  v2 + [c]  v3 + [ab]  v4 +    + [abc]  v7: Otherwise (when [abc] is negative) we can represent v as non-negative linear combination of v1{v7 as follows: v = [a]v1 + [b]v2 + [c]v3 + ([ab] + [abc])v4 +([ac] + [abc])v5 + ([bc] + [abc])v6 ? [abc]  v8 : Theorem 3 is proven.

In terms of dimensions of subspaces Ingleton's inequality says that

Proposition 1 The following inequality is true for ranks:

I(A : B)  I(A : B jC) + I(A : B jD) + I(C : D) (11)

dimA + dimB + +dim(C +D)+dim(A+B+C)+dim(A+B+D)   dim(A + B) + dim(A + C) + dim(B + C) + + dim(A + D) + dim(B + D) (12)

To prove Ingleton's inequality one may interpret I(A : B) as the dimension of intersection A \ B, and I(A : B jC) as the dimension of the intersection of A=C and B=C (i.e., A and B factored over C). See also section 6 where Ingleton's inequality is proved as a a consequence of Theorem 9. 5 Ingleton's inequality The following example shows that Ingleton's inAs we have seen in the preceding section, for n = equality is not always true for Shannon entropies. 3 the same inequalities are true for Shannon entropies, Kolmogorov complexities and ranks, namely, 8

Theorem 4 There exist four random variables , ,

and  such that

I( : ) I( : j ) I( : j) I( : )

> = = =

0 1

0 0 0 0

1 0 1

0 0 1=8 1 3=8

= 0;  = 0 0 0 1=8 1 3=8

In other terms, and  are independent, and and are independent for any xed values of and ; however, and are dependent. Proof of Theorem 4. Let the range of all for variables ; ; ;  be f0; 1g. Let and  be independent and uniformly distributed. Any possible distribution of ; is determined by four non-negative reals whose sum is 1 (i.e., by the probabilities of all four combinations), so the distribution can be considered as a point in a threedimensional simplex S in R4. For any of the four possible values of and  we have a point in S (whose coordinates are conditional probabilities). We denote these points by P00, P01, P10 and P11. What are the conditions we need to satisfy? Let I be the subset of S that corresponds to independent random variables; I is a quadratic curve (the independence condition means that the determinant of the probabilities matrix is equal to zero). The conditions I( : j ) = 0 and I( : j) = 0 mean that midpoints of segments P00P01, P10P11, P00P10, P01P11 belong to I . The inequality I( : ) > 0 means that the point (P00 + P01 + P10 + P11)=4 does not belong to I . In other terms, we are looking for a parallelogram whose vertices lie on a quadratic curve but whose center does not, so almost any example will work. Fig. 4 shows one of them: It is easy to check that all four conditional distributions (for conditions = 0, = 1,  = 0,  = 1) satisfy the independence requirement. However, the unconditional distribution for h ; i is 0 1 0 5=16 3=16 1 3=16 5=16

0 0 0

= 0;  = 1

1 3=8 1=8

= 1;  = 0 1 3=8 1=8

0 1

0 1 0

1 0 0

= 1;  = 1

Figure 4: Conditional probability distributions for

h ; i

A simpler example, though not so symmetric, can be obtained as follows. Let and  be independent random variables with range f0; 1g and uniform distribution, = (1 ? ) and = (1 ? ). For any xed value of or  one of the variables and is equal to 0, therefore they are independent. However, and are not (unconditionally) independent, since each of them can be equal to 1, but they cannot be equal to 1 simultaneously. (End of proof.) We see that for n = 4 not all the inequalities valid for ranks are valid for entropies, so the rank and entropy cases should be considered separately. For ranks we have the complete answer:

Theorem 5 For n = 4, all the inequalities that are valid for ranks, are consequences (positive linear combinations) of basic inequalities and Ingleton-type inequalities (i.e. inequalities obtained from Ingleton's inequality by permutations of variables).

For entropies we do not know the answer. The only thing know is the following conditional result. Let (13) x =" ywemeans that jx ? yj < ".

so and are dependent. 9

6 One more inequality for entropies

Theorem 6 If for any " > 0 there exist random variables , , and  such that H( ) =" H( ) =" H( ) =" H() =" 2; H(h ; i) =" H(h ; i) =" H(h ;  i) =" =" H(h ; i) =" H(h ;  i) =" 3; H(h ;  i) =" 4; H(h ; ;  i) =" H(h ; ;  i) =" =" H(h ; ;  i) =" H(h ; ; i) =" 4; H(h ; ; ;  i) =" 4;

In this section we present one more inequality for entropies and show how it can be used to prove Ingleton's inequality and Gacs{Korner result on common information.

Theorem 8 For any random variables , , , and 

H()  2H( j ) + 2H( j ) + + I( : j ) + I( : j) + I( : )

then all the linear inequalities that are valid for Shannon entropies are consequences (positive linear combinations) of basic inequalities.

The proofs of Theorems 5 and 6 require a fairly long computation (it can be performed by hand or using an appropriate software). As before, consider the cone C  R15 that consists of all the points satisfying basic inequalities (for Theorem 6). Its dual cone C contains all nonnegative combinations of basic inequalities. Compute all the extreme vectors of the cone C. If for any extreme vector we can nd a quadruple of random variables whose entropies' vector is proportional to the extreme vector, we are done. It turns out that for all the extreme vectors but one it can be done easily; the only exception is the vector given in the statement of the theorem. Let us note that there are no , , and  satisfying the equalities in the above theorem for " = 0 (the proof will be presented in the whole text). For the rank case, we have more inequalities, the cone is smaller, and the problematic extreme vectors disappear. (Of course, in this case we need to construct subspaces instead of random variables.) We may also ask which inequalities are valid for ranks in arbitrary matroids (see [8]). In this case the extreme vector mentioned in Theorem 6 is represented by a Vamos matroid (see [8]), so we get the following

Proof of Theorem 8. This inequality is a nonnegative linear combination of basic inequalities. However, we present a proof that re ects the intuitive meaning of the inequality. As we have seen, Ingleton's inequality I( : )  I( : j ) + I( : j) + I( : ) is not always true for entropies. However, if a binary string  has zero complexities K( j ) and K( j ), then K()  I( : j ) + I( : j) + I( : ) Indeed, as we know from section 2, inequality (6), K()  K( j ) + K( j) + I( : ) Now we use the conditional versions of the same inequality (6), K( j )  K( jh ; i) + K( jh ; i) + I( : j ) K( j)  K( jh ;  i) + K( jh ;  i) + I( : j)

Recalling that K( jh ; i)  K( j ), K( jh ;  i)  K( j ), etc., and combining last three inequalities, we get the inequality of Theorem 8. (End of proof.) Theorem 7 For n = 4, all the inequalities that We present two corollaries of this inequality. The are valid for ranks in arbitrary matroids, are con- rst one is the generalization of Ingleton's inequality. sequences (positive linear combinations) of basic in- We formulate this corollary for Shannon entropies; equalities. the similar result is true for Kolmogorov complexities. 10

Let us call the random variable  a common infor- any x with small K(xja) and K(xjb) has small commation for random variables and if plexity.

However, I(a : b) may still be signi cant, and in this case we get an example of two strings with signi cant mutual information but with no common information. Such an example can be constructed using Theorem 4. Theorem 9 Let , , and  be random variables. Consider two coins (random variables) and If there exists a random variable that is a common used in the proof of Theorem 4, see (13). Each coin information for and , then Ingleton's inequality has two equiprobable outcomes; and are dependent: holds: H( j ) = 0 H( j ) = 0 H() = I( : )

I( : )  I( : j ) + I( : j) + I( : )

Pr[ = ] = 5=8; Pr[ 6= ] = 3=8

The proof is easy: just apply the Theorem 8 to the random variable  that is the common information of and . Ingleton's inequality for ranks is the consequence of Theorem 9. Indeed, recall the proof of Theorem 2. In that proof for each subspace X we considered a random variable X that is the restriction of a random linear functional to X. If X and Y are two subspaces, the random variables X and Y have common information. This common information is Z where Z = X \ Y . Therefore, we may apply the inequality of Theorem 9. Now we understand the reason why Ingleton's inequality is true for ranks in linear spaces (though it is not true for general matroids, Shannon entropy or Kolmogorov complexity): There is an intersection operation on subspaces that extracts the common information! The second corollary is an easy proof of one of the Gacs{Korner [1] results on common information. Let a and b be two binary strings. We look for the binary string x that represents the common information in a and b in the following sense (cf. the de nition for the case of Shannon entropies above): K(xja) and K(xjb) are small and K(x) is close to I(a : b). (As we know from section 2, equation (6), K(x) cannot exceed I(a : b) signi cantly if K(xja) and K(xjb) are small.) Now we can read the Kolmogorov complexity version of the inequality of Theorem 8 in the following way: If for given a and b one can nd c and d such that I(a : bjc), I(a : bjc) and I(c : d) are small, then

Theorem 10 Consider the in nite sequence of independent trials h i ; ii having this distribution. Let AN be the initial segment 1 2 : : : N ; BN be the initial segment 1 2 : : : N . Then with probability 1 we have

I(AN : BN ) = cN + o(N) where c = I( : ) > 0. At the same time the following is true: for any sequence XN of binary strings of length O(N) such that K(XN jAN ) = o(N) and K(XN jBN ) = o(N), the complexity K(XN ) is small: K(XN ) = o(N). This result was proved (among others) in [1], but the proof is rather technical and long.

7 Questions Many questions are still unsolved. Here are some of them:  Is it true that all inequalities valid for Shannon entropies or Kolmogorov complexities are consequences of basic inequalities?

11

 Is it true that all inequalities valid for ranks are

consequences of basic inequalities and Ingletontype inequalities?

 What inequalities are true for ranks in arbitrary matroids? (For n = 4 the answer is given by Theorem 7.)

 The proof of Gacs{Korner's result given above

works only if the probabilities are close enough to 1=2; we cannot use it directly if 3=8 and 5=8 are replaced, say, by 1=8 and 7=8. Is it possible to modify it and get a simple proof of Gacs{ Korner's result for general case?

References

[1] P. Gacs and J. Korner. Common information is far less than mutual information. Problems of Control and Inform. Theory, 2:149{162, 1973. [2] D. Hammer and A. Shen. A strange application of Kolmogorov complexity. Mathematical Systems Theory, accepted for publication. [3] A.W. Ingleton. Representation of matroids. In: D.J.A. Welsh, editor. Combinatorial mathematics and its applications. Academic Press (London), 1971, pp. 149{167. [4] A.N. Kolmogorov. Three approaches to the quantitative de nition of information. Problems Inform. Transmission, 1(1):1{7, 1965. [5] A.N. Kolmogorov. Logical basis for information theory and probability theory. IEEE Trans. Inform. Theory, IT-14(5):662{664, 1968. [6] M. Li and P. Vitanyi. An introduction to Kolmogorov complexity and its applications. SpringerVerlag, 1993. [7] V.A. Uspensky and A. Shen. Relation between varieties of Kolmogorov complexities. Mathematical Systems Theory, 29:271{292 (1996) [8] D.J.A. Welsh. Matroid theory. Academic Press, 1976. [9] A.K. Zvonkin and L.A. Levin. The complexity of nite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Math. Surveys, 25(6):83{124, 1970

12