Kolmogorov Complexity in Combinatory Logic - CiteSeerX

3 downloads 0 Views 98KB Size Report
Mar 17, 2002 - machine learning, machine models, and logic. In this paper we propose a concrete definition of Kolomogorov com- plexity that is (arguably) as ...
Kolmogorov Complexity in Combinatory Logic John Tromp March 17, 2002 Abstract Intuitively, the amount of information in a string is the size of the shortest program that outputs the string. The first billion digits of π for example, contain very little information, since they can be calculated by a C program of a few lines only. Although information content seems to be highly dependent on choice of programming language, the notion is actually invariant up to an additive constant. The theory of program size complexity, which has become known as Kolmogorov complexity after one of its founding fathers, has found fruitful application in many fields such as combinatorics, algorithm analysis, machine learning, machine models, and logic. In this paper we propose a concrete definition of Kolomogorov complexity that is (arguably) as simple as possible, by defining a machine model based on the elegantly minimal Combinatory Logic, and exhibiting a universal machine.

1

Introduction

Our objective is to define, concretely, for any finite binary string x, a measure of complexity of description C(x). This complexity is taken to be the length of a shortest ‘program’ p such that U (p) = x, for some fixed ‘universal machine’ U . We shall allow ourselves some freedom as to what a universal machine is, in the interest of obtaining as simple a definition as possible. This will lead us to consider models of computation other than the familiar Turing Machines. Measures according to different universal machines will differ by at most an additive constant. In [3], Levin stressed the importance of a measure, which, when compared with other natural measures, yields small constants, of at most a few hundred bits. His approach was based on constructive objects (c.o.’s) which are functions from and to lower ranked c.o.’s. Levin stops short of exhibiting a specific universal machine though, and the abstract, almost topological, nature of algorithms in the model complicates a study of the constants achievable. The usual machine model specifies the form and operational behaviour of machines. It specifies how input is provided to the machine, how the machine 1

can change state, when (if ever) the computation is considered to halt, and what the output is taken to be. In this way every machine computes some partial recursive function, and there are generally infinitely many other machines computing the same function. It turns out to be very useful to consider models in which machines can’t recognize the end of their input. Rather, they have to decide, based on what’s been read so far, whether to read any more bits. In these so-called prefix machine models, only prefix-free partial recursive functions are computable—if the function is defined on some input x, it is necessarily undefined on any string of which x is a proper prefix. This forces programs to be self-delimiting and allows them to be concatenated with no loss of information. A universal machine, intuitively, is one that can imitate any other other machine (in a given machine model), given a description of it: Definition 1 A machine U is said to be universal if for all machines W there exists a binary string w such that U (wx) = W (x) for all binary strings x1 . Since w and x are simply concatenated without any separating markers, w needs to be self-delimiting so that U can locate its end. In effect, U must behave like a prefix machine while reading the machine description. Since defining the information content of a string reduces to fixing a universal machine, our main objective is to find a machine model in which a universal machine is easily defined. For this purpose, Turing machines turn out to be less than desirable. The operating logic of a Turing machine, its finite control, is of a somewhat irregular nature, having no natural straightforward encoding into a bitstring, let alone a self-delimiting one. In his book [1], at the end of Chapter 2, Roger Penrose took up the challenge, exhibiting a universal Turing machine describable in no less than 5495 bits. In [2], Gregory Chaitin paraphrases John McCarthy about his invention of LISP, as “This is a better universal Turing machine. Let’s do recursive function theory that way!” Later, Chaitin continues with “So I’ve done that using LISP because LISP is simple enough, LISP is in the intersection between theoretical and practical programming. Lambda calculus is even simpler and more elegant than LISP, but it’s unusable. Pure lambda calculus with combinators S and K, it’s beautifully elegant, but you can’t really run programs that way, they’re too slow.” Since we’re more concerned with simplicity of definition than with efficiency of execution, we will do just that: base a concrete definition of Kolmogorov complexity on combinators. What follows is a very brief exposition on combinatory logic (see [5] for a full comprehensive treatment), or CL for short, which develops into a discussion of how to represent binary strings in it. Some familiarity with lambda calculus is assumed. 1 Here, M (x) denotes the output of machine M on input x. Should M fail to halt on input x, then M (x) takes the special value “undefined”.

2

2

Combinatory Logic

The atomic combinators S and K are characterized by the contraction rules SXY Z KXY

= XZ(Y Z) = X

Other combinators are constructed from these two by application. We assume that application of X on Y , written (XY ), associates to the left, and omit parentheses accordingly. Thus, SXY Z should be interpreted as (((SX)Y )Z), and XZ(Y Z) as (XZ)(Y Z). Combinations are applicative terms that may contain variables x, y, z . . . in addition to S and K. The size of a combination is the number of occurances of variables, S, and K in it. Syntactical identity is denoted ≡, while = denotes convertibility under contractions. Perhaps surprisingly, S and K suffice to define any function definable in lambda calculus. For instance, the second place selection function λxy.y is given by the combinator SK: SKxy = Ky(xy) = y. As another example, we can check that I ≡ SKz, for any z, gives the identity function λx.x: Ix ≡ SKzx = Kx(zx) = x. A more involved combinator is ∞ ≡ SSK(S(SSK)). When applied to anything, it leads to an infinite reduction (abbreviating S(SSK) to L): ∞x ≡ SSKLx = SL(KL)x = Lx(KLx) = LxL ≡ S(SSK)xL = SSKL(xL) ≡ ∞(xL) = ∞(xLL) = ∞(xLLL) . . . In our machine model, we want to consider every combinator to be a machine, capable of processing input bits and producing output bits. This entails representing an input binary string by a combinator, to which the machine can be applied, and a way of interpreting the result, if possible, as an output binary string.

3

Computation in Combinatory Logic

So far we’ve presented CL as an equational theory. A computation in CL is a sequence of contractions, which can be considered the basic computational steps. A halting computation is one that ends in a normal form: a combinator in which no contractions are possible. It can be shown that two equal combinators have the same unique normal form, if any. Furthermore, this normal form can be reached by always contracting the leftmost S or K. This will be the basic machine cycle. When a machine halts, we can check if it is the normal form of a binary string, and if so, take that to be the output. 3

4

Booleans, pairs, and lists

Define 0 ≡ K

(true)

1 ≡ SK

(false)

Then BP Q, for B ∈ {0, 1}, represents if B then P else Q.2 The simplest way to pair two combinators x and y is to have something that, when applied to 0 gives x, and when applied to 1 gives y. What we want then is a combinator, call it P, that satisfies Pxyz = zxy which is satisfied by P ≡ S(S(KS)(S(KK)(S(KS)(S(S(KS)(SK))K))))(KK). Lists are constructed in the usual fashion by repeated pairing. Suppose the empty list is represented by some combinator $.3 Then a (delimited) list [L0 , L1 , . . . , Ln−1 ] is represented by PL0 (PL1 . . . (PLn−1 $) . . .). The first element of a list L is L0 while L1 is the tail of L. The i’th element of a list L can be selected as L1i 0. This leaves the question of how to represent the empty list $. Since its only use is in being distinguishable from a non-empty list, which we assume to be of the form Pxy, it suffices to find representations for $ and for an empty-testing combinator we’ll call ‘?’, that satisfy $? = 0 P xy? = ?xy = 1. The simplest way of achieving this is by defining $ = K0 and ? = K(K1). An undelimited list is obtained by replacing the terminating $ by the combinator ∞; any attempt to process the list beyond its end will then result in a diverging computation. This property will be used later in defining prefix complexity. With any string s and combinator t, we associate the list of bits (as booleans 0 and 1) in the string, terminated with t, and denote it (s : t). For instance, the string (011 : $) = P0(P1(P1$)). It has normal form S(SI(K0))(K(S(SI(K1))(K(S(SI(K1))(K$))))). It’s easy to see whether a halted computation is in such a normal form, and to extract the binary string from it as the output. 2 choosing

1 ≡ K would associate true with 1, but would fail to give the desired Bx0 x1 = xB [5], the smallest list considered is a singleton one [M ] which is defined as M itself. This creates problems when dealing with lists of unknown length. 3 In

4

5

The Universal Combinator

According to definition 1, a universal combinator is one that can imitate any other combinator given a description of it as a prefix of its input. We first present a simple encoding of combinators into strings and subsequently show how this can be decoded. By applying the decoded combinator to the remainder of the input—whether delimited or not—we obtain a universal combinator.

5.1

Encoding combinators as binary strings

Combinators have a wonderfully simple encoding as binary strings, one that’s self-delimiting to boot: encode S as 00, K as 01, and application as 1. Formally, we define the encoding hCi of a combinator C as hSi hKi

≡ 00 ≡ 01

hC0 C1 i ≡ 1hC0 ihC1 i For instance, the combinator S(KSS), (S((KS)S)) in full, is encoded as 10011010000. The length of the encoding of a size n combinator is thus 3n − 1.

5.2

Decoding a combinator encoding

In turns out that decoding is most conveniently done in Continuation Passing Style. That is, with a combinator V that satisfies V c(hCi : t) = cCt The following combinator achieves just that: V cl V0 l V1

= l(λa.aV0 V1 ) = l(λa.c(aSK)) = V (λa.V (λb.c(ab)))

Given a continuation c and a list l, V examines the first bit of l and applies V0 or V1 to the remainer of the list accordingly. V0 examines the next bit and returns the continuation applied to either S or K. Thus, V c(00 : t) = V0 (0 : t) = cSt and V c(01 : t) = V0 (1 : t) = cKt. To witness the operation of V1 : V c(1hC0 ihC1 i : t) = V1 (hC0 ihC1 i : t) = 5

V (λa.V (λb.c(ab)))(hC0 ihC1 i : t) = (λa.V (λb.c(ab)))C0 (hC1 i : t) = V (λb.c(C0 b))(hC1 i : t) = (λb.c(C0 b))C1 t c(C0 C1 )t

=

To find a V satisfying these recursive definitions requires the use of a fixpoint operator Y with the property Yx = x(Yx). The two common definitions for Y are YCurry ≡ λx.V V, V ≡ λy.x(yy), YTuring ≡ ZZ, Z ≡ λzx.x(zzx). The first one, due to Curry, rates as the simplest one known, but even the most clever translations of these definitions into combinators give sizes 18 and 20 respectively. In search of a simpler one, we arrived at Y ≡ W (λxy.(x(W yx))), W ≡ λxy.xyx, or the combinator Y ≡ SSK(S(K(SS(S(SSK))))K) as what we believe to be the shortest possible fixpoint combinator, of size 12. With help of Y, we can take V ≡ YV ′ where V ′ vcl = l(λa.aV0 (v(λa.v(λb.c(ab))))). Putting the pieces together, we form the universal combinator by using an identity continuation: U ≡ V I, which has the property that U(hCi : t) ≡ V I(hCi : t) = ICt = Ct. At long last, we can define Kolmogorov complexity concretely. Definition 2 The simple Kolmogorov complexity KS(x) of a string x is the length of a minimal string p such that U(p : $) = (x : $). The prefix Kolmogorov complexity KP (x) of a string x is the length of a minimal string p such that U(p : z) = P (x : $)z 4 . To define Kolmogorov complexity conditional to a string y, we simply, after applying U to the input, apply the result to (y$) in turn. (It’s clear how this approach can be extended to an arbitrary number of conditional arguments.) 4 producing a pair of x and the remaining, ”unread”, part of the input best captures the spirit of self-delimiting descriptions.

6

Definition 3 The simple Kolmogorov complexity KS(x|y) of a string x conditional to a string y is the length of a minimal string p such that U(p : $)(y : $) = (x : $). The prefix Kolmogorov complexity KP (x|y) of a string x conditional to a string y is the length of a minimal string p such that U(p : z)(y : $) = P (x : $)z. The definitions also extend naturally to complexities of pairs of strings, e.g. KS(x, y) is the length of a minimal string p such that U(p : $) = P (x : $)(y : $).

6

Conclusion

We’ve seen that combinatory logic makes an excellent vehicle for defining Kolmogorov complexity concretely. As a result we can prove such statements as Theorem 1 KS(x) ≤ l(x) + 8 KS(x|y) ≤ l(x) + 2 KP (x) ≤ 2l(x) + 297 KP (x|l(x)) ≤ l(x) + 599 KP (x) ≤ l(x) + 2l(l(x)) + 915 KP (x, y) ≤ KP (x) + KP (y) + 428 For instance, to prove the third result, note that a string s can be encoded in a self-delimiting way as 1s0 1s1 . . . 1sn−1 0 of length 2n + 1. This is decoded with combinator D defined by Dl = l(λa.a(P $)(λl.l(λaλl.Dl(λx.P (P ax))))), expressible as a size 99 combinator. A mere 272 bits, or 34 bytes, suffice to encode the universal combinator U; over 20 times smaller than Penrose’s—admittedly less optimized—universal Turing machine: 11111000001110010111000010011000001011100110010100110010101110010110 01011100101100110001010111001011001100101001100101110010110011000101 01110010110011000101110010101110000101110011001100010110100101011100 10110010101110010111000011001100101001100101010010100011010111000101

References [1] R. Penrose, The Emperor’s New Mind, Oxford University press, 1989. [2] G. Chaitin, An Invitation to Algorithmic Information Theory, DMTCS’96 Proceedings, Springer Verlag, Singapore, 1997, pp. 1-23 (http://www.cs.auckland.ac.nz/CDMTCS/chaitin/inv.html). [3] L. Levin, On a Concrete Method of Assigning Complexity Measures, Doklady Akademii nauk SSSR, vol. 18(3), pp. 727-731, 1977. 7

[4] M. Li, and P. Vit´anyi, An Introduction to Kolmogorov Complexity and Its Applications, Graduate Texts in Computer Science, second edition, SpringerVerlag, New York, 1997. [5] H.P. Barendregt, The Lambda Calculus, its Syntax and Semantics, revised edition, North-Holland, Amsterdam, 1984.

8