## Coding Complexity: The Computational Complexity of Succinct ...

Dept. of Computer Science, Tokyo Inst. of Tech., Meguro-ku Tokyo 152, Japan. .... characterized as the class of sets that are polynomial-time Turing reducible to ...

L

ISSN 0918-2802 Technical Report

Coding Complexity:

The Computational Complexity of Succinct Descriptions Jose L. Balcazar, Ricard Gavald a, and Osamu Watanabe

TR96-0018 October

Department of Computer Science Tokyo Institute of Technology O^ okayama 2-12-1 Meguro Tokyo 152, Japan

http://www.cs.titech.ac.jp/

c The author(s) of this report reserves all the rights.

Title: Coding Complexity: Authors:

The Computational Complexity of Succinct Descriptions

Jose L. Balcazar, Ricard Gavalda Dept. of Software (LSI), Univ. Politecnica de Catalunya, E-08028 Barcelona, Spain. E-mail: fbalqui,[email protected] Osamu Watanabe Dept. of Computer Science, Tokyo Inst. of Tech., Meguro-ku Tokyo 152, Japan. E-mail: [email protected] Abstract. For a given set of strings, the problem of obtaining a succinct description becomes an important subject of research, related to several areas of theoretical computer science. In structural complexity theory, researchers have developed a reasonable framework for studying the complexity of these problems. In this paper, we survey how such investigation has proceeded, and explain the current status of our knowledge. 1

Introduction

The question \What is structural complexity theory?", or \Why do we study structural complexity theory?" has been sometimes asked, and it has been the source of interesting discussions. Attempting to give an answer to this question, Book and Watanabe wrote an article [BW93] on structural complexity theory. There they illustrated, through examples, that there is the following role in structural complexity theory: When studying some speci c problem, one may establish some intuition regarding the diculty of the problem compared with others. The role of structural complexity theory is to provide a framework and a theory for expressing such intuition formally and investigate it further. Here we present another set of examples supporting this thesis. We survey a structural study on \coding complexity" of languages. The problem of obtaining a succinct description for a given set of data arises in various elds of computer science. More speci cally, we consider in this paper the problem of representing a given language, i.e., a set of strings. Here we regard a machinery, e.g., automaton, circuit, grammar, etc., for recognizing or generating a language as its description. That is, the problem is to compute, e.g., some (small) automaton accepting a given language. In this paper, we use \coding problem" referring to this type of problem, and by \coding complexity", we mean the complexity of these coding problems. Our coding problems are closely related to learning theory. Roughly speaking, this is the problem to be solved for designing a learning algorithm. The problem varies, however, depending on the learning framework chosen. In the PAC learning framework [Val84], a learning algorithm is essentially the same as an algorithm that computes a succinct description explaining a given set of examples. (Actually, the equivalence between learning and nding a compressed description of the examples can be stated and proved formally; see [BEHW87], [BP90].) Notice that a set of examples is given as input data. Hence, the coding problem becomes an optimization 1

problem, and we can discuss its diculty in the standard complexity theory framework; for example, we can argue for the diculty of a given PAC learning problem by showing the corresponding coding problem is NP-hard [PV88]. On the other hand, in the query learning framework [Ang87], information on a target language is obtained on the course of computation by asking queries to a teacher, a person or artifact who knows the target language, or enough information about it. Thus, some new framework is necessary for discussing the diculty of such learning process. We will see below that structural research on relative complexity and nonuniform complexity has provided us with one such framework and several important results, which would help us to investigate query learnability . 1

2

Preliminaries

We use standard notions and notations in formal language theory and complexity theory, and we review some of the basic notions and notations; see, e.g., [BDG88, Pap94] for the others. In this paper, a string is a binary string, i.e., an element of f0 1g3 , and by a language (or, more simply, a set), we mean a set of strings. For any set and any integer , we use  and to denote the sets of strings in of length at most and exactly , respectively. A subset of 03 is called a tally set, and a set is called a sparse set if there exists a polynomial such that k  k  ( ) for all  0. For measuring uniform complexity, we use a standard Turing machine model, and de ne complexity classes by using resource bounded Turing machines. For example, P is the class of languages recognized by polynomial-time bounded Turing machines. We also consider Turing machine transducers for discussing the complexity of computing a function. For the sake of simplicity, we use the same notation for language and function classes. For example, by P we also mean the class of functions that are computable by polynomial-time Turing machine transducers, a class frequently denoted as PF or FP. For measuring relative complexity, we use oracle Turing machines, i.e., machines that ask queries to an oracle set. For example, for any set , let P( ) is the class of languages recognized by polynomial-time bounded oracle Turing machines by using as an oracle. The boolean circuit model is also considered in order to measure nonuniform complexity. Here we consider combinatorial circuits, i.e., acyclic circuits consisting of AND, OR, NOT, input and output gates. We assume that each circuit has only one output gate, and regard a circuit of input gates as an acceptor for strings of length . When necessary, we assume that circuits are encoded in f0 1g3 in some reasonable way. In structural complexity theory, various polynomial-time reducibilities have been de ned for classifying polynomial-time bounded relativized computation. Among them, ;

A

A

=n A

n

A

p

A

n

p n

n

n

n

n

A

A

A

n

n

;

The results obtained so far are not strong enough to yield really important results on query learnability; but based on these results, we hope, a comprehensive theory of query learnability will be established in near future. 1

2

 -reducibility and  -reducibility will be considered in this paper. For any set and , we say that is  -reducible to if is accepted by some polynomial-time oracle Turing machine with as an oracle. On the other hand,  -reducibility is more restrictive. That is, we say that is  -reducible to if is accepted by some polynomial-time oracle Turing machine by asking only nonadaptive queries to : this means simply that the list of queries to be asked is prepared before the corresponding answers are known, so that the queries do not depend of the answers to other queries. The case in which a single query tells the result to be output is known as -reducibility; it de nes a polynomial-time computable function from the reduced set to the set it is reduced to, and likewise for their complements. If it is bijective and the inverse is also polynomial-time computable, we call it a polynomial-time isomorphism. We use sometimes Kolmogorov complexity. We provide only the very basic de nitions concerning this notion; see the book by Li and Vitanyi for more information [LV93]. Fix a universal machine . For a binary string , any string such that ( ) stops and outputs is considered as a description of . The Kolmogorov complexity of , denoted by K( ), is de ned as the length of the shortest description for . String is called Kolmogorovrandom if K( ) is at least equal to the length of (except, maybe, up to some additive constant). It is well known that, for every length , there are Kolmogorov-random strings of length . On the other hand, if K( ) is at most logarithmic in the length of , is informally called Kolmogorov-easy. P T

B

P tt P T

A

A

B

A

P tt

B

A

P tt

B

A

B

m

U

x

x

y

U y

x

x

x

x

x

x

x

n

n

3

x

x

x

Developing a Theory

We rst explain a framework for discussing the \descriptional complexity" of languages. As explained above, we consider three similar ways to represent a given language: circuit, tally set, and sparse set representations. Circuits and Reduction Classes

Let us start with circuits. It has been known (see, e.g., [Sav72]) that circuit size can be used as a computational complexity measure that is closely related to the standard time complexity of Turing machines. But Karp and Lipton [KL80] may be the rst who explicitly use the notion of circuit size as a nonuniform complexity measure. They also introduced the class P poly for characterizing the class of languages having polynomialsize circuits. Though P poly is de ned in a di erent way, in this paper, we use the following rather intuitive de nition for P poly. (See, e.g., [BDG88, Joh90]). De nition 3.1. A set has polynomial-size circuits if there exist a polynomial and a family f g  of circuits such that for each  0, (i) takes a string of length as input and determines whether is in , and (ii) the size of is bounded by ( ). Let P poly be the class of languages having polynomial-size circuits. =

=

=

A

Cn

n

p

n

0

x

A

Cn

x

Cn

=

3

n

p n

It has been long known (attributed to Meyer in [BH77]) that the class P poly is characterized as the class of sets that are polynomial-time Turing reducible to sparse sets. Book and Ko [BK88] showed further characterizations by considering the following \reducibility classes". De nition 3.2. [BK88] For any polynomial-time reduction type , R (TALLY) (resp., R (SPARSE)) is the class of sets that are reducible to some tally set (resp., sparse set) via some reduction of type . Proposition 3.3. [BK88] For any set , the following ve statements are equivalent. (1) is in P poly, (2) is polynomial-time Turing reducible to some sparse set, (3) is polynomial-time truth-table reducible to some sparse set, (4) is polynomial-time Turing reducible to some tally set, and (5) is polynomial-time truth-table reducible to some tally set. That is, P poly = R (SPARSE) = R (TALLY) = R (SPARSE) = R (TALLY). They also presented a deep study of more restrictive reduction classes, with precise comparisons among all of them. In this paper, we intuitively regard a family of circuits (resp., a tally or sparse set) as a \code". For example, for a given set 2 P poly, consider a family f g  of circuits recognizing . Then each can be regarded as a code of the set = f 2 : j j = g. In this case, a circuit evaluator, a machine computing, from and , the value of on , is considered as a \decoder". (Note that we may assume that this decoder runs in polynomial time; it is easy to design a circuit evaluator that runs in polynomial time w.r.t. the size of a given circuit.) Similarly, in the case that is reducible to a tally or sparse set via some polynomial-time oracle machine (i.e., = ( )), we can regard and as a \code" and its \decoder". That is, we would like to consider language coding systems that have polynomial-time decoders, and the above notions provide us with a framework for discussing \descriptional complexity" or \nonuniform complexity" in such language coding systems. Notice that in the above notions, we do not consider \coding complexity", the complexity of computing a code from a given language, while we assumed that \decoding complexity" is polynomial. This is because coding complexity is not essential for discussing descriptional complexity or nonuniform complexity; the main issue there is whether there exists a succinct representation, and we do not need to worry about a way to obtain the representation. In fact, in structural complexity theory, whereas properties of sets like those of being reducible to tally or sparse sets have been studied rather deeply, it seems that researchers have investigated somewhat less, and more implicitly than explicitly, \reverse complexity" : the complexity of computing a circuit, tally set, or sparse set representation from a given language. =

r

P r

P r

r

A

A

=

A

A

A

A

P T

=

P T

P tt

P tt

A

Cn

A

n

=n

Cn

A

0

x

A

x

x

Cn

n

Cn

x

A

L

A

M L

L

M

M

2

2

=

This term is attributed to Ron Book.

4

Before explaining such investigations, let us de ne some notions for discussing \coding complexity". For a tally set or sparse set representation, coding complexity is just the complexity of that tally or sparse set relative to the original set. For any set and any relativizable complexity class C, if there is some tally or sparse set such that  and 2 C ( ), then we may consider that has a tally or sparse set representation that is C ( )computable (relative to ). In this case, we say that has a C ( )-self-computable tally or sparse set (representation). On the other hand, we say that has C ( )-self-computable circuits if there is some oracle C -transducer such that (0 ) computes for every  0. (This is a generalization of the notion \self-producible circuits" introduced in [Ko85], which is, in our notation, equivalent to P( )-self-computable circuits.) A

L

L

A

P T L

A

A

A

3

A

4

A

M

M

A

n

Cn

n

Lowness

One of the important questions in structural complexity theory is to relate nonuniform complexity to the standard computational complexity. Research studying this question was started from the fundamental paper [KL80] of Karp and Lipton, where they proved some uniform complexity consequences from nonuniform upper bounds. The following theorem stands out among them. Theorem 3.4. [KL80] If NP  P poly, then the polynomial-time hierarchy collapses to 6 \ 5 . This line of research is closely related to our coding complexity. In fact, though it was quite implicit in [KL80], the key for proving the above theorem is the fact that there is some upper bound on coding complexity . This upper bound can be stated as follows. (The proof of the following theorem is not explicitly stated in the literature, but it is derived from the proof of Theorem 3.4 and, more directly, from the proofs in [KS85].) Theorem 3.5. Every NP-complete set in P poly has a tally set representation that is computable in 1 . Similarly, it has polynomial-size circuits computable in 1 . Remark. In general, the theorem holds for any \self-reducible" set in P poly (see, e.g., [BDG88, KS85, Sch86]). Due to the self-reducible property of , it is not necessary to use for computing its tally set (resp., circuit) representation. Thus, the above class 1 is the absolute one, and is not the relativized class 1 ( ). =

P 2

P 2

5

A

=

P 3

P 3

A

=

A

A

P 3

P 3 A

The term \self" is used because if A PT L and L 2 C (A), then L 2 C (L) for any reasonable complexity class C . 4 Although we use \computable" for both sets and circuits, the notion of \computable" di ers slightly in each context. By it means \recognizable" for tally or sparse sets, and \producible" for circuits. 5 Precisely speaking, for showing the theorem, it is enough to consider the problem of checking whether a given circuit is correct instead of producing a circuit. 3

5

By Theorem 3.4, Karp and Lipton demonstrated some sort of qualitative upper bound for self-reducible sets in P poly. In [Sch83], Schoning introduced the notion of \lowness " into structural complexity theory; this notion provides a formal way to discuss such qualitative upper bounds. Intuitively speaking, a set is low it has low complexity when used as an oracle set. More speci cally, Schoning de ned the following lowness notion for NP sets. A set 2 NP is 6 -low if 6 ( ) = 6 . That is, the oracle information in is useless in 6 -computation. (The notion of 1 -lowness is de ned similarly by using 1 classes instead of 6 classes.) Notice that for any NP-complete set, e.g., SAT, 6 (SAT) = 6 ; hence, SAT cannot be 6 -low unless the polynomial-hierarchy collapses to the 6 level. In other words, the role of SAT in 6 (SAT) cannot be replaced by any 6 -low set unless the polynomial-hierarchy collapses to 6 . That is, the lowness level of a set indicates how much it is away from the NP-completeness. Since the lowness notion was introduced into complexity theory [Sch83], the lowness of various sets have been investigated. Also the lowness notion itself has been generalized in several ways. (Here we can explain only limited number of results. Thus, the reader is recommended to refer a survey paper [Kob95] on the study of low sets.) By looking through these results, one can see the close relation between showing lowness properties and obtaining succinct representations. For example, as shown above, every NP-complete set has polynomial-size circuits that are computable in 1 . Then we can show that is 1 -low . That is, 1 ( ) = 1 , because any 1 ( )-computation can be simulated by some 1 -computation in the following way: First construct a circuit representation of (up to necessary length). Then using the obtained circuit in place of , simulate the 1 ( )-computation without using oracle . Clearly this computation can be done in 1. Since Schoning's lowness is de ned for NP sets, coding complexity is discussed in terms of nonrelativised complexity classes. Nevertheless, some idea for computing circuit representations, e.g., the one stated in [KS85], can be used for any other sets. Furthermore, Balcazar, Book, and Schoning introduced the notion of \extended lowness" in order to discuss lowness properties of sets outside of NP, and for studying this lowness, we now need to consider relativized complexity classes. For example, the following upper bound is immediate from the arguments in [BBS86, KS85]. Theorem 3.6. Every set in P poly has a 1 ( )-self computable tally set. Similarly, it has 1 ( )-self computable circuits. As we will see later, this upper bound was improved much later from results in computational learning theory. =

A

P

P

k

k

P

A

P

A

k

P

P

k

k

P

k

P k

P k +1

k

P

P k

k

P

P

k

k

P k

P 3

A

P 3

A

6

P 3 A

P 3

P 3 A

P 3

A

A

P 3 A

A

P 3

A

P 3

=

P 3

For showing lowness properties, it is enough that we can guess a circuit and check its correctness, and this task can be Pdone in 6P2, which is lower than 1P3 . From this fact, a stronger lowness property is provable; i.e., A is 62 -low. 6

6

Tally and Sparse Degrees

Now let us move to the second line of research relating to coding complexity. It is about \equivalence classes" introduced by Book and Tang. They correspond to the degrees induced by polynomial-time reducibilities. As shown in Proposition 3.3, from the point of view of descriptional complexity, and modulo polynomial-time computations, there is essentially no di erence between three coding systems: circuits, tally sets, and sparse sets. Also there is no di erence between nonadaptive and adaptive way to access information. It is, however, reasonable to expect some sort of di erence between tally and sparse set representations. This and related issues were discussed at length at a workshop that Ron Book organized at UCSB by 1985, as well as the connections with Kolmogorov complexity and lowness. Shortly after, lowness of certain Kolmogorov-easy sets was established by Balcazar and Book, together with the rst rudiments of the result eventually due to Allender and Rubinstein on the isomorphism degrees of tally sets described below. After, the rst author visited Ker-I Ko, then at Houston , where three things happened that are still in his memory: the very warm and friendly hospitality of Ker-I and his family, the pain of a wisdom-tooth, and the sudden realization (at the time of going to bed) of the fact that circuit and tally set representations were so closely related. The precise statement of this last intuition was worked out as follows: Theorem 3.7. [BB86] A set has P( )-self-computable circuits if and only if there exists a tally set that is polynomial-time Turing equivalent to , i.e.,  and  . Thus, for the topics we discuss here, from now on we focus on the di erences between tally and sparse sets, and between nonadaptive and adaptive access. For clarifying such di erence, Book and Tang considered coding complexity, the complexity of obtaining tally or sparse set descriptions from a given P poly language. The motivation was in the investigation of nonuniform complexity classes, subclasses of P poly, in the light of the characterization given above of P( )-self-computable circuits. In [BB86], Balcazar and Book studied the classi cation of P poly, and obtained, among others, the following result. Theorem 3.8. [BB86] There is a set in P poly (in fact, a sparse set) for which there is no tally set that is equivalent to , i.e.,  and  . Thus, has no P( )-self-computable tally set. That is, as a coding language, tally sets have much simpler structure than sparse sets, and thus it is hard to compute a tally set representation. Motivated by this fact, Book and Tang considered \reverse reducibility" (i.e., coding complexity in our terminology), 7

A

T

A

A

P T T

P T A

=

=

=

S

T

7

=

A

S

P T T

And his good friend Juan Romo, then at A&M at College Station.

7

T

P T S

S

T

and proposed the following concept of \equivalence classes" for studying the di erence between tally and sparse sets, and between various reduction types. De nition 3.9. [TB88] For any polynomial-time reduction type , E (TALLY) (resp., E (SPARSE)) is the class of languages that are equivalent to some tally set (resp., sparse set) via some reduction of type . By using these classes, they showed the following di erence. Theorem 3.10. [TB88] (1) E (TALLY) 6 E (SPARSE). (2) E (TALLY) 6 E (SPARSE). (3) E (TALLY) 6 E (TALLY). The di erence between E (SPARSE) and E (SPARSE) classes was again shown much later when coding complexity was studied more explicitly, as described below. r

P r

P r

r

P T P tt P tt

=

=

=

P T P tt P T

P tt

P T

Isomorphism Degrees

Book and Tang and their followers obtained many other classi cation results, just as the work of Book and Ko studied many other reduction classes. See, e.g., [AH92, AHOW91, AW90, TB88, TB91] for these results. We omit most of them here, but we will brie y mention what happens at the other end of the scale: the strongest degrees, de ned by polynomial-time isomorphisms, applied to tally sets. Indeed, tally strings are the most natural examples of words of low Kolmogorov complexity (speci cally, logarithmic), shortly followed by their images under string homomorphisms (such as 0101010101010101). Conversely, then, sets consisting only of words of logarithmic Kolmogorov complexity are the most natural generalization of tally sets, and easily one wonders which of the complexity-theoretic properties of tally sets are preserved under this generalization. The answer is: essentially, all of them. Building on a technically easier similar result in terms of semi-isomorphisms given in [BB86], Allender and Rubinstein found an interesting detour through the notion of rankability [GS91] and the intermediate step of the tally sets in P, which allowed them to prove: Theorem 3.11. [AR88] A set contains only words of logarithmic, polynomial timebounded, Kolmogorov complexity if and only if it is polynomially isomorphic to a tally set. This means that these sets have descriptions that are, on the one hand, strictly restricted regarding syntax, since only tally words can be used, but on the other hand are just a renaming of the encoded set, preserving, thus, all properties that are invariant 8

under polynomial-time isomorphism, including all their behaviors under polynomial-time reducibilities. Between isomorphism and Turing equivalence, a rich structure of reduction concepts exists, and each reduction provides a degree structure that we should consider (from the point of view of coding complexity) restricted now to tally sets. It is easy to see that the isomorphism degree of a tally set di ers from its equivalence degree, and we just mentioned the situation at the other end, near Turing equivalence. For many intermediate, ner reducibilities, The distinctness of the respective degrees is in fact a very deep question, equivalent or closely related to several other problems concerning the structure of polynomial and exponential-time classes. Speci cally, consider the following purely complexity-theoretic working hypothesis: Accepting computations for nondeterministic exponential time machines can be constructed deterministically in exponential time. This has been called sometimes \hypothesis Q", and has been studied in depth in [AW90]. It is obviously stronger than the equality of deterministic and nondeterministic exponential time, but weaker than P = NP. More precisely, it is an intermediate step [IT89] of the Sewelson's conjecture; that is, E = NE implies Q, and Q implies E = E , and Sewelson conjectured that E = NE implies E = E . Besides these, Q is related to interesting questions. For instance, certain one-way functions exist if and only if \Q" fails to hold. A connection with coding complexity is as follows: it turns out to be true if and only if all the many-one and bounded-truth-table degrees of tally sets coincide, and false if and only if all of them di er [AW90]. T

NP

m

NP

Logarithmic-Size Descriptions and Kolmogorov Complexity

Most of the results we have indicated, and many of the other related results from the same references that we have omitted, can be translated to the much more stringent condition on descriptions, namely logarithmic size. However, not all results are the strict analogues of those in the polynomial case, and technically there are a number of di erences in the argumentations. A rather complete account can be found in the Ph. D. dissertation of Hermo [Her96]; here we only highlight some aspects. The very concept of P log, the logarithmic analog of P poly, is somewhat unclear. Two alternative approaches, that are equivalent for polynomial bounds, are not anymore so: do we want a description for for each individual length , as in [KL80], or a  description for ? The rst one is more natural in terms of circuits but the resulting class is not closed under reductions (nor even isomorphisms), and thus is not amenable of an interesting complexity-theoretic analysis. Thus, the second one, which was introduced by Ko [Ko87], has to be chosen, and the circuit model adapted accordingly; the circuit expressions introduced in [WG94] turn out to be an adequate concept for this end. A second problem is that logarithmic size circuits are nonsense, because there is no way of devoting gates to more than a ridiculous fraction of the input. In [KL80] the proposed solution was to use circuit with easy descriptions, i.e. a sort of \description of =

=

A

A

=n

n

n

9

the description" of the original set; their formalization, however, was eventually found to correspond to a class di erent from P log [HM94]. An alternative formulation from [HM94], based on Kolmogorov complexity, is successful in formalizing the intuition of [KL80], though, and works as well for circuit expressions to characterize the variant of Ko. A curious, paradoxical-looking situation arises from the comparison of these logarithmically easy circuit expressions with similarly bounded circuits. Since each circuit is a circuit expression, one would expect that the class de ned by Kolmogorov-easy circuit expressions be larger than that de ned by Kolmogorov-easy circuits; but it turns out to be properly smaller! The explanation is that the condition on the Kolmogorov complexity bounds circuit expressions more stringently than circuits, due to the fact that circuit expressions are requested to work for sets of the form  whereas circuits are already satisfactory if they work for . See [BBH95] or [Her96] for details. This brand of logarithmic advice has also characterizations in terms of \doubly tally" (here denoted tally2) sets: tally sets whose words are all of length a power of 2. Both in these characterizations and in the one with circuit expressions, there is a marked di erence with the proofs in the polynomial case: the argumentation line is nontrivial after comparison. The main technical ingredient is a way of choosing a subsequence of objects from a given sequence (e.g. a sequence of circuit expressions), properly spaced out (usually by a double exponential), and to prove that the subsequence is short enough but still has the same descriptional power as the original sequence. This gives rise to the study of reduction and equivalence classes of tally2 sets, as well as their closure under isomorphism as in the previous section: yet one more instance of coding complexity; they present many characteristics shared with the polynomial bound case, and some peculiarities of their own. And actually a couple of open problems remain in the comparisons among all those reduction and equivalence classes. =

A

n

=n A

4

Coding Complexity and Computational Learning

Though rather implicitly, coding complexity has been investigated in these two lines of research. Thus, when coding complexity was studied explicitly in relation to query learning, there was a rather rm intuitive background on which formal de nitions and proofs could stand. In order to analyze query learnability in structural complexity theory, Watanabe [Wat90] proposed a framework, which was later amended and extended by Watanabe and Gavalda [Wat94, WG94]. They attempted to (i) characterize the power of query learning systems in terms of relativized complexity classes, and (ii) analyze the complexity of computing a description of a given target language, and thereby demonstrating nonlearnability of some concept classes. Here we omit explaining notions used in query learning and the Watanabe-Gavalda framework. Also we state only one result from [WG94] and omit others. See [Gav94, WG94] for the explanation and more results; we just hope 10

our small probe is illustrative enough. Theorem 4.1. [WG94] (1) If CIR, the class of concepts represented by circuits, is polynomial-time query learnable, then every set 2 P poly has 1 ( )-self-computable circuits. (2) For some subclass of CIR, e.g., REPCIR, the converse relation also holds. That is, REPCIR is polynomial-time query learnable with subset and superset queries if and only if every set 2 P poly has 1 ( )-self-computable circuits. Thus, our coding complexity is quite important here; in particular, its lower bound is important for showing non-learnability results. A

A

P 2

=

P 2

=

Upper and Lower Bounds on Coding Complexity

Motivated by all this, Gavalda and Watanabe investigated a lower bound on coding complexity. Notice that the result of Balcazar and Book (i.e., Theorem 3.8) gives one lower bound. Gavalda and Watanabe [GW93] improved it as follows. Theorem 4.2. [GW93] There is a set in P poly that has no (NP \ co-NP)( )-selfcomputable tally set. Thus, has no NPSV( )-self-computable circuits. Remark. NPSV is the class of functions that are computable by a polynomial-time nondeterministic single valued transducer, which can be regarded as a function version of the class NP \ co-NP. By using the proof technique introduced for proving this theorem, the following separation, which was left open in [TB88], was also proved. Theorem 4.3. [GW93] E (SPARSE) 6 E (SPARSE). The obtained lower bound NP \ co-NP is far from our goal 1 . On the other hand, there is some evidence that this goal is quite dicult to achieve. After proving the NP \ co-NP lower bound, Gavalda tried to improve known upper bound, i.e., 1 (see Theorem 3.6). By using a \majority vote strategy", he could obtain the following incomparable bound. (Here and after, we will state results only in terms of circuits. But the same or very similar results hold for tally sets.) Theorem 4.4. [Gav95] Every set in P poly has P(NP( ) 8 6 )-self-computable circuits. Remark. In P(NP( ) 8 6 ), a relativized part is only P(NP( )) part, and 6 is an absolute class. This was improved further by Kobler in the following way. Theorem 4.5. [Kob94] Every set in P poly has P(NP( ) 8 6 )-self-computable circuits. A

=

A

P tt

=

P T

P 2

P 3

P 3

=

P 3

P 3

P 2

=

11

It follows from Theorem 4.4 (and of course, Theorem 4.5) that if NP = co-NP (and thus 6 = NP), then every set in P poly has 1 ( )-self-computable circuits. Thus, our goal (i.e., the 1 lower bound) implies NP 6= co-NP. Therefore, it seems very hard to prove the 1 lower bound from no assumption. Here the problem of interest is to prove this goal from some reasonable assumption. P 3

P 2

=

P 2

P 2

A BPP Lower Bound

The NP \ co-NP lower bound in Theorem 4.2 is a slight improvement over the longknown P lower bound. We present now another improvement that, as far as we know, was previously unpublished, namely, a BPP lower bound. The proof is based on a an easy but useful-looking lemma about Kolmogorov complexity taken from [Gav92]; we are not aware of this lemma having been proved in the literature. Informally speaking, we de ne a popular string to be one that shows up very frequently as the output of some recursive function. The lemma quanti es the intuition that all popular strings must have low Kolmogorov complexity. Lemma 4.6. (Popularity Lemma) Let : f0 1g3 ! f0 1g3 and : IN ! IN be total recursive functions. De ne the popularity of (w.r.t. and ) as Pop ( ) =j f j ( ) = ^ j j  (j j) g j Then, for every , K( )  (j j) 0 log Pop ( ) + (log(j j + (j j))) Proof. Notice rst that, on the hypothesis that and are total and recursive, the popularity function is also total and recursive. Fix a length and consider the list , , . . . , consisting of all words of length sorted in order of decreasing popularity; that is, Pop ( )  Pop ( )  . . .  Pop ( ). Ties are solved, say, by lexicographical order. Let be a program that, on input h i, computes the popularity of every string of length , sorts these strings according to their popularity and prints the th word in the resulting list. Then, our universal machine on input h i outputs precisely . We now estimate jh ij. Since every string of length  ( ) adds at most to the popularity of one string, we have for every X 2  Pop ( )  1 Pop ( ) f

;

;

x

f;g

x

y

f y

f

x

y

g

g

g

x

:

x

x

g

x

f ;g

x

O

x

f

n

x1

x2

g

:

g

x2n

n

f;g

f;g

x

x1

f;g

x2

x2n p

n; i

n

i

p; n; i

xi

p; n; i

g n

i

g (n)+1

j

Hence,  2 K( ) by i

g (n)+1

=

i

f;g

xj

i

f;g

xi :

Pop ( ), so j j  ( ) + 1 0 log Pop ( ). Therefore, we can bound f ;g

xi

i

g n

f;g

xi

12

xi

K( )  jh ij  j j + j j + j j + (log(j j + j j)) = (1) + log + ( ) 0 log Pop ( ) + (log( + ( ))) = ( ) 0 log Pop ( ) + (log( + ( ))) This proves the lemma. tu It is easy to give a version of this lemma for resource-bounded Kolmogorov complexity. Simply take into account the resources needed to compute and , and estimate those used by program in the proof. The Popularity Lemma can be applied to fool BPP-type oracle computations in the following way: suppose that membership of a particular string in the oracle is \important" to determine whether a BPP machine accepts or rejects. Taking the string in or out of the oracle causes a large fraction of the randomized computations to change their result. Therefore, the string has to be queried in a large fraction of the computations. This causes the string to be very popular, so by the lemma it must have low Kolmogorov complexity. As there are few Kolmogorov-easy strings, there can be only few such \important" strings. This guarantees ample room to diagonalize. Theorem 4.7. There is a set in P poly (in fact, a sparse set) for which there is no tally set such that  and 2 BPP( ). Thus, has no BPP( )-self-computable tally set. Proof. Choose a sequence , , . . . , , . . . of natural numbers such that 2 . S For each , let be a Kolmogorov-random string of length . De ne = f g. Now assume that there is a tally set such that  and 2 BPP( ), and let machines and , respectively, witness these facts. Assume without loss of generality that both and run in time exactly ( ). and that accepts in a BPP way with threshold 3 4. From these assumptions, we will derive a contradiction with the fact that strings are random. Fix a large and let 0 be the nite oracle  0 . We show that string is queried not too often by (0 ). This is equivalent to showing that it does not appear too often as the result of the following function (h i): \the th query of with input 0 and oracle 0 using random bits ". The function is de ned only for those h i such that  ( ), j j  ( ),  ( ). For all these and some constant , jh ij  ( ) + log ( )). Taking ( ) = ( ) + log ( ), by the Popularity Lemma, we have xi

p; n; i

p

O

n

n

i

O

g n

g n

f ;g

n

xi

f;g

xi

O

i

n

O

g n

n

g n

:

f

g

p

S

T

S

P T T

T

n1

ni

=

S

n2

S

ni

xi

ni

T

Ma

ni

ni+1

P T T

S

S

T

i

xi

S

Mb

Ma

Mb

p n

Mb

=

xi

ni

S

S

S

f

S

j; k; y

j; k; y

1

xi

k

Mb

k

ni

j; k; y

j

Mb

y

j

p k

p k

c

p k

g n

p n

y

c

p k

k

p ni

c

p n

log Pop ( )  ( ) 0 K( ) + (log ( ))  ( ) 0 + (log )  ( ) 0 3 f;g

xi

p k

p k

xi

ni

13

O

O

p k

k

p k

because  ( ). This means that Pop (0 )  2 8, or, in other words, at most a fraction 1 8 of the 2 computations of (0 ) query . Note that only this small fraction of computations may change when (0 ) uses oracle instead of 0: and 0 di er only on the string and on strings too long to be reached by (0 ) (if is large enough). Therefore, as a probabilistic machine, (0 ) gives the same result with both oracles and 0. This gives a recursive way of reconstructing up to length ( ), given only , , and the list of words in 0: it is 0 enough to run (0 ) for every  ( ). Once this part of is obtained, the reduction given by can be used to nd the word , by exhaustive search. All in all, we have a description of in terms of , , , and 0. The bit-length of this description is much less than for suciently large. This contradicts the randomness of . tu As we will see next, it is not possible to extend this BPP lower bound to the presence of an NP oracle. k

p ni

f;g

p(k )

=

p(k )

xi

S

Mb

k

=

xi

Mb

S

S

Mb

Mb

S

k

S

xi

ni

k

S

T

p ni

S

Mb

k

k

k

ni

S

Mb

S

p ni

T

Ma

xi

xi

ni

ni

Ma

Mb

S

ni

xi

The Interplay with Learning Theory

Recently, yet further improvement has been obtained; this time from computational learning theory. Bshouty, Cleve, Kannan, and Tamon [BCKT94] studied query learnability of CIR when learners are provided some additional computational power, such as access to an NP oracle. By using a clever implementation of \majority vote strategy", they showed that CIR is randomized polynomial-time query learnable by using equivalence queries and also using some NP oracle. This result on the complexity of learning CIR is interesting for two main reasons. First, circuits are powerful enough to encode eciently any representation of boolean functions that is interesting from the point of view of learning theory. More precisely, equivalence queries (and other types of queries) with any sensible representation scheme can be replaced by queries with boolean circuits. Second, it is known that CIR is not polynomial-time learnable under widely believed cryptographic hypotheses ([Val84], see also [KV94]). The upper bound in [BCKT94] complements, up to some point, this cryptographic lower bound. In our framework, the learning result in [BCKT94] can be interpreted in the following way. Theorem 4.8. [BCKT94] Every set in P poly has ZPP(NP( ))-self-computable circuits. Remark. ZPP is the class of sets (in this case, functions) that are computable by some randomized machine with no error and within expected polynomial time. This result, in turn, gives the following improvements of Theorem 3.5 and Theorem 3.4. Theorem 4.9. [KW95] Every NP-complete set in P poly has polynomial-size circuits computable in ZPP(NP). =

A

14

=

Theorem 4.10.

[KW95] If NP  P poly, then the polynomial-time hierarchy collapses =

to ZPP(NP). To nish, let us mention very brie y that also the case of logarithmically long descriptions treated in section 3 has some relationship to learning theory. Consider Kolmogoroveasy circuit expressions that characterize the logarithmic nonuniform class: they are learnable in the presence of an NP oracle, by means of membership queries to the set to be described. Furthermore, the sets whose descriptions in these same terms can be found with queries to the set can be characterized by the polynomial time Turing degrees of tally2 sets (see [BBH95] or [Her96]). And, eventually, it has been proved [BBu96] that Kolmogorov-easy circuit expressions can be learned in polynomial time via membership queries (without the help of any additional oracle) if and only if the purely complexitytheoretic hypothesis Q, described in a previous section, holds: a learning algorithm exists for these expressions if and only if accepting computations for nondeterministic exponential time machines can be constructed deterministically in exponential time. As a by-product of the proof, it is proved there as well that the access to NP can be reduced to nonadaptive in the learning algorithm of [BBH95] if and only if the hypothesis Q is equivalent to the equality E = NE. Acknowledgments

We would like to thank Ker-I Ko and Ding-Zhu Du for the task of preparing this festschrift volume, and for inviting us to participate in it. Osamu also thanks Josep Daz for inviting him to Barcelona in 1990 and giving him a chance to start working with excellent researchers of Barcelona on the topics explained in this paper. But everything had started when Ron Book invited Jose and then Osamu to Santa Barbara. References

[AH92] Allender E, Hemachandra L. Lower bounds for the low hierarchy. Journal of the ACM 1992;39:234{251. [AHOW91] Allender E, Hemachandra L, Ogiwara M, Watanabe O. Relating equivalence and reducibility to sparse sets. SIAM Journal of Computing 1992;21:521{539. [AR88] Allender E, Rubinstein R. P-printable sets. SIAM Journal of Computing 1988;17:1193{1202. [AW90] Allender E, Watanabe O. Kolmogorov complexity and degrees of tally sets. Information and Computation 1990;86:160{178. [Ang87] Angluin, D. Leaning regular sets from queries and counterexamples. Information and Computation 1987;75:87{106. 15

[Ang87] Angluin, D. Queries and concept learning. Machine Learning 1988;2:319{342. [BB86] Balcazar J, Book R. Sets with small generalized Kolmogorov complexity, Acta Informatica 1986;23,679{688. [BBS86] Balcazar J, Book R, Schoning U. Sparse sets, lowness and highness. SIAM Journal of Computing 1986;15:679{688. [BBu96] Balcazar J, Buhrman H. Characterizing the learnability of Kolmogorov-easy circuit expressions. In preparation. Preliminary version as report LSI{95{59{R. [BBH95] Balcazar J, Buhrman H, Hermo M. Learnability of Kolmogorov-easy circuit expressions. Proc. Second European Conference on Computational Learning Theory, Lecture Notes in Computer Science, Springer-Verlag 1995;904:112{124. [BDG88] Balcazar J, Daz J, Gabarro J. Structural Complexity I, Springer-Verlag 1988. [BH77] Berman L, Hartmanis J. On isomorphisms and density of NP and other complete sets. SIAM Journal of Computing 1977;6:305{322. [BEHW87] Blumer A, Ehrenfeucht A, Haussler D, Warmuth M. Occam's razor. Information Processing Letters 1987;24:377{380. [BP90] Board R, Pitt L. On the necessity of Occam algorithms. Proc. 22nd ACM Symposium on Theory of Computing, ACM Press 1990:54{63. [BK88] Book R, Ko K. On sets truth-table reducible to sparse sets. SIAM Journal of Computing 1988;17:903{919. [BW93] Book R, Watanabe O. A view of structural complexity theory. Current Trends in Theoretical Computer Science (G. Rozenburg and A. Salomaa Eds.), World Scienti c 1993:451{468. [BCKT94] Bshouty N, Cleve R, Kannan S, Tamon C. Oracles and queries that are suf cient for exact learning. Proc. 7th ACM Conference on Computational Learning Theory ACM Press 1994:130{139. See also the journal version [BCGKT95]. [BCGKT95] Bshouty N, Cleve R, Gavalda R, Kannan S, Tamon C. Oracles and queries that are sucient for exact learning. Journal of Computer and System Sciences 1996;52:421{433. [Gav92] Gavalda R. Kolmogorov Randomness and its Applications to Structural Complexity Theory. Doctoral dissertation, LSI Department, Universitat Politecnica de Catalunya, Barcelona, april 1992. [Gav94] Gavalda R. The complexity of learning with queries. Proc. 9th Structure in Complexity Theory Conference, IEEE 1994;324{337. 16

[Gav95] Gavalda R. Bounding the complexity of advice functions. Journal of Computer and System Sciences 1995;50:468{475. Preliminary version in Proc. 7th Structure in Complexity Theory Conference, IEEE 1992;249{245. [GW93] Gavalda R, Watanabe O. On the computational complexity of small descriptions. SIAM Journal of Computing 1993;22:1257{1275. [GS91] Goldberg A, Sipser M. Compression and ranking. SIAM Journal of Computing 1991;20:524{536. [Her96] Hermo M. Nonuniform complexity classes with sub-linear advice functions. Doctoral Dissertation, Universidad del Pas Vasco, Donostia, march 1996. [HM94] Hermo M, Mayordomo E. A note on polynomial-size circuits with low resourcebounded Kolmogorov complexity. Mathematical Systems Theory 1994;27:347{356. [IT89] Impagliazzo R, Tardos G. Decision versus search problems in super-polynomial time. Proc. 30th IEEE Symposium on Foundations of Computer Science, IEEE 1989;222{227. [Joh90] Johnson D. A catalog of complexity classes. Handbook of Theoretical Computer Science (Van Leeuwen, Ed.), Elsevier 1990:67{161. [Ko87] Ko K. On helping by robust oracle machines. Theoretical Computer Science 1987;52;15{36. [KL80] Karp R, Lipton R. Some connections between nonuniform and uniform complexity classes. Proc. 12th ACM Symposium on Theory of Computing, ACM Press 1980:302{ 309. [KV94] Kearns M, Valiant L. Cryptographic limitations on learning boolean formulae and nite automata. Journal of the ACM 1994;41:67{95. [Ko85] Ko K. Continuous optimization problems and a polynomial hierarchy of real functions. Journal of Complexity 1985;1:210{231. [KS85] Ko K, Schoning U. On circuit-size complexity and the low hierarchy. SIAM Journal of Computing 1985;14:41{51. [Kob94] Kobler J. Locating P/poly optimally in the extended low hierarchy. Theoretical Computer Science 1994;134:263{285. [Kob95] Kobler J. On the structure of low sets. Proc. 10th Structure in Complexity Theory Conference, IEEE 1995;246{261. 17

[KW95] Kobler J, Watanabe O. New collapse consequences of NP having small circuits. Proc. 22nd International Colloquium on Automata, Languages and Programming, Lecture Notes in Computer Science, Springer-Verlag 1995;944:196{207. [LV93] Li M, Vitanyi P. An Introduction to Kolmogorov Complexiy and Its Applications. Texts and Monographs in Computer Science. Springer-Verlag, 1993. [PV88] Pitt L, Valiant L. Computational limitations on learning from examples. Journal of the ACM 1988;35:965{984. [Sav72] Savage J. Computational work and time on nite machines. Journal of the ACM 1972;19:660{674. [Sch83] Schoning U. A low and a high hierarchy in NP. Journal on Computer and System 1983;27:14{28. [Sch86] Schoning U. Complexity and Structure, Lecture Notes in Computer Science, Springer-Verlag 1986;211. [Pap94] Papadimitriou C. Computational Complexity, Addison-Wesley 1994. [TB88] Tang S, Book R. Separating polynomial-time Turing and truth-table reductions by tally sets. Proc. 15th International Colloquium on Automata, Languages and Programming, Lecture Notes in Computer Science, Springer-Verlag 1988;317:591{599. [TB91] Tang S, Book R. Reducibilities on tally and sparse sets. RAIRO Informatique Theorique et Applications 1991;25:293{302. (This is an extended version of [TB88].) [Val84] Valiant L. A theory of the learnable. Communication of the ACM 1984;27:1134{ 1142. [Wat90] Watanabe O. A formal study of learning via queries. Proc. 17th International Colloquium on Automata, Languages and Programming, Lecture Notes in Computer Science, Springer-Verlag 1990;443:139{152. [Wat94] Watanabe O: A framework for polynomial time query learnability. Mathematical Systems Theory 1994;27:211{229. [WG94] Watanabe O, Gavalda R. Structural analysis of polynomial time query learnability. Mathematical Systems Theory 1994;27:231{256.

18