Classi cation Using Information3

3 downloads 0 Views 277KB Size Report
(8x 2)[g(x) = 0]? ... If an answer of YES is given then guess YES (g 2 FS). ... natural numbers, 6 will denote a xed set of symbols such that f0;1g 6. N. 63 denotesĀ ...
Classi cation Using Information3 William Gasarch

y

Mark G. Pleszkoch

University of Maryland Frank Stephan

x

z

IBM Corporation Mahendran Velauthapillai

{

Georgetown University

University of Heidelberg

Abstract

Let A be a set of functions. A classi er for A is a way of telling, given a function f , if f is in A. We will de ne this notion formally. We will then modify our de nition in three ways: (1) Allow the classi er to ask questions to an oracle A (thus increasing the classi ers computational power). (2) Allow the classi er to ask questions about f (thus increasing the classi ers information access). (3) Restrict the number of times the classi er can change its mind (thus decreasing the classi ers information access). By varying these parameters we will gain a better understanding of the contrast between computational power and informational access. We have determined exactly (1) which sets are classi able (Theorem 3.6), (2) which sets are classi able with queries to some oracle 3 The

rst, second and fourth author presented a preliminary version of this paper on the Conference on Algorithmic Learning Theory 1994 in Reinhardsbrunn [5]. y Dept. of C.S. and Inst. for Adv. Stud., University of Maryland, College Park, MD 20742, U.S.A. Supported in part by NSF grants CCR-8803641 and CCR-9020079 (email: [email protected]). z IBM Corporation, Gaithersburg, MD 20879, U.S.A., (email: [email protected]). x Mathematical Institute of the University of Heidelberg, Im Neuenheimer Feld 294, 69121 Heidelberg, Germany, EU, Supported by the Deutsche Forschungsgemeinschaft (DFG) grants Me 672/4-2 and AM 60/9-1 (email: [email protected]). { Dept. of C.S., Georgetown University, Washington, D.C., 20057, U.S.A., (email: [email protected]).

1

(Theorem 3.2), (3) which sets are classi able with queries to some oracle and queries about f , (Theorem 5.2), and (4) which sets are classi able with queries to some oracle, queries about f , and a bounded number of mindchanges, (Theorem 5.2). The last two items involve the Borel hierarchy.

1 Introduction Let FS = ff j (9x)(8y)[y  x ) f (x) = 0]g: (FS stands for ` nite support'.) If you were given g(0); g(1); : : : you could never classify g with respect to FS , even in the limit. Even if you had access to K (or some other oracle) you could not classify g. The barrier to your classi cation is not computational, but is instead informational. By contrast, assume you could ask existential questions about g. Initially guess NO (g 2= FS ). Then ask the following questions until you get an answer of YES (this might never happen). (8x  0)[g(x) = 0]? (8x  1)[g(x) = 0]? (8x  2)[g(x) = 0]? ... If an answer of YES is given then guess YES (g 2 FS ). You have successfully classi ed g in the limit. We will see later that FS is dense and co-dense in the standard topology of the function space and that no dense and co-dense set is classi able. Hence you really needed that additional information. Let UK = ff j (8x)[f (x)  K (x)]g. (The `UK' stands for `Under K'.) If you were given g(0); g(1); : : : you could never classify g with respect to UK, even in the limit. However, if you had access to K , then you could classify g with respect to U K in the limit as follows: Guess YES until an x is spotted such that g(x) > K (x), at which point change the guess to NO (and never change your mind thereafter). Hence the barrier to classi cation is computational. When a class of functions cannot be classi ed it may be for either computational or information-theoretic reasons. Information-theoretic means that not enough information is available to classify. This is pinned down by topology; for the rest of the paper we will use the mathematically precise word `topological' rather than the intuitive word `information-theoretical.' 2

In the next section we de ne classi cation formally. We vary the amount of information the learner can access. To increase the model's ability to access information, we give it the ability to ask questions about the function. We also regulate the type of question by both restricting the query language and restricting the number of alternations of quanti ers a question can have. To decrease the models ability to access information, we will bound the number of mindchanges it may make. Carl Smith and Rolf Wiehagen [13] introduced a model of classi cation that is similar to the Gold model of learning [7]. The classi er M sees longer and longer initial segments of the graph of a function f . At each segment it guesses either YES (for f 2 A) or NO (for f 2= A). The guesses converge for each function f to the value M (f ) 2 fYES; NOg. M (f ) = YES means that f 2 A and M (f ) = NO that f 2 = A. In this model the classi er is limited in both computing power and access to information. In particular the learner is limited to Turing computability and initial segments of the function to be classi ed. Shai Ben-David [1] and Kevin Kelly [8] studied the same topic, but did not consider limitations of computational power.

2 De nitions and Notations In this section we formalize our notions. N denotes the set f0; 1; 2; : : :g of natural numbers, 6 will denote a xed set of symbols such that f0; 1g  6  N. 63 denotes the set of all nite sequences of symbols in 6. 6! denotes the set of all countably in nite sequences of symbols in 6. If  2 63 and f 2 63 [ 6! then   f means that  is a pre x of f . If ;  2 63 then  denotes their concatenation. We may use  1  for clarity. Throughout this section A denotes a subset of 6! and A denotes its complement; it can't be confused with the topological closure operation since the closure operation is not used in this paper. #A denotes the cardinality of a set A. De nition 2.1 A classi er is a recursive function M : 63 ! fYES; NO; DKg

(DK stands for DON'T KNOW). Our intention is that M is fed initial segments of some f and eventually decides if it is in A or not. Let f : N ! 6 be a function. M classi es f with respect to A if (1) when M is given initial segments of f as input, the resultant sequence of answers converges (after 3

some point there are no more mindchanges) (2) if f 2 A then the sequence converges to YES, and (3) if f 2= A then the sequence converges to NO. Note 2.2 In the above de nition we restrict a classi er to be a recursive

function that only has access to the function via initial segments. We will later allow classi ers to have access to oracles and/or be able to ask questions about the function. The type of classi er will be clear from context. M classi es A if, for every function f , M classi es f with respect to A. The class DE is the collection of all sets A such that there exists a classi er M that classi es A (DE stands for DEcision). We denote this by saying \A 2 DE via M ." Formally M is a function; however, we will often describe it as a process that continually receives values of f (in order) and outputs conjectures. Such a description can clearly be restated in terms of M being a function. The class DE[A] denotes decision relative to an oracle A and DE[all] is the collection of all classes DE[A]: A 2 DE[all] , A 2 DE[A] for some set A  N: The class DEc is the collection of all sets in DE that have classi ers that change their mind about each f at most c times. The initial change from DK to either YES or NO is not counted as a mindchange. We will mostly be concerned with DE[all] since we wish to study how much information is needed independent of computational resources.

De nition 2.3

We now de ne classi ers that can make queries. This is analogous to the query inference machines de ned by Gasarch and Smith [6]. De nition 2.4 A query language consists of the usual logical symbols (and

equality), symbols for rst order variables, symbols for every element of N, symbols for some functions and relations on N, and a special symbol f . A query language is denoted by the symbols for these functions and relations, A well-formed formula over L is de ned in the usual way.

4

Convention 2.5 Small letters are used for rst order variables which range

over N. All questions are assumed to be sentences in prenex normal form (quanti ers followed by a quanti er-free formula, called the matrix of the formula) and questions containing quanti ers are assumed to begin with an existential quanti er. This convention entails no loss of generality. The special symbol f will represent the function we are trying to classify. De nition 2.6 Let

be a query language. A query over L is a formula (f ) such that the following hold. i. (f ) uses symbols from L. ii. f is a free function variable and is the only free variable. We think of a query (f ) as asking a question about an as yet unspeci ed function f . If f is a function then (f ) will be either true or false. De nition 2.7 Let L be a query language. Informally, a classi er over L (usually just `classi er') is a total Turing machine that can ask questions about the recursive function f in the language L and by using the answers to these questions, eventually outputs 0 or 1 in the limit. Formally a classi er is is a total Turing machine M , which takes as input a string of bits  (the empty string is allowed), corresponding to the answers to previous queries about f , outputs rst one value M () 2 fYES; NO; DKg in order to indicate whether it at the moment guesses f 2 A and second a new question () in the language L. Our intention is that M is conjecturing whether f is in A or not and also generating the next question to ask about f . The de nition of when M classi es f with respect to A is straightforward but tedious (it is analogous to the de nition in [6]). L

De nition 2.8 Let

L be a query language. The class QDE[L] is the collection of all sets A such that there exists a classi er that classi es A and only asks queries that use the symbols in L. We denote this by saying \A 2 QDE[L] via M ." The class QDEa[L] is the collection of all sets in QDE[L] that have classi ers that change there mind about each f at most a times. The initial change from DK to either YES or NO is not counted as a mindchange. Furthermore QDE[all] is the union of all classes QDE[L] as L goes over all possible query languages.

5

All the query languages that we will consider allow the use of quanti ers. Restricting the applications of quanti ers is a technique that we will use to regulate the expressive power of a language. Of concern to us is the alternations between blocks of existential and universal quanti ers. De nition 2.9 Suppose that f 2 QDE[L](M ) for some M and L. If M only asks quanti er-free questions, then we will say that f 2 Q DE[L](M ). If M only asks questions with existential quanti ers, then we will say that f 2 Q DE[L](M ). In general, if M 's questions begin with an existential quanti er and involve a alternations between blocks of universal and existential quanti ers, then we say that f 2 Qa DE[L](M ). The classes QcDE[L] and QcDEb[L] are de ned analogously. 0

1

+1

Note 2.10 We use the notations DE[A] and QDE[L]. In the rst case the

is a set which we ask question to and in the second the language L is a language we express questions in. Note that in DE[A] we are allowing more computational power to the inference device and in QDE[L] we are allowing greater access of information. One of the points of this paper will be to compare computational to information.

A

3 Classi cation with Oracles The class DE[all] has various topological characterizations. In this section we present the main ones. De nition 3.1 The following two topological spaces are useful. i. F is the set of all functions from N to 6. We place a topology on it by letting the basic open sets be F = ff j   f g where  ranges over 63. ii. N is the set N. We place a topology on it by letting the basic open sets be N , ;, and and all sets of the form fy 2 N j y  xg with x 2 N. Theorem 3.2 A is in DE[all] i there is a continuous function F : F such that A = ff j F (f ) is odd g.

6

!N

Recall that F is continuous i the inverse image of every open subset of N is an open set in F ." Let M be an classi er which witnesses A 2 DE[A] for some oracle A. We can assume that M (;) = NO. Now let F () denote the number of mindchanges on input ; note that F () is even i M () = NO. Classifying each function f , M makes only nitely many mindchanges and thus F (f ) = limf F () exists for each function f . Now f 2 A i M converges on f to YES i M makes an odd number of mindchanges on f i F (f ) is odd. It remains to show that F is a continuous function from F to N . Let y 2 N, Uy = ff j F (f )  yg and f 2 Uy . There is a   f such that F ()  F (f ). By the de nition of F , F ( )  F () for all    and F (g )  F ( )  y for all g   . Thus the basic open set F is contained in Uy ; so Uy is the union of basic open sets; therefore Uy is open and F is continuous. For the other way round, let F : F ! N be a continuous function and A = ff j F (f ) is oddg. Now for each  let F () = minfF (f ) j   f g. F () is de ned since the natural numbers are well-ordered. Let y = F (f ). Since F is continuous there is a string   f such that F (g )  y for all g   . Therefore F ( ) = y for all  with     f and the classi er  YES if F ( ) is odd; M ( ) = NO if F () is even; decides A: If F (f ) is even then M converges on f to NO and if F (f ) is odd then M converges on f to YES.

Proof:

Corollary 3.3 Let A  6! . Assume A 2 DE[all]. Then (1) there is a  such that either F  A or F  A, and (2) the topological boundary @ A is nowhere dense. Hence FS = ff j (81 x)[f (x) = 0]g 2= DE[all].

Proof: Assume that A 2 DE[all] witnessed by a continuous F : F ! N . Again extend F onto the nite strings  2 63 by F () = minfF (f ) j   f g. Let  = . As long as possible nd an extension n  n with F (n ) > F (n). If this process never terminates, then F (f )  F (n)  n for the limit f of all n; but this contradicts the fact that F (f ) 2 N. Therefore the process stops for some n. Now F ( ) = F (n) for all   n and therefore F (g ) = F (n ) for all g  n . The basic open set F either belongs to A or to A. 0

+1

+1

n

7

This construction indeed provides such a basic open set above any given string. Thus each string  is extended by some  with either F  A or F  A. Thus f 2= @ A for all f   and @ A is nowhere dense. Since @ FS = F , FS 2= A. To see that every f : N ! 6 is in @ FS , note that for each   f , 0! 2 FS and 1! 2= FS , thus f is approximated by a sequence inside FS and an other sequence outside FS . So f is in the boundary of FS . Similarly one can show that B = f0! j  2 f0; 1g3g [ ff j (9x)[f (x)  2]g is not in DE[all] for 6 = f0; 1; 2g. @ B is nowhere dense since F1  B for all . So the rst two statements of the corollary are not \if and only if". Another topological characterization is based on the following observation: 2

Theorem 3.4 If A is open in F then A 2 DE1 [all]. S Proof: Since A is open, A = 2W F for some set

Without loss of generality we can assume that W = f j F  Ag. Now the classi er M given by  YES if  2 W ; M ( ) = NO if  2= W ; DE [W ] classi es A. If W is r.e., then even A 2 DE . An alternative proof | which only shows A 2 DE[all] | uses the topological characterization of Theorem 3.2: Let F be the characteristic function of A, i.e., F (f ) = 1 if f 2 A and F (f ) = 0 if f 2= A. Then the inverse images of the open set N is F , of the open set fx j x  1g is the open set A and of all other open sets is ;. Since ; and F are also open, F is continuous. So one might ask, how the class of all sets in DE[all] can be generated from the open sets. The answer follows from the following de nition: 1

W.

1

De nition 3.5 Let C be a collection of subsets of 6! . A is a the wellde ned symmetric di erence of C (denoted A = WDSD(C )) if A consists of

all f such that (1) f is contained only in nitely many sets B 2 C , and (2) fB 2 C j f 2 Bg has an odd number of elements. Note that if A is a Boolean combination of open sets, then it is also the WDSD of a nite collection of open sets. Further if A is a WDSD of a collection of open sets, then A is also a Borel set. But none of these two 8

implications have a reverse: ff j min(f ) is oddg is a WDSD of a collection of open sets but not the Boolean combination of nitely many open sets; FS is a Borel-set since FS is countable, but FS is not the WDSD of some collection of open sets. Now DE[all] has the following characterization: Theorem 3.6 A 2 DE[all] i A is the well-de ned symmetric di erence of some collection of open sets.

Let A 2 DE[all] be given and F : F ! N be the continuous function from Theorem 3.2 such that f 2 A , F (f ) is odd. Now let C = fUy j y 2 Ng with Uy = ff j F (f )  yg for y  1. All sets Uy are open and each f is in the nitely many sets U ; U ; : : : ; UF f . So A = WDSD(C ) and the \only if" direction holds. Now let A = WDSD(C ) for some collection C of open sets. Further let F (f ) denote the cardinality of the set fB 2 C j f 2 Bg. By de nition, F (f ) is odd i f 2 A. It remains to show, that F is continuous. Let Cy be the collection of all sets which are the intersection of at least y di erent sets from C . Then F (f )  y i there is some B 2 Cy with f 2 B. It follows that Uy = ff j F (f )  yg is just the union of all sets in Cy and so each set Uy is open. Therefore F is a continuous mapping from F to N . There is an e ective version of this theorem. This version works with basic open sets instead of open sets. This is needed since open sets can be highly nonrecursive, whereas basic open sets are recursive. Proof:

1

2

( )

Theorem 3.7 A 2 DE i A = WDSDfF j  2 W g for some r.e. set W , i.e., i A is the well-de ned symmetric di erence of an r.e. collection of basic open sets.

We establish the \only if" direction. If A 2 DE via some classi er M then let W = fa j M ( ) 6= M (a)g. We can assume, without loss of generality, that M () = NO (if M () = YES one has to add  to W ). W is even recursive. Since f 2 A i M makes an odd number of mind changes, f 2 A i there is an odd number of strings  2 W with   f , i.e., i f 2 WDSDfF j  2 W g. We now establish the \if" direction. Let A = WDSDfF j  2 W g for an r.e. set W . First W has to be replaced by a recursive set which is suciently similar to W . Let  ;  ; : : : be a recursive 1-1 enumeration of W such that

Proof:

0

1

9

jnj  n for all n; in order to achieve this condition, n = # is allowed. Now the set

= f j j j 6= # ^ j j   g is recursive and for each f the sets f 2 W j   f g and f 2 V j   f g have the same nite cardinality. Thus A = WDSDfF j  2 V g. Now N given by  YES if #f 2 V j    g is odd; N ( ) = NO otherwise (#f 2 V j   g is even); is a recursive classi er which classi es A. Shai Ben-David [1] found a further topological characterization based on the notion of countable unions of closed sets, the so called \F -sets". The next theorem is his. We proof it for completeness. Theorem 3.8 sets.

V

A 2 DE[all] i A and A both are countable unions of closed

Let A = ff j F (f ) is oddg for a continuous functionSF : F ! N . The sets Uy = ff j F (f )  yg are open and each satis es Uy = 2W F for certain sets Wy . The basic open sets are not only open, but also closed. The sets F 0 Uy are also closed. From the relation [ [ [ A = y ff j F (f ) = 2y + 1g = y 2W2 +1 (F 0 U y )

Proof:

y

2 +2

y

follows that A and similarly A are the union of countably many closed sets. For the other way assume that A = C [C [C : : : and A = C [C [C : : : are the countable unions of the closed sets C ; C ; : : :; further let F (f ) be the rst y such that f 2 Cy . Since the Cy cover the whole set F , F (f ) is de ned and F (f ) is odd i f 2 A. The function F is continuous since the sets ff j F (f )  yg = C [ C [ : : : [ Cy0 are open: each of them is the complement of a nite union of closed sets. The next theorem shows that DE[A] and DE[B ] have a simple relationship. 1

3

0

0

1

Theorem 3.9 DE[A]  DE[B ] i A T B .

10

5

0

1

1

2

4

The \if"-direction is clear, for the \only-if"-direction let DE[A]  DE[B ], 6 = f0; 1g, f = A and A = ff g. Now ff g 2 DE[B ] via some classi er M B . For all suciently long input   f the classi er outputs YES, by a nite modi cation one can obtain M B () = YES for all   f . A further modi cation gives, that M B makes at most one mind change: if M B ( ) = NO, then one can set M B ( ) = NO for all    since no g   is in ff g. Thus T = f 2 63 j M B () = YESg is a B -recursive tree with f being its only in nite branch. Therefore f T B and A T B .

Proof:

4 Arbitrary Query-Languages This section looks for relations between the number of quanti ers (allowed in queries) and bounds on mindchanges. Queries allow one to extract more information than just looking at initial segments. For example FS 2 Q DE [;] 0DE[all]. Most results in this section do not depend on a speci c query language. 2

0

Theorem 4.1 Q1 DE0[