Spatial Reasoning and Connectionist Inference

1 downloads 0 Views 394KB Size Report
Figure 1: There is a dark object right of the square and a large, light object below the circle. In 31] a ..... sensitive binary threshold units 25]. In the plane the units ...
$

' Spatial Reasoning and Connectionist Inference

Antje Beringer and Ste en Holldobler and Franz Kurfe

&

?

Forschungsbericht AIDA{93{02 Darmstadt, Fachgebiet Intellektik, Fachbereich Informatik Technische Hochschule Darmstadt Alexanderstrae 10 D{6100 Darmstadt Germany

%

Spatial Reasoning and Connectionist Inference Antje Beringer

Ste en Holldoblery

Franz Kurfez

Technical Report December 1992

Abstract

Intellectics, ie. Arti cial Intelligence and Cognitive Science, is an interdisciplinary eld whose goals are to understand and explain intelligence on the one hand and to develop computational models which show intelligent behavior on the other hand. The system presented in this paper is based on ideas taken from Automated Reasoning and Connectionism. The connectionist inference system Chcl is applied to solve the question whether a given sentence correctly describes the spatial relations of objects shown in a visual scene. The sentence is posed in relational form to a set of rules which de ne spatial relations among objects and a set of facts which describe the objects and their relations in the visual scene. These facts are not given in advance but are abductively inferred from an analog representation of the visual scene.

Area: Connectionist and PDP models; Automated Reasoning. Acknowledgements: A. Beringer is supported by the Deutsche Forschungsge-

meinschaft (DFG) within project MPS under grant no. HO 1294/3-1. F. Kurfe is supported by the German Ministery of Research and Technology within project WINA under contract no. 413-4001-01 IN 103 E/9.

Intellektik, Informatik, TH Darmstadt, Alexanderstrae 10, D-6100 Darmstadt, Germany, phone: [49](6151)16-5469, FAX: [49](6151)16-5326, Email: [email protected] y Intellektik, Informatik, TH Darmstadt, Alexanderstrae 10, D-6100 Darmstadt, Germany, phone: [49](6151)16-5469, FAX: [49](6151)16-5326, Email: ste [email protected] z Neural Information Processing Department, Faculty of Computer Science, University of Ulm, Oberer Eselsberg, D-7900 Ulm, Germany, phone: [49](731)502-4154, FAX: [49](731)502-4156, Email: [email protected]



1

1 Introduction Intellectics, ie. Arti cial Intelligence and Cognitive Science, is an interdisciplinary eld whose goals are to understand and explain intelligence on the one hand and to develop computational models which show intelligent behavior on the other hand [3]. In recent years the eld has developed in many sub elds and these sub elds are often developed as independent areas. Automated Reasoning and Connectionism are such sub elds. Originally, the goal of Automated Reasoning dating back to Aristotle and Leibniz was to formalize human thought. As this goal was recognized as being unattainable and as vonNeumann computers became available, much work in this area was directed towards the developement of automated theorem provers for certain logics. Although these systems show quite impressive computational power, they are often not adequate in the sense that they need a long time to solve problems which humans seem to solve almost immediately and e ortlessly. Connectionist and PDP models, on the other hand, have recently attracted much attention as they were applied to models of behavior which match psychological data, are biological plausible, and computationally ecient. These applications are mainly concerned with low-level cognitive tasks such as perception, motor control, associative information retrieval, or feature discovery. Considerably less experience has been gathered so far in modelling high-level cognitive tasks such as story understanding, learning of general rules, or knowledge representation and reasoning. Connectionist systems are heavily criticized as they are not yet able to handle inference, the main process of cognition [8], and Paul Smolensky [26] has argued that connectionist systems may well o er an opportunity to escape the brittleness of symbolic AI systems : : : if we can nd ways of naturally instantiating the power of symbolic computation within fully connectionist systems . Unfortunately, the relationship between automated reasoning and connectionist models has often been neglected. There are only few exceptions like the work by Gadi Pinkas [22, 21], who investigated the relationship between propositional reasoning and energy minimization, or J. A. Barnden [1], who has given a neural network implementation of syllogistic reasoning. L. Shastri and V. Ajjanagadde [25] even talk about the gap between the ability of humans to draw a variety of inferences e ortlessly, spontaneously, and with remarkable eciency on the one hand, and the results about the complexity of reasoning reported by researchers in arti cial intelligence on the other hand as the AI paradox. They present a connectionist model and claim that their model is a step towards resolving this paradox. However, a closer look at the logic behind this model might reveal that the problems solved by Shastri and Ajjanagadde's system can be solved by only applying reduction techniques within an automated theorem prover [13]. Chcl is another attempt to investigate the relationship between automated reasoning and connectionism [10, 12, 14]. Chcl is an inference system for Horn logic which is purely connectionist in the sense of [6], is based on a connectionist uni cation algorithm [11], and utilizes Bibel's connection method [2]. We know from Bibel [2] or Stickel [27] that a proof for a formula consists of a so-called spanning set of pairs of literals (occurring in this formula) if the pairs of literals are simultaneously uni able. For a given formula Chcl generates sequentially all spanning sets and uni es the pairs of literals in these sets in parallel. Chcl also contains various reduction techniques which are applied in parallel 2

and essentially reduce the number of spanning sets. In this report we show how a slightly modi ed version of Chcl can be used to solve spatial reasoning tasks as they occur in the miniature language acquisition project L0. L0 is a project to investigate the question how we could learn to describe what we see in a setting which uni es visual perception, language, inference, and learning [7]. The task is to construct a system such given a picture and a statement about the picture it decides | after a learning period | whether the statement is a true description of the picture. The pictures show simple two-dimensional scenes containing simple geometric objects. The statements are drawn from a nite corpus capturing (as closely as possible) the meaning of English descriptions. As an example consider Figure 1, which shows a typical scene accompanied by a true description.

 c  a

b

Figure 1: There is a dark object right of the square and a large, light object below the circle. In [31] a prototypical Prolog system is presented, which allows the user to experiment in the L0 domain. Scenes may be entered via a graphical interface and are translated into Prolog facts. The sentences, which are relational in nature, are encoded as Prolog queries, which then are answered with respect to a set of rules specifying graphical relations and a set of facts extracted from the scene. A rst attempt to give a connectionist semantics for the spatial relations has been made by S. H. Weber [29, 30]. Since the prohibitive combinatorics makes it impossible to simply tabulate all geometric relationships, Weber's idea is to employ an iterative search controller which provides a focus on certain candidate objects. For a xed number of objects her system is equipped with a copy of the geometric map and the various spatial relations are computed in parallel on these copies. The remaining objects and relations are not considered until the controller focusses on them. An open problem in Weber's approach is the question how such a controller can be modelled in a connectionist setting and this is one of the tasks considered in this report. The connectionist system presented here decides whether a statement is a true description of a visual scene. The objects in the scene are represented analogically as a pattern of activation in a matrix of units called plane. As only a limited number of objects is present in each scene, the objects are phase-coded as in [25]. The external activation of the objects leads to the recruitment of a reference unit for each object such that (after the recruitment) the excitation of the reference unit triggers the activation of the object in the plane and vice versa. The features of each object | ie. its shape, size, and shade | 3

are extracted using a trained multi-layer feed-forward network such that this information becomes available for Chcl as if it were presented as a Prolog fact. The rules de ning spatial relations like bel (T; L) ( rbel(L; R); in (T; R) (1) | expressing that a trajector T is below a Landmark L if there is a region R below L and T is in R | are represented as usual in Chcl1. The facts for the relations rbel and in are not explicitely represented, rather there is a dummy for the two relations which tells Chcl that there are facts for rbel and in . As described in the next paragraph these facts are inferred abductively as soon as they are needed. Finally, the sentence is posed as a Prolog query. As mentioned before, Chcl generates the spanning sets of a formula sequentially and thus, provides the controller for focussing on certain objects and spatial relations. Now, if the controller requires to check whether the object b is below the object c as shown in Figure 1, ie. if rule (1) is used, then the dummy for rbel and in is activated with L bound to c and T bound to b . With the help of the reference unit for c , the activation of the dummy will trigger a downward spreading of activation in the plane originating from the units representing c . If this activation reaches the units representing b , then we know that b is in the region below b and, hence, that the preconditions of rule (1) are satis ed. The report is organized as follows. In Section 2 the kind of spatial reasoning problems considered is presented. Section 3 contains a brief decription of Chcl. The main Section 4 describes the modi cations and extensions of Chcl necessary to solve the spatial reasoning problems. Finally, Section 5 discusses the results and possible directions for future work.

2 Spatial Reasoning Problems in L0 As mentioned in the introduction the problem to be solved is to decide whether a given statement is a true description of a given visual scene. The visual scene contains simple graphical objects. We are interested in their shape | square ( sq ), triangle ( tr ), or circle ( ci ) |, their shade | dark ( da ) or light ( li ) |, and their size | large ( la ), medium ( me ), or small ( sm ). The scene is given as a picture and we have to extract these features from the picture. It is not obvious at all how this can be done as experience and the context given by other objects and the statement have to be taken into account. In the example shown in Figure 1 we should extract the facts obj (a; sq ; da ; sm ) , obj (b; sq ; li ; la ) , and obj (c; ci ; li ; sm ) , where the constants a , b , and c are system generated internal names used to refer to the various objects. The spatial relations are de ned by universally closed rules of the form relation (T; L) ( spec region (L; R); in (T; R) 1

Throughout the report variables are denoted by words beginning with an upper case letter, whereas predicates and constants are denoted by words beginning with a lower case letter. All variables occurring in rules and queries are considered to be universally quanti ed.

4

(2)

or, equivalently, relation (T; L) _ :spec region (L; R) _ :in (T; R);

where relation is one of abv , bel , lo , and ro denoting that the trajector T is above, below, left or right of the landmark L , respectively. in denotes that T is in region R and spec region is one of rabv , rbel , rlo , and rro denoting the region R above, below, left or right of L , respectively. One should observe that Shastri and Ajjanagadde's limited inference system [25] as well as Lange and Dyer's Robin[17] cannot handle these kind of rules as the variable R occurs twice in the conditions of the rule and does not occur in the conclusion of the rule. To solve the example shown in Figure 1 we need rules for determining if a certain objects is right of (ro) another object or below (bel ) it. These are the rules ro(T; L) ( rro(L; R); in (T; R); bel (T ; L ) ( rbel(L ; R ); in (T ; L ): If such a rule is to be used, then its conditions have to be satis ed. This requires to determine regions in the visual scene and to check whether an object is in a certain region. In our example this requires to determine the region r1 right-of object b ( rro(b; r1) ) and the region r2 below object c ( rbel(c; r2) ), and to check whether object a is in r1 ( in (a; r1) ) and b is in r2 ( in (b; r2) ). One should note that if we look at the visual scene we cannot simply determine all regions and all relations concerning objects being in a certain region as there are just too many. The notions left-of, right-of, in, etc. are also not rigorously de ned. What precisely is the region left-of an object? How large must be the portion of an object that is in a region such that we consider the object as being in the region? We will refer to these questions in Section 4. A statement is a conjunction of atomic propositions describing objects and their spatial relations. L0-statements are quite simple English sentences and we may assume for the moment that they are in relational form. The sentence in Figure 1 can thus be expressed as the query 0

0

0

0

0

0

( obj (X; ; da ; ); obj (Y; sq ; ; ); ro(X; Y ); obj (U; ; li ; la ); obj (V; ci ; ; ); bel (U; V ) or, equivalently, as :obj ( da ) _ :obj ( sq X; ;

;

Y;

; ;

) _ :ro (

X; Y

) _ :obj (

U; ;

li ; la ) _ :obj (V ; ci ; ; ) _ :bel (U; V ):

Each occurrence of the symbol denotes a di erent variable, where we do not care about the possible bindings for these variables. One should observe that the connectionist inference systems proposed in [25] and [17] again cannot handle this kind of queries as unbound variables like the X occur more than once in the query. Logically, to determine whether a statement correctly describes a picture we must decide whether the query representing the statement is a logical consequence of the facts about objects and regions extracted from the picture and the set of rules de ning the spatial relations.

5

obj (X; ; da ;

:

)

obj (Y; sq ;

:

)

;

obj (U; ; li ; la )

ro (X; Y )

:

:

obj (V; ci ;

:

;

)

bel (U; V )

:

obj (a; sq ; da ; sm ) obj (b; sq ; li ; la ) obj (c; ci ; li ; sm ) in (T ; R)

rro (L; R)

ro (T ; L)

:

:

rro (b; r1 ) in (a; r1 ) bel (T

0

;L

rbel(c; r

0

)

0

:rbel(L ; R

0

)

in (T

:

0

;R

0

)

2)

in (b; r2 )

Figure 2: The formula representing the problem depicted in Figure 1, where the logical connectives are omitted. The lines show uni able pairs of literals. The set of pairs of literals shown by regular lines is a spanning set.

3 CHCL Chcl is a connectionist inference system for Horn logic [10, 12, 14]. From its expressive

power it can easily handle the kind of problems presented in Section 2 if all the rules and facts are given. Chcl is based on Bibel's connection method [2]. This method tells us that a proof of a formula is found, if there exists a spanning set of pairs of literals such that all pairs are simultaneously uni able. Figure 2 shows the formula corresponding to the problem depicted in Figure 1 with all necessary facts given. The curves show the pairs of literals which are uni able and, hence, which are potential elements of a spanning set. A spanning set can informally be de ned as follows. Each literal occurring in the query must be in the spanning set and, if the conclusion of a rule is in the spanning set, then each literal occurring in the conditions of the rule must be in the spanning set. It is easy to check that the regular curves in Figure 2 form a spanning set. Moreover, all pairs of literals in this set are simultaneously uni able yielding the answer substitution fX 7! a; Y 7! b; U 7! b; V 7! cg . There are other spanning sets obtained by selecting some of the dotted lines, but for none of these sets the pairs of literals are simultaneously uni able. Chcl contains a subnetwork to represent the pairs of literals which are potential elements of spanning sets. For each pair of literals l this subnetwork contains an output unit which becomes active i l is an element of the current spanning set. The subnetwork resembles a Jordan network [16] and sequentially generates the various spanning sets of a formula. As soon as a spanning set has been determined, all pairs of literals occurring in this set have to be simultaneously uni ed. This is done with the help of the connectionist uni cation algorithm presented in [11]. 6

shape shade size

markers

variables

ci sq tr da li smme la a b c r1 r2 X Y U V T L R T' L' R'

:obj (X; ; da ; ) :obj (Y; sq ; ; ) :ro(X; Y )

:obj (U; ; li ; la ) :obj (V; ci ; ; ) :bel (U; V )

obj (a; sq ; da ; sm ) obj (b; sq ; li ; la ) obj (c; ci ; li ; sm ) ro (T; L) :rro(L; R) :in (T; R)

rro(b; r1) in (a; r1) bel (T 0; L0 ) :rbel(L0 ; R0) :in (T 0 ; R0)

rbel(c; r2)

in (b; r2)

Figure 3: The term layer for the example shown in Figure 2. The terms of the literal next to each row or pair of rows are represented by the large dark squares . The medium sized dark squares represent the units activated by putting the term representations of pairs of literals occurring in the spanning set shown in Figure 2 on top of each other. The small dark squares are activated by the uni cation algorithm. Terms are represented in a so-called term layer using a role- ller representation as shown in Figure 3. Each element of the term layer is a selfexcitatory binary threshold unit. The eight units representing shape, shade, and size are referred to as the feature represent the terms in the original formula. If the vector. The large dark squares spanning set contains a pair of literals like h:obj (X; ; da ; ); obj (a; sq ; da ; sm )i , then the representations of the terms of these literals in the term layer are put on top of each other. This is simultaneosly done for all elements of the spanning set. In Figure 3 the medium sized dark squares show the result of this process for the spanning set de ned by the regular curves in Figure 2. Now, the terms are uni ed. The problems considered in this paper are such that terms are constants and variables only. Hence, uni cation is successful if multiple occurrences of the same variable are bound to the same term and two di erent constants do not appear at the same position. The bindings are represented as the activation pattern of the rows in the term layer. For example, from the activation pattern in the rst row of the term layer shown in Figure 3 7

we learn that the variables X and T both are bound to the constant a . The uni cation algorithm further activates units such that whenever a variable occurs in more than one row, then the activation patterns of these rows are recursively put on top of each other until eventually they become identical. This takes no longer than O(log (n)) time in the worst case, where n is the number of variables occurring in the formula. The small dark squares show the result of this process for the running example. Two di erent constants appear in the same position if in one line for the shape, shade, size, or references more than one unit is active. As this is not the case in Figure 3 all elements of the spanning set are simultaneously uni able and, hence, a proof is found. Note that an attempt to unify obj (X; ; da ; ) and obj (a; sq ; li ; la ) would result in the simultaneous activation of the units representing darkness (da ) and lightness (li ) for the two literals, which causes a failure of the attempt.

4 Spatial Reasoning with CHCL In this section we show how a slightly modi ed version of Chcl can be used to deal with the spatial reasoning problem of L0 . Chcl may be applied to uni cation problems given through the term layer together with the set of connections. Testing if a given description matches the situation represented by the geometric scene then means applying Chcl to the resulting uni cation problem. But in our case not all the facts needed to answer a query with respect to the given geometric scene are present in advance, i.e. the term layer is not completely given. These facts have to be generated. First of all the objects shown in the geometric scene are must be recognized. Then the necesary spatial relations will be inferred. This is done only by need because in general many of these relations are useless for the given task.

4.1 What we need

As in original Chcl, we consider only a xed query together with xed rules. Furthermore, we restrict the number of objects to be considered simultaneously. The patterns presented to the neural network represent planes with at most ! objects for xed ! and the query also doesn't involve more than ! di erent objects2. In addition, we have to x in advance how many copies of every given rule may be used. The size of Chcl's term layer depends on this xation. Initially, the activation of units in the term layer is not completely xed. So we have to reserve ! free rows in the term layer for the objects allowed to occur simultaneously in a geometric scene. These rows have to be \ lled" by a feature extraction component of the network3. As we allow ! di erent phases, in our system maximally ! spatial relations can be tested in parallel. So we need at most ! copies of each rule. For each copy of a rule which may be used during the inference process we need the following number of rows in the term layer: Be r ; j = 1 : : : n a rule with l literals in the body. Then we need j

j

As shown by several psychological studies on the human ability to concentrate on di erent things in parallel, ! = 7 seems to be a reasonable restriction [18, 25]. 3 \Filling " here means activating some units in the row. 2

8

1 + l rows for rule r . Altogether we need ! =1(1 + l ) additional rows in the term layer. In our case each rule has two literals in its body and one in the head. So for each (copy of a) rule we need three rows. These rows are \ lled" in advance de ning the rules that may be used. j

n

j

j

j

4.2 The idea

The main ideas of this approach are the following. The picture is represented as a pattern of activation in a matrix of units, called plane, where each unit represents a pixel (or whatever may be the atomic parts) of the picture. To distinguish the objects considered simultaneously, the objects are coded in di erent phases. So the matrix units are phase sensitive binary threshold units [25]. In the plane the units collectively representing an object are activated in the same unique phase. The di erent objects and their features must be extracted from the analogue representation. The object descriptions are stored in feature vectors as described in Section 3. As the feature vector units are selfexcitatory, they work like a memory for the geometrical scene. To establish the correspondence between the units representing a certain object in the plane and the associated object description stored in the term layer we use ! special phase sensitive reference units. These reference units are not present in advance but must be recruited in a special learning step. A recruited reference unit will be activated if the units of the corresponding object in the plane are active, and vice versa. Thus, by activating the reference unit via the feature vector, the analogue object representation can be retrieved and by activating the reference units via the analogue object representation, ie. the units in the plane, the internal name of the object and its features can be recovered. Consequently, the reference units serve like a memory for the analogue representation of objects. As described in the introduction, spatial relations between objects have to be inferred abductively. In particular, we have to consider regions in the plane with respect to a given landmark and we have to test whether a given trajector is in such a region. The regions will be identi ed by spreading of activation originating from the units representing the landmark. If this activation reaches the units of the trajector, the trajector will be said to lie in the speci ed region.

4.3 The representation of the plane

As mentioned above, the plane is represented by a matrix of binary threshold units of appropriate size. Then the input of a given geometrical picture containing several objects means activating certain units of this matrix. The idea is to lter out of this anlogue representation the di erent objects and their features, representing them in a feature vector, i.e. a set of units standing for the di erent shapes of the object's distinguishable features. Arranging the input units for the geometric scenes like a matrix means that every unit has a link to each of its direct neighbours. Depending on the de nition of spatial relations like \right of" and \above" and the kind of reasoning one wants to perform, di erent sets of units have to be considered to be the neighbours of a given unit. If, for example, 9

@

?

above

@ @

@

left

? ? ? ?

@ ?

? @

below

@ I @ I? @ ? @ I? @ ? @ I? @ ? @ ? ?

? ?

right @ @

@

I? @  6 ? @ I? @  6 ? R ? @ I? @  6 ? R ? @ 6 I? @  ? ? R @ @? ? @ ? R

I? @  6 ? @ I? @  6 ? R ? @ I? @  6 ? R ? @ 6 I? @  ? R ? @ @? ? @ ? R

I? @  6 ? @ I? @  6 ? R ? @ I? @  6 ? R ? @ 6 I? @  ? R ? @ @? ? @ ? R

 6? ?@?  6 ?@ R ? @?  6 ?@ R ? 6 @?  ? R ?@ @ ?@ R

Figure 4: Four di erent spatial relations and the necessary links between the matrix units

left above

above

left left below

right above

6 6  ? 6  ? 6  ? 

right below

right below

?

6 6 ? 6 ? 6 ? -

6 6 ? 6 ? 6 ? -

6 6 ?6 ?6 ?-

?

?

?

Figure 5: Eight di erent spatial relations and the necessary links between the matrix units

10

we want to divide the plane into four pieces denoted to be left of, right of, above and below a certain object, as shown in Figure 4, then we have to consider eight neighbours of each unit: the units directly on the left, on the right, above and below and the four units lying on the diagonals. If on the other hand we want to divide the plane into eight parts with respect to a given object as in Figure 5, then we have to consider only the four units directly near each unit. We will use the second possibility, i.e. consider only the units above, below, on the right and on the left as the direct neighbours of a unit. For simplicity we restrict ourselves to the simple spatial relations \left of", \right of", \below" and \above" and leave the compound relations. Each of the units in the plane is connected to its neighbours and these connections are controlled by phase sensitive direction gates. If, for example, the region below a landmark is to be considered, the gates on the connections going downwards from the units in the plane are opened through an activation of a special direction unit for this direction (see Figure 11). Now, if the units representing the landmark are activated, then this activation is spread downwards in the plane. If a compound relation like \left above" is in question, this could be inferred through rst opening the gates for the connections going upwards in the plane and then activating the gates on the \left" connections in the appropriate phase in a second step. One should observe, that the units in the plane, the direction units, and the direction gates are phase sensitive. They are able to distinguish ! di erent phases no matter if input or output is concerned. Hence, ! di erent regions can be generated simultaneously without giving rise to cross-talk. If we consider the compound relations too, then we have to construct the links between the direction units and {gates in a di erent way. But this doesn't a ect neither the construction of the rest of the network nor plane. A combination of the regions \left above", \above" and \right above" to one relation \above" would automatically generate the compound relations as then just the two simple relations had to be tested. In addition to the units representing the plane we need ! reference units to establish the correspondence between the units representing a certain object in the plane and the associated object description { its feature vector. The feature vectors consist of eight units representing the di erent instances of the features, several units standing for the variables used in the query and ! marker units for denoting the object, whose features are described. They belong to the term layer of Chcl and have to be extracted from the analogue representation by the feature extraction component of the neural network. The reference units are phase sensitive binary threshold units. Their recruitment can be done with a mechanism similar to the ones described in [4, 5]. The reference units are connected to every matrix unit via bidirectional connections with initially low weights which are adjusted phase dependently by Hebbian learning: If a unit of the input matrix is activated in the same phase as a certain reference unit, the weight of their bidirectional link will be increased. Simoultaneously the weights of links between reference units and matrix units where only one or none of the two units is active will be set to zero. This means a step of Hebb learning is executed to establish a strong connection between the reference units and their corresponding object. The weights of the links from the activated matrix units to the respective reference units are normalized in a second step to guarantee that the potential of the reference units never exceeds 1. The rule for adjusting the weights 11

then is wi;j

=    i

k

j

i

: k

where  is 1 if unit i is active and 0 otherwise.4 The links from a reference unit to the corresponding matrix units are not normalized but keep their weight of 1. The idea of viewing the reference units as a kind of memory for the geometrical scene may be supported by allowing a certain decay to the weights, such that forgetting is modelled. As the connections from the matrix units to the reference units are normalized, the thresholds of the reference units can be used to de ne the notion of when an object is said to lie in a certain region. A threshold of .7, for example, then means that a reference unit should be activated if more than 70% of the units representing the object re erred to are active. In other words, if at least 70% of an object are in a certain region, the object is said to be in this region. When adjusting the weights, a problem has to be taken into account: If the object is a light one , this means that only the units in the plane belonging to the edge of the object are active. Hence, if 70% of these units are activated, this does not mean that also 70% of the object are active. To avoid this, it could be reasonable to establish strong connections between a reference unit and the interior of an object, too. Moreover, if, for example, a very large object is below a very small object as shown in Figure 6, this is possibly not detected because the small object cannot cause the activation of 70% of the units of the large object. This could be prevented by means of a di erent de nition for the spatial relations (and therefore a di erent neighbourhood relation for the matrix units). But if we use for example a partition of the plane like in Figure 4 as suggested through the dashed lines in Figure 6, then for the light circle lying exactly on a diagonal it cannot be decided wether it lies in the region below or right of the dark circle. i

4.4 Feature extraction

The main part of the feature extraction component is a feed forward network consisting of an input layer constituted by the matrix units, one hidden layer, and an output of eight phase sensitive units. The output units represent a condensed version of the feature vectors, where the features of all objects are coded in parallel by a phase dependent representation. The correct weights of the connections are learned by backpropagation. Every matrix unit is connected to every hidden unit and every hidden unit has a link to every unit of the condensed feature vector (CFV). In the beginning the weights of these links are arbitrarily chosen and the network learns the correct weights by training with examples. The training period only involves the input matrix, the hidden unit layer and the CFV. Di erent patterns of activation of the plane together with the expected output of the CFV are presented to the network. Then the weights of the links between the input and hidden units and the hidden layer and the CFV are adjusted by backpropagation of the output error. Thereafter the network should be able to recognize di erent objects in the plane together with their size and shade. The pattern of activation in the plane causes some units of the condensed feature vector (CFV) to be activated as shown in Figure 7 4

[9] has shown that this is possible by using a second input layer for the concerned units.

12

x

@ ?? ? ? @ ? ? ? ? ? ? ? ? ? ? ? ? ?

@ ? ? ?@ @ @

? ? ?

@ @

? ? ?

@ @

? ? ? ? ? ? ? ? ? ? ? ?

 @ @ @  @

Figure 6: We would like to conclude, that the square is below the small dark circle. But the small dark circle cannot cause the activation of 70% of the units of the square, if the partition of the plane into eight regions is used. If the plane is partitioned into only four regions, it is not clear, whether the light circle is right of or below the dark circle. for our example. As each reference unit is activated in a unique phase, they can be used to divide the CFV into at most ! separate feature vectors (see again Figure 7). The rows of the feature vectors in the term layer additionally contain ! marker units representing the internal names of objects. These are necessary, because after the spanning set has been generated variables occurring in the query are instantiated with markers, naming the objects that have to be considerd during the abductive inference of missing facts. The object features of a given picture are extracted in the following way. 1. The geometric scene in question has to be presented to the net as a pattern of activation of the units in the input matrix. phase 1

rr r r rr rr

phase 2

ci tr sq li da sm me la a b c

r

r r

r

phase 3

CFV obj(a,sq,da,sm) obj(b,sq,li,la) obj(c,ci,li,sm)

Figure 7: The activation of the matrix units for the example plane with the resulting condensed feature vector CFV and the derived feature vectors. 13

 1.5 1 *   

HH HH j H

1



1

Figure 8: A gate unit 2. This causes the condensed feature vector (CFV) to be instantiated, i.e. some of the corresponding units are activated in one ore more phases. Furthermore a reference unit for each object is recruited. 3. The CFV is divided into the di erent term layer components. Additionally a xed marker is assigned to each of this term layer vectors by extending them by ! marker units representing di erent constant names. In every of the new feature vectors one of these marker units is activated. All this is performed through the reference units. 4. After these steps the features describing the objects in the plane are coded in the term layer and can be used as if they had been given in advance. Until here there is no di erence to giving the facts for the objects together with the query and rules in so far as these facts have to be stored all the time independent of the requirements of the query. Before describing these steps in detail, we show the construction of the part of the neural network dealing with dividing the CFV into the feature vectors of the term layer. For the moment we leave all details, which are unnecessary for the feature extraction, because the same network is used to compute the spatial relations of objects and therefore contains many more links and units than shown at this point. In the following gures phase sensitive binary threshold units are represented by circles or squares (in the plane) labeled with a  . The phase sensitive units of the CFV are represented as squares in a row as well as the (non phase sensitive) units of the term layer rows. Modi ers are used as phase independent inhibitors and are represented by little dots touching the concerned connection. Phase sensitive gates control connections within a certain phase. They become active (ie. open) in a certain phase if they receive input from two units in that phase; hence, their threshold is 1.5 (cf. Figure 8). The gates are represented by little un lled circles on the connections which they control. If nothing else is said, the connections are all assumed to have weight 1, the thresholds are set to 0.5 and the output is either 0 or 1. This means that a unit or modi er becomes active if there is an unblocked connection from another active unit to this unit or it is excited externally.

14

   :::        @ 

     

  

@ @

phase units

@ @

c    @ R @ reference the plane  .. .. units a  .  represented by .    matrix units   C 1   C    b H H  : : :   C HH C  HH     ? CW  j  H j H   CCW  )  k: : : k: : : khidden layer ORG e  XXX H @ H ?? C  ? R @ X ?X 9  zH X  j ? C ci tr sq li da s m l CFV  ? ? C  e ORG e ORG     C : : :    ? ? ? +  +  +  eee C T E R M L A Y E R

 

CW

+  + + a b c ci tr sq : : :

 ?

e?+e?+e?+ +  + +

:::

e?+e?+e?+ +  + +

:::

a b c ci tr sq : : :

'

a b c ci tr sq : : :

&

6

???

Figure 9: The feature extraction component. Not all units and links are shown in detail.

4.4.1 The feature extraction network

The feature extraction network is built up as shown in Figure 9. We use ! phase units to de ne the phases the di erent reference units are concerned with (we show only three of them). These phase units are assumed to be active in a certain phase during the recruitment of the reference units and their weight adjustment. This can be realized by external activation together with the activation of the matrix units or by using a WTAnetwork which assigns a certain phase to each phase unit. Each phase unit is connected to exactly one reference unit. The reference units are connected to every matrix unit via bidirectional links. The links between the matrix units and their direct neighbours are controlled by phase sensitive direction gates which are inactive for the moment and therefore closed. Furthermore each reference unit has a link to exactly one feature vector in the term layer. The link of a reference unit to a certain feature vector actually is a 15

unidirectional connection between the reference unit and one of the marker units. We may consider these links as a bijective mapping from reference units to marker units as every marker unit is only referred to once and also every feature vector is only referred to by one reference unit. The condensed feature vector has links to all feature vectors of the term layer which are reserved for the new objects. In detail every unit of the CFV is connected to all of these units, which are in the respective position, i.e. the unit for \dark" in the CFV is connected to the ! units for \dark" in the term layer and so on. All these links from the CFV to the term layer are unidirectional and controlled by gates. In addition to the direct link between a certain reference unit and one marker unit in the associated feature vector of the term layer, this (and only this) particular reference unit has links to all the gates which lie on the links between the CFV and this associated feature vector. This has the e ect, that the gates are only open in the activation phase of the reference unit standing for the respective object in the plane. Furthermore there are so called object recognition gates (ORG) controlling the links between the reference units and the feature vectors in the term layer. These gates are clamped, ie. open, for all phases only while feature extraction is performed and closed for all phases otherwise. This avoids unintentional in uences of the activation of reference units to the feature vectors after the object descriptions have been extracted from the matrix.

4.4.2 Extraction of the feature vectors

The recognition of objects in a given geometrical scene now proceeds as follows. First of all, the ORGs are externally clamped. Simultaneously, the pattern of activation representing the geometrical scene in question is presented to the input matrix. The important thing here is that the units which commonly represent an object are activated in the same phase and no other units are activated in this special phase5 . In addition the phase units have to be activated in di erent phases externally if this is not to be done otherwise. All these external activations have to be clamped only during the feature extraction period. As a consequence of activating the phase units a reference unit for each object is recruited and becomes active in a certain phase. The next step is now to recognize the features of the di erent objects. As all the units of the hidden layer and also the units of the CFV are phase sensitive, all objects can be considered in parallel. This is even necessary for the extraction of relative characteristics like size and shade. If a pattern of activation is presented to the network, it causes some units of the CFV to be activated. These units may be activated in more than one phase, but some restrictions always will hold: If we consider each of the three possible features shape, shade and size and the respective representative units, then there always will be the same number (at most ! ) of unit/phase pairs (X; Y ) indicating that unit X is active in phase Y , where each phase occurs at most once. Now we have a vector of units where the features of all the objects are encoded simultaneously. They are only distinguishable by the activation phase of the respective units. Consider again our introductory example plane with two squares and one circle as shown in Figure 7. The pattern of activation of the matrix units causes the feature vector to be instantiated as follows: The \square" 5

This means that the distinction between di erent objects has to be done before.

16

unit is activated in two di erent phases, whereas the \circle" unit is activated in a third phase and the \triangle" unit is not activated at all. With the construction described above, we are now able to divide this condensed representation up into ! separate feature vectors where each of them gets named with a marker6. To illustrate the spreading of activation from the CFV to the feature vectors of the term layer consider again Figure 9. We exemplary look at the units representing the shape of the objects. We assume the unit for \square" to be activated in the phases 1 and 2 and the unit \circle" activated in phase 37. Assume that the appropriate reference unit for phase 1, 2 and 3 be named a; b and c respectively. Then there are three feature vectors in the term layer where (among others) one of the ! marker units is activated: These are the marker units for a; b and c respectively, depending on the reference unit controlling their inputs. For simplicity the feature vectors will be referred to by the corresponding markers in the sequel. Consider feature vector a . Each of its units (except of the marker units) gets input from the corresponding unit of the CFV. But each of these links has a gate which is closed if inactive. These gates are now activated by the reference unit for a . This means that activation in phase 1 may pass the gates whereas all other phases are blocked. As the \square" unit of the CFV is active in phase 1, this activation will pass and activate the \square" unit of feature vector a . On the other hand the \circle" unit of the CFV is active in phase 3 and therefore its activation won't pass the gate. Because of the fact, that each phase occurs at most once in a unit/phase pair for each feature, it is impossible for two units describing di erent shapes of the same feature to be activated in the same feature vector. Now we may have some empty feature vectors which consist only of inactive units except for one marker unit (which in this case denotes an unemployed phase). Empty feature vectors will cause errors in the uni cation process because they are uni able with every term in the query belonging to an obj predicate symbol together with a variable standing for the name of the object. Consequently empty feature vectors must not be used in the process of detecting spanning sets of connections. To avoid this, an empty feature vector will cause the connections it is involved in to be marked \useless"8 . This can be achieved by using activating links from the marker units of the feature vector to the \useless" unit, which are modi ed by links from the other units of this feature vector. So if at least one of the other units is active, the corresponding connections are not marked \useless". After extracting the features of the objects, the task to be performed by the network is the following. 1. Generate a new spanning set S if there exists one; otherwise stop with NO. 2. Simultaneously unify the pairs of literals occurring in S . If this results in a failure Note that the following process is only performed if the ORGs are open, i.e. if we are in the feature extraction period. Everything before, until the initialization of the CFV, may also occur in later periods when matrix units become active because of internal demands, but without a ecting the term layer. 7 By the way this means that our example pattern presented as input consists only of three di erent objects. 8 cf. section 3 for a description of this mark.

6

17

go to 3. 3. Try to abductively infer the regions and related facts needed such that together with S they form a proof for the formula. If this is possible, then stop with YES; otherwise go to 3.

4.5 Computation of the spanning sets

Now we are at the point were spatial relations can be inferred. Consider the following problem: We take our example in Figure 1. We want to know whether the stated sentence is true with respect to the given geometrical scene or not. In PROLOG{like notation this is described as ?? obj (X; ; da ; ); obj (Y; sq ; ; ); ro(X; Y ); obj (U; ; li ; la ); obj (V; ci ; ; ); bel (U; V ): Recall that from the feature extraction component we obtain the facts obj (a; sq ; da ; sm ): obj (b; sq ; li ; la ): obj (c; ci ; li ; sm ):

Here the constant names denoting the di erent objects are arbitrarily chosen. Actually they depend on the current binding of reference units to phases during the feature extraction period. The computation of the spanning sets for a given query and rules together with the extracted object descriptions is explained in Section 3. One should observe that for each formula there are only nitely many spanning sets if we allow only a xed number of copies for each fact and rule. Thus, if none of the spanning sets constitutes a proof, a negative answer is signalled. A user may now provide additional copies. In contrast to the original Chcl system, the literals occurring in the conditions of a rule are not members of a spanning set as this will require the presence of facts about spatial relations. However, these facts are not known in advance, but have to be abductively inferred on demand. Chcl eventually generates the spanning set shown in Figure 2 except that the facts representing the spatial relations | ie. rro(b; r1); in (a; r1); rbel(c; r2) , and in (b; r2) | are not present yet. To answer the query should now result in testing whether ro(a; b) and bel (b; c) are true9. Here we extract only these facts explicitely demanded by the rules. This is done by activating the reference unit corresponding to the object in question, which itself causes activation of the respective units in the plane. These units now spread 9 In the original Chcl there are also facts for the predicates rlo and rbel respectively as well as for in , which have connections to the literals in the bodies of the respective rules. (If not, this simply means, that there is no spanning set.) The corresponding spanning set units would be activated and Chcl would try to unify the corresponding terms.

18

their activation into the direction which has to be considered and the activation of the respective units in the plane may cause the activation of a reference unit, if there is an object in the considered direction. We now have to test whether among the activated reference units there is one referring to the object denoted by the query. If so the connections of the rule are considered complementary, if not, the attempt fails. In the sequel we describe how this works in detail.

4.6 The inference of spatial relations

First of all we describe the construction of the part of the neural net performing the spatial inferences10. The abductive reasoning component is explained with respect to the example shown in Figures 1 and 7. After a successful uni cation of the pairs of literals occurring in the spanning set, it must be checked whether ro(a,b) and bel(b,c) are true. One should note that the spanning set units involving these literals and the conclusion of the rules for ro and bel , respectively, are activated. The rows in the term layer representing the conditions of these rules are partially instantiated to rro(b; R); in (a; R) and rbel(c; R ); in (b; R ) . In other words, for both relations we do know the landmark as well as the trajector. Figure 11 shows the rows for rbel(c; R ) and in (b; R ) with the units representing the markers b and c highlighted. The literals in the body of a rule share a special rule unit which is activated by the spanning set unit for the head of the rule (see Figure 11).11. The rule unit is the dummy establishing that the two literals in the body are not independent from one another as they share a variable not occuring in the head of the rule12. Moreover, it represents the assumption that facts satisfying the conditions of the rule can be inferred. The rule unit must trigger the inference process and, therefore, has three tasks, viz. to open the correct gates in the plane, to activate the units representing the landmark | and, thus, to activate the missing region |, and to check whether the trajector is in the region. The rule unit is phase sensitive and, hence, ! di erent spatial relations can be tested simultaneously. In addition we need some units denoting that a certain rule has been used. These are binary threshold units called failure units which are not phase sensitive. If the head of a rule is involved in a connection of the spanning set, then the associated spanning set unit for this connection is activated. As an extension to the original Chcl system this causes also the activation of the failure unit associated to this instance of the rule (each copy has its own failure unit). This failure unit is linked to the answer units of the Chcl system, i.e. if this link is not inhibited, the activation of the unit causes the NO{answer unit of Chcl to be activated and the YES{answer unit to be inhibited as shown in Figure 10. The idea is, that if a certain rule is used to compute the answer of a query, this causes the answer to be negative as long as it is not guaranteed that the rule can be applied. 0

0

0

We only describe the part not already shown in the previous section. These two parts have to be combined. So if for example we describe some links of the reference units to other units, one has to add the links described in the section before. 11In Figure 11 we only show the part of the spatial inference component necessary for the inference of bel(b,c) 12This is a case [25] and [17] exclude in their system 10

19

0

Yes

?@ ? @ :

5

?@ ? @ : No

5

1 6 1 661 ATS

m u u m H m 6

uuu

mfailure unit

Figure 10: The in uence of a failure unit to the answering component of CHCL If we can infer from the plane exactly the spatial relations required by the rule, then the activation of the respective failure unit is inhibited and the answer may be positive depending only on the remaining results. Furthermore we need four phase sensitive direction units which cause the corresponding direction gates in the plane for the corresponding direction to be opened for the activation phase of the direction unit. The direction units are named \up", \down", \left" and \right" respectively. Each of them has a link to all of the appropriate direction gates in the plane. This is illustrated in Figure 12. Abductive inference is now performed through the combination of several activating and inhibiting gates on the links between reference units, term layer units and the failure units shown in Figure 11. Everything apart from the direction units, the reference units, and the plane has to be considered as representing one copy of a rule. Hence, for each rule a copy of the remaining part of the network has to be provided, where the rule unit is connected to the appropriate direction unit. To describe the links between the term layer, the reference units and the failure units we consider the general form of the rules used in the L0 task. The rules are all of the form relation (X; Y ) : ?spec region (Y; Z ); in (X; Z ): where relation is one of lo ; ro; abv ; bel and spec region is one of rabv ; rbel; rlo ; rro . The principle of connecting the termlayer units for these rules to the reference{ and the failure units is the following: The marker units of each spec region row of the term layer have two outgoing links each. One inhibits the activation of the same marker unit in the corresponding in row. This is necessary to prevent the (faulty) inference of propositions like bel (a; a) . The second one serves for activating the corresponding reference unit. The marker units of the in row only have a single outgoing link to a gate between the respective reference unit and the modi er for the failure unit. In addition these links are modi ed by the corresponding marker units of the spec region row. 20

1     ..  P . spanning PP >   PP rule  set q unit unit  -  CHCL     A

down

A Q A Q a b c ? AQ Q   A Q QQ  ?   Q ?  A QQ    Q reference AU  sQ /   s Q + ? units  ?c  ? /    + b ? a                  Q  H  H    Y Y H k H  H H k Q Y 6 H Q     H HH Y H   Q  6 H  Q 6 HH  H  H H a bQc H 

c c c     failure unit   c c  c  c c c ss s s s s CHCL



the plane

:::

rbel

.. . term layer .. . :::

in

Figure 11: The spatial inference component necessary to infer bel (b; c) and the links from the direction units to the direction gates between the matrix units representing the plane. Not all links are shown.

21

left up   '      HH  H 6 HH d  $ jd ? 6?   ?  d  - d -  - dd-    6 dHYH6 &  -d ?  6 ? HH  H       %  

right

down

Figure 12: The links from the direction units to the direction gates between the matrix units representing the plane. If the head of a rule is involved in a connection of the current spanning set, then the spanning set unit activates the failure unit of this rule. In addition we use this unit to activate the rule unit which represents the two literals in the body of the rule13. This is necessary, because the two literals in the body are not independent from one another as they share a variable not occuring in the head of the rule14. To establish this somehow \hidden" dependence, the rule unit is activated in a certain phase. If a rule unit is excited in phase p , it causes the opening of the appropriate direction gates in the plane for phase p by exciting the corresponding direction unit in p . In the example shown in Figur 11 the down unit is activated and the gates on the connections going downwards from each unit are opened in a certain phase. As the marker c in the term layer row for rbel is activated, the rule unit excites the reference unit for c in phase p by opening the corresponding gates.

4.6.1 Inferring spatial relations If now bel (c; b) has to be tested, the following spreading of activation occurs: 1. To infer whether the trajector b is in the region below c , the rule unit opens the gates on the connections from marker units of the in row to the gate on the connection from the reference units to the failure unit in phase p . In general we need as much rule units for one rule as there are di erent variables occuring repeatedly in the body but not in the head of the rule. 14This is a case [25] exclude in their system

13

22

 ?? ???? c ? ????? ??????

a

??? ?? - -? - - - - - - - - - - - -     - -? - - - - - - - - - - - - ? ? ? ? ?                       - -? - - - - - - - - - - - - ? ?? ??      b ------??---????---????---??---------------------------------------    - -? - - - - - - - - - - - -                 ? ? ? ? ?                 - ??????

Figure 13: The spreading of activation triggered by the spatial inference component. Di erent phases are represented by di erent line types. 2. In the term layer vector for rbel the marker unit c is activated. This activation is propagated along the link to the reference unit for c . But on this link we nd a gate being activated in phase p . Consequently the reference unit c becomes active in phase p . 3. Now the units in the plane representing the object, c is referring to, also become active in phase p . 4. As the below gates are opened for phase p , all the units in the plane \below" this object become active. If more than 70% of the units belonging to an object lie in this area, i.e. are activated, then the corresponding reference unit becomes active in phase p . In our example the reference unit b would be activated. 5. If the gate on the link from the reference unit b to the modi er of the failure unit is opened, the output of the failure unit is blocked. 6. This gate must be open at least in phase p , which is only possible if there is an active marker unit b in the term layer vector for the in literal of the considered rule. In addition the gate controlling the link coming from the marker unit b must be open at least for phase p . This is the case for our example. Thus the failure unit can be inhibited, signifying that bel (b; c) is true. Figure 13 illustrates the spreading of activation in the plane. As di erent phases are used bel (b; c) and ro(a; b) can be tested simultaneously. The activation in phase 1 of the units representing object b does not spread to the right, because the \right" gates are only opened in the phase corresponding to the query ro(a; b) , say phase 2. As for satisfying the query bel (X; c) the reference unit X must be activated in phase 1, cross{ talk results like bel (a; c) are not inferred. This is also the case the other way round: Object c will not be activated in phase 2 as the \up" gates are closed for all phases. Hence, for example the (incorrect) inference of ro(c; b) is impossible. As we allow ! 23

di erent phases, to cross talk.

!

di erent spatial relations can be tested in parallel without giving rise

5 Discussion The goal of the system presented in the previous sections is to check the correspondence between a description of a situation in pictorial form and a description in natural language. Our main emphasis, however, lies on the reasoning component, and not on the image processing or language understanding part. Thus we will assume that the natural language description of the scene will already be available in a logic notation, e.g. Horn clauses. For the pictorial representation we use a backpropagation network which is trained to extract features like shape, size, and color from the pixel representation; at this point, no information about the relative or absolute position is preserved. The essential task for the reasoning component then is to take the query derived from the language input, identify the objects occuring in the query, and identify their spatial relations according to the query. Our design goal is to construct a system which is capable of reasoning, learning, and autonomous operation; it works in a massively parallel way, and physiologically plausible solutions have been used whenever suitable. In this setting a number of technical problems have to be solved:  an adequate internal representation has to be found;  the reasoning mechanism must be powerful enough;  the correlation between the objects and their features has to be preserved15  the description of features cannot be in absolute terms, and might use linguistic variables [32] to characterize their relative values  the variable binding problem;  the treatment of alternative solutions;  abductive reasoning via the generation of required facts on demand;  the concordance between the expression of spatial relations in language and the relative positions of objects in the visual scene. Our claim is that the system presented here achieves most of these goals and solves the technical problems in a prototypical way. The reasoning mechanism is essentially the same as in Chcl; in contrast to other connectionist inference mechanisms like Shastri and Ajjanagadde's [25], Chcl is powerful enough to deal with all the necessary aspects required here. Our system is capable of learning to associate objects and features; at 15

The experiment where people see a $-sign when a picture is presented brie y which contains a S-shape and a k in the right orientation but no $-sign shows that the human vision system can have problems here [28].

24

them moment, however, learning is not completely integrated into the overall system, and features have to be prede ned. The integration of the learning function can be achieved by using a di erent mode for the whole system in which a query is not considered as a question to be answered, but rather as a statement about a visual scene. In a similar way new features can be introduced; in the long run, an image processing component may be integrated which performs low-level feature extraction similar to visual processing in the brain [15]. The quest for autonomous operation obviously is relative: we certainly do not aim at a black box which receives visual input via a camera on one side and spoken language via a microphone on the other. It consists of a relatively small number of simple modules, without the need for a complicated inference engine as in Prolog or expert systems, and produces the requested result from two di erent sources, one an analogue representation of a visual scene, the other a symbolic representation of a sentence. The massively parallel operation of the Chcl part is quite obvious: it basically consists of a matrix of simple processing elements, and computation is performed by spreading activation, which can either be loosely or tighly coupled. The backpropagation network is structured in layers instead of a matrix, and accordingly has somewhat stronger synchronization requirements; the same functionality can also be achieved through nets with di erent architectures. As for the more technical problems, we have shown that the internal representation as used in Chcl can be applied without major problems to the speci c task of spatial reasoning. Physiological and cognitive plausibility can be considered on various levels: on a relatively high level, the treatment of alternative solutions in Chcl is sequential, which seems to be the same for reasoning tasks which require deliberate thinking in humans. On a lower level, there is no strong evidence that reasoning problems are represented in the brain as matrices mirroring the structure of predicates and terms. As a model, however, neural networks can be represented in a similar way as associative memories [19], which again have a strong similarity to the structure of the matrix used in Chcl. For the bindings between objects and their features [23] we use phase-sensitive units; this concept has a good plausibility and is used also in the reasoning system of Shastri and Ajjanagadde [25]. The use of linguistic variables is implicit in the learning process for the features and their values; in the present system, it is not supported from the language end, but can be incorporated. A solution for the variable binding problem as well as for the treatment of alternative solutions is already given through Chcl: the correspondence between a variable and its value is represented in the term layer of Chcl as shown in Figure 3, and alternative solutions correspond to di erent spanning sets. In contrast to conventional inference mechanisms this representation even allows for a straightforward realization of abductive reasoning: whereas in conventional systems all possible combinations of features and objects would have to be generated and stored as facts, our systems initiates the generation of a fact only when it is really needed in the reasoning process. In our present system, the four elementary spatial notions \up", \down", \left", and \right" are built into the system via direction units, which control the direction gates for a speci c ow of activation see Figure 12. These notions are used to compose the spatial relations used in the language part. In total, the system as presented here solves the problem of checking the correspondence 25

between a simple visual scene, and a simple natural language characterization of the spatial relations between objects in that scene. With a moderate enhancement of the basic Chcl mechanism, it is possible to extract the necessary information from the analogue representation of the scene, without a necessity to generate an excessive amount of facts describing all the spatial aspects of such a scene.

5.1 Future Work

Implementation As for future work our main emphasis is on extending and optimizing

the system as well as on the grounding problem. We would like to provide a mechanism such that rules can be learned, too. It should be possible to train the system such that spanning sets, which can be uni ed and whose missing spatial relations can be inferred, are generated with a certain priority. Furthermore it has to be investigated if the number of units needed, ie. the space complexity of Chcl can be reduced. At this point, the essential components of our spatial reasoning have been implemented using Icsim [24], or their implementation is quite straightforward (e.g. the backpropagation network for learning the features of the objects). It will be necessary, however, to integrate the components into a single autonomous system in a systematic way.

Inference Mechanism Although the power of the inference mechanism, Chcl, is suf-

cient for the purposes here, there are still some areas to be investigated. On one hand, some limitations have to be overcome, as for example the necessity to generate copies of clauses dynamically. On the other hand, some of the observations made while working on Chcl indicate relations to concepts found in conventional inference systems, e.g. the treatment of isolated connections [?] and the generation of facts in an abductive way. Another interesting area is the \fuzzi cation" of Chcl, which might be possible by using stochastic units in the term matrix, or by using cell assemblies [20] instead of single units. The features of the objects in the picture are essentially represented by linguistic variables, but these variables are not yet correlated to the feature values of the objects mentioned in the description. This involves some kind of non-monotonic reasoning as it might be necessary to abductively reject extracted feature values. For example, there is no reason why two objects should be called small and medium rather than medium and large, if this is not forced by evidence. Equally, it could be necessary to prefer calling an object small rather than medium as this better ts the requirements. Such a kind of \fuzziness" could be provided with an overlap technique. This means that the shape of an object feature, which cannot be determined uniquely, is not represented by a single unit but by a set of units which partially serve also to represent the other shapes of that feature.

Image Processing Our current approach for extracting features from the analogue re-

presentation of the visual scene goes immediately from the pixel representation to objects and their features like shape, size, shade. For relatively simple objects and scenes this is feasible, and it is sucient to demonstrate the functionality of the system. A more sophisticated mechanism would be based on the extraction of micro-features from the pixel plane in a way similar to biological mechanisms [15], and then put together the features 26

of the objects from these microfeatures. We will not devote a lot of e ort to this problem, but might be able to use concepts and mechanisms under development in related projects.

Language Understanding This is not our main area of interest, but maybe we can use insights from L0 . Anyway, the idea of L0 is strongly related to the problem of nding a de nition of the spatial relations to be considered. What is the meaning of above, below, etc. and when do we call an object small rather than large? These are questions which involve human psychology as well as linguistics.

References [1] J. A. Barnden. Neural-net implementation of complex symbol-processing in a mental model approach to syllogistic reasoning. In Proceedings of the International Joint Conference on Arti cial Intelligence, pages 568{573, 1989. [2] W. Bibel. Automated Theorem Proving. Vieweg Verlag, Braunschweig, second edition, 1987. [3] W. Bibel. Intellectics. In S. C. Shapiro, editor, Encyclopedia of Arti cial Intelligence, pages 705{706. John Wiley, New York, 1992. [4] J. Diederich. Connectionist recruitment learning. In Proceedings of the European Conference on Arti cial Intelligence, pages 351{356, 1988. [5] J. A. Feldman. Dynamic connections in neural networks. Biological Cybernetics, 46:27{39, 1982. [6] J. A. Feldman and D. H. Ballard. Connectionist models and their properties. Cognitive Science, 6(3):205{254, 1982. [7] J. A. Feldman, G. Lako , A. Stolcke, and S. H. Weber. Miniature language acquisition: A touchstone for cognitive science. In Proceedings of the Annual Conference of the Cognitive Science Society, pages 686{693, 1990. [8] J. A. Fodor and Z. W. Pylyshyn. Connectionism and cognitive architecture: A critical analysis. In Pinker and Mehler, editors, Connections and Symbols, pages 3{71. MIT Press, 1988. [9] S. Grossberg. Adaptive pattern classi cation and universal recoding i. parallel development and coding of neural feature detectors. In J.A. Anderson and E. Rosenfeld, editors, Neurocomputing: Foundations of Research, chapter 19. MIT Press, 1988. [10] S. Holldobler. CHCL - A connectionist inference system for a limited class of Horn clauses based on the connection method. Technical Report TR-90-042, International Computer Science Institute, Berkeley, CA, 1990. [11] S. Holldobler. A structured connectionist uni cation algorithm. In Proceedings of the AAAI National Conference on Arti cial Intelligence, pages 587{593, 1990. 27

[12] S. Holldobler. Automated inferencing and connectionist models. Postdoctoral Thesis, TH Darmstadt, 1992. (draft). [13] S. Holldobler. On the arti cial intelligence paradox. Submitted to the Journal of Behavioral and Brain Sciences as a commentary to [25], 1992. [14] S. Holldobler and F. Kurfess. CHCL { A connectionist inference system. In B. Fronhofer and G. Wrightson, editors, Parallelization in Inference Systems, pages 318 { 342. Springer, LNAI 590, 1992. [15] David H. Hubel and Torsten N. Wiesel. Die verarbeitung visueller informationen. In Gehirn und Nervensystem, Verstandliche Forschung, pages 123{134. Spektrum der Wissenschaft, Heidelberg, 9 edition, 1988. [16] M. I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the Annual Conference of the Cognitive Science Society, pages 531{546, 1986. [17] T. E. Lange and M. G. Dyer. High-level inferencing in a connectionist network. Connection Science, 1:181 { 217, 1989. [18] G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review, 63(2):81{97, 1956. [19] G. Palm. On associative memory. Biological Cybernetics, 36:19{31, 1980. [20] Gunther Palm. Cell assemblies as a guideline for brain research. Concepts in Neuroscience, 1(1):133{147, 1990. [21] G. Pinkas. Propositional non-monotonic reasoning and inconsistency in symmetrical neural networks. In Proceedings of the International Joint Conference on Arti cial Intelligence, pages 525{530, 1991. [22] G. Pinkas. Symmetric neural networks and logic satis ability. Neural Computation, 3(2), 1991. [23] I. Rock and S. Palmer. The lagacy of Gestalt psychology. Scienti c American, 263:48{61, 1990. [24] H. W. Schmidt. ICSIM: Initial design of an object-oriented net simulator. Technical Report TR-90-055, International Computer Science Institute, Berkeley, CA, 1990. [25] L. Shastri and V. Ajjanagadde. From associations to systematic reasoning: A connectionist representation of rules, variables and dynamic bindings using temporal synchrony. Behavioural and Brain Sciences, 1992. (to appear). [26] P. Smolensky. On variable binding and the representation of symbolic structures in connectionist systems. Technical Report CU-CS-355-87, Department of Computer Science & Institute of Cognitive Science, University of Colorado, Boulder, CO 803090430, 1987. 28

[27] M. E. Stickel. An introduction to automated deduction. In W. Bibel and P. Jorrand, editors, Fundamentals of Arti cial Intelligence, pages 75 { 132. Springer, 1987. [28] Anne Treisman. Merkmale und Gegenstande in der visuellen Verarbeitung. In Gehirn und Kognition, pages 134{154. Spektrum der Wissenschaft Verlagsgesellschaft mbH, Heidelberg, 1990. [29] S. H. Weber. Connectionist semantics for miniature language acquisition. International Computer Science Institute, Berkeley, CA, 1990. [30] S. H. Weber. A connectionist semantics for spatial relations. International Computer Science Institute, Berkeley, CA, 1991. [31] S. H. Weber and A. Stolcke. L0: A testbed for miniature language aquisition. Technical Report TR-90-010, International Computer Science Institute, Berkeley, CA, 1990. [32] L. A. Zadeh. Knowledge representation in fuzzy logic. IEEE Transactions in Knowledge and Data Engineering, 1, 1989.

29