Generating English Plural Determiners from ... - Semantic Scholar

0 downloads 0 Views 183KB Size Report
Abstract. In this paper, we present a model of grammatical category formation, applied to English plural determiners. We have identi ed a set of semantic features ...
Generating English Plural Determiners from Semantic Representations: A Neural Network Learning Approach Gabriele Scheler FG KI/Kognition Institut fur Informatik TU Munchen D-80290 Munchen [email protected]

Abstract. In this paper, we present a model of grammatical category

formation, applied to English plural determiners. We have identi ed a set of semantic features for the description of relevant meanings of plural de niteness. A small training set (30 sentences) was created by linguistic criteria, and a functional mapping from the semantic feature representation to the overt category of inde nite/de nite article was learned. The learned function was applied to all relevant plural noun occurrences in a 10000 word corpus. The results show a high degree of correctness (97%) in category assignment. We can conclude that the identi ed semantic dimensions are relevant and sucient for the category of de niteness. We also have the signi cant result that actually occurring uses of plural determiners can be accounted for by a small set of semantic features. In a second experiment, we generated plural determiners from textually derived semantic representations, where the target category was removed from the input. Because texts are semantically underdetermined, these representations have some degree of noise. In generation we can still assign the correct category in many cases (83%). These results can be improved in various ways. It is nally discussed, how these results can be applied to practical NLP tasks such as grammar checking.

1 Introduction In this paper we present a learning approach to the generation of an obligatory grammatical category, namely the category of de niteness/inde niteness in English. In any human language there is a number of grammatical distinctions which are speci c to that language and for which hard and fast rules concerning the use of the corresponding morphological or lexical expressions can not be determined. These grammatical distinctions comprise the language-speci c categories of tense, mood, aspect, cases, nominal classi ers and determiners, etc. They usually correspond to basic cognitive classi cations and the logic of these categories has in some cases been explored in great detail ([7]). However, relating the logical form of a sentence to its overt expression is a dicult and error-prone task, for which heuristics are hard to nd (cf. [2],[5]).

In this work, we show how an assignment function for de niteness of English plural nouns could be learned, which has been tested on a 10000-word corpus. The basic methodology consists of devising a set of semantic dimensions which correspond to the logical distinctions expressed by a certain grammatical category. In the case of de nite determiners, we have chosen the dimensions of givenness (i.e. type of anaphoric relation), of quanti cation, of type of reference (i.e. predication or denotation), of boundedness (i.e. mass reference or individual reference) and of collective agency. The di erent logical forms of the sentences can be represented by a set of sentential operators, which are de ned in rst-order logic. These sentential operators can be used as atomic semantic features, which are consequently sucient in representing the logical meaning of a sentence with respect to the chosen semantic dimensions. An empirical question that can be answered by generation experiments is whether the resulting semantic representations are indeed sucient for the generation of a particular grammatical category. Another question of practical importance is whether these semantic representations can be derived from running text by automatic methods and what results we can achieve in using automatically generated semantic representations for category assignment. The paper is structured as follows: In the next section, we discuss the semantic dimensions that have been used and their relation to logical formulations of cognitive content. Then we present the methodology and the results from the learning of a mapping function from semantic representation to the grammatical category. We show how we can extract semantic features from a surface coding of running text (without the target category) and present the results of the category assignment from this automatically generated semantic representation. We discuss the context of this research and applications to grammar checking and machine translation. The main results are summarized in the conclusion.

2 Semantic dimensions and features The goal in this section is to de ne a set of semantic features that capture the meaning of plural determiners in English and relate to cognitive operations and cognitive representations by way of logical de nitions. This approach has been explained in detail in [17]. The main idea is to `ground' atomic semantic features by treating them as unary operators in rst-order logic, and de ne them with reference to a logical ontology and cognitive primitives. Features can be ordered into dimensions of logically mutually exclusive features. In this paper, certain dimensions have been selected with the goal of nding a set of semantic features which is sucient to explain the uses of de nite/inde nite determiners in any text. The use of logical de nitions is threefold: - they provide an abstract description of the cognitive level - the basic conceptual primitives are identi ed - semantic features are grounded

First-order logic is seen as a tool for a meta-theory of cognition. It is presumably not an adequate cognitive representation per se. However, we believe that any cognitive representation must bear an identi able relation to its logical formulation. We do not present full logical forms or logical de nitions for semantic features here. For the most part, the logic of the semantic features that are used here has been de ned before (see below), and we can refer to that work.

2.1 Generalized quanti cation

An obviously relevant semantic dimension concerns the type of quanti cation of the noun phrase. According to the concept of generalized quanti cation [4], we distinguish between - num quanti er with an explicit quantity, e.g. four, ve etc. - some an unspeci ed quantity, which constitutes a small percentage - most an unspeci ed quantity, which constitutes a large percentage - all universal quanti cation, constrained with respect to the discourse setting, e.g. He talked to the ladies (the ladies are \all ladies" in the current discourse setting). - general universal quanti cation, unconstrained with respect to discourse, but pragmatically constrained (i.e. nearly all, in general, disregarding exceptions, e.g. Women know these things).

2.2 Anaphoric relation

The type of anaphoric relation concerns the contrast between given and new referents in discourse. There are also noun phrases which refer to objects not previously introduced explicitly, but which are textually implied. For a theory of discourse referents and co-reference relations we refer to [7]. In general, types of anaphoric relations are relative to the method of anaphoric solution. Co-reference can be realized through synonymic relations (snu -boxes { such banalities) or indicated syntactically (they were like duellists), or some combination thereof. A textual implication must be resolved by a speci c lexical knowledge base. For instance, part-whole relations or incorporated objects constitute indirect anaphoric relations. Examples for the rst type are nouns which designate a part of an object, such as end, limitation, corner, as in moustache with sti waxed ends, the limitations of the policeman mentality, the corners of his lips. Objects which are lexically implied in verbs occur in phrases such as say words to that e ect, state in words, or snap disdainful ngers. It is possible to represent the di erent types of anaphoric relation by a scalar value which gives a measure of the \distance" to an antecedent, e.g. number of relations traveled in a lexical knowledge base. However in the absence of a system for anaphoric relation assignment, a threefold distinction into new-given-implied may be used. { given noun phrase with a co-referring antecedent { implied noun phrase which refers to an object implied by a lexical relation { new noun phrase that introduces a new referent

2.3 Reference to Discourse Objects A basic distinction in the reference of noun phrases concerns the complementary notions of denotation and predication. Noun phrases may be used to pick out a speci c discourse object, or to introduce a new one, but they are often also used merely to designate a quality or property. An example for this distinction is She was one of the foremost writers of detective ction where writer is used denotationally (it co-refers with she), and detective ction is used predicatively, i.e. does not refer to a particular object. { denotation noun phrase that denotes an object term in discourse (e.g., He was walking about in the park) { predication noun phrase that denotes a property in discourse (where a property is a one-place relation of a discourse object) (e.g., It's more a park than a garden) This distinction, which is explicit in a logical representation, is often not as clear-cut in a real text. Dicult cases, taken from the text (s. 3.1), are for instance: 101. You'll nd Anderson there with the four GUESTS. 102. None of those people can be CRIMINALS. 103. MURDERERS look and behave very much like everybody else. 104. He suspected that these people were MURDERERS. 105. MURDERERS are often hearty. 106. Mere MEN being in charge, we've got to be careful. 107. 'MEN - MEN,' sighed Mrs Oliver.

Again, things might be easier, if we had a scale of discourse prominence: Discourse prominence could be either de ned with respect to the cognitive/ logical representation, i.e. the status of an object in the discourse representation as central, peripheral or not referring, if we had a corresponding theory of discourse objects. Or it may be a measure that is only practically de ned from textual analysis, i.e. using syntactic clues such as subject/object, use of descriptive adjectives, \be", numerals, determiners, possessive pronouns etc. Instead of two concepts, predication and denotation, we would get scalar or fuzzy concepts. For instance, in this case, 102 is 'very predicative', 101 is 'very referential', 106 is 'somewhat referential', 103 'mostly predicative' etc.

2.4 Boundedness Boundedness of reference is a phenomenon that is attested both in the verbal and the nominal domain (cf. [12], [8]). Mass reference or reference to substances can be de ned by invariance under partition (part of X = subset of X), while piecewise reference or reference to individuals does not allow partitioning without change of designation (part of X = constituting object of X).

{ mass reference to an unbounded quantity of one kind (e.g., a Lovely Young Thing with tight poodle CURLS) { pieces reference to a collection of individuals (e.g., Those dreadful policewomen in funny HATS who bother people in parks! )

2.5 Agentive involvement

Plural noun phrases which have agentive meanings can refer to a set of individuals, each of which performs an individual action, or to a set of individuals which collectively performs one action. A further discussion and full logical axiomatization of boundedness and agentive involvement can be found in (cf. [9], [10]). { collective a plural noun referring to set of individuals and a common action (e.g., The two girls sang a duet.) { distributive a plural noun referring to a set of objects and individual actions (e.g., Four people brought a salad to the party.) These features are primarily de ned for agentive constructions, where the plural noun phrase acts as a subject. They may be seen to carry over to other constructions, where an action is associated with a set of objects, such as She

wrote chatty ARTICLES on The Tendency of the Criminal; Famous Crimes Passionnels; Murder for Love vs. Murder for Gain. Here articles is collective, as all articles together are on the named subjects. Compare this to e.g., She has written the ARTICLES The Tendency of the Criminal; Famous Crimes Passionnels and Murder for Love vs. Murder for Gain. We have used the dimension of

agentive involvement in the extended sense here.

3 From semantic features to morphological expression The question that has been investigated by the rst experiment is the adequacy of a semantic representation for noun phrases which consists of the semantic dimensions and individual features given in section 2. In particular, we wanted to know how a functional assignment that has been learned by a set of linguistically chosen examples carries over to instances of the relevant phenomenon in real texts.

3.1 Method

In order to answer this question, we use the method of supervised learning by back-propagation, as implemented in the SNNS-system (cf.[20]). Supervised learning requires to set up a number of training examples, i.e. cases, where both input and output of a function are given. From these examples a mapping function is created, which generalizes to new patterns of the same kind. We created a small training corpus for typical occurrences of bare plurals and de nite plurals. 30 example sentences were taken from an English grammar

([18]). For each example, a semantic feature representation was created. This consists of a number of features from the list given above. For each semantic dimension, a feature value was set referring to the logical interpretation of the sentence. Sometimes a certain dimension was not applicable which resulted in a neutral value () for that dimension. Examples are given in Table 1. The symHe gives wonderful PARTIES.

new general predication pieces *

indef

The MUSICIANS are practicing a new piece. given all reference pieces collective

def

They were discussing BOOKS and the theater. new general predicative mass *

indef

Table 1. Examples from the training set: Sentences, semantic representations, and grammatical category

bolic descriptions were translated into binary patterns by using 1-of-n coding. The assignment of the correct output category consisted in a binary decision, namely, de nite plural or bare (inde nite) plural. We wanted to know how such a set of training examples relates to the patterns found in real texts. Accordingly, we tested the acquired classi cation on a narrative text, (\Cards on the table" by A. Christie), for which the rst 5 chapters were taken, with a total of 9332 words. Every occurrence of a plural noun without a possessive or demonstrative pronoun formed part of the dataset. Modi cation by a possessive pronoun (my friends), or a demonstrative pronoun (those people) leads to a neutralization of the inde niteness/de niteness distinction as expressed by a determiner. Generating possessive or demonstrative pronouns is beyond the goals of this research. As a result, there were 125 instances of de nite or bare plural nouns. For these test cases, another set of semantic representations was manually created.

3.2 Results The mapping from semantics to grammatical category for the example sentences could be learned perfectly, i.e. any semantic representation was assigned its correct surface category (cf. Table 2). This success in learning can be explained by the careful selection of semantic features which describe the relevant semantic dimensions of a surface category such as de niteness.

100% 97%

30

30

121

125

Learning

Generalization

(training set)

(test cases)

Table 2. Mapping from semantic representation to output category The learned classi er was then applied to the cases derived from the running text. Again, a high percentage of correctness (97 %) could be achieved (cf. Table 2). This result is remarkable, as we did not expect to be able to describe each occurrence of a plural noun by this rather small set of 5 dimensions and 15 features. Rather we had anticipated diculties for a number of examples which seemed somehow idiosyncratic or 'idiomatic' in their use. Examples that had been set aside as idiomatic were almost always assigned correctly. The few remaining misclassi cations have also been examined (s. below). They are due to stylistic peculiarities, as in 45 and 89. Also, two sentences involving numerals were not classi ed correctly. This has probably not been suciently covered by the training set. 45 INTRODUCTIONS completed, he gravitated naturally to the side of Colonel Race. given all predication mass collective

89 I held the most beautiful CARDS yesterday.

new some predication pieces *

94 He saw four EXPRESSIONS break up - waver.

implied num predication pieces distributive

118 Yes. That's to say, I passed quite near him THREE TIMES.

implied num predication pieces *

We also compared results from generalization of the special training set with generalization from a random selection of training cases from the actual text. The results are given in Table 3. We see that we need less examples to nd the mapping function from a semantic representation to morphological expression when we select a set of

100% 95%

76 98

78%

125

Training=30

80

Training=75

Table 3. Generalization from randomly selected text cases training examples by linguistic criteria than just picking random examples. An English grammar thrives to capture all uses of a speci c grammatical category irrespective of its frequency. This proves to be an asset in learning a mapping function that will generalize well to a real text. We get comparable results only when we use app. 50 % of the whole set for training. As the absolute numbers are small, this may not seem to be of much importance for practical NLP. However, when we consider learning the whole set of grammatical categories in a natural language, this nding may be of practical importance.

3.3 Discussion We have achieved to learn a generation function from semantic representations with remarkably few wrong assignments. The remaining problems with functional assignment which are due to stylistic variation are less than we expected, but they may go beyond an analysis in terms of semantic-logical features. It seems that a small number of examples is sucient to x the generation function properly. Accordingly, a learning situation of few example sentences which contains the most frequent and/or the decisive patterns may x the important points of the function. We can use in principle a sampling method (cf. [3]), or a linguistic principle of selection (which in the ideal case should be the same thing.) In this case, we have set up patterns of 15 binary features. Of these di erent features within a dimension are mutually exclusive, i.e. do not co-occur. On the other hand, sometimes a certain dimension is not used in the representation of a noun phrase. This makes a theoretical number of app. 750 di erent patterns, i.e. constellations of semantic features. This is a size that is beyond explicit rule

coding; consequently attempts to handle syntax-semantic mapping by rule creation did not achieve coverage of real texts beyond handling of selected examples ([6, 5]). Once a generation function has been learned, a function of this size may be implementable by a lookup table, at least if we just use binary features. Not all of the theoretically possible patterns may occur, and only a subset will be frequent. For instance, our sample of 125 patterns contains 59 di erent patterns. It should be stressed that learning of a binary classi cation function is not restricted to neural network approaches. Supposedly any pattern classi er such as decision trees ([11]), adaptive distance classi cation ([13, 14]), or statistical supervised clustering methods can do it. A comparison of di erent methods in this respect is beyond the scope of this paper.

4 Generation from noisy input For the goal of cognitive modeling it is interesting to look at the kind of semantic representations necessary to explain attested morphological categories and their use. For practical purposes, however, semantic representations cannot be manually created. They have to be derived from running text by automatic methods. This is a goal that is not easy to reach. First of all, texts are semantically underdetermined. They do not contain all the information present in a speaker's mind that corresponds to a full logical representation. Fortunately, these logical representations are highly redundant for the selection of a grammatical category, so that a noisy representation is often sucient for practical NLP tasks such as text understanding, machine translation or grammar checking. Secondly, there remains the problem of how to represent or code a text such as to derive a maximum of semantic information from it. In this paper we wanted to look at the possibility of using a neural network learning approach to syntax-semantics mapping for grammar checking, i.e. the automatic correction of the de niteness category in a running text. This could be a valuable feature in a foreign language editor. The main idea is to provide a slot-value representation of surface-accessible textual features and use that representation as input to the learning of the interpretation function. In this case, we have created surface representations for the core of a sentence (NP { VP { NP), or for nominal phrases with prepositional phrases attached. Both types constitute the immediate local context for the plural noun phrases, and both can be represented by the same slots and values. This approach means to leave out a lot of textual detail, and extract only speci c information into the surface representation which is needed for further semantic processing. We have called this way of doing deep textual analysis vertical modularity (cf. [16]) to distinguish it from the more usual approach of processing morphology, syntax, lexical semantics and sentential semantics in horizontal layers. In particular, we have used slots for

{ { { { {

head noun modi ers of the head noun predicate or preposition dependent noun modi ers of the dependent noun

Values in the slots were lexical classes for head noun, predicate and dependent noun (e.g., perceptual entity, physical object, body part, person,communication) taken from a small working dictionary, and grammatical classes for modi ers (e.g., adjective, numeral,demonstrative, singular, plural). Some examples are given in Table 4. 3 VOICES drawled or murmured.

perceptual entity * plural qu action * * * *

4 in aid of the London HOSPITALS.

event * singular indef prep institution desc adj plural qu

5 a Lovely Young Thing with tight poodle CURLS.

object desc adj singular indef prep body part desc adj plural qu

7 He wore a moustache with sti waxed ENDS.

body part * singular indef prep part desc adj plural qu

Table 4. Examples for surface textual coding In order to investigate the possibilities of grammar checking, we left out the de niteness category for the target noun phrase, i.e. substituted indef/def by qu. For this experiment we have used a set of 81 randomly chosen examples from the running text. This representation is fairly primitive in that it needs a xed length of inputs, incorporates a dictionary look-up for lexical items which is not supported by a full English dictionary, i.e. a full and consistent ontology, and uses only few of the syntactical, morphological and lexical information available. An improved method of surface coding is currently being investigated ([1]). As this experiment is expected to be a harder problem than the former one, we used cross-validation for learning rather than the usual distinction of training and test cases. Crossvalidation is a preferable statistical method for limited data sets (cf. [19]), it means that we learn 80 examples in each run and generalize to the remaining one. This is repeated 81 times so that we get generalization gures for all 81 cases. The results for learning and for generalization are given in Table 5. Data on 'noisy' assignments (between 1 { 4 errors) are also given.

100% 91% 83%

74 37

60%

3-4

28 3-4

25

23

1-2

1-2

11

3

81

correct

Learning

correct

Generalization

Generalization

including anaphoric relations

excluding anaphoric relations

Table 5. Mapping from encoded surface text to semantic representation

100% 92%

75

67

83%

81

Generation including anaphoric relations

81

Generation excluding anaphoric relations

Table 6. Generation from automatically derived semantic representations These results amount in an average of 4.1 errors per pattern, where 15 bits had to be set. When we use these noisy patterns as input to the learned generation function (cf. Table 6), we nd that we can still assign 83% correctly. This is due to the fact that the semantic representation is often redundantly explicit, not all of the features are needed to set an output category correctly. However it is clear from the surface coding, which is restricted to the local context of a single sentence that the dimension of anaphoric relation cannot be learned successfully. We excluded that dimension from the interpretation and generation task by annotating the surface representation with the correct anaphoric relation. Consequently, results improved as seen in Tables 5 and 6.

5 Applications in Multilingual NLP The main task of this paper has been to identify a set of semantic features for the description of the de niteness category in English and apply it to instances of plural nouns in a real text. A possible application to grammar checking has been spelled out in the former section. The results lead us to expect that with the development of a more sophisticated textual coding, we may have a practical tool for checking and correcting de niteness of English plural nouns. We may raise the question whether a direct pattern completion approach to grammar checking may not be equally promising. I.e. given a surface text encoding without the target category, we may try to learn the correct category as a classi cation of surface sentences. We have used the same surface encoding as in the previous experiment and provided the output category for the training cases. To make the results comparable, we have used cross-validation again. The best results were achieved with a hidden layer of 20 units (i.e. a 53-20-2 net). The results of this approach are given in Table 7.

100%

77%

62

54%

44

81

Learning

81

Generalization

Table 7. Determiner Selection as Classi cation of Surface Sentences We see that the results are approximately at chance level. Using the more complicated route of creating semantic representations rst and generating from them proves to be considerably better. Both approaches could bene t from a better textual encoding. Yet a representation that is created from an example set of a particular task will probably not be as good as a semantic representation that is task-independent and grounded in cognition. The results from all these experiments are not conclusive yet in deciding whether full semantic representations will eventually be better for practical applications than mere textual matching. We have formerly applied a similar approach to the interpretation and generation of verbal aspects in an interlingual approach, i.e. a machine translation

application (cf.[15]). In that case using a full semantic representation improved results by about 20% compared to a direct matching of source language surface representation and target language aspectual category. The work reported here can also be used for multilingual interpretation and generation. This is especially interesting for languages without nominal determiners, such as Japanese or Russian. In these cases other grammatical information that is provided in the surface coding, e.g. Japanese particles with topic/comment contrast combining the agentive/givenness dimensions and Japanese word order and nominal classi ers, can be used to set the semantic features of the intermediate, interlingual representation. Generation of an English determiner can then be handled by the unilingual learned generation function. The history of machine translation and message understanding has shown that mere surface scanning and textual matching approaches tend to level o as they have no capacity for improving performance beyond a certain percentage. In contrast, using explicit semantic representations which can be linked to cognitive models provides a basis for both human language understanding and practical NLP. Using semantic representations has additional advantages for interactive systems both for grammar checking and machine translation. The additional plane of semantic representation allows a system to assess the validity of a given decision and frame a question in other cases. Systems which require a high quality of performance will certainly have to incorporate an interactive component.

6 Conclusion Two main questions have been raised in the introduction: (a) Are semantic feature representations sucient in explaining the use of morphological categories in real texts? (b) Can semantic representations be derived from running text by learning methods and what performance do we get in generation from automatically derived semantic representations? Both questions can be answered armatively with some restrictions especially concerning the quality of the automatically derived semantic representation. Further work needs to be done on surface textual coding to improve the learning of the interpretation function. However, learning semantic representations is a signi cant improvement in eciency and performance over former rule-based methods. We have also compared this approach to a direct textual categorization approach for the task of checking the use of a determiner. The results of using the cognitively more adequate method including semantic representations were considerably better. The grounding of representations in logical semantics and cognition should be an asset in developing high-quality NLP systems. A major result of this research is that mapping from surface to semantic representation is a viable approach for practical NLP tasks when we use a learning method to create the mapping function.

References 1. Stefan Bauer. Entwicklung eines Eingabe-Taggers fur lexikalisch-syntaktische Information. Diploma thesis, Technische Universitat Munchen, Lehrstuhl Brauer, November 1995. 2. W. Brauer, P. Lockemann, and H. Schnelle. Text Understanding { The Challenges to Come. In O. Herzog and C.-R. Rollinger, editors, Text Understanding in LILOG, Springer, 1991, pp. 14{32. 3. Ido Dagan and Sean Engelson. Committee-based sampling for training probabilistic classi ers. (this volume), 1995. 4. P. Gardenfors, editor. Generalized Quanti cation. North-Holland, 1990. 5. O. Herzog and C.-R. Rollinger, editors. Text Understanding in LILOG, Springer, 1991. 6. Jerry R. Hobbs, Mark E. Stickel, Douglas E.Appelt, and Paul Martin. Interpretation as abduction. Arti cial Intelligence, 63:69{142, 1993. 7. Hans Kamp and Uwe Reyle. From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation. Studies in Linguistics and Philosophy. Kluwer, 1993. 8. Manfred Krifka. Nominalreferenz und Zeitkonstitution im Deutschen. Wilhelm Fink, 1989. 9. Godehard Link. First order axioms for the logic of plurality. In J. Allgayer, editor, Processing Plurals and Quanti cations. CSLI Notes, Stanford, 1991. 10. Godehard Link. Plural. In A. von Stechow and D. Wunderlich, editors, Handbuch Semantik. De Gruyter, 1991. 11. J.R. Quinlan. C 4.5 - Programs for Machine Learning. Addison-Wesley, 1995. 12. Gabriele Scheler. Zur Semantik von Tempus und Aspekt, insbesondere des Russischen. Magisterarbeit, LMU, Munchen, April 1984. 13. Gabriele Scheler. The use of an adaptive distance measure in generalizing pattern learning. In I. Aleksander and J. Taylor, editors, Arti cial Neural Networks 2, North Holland, 1992, pp. 131{135. 14. Gabriele Scheler. Feature selection with exception handling using adaptive distance measures. Technical Report FKI-178-93, Technische Universitat Munchen, Institut fur Informatik, July 1993. 15. Gabriele Scheler. Machine translation of aspectual categories using neural networks. In J. Kunze and H. Stoyan, editors, KI-94 Workshops. 18. Dt. Jahrestagung fur Kunstliche Intelligenz, 1994, pp. 389{390. 16. Gabriele Scheler. Learning the semantics of aspect. In Daniel Jones, editor, New Methods in Language Processing. University College London Press, 1996. 17. Gabriele Scheler and Johann Schumann. A hybrid model of semantic inference. In Alex Monaghan, editor, Proceedings of the 4th International Conference on Cognitive Science in Natural Language Processing (CSNLP 95), 1995. 18. A.J. Thompson and A.V. Martinet. A Practical English Grammar. Oxford University Press, 1969. 19. Sholom M. Weiss and Casimir A. Kulikowski. Computer Systems That Learn. Morgan Kaufmann, 1991. 20. Andreas Zell et al. Snns User Manual v. 3.1. Universitat Stuttgart: Institute for parallel and distributed high-performance systems, 1993.