Lexical representations in spoken language comprehension

61 downloads 88 Views 950KB Size Report
The comprehension of a spoken utterance requires the listener to integrate together ... tions participate in language comprehension, it is necessary to capture in.
LANGUAGE AND COGNITIVE PROCESSES, 1988,3(1) 1-16

Lexical Representations in Spoken Language Comprehension William Marslen-Wilson Max-Planck Institute for Psycholinguistics, Nijmegen, The Netherlands and M.R. C. Applied Psychology Unit, Cambridge, U.K .

Colin M. Brown Max-Planck Institute for Psycholinguistics, Nijmegen, The Netherlands

Lorraine Komisarjevsky Tyler Department of Experimental Psychology, University of Cambridge, Cambridge, U.K . This study investigates the timing with which lexical representations are deployed at different levels of the language system, contrasting linguistic aspects of verb argument frames with their consequences for interpretation in the domain of non-linguistic, conceptual models. The experiment examined monitoring latencies to noun targets that were either normal with respect to the preceding verb, or which violated either pragmatic, semantic, or categorial constraints imposed by the verb’s argument frame and its associated co-occurrence restrictions. The results show that syntactic and semantic constraints derived from the verb have immediate effects on processing, and that there is also a very rapid projection of the thematic properties of verb argument frames on to non-linguistic domains of interpretation and inference, involving the listener’s mental model of the discourse.

I NTRODUCTION The comprehension of a spoken utterance requires the listener to integrate together a variety of different types of linguistic and non-linguistic knowledge. In particular, the linguistic properties of the utterance-its acoustic-phonetic, lexical, syntactic, and semantic properties-must in some way be mapped onto a mental model of the current discourse, taking into

Requests for reprints should be addressed to William Marslen-Wilson, M.R.C. Applied Psychology Unit, 15 Chaucer Road, Cambridge CB2 ZEF, U.K.

0 1988Lawrence Erlbaum Associates Ltd. & V.S.P. Publications

2

MARSLEN-WILSON, BROWN, TYLER

account the listener’s general non-linguistic knowledge of the world. It has become increasingly evident , from developments in linguistics, psycholinguistics, and computational linguistics, that lexical representations play a central role in this process of integration. Not only do lexical representations provide the basic bridge between sound and meaning, linking the phonological properties of word-forms with clusters of syntactic and semantic attributes, but also they provide the basic structural framework in terms of which the linguistic representation of the utterance is constructed, and in terms of which this linguistic representation is projected on to an interpretation in a mental model. In the research reported here we exploit the diverse properties of the argument frames associated with verbs, in order to examine the basic timing with which lexical representations are deployed at different levels of the comprehension process, and in order to assess the relative importance of these lexically based processes at each level. We begin with a brief summary of the linguistic and psycholinguisticbackground to this research.

Background The psycholinguistically most influential analysis of the properties of lexical representations is still Chomsky’s (1965) treatment of the cooccurrence constraints associated with a given lexical item. In a complex analysis of how these constraints could be expressed as part of the base component of a transformational grammar, he distinguished two major types of context-sensitive subcategorisation rules. Strict subcategorisation rules specify the categorial context in which lexical entries can occurstating, for example, that a verb like grow can be followed either by an NP, by an Adjectival form, or by nothing (in its intransitive use). Selectional rules further subcategorise lexical items in terms of the syntactic features of the items that can fill the categorial possibilities specified by the subcategorisation rules. By syntactic features, Chomsky meant properties such as [*Animate] or [+.Abstract], which in later treatments have usually been classified as semantic in nature (e.g. Jackendoff, 1972). All subsequent analyses of the properties of lexical representations have preserved, in one form or another, this fundamental distinction between the categorial properties of the verb argument frame and the semantic and syntactic properties of the items that can fill these argument slots. And, certainly, to develop a psycholinguistic model of how lexical representations participate in language comprehension, it is necessary to capture in some way the processing implications of such a distinction. When a verb-form is identified, what kinds of subcategorisation and selectional constraints are made available, and when and how do they affect the processing of the subsequent input? These are questions that have received

LEXICAL REPRESENTATIONS IN SPOKEN LANGUAGE COMPREHENSION

3

surprisingly little attention in psycholinguisticresearch. Scattered attempts were made in the early literature (e.g. Downey & Hakes, 1968; Fodor, Garrett, & Bever, 1968; Hakes, 1971; Moore, 1972) to explore some of the processing consequences of Chomsky’s proposals for lexical representation, but the results of these studies were contradictory and inconclusive. Recent developments in the linguistic analysis of lexical representations make their processing implications all the more salient for an on-line theory of language comprehension. Essentially every major initiative in generative linguistics over the past decade has served to move lexical representations into a central position in determining the syntactic and semantic properties of a string-indeed, as Wasow (1985) remarks, clause structure has come to be viewed as a projection of lexical semantics. For example, in both the Government-Binding approach (Chomsky, 1981; van Riemsdijk & Williams, 1986) and in Lexical Functional Grammar (Bresnan, 1982), lexical representations determine the basic relations between elements at both syntactic and semantic levels of the theory. GovernmentBinding achieves this through the combined effects of the Theta-Criterion and the Projection Principle, operating on the argument structures associated with verbs, whereas Lexical Functional Grammar assumes that lexical entries have a dual specification, pairing predicate argument structures with specifications of grammatical functions. The grammatical functions associated with lexical items map on to a syntactic constituent structure, whereas the lexical predicate argument structure determines the functional structure of the string, which in turn determines its semantic properties at the appropriate level of representation. What is important about these developments is the closeness with which they link the subcategorisation properties of a lexical item with its argument structure in a semantic representation. The argument frames associated with lexical items specify not only how these arguments might function in a purely syntactic representation of the string, but also in its semantic interpretation. And it is this semantic interpretation that determines, in turn, how the string is projected on to a discourse model. In effect, on this type of enriched account of lexical representations, the argument frames associated with verbs have consequences not simply for the linguistic analysis of an utterance, but also for the construction of an interpretation in the non-linguistic, conceptual domain (c.f. Carlson & Tanenhaus, 1987). But even in this current linguistic climate of emphasis on the importance of the lexicon, the on-line processing functions of lexical representations remain relatively neglected-especially in the auditory domain. A number of studies have begun to appear looking at lexical effects in the parsing of written texts. These range from questionnaire studies (e.g. Ford, Bresnan, & Kaplan, 1982), to studies using more on-line measures (e.g. Clifton,

4

MARSLEN-WILSON,BROWN, TYLER

Frazier, & Connine, 1984; Mitchell, 1987; Mitchell & Holmes, 1985; Tanenhaus, Stowe, & Carlson, 1985). All of these studies show that lexical “preferences”-the relative salience of different subcategorisation frames associated with the same word-form-have an immediate effect on subsequent structural processing. But in the auditory domain, as far as we are aware, there is no research which has looked explicitly at the on-line recognition of verb-forms and the activation of their associated argument frames. The research which seems most relevant to the current study is some research of our own which did not directly vary lexical variables at all. This was an experiment (Marslen-Wilson & Tyler, 1980) designed to track the time-course with which different types of processing information-globally defined as syntactic and semantic-became available as listeners heard a spoken utterance. The subjects monitored for word-targets at different serial positions in three kinds of prose material: Normal Prose, Anomalous Prose, and Scrambled Prose. Anomalous Prose differed from Normal Prose in having no semantic organisation while preserving syntactic and prosodic structure. Scrambled Prose differed from both in having neither syntactic nor semantic organisation. These three types of test materials were presented either as isolated sentences, or in the context provided by a preceding lead-in sentence. The pattern of monitoring responses across word positions for the three prose materials led to a view of sentence processing in which, from the first word of an utterance, the syntactic and semantic properties of the incoming speech are immediately interpreted in the context of the current discourse model (Marslen-Wilson & Tyler, 1980; 1981). These global claims about the properties of on-line language processing have a rather direct interpretation in the lexical domain we have been discussing and, certainly, the enriched functions that are being associated with lexical representations make it easier to see how the speech input can be interpreted as it is being heard. They imply that as a lexical entry is accessed, this immediately makes available information about its argument frame and places syntactic and semantic constraints on the types of elements that can fill these slots. These argument frames are specified in a way that has immediate consequences not only for the local syntactic relations of items in the string, but also for their thematic or functional roles. This means that the elements that fill these argument slots are assigned a value not only with respect to some putative level of syntactic organisation, but also with respect to a semantic and ultimately pragmatic interpretation. In the following section of the paper we describe an experiment designed to establish how far lexical representations do make available information or constraints that operate at these different levels, and the timing with which these come into operation.

LEXICAL REPRESENTATIONS IN SPOKEN LANGUAGE COMPREHENSION

5

Local Violation of Lexical Constraints In an earlier study (Marslen-Wilson & Tyler, 1980), the availability of different types of processing information was evaluated using global distortions of complete texts-as, for example, in the overall contrast between Normal Prose and Anomalous Prose. The problem with this technique is, first, that it rules out the possibility of looking at the processing information made available by any specific lexical item and, secondly, that it makes it difficult to decide exactly what are the types of information of which a global distortion is in fact depriving the listener (cf. Cowart, 1982; Marslen-Wilson & Tyler, 1983). Marslen-Wilson & Tyler (1980) presented the contrast between Normal and Anomalous Prose as a difference in semantic interpretability, but without systematically dissociating this from pragmatic interpretability. Furthermore, as Cowart (1982) points out, Anomalous Prose may have contained violations of syntactic constraints as well as of semantic constraints. These problems are avoided by looking at the local violation of lexical constraints in otherwise normal sentence contexts. Specifically, by holding the sentence constant, and varying the properties of the verb, we can determine how and when different aspects of the verb’s argument frame affect the processing of a spoken sentence. Take as an example the following set: la. lb. lc. Id.

John carried the guitar. John buried the guitar. John drank the guitar. John slept the guitar.

These are four identical strings, all with the same monitoring target (guitar), where the only variation is in the verb, and in the argument frames associated with the verb. In so far as these variations involve properties of lexical representations that are significant for on-line processing, then they will affect the listener’s response to the monitoring target. In sentence (la) the relationship between the verb (carried) and the target (guitar) is fully acceptable on syntactic, semantic, and pragmatic grounds. The subcategorisation requirements of the verb allow for a nounphrase as direct object, a guitar has the appropriate semantics for the action of carrying, and carrying a guitar is a perfectly reasonable activity in the context of a standard model of the world. Response times to targets in Normal contexts like (la) form the baseline condition for the experiment. Sentences (lb) and (lc) illustrate two grades of potential violation of the lexical representations evoked by the verb. In both cases the target NP (guitar) remains categorially appropriate-the verbs are transitive and

6

MARSLEN-WILSON, BROWN, TYLER

accept a nounphrase as direct object. Sentence (lb), however, constitutes what we will label a pragmatic anomaly, and contrasts with (lc), which constitutes a semantic anomaly. This, in effect, is the distinction between the linguistic and the non-linguistic aspects of the lexical representation of a verb. The anomaly-or the “oddness”-of “John buried the guitar” cannot be part of the linguistic specification of the semantics of the lexical items involved. In fact, it is not something that is likely to be pre-stored at all. The pragmatic oddness of burying a guitar is something that we have to infer, given our knowledge of the world, and given what we know about guitars, the likely effects of burying them, and so on. The first question we ask here, therefore, is whether such pragmatic violations affect the on-line response to the monitoring target-that is, whether responses are slower than in the normal condition. This will only be the case if the response in the normal condition reflects, at least in part, the pragmatic normality of the actions or events involved. And this, in turn, can only hold if the lexical representations associated with the verb have already begun to have consequences for an interpretation in a mental model of the discourse. This requires, we should remember, not only the construction of a linguistic argument frame, realising the structural constraints on the relationship between the verb and its potential arguments, but also the projection of the thematic consequences of these relations into the non-linguistic domain. The second type of violation, in (lc), can plausibly (though not necessarily) be treated as a violation of the linguistically specifiable properties of the lexical representations associated with a given verb-a violation, in Chomskian terms, of the selection restrictions on the semantic properties of the items that can fill the argument slots made available by the verb’s subcategorisation properties. It is plausible that the linguistic specification of drinking (that it involves liquids) and of guitars (that they are solid objects) is sufficient to make guitar anomalous following drink without having to invoke knowledge or operations outside the linguistic domain. That is why we refer to this as a semantic rather than a pragmatic violation. Our second question, therefore, is twofold: Are semantically anomalous targets responded to more slowly than normal targets, and are they also responded to more slowly than pragmatically anomalous targets? If the first of these holds, then this reflects the timing with which the semantic restrictions associated with argument frames start to become available, and begin to be integrated with the semantic properties of potential fillers for these slots. If the second of these holds, and semantic anomalies are more disruptive than pragmatic anomalies, then this tells us something about the relative significance, for the on-line analysis process, of these two types of disruptions (we will return later in the paper to the question of how far semantic and pragmatic anomalies can be kept qualitatively distinct).

LEXICAL REPRESENTATIONS IN SPOKEN LANGUAGE COMPREHENSION

7

Finally, we can also predict that if the original difference between Normal Prose and Anomalous Prose in Marslen-Wilson and Tyler (1980) was indeed equivalent, as originally intended, to violations of semantic cooccurrence restrictions, then the reaction time difference between Normal and Semantic targets here should be of the order of 55-60 msec. The third type of sentence, exemplified in (Id), differs from the other two in violating the subcategorisation frame associated with a given verb. A verb-form like sleep is subcategorised as an intransitive verb, and has no subcategorised argument slot into which a nounphrase like guitar can fit. This categorial violation should be highly disruptive of monitoring performance. The target nounphrase will be heard in the context of a verb argument frame to which it cannot be attached. This means that the listener cannot construct the appropriately configured structural object for projection on to any other domain of interpretation. The collocation of sleep and guitar simply has no interpretation, either semantically or pragmatically. This means that the monitoring decision that the word guitar is present can be based only on the bottom-up sensory input. If the distinction here between semantic and categorial anomalies is equivalent to the distinction in the earlier study (Marslen-Wilson & Tyler, 1980) between Anomalous and Scrambled Prose, then reaction times in the categorial condition should be on the order of 25-30 msec slower than in the semantic condition, and about 90-100 msec slower than targets in the normal sentences. In a final manipulation, over and above the four primary contrasts already described, we will investigate further the role of the discourse model in these lexical activation processes, by comparing responses to normal targets (as in la) when the test sentence coheres normally to its preceding discourse context as opposed to cases where there is no coherent relationship. When the utterance containing the target word cannot be readily mapped onto the scenario established in the preceding sentence, does this slow down responses? These are not, strictly speaking, local violations, but we include them here in order to maintain comparability with the earlier study, where we did find effects of the presence or absence of a discourse context for target-positions early in the test sentence.

METHOD Materials The test materials consisted of 40 sets of sentence pairs. Each pair within these sets was made up of a lead-in sentence and a continuation sentence. The monitoring target always occurred in the second sentence, placed in object position, immediately following the verb. Four kinds of continua-

8

MARSLEN-WILSON, BROWN, TYLER

tion sentences were constructed by varying the relationship between the verb and the target noun in the embedded verb-noun sequence. Within each item set the target noun was kept constant. This gave the following four anomaly conditions, together with one extra condition constructed by varying the congruence of the lead-in sentence with the continuation sentence: 1. Normal: Here the verb and the target noun form a natural sequence in standard Dutch subject-verb-object order (e.g. “The boy held the guitar”, where “guitar” is the target). 2. Pragmatic: Here the verb in combination with the target noun forms a possible, but pragmatically implausible real-world situation (e .g. “The boy buried the guitar”). 3. Semantic: Here the semantic properties of the verb are incompatible with the semantic properties of the noun (e.g. “The boy drank the guitar”). 4. Categorial: Here the verb forms a syntactically illegal combination with the target noun. This was done by choosing intransitive verbs that could not be followed by a noun in direct object position (e.g. “The boy slept the guitar”). 5 . Discourse Congruence: A fifth manipulation applied to the Normal continuation sentences only. Here the lead-in sentence was varied to give either a natural or an unnatural discourse linkage with the continuation sentence. The following example shows one complete stimulus set, with the subjects hearing either lead-in sentence (1) plus one of the four lead-in sentences (a)-(d), or lead-in sentence (2) plus continuation sentence (a). Note that the original materials were in Dutch (the set given here is for illustration only):

1. The nurses walk to their work each morning. 2. Christmas falls on a Friday this year. a. b. c. d.

They puss the beach on their way to the hospital. They measure the beach on their way to the hospital. They chew the beach on their way to the hospital. They yawn the beach on their way to the hospital.

Pre-tests The 40 sets of test materials were selected from a larger initial pool of 60 sets, on the basis of three types of pre-tests.

1. A predictability pre-test was run to ensure that the target nouns were

LEXICAL REPRESENTATIONS IN SPOKEN LANGUAGE COMPREHENSION

9

not easily predictable given the discourse sentences and the various pretarget verbs. Three sentence combinations were tested: the normal condition ( l a above), the discourse incongruence condition (2a above), and the pragmatic anomaly condition (lb above). A total of 36 subjects were tested in three groups, using a written cloze procedure, with items rotated across groups such that no subject had to predict the same target word twice. The subjects’ responses were scored on a 4-point scale. A scale value of 4 corresponded to responses unrelated to the target noun, 3 to responses related to the target noun, 2 to responses that were synonyms of the target noun, and 1to cases where subjects responded with the actual target noun. The overall mean predictability ratings for the 40 test sentences was very low for all three conditions-3.86 for normal, 3.91 for discourse incongruence, and 3.87 for pragmatic anomaly. 2 . A subcategorisation pre-test was run to ensure that the intransitive verbs used in the verb-noun sequences did indeed exclude the possibility of inserting nouns in direct object position immediately following the verb. The same 60 sentence pairs from the predictability pre-test were used, along with 30 fillers. The fillers contained sentences that did allow for the use of direct objects. Twelve subjects read the neutral lead-in sentence followed by the (d)-continuation sentence, up to and including the intransitive verb, and were asked to write down a continuation. They were also asked to indicate on a 7-point scale whether or not they thought the sentence pair-including their own continuation-was a natural one, both with respect to grammar and meaning. Responses were checked for the occurrence of direct objects. If direct objects were given along with a natural rating, the sentence pair was removed from the stimulus pool. Very few prospective stimuli failed this test. 3. In a naturalness pre-test, we tested the naturalness of the discourse linkage between the lead-in and the continuation sentences (as well as the unnaturalness of the linkage for the discourse incongruence pairs). Two groups of 12 subjects were given 60 typed sentence pairs containing an equal number of normal (la) and incongruent (2a) pairs. Subjects were required to give naturalness judgements on a 7-point scale, with value 1 representing “very unnatural”, and value 7 representing “very natural”. The mean rating for the final stimulus set was 6.35 for the normal pairs and 1.31 for the incongruent pairs.

Filler Materials In order to distract subjects from the specific construction of the test sentence pairs, 80 filler items were constructed. These also consisted of a lead-in sentence and a continuation sentence. Three kinds of variations were made on the filler items:

1O

MARSLEN-WILSON, BROWN, TYLER

1. Target position: The standard construction of the test continuation sentences is subject-verb-target noun. In order to prevent subjects from noticing the uniform position of the target noun, and the fact that all test targets were nouns, we vaned both the word type and the word position of the filler target. Filler targets could occur either in the lead-in sentence, in the beginning of the continuation sentence, in the same position as the test sentences, or late in the continuation sentence. 2. Anomaly of target: In the test sentence pairs, three out of the five combinations have an anomalous target noun (lb, lc, Id). This could lead subjects to believe that every anomaly is a target. To prevent this, most of the fillers contained anomalous words which were not the monitoring targets (as well as containing non-anomalous targets). 3 . Task: The experimental task for all test stimuli is identical monitoring. For the filler sentences two further tasks were used, namely rhyme monitoring and category monitoring, in order to make the test situation comparable to that used by Marslen-Wilson and Tyler (1980), and in order to reduce the possibility that the subjects would develop special listening strategies in carrying out the identical monitoring response. A total of 20 fillers were run using identical monitoring, 30 fillers using rhyme monitoring, and 30 fillers using category monitoring. This task distribution brought the total number of items presented using identical monitoring up to 60 (40 test and 20 filler items), balanced by 60 filler items on rhyme and category monitoring.

Practice Sentences A total of 25 practice sentences were also constructed. These practice items reflected all conditions and variations used in the test and filler sentences. A further 10 filler items were constructed to serve as start-up sentences to stabilise subject responses following the breaks after the practice session and half-way through the test sequence.

Recording The materials were recorded in a soundproofed booth, using a Revox A700 reel-to-reel tape-recorder, by a female speaker who was naive with respect to the purpose and background of the experiment. All sentences were read with a normal intonation pattern, without contrastive stress on either the target words or their preceding verbs.

LEXICAL REPRESENTATIONS IN SPOKEN LANGUAGE COMPREHENSION

11

Design and Procedure A total of 40 test sentence pairs were used under five conditions, along with 80 filler items, 10 start-up fillers, and 25 practice items. The 40 test sentence pairs were rotated by conditions over five experimental versions, with each experimental version having eight items in each condition. Each target word occurred only once in each version, and each subject heard an equal number of targets in each of the five test conditions. The five experimental versions were made by cross-recordingfrom a master tape.

Procedure The 40 subjects were tested in groups of 4. They were seated facing a projection screen on to which was projected the slide specifying the monitoring task for the next trial, and the appropriate cue word. Only the “identical monitoring” task was used for the test stimuli, so that the cue word was always the target itself. Thus, if the target was “beach”, the subject would see the words “IDENTICAL” and “BEACH’. All the test materials were presented over closed-ear headphones. Each trial began with a warning tone that also triggered the projection of the relevant slide. The sentence pair followed 3 sec later. The subjects pressed a response button as soon as they detected the monitoring target. They were instructed to respond as quickly as possible, to avoid guessing in advance what the target word could be, and not to try to correct any mistakes they heard in the sentences. A complete experimental session, including instructions and practice, lasted approximately 1 hour. Subjects were given a short break half-way through the test sequence.

Subjects A total of 40 paid student subjects participated in the experiment, of whom 20 were male and 20 female, and all were native speakers of Dutch. RESULTS AND DISCUSSION Eight subjects were run in each condition. There were 28 missing values due to machine and subject error. A further 26 extreme values were set to zero because they were well outside the range of the other data points in the condition in which they occurred. These 54 missing and extreme values (totalling 3.4% of the data) were replaced‘by the mean for the item in the condition in which they occurred. Table 1 gives the main results for the corrected data.

12

MARSLEN-WILSON,BROWN, TYLER TABLE 1 Item Means by Conditions (msec)

Normal

Incongruous

Pragmatic

Semantic

Categorial

241

235

268

291

320

Two separate one-way analyses of variance were computed, on subjects and on items. The main effect of conditions was highly significant (min F’(4,46)=16.678, P