Theoretical Note - APA PsycNET - American Psychological Association

0 downloads 17 Views 742KB Size Report
What's New in Speech Perception? The Research and Ideas of William Chandler Bagley, 1874-1946. Ronald A. Cole and Alexander I. Rudnicky. Department of ...
Copyright 1983 by the American Psychological Association, Inc. 0033-295X/83/9001-0094S00.75

Psychological Review 1983, Vol. 90, No. 1,94-101

Theoretical Note What's New in Speech Perception? The Research and Ideas of William Chandler Bagley, 1874-1946 Ronald A. Cole and Alexander I. Rudnicky Department of Computer Science, Carnegie-Mellon University

At the turn of the century, William Chandler Bagley published the results of a 2-year investigation into the relation between sound and meaning in human speech perception. Using Edison phonograph cylinders, Bagley presented subjects with spoken words, either individually or in sentences, that had been pronounced with a missing consonant sound. The subjects, who were instructed to report only what they had heard, often restored words to their original form; that is, they heard the words as if they had been spoken correctly. Restorations were determined by the position of the missing sound in the word and the position of the word in the sentence. The pattern of results observed by Bagley and his conclusions about human speech perception find remarkable parallels in contemporary psycholinguistics. For example, Bagley explains his results in terms of the critical role of context in speech perception and the sequential use of sound in spoken-word recognition. The present article describes Bagley's research and compares some of the main results to those obtained in more recent experiments. It is concluded that many of the most important insights about spoken-word recognition were first offered by Bagley in 1900-1901. At the end of the 19th century, William Chandler Bagley,1 a student of Titchener's at Cornell University, conducted what is almost certainly the first large-scale experimental investigation of the psychology of language. The article reporting this research, "The Apperception of the Spoken Sen-

tence: A Study in the Psychology of Language," appeared in the 1900-1901 volume of The American Journal of Psychology. While reading this article, we were fascinated to discover that the results reported by Bagley anticipate some of the most interesting phenomena

We would like to thank Zinny Bond of Ohio University for first bringing our attention to W. C. Bagley's paper. Requests for reprints should be sent to Ronald A. Cole, Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pennsylvania 15213.

ucation and director of the School of Education at the University of Illinois. In 1917, he became professor of education at Teachers College, Columbia University, where he remained until his retirement in 1940. Bagley authored a dozen books on educational topics as well as a series of textbooks on American history. He is remembered chiefly for championing the equality of educational opportunity and for his vigorous opposition to racist theories of intelligence and educability. He was also known for his emphasis on educational excellence and intellectual discipline, a concern that was expressed in his involvement in the Essentialist movement, a backto-basics movement of the late 1930s. As far as we could determine, Bagley never returned to his interest in word perception after leaving Cornell University. Judging from his accomplishments in education, we must conclude that it was a loss for psychology. (The information in this biographical sketch was compiled from several sources. The bibliography given in Brickman, 1978, can serve as a point of departure for those wishing to know more about Bagley.)

' William Chandler Bagley was born on March 15, 1874, in Detroit, Michigan. He died on July 1, 1946, at the age of 72. Bagley's early life was spent in Michigan and Massachussets; in 1895 he obtained a bachelor's degree from the (now) Michigan State University and went on to teach in a rural school. In 1898 he obtained a master's degree from the University of Wisconsin and subsequently went to Cornell University, where he obtained a doctorate in "psychology, neurology, and education" in 1900. Thereafter, he went into education, serving as a school principal, a teacher at a normal school, and a school-system superintendent. In 1907 he began teaching at the (then) State Normal School in Oswego, New York; soon thereafter he became professor of ed-

94

THEORETICAL NOTE

reported in the speech-perception literature over half a century later.2 This article describes some of Bagley's experimental findings and compares them to results obtained in more recent experiments. Bagley describes his research as "an attempt to determine the nature and relations of the factors which are involved in the perception of spoken symbols and in the apperception of their meaning" (p. 82). Two experiments are reported. Experiment 1 is an empirical investigation of the effect of context on word recognition, whereas Experiment 2 uses introspection and protocol analysis to investigate the conscious processes by which sounds are perceived as meaningful symbols. This article deals mainly with Experiment 1. Bagley's Experiment The experimental procedure involved the presentation of "mutilated" words—words that were spoken with a missing consonant sound. According to Bagley, "our method involved the elision of consonants, and a determination of the accompanying effect of the word upon the observer" (p. 87). The mutilated words were recorded on Edison phonograph cylinders and presented either individually or in sentences to subjects who were instructed to repeat what they had heard. The mutilated words consisted of (a) monosyllabic English words pronounced without an initial consonant segment, (b) monosyllabic English words pronounced without a final consonant segment, and (c) a group of predominantly polysyllabic words spoken without a medial consonant segment. In order to control for the effect of vowel environment on the deletion of consonants, Bagley "attempted to combine every consonant with every vowel; i.e., to represent each consonant in every series by as many different words as there are different vowel sounds" (p. 88). For example, the consonant /m/ was omitted in the medial position from the following words: a(m)iable, sla(mm)ing, e(m)otional, ele(m)ents, gli(mm)ering, rhy(m)ing, mo(m)ent, no(m)ination, loo(m)ing, calu(mn)y, and wo(m)an. The words and sentences were recorded in Bagley's voice. Words spoken in isolation were recorded on one set of cylinders and words in sentence contexts were recorded on a second set. "Care was taken to speak the words distinctly, but in general, the emphasis and inflection were those of ordinary conversation" (p. 92). The words were presented (a) without context, (b) when preceded by one or two unrelated words, (c) at the beginning of a sentence, (d) in the middle of a sentence, or

95

(e) at the end of a sentence. Bagley controlled for possible repetition effects (caused by repeating the same word in different conditions) "by permitting a relatively long interval of time—two to four months—to elapse between each repetition of the word in the various connections" (p. 93). To summarize, Bagley's subjects were presented with words in which consonants were deleted from initial, medial, or final positions. These words were presented either in isolation, with related words, or in the beginning, middle, or end of a complete sentence. The words were presented to eight members of the Department of Psychology at Cornell University over a period of 2 years. For words presented in isolation, subjects were instructed to "listen to the word as reproduced by the phonograph, and to repeat it to the operator, who recorded the judgement. Occasional unmutilated words were inserted as checks" (p. 92). For isolated words presented with context, "two seconds before the word was reproduced by the phonograph, the operator spoke two words which stood in the relation of context to the mutilated word" (p. 92). In this condition, only polysyllabic words were presented as the mutilated words, and in each case a consonant sound was omitted from the middle of the word to produce the mutilation. For words in sentences, the subjects were told to liste'n to the entire sentence and repeat it to the experimenter. Note that subjects were not instructed to guess the identity of mutilated words in any condition, only to report what they heard. Results Although subjects were instructed to repeat words and sentences "as reproduced by the phonograph," mutilated words were often restored to their original form. Restorations were influenced by the position of the omitted sound within a word, the number of syllables within the word, and the position of the mutilated word in a sentence.3 2

In the past 30 years, research in acoustic phonetics has produced important insights into the mechanisms underlying speech perception. This body of research is certainly an appropriate answer to the question, "What is new in speech perception?" Our focus in the present article is on results at the word and sentence level—levels traditionally outside the scope of acoustic phonetic research. Our Claims about what is new in speech perception should be taken with this caveat in mind. 3 Bagley also observed that restorations were influenced by the class of sound that was omitted. The semivowels /w/, /r/, and /!/ were restored an average of 72%

96

THEORETICAL NOTE

Words in isolation. When mutilated words were presented in isolation, the original word was restored 23% of the time. Medial consonants were restored 42% of the time compared to only 3.5% and 13% restorations, respectively, for initial and final consonants. Bagley attributes this large difference to the fact that medial consonants were omitted from polysyllabic words, whereas initial and final consonants were omitted from monosyllabic words. In order to examine the effect of prior semantic context on perception of mutilated words, Bagley compared restorations for polysyllabic words that were presented both in isolation and when preceded by semantically related words. Words in context were restored almost twice as often as words presented without context. "When mutilated words are given with a minimum of context, the chances for their correct perception are increased by 82% as compared with their chances of correct perception when given without context" (p. 96). Bagley also reports that words in context sounded less mutilated than words presented in isolation: "The fact of mutilation is readily noticed in the single words given without context, even though the word be finally correctly perceived; the elision is not so readily noted when the word is given with a minimum of context" (p. 96). As we will see below, words with elisions were often completely restored in sentences—a phenomenon that has been reported by Warren (1970) and others. Words in sentences. Words presented in a sentence context were restored to their original form more often than words presented in isolation. Monosyllabic words presented in isolation were restored to their original form 8% of the time, whereas monosyllabic words in sentence contexts averaged 80% restorations. Polysyllabic words with medial elisions were restored 42% of the time when presented in isolation, whereas polysyllabic words in sentence contexts averaged 70% restorations. of the time, whereas all other sound classes were restored about 50% of the time (see Table 9 of Bagley's article). Bagley's analysis of restorations and substitutions for individual speech sounds did not produce any unusual insights, and at any rate, it would be difficult to interpret these results without a detailed acoustic description of the stimuli. Nevertheless, it is interesting to note that only recently have attempts been made to examine the perceptibility of phonetic features in natural continuous speech (Cole, Jakimik, & Cooper, 1978). The task used in these studies—detection of mispronunciations—is quite similar to the experimental task originally designed by Bagley.

Restorations were also influenced by the position of a word in a sentence. Across all words, restorations were 52%, 80%, and 85%, respectively, in the beginning, middle, or end of a sentence. Relation to Subsequent Research Associative Supplementing and Phonemic Restorations On the basis of introspective reports provided by his subjects, Bagley identified two mechanisms involved in the restoration of mutilated words: "associative supplementing" and "contextual supplementing." In associative supplementing, the mutilated word is perceptually restored to its correct form and the mutilation goes unnoticed. In contextual supplementing, the mutilation is noticed, and context is used to infer the correct word. Associative supplementing appears to be another name for the process involved in phonemic restorations (Warren, 1970; Warren & Obusek, 1971). Warren replaced the first [s] in legislatures with a coughing sound of the same intensity in the sentence, "The state governors met with their respective legislatures in the capital city." Even though subjects were informed in advance that the cough completely replaced one or more speech sounds, 95% of the subjects reported that the cough did not replace any sound; it was heard as background noise. Moreover, listeners could not accurately locate the position of the cough in the sentence. Thus, the missing speech sound was restored from context. Warren and Obusek report that "speech sounds synthesized through phonemic restorations cannot be distinguished from those physically present" (p. 360). Semantic Priming As just noted, Bagley observed an effect of prior semantic context on spoken-word recognition— an increase in restoration of mutilated words that were preceded by associated words. In the past two decades there have been numerous demonstrations that prior semantic context facilitates spoken-word recognition. For example, Morton and Long (1976) observed faster reaction times to target phonemes in predictable words than to target phonemes in unpredictable words. They argued that in order for this result to obtain, subjects must first recognize the word containing the target sound and then respond to its segmental structure—in other words, the target phoneme is generated (restored) as a result of recognition. Cole and Jakimik (1980) and Cole and Perfetti (1980) observed faster reaction times to mispronunciations in predictable words than in less predictable

THEORETICAL NOTE

words. This result was observed even when predictability was determined solely by the word just before the mispronunciation (e.g., mink poat versus pinkpoat)—the condition used by Bagley. For visually presented words, evidence for semantic priming has been reported by Meyer and Schvaneveldt (1976) using a lexical-decision task. Lexical Constraint In addition to effects of prior context, Bagley reported two results that provide evidence for lexical constraint—constraint provided by the intact sounds within the mutilated word. These were (a) more restorations in polysyllabic words than monosyllabic words and (b) more restorations in word-final position than word-initial position. Perception of polysyllabic words. Bagley concludes that "polysyllabic words when mutilated are more easily recognized than monosyllabic words under the same conditions" (p. 97). Over 50 years later, Hirsh, Reynolds, and Joseph (1954) reached essentially the same conclusion after examining the intelligibility of different speech materials under various conditions of noise: "The intelligibility of meaningful words is a direct function of the number of syllables per word" (p. 538). Warren and Sherman (1974) and Samuel (1981) have demonstrated effects of lexical context on phonemic restorations. Warren and Sherman presented words like deli*ery and deli*eration (where * indicates a speech sound replaced by noise) in contextually neutral sentences and observed that the portion of a word following a replaced segment was sufficient to produce perceptual restoration of the segment. Using an improved methodology, Samuel (1981) also found an increase in restorations as a function of the number of syllables in a word. In this experiment, subjects were required to decide if part of a word had been replaced by noise (the traditional case in phonemic-restoration experiments) or if noise had been added to the word. The rationale behind this approach is that if phonemic restoration occurs, the subject should be unable to discriminate between the two types of stimuli. Samuel observed an increase in restorations (i.e., a decrease in discriminability) in three- and four-syllable words compared to twosyllable words. In addition, subjects were less willing to report that a sound was missing in longer words. Thus, missing sounds are more difficult to detect as missing in longer words and are accompanied by an increased impression that the sound is actually there. Perception of word-initial versus word-final sounds. Bagley's finding that more restorations occur in word-final position than word-initial po-

97

sition is also consistent with results obtained in recent experiments. For example, Marslen-Wilson and Welsh (1978) found that while shadowing speech (repeating it aloud while listening to it), subjects were more likely to restore mispronounced words to their original form if the mispronunciation occurred in the third syllable of the word rather than the first syllable. Cole, Jakimik, and Cooper (1978) found that word-initial mispronunciations in one-syllable words were detected about twice as often as the identical wordfinal mispronunciations. In effect, the failure to detect a mispronunciation is evidence that the listener has restored the input to its correct form. Thus, Bagley's finding that listeners restore more word-final sounds is equivalent to Cole et al.'s (1978) finding that mispronunciations are more difficult to detect in word-final position. Sentential Constraint Bagley found that (a) words heard in a sentence context are recognized more accurately than words presented in isolation and (b) words occurring later in a sentence are more likely to be restored than words occurring earlier in a sentence. Both of these results have received confirmation. For example, Miller, Heise, and Lichten (1951) found that, at 0 dB signal-to-noise ratio, words in sentences are about 20% more intelligible than words in isolation. Marslen-Wilson and Tyler (1975, 1980) found that reaction times to target words become progressively faster throughout the course of a sentence, and Tyler and Marslen-Wilson (1981) obtained similar results with children as young as 5 years of age. Marslen-Wilson and Welsh (1978) found that subjects who shadowed speech that contained occasional mispronunciations were more likely to restore mispronounced words to their original form when they were predictable from prior context. If we assume that a word near the end of a sentence is generally more predictable than a word near the beginning of a sentence, then Bagley's results can be viewed as equivalent to those reported by Marslen-Wilson and Welsh. Relation to Subsequent Theory Role of Context in Speech Perception In the introduction to his article, Bagley reviews the experimental work in visual word recognition and reading and concludes that "above all else, this work upon visual perception bears overwhelming testimony to the significance which 'context' has for the perception of symbols which

98

THEORETICAL NOTE

appeal to the eye" (p. 86). Bagley views his own research as an investigation of the effects of context on speech perception: "It was the primary object of the present study to determine whether a similar condition obtains in the case of symbols appealing to the ear" (p. 86). The major conclusion that Bagley draws from his study is summarized in the following quotations: From the scries of mutilated words that were given, now singly, now with a minimum of context, now at the beginning, now in the middle, and now at the end of a complete sentence, it is evident that the "setting" of a word is the determining factor in its apperception, (p. 102) The temporal position of a mutilated word and the succession of contextual elements with which it is given, determines the amount of injury which the word as a unit of auditory perception sustains through mutilation, (p. 98)

The overwhelming importance of context in speech perception has been reiterated by theorists throughout the past three decades (e.g., Cole & Jakimik, 1978; Forster, 1976; Lieberman, 1963; Marslen-Wilson & Tyler, 1980; Miller, 1962; Miller, ct al., 1951; Miller & Isard, 1963; Morton, 1969; Pollack & Pickett, 1963). Despite this general agreement on the importance of linguistic and real-world knowledge during speech perception, the nature of this interaction is still a matter of debate. Some have argued that sound and knowledge interact during word recognition, whereas others have argued that context effects are the result of processing that occurs after words are recognized. Morton (1969) and Marslen-Wilson and his colleagues (e.g., Marslen-Wilson & Tyler, 1980; Marslen-Wilson & Welsh, 1978) support an "interactive" view of spoken-word recognition. This view holds that sound and knowledge interact during recognition. Morton (1969) explicitly modeled this interaction through the activity of a set of counting devices called "logogens." Logogens accept input both from contextual sources (e.g., a preceding set of words) and from the physical stimulus and fire (produce recognition) whenever a threshold is exceeded. Morton thus assumed that stimulus information and contextual information interact during word recognition. More recently, Marslen-Wilson and his colleagues have argued strongly that sound and knowledge interact during the earliest stages of spoken-word recognition. An alternative position is taken by Forster (1976), Foss and Blank (1980), and Cairns, Cowart, and Jablon (1981). They argue for an "autonomous" model of spoken-language compre-

hension in which dedicated subprocessors independently effect lexical access, syntactic analysis, and the conceptual interpretation of a sentence. In this view, lexical access is based solely on the acoustic-phonetic properties of the input. Similarly, syntactic analysis is based exclusively on the syntactic properties of the input. A "message processor" (Cairns et al., 1981) or "general problem solver" (Forster, 1979) takes input from the lexical and syntactic modules, adds information based on inferences and real-world knowledge, and constructs a semantic interpretation of the utterance. A main difference between the two views is the locus-of-context effects: The autonomous model assumes that effects of semantic context occur after lexical access, whereas the interactive model assumes that semantic context is an integral part of the word-recognition process. To summarize, there is still disagreement on just how context works. This disagreement centers on the nature of the processing system that produces context effects and the locus of these effects. All seem to agree, however, with Bagley's contention that a word's context is a determining factor in its perception. Primacy of Word-Initial Information On the basis of his finding that deletion of a word-initial consonant affects perception more than deletion of a word-final consonant, Bagley concludes that initial consonants play a primary or "determining" role in perception. A number of investigators have recently suggested that wordinitial sounds play a primary role in word recognition from fluent speech. For example, in their model of word recognition, Marslen-Wilson and Welsh (1978) attribute a primary role to wordinitial sounds in lexical access. They assume that word-initial sounds are used to access candidates for recognition. Cole and Jakimik (1978) have suggested that certain word-initial segments, such as slop consonants in prestressed word-initial position, serve as perceptual anchors because they provide the most direct and reliable information about a word's identity. They point out that wordinitial sounds are least likely to be affected by phonological variation and that segment durations are longest for these sounds (Umeda, 1977). On the basis of results of listening-for-mispronunciation experiments with both children and adults, Cole (1981) has suggested that listeners pay special attention to beginnings of words. Sequential Use of Sound in Word Recognition According to Bagley, restorations occur more often in word-final position than word-initial po-

THEORETICAL NOTE

sition because of associative supplementing, a process by which listeners supplement or restore the information at the end of a word using the context provided by the beginning of the word. In Bagley's terms, this . . . can be interpreted as analogous to the influence of context; for since the initial element possesses the greatest significance for perception, it is reasonable to ' suppose that the mid and final elements lose significance through the associative supplementing of the preceding elements, (p. 99)

Note the similarity of this explanation to that used by Marslen-Wilson and Welsh (1978) to explain why subjects are more likely to restore word-final mispronunciations than word-initial mispronunciations while shadowing speech: The lexical constraint (third syllable) effects follow as a result of successful word-identification. Given the length of the words used, and other evidence about the timecourse of word-recognition, the selection process would normally be completed before the third syllable of the mispronounced words had been heard. Clearly, once a single word-choice has emerged, the recognition system will have achieved its primary goal, and a less detailed assessment of the remaining input for that word will be required. This will have the effect of making the system less sensitive to deviations that occur after the point of identification, (p. 57)

The notion of sequential lexical constraint during spoken-word recognition is a central concept in models of speech perception proposed by Marslen-Wilson and Welsh (1978) and Cole and Jakimik (1978, 1980). According to Marslen-Wilson and Welsh, word-initial information is used to generate a set of candidates for recognition. As the listener continues to process the input, candidates are eliminated from consideration until a single candidate remains, at which point recognition is assumed to occur. Using this model, it is possible to predict the point within a word at which it should be recognized—this is denned as the point at which the word diverges from all other words in the language that share the same initial sounds. Experiments that have tested this prediction have produced strong support for the model (e.g., Grossjean, 1980; Jakimik, 1980; Marslen-Wilson, & Tyler, 1980). Thus, the notion of sequential lexical constraint offered by Bagley has recently been incorporated in models of spoken-word recognition and has received support from a number of different experiments. Conclusion As we read Bagley's article, we were initially delighted, surprised, and a bit overwhelmed. For

99

the most part, we were delighted and impressed by the magnitude of the enterprise reported. We were surprised by the remarkable correspondence between his results and those reported in recent years. When we think about Bagley the man, his sense of perspective, and his insight, we are awed. A single individual, working for a year or two on a dissertation, managed to anticipate many of the major empirical and theoretical results of the last 2 decades of research in psycholinguistics. Because of this, Bagley's work should be regarded as an important piece of research and as a highly instructive historical event, not just a historical curiosity. Although Bagley's article was a pleasure to read, it also forced us to consider a serious and disturbing question: What really has been accomplished in the past 80 years? What do we know about speech perception in 1982 that was not reported in 1900-1901? After considering this question for the past year, we offer the following opinion: Most of the important facts about spoken-word recognition were catalogued by Bagley in 1900-1901; subsequent research has added little to this basic catalogue. In terms of identifying new phenomena, or extending our understanding of the fundamental mechanisms underlying spoken-language comprehension, precious little has come to light since Bagley's time. In terms of our ability to describe and place phenomena, we have progressed. Contemporary work is a great deal more reliable. Whereas Bagley had only his wits and an Edison wax-drum phonograph, today's researcher has at his or her disposal a variety of tools: tape recorders, computers, word-frequency lists, and sophisticated statistics, to name but a few. These tools produce reliable results: A variety of extraneous factors have been accounted for and alternative explanations have been eliminated. We are much more certain of the context effects than Bagley could ever be. How could a single individual anticipate so much of contemporary work? For this to make sense, we must either assume that William Bagley was a genius (forgotten by psychology perhaps, but a genius nonetheless) or else that his findings represent the most obvious facts about spoken-language comprehension. If we accept the latter, we must conclude that current research, despite its methodological and theoretical sophistication, belabors the obvious. That is, we have quantified the principal problems of speech understanding but have failed to go beyond these to address more difficult issues. Thus, in 1982 we are confident that listeners use various types of context to recognize speech, but we know little more about the mech-

100

THEORETICAL NOTE

anisms of perception than Bagley did in 19001901. The duplication of Bagley's work points to another disturbing aspect of contemporary psychology—its remarkably ahistoric character. Contemporary work makes little reference to work that was done in the last generation, let alone the 19th century. To be sure, certain subfields of psychology pay homage to one or two historical figures. But except for a few classic studies, familiar for the most part only through secondary sources, little active use is made of early psychological research on language. A computer search of the psychology journals from 1964 to the present failed to reveal a single citation of Bagley's work. The lesson is clear: Unless important papers are cited, the research described in these papers is likely to be lost to generations of researchers. Part of the difficulty lies in our unfamiliarity with the methodological, theoretical, and social context of early psychological research. A paper written by a structuralist such as Bagley is difficult for the modern reader to understand: The terminology is arcane, the conception is unfamiliar, and the matrix of related research is lacking. Despite the relative inaccessibility of their work, many early psychologists devoted a great deal of effort to issues that are still of interest today. It seems unfortunate that their contributions remain for the most part untapped. Perhaps the most disturbing thought of all is that there is nothing new to learn about speech perception. But this is surely false. We need only look at our attempts to build computer speechrecognition systems to understand that we do not know so much after all and that there is still much research to be done. Despite the best attempts of computer scientists, engineers, and mathematicians working in collaboration with psychologists, phoneticians, and linguists, and despite massive funding and effort both in universities and in industry, no one has been able to build a computer system that can recognize everyday speech. In fact, on any meaningful measure, man is orders of magnitude better than the most powerful speechunderstanding system. For example, we are able to recognize conversational speech produced by an unfamiliar speaker in a noisy environment. No computer system today can perform with acceptable accuracy under these conditions. Even after training with a single speaker, computer systems are unable to perform at better than about 80% accuracy with acoustically confusable vocabularies, such as the letters B, D, E, P, T, G, V, Z, andC. If we have learned a lesson from Bagley's work, it is that the basic phenomena of spoken-language

comprehension have been well documented. Effects of lexical and sentential constraint on spoken-language comprehension were established empirically as early as 1900-1901 and have been replicated using a large number of (often ingenious) experimental techniques during the past 82 years. In our opinion, we must now proceed beyond a description of these phenomena to a deeper understanding of the mechanisms underlying context effects. Thus, there is still work to be done. We have much to learn about the nature of the information in the speech wave; how we process it; and how we combine the information in the stimulus with our knowledge of speech, language, and the world. We are encouraged by recent attempts to describe the segment-by-segment processing of speech during word recognition by tests of this model and by the healthy skepticism with which most of our colleagues greet the model. We will anxiously open our journals hoping for new insights. But we will be less than impressed by new demonstrations that listeners use context to understand speech. References Bagley, W. C. The apperception of the spoken sentence: A study in the psychology of language. American Journal of Psychology, 1900-1901, 12, 80-130. Brickman, W. W. William Chandler Bagley. In J. F. Ohles (Ed.), Biographical dictionary ofamerican educators (Vol. 1). Westport, Conn.: Greenwood Press, 1978. Cairns, H. S., Cowart, W., & Jablon, A. D. Effects of prior context upon the integration of lexical information during sentence processing. Journal of Verbal Learning and Verbal Behavior, 1981, 20, 445-453. Cole, R. A. Perception of fluent speech by children and adults. In H. Winitz (Ed.), First and second language learning. New York: New York Academy of Sciences, 1981. Cole, R. A., & Jakimik, J. Understanding speech: How words are heard. In G. Underwood (Ed.), Information processing strategies. London: Academic Press, 1978. Cole, R. A., & Jakimik, J. A model of speech perception. In R. Cole (Ed.), Perception and production affluent speech. Hillsdale, N.J.: Erlbaum, 1980. Cole, R. A., Jakimik, J., & Cooper, W. Perceptibility of phonetic features in fluent speech. Journal of the Acoustical Society of America, 1978, 64, 44-56. Cole, R. A., & Perfetti, C. Listening for mispronunciations in a children's story: The use of context by children and adults. Journal of Verbal Learning and Verbal Behavior, 1980, 19, 297-315. Forster, K. Accessing the mental lexicon. In R. Wales & E. Walker (Eds.), New approaches to language mechanisms. Amsterdam: North Holland, 1976. Foss, D. J., & Blank, M. S. Identifying the speech codes. Cognitive Psychology, 1980, 12, 1-31. Grossjean, F. Spoken word recognition processes and the

THEORETICAL NOTE gating paradigm. Perception & Psychophysics, 1980, 28, 267-283. Hirsh, J. J., Reynolds, E. G., & Joseph, M. Intelligibility of different speech materials. Journal of the Acoustical Society of America, 1954, 26, 530-538. Jakimik, J. The interaction of sound and knowledge in word recognition from fluent speech. Unpublished doctoral dissertation, Carnegie-Mellon University, 1980. Lieberman, P. Some effects of semantic and grammatical context on the production and perception of speech. Language and Speech, 1963,6, 172-187. Marslen-Wilson, W. D., & Tyler, L. K. Processing structure of sentence perception. Nature, 1975, 257, 784786. Marslen-Wilson, W., & Tyler, L. K. The temporal structure of spoken language understanding. Cognition, 1980, 8, 1-71. Marslen-Wilson, W., & Welsh, A. Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 1978, 10, 29-63. Meyer, D. E., & Schvaneveldt, R. W. Meaning, memory structure, and mental processes. In C. N. Cofer (Ed.), Structure of human memory. San Francisco: Freeman, 1976. Miller, G. A. Some psychological studies of grammar. American Psychologist, 1962, 17, 748-762. Miller, G. A., Heise, G., & Lichten, W. The intelligibility of speech as a function of the context of the test materials. Journal of Experimental Psychology, 1951,47, 329-335. Miller, G. A., & Isard, S. Some perceptual consequences

101

of linguistic rules. Journal of Verbal Learning and Verbal Behavior, 1963, 2, 217-228. Morton, J. Interaction of information in word recognition. Psychological Review, 1969, 76, 165-178. Morton, J., & Long, J. Effect of word transitional probability on phoneme identification. Journal of Verbal Learning and Verbal Behavior, 1976, 75, 43-51. Pollack, I., & Pickett, J. M. The intelligibility of excerpts from conversational speech. Language and Speech, 1963,6, 165-171. Samuel, A. Phonemic restoration: Insights from a new methodology. Journal of Experimental Psychology: General, 1981, 770, 474-494. Tyler, L. K., & Marslen-Wilson, W. D. Children's processing of spoken language. Journal of Verbal Learning and Verbal Behavior, 1981, 20, 400-416. Umeda, N. Consonant duration in American English. Journal of 'the Acoustical Society of 'America, 1977,67, 846-858. Warren, R. M. Perceptual restoration of missing speech sounds. Science, 1970, 767, 393-395. Warren, R. M., & Obusek, D. J. Speech perception and phonemic restorations. Perception & Psychophysics, 197l,9fJSj, 358-362. Warren, R. M., & Sherman, G. Phonemic restorations based on subsequent context. Perception & Psychophysics, 1974, 76, 150-156.

Received February 28, 1982 Revision received July 1, 1982 •

Suggest Documents