The Role of Stress and Position in Determining First Words

12 downloads 0 Views 2MB Size Report
syllables to assist in the identification and extraction of longer words and larger linguistic units.3 From this perspective, the initial task for the young languageĀ ...
LANGUAGE ACQUISITION, 2(3), 189-220 Copyright o 1992, Lawrence Erlbaum Associates, Inc.

The Role of Stress and Position in Determining First Words Catharine H. Echols Department of Psychology University of Texas

Elissa L. Newport Department of Psychology University of Rochester

The possibility that perceptual predispositions may assist young language learners in the initial identification of words in speech was investigated in a corpus of early words. The specific predispositions investigated included tendencies to attend to and extract stressed and final syllables. A total of 616 productions with multisyllabic adult targets were collected from three children at the one-word stage of language acquisition. These utterances were phonetically transcribed and coded, in relation to the adult target word, for omissions and accuracy of syllables. Results provided support for two predictions: Syllables that were stressed or final in the adult target word were (a) omitted much less frequently and (b) produced more accurately than were unstressed, nonfinal syllables. These results are consistent with the view that syllables that are stressed or final in adult speech are particularly salient to young children and, consequently, are particularly likely to be extracted and included in first productions.

1.

INTRODUCTION

The question of how children initially extract and represent approximately word-sized units from the stream of adult speech would seem to be one of the most fundamental of language acquisition. Unless the child can identify word-level units, virtually a11 further linguistic accomplishments will be unattainable. The problem is also far more difficult than it may initially Requests for reprints should be sent to Catharine H. Echols, Department of Psychology, Mezes Hall 330, The University of Texas at Austin, Austin, TX 78712.

appear. Contrary to adult intuitions, speech rarely contains pauses between words, and there are generally few, if any, highly consistynt cues to s ' Clark boundaries between words (Cole and Jakamik (1980), ~ a ~ eand (1970)). Yet children's early productions, although sometimes smaller or larger than adult words, do tend to approximate word-level units. The child's success at the task of identifying words in the stream of speech suggests that the task must be constrained in some way. One mechanism by which the required segmentation process could be achieved would be for perceptual biases to direct the child's attention toward particular elements of the speech stream. Although there appear to be no entirely consistent cues to boundaries between words, properties of speech, such as stress, intonation, or rhythm, could make an element of speech, such as a syllable, particularly salient (Gleitman, Gleitman, Landau, and Wanner (1988), Gleitman and Wanner (1982), Peters (1981; 1983)). Biases to attend to and extract such salient syllables could provide an entry point for the task of identifying words in speech.' Although not always coextensive with adult words, the units thus extracted would map in a systematic way to the unit word in adult speech. Salient syllables, such as stressed syllables, would provide the child with a useful starting point for at least two reasons: First, syllables generally fall .~ a tendency to extract salient syllables would within word b ~ u n d a r i e sThus, circumvent the problem of identifying boundaries between woids. Second, every open class word in English has a single primary word stress. If a child succeeds in extracting a stressed syllable, particularly a syllable carrying primary word stress, that child has managed to extract one syllable of an open class word. In some cases, the single syllable will correspond to the adult word, but if not, the child can then build up from that syllable, perhaps with the assistance of other perceptual-attentional biases, or of distributional cues, to achieve a unit corresponding to the adult word. Tendencies to attend to larger scale prosodic cues, such as intonation and 'Note that biases, as used here, means simply that the perceptual system is tuned such that a certain set of acoustically prominent aspects of the speech system will bp preferentially attended to. 'Exceptions to this may occur in continuous speech as, for example, when a vowel-initial word, particularly a word beginning with a stressed syllable, follows a word ending in one or more consonants (Kahn (1980)). Although common in certain other languages, such as French and Arabic, the extent to which resyllabification across word boundaries occurs in English, or whether it occurs at all, has been questioned (Selkirk (1984)). Indeed, the greqt difficulty that English speakers have with learning word boundaries in Egyptian Arabic (Broselow (1984)) could be due, in part, to the infrequency of such resyllabifications in English. Of course, some children will be learning languages in which such resyllabification occurs. The prediction would be that children acquiring such languages should frequently make segmentation errors. In fact, there is some evidence suggesting that they do make such errors (e.g., Pye (1983) provides evidence of errors due to resyllabification across morpheme bounddries).

rhythm, may operate simultaneously with tendencies to attend to particular syllables to assist in the identification and extraction of longer words and larger linguistic units.3 From this perspective, the initial task for the young language learner should be described not as one of dividing an entire sequence of speech into word-sized units, but rather as the extraction of units from a partly or entirely unanalyzed stream of sound (Gleitman and Wanner (1982)). Some clues to the nature of attentional predispositions that may play this important role in early language learning can be obtained by considering the nature of children's earliest productions. There is a large body of literature describing a set of irregularities in the differences between young children's words and the adult target words. These differences have sometimes been attributed to incongruities between children and adults in the phonological representation (e.g., Ingram (1992), Jakobson (1941/1968), Stampe (1969)), sometimes to production constraints (e.g., Ferguson (1976), Ingram (1976), Menn (1978)), and sometimes to perceptual factors (e.g., Klein (1981a), Macken (1980), Vihman (1981), Waterson (1971)). Among the various distortions found in early speech are a set of phenomena that have frequently (but not always) been attributed to perceptual salience. These include tendencies to omit from early productions unstressed and nonfinal syllables (Blasdell and Jensen (1970), Frumhoff, Echols, and Newport (1992); see also Allen and Hawkins (1980) and Gerken, Landau, and Remez (1990) for production-based accounts for this phenomenon) and an advantage in acquisition for final (Slobin (1973)) or stressed (Pye (in press)) morphological inflections. If these phenomena do result from the tendency of the child to attend selectively to particular perceptually salient parts of speech, then such phenomena may be informative concerning the perceptual tendencies that assist the child in the initial segmentation of speech. In this article, therefore, we consider what the implications are for segmentation if these phenomena are, indeed, representational4 1.1 Theoretical Background Concerning Perceptual Biases

The notion that certain parts of words are particularly salient to young language learners was proposed by Slobin (1973) in the context of the 3The role of such prosodic cues in word identification has been discussed, for example, by Frazier (1987), Grosjean and Gee (1987) and Waibel (1986) for adults, and by Kemler Nelson (1989) for infants; see the next section for discussion of such cues in the segmentation of larger linguistic units. 4We return to the possibility that the stress and position phenomena are due to production factors at a later point in the article.

acquisition of morphology (see also Bever (1970) for a di/scussion of perceptual strategies used by young children for the processiqg of grammatical and syntactic structure and Waterson (1971) for an account in which perceptual salience, at the level of the phoneme, partial14 determines the form of a young child's words). On the basis of cross-linguistic data on the order of acquisition of various morphological inflections, Slobin argued, for example, that children pay particular attention to elements at the ends of words. Although Slobin was concerned with the acquisition of morphology and not with the initial segmentation of speech or representation of words, his proposed biases contribute the notion that some components of speech, frequently syllable-level units, may be more easily or more readily processed than are other parts of speech. Moreover, the relative ease of processing these units stems from perceptual-attentional tendencies or "strategies" on the part of the child. This article builds on these notions, that is, that certain parts of speech, in this case stressed or final syllables, will be particularly salient and readily processed by the young language learner. Perceptual biases have also been discussed in the context of segmentation. Morgan and Newport (1981) and Gleitman and Wanner ((1982); see also Gleitman et al. (1988), Morgan (1986), Morgan, Meier, and Newport (1987)) suggested that prosodic cues to syntactic boundaries available in the speech signal could be used by young children for the initial identification of syntactically relevant units. Several investigators (Fernald and Kuh1(1987), Hirsh-Pasek, Kemler Nelson, Jusczyk, Wright-Cassidy, Druss, and Kennedy (1987), Jusczyk, Hirsh-Pasek, Kemler Nelson, Kennedy, Woodward, and Piwoz (1992), Mehler, Bertoncini, Barriere, and JassikGerschenfeld (1978), Morgan (1986)) provided evidence of early sensitivity to such prosodic information. Gleitman et al. and Morgan et al. proposed that the prosodic cues to clause and phrase boundaries may proyide children with a "bootstrap" into the morphological and syntactic struqture of their language. This initial segmentation of speech into prosodically defined units may then facilitate the identification of distributional cues to an elaborated syntactic structure. Indeed, Morgan et al. and Morgan and Newport showed in artificial language learning that availability of and attention to grouping cues like those provided by prosody are necesqary for the acquisition of syntactic structure. Within the broader model outlined earlier, Gleitman et al. and Gleitman and Wanner also suggested that prosodic information may assist in the identification of word-level units of speech. Based on ex mples from children's early productions, they argued that a tendency to ttend to and extract stressed syllables may be one important route into the initial segmentation of speech. They noted that such a tendency woulp account for the telegraphic nature of children's early utterances, in which (at least in

!

ECHOLS AND NEWPORT

193

English) function words, inflectional morphemes, and, frequently, unstressed syllables of content words - essentially, all the unstressed elements -tend to be absent.' Peters (1981; 1983) included tendencies to attend to stressed syllables, and also final syllables, in her description of the initial segmentation of speech. Additionally, attention to stressed syllables has been incorporated into descriptions of adult speech processing (e.g., Cutler and Foss (1977), Grosjean and Gee (1987)), including an account for certain speech deficits observed in Brocas aphasics (Kean (1977)). Moreover, there are some suggestions that stress may increase the salience of a syllable for young infants: Exaggerated, or rnotherese-style stress, at least, has been shown to facilitate the identification of an embedded phoneme feature change by 1- to 4-month old infants (Karzon (1985)).~ The proposal that stressed syllables should be especially salient to young children and therefore particularly extractable is compatible with acoustic characteristics of speech. Stressed syllables are, in physical terms, acoustically prominent (i.e., in English, they tend to be louder, longer, and higher pitched than unstressed syllables; Lehiste (1970)). However, the notion that final syllables should assist in segmentation may appear circular. If children do not know where the word boundaries are, they should be unable to determine which syllable is final, much less perceive such syllables as particularly salient. However, a syllable that is word final also has the potential for being sentence final or at least phrase final (and speech contains more acoustic cues to phrase boundaries than to word boundaries; Cooper (1983), Paccia-Cooper and Cooper (1981)). Indeed, some evidence suggests that newly introduced nouns are particularly likely to be placed in final position (as well as on pitch peaks) in infant-directed speech (Fernald and Mazzie (1991), Woodward and Aslin (1990)). Syllables falling in final position may be easier for the child to break off and store in a representation than equivalently stressed syllables falling in other positions in a speech sequence. Although Slobin (1973; 1985) has presented substantial evidence that attentional biases contribute to the acquisition of morphology, systematic evidence for a role for final position, as well as stress, in the very earliest phases of language acquisition has not been previously presented. However, 'However, see Gerken, Landau, and Remez (1990) and Gerken and McIntosh (1991) for evidence that function words may not be entirely absent from children's representations during this period. %ee, however, Jusczyk and Thompson (1978) for a failure to find an effect of stress on 2-month-old infants' discriminations of an embedded phonetic feature change. The discrepancy may be due to a ceiling effect in the Jusczyk and Thompson study, however. The phonetic feature change that was tested may have been more discriminable than that employed in the Karzon study, and the sequences were shorter; infants were generally successful at perceiving the contrast regardless of whether it was stressed or unstressed.

194

STRESS AND POSITION IN FIRST WORDS

a

a few descriptive and experimental studies do provide releva t data, and additional hints of the salience of stressed or final syllables can e extracted from general descriptions of acquisition in various languages. 1.2 Evidence from Descriptions of Acquisition in English and Cross-Linguistically

The tendency of children to omit unstressed (especially unstressed and nonfinal) syllables from their earliest productions has been noted in English (Allen and Hawkins (1978), Gleitman and Wanner (1982), Ingram (1978), Klein (1981a)) and has also been observed in Hebrew (Berman (1977)). Greater accuracy in the production of consonants of stressed syllables has also been described for English-learning children (Klein (1981a)). In Hungarian, however, in which primary stress is initial, it appears to be the final parts of words that are likely to be omitted, although there may be some tendency to drop medial syllables while retaining initial and final syllables (MacWhinney (1985)).~ Data on language acquisition in certain Native American languages are particularly interesting for identifying potential effects of perceptual salience because they permit the disentangling of the role of acoustic properties of language, such as stress, from that of semantics in the extraction of first words. In English and many other languages, stress tends to fall on open class words and, within a word, on the root of the word, whereas inflections, and especially function words, are virtually always unstressed. Thus, if the child omits inflectional endings and function words from early productions in English, it could be because those are unstressed, but it could also be because the child attends primarily to noun and verb roots or to other open class words. In Native American languages such as Mohawk and K'iche' Mayan, nouns or various inflectional markers may be incorporated into the verb, resulting in multisyllabic and morphologically complex verbs. Stress frequently falls on inflectional endings of these complex verbs rather than on the verb root (Feurer (1980), Pye (1983)). The evidence from acquisition in both K'iche' Mayan and Mohawk are far better accounted for by perceptual salience than by semantic importance: The telegraphic utterances of Feurer's (1980) Mohawk subject were generally comprised of a stressed syllable and, frequently, a subsequent 'As one reviewer noted, the possibility that final syllables may frequently also be omitted in Hungarian suggests that final syllables may be most readily retained when they follow a stressed syllable. This is similar to a possibility raised by Du Preez (1974), although with respect to sentence-level stress on children's tendency to imitate words within a sentence. Du Preez noted, "the tonic appears to act as a signal to notice what is to follow" (p. 371). This issue I is returned to at a later point in the article.

syllable (the syllable subsequent to the stressed syllable often being final). Although these productions often included the verb root or part of it, they sometimes consisted of nothing but a sequence of suffixes. The child's word for see, for example, was [se:la], which corresponded to two inflectional morphemes (from the adult /kn + 3sBr + a?/, 'see' + Purp. Punct.). In K'iche', telegraphic speakers' attempts at those verbs for which inflectional endings were stressed were far more likely to include the inflections than to include the verb roots (31% vs. 6% in samples of speech collected from four children; Pye (1983)). In some cases, the stressed verb endings were also final. Thus, final position may also contribute to the perceptual salience, and extractability, of the stressed inflections. In a more recent article, Pye (in press) also argued for perceptual salience over semantics in accounting for the order of acquisition of a number of K'iche' morpheme^.^ Data from early language acquisition in two other languages in which children acquire certain grammatical elements very early also provide evidence attesting to the potential salience and extractability of stressed and final syllables. As was noted earlier, children acquiring English and many other languages typically begin to produce inflections and other grammatical elements much later than open class words. However, Aksu-koq and Slobin ((1985); see also Slobin (1982)) reported that instead of being telegraphic, the utterances of even very young children acquiring Turkish tend to be comprised of sequences of syllables representing both roots and inflectional suffixes. Primary stress is word final in Turkish, but stress is fairly evenly distributed across syllables (Aksu-koq and Slobin (1985)). Although Aksu-koq and Slobin preferred a different interpretation for these observations, Gleitman and Wanner ((1982); Gleitman et al. (1988)) suggested that tendencies to attend to and extract stressed syllables provide a plausible explanation for the early competence of Turkish children with sequences of morphemes: For a language in which stress is relatively evenly distributed across syllables, the proposed tendency to attend to and extract stressed syllables may predict that young language learners should produce more syllables than should children learning languages in which only certain syllables are prominently stressed. The observations from Turkish and the Native American languages provide evidence for the perceptual salience of stressed syllables (possibly in interaction with final position). The early acquisition of certain grammatical elements in Japanese may provide support particularly for the salience of final position: The grammatical markers acquired early by children learning Japanese include functional markers and inflections on verbs.

+

'1n both articles, however, Pye proposed a production-based explanation for the perceptual salience effects on production of verb endings. This issue is discussed in a subsequent section of this article.

Functional markers are sentence final. Verb inflections are at least word final and frequently also sentence final (Clancy (1985)).' Although Slobin's operating principles have received subport from acquisition of morphology in a broad range of languages (Slobin (1985)), there have been occasional reports of violations or at least failure to confirm this general observation. For example, it has been suggested that children acquiring Polish may show no specific tendency to learn word-final inflections more readily than word-initial inflections (Weist (1983)). Other factors may account for these apparent contradictions, however.'O Although the data from descriptive studies of acquisition both in English and cross-linguistically are suggestive of tendencies to attend to stressed and final syllables, only Klein's (198la) description of early words in English and Pye's (1983) study of acquisition in K'iche' Mayan were explicitly intended to explore the role of those factors. Pye's study was with children who were well beyond the one-word period when the study began. Although Klein did provide a careful description of effects of stress and position on accuracy of children's productions during the one-word period, she did not systematically separate the effects of stress and position, nor did she consider the implications for segmentation. Thus, none of these previous studies was a systematic study of a large corpus of early utterances from children in the one-word period of language acquisition designed to seek evidence for a potential role for perceptual biases in segmentation.

1.3 Evidence from Studies of Children's Imitations Tendencies to retain stressed syllables, while frequently omitting unstressed syllables, have been observed in studies of imitations of English speech (Oller and Rydland (1974)) and of nonsense-word speech (Blasdell and 'clancy noted that these markers and inflections are also stressed. However, there is some controversy over whether the pitch accent system found in Japanese is indeed a stress system or whether it may be more accurately described as a tone system (Poser (1984)). Whatever the typology, it is possible that the pitch pattern may enhance the salience of those grammatical elements that are acquired early. In any case, final position is likely to be at least one factor increasing the salience and extractability of functional markers and verb inflections in Japanese, thus contributing to their early acquisition. ''In the case of the inflections evaluated by Weist, the later acquired suffix is frequently embedded. Moreover, because stress in Polish is penultimate in a word (Smoczynska (1985)), the initial syllable in which the prefixal changes take place may frequently be stressed, whereas the syllable containing the suffixal change would never be. Even if Weist's data were stronger, they are not evidence against the position presented here. His subjects were 21/2 to 3% years of age and, accordingly, well beyond the earliest period of language acquisition. If children start out with a universal set of operating principles, those tendencies will, at some point, need to become tuned in language-specific ways. By 2% to 31/2 years of age, children may no longer be biased in accordance with the universal tendencies in every case.

ECHOLS AND NEWPORT

197

Jensen (1970), Frumhoff et al. (1992)) by children ranging in age from 2 to 4 years (the youngest of whom would be producing telegraphic, or at least two-word, speech). Additionally, a tendency to retain words carrying sentence-level stress (each of which would contain or, in the case of single syllable words, be comprised of a stressed syllable) was observed by Du Preez (1974). Evidence for attention to final elements, as well as to stressed elements, was obtained in both the Blasdell and Jensen study and the Frumhoff et al. study." Children in both studies were more likely to imitate syllables that were stressed or final in the nonsense-word target sequences than to imitate unstressed and nonfinal syllables. Moreover, Oller and Rydland's observation that it was specifically initial unstressed syllables of the two-syllable stimulus words that tended to be omitted could be due to the combined effect of biases to attend to stressed and to final syllables. An additional finding of the Frumhoff et al. study suggests that stress may also facilitate the analysis of a syllable: Any unstressed syllables that were reproduced tended to contain more errors than the imitated stressed syllables. These results suggest that stress not only increases the probability that a syllable will be extracted, but that it also increases the likelihood of accurate analysis. Such an effect would, in fact, be predicted if stressed syllables were particularly salient to young language learners. Unlike the descriptive studies of language acquisition, most of the studies of children's imitations were explicitly motivated by an interest in the perceptual salience, for young language learners, of stressed or stressed and final syllables. Further evaluation of this question is warranted, however, for at least two reasons. First, the data for the experimental studies were children's imitations and not their spontaneous productions. Although it might be expected that children would tend to imitate those elements that are most salient to them, it is possible that a child may use relatively short-term perceptual strategies in imitation that may differ from the processes determining long-term storage of linguistic elements. Second, and perhaps more important, the subjects for most of these studies were at least 2 years old, and most were older. Therefore, they were well beyond the phases of language acquisition during which children must first extract word-level units, and they would have had ample opportunity to learn strategies for processing linguistic input. Children who are less than 2 years of age and who are just beginning to produce words will of course also have had substantial exposure to their native language; but they will at least still be in the period during which extracting individual words from speech "Because Du Preez reported his findings at the level of the word, there is no information concerning whether stressed or unstressed syllables within a word were produced. As a result, it is difficult to assess the independent effects of stress and position.

198

STRESS AND POSITION IN FIRST WORDS

k

is an important task. A true test of the claim that tendencies o attend to stressed and final syllables are "prewired" would require testi g neonates (or, possibly, unborn fetuses). If it is children's early words tdat are to be investigated, however, neonates would be inappropriate subjects. 1.4 The Present Study If the tendencies of young language learners to omit unstressed, nonfinal syllables do reflect perceptual or attentional predispositions that may also assist with segmentation, then two criteria should be met. First, these tendencies should be present as soon as the child begins to talk, as the predispositions should be present before language. Second, the tendencies to omit unstressed, nonfinal syllables should be a result of the child's perceptions and representations of words and should not be due solely to production constraints or strategies. Although the present study does not test these requirements fully, it does come closer than have previous studies to evaluating both criteria. The children are younger than were the subjects of tqe descriptive and experimental studies reviewed here (with the exception of Klein (1981a; 1981b)), thus better meeting the first criterion. Although our subjects are not prelinguistic, they are in the period of language development during which extracting words from the acoustic stream is an important task, that is, the one-word period of language development. The study comes closer than have previous descriptive studies to meeting the second criterion in that it is designed for the specific purpose of assessing the role of tendencies to attend to stressed and final elements in determining the form of these earliest utterances. In contrast to previous experimental studies, the data are primarily spontaneous productions collected in natural contexts from children who are predominantly one-word speakers. Although productions are not necessarily identical to representations, spontaneous productions must at least rely on a stored representation, whereas imitations may not. The intention of this study is to evaluate the implications of perceptually based predictions for naturally occurring productions. Because the data are productions, we do not directly test the claim that these tendencies are perceptually based. Further development of a perceptually based model for early words, and additional evidence supporting it, can be foynd in Echols (1992). However, we leave open the possibility that some of these phenomena may be due to production constraints and return to that issue in the discussion. If perceptual or attentional biases do assist in the initial extraction of approximately word-level sequences of speech, then children's earliest productions should provide evidence of those biases. In particular, if tendencies to attend to stressed and final syllables do assist in the identification of words in speech, then syllables of an adult target word that are

ECHOLS AND NEWPORT

199

stressed or final should be particularly likely to be included in first words, and syllables that are unstressed and nonfinal should frequently be omitted. Further, if stressed and final syllables are particularly salient, then they should also be produced more accurately than unstressed and nonfinal syllables, even when these other syllables are included. These predictions were tested in a corpus of early utterances by assessing the frequency of omission of syllables and the accuracy of included syllables as a function of the position and stress level of the target syllable in an adult word.

2. METHOD 2.1 Subjects and Data Collection Subjects were three children in the one-word speech period. Utterances were recorded in the child's home during natural interactive sessions among the child, one or both parents, and an experimenter. During the time of data collection, the children ranged in age from 1;5 to 1;ll. The mean MLU across first sessions for the three children was 1.07; it was 1.42 across final sessions. The analyses described here thus include utterances collected across several sessions during the period in which the child was producing primarily single-word utterances and into the earliest part of the period of two-word speech.12 The number of sessions and period of time covered for each child varied as a result of differences in the rate and pattern of language acquisition.13 For Subject 1, the period covered is from 1;7;2 to 1;8;3 (MLU 1.0 to 1.02); for Subject 2, 1;5;3 to 1;10;1 (MLU 1.22 to 1.92); and for Subject 3, 1;7;2 to 1;11;3 (MLU 1.O to 1.33). Each recording session lasted about 1 hour. Sessions were recorded on cassette audiotapes using either an Onkyo TA2130 cassette tape deck, a Sony AVC 3260DX video camera and video tapedeck with high-quality sound, or a Sony TC180AV taperecorder, all used with an Electro-Voice 635A microphone. The children were seated either at a table or on the floor, and the microphone was placed nearby. 12Single-wordutterances includes some utterances for which the adult target is longer than a single word but that are treated as a single unit by the child. Such utterances have been described by MacWhinney (1978) as amalgams and are discussed in greater detail in the section here on coding. Once amalgams and unsegmented imitations are excluded, the proportion of two-word utterances included in the analyses was 12%: 0% from Subject 1, 25% from Subject 2, and 8% from Subject 3. Note that the child for whom 25% of the included utterances were two-word utterances was a child who began very early to produce utterance frames such as itsa -, where's -, and here -. Virtually all of that child's two-word utterances-86%, or 51 of 59-were utterances of this type. 13Agesare reported as years;months;weeks.

200

STRESS AND POSITION IN FIRST WORDS

2.2 Coding

The tape productions were phonetically transcribed using the Ibternational Phonetics Alphabet (as described in Ladefoged (1982)).14 ~roductionswere transcribed using a broad transcription intended to capture phonemic, or linguistically relevant, differences between words and their targets; the detailed phonetic nature of the words was not described.15 The transcribed utterances were entered into a database system along with the corresponding adult target word,16 were divided into syllables, and were coded, on a syllable-by-syllable basis, for omissions of syllables and for number of phonemes correct within a syllable, in relation to the stress level and position of the target syllable in the adult word. Productions were also coded at the level of the utterance as spontaneous or imitated.

2.2.1 Exclusions Words with single-syllable adult targets were excluded from the analysis because they provide no useful information about the hypothesis in question; such words necessarily contain no omitted syllables and, being only a single syllable, permit no comparisons between syllables of different positions and different stress levels.17 Also excluded were utterances for which the adult target word could not be determined, utterances representing noises (e.g., hoo-hoo for a horn), sounds used to indicate yes or no (e.g., mm-hmm), and multiple repetitions of single words that conformed exactly to the adult target (and therefore provided no information useful for the analysis). Mommy, daddy, and variants of these (e.g., mama, dada) 14All of the recordings were transcribed by a single researcher, although a subset was also transcribed by a second researcher, as described in the section on reliability. Two exceptions to the IPA system should be noted: In accordance with common American usage (Pullum and Ladusaw (1986)), [El is used to represent IPA ItJ and B] replaces IPA [d3]. 15Stressplacement also was not included in the transcripts. Although transcriptions of stress were originally planned, it turned out to be exceedingly difficult to obtain reliability between coders on stress placement. I6Because the productions were primarily single-word utterances, all analyses compare the child's version of a word to the adult version. As was noted earlier, in some instances a child's "word" included more than one adult word. Such utterances were treated as single words for the purpose of analysis and therefore are described henceforth as words; the process by which adult targets were identified is described in the following section. "A large proportion of the utterances collected for the analysis of children's productions were, in fact, words with single-syllable adult targets. Although these productions were not useful for the present analysis, it should be noted that the high proportion of words with single-syllable adult targets is, in itself, at least consistent with the claim that stressed and final syllables are especially salient and particularly likely to be extracted. Single-syllable words (other than function words, which very rarely appear as independent units in early speech) almost invariably carry primary stress and are, by definition, word final. Although other explanations are also plausible, the model proposed here would, in fa&, predict that single-syllable words should appear with high frequency in children's earliest productions.

ECHOLS AND NEWPORT

201

were also excluded from the analysis unless they were used in a specifically referential sense (e.g., to refer to a mommy mouse or a picture of a daddy). These words were excluded because they are extremely frequent and are often distorted due to calling or whininess. Such distortions are clearly not errors but would have to be counted as such in the analysis. After these exclusions, a total of 616 utterances were included in the analysis, 180 from Subject 1, 237 from Subject 2, and 199 from Subject 3. All contained two or more syllables in the adult language.

2.2.2 Coding Categories: Codes Describing the Adult Target Word A summary of the coding is described here. For more information on the details of the coding procedure, see Echols (1987).

A. Adult target word. The adult target for a child's production was, for the most part, determined from the context of the utterance (i.e., both the events and the previous speech of the child and adults). In many cases, a parent provided an interpretation, and that was considered to be the target word unless subsequent context suggested that it was wrong (e.g., an adult statement of, "Oh! you're saying . . ."). Targets were assumed to be uninflected forms (e.g., singular, present rather than plural, progressive, or past), even if the utterance was an imitation and the adult target word was inflected, unless there was evidence of the inflection in the child's production. Thus, if a child said [zmo] after an adult had said animals, the target would be considered animal; but if the child produced [zmoz], the target would be animals. As mentioned previously, a sequence consisting of two or more adult words is, in some cases, considered to be a single word-level unit for a child. Following other investigators (Brown (1973), MacWhinney (1978)), we considered a sequence of two or more adult words as a single unit, or amalgam, if the words comprising that sequence never appeared independently and did not combine freely with other elements. Thus, for example, Mickey Mouse would be considered a single unit if the child never said mouse except when saying Mickey Mouse. l9

'*

"It has been pointed out that children's final syllables may have been less accurate than measured because there could have been times when a plural word was intended but the [s] was omitted. Because there is no way to know whether a plural was intended unless there is evidence in the child's productions, there is no way of addressing this concern; children of this age are not believed to have productive control of the plural (Brown (1973)) and, accordingly, would presumably be producing it only as part of a holistic version of the word. Thus, production of the plural would not be predictable from the context. lgWords that appeared elsewhere in the corpus independently or combined with other elements were occasionally coded as amalgams if the production differed markedly from another, consistent, form of the word, suggesting that the inconsistent form was not analyzed and, as a result, not perceived as similar to other productions of that word.

202

STRESS AND POSITION IN FIRST WORDS

B. Division into syllables. Adult target words were divided into syllables by reference to an English language dictionary (Stein and Su (1978)). Although this is not an entirely satisfactory way of determining syllable divisions, no standard method for identifying syllables exists inthe field of phonetics (see Ladefoged (1982) for a discussion of this issue). Wq hoped that reference to a dictionary would at least provide an objective means of dividing words into syllables. For utterances in which the target was longer than a single word, the word boundaries were generally considered to be the syllable boundaries, except for a small number of those cases in which unstressed function words tend to cliticize to an adjacent stressed syllable in continuous speech (Kean (1977), Selkirk (1984)). Such sequences have been described as phonological words (Kean (1977)) and were treated as a single word for the purpose of identifying syllable boundaries in this analysis. Because the hypotheses being tested concerned which elements of the adult target are particularly salient to children, division of a child's production into syllables was made on the basis of the dictionary citation of the adult target?' If a single syllable of the child's production contained sounds from two or more syllables of the adult target, the phonemes included in the child's production would be coded in relation to the appropriate target syllable.21 C. Stress level. Syllables of the adult target were coded as either stressed or unstressed." As with syllable boundaries, stress levels were 20Dictionary citations tend to group potentially ambisyllabic phonemes with the stressed syllable. Note that this tendency would, if anything, tend to work against the predictions given here. The inclusion of additional phonemes in these syllables (from less salient, unstressed syllables) would, in all likelihood, tend to increase the difficulty of producing the syllable accurately. On the other hand, if these phonemes are perceived by the child as onsets of the following syllable and if, as much of the available data suggest, children tend toward producing CV syllables (e.g., Ingram (1978)), then the accuracy of stressed syllables could be somewhat inflated. However, the number of words in which this inflation may occur was limited and would be insufficient to account for observed effects. 21For example, in [ki] for coffee the first syllable of the adult target would be coded with the /k/ as correct (with a vowel and consonant deleted), and the second syllable would be coded with the /i/ as correct. If it was not clear which syllable of the adult target a phoneme of the child's production should correspond to (e.g., if no phoneme matched exactly), the assignment was made on the basis of similarity of the child and adult target phonemes (as defined by the number of shared features). If assignment was completely ambiguous, the child was given credit for both adult target phonemes (e.g., in a child's production of [magi] for monkey, the [g] would count for both the / g / and the /k/ of the adult target). ''~ecause secondary stress is relatively infrequent in the targets of children's earliest words, adult target syllables carrying primary and secondary stress were included in a single category of stressed syllables. Although it might be expected that syllables carrying primary stress would be more salient than syllables carrying secondary stress, the small numbers of syllables in the corpus carrying secondary stress disallowed an analysis of that possibility.

primarily assigned by reference to a standard English language dictionary. However, if the child's production was an imitation of an adult production, the target stress pattern was that of the adult production. If the adult target consisted of more than a single word, regularities in stress patterns occurring in English compounds and other simple combinations (described, e.g., in Selkirk (1984)) were, where possible, used to assign stress.23Because the location of an adult target in a sentence could presumably vary, it was not possible to determine whether a given word would have carried sentence-level stress in the contexts in which the child had heard it; accordingly, sentence-level stress was not incorporated into this analysis. D. Position. Syllables of the adult target word were coded as initial, medial, or final in position. Where the adult target was longer than a single word, position was coded in relation to the unit that appeared to serve as a single unit for the child. 2.2.3

Codes Describing the Child's Production

E. Spontaneous/imitated. Children's productions were coded as spontaneous or imitated. A code of delayed imitation was assigned to utterances for which the adult target word had not immediately preceded the child's production but had occurred within the previous three adult utterances. If a first token of a particular word type was produced before the adult had produced a word, additional tokens produced after the adult had said the word would be scored as spontaneous unless the form of the word changed in those later tokens to accommodate the adult form of the word. The proportion of imitations or delayed imitations in the total corpus was 15070: 37% for Subject 1, 9% for Subject 2, and 2% for Subject 3. F. Number of phonemes correct and incorrect. Each syllable of the child's production was coded for accuracy in relation to the corresponding syllable of the adult target word on a phoneme-by-phoneme basis. Accuracy was determined by scoring each phoneme of the adult syllable as correct, changed, or deleted in the child's syllable. Phonemes coded as changed typically contained some correct features in relation to the adult target 23Asnoted earlier, children frequently treat sequences of two or more adult words as a single unit. It therefore seemed highly unlikely that children would have analyzed compound words into their component words. Accordingly, no special treatment was given compound words in the analysis. That would have been difficult, in any case, because the number of compound target words in the corpus was small. In cases that could not be treated as compoundlike sequences for the assignment of stress, words in a target sequence longer than a single word were assigned stress as if they occurred in isolation. Function words (e.g., the, a, it) were typically considered to be unstressed, unless occurring in combination with other function words (as in it's-a); otherwise, single-syllable words were counted as stressed.

204

STRESS AND POSITION IN FIRST WORDS

phoneme, although there could be no features in common betwekn the child version and adult target. For purposes of analysis, changed @d deleted phonemes within a syllable contributed to a single category of incorrect phonemes. The number of phonemes correct and incorrect wasthen coded for each syllable based on the accuracy codes assigned to those phonemes. A mean proportion of phonemes correct could then be calculated. The use of proportions permit some compensation for the variability in complexity of syllables (i.e., in more complex syllables, there are more features that could be incorrect but there are also more features that could be correct). However, to the extent that complex syllables are more difficult for young children to produce, this correction may be slightly i n ~ u f f i c i e n t .Pho~~ nemes correct and incorrect were not coded for syllables that the child entirely deleted; these omissions were analyzed separately. Because of the difficulty of obtaining accurate transcriptions of children's speech, phonemes were considered to be correct if they were within a feature of the target phoneme. A similar procedure has been used, for example, by Vihman (1981). The features used to define consonants were [anterior], [coronal], [voice], [continuant], [nasal], and [strident]. Vowels were defined by [back], [high], [low], [tense], and [round] (Chomsky and Halle (1968)). In addition, phonemes that have been shown experimentally to be highly confusable (described, e.g., in Shepard (1972)) were considered to be within a feature, even where they differed in more than one feature. These included the consonant pairs, /d/ and /g/, /0/ and /f/, and /a/ and /v/, and the vowel pairs /a/ and /3/, and / E / and / E / . ~ ~

6. Omissions. Syllables were coded as omitted if no phonemes associated with the target syllable were present in the child's production. 24Note, however, that if anything any such effects should work against the predictions advanced here. Syllables that are assigned stress in English tend to be heavy syllables, that is, syllables consisting of at least a CVV or CVC sequence, whereas light syllables (a consonant and short vowel) are rarely assigned stress (Chomsky and Halle (1968), Selkirk (1984)). Accordingly, stressed syllables in English tend to be more complex than unstressed syllables. Thus, if children are more likely to make production errors in complex syllables, unstressed syllables should benefit in accuracy. 25Vowel diphthong simplifications (e.g., /ai/ to [a]) and the consonant pairs /j/ and /i./ were also considered to be within a feature; the latter judgment was made because /j/ and / E / are frequently described as consisting of /d/ + /3/ and /t/ + I S / , respectively (Ladefoged (1982)). (We recognize that this is not the only possible way of describing these phonemes, but it was one of a number of decisions required to permit coding.) Phonemes added to a syllable were scored as incorrect. However, if a child produced an additional syllable by reduplicating, for example, a consonant-vowel sequence, the change was scored as one phoneme incorrect, rather than two, because this was considered to be a single error. Finally, one extremely common change in children's productions was for an /al/ or an /ar/ to become an [o] or [u]. This change was scored as one phoneme change rather than two because the single change involved both the vowel and the consonant.

2.2.4 Reliability

As an evaluation of the reliability of the transcriptions of the children's productions, a subset (30%) of the utterances were extracted from their contexts and were transcribed by a second researcher trained in phonetic transcription. Reliabilities were calculated for agreement on correct and incorrect assignments for individual phonemes, in relation to the adult target, and for agreement on syllables coded as omitted of all coded syllables. Agreement was .84 for coding of individual phonemes and .99 for coding of omissions. Whenever possible, disagreements were resolved by discussion between the transcribers.

3. RESULTS Several analyses were conducted on productions with multisyllabic adult targets to test the predictions that unstressed and nonfinal syllables in the adult target word should more frequently be omitted, and less frequently produced accurately, than should those that are stressed or final. To test the prediction that children should frequently omit unstressed and nonfinal syllables but should tend to include stressed and final syllables in their productions, the proportion of syllables omitted was calculated as a function of the stress level and position of the adult target syllable. Because the number of syllables in medial position was small, and because stress occurs far more frequently in certain of the three positions in adult target words than in others, the totals for initial and medial position were combined into a single category of nonfinal syllables. Examples of productions in which syllables were omitted are shown in Table 1. The mean proportions of syllables omitted are presented in Table 2 as a function of the stress level and position of the adult target syllable. They are also presented graphically in Figure 1.26 A consideration of these results reveals that syllables that were either stressed or final in the adult target word were far less frequently omitted than were syllables that were unstressed and nonfinal. These observations are supported by the results of a repeated measures analysis of variance, in which the main effects for stress level and position are significant, with F(l, 2) = 77.56, p < .02 and F(l, 2) = 264.01, p < .005, re~pectively.'~ The 26All analyses presented here include imitations; however, the results do not differ importantly if imitations are excluded. 2 7 ~ a cof h the analyses reported here was also performed with all data transformed using an arc sine transformation, to correct for the instability of error term variances that may be present in proportional data (Neter, Wasserman, and Kutner (1985)). Although the results reported in the text reflect analyses of raw scores, the overall results are similar when analyses are conducted on the transformed data.

TABLE 1 Example of Utterances With Omitted Syllables

1

Adult Target

Chilqs Production

chocolate orange eraser elephant

[Eak) [an3 [raisa] [elfAn]

Stressed syllable retained Final syllable retained Stressed and final retained

TABLE 2 Mean Proportion of Syllables Omitted as a Function of Stress and Position for Targets of Two or More Syllables Position Stress Level

Nonfinal

Final

Stressed Unstressed Note.

Numbers in parentheses are total number of syllables contributing to analysis.

u a,

.-

C C

Unstressed

~ihal

Nonfinal

Position FIGURE 1 Mean proportion of syllables omitted as a function of stress and position for targets of two or more syllables.

ECHOLS AND NEWPORT

207

Stress x Position interaction was also significant, F(l,2) = 48.10, p < .05. The interaction is due to the pattern, clearly seen in Table 2 , wherein syllables that are both unstressed and nonfinal are particularly susceptible to omission, far more so than would be expected if the increase in probability of omission due to unstressed status and to nonfinal position combined in a simple additive function. The prediction that any unstressed, nonfinal target syllables retained in a child's utterance should be less accurate than the stressed or final target syllables was tested by calculating the proportion of phonemes correct as a function of position, and of level of stress, for syllables included in a child's production. In other words, to be certain that such an effect is independent of the effect of syllable omissions, scoring excluded omissions and was performed only on syllables represented by some phonetic material. As in the analysis of omitted syllables, initial and medial syllables were combined into a single category of nonfinal syllables. The mean proportions of phonemes correct are shown as a function of stress and position in Table 3, and they are summarized in graphic form in Figure 2. Mirroring the results for omitted syllables, the proportion of phonemes correct is substantially lower for syllables that are both nonfinal and unstressed. The mean proportion correct ranges between .66 and .69 for syllables that are either stressed or final but is much lower, at .36 correct, for nonfinal unstressed syllables. Significant main effects for stress level, F(1, 2) = 180.94, p < .01, and for position, F(1, 2) = 40.05, p < .05, provide support for these observations. The Stress x Position interaction was also significant, with F(1,2) = 139.59, p < .01, again reflecting the particular vulnerability of syllables that are both unstressed and nonfinal. The data presented earlier, which included productions for all targets of two or more syllables, are presented separately for productions with targets of only two syllables in Tables 4 and 5 (Figures 3 and 4) and for utterances with targets of three or more syllables in Tables 6 and 7 (Figures 5 and 6). Dividing the data in this way is of interest because the vast majority of words with two-syllable adult targets contained an initial stressed syllable TABLE 3 Mean Proportion of Phonemes Correct as a Function of Stress and Position for Targets of Two or More Syllables Position Stress Level

Nonfinal

Final

Stressed Unstressed

.69 (1,568) .36 (182)

.66 (521) .67 (691)

Note.

Numbers in parentheses are total number of phonemes contributing to analysis.

Nonfinal

Final

Position FIGURE 2 Mean proportion of phonemes correct as a function of stress and position for targets of two or more syllables.

TABLE 4 Mean Proportion of Syllables Omitted as a Function of Stress and Position for Targets of Two Syllables Only

Position Stress Level

Nonfinal

Final

Stressed Unstressed

.Ol (448) .26 (29)

.02 (128) . l l (347)

Note.

Numbers in parentheses are total number of syllables contributing to analysis.

TABLE 5 Mean Proportion of Phonemes Correct as a Function of Stress and Position for Targets of Two Syllables Only

Position Stress Level

Nonfinal

Final

Stressed Unstressed

Note.

208

Numbers in parentheses are total number of phonemes contributing to analysis.

D

Unstressed

Q

.-E C

0.3

0

Nonfinal

Final

Position

FIGURE 3 Mean proportion of syllables omitted as a function of stress and position for targets of two syllables only.

Stressed

Unstressed

Nonfinal

Final

Position

FIGURE 4 Mean proportion of phonemes correct as a function of stress and position for targets of two syllables only.

TABLE 6 Mean Proportion of Syllables Omitted as a Function of Stress and Position for Targets of Three or More Syllables

Position Stress Level

Initial

Medial

Stressed Unstressed

.16 (96) .44 (47)

.21 (82) .55 (112)

Final 0

(84) .lo (50)

Note. Numbers in parentheses are total number of syllables contributing to analysis.

TABLE 7 Mean Proportion of Phonemes Correct as a Function of Stress and Position for Targets of Three or More Syllables

Position Stress Level

Initial

Medial

Final

Stressed Unstressed

Note. Numbers in parentheses are total number of phonemes contributing to analysis.

Stressed Unstressed

.-

E 0

0.6

Initial

Medial

Final

Position FIGURE 5 Mean proportion of syllables omitted as a function of stress and position for targets of three or more syllables.

ECHOLS AND NEWPORT

Y

I

Stressed

Initial

21 1

Unstressed

Medial

Final

Position FIGURE 6 Mean proportion of phonemes correct as a function of stress and position for targets of three or more syllables.

and a final unstressed syllable. In utterances with targets of three or more syllables, there is a somewhat more equal number of words with stress falling in each of the three syllable positions. As might be expected, however, the frequency of words with targets of three or more syllables is relatively small in children's early speech (the total number of words with targets of at least three syllables was 144 across the three subjects, 37 for Subject 1, 37 for Subject 2 , and 7 0 for Subject 3). As can be seen in the tables and figures, the patterns obtained for proportion of syllables omitted and for proportions of phonemes correct conform closely to those obtained in the overall analysis, both for utterances of two syllables only and for utterances of three or more syllables. Unstressed, nonfinal syllables of the adult target word are far more frequently omitted and, when included, are less accurately produced than are syllables that are either stressed or 4.

DISCUSSION

The results of the analyses of omitted syllables and of the accuracy of syllables included in children's productions are consistent with the hypoth28Dueto the smaller numbers of data points, however, the results of the statistical analyses are not as strong as for the overall analysis and, in any case, are not independent of the analysis of the combined data set.

212

STRESS AND POSITION IN FIRST WORDS

esis that stressed and final syllables are particularly salient to young children. The prediction that stressed and final syllables are m st likely to be included in children's early productions is supported by the analysis of the proportion of syllables omitted as a function of the level of stress and position of the adult target syllable. The magnitude of thjs effect is substantial. An examination of Table 2 reveals that a striking 51% of the unstressed nonfinal syllables were omitted from children's productions, whereas at most 11Yo of either stressed or final syllables were omitted. The significant Stress x Position interaction supports the observation that unstressed, nonfinal syllables were particularly vulnerable to omission. These results suggest that stress and syllable position do play a role in determining whether a syllable will be included in a child's production. The tendency to omit unstressed and nonfinal syllables is consistent with the claim that children may perceive stressed and final syllables as particularly salient and, as a result, be particularly likely to extract those syllables from the stream of adult speech. In that these tendencies appear to be evident in children's earliest speech, these results are also consistent with the view that such perceptual tendencies may drive the initial segmentation or extraction of approximately word-level units from speech." If the child is, in fact, biased to attend to stressed and final syllables, then even where unstressed and nonfinal syllables are extracted and included in children's productions, those syllables should, because they are less salient, be analyzed less accurately. This prediction was also confirmed in the analysis of children's utterances. The results of the analysis of number of phonemes correct in those syllables included in children's productions provide support for the prediction that syllables that are stressed and those that are final in the adult target word will be more accurately reproduced by the child than will syllables that are unstressed and nonfinal in the adult target word. The greater accuracy in production of stressed anq final target syllables could be an indication that those syllables are particdlarly salient

P

2 9 ~was s noted earlier, it is possible that final syllables are particularly likely to be retained when they follow a stressed syllable, or, more generally, it may be that children have a "trochaic bias" that results in a tendency to prefer sequences consisting of a stressed followed by an unstressed syllable over those consisting of an unstressed syllable followed by a stressed syllable (Allen and Hawkins (1980), Gerken et al. (1990)). Indeed, as one reviewer suggested, a trochaic bias for English-learning children could reflect a more general "edge effect" related to the foot construction processes of the language. In English, final syllables may be salient because metrical feet are constructed from ends of words, whereas in a language with iambic feet constructed from left to right, an initial syllable may be more salient. However, because our data are exclusively from English and because English is heavily dominated by words with penultimate stress (the target words in this corpus were no exception), it was not possible to identify independent effects of these different processes. Accordingly, it was not possible to determine whether the stress plus final, the trochaic bias, or the edge hypothesis accounts 1 better for the data in this study.

to children and, therefore, most likely to be attended to and accurately analyzed, whereas syllables that are both unstressed and nonfinal are less salient and, as a result, are less likely to be accurately analyzed, even when extracted. The significant Stress x Position interactions for the analyses of omitted syllables and of accuracy of included syllables suggest that the effects of stress and position are not additive; that is, if a syllable is either stressed or final, then it is likely to be included and produced relatively accurately, and there is, apparently, no added advantage to being both stressed and final. Viewed from the other direction, the interaction implies that it is specifically when both unstressed and nonfinal that a syllable is vulnerable to omission or, if included, to inaccurate production. These observations can be confirmed by comparing the mean proportion of phonemes correct for final stressed syllables to the means for final unstressed syllables and nonfinal stressed syllables; all are virtually identical, whereas the accuracy of syllables that are both unstressed and nonfinal reaches only about half that level. A complementary pattern holds for omissions. This apparent lack of additivity of stress and position effects may occur simply because the presence of either results in what is essentially ceiling performance for children of this age. Other attentional factors and production factors (such as difficulty producing consonant clusters) may very well conspire to keep children of this age from producing stressed and final syllables above the observed level of accuracy, whereas the lack of additivity in the analyses of omitted syllables may simply be due to a "floor effect" (i.e., the very small numbers of stressed or final syllables that were omitted). 4.1 Perceptual Versus Production-Based Accounts

The results strongly support a set of predictions based on the notion that tendencies to attend to stressed and final syllables may assist the child in the initial identification of words in speech. However, the question of whether these observations are due to perceptual or representational factors, or whether they could be attributed to production constraints, cannot be resolved on the basis of the present results. Previous accounts for effects of stress and position in older children have included both perceptualrepresentational and production-based accounts. However, very little empirical evidence has been presented that conclusively distinguishes these accounts, and none comes from the age range under study in our own experiments. For example, Blasdell and Jensen (1970), in a study with older children, and Gleitman and Wanner ((1982), also Gleitman et al. (1988)), in their theoretical accounts, attributed omission of unstressed or nonfinal syllables to a failure to process those less salient syllables. Klein (1981a; 1981b),

214

STRESS AND POSITION IN FIRST WORDS

4

studying very young children, argued that both production an perceptual factors contribute to the form of early utterances. Whereas production factors constrain the length or complexity of early output, percedtual factors may influence which specific elements are included in a represqntation and thereby contribute to the form that a particular word takes whep produced. Pye (1983; in press), studying telegraphic speakers of K'iche', suggested that both production and perceptual factors underlie children's ihclusion of stressed verb inflections and omission of unstressed verb roots. h e observed that children may include in their productions different syllables of a given verb depending on which syllable would be stressed in the adult target. To account for this, Pye suggested that children are limited by a production constraint to producing only one or two syllables; they use perceptual salience to determine which syllables to retain and which to omit.30 In contrast, though also in older children, Allen and Hawkins (1980), Hochberg (1988), and Gerken, Landau, and Remez (1990) presented entirely production-based accounts of stress effects. Allen and Hawkins (1980) suggested that certain unstressed syllables are omitted by 3%- to 6%-year-olds due to a tendency to produce trochaic accent patterns; however, they did not provide evidence that children have stored the omitted unstressed syllables, and only minimal support for a ttochaic bias .~~ (1988) provided evidence (see Hochberg (1988) for d i s c u ~ s i o n )Hochberg in 2-year-old children for changes in accuracy of syllables when stress is shifted. The most compelling evidence for a production-based account of omissions of one class of unstressed syllables in telegraphic speakers (ages 23-30 months) was provided by Gerken et al. (1990). In a series of imitation studies, children were more likely to omit English function words than nonsense function words. Thus, children apparently did perceive the unstressed function words. Recent evidence by Gerken and McIntosh (1991) suggests that sensitivity to function words may extend downward to MLUs of less than 1.5. One way of integrating these accounts, while maintaining the perceptually based view of early production proposed in the present article, is to suggest that the status of stress and syllable position may change somewhat over development. Gleitman et al. (1988), Peters and Menn (1990), and others have suggested such a process. At the beginning of language learning, tendencies to attend to stressed and final syllables, combined with limited knowledge of linguistic structure, may result in children extracting primarily the perceptually salient syllables from a sequence of speech, 30Pye's account is actually more complex, and somewhat more oriented toward production factors, than our explanation suggests; see Pye (1983; in press) for details. 31Additionally, potential evidence for a perceptual component was excluded. Children were trained to criterion on the word-learning task and rate of learning results were not reported, although Allen and Hawkins acknowledged that number of trials to criterion may have been greater for sequences in which the trochaic pattern was violated.

ECHOLS AND NEWPORT

215

leaving behind much of the rest as unanalyzed noise (though see Peters (1977) and Echols (1992) for evidence that aspects of the prosodic structure, if not the segments of unstressed syllables, may be represented even in children's earliest words). As children acquire more words or begin to represent existing words more completely, they should begin to attend to the less salient syllables. However, these less salient syllables may still be less completely represented than stressed or final syllables (Echols (1992), Klein (1981a), see also Waterson (1971) for a related view). Because they are less completely represented, those syllables may be more vulnerable to omission even after children have some representation for them. As children's representations become more complete, the initial perceptual basis for these omissions may be replaced by production constraints, that is, those syllables that were initially poorly represented will be particularly susceptible to omissions due to production limitations. In this sense, unstressed syllables may be "fragile" throughout development, though their precise representational status and the factors underlying their output may change.32 Although the role of perceptual salience apparently changes somewhat over the course of language acquisition, perceptual factors may continue to influence the form of representations for words well beyond the earliest phases of language learning. Indeed, Aitchison and Chiat (1981) provided evidence supporting a perceptually based account for effects of stress on the form of children's productions in subjects ranging in age from 4% to 9 years. In a task involving the learning of names for unusual animals, Aitchison and Chiat elicited omissions of unstressed, nonfinal syllables and other errors similar to those observed in young language learners. Because of the subjects' well-developed language abilities and because they could produce the words accurately immediately after hearing them, Aitchison and Chiat argued that memory limitations may account for the observed errors. They suggested that when children's memory is overloaded, as it may frequently be when learning an entirely new word, they may fail to store all of the features of the word. Perceptual salience may influence which features are stored. Although the data are from older children, Aitchison and Chiat's results tend to argue for the importance of accurate storage in a representation. Memory factors, such as the accurate storage of a representation for a word, account far better than do production factors for the observed patterns of omissions and inaccuracies in their school-age subjects. Although we prefer the perceptually based account and argue that it accounts very well for the data, an alternative interpretation of the present results is that they, along with later aspects of selective output, arise entirely from production constraints. In such a view, the observed advantage of 3ZIndeed,Kean (1977) provided evidence that unstressed function words are particularly vulnerable with aphasia in adulthood.

stressed and final syllables would not result from a perceptual bjas and thus might not play the role we have hypothesized in early word se$mentation. In this case, however, the results we report for early word produktion would still contribute to the correct description of such output coAstraints. A strong claim for a perceptual or a production-based explanation for the observed tendencies to omit unstressed and especially unstressed and nonfinal syllables will require additional evidence of types other than early productions, that is, evidence that taps children's representations more directly. (See Echols (1988; 1992) for a study investigating children's representations, though of other aspects of word structure.) Although the data described here do not specifically address this question, it is possible that some of the same perceptual tendencies that assist with the identification of words in speech could be recruited for the identification of larger linguistic units. For example, unstressed syllables, particularly function words, may provide cues to grammatical units (Gerken et al. (1990), Morgan et al. (1987)). More generally, tendencies to attend to prosody may assist in the identification of larger linguistic units (Gleitman et al. (1988), Gleitman and Wanner (1982), Jusczyk et al. (1992), Morgan et al. (1987), Morgan and Newport (1981)). Thus, these early perceptual tendencies could lead directly into the prosodic bootstrapping proposed for later stages of the acquisition process by Gleitman et al. and Morgan et al. 4.2 Summary The problem that originally motivated the set of predictions and the study described here was that of how the very young language learner initially identifies word-level units in speech. The enormity of the problem suggested that the task must somehow be constrained for the child, and a consideration of theoretical and empirical work suggested that perceptual biases may provide the required constraints. In particular, biases to attend to stressed and final elements may assist the child in identifying approximately word-level units in speech. The results described here provide strong support for the prediction that stressed and final syllables will tend to be included in children's earliest utterances, whereas syllables that are both unstressed and nonfinal will tend to be omitted. They also conform very well to the notion that stressed and final syllables are particularly salient to children and, as a result, are particularly likely to be extracted and included in early productions. More generally, this finding on one-word speech, in turn, adds support to suggestions of prosodic bootstrapping proposed for later stages of the acquisition process (Gleitman et al. (1988), Gleitman and Wanner (1982), Morgan et al. (1987), Morgan and Newport (1981)).

ECHOLS AND NEWPORT

217

Although additional types of evidence will be required to eliminate other possible explanations, and factors other than stress and position will undoubtedly require inclusion in a full description of the extraction and representation of first words, the results of these analyses do provide one additional source of evidence for the perceptual salience of stressed and final elements. Combined with the cross-linguistic data and the results of the studies of children's imitations, the evidence is robust that tendencies to attend to stressed and final elements may assist in the initial identification of word-level elements of speech.

ACKNOWLEDGMENTS This research was supported by a Cognitive Science/Artificial Intelligence Fellowship and a Graduate College Dissertation Research Grant from the University of Illinois, by INRSA HD07109 from NICHD, and by a Summer Salary Award from the University Research Institute at the University of Texas to C. Echols, by NICHD Training Grant HD07205, by Sloan Foundation Grant 83-6-14 to Stanford University, and by NIH grant DC00167 to E. Newport and T. Supalla. We thank ReneC Baillargeon, Ann Brown, Dedre Gentner, LouAnn Gerken, Lila Gleitman, Michael Kenstowicz, Molly Mack, Doug Medin, Richard Meier, and three anonymous reviewers for helpful discussion, and Ann Garber for important assistance with transcription.

REFERENCES Aitchison, J. and C. Chiat (1981) "Natural Phonology or Natural Memory? The Interaction Between Phonological Processes and Recall Mechanisms," Language and Speech 24, 31 1-326. Aksu-kog, A. A. and D. I. Slobin (1985) "The Acquisition of Turkish," in D. I. Slobin, ed., The Crosslinguistic Study of Language Acquisition: Vol. I . The Data, Lawrence Erlbaum Associates, Inc., Hillsdale, New Jersey. Allen, G . D. and S. Hawkins (1978) "The Development of Phonological Rhythm," in A. Bell and J. B. Hooper, eds., Syllables and Segments, North-Holland, Amsterdam. Allen, G . D. and S. Hawkins (1980) "Phonological Rhythm: Definition and Development," in G. Yeni-Komshian, J. F. Kavanagh, and C. A. Ferguson, eds., Child Phonology: Vol. 1 . Production, Academic, New York. Berman, R. A. (1977) "Natural Phonological Processes at the One-Word Stage," Lingua 43, 1-21. Bever, T. G. (1970) "The Cognitive Basis for Linguistic Structures," in 3. R. Hayes, ed., Cognition and the Development of Language, Wiley, New York. Blasdell, R. and P. Jensen (1970) "Stress and Word Position as Determinants of Imitation in First Language Learners," Journal of Speech and Hearing Research 13, 193-202. Broselow, E. (1984) "An Investigation of Transfer in Second Language Phonology," International Review of Applied Linguistics 22, 253-269. Brown, R. (1973) A First Language: The Early Stages, Harvard University Press, Cambridge, Massachusetts.

Chomsky, N. and M. Halle (1968) The Sound Pattern of Engllsh, Harper & Rbw, New York. Clancy, P. M. (1985) "The Acquisition of Japanese," in D. I. Slobin, ed., ThelCrosslingurstic Study of Language Acquisition: Vol. I . The Data, Lawrence Erlbaum Associates, Inc., Hillsdale, New Jersey. Cole, R. A. and J. Jakamik (1980) "A Model of Speech Perception," in R. A. Cole, ed., Perception and Production of Fluent Speech, Lawrence Erlbaum ASsociates, Inc., Hillsdale, New Jersey. Cooper, W. E. (1983) "The Perception of Fluent Speech," Annals of the New York Academy of Science 405, 48-63. Cutler, A. and D. J. Foss (1977) "On the Role of Sentence Stress in Sentence Processing," Language and Speech 20, 1- 10. Du Preez, P. (1974) "Units of Information in the Acquisition of Language," Language and Speech 17, 369-376. Echols, C. H. (1987) A Perceptually-Based Model of Children's First Words, Doctoral dissertation, University of Illinois, Champaign-Urbana. Echols, C. H. (1988) "The Role of Stress, Position and Intonation in the Representation and Identification of Early Words," Papers and Reports on Child Language Development 27, 39-46. Echols, C. H. (1992) "A Perceptually-Based Model of Children's Earliest Productions," ms., submitted for publication. Ferguson, C. A. (1976) "Learning to Pronounce: The Earliest Stages of Phonological Development in the Child," Papers and Reports on Child Language Development 11, 1-27. Fernald, A. and P. K. KuhI(1987) "Acoustic Determinants of Infant Preference for Motherese Speech," Infant Behavior and Development 10, 181-195. Fernald, A. and C. Mazzie (1991) "Prosody and Focus in Speech to Infants and Adults," Developmental Psychology 27, 209-221. Feurer, H. (1980) "Morphological Development in Mohawk," Papers and Reports on Child Language Development 18, 25-42. Frazier, L. (1987) "Structure in Auditory Word Recognition," Cognition 25, 157-187. Frumhoff, P., C. H. Echols, and E. L. Newport (1992) "Perceptual Salience and Operating Principles for Language Acquisition: The Effects of Stress and End of Word," ms., University of Texas, Austin. Gerken, L. A., B. Landau, and R. Remez (1990) "Function Morphemes in Young Children's Speech Perception and Production," Deve/opmenta/ Psychology 26, 204-216. Gerken, L. A. and B. J. McIntosh (1991) "Single Word Talkers Can Distinguish Grammatical and Ungrammatical Function Morphemes," ms., State University of New York, Buffalo. Gleitman, L. R., H. Gleitman, B. Landau, and E. Wanner (1988) "Where Learning Begins: Initial Representations for Language Learning," in F. J. Newmeyer, ed., Linguistics: The Cambridge Survey: Vol. 3. Language: Psychological and Biological Processes, Cambridge University Press, Cambridge, England. Gleitman, L. R. and E. Wanner (1982) "Language Acquisition: The State of the State of the Art," in E. Wanner and L. R. Gleitman, eds., Language Acquisition: The State of the Art, Cambridge University Press, Cambridge, England. Grosjean, F. and J. P. Gee (1987) "Prosodic Structure and Spoken Word Recognition," Cognition 25, 135-155. Hayes, J. R. and H. H. Clark (1970) "Experiments on the Segmentation of an Artificial Speech Analog," in J . R. Hayes, ed., Cognition and the Development of Language, Wiley, New York. Hirsh-Pasek, K., D. G. Kemler Nelson, P . W. Jusczyk, K. Wright-Cassidy, B. Druss, and L. Kennedy (1987) "Clauses are Perceptual Units for Young Infants," Cognition 26, 269-286. Hochberg, J. G. (1988) "First Steps in the Acquisition of Spanish Stress," Journal of Child Language 15, 273-292.

Ingram, D. (1976) "Phonological Analysis of a Child," Glossa 10, 3-27. Ingram, D. (1978) "The Role of the Syllable in Phonological Development," in A. Bell and J. B. Hooper, eds., Syllables and Segments, North-Holland, Amsterdam. Ingram, D. (1992) "Early Phonological Acquisition: A Cross-Linguistic Perspective," in C. A. Ferguson, L. Menn, and C. Stoel-Gammon, eds., Phonological Development: Models, Research, Implications, York, Monkton, Maryland. Jakobson, R. (1968) Child Language, Aphasia and Phonological Universals (A. R. Keiler, Trans.), Mouton, The Hague. (Original work published 1941) Jusczyk, P. W., K. Hirsh-Pasek, D. G. Kemler Nelson, L. J. Kennedy, A. Woodward, and J. Piwoz (1992) "Perception of Acoustic Correlates of Major Phrasal Units by Young Infants," Cognitive Psychology 24, 252-293. Jusczyk, P. W. and E. Thompson (1978) "Perception of a Phonetic Contrast in Multisyllabic Utterances by Two Month Olds," Perception and Psychophysics 2, 105-109. Kahn, D. (1980) "Syllable-Structure Specifications in Phonological Rules," in M. Aronoff and M. L. Kean, eds., Juncture: A Collection of Original Papers, Anma Libri, Saratoga, California. Karzon, R. G. (1985) "Discrimination of Polysyllabic Sequences by One- to Four-Month-Old Infants," Journal of Experimental Child Psychology 39, 326-342. Kean, M. L. (1977) "The Linguistic Interpretation of Aphasic Syndromes: Agrammatism in Broca's Aphasia, An Example," Cognition 5, 9-46. Kemler Nelson, D. G. (1989) "Developmental Trends in Infants' Sensitivity to Prosodic Cues Correlated with Linguistic Units," paper presented at the biennial meeting of the Society for Research in Child Development, Kansas City, Missouri. Klein, H. B. (1981a) "Early Perceptual Strategies for the Replication of Consonants from Polysyllabic Lexical Models," Journal of Speech and Hearing Research 24, 535-551. Klein, H. B. (1981b) "Productive Strategies for the Pronunciation of Early Polysyllabic Lexical Items," Journal of Speech and Hearing Research 24, 389-405. Ladefoged, P. (1982) A Course in Phonetics, 2nd ed., Harcourt Brace Jovanovich, San Diego, California. Lehiste, I. (1970) Suprasegmentals, MIT Press, Cambridge, Massachusetts. Macken, M. A. (1980) "The Child's Lexical Representation: The Puzzle-Puddle-Pickle Evidence," Journal of Linguistics 16, 1-17. MacWhinney, B. (1978) "The Acquisition of Morphophonology," Monographs of the Society for Research in Child Development 43 (1-2, Serial No. 174). MacWhinney, B. (1985) " ~ u n ~ a i i aLanguage n Acquisition as an Exemplification of a General Model of Grammatical Development," in D. I. Slobin, ed., The Crosslinguistic Study of Language Acquisition: Vol. 2. Theoretical Issues, Lawrence Erlbaum Associates, Inc., Hillsdale, New Jersey. Mehler, J., J . Bertoncini, M. Barriere, and D. Jassik-Gerschenfeld (1978) "Infant Recognition of Mother's Voice," Perception 7, 491-497. Menn, L. (1978) "Phonological Units in Beginning Speech," in A. Bell and J. B. Hooper, eds., Syllables and Segments, North-Holland, Amsterdam. Morgan, J. L. (1986) From Simple Input to Complex Grammar, MIT Press, Cambridge, Massachusetts. Morgan, J. L., R. P. Meier, and E. L. Newport (1987) "Structural Packaging in the Input to Language Learning: Contributions of Prosodic and Morphological Marking of Phrases," Cognitive Psychology 19, 498-550. Morgan, J. L. and E. L. Newport (1981) "The Role of Constituent Structure in the Induction of an Artificial Language," Journal of Verbal Learning and Verbal Behavior 20, 67-85. Neter, J., W. Wasserman, and M. H. Kutner (1985) Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs, Irwin, Homewood, Illinois.

220

STRESS AND POSITION IN FIRST WORDS

Oller, D. K. and R. E. Rydland (1974) "Note on the Stress Preferences of Young EnglishSpeaking Children," ms., Mailman Center for Child Development, ~iami,lFlorida. Paccia-Cooper, J. and W. E. Cooper (1981) "The Processing of Phrase StruJture in Speech Production," in P . E. Eimas and J. L. Mills, eds., Perspectives on the Study of Speech, Lawrence Erlbaum Associates, Inc., Hillsdale, New Jersey. Peters, A. M. (1977) "Language Learning Strategies: Does the Whole Equal the Sum of the Parts?" Language 53, 560-573. Peters, A. M. (1981) "Language Typology and the Segmentation Problem in Early Child Language Acquisition," Proceedings of the Seventh Annual Meeting of the Berkeley Linguistics Society 7, 236-248. Peters, A. M. (1983) The Units of Language Acquisition, Cambridge University Press, Cambridge, England. Peters, A. M. and L. Menn (1990) "The Microstructure of Morphological Development: Variation Across Children and Across Languages," ms., University of Colorado, Boulder. Poser, W. J. (1984) The Phonetics and Phonology of Tone and Intonation in Japanese, Doctoral dissertation, Massachusetts Institute of Technology, Cambridge. Pullum, G. K. and W. A. Ladusaw (1986) Phonetic Symbol Guide, University of Chicago Press, Chicago, Illinois. Pye, C. (1983) "Mayan Telegraphese: Intonational Determinants of Inflectional Development in Quiche Mayan," Language 59, 583-604. Pye, C. (in press) "The Acquisition of K'iche' Maya," in D. L. Slobin, ed., The Crosslinguistic Study of Language Acquisition: Vol. 3, Lawrence Erlbaum Associates, Inc., Hillsdale, New Jersey. Selkirk, E. (1984) Phonology and Syntax, MIT Press, Cambridge, Massachusetts. Shepard, R. N. (1972) "Psychological Representation of Speech Sounds," in E. E. David and P. B. Denes, eds., Human Communication: A Unified View, McGraw-Hill, New York. Slobin, D. I. (1973) "Cognitive Prerequisites for the Development of Grammar," in C. A. Ferguson and D. I. Slobin, eds., Studies of Child Language Development, Holt, Rinehart &Winston, New York. Slobin, D. I. (1982) "Universal and Particular in the Acquisition of Language," in E. Wanner and L. R. Gleitman, eds., Language Acquisition: The State of the Art, Cambridge University Press, Cambridge, England. Slobin, D. I., ed. (1985) The Crosslinguistic Study of Language Acquisition: Vol. 1. The Data, Lawrence Erlbaum Associates, Inc., Hillsdale, New Jersey. Smoczy~iska,M. (1985) "The Acquisition of Polish," in D. I. Slobin, ed., The Crosslinguistic Study of Language Acquisition: Vol. 1. The Data, Lawrence Erlbaum Associates, Inc., Hillsdale, New Jersey. Stampe, D. (1969) "The Acquisition of Phonetic Representation," in R. I. Binnick, A. Davison, G. M. Green, and J. L. Morgan, eds., Papers from theFifth ReglonalMeeting of the Chicago Linguistics Society, University of Chicago, Chicago, Illinois. Stein, J. and P. Y. Su, eds. (1978) The Random House Dictionary, Ballantine, New York. Vihman, M. M. (1981) "Phonology and the Development of the Lexicon: Evidence from Children's Errors," Journal of Child Language 8, 239-264. Waibel, A. (1986) "Suprasegmentals in Very Large Vocabulary Word Recognition," in E. C. Schwab and H. C. Nusbaum, eds., Pattern Recognition by Humans and Machines: Vol. 1. Speech Perception, Academic, Orlando, Florida. Waterson, N. (1971) "Child Phonology: A Prosodic View," Journal of Linguistrcs 7, 179-21 1. Weist, R. M. (1983) "Prefix versus Suffix Information in the Comprehension of Tense and Aspect," Journal of Child Language 10, 85-96. Woodward, J. Z. and R. N. Aslin (1990) "Segmentation Cues in Maternal Speech to Infants," poster presented at the Internationaal Conference on Infancy Studies, Montreal, Canada.