Vowel dynamics in the acquisition of L2 English – a

1 downloads 0 Views 752KB Size Report
consonants /j ɲ ɕ ʑ tɕ dʑ k g/, with the exception of the velar fricative /x/. For this reason, ... manual annotation in the Praat program (Boersma & Weenink 2017). Unfortunately, it was not ...... Gramatyka Współczesnego Języka Polskiego –. Fonetyka i Fonologia [Grammar of Modern Polish – Phonetics and Phonology] PAN:.
1

Vowel dynamics in the acquisition of L2 English – a cross-sectional and longitudinal study of L1 Polish learners Geoffrey Schwartz, Kamil Kaźmierski, Jarosław Weckwerth, Mateusz Jekiel, Kamil Malarski Faculty of English, UAM Poznań 0 Abstract This paper presents a cross-sectional and longitudinal study of the acquisition of vowel formant dynamics in L2 (British) English by Polish learners. Results from these experiments, along with baseline data from L1 English and L1 Polish, suggest that the acquisition of English vowels by Polish learners entails a temporal reorganization such that vocalic targets in English are housed later in a vowel‟s duration than they are in Polish or Polish-accented English. This leads to an increase in the degree of formant movement, particularly through the 2nd and 3rd fifths of vowel duration, over the course of L2 acquisition. Implications of these findings for cross-language phonetic similarity, a crucial concept for current theoretical models of L2 speech, are also discussed. 1 Introduction The acquisition of the vowel system of English as a second language (L2) has been the subject of a large number of experimental studies (for literature reviews on L2 speech, see e.g. Zampini 2008 and Bohn 2017). Most frequently, this research is spawned by differences between the inventory of vowels found in the English system and those of the first languages (L1) of the learner groups under study. L2 English vowel acquisition therefore presents a clear set of questions that may be developed into hypotheses for empirical research. Since English includes typologically uncommon vowels such as /æ/, /ɒ/, and /ʌ/, as well as difficult contrasting pairs that are in relatively close proximity on two-dimensional vowel charts (/iː//ɪ/; /uː/-/ʊ/), experimental studies have looked at the perception and production of vowels by learners whose L1 vowel systems lack these elements. In particular, the long-short (tense-lax) contrast in high, front, vowels (/iː/-/ɪ/; keywords FLEECE-KIT (Wells 1982)), as well as the realization of the low front vowel /æ/ (TRAP) have been the focus of L2 studies of learners from a number of L1 backgrounds, including Spanish (Escudero & Boersma 2004), German (e.g. Bohn & Flege 1992), Mandarin (e.g. Flege et al. 1997), and Polish (e.g. Gonet et al., 2010; Rojczyk 2011, Weckwerth 2011).1 A common motif underlying all of this research is the degree to which learners treat L1 and L2 vowel items as equivalent. Bohn (2017) refers to this problem as a „mapping‟ issue. For example, Spanish has only a single high front vowel /i/ and no duration contrasts, so researchers have investigated the interlanguage mapping between two L2 categories and the single L1 vowel. An interesting finding is that despite the fact that their L1 has no vowel duration contrasts, Spanish speakers use duration in their perception and production of the pair to a greater degree than native speakers, who rely more on the static position of the vowels in two-dimensional acoustic space (Escudero & Boersma 2004). Bohn & Flege (1992) and Rojczyk (2011) describe similar findings for the production of /æ/ by L1 speakers of 1

The TRAP vowel in British English has been shown to be undergoing a process of lowering and retraction (Hawkins & Midgley 2005), but is still described as a front vowel in most textbooks of English pronunciation.

2

German and Polish, respectively. In both of those cases, learners appear to be less sensitive than natives to the position of this vowel in acoustic space, relying instead on duration to produce a contrast with the nearest L1 vowel with an /ɛ/-like quality. To explain such findings, Bohn (1995) hypothesized that learners may become „desensitized‟ by phonetic properties they rely on in their L1, and attend to new types of acoustic information that are not used in their native system. Thus, when Spanish listeners hear beat-bit, or Polish listeners hear bet-bat, learners whose L1 spectral associations encompass the L2 pairs in a single category start to attend to new acoustic information, in these cases duration, that is not relevant in L1. This idea is incorporated into Hypotheses 2 and 3 of Flege‟s Speech Learning Model (SLM; Flege 1995), by which acoustic differences contribute to the formation of L2 phonetic categories, especially if those differences are not subject to „equivalence classification‟ (Flege 1987) by learners. These issues bear on the role of cross-language phonetic similarity that is crucial to current models of L2 speech acquisition, including the SLM, as well as the Perceptual Assimilation Model (PAM; Best, 1995; PAM-L2, Best and Tyler, 2007) and the Second Language Linguistic Perception model (L2LP; Escudero 2005). Notably, the studies discussed above, which constitute just a small sample of a larger body of work, contribute to our understanding of similarity by engaging in direct comparisons of different types of acoustic features. While these studies and the models they have tested have added to our understanding of L2 speech acquisition, their contribution may be limited somewhat by assumptions concerning equivalence and similarity of phonological segments in L1 and L2 (see Bohn 2017). There is more to vowels than their duration and static position in F1-F2 space, and other phonetic properties may contribute to the cross-language phonological differences that present challenges for L2 learners. For example, the English vowel system has been shown to exhibit a fairly large degree of vowel inherent spectral change (VISC; e.g. Nearey and Assmann 1986), changes in vowel quality over the time course of a vowel‟s duration, in the production of nominal monophthongs (diphthongs, of course, are also characterized by a large degree of formant movement). At the same time, perceptual studies have shown that VISC is an important element of vowel identification by native listeners of English (see Hillenbrand 2013). Given its prevalence in both perception and production, it is reasonable to suggest that vowel inherent change constitutes a systemic aspect of the sound system of many varieties of English, which in turn may present a challenge for L2 acquisition. This is especially the case for speakers of L1s whose vowel systems are characterized by relatively stable quality. Not all languages exhibit as much VISC as English (e.g. Williams et al. 2015), so a question for investigation is the degree to which the dynamic aspects of English vowel quality are acquired by L2 learners whose L1 is characterized by more steady vowel quality. This paper will present acoustic data from vowel productions by L1 Polish learners of English in a L1-dominant university setting. Polish is a language with a small vowel system and relatively stable vowel quality (Dukiewicz & Sawicka 1995). Thus, our goal is to gain insight into the acquisition of formant dynamics by speakers whose L1 is characterized by a relatively small amount of formant movement. Two separate acoustic studies of L2 speakers will be presented. The first is a cross-sectional study comparing first year students at the beginning of their university studies to faculty members and PhD students from the same university. The second is a longitudinal study tracking vowel production by learners over the

3

course of an academic year in which they receive extensive explicit instruction in both English pronunciation, as well as theoretical aspects of the phonology and phonetics of English. These studies are supplemented by baseline data from L1 Polish and L1 English. The setting of our study is important in that it provides a unifying factor for our experimental groups. A preponderance of studies of second language speech acquisition is based on migrant communities immersed in an L2 environment, introducing a great deal of variability with regard to age, amount of instruction (if any), language background, and other factors. Since our study has as participants university students and faculty members affiliated with a single university‟s program in English studies, our speaker groups are relatively homogeneous in terms of their motivation, as well as exposure to specific phonetic aspects of the L2. The rest of this paper will proceed as follows. Section 2 provides background on vowel inherent spectral change. Section 3 presents the acoustic studies conducted to retrieve baseline data of VISC measures in both L1 Polish and L1 English. Section 4 describes the cross-sectional study. Section 5 describes the longitudinal study. Section 6 concludes with a discussion of the implications of formant dynamics for the notion of L1-L2 phonological similarity. 2 Static vs. dynamic specification of vowel quality Traditionally, vowels are represented in terms of their position on a two-dimensional chart containing a height dimension and a front-back dimension, which correspond to putative „target‟ realizations in terms of tongue body position, and its acoustic consequences. Acoustically, the height dimension is correlated with the first formant (F1) and the front-back dimension is correlated with the second formant (F2). In early acoustic analyses, researchers looked for a steady-state portion somewhere near vowel midpoint, in which F1 and F2 exhibited flat trajectories, that was taken to contain the acoustic target of the given vowel. In the literature on vowel perception (see Strange 1989), this was referred to as the „static target‟ approach, which essentially operated on the assumption that the perceptual identity of a vowel was a function of its static position in F1-F2 space. In the second half of 20th century, a number of studies of the perception and production of vowels in North American English quickly cast doubt on the static target. In one study, Peterson and Lehiste (1960) noticed vowel-based differences in formant trajectories. In later work (e.g. Nearey & Assmann 1986), these observations were documented in more detail, and certain generalizations were formulated with regard to formant movement. Notably, it was observed that so-called „tense‟ vowels tend to show movement toward the periphery of the acoustic vowel space, while „lax‟ vowels are characterized by movement toward the center (see e.g. Nearey 2013: 52-54). On the basis of this work, the term Vowel Inherent Spectral Change (VISC) was coined, and VISC was hypothesized to be a truly inherent aspect of the vowels of North American English. To support this hypothesis, researchers often employed discriminant analyses, establishing formant movement, in addition to static formant targets, as a significant predictor of vowel identity.

4

It may be suggested that VISC is related to the effects of neighboring consonants. In several early studies of vowel production, authors (e.g. Lindblom 1963 for Swedish; Stevens & House 1963 for American English) documented „target undershoot‟ in CVC contexts. Under the influence of neighboring consonants, canonical formant targets associated with vowels produced in isolation very often are not reached. The target undershoot problem became a focus for research into vowel perception, which asked how identification could remain constant even when the acoustics of a given vowel showed a great deal of consonantinduced variability. One hypothesis that came out of this research was termed the „dynamic specification‟ approach (Strange 1989), according to which formant trajectories over the duration of the vowel provide listeners with crucial cues for vowel perception. To test this hypothesis, an experimental paradigm was developed in which naturally produced stimuli were altered by silencing various parts of a vowel‟s duration, allowing researchers to investigate in a controlled fashion the role of formant dynamics in vowel identification. In one such stimulus condition, referred to as the Silent Center condition (SC; e.g. Strange et al. 1983), the central quasi-steady-state portion of the vowel is silenced, leaving listeners to identify vowels on the basis of CV and VC transitions. Silent Center tokens are compared for perception accuracy with tokens in which central portion of the vowel is included, or others in which only the CV or VC transitions are included, or unmodified tokens. A consistent finding in these experiments on North American English was that the SC tokens were identified most accurately of all the modified stimuli, with error rates often not significantly higher than unmodified tokens (Strange 1989; Jenkins and Strange 1999). These studies suggest a significant role for formant movement in English vowel perception, in which CV and VC transitions together are an important cue to vowel identity. In recent years, VISC has become increasingly prevalent in descriptions of sociophonetic variation in a number of varieties of English. For example, one set of studies described in Fox & Jacewicz (2009) and Jacewicz & Fox (2013), found that younger North American speakers, both in Northern and Southern dialect areas, show a lesser amount of formant movement than older speakers, measured in terms of the sum of Euclidean distances over four vowel-internal intervals. Williams & Escudero (2014) compared vowel qualities in Southern British English and Sheffield English. They found that adding formant trajectories, described in terms of Discrete Cosine Transform coefficients, to discriminant analyses based on mean formant values, increased classification accuracy for both dialects. In a similar vein, Elvin et al. (2016) looked at vowel formant dynamics in the dialect of English spoken in Western Sydney, Australia, and found an important role of VISC in the classification of vowel identities of both diphthongs and nominal monophthongs. Although VISC is gaining an increasingly prominent position in descriptions of English vowels, cross-language studies are relatively uncommon. Williams et al. (2015) compared the production of vowels in Southern British English and Dutch, and found that spectral change was a better predictor of vowel identity in the former than in the latter. With regard to perception, Schwartz et al. (2016) employed the Silent Center paradigm with L1 Polish learners of English both in their L1 and L2. While the SC items were identified most accurately in L2 English, Polish listeners showed no dynamic specification effects in L1 perception, with constant identification accuracy regardless of the portion of the vowel they

5

heard. Taken together, these studies suggest that VISC is perceptually more important in English than in Dutch or Polish. A small amount of research has been devoted to VISC in second language speech. Jin & Liu (2013) compare native American English speakers to L1 Chinese and Korean learners. They found that the L1 Chinese speakers exhibited the greatest degree of VISC. Research described by Rogers et al. (2013) investigated L1 Spanish speakers‟ reliance on formant trajectories in vowel perception in L2 English, as well as VISC in production. One finding was that early bilinguals‟ performance identifying Silent Center tokens was quite accurate (around 75%), yet still lower than that of monolinguals, while late learners of English showed much higher error rates (Rogers et al 2013: 239, Fig. 2). With regard to production, native speakers and early bilinguals produced very similar formant trajectories, in which VISC served to distinguish vowels that are housed in close proximity in the vowel space, while late learners of English produced a lot of acoustic overlap, particularly in the case of the front vowels over 3 different measurement points (Rogers et al. 2013: 248, Fig. 6). Finally, Schwartz (2015) presents a pilot acoustic and perceptual study of English vowels produced by Polish learners, and found that productions of the FLEECE and TRAP vowels with a greater magnitude of formant movement were rated more „native-like‟ by L1 English listeners. This paper seeks to contribute to the relatively sparse literature on the production of VISC in English as an L2. We look at both cross-sectional and longitudinal data from Poles who are learning/have learned English in a formal university setting. Given the relatively stable quality of Polish vowels, and the low weight of formant movement for Polish vowel perception described above, these experiments represent a case study into what happens when speakers of an L1 with a negligible amount of VISC come into contact with English. Before describing the L2 data, however, it is necessary to provide a more detailed comparison of the vowel systems of Polish and English, as well as baseline data on formant dynamics in the two L1s.

3 Polish and English vowels: baseline data on formant dynamics This section will provide baseline data on formant dynamics in L1 Polish and L1 English. Before doing so, however, it is necessary to provide some basic information about the Polish vowel system (we assume that the English vowel system is familiar enough to readers). 3.1 Overview Polish has a system of six oral vowels, which are transcribed /i ɨ ɛ a ɔ u/ in most descriptions. There are no phonological duration contrasts in the language. There are also two graphemes and , which are said to correspond to nasalized vowels, but will not be discussed in this paper. Some additional comment on the vowel /ɨ/ in Polish is necessary at this time. The vowel (spelled in Polish orthography) is subject to certain distributional restrictions. Most notably it is absent from word-initial position, and does not occur after palatal and velar

6

consonants /j ɲ ɕ ʑ tɕ dʑ k g/, with the exception of the velar fricative /x/. For this reason, some phonological analyses of Polish see this vowel as an allophone of /i/ (e.g. Feldstein & Franks 2002). However, Rydzewski (2017) argues against this approach and in favor of phonemic status for the vowel. Rydzewski‟s phonological arguments are buffered by the fact the sound clearly has psychological reality for Polish speakers, as it is rendered by its own letter in the orthography of the language. The phonemic status of /ɨ/ is embraced by nearly all recent treatments (e.g. (Jassem 2003, Gussmann 2007, Rubach 2007). Comparing Polish and English, we may establish correspondences between the two languages with regard to the assumed perceptual similarity of individual vowels between the two vowel systems. In other words, we seek to establish cross-linguistic pairings between the two languages, which will allow us to evaluate theoretical postulates with respect to crosslinguistic similarity. These correspondences will constitute the primary area of comparison for our acoustic studies. In selecting these pairings, we excluded two Polish vowels, /u/ and /ɔ/, due to confusion in determining which English vowels they correspond to. Since English features two vowels in the high-back region (GOOSE and FOOT), we expect single category assimilation by Polish learners in accordance with the predictions of the PAM (Best & Tyler 2007), and indeed Polish listeners show a great deal of confusion in distinguishing this pair. Polish /ɔ/ is excluded since its static location in F1-F2 space falls roughly halfway in between the British English THOUGHT and LOT vowels, and there is no clear choice as to which of these represents the best correspondent. These exclusions leave four Polish vowels /i ɨ ɛ a/, to be compared with four English vowels: FLEECE-KIT-DRESS-TRAP, in terms of formant dynamics For a visualization of the cross-linguistic differences in formant dynamics, consider Figure 1, which presents formant tracks from tokens of the four vowel pairings produced by a C2-level Polish speaker of English both in L2 English (left) and L1 Polish (right). The tracks are taken from CVC words in a coronal environment, over the 20-80% portion of vowel duration. In the figure, the size of the dots increases over the time-course of the vowel, yielding a comet-like appearance. It is evident from this figure that the English items show a great deal more formant movement. The goal of our empirical studies is to document the extent to which this pattern holds over wider samples of speakers and items.

7

Figure 1 – 20-80% formant trajectories of paired vowels in English (left) and in Polish, produced by C2 level Polish speaker of English

3.2 Procedure The baseline data we present are taken from two sources. For L1 Polish, we made citation form recordings of 24 Polish students of English producing the target vowels in CVC words. Although these speakers had a fairly high level of proficiency in English (B2), the recordings session was carried out entirely in Polish, which for the most part may be assumed to have prevented language mixing effects (Grosjean 1998; Antoniou et al. 2010). Recordings were made in a sound treated room at the English department of a Polish university. Items were presented one at a time on a monitor located within the recording booth using Speech Recorder (Draxler & Jänsch 2015). The coda consonant in these words was always coronal, while the onset consonants were counterbalanced for place of articulation, including labial, coronal, and dorsal onsets. A total of 971 L1 Polish items are included for analysis, after manual annotation in the Praat program (Boersma & Weenink 2017). Unfortunately, it was not feasible for us to collect a representational sample of vowels from L1 English speakers, since in the city in which we are based there is no homogeneous group of native speakers of English. For this reason, acoustic measurements for British English native speakers were conducted on the Aix-MARSEC corpus (Auran et al. 2008). A more richly annotated and machine-readable extension of the SEC (Spoken English Corpus), the Aix-MARSEC corpus is a database of spoken British English containing over five hours of speech produced by 68 speakers. It is “a collection of BBC recordings from the 1980s, grouping eleven different radio speech styles ranging from news and interviews to poetry reading” (Auran et al. 2004), and it comprises 52999 word tokens and 7578 distinct word types. The annotations, crucially for our purposes, include a phonemic annotation layer. All instances of the FLEECE, KIT, TRAP and DRESS vowels were retrieved by querying the phonemic annotation layer. Only items 70ms and longer were kept in order to filter out unstressed vowels, facilitating comparison with the L1 Polish citation form data.

8

3.3 Acoustic analysis and statistical tests A Praat script was used to segment each vowel item into 5 equal intervals each encompassing 20% of the vowel‟s duration. As is standard practice in studies of VISC (see e.g. Williams & Escudero 2014), the first and fifth interval (0-20% and 80-100%) were excluded from the analysis in an effort to minimize the acoustic effects of neighboring consonants. The script extracted two measures of formant movement for the remaining intervals (2nd, 3rd, 4th): F1-F2 Euclidean distance measured in Bark and the slope of individual formants measured in units of Bark/100msec. Since formant slope measures are often negative, the absolute value of the slope measures was also calculated. Table 1 provides a summary of the acoustic measures used in the baseline analysis, as well as the L2 analyses to be described later. These acoustic measures served as dependent variables for a series of linear mixed effects models carried out with the help of the SPSS statistical software (IBM Corporation 2013), with Language as the main fixed factor of interest. Onset consonant and Speaker were included as random factors in all analyses. For Euclidean distance measures, analyses were collapsed across all four vowel pairings, and Vowel was treated as an additional random factor. This is done to get a picture of the overall degree of formant movement in the system as a whole. Formant slope measures are examined on a by-vowel basis, with the Language*Vowel interaction as the main fixed factor of interest. Models presented for all of the results in the study constituted the best fit according to the Aikake Information Criterion. Measure Euclidean Distances (2nd, 3rd and 4th intervals)

Description Unit of measure Magnitude of formant excursion Bark in F1-F2 space for a given interval F1 Slope (2nd, 3rd and 4th Mean rate of change of F1 for a Bark/100msec intervals) given interval F2 Slope (2nd, 3rd and 4th Mean rate of change of F2 for a Bark/100msec intervals) given interval Table 1 – Acoustic measures used in baseline study, as well as subsequent studies

Comment Used as a global measure of formant movement Used to describe trajectories of individual vowels Used to describe trajectories of individual vowels

Before presenting the results, a few additional comments on the acoustic measures are necessary at this time. First, we collapse our results across gender in this study, which is feasible using Bark, an auditory transform, rather than raw acoustic measures in Hertz. The next point concerns the role of duration. For the F1-F2 Euclidean distance measures, duration may be expected to be a confounding factor – longer vowels offer more space for larger formant excursion. For this reason, vowel duration is included as a covariate in the analyses of Euclidean distances. By contrast, the formant slope measures capture rate of change, which offers a built-in control of vowel duration in its quantification. 3.4 Results We start by presenting results of Euclidean distance measures collapsed across vowels, which are included as a random factor, along with Speaker and Onset consonant. Figure 2 presents a summary of the three Euclidean distance measures for the intervals of interest, sorted for language. It is evident that the largest difference between the two languages is found in 2nd

9

and 3rd intervals (20-60% of vowel duration), while in the 4th interval, the difference in movement between the two languages is much less dramatic. The mixed-effects regression analyses found that the language-based difference was significant in the 2nd and 3rd intervals (Tables 2 and 3), but not the fourth.

Figure 2 – Mean Euclidean distance measures (in Bark) in 2nd, 3rd, and 4th vowel intervals, sorted for language; error bars represent 95% confidence intervals Euclid 2nd Coeff. Std. Error t Intercept .434 .173 2.51 Language: Polish -.336 .094 -3.57 Duration 1.691 .202 8.36 Table 2 – Second interval Euclidean distance results in baseline study

Sig. .012