Sociophonetic Learning in L1 and L2

0 downloads 0 Views 2MB Size Report
it is ultimately the phonemes and their allophones that act as structural ...... presented using PowerPoint presentations on a laptop computer. Tokens.
Sociophonetic Learning in L1 and L2

Christian Langstrof

Table of Contents / Inhaltsverzeichnis Acknowledgements

4

Zusammenfassung der Arbeit in deutscher Sprache

5

1

Introduction

15

2

Sociophonetic Learning in L2

26

2.1 2.2 2.2.1 2.2.2 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.4 2.4.1 2.4.2 2.4.3 2.4.4 2.5

26 37 37 41 43 43 46 48 49 51 53 55 58 60 63

2.6 2.7

Background Method US Condition UK Condition Results US condition US condition – Occupation US condition – Age US condition – Likeability Interim Discussion UK condition UK condition – Occupation UK Condition – Age UK Condition – Likeability Discussion perception UK Condition Results II – Speech Production and Factor Analysis Overall Discussion Conclusion

3

Sociophonetic Learning in the Lab

74

3.1 3.2 3.3

Background Previous Studies Method

74 90 97 2

69 72

3.4 3.5 3.6

Results Discussion Caveats and Outlook

106 127 142

4

Variant Learning in L1 and L2

144

4.1 4.2 4.3 4.4 4.5 4.6 4.7

Introduction Method Results Interim Discussion Further Pilot Studies Interim Discussion Chapter Discussion

144 146 151 163 168 175 177

5

Looking Back and Ahead

179

Appendix to Chapters 3 and 4: A Stochastic Afterthought – What is ‘successful learning?‘

188

References

206

3

Acknowledgements First of all, my heartfelt thanks goes to my colleagues at the University of Freiburg’s Philologische Fakultät, and the members of the Department of English in particular, who have – apart from providing helpful professional feedback and advice on my research throughout the years – made my professional environment a pleasant place to work in. I would specifically like to thank Professor Bernd Kortmann for ‘taking me in‘, as it were, and for creating a stimulating and encouraging work climate throughout the years . My colleagues Alice Blumenthal and Pia Bergmann deserve special mention in this regard, as they have never tired of going though hours of discussions on all things professional and non-; this includes sharpening my views on linguistics in general as well as helping me out with the nitty-gritty aspects of carrying out projects like these, such as helping me in the preparation of stimuli, recruiting test subjects, et cetera. Standing, as we all do, on the shoulders of giants when it comes to conceptualising and designing large-scale research projects, I would like to thank Gerry Docherty and Paul Foulkes for hauling me on board in order to help carry out the original ‚tribe 1 / tribe 2‘ experiments, the conceptual framework of which I subsequently drew heavily on in much of the work reported on in this book. That I have taken the route towards sociophonetics is ultimately due to Jen Hay, who, from my PhD stage onwards, has never ceased to provide support and professional advice throughout the years in spite of earthquakes and being half a world apart (as well as being notoriously busy running her own lab). On a more personal level, I would like to thank my wife, my parents, and my sister for their wholehearted support in all things personal throughout the years. My special thanks in this regard goes to Vincent Langstrof for boosting my productivity during the formative stages in the preparation of this manuscript. Finally, none of this would have been remotely possible without the participation of hundreds of participants in the various experiments, who endured going through what must have appeared to them as an utterly pointless procedure. It wasn’t!

4

Zusammenfassung der Arbeit in deutscher Sprache

Dieses Buch befasst sich mit dem Thema „Lernbarkeit soziophonetischer Variabilität“. Auf Basis zweier unterschiedlicher - jedoch thematisch miteinander verwobenen - sprachperzeptiver Versuchsreihen soll festgestellt werden, ob und inwieweit erwachsene Lerner in der Lage sind, neuartige Assoziationen zwischen soziolinguistischen Variablen und nicht-linguistischen Kategorien zu formen. Neben einem Übersichtskapitel (Kapitel 1) sowie einer Zusammenfassung (Kapitel 5) enthält die Arbeit drei substantielle Kapitel, die im Folgenden beschrieben werden. Kapitel 2 – Sociophonetic Learning in L2 Kapitel 2 untersucht auf Basis einer Reihe von Sprachwahrnehmungsexperimenten, ob und inwieweit NichtMuttersprachler des Englischen soziolinguistisch stratifizierter Variation in der Zielsprache Englisch gegenüber sensibilisiert sind, und ob sich Prozesse dieser Art aus früheren Erfahrungen mit der Zielsprache im Ausland herleiten lassen. Der zugrunde liegende Datensatz kontrastiert Probanden, die auf einen mindestens sechsmonatigen Aufenthalt in einem englischsprachigen Land zurückblicken können, mit Probanden, die Englisch weitestgehend ausschließlich im Kontext des Zweitsprachenerwerbs in weiterführenden Schulen gehört bzw. gesprochen haben, sowie auch nativen englischsprachigen Kontrollgruppen. Es soll gezeigt werden, dass die entsprechenden Daten im Hinblick auf die im Rahmen der Exemplartheorie entwickelten Rezenzhypothese (siehe hierzu Goldinger (2000): Jüngere hörerbiographische Erfahrungen wirken sich im Vergleich zu älteren in der Ausformung kognitiver linguistischer Kategorien stärker aus) relevante Einblicke erlauben. Zwei miteinander verzahnte Experimentreihen werden vorgestellt und analysiert: (1) Zum einen werden Probanden untersucht, die Erfahrungen mit nordamerikanischem Englisch als Resultat eines Auslandsaufenthaltes haben. Diese werden im Hinblick darauf getestet, ob sie einer bestimmten phonetischen Variable gegenüber nicht-zufallsverteilte Assoziationen hinsichtlich einer Reihe von nicht-linguistischen Evaluationsparametern aufweisen, nämlich Alter, Berufsstand und Sympathie. In dieser Testbedingung sollte herausgefunden werden, ob Hörer phonetisch vergleichbare Variablen unterschiedlich evaluieren, je nachdem, ob diese im Kontext der Muttersprache Deutsch oder der Zielsprache Englisch präsentiert werden. Dass dies der Fall sein sollte, ergibt sich direkt aus 5

neueren exemplartheoretischen Ansätzen zur Sprachwahrnehmung und verarbeitung bzw. ihrer kognitiven Repräsentation (siehe, inter alia, Johnson 1997, Goldinger 2000, Pierrehumbert 2001). Grundbedingung ist, dass die entsprechende Variable in ihren jeweiligen nativen soziolinguistischen „Biotopen“ tatsächlich unterschiedlich evaluiert wird. Bei der hier untersuchten Variable handelt es sich um Abschwächung stimmloser alveolarer Plosive in intervokalischer Position; dieses Phänomen findet sich sowohl im nordamerikanischen Englisch wie auch in unterschiedlichen Varietäten des Deutschen. Der Unterschied besteht darin, dass die ‚abgeschwächte‘ (d.h. stimmhafte) Variante die weitestgehend neutrale, d.h. standardsprachlich akzeptierte Lautung im nordamerikanischen Englisch darstellt (Eckert 2008), wohingegen diese Variante nicht Teil des gegenwärtigen Standarddeutschen ist. Im Hinblick auf die Rezenzhypothese ließe sich somit prädizieren, dass deutschsprachige Hörer als Resultat eines Auslandsaufenthaltes in Nordamerika ihr „episodisches Gedächtnis“ entsprechend aktualisiert haben, also den beiden Varianten gegenüber fundamental unterschiedliche Evaluationen zeigen, je nachdem, ob sie der Variable im Deutschen oder im Englischen ausgesetzt sind. Beispielsweise wäre davon auszugehen, dass sie Benutzern der abgeschwächten Variante [d] niedrigere Ratings hinsichtlich des Parameters Berufsstand zuweisen, wenn die Variante im Kontext des Deutschen präsentiert wird, als wenn diese vor dem Hintergrund des Englischen präsentiert wird. (2) Eine komplementäre Versuchsreihe untersucht die perzeptiven Korrelate, welche von nicht-Muttersprachlern des Englischen mit Bezug auf Variablen geformt werden, die kein vergleichbares Korrelat im Deutschen aufweisen. Die relevante Testgruppe besteht aus Muttersprachlern des Deutschen, die einen mindestens sechsmonatigen Aufenthalt in England aufweisen können. Der diesem Wahrnehmungsexperiment zugrunde liegende Datensatz besteht aus drei soziolinguistisch stratifizierten Variablen des englischen Englisch, welche von den Probanden hinsichtlich der o.g. nicht-linguistischen Kategorien evaluiert wurden: (1) Alveolare vs. glottale Lautungen des intervokalischen /t/ in z.B. butter, city; (2) Diphthongale vs. monophthongale Lautungen des Vokals /i:/, wie in z.B. fleece, cheese; (3) Diphthongale vs. monophthongale Lautungen des Vokals /ei/, wie in z.B. face, fate. Diese Variablen wurden ausgewählt, da sie in ihrem nativen Kontext mit unterschiedlichen nicht-linguistischen Dimensionen korrelieren: Während es sich bei der konsonantischen Variable um eine hochsaliente Variable handelt, die in hohem Grad mit Faktoren wie Alter und sozioökonomischer Status korreliert (cf. Altendorf 2003, Rosewarne 1984), können die vokalischen Variablen /i:/ und /ei/ als insgesamt weniger salient angesehen werden. Darüber hinaus handelt es sich bei der Variable /ei/ um 6

eine stark regional stratifizierte Variable, da der Monophthong [e:] in der Regel ausschließlich im Norden Englands benutzt wird, der Diphthong [ei] hingegen ein Merkmal der südlichen Varietäten einschließlich der Received Pronunciation („RP“) darstellt. Somit geht diese Experimentreihe einen Schritt weiter als die vorherige: Es soll nicht nur untersucht werden, ob Nicht-Muttersprachler grundsätzlich in der Lage sind, phonetischen Variablen in der Zielsprache nicht-zufallsverteilte Evaluationen hinsichtlich sozial-indexikalischer Kategorien zuzuweisen, sondern darüber hinaus auch, ob dieses Phänomen auch im Hinblick auf unterschiedliche nicht-linguistische Korrelate zu finden ist. Die Ergebnisse der Experimente seien im Folgenden kurz zusammen gefasst: Im Großen und Ganzen kann festgestellt werden, dass die Rezenzhypothese insofern zuzutreffen scheint, als dass NichtMuttersprachler eindeutig in der Lage sind, nicht-zufallsverteilte Assoziationen zwischen phonetischen Varianten und sozial-indexikalen Kategorien in der Zielsprache zu formen. Es kann ferner nachgewiesen werden, dass dieses Phänomen auch als Resultat eines Auslandsaufenthaltes begriffen werden kann. Es wird gezeigt, dass die Testgruppe in der ersten Versuchsreihe (also Deutsche mit Auslandserfahrung in Nordamerika) die dem Experiment zugrundeliegende Variable im Englischen ähnlich evaluieren wie die native, d.h. US-Amerikanische Kontrollgruppe, wohingegen die deutsche Kontrollgruppe in weiten Teilen andere Ergebnisse aufweist. Ferner wird nachgewiesen, dass die Testgruppe fundamental unterschiedliche Evaluationen hinsichtlich der beiden Varianten der Testvariable zeigt, je nachdem, ob diese im Kontext deutscher oder englischer Wörter eingebettet ist. Dieses Ergebnis stärkt exemplartheoretische Modelle der Sprachwahrnehmung, die von einer hybridartigen Verarbeitung von Sprachdaten ausgehen, d.h. dass die zugrunde liegenden kognitiven Repräsentationen jeglicher sprachlicher Kategorien sowohl linguistische als auch nichtlinguistische Merkmale umfassen. Die im Rahmen dieser „UK condition“ erhaltenen Ergebnisse lassen allerdings ferner darauf schließen, dass sich nicht-zufallsverteilte Assoziationen zwischen phonetischen Variablen und nicht-linguistischen Kategorien in Teilen auch als Resultat sogenannter „classroom standards“ begreifen lassen; Lerner neuen Varianten gegenüber also nicht absolut neutral gegenüberstehen, sondern einige der im Rahmen der vorherigen Lernerbiographie nicht angetroffenen Lautungsvarianten generell eher mit denjenigen sozial-indexikalischen Kategorien assoziieren, die man vielleicht unter der Metakategorie „Nicht-Standard“ zusammenfassen kann (d.h. niedrigere Ratings bei Faktoren wie sozioökonomischer Status und Alter).

7

Jenseits des neuartigen Ansatzes zur Überprüfung der Rezenzhypothese in exemplartheoretischen Modellen der Sprachwahrnehmung und – verarbeitung können die in Kapitel 2 analysierten Daten auch als Brückenschlag zwischen bis dato weitestgehend voneinander unabhängigen Teilbereichen der Linguistik betrachtet werden, nämlich der perzeptiven Soziophonetik auf der einen Seite und der Zweitsprachenerwerbsforschung auf der anderen. Die Beziehung zwischen diesen Teilgebieten ist in der Vergangenheit wenig untersucht worden, da sich die Zweitsprachenerwerbsforschung weitestgehend auf Fragen des Erlernens linguistischer Strukturen beschränkt und deren sozial-indexikalische Dimension weitestgehend ignoriert wurde, während die der empirischen Soziolinguistik zugrundeliegenden Forschungsfragen in der Regel auf Basis von L1-Daten untersucht wurden. Als Beispiel aus der jüngeren Vergangenheit, wo diese Teilbereiche miteinander in Bezug gesetzt werden, seien Drummonds Arbeiten zum Erwerb soziolinguistischer Variablen des Englischen seitens Muttersprachlern des Polnischen in Manchester genannt (Drummond 2010/11/12/13). Im Gegensatz zu dem im Kapitel 2 dieses Buches gewählten Forschungsansatzes nähert sich Drummond dem Thema von einem eher ethnographischen Standpunkt aus, wohingegen die hier vorgestellten Analysen einen laborgestützten, kontrollierteren Ansatz bieten. Des Weiteren fokussiert die Analyse in Kapitel 2 im Gegensatz zu Drummonds Daten im Allgemeinen eher auf die perzeptive Dimension soziophonetischer Varianz, auch wenn eine Reihe von im Rahmen dieser Experimente erhobener Produktionsdaten ebenfalls analysiert wird. Im Hinblick auf zukünftige Untersuchungen lässt sich auf Basis der vorliegenden Arbeit in Verbindung mit ihrer Einbettung in exemplartheoretische Ansätze zur Sprachwahrnehmung sicherlich argumentieren, dass das Thema „Soziolinguistisches Lernen“ eine prominentere Rolle in der Zweitsprachenerwerbsforschung spielen sollte, was wiederum direkt aus den Prämissen der Exemplartheorie folgt: Wenn linguistische Repräsentationen tatsächlich „komplex“ bzw. „hybridförmig“ sind, d.h. sowohl Informationen hinsichtlich einer Reihe linguistischer Merkmale als auch eine ganze Reihe an weiteren Sprecher- und Sprechepisodencharakteristika enthalten – und die in der jüngeren Forschung im Bereich Soziophonetik erhaltenen Forschungsergebnisse weisen sehr stark in diese Richtung – bedeutet das qua Definition, dass das Konzept des „Erlernens einer Sprache“ die Einbindung des Erlernens sozial-indexikalischer Kategorien und ihrer Relationen zu linguistischen Strukturen zwingend in Betracht ziehen muss.

8

Kapitel 3 – Sociophonetic Learning in the Lab Kapitel 3 nähert sich dem Thema „Soziophonetisches Lernen“ aus einer anderen Richtung: Grundsätzlich handelt es sich hierbei um Folgestudien zu einer Reihe explorativer Untersuchungen von Docherty, Langstrof und Foulkes (2008a/b, 2013; von hier an „DLF“), deren grundlegende Zielsetzung es war, die vielleicht fundamentalste Hypothese exemplartheoretischer Ansätze der Sprachverarbeitung experimentell zu überprüfen, dass nämlich Hörer nachweislich in der Lage sein sollten, auf Basis von entsprechenden Inputdaten neuartige Assoziationen zwischen linguistischen Variablen und sozialen Kategorien zu entwickeln. Umgesetzt wurde dieser Ansatz seitens DLF, indem Versuchsteilnehmer in einer sog. „Trainingssitzung“ mit nichtzufallsverteilten Assoziationen zwischen Varianten einer phonologischen Variable und hypothetischen Sprechergruppen konfrontiert wurden und diese daraufhin in einer Testsitzung replizieren sollten (siehe Kapitel 3 für eine genauere Beschreibung der zugrunde liegenden Methode). Zwar konnte gezeigt werden, dass der Ansatz funktioniert, da die Probanden in der Lage waren, zumindest einige der in der Trainingssitzung angetroffenen Assoziationen erfolgreich zu rezipieren, allerdings konnte nicht genau bestimmt werden, welche Faktoren letztendlich ein erfolgreiches Lernen in der Hauptsache prädizieren. Die in Kapitel 3 der vorliegenden Arbeit analysierten Daten erweitern diesen Forschungsansatz in verschiedene Richtungen: Zum einen wurden Probanden auch auf Variablen trainiert und getestet, die nicht Bestandteil ihrer Muttersprache sind, wohingegen die Daten von DLF ausschließlich attestierte Variablen umfassten. Hiermit einhergehend wird außerdem untersucht, inwieweit lexikalische Faktoren, die sich in der jüngeren Vergangenheit zunehmender Beliebtheit in der soziophonetischen Forschung erfreuen (Pierrehumbert 2001, Hay 2013), soziophonetisches Lernen beeinflussen. So beinhalten die in Kapitel 3 vorgestellten Experimente sowohl bekannte Variablen, die in echte Wörter eingebettet sind, aber auch eine Reihe von Variablen, die den Probanden vor dem Hintergrund einer „Fantasiesprache“ präsentiert werden. Dadurch soll untersucht werden, ob soziophonetisches Lernen sich zuvorderst auf der Ebene des Wortes abspielt, wobei wir vermutlich Unterschiede hinsichtlich der Erlernbarkeit soziophonetischer Variabilität erwarten würden, je nachdem, ob eine ansonsten identische Variable im Rahmen „echter“ oder Nonsens-Wörter eingebettet ist, oder ob der primäre Lokus soziophonetischen Lernens sich auf der Ebene der Variable selbst befindet, woraufhin wir derartige Unterschiede nicht erwarten würden. Des Weiteren beinhaltet eines der hier vorgestellten Experimente eine ungleichmäßige Verteilung von phonetischen Varianten über unterschiedliche Wörter in der Trainingssitzung, d.h. dass eine bestimmte Variante der zugrunde liegenden 9

Variable in einem der Testwörter öfters anzutreffen ist als in anderen Wörtern. Sollte das Wort der primäre Lokus soziophonetischen Lernens sein, so würde man erwarten, dass die Probanden diese unterschiedlichen Eingangsverteilungen als Funktion spezifischer Wörter in der Testsitzung replizieren. Darüber hinaus manipulieren einige der in Kapitel 3 vorgestellten Experimente sowohl die stochastische Robustheit der in der Trainingssitzung angetroffenen Variantenverteilungen, wie auch deren zugrunde liegende Verteilung über die hypothetischen Sprechergruppen. Während die in den DLF-Experimenten angetroffenen Eingangsdistributionen sowohl symmetrisch als auch relativ robust waren, beinhalten einige der in Kapitel 3 analysierten Folgeexperimente sowohl asymmetrische als auch weniger robust verteilte Eingangsverteilungen. Hiermit soll zum einen untersucht werden, ob erfolgreiches soziophonetisches Lernen eine lineare Funktion der stochastischen Robustheit von Eingangsverteilungen ist oder ob nicht-lineare Schwellenwerte vorliegen. Zum anderen dienen diese Manipulationen der Untersuchung etwaiger „stochastischer Stereotypisierungen“, d.h. der potentiellen Überassoziierung eines linguistischen Merkmals X mit ebenjener Sprechergruppe, die Merkmal X zwar relativ selten nutzt im Vergleich zum Komplementärmerkmal Y, auf der anderen Seite jedoch die einzige Gruppe ist, die X überhaupt benutzt. Die Ergebnisse zeigen, dass Probanden weitaus bessere Lerner soziophonetischer Variabilität sind, als sich aus den Studien von Docherty et al. ableiten ließ: Zum einen wird gezeigt, dass Probanden zumindest prinzipiell Eingangsverteilungen von soziophonetischen Variablen zu replizieren imstande sind, deren phonetisches Substrat nicht in ihrer Muttersprache vorkommt. Darüber hinaus zeigt sich, dass auch im Vergleich zu den DLF-Studien weniger robuste und asymmetrische Eingangsverteilungen weitestgehend eingangsverteilungsgetreu wiedergegeben werden. Auf der anderen Seite wird gezeigt, dass sich lexikalische Effekte insofern als weitestgehend irrelevant herausgestellt haben, als dass der Lernerfolg in diesen Experimenten unabhängig davon ist, ob die Variable in echten Wörtern eingebettet ist oder den Probanden im Rahmen einer Fantasiesprache präsentiert wird. Im Großen und Ganzen zeigt sich, dass der primäre Lokus soziophonetischen Lernens die phonologische Variable ist. Darüber hinaus werden die Ergebnisse vor dem Hintergrund der Konzepte der Salienz und des Probability Matchings interpretiert. Während das Konzept der Salienz seit geraumer Zeit eine zentrale Rolle in der soziolinguistischen Diskussion einnimmt, bleibt es doch notorisch ungreifbar hinsichtlich sowohl seiner Ontologie bzw. explanatorischen Kraft als auch seiner Messbarkeit. Während das Problem der Ontologie (z.B.: ist Salienz 10

Explanans oder Explanandum?) hier nicht weiter behandelt werden soll (hier sei auf die relevanten Diskussionen in Auer et al. (1998) und Kerswill and Williams (2002) verwiesen), wird argumentiert, dass es sich bei dem den Experimenten zugrundeliegenden methodischen Ansatz um einen nichtzirkulären Ansatz zur Messbarkeit zumindest einer spezifischen Facette der Salienz handelt, nämlich der variablen-intrinsischen (d.h. „intern“ im Labovschen Sinne, also der linguistischen und distributionalen Eigenschaften eines sprachlichen Merkmals, unabhängig von sozialen Faktoren) und paradigmatischen Salienz (siehe Kapitel 3 und 5 der vorliegenden Arbeit für eine detailliertere Darstellung der Abgrenzung unterschiedlicher Teilaspekte eines potentiellen Gesamtkonzeptes der Salienz). Probability Matching ist ein aus den Verhaltens- und Wirtschaftswissenschaften entnommener Terminus (cf. inter alia Cosmides und Toby 1996, Gallistel 1990/2005, Gigerenzer 2000, Harper 1982, Vulkan 2000, Weber 1998), welcher von Labov im Bezug auf die Soziolinguistik diskutiert wurde (1994: 580-88, siehe Kapitel 3 der vorliegenden Arbeit für eine kurze inhaltliche Darstellung). Labov argumentiert, dass die Art des Probability Matchings, wie sie in nichtsprachlichen Mustererkennungsstrategien seitens unterschiedlicher menschlicher und nicht-menschlicher Probanden angewandt wird, auch die grundlegende Herangehensweise von Sprachlernern bei der Verarbeitung soziolinguistisch stratifizierter Variation darstellt. Labov stellt dar, wie sich bestimmte beobachtete Muster des Sprachwandels (hier insbesondere sog. „pull chains”) direkt als Resultat einer Probability-Matching-Strategie erklären lassen, allerdings wird nicht experimentell nachgewiesen, dass es sich hierbei um eine universelle Verhaltensstrategie bei soziolinguistischen Lernern handelt. Dieser Sachverhalt wird in Kapitel 3 beleuchtet; zusammenfassend kann festgestellt werden, dass Lerner tendenziell zwar einem dem Probability Matching entsprechenden Ansatz zu folgen scheinen, es allerdings auch dezidierte Ausnahmen gibt. So scheinen Lerner beispielsweise eine prononcierte Abneigung gegen die Replikation von Varianten aufzuweisen, die in den Trainingsdaten eine kategorische Assoziierung mit hypothetischen Sprechergruppen aufweisen. Kapitel 4 – Variable Learning in L1 and L2 Kapitel 4 führt den den DLF-Experimenten wie auch den Kapitel 3 zugrunde liegenden methodischen Ansatz in zweierlei Hinsicht weiter: Zum einen wird überprüft, wie sich Probanden mit im Vergleich zu den vorigen Experimenten in verschiedenerlei Hinsicht komplexeren Trainingsdaten verhalten. Auf der anderen Seite wird ebenfalls analysiert, ob und wie Mustererkennung vor dem Hintergrund minimal komplexer Eingangsdaten funktioniert. Darüber 11

hinaus kann Kapitel 4 als Brückenschlag zwischen den Themen der vorherigen Kapitel, nämlich laborgestütztes Lernen soziophonetischer Variabilität (Kapitel 3) und Perzeption/Evaluation von Variablen in einer Fremdsprache (Kapitel 2), betrachtet werden, da sich die Probandenstichproben in einigen der im Kapitel besprochenen Testbedingungen sowohl aus Muttersprachlern, als auch aus NichtMuttersprachlern der den Trainingsdaten zugrundeliegenden Sprache (in diesem Fall Englisch) zusammensetzen. Hinsichtlich des Gesichtspunktes der Komplexität sei in Erinnerung gerufen, dass sowohl die DLF-Experimente als auch die in Kapitel 3 der vorliegenden Arbeit besprochenen Folgeexperimente bezüglich der den Probanden in der Trainingsphase präsentierten Eingangsdaten ausnahmslos je zwei Varianten einer linguistischen Variable beinhalteten, die ferner über genau zwei hypothetische Sprechergruppen verteilt waren. Zwei der in Kapitel 4 vorgestellten Testbedingungen erhöhen die intrinsische Komplexität des Datenpools in der Trainingssitzung hinsichtlich der Anzahl der zugrunde liegenden Variablen (hierbei handelt es sich um die Testbedingung „condition 4/1“, im Rahmen derer die Probanden in der Trainingssitzung drei Variablen anstelle von einer antrafen), sowie auch hinsichtlich der „sozialen Komplexität“ (Testbedingung „condition 4/2“, im Rahmen derer Varianten über drei anstelle von zwei hypothetische Sprechergruppen verteilt wurden). Die der erstgenannten Testbedingung zugrundeliegende Fragestellung ist die der Variableninteraktion; mit anderen Worten, können wir die Lernbarkeit bzw. Salienz einer Variable und ihrer Varianten überhaupt als Funktion ihrer selbst greifen oder hängt dieses auch von der Präsenz zusätzlicher Variablen im Gesamtdatensatz ab? Die in Kapitel 4 analysierten Ergebnisse zeigen, dass Lernbarkeit soziophonetischer Varianz zumindest in Teilen davon abhängig ist, ob sich weitere Variablen im Datenpool befinden. Auf der anderen Seite wird ebenfalls gezeigt, dass zusätzliche Komplexität bezüglich der Komplexität der sozialen Korrelate (also der Anzahl der hypothetischen Sprechergruppen) so gut wie keine Rolle hinsichtlich der Fähigkeit zur erfolgreichen Mustererkennung spielt. Zusätzlich zu den im Vergleich zu den vorhergehenden Testbedingungen komplexeren Eingangsdistributionen werden in Kapitel 4 ebenfalls die Ergebnisse einer Testbedingung analysiert, die Probanden mit minimal komplexen Variantenverteilungen konfrontiert, welche zudem nicht Teil ihrer nativen Hörerbiographie darstellen (Testbedingung „condition 4/3“). Es handelt sich um eine Gruppe nativ deutschsprachiger Probanden, welche mit glottalen vs. alveolaren Varianten intervokalischer Verschlusslaute im Rahmen englischer Wörter konfrontiert wurden. Es wird gezeigt, dass Probanden nicht-zufallsverteilte Assoziationen zwischen phonetischen 12

Varianten und hypothetischen Sprechergruppen bereits auf Basis der kleinstmöglichen Eingangsverteilung, nämlich eines einzigen gruppendefinierenden Tokens, zu bilden imstande sind. Hinsichtlich des Vergleiches von nativen gegenüber nicht-nativen Probanden in der Testbedingung „condition 4/1“ wird gezeigt, dass die Antwortdistributionen der nicht-nativen Probanden am besten als Ergebnis einer komplexen Interaktion von unterschiedlichen Faktoren greifbar werden, insbesondere (a) den linguistischen Merkmalen der Muttersprache, (b) soziolinguistischer „classroom standards“ (s.o.), sowie (c) zielsprachenrelevanter Auslandserfahrungen. Schließlich geht Kapitel 4 einen kleinen Schritt in Richtung der Untersuchung von Lernbarkeit nicht-phonologischer Variablen relativ zur Lernbarkeit phonologischer Variablen: So handelt es sich bei einer der in Testbedingung „condition 4/1“ zugrundeliegenden Variablen um eine morpho-syntaktische, nämlich doppelte Verneinung, welche in den Trainingsdaten in Verbindung mit zwei phonologischen Variablen – „tflapping“ sowie Vokalqualität in /æ/ - präsentiert wurde. Abschließend seien hier die zentralen Untersuchungsergebnisse der vorliegenden Arbeit noch einmal stichpunktartig zusammengefasst:

(1) Nicht-Muttersprachler sind kompetente Lerner soziolinguistischer Variabilität. Somit lässt sich die Hypothese, dass rezente Erfahrungen bezüglich linguistischer Variablen und ihrer assoziierten nichtlinguistischen Kategorien eine formative Schlüsselrolle in der Repräsentation kognitiver linguistischer Kategorien einnehmen, auf Basis dieser Daten stützen. (2) Bezüglich der perzeptiven Evaluation linguistischer Variablen von Nicht-Muttersprachlern beobachten wir eine Interaktion zwischen im Rahmen der formalen Lernens, d.h. auf Basis negativer Evidenz hinsichtlich der zuvor nicht angetroffenen Variante extrapolierten soziolinguistischen „classroom standards“ auf der einen Seite und der als Resultat entsprechender Auslandserfahrungen erworbenen soziolinguistischen Kompetenz auf der anderen Seite, wobei im Konfliktfall letztere überwiegt. (3) Laborgestütztes Lernen soziophonetischer Variabilität findet primär auf der Strukturebene der phonologischen Variable statt. Alternative potentielle Faktoren wie beispielsweise Variante, Wort oder 13

Robustheit bzw. Form untergeordnete Rollen.

der

Eingangsdistribution

spielen

(4) Hinsichtlich der Mustererkennung, also der intrinsischen Salienz linguistischer Merkmale, lässt sich somit eine Prädiktorenhierarchie ableiten (Wichtigster zu Unwichtigster): >Vertrautheit mit der phonologischen Variable / Phonetischen Distanz zwischen den Varianten der Variable >>Form der Eingangsdistribution >>>Stochastische Robustheit der Eingangsdistribution >>>>Phonetische Merkmale Varianten/Lexikalische

spezifischer Faktoren

(5) Die Lernbarkeit spezifischer linguistischer Variablen interagiert mit dem Vorhandensein zusätzlicher Variablen in einem gegebenen Datenpool, womit Lernbarkeit bzw. intrinsische Salienz linguistischer Merkmale nur bedingt als Funktion variableninterner Merkmale erklärbar ist. Diese Variableninteraktion lässt sich ferner auch über unterschiedliche linguistische Strukturebenen nachweisen; so wird gezeigt, dass das Vorhandensein einer morpho-syntaktischen Variable die Lernbarkeit phonologischer Variablen erschwert. (6) Mustererkennung von nicht-zufälligen Eingangsverteilungen findet bereits auf Basis sehr kleiner Mengen sozial relevanter Daten statt. Es kann nachgewiesen werden, dass bereits ein einziges Token ausreicht, um soziale Kategorien auf Basis linguistischer Variabilität zu identifizieren. Des Weiteren wird nachgewiesen, dass bereits bei zwei gruppen- und sprecherspezifischen Tokens einer Variante einer linguistischen Variable diese seitens der Probanden als Merkmale der Gruppe – und nicht etwa individueller Sprecher – angesehen werden.

14

Chapter 1 Introduction This is a book on sociophonetic learning. Drawing on two interwoven strands of experiments, it will be investigated whether and to what extent adult learners are able to form novel associations between sociophonetic variants and non-linguistic categories. The topic will be approached from two angles: Chapter 2 will look at the way in which non-native learners evaluate sociophonetic variation in a second language. It will be investigated whether German learners of English display non-random associations of sociophonetic variants with basic nonlinguistic categories such as occupational status, age, and likeability. Specifically, I will compare Germans with some degree of first-hand experience with English in either England or North America to Germans without such prior knowledge as well as native speakers of each of the two varieties of English. A complementary set of experiments will be discussed in chapters 3 and 4, which will adopt a more experimental approach to sociophonetic learning: Based on a methodological framework designed by Docherty et al. (2013, henceforth DLF), listeners were trained on non-random associations between linguistic variants and hypothetical social categories, and subsequently tested on whether they have managed to memorize the distributions of phonetic variants across extra-linguistic labels which they encountered in the training data. In other words, these experiments test whether and to what extent listeners are able to form truly novel associations between linguistic variants and artefactual social categories, an issue that has drawn little attention in the sociophonetics literature so far. Whereas chapter 3 essentially represents a range of fine-grained follow-up tests to the pioneering experiments discussed by Docherty et al., chapter 4 seeks to connect the two approaches as outlined in chapter 2 and 3 by applying Docherty et al.'s methodological framework against the background of sociophonetic learning in L2. In addition, chapter 4 will extend the DLF approach in the direction of investigating data sets of different degrees of inherent complexity in that both more complex and less complex data sets will be investigated. On a theoretical level, I would like to argue that the data analyzed in this volume will shed light on various concepts that have been discussed in 15

various quarters of sociolinguistics, specifically exemplar theory, salience, and probability matching.

Sociophonetics and Sociolinguistics There seems to be some degree of terminological fuzziness as regards the concept sociophonetics, which is perhaps unsurprising given the relative youthfulness of the field. Specifically, a broad definition of the term sociophonetics seems to compete with a narrower one. The former kind of view on the subject essentially embodies a methodological take to the extent that under this view, any study that investigates a given range of phonetic features while being sensitive to the non-linguistic factors (such as social characteristics of the speaker) can rightfully be regarded as an instance of ‘doing sociophonetics’ (cf. the discussion in Thomas 2011: ch. 1). Crucially, this kind of approach would certainly go back to Labov et al’s early variationist studies from the 1970s that draw on measurable phonetic variables and their variants in order to arrive at models of structured heterogeneity in various speech communities (Labov, Yaeger, and Steiner 1972). The narrower view, poignantly epitomized by the title of Labov’s 2006 article ‘a sociolinguistic perspective on sociophonetics’ (which essentially amounts to a pleonasm under the broader definition), essentially identifies sociophonetics with an underlying theoretical agenda, specifically ‘exemplarist’ approaches to speech perception, production, processing, and storage. This perceived discrepancy follows from the fact that the major protagonists in recent sociophonetic research have strongly focused on the role of frequency in the context of sociophonetic alternants embedded in specific lexical items (cf. inter alia Pierrehumbert 2002). Labov correctly states that his caveat is not merely a question of labels. Rather, it exemplifies two different interpretations regarding the prime level of analysis in the perception and processing of linguistic data: Do speakers/listeners operate on the level of the word, or on the level of more abstract structural entities such as phonemes? If we identify the word as the primary predictor of distributions of sociophonetic variants, it is hard to account for the fact that it is ultimately the phonemes and their allophones that act as structural protagonists in the vast majority of attested sound changes. In this book, I hope to be able to show that stochastic learning can be investigated and interpreted against the background of some of the predictions that logically emerge from episodic-memory approaches to phonetic processing without any necessary a priori (nor posteriori) 16

commitment to word-based approaches. In fact, the data as it emerges from the learning experiments that will be reported on in chapters 3 and 4 of this volume strongly suggest that (a) stochastic learning takes place, and that (b) the primary level learners operate on is the sociolinguistic variable. In other words, this work can be regarded as representing an interface between the general interpretation, on the one hand, and the narrower, more theory— driven one, on the other hand.

The structure of this book Chapter 2 – Sociophonetic learning in L2 This book contains three substantive chapters: Chapter 2 will present data from a range of perception experiments that aim at investigating whether and to what extent non-native speakers have any sort of non-random association of phonetic variables and their associated variants with a range of social-indexical categories. Furthermore, it will be investigated whether such associations, if present, are the result of pre-existing universal learnerlanguage defaults or whether they can be shown to be malleable as a result of first-hand experience with the target language in its native environment. In other words, the experiments that will be reported on in chapter 2 involve a comparison of L2 listeners with some measure of first-hand experience with English as opposed to L2 listeners without. On the theoretical level the data will be loosely couched and interpreted against the background of exemplar theory. Specifically, the data was designed to address what can be paraphrased the 'recency hypothesis' (Goldinger 2000), which essentially claims that recent linguistic experiences have a comparably strong impact on the shape and strength of sociophonetic representations in speech perception and processing. It will be argued that the kinds of data that will be discussed in chapter 2, i.e. perception of L2 variants, allow for a more direct assessment of this hypothesis compared to earlier approaches to investigating the topic of recency such as Harrington’s analysis of Queen Elizabeth 2nd’s Christmas speeches over the last decades (Harrington et a. 2000a/b, Harrington 2006). Two interwoven strands of data will be analyzed: The first strand consists of data from native speakers of German with some degree of firsthand experience with North American English. This group as well as two control groups – Germans without first-hand experience with US English and native US English speakers – were asked to evaluate a phonetic variable 17

along a range of non-linguistic categories including age, occupational status, and likeability. The overarching aim of this condition was to investigate whether phonetically similar variables are evaluated differently if they are being presented in the context of different languages, which is very much what we would expect under current exemplar-theoretical approaches to sociophonetic perception and processing. Of course this expectation would only be borne out if the kinds of sociolinguistic associations differ across the two languages in question. The variable that was chosen for analysis is intervocalic t-attenuation, which occurs in US English as well as in a range of accents of German. The difference resides in the fact that attenuated /t/ represents the unmarked baseline – i.e. everyday variant – in North American English (Eckert 2008), whereas this is never part of the Standard in contemporary Germany. If the recency hypothesis is correct, episodic memory would be expected to lead listeners who have experienced the markedness differences that hold differentially across the two languages with regard to the two variants of intervocalic /t/ to have developed different value judgments as a function of ambient language. For example, they would be expected to associate the attenuated variant with comparably lower occupation ratings when the variant is presented in the context of German data compared to US English data. A related strand of experiments investigated in chapter 2 looks at how L2 listeners deal with variables that do not have a straightforward correlate in their native language. The test group consists of native speakers of German with some amount of first-hand experience with UK English. Participants in this condition were asked to provide value judgments on three variables: (1) Alveolar vs. glottal variants of intervocalic /t/ as in e.g. butter, city; (2) Diphthongal vs. monophthongal variants of /i:/ as in e.g. fleece, cheese; (3) Diphthongal vs. monophthongal variants of /ei/ as in face, fate, etc. The idea behind the selection of these variables was to investigate how non-native listeners handle non-standard variants, which also show different types of correlation with extra-linguistic social-indexical categories on top of showing differential degrees of salience in the native UK speech community: Whereas the intervocalic stop variant represents a highly salient non-standard variant that correlates mainly with age and social class, the vocalic variables (i) and (ei) can be argued to be rather less salient overall (cf. Docherty et al. 2013); in addition, the (ei) variable is primarily associated with the factor region in that the monophthong is a typically Northern English variant contrasting with Southern (including RP) diphthongized [ei]. Hence, this line of experiments aimed at looking beyond the question of 18

whether or not non-native listeners are in principle capable of forming native-like associations of phonetic variants with social-indexical categories in that the underlying question is whether this type of sociophonetic learning applies in a native fashion with regard to socially different types of variables which furthermore do not occur in the test subjects’ native language. Overall, it will be shown that the recency hypothesis can be upheld since non-native listeners are capable of forming non-random associations between phonetic variants and extra-linguistic categories, which can furthermore be shown to be partly a function of exposure to L2 in its native environment: Germans with first-hand experience with US English show associations that are more similar to the US control group than they are to the German control group. In addition, it will also be shown that fundamental differences exist within the test group in terms of evaluating a phonetically similar variable in the context of German data compared to English data. This finding lends strong support to models of speech perception that recognize hybrid representations of perceptual exemplars including both phonetic and non-linguistic information regarding the relevant data points. The data from the UK test and control subjects furthermore reveals that L2 learners form sociolinguistic defaults as a function of formal learning, that is, control subjects also reply in a nonrandom fashion when evaluating phonetic variants against a range of nonlinguistic categories, which can be regarded as a ‘classroom standard’ that develops during their learning in an L2 environment. Overall, it will be argued that what we essentially observe is an interaction of classroom defaults with later first-hand experience in a foreign language when it comes to the social evaluation of phonetic variables and their associated variants. In other words, L2 listeners seem – at least in parts – be able to operate on negative data scenarios. Apart from offering a novel perspective on testing the recency hypothesis in exemplar-based models of speech perception and processing, chapter 2 can also be regarded as a link between what used to be largely unconnected fields of research in linguistics thus far, namely perceptual sociolinguistics and second-language acquisition. This link has remained rather under-researched in the past as research in second-language acquisition has mainly focused on the acquisition of linguistic structure rather than their non-linguistic correlates, whereas sociolinguistics has predominantly focused on data from speakers’ native language rather than languages that are being learned later on in a given speaker’s linguistic biography. Recent steps towards linking the two fields include Drummond’s 19

work on the acquisition of local non-standard variants of English on the part of Polish immigrants in Manchester (Drummond 2010/11/12/13). Unlike the data presented in chapter 2 of this book, Drummond’s research essentially investigates the issue using a community-research type of approach, whereas the work presented in this book takes a more lab-based approach. In addition, the data that will be analyzed in chapter 2 focuses more strongly on speech perception rather than production. I would like to maintain that the topic of sociolinguistic learning should take a more prominent position in the field of L2 acquisition, which follows directly from the insights gained from exemplar-theory accounts of language acquisition: If linguistic representations are indeed complex in that linguistic representations include both information on linguistic structure and a rich array of social-indexical correlates – and an increasing body of evidence in sociophonetics suggests that this is the case - it follows that the concept of ‘learning a language’ by definition needs to incorporate notions of learning of social-indexical categories. Chapter 3 – Sociophonetic Learning in the Lab Chapter 3 will investigate the constraints and capabilities of lab-based learning of sociolinguistic representations. It essentially represents a range of follow-up studies to earlier exploratory studies by Docherty, Langstrof, and Foulkes (2013), who aimed at testing what can be regarded as the basic prediction of episodic-memory theories of sociolinguistic processing, namely that listeners should be able to form novel associations between linguistic variants and non-linguistic categories after previous exposure to appropriate kinds of data. This was implemented by training participants on data that consisted of variants of attested sociolinguistic variables that were tied to hypothetical speaker groups in a training session (cf. ch. 3 for a more indepth description of the methodology involved). Although this approach was shown to work in that listeners are in principle capable of forming such novel associations, the limits and capabilities of de novo sociophonetic learning remained somewhat unclear. The experiments that will be reviewed in chapter 3 of this book have extended the original approach in various ways: First of all, some of the test conditions trained and tested listeners on variables that do not exist in their native language, whereas the DLF experiments focused on training and testing listeners on the basis of known variables. Lexical effects, which have recently gained renewed attention in frequency-based accounts of sociophonetic learning in the exemplartheoretical vein (Pierrehumbert 2001, Hay 2013), will also be investigated: 20

First of all, some of the data involves known (i.e. ‘real-world’) variables presented in the context of nonce-words in order to investigate whether lexical embedding in any way influences the degree to which learners internalize input distributions of phonological variants distributed over hypothetical speaker groups. This addresses the question of whether sociophonetic learners operate primarily on the level of the word – in which case we would probably expect differences in terms of learnability between variables embedded in real words as opposed to nonsense words – or whether they operate primarily on the level of the phonological variable, in which case we would not expect any particularly striking differences across the two modes of exposure. In a related vein, one of the conditions involves differential degrees of strength across specific lexical items in terms of their likelihood of being associated with one of the hypothetical speaker groups in the input data, the question would then be whether such differences are also found in the output data participants produce in the test session of the experiment. Additional experiments that will be reviewed in chapter 3 manipulate the strength and the shape of the variant distributions participants encountered in the training session. Whereas all of the experiments that were reported on in Docherty et al. (2013) involved input distributions that were both symmetrical1 as well as relatively robust2, some of the follow-up conditions that will be analyzed in chapter 3 exposed participants with either asymmetrical distributions or less solidly weighted ones. The main idea behind this approach was to investigate whether learning success in this type of task is a linear function of the robustness of the input data, or whether we observe abrupt cut-off effects. In addition, these conditions also aimed at investigating whether input data is taken at face value, or whether participants can be led to form ‘stochastic stereotypes’, i.e. to overascribe a given feature X to a group which uses X rarely in comparison to a complementary feature Y, while at the same time being the only group overall that uses X. Overall, it will be shown that listeners are rather more capable of forming novel associations between phonological variables and social1

This relates to the distribution of variants across the two hypothetical speaker in the training session of the DLF experiments: If one of the two groups uses variant x 80% of times while using variant y 20% of times, this weighting was mirrored in the complementary group to the effect that this group would use x 20% of times and y 80% of times. This symmetrical set-up was used in all of the tasks in the DLF experiments. 2 Robustness here relates to the strength of the association of variants with hypothetical speaker groups, where the DLF experiments used either a categorical input distribution in the training session – i.e. one group would always use variant x, while the complementary group always used y – or a strongly preferential association in the order of 80%20%, cf. footnote 1 above.

21

indexical categories than was originally anticipated by Docherty et al.: They are well capable of replicating input distributions even in scenarios where the phonetic substance of the variable is unknown; in addition, they tend to replicate input frequencies rather faithfully even if these are asymmetrical or less robustly weighted in comparison to the kinds of weightings participants encountered in the DLF tasks. Lexical effects, on the other hand, will be shown to be virtually non-existent. Overall, a strong case will be made in favor of regarding the phonological variable as the primary level of sociophonetic learning. In addition, the results will also be interpreted against the background of salience and probability matching. The former concept has a long pedigree in various corners of linguistics, while at the same time being notoriously elusive in terms of measurability and explanatory force. Whereas the latter problem – what does the concept of salience buy us? – will not be discussed, it will argued that to some extent at least, the kinds of experimental set-up that underlies the data in chapter 3 reflects one aspect of salience rather directly, namely the degree to which differential degrees of noticing hold as a function of the phonetic shape of the variable and its associated variants as well as the distributional properties of the variants. In other words, it will be argued that these tasks represent a way to measure variable-intrinsic salience in a non-circular manner. Probability matching is a term that ultimately stems from research in the behavioral sciences as well as in economics (cf. inter alia Cosmides and Toby 1996, Gallistel 1990/2005, Gigerenzer 2000, Harper 1982, Vulkan 2000, Weber 1998; for an accessible overview cf. http://naturalrationality.blogspot.de/2007/11/probability-matching-briefintro.html) and has been introduced into the field of linguistic by Labov (1994: 580-88, cf. ch. 3 for a brief description of the concept). Labov essentially claims that the probability matching strategy as observed in various pattern-detection tasks also represents the way language learners deal with sociolinguistically stratified variation. An elaborate case is made for probability matching as the underlying mechanism behind specific types of sound changes (pull chains); however, no direct experimental evidence has been shown so far that shows that this is what people actually do when approaching linguistic data sets. This ‘blank’ will be addressed in chapter 3 below; in a nutshell, it will be argued that listeners can be regarded as probability matchers overall, while at the same time various caveats also apply, which is to say that listeners do not always replicate the training data perfectly. For example, listeners are relatively reluctant to reproduce input 22

distributions of variants if these are categorically tied to non-linguistic categories.

Chapter 4 – Sociophonetic Learning in L1 and L2 The data that will be presented and analyzed in chapter 4 represent an extension of the type of task employed by DLF and the follow-ups reported on in chapter 3. Specifically, the experiments that will be reported on in chapter 4 investigate how listeners deal with increasingly complex sets of training data, on the one hand, and maximally simple data pools on the other hand. In addition, this chapter seeks to unify the two overarching themes of the previous chapters - perception of sociolinguistic variation by non-native listeners and lab-based sociophonetic learning - in that the relevant experiments were run with native as well as non-native speakers of the input language, in this case English. Whereas both the DLF experiments and the follow-ups reported on in chapter 3 involve two variants of one variable distributed over two hypothetical speaker groups, two of the test conditions reported on in chapter 4 increase complexity in terms of the number of variables (condition 4/1, where the input data pool in the training session includes three variables rather than one) as well as in terms of ‘social complexity’ (condition 4/2, where variants are distributed over three hypothetical speaker groups rather than two). The kind of set-up exemplified by condition 4/1 essentially investigates variable interaction. It will be shown that the degree to which listeners internalize input distributions of sociolinguistic variables and their variants to some extent hinges on the presence of additional variables in the data pool. On the other hand, increasing the social complexity has little impact on pattern detection. In addition to analyzing these comparably more complex data pools, chapter 4 will also present data from a pilot study that essentially takes the approach in the opposite direction by looking at how listeners handle extremely small amounts of socially stratified data, which can furthermore be argued to be a priori unknown on the basis of not being present in the listeners’ native language – specifically, German listeners were trained and tested on English data involving the intervocalic glottal stop variable. It will be shown that listeners form non-random associations of phonetic variants with hypothetical speaker groups even in minimal data scenarios involving only one group-defining token in the training session. With regard to the ‘complex data conditions’ it will be argued that to some extent at least, learning success – and therewith a given variant’s salience – 23

cannot be fully understood by just studying the properties of the relevant variant in isolation; rather, we need to develop some degree of sensitivity to the role and status of additional – perhaps competing – variables and their associated variants in order to approximate a more in-depth understanding of the constraints and capabilities in linguistic pattern detection. The comparison of the native US English participants to the German L2 test groups will show that pattern detection in this type of task involves an interaction of the properties of the L2 participants’ native language (in this case German), sociolinguistic ‘classroom defaults’ (see above), and firsthand experience with the target language in its native environment. Finally, the data reported on in chapter 4 will take a humble step in the direction of ‘transcending sociophonetics’ in the sense that one of the variables participants encountered in the training session was located in the morpho-syntactic domain, namely double negation in conjunction with lack of 3rd person concord. By definition, sociophonetics has strongly focused on investigating how people deal with (i.e. perceive, process, and store) phonetic data. On the other hand, a non-negligible amount of research in general sociolinguistics has revealed that variables in any domain (including morphology and syntax, cf. Labov 1972 on double negation specifically, Hudson 1995 for an overview) are in principle available to serve as the substrate of sociolinguistic structure.

Conventions At various stages of this book I will mention phonemes, linguistic variables and their variants, and non-linguistic categories (independent variables). When writing about phonemes whose real-world instantiations are consonants, I will use IPA. Within the context of sociolinguistic alternations I will enclose variables in brackets and their associated variants in square brackets (so the variable intervocalic t in US English will be transcribed (t), its real world manifestations (‘variants’) will be labeled [t] and [d], respectively). A similar convention will be used with regard to the vocalic variables and their associated variants, except that I will occasionally also use Wells’ lexical set terminology when talking about vowel phonemes (neither the use of lexical set terms as such nor the specific grouping of etymological categories they group under specific labels is entirely unproblematical, cf. Langstrof 2006: chs.1 and 5 for discussion). Table 1.1 below states the relevant correspondences as they will be used throughout the remainder of this book:

24

Wells

IPA

Trager-Smith3

FLEECE

i(:)

iy

Sample lexical items4 fleece, seat

START

a(:)

ah

start, dance, path

THOUGHT

o(:)

oh

thought, order

GOOSE NURSE

u(:) ɜ(:)

uw n/a

goose, fool nurse, her, fir

KIT

ɪ

y

kit

DRESS

ɛ

e

dress

TRAP

æ

æ

trap, ant

STRUT

a

ə

strut

LOT FOOT

ɒ ʊ

a u

lot foot

FACE

ei

ey

face

PRICE MOUTH

ai au

ay aw

price mouth

GOAT

ou

əw

goat

CHOICE NEAR

oi ɪə

oy n/a

choice, toy near

SQUARE

ɪə

n/a

square, fair

CURE

uə ə

n/a n/a

cure, moor comma

COMMA

broad,

Table 1.1 – Correspondences between Wells’ lexical set labels for vowel phonemes and IPA assuming a Southern British English type of distributional system.

Variables, to the extent that they function as such in the course of the analyses, will be stated in brackets. An illustrative example: “Evaluations in terms of (OCCUPATIONAL STATUS) were elicited for both the [t] and the [d] variants of (t) variable, the same was done for the [ei] variant of the FLEECE vowel as well as the variable (neg).”

3

Although Trager-Smith conventions will not be used in this book, they were added for completeness (cf. Trager-Smith 1957). 4 Assuming an RP type of distribution

25

Chapter 2 Sociophonetic Learning in L2 2.1 Background This chapter investigates whether and to what extent non-native listeners are sensitive to extralinguistic categories that are associated with phonetic variants in L2. Recent research in phonetics and sociolinguistics suggests that phonological representations are rich, and that phonetic data and their extralinguistic correlates are intrinsically intertwined (Johnson 1997, Pierrehumbert 2001, Hay et al. 2006). In addition, it has been suggested that sociophonetic exemplars undergo continuous updating over the course of an individual's lifetime (Harrington 2006, Harrington et al 2000a/b, 2005). However, most of the research that seeks to investigate the role of recency in sociophonetic processing has focused on the pronunciation patterns of individual speakers. For example, Harrington (2006) investigated Queen Elizabeth II's Christmas speeches over the last 60 years showing that the Queen (a) shows substantial shifts in her pronunciation over time and (b) this shift is in the direction of changing community norms. Although these results suggest that an individual can indeed update their realization of specific speech sounds, it is less clear whether speakers/listeners acquire something truly novel (pronunciation patterns, phonological representations, sociophonetic associations) over the course of their lives, or whether they merely oscillate between different registers that were present from the early stages of language/dialect acquisition. The research reported in this paper aims at investigating the issue from a somewhat different angle: Unlike L1 speakers, L2 learners are unlikely to have any social associations with phonetic variants in L2 prior to being exposed to L2 in a native context. If recency is indeed a relevant factor in the formation of phonetic - and ultimately phonological - exemplars, we would expect L2 learners to acquire non-random associations of phonetic variants with social categories. In addition, these associations are unlikely to represent register shifts over the course of an individual's lifetime due to the lack of sociophonetic categorization in L2 prior to acquiring L2.

26

Exemplar theory and variable learning Recently, a number of linguists have argued in favor of what can be regarded as a rich model of cognitive representation of linguistic data. In a nutshell, it is argued that we store each percept of speech data as a complexly 'tagged' instance of a larger category. Whereas classical approaches to linguistic representation such as generative accounts of language acquisition are regarded as minimalistic and unidirectional, this approach argues that each new percept also updates the overall representation of the category it is recognized as on the part of the listener. What's more, this process of updating applies to the structurally relevant information as well as a range of associated extra-linguistic categories, for example speaker characteristics. Drawing on a model discussed in Nosofsky (1986), this approach to linguistic perception and storage is frequently referred to exemplar theory (also 'episodic memory'). Early formulations outlining the applicability and usefulness of these models with respect to modeling speech perception, processing, and storage include Johnson (1997), Goldinger (1997), and Pierrrehumbert (2001). These proposals have subsequently sparked a vigorous research program defining the potential as well as the constraints of applying exemplar-theoretical modeling to various disparate linguistic data sets. Examples include inter alia Pierrehumbert (2001a/b/02/03/06) and Munson (2010) on the scope and domains of phonetic processing; Wedel (2006) on an evolutionary model of episodic processing; Hay et al. (2006a) on the application of exemplar models against the background of specific diachronic processes; Foulkes and Docherty (2006) on the perspectives of exemplar-theory against the background of fine-grained consonantal and suprasegmental variation; Drager (2009) on the application of exemplar-theoretical explanations against the background of data drawn from community-type studies. For a recent general discussion (and critical evaluation) of exemplar-based models of phonetics and phonology, see Docherty and Foulkes (2014). To give an example of the basic mechanisms involved in exemplarbased models of speech processing: Imagine my colleague enters the office on 17 October 2011 9am and says 'Hallo Christian'. The classical (structuralist and generativist) view would be to assume that the listener - me in this case - perceives and recognizes the structurally relevant information on a barebones level (something like phonological /halo kristja:n/ plus the corresponding information in the domains of morpho-syntax, semantics, etc.) Whatever else I might be associating with the relevant percept - such as speaker characteristics - is being stored, if at all, as part of an independent 27

sociolinguistic component. In other words, the social information does not form part of the linguistic representation of 'hallo Christian' in the classical view. In a bare-bones exemplar theory type of approach, however, what used to be regarded as the structural information is intrinsically intertwined with additional seemingly extra-linguistic categories. For example, a simplistic approximation of an exemplarist representation of the second sound in this particular instance of my colleague's utterance 'hallo' (let's call this particular token a1) might look like this:

continuous f0 contour at 221 hz (i.e. modal voicing) spoken 23 September 2011 9am concentrations of acoustic energy at 1110hz and 1450hz curly hair female university lecturer rainy weather outside during utterance just over 60 milliseconds in duration acoustically similar to speaker's sound in e.g. 'K/a/tze' lives in France leaves turning red outside etc.

etc.

etc.

‘H[a]llo’

The fact that this is an unstructured feature bundle follows from the view that all these bits of information form rather different domains form an integral part of the way a1 enters and updates the overall representation of 28

some cognitive category, let's call this category "A". These representations are sometimes referred to as exemplar clouds. Imagine a hypothetical scenario where my only experience with A prior to being greeted by my colleague are my own tokens of A, i.e. an aggregate of all my real-world instances of /a/. At this stage, my exemplar cloud for A might look something like this:

continuous f0 contour at 105 hz (i.e. modal voicing) spoken at vastly different times concentrations of acoustic energy at 920hz and 1230hz straight hair male university lecturer lots of different weather conditions during utterances between 30 and 200 ms in duration acoustically similar to speaker's sound in e.g. 'Katze' lives in Germany etc.

etc.

etc.

As a1 becomes part of A, some of the information undergoes updating. For example, whereas my previous representation A specified that the percept is always associated with males, the update that occurs as a result of a1 entering A results in A incorporating the additional 'tag' FEMALE. As my exposure to increasing amounts of real-world encounters of A increases over time, new experience specifies that A is not exclusively a property of male 29

speakers, nor is it a property of people with a specific hair color. Further experience with real-world data would eventually update A in a way that there is no correlation whatsoever with instances of A and hair color, so that over time the language learner would learn that this sound is just as likely to be uttered by blonde people as by dark-haired people, i.e. a random distribution (non-correlation) of the likelihood of uttering instances of A on the one hand, and the speaker’s hair color on the other hand, would emerge in the listener’s cognitive representation of A. Similar non-correlations would be arrived at vis-à-vis the speakers’ job, the weather outside during the utterance, et cetera. However - and this one key aspects illustrating the usefulness of exemplarist approaches to linguistic representations - large amounts of real-world data will reveal non-random distributions of some properties of A percepts with other properties. For example, it is well-known that what phoneticians call formant frequency in vowels (i.e. concentrations of acoustic energy in different parts of the overall frequency spectrum) is a function of the size and shape of the supralaryngeal vocal tract (Fant 1960, Stevens 2000, Johnson 2011). All other things being equal, females have shorter vocal tracts than males on average, which results in their having measurably higher formant frequencies. As Johnson (1997) argues, adopting an exemplar theory approach to speech perception and representation allows us to circumvent a long-standing conundrum in speech perception, namely how speakers of different vocal tract size can even understand each other (the normalization problem). Early formant frequency data presented by Peterson and Barney (1952) indicated that the bare-bones acoustic input data is problematical from the point of view of perceptual resolution: If we measure formant frequency data of large numbers of vowels spoken by large numbers of speakers, we observe that distributions of what are phonemically different vowels frequently overlap each other. To illustrate the problem in a simplified way, figure 2.1 below plots average formant frequency averages of lax vowels spoken by 2 speakers of American English, 1 male and 1 female.

30

F2 in hz 2500

2000

1500

1000

500 200 300

male /i/ male /E/ female /i/ female /E/

female /u/ male /a/ male /o/ male /ae/

female /a/

400 500 600

F1 in hz

male /u/

700 female /o/ 800

female /ae/ 900

Figure 2.1 - Male and female formant frequency averages of US English lax vowels elicited in a h_d frame (i.e. hid – head – had – hud – hod – hood). The data points represent raw formant frequency averages over 5 tokens per vowel and speaker.

It can be observed that the female /u/ is acoustically a lot closer to the male /a/ than the latter is to female /a/. From the point of view of pooled data, why should listeners be able recognize the relevant data points as instances of /a/ rather than /u/? Earlier approaches to this issue proposed a number of normalization procedures, which were by some regarded as a rough analogue of the kinds of transformations listeners apply during the process of speech perception in order to disambiguate acoustically problematical data. Examples of proposed normalization procedures include inter alia Gerstman (1968), Lobanov (1971), and Neary (1978/89). None of these are unproblematical for various reasons (cf. the discussions in Disner 1980, Adank (2003), Adank et al. (2004), and Langstrof (2006: ch. 2)), one consequence is that these procedures render the process of speech perception a lot more complex than was originally thought. Johnson (1997) argues that the problem largely disappears if we adopt an exemplartheoretical approach to the normalization conundrum, the upshot of Johnson's account being that representations such as figure 2.1 above are illusionary from a practical point of view: Exemplar theory would state that the data point located at 524/1359hz in figure 2.1 above (i.e. ‘female /u/’) is intrinsically tagged as FEMALE in the perception of the listener the moment is 31

being uttered, amongst many other properties such as "similar to the sound that occurs in words like 'foot' and 'wool'", "uttered in Germany", etc. It would therefore enter and update a 'sub-cloud' of exemplars, namely only the token pool that is tagged as FEMALE in the listener's exemplar cloud, which implies that it never participates in any sort of close perceptual competition with the male /a/ in this case provided that the person who utters the relevant speech sound has been successfully identified as a female speaker beforehand. One of the key assumptions is that exemplar clouds, i.e. cognitive representations, undergo continuous updating as listeners are being exposed to new data in their everyday lives. Not all data points are equal, though: It has been argued that the main predictors of impact on the overall shape of any given exemplar cloud are frequency and recency. The former concept implies that frequent associations between various perceptual properties will have a more severe effect on whether these associations are stored compared to infrequent ones. Coming back to the initial hypothetical example, the degree to which my colleague’s utterances of /a/ will have a noticeable impact on my internalized exemplar cloud A is partly determined by the shape and robustness of the exemplar cloud at the time she says ‘Hallo Christian’. Imagine that A consists of only one single token of said vowel by the time my colleague says ‘Hallo Christian’. The single token as uttered by my colleague would then have a severe impact on the shape of the exemplar cloud since 50% of A would be the result of that particular data point and its properties. So if my exemplar cloud A (a single one of my own tokens of /a/) prior to my colleague entering the room had the property “concentration of acoustic energy at ~600hz”, and my colleague’s token a1 is added to A with a first formant in the 900 hertz region, the overall representation of A would shift to a F1 value somewhere in-between the two values. Further instances of /a/ that are uttered by my colleague would subsequently shift the overall direction of A towards the properties associated with her. In other words, the weight of percepts are a function of both the solidity of the exemplar cloud as it exists at some point in time and the coherence and frequency of new percepts. Recency, on the other hand, implies that percepts and their properties are weighted more strongly in people’s cognition if they are relatively recent, i.e. a speech sound that was perceived 5 minutes ago has a stronger impact on the exemplar cloud than a comparable speech sound that was heard a week ago, all other things being equal. And although a considerable body of research has recently been devoted to the investigation 32

of frequency effects in language (cf. the various contributions stemming from the University of Freiburg-based research cluster on frequency effects in language, a list of relevant publications can be found under: http://frequenz.uni-freiburg.de/assets/files/Publikationsliste-GRK-1624-1709-2013.pdf), recency remains somewhat under-researched in comparison. Evidence supporting the hypothesis that sociophonetic representations undergo updating over time can be gleaned from two strands of research, longitudinal studies of the speech patterns of individual speakers on the one hand, and the analysis of variation in the realization of phonemes as a function of ‘word age’ in the speech of individuals. Perhaps the most widely known example of research of the former kind is exemplified by Harrington et al’s work on change in the pronunciation of Queen Elizabeth 2nd’s annual Christmas speeches over the last 6 decades (Harrington et al. 2000a/b, 2005; Harrington 2006). What these studies show is that the Queen’s pronunciation can be shown to change over time; specifically, it can be shown to change in a way that reflects change in the speech community as a whole. An additional line of research has shown that, in the speech of individuals, words that are associated with experiences that occurred at comparably early stages in a given speaker’s lifetime sometimes exemplify more conservative variants of variables that are undergoing change in the speech community which the speaker in question has grown up in. For example, New Zealand English has lost rhoticity (i.e. pronunciation of postvocalic /r/ in words like ‘car’, ‘cart’) during the early stages of its development while still being variably present in the speech of second and third generation speakers in the late 19th and early 20th century. Gordon et al (2004) demonstrated that a number of these early speakers of New Zealand English are more likely to retain rhoticity in words associated with childhood reminiscences, ‘old’ concepts such as farming and mining, and so on. In a similar vein, Langstrof (2005/06) has shown that a number of third and fourth-generation speakers of New Zealand English show more conservative realizations of the so-called broad-A vowel (i.e. the vowel in words like dance and bath) in proper nouns that designate toponyms relating to places where the speakers grew up in. For example, one of the speakers in the sample’s only instance of the non-backed – i.e. conservative – realization of broad A is found in the place name Alexandra, which designates the town in which that speaker grew up. Further evidence is supplied by Yaeger-Dror (1996) study on Montreal French. Walker and Hay (2011) have recently shown that these effects can also be reproduced in lab-based lexical decision tasks: Listeners 33

are more accurate at identifying words in scenarios where word age is matched with speaker age compared to situations where the input data is at odds with the test subjects’ real-life linguistic experience, i.e. if a word that is more likely to be used by younger speakers is presented by an old voice, and vice versa. Although these studies have provided useful evidence in showing that sociophonetic representations are malleable over the course of an individual speaker’s lifetime, it is not always clear whether these facts provide any direct evidence in favor of late (i.e. post-adolescent and beyond) updating of exemplar clouds. For example, the fact that Queen Elizabeth 2 shows change in a number of speech sounds that to an extent mirror changes in the speech community does not per se establish that she – or any other speaker, for that matter – actually acquires something truly novel in terms of sociophonetic memory. Alternatively, it is perhaps not entirely inconceivable that speakers such as Queen Elizabeth, rather than acquiring new sociophonetic representations, simply switch between competing representations that were already part of their social-indexical repertoire from the earliest stages of their linguistic experience onwards. So what changes in this type of hypothetical scenario is the speaker’s willingness to adapt to an alternative linguistic norm rather than the nature of that norm (in the form of one or several internalized sociophonetic models) itself. I would like to claim that in order to provide more solid evidence in favor of recent updating of sociophonetic representations it would be highly advantageous to study sociophonetic acquisition processes in blank slate type of situations, i.e. where there cannot be any potential pre-existing array of competing sociophonetic models for the listener to differentially choose from over the course of their linguistic biographies. Of course, since we are all native speakers of some native language, it is hard to conceive of a situation where this type of blank-slate situation could be observed on the basis of L1 data. Therefore, it might be a potentially fruitful endeavor to investigate whether and to what extent people whose first experience with a given array of phonetic variants and their associated social-indexical categories occurs long after the process of L1 acquisition and its associated acquisition of dialectal competence. Hence, the research reported on in this chapter probes the issue in that it presents data from a range of exploratory studies aiming at investigating L2 language learners’ capability in acquiring phonetic variants and their associated social-indexical categories in a foreign language. More specifically, I will address the following research questions:

34

(i) Do non-native listeners have non-random associations between sociophonetic variables and extralinguistic categories? The baseline that needs to be established in order to support the hypothesis that sociophonetic updating takes place over the course of an individual's lifetime against the background of L2 learning is that L2 learners are able to link at least some kinds of variables to extralinguistic categories in a nonrandom fashion provided such a link exists in the native L2 environment. This is a necessary condition sine qua non if updating of sociophonetic representations occurs at all. (ii) If they do, does first-hand experience with L2 have any influence on how they evaluate variables? Specifically, do L2 listeners evaluate phonetically similar variables and their variants differently in different languages? If we succeed in establishing that non-native listeners have non-random associations between linguistic variability and social categories, the next step would be to investigate where these associations come from. Theoretically, three options are conceivable: L2 learners might simply carry over value judgments from their native language to an L2, which implies that we would predict non-random associations between variants and non-linguistic categories only in cases where a given variable that occurs in L2 has a clear phonetic correlate in L1. In addition, this scenario would also lead us to expect that whatever the relevant associations might be, they should be identical in both L1 and L2. Alternatively, it is conceivable that L2 learners operate on what we might call a 'sociolinguistic elsewhere condition', i.e. establish a notion of standardness in L2 based on the kinds of variants they are exposed to as part of their secondary schooling experience. To exemplify this: German learners of English cannot be expected to encounter glottal stop variants as valid realizations of medial /t/ in a secondary school context. When exposed to variants of medial /t/ - including alveolar stops and glottal stops - in a perception and evaluation task, they might simply associate the previously unencountered variable (i.e. glottal stop in this case) with those extra-linguistic categories that they also associate sub-standard / colloquial variants with in their own language, such as lower occupational status, (perhaps) lower age, and so on. However, if recency is indeed a strong predictor in the formation of sociolinguistic exemplars in L2 contexts, we would predict a fundamental difference between non-native learners who have experienced L2 in a native environment and those who haven't. Specifically, we would expect the former to show associations between linguistic variants and social categories 35

that are more similar to those shown by native speakers of L2 compared to native speakers of L1. Ideally, we would also expect those L2 learners who have had prior first-hand experience with L2 to also evaluate phonetically similar variables and their associated variants differently across the two languages provided that such differences exist in the speech communities of the two languages. It will be argued below that such a case exists between German and English: Although the correspondence is not perfect5, one might argue that the alternation between voiced and voiceless variants of intervocalic /t/ is phonetically similar in North American English on the one hand and various dialects of German on the other hand. Crucially, though, the way those variants map onto extralinguistic categories is fundamentally different: Whereas the voiced (or 'attenuated') variant represents the unmarked baseline realization of medial /t/ in North America, it is associated with various non-standard accents of German. Under the third scenario as outlined above - i.e. updating of sociophonetic representations as a function of recent exposure to L2 in a native context - we would expect those Germans who have stayed in the US for a prolonged period of time to have formed fundamentally different associations of medial /t/ variants depending on whether the variable and its variants occur in English, or German. (iii) Are there any links between the way L2 listeners form sociophonetic representations in L2 and the degree to which they adopt specific L2 variants in their own speech? This question links perception and production: If it can be shown that nonnative listeners develop native-like mappings of sociolinguistically stratified variants onto extra-linguistic categories, we might further ask whether this goes along with any differences in their own speech production. That fundamental differences exist between perception and production has been shown repeatedly in the sociophonetics literature: For example, Hay et al. (2006) have shown that speakers of New Zealand English can employ phonological differences that they do not show in their own speech nor in 5

Although the variability between voiceless and voiced (attenuated) /t/ variants per se is comparable in US English and the relevant varieties of German, a difference exists in that this pattern is part of a more general rule in German dialects where t > d / V_V is included in C[+voice] > C[-voice] / Cont[+voice]_cont[+voice], i.e. a consonant undergoes voicing if it occurs in-between voiced continuants. In US English, the rule is rather more restricted in that it exclusively affects /t/ rather than all voiceless obstruents. In addition, the conditioning factor in US English is restricted to V_V and N_V. Whether this in any way affects the data at hand cannot be determined.

36

their conscious perception in order to identify social groups. On the flipside, Niedzielski (1999) has shown that speakers of Michigan English produce variants that they ascribe to out-group speakers (in this case Canadians) when asked about the relevant variables and their specific variants. Whether and to what extent similar phenomena are also found in a cross-linguistic setting has not been investigated so far. The research question as stated above breaks down into two subquestions: Firstly, we might ask whether perception of sociophonetic variation on the part of non-native speakers is in any way linked to their overall phonetic competence in L2. In other words, are L2 speakers who show fewer instances of L2 accent patterns in their speech also less likely to acquire native-like sociophonetic representations in the perceptual domain? Secondly, we might ask whether learning success of perceptual representations of sociolinguistic variation is accompanied by an inclination to adopt similar variants in their own production of L2. In order to address these questions, the test subjects also participated in a production task in order to elicit production data on variables that frequently show carry-over effects from L1, in this case from German into English. Examples include final obstruent devoicing, l-allophonisation (more specifically, lack thereof in German-accented English), and /ɛ/-/æ/ merger. In addition, test group participants were also asked to produce words that included a range of variables that typically distinguish North American English from RP. These include medial /t/, broad A, and Rhoticity (cf. section 2.2 below for details).

2.2 Method A total of 64 participants took part in two test conditions, a US condition and a UK condition. The test group in the US condition consisted of 20 participants, while the test group in the UK condition consisted of 16 participants. The English control group in the UK condition consisted of 13 participants, the analogous control group in the US condition consisted of 5 participants. The same 5 participants made up the German control group in both conditions. 2.2.1 US condition Aims The overarching goal of this condition was to investigate whether test subjects react differently to phonetically similar variables if those variables 37

occur in different languages, and with different evaluative associations. The variable that was tested for is flapped /t/ in intervocalic position. Although the flapped variant occurs in both American English and a range of German varieties, it is never part of Standard German whereas it constitutes the ‘unmarked’ – pragmatically neutral – realization of intervocalic /t/ in American English; Eckert points out that unattenuated /t/ is associated with a range of marked conversational and social-indexical contexts and properties including prissiness, Britishness, gayness, careful speech, and others (Eckert 2008). Subjects The test subjects in this condition were 20 native speakers of German with at least 6 months of first-hand exposure to North American English. In addition, 5 native speakers of American English as well as 5 native speakers of German without any first-hand experience with American English were recruited as control subjects. Tasks Test subjects were exposed to a perception task consisting of 48 words containing the variable as well as an equal number of distracter tokens. The test material was repeated twice, each round elicited value judgments on a different extralinguistic variable. The variables that were investigated include the speaker’s AGE, OCCUPATION, and LIKEABILITY. The input stimuli consisted of single word utterances produced by four phonetically trained native speakers of American English. Each of the four input speakers supplied the same number of test words and filler items (distracters), as well as an equal number of each of the two variants [d] and [t] in order to make sure that associations of linguistic materials with extralinguistic categories are not a function of individual voices. For each iteration of a test token, listeners were asked to provide a reply along the following categories: For AGE, answer options included 51. For OCCUPATION, answer options included blue collar worker, white collar worker, academic. LIKEABILITY was evaluated on a 5-point likert-scale, where 1 = not likeable > 5 = very likeable. In each round, 96 data points were elicited per subject, half of which contained the test variable, the other half contained distracter tokens. In addition, the German test group also participated in a similar task involving the same value judgments in German words including the variable /t/ between vowels. The data was produced by four phonetically trained native speakers of standard German and cross-matched for words and 38

variants, i.e. every input speaker contributed the same number of tokens for each variant and word. After the perception task, test subjects also participated in a brief production task that involved reading out a word list. This was done in order to investigate potential links between listeners’ awareness to variation in the perception task and the degree to which they use North American variants themselves as well as the degree to which they retain German accent features (i.e. testing for dialect competence and language competence). The word list is shown below. had dance had dance had

butter hat simple past butter hat simple past butter hat

hatter link hatter link hatter

start example start example start

head file head file head

pet fast pet fast pet

The following production variables were included: US accent variables: Intervocalic /t/ This variable is included in the words hatter and butter. Unlike what we find in RP and Standard German, the unmarked realization in the US is a voiced flap [d]~[ɾ] rather than an unvoiced plosive [t]. Rhoticity This relates to whether or not /r/ is realized in coda position. Unlike most varieties of North American English, most varieties of German - specifically, Standard German - are non-rhotic, i.e. etymological /r/ in coda position either drops out entirely or develops into a schwa-like reflex. /l/ in coda position Unlike American English, Standard German uses 'clear l' (i.e. an alveolar lateral with no further constriction) in coda position (for discussion regarding the phonetic details of l-‘darkness’ cf. Recasens et al. 1995, Recasens 1996) flat vs. broad a This term refers to the distribution of 'Middle English short a' (words like trap, ant, dance, fast) in various varieties of contemporary English, which is 39

markedly different in US English compared to RP as well as other British varieties. Specifically, whereas most US varieties have a front vowel [æ] in all the relevant words, RP as well as other South English varieties have two qualitatively distinct reflexes, namely [æ] in words like trap and ant, and a low back [ɑ:] in words like dance and fast (For an overview of the development and current distribution of broad A vs. flat A in a number of contemporary varieties of English see Lass (1976), Langstrof (2006: ch. 6), Kortmann and Langstrof (2012)). The relevant test words include dance, fast, example, and simple past. German accent variables Final obstruent devoicing Most varieties of German including Standard German have neutralized the etymological contrast between voiced and voiceless obstruents in codas, so that words that used to differ in terms of this feature such as Rad ‘wheel’ vs. Rat ‘council’ are both pronounced /ra:t/ (many of which still do so in morphophonemic alternations, cf. plural Rä[t]e vs. Rä[d]er). Most varieties of English, on the other hand, maintain the contrast as such on the phonological level, although much discussion has been devoted as to what phonetic cues actually carry the contrast (Lisker (1986), Kingston and Diehl (1994), Iverson and Salmons (1995), Nittrouer (2004), for a summary see Thomas (2011: 125)). /ɛ/ /æ/ identity Whereas most varieties of English have two short front mid-vowels situated in-between [I] (as in the RP realization of the KIT vowel, Standard German has a phonetically comparable element in words like Sitz ‘seat’ and [a] (as in the Australian English realization of STRUT or the North British English realization of TRAP/PATH/DANCE, Standard German has a comparable element in words like Satz, ‘sentence’) on the open-close scale, this is not the case in Standard German, which has only one short vowel in that region of the vowel space in words like setzen ‘put, set, sit’, usually a half open vowel [ɛ]. As a result, pairs such as head and had are usually pronounced as homophones in Germanaccented English. /l/ in coda position This is as much a German accent variable as it is a US vs. UK accent variable in that both US and most varieties of English English (such as RP) have 'dark l' 40

(i.e. an alveolar lateral with an additional constriction in post-palatal/velar place of articulation.) in coda position, whereas most varieties of German have clear l across the board. Further Independent variables Due to the restricted availability of potential test subjects that could be recruited at Freiburg University, no controls were applied beyond the requirement to have lived in the US for at least 6 months. However, subjects were asked to fill in a questionnaire after the experiment proper indicating the time, place, and duration of their stay in the US.

2.2.2 UK condition Aims Whereas the US condition tested L2 listeners on the same variable in their native language as well as in the foreign language, the goal of the UK condition was to investigate how L2 listeners react to a range of variables that (a) do not occur in the test subjects’ native language (German) and (b) differ in terms of both phonetic nature and non-linguistic correlates in the target language. The variables that were employed in this condition include intervocalic glottal stop, monophthongal vs. diphthongal FACE, and monophthongal vs. diphthongal FLEECE. The use of intervocalic glottal stop as an allophone of /t/ in words like butter, matter, etc. is a rather salient feature of UK English. Although originally – and rather emblematically associated with working class London English (‘Cockney’), it has recently spread outside its original domain in terms of both region and social class, to the extent that it is regarded as a typical feature of a pan-English variety – or a cluster thereof – frequently referred to as ‘Estuary English’ (Rosewarne 1984, Altendorf 2003, for a critical discussion of this concept cf. Trudgill 2001: 171-80). Monophthongization of FACE6, on the other hand, is a typical regional feature: The monophthong is usually restricted to the North of England as well as Scotland (Wells 1982: 210). Finally, diphthongization of FLEECE is a feature which originally arose as part of long vowel shifts in the South-East of England. And although the distributional data from various varieties of 6

This is a rather RP-centric way of wording the variable, since the diphthongal realizations we find in RP and other varieties of English are clearly innovations. The historically correct statement should be worded along the lines of 'lack of diphthongization of FACE' rather than 'monophthongisation'.

41

English English indicates that this is both a geographical feature as well as a sociolinguistically stratified one, it less emblematically associated with these categories compared to the intervocalic glottal stop variable (for diphthong shift in various varieties of English cf. Wells 1982: 308-10, for FLEECEdiphthongization in the South-Eastern parts of England cf. inter alia Altendorf 2003, Williams and Kerswill 1999, Trudgill 2004). Participants The test subjects in this condition were 16 native speakers of German with at least 5 months of first-hand exposure to English English7. In addition, 13 native speakers of English English speakers as well as 5 native speakers of German without any first-hand experience with English English were recruited as control subjects. Note that the German control subjects were the same people who also participated in the US condition as control subjects. Tasks Test subjects were exposed to a perception task consisting of words containing the variable as well as an equal number of distracter tokens. The test material was repeated twice, each round elicited value judgments on a different extralinguistic variable. The variables that were investigated include the speaker’s AGE, OCCUPATION, and LIKEABILITY. The input stimuli consisted of single word utterances produced by 4 phonetically trained native speakers of RP. Each of the 4 input speakers supplied the same number of test words and filler items (distracters), as well as an equal number of each of the two variants of the test variables in order to make sure that associations of linguistic materials with extralinguistic categories are not a function of individual voices. The answer categories AGE, OCCUPATION, and LIKEABILITY were identical to the ones used in the US condition. After the perception task, test subjects participated in the same production task as was used in the US condition. Independent variables As with the US condition, no a priori controls were applied beyond the requirement of 5 months exposure to English English.

7

This was a purely conventional choice: The majority of test group participants in the UK conditions were students who had formerly participated in the ERASMUS student exchange program, the typical duration of which is 5 months.

42

2.3. Results US condition – Perception 2.3.1 US Condition - OCCUPATION Figure 2.2 displays the results for the variable OCCUPATION.

US condition - OCCUPATION German test group 3

German tokens

US tokens [t]

[d]

distracters

[t]

[d]

distracters

2

1 US condition OCCUPATION German control group

US condition OCCUPATION US control group 3

3

2

2

1

1 t

2.2. (b)

d

distracters

t

d

distracters

2.2. (c)

Figure 2.2 - Histograms showing occupation ratings in the US condition. Fig. 2.2 (a) shows the test group, 2.2 (b) and (c) show the control groups. Fig. 2.2 (a) shows the test group’s average responses to the English tokens (the three bars on the left) and the German tokens (the three bars on the right), respectively. On the x-axis, the data is broken down into the relevant variables as indicated in the axis caption. The y-axis represents dummy-coded (cf. Baayen 2008) numerical averages, where 1 corresponds to blue collar worker (‘Arbeiter’ in the German condition), 2= white collar worker (‘Angestellter’ in the German condition), 3 = academic (‘Akademiker’ in the German condition).

43

Figure 2.2 (a) shows the results for the German test group, i.e. native German participants with at least 6 months of exposure to US English. The data is broken down into subtasks: The three bars on the left-hand side show the pooled average replies to the US variants, whereas the bars on the righthand side show responses to the German data. The data is furthermore broken down along the factor variant; black bars display responses to the [t] variant, whereas grey bars show responses to [d]. The white bars show the distracter tokens. Figure 2.2 (a) shows that first of all, noticeable differences obtain when we compare the test group participants' evaluation across languages. When exposed to English data, participants assign significantly higher occupation scores to the [t] variant compared to both the [d] variant and the distracter tokens (the difference between [t] and [d] scores is significant on the xy< occurs within a specific variety of a language, the higher the salience of one of the two features. In other words, Rasz’ notion of salience is essentially paradigmatic as well as internal, i.e. a function of distributional properties of linguistic features. This accounts for the fact that one and the same feature is commented on more prominently in some linguistic contexts – i.e. specific phonotactic sequences - compared to that same feature in other contexts; a case in point being glottalization in the UK, which is a lot more likely to attract social comment in scenarios where it is 85

used intervocalically compared to other contexts such as word-finally. In addition, Rasz’ approach for formulating a predictive quantitative formulation of a partial theory of salience, since transitional probabilities are straightforwardly measurable, i.e. if we know a given language’s phonotactics, we can to an extent predict which features can be assumed to be salient to speakers of that language. I would like to claim that the approach taken in the experiments reported in this chapter as well as the following one complement Rasz’ approach to assessing linguistically determined salience by looking at the paradigmatic side of the issue; that is, salience grounded in the contingencies of the linguistic (i.e. acoustic, auditory, lexical, etc.) and distributional features of variants of a given variable, since listeners are only asked to assign manufactured input distributions of variants to hypothetical speaker groups without any connections to real-world knowledge or experiences regarding ‘secondary’ (i.e. social) categories such as standard vs. substandard, socio-cultural contingencies, and so on. This essentially amounts to a somewhat minimalistic approach to the concept of primary (i.e. linguistic) salience: A salient feature is a feature people notice. What does 'noticing' imply? Technically speaking, noticing probably implies something along the lines of: A feature - in our case a linguistic variant - whose distribution is not entirely random with respect to another feature - non-linguistic categories - in the mental representation of listeners. Although this probably reads rather cryptic, the underlying idea is relatively straightforward: A listener becomes aware of some linguistic feature. In other words, what said listener does is detect or infer a pattern with regard to the distribution of that feature, i.e. 'this feature is used by person Y rather than anyone else, the people in this region rather than anyone else, females rather than males, etc.'. Note that this definition is impartial as to the correctness of the kinds of patterns listeners come up with, or whether they do so on a conscious basis, what counts is that they infer a non-random distribution of a linguistic feature and one or several non-linguistic categories. If we accept this definition, the approach taken in these experiments relates to salience in a relatively straightforward manner since all of them confront learners with non-randomly distributed input data. In other words, if the data contains patterns, the question is whether and to what extent listeners are able to detect - 'notice' - them. And although they do not eo ipso tell us why a given feature is salient, they allow for an in-depth analysis of the kinds of factors that play a role in sociophonetic pattern detection 86

relative to each other, in other words, what is salient relative to something else. Of course this view is rather restricted in scope as it excludes a whole range of factors that may well have a bearing on salience, such as speaker characteristics beyond the usage of linguistic variables (I will refer to this aspect as ‘secondary’ or ‘external’ salience, i.e. a listener notices a linguistic feature not on its own grounds, but because of noteworthy non-linguistic speaker characteristics such as fancy clothes, or because someone else tells the listener that the speaker uses a given feature, in which case the listener does not ‘notice’ the feature as result of their own perception and processing mechanisms). On the flipside, the controlled nature of the learning tasks allows for establishing a salience hierarchy relating for different kinds of factors associated with the variants in the learning task: Phonetic familiarity, input frequency, the shape of input distributions, etc.

Probability matching An additional topic these experiments have the potential to address is the question of whether and to what extent learning of sociolinguistically structured variables and their associated variants follows the principle of probability matching. Labov (1994: 580) has claimed that humans are essentially probability matchers. What this implies is that whenever humans are faced with non-categorical input weightings in any sort of pattern detection task, they usually tend to replicate the input weighting rather faithfully rather than overshoot the input. Interestingly, Labov quotes a range of ethological studies that show that humans show this type of behavior even if it turns out to be detrimental to their individual goals. The example Labov discusses is the so-called t-maze test:

Reward?

?

87

Reward?

T-maze, schematic In this type of test, participants of various species (such as humans, ducks, rats) are faced with a behavior-> reward type of task, e.g. in each iteration/run, they encounter some kind of reward (food, money) at one end of the maze, but not the other. If the distribution of rewards in the training runs is neither categorical nor random, humans will match their t-maze behavior to the weightings they previously encountered. Why is this interesting? Assume the input weighting is 80/20, i.e. a participant learns that in about 80% of all cases, a reward will be found at the right terminus of the maze, whereas in 20% of all cases, the reward will be found at the lefthand terminus. In subsequent runs, the majority of human participants will opt for the right terminus in 80% of all cases while opting for the left terminus in 20% of all cases. A simple calculation shows that this not the optimal pattern on an individual basis. Rather, participants would get more out of the test runs if they opted for the 80% terminus every single time. scenario 1, probability matching type of response scheme, 80/20 input 0.2*0.2+0.8*0.8 =68% likelihood to receive reward scenario 2, categorical response scheme, 80/20 input =80% likelihood to receive reward Interestingly, different species react differently to his type of task. Whereas humans (as well as ducks, for example) show probability matching, rats seem to oscillate between the different strategies (Gallistel 1990). Therefore, an individual rat who adopts the categorical response strategy will be more likely to receive the reward more often compared to a member of a probability matching species. While it is interesting to see why humans should show such non-optimal output behavior - this can be shown to be optimal on a group- and habitat-based rather than an individual view - what is more relevant to the study of variation is Labov's argument that probability matching is a deeply ingrained behavioral patter in humans and can therefore account for why non-categorical variant distributions can be passed down through several generations of language learners, sometimes throughout very long stretches of time, as in various stable variation types of scenarios. What is less clear, however, is whether and under what conditions 88

probability matching can be overridden. That this should be possible in theory is a logical inference based on the existence of linguistic stereotypes. Technically speaking, a stereotype is nothing more than a mismatch between the input frequency of a linguistic variant on the part of some speaker (as a perceived representative of some group) and the frequency/likelihood with which a listener ascribes said feature to any representative of that group. An anecdotal example: Matter-of-factly, speakers of English English rarely use intervocalic glottal stops 100% of the times where this variant may be applicable. However, Australians almost always use intervocalic glottal stops when mocking the pronunciation of English speakers. In other words, in this scenario the output frequency of a variant (close to 100% on the part of the Australian listener) far exceeds the input (the Australian's previous experience with English English intervocalic /t/, in all likelihood less than 100%). What is less clear, however, is whether stereotyping - i.e. lack of probability matching - is mainly a function of wrongly inferred input frequencies, or whether additional factors also enter the equation. In other words, do stereotypers fail to learn the input probabilities, or do they consciously ignore/override them? If input weightings play a role, it is furthermore unclear what kind of input weighting is actually conducive to stereotyping: Is it the frequency or the rarity of a feature that promote stereotyping? A priori, either scenario is plausible: (1) A listener perceives a high - yet not categorical - correlation between some linguistic feature and a non-linguistic category (e.g. intervocalic tattenuation American English speakers) and the weight of the positive evidence eclipses the rare instances of counterevidence. As a result, the counterevidence might be too rare to enter the perceptual representation of the listener -> stereotyping occurs. (2) A listener perceives a feature - let's say a phonetic variant like intervocalic glottal stop - in the output of one speaker group, but not in that of others. Even if that feature does not happen to be particularly common in the group of speakers that uses it, it is distinctive of that group. As a result, a listener might be tempted to over-ascribe that feature to the group in order to demarcate that group.

89

In the experiments discussed in this chapter, input conditions were created that mirror the above scenarios in various ways (cf. below for a detailed description of each of the test conditions).

3.2 Previous studies In the DLF experiments, the data participants were confronted with in the training sessions consisted of 160 test words containing the phonetic variant and 160 distracter tokens. The data was produced by four speakers and cross-balanced across both test words and distracters in order to make sure that associations with social categories cannot be intrinsically formed on the basis of speaker characteristics. The data was presented using DMDX/MSPowerPoint. The audio recording - i.e. the stimulus proper - was presented alongside a picture containing a graphic representation of the word. In addition, a visual indication of the social category was also presented, i.e. each auditory stimulus was tagged as 'tribe 1' or 'tribe 2', respectively. Participants were distributed across four experimental conditions: In condition 1, each hypothetical speaker group - i.e. 'tribe' - showed differences in the use of variants of intervocalic /t/ in words like butter, matter, etc. Whereas tribe 1 was always associated with alveolar [t], the glottal stop realisation was categorically tagged 'Tribe 2'. In condition 2, the same variable was used. However, in this condition 80% of the variables were produced with an alveolar stop by tribe1 speakers, the remaining 20% were produced with a glottal stop. The converse distribution applied to the stimuli that were tagged as tribe 2. Condition 3 went one step further and tested for a different sort of phonetic variation while maintaining the 80/20 20/80 distribution that was used in condition 2. The variable that was used was diphthongisation of the FLEECE vowel, which confronts learners with what can be expected to represent a rather less 'salient' variable in terms of both phonetic detail and social affect. Whereas tribe 1 stimuli were predominantly associated with a monophthongal realisation of FLEECE in the input data, tribe 2 stimuli were predominantly associated with a slightly diphthongised version, which represents a common realizational variant in the UK (Tollfree 1999, Williams and Kerswill 1999). Finally, condition 4 trained and tested participants on another vocalic variable, diphthongisation of FACE. The difference between this variable and the previous one is that degree of diphthongisation in FACE is a regionally stratified phenomenon, 90

monophthongal realisations being a typically Northern English phenomenon (Wells 1982: 210). Figure 3.1 displays the pooled results for each of the test conditions. Task 1 - Input distribution 100 80 60 40 20 0 Tribe 1 Tribe 2 Glottals

Tribe 1 Tribe 2 Alveolars

Tribe 1 Tribe 2 Distracters

Task 1 - Reply distribution 100 80 60 40 20 0 Tribe 1 Tribe 2

Tribe 1 Tribe 2

Glottals

Alveolars

Tribe 1 Tribe 2 Distracters

Fig. 3.1 (a)

100

Task 2 - Input distribution

80 60 40 20 0 Tribe 1 Tribe 2

Tribe 1 Tribe 2

Glottals

Alveolars

91

Tribe 1 Tribe 2 Distracters

Task 2 - Reply distribution

100 80 60 40 20 0 Tribe 1 Tribe 2

Tribe 1 Tribe 2

Glottals

Alveolars

Tribe 1 Tribe 2 Distracters

Fig. 3.1 (b)

Task 3 - Input distribution 100 80 60 40 20 0 Tribe 1 Tribe 2 Monophthongs

Tribe 1 Tribe 2 Diphthongs

Tribe 1 Tribe 2 Distracters

Task 3 - Reply distribution 100 80 60 40 20 0 Tribe 1 Tribe 2 Monophthongs

Tribe 1 Tribe 2 Diphthongs

Fig. 3.1 (c)

92

Tribe 1 Tribe 2 Distracters

100

Task 4 - Input distribution

80 60 40 20 0 Tribe 1 Tribe 2

Tribe 1 Tribe 2

Monophthongs

100

Diphthongs

Tribe 1 Tribe 2 Distracters

Task 4 - Reply distribution

80 60 40 20 0 Tribe 1 Tribe 2

Tribe 1 Tribe 2

Monophthongs

Diphthongs

Tribe 1 Tribe 2 Distracters

Fig. 3.1 (d)

Figure 3.1 - Histograms showing pooled tribe 1 / tribe 2 replies across 4 experimental conditions (Docherty et al 2013). Both the input distribution and the average reply distributions are shown.

It can be observed that, by and large, listeners successfully managed to learn and replicate the distribution of alveolar vs. glottal stop across the two hypothetical speaker groups they were trained on in conditions 1 and 2. An additional finding was that listeners tune in on variables rather than their variants. In other word, a listener who is good at learning the input weighting of the glottal stop variant in condition 2 is also more likely to successfully learn the weighting of the alveolar stop. Figure 2 illustrates this by showing a scatter plot of highly correlated responses per subject suggesting that the distributions of responses are due to the same underlying factor (presumably learning).

93

Figure 3.2: Scatterplot showing a strong negative correlation (Spearman's rho = -.89, p< 0001) for individual speakers between tribe1 answers across the two variants of the medial /t/ variable (DLF's task 2, cf. figure 3.1(b) above). Each data point represents an individual test subject. The line represents a locally weighted scatter plot smoother fit through the data (Cleveland 1979).

Condition 3 (cf. fig. 3.1c above), in contrast, seems to have presented listeners with a rather more challenging task: Sociophonetic learning was much less evident compared to the glottal stop conditions, although a small number of speakers seem to show emergent sociophonetic learning, i.e. show non-randomly distributed responses across tribes. Although this was expected on the grounds of lower salience of the FLEECE variable compared to medial /t/ in phonetic and sociolinguistic terms, what was somewhat more surprising was the result for condition 4, i.e. monophthongal vs. diphthongal realisations of FACE. Recall that this is a variable that is relatively solidly cued in terms of region as well as perhaps auditorily more salient compared to FLEECE diphthongisation. However, the results indicate that listeners in condition 4 by and large failed to learn the associations that they were exposed to in the input data. On the whole, the findings from the DLF study 94

indicate that (a) lab-based sociophonetic learning is possible and that (b) learning is more successful with regard to some variables - in this case a consonantal variable with one highly salient variant - whereas variables that are arguably less salient on both phonetic and sociolinguistic counts are much harder to learn. The research reported on in this chapter elaborates on a number of issues arising from the DLF project. One of the problems with the DLF data pool is that it is not entirely clear which factors actually predict cross-variable variation in terms of learning success. For instance, we might ask whether the fact that listeners do extremely well in the glottal stop condition while showing either significant undershoot or no learning at all with regard to the vowel variables is due to phonetic salience of the variables themselves, or whether this has to with external factors such as stigmatisation, sociolectal and dialectal entrenchment, and so on. Given the relatively restricted number of variables in the DLF series of experiments, co-variation inevitably arises with regard to the above-mentioned factors. For example, intervocalic glottal stop is both phonetically highly salient and strongly associated with external factors such as age, social class, and possibly region. Monophthongal FACE on the other hand is presumably rather less salient as well as strongly associated with region, but probably less so with age. An additional issue arising from the DLF project is to what extent the kind of learning exemplified by the listeners actually requires any degree of familiarity at all with the type of variation encountered in the experiment. Little control on external variables was applied in the course of the DLF projects beyond the requirement that all test subjects should be native speakers of English English, the implication being that test subjects can be expected to have at least some experience with the variables that were used as a substrate for tribe identification in the training data. Thus, the follow-up research reported on in this chapter breaks down into the following research questions: (i) Is sociophonetic learning in the lab restricted to real variables? All of the variables that were used in the DLF experiments are 'real' variables in English English, that is, listeners can be expected to have encountered them before outside the context of the learning experiment proper. It is unclear, however, whether and to what extent this experience is drawn upon – or even required - in the experimental situation itself. In other words, the learning aspect of the DLF experiments relates to one side of the equation 95

only, namely the previously unencountered hypothetical social groups. However, sociophoneticians would probably also want to show that learning can also happen with respect to previously unknown variables. Otherwise, it would be hard to conceive of a mechanism by way of which new sociophonetic patterns based on new variables and their variants should ever arise. To investigate the issue, two test conditions (condition 3/3 and 3/4, see below) were designed exposing German test subjects to variants that do not occur in German: Nasalised vowels (condition 3/3) and upgliding diphthongs with a mid-nucleus (condition 3/4). (ii) Is sociophonetic learning in the lab restricted to variables embedded in real words? An additional point that is connected to the above question is to what extent sociophonetic variation is actually required to be embedded in real words. Again, using real words as a structural framework in which sociophonetically stratified variation is located may provide a link between the experimental situation proper and listeners' general sociophonetic experience. This is also relevant in light of recent arguments within exemplarist frameworks as to the prominence of the word grosso modo in phonetic/phonological processing (Pierrehumbert 2002, Hay 2013). If this is true, a variable in real words can presumably be expected to be learned rather more readily compared to the same variable in nonsense words. In the follow-up experiments, four conditions included variables embedded in fabricated lexical environments (conditions 3/2-3/5, see below). (iii) Is lab-based sociophonetic learning sensitive to lexical contingencies? As a corollary of the previous question, we might ask whether languageinternal sources of variation beyond phonetic variation proper are also reflected in listeners’ performance. In the DLF experiments, all words that carried the phonetic contrast had the same input weighting. However, if learners are able to track differential frequency distributions of variants across words as predicted by exemplarist models of speech processing, this should be reflected to some extent in learning tasks of the DLF type. In the experiments reported on here, one test condition (condition 3/1) was designed to test this prediction. 96

(iv) To what extent does learning of sociophonetic variants of a given variable depend on the degree of phonetic detail/distance between the two variants? As was pointed out above, the variables that were employed in the DLF project could not ultimately resolve the issue of whether learning success is a function of external factors such as stigma and prestige, or whether phonetic facts are a more solid predictor. Participants in the DLF experiments did well in the glottal stop condition, i.e. a variable that is salient on both external (social) counts as well as in terms of its phonetic properties. Some of the follow-up conditions reported on below aim at addressing the problem (cf. condition 3/3 and 3/4 below, respectively). (v) Is there any link between this type of sociophonetic learning and listeners' own production/evaluation of different variants? The DLF project focussed exclusively on perception and learning of sociophonetic variation, ignoring listeners' awareness and opinions about specific variables and their respective variants. However, factors such as salience - and therewith learnability in terms of pattern detection - crucially depend on - and are even defined on the basis of - the degree to which listeners show either conscious or unconscious awareness of the variable in question (cf. Labov 1972 on the fundamental differences between markers, indicators, and stereotypes). In order to attain a more in-depth understanding of learning capabilities and constraints, it would certainly be useful to calibrate listeners' performance in learning tasks against the degree to which they are aware of the variable as such.

3.3 Method Methodologically, the approach taken in these experiments is broadly comparable to the research reported on by Docherty et al.: Test subjects were trained and tested on a range of variables in various test conditions (see below). The test subjects were students recruited at the University of Freiburg, all of whom were native speakers of German. The experiments consisted of two discrete stages, a training session and a test session. In the training session, listeners were exposed to input data consisting of 160 test tokens as well as 80 distracter tokens. Each condition involved training and 97

testing of one phonetic variable with two variants. Participants were told that they were listening to a range of words spoken by members of two different groups (referred to as Gruppe 1 (i.e. 'group 1') and Gruppe 2 ('group 2') in the experiment). The data was presented using PowerPoint presentations on a laptop computer. In the training session each slide consisted of a sound file as well as either an orthographical representation of the word (in condition 3/1) or a picture (in conditions 3/2-3/5, see below). In addition, each slide indicated whether the speaker is a member of group 1 or group 2. In the subsequent test session, participants were exposed to the same tokens as before except that no indication of group affiliation was given. Participants were asked to indicate which of the two groups they think each token should be assigned to. The data was produced by four trained linguists, 2 males and 2 females. Each speaker contributed the same number of test tokens and distracters. In addition, each speaker contributed equal numbers tokens of both variants in order to make sure that there is no intrinsic association of specific variants in the input data with individual speakers. The data was presented using PowerPoint presentations on a laptop computer. Tokens were presented in 3 second intervals. A total of 111 test subjects were distributed over 8 test conditions: 15 participants took part in each of the conditions 3/1, 3/3, 3/4. 10 participants took part in conditions 3/2, 3/6, 3/7. 6 participants took part in condition 3/5. 20 participants took part in condition 3/8. 5 participants took part in conditions 3/6b and an ancillary condition (cf. below). Condition 3/1 – real words, real variable German test subjects were trained on differential distributions of a real variable embedded in real words. The main purpose behind condition 3/1 was to test whether the type of ad hoc learning observed by Docherty et al. also applies to a variable that is not particularly prominently associated with region or social class. In addition, this condition was designed to test tested for links between sociophonetic learning and evaluation as well as the test subjects’ own production with regard to the test variable. After the test phase of the learning experiment proper, listeners were exposed to the same stimuli in two rounds eliciting judgements on the speakers’ (a) OCCUPATION and (b) LIKEABILITY. Answer options for OCCUPATION included blue collar worker (Arbeiter), white collar worker (Angestellter), academic

98

(Akademiker). Answer options for LIKEABILITY were chosen from a 5-point likert scale ranging from 1 (not likeable) to 5 (very likeable). Finally, test subjects were asked to read out a brief word list and a range of short sentences containing the test words as well as a range of distracter tokens. The test variable was [ɛ:] vs. [e:] in German words like Käse ‘cheese'. The open vowel is the etymologically older variant. The use of [e:] - the 'new', unetymological variant - amounts to a merger since [e:] is a phonemic German vowel, etymological in words like Beeren ('berries'), leben ('live'), Beben ('quake'), etc11. Sociolinguistically, the raised realization of /ɛ:/ - and therewith the merged system - is not particularly strongly stigmatised, and has recently been accepted by the Duden guidelines for Standard German as an acceptable pronunciation variant of /ɛ:/. The input distribution of variants over the hypothetical speaker groups was set at 80% [ɛ:] vs. 20% [e:] in group 1 as opposed to 20% [ɛ:] vs. 80% [e:] in group 2. In addition, one test word (Bären, i.e. 'bears') did not show this distribution. Rather, Bären had a categorical distribution in the input data, i.e. the [e:] variant was always associated with group 1, while the [ɛ:] variant was always associated with group 2. This was done in order to test for whether listeners learn phonological variables exclusively, or whether they also keep track of differential distributions of phonological variants across different lexical items. An overall number of 160 test tokens were used in the training session, which is comparable to the amount of test tokens in the Docherty et al. experiments. 80 distracter tokens were also used. In the subsequent test session, the overall number of tokens was cut by 50%, yielding a total of 80 test tokens as well as 40 dummy replies per participant. The evaluation task in condition 3/1 included 32 tokens per round*test subject. The production task in condition 1 consisted of the following material: (1) sentences Er mochte die Vase 'He liked the vase' Er hat den Käse geschnitten, nicht geschmiert 'He sliced the cheese, he didn't spread it' 11

Modern German /e:/ descends from Proto-Germanic /ai/ in a range of conditioned environments, whereas /ɛ:/ reflects umlauted reflexes of /a:/, cf. Lass (1976: ch. 3) for an overview.

99

Wiesengrün statt Auspuffschwarz 'meadow-green instead of muffler-black' Es waren Braunbären, keine Schwarzbären 'They were brown bears, not black bears' Strassenrecht, nicht Wegerecht 'street-law, not road-law' So ein Maß an Häme hätte ich von einem Dozenten nicht erwartet 'I would not have expected such malice on the part of a lecturer' Each of the sentences was read out twice (the position of the variable is indicated in bold underscored letters). The positional differences of the test variable in terms of stress/tonicity were deliberate. Sentences not containing the variable served as distracters and as baselines for vowel normalization. (2) word list Wiese, Huhn, Vase, Käse, Besen, Bären, Häme Phonologically: /vi:zə/, /hu:n/, /va:zə/, /kɛ:zə/, /be:zen/12, /bɛ:rən/, /hɛ:mə/ i.e. ‘meadow, chicken, vase, cheese, broom, bears, malice‘ The word list was read out 3 times by each test subject, the order of words was permutated across iterations. Filler items served as distracters and as baselines for vowel normalization. Examples of what the PowerPoint sheets that accompanied the sound stimuli looked like in condition 1 is shown in figure 3.23 below.

12

Note that in words like Besen, /e:/ is etymological.

100

GRUPPE 2 Käse

Käse

Figure 3.3 - PowerPoint slides as shown to participants in the training session (left) and the test session (right) condition 3/1.

In the test session of the learning experiment as well as in the evaluation task, answer sheets were used to record replies. The production task data was recorded in mp3 format using a Roland Edirol R09 mp3/wav recorder. Condition 3/2 – hypothetical words, real variable In condition 3/2, listeners were trained on the same variable as in condition one: [ɛ:] vs. [e:] using the same weighting of variants as in condition 3/1. However, the variable was embedded in hypothetical words of the shape VCV, the first vowel being the variable. Test subjects were instructed as follows:

Liebe/r Testteilnehmer/in, Vielen Dank für die Teilnahme an diesem Experiment! Das Experiment ist wie folgt aufgebaut:

(1) In einer ersten Trainingsphase werden Sie Wörter einer Ihnen unbekannten Sprache hören. Des Weiteren werden Sie zu jedem Wort Bilder auf dem Bildschirm sehen, die die jeweilige Bedeutung des Wortes darstellen. Nehmen Sie an, dass es bei den Sprechern dieser Sprache unterschiedliche Gruppen gibt, die in diesem Experiment als ‚Gruppe 1’ und ‚Gruppe 2’ bezeichnet werden. In der Trainingsphase des Experimentes wird für jedes Wort angegeben, welcher der beiden Gruppen der jeweilige Sprecher zuzuordnen ist.

101

(2) Nach Beendigung der Trainingsphase gibt es eine Testphase, im Laufe derer Sie jedes Wort einer der beiden Gruppen zuordnen sollen. Viel Spass! [Dear participant, Thank you for participating in this experiment! The experiment is structured as follows:

(1) In a training session you will listen to words from an unknown language. In addition, you will see a picture displaying the meaning of each word. Assume that there are two distinct groups of speakers in this language, who will be referred to as ‘group 1’ and ‘group 2’ respectively in the course of this experiment. During the training stage, it will be indicated which group the speaker belongs to. (2) After the training session there will be a test session during which you will be asked to assign each word to either one of the two groups. Have fun! (Translation my own, CL)]

The overall number of lexical items was limited to four: 2 words contained the test variable ([e:tə] / [ɛ:tə] signified "house", [e:kə] / [ɛ:kə] signified "car"), and two containing no variable ([o:tə] signifying "door", [u:tə] signifying "window"]). Note that although these words do not occur in German, they are possible German words in terms of phonotactic structure. In order to further minimise the demands on learning the semantic content of the lexical items, a picture of each of the concepts conveyed by the hypothetical words was shown.

102

GRUPPE 1

Figure 3.4 - PowerPoint slides as shown to participants in the training (left) and the test session (right) condition 2.

Condition 3/3 – hypothetical words+hypothetical variable In this condition, listeners were trained on a variable that does not occur in German, namely a nasalised vowel with the variants [ɛ~] and [ɔ~]. The variable was embedded in the same phonological environments as in condition 3/2: Two test words as well as two distracter tokens. The number of tokens per subject as well as the variant distribution in the input data was identical to the previous conditions; the instructions were identical to those in condition 3/2.

Condition 3/4 – hypothetical words, hypothetical variable Similar to condition 3/3, condition 3/4 was meant to test whether listeners are able to learn sociophonetic variation from scratch, i.e. assign variants of a variable that does not occur in their native language to hypothetical speaker groups. However, it can be argued that the kind of variation test subjects encountered in condition 3/3 is only partially unknown since German does have the non-nasalised versions of the vowels used in condition 3 as phonemes. Condition 3/4 therefore tested listeners on a type of variation that is arguably less salient in terms of phonetic distance, namely variable degrees of opening in the nucleus of an upgliding mid vowel. The variants that were used in this condition can be transcribed as [ei] vs. [ɛi]. The number of tokens per subject as well as the variant distribution in the input data was identical to the previous conditions; the instructions were identical to conditions 3/2 and 3/3.

103

Condition 3/5 – hypothetical words, 1 hypothetical variant, 1 real variant While the variants of the test variables listeners encountered in conditions 3/1-3/4 were either both known or both unknown, condition 3/5 trained listeners on a variable that has a known variant [e:] as well as an unknown one [ei]. The number of tokens per subject as well as the variant distribution in the input data was identical to the previous conditions; the instructions were identical to conditions 3/2-3/4. While the basic data elicitation paradigm that was employed in conditions 3/6-3/8 reported on below is identical to the approach taken before - a selection of test subjects being confronted with non-randomly distributed input data during a training session and subsequently tested on whether they were successful in forming non-random associations between phonetic variants and non-linguistic tags, i.e. labels referring to hypothetical speaker groups- these tests manipulated the weight and the distributional shape of variants while controlling for the phonetic substance of the variable. The variable that was chosen was the same that was used in conditions 3/1 and 3/2 above, namely differential degrees of height in long front monophthongs ([ɛ:] vs. [e:]). The same recordings were used, the relevant on-screen presentations were adapted to fit the new input distributions. Condition 3/6 Condition 3/6 trained and tested listeners on an asymmetrical types of input distribution: Unlike the input weightings that were used in conditions 3/13/5 where the variant weighting in one of the hypothetical social groups was the inverse of the weighting found in the other group, condition 3/6 associated one variant with one group on a categorical basis (100/0) while associating the complementary variant predominantly with that same group. Specifically, group 2 always used the low-mid variant [ɛ]; [e:], on the other hand, was associated with group 2 80% of all times. This leaves us with a comparatively small number of test tokens that were associated with group 1 in the training session, all of which were [e:]. A total of 80 distracter tokens were distributed over the two speaker groups at chance level. 10 test subjects were recruited for this condition, all of whom were native speakers of German. The basic idea behind this design was to test for one out of a range of different types of stereotyping (see above): If one out of two variants is both rare as well as exclusively associated with one out the two groups, do listeners over-ascribe this variant to that group? A rough real-world correlate 104

to this scenario would be the use of intervocalic glottal stop in two speech communities, England and the US: Whereas the American speakers never use intervocalic stop - equivalence to [e:] usage in group 1 - , English speakers sometimes use it, but by no means exclusively - a rough equivalent to group 2. In other words, are absolute numbers (the weight of the evidence) better predictors to the association of variants with non-linguistic categories - in which case we would not expect overshoot - or is it the structure of the evidence (exclusive association of a small number of variants to one group) that predicts such associations, in which case overshoot seems a plausible scenario? Condition 3/6b13 Condition 3/6b approaches the issue of asymmetry from a somewhat different angle: Whereas in all other conditions the overall number of tokens was identical in the two hypothetical speaker groups (80 test tokens tagged to each group), condition 3/6b confronted listeners with a situation where a majority of the overall test token pool (80%) was associated with one group (group 1) in the training data, whereas only 20% of the test tokens was associated with group 2. In addition, the latter only used the [e:] variant while the variants were evenly weighted in group 1. This condition therefore elaborates on the design in condition 3/6 in that there is room for stereotyping in two ways: Test subjects may either be led to over-ascribe the [ɛ:] variant to group 1 since this is the only group who uses [ɛ:]. On the flipside, they might also over-ascribe [e:] to group 2 since group 2 uses only [e:]. In addition, the evidence for group 2 data is rare overall. If rarity is a predictor of stereotyping, over-assignment of [e:] to group 2 is also expected. 5 participants took part in this condition; the data consisted of real words.

Condition 3/7 Condition 3/7 was similar to condition 3/6: Group 1 used one variant ([ɛ:]) 100% of all times the training session, whereas group 2 used [ɛ:] 20% of the times and [e:] 80% of the times. The contrast to condition 3/6 therefore lies with group 2. The idea behind this design was to test the first one of our hypothetical stereotyping mechanisms as outlined above:

13

This condition originally arose accidentally as a result of a coding error in the experimental design, hence the small number of participants and the seemingly odd labeling. The results were deemed sufficiently interesting to warrant reporting on.

105

A listener perceives a high - yet not categorical - correlation between some linguistic feature and a non-linguistic category (e.g. intervocalic t-attenuation American English speakers) and the weight of the positive evidence eclipses the rare instances of counterevidence. As a result, the counterevidence might be too rare to enter the perceptual representation of the listener -> stereotyping occurs. Unlike condition 3/6, some of the participants were trained on variants embedded in real words whereas others were trained on the same variants in nonsense words. A total of 10 native speakers of German took part in condition 3/7, 5 of whom completed the real-word version, while 5 were trained and tested on the nonsense words. Condition 3/8 Condition 3/8 trained and tested listeners on a symmetrical variant distribution comparable to the data obtained in conditions 3/1-3/5. Unlike conditions 3/1-3/5, however, the cueing of group membership was based on a much less robust distribution of variants: Group 1 used [ɛ:] 60% of all times while using [e:] 40% of the times, the inverse situation held in group 2. Distracter tokens were evenly distributed across the two groups. The idea behind this condition was for test for whether the overall solidity of the associations between variants and social categories affects learnability at all, and if so, whether the relation between input and output weightings is linear, or whether there are cut-off points. 20 Participants were trained and tested in condition 3/8; 10 of whom on variants embedded in real words, another 10 on variants embedded in nonsense words.

3.4

Results

Condition 3/1 The results of the real word+real variable condition are shown in figure 3.5 below. The histograms show the percentage of tokens (on the y-axis) assigned to either one of the two hypothetical groups. Grey bars indicate group 2 responses, while black bars represent group 1 responses. On the xaxis the data is furthermore broken down into response averages to variants: The two bars to the left show the results for the open variant [ɛ:], whereas 106

the centre bars show the results for the close variant [e:]. The distracter tokens are also shown.

Distribution of variants across groups real word + real variable condition 100 80 60 40 20 0

group 1

group 2

group 1

E

group 2 e

group 1

group 2

distracters

Figure 3.5 – Histograms showing output variants in condition 3/1. Differences in group assignment across variants are significant on the p < 0.001 level for the test tokens as determined by a Wilcox test. In addition, both distributions are significantly different to the filler items on the 0.001 level as determined by a Wilcox test. ‘E’ represents the open variant [ɛ:], whereas ‘e’ represents the closer variant [e:].

It can be observed that test subjects in condition 3/1 successfully managed to associate known variants to hypothetical speaker groups. The result is comparable to the 80/20 glottal stop condition in Docherty et al.’s experiments: On aggregate, listeners reproduce the input distribution rather faithfully. In other words, although there is a slight tendency to overshoot the input distribution, listeners do not categorically assign a variant to the group that shows predominant – but not exclusive – usage of that variant. Word-based effects Recall that one of the key motivations underlying the project design in condition 3/1 was to investigate issues beyond learning on the level of phonetic variants alone. Rather, phonetic variation in the input data was 107

distributed differentially across lexical items: Whereas two out of three test words – Häme and Käse - showed non-categorical weighting in the input data, this was not the case for the third test word (Bären), where all [e:] tokens were assigned to group 1 in the input data, whereas all [ɛ:] tokens were assigned to group 2. If listeners keep track of variation across lexical items, we would expect them to show differential output weightings for Bären tokens compared to Häme and Käse. This prediction is not borne out, as figure 3.6 indicates.

Group 1 responses across different lexical items 100 80 60 40 20 0 E

e

E

Häme

e Käse

E

e Bären

Figure 3.6 – Histograms showing group 1 responses across lexical items. ‘E’ represents the open variant [ɛ:], whereas ‘e’ represents the closer variant [e:].

Although the overall differences across words are relatively minor, some of these are actually statistically significant as determined by a Wilcox test. Specifically, the [ɛ:] variant in the word Käse is significantly more likely to be assigned to group 1 compared to the same variant in the words Häme and Bären (p X% in the test) go in the opposite direction with regard to the complementary variant. In other words, the fact that the correlation is more robust in the DIFFINPUT2 measure compared to DIFFINPUT illustrates that cross-subject coherence in terms of learning patterns rather than, say, a general bias in favor of one of the two social categories, in which case we 202

would expect this correlation to be weaker compared to the DIFFINPUT measure. Finally, reply averages also correlate strongly if we measure the divergence from chance level (DIFFCHANCE, fig. 4.16(c))21. Overall, each of the three potential measures yield highly significant correlations between group 2 replies for variant X on the one hand, and variant Y on the other, which in turn suggests that irrespective of the measure we regard as the one that reflects the learning parameter – and it was argued above that this is to some degree an aesthetic choice – we observe that variable-based learning is the strategy learners employ in these types of tasks.

Figure 4.16 (a) rho = 0.24 p< 0.001

21

The fact that we observe an even stronger correlation in the DIFFCHANCE parameter compared to DIFFINPUT2 is probably due to the fact that this measure essentially wires in both learning success and the nature of the underlying input distributions. This is to say that differential degrees of divergence from chance level are also a function of different test conditions since some of the input distribution in e.g. condition 3/8 was 6040 vs. 4060, i.e. closer to chance level compared to the other conditions. In a hypothetical scenario where every single participant faithfully reproduces the input distributions in every single test condition, the DIFFINPUT measures would not yield a correlation between anything since every data point would be located at 0/0. Correlating data points along the parameter DIFFCHANCE in the same scenario, on the other hand, would yield a correlation due to the fact that diffchance captures this epiphenomenal correlation. It is not unlikely that the somewhat stronger correlation that was observed in X (c) vis-à-vis X (b) is partly due to this factor.

203

Figure 4.16 (b) rho = - 0.52; p < 0.001

Figure 4.16 (c) rho = 0.63 p < 0.001

Figure 4.16 – Spearman correlation test results along three different parameters diffinput (4.16a), diffinput2 (4.16b), and diffchance (4.16c). Each data point represents an individual participant’s group 2 percentages with respect to a given variant of the test variable (on the x-axis) against that same individual’s percentage of group 2 replies in the complementary variant (on the y-axis). The lines represent a locally weighted scatter plot smoother fit through the data (Cleveland 1979). Correlation coefficients and significance levels are also shown.

Summary - Factor analysis and Correlations Overall, the data reviewed in the above sections provide strong evidence for the prominent role of the phonological variable as the main predictor of pattern detection in the relevant experiments: The CART analysis discussed reveals that the factor variable takes precedence over other factors in predicting the patterns in the pooled data. In addition, correlation tests showed that variable-based learning by and large reflects individual response 204

patterns. That is to say, individual participants show comparable response patterns to both variants of any given variable; people who verge towards a more categorical response scheme with regard to one variant also do so with regard to the complementary one, people who undershoot the input distribution of one variant also undershoot the input distribution of the complementary variant.

205

References Adank, P. (2003) Vowel normalisation: A perceptual-acoustic study of Dutch vowels. Doctoral dissertation, Nijmwegen. Adank, P., R. van Houdt, and R. Smits (2004) An acoustic description of the vowels of Northern and southern Standard Dutch. Journal of the Acoustical Society of America 116: 1729-1738. Altendorf, U. (2003) Estuary English: Levelling at the interface of RP and South-Eastern British English. Narr. Altendorf, U. and D. Watt (2008) The dialects in the South of England: Phonology. In B. Kortmann & C. Upton (eds.) Varieties of English: The British Isles. Mouton de Gruyter, Berlin, pp. 194-222. Auer, P., B. Barden, and B. Grosskopf (1998) Subjective and objective parameters determining 'salience' in long-term dialect accommodation. Journal of Sociolinguistics 2: 163-88. Baayen, H. (2008) Analyzing linguistic data: A practical introduction using R. CUP. Breiman, L., J. Friedman, R. Olshen, and C. Stone (1984) Classification and Regression Trees. New York: Chapman & Hall. Cleveland, W. S. (1979) Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74: 829-836. Cosmides, L., and J. Tooby (1996) Are humans good Intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition 58: 1-73. Disner, S. (1980) Evaluation of vowel normalization procedures. Journal of the Acoustical Society of America 67: 253-261. Docherty, G. (2007) Speech in its natural habitat: accounting for social factors in phonetic variability. In: J. Cole and J. I. Hualde (eds.) Laboratory Phonology 9. Berlin: Mouton de Gruyter, pp. 1-35.

206

Docherty, G. and C. Langstrof (2008a) Perceptual evaluation of sociophonetic variability: how do listeners learn? Poster presented at the 11th conference on Laboratory Phonology, Wellington, July 2008. Docherty, G. and C. Langstrof (2008b) How is sociophonetic variability learned? Paper presented at the Colloquium of the British Association of Academic Phoneticians, Sheffield, April 2008. Docherty, G., C. Langstrof, and P. Foulkes (2013) Listener evaluation of sociophonetic variability: probing constraints and capabilities. Linguistics 51/2: 355-80. Docherty G. and P. Foulkes (1999) Instrumental phonetics and phonological variation: case studies from Newcastle upon Tyne and Derby. In: P. Foulkes and G.J. Docherty (eds.) Urban Voices: Accent Studies in the British Isles. London: Arnold, pp. 47-71. Docherty G. and P. Foulkes (2000) Speaker, speech, and knowledge of sounds. In: N. Burton-Roberts, P. Carr and G. Docherty (eds.) Phonological knowledge: conceptual and empirical issues. Oxford, Oxford University Press, pp. 105-129. Docherty, G., P. Foulkes, D. Watt, and J. Tillotson (2006) On the scope of phonological learning: issues arising from socially structured variation. In: L. Goldstein, D. Whalen, and C. Best (eds.), Papers in Laboratory Phonology 8. Berlin: Walter de Gruyter, pp. 393-421. Docherty, G. and P. Foulkes (2014) An evaluation of usage-based approaches to the modelling of sociophonetic variability. Lingua 142: 42-56. Drager, K. (2009) A sociophonetic ethnography of Selwyn Girls’ High. Unpublished Doctoral Dissertation. University of Canterbury, NZ. Drager, K. (2010) Sensitivity to grammatical and sociophonetic variability in perception. Laboratory Phonology 1: 93-120. Drummond, R. (2010) Speaking like the locals – the acquisition of two local variants in the spoken English of native Polish speakers living in Manchester. In: K. Dziubalska-Kolaczyk, M. Wrembel, and M. Kul (eds.) New Sounds 2010: Proceedings of the 6th International Symposium on the Acquisition of Second Language Speech: 106-112. Drummond, R. (2011) Glottal variation in /t/ in non-native English speech: patterns of acquisition. English Worldwide 32, 3: 280-308. 207

Drummond, R. (2012) Aspects of identity in a second language: ING variation in the speech of Polish migrants living in Manchester, UK. Language Variation and Change 24: 107-133. Drummond, R. (2013) The Manchester Polish STRUT: Dialect acquisition in a second language. Journal of English Linguistics. 41, 1: 65-93. Eckert, P. (2008) Variation and the indexical field. Journal of sociolinguistics 12: 453-76. Elmentaler, M, J. Gessinger and J. Wirrer (2010) Qualitative und quantitative Verfahren in der Ethnodialektologie am Beispiel von Salienz. In: C. Anders, M. Hundt, A. Lasch (eds.) Perceptual dialectology – Neue Wege der Dialektologie. Berlin/New York, pp. 111-149. Fant, G. (1960) Acoustic theory of Speech Production. The Hague: Mouton. Foulkes, P. (2010) Exploring social-indexical knowledge: a long past but a short history. Laboratory Phonology, 1(1): 5-39. Foulkes, P. and G. Docherty, G.J. (2006) The social life of phonetics and phonology. Journal of Phonetics, 34 (4): 409-38. Foulkes, P., G. Docherty, and D. Watt. (2005) Phonological variation in child directed speech. Language, 81: 177-206.

Foulkes, P. Scobbie, J., and D. Watt (2010) Sociophonetics. In: W. Hardcastle, and J. Laver (eds) Handbook of Phonetic Sciences, 2nd edition. Oxford: Blackwell. Gallistel, C. R. (1990) The organization of learning. Cambridge: MIT Press. Gallistel, C. R. (2005) Deconstructing the law of effect. Games and economic behavior, 52(2): 410-423. Gerstman, L. (1968) Classification of self-normalized vowels. IEEE transactions on audio and electroacoustics AU 16: 78-80. Gigerenzer, G. (2000) Adaptive thinking : Rationality in the real world. New York: Oxford University Press.

208

Goldinger, S. (1997) Words and voices: perception and production in an episodic lexicon. In: K. Johnson and J. Mullenix (eds.) Talker Variability in Speech Processing. San Diego: Academic Press, pp. 33-66. Goldinger, S. (2000) The role of perceptual episodes in lexical processing. Keynote address. Published in Proceedings of the Workshop on Spoken Word Access Processes. Max-Planck Institute for Psycholinguistics, Nijmegen, The Netherlands, pp. 155-58. Gordon, E., L. Campbell, J. Hay, M. Maclagan, A. Sudbury, P. Trudgill (2004) New Zealand English: Its origin and evolution. CUP. Gordon, M. (2001) Small-Town values, big city vowels: A study of the Northern Cities shift in Michigan. Durham: Duke University Press. Gordon, M. (2002) Investigating chain Shift and mergers. In: J.K. Chambers et al. (eds.) The Handbook of Language Variation and Change. Blackwell. Harper, D. (1982) Competitive foraging in mallards: ‘Ideal free' ducks. Animal behaviour, 30(2): 575-84. Harrington, J. (2006) An acoustic analysis of ‘happy-tensing’ in the Queen’s christmas broadcasts. Journal of Phonetics 34: 439-57. Harrington J., S. Palethorpe, and C. Watson (2000a) Does the queen still speak the Queen's English? Nature 407: 927-28. Harrington J., S. Palethorpe, and C. Watson (2000b) Monophthongal vowel changes Received Pronunciation: an Acoustic analysis of the Queen's christmas broadcasts. Journal of the International Phonetic Association 30: 63-78. Harrington J., S. Palethorpe, and C. Watson (2005) Deepening or lessening the divide between diphthongs? An analysis of the Queen’s annual Christmas broadcasts. In: W. Hardcastle, and J. Beck (Eds.), A figure of speech: A Festschrift for John Laver, pp. 227–261. Hay, J. (2013) Word-memory and regular sound change. Plenary held at the 7th international conference on language variation in Europe (iclave) in Trondheim, June 2013. Hay, J., A. Nolan and K. Drager (2006) From fush to feesh: Exemplar priming in speech perception. The Linguistic Review 23: 351–379. 209

Hay, J., P. Warren and K. Drager (2006) Factors influencing speech perception in the context of a merger-in-progress. Journal of Phonetics 34: 458-84. Horvath, B. (1985) Variation in Australian English: The sociolects of Sydney. CUP. Hudson, R. (1995) Syntax and Sociolinguistics. In: J. Jacobs, A. Stechow, W. Sternefeld and T. Vennemann (eds.) Syntax. An international handbook of contemporary research. Berlin: Walter de Gruyter, pp. 1514-28. Iverson, J.K. and J. Salmons (1995) Aspiration and laryngeal representation in Germanic. Phonology 12: 369-96. Johnson, K. (1997) Speech perception without speaker normalization: An exemplar model. In K. Johnson, and J. Mullennix (eds.) Talker Variability in Speech Processing. San Diego: Academic Press, pp. 145-165. Johnson, K. (2011) Acoustic and Auditory Phonetics. Blackwell. Kerswill, P. and A. Williams (2002) `Salience' as an explanatory factor in language change: evidence from dialect levelling in urban England. Contributions to the sociology of language 86, pp. 81-110. Khattab G. (2007) Variation in vowel production by English-Arabic bilinguals. In: Hualde, J-I. and J. Cole (eds.) Laboratory Phonology. Berlin: Mouton de Gruyter, pp. 383-410. Kingston, J. and R. Diehl (1994) Phonetic knowledge. Language 70: 419-54. Kortmann, B., and C. Langstrof (2012) Regional varieties of British English. In: A. Bergs and L. Brinton (eds.) Historical Linguistics of English: An international handbook. Berlin/New York: Mouton de Gruyter. Kraljic, T., Brennan, S., and A. Samuel (2008) Accommodating variation: dialects, idiolects and speech processing. Cognition 107: 54-81. Labov, W. (1972a) Negative attraction and negative concord in English grammar. Language 48: 773-818. Labov, W. (1972b) Sociolinguistic patterns. University of Pennsylvania Press.

210

Labov, W. (1994) Principles of linguistic change, Vol. 1: Internal Factors. Blackwell. Labov, W. (2001) Principles of linguistic change, Vol. 2: Social Factors. Blackwell. Labov, W. (2010) Principles of linguistic change, Vol. 3: Cognitive and Cultural Factors. Blackwell. Labov, W., M. Yaeger, and R. Steiner (1972) A quantitative study of sound change in progress. Philadelphia: US regional survey. Langstrof, C. (2005) When did we stop d[ae]ncing? Paper presented at the Victoria University postgraduate conference, Wellington, August 2005 . Langstrof, C. (2006) Vowel change in New Zealand English – Patterns and implications. Unpublished PhD thesis, University of Canterbury. Lass, R. (1976) English Phonology and Phonological Theory. Synchronic and Diachronic Studies. CUP. Lenneberg, E. (1967) The biological foundations of language. Wiley. Lenz, A. (2010) Zum Salienzbegriff und zum Nachweis salienter Merkmale. In C. Anders, M. Hundt, and A. Lasch (eds.) Perceptual dialectology – Neue Wege der Dialektologie. Berlin/New York, pp. 89-110. Lisker, L. (1986) ‘Voicing’ in English: A catalogue of acoustic features signaling /b/ versus /p/ in trochees. Language and Speech 29: 3-11. Lobanov, B. (1971) Classification of Russian vowels spoken by different speakers. Journal of the Acoustical Society of America 49: 606-8. Mendoza-Denton, N, J. Hay, and S. Jannedy (2003) Probabilistic sociolinguistics: Beyond variable rules. In: R. Bod, J. Hay and S. Jannedy (eds.) Probabilistic Linguistics. Cambridge: MIT Press. Munson, B. (2010) Levels of phonological abstraction and knowledge of socially motivated speech-sound variation: a review, a proposal, and a commentary on the Papers by Clopper, Pierrehumbert, and Tamati; Drager; Foulkes; Mack; and Smith, Hall, and Munson. Laboratory Phonology 1: 157-177. 211

Neary, T.M. (1978) Phonetic feature systems for vowels. Indiana University Linguistics Club. Neary, T.M. (1989) Static, dynamic, and relational properties in vowel perception. Journal of the Acoustical Society of America 85: 2088-2113. Niedzielski, N. (1999) The effect of social information on the perception of sociolinguistic variables. Journal of Language and Social Psychology 18: 62-85. Nittrouer, S. (2004) The role of temporal and dynamic signal components in the perception of syllable-final stop voicing by children and adults. Journal of the Acoustical Society of America 115: 1777-90. Nosofsky, R. (1986) Attention, similarity and the identification-categorization relationship. Journal of Experimental Psychology: General 115(1): 39-57. Peterson, G.E., and Barney, H.L. (1952) Control methods used in a study of the vowels. Journal of the Acoustical Society of America 24: 175-184. Pierrehumbert, J. (2001) Exemplar Dynamics: Word frequency, lenition, and contrast. In: Bybee, J. and P. Hopper (eds.) Frequency effects and the emergence of linguistic structure. John Benjamins, pp. 137-57. Pierrehumbert, J. (2002) Word-specific phonetics . Laboratory phonology 8. Berlin: Mouton de Gruyter, pp. 101-139. Pierrehumbert, J. (2003) Phonetic diversity, statistical learning, and acquisition of phonology. Language and Speech 46: 115-154. Pierrehumbert, J. (2006) The next toolkit. Journal of Phonetics 34(6): 516530. Rasz, P. (2013) Salience in sociolinguistics. De Gruyter. Recasens, D. (1996) An articulatory and perceptual account of vocalization and elision of dark /l/ in the Romance languages. Language and Speech 39: 63-89.

212

Recasens, D. J. Fontdevila and M.D. Pallares (1995) Velarization degree and coarticulatory resistance for /l/ in Catalan and German. Journal of Phonetics 23: 37-52. Rosewarne, D. (1984) Estuary English. Times Educational Supplement, 19 (October 1984). Schirmunski, V. (1930) Sprachgeschichte und Siedlungsmundarten. Winter. Stevens, K. (2000) Acoustic Phonetics. MIT Press. Thomas, E. (2011) Sociophonetics – An introduction. Palgrave. Tollfree, L. (1999) South East London English: Discrete vs. continuous modeling of consonantal reduction. In: P. Foulkes and G. Docherty (eds.) Urban Voices. London: Arnold, pp. 163-84. Trager, G. and H. Smith (1957) An outline of English structure. Washington D.C.: American Council of Learned Societies. Trudgill, P. (1986) Dialects in contact. Blackwell. Trudgill, P. (2001) Sociolinguistic variation and change. Edinburgh University Press. Trudgill, P. (2004) New dialect formation: The inevitability of colonial Englishes. Oxford University Press. Vulkan, N. (2000) An economist's perspective on probability matching. Journal of Economic Surveys, 14(1): 101-18. Walker, A. and J. Hay (2011) Congruence between 'word age' and 'voice age' facilitates lexical access. Laboratory Phonology, 2/1: 219-37. Weber, T. P. (1998). News from the realm of the ideal free distribution. Trends in Ecology & Evolution 13(3): 89-90. Wedel, A. (2006) Exemplar models, evolution and language change. The linguistic review 23: 247–74. Wells, J. (1982) Accents of English. CUP. 213

Williams, A. and P. Kerswill (1999) Dialect levelling: change and continuity in Milton Keynes, Reading and Hull. In: P. Foulkes and G. Docherty (eds.) Urban voices: accent studies in the British Isles. London: Arnold, pp. 14162. Yaeger-Dror, M. (1996) Phonetic evidence for the evolution of lexical classes: The case of a Montreal French vowel shift. In: G. Guy, C. Feagin, J. Baugh, and D. Schiffrin (eds.) Towards a Social Science of Language. Philadelphia: Benjamin, pp. 263–287.

214