Review of "Corpora and Language Education" by L ...

10 downloads 765 Views 104KB Size Report
tice interact and how each contributes to the growth and development of the other. Given the ... The nexus of cor- pus linguistics, textlinguistics and sociolinguistics; Applications of corpora in re- ..... to the iPhone generation”. English Today, 28 ...
book reviews Flowerdew, L. 2012. Corpora and Language Education. Basingstoke: Palgrave Macmillan. (xv + 347 pp.)

Despite the focus on language education in the title, Flowerdew’s volume provides an excellent overview of the many faces of corpus linguistics (CL) for any interested researcher or student. The volume is part of the textbook series Research and Practice in Applied Linguistics, which is aimed at “students and researchers in Applied Linguistics, TESOL, Language Education and related areas, and language professionals keen to extend their research experience” (p. xiv); it assumes some knowledge of linguistics on the part of its readers. By presenting chapters which interweave theoretical issues stemming from years of research and incisive accounts of particular case studies and research projects, this volume certainly achieves the goal of showing the reader how CL research and practice interact and how each contributes to the growth and development of the other. Given the pedagogical nature of the book, its evaluation will be focused on its merits as a textbook, though I have not yet had the opportunity to use it as such with my students. The book is in four parts: Key concepts and approaches; The nexus of corpus linguistics, textlinguistics and sociolinguistics; Applications of corpora in research and teaching arenas; Resources. Each chapter is characterized by a number of conventions shared with other texts in the series, including: a clear statement of the aims of the chapter in bullet point form; concepts, quotes and examples “boxed off ” from the text for emphasis; and brief annotated suggestions for further reading. These boxed off sections are to my mind one of the most salient and interesting features of the textbook, so it is worth considering their function briefly before moving on to the discussion of the wider contents. The most prominent among these boxed off sections are the ‘Concepts’. These are extracted and adapted from the literature or authored by Flowerdew herself. As the name suggests, they refer to basic aspects of CL and related domains, which the reader ought to be acquainted with in order to better understand the topics discussed in a given section. Examples include “Criteria for defining a corpus” (Concept 1.1), phraseology, the distinction between competence and performance, the probabilistic vs. the neo-Firthian approach, frame semantics, Hyland’s interactional level of metadiscourse, dialect, world Englishes, corpus stylistics, vague language in medical interaction, and “Sketch Engine search facilities”. As we can see, the topics range widely, from central, general concepts to rather specific – sometimes author-specific – items. I agree with the need to ­ensure that the key concepts International Journal of Corpus Linguistics 19:1 (2014), 147–155. doi 10.1075/ijcl.19.1.06fel ISSN 1384–6655 / e-issn 1569–9811 © John Benjamins Publishing Company

148 Book reviews

of the discipline are clearly and concisely set out for the reader, especially given the diversity of backgrounds students using the textbook might have, and indeed Flowerdew does this very well. I am, however, unsure about the inclusion in the Concepts family of the more article-specific items (e.g. “A local grammar for the genre of legal essays” or “Experiments to test whether word recognition is sensitive to collocational frequency and semantic prosody”). It is not clear to me how these are Concepts on a par with dialect or competence vs. performance and I wonder whether the less confident student might find it difficult to interpret the material in the intended manner. Another element which appears in boxed off sections is the ‘Quotes’. These are direct quotations, usually around a paragraph in length, from a wide range of research literature. My understanding is that they appear when the material quoted is useful and relevant to the text, and would not benefit from being paraphrased. At first it might seem odd to the reader to see these numbered and boxed off from the text, rather than in the normal flow of text. However, it is probably helpful for students to have the authorship of a particular idea set out so explicitly and clearly, and it should enable them to attribute it correctly in their own work (rather than, say, to Flowerdew herself). Finally, we also find ‘Examples’ as boxed off elements. The Examples, also taken from published research, tend to include specific instances of patterns, collocations, and other similar phenomena, sometimes in the form of tables (e.g. “Verbs occurring in genre moves in law cases” or “Functional clusters and literary criticism”), sometimes in a more discursive form which includes discussion by the original author (e.g. “Disciplinary differences in stance nouns” or “Mauranen on strategic and epistemic hedges in MICASE”). These appear in sections where Flowerdew explains a particular type of research approach or case study. Sometimes, again, the difference between Example and Concept is not very clear, but I assume the distinction lies in the fact that Concepts are of relevance to the wider discussion, while Examples are included as needed to illustrate specific points about the range of research carried out in CL. The emphasis on discussing a wide variety of practical applications of CL is, in my opinion, one of the strongest points of this book. Rather than just discussing hypothetical possibilities and directions for research, it constantly refers to real studies covering most available techniques, methods, and sub-disciplines. This ought to turn even beginner students into enthusiasts eager to start delving into a corpus. The book’s eight main chapters follow a natural order which starts with essential background information such as defining a corpus and the different approaches in the field, moves through to CL’s relationship with disciplines such as discourse analysis and sociolinguistics, and ends with a detailed discussion of its



Book reviews 149

application in a rich set of research and teaching contexts. Chapter 1 introduces the reader to the definitions of ‘corpora’, their purposes, and applications. The first paragraph distils a clear and, in my view, uncontroversial definition which sums up all the main criteria: “authentic, naturally occurring data; assembled according to explicit design criteria; representative of a particular language or genre; designed for a specific linguistic or socio-pragmatic purpose” (p. 3). The only – very minor – quibble I have with this definition is the use of the term ‘socio-­pragmatic’, which is presented with no explanation and is not in the Glossary. This is not a term which is immediately transparent to all students of linguistics, and might cause confusion and misunderstanding in what is otherwise a commendably clear definition. The thorny issues of corpus size, design criteria, representativeness, and whether the web is a suitable source of corpus material are also addressed. Flowerdew cites Kilgarriff & Grefenstette’s (2003) question that, in deciding what makes a good source for a corpus, one should ask “Is corpus x good for task y?” (p. 8). This is quoted almost without comment but, it seems to me, it is a good rule of thumb to follow not just for evaluating the web as corpus but any data source, and a rule that could be emphasised more for budding corpus linguists. The chapter also introduces the phenomena that can be studied by looking at corpora: phraseology, word frequency, salience, collocations, semantic prosody, and lexico-grammatical patterns, among others. Oddly, the choices focused on here leave the reader with the impression that corpora are just used to look at the behaviour of words and phrases rather than anything more complex, despite the fact that the rest of the book goes on to showcase a much richer variety of research. The chapter concludes with a section discussing the main limitations of corpora as identified by the research literature: these relate to size, context, software tools, and interpretation. Each objection is picked apart and counteracted effectively. It is salutary to have this section, however, as it presents students with an honest appraisal of the field, and raises their awareness of possible pitfalls to be wary of. Chapter 2 briefly gives the “historical and conceptual background of corpus linguistics”. It describes early attempts at collecting corpora of language (with honourable mentions for Samuel Johnson and Jonathan Swift), before moving on to what happened “AD”, after the advent of digitisation. There is also an overview of the debates over the Chomskyan linguistics vs. corpus linguistics and the competence/performance distinctions; this is presented in a rather more balanced and conciliatory manner than one might sometimes encounter. After all, as ­Flowerdew notes reassuringly, “not all theoretical linguists are as dismissive of corpus linguistics as Chomsky” (p. 45). More interesting, in my view, is the second part of the chapter, which discusses grammaticality vs. acceptability, namely how “the use of invented, artificial language and naturally occurring data can lead to quite different conclusions on what is considered grammatical or acceptable”

150 Book reviews

(p. 48). This, and the role of native speaker judgements in this domain, are wisely presented not as issues with a clear “right” answer but as something that arises in the course of research, to which scholars need to be alert, particularly when dealing with genres where more creative uses of language abound. Reflecting on the problem of the “messy” nature of real-world language is especially relevant to the book’s audience of language practitioners and instructors, though they might not find it a particularly comforting thought. Overall, this chapter fulfils a useful role in informing students of the debates surrounding CL, especially its relationship to theoretical linguistics, which can be important if they have different linguistics backgrounds. There is a small risk that, being a chapter dense with information rather than concepts which it is easier to relate to, it might deter students who are less confident with theoretical matters. Chapter 3 describes the “five main schools of corpus-based approaches to linguistic analysis” (p. 53), namely the Neo-Firthian, probabilistic, systemic-­ functional grammar, multi-dimensional, and sociolinguistic approaches. The first two are compared to each other, introducing the distinction between corpusdriven and corpus-based approaches, as well as different methods of defining and identifying phraseological units. Flowerdew presents arguments in favour of and against each position in a clear and balanced manner, rather than steering the reader towards a particular position. Systemic functional grammar is shown to be in some ways complementary to the phraseological approach to CL, albeit by using the corpus in very different ways – as a source of language descriptions in the former, and as a test bed for the grammar in the latter (p. 68). The multidimensional approach is represented by Biber’s work (1988 and later), while the sociolinguistic approach is embodied by the Nottingham School’s research on creating and analysing corpora of spoken language (in particular CANCODE, cf. for example McCarthy 1998). This latter approach stands out in showing readers a different side of CL, one which focuses not just on phraseology or grammar but also on socially oriented issues such as contexts of interaction, how speakers relate to each other, formulate speech acts, and so on. Like the previous chapter, this too is very dense, though it is of course necessary and appropriate to present different approaches to the discipline and evaluate their strengths and weaknesses. What might have helped the student reader is some context for the chapter and how it relates to what comes before and after it, for example explaining that the research described in the remainder of the text will tend to adopt one of the approaches discussed. This would have provided some guidance to what can otherwise be a daunting amount of information. In Part II, Flowerdew discusses how CL relates to discourse analysis (DA, Chapter 4) and sociolinguistics (Chapter 5). The topic of Chapter 4 is a particularly interesting question as, to the layperson, it might seem that there is little



Book reviews 151

­ ifference between the two disciplines as both involve looking at the language d used in one or more texts. The chapter also revisits the issue of whether CL is a theory, a methodology, or an approach. As elsewhere in the book, Flowerdew presents a balanced view of the debates, with a range of perspectives, before concluding that “[i]t is probably best regarded, in essence, as a methodology along the continuum (rather than divide) of the corpus-driven vs. corpus-based approaches” (p. 83) and CL is not “a theory in itself ” (p. 83). The rest of the chapter explains the differences between DA and CL, the key point being that DA is more focused on “unfolding discourse as process and social action” (p. 84) while CL sees the text as a “product” (p. 84). There is, however, a detailed discussion of how corpus-based studies can inform DA research in areas such as genre, prosody and discourse features, rhetorical structure, and computer mediated communication. It is a very good introduction to the many areas of research in DA, and clearly answers the question posed by the chapter title by showing that, despite differing perspectives, there is a lot of synergy between the two disciplines. Chapter 5 explains that the relationship between CL and sociolinguistics needs clarifying because corpus-based research often draws on sociolinguistic approaches, both in interactional and variational studies. The two approaches are defined and exemplified, with the interactional paradigm represented by studies on spoken interaction in the workplace and on the radio, and the variationist paradigm by studies on dialect and different varieties of English (New Zealand, Australian, Hong Kong, and so on). Notably, there is also a short section on variation in corpora of languages other than English, one of the few places in the book where this occurs. As one of the target audiences of this book is TESOL students, the emphasis on English throughout is justified, but it is helpful to have occasional reminders for the reader (and not just the student!) that not all CL research is focused on one language. The chapter also includes a very useful discussion of the limitations of this type of research, in particular with regard to the availability or otherwise of the relevant metadata regarding context and speaker demographics. This is particularly valuable for the budding researcher, who should be aware that there are many factors both external and internal to the data that can affect the strength of one’s research findings. Part III contains the three chapters which are, in my opinion, the most exciting and useful for the reader, especially the more practical-minded one. ­Chapter 6, “Applying CL in research arenas”, is a very long overview of how CL has been applied in the areas of English as Lingua Franca, professional communication, forensic linguistics, corpus stylistics, translation, learner language, lexicography, and testing. This chapter follows the pattern established in the rest of the volume of combining the discussion of key theoretical concepts with summaries and results of related research. It should certainly interest readers by showing the v­ ariety

152 Book reviews

of research in CL, and, with its wide range of topics, it offers something of relevance to almost all language practitioners and research students. The section on corpus stylistics, for instance (covering examples from Shakespeare to Sylvia Plath), can be used with students who have a background in literature as well as linguistics, as a more accessible entry route into CL research and methods. Similarly, the section on translation studies research will appeal to those intending to go on to professional practice in this area. Despite my enthusiasm for this chapter, there are some minor issues and omissions to note. For example, the otherwise rich and insightful section on English as a Lingua Franca does not include any examples from research articles, or extracts from the corpora mentioned, unlike almost all other sections in the book. A more perplexing omission is in the section on research in business and healthcare communication, which contains several interesting examples of how speakers interact in these contexts, perform relational talk, and are involved in exchanges where one’s face or sense of identity might be at stake: there is no reference to any frameworks from politeness theory – Spencer-Oatey’s (2000) rapport management springs to mind as perhaps one of the most appropriate for this type of data. Turning to the sections on learner corpora, second language acquisition research, and language testing, I am uncomfortable with the way Flowerdew remarks almost in passing that “learner corpora are usually tagged for errors” (p. 169) without expanding on this further. As error tagging of learner corpora (whether manual or automated) is a significant undertaking, fraught with conceptual and methodological pitfalls, it is a bit surprising that this observation has not been developed further. The reference provided (Dagneaux et al. 1998) could be supported by James (1998) and Nicholls (2003). Also, unusually for this book (where cross-referencing is quite rich), the criteria for learner corpus compilation very clearly set out in Concept 1.2 (p. 5) are not recalled or cross-referenced in this section, nor is there any other mention of the issue of corpus design and comparability of results. Chapter 7, “Applying CL in teaching arenas”, is almost as long as the previous chapter and focuses on how corpora can be used directly and indirectly in teaching. After warning the reader of the potential dangers in using corpora in teaching without considering their limitations, the actual goals of teachers and learners, and the relationship between salience and frequency, the chapter goes on to describe indirect applications of corpora in this domain, i.e. how they have informed the creation of grammars and textbooks. Flowerdew observes that few English Language Teaching (ELT) publishers seem to be producing corpus-based ELT textbooks (p. 195), though many publishers are actively pursuing the creation of ever-growing corpora to use in materials development, so this might change in years to come. As for direct applications of CL in teaching, Flowerdew discusses



Book reviews 153

how data-driven learning can be used in a wide range of domains, not just “traditional” language learning but also academic writing and ESP. Although many of the studies discussed are quite specific, they are a good starting point for the budding teacher to reflect about what can be applied in his or her own classroom. A further topic that could be discussed here is the growing use of mobile phones and tablets in accessing resources, both through websites and apps, as described for example in Aarts et al. (2012). One of the features of this book is its balance in discussing both pros and cons of issues and methodologies, and this chapter is no exception. There is a section on the potential impediments to data-driven learning (DDL), in which the author notes that “most of the initiatives for integrating corpora into language learning have mainly remained at the institutional level and not filtered through the language teaching community at large” (p. 203). This is perhaps unsurprising given that often the needs of a particular learner community are quite specific and it can be hard to devise projects that are universally applicable, though at least a general awareness of the potential of CL in the wider teaching community would be desirable. Flowerdew also notes that further impediments to the wider use of DDL in teaching include unfamiliarity with tools and corpora interfaces, unsuitability of the corpora themselves, and learners lacking knowledge about how to use them. However, it is possible that newer generations of teachers and students, accustomed to technology from an early age, will find some of these practical and technical issues less daunting. Finally, Chapter 8, “Research cases”, is where the book really shines, in my opinion. The chapter presents ten different case studies from the literature, covering topics including spoken native and learner language, politeness, academic writing, collocations, discourse studies, legal language, and syllabus development, among others. For each case study, Flowerdew provides a summary, its aims, a description of the corpora and methods used, a summary of the results and analysis, a commentary, and suggestions for future research. This is a great resource for students and young researchers, in particular the commentary. It is very helpful in offering training in how to critically approach a research article, and what questions to ask oneself in doing so. Examples of this kind of analysis should be shown to all students at the beginning of their studies. The book concludes with Chapter 9, “Key sources”, which includes sections on books (all the main well-known textbooks, except for McEnery & Hardie (2012), I assume because it was not yet published at the time this list was compiled), a wide list of edited collections, handbooks, journals, both those explicitly focused on CL (e.g. the present one, or Corpora) and on related topics. As for digital resources, we find the main CL conferences, associations, and special interest groups, with URLs (though there seems to be no mention of the biennial, UK-based Corpus Linguistics conference – which admittedly does not have a ­permanent URL), websites for

154 Book reviews

compendiums of resources and for corpus analysis tools (to which I would add the resources offered by Mark Davies at http://corpus.byu.edu), and relevant email discussion lists such as Corpora List and Linguist List, for which I would have provided a URL rather than an email address, since one needs to visit the website to join the list and browse its archives. A notable omission from this chapter is a list of well-known corpora, though I admit this can be an unwieldy undertaking; however, a reference to Xiao (2008) or his Corpus Survey webpages (http://www.lancs. ac.uk/fass/projects/corpus/cbls/corpora.asp) might have usefully complemented the other information provided. Overall, I am a great admirer of this textbook, and none of the issues highlighted above seriously detract from its usefulness. It is quite user-friendly, though I have sometimes wondered whether the reader would benefit from some introductory or guiding remarks at the start, explaining how the different types of chapters relate to and support each other and the student’s own work. For example, when debates such as the corpus-based vs. corpus-driven distinction are discussed, it might not always be clear to the student why it is necessary to have the discussion at all. On the other hand, one might also argue that we should allow students to draw these conclusions on their own. Another usability issue regards the fact that often corpora are introduced in the text for the first time with no explanation, and sometimes even without their full names, for example LOB (p. 10), the BNC (p. 12), the BoE (p. 19). Some of these are explained more fully in later chapters (though not always cross-referenced), and by reading the whole book one eventually gains a pretty good understanding of a corpus’s genesis and contents, but in a book of this sort, key information about the main and reference corpora could be presented more clearly, perhaps in one of the concept boxes that are integral to the text. Regarding the content, it should be clear by now that I find it very comprehensive, and wide-ranging enough to appeal to almost any reader. The only area of linguistics that could have been more explicitly discussed is pragmatics, in relation to the growing field of corpus pragmatics (both historical and contemporary; cf. Romero-Trillo (2013) and Jucker & Taavitsainen (2010) among others). Some relevant bibliography is included in the “Key sources” section, and there is reference throughout the text to specific studies on pragmatics topics such as discourse markers or politeness in business letters, but there is no more general overview of the potential of CL research for pragmatics, unlike for many other areas of linguistics. As noted above, I have not yet been able to use this volume with my students as part of their course, but I look forward to doing so in the coming year. It is an invaluable introduction to CL in many areas of language use and study, and a useful reminder for more experienced researchers of the variety of directions our chosen discipline can take us.



Book reviews 155

References Aarts, B., Clayton, D. & Wallis, S. 2012. “Bridging the grammar gap: Teaching English grammar to the iPhone generation”. English Today, 28 (1), 3–8. DOI: 10.1017/S0266078411000599 Biber, D. 1988. Variation Across Speech and Language. Cambridge: Cambridge University Press. DOI: 10.1017/CBO9780511621024 Dagneaux, E., Denness, S. & Granger, S. 1998. “Computer-aided error analysis”. System, 26 (2), 163–174. DOI: 10.1016/S0346-251X(98)00001-3 James, C. 1998. Errors in Language Learning and Use: Exploring Error Analysis. London: L ­ ongman. Jucker, A. & Taavitsainen, I. (Eds.) 2010. Handbook of Historical Pragmatics. Berlin: Mouton de Gruyter. DOI: 10.1515/9783110214284 Kilgarriff, A. & Grefenstette, G. 2003. “Introduction to the Special Issue on Web as Corpus”. Computational Linguistics, 29 (3), 333–347. DOI: 10.1162/089120103322711569 ­ niversity McCarthy, M. 1998. Spoken Language and Applied Linguistics. Cambridge: Cambridge U Press. McEnery, T. & Hardie, A. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press. Nicholls, D. 2003. “The Cambridge Learner Corpus: Error coding and analysis for lexicography and ELT”. In D. Archer, P. Rayson, A. Wilson & T. McEnery (Eds.), Proceedings of the Corpus Linguistics 2003 Conference, 572–581. Lancaster: UCREL, Lancaster University. Romero-Trillo, J. (Ed.) 2013. The Yearbook of Corpus Linguistics and Pragmatics: New Domains and Methodologies. Dortdrech: Springer. Spencer-Oatey, H. 2000. Culturally Speaking: Managing Rapport through Talk across Cultures. London: Continuum. Xiao, R. 2008. “Well-known and influential corpora”. In A. Lüdeling & M. Kyto (Eds.), Corpus Linguistics: An International Handbook. Berlin: Mouton de Gruyter, 383–457.

Reviewed by Rachele De Felice, University College London, UK