The interactions between the effects of implicit ... - Wiley Online Library

0 downloads 0 Views 205KB Size Report
treatment effects were measured by a grammaticality judgment test and an elicited imitation test. The. Words in Sentences subtest of the MLAT was used to ...
The Interactions Between the Effects of Implicit and Explicit Feedback and Individual Differences in Language Analytic Ability and Working Memory SHAOFENG LI University of Auckland Department of Applied Language Studies and Linguistics Auckland 1142 New Zealand Email: [email protected] This study investigated the interactions between two types of feedback (implicit vs. explicit) and two aptitude components (language analytic ability and working memory) in second language Chinese learning. Seventy-eight L2 Chinese learners from two large U.S. universities were assigned to three dyadic NS–NNS interaction conditions and received implicit (recasts), explicit (metalinguistic correction), or no feedback (control) in response to their non-target-like oral production of Chinese classifiers. The treatment effects were measured by a grammaticality judgment test and an elicited imitation test. The Words in Sentences subtest of the MLAT was used to measure language analytic ability; a listening span test was utilized as the measure of working memory. A principal components analysis and a structural equation modeling analysis established that working memory was an aptitude component. Multiple regression analyses showed that language analytic ability was predictive of the effects of implicit feedback, and working memory mediated the effects of explicit feedback; all the statistically significant results involved delayed posttest scores. Interpretations were sought with recourse to the mechanisms of the cognitive constructs and the processing demands imposed by the different learning conditions. Keywords: corrective feedback; aptitude; language analytic ability; working memory

CORRECTIVE FEEDBACK HAS BEEN A MAJOR theme in recent SLA research because of the important place it occupies in pedagogy and theory construction. Practitioners are faced with the conundrums of whether learners’ errors should be responded to and if so, when and how feedback should be provided to achieve optimal instructional effects. Researchers are interested in the cognitive and social dimensions of feedback as a theoretical construct that facilitates or impedes interlanguage development

The Modern Language Journal, 97, 3, (2013) DOI: 10.1111/j.1540-4781.2013.12030.x 0026-7902/13/634–654 $1.50/0 © 2013 The Modern Language Journal

(see Krashen, 1981). Empirical investigation into corrective feedback has been fruitful in probing the principles and processes of SLA and providing valuable pedagogical implications. However, there has been scant attention to the role played by individual differences in learners’ processing of corrective feedback and the learning that results. Language aptitude is an extensively studied individual difference variable. However, most aptitude research has been predictive in nature, and the primary objective of aptitude testing has been to determine learners’ potential to achieve ultimate L2 success. The results of aptitude tests have been mainly used to select elite learners for language courses or programs (which are often government-funded), diagnose learning disabilities, or place students into classes of appropriate

Shaofeng Li levels (Carroll & Sapon, 2002). Recently, it has been proposed that the boundaries of aptitude research should be expanded to examine how aptitude, or rather different configurations of aptitude components, interacts with different learning conditions (Robinson, 2005) or different stages of L2 development (Skehan, 2012). Research on the interaction between instructional treatment and language aptitude can potentially provide insights into the mechanisms of SLA and suggest pedagogical implications. Unfortunately, to date there has been a dearth of such research. As Spada (2011) observed, “There is a clear need for more research exploring relationships between aptitude (and other individual differences), type of instructional approach and SLA” (p. 232). This study is undertaken to investigate the interactions between the effects of two types of feedback (recasts and metalinguistic correction) and learners’ aptitude differences in language analytic ability and working memory in the learning of Chinese classifiers. RECASTS AND METALINGUISTIC FEEDBACK A recast is a reformulation of a non-target-like L2 utterance. Among all the identified corrective strategies, recasts are probably the most studied. Recasts have been found to be the most frequent feedback type in all instructional settings; their popularity is most likely due to their contingency, nonintrusiveness, and affordance of both positive and negative evidence. These characteristics make recasts an ideal form-focusing strategy in meaning-oriented communication. Lyster and Mori (2006) contended that recasts are useful not only as a form-focusing device but also as a scaffolding tool when the content or knowledge required for the maintenance of the ongoing interaction is beyond the learner’s capacity. Research shows mixed results regarding the effects of recasts. In general, recasts have been shown to be effective in laboratory studies (Egi, 2007; Ishida, 2004; Iwashita, 2003; Leeman, 2003; Lyster & Izquierdo, 2009; Mackey & Philp, 1998; McDonough, 2007; Sagarra, 2007). These studies are typically carried out in dyadic interaction (or via the computer) where learners receive intensive recasts on a single structure. Methodological features such as the lab setting, provision of feedback on a one-on-one basis, and targeting a single structure might make recasts relatively salient and therefore beneficial to L2 development. This speculation has been confirmed by some studies that showed small or no effects for recasts in classroom settings (Ellis, Loewen, &

635 Erlam, 2006; Lyster, 2004; Sheen, 2010; Yang & Lyster, 2010). However, recasts were found to be very effective in classroom settings where their use was salient or intensive. In Doughty and Varela’s (2002) study, a non-target-like utterance was repeated with a rising tone followed by a recast, which made the corrective intention easily recognizable. In Han’s (2002) study, the learners received recasts in 11 sessions on their non-targetlike production of English past tenses. The effects of recasts may also be mediated by the target structure. For example, Ammar and Spada (2006) found that recasts were effective for possessive his/ her in English, a salient, transparent structure, whereas redundant and/or opaque structures such as French gender (Lyster, 2004), English past tense (Ellis et al., 2006; Yang & Lyster, 2010), and English articles (Sheen, 2007) may not be amenable to such feedback. One corrective strategy that is often juxtaposed with recasts in feedback research is metalinguistic feedback, which is defined as “comments, information, or questions related to the well-formedness of the student’s utterance” (Lyster & Ranta, 1997, p. 47). Sheen (2011) made a distinction between direct and indirect metalinguistic feedback. The former, also called metalinguistic correction, contains the correct form and a metalinguistic comment; the latter only includes a metalinguistic comment. Sheen (2011) contended that direct metalinguistic feedback contains both positive and negative evidence, facilitates learner awareness at the level of understanding (rather than mere noticing), and is therefore especially useful in learning complex linguistic structures. This study investigates what is called direct metalinguistic feedback in Sheen’s terminology and is referred to as metalinguistic correction. Recasts are often investigated in comparison with metalinguistic feedback partly because the former constitutes an implicit type of feedback and the latter is representative of explicit feedback. However, many would disagree with the implicit–explicit dichotomy on the grounds that recasts can become explicit in situations where learners easily recognize the corrective force even if they are intended to be implicit. To dispel the controversy surrounding the distinction between implicit and explicit feedback, it is important to distinguish instruction from learning. Hulstijn (2005) stated that “instruction is explicit or implicit when learners do or do not receive information concerning rules underlying the input, respectively” (p. 132). Following Hulstijn’s view, then, the implicitness or

636 explicitness of feedback should be determined from the teacher’s perspective. Recasts do not provide rule explanation and are therefore implicit; metalinguistic feedback often subsumes rule explanation, so it is explicit. Whereas the nature of feedback concerns the provider, the nature of learning resulting from feedback relates to the receiver of the feedback. Learning is explicit if it is conscious and involves rule processing or induction; learning is implicit if it happens unconsciously and does not implicate rule processing or induction (Hustijn, 2005; Robinson, 1997, 2002). While implicit feedback is more likely to lead to implicit learning than explicit feedback, it may also contribute to explicit learning when the learner starts to infer and formulate rules based on available linguistic data (e.g., Long, Inagaki, & Ortega, 1998). By the same token, explicit feedback may invite implicit learning, such as when the learner picks up something unconsciously that is not related to the linguistic structure the feedback targets. Previous research showed that recasts were less effective than metalinguistic feedback (Ellis, Loewen, & Erlam, 2006; Lyster, 2004; Sheen, 2010), which provides further support for Spada and Tomita’s (2010) and Norris and Ortega’s (2000) meta-analytic finding that explicit instruction was more effective than implicit instruction. However, the meta-analyses by Li (2010) and Mackey and Goo (2007) showed larger long-term effects for implicit feedback. There is also evidence that implicit feedback and explicit feedback were equally effective for more advanced learners but lower-level learners benefited more from explicit feedback, suggesting an impact of proficiency on the effects of feedback (Ammar & Spada, 2006; Li, 2009). It would seem that the unequivocal advantage of explicit feedback/instruction needs to be reconsidered and that the investigation of the mediating variables for corrective feedback should be prioritized in the new agenda for feedback research. One such variable is language aptitude. LANGUAGE APTITUDE AND CORRECTIVE FEEDBACK Language aptitude refers to an individual’s capacity to learn a language. The publication of the MLAT (Modern Language Aptitude Test; Carroll & Sapon, 1959) led to an exponential growth in research on language aptitude because it provided a clear definition and valid measure of the construct. The MLAT has been so influential that, to some extent, aptitude is defined in terms

The Modern Language Journal 97 (2013) of what the MLAT measures. The MLAT consists of five parts that measure three dimensions of aptitude: phonetic coding ability, language analytic ability, and rote learning (memory) ability. Based on the findings of the MLAT studies, Carroll and Sapon (2002) claimed that language aptitude is (a) distinct from IQ, (b) stable and unsusceptible to training, (c) separate from affective variables (motivation, anxiety, etc.), (d) prognostic of learning rate (and diagnostic of learning problems), and (e) not affected by differences in instructional context, target language, or language skill (e.g., written vs. oral). The MLAT was developed in the heyday of the audiolingual approach (the 50s and 60s), which was characterized by drills, rote learning, and the development of explicit grammar knowledge. The audiolingual approach was grounded in Behaviorism, according to which language learning was a matter of habit formation. Subsequently, Behaviorism lost ground to Krashen’s (1981) Universal Grammar-based Monitor Theory, which featured exposure to input and implicit learning. In the meantime, the audiolingual approach gave way to communicative approaches, where aptitude was criticized as being irrelevant. Also, because of the dominance of Krashen’s theory (the Affective Filter Hypothesis in particular), variables other than aptitude, such as motivation and anxiety, became the focus of research on individual differences. However, several more recent theories (such as the Interaction Hypothesis (Long, 1996) and the Noticing Hypothesis (Schmidt, 1990)) have challenged Krashen’s theories and generated prolific empirical research. The 1990s witnessed a resurrection of aptitude research, which is ascribable to the claims of the newly born theories. For instance, empirical studies guided by some of the above theoretical frameworks demonstrated that there is a need for a certain amount of attention to form in meaning-primary L2 instruction. This made it possible for aptitude to reclaim part of its lost territory in SLA, because of its inherent relevance to form-focused instruction. Also, these theories have spawned a sweeping interest in the cognitive processes of SLA, leading to the study of aptitude components such as analytic ability and working memory as cognitive variables central to L2 acquisition. In the meantime, aptitude researchers realized the limitation of investigating only the variable of aptitude as a predictor of proficiency. It has become clear that the predictive function of aptitude is of limited theoretical value. While it is true that the results of aptitude tests may serve

637

Shaofeng Li some pedagogical purposes such as selection of talented language learners, diagnosis of learning disabilities, and prognosis of learning rate, these uses of aptitude are mostly tangential to explaining SLA. Robinson (1997, 2002) pointed out that aptitude should be treated as a dynamic construct that interacts with different learning and instructional conditions that impose different cognitive demands on learners. The interaction between language aptitude and corrective feedback is a promising research venue. Both domains are relatively mature: There has been a large body of research on both corrective feedback and language aptitude to the effect that the constructs are clearly defined and operationalized and reliable research methods, such as measures and data elicitation tasks, have been developed. Integrating the two areas has the potential of obtaining findings that lead to more precise understanding of SLA processes. There is also a mutual need for investigating the two areas in combination. Feedback research has reached a point where new variables must be introduced in order to obtain more accurate interpretations of the effects of different corrective strategies. Aptitude research has been in hibernation for a long period of time and its revival is in large part dependent on the extent to which its primary objective evolves from a predictive to an explanatory role. To revitalize aptitude research, it is also critical to update aptitude measures incorporating the latest research findings. For instance, some researchers have argued for the utilization of working memory tests rather than tests of associative memory to capture the learning mechanism of form-focused instruction—a brief attention to form during meaningful communication (DeKeyser & Koeth, 2011; Robinson, 2002). Despite a large body of research into the relationship between working memory and SLA processes (Skehan, 2012), there has been a lack of research on working memory as an aptitude component. The following review starts with an overview of the two aptitude components under investigation—language analytic ability and working memory—followed by a summary of the research that has investigated the interface between feedback and the two cognitive variables.

Language analytic ability is often measured with the Words in Sentences subtest of the MLAT. Previous research showed that, among the three components included in the MLAT (phonemic encoding, analytic, and memory), language analytic ability is the most predictive of L2 proficiency (Ehrman & Oxford, 1995; Hummel, 2009; Ranta, 2002). Language analytic ability has been found to be related to general L2 proficiency (Alderson, Clapham, & Steel, 1997; Sparks et al., 2011) and the acquisition of explicit knowledge (Gardner & Lambert, 1965; Horwitz, 1980); whether it is predictive of the acquisition of implicit knowledge is less certain. While it was found to be correlated with oral production in some studies (Ehrman & Oxford, 1995; Horwitz, 1987), there were no such correlations in other studies (Harley & Hart, 1997; Ranta, 2005). A few studies have examined how language analytic ability interacted with the effectiveness of corrective feedback. DeKeyser’s (1993) longitudinal classroom study included two L2 French classes: One received explicit error correction and the other did not. After a school year, the feedback group did not outperform the comparison group; there was no effect for language analytic ability. Sheen (2007) investigated the extent to which language analytic ability correlated with the effects of recasts and metalinguistic correction in the learning of two uses of English indefinite and definite articles, a as first mention and the as anaphoric reference. Language analytic ability was found to be correlated with the effects of metalinguistic correction but not those of recasts. Trofimovich, Ammar, & Gatbonton (2007) examined the relationship between the effects of computerized recasts and language analytic ability (as well as attention control and working memory). Significant relationships were found between language analytic ability and the effects of the feedback in learning grammatical items (his/her in English) but not lexical items. Taken together, these few studies seemed to indicate that language analytic ability was related to the effects of metalinguistic feedback, and to the effects of recasts in the learning of an easy grammatical structure (his/her) but not a difficult one (the/a); it was not sensitive to the learning of lexicon. However, given the small number of studies, it is premature to draw any conclusions.

Language Analytic Ability Carroll (1981) defined language analytic ability (grammatical sensitivity) as “the ability to recognize the grammatical functions of words (or other linguistic entities) in sentence structures” (p. 105).

Working Memory The term working memory has been adopted for short-term memory to reflect the fact that, instead of being merely a warehouse to store incoming

638 data, it is also responsible for information processing (Miyake & Friedman, 1998). There are two views on the architecture of the working memory construct (Conway et al., 2007; French, 2006): the unitary approach and the multicomponential or multifaceted approach. Researchers embracing the unitary approach believe that working memory is a single construct that performs both storage and processing functions (Daneman & Carpenter, 1980). Others hold that working memory consists of a central executive and several slave systems (e.g., Baddeley, 2007). The central executive is responsible for the control and regulation of the working memory system. The subcomponents include a phonological loop that stores phonological/auditory information, a visuospatial sketchpad that involves the generation and storage of visual information, and an episodic buffer that integrates information from a variety of systems and from long-term memory. In L2 research, there has been a call to investigate working memory as an aptitude component (Miyake & Friedman, 1998; Robinson, 2005). Robinson argued that the MLAT and other test batteries of aptitude were developed in audiolingual contexts where rote learning was a defining feature. However, in communicative language teaching, linguistic forms are addressed in meaning-focused instruction, and the processing demands of this type of instruction are different from those of audiolingual classes. Skehan (1982) also argued that the MLAT subtest that is concerned with the memory component of aptitude (asking the learner to memorize some artificial words and then recognize them) measures learners’ associative memory, which may not be the most predictive of language learning. Indeed, working memory has the potential of being the most important aptitude component because it constitutes a converging space for all three aptitude components—phonetic coding, language analytic ability, and memory. However, to date, few studies have focused on the validation of working memory as an aptitude component and its role in SLA. With respect to the relationship between working memory and the effectiveness of corrective feedback, there have been several published studies, all of which relate to recasts. Mackey et al. (2002) investigated the relationship among working memory, noticing of recasts, and the effects of recasts in the learning of English question formation by Japanese EFL learners. The researchers found a positive correlation between working memory and noticing. They

The Modern Language Journal 97 (2013) also found that learners with low working memory showed more improvement on the immediate posttest and those with high working memory demonstrated more interlanguage development on the delayed posttest. Mackey and Sachs (2012) investigated the relationship between working memory and interactional feedback (mainly recasts) with 9 older adult ESL learners (ages 65–89). The participants who improved their accuracy in producing the target structure (question formation) were those with the highest working memory scores. Re´ve´sz’s (2012) study concerns the extent to which the effects of recasts are related to scores of memory in the learning of English past progressive construction. Significant correlations were found between working memory and written test scores and between phonological short-term memory and oral test scores. Trofimovich, Ammar, and Gatbonton (2007) examined the role of memory (together with attention and analytic ability, as reviewed above) in mediating the effects of computerized recasts. It was found that working memory was not a significant predictor of the learners’ interlanguage development. However, in a similar study, Sagarra (2007) found that the effects of recasts were associated with the learners’ working memory capacities. Many issues remain unresolved. The studies by Mackey and her colleagues revealed some interesting and thought-provoking findings, but these findings need to be verified with more learners and in different contexts. Trofimovich, Ammar, and Gatbonton (2007) and Sagarra (2007) obtained some conflicting results, and in both studies recasts were provided in computer mode and in discrete item practice. How working memory interacts with feedback in meaningful communication is not clear. All five studies investigated recasts, and how working memory correlates with the effects of other feedback types needs further empirical exploration. Also, in previous research, working memory was either operationalized as phonological short-term memory, or when it was measured using complex, sentence-span tests, it was mainly the recall component (not veracity judgment or reaction time) that was scored. However, research showed that there was a tradeoff between the storage and processing components of working memory (e.g., Waters & Caplan, 1996). THE CURRENT STUDY This study examines how learners’ aptitude differences in language analytic ability and

639

Shaofeng Li working memory interact with the effects of implicit and explicit feedback. The data were collected as part of a larger study investigating factors constraining the effectiveness of corrective feedback. The findings pertaining to the interactions between feedback type and proficiency are reported in a separate article.1 Therefore it is not a focus of this study to show the comparative effects of implicit and explicit feedback; rather, the primary concern here is the effect of the interface between feedback type and aptitude components on L2 learning. Consequently, the following research questions are formulated: 1. What is the relationship between the effectiveness of implicit and explicit feedback and learners’ individual differences in language analytic ability? 2. What is the relationship between the two feedback types and learners’ individual differences in working memory? 3. Do the two aptitude components interact differently with the two feedback types?

Participants Seventy-eight L2 Chinese learners aged 18–38 (M ¼ 20.8) from two large U.S. universities participated in the study. Seventy-five were native speakers of English and 3 reported Korean as their native language; 34 were female and 44 male. At the time of data collection, they were in their 4th, 6th, and 8th semesters of their Chinese study.2 The participants were assigned to one of three conditions: implicit (n ¼ 28), explicit (n ¼ 29), and control (n ¼ 21). These groups received recasts, metalinguistic correction, or no feedback, respectively, in response to their nontarget-like L2 production. A standardized proficiency test (HSK) was administered to ensure that the three groups were comparable in their proficiency in the L2; a one-way ANOVA showed no significant differences among the three groups in their test scores, F (2, 75) ¼ .15, p ¼ .86. The descriptive statistics are displayed in Table 1.

Target Structure Chinese classifiers served as the linguistic target of the treatment tasks. A classifier is a word that is used between a determiner (which is typically a number but can also be a demonstrative such as this/that or a quantifier such as several) and a count noun, as in liaˇng beˇn shu¯ (two CLASSIFIER books) or yı¯ ke¯ shu` (one CLASSIFIER tree). The classifier is one of the most striking features of the Chinese language (Li & Thompson, 1981). Semantically, a classifier is used to categorize and quantify a set of objects with the same or similar physical properties or characteristics. Syntactically, “classifiers are units of enumeration employed to mark countability; their occurrence makes the semantic partitioning of nouns visible” (Wu & Bodomo, 2009, p. 490). The choice of classifier depends on the accompanying noun, not the determiner. The form–meaning mapping of a classifier is transparent: There is usually a one-to-one correspondence between a classifier and the related noun. However, there are situations where more than one classifier is compatible with an object. For instance, there are two possible classifiers for dogs—zhı¯ and tia´o—and which is more appropriate is subject to controversy. Also, in Chinese there is a general classifier (ge`) that can substitute for a special classifier in many cases, which confuses L2 Chinese learners and partly explains why classifiers constitute a problematic structure for learners. Typical learner errors include failure to use a classifier in an obligatory context, misuse of classifiers, or use of the general, default classifier ge` in lieu of a special classifier. The classifier was selected as the target structure because it is problematic for learners at all levels of their interlanguage development; yet it is one of the earliest addressed structures in L2 Chinese instruction, so all learners had some prior knowledge of it. Structures about which learners have partial knowledge but not full mastery are ideal for feedback treatment (Han, 2002). For learners whose first language is a nonclassifier language, such as English (which

TABLE 1 Descriptive Statistics for Proficiency Scores Implicit

Explicit

Control

n

Mean

SD

n

Mean

SD

n

Mean

SD

28

29.86

7.50

29

29.52

8.34

21

30.81

9.83

640 has measure words—such as in “a piece of bread”—but not classifiers), classifier learning involves a two-step procedure: (a) They need to be aware that a classifier must be used between a determiner and a noun, and (b) they need to match specific classifiers in the repertoire with the corresponding nouns. In a sense, classifier learning is both rule- and item-based. Feedback Operationalization Implicit feedback was operationalized as recasts; that is, the reformulation of the learner’s non-target-like production of the target structure. The recasts in this study were mostly partial, didactic, ended in a falling tone, and were provided in meaning-focused tasks where the target structure was attended to in information exchange. Aside from the utterances containing the target structure, utterances that subsumed errors related to non-target structures were also responded to with recasts as well as other feedback types to mask the linguistic focus. The following episode, which was extracted from the dataset of this study, exemplifies how a recast was provided. EXAMPLE 1. Recast: Incorrect Classifier

[Note. CL ¼ classifier]

The Modern Language Journal 97 (2013) In this episode, the learner used a wrong classifier (ge`) for pigs. The native speaker reformulated the noun phrase by replacing the wrong classifier with the correct one (to´u). In the next utterance, the learner repeated the correct classifier and the noun, followed by a descriptive statement about the pigs in the photo. Following Sheen (2007), explicit feedback was operationalized as metalinguistic correction, that is, the provision of the correct form followed by a metalinguistic clue. This operationalization is motivated by the following factors. First, as Sheen pointed out, metalinguistic correction is potentially more effective than a metalinguistic clue alone because it provides positive evidence. Further support for Sheen’s operationalization comes from Ellis (2007), who suggested the principle of “bias for best,” (p. 340) that is, operationalizing a feedback type in a way that maximizes its potential effect. Second, a major goal of this study is to explore how the effects of implicit and explicit feedback interact with the two aptitude components. Combining explicit correction and metalinguistic feedback, two explicit feedback types, makes the resulting feedback even more explicit, hence increasing the implicit–explicit contrast. An additional benefit of providing positive evidence in both feedback groups was to control for the amount of modified output (uptake), which might affect the effectiveness of feedback (e.g., Lyster & Ranta, 1997). Metalinguistic feedback without a model is a type of outputprompting feedback that imposes participatory demands on the learner. Thus, the availability of the correct form in the feedback minimized the likely influence of the confounding variable of uptake. One might argue that due to the explicit nature of metalinguistic feedback, learners who received this type of feedback might still have produced more uptake. However, there has been limited empirical evidence for the benefits of uptake in facilitating SLA, and this was especially true in laboratory settings (Mackey & Philp, 1998). To be consistent with Chinese pedagogical grammar, the term measure word was used instead of classifier in the metalinguistic clue. Also, to ensure that the learner understood the metalinguistic clue, the information was provided in English, the learner’s L1. The following episode illustrates how explicit feedback was provided. EXAMPLE 2. Explicit Feedback: Missing Classifier

641

Shaofeng Li

In the above scenario, the learner failed to use a classifier between the numeral yı¯ (‘a’) and the noun he´ (‘river’). The native speaker reformulated the noun phrase by inserting the classifier tia´o and then provided a metalinguistic clue. The learner repaired the non-target-like utterance in the next turn. Tasks Two tasks were used to elicit the production of classifiers: picture description and spot the differences. In both tasks, pictures were used, each of which contained a scenario that created a meaningful context for the use of the target structure. In Task 1, the picture description task, the learner was asked to describe seven pictures that contained 15 cases of classifier use. The pictures had different numbers of various objects (such as two trees, a river, three horses) so that the learners had to use classifiers when they described the objects and reported how many of them there were. In Task 2, the spot the differences task, there were three sets of pictures; each set had two pictures that contained more or less the same items but were different in a number of aspects. The native speaker and the learner each held a picture, and the learner asked questions to identify the differences between the pictures. Completion of the task required the use of the same 15 selected classifiers as in Task 1. The native speaker provided explicit or implicit feedback in response to the learner’s wrong classifier use. The learners in the control group were asked to read a

story about a Chinese idiom shu´ ne´ng she¯ng qia˘o (‘Practice makes perfect’) and retell it by following some clues. Retelling the story did not require the use of the selected classifiers, and no feedback was provided in the control condition.3 The selection of classifiers was based on the responses from 45 native speakers of Chinese to a survey on classifier use. The respondents were Mandarin native speakers studying or working in the local community where this study was conducted. Twenty of these native speakers had a bachelor’s degree, 19 had a master’s degree, and 6 had a doctoral degree. Their specializations were varied, including humanities, science, and engineering. The average age was 32.08. The survey served two purposes. One was to select appropriate classifier þ noun combinations for treatment tasks. The other was to ensure that the selected special classifiers could not be replaced by the general classifier ge`. The survey had 40 items, each providing a context for classifier use. For each item, the respondent was asked to fill in the missing classifier and then decide whether the classifier could be replaced by the general classifier. The surveyed classifiers were mostly selected from the textbooks used in the Chinese programs the study participants were enrolled in (e.g., Liu et al., 2009). Additional sources of classifiers were other commercial Chinese textbooks (e.g., Wu et al., 2007) used in North America, Erbaugh’s (1986) list of core classifiers, and the widely used Chinese grammar book by Li and Thompson (1981). The example below shows a sample item in the survey. EXAMPLE 3. Survey Item

Altogether 15 cases of classifier use were selected out of the 40 surveyed items. In order to be eligible to be included in the study, a classifier had to reach an agreement rate of 80% or higher among the respondents regarding the collocation of the classifier with its accompanying

642 noun and the inability to substitute the general classifier for the special one. Testing and Scoring The measures used in this study included a proficiency test, tests of treatment effects, a language analytic ability test, and a working memory test. The Appendix presents the details on the different measures of the constructs under investigation, the number of items and possible points for each measure, and the related estimates of internal reliability. Proficiency. To ensure that the three participant groups were comparable in their L2 proficiency, an adapted HSK (ha`nyuˇ shuˇˇˇıpı´ng kaˇoshı` or ‘Chinese Proficiency Test’) test was administered. The HSK is a standardized test of Chinese as a foreign language sponsored by Beijing Languages and Cultures University and recognized by the People’s Republic of China and numerous countries worldwide. Previous research (e.g., Nie, 2006) has demonstrated that the test has high levels of validity and reliability. The revised HSK used in this study consisted of 60 items: 30, 20, and 10 for listening, grammar, and reading respectively. Each item was assigned 1 point, with a total score of 60. More weight was given to listening comprehension and grammar than to reading comprehension to align with the format of the interventional treatment, where feedback was provided orally to linguistic errors in oral production. Treatment Effects. A grammaticality judgment test (GJT) and an elicited imitation (EI) test were used to measure the effects of feedback. The GJT and EI tests were used to measure learners’ explicit and implicit knowledge about the target structures respectively (Ellis et al., 2009). During the GJT, learners were asked to judge whether a sentence was grammatical or ungrammatical or whether they were not sure. If learners judged a sentence to be ungrammatical, they were asked to locate the error and correct it. During the EI test, learners listened to statements relating to their everyday life or personal experience. The stimuli, which were read at normal speed by the researcher and were recorded on an audio disc, were presented manually using a disc player. After each statement the disc was paused to allow learners to decide whether it was true or not true based on their experience or whether they were not sure. (An example stimulus, translated into English, would be “I bought three shirts yesterday.”) The learner was then asked to repeat each statement in correct Chinese.

The Modern Language Journal 97 (2013) Both the GJT and the EI test had three versions: a pretest, an immediate posttest, and a delayed posttest. Each version had 15 target items and 8 distracters. Among the 15 target items, 8 were ungrammatical and 7 were grammatical. The three tests had the same target items but different distracting items, and the sequence in which the items were presented was different across the tests. The sentence stimuli in the GJT were different from those in the EI test except for the obligatory contexts for the use of the target structure. In both tests, vocabulary annotation was provided for some key words, including the characters, the alphabetic transcription, and the English translation. During the GJT, the learners were allowed to ask additional vocabulary questions but not grammar-related questions. The total possible score for each test was 15, with each item receiving 1 point. For GJT items, credit was given when a grammatical sentence was judged to be grammatical, and when an ungrammatical sentence was judged to be ungrammatical and the error was corrected. Credit was also given when a grammatical item was judged to be ungrammatical but the correction was made on a part that was unrelated to the target structure. The scoring criteria for the EI tests were different. Credit was given when the target structure was supplied in obligatory contexts. This meant that no credit was given if the target structure was supplied but the context for the use of the structure was not established (e.g., only repeating a classifier in the original sentence without producing the accompanying noun); it also meant that scoring only focused on the use of the target structure and the rest of a reproduced sentence was ignored. Also, the purpose of an EI test is to measure a learner’s implicit knowledge, which is supposedly unconscious and automatic. Therefore cases containing self-correction, which showed the learner’s conscious processing of the target structure, did not receive credit. Language Analytic Ability. Language analytic ability was measured via the Words in Sentences subtest of the MLAT (Carroll & Sapon, 2002), a widely used aptitude test in SLA research. The subtest was used to measure language learners’ sensitivity to grammatical structures or the “ability to handle the grammatical aspects of a foreign language” (Carroll & Sapon, 2002, p. 3). In each item, a key sentence was provided where a certain part was underlined, and the key sentence was followed by one or more comparison sentences with five underlined parts. Test takers chose the part in the comparative sentence(s) that matched

643

Shaofeng Li the function of the designated part in the key sentence. The test had 45 items; learners were given 15 minutes to complete the test. One point was assigned for each item, so the total possible score was 45. Working Memory. A listening span test was developed to measure the learners’ working memory capacities. The rationale behind the decision to use a listening span rather than a reading span test was that the instructional treatment involved oral feedback, which did not draw on learners’ ability to store and process visual stimuli. The test was created using the stimuli developed and validated by Waters and Caplan (1996). It contained 72 sentences divided into 4 sets of sentences at span sizes 3, 4, 5, and 6. Half of the sentences had verbs that required animate subjects and half contained verbs that required inanimate subjects. Half of the sentences were plausible; half were implausible. Implausible sentences were constructed by inverting the animacy of the subject and object noun phrases (e.g., “It was the woman that the fur coat desired”). The sentences were of four types: cleft subject, cleft object, object–subject, and subject– object.4 All four sentence types and two plausibility possibilities (“Good” or “Bad”) were evenly distributed among the test stimuli, and in each set, there was a mixture of sentences with different structures and plausibility possibilities. The sequence in which sentence sets of different span sizes were presented was randomized. During the test, the learner listened to each sentence in a set and decided whether it was plausible; that is, whether it was about something that could happen in the real world. When the whole set was finished, there was a pause, during which the learner recalled the final word of each sentence in that set and wrote down the words on a blank sheet before starting the next set. The learner was informed that reaction time, plausi-

bility judgment, and word recall were equally important. Unlike previous studies that only included recall scores, this study also recorded reaction time and plausibility scores because WM capacity should involve both the processing and storage functions and because previous studies (Leeser, 2007; Waters & Caplan, 1996) showed that learners sacrificed one component for a better performance in another (such as when learners process more slowly in order to achieve more accuracy in word recall). In data analysis, the working memory score for each participant was the average of the z scores for the three components—reaction time, plausibility judgment, and recall of sentence-final words. Procedure Each participant attended three one-on-one sessions with a native speaker (the researcher). In session 1, participants took the HSK proficiency test and the GJT pretest. Session 2 started with the EI pretest, after which the learner received feedback (implicit or explicit) from the native speaker interlocutor (the researcher) on his/her non-target-like classifier use in dyadic interaction; the instructional treatment was followed by the immediate GJT and EI posttests. During the third and final session (seven days after session 2), the learner took the delayed GJT and EI posttests, the test of language analytic ability (Part IV of the MLAT), and the working memory test. Table 2 displays the tasks the participants performed in the three sessions and the approximate duration of each of the tasks. Analysis In order to verify the hypothesis that working memory and language analytic ability underlay the same construct (i.e., language aptitude), a

TABLE 2 Procedure of the Study Session 1 Task HSK test GJT pretest

Session 2

Session 3

Duration

Task

Duration

Task

Duration

50 min 15 min

EI pretest Treatment tasks • Picture description • Spot the difference EI posttest 1 GJT posttest 1

10 min 40–45 min

• EI posttest 2 • GJT posttest 2

10 min 15 min

• Test of LAA • WM test

15 min 15 min

10 min 15 min

Note. HSK test: test of proficiency; GJT: grammaticality judgment test; EI: elicited imitation; LAA: language analytic ability; WM: working memory.

644 principal components analysis (using the Direct Oblimin rotation method) was performed, followed by a maximum likelihood structural equation modeling (SEM) analysis.5 Included in both analyses were learners’ scores on the proficiency test, the GJT and EI pretests, the test of language analytic ability, and the working memory test. If the two cognitive variables tapped into the same construct, they would cluster under the same latent variable or factor in the principal components analysis. The SEM analysis was performed to confirm the identified factor solution and to ascertain how well the data fit the model, with an additional benefit of exploring the relationship between the identified latent variables. Next, two series of multiple regression analyses were conducted. In the first group of analyses, the dependent variables were the immediate and delayed GJT and EI gain scores of all the participants (not of individual groups); the independent variables were the two aptitude components (continuous variables) and the two dummy variables (categorical variables)6 that were related to the effects of the two types of feedback vis-a`-vis the control group. A significant coefficient (b) for a dummy variable indicates that the related feedback group significantly outperformed the control group, and the b is equivalent to the mean difference between the two groups involved (or, more precisely, the effect size related to the group difference given that the coefficient was standardized). A multiple regression analysis including both categorical and continuous variables is essentially the same as an analysis of covariance (ANCOVA) where categorical variables serve as independent variables and continuous variables as covariates (Field, 2005). However, an ANCOVA has a slightly different focus: The researcher is often interested in the main effects and the post hoc analyses

The Modern Language Journal 97 (2013) related to the categorical variable(s) rather than the covariates. A decision was made to conduct multiple regression analyses instead of ANCOVAs because the purpose was to obtain an overall picture about the effects of feedback (compared with no feedback) and the weights of the two aptitude components after the effects of feedback were held constant; of interest were the unique and combined effects of all the included predictors. While the above analyses yielded useful information, the substantive analyses were the second group of regression analyses. In each of the analyses, the predictor variables were the two aptitude components, and the dependent variables were the GJT and EI gain scores (immediate and delayed) of each feedback group (implicit/ explicit). The purpose of the analyses was to ascertain the differential contributions of the two variables to the effects of implicit and explicit feedback. The gain scores of the control group were irrelevant in this case. RESULTS Principal Components Analysis and SEM Analysis The principal components analysis and the SEM analysis were conducted to verify the hypothesis that working memory tapped into the same construct as language analytic ability; that is, working memory was an aptitude component. Table 3 reports the descriptive statistics for the observed variables included in the analyses. The principal components analysis showed a clear two-factor solution: Proficiency, GJT, and EI loaded onto the same factor, which was labeled L2 Competence; language analytic ability and working memory loaded onto another factor, which was named Aptitude. The two factors explained 66% of the total variance (see Table 4). The SEM

TABLE 3 Descriptive Statistics for Variables in the Principal Components Analysis and the Structural Equation Modeling Analysis Variables

N

Mean

SD

Proficiency (HSK) GJT pretest EI pretest Language analytic ability Working memory (average z scores) • Reaction time (milliseconds) • Plausibility judgment • Recall

78 78 78 77 77

29.99 5.78 3.24 24.25 .01 3769.53 63.64 50.79

8.39 1.26 2.22 6.37 .74 523.63 5.27 9.84

Note. GJT grammaticality judgment test; EI: elicited imitation test.

645

Shaofeng Li TABLE 4 Results of the Principal Components Analysis

TABLE 5 Descriptive Statistics for Pretest–Posttest Gain Scores

Factor Loading

GJT

L2 Competence Aptitude

Observed Variables Proficiency Elicited Imitation Grammaticality Judgment Working Memory Analytic Ability

Posttest

n

M

SD

M

SD

A I E C A I E C

78 28 29 21 78 28 29 21

3.28 3.23 5.31 0.54 2.85 3.19 4.46 0.26

2.95 2.17 2.82 1.52 2.70 2.25 2.39 1.55

4.40 4.62 6.43 1.30 4.00 4.08 5.50 1.90

2.95 2.35 2.42 1.34 2.79 2.51 2.81 1.63

1

.85 .84 .72 .85 .75

2

Note. Variance explained: L2 Competence ¼ 43%; Aptitude ¼ 23%; total ¼ 66%.

analysis confirmed the two-factor model and at the same time identified a causal path from Aptitude to L2 Competence. The data showed an acceptable fit to the identified model, x2 ¼ 1.25, df ¼ 4, p ¼ .87 (goodness-of-fit indices: RMSEA ¼ .00, NFI ¼ .98, CFI ¼ 1.00, TLI ¼ 1.17). Figure 1 illustrates the results of the SEM analysis; the value on each of the arrows represents the standardized regression coefficient associated with the path. Taken together, the results indicated that working memory and language analytic ability underlay the same latent variable, which was predictive of L2 competence, the other latent variable in the model. Regression Analyses Table 5 reports the descriptive statistics for the separate and combined pretest–posttest gains of the three participant groups. Gain scores were obtained by subtracting pretest scores from posttest scores. It can be observed that all three groups improved from the pretests to the posttests, as shown by the positive gain scores.

EI

Group

Note. GJT ¼ grammaticality judgment test; EI ¼ elicited imitation test; A ¼ all three groups combined; I ¼ implicit; E ¼ explicit; C ¼ control.

The explicit group performed better than the implicit group, which in turn improved more than the control group. The gap between the two experimental groups in their delayed gain scores appeared smaller than in the immediate gains. To obtain an initial, holistic picture of the impact of the two types of feedback and of the two aptitude components after controlling the effects of feedback, the gain scores of all participants (all three groups) were subjected to multiple regression analyses using a stepwise variable entry method. Table 6 shows the standardized regression coefficient (b) and significance value for each predictor as well as the R2 value for each regression model. A standardized coefficient refers to the change in the outcome in standard deviation units as a result of one standard deviation unit change in the predictor. In the case of a dummy variable, the coefficient indexes the change in outcome as a result of the switch between the two involved groups (i.e., from control to explicit/implicit). R2 represents the percentage of variance in the response variable

FIGURE 1 A Structural Model for Language Aptitude and L2 Competence WM

Proficiency .81

.39 .38 L2 Aptitude .78

LAA

L2 Competence

.48

GJT

.83 EI

Note. WM ¼ working memory; LAA ¼ language analytic ability; GJT ¼ grammaticality judgment test; EI ¼ elicited imitation; .39 (and other numbers) ¼ standardized regression coefficient.

646

The Modern Language Journal 97 (2013)

TABLE 6 Regression Results for the Effects of Feedback and Contributions of Language Analytic Ability and Working Memory Predictors DumEx Tests

b

Timing

GJT

Posttest Posttest Posttest Posttest

EI

1 2 1 2

DumIm p



.81 .77 .84 .62

.00 .00 .00 .00

b 

.41 .48 .51 .31

LAA

WM

p

b

p

b

p

R2

.00 .00 .00 .02

.10 .17 .05 .09

.26 .06 .63 .41

.14 .27 .11 .25

.13 .00 .26 .02

.46 .51 .47 .31



Note. p < .05; DumEx ¼ dummy variable representing the explicit-control comparison; DumIm ¼ dummy variable representing the implicit-control comparison; LAA ¼ language analytic ability; WM ¼ working memory; b ¼ standardized regression coefficient; GJT ¼ grammaticality judgment; EI ¼ elicited imitation; R2 ¼ percentage of variance accounted for.

accounted for by the identified regression model (e.g., R2 ¼ .23 means 23% of the variance in the response variable was accounted for). The two dummy variables (DumEx for the explicit vs. control comparison; DumIm for the implicit vs. control comparison) were significant predictors for gain scores on all measures (GJT and EI; immediate and delayed). In other words, learners in the two feedback conditions showed significantly more gains than the control condition on all measures. By and large, explicit feedback showed larger coefficients than implicit feedback. After the effects of feedback were controlled, working memory was predictive of the delayed gains, and language analytic ability was a near significant predictor (p ¼ .06) of the delayed GJT gain scores. While the above results show how different configurations of feedback and aptitude affected the gains of all learners including the control group, the interactions between feedback and aptitude, which were of primary interest in this study, remain unclear. To determine whether such interactions exist, regression analyses were performed using the gain scores for the two experimental groups as response variables and the two aptitude components as predictors. The results, which appear in Table 7, are summarized as follows: (a) All statistically significant effects related to the delayed gain scores. (b) Language analytic ability was a significant predictor for the GJT gain scores of the implicit group, b ¼ .44, p < .05. A total of 20% of the variance was explained. No significant relationships were found between language analytic ability and the effects of explicit feedback.

(c) Working memory was predictive of the effects of explicit feedback, and the result was found for both the GJT scores (b ¼ .56, p < .01) and the EI scores (b ¼ .38, p < .05). Altogether 30% of the variance in the GJT scores and 14% of the variance in the EI scores was accounted for. Also, working memory was not significantly related to the effects of implicit feedback.

DISCUSSION In response to the call for more research into the mediating variables of corrective feedback (Ellis, 2010) and into aptitude–treatment interaction (Robinson, 2005), this study investigated whether two components of language aptitude— TABLE 7 Regression Results Pertaining to Feedback–Aptitude Interactions Predictors LAA Feedback Test

Timing

Implicit GJT Posttest Posttest EI Posttest Posttest Explicit GJT Posttest Posttest EI Posttest Posttest

1 2 1 2 1 2 1 2

WM

b

p

b

p

R2

.25 .44 .22 .11 .04 .01 .03 .19

.27 .02 .33 .63 .83 .66 .88 .33

.11 .24 .02 .26 .18 .56 .07 .38

.61 .25 .92 .25 .39 .00 .74 .04

.10 .20 .05 .10 .04 .30 .01 .14

Note.  p < .05; GJT ¼ grammaticality judgment test; EI ¼ elicited imitation test; LAA ¼ language analytic ability; WM ¼ working memory; b ¼ standardized regression coefficient; R2 ¼ amount of variance accounted for.

Shaofeng Li language analytic ability and working memory— played different roles in mediating the effects of implicit and explicit feedback. Initial regression analyses performed on the contributions of the two types of feedback and the two aptitude components to the learners’ pretest–posttest gains established that both explicit and implicit feedback were facilitative of the learners’ interlanguage development of Chinese classifiers; after the effects of feedback were controlled, working memory explained a significant portion of the variance of the learners’ gains scores after treatment, but language analytic ability did not. However, subsequent analyses mapping the relationships between the two feedback types and the two aptitude components showed that (a) both aptitude components were significant predictors, and (b) language analytic ability was sensitive to the effects of implicit feedback and working memory to the effects of explicit feedback. Also noteworthy is the impact of testing on the results: All significant results related to the delayed gains and most of them related to GJT scores. In the following, interpretations are sought for the interactions between the aptitude components and the learning conditions and for the influence of testing on the aptitude– treatment interactions. Language Analytic Ability Language analytic ability was sensitive to the effects of implicit feedback that contained the correct classifier without metalinguistic information (recasts). It would seem that in the absence of metalinguistic information, learners with higher analytic ability achieved more. These learners were better versed in (a) noticing linguistic problems and (b) extracting and generalizing the syntactic regularities related to classifier use based on the positive and/or negative evidence contained in the provided recasts. However, this interpretation is subject to two concomitant questions: (a) Did the learners engage in syntactic processing given the implicitness of the feedback? (b) Was language analytic ability drawn upon given that classifiers constitute a simple, transparent structure? With regard to the first question, despite the fact that the feedback did not overtly draw the learners’ attention to errors, the saliency and transparency of this linguistic structure, the

647 instructional context (laboratory), and the characteristics of the recasts (partial and didactic) might have made the corrective force of recasts more easily perceived than in other studies of recasts. Robinson (1997) found that in the implicit condition of his study where learners were asked to simply memorize some examples without being provided with any rule explanation, learners with high aptitude claimed to have actively looked for and were able to verbalize rules. In Long, Inagaki, and Ortega (1998), learners were able to explicitly formulate rules about the target structure as a result of receiving recasts. These findings are proof that learners in implicit learning conditions engaged in rule search or induction, which taxed their language analytic ability. With regard to the second question, although the classifier is not a complex (or hard) structure, it does pose problems for native speakers of English, a nonclassifier language. This is confirmed by Polio’s (1994) data showing that native speakers of English committed omission errors in using Chinese classifiers, but that native speakers of Japanese, a classifier language, did not. For speakers of languages without classifiers, the mastery of classifiers necessarily involves the initial recognition of the syntactic permutation (e.g., numeral þ classifier þ noun) prior to the semantic matching between specific classifiers and the accompanying nouns. It would seem that whether language analytic ability influences the effects of implicit feedback (recasts) is also constrained by the extent to which the linguistic target is within learners’ processing capacity. This speculation is supported by the conflicting findings obtained in feedback studies. Structures such as classifiers in this study and the English possessive determiners (his/her) in Trofimovich, Ammar, and Gatbonton’s (2007) study did not involve complex form–meaning mapping, which made it possible for the learners to solve problems by utilizing their internal resources. In the case of opaque, hard structures, such as English articles in Sheen’s (2007) study, learners were likely unable to extract rules about the target structure using their own analytic ability (even if there was a high level of noticing). Consequently, the effects of recasts were found to be related to language analytic ability in this study and Trofimovich, Ammar, & Gatbonton’s study, but not in Sheen’s study. Consideration of the nature of the linguistic target also helps explain why language analytic ability was not related to the gains in the explicit condition. The classifier is a relatively transparent structure; the metalinguistic information

648 contained in the explicit feedback (which stipulated that a classifier was required between a numeral and a noun) was easy to process and internalize. As a result, the learners may have been relieved of the need to apply their language analytic ability. Therefore, while language analytic ability made a difference in the absence of metalinguistic information in the implicit condition, it is the provision of the metalinguistic information that leveled out the role of language analytic ability in the explicit condition. In Sheen’s (2007) study, however, a significant correlation was detected between language analytic ability and the effects of explicit feedback (metalinguistic correction) in the learning of English articles (a/the). This further testifies to the role of the linguistic target: Language analytic ability was drawn upon in processing the metalinguistic information about a hard, opaque structure. Based upon the available empirical evidence from this study and previous studies, the following hypothesis can be formulated regarding the interaction between language analytic ability and different learning conditions: Other things being equal, language analytic ability is implicated in implicit conditions in the learning of easy, transparent structures that are within one’s processing capacity, and in explicit conditions in the learning of hard, opaque structures where the internalization of available metalinguistic information sets heavy processing demands on internal cognitive resources.

Clearly the hypothetical claim is debatable because of potential problems such as the field’s inconsistency in operationalizing implicitness/ explicitness and controversy over how linguistic difficulty/complexity is determined. Therefore, the falsifiability of the hypothesis is dependent on the extent to which related constructs are clearly and consistently defined and theoretically justified. Working Memory Working memory was predictive of the effects of the explicit feedback. The processing demands of classifier learning through external assistance in the form of metalinguistic correction seemed a perfect match to the mechanism of working memory. When the learner’s attention was brought to the target structure through the provided feedback, he/she encoded and regis-

The Modern Language Journal 97 (2013) tered the auditory stimuli (sound representations about a classifier as well as the metalinguistic information) in the phonological loop, matching the phonological codes with existing codes (e.g., sounds and tones the learner previously learned) archived in long-term memory. This was likely followed by vocal or subvocal rehearsal of the stored information (e.g., repetition of the provided classifier or uptake). The central executive maintained the information in focal attention and processed it for storage in long-term memory through the episodic buffer. The cognitive processing may have taken place by matching a certain classifier with a noun and analyzing the metalinguistic information; it may also have involved the inhibition of other classifiers in the repertoire, which likely competed for the limited capacity of working memory. Evidently, classifier learning in the explicit condition drew heavily on the learner’s ability to store and process the available input, which led to the significant relationship between working memory and the treatment effects. The finding that working memory was sensitive to the effects of explicit but not implicit feedback may have to do with consciousness. Almost all models of working memory, such as the Multiple Component Model (Baddeley & Logie, 1999), the Executive Attention Model (Engle, 2002), and the Embedded Process Model (Cowan, 1999), acknowledge the role of consciousness and attention control. Baddeley (2007) pointed out that “as has become increasingly obvious over the years, conscious awareness appears to be closely related to the executive control, and hence to the operation of working memory” (p. 302). Engle (2002) even stated that working memory is not about short-term span; rather, it is about the ability to focus attention on relevant information and inhibit irrelevant information. Similarly, Ellis (2009) observed that implicit learning does not implicate central attentional resources; explicit learning, by contrast, relies heavily on working memory because it involves conscious memorization of facts. Indeed, in this study, learners’ ability to focus their attention on the information contained in the explicit feedback and at the same time resist competing information may be critical to the development of their knowledge about classifier use. The finding that language analytic ability, but not working memory, mediated the effects of implicit feedback and that working memory, but not language analytic ability, was related to the effects of explicit feedback, demonstrated the different processing demands the two learning

649

Shaofeng Li conditions imposed on the learners’ cognitive resources. As previously stated, classifier learning involves an initial recognition of the syntactic permutation followed by the semantic mapping between individual classifiers and their corresponding nouns. In the implicit condition, where no metalinguistic information was available, learners’ ability to notice, process, and consolidate the syntactic pattern of classifier use seems to have played a greater role than the subsequent processing and storage of individual classifiers. In contrast, in the explicit condition, where information was available about the syntactic component of the target structure, learners’ ability to encode, rehearse, and store individual classifiers and simultaneously to suppress similar classifiers became more important. Aptitude and Testing It is interesting that the feedback–aptitude interactions found in this study are subject to the timing of testing: All significant effects were related to the delayed gains. The relation of aptitude measures to the delayed effects of instructional treatment is consistent with the findings of previous research (Erlam, 2005; Mackey et al., 2002; Trofimovich, Ammar, & Gatbonton, 2007). It is not clear why this is so, but researchers have made some reasonable speculations, which boil down to two themes: The immediacy of the first posttests leveled out the role of aptitude, and aptitude “contributed to the capacity to build on initial exposure during training, and continue to learn during the posttests” (Robinson, 2002, p. 204). Also, the significant results were mainly reflected through measures of explicit knowledge (i.e., the GJT tests). Although working memory was also related to the EI scores in the explicit condition, the gains under this condition may have resulted more from item–learning than system–learning, as evidenced by a lack of relationship between this type of feedback and language analytic ability. Consequently, the tapped knowledge was likely more lexical than syntactic; and lexical knowledge is largely explicit (Do¨rnyei, 2009). The finding is not surprising given that the measures of both cognitive variables involved conscious linguistic processing. According to Ranta’s (2005) review, most significant correlations between aptitude or aptitude components and instructional treatments were found for measures of explicit knowledge, and as was evident from her own study, a measure of language analytic ability was not related to oral

fluency, an important dimension of implicit knowledge. Re´ve´sz (2012) also found that working memory related to learners’ GJT scores and written production but not oral production. Thus, there seems to be a need to include in aptitude tests a measure of the capacity to acquire implicit knowledge. CONCLUSION This study constitutes the first empirical attempt to investigate the relationship between feedback type and aptitude components. It was found that language analytic ability impacted the learning resulting from implicit feedback and working memory influenced the effects of explicit feedback; the significant relationships pertained to delayed posttest scores and explicit knowledge. The findings showed the need for an integrated, situated approach to the role of language aptitude in SLA. They underscore the importance of exploring aptitude–treatment interaction (Snow, 1991) and provide further justification for the necessity of taking a componential rather than a monolithic approach to aptitude research (Do¨rnyei & Skehan, 2003; Robinson, 1997, 2002, 2005). Clearly, the idiosyncratic characteristics of each learning condition (which are molded by feedback type and perhaps also the nature of the linguistic target) set different processing demands on learners’ cognitive abilities, hence the resultant contingent relationships between the two aptitude components and the two feedback types. The study was conducted in a highly controlled laboratory setting, where the interference of potential distracting variables was minimized. This is critical to aptitude research because an underlying premise for the role of aptitude is that, all other things being equal, learners with higher aptitude learn more and faster. Without controlling the noise from other factors, the effects of aptitude could not have been clearly observed and precisely interpreted. Also, a series of moves were taken to ensure methodological rigor: Reliable measures were used, the treatment tasks were carefully developed, treatment effects were measured by using tests of both explicit and implicit knowledge, and robust statistical procedures were employed. Using pretests and posttests made it possible to examine the impact of aptitude on the gains as a result of treatment, as aptitude concerns the ability to learn rather than the ultimate outcome without controlling learners’ prior knowledge (in the absence of a pretest). Language aptitude relates to a transition theory

650 (development between point A and point B), not the amount of stored knowledge at fixed points. Thus, the appropriateness of investigating the contribution of aptitude to learners’ stored knowledge or ultimate outcome at fixed time points in some previous research (such as correlating aptitude scores with pretest and posttest scores or proficiency scores rather than gain scores) is questionable. Further research including replications is warranted to verify, confirm, or dispute the findings of the current study. Replications are particularly valuable in aptitude research (as well as other lines of SLA research) given the heterogeneity of instructional contexts and inconsistency in construct operationalization. For example, even in the few previous studies examining the mediating role of aptitude in affecting the effectiveness of feedback, working memory and language analytic ability were measured in different ways. Working memory has been measured by means of listening span tests (Mackey et al., 2002; Mackey & Sachs, 2012), reading span tests (Re´ve´sz, 2012; Sagarra, 2007), and a number–letter recall test (Trofimovich, Ammar, & Gatbonton, 2007). Measures of analytic ability included a Dutch version of the adapted Words in Sentences subtest of the MLAT (DeKeyser, 1993), a French version of the subtest (Trofimovich, Ammar, & Gatbonton, 2007), and a language analysis test developed in an artificial language (Sheen, 2007). Methodological disparities between studies make their results hardly comparable and make it difficult to reach any conclusions. It is also necessary to carry out more studies that include other aptitude components, such as phonemic coding ability, to explore the unique and combined effects of multiple factors on L2 achievements under different learning and instructional conditions. Furthermore, it is worthwhile to examine the role of the linguistic target in mediating the relationship between aptitude components and learning conditions. The nature of the target structure was resorted to in accounting for the discrepancies between the findings of this study and those of previous research, but to date there has been no empirical research that included it as an independent variable.

ACKNOWLEDGMENTS I would like to express my gratitude to the following individuals for the help and support they provided me in

The Modern Language Journal 97 (2013) various aspects of the project the article is based on: Rod Ellis, Susan Gass, Xiaoshi Li, Shawn Loewen, Roy Lyster, Jenefer Philp, Leila Ranta, Patti Spinner, Hong Wang, and Paula Winke. My thanks are also due to the instructors of Chinese at Michigan State University (Liren Shi, Taiheng Shi, Chunhong Teng, and Qiongyao Wang) and the University of Michigan (Qinghai Chen, Laura Grande, Wei Liu, Le Tang, and Haiqing Yin) for their assistance with data collection. Also, the article has benefited enormously from the insights of the anonymous reviewers and Heidi Byrnes, editor of the Modern Language Journal. I am solely responsible for any limitations and errors.

NOTES 1

The results on the comparative effects of explicit and implicit feedback were reported in another study (Li, 2014), which investigated the interactions between feedback type and proficiency. It was found that explicit feedback was more effective than implicit feedback for low-level learners, but the two types of feedback were equally effective for more advanced learners; explicit feedback showed an initial advantage, but the effects of implicit feedback were better maintained. 2 Although the participants were from different levels of classes, the influence of proficiency was minimized by assigning learners from all levels to each participant group and ensuring that there were no significant differences among the three groups in their test scores on the HSK test. There were also no significant betweengroup differences in their pretest scores on classifier use. An ideal scenario would have been one in which all the participants were recruited from the same level of classes, but this was not possible due to logistic constraints. 3 The control group performed a different task and therefore received some placebo treatment. Therefore, essentially they only took the pretests and posttests, as in many feedback studies (e.g., Ellis, Loewen, & Erlam, 2006). However, it must be admitted that the effects of feedback would have been better disentangled if a comparison group had been included that performed the same task as the experimental groups. 4 The sentence stimuli have the following structures: • It was the woman that ate the apple. (cleft subject: CS) • It was the damaged car that the mechanic fixed. (cleft object: CO) • The police arrested the man that punched his dog. (object–subject: OS) • The story that the man told amused the audience. (subject–object: SO) These sentences differ in number of propositions and syntactic complexity. CS and CO sentences have one proposition, but OS and SO sentences have two. CS and OS sentences involve canonical assignment of thematic roles (Agent þ Theme) and are therefore easier to process than CO and SO sentences.

Shaofeng Li 5

An anonymous reviewer pointed out that a SEM analysis requires a large sample size. Bentler and Chou (1987) stated that 10 subjects per indicator variable was an acceptable ratio. The SEM analysis in this study included 5 indicator variables and was based on data contributed by 78 subjects. Therefore, the sample size, while not large, was appropriate in this case. 6 The two categorical variables were named DumEx and DumIm, representing the explicit–control contrast and the implicit–control contrast respectively. Zeros and ones were used to code the variables. For the DumEx variable, 1 was assigned to the explicit group, and 0 to the other groups (implicit and control); for the DumIm variable, 1 was assigned to the implicit group, and 0 to the other two groups. REFERENCES Alderson, J. C., Clapham, C., & Steel, D. (1997). Metalinguistic knowledge, language aptitude and language proficiency. Language Teaching Research, 1, 93–121. Ammar, A., & Spada, N. (2006). One size fits all? Recasts, prompts, and L2 learning. Studies in Second Language Acquisition, 28, 543–574. Baddeley, A. (2007). Working memory, thought, and action. Oxford: Oxford University Press. Baddeley, A., & Logie, R. (1999). Working memory: The multiple–component model. In A. Miyake & P. Shah (Eds.), Models of working memory: Mechanisms of active maintenance and executive control (pp. 28– 61). Cambridge: Cambridge University Press. Benter, P. M., & Chou, C. P. (1987). Practical issues in structural modeling. Sociological Methods and Research, 16, 78–117. Carroll, J. B. (1981). Twenty-five years of research on foreign language aptitude. In K. C. Diller (Ed.), Individual differences and universals in language learning aptitude (pp. 83–118). Rowley, MA: Newbury House. Carroll, J. B., & Sapon, S. (1959). Modern language aptitude test. New York: The Psychological Corporation/Harcourt Brace Jovanovich. Carroll, J. B., & Sapon, S. (2002). Manual for the MLAT. Bethesda, MD: Second Language Testing. Conway, A., Jarrold, C., Kane, M., Miyake, A., & Towse, J. (Eds.). (2007). Variation in working memory. Oxford: Oxford University Press. Cowan, N. (1999). An embedded-process model of working memory. In A. Miyake & P. Shah (Eds.), Models of working memory (pp. 62–101). Cambridge: Cambridge University Press. Daneman, M., & Carpenter, P. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450–466. DeKeyser, R. (1993). The effect of error correction on L2 grammar knowledge and oral proficiency. Modern Language Journal, 77, 501–514. DeKeyser, R., & Koeth, J. (2011). Cognitive aptitudes for second language learning. In E. Hinkel (Ed.),

651 Handbook of research in second language teaching and learning (pp. 395–406). New York/London: Routledge. Do¨rnyei, Z. (2009). The psychology of second language acquisition. Oxford: Oxford University Press. Do¨rnyei, Z., & Skehan, P. (2003). Individual differences in second language learning. In D. Catherine & L. Michael (Eds.), Handbook of second language acquisition (pp. 589–630). Malden, MA: Blackwell. Doughty, C., & Varela, E. (1998). Communicative focus on form. In C. Doughty & J. Williams (Eds.), Focus on form in classroom second language acquisition (pp. 114–138). Cambridge: Cambridge University Press. Egi, T. (2007). Recasts, learners’ interpretations, and L2 development. In A. Mackey (Ed.), Conversational interaction in second language acquisition: A collection of empirical studies (pp. 249–267). Oxford: Oxford University Press. Ehrman, M., & Oxford, R. (1995). Cognition plus: Correlates of language learning success. Modern Language Journal, 79, 67–89. Ellis, R. (2007). The differential effects of corrective feedback on two grammatical structures. In A. Mackey (Ed.), Conversational interaction in second language acquisition (pp. 339–360). Oxford: Oxford University Press. Ellis, R. (2009). Implicit and explicit learning, knowledge and instruction. In R. Ellis, S. Loewen, C. Elder, R. Erlam, J. Philp, & H. Reinders (Eds.), Implicit and explicit knowledge in second language learning, testing and teaching (pp. 3–25). Bristol, UK: Multilingual Matters. Ellis, R. (2010). Cognitive, social, and psychological dimensions of corrective feedback. In R. Batstone (Ed.), Sociocognitive perspectives on language use and language learning (pp. 151–165). Oxford: Oxford University Press. Ellis, R., Loewen, S., Elder, C., Erlam, R., Philp, J., & Reinders, H. (Eds.). (2009). Implicit and explicit knowledge in second language learning, testing and teaching. Bristol, UK: Multilingual Matters. Ellis, R., Loewen, S., & Erlam, R. (2006). Implicit and explicit corrective feedback and the acquisition of L2 grammar. Studies in Second Language Acquisition, 28, 339–368. Engle, R. (2002). Working memory capacity as executive attention. Current Directions in Psychological Science, 11, 19–23. Erbaugh, M. (1986). Taking stock: The development of Chinese noun classifiers historically and in young children. In C. Craig (Ed.), Noun classes and categorization (pp. 399–436). Philadelphia/Amsterdam: John Benjamins. Erlam, R. (2005). Language aptitude and its relationship to instructional effectiveness in second language acquisition. Language Teaching Research, 9, 147–171. French, L. (2006). Phonological working memory and second language acquisition: Developmental study of Franco-

652 phone children learning English in Quebec. New York: Edwin Mellen Press. Field, A. (2005). Discovering statistics using SPSS. Thousand Oaks, CA: SAGE. Gardner, R. C., & Lambert, W. E. (1965). Language aptitude, intelligence, and second-language achievement. Journal of Educational Psychology, 56, 191–199. Gass, S., & Selinker, L. (2008). Second language acquisition: An introductory course. New York/London: Routledge. Han, Z. (2002). A study of the impact of recasts on tense consistency in L2 output. TESOL Quarterly, 36, 543–572. Harley, B., & Hart, D. (1997). Language aptitude and second language proficiency in classroom learners of different starting ages. Studies in Second Language Acquisition, 19, 379–400. Horwitz, E. (1980). The relationship of conceptual level to the development of communicative competence. (Unpublished doctoral dissertation). The University of Illinois at Urbana–Champaign, Urbana– Champagne, Illinois. Horwitz, E. (1987). Linguistic and communicative competence: Reassessing foreign language aptitude. In B. VanPatten, T. Dvorak, & J. Lee (Eds.), Foreign language learning (pp. 146–157). Cambridge, MA: Newbury House. Hulstijn, J. (2005). Theoretical and empirical issues in the study of implicit and explicit second-language learning: Introduction. Studies in Second Language Acquisition, 27, 129–140. Hummel, K. (2009). Aptitude, phonological memory, and second language proficiency in nonnovice adult learners. Applied Psycholinguistics, 30, 225– 249. Ishida, M. (2004). Effects of recasts on the acquisition of the aspectual form –te i-(ru) by learners of Japanese as a foreign language. Language Learning, 54, 311– 394. Iwashita, N. (2003). Positive and negative input in taskbased interaction: Differential effects on L2 development. Studies in Second Language Acquisition, 25, 1–36. Krashen, S. (1981). Second language acquisition and second language learning. Oxford: Pergamon. Leeman, J. (2003). Recasts and second language development: Beyond negative evidence. Studies in Second Language Acquisition, 25, 37–63. Leeser, M. (2007). Learner-based factors in L2 reading comprehension and processing grammatical form: Topic familiarity and working memory. Language Learning, 57, 229–270. Li, S. (2009). The differential effects of implicit and explicit feedback on L2 learners of different proficiency levels. Applied Language Learning, 19, 53–79. Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis. Language Learning, 60, 309– 365.

The Modern Language Journal 97 (2013) Li, S. (2014). The interface between feedback type, L2 proficiency, and the nature of the linguistic target. Language Teaching Research. Li, C., & Thompson, S. (1981). Mandarin Chinese: A functional reference grammar. Los Angeles: University of California Press. Liu, Y., Yao, T., Bi, N., Ge, L., & Shi, Y. (2009). Integrated Chinese (3rd ed.). Boston: Cheng & Tsui Company. Long, M. H. (1996). The role of the linguistic environment in second language acquisition. In W. C. Ritchie & T. K. Bhatia (Eds.), Handbook of language acquisition. Vol. 2: Second language acquisition (pp. 413–468). New York: Academic Press. Long, M., Inagaki, S., & Ortega, L. (1998). The role of negative feedback in SLA: Models and recasts in Japanese and Spanish. Modern Language Journal, 82, 357–371. Lyster, R. (2004). Different effects of prompts and effects in form-focused instruction. Studies in Second Language Acquisition, 26, 399–432. Lyster, R., & Izquierdo, J. (2009). Prompts versus recasts in dyadic interaction. Studies in Second Language Acquisition, 59, 453–498. Lyster, R., & Mori, H. (2006). Interactional feedback and instructional counterbalance. Studies in Second Language Acquisition, 28, 269–300. Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake. Studies in Second Language Acquisition, 19, 37–66. Mackey, A., & Goo, J. (2007). Interaction research in SLA: A meta-analysis and research synthesis. In A. Mackey (Ed.), Conversational interaction in SLA: A collection of empirical studies (pp. 408–452). Oxford: Oxford University Press. Mackey, A., & Philp, J. (1998). Conversational interaction and second language development: Recasts, responses, and red herrings? Modern Language Journal, 82, 338–356. Mackey, A., Philp, J., Egi, T., Fujii, A., & Tatsumi, T. (2002). Individual differences in working memory, noticing of interactional feedback, and L2 development. In P. Robinson (Ed.), Individual differences and instructed language learning (pp. 181– 209). Philadelphia/Amsterdam: John Benjamins. Mackey, A., & Sachs, R. (2012). Older learners in SLA research: A first look at working memory, feedback, and L2 development. Language Learning, 62, 704–740. McDonough, K. (2007). Interactional feedback and the emergence of simple past activity verbs in L2 English. In A. Mackey (Ed.), Conversational interaction in second language acquisition (pp. 323–338). Oxford: Oxford University Press. Miyake, A., & Friedman, N. (1998). Individual differences in second language proficiency: Working memory as language aptitude. In A. Healy & L. Bourne (Eds.), Foreign language learning: Psycholinguistic studies on training and retention (pp. 339– 364). Mahwah, NJ: Lawrence Erlbaum.

Shaofeng Li Nie, D. (2006). Test–retest reliability of HSK (Elementary–Intermediate Level). China Examinations, 5, 43–47. Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50, 417–528. Polio, C. (1994). Non-native speakers’ use of nominal classifiers in Mandarin Chinese. JCLTA, 29, 51–66. Ranta, L. (2002). The role of language analytic ability in the communicative classroom. In P. Robinson (Ed.), Individual differences and instructed language learning (pp. 159–180). Philadelphia/Amsterdam: John Benjamins. Ranta, L. (2005). Language analytic ability and oral production in a second language: Is there a connection? In A. Housen & M. Pierrard (Eds.), Investigations in instructed second language acquisition (pp. 99–130). Berlin: Mouton de Gruyter. Re´ve´sz, A. (2012). Working memory and the observed effectiveness of recasts on different L2 outcome measures. Language Learning, 62, 93–132. Robinson, P. (1997). Individual differences and fundamental similarity of implicit and explicit adult second language learning. Language Learning, 47, 45–99. Robinson, P. (2002). Effects of individual differences in intelligence, aptitude and working memory on adult incidental SLA: A replication and extension of Reber, Walkenfield and Hernstadt (1991). In P. Robinson (Ed.), Individual differences and instructed language learning (pp. 211–266). Philadelphia/ Amsterdam: John Benjamins. Robinson, P. (2005). Aptitude and second language acquisition. Annual Review of Applied Linguistics, 25, 46–73. Sagarra, N. (2007). From CALL to face-to-face interaction: The effect of computer-delivered recasts and working memory on L2 development. In A. Mackey (Ed.), Conversational interaction in second language acquisition (pp. 229–248). Oxford: Oxford University Press. Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics, 11, 129–158. Sheen, Y. (2007). The effects of corrective feedback, language aptitude, and learner attitudes on the acquisition of English articles. In A. Mackey (Ed.), Conversational interaction in second language acquisi-

653 tion (pp. 301–322). Oxford: Oxford University Press. Sheen, Y. (2010). Differential effects of oral and written corrective feedback in the ESL classroom. Studies in Second Language Acquisition, 32, 203–234. Sheen, Y. (2011). Corrective feedback, individual differences, and second language learning. Berlin: Springer. Skehan, P. (1982). Memory and motivation in language aptitude testing. (Unpublished doctoral dissertation). University of London, London, UK. Skehan, P. (2012). Language aptitude. In S. Gass & A. Mackey (Eds.), The Routledge handbook of second language acquisition (pp. 381–395). New York/ London: Routledge. Snow, R. (1991). Aptitude–treatment interaction as a framework for research on individual differences in psychotherapy. Journal of Consulting and Clinical Psychology, 59, 205–216. Spada, N. (2011). Beyond form-focused instruction: Reflections on past, present and future research. Language Teaching, 44, 225–236. Spada, N., & Tomita, Y. (2010). Interactions between type of instruction and type of language feature: A meta-analysis. Language Learning, 60, 263–308. Sparks, R. L., Humbach, N., Patton, J. O. N., & Ganschow, L. (2011). Subcomponents of Second-Language Aptitude and Second-Language Proficiency. Modern Language Journal, 95, 253–273. Trofimovich, P., Ammar, A., & Gatbonton, E. (2007). How effective are recasts? The role of attention, memory, and analytical ability. In A. Mackey (Ed.), Conversational interaction in second language acquisition (pp. 171–195). Oxford: Oxford University Press. Waters, G., & Caplan, D. (1996). The measurement of verbal working memory capacity and its relation to reading comprehension. Quarterly Journal of Experimental Psychology, 49A, 51–79. Wu, Y., & Bodomo, A. (2009). Classifiers 6¼ determiners. Linguistic Inquiry, 40, 487–503. Wu, S., Yu, Y., Zhang, Y., & Tian, W. (2007). Chinese link. Upper Saddle River, NJ: Pearson Education. Yang, Y., & Lyster, R. (2010). Effects of form-focused practice and feedback on Chinese EFL learners’ acquisition of regular and irregular past tense forms. Studies in Second Language Acquisition, 32, 235–263.

654

The Modern Language Journal 97 (2013)

APPENDIX

Measures Used in the Study Measure HSK Treatment effect • Grammaticality judgment • Elicited imitation Part IV of MLAT Listening span test • Reaction time • Plausibility judgment • Recall

Construct

Items

Points

Reliability

Proficiency

60

60

.85

Explicit knowledge Implicit knowledge Language analytic ability Working memory

15 15 45 72 72 72 72

15 15 45

.74 .68 .81

Average 72 72

.98 .80 .89

Note. HSK: Chinese proficiency test; listening span test: The working memory score used in the analyses for each participant is the average of the z scores relating to the three components of the test; average: The reaction time for each learner is the average of the reaction times relating to the items for which the plausibility judgments were correct; reliability: Cronbach’s a is used as the reliability coefficient, and reliability estimates relating to the GJT and EI tests are based on the learners’ pretest scores.