How you ask is what you get: on the influence of ... - Wiley Online Library

12 downloads 0 Views 125KB Size Report
Jun 15, 2004 - How You Ask Is What You Get: On the Influence of. Question Form on Accuracy and Confidence. IZASKUN IBABE1* and SIEGFRIED LUDWIG ...
APPLIED COGNITIVE PSYCHOLOGY Appl. Cognit. Psychol. 18: 711–726 (2004) Published online 15 June 2004 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/acp.1025

How You Ask Is What You Get: On the Influence of Question Form on Accuracy and Confidence IZASKUN IBABE1* and SIEGFRIED LUDWIG SPORER2 1

University of the Basque Country, Spain 2 University of Giessen, Germany

SUMMARY Memory accuracy and confidence for details of an event were investigated as a function of three question forms (open-ended, true-false (T-F) and four-alternative-forced-choice (4-AFC) questions), type of content (action vs. descriptive details) and centrality of information (central vs. peripheral). Sixty-two undergraduates were shown a film of a robbery, then answered 32 questions about the film using a questionnaire in one of the three question forms. Open-ended (74.1%) and T-F questions (73.0%) led to significantly more correct answers than 4-AFC questions (66.5%). Accuracy was higher for central than peripheral information, and higher for action details than for descriptive details. Central action details were remembered better than peripheral action details whereas centrality made no difference for descriptive details. Moreover, type of content and centrality of information had different effects depending on question form. Confidence in correct answers with open-ended questions was lower than with T-F and 4-AFC questions. Across question forms, witnesses were much more confident with correct than incorrect answers, except for central action details. The discussion focuses on differences between recall and recognition and their implication for confidence with different question forms. Copyright # 2004 John Wiley & Sons, Ltd.

Since the beginning of the twentieth century, eyewitness researchers have been investigating the influence of different types of question forms on the accuracy of reports (e.g. Binet, 1900; Marston, 1924; Stern, 1902; for a historical review, see Sporer, 1982). In some classical studies, the narrative form has led to less complete but more accurate reports than the interrogative form (Lipton, 1977; Marquis, Marshall, & Oskamp, 1972; Marston, 1924; for a summary, see Deffenbacher, 1991; Yarmey, 1979). More recently, Yarmey and Yarmey (1997) tested eyewitnesses’ interrogative and narrative recall for person and clothing characteristics after a short live encounter with a target woman in a field setting. The authors found interrogative responses to be more complete than narrative responses. However, despite containing more correct details, interrogative responses also contained more errors than narrative responses, resulting in a small net advantage (8% difference in accuracy; effect size r ¼ 0.17) for narrative responses (cf. Deffenbacher, 1991). Confidence was higher with narrative than with interrogative reports. The authors suggested

*Correspondence to: Dr I. Ibabe, Faculty of Psychology, University of the Basque Country, Avda. Tolosa, 70, 20009-Donostia, Spain. E-mail: [email protected]; [email protected] Contract/grant sponsor: Basque Country Government; contract/grant number: B.2 4/99053. Contract/grant sponsor: German Science Foundation; contract/grant number: DFG: Sp 262/3-2.

Copyright # 2004 John Wiley & Sons, Ltd.

712

I. Ibabe and S. L. Sporer

that error rates could probably be lowered if witnesses were not pushed to answer cued questions when they are unsure of their answers. Among interrogative question forms, more specific distinctions between different question forms can be made, viz. true-false (T-F) questions, and alternative-forced choice (AFC) tests with different numbers of response alternatives. In T-F tests, respondents tend to answer affirmatively (Richardson, Dohrenwend, & Klein, 1965), particularly children (Peterson & Biggs, 1997). That is, when a person does not know whether a sentence is true or false, he or she is inclined to answer that the statement is true (the well-known acquiescence bias: see Cronbach, 1990). In contrast to recognition tests, when open-ended questions are asked in a nonleading manner, they involve a smaller risk of misleading respondents because the questions provide them with a lower amount of new information than T-F and multiple-choice questions. The information contained in the question serves as a recall cue for retrieving the correct answer. For example, asking ‘What type of buildings were there in the area?’ does not suggest any specific type of buildings per se—although it does suggest the presence of buildings. In an AFC test asking whether the buildings were of various types (factories, family homes, etc.) introduces information which may simply be chosen on the basis of familiarity. In the latter case, the choice may be based on a vague sense of familiarity or on a subjective selection of the most likely of several alternatives. The number of alternatives in AFC recognition tests also has an influence on the percentage of correct responses, since the probability of producing correct answers by guessing decreases as the number of alternatives increases. In open-ended questions, the number of possible answers is unlimited compared to recognition tests, thus reducing the chance of correctly guessing an answer to a minimum. From this reasoning, one would predict superior performance on AFC tests compared to open-ended questions. On the other hand, choosing an answer simply on the basis of familiarity may lower the scrutiny of the respondent. Thus, an answer in an AFC test may be inappropriately accepted as correct, leading to more errors than a carefully thought out answer to an open question. These rival hypotheses will be further examined in this paper. The very form in which questions are asked also has implications for the confidence with which answers are given. The respondents’ confidence may be increased through the presentation of alternatives since they are sure the correct answer is among them, and thus make a smaller cognitive effort when answering (Robinson, Johnson, & Herndon, 1997). In contrast, open-ended questions require respondents to retrieve the information to-berecalled from memory and therefore elicit a lower level of confidence in their responses, even if the answer is correct (Robinson et al., 1997). The question form may also be one of the variables that affects the confidence of an eyewitness without affecting his or her accuracy, but it may also affect his or her accuracy without a change in confidence (see Luus & Wells, 1994; Sporer, Penrod, Read, & Cutler, 1995). While accuracy may be higher in recall than in recognition tests, confidence has been observed to be higher for recognition than recall tests (e.g. Robinson et al., 1997). In the following experiment we will address the question how the use of different question forms (open-ended, T-F questions, and four-alternative-forced-choice) affects accuracy and confidence in the answers given (correct or incorrect). With regard to accuracy, we expect a higher rate of correct answers to open-ended than to T-F or 4-AFC questions (Cassel, Roebers, & Bjorkland, 1996; Lipton, 1977; Schneider & Pressley, 1997). Further, from the base rates obtainable simply by guessing we should assume that answers to T-F questions are more likely to be correct (i.e. above 50% by mere guessing) Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

Question form, accuracy, and confidence

713

than answers to AFC questions (i.e. above 25% for an 4-AFC test), although past evidence on this issue is mixed (Marquis et al., 1972). With respect to confidence, we expect the level of confidence to be lower for openended questions than for forced-choice questions, due to the fact that respondents lack retrieval cues for their answers and thus need to make a greater effort in retrieving the specific information (Robinson et al., 1997). It has to be acknowledged that comparing the accuracy rates across different question forms is not only influenced by the question forms, but also the specific content of the questions asked and the types of alternatives provided, that is the item difficulties of the individual questions asked. These factors influence the chances of being correct by guessing. Further, item difficulty may also depend on how central or peripheral the information is within the context of a given event, and whether the question refers to action details or to descriptive details. CENTRAL VS. PERIPHERAL INFORMATION We assume that the information related to an event is not stored uniformly. It has been found that central information is remembered better than peripheral information (Burke, Heuer, & Reisberg, 1992; Christianson & Loftus, 1987, 1991; Heuer & Reisberg, 1990). Although this effect has been primarily observed with emotional events where it is assumed that the attentional focus is narrowed and central information is being captured first (e.g. Yuille & Dayen, 1998), other researchers have also addressed this distinction without reference to the emotional content of the material (Heath & Erickson, 1998; Migueles & Garcia-Bajos, 1999; Wright & Stroud, 1998). When confronted with eyewitness testimony, decision makers are usually concerned with central actions that are of legal relevance, that is, facts fulfilling the abstract definitions of a legal code (Who did what to whom? Who poured the poison into the cup?). Of course, at times peripheral details may also be of legal relevance (e.g. How many cups were on the table, and which of several cups contained the poison?), but in most cases investigators are only concerned with obtaining central information about an event rather than peripheral information. The distinction between central and peripheral information is also important as fact finders are often persuaded by witnesses who display memory for minor details. For example, Wells and Leippe (1981) observed a negative relationship (r ¼ 0.41) between identification accuracy and the number of correctly recalled details,1 but mock jurors were more likely to believe witnesses who answered questions about peripheral details correctly. The most likely reason why memories for central and peripheral information may differ is that attention is more likely directed at the former (cf. the weapon focus effect; Loftus, Loftus, & Messo, 1987; Steblay, 1992). ACTION VS. DESCRIPTIVE DETAILS We further distinguish between different types of content, in particular action vs. descriptive details. The focus on action (or behaviour), as opposed to the circumstances in 1 However, it should be noted that in these studies correlations are computed between identification accuracy (a visual recognition task with faces as stimuli) and verbal recall (of objects or person details) which generally show a weak or no relationship with each other (see Sporer, 1996, for review).

Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

714

I. Ibabe and S. L. Sporer

which the behaviour occurs, is central to Gestalt psychological approaches to social perception (for a classic review, see Sherif & Sherif, 1969). A similar proposition has been put forward by Heider (1958, p. 54): ‘It seems that behavior in particular has such salient properties it tends to engulf the total field rather than be confined to its proper position as a local stimulus whose interpretation requires the additional data of a surrounding field— the situation in social perception.’ Jones and Nisbett (1971), in their famous paper on actor-observer differences in attribution, have further stressed the point that ‘behavior is figural against the ground of the situation’ (p. 87). Just as ‘behavior engulfs the field’ in Heider’s (1958) attributional approach, witnesses are assumed to focus their attention on actions as the figure against the background of other descriptive details. Besides the well-known Gestalt principles (for summaries, see Goldstein, 1996; Rock & Palmer, 1990; Sherif & Sherif, 1969) that guide perceptual organization of objects (i.e. static stimuli) there are also movement heuristics humans apply to the perception of motion (Ramachandran & Anstis, 1986). Although the deployment of attention may be guided by many other factors (e.g. the instructions given to participant-witnesses in an experimental study using a slide show or a film as stimulus material) we argue that action details are more likely to be the natural focus of attention than descriptive details in an ongoing event. There is some empirical evidence that the type of content of an event (action details vs. descriptive details) influences eyewitness reports (Burke et al., 1992; Clifford & Scott, 1978; Tichner & Poulton, 1975). One possible explanation for the difference is that with actions, an individual may recall information congruent with a story line on the basis of previously learned schemata about this type of event, whereas he or she is less likely to possess schemata when it comes to descriptive details (Heuer & Reisberg, 1990). It is also possible that an observer focuses his or her attention on actions first, and only subsequently on descriptive details as in Heider’s (1958) famous dictum ‘the behavior engulfs the field’. A general problem with these types of studies is that it is very difficult to define central and peripheral details as well as action and descriptive details on an a-priori basis. Conceptualizations of centrality tend to be circular: Central events are found to be more likely to be remembered, and the centrality of information is offered as an explanation for better memory. Heuer and Reisberg (1990) have used a conceptual distinction to differentiate central from peripheral information whereas Christianson and Loftus (1991) have argued for a perceptive/spatial distinction (see also Migueles & Garcia-Bajos, 1999). Unfortunately, the categorization Heuer and Reisberg (1990) propose may only be useful for slides (see also Heath & Erickson, 1998), and not for a film or a simulated event. However, presenting ‘actions’ as slides appears problematic because actions involve movements which cannot be perceived naturally with this medium of presentation. This requires participants to abstract actions from slides (which seems hardly possible without interpolating action sequences between slides from scripts or event schemas). Besides lacking ecological validity, this raises the question whether or not these authors’ conceptual distinctions are readily applicable to filmed or live events. Therefore, in our study we used a filmed event to avoid this problem.

GOALS OF THE PRESENT STUDY We have attempted to avoid the ubiquitous problem of circularity by using precise operational definitions of action and descriptive details that either pertain or do not pertain Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

Question form, accuracy, and confidence

715

to a protagonist’s actions, thus embracing both the conceptual and the spatial concept (I. Ibabe, unpublished dissertation, 1998). We defined central action details as those behaviours that relate to central characters of an event and that are contemporaneous to the critical event. On the other hand, peripheral action details include behaviours of noncentral characters or of central characters whose actions do not take place during the critical event. Central descriptive details are defined as physical characteristics of scenes, persons and objects related to the critical event. In contrast, peripheral descriptive details refer to descriptive information unrelated to the event itself. We expected accuracy to be higher for action than for descriptive details, and higher for central than for peripheral information (Christianson & Loftus, 1991; Clifford & Scott, 1978; Tichner & Poulton, 1975). Besides these main effects, we also expected an interaction between type of content and centrality of information as found in Ibabe (I. Ibabe, unpublished dissertation, 1998, Exp. 1 and 3): Accuracy is expected to be higher in central than in peripheral actions whereas for descriptive details no such difference might be observed. In summary, this study investigates the effects of type of content (action vs. descriptive details) and centrality of information (central vs. peripheral) on participants’ accuracy and confidence with different question forms (open-ended, T-F, 4-AFC). Further analyses will look at the confidence for correct and incorrect answers as a function of question form, type of content and centrality of information. METHOD Participants Participants in this experiment were 62 first year students of psychology (10 males, 52 females) at the University of the Basque Country for partial fulfilment of a course requirement. Their mean age was 19 years. Material Film of a staged event For the presentation phase of this experiment a staged event (an armed theft of a car) was filmed by professional cameramen and played by non-professional actors, its total length being 62 s. First, participants see a general panorama of a suburb, then a car stops. While its owner is consulting a map, he is threatened by a youth with a knife and forced to leave the car. When the robber drives away with the car, the owner runs after him a few metres. Shortly thereafter, the owner sees a policeman and asks him for help. The film was projected onto a 2  2 m white screen, using a Sony video projector (model Sony VPH1042 QM).2 Questionnaires In order to assess the memory for the event, three types of questions were used: openended questions (‘What color was the car?’), 4-AFC questions (‘What color was the car?: 2 The emotionality of the content of the film is considered low as observed in a previous study (I. Ibabe, unpublished dissertation, 1998, Exp. 5). Participants in that study (N ¼ 120) had rated how distressed and how unpleasant (on 7-point Likert scales) they had felt about the event. Both means were below the mid-point of the scale (M ¼ 2.86, SD ¼ 1.46, and M ¼ 3.34, SD ¼ 1.42, respectively).

Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

716

I. Ibabe and S. L. Sporer

black/gray/red/blue?’), and T-F questions (‘The color of the car was blue: T/F’). Since the experiment was conducted in Spain, the questions were asked in Spanish. The content of the questions was the same for all three question forms. Every test had 32 questions, of which 16 concerned action details and 16 were about descriptive details of the event (e.g. visual, spatial and time details, as well as person and object descriptions). Eight of each type concerned central and the other eight peripheral aspects. To be exact, we had eight questions of each of the following: central action, peripheral action, central descriptive and peripheral descriptive details. In the T-F test, ‘True’ answers were correct for four questions but incorrect for the other four questions. Procedure Participants were randomly allocated to one of the three Question Form conditions (openended questions, 4-AFC or T-F test). Each group participated in a separate session, and every session lasted approximately 30 min. First, participants were told to pay special attention to what they were to see on the screen. Then, they were shown the film of the car theft. Next, they were asked to answer a series of questions about the event observed, leaving no questions unanswered. For every question they had to rate the confidence in the answer on a 5-point Likert scale (‘0 ¼ not at all sure’ to ‘4 ¼ totally sure’). For the openended questions, participants were instructed to try to guess the answer if they did not know it. There was no time limit to answer the questions.

RESULTS Accuracy We computed a 3  2  2 mixed-model ANOVA with question form (open-ended questions, T-F, 4-AFC) as a between-subjects, and type of content (action details vs. descriptive details) and centrality of information (central vs. peripheral) as repeated measures factors. The dependent variable was the number of correct answers, reported here as per cent correct. The means of the percentages of correct answers in all conditions are presented in Table 1. There were three significant main effects: for question form, F(2, 59) ¼ 4.94, p < 0.01, effect size partial 2 ¼ 0.14; type of content, F(1, 59) ¼ 14.33, p < 0.001, effect size r ¼ 0.44; and centrality of information, F(1, 59) ¼ 14.08, p < 0.001, r ¼ 0.44. (The effect sizes partial 2 were calculated according to Tabachnik & Fidell, 1996, for unequal-n designs; the effect sizes r were computed according to Mullen, 1989, from F-values with one degree of freedom in the numerator. Positive r values indicate an effect in line with our hypotheses, negative r values contrary to expectations.) Table 1 also displays the marginal means for the main effects of these factors. Participants received higher scores with the open-ended questions (M ¼ 74.1%) and the T-F test (M ¼ 73.0%) than with the 4-AFC test (M ¼ 66.5%), according to Scheffe´ post hoc analyses (p < 0.05). Moreover, the rate of performance was higher for action (M ¼ 74.4%) than descriptive details (M ¼ 67.6%). Central information (M ¼ 74.5%) was recalled better than peripheral information (M ¼ 67.5%). Type of Content interacted significantly with Centrality of Information, F(1, 59) ¼ 8.41, p < 0.01, partial 2 ¼ 0.13. Simple main effects analyses revealed that central action details were recalled significantly better than peripheral action details, F(1, 59) ¼ 23.72, Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

Question form, accuracy, and confidence

717

Table 1. Means of the percentages of correct answers as a function of question form, type of content and centrality of information (n ¼ 62) Content

Question form Open-ended (n ¼ 20)

Action details Central Peripheral Means Descriptive details Central Peripheral Means Means Central Peripheral Means

T-F (n ¼ 20)

4-AFC (n ¼ 22)

Means

85.6 71.9 78.8

84.4 62.5 73.5

71.0 71.6 71.4

80.0 68.8 74.4

68.1 70.6 69.4

73.8 71.3 72.5

65.4 58.0 61.6

69.0 66.4 67.6

76.9 71.3 74.1

79.1 66.9 73.0

68.3 64.8 66.5

74.5 67.5 71.0

p < 0.001, r ¼ 0.54, while there was no significant difference between central and peripheral descriptive details, F(1, 59) ¼ 0.95, ns, r ¼ 0.13 (see Table 1). Finally, the three factors interacted significantly, F(2, 59) ¼ 7.61, p < 0.001, partial 2 ¼ 0.21 (see Figure 1). Analyses of simple effects revealed that with open-ended questions, F(1, 59) ¼ 37.43, p < 0.01, r ¼ 0.62, and with T-F tests, F(1, 59) ¼ 94.75, p < 0.001, r ¼ 0.79, there were more correct answers for central than for peripheral actions. In contrast, with the 4-AFC test, there were no significant differences between central and peripheral actions, F(1, 59) ¼ 0.08, ns, r ¼ 0.04. With 4-AFC tests, participants’ performance was better for central descriptive details than peripheral descriptive details, F(1, 59) ¼ 10.77, p < 0.01, r ¼ 0.39, but there were no significant differences between central descriptive details and peripheral descriptive details with open-ended questions, F(1, 59) ¼ 1.24, ns, r ¼ 0.14, and with T-F tests F(1, 59) ¼ 0.10, ns, r ¼ 0.04. To sum up, with the open-ended questions and the T-F test, accuracy was higher than with the 4-AFC test. In general, the rate of performance was higher for central than for peripheral information but this effect was much more pronounced for action than for descriptive details (see Table 1). Confidence in correct answers We conducted a 3  2  2 ANOVA similar to the one for accuracy scores on the confidence in correct answers. The means for confidence in correct answers are displayed in Table 2. Since the question form factor was significant F(2, 59) ¼ 8.42, p < 0.005, partial 2 ¼ 0.22, posthoc Scheffe´ test were computed which indicated a greater level of confidence in answers to T-F and 4-AFC tests than with open-ended questions. Confidence in T-F and 4-AFC tests did not differ from each other. Confidence was significantly higher for action details (M ¼ 3.50) than for descriptive details (M ¼ 3.14), F(1, 59) ¼ 36.53, p < 0.001, r ¼ 0.62. With regard to centrality of information, confidence was much higher for central (M ¼ 3.51) than for peripheral information (M ¼ 3.13), F(1, 59) ¼ 49.75, p < 0.001, r ¼ 0.68. Question form showed a significant interaction with centrality of information, F(2, 59) ¼ 10.41, p < 0.001, partial 2 ¼ 0.26, and Type of Content interacted significantly with Centrality of Information, F(1, 59) ¼ 16.22, p < 0.001, partial 2 ¼ 0.22 (see Table 2). Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

718

I. Ibabe and S. L. Sporer

Figure 1. Interaction between Question Form  Type of Content  Centrality of Information for the percentage of correct answers

For peripheral information, the level of confidence was lower with open-ended questions than with T-F tests, F(1, 59) ¼ 30.04, p < 0.001, r ¼ 0.58, and 4-AFC tests, F(1, 59) ¼ 46.20, p < 0.001, r ¼ 0.66. There were no significant differences for central details, all Fs < 2.91, ns. Confidence for central actions was much higher than for peripheral actions, F(1, 59) ¼ 70.84, p < 0.001, r ¼ 0.74. Confidence for central descriptive details was also somewhat higher than for peripheral descriptive details, F(1, 59) ¼ 5.37, p < 0.05, r ¼ 0.29. Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

Question form, accuracy, and confidence

719

Table 2. Means of confidence expressed in correct answers as a function of question form, type of content and centrality of information (n ¼ 62) Content

Question form Open-ended (n ¼ 20)

Action details Central Peripheral Means Descriptive details Central Peripheral Means Means Central Peripheral Means

T-F (n ¼ 20)

4-AFC (n ¼ 22)

Means

3.74 2.75 3.25

3.86 3.37 3.62

3.76 3.49 3.63

3.79 3.21 3.50

3.21 2.73 2.97

3.14 3.16 3.15

3.33 3.25 3.29

3.23 3.05 3.14

3.47 2.74 3.11

3.50 3.26 3.38

3.55 3.37 3.46

3.51 3.13 3.32

Confidence in errors Since 15 participants had made no errors in one or more of the four categories (central action, peripheral action, central descriptive and peripheral descriptive details), their scores could not be used for a repeated measures ANOVA of confidence in errors. Therefore the analysis is based only on 47 participants.3 Again, a 3  2  2 ANOVA was computed. The means are shown in Table 3. Unlike our results for correct answers, the main effect of question form was not significant, F(2, 44) ¼ 0.45, ns, partial 2 ¼ 0.02. Type of content, F(1, 44) ¼ 32.65, p < 0.001, r ¼ 0.65, and centrality of information, F(1, 44) ¼ 41.51, p < 0.001, r ¼ 0.70, Table 3. Means of confidence expressed in incorrect answers as a function of question form, type of content and centrality of information (n ¼ 47) Content

Question form Open-ended (n ¼ 16)

Action details Central Peripheral Means Descriptive details Central Peripheral Means Means Central Peripheral Means

T-F (n ¼ 12)

4-AFC (n ¼ 19)

Means

3.53 2.10 2.81

3.94 2.63 3.28

3.47 2.06 2.76

3.64 2.27 2.95

2.22 2.72 2.47

2.33 2.20 2.27

2.34 2.56 2.45

2.30 2.48 2.39

2.87 2.41 2.64

3.13 2.42 2.78

2.90 2.31 2.61

2.97 2.37 2.67

3 This does not necessarily imply that the analyses presented above were invalid due to ceiling effects. Rather, the repeated-measures analysis in this section is not possible when anyone of the four answer types contained no errors which was somewhat more frequent in the T-F than in the other conditions.

Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

720

I. Ibabe and S. L. Sporer

however, did yield highly significant main effects. Erroneous action details were provided with higher confidence ratings (M ¼ 2.95) than descriptive details (M ¼ 2.39), and confidence for false central information (M ¼ 2.97) was higher than for peripheral information (M ¼ 2.37). Question Form interacted significantly with Type of Content, F(2, 44) ¼ 5.84, p < 0.01, partial 2 ¼ 0.21, and Type of Content showed a highly significant interaction with Centrality of Information, F(1, 44) ¼ 62.58, p < 0.01, partial 2 ¼ 0.59 (see Table 3). The differences between the three question forms were significant for action details, F(2, 44) ¼ 3.66, p < 0.05, partial 2 ¼ 0.08, but there were no differences for descriptive details, F(2, 44) ¼ 0.43, ns, partial 2 ¼ 0.01. Confidence ratings for errors with central actions were much higher than for peripheral actions, F(1, 44) ¼ 129.37, p < 0.001, r ¼ 0.86, but no difference was found between confidence in errors for central descriptive details and peripheral descriptive details, F(1, 44) ¼ 1.41, r ¼ 0.18. Comparing confidence in correct vs. incorrect answers In addition, we computed a mixed model ANOVA with the three Question Forms as between-subjects variable and accuracy (correct vs. incorrect), as well as type of content (action details vs. descriptive details) and centrality of information (central vs. peripheral) as repeated measures factors. Basically, this analysis combines the analyses of confidence in correct responses and in errors in a single design. However, anytime a participant did not make any errors with anyone of the four answer types, no confidence score for this type of error was available, thus reducing the total sample to 47 participants. The main effect for question type was not significant, F(1, 44) ¼ 1.04, p ¼ 0.364, nor its interaction with accuracy, F(1, 44) ¼ 2.54, p ¼ 0.090. Correct answers were given with much more confidence (M ¼ 3.32) than incorrect answers (M ¼ 2.67), F(1, 44) ¼ 88.71, p < 0.001, r ¼ 0.81. This effect size can also be interpreted as an indicator of the confidence-accuracy relationship. There were also several significant two-way and three-way interactions between accuracy and the other variables, indicating that confidence calibration varied as a function of both question type and information reported. For example, a significant interaction between accuracy, centrality and type of content, F(1, 44) ¼ 22.14, p < 0.001, revealed that confidence in correct central action details was highest (M ¼ 3.79), but almost as high (M ¼ 3.64) for incorrect central action details, and lowest for incorrect peripheral action details (M ¼ 2.27). However, we should be cautious with an interpretation of this threeway interaction due to the small sample sizes involved. One must also keep in mind that the means reported represent averages over different questions (i.e. witnesses got different questions right and wrong). If we look at individual items as the unit of analysis—by averaging over participants—we can also assess the relationship between the difficulty of an item and the confidence expressed in individual answers given to these items. Average accuracy for all 32 items ranged from a mean proportion of correct solutions of 0.23 to 1.0, with a mean of 0.71. Proportion of correct solutions showed a very high linear correlation with confidence, r(30) ¼ 0.70, p < 0.001. DISCUSSION This experiment confirms that the way questions are asked, that is, question form, influences the accuracy of eyewitnesses’ responses. In addition to the format in which a Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

Question form, accuracy, and confidence

721

question is asked, researchers are paying increasingly more attention to the type of content; in particular central vs. peripheral information, or action vs. descriptive information (e.g. Migueles & Garcia-Bajos, 1999; Wright & Stroud, 1998). These distinctions have not only played an important role in research on memory for emotional events (e.g. Christianson & Loftus, 1991) and on the misinformation effect (e.g. Heath & Erickson, 1998) but may also be important for any type of eyewitness report. From an applied perspective, investigators and triers of fact (e.g. jurors) are more likely to be concerned with central action details considered essential for effective prosecution and resolution of a case than with peripheral information although the latter sometimes also becomes important in individual cases. We first summarize the major findings and then discuss the results concerning accuracy and confidence separately in more detail. In general, answers to open-ended questions and T-F tests resulted in higher performance than to 4-AFC tests. The results obtained with respect to type of content and centrality of information confirmed our hypothesis that central information will be remembered better than peripheral information and that action details are remembered better than descriptive details. Since the effect sizes for the latter two effects are quite large, these distinctions appear to be quite important. Moreover, these factors have different effects as a function of question form. With open-ended questions, participants received higher accuracy scores with peripheral information than they did with forced choice tests. Question form, as well as the type of content and centrality of information, also had strong effects on the confidence in correct answers. Confidence in correct answers with open-ended questions was lower than with forced-choice tests. With open-ended questions, confidence was higher for central than for peripheral information, but there were no comparable differences with T-F and 4-AFC tests. In contrast, confidence for errors was higher with central than with peripheral information with all question forms. In addition, confidence in errors with T-F tests was greater for actions than for descriptive details, but there were no significant differences with open-ended questions and 4-AFC tests. We will now discuss these findings in more detail in light of the recall-recognition distinction and other recent findings. Accuracy We must first acknowledge that any study comparing accuracy as a function of question form depends highly on the way individual questions are constructed, as well as on the difficulty level of specific questions. Although we have put much effort into attempting to formulate questions that are as parallel across question forms as possible, the external validity of this and other studies is always threatened by this problem. The problem is exacerbated when comparing performance between AFC tests with different numbers of response alternatives, because each response alternative has to be constructed with equal plausibility to serve as a reasonable distractor, and of course, as the number of (plausible) alternatives in AFC tests increases, the likelihood of guessing the correct answer diminishes. However, the T-F format where a respondent has to affirm or deny a given observation (e.g. the presence of an object) is not necessarily easier than choosing one of four alternatives, one of which can be assumed to be correct. Although the chance expectations of being correct differ between these two tasks (0.50 vs. 0.25), in the latter case it is sufficient to choose the most familiar looking alternative, whereas with the T-F test the choice is not between the better of two alternatives (e.g. one of two colours) but whether an observation has been made or not. Thus, the T-F test resembles more a Yes-No Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

722

I. Ibabe and S. L. Sporer

recognition task, which is easier than a 2-AFC task where one knows that one of the response alternatives is definitely correct (cf. signal detection theory; e.g. Lockhart & Murdock, 1970). We started this study with the hypothesis that with open-ended questions the rate of performance would be higher than with T-F and 4-AFC tests (cf. Lipton, 1977). Our expectations were confirmed in so far as participants received a higher number of correct answers with open-ended questions (74.1%) than with 4-AFC tests (66.5%). These results are in line with findings by Hollins and Perfect (1997, Exp. 2), who also observed openended responses to be more accurate than 4-AFC responses. However, our results did not agree with the general expectation that performance in open-ended recall tests is generally higher than in any recognition tests because the difference between open-ended questions (74.1%) and T-F tests (73.0%) was not reliable. One possible explanation might be that by virtue of our instructions, participants were forced to answer all open-ended questions, thus introducing a higher error rate. As Yarmey and Yarmey (1997) have suggested, accuracy in open-ended questions may be higher when respondents are not forced to give an answer. Our data confirm our hypothesis derived from Gestalt psychological principles that accuracy would be greater for central than for peripheral information. Our results also support the notion that performance is higher with action than with descriptive details (Burke et al., 1992; Christianson & Loftus, 1987, 1991). These effects were modified, however, by an interaction of these factors. We obtained only higher scores for central actions than for peripheral actions, but there was no difference between central descriptive and peripheral descriptive details. This replicates Ibabe’s (unpublished dissertation, 1998, Exp. 1 and 3) studies which obtained the same pattern of results. Ibabe’s first experiment used slides of the kidnapping of a child, and the third experiment used the same film as in this study about a car theft. Being able to replicate these findings with different stimulus materials implies that type of content and centrality appear to be important across different events and different presentation forms. Confidence The confidence in correct and incorrect answers varied as a function of question form. With open-ended questions the level of confidence in correct answers was lower than with the two other tests. However, there were no differences between question type for confidence in errors. In a similar study by Robinson et al. (1997), participants watched a film of a robbery that lasted 3 min. Half of the participants answered 32 open-ended questions, and the other half answered the same number of 4-AFC questions. The confidence in the correct answers with open-ended questions was higher than with 4-AFC tests. However, for incorrect answers, the result was the opposite. The difference in results in our study may be due to the fact that Robinson et al. (1997) did not control for the type of content or for the centrality of information about the event across question forms as we did in our study. With peripheral information, the confidence in correct answers was higher with the 4-AFC test than with open-ended questions. Since participants primarily pay attention to central information when they watch an event, they may not encode all the peripheral information and thus feel more confident when they pick a given alternative than they do when having to retrieve a piece of information requested in open-ended questions. We found no differences in confidence in errors among the three question forms with regard to descriptive details. However, confidence was higher for actions with the T-F test Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

Question form, accuracy, and confidence

723

than with the two other tests. In a previous study, Ibabe (unpublished dissertation, 1998, Exp. 1) reported that with a T-F test the confidence with false alarms corresponding to actions was higher than it was with descriptive details. False alarms occur when a participant decides that a statement is true, possibly because the contents agree with the scheme of the story and confidence is high even if the statement is incorrect. Taken as a whole, confidence for both correct and incorrect answers was greater for action details than for descriptive details, and for central rather than for peripheral information. Also, for central actions, confidence was greater than it was for peripheral actions. In this respect, the results follow those for accuracy. It appears that witnesses are not only more correct with action and with central details, compared to descriptive and peripheral information, but may also use the fact that the information to be retrieved refers to an action or a central aspect of the event as a metacognitive cue for confidence. From a practical point of view, this would imply that evaluators of witnesses’ testimony should be aware of such tendencies, and therefore should not be too impressed by the relative high confidence witnesses place in action details or central information. One needs to be cautious with this interpretation as the confidence scores for each participant are averaged over only a few correct and incorrect answers to different questions which may vary in item difficulty. However, the much higher confidence in correct compared to incorrect answers also revealed that participants did very well in monitoring their accuracy (cf. recent discussions on calibration, e.g. Bornstein & Zickafoose, 1999; Weber & Brewer, 2003). They adjusted their confidence well, depending on the difficulty of a particular question. Our analysis draws upon within-subjects comparisons of confidence, which is much more sensitive to participants’ ability to monitor the correctness of their responses than the betweensubjects confidence-accuracy relationship frequently reported for identification decisions (see Sporer et al., 1995). With identifications, the point-biserial correlation rests on only two data points per participant whereas here 32 items of varying difficulty entered into the calculation. Our finding (if replicable with different stimulus material and questions) would also have important practical implications: (1) Researchers and practitioners alike should clearly separate confidence in response to individual questions about the event from confidence in an identification decision. (2) While the confidence-accuracy relationship for identification decisions may be low (or modest), and malleable by post-identification feedback and other variables (Wells & Bradfield, 1998, 1999), witnesses may be better able to calibrate their confidence with respect to central details, perhaps depending on whether the question refers to actions or peripheral information. In summary, accuracy was higher with open-ended and T-F questions than with a 4-AFC test. In general, with recognition tests, performance is likely to decrease as the number of alternatives increases: thus, accuracy was greater with the T-F than with the 4-AFC test. When response alternatives are equally likely, one is more likely to guess the correct answer when fewer alternatives are available (T-F test) than with more alternatives (4-AFC test). There is a clear consensus that in recall tasks participants use more elaborated strategies (Morris & Gruneberg, 1994), whereas in recognition memory they tend to answer on the basis of the familiarity of the alternatives and with less intentional effort (Horton, Pavlick, & Moulin-Julian, 1993). Perhaps for that reason more mistakes are made in recognition tests when item difficulty is higher (Marquis et al., 1972). From an applied perspective, the most suitable question form appears to be open-ended questions (74.1% accuracy). Even though with T-F tests the level of accuracy (73.0%) may be comparable the confidence in errors for actions is greater than with open-ended (and Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

724

I. Ibabe and S. L. Sporer

4-AFC) questions. Moreover, it is important to note that with open-ended questions, especially concerning peripheral information, participants show a lower level of confidence—even if the answers are correct. Thus, when an eyewitness expresses low confidence in an answer concerning peripheral information his or her answer may nonetheless be accurate. Nonetheless, participants’ confidence was generally much higher for correct than for incorrect answers. Thus, a specific answer expressed with more confidence should be weighed more heavily than one with low confidence, except for central action details. The present experiment involved a film of a robbery, and questions were posed after a short retention interval. In actual criminal cases, the amount of emotional involvement may be stronger, and the delay between witnessing an event and the interrogation longer, frequently involving repeated recall attempts (Ebbesen & Rienick, 1998). Thus, the theoretical principles investigated here should also be tested under more ecologically valid conditions. We would also like to acknowledge that the question forms studied here are not the only forms of questions used in forensic practice. Free narratives are often followed by more specific cued-recall questions which should be nonleading and nonsuggestive (Yarmey & Yarmey, 1997). Interrogative questioning may also benefit from special interview strategies such as the cognitive interview (Fisher & Geiselman, 1992; Malpass, 1996) which have been demonstrated to improve recall (despite some increase in false information). ACKNOWLEDGEMENTS Preparation of this article was supported by grants from the Basque Country Government (B.2 4/99053) to the first author, and by the German Science Foundation (DFG: Sp 262/3-2) to the second author. The second author would also like to thank the Department of Psychology at Canterbury University, Christchurch, NZ, for the hospitality during his sabbatical visit which greatly facilitated completion of this manuscript.

REFERENCES Binet, A. (1990). La suggestibilite´. Paris: Schleicher Fre`res. Bornstein, B. H., & Zickafoose, D. J. (1999). ‘I know I know it, I know I saw it’: the stability of the confidence-accuracy relationship across domains. Journal of Experimental Psychology: Applied, 5, 76–88. Burke, A., Heuer, F., & Reisberg, D. (1992). Remembering emotional events. Memory & Cognition, 20, 277–290. Cassel, W. S., Roebers, C. E. M., & Bjorklund, D. F. (1996). Developmental patterns of eyewitness responses to repeated and increasingly suggestive questions. Journal of Experimental Child Psychology, 61, 116–133. Christianson, S. A., & Loftus, E. F. (1987). Memory for traumatic events. Applied Cognitive Psychology, 1, 225–239. Christianson, S. A., & Loftus, E. F. (1991). Remembering emotional events: the fate of detailed information. Cognition and Emotion, 5, 81–108. Clifford, B. R., & Scott, J. (1978). Individual and situational factors in eyewitness testimony. Journal of Applied Psychology, 63, 352–359. Cronbach, L. J. (1990). Essentials of psychological testing. New York: HarperCollins Publishers. Deffenbacher, K. A. (1991). A maturing of research on the behavior of eyewitnesses. Applied Cognitive Psychology, 5, 377–402. Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

Question form, accuracy, and confidence

725

Ebbesen, E. B., & Rienick, C. B. (1998). Retention interval and eyewitness memory for events and personal identifying attributes. Journal of Applied Psychology, 83, 745–762. Fisher, R. P., & Geiselman, R. E. (1992). Memory-enhancing techniques for investigative interviewing. Springfield, IL: Charles Thomas. Goldstein, E. B. (1996). Sensation and perception. Pacific Grove, CA: Brooks/Cole. Heath, W. P., & Erickson, J. R. (1998). Memory for central and peripheral actions and props after varied post-event presentation. Legal and Crimonological Psychology, 3, 321–346. Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley. Heuer, F., & Reisberg, D. (1990). Vivid memories of emotional events: the accuracy of remembered minutiae. Memory & Cognition, 18, 496–506. Hollins, T. S., & Perfect, T. J. (1997). The confidence-accuracy relation in eyewitness event memory: the mixed question type effect. Legal and Criminological Psychology, 2, 205–218. Horton, D. L., Pavlick, T. J., & Moulin-Julian, M. W. (1993). Retrieval-based and familiarity-based recognition and the quality of information in episodic memory. Journal of Memory and Language, 32, 39–55. Ibabe, I. (1998). Confianza y exactitud en el testimonio y la identification de los testigos presenciales. [Confidence and accuracy in eyewitness testimony]. Unpublished doctoral dissertation, University of Basque Country, San Sebastian, Spain. Jones, E. E., & Nisbett, R. E. (1971). The actor and the observer: divergent perceptions of the causes of behavior. In E. E. Jones, D. E. Kanouse, H. H. Kelley, R. E. Nisbett, S. Valins, & B. Weiner (Eds.), Attribution: Perceiving the causes of behavior (pp. 79–94). Morristown, NJ: General Learning Press. Lipton, J. L. (1977). On the psychology of eyewitness testimony. Journal of Applied Psychology, 62, 90–95. Lockhart, R. S., & Murdock, B. B. (1970). Memory and the theory of signal detection. Psychological Bulletin, 74, 100–109. Loftus, E. F., Loftus, G. R., & Messo, J. (1987). Some facts about ‘weapon focus’. Law and Human Behavior, 11, 55–62. Luus, C. A. E., & Wells, G. L. (1994). Eyewitness identification confidence. In D. F. Ross, J. D. Read, & M. P. Toglia (Eds.), Adult eyewitness testimony: Current trends and developments (pp. 348–361). New York: Cambridge University Press. Malpass, R. S. (1996). Enhancing eyewitness memory. In S. L. Sporer, R. S. Malpass, & G. Koehnken (Eds.), Psychological issues in eyewitness identification (pp. 177–204). Mahwah, NJ: Erlbaum. Marquis, K., Marshall, J., & Oskamp, S. (1972). Testimony validity as a function of question form, atmosphere, and item difficulty. Journal of Applied Social Psychology, 2, 167–186. Marston, W. M. (1924). Studies in testimony. Journal of Criminal Law and Criminology, 15, 5–31. Migueles, M., & Garcia-Bajos, E. (1999). Recall, recognition and confidence patterns in eyewitness testimony. Applied Cognitive Psychology, 13, 237–256. Morris, P. E., & Gruneberg, M. M. (1994). The major aspects of memory. In P. E. Morris, & M. M. Gruneberg (Eds.), Theoretical aspects of memory (pp. 29–49). New York: Routledge. Mullen, B. (1989). Advanced basic meta-analysis. New Jersey: Lawrence Erlbaum Associates. Peterson, C., & Biggs, M. (1997). Interviewing children about trauma: problems with ‘specific’ questions. Journal of Traumatic Stress, 10, 279–290. Ramachandran, V. S., & Anstis, S. M. (1986, June). The perception of apparent motion. Scientific American, 254, 102–109. Richardson, S. A., & Dohrenwend, B. S., & Klein, D. (1965). Interviewing: Its forms and functions. London: Basic Books. Robinson, M. D., Johnson, J. J., & Herndon, F. (1997). Reaction time and assessments of cognitive effort as predictors of eyewitness memory accuracy and confidence. Journal of Applied Psychology, 82, 416–425. Rock, I., & Palmer, S. (1990, December). The legacy of Gestalt psychology. Scientific American, 263, 84–90. Schneider, W., & Pressley, M. (1997). Memory development between 2 and 20 (2nd ed.). New York: Springer-Verlag. Sherif, M., & Sherif, C. W. (1969). Social psychology. New York: Harper & Row. Sporer, S. L. (1982). A brief history of the psychology of testimony. Current Psychological Reviews, 2, 323–339.

Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)

726

I. Ibabe and S. L. Sporer

Sporer, S. L., Penrod, S., Read, D., & Cutler, B. (1995). Choosing, confidence, and accuracy: a metaanalysis of the confidence-accuracy relation in eyewitness identification studies. Psychological Bulletin, 118, 315–327. Sporer, S. L. (1996). Psychological aspects of person descriptions. In S. L. Sporer, R. S. Malpass, & G. Koehnken (Eds.), Psychological Issues in Eyewitness Identification (pp. 53–86). Mahwah, NJ: Erlbaum. Steblay, N. M. (1992). A meta-analytic review of the weapon-focus effect. Law and Human Behavior, 16, 413–424. Stern, W. L. (1902). Zur Psychologie der Aussage [On the psychology of report]. Zeitschrift fu¨r die Gesamte Strafrechtswissenschaft, 22, 315–370. Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics. New York: HarperCollins College Publishers. Tichner, A., & Poulton, E. (1975). Watching for people and actions. Ergonomics, 18, 35–51. Weber, N., & Brewer, N. (2003). The effect of judgment type and confidence scale on confidenceaccuracy calibration in face recognition. Journal of Applied Psychology, 88, 490–499. Wells, G. L., & Bradfield, A. L. (1998). ‘Good, you identified the suspect’: Feedback to eyewitnesses distorts their reports of the witnessing experience. Journal of Applied Psychology, 83, 360–376. Wells, G. L., & Bradfield, A. L. (1999). Distortions in eyewitnesses’ recollections: can the postidentification-feedback effect be moderated? Psychological Science, 10, 138–144. Wells, G. L., & Leippe, M. R. (1981). How do triers of fact infer the accuracy of eyewitness identification? Using memory for peripheral detail can be misleading. Journal of Applied Psychology, 66, 682–687. Wright, D. B., & Stroud, J. N. (1998). Memory quality and misinformation for peripheral and central objects. Legal and Criminological Psychology, 3, 273–286. Yarmey, A. D. (1979). The psychology of eyewitness testimony. New York: Free Press. Yarmey, A. D., & Yarmey, M. J. (1997). Eyewitness recall and duration estimates in field settings. Journal of Applied Social Psychology, 27, 330–344. Yuille, J. C., & Daylen, J. D. (1998). The impact of traumatic events on eyewitness memory. In C. P. Thompson, D. J. Herrmann, J. D. Read, D. Bruce, D. G. Payne, & M. P. Toglia (Eds.), Eyewitness memory: Theoretical and applied perspectives (pp. 155–178). Mahwah, NJ: Lawrence Erlbaum.

Copyright # 2004 John Wiley & Sons, Ltd.

Appl. Cognit. Psychol. 18: 711–726 (2004)