The Aging Eyewitness: Effects of Age on Face ... - Semantic Scholar

1 downloads 89 Views 126KB Size Report
memory (Bartlett & Fulton, 1991; Bartlett, Strater, & Fulton,. 1991; Smith & Winograd, 1978), older participants made more false choices of a lineup ''foil'' than did ...
Journal of Gerontology: PSYCHOLOGICAL SCIENCES 2003, Vol. 58B, No. 6, 000–000

Proof only. Not for distribution.

Copyright 2003 by The Gerontological Society of America

The Aging Eyewitness: Effects of Age on Face, Delay, and Source-Memory Ability Amina Memon,1 James Bartlett,2 Rachel Rose,3 and Colin Gray1 1

Department of Psychology, University of Aberdeen, Texas. School of Human Development, University of Texas at Dallas. 3 Department of Psychology, Kingston University.

2

As a way to examine the nature of age-related differences in lineup identification accuracy, young (16–33 years) and older (60–82 years) witnesses viewed two similar videotaped incidents, one involving a young perpetrator and the other involving an older perpetrator. The incidents were followed by two separate lineups, one for the younger perpetrator and one for the older perpetrator. When the test delay was short (35 min), the young and older witnesses performed similarly on the lineups, but when the tests were delayed by 1 week, the older witnesses were substantially less accurate. When the target was absent from the lineups, the older witnesses made more false alarm errors, particularly when the faces were young. When the target was present in the lineups, correct identifications by both young and older witnesses were positively correlated with a measure of source recollection derived from a separate face-recognition task. Older witnesses scored poorly on this measure, suggesting that source-recollection deficits are partially responsible for age-related differences in performance on the lineup task.

O

NE striking and ecologically valid test of memory is provided by performance on a lineup test following a witnessed incident. An eyewitness’s decision in the lineup task is likely to be influenced by several variables, including the conditions at the time of \encoding, the characteristics of the test situation, the personal characteristics of the witness, and the characteristics of the event (for reviews, see Memon, Vrij, & Bull, 2003; Wright & Davies, 1999). One particularly important factor, however, and one with clear implications for memory, is the age of the eyewitness (Memon & Bartlett, 2002; Memon, Hope, Bartlett, & Bull, 2002; Searcy, Bartlett, & Memon, 1999, 2000; Searcy, Bartlett, Memon, & Swanson, 2001; Searcy, Bartlett, & Seipel, 2000). In one relevant study (Searcy et al., 1999), younger (18- to 30-year-old) and older (60- to 80-year-old) witnesses viewed a crime video, after which they were asked to identify the perpetrator in a photo-identification lineup. Consistent with predictions based on standard laboratory tests of face-recognition memory (Bartlett & Fulton, 1991; Bartlett, Strater, & Fulton, 1991; Smith & Winograd, 1978), older participants made more false choices of a lineup ‘‘foil’’ than did younger participants. The age-related increase in false identifications has been replicated in several subsequent studies, many of which have also reported an age-related reduction in the hit rate (Searcy, Bartlett, & Memon, 2000; Searcy, Bartlett, & Seipel, 2000; Searcy et al., 2001; Memon et al., 2002; Memon & Bartlett, 2002). Although age-related deficits in lineup performance are now well established, it is important to note that, in the studies cited, all the lineup faces were young. Hence, the young witnesses were tested with ‘‘same-age’’ faces whereas the older witnesses were tested with ‘‘other-age’’ faces. This confound is important in light of the finding by Bartlett and Leslie (1986) that agerelated differences in face-recognition memory were reduced when older faces were used: younger participants showed an advantage with younger faces, whereas older adults showed no

effect of age of face. The same asymmetric other-age effect has also been reported by Rodin (1987) and by Fulton and Bartlett (1991; see also List, 1986, for related results). In Fulton and Bartlett, however, the other-age effect was found with ‘‘hits’’ (correct recognitions of previously viewed faces), but not with false alarms (erroneous recognitions of foil faces). False alarms were more frequent among older participants, regardless of age of face. The possibility that an other-age effect might occur in the context of eyewitness identification was examined by Wright and Stroud (2002), who found that young-adult and middleaged viewers of target-present lineups were more accurate with same-age faces than with other-age faces (i.e., middle-aged faces for young viewers and young-adult faces for middle-aged viewers). The other-age effect, however, was not found in target-absent lineups, which is in partial agreement with Fulton and Bartlett (1991). Wright and Stroud, however, did not replicate the Fulton and Bartlett finding of higher false-alarm rates among older viewers. This may be because the older participants in the Wright and Stroud study ranged from 35 to 55 years old, whereas those in the Fulton and Bartlett study were all over 60. Nonetheless, we are left with an unanswered question: Are the inflated rates of false identification shown by persons over the age of 60 restricted to lineups of young faces? The question is important, because, from a forensic point of view, we need to know whether older eyewitnesses are any more (or less) reliable than young adults with lineups of older faces. Moreover, from the standpoint of theory, it is time to answer those critics who have suggested that current conceptions of cognitive aging are too laboratory based to permit clear predictions about real-life memory tasks, including eyewitness identification (see Park, 2000, for a review). Any adequate theory of the other-age effect must explain its occurrence, or failure to occur, in real-life situations such as lineup identification. With these considerations in mind, we designed a study to assess age differences in lineup performance using both target-present and

Grnb-58-06-09  Thursday, 4 September 2003  2:06 pm  Allen Press, Inc.  Page 43

MEMON ET AL.

target-absent lineups of both young-adult and older-adult faces at each of two test delays (35 min and 1 week). In addition to rectifying the age confound that made it difficult to interpret some previously published findings, we also sought to determine whether a currently influential hypothesis for age differences in memory could be applied to the lineup task. A number of theorists have recently argued that many age differences in memory occur because older people have more difficulty in recalling contextual and perceptual details that specify the ‘‘source’’ of retrieved information (Johnson, Hashtroudi, & Lindsay, 1993; Spencer & Raz, 1995). Older witnesses are therefore more likely to base their memory judgments on generalized feelings of familiarity (Bartlett, 1993; Bartlett et al., 1991; Dywan & Jacoby, 1990; Jennings & Jacoby, 1997; Koutstaal, Schacter, Galluccio, & Stofer, 1999). Several recent studies have suggested that deficient source memory is a factor in age differences in face recognition, particularly age differences in false-alarm errors (Bartlett, 1993; Searcy et al., 1999). Bartlett and Fulton (1991) have provided evidence that false-alarm errors with entirely new faces reflect an age-related increase in the use of familiarity for making recognition judgments. Bartlett and his associates argued that, because the set of human faces is highly homogenous, even new faces will often seem familiar because of their resemblance to faces seen in life. Recollection of source may aid in the rejection of new-but-familiar-looking faces. Source recollection, however, is impaired in old age, and this may be the reason that false recognitions are increased in old age (Searcy et al., 1999). Although the source-recollection hypothesis has received some support from research using standard laboratory paradigms for the probing of face memory, there is as yet, to our knowledge, no published evidence to confirm that source recollection affects performance on lineups. In pursuit of such evidence, we therefore examined the relation between lineup performance and a laboratory task that is known to tap source memory. The laboratory task was modeled on a study by Jennings and Jacoby (1997), in which a study list of words was followed by a recognition test in which the lure (new) items were repeated at different intervals (lags). They found that false-alarm rates were higher for repeated lures than for firsttime lures among older participants but not young adults. Thus, their older participants showed a deficit in distinguishing old items from repeated-lure items. Taken together with other results (Jennings & Jacoby, 1993), this finding suggests an agerelated deficit in recollection of source information. Extending the Jennings and Jacoby method to face recognition, we expected to observe a similar age deficit in distinguishing old items from repeated-lure items. Our principal concern, however, was with whether this deficit in recollection would show correlations with lineup performance. By the source-recollection hypothesis, old versus repeated-lure discrimination and lineup performance should be positively correlated, at least in older witnesses. In contrast, old versus nonrepeated-lure discrimination and lineup performance may show no correlation with lineup performance, or, at best, a weak correlation. Discrimination between old faces and nonrepeated lures is likely to be based on familiarity information, at least to a degree; and familiarity information, by our hypothesis, is largely age invariant.

We considered that the source-recollection hypothesis would be best tested by examining the effects of length of test delay on the lineup performance of younger and older adults. There is evidence that age-related source-memory impairments increase over time (Brown, Jones, & Davis, 1995; Schacter, Kazniak, Kihlstrom, & Valdserri, 1991; Henkel, Johnson, & DeLeonardis, 1998; see Yonelinas, 2002 for a review). Considering this evidence in relation to our hypothesis that age-related deficits in the lineup task reflect age-related deficits in recollection, we can predict that older witnesses will show larger impairments in lineup performance with longer test delays. Whereas delay is a variable of great relevance to forensics, its effects in lineup tasks remain largely unexplored (Kassin, Tubb, Hosch, & Memon, 2001). The available evidence from laboratory studies suggests that the effects of delay on face memory may vary with the type of measure. For example, hit rates on a facerecognition task may decline with delay whereas the falsealarm rates may show little change (Shepherd, 1983). In contrast, Sporer (1992) found a decrease in hits and an increase in false alarms over various intervals up to 3 weeks. Moreover, in an eyewitness study, Gwyer and Clifford (1997) noted little difference following a 48- or 96-hour delay on any of their measures of eyewitness recall or recognition. A metaanalysis of 128 studies of face recognition (80% laboratory based) and 960 conditions suggests there is a linear decline in hits to ‘‘old’’ faces after a delay and no clear effect of delay on false alarms to ‘‘new’’ faces (Shapiro & Penrod, 1986). However, because none of these studies compared young adults with older adults, they do not speak to the prediction of the sourcerecollection hypothesis that age-related deficits in lineup performance become more marked with increases in test delay. In summary, in this study we examined age differences in lineup performance with both young and older faces. We tested the hypothesis (which was based on laboratory studies such as that by Bartlett & Leslie, 1986) that the accuracy of performance with target-present lineups will show an agerelated deficit, but that the size of this deficit will be reduced with older faces. Additionally, we tested three predictions from the hypothesis that older witnesses’ problems in lineup performance reflect age-related deficits in source recollection: first, older adults will show deficits in a laboratory test of source memory; second, performance on this test will be positively correlated with performance on the lineup test; and third, agerelated differences in lineup performance will increase with longer test delays.

METHODS

Participants A total of 172 participants were tested (1 participant was excluded from the data analysis; see Table 1). Eighty-four young participants aged between 16 and 33 years (M ¼ 19.4 years) were recruited from local colleges, and 88 older participants had responded to posters and advertisements placed in local centers, clubs, and societies. The older participants were 60–82 years old (M ¼ 71.7 years). All participants reported that they were in good health. To determine the characteristics of our older sample, we asked all older adults to take the Mini-Mental State

Grnb-58-06-09  Thursday, 4 September 2003  2:06 pm  Allen Press, Inc.  Page 44

AGING AND EYEWITNESS IDENTIFICATION

Examination (MMSE; Folstein, Folstein, & McHugh, 1975) and complete the Geriatric Depression Scale (GDS; Brink et al., 1982). The GDS agrees well with clinical diagnoses and symptom checklists (Yesavage et al., 1983). Finally, the National Adult Reading Test, or NART (Nelson & O’Connell, 1978), was administered to all participants. Table 1 shows participants’ mean scores on these screening measures for each combination of delay and lineup type.

Table 1. Mean Scores on the NART and the MMSE and GDS for Each Combination of Lineup Type and Test Delay Target Present Age Group and Measure

Long

Short

Long

Young adults

n ¼ 22

n ¼ 20

n ¼ 21

n ¼ 21

NART

118.61 (3.48)

119.36 (3.65)

118.44 (3.71)

120.80 (3.12)

n ¼ 22

n ¼ 23

n ¼ 19

n ¼ 23

122.55 (3.74) 25.77 (7.03) 6.73 (5.09)

121.52 (5.38) 27.05 (4.47) 5.70 (3.51)

120.89 (5.43) 27.68 (1.57) 5.58 (5.50)

121.91 (3.45) 27.55 (1.63) 6.04 (5.36)

Older adults NART

Design Age group (young or old), lineup presentation type (target present or target absent), and delay were between-subject factors. All participants viewed two lineups, one for the older perpetrator, the other for the younger, the ordering of which was counterbalanced. For 87 of the participants, both lineups included the perpetrator; for the remaining 85 participants, the perpetrator was absent from both lineups. Within each age group and lineup-type condition, approximately half the participants made their lineup judgments after a short delay of 35 min, whereas the remainder made theirs after a longer delay of 1 week (see Table 1).

Materials Eyewitness events. —The incidents consisted of two separate video clips of a young man aged 22 years (young-target condition) or an older man aged 60 years (old-target condition) apparently breaking into a house. In both clips, the perpetrator followed the same script (which lasted for 50 s). The facial exposure times were also equal (43 s). In each video, the man rang the doorbell to establish if anyone was at home, went through the side gate round to the back of the house, entered the house through the back door, and a few seconds later emerged from the front door carrying a camera. The purpose of the successive presentation of the same incident was to obtain a statistically powerful within-groups comparison between participants’ ability to recognize old and young faces. Lineups. —Four lineups were constructed: A target-present (TP) and a target-absent (TA) lineup for the young perpetrator and a TP and TA lineup for the older perpetrator. The lineup photographs consisted of 20 cm 3 26 cm colored full-face head shots presented in a 3 3 2 array. The perpetrator’s photograph was taken a few days after the video so that hairstyle and other external features appeared as they did on the video. Following the recommendation of Wells (1993) that all lineup members must match the eyewitness’s prelineup description of the perpetrator, 20 independent raters rated a pool of photographs on a scale from 1 to 7 scale (where 7 ¼ a good match to the description). The five faces that received the highest ratings were selected as foils; a high-ranking foil was used for the absent condition. The lineup raters were drawn from technical and secretarial staff at the University of Southampton, who ranged in age from 20 to 50 years. The lineup for the young perpetrator contained foils ranging in age from 20 to 22 years. The lineup for the older perpetrator contained foils ranging in age from 55 to 65 years. The lineups were presented simultaneously.

Target Absent

Short

MMSEa GDS

Notes: NART ¼ National Adult Reading Test (mean scores are for all participants); MMSE ¼ Mini-Mental State Exam and GDS ¼ Geriatric Depression Scale (both for older adults only); lineup type is for target present or absent. Standard deviations are given in parentheses. a Three participants did not complete all the items of the MMSE and, as a result, had scores that were below normal (cutoff score ¼ 24). All three, however, had normal NART scores but had felt insulted by some of the questions in the MMSE. We therefore decided to retain those participants in the data set. The GDS scores of seven of the older participants were indicative of ‘‘mild’’ depression. Another person, however, had a score in the ‘‘severe’’ category, and it was decided to eliminate that person from the data set. The statistics shown apply to the reduced data set.

Face-source-recollection task. —The study list and test were constructed from a pool of 200 facial photographs of youngadult females unknown to the participants. The study list included 48 faces that, on the basis of their appearance, three independent raters had assigned to one of six different occupational categories. There were 15 faces in each of two large categories (teacher and nurse), 6 faces in each of two medium categories (hairdresser and shop assistant), and 3 faces in each of two small categories (model and hairdresser). (Color coding was also used to ‘‘cue’’ the derived categories to our participants both at study and at test. Thus, different occupations were shown on different colored backgrounds. Unfortunately, we did not counterbalance the assignment of faces to category condition or old–new status in this study, so the category size data are not presented here.) The study list also included 15 faces that the raters did not classify consistently, so there were 63 faces presented in all. The randomly ordered recognition test comprised the 63 old faces, plus 31 lures, including 3 lures from each of the six categories and 13 noncategorized lures. Each lure was repeated at a lag of either two or four photographs, creating three test conditions: (a) old faces; (b) first-time-presented lures; and (c) repeated lures. The participant’s task was to decide whether a face had been seen before or was new. The instructions stressed that even repeated-lure faces should be classified as ‘‘new.’’

Procedure Participants were randomly assigned to the delay or immediate test condition, and individual test sessions lasted approximately 40 min. Participants were informed they would be watching two short video clips. After the video, participants were presented with a large number of female faces, and they

Grnb-58-06-09  Thursday, 4 September 2003  2:06 pm  Allen Press, Inc.  Page 45

MEMON ET AL.

Table 2. Mean Proportions of Correct Responses by Young and Older Participants Under the Short and Long Delay Conditions of lineup Viewing Age Group Young adults Older adults M

Short Delay

Long Delay

M

.81 (.85) .66 (.73) .74

.90 (.70) .35 (.53) .61

.86 .49

Notes: Standard deviations are given in parentheses. Mean proportions of correct responses are over the two lineups.

rated each face for pleasantness on a scale from 1 to 5 scale (1 ¼ very pleasant). The face-rating task was untimed and took approximately 10 min. The older participants then completed the MMSE and took a 20-min break before completing the lineup and facerecognition tasks (short delay group). Because the younger participants did not complete the MMSE, they received a slightly longer break than the older adults to ensure that the delay between the face-recognition and lineup tasks (35 min) was the same for both groups. Those in the long delay group returned after 1 week to complete the lineup and face-recognition tasks. Each participant either viewed two lineups with the perpetrators present, or two from which the perpetrators were absent. The two tests were given in the same sequence as the video clip, so that participants who saw the young-man video first took the young lineup first. (Kendall’s tau b correlations were calculated to determine the effects, if any, of order of lineup presentation, young or old first, and participants’ choices in the lineup, that is, hit, false alarm, or miss. There were no significant correlations. The order in which the lineups were presented may nevertheless have influenced participants’ lineup choices.) The instructions were to look at each photograph carefully and to indicate whether one of the faces belonged to the person from the video. Participants were warned that hairstyle and clothing might not look the same and that, ‘‘just as in a real lineup, the culprit may or may not be present.’’ All participants were then asked to indicate how certain they were (on a 1–7 scale) that they were correct in their lineup choice. The procedure was repeated for the other video. The second part of the faces task was then completed. (The recognition phase of the face-source-recollection task took place at the very end of the experiment. Although all our young participants completed the study phase of the faces tack, because of class timetable constraints, 29 of the younger participants did not have time to complete the test phase.) The participants were told that they would see a list of face photographs, some of which they had seen in the first testing session and some of which were new. They were asked to judge each photograph as either ‘‘old’’ (seen in Session 1) or ‘‘new’’ (not seen in Session 1), and they were warned that new faces might be repeated in the test. It was stressed that ‘‘if a photo in this second session is repeated, you should still respond ‘new’ because it is still a new photo in the second session.’’ Older participants then completed the GDS, and all participants also completed a brief self-report questionnaire that asked them

whether each of the video events was a criminal act. One older participant refused to complete the faces task because of fatigue.

RESULTS In this section, analyses are reported for several different aspects of lineup performance: (a) the effects of age and delay upon accuracy of performance summed over the two lineups; (b) the proportions of correct and incorrect responses in the younger and older lineups considered separately and the proportions for the TP and TA conditions; and (c) the confidence of the participants in their lineup decisions. Finally, we considered lineup performance in relation to measures of facesource recollection.

Total Accuracy Scores Because each participant viewed two lineups, a measure of performance is the number of correct lineup judgments that a participant made, which can take the values 0 (neither correct), 1 (one correct), or 2 (both correct), corresponding to the proportions correct of 0, 0.5, and 1, respectively. The mean proportions correct for the young and older participants (across the short and long delay conditions) respectively were .43 (SD ¼ .38) and .25 (SD ¼ .32). The younger participants thus outperformed the older participants: U ¼ 2,726; z ¼ 3.14; p ¼ .002 (Mann–Whitney test for ordinal data). Table 2 shows that, in line with our predictions from the source-recollection hypothesis, the effect of delay upon the performance of the older adults was greater than that with the young participants: that is, there appears to be a Delay 3 Age Group interaction. Because the dependent variable consisted of categories rather than measurements, multinomial logistic regression was used to confirm the presence of a Delay 3 Age Group interaction: v2(6, N ¼ 171) ¼ 23.07, p , .001, and Nagelkerke’s R2 ¼ .145. This interaction is robust, both in the TP condition, v2(6, N ¼ 87) ¼ 13.97, p ¼ 03, and Nagelkerke’s R2 ¼ .172, and in the TA condition, v2(6, N ¼ 84) ¼ 16.91, p ¼ .01, and R2 ¼ .210.

Proportions of Hits, False Alarms, and Misses With the Young and Old Lineups Table 3 provides a more detailed view of the data for the TP condition. The entries are the proportions, for each of the two lineups, of participants’ choosing the perpetrator (hits), choosing one of the foils (false alarms), and making no choice (misses). It is immediately apparent from inspection of Table 3 that, with the young lineup, the older participants made more falsealarm errors than did the younger participants. (The same tendency can also be discerned in the data for the older lineup, but it is much less marked there.) For this reason, the hit rates were corrected to allow for the false-alarm rates (see Table 3 for an explanation of the correction). It is clear, however, that both the raw hit rates and the corrected hit rates were generally higher in the young group than in the older group. (The participant-age differences in hit rates, however, are larger with the old lineup than with the young lineup.) Because each participant made judgments with both the young and the old lineups, a separate multinomial logistic regression was carried out on the data from the old and young

Grnb-58-06-09  Thursday, 4 September 2003  2:06 pm  Allen Press, Inc.  Page 46

AGING AND EYEWITNESS IDENTIFICATION

Table 3. Summary of the Data for the TP Condition Young Lineup Participant Age and Test Delay

Old Lineup

Hits

False Alarms

Misses

Corrected Hits

Hits

False Alarms

Misses

.41 .35

.23 .00

.36 .65

.36 .35

.41 .35

.41 .25

.18 .40

Corrected Hits

Young participants Short delay Long delay M

.36

.33 .30 .32

Older participants Short delay Long delay

.45 .26

.41 .43

.14 .30

M

.37 .17

.36 .04

.45 .65

.27

.18 .30

.27 .09 .09

Note: TP ¼ target present. Data are for proportions of hits, false alarms, misses (no choice was made), and corrected hits by young adults and older adults in the short and long test delay groups with young and old lineups. Corrected hits are the hit rates ‘‘corrected’’ by subtraction of the false-alarm rates. (Because the prior probability of choosing one of the 5 foils is five times that of choosing the perpetrator, the false-alarm rates were divided by 5 prior to subtraction.) Both the ‘‘raw’’ hit rates and the corrected hit rates were generally higher in the young group than in the older group. However, the size of this age-related deficit was much larger in the long delay than in the short delay condition. The ‘‘corrected’’ proportions are unsuited to formal testing because different participants contributed to the hits and false-alarm rates, which were combined in the correction.

lineups. In each regression, the dependent variable was the choice category (hit, false alarm, or miss) and the independent variables were age group and delay. There were no reliable effects of any kind for the older lineup: v2(6, N ¼ 171) ¼ 10.421 and p ¼ .108. In the data for the young lineup, however, it is clear that the pattern of young participants’ responses was different from that of the older participants: the young participants had a higher hit rate, a higher miss rate, and a lower false-alarm rate. This pattern is confirmed by a significant main effect of age: v2(2, N ¼ 171) ¼ 24.45, p , .001, and R2 ¼ .150. If, for the moment, we disregard the age of the participant who is viewing the young lineup, it appears from Table 3 that the introduction of a delay had little effect on the distribution of hits, misses, and false-alarm rates: v2(2, N ¼ 171) ¼ 2.75 and p ¼ .253. It is also clear, however, that the delay variable had different (and quite complex) effects upon the distributions for the younger and older participants: there is a significant Age 3 Delay interaction, at v2(6, N ¼ 171) ¼ 32.43, p , .001, and R2 ¼ .194. An increase in delay resulted in a greater loss of accuracy (hits) in the older group of participants. The miss rate, in contrast, increased in both groups. An increase in delay had opposite effects upon the false-alarm rates in the two groups: in younger participants, false alarms increased; in older participants, they decreased. Turning now to the data from the TA condition, we see that Table 4 shows that, with the young lineup, age differences in false alarms are greater in the delay condition than in the immediate condition: v2(1, N ¼ 84) ¼ 5.72, p ¼ .017, and R2 ¼ .093. These data confirm the prior finding (Memon & Gabbert, 2003; Searcy et al., 1999, 2000) that, with young lineups, older adults make more false choices in TA situations than do younger adults. Beyond this, the results also show, for the first time, that this age-related difference is increased with test delay. The Age 3 Delay interaction is less evident in the data for the older lineup and is not statistically reliable: v2(1, N ¼ 84) ¼ 1.69 and p ¼ .194.

Confidence and Accuracy The younger participants were significantly more confident in their decisions on younger lineups than were the older

participants: F(1,171) ¼ 3.72, MSE ¼ 8.48, and p ¼ .05. The same comparison for the older lineup showed a nonsignificant tendency for the younger participants to be more confident than older participants: F(1,171) ¼ 3.10, MSE ¼ 8.14, and p ¼ .08. An examination of confidence accuracy relationships in the young participants and in the older participants showed only one reliable correlation, namely, a positive relationship between younger participants’ ratings of confidence on the old lineup and the accuracy of their decisions on that lineup: rpb ¼ .29, N ¼ 84, and p ¼ .008. There was no effect of delay upon confidence ratings.

Face-Source-Recollection Data Table 5 shows the data from the face-recognition and sourcerecollection test administered at the end of the experimental sessions. ‘‘Old’’ judgments in response to old faces (hits) were less frequent in the long delay condition than in the short delay condition, but, as in much prior research (Searcy et al., 1999), the hit rates did not differ significantly for young and older witnesses (M ¼ 58 and M ¼ .55, respectively, in the short delay condition, and M ¼ .51 and M ¼ .49, respectively, in the long delay condition). An analysis of variance (ANOVA) of the hit rates using age group and delay as between-groups factors showed a main effect of test delay, F(1,127) ¼ 5.20, MSE ¼ .027, and p ¼ .03, but there was no participant-age effect, and no Participant-Age 3 Delay interaction (F , 1). Erroneous ‘‘old’’ judgments in response to new and newrepeat items (false alarms) were also less frequent in the long delay condition than in the short delay condition. Additionally, consistent with what we found in our lineup data, false-alarm rates for new faces were higher for older adults than younger adults at both test delays (M ¼ .39 and M ¼ .26, respectively, in the short delay condition, and M ¼ .34 and M ¼ .20, respectively, in the long delay condition). False-alarm rates for new-repeat faces showed a still stronger age-related increase: M ¼ .59 and M ¼ .35 for old and young adults in the short delay condition and M ¼ .49 and M ¼ .25 for old and young adults in the long delay condition. We conclude that lure-repetition increased false alarms, but this effect was stronger among older adults (M differences ¼ .20 and .15 for the short delay and long delay conditions, respectively) than among young adults (M

Grnb-58-06-09  Thursday, 4 September 2003  2:06 pm  Allen Press, Inc.  Page 47

MEMON ET AL.

Table 4. Proportions of FAs and CRs for the Young and Old Lineups in the Target-Absent Condition Young Lineup Participant Age and Test Delay

FA

CR

Old Lineup FA

Table 5. Mean Proportions and Standard Deviations of ‘‘Old’’ Judgments in Response to Old Faces (Hit Rates), New Faces, and New-Repeat Faces (False-Alarm Rates) in the Face-Source-Memory Test

CR

Test Item

Young adults Short delay Long delay

.57 .38

.43 .62

.62 .52

.38 .48

.79 .92

.21 .08

.74 .71

.26 .29

New

New Repeat

n

M

SD

M

SD

M

SD

26 25

.58 .51

.11 .15

.26 .20

.12 .13

.35 .25

.21 .17

38 46

.55 .49

.15 .20

.39 .34

.18 .16

.59 .49

.21 .21

Young adults

Older adults Short delay Long delay

Participant Age and Test Delay

Old

Note: FA ¼ false alarm; CR ¼ correct rejection.

Short delay Long delay Older adults Short delay Long delay

differences ¼ .09 and .05, respectively). An ANOVA of the false-alarm rates with age and test delay as between-groups factors and lure repetition as a within-groups factor showed reliable main effects for test delay, F(1,127) ¼ 6.07, MSE ¼ .05, and p ¼ .02, participant age, F(1,127) ¼ 42.1, MSE ¼ .05, and p , .001, and lure repetition, F(1,131) ¼ 68.1, MSE ¼ .014, and p , .001; moreover, there was a reliable Age 3 Lure-Repetition interaction: F(1,131) ¼ 13.0, MSE ¼ .015, and p , .001. The interaction is suggestive of an age-related deficit in conscious recollection (cf. Jennings & Jacoby, 1997). Moreover, Table 5 shows that the young adults (but not the older adults) could distinguish old-test faces from new-repeat faces. In the young adult group, hit rates for old faces and false-alarm rates for newrepeat faces averaged .55 and .30, respectively: F(1,47) ¼ 75.8, MSE ¼ .02, and p , .001. In the older group, they averaged .52 and .54, respectively (F , 1).

Relations of Source Memory to Lineup Performance The principal question motivating this study was whether recollection of source information is specifically related to performance on lineups. To address it, we examined correlations between lineup performance and two measures derived from the face-recognition test. The source-memory measure was the hit rate for old faces minus the false-alarm rate for newrepeat faces, whereas the general face-recognition measure was the hit rate for old faces minus the false-alarm rate for entirely new faces. Distinguishing old faces from new-repeat faces requires source recollection, whereas distinguishing old faces from entirely new faces does not. Accuracy in the lineup task was scored as 0, 1, or 2 correct identifications in the TP condition, and as 0, 1, or 2 correct rejections in the TA condition. Table 6 shows the Goodman–Kruskal gamma correlations between lineup accuracy and both measures derived from the face-recognition test, with the data broken down by age group as well as by lineup type (TP vs. TA). Pearson correlations show precisely the same pattern. Lineup accuracy was reliably correlated with source memory in the TP condition but not in the TA condition, a finding that held in both the young and older groups. By contrast, lineup accuracy and face memory were not reliably correlated. These data indicate that accurate identifications in TP lineups were linked to our participants’ source memory.

Perceptions of the Perpetrator A self-report questionnaire asked participants what they thought was going on in each of the two scenarios (i.e., the

young-man scenario and the older-man scenario). The young man’s behavior was viewed as suspicious by 82% of the older participants, as compared with only 62% of the younger witnesses: v2(2, N ¼ 171) ¼ 8.41 and p ¼ .01. In contrast, the older man’s behavior was viewed as suspicious by equal percentages of older and younger witnesses (74% and 75%, respectively). We conclude that older witnesses were relatively more inclined to suspect the young man (though not the older man) of criminal intent in the scenario used here.

DISCUSSION The main purpose of the present study was to test the sourcerecollection hypothesis in the context of lineup identification. As predicted, age-related differences in lineup performance increased when the lineup was delayed by 1 week: the overall accuracy measure showed a Participant-Age 3 Delay interaction in both TP and TA conditions. Searcy and colleagues (2001) reported that older participants had more difficulty in identifying a person they had personally encountered some 5 weeks before the lineup test and speculated that long test delays might put older eyewitnesses at a particular disadvantage. The present findings confirm their speculation, in line with the view that there are age-related deficits in source recollection and that these deficits increase after longer test delays. The source recollection hypothesis was further supported by performance on the face-recognition and source-recollection tasks. As expected, the young adults and the older adults did not differ in correct recognition of previously seen faces (i.e., hit rates were age invariant). In line with our hypotheses, however, the older adults produced more false alarms in response to new faces, particularly in the case of repeated presentations of these new faces. In fact, among the older adults, new-repeated faces drew as many false alarms as hits. Following Jennings and Jacoby (1997), who obtained similar results with verbal stimuli, we believe that high false-alarm rates for repeated-new faces reflect problems in conscious recollection of information about context or source. Our purpose, however, was not simply to document further age-related deficits in recollection. Rather, we wished to determine whether these deficits have any bearing on the extent of older adults’ difficulties with the lineup task. To this end, we assessed the correlation between the accuracy of performance in the lineup task and source memory in the facerecognition task. We found that our measure of source memory was reliably correlated with the accuracy of performance in the

Grnb-58-06-09  Thursday, 4 September 2003  2:06 pm  Allen Press, Inc.  Page 48

AGING AND EYEWITNESS IDENTIFICATION

Table 6. Gamma Correlations Between Source Memory and Face Memory From the FR Test and Correct Responses by Young and Older Participants in the TP and TA Lineup Tasks Young Adults FR Measure Source memory Face memory

Older Adults

TP (n ¼ 28)

TA (n ¼ 23)

TP (n ¼ 42)

TA (n ¼ 42)

.50** .13

.16 .14

.35* .08

.18 .17

Note: FR ¼ face recognition; TP ¼ target present; TA ¼ target absent. *p ¼ .01; **p ¼ .002.

TP lineup task. This finding was supported within each age group and confirms our hypothesis of a link between identifying perpetrators in the lineup situation and source recollection. Moreover, because older participants scored lower on our face-source-recollection measure, the data suggest that age-related deficits in identifying perpetrators may be associated with age-related deficits in source recollection. In line with expectations, our measure of general facerecognition ability (old-minus-nonrepeated-lure discrimination) showed weaker correlations with lineup performance than did our source-memory measure. This outcome is both sensible and theoretically important. It is sensible in terms of the hypothesis that although lineup performance is based on recollection, distinguishing old from new faces in standard laboratory tasks is based partly on differences in familiarity (see Bartlett, Hurry, & Thorley, 1984). It is theoretically important in ruling out the simple notion that any two measures of memory for faces are likely to be correlated. Contrary to that notion, the present findings suggest that source-memory abilities are specifically important for good performance with lineups. Further research is required to examine the relations of source recollection to lineup performance. One unexpected finding that has to be addressed is the lack of correlations between source recollection and performance on those lineups where the target was not present. The source-memory component of our face-recognition task correlated with lineup performance in the TP condition but not in the TA condition. Perhaps the factors that govern decisions in TP situations are different from those that govern decisions in TA situations. Although cognitive factors may be responsible for decisions in a TP context, social factors (e.g., demand characteristics) may have a greater effect on decisions in a TA situation (Memon & Rose, 2002; Pozzulo & Lindsay, 1998). Despite its limitations, however, the present study is the first to find direct support for the hypothesis that source recollection is a factor affecting lineup performance. In extending this hypothesis to a real-world domain, our findings serve to strengthen its status as a general theory with the power to predict and explain performance in naturalistic settings. Our findings should also encourage investigators to pursue the applied implications of the theory to a greater extent than they have done in the past. One of the purposes of this study was to determine whether age differences in lineup performance reflect merely a confound in prior investigations that have used only young faces as stimulus materials. On the basis of earlier laboratory studies (e.g., Fulton & Bartlett, 1991) and one lineup identification study (Wright & Stroud, 2002), we had predicted that age

differences in lineup performance would be reduced with old faces as compared with young faces. Our expectations were confirmed insofar as false identifications are concerned. Although older eyewitnesses made more false identifications than younger witnesses, this difference was larger with the young lineup than with the old lineup. Our results therefore differ from those of prior studies in which Age of Participant 3 Age of Face interactions were found in hits, but not in false alarms (Fulton & Bartlett, 1991; Wright & Stroud, 2002). Moreover, whereas earlier studies found age differences in hits were reduced with older faces, we noted a trend for age differences in hits to be larger with the older lineup. The discrepancy with the findings from Fulton and Bartlett is not surprising in light of the many methodological differences between the laboratory task used in that study and the lineup task used in the present investigation. The discrepancy with the findings of Wright and Stroud is perhaps more puzzling; but, as noted in the introduction, their ‘‘older’’ witnesses were substantially younger than ours. In addition, one of the limitations of the typical lineup study is that only one or two targets are used. Moreover, in the current study the witnesses were presented with two highly similar video events and two lineups, all within a short space of time. More research on this issue is needed that uses a larger set of faces and different scenarios to see if there are circumstances in which the other-age effect is more likely to occur. Future research should also address an unexpected finding from our after-lineup questionnaire. Our older participants were more inclined to suspect the younger perpetrator of a crime than they were the older perpetrator, whereas there were no age differences in how the older man was perceived. We plan to examine the possibility that witnesses’ attributions about the behavior of a perpetrator may be more important than the age of the perpetrator per se. ACKNOWLEDGMENTS This research was supported by a grant from the National Science Foundation (SES-9809977). Address correspondence to Professor Amina Memon, Department of Psychology, University Of Aberdeen, Kings College, Old Aberdeen, Scotland, AB24 2UB. E-mail: [email protected]

REFERENCES Bartlett, J. C. (1993). Limits on losses in face recognition. In J. Cerella, W. Hoyer, J. Rybach, & M. Commons (Eds.), Adult information processing: Limits on loss (pp. 351–379). New York: Academic Press. Bartlett, J. C., & Fulton, A. (1991). Familiarity and recognition of faces: The factor of age. Memory & Cognition, 19, 229–238. Bartlett, J. C., & Leslie, J. E. (1986). Aging and memory for pictures of faces. Memory & Cognition, 14, 371–381. Bartlett, J. C., Hurry, S., & Thorley, W. (1984). Typicality and familiarity of faces. Memory & Cognition, 12, 219–228. Bartlett, J. C., Strater, L., & Fulton, A. (1991). False recency and false fame of faces in young adulthood and old age. Memory & Cognition, 19, 177–188. Brink, T., Yesavage, J., Lum, O., Heersema, P., Adey, M., & Rose, T. (1982). Screening tests for geriatric depression. Clinical Gerontologist, 1, 37–44. Brown, A. S., Jones, E. M., & Davis, T. L. (1995). Age differences in conversational source monitoring. Psychology and Aging, 10, 111–122. Burke, D. M., & Light, L. L. (1981). Memory and aging: The role of retrieval processes. Psychological Bulletin, 90, 513–546.

Grnb-58-06-09  Thursday, 4 September 2003  2:06 pm  Allen Press, Inc.  Page 49

MEMON ET AL.

Dywan, J., & Jacoby, L. (1990). Effects of aging on source monitoring: Differences in susceptibility to false fame. Psychology and Aging, 5, 379–387. Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). Mini-mental state: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12, 189–198. Fulton, A., & Bartlett, J. C. (1991). Young and old faces in young and old heads: The factor of age in face recognition. Psychology and Aging, 6, 623–630. Gwyer, P., & Clifford, B. R. (1997). The effects of cognitive interview on recall, identification, confidence and the confidence–accuracy relationship. Applied Cognitive Psychology, 11, 121–145. Henkel, L. A., Johnson, M. K., & De Leonardis, D. M. (1998). Aging and source monitoring: Cognitive processes and neuropsychological correlates. Journal of Experimental Psychology: General, 127, 251–268. Jacoby, L., Kelley, C., Brown, J., & Jasechko, J. (1989). Becoming famous overnight: Limits on the ability to avoid unconscious influences of the past. Journal of Personality and Social Psychology, 56, 326–338. Jennings, J. M., & Jacoby, L. L. (1993). Automatic versus intentional uses of memory: Aging, attention and control. Psychology and Aging, 8, 283–293. Jennings, J. M., & Jacoby, L. L. (1997). An opposition procedure for detecting age related deficits in repetition: The telling effects of repetition. Psychology and Aging, 12, 352–361. Johnson, M. K., Hashtroudi, S., & Lindsay, D. S. (1993). Source monitoring. Psychological Bulletin, 114, 3–28. Kassin, S., Tubb, V. A., Hosch, H. M., & Memon, A. (2001). The ‘‘general acceptance’’ of psychological research on eyewitness testimony: An issue revisited 10 years on. American Psychologist, 56, 405–416. Koutstaal, W., & Schacter, D. (1997). Gist based false recognition of pictures in older and younger adults. Journal of Memory and Language, 37, 555–583. Koutstaal, W., Schacter, D. L., Galluccio, L., & Stofer, K. A. (1999). Reducing gist-based false recognition in older adults: Encoding and retrieval manipulations. Psychology and Aging, 14, 220–237. List, J. A. (1986). Age and schematic differences in the reliability of eyewitness testimony. Developmental Psychology, 22, 50–57. Memon, A., & Bartlett, J. C. (2002). The effects of verbalisation on face recognition in young and older adults. Applied Cognitive Psychology, 16, 635–650. Memon, A., & Gabbert, F. (2003). Improving the identification accuracy of senior witnesses: Do pre-lineup questions and sequential testing help? Journal of Applied Psychology, 88, 341–347. Memon, A., Hope, L., Bartlett, J., & Bull, R. H. C. (2002). Eyewitness recognition errors: The effects of mugshot viewing and choosing in young and old adults. Memory & Cognition, 30, 1219–1227. Memon, A., & Rose, R. (2002). Identification abilities of children: Does verbalisation impair face and dog recognition? Psychology, Crime & Law, 8, 229–242. Memon, A., Vrij, A., & Bull, R. (2003). Psychology & law: Truthfulness, accuracy and credibility of victims, witnesses and suspects. Chichester, England: Wiley. Nelson, H. E., & O’Connell, A. (1978). Dementia: The estimation of premorbid intelligence levels using the new adult reading test. Cortex, 14, 234–244. Park, D. C. (2000). The basic mechanisms accounting for age-related decline in cognitive function. In D. Park & N. Schwarts (Eds.),

Cognitive aging: A primer (pp. 3–22). Philadelphia: Psychology Press. Pozzulo, J., & Lindsay, R. C. L. (1998). Identification accuracy of children versus adults: A meta-analysis. Law and Human Behavior, 22, 549– 570. Rodin, M. J. (1987). Who is memorable to whom—A study of cognitive disregard. Social Cognition, 5, 144–165. Schacter, D. L., Kaszniak, A., Kihlstrom, J. F., & Valdserri, M. (1991). The relation between source memory and aging. Psychology and Aging, 6, 559–568. Searcy, J., Bartlett, J. C., & Memon, A. (1999). Age differences in accuracy and choosing in eyewitness identification and face recognition. Memory & Cognition, 27, 538–552. Searcy, J. H., Bartlett, J. C., & Memon, A. (2000). Relationship of availability, lineup conditions and individual differences to false identification by young and older eyewitnesses. Legal and Criminological Psychology, 5, 219–236. Searcy, J. H., Bartlett, J. C., Memon, A., & Swanson, K. (2001). Aging and lineup performance at long retention intervals: Effects of metamemory and context reinstatement. Journal of Applied Psychology, 86, 207– 214. Searcy, J. H., Bartlett, J. C., & Seipel, A. (2000). Crime characteristics and lineup identification decisions. Manuscript submitted for publication. Shapiro, P. N., & Penrod, S. (1986). Meta-analysis of facial identification studies. Psychological Bulletin, 100, 139–156. Shepherd, J. W. (1983). Identification after long delays. In S. M. A. LloydBostock & B. R. Clifford (Eds.), Evaluating eyewitness evidence (pp. 173–187). Chichester, England: Wiley. Smith, A., & Winograd, E. (1978). Adult age differences in remembering faces. Developmental Psychology, 14, 443–444. Spencer, W. D., & Raz, N. (1995). Differential effects of aging on memory for content and context: A meta-analysis. Psychology and Aging, 10, 527–539. Sporer, S. (1992). Das Wiedererkennen von Gesichtern. Weinheim: Psychologie Verlags Union. Wells, G. L. (1993). What do we know about eyewitness identification? American Psychologist, 48, 553–571. Wright, D. B., & Davies, G. M. (1999). Eyewitness testimony. In F. T. Durso, R. S. Nickerson, R. W. Schvaneveldt, S. T. Dumais, D. S. Lindsay, & M. T. H. Chi (Eds.), Handbook of applied cognition (pp. 789–818). Chichester, England: Wiley. Wright, D. B., & McDaid, A. T. (1996). Comparing system and estimator variables using data from real lineups. Applied Cognitive Psychology, 10, 75–84. Wright, D. B., & Stroud, J. (2002). Age differences in lineup identification accuracy: People are better with their own age. Law and Human Behavior, 26, 641–654. Yesavage, J., Brink, T., Rose, T., Lum, O., Huang, V., Adey, M., et al. (1983). Development and validation of a geriatric depression screening scale: A preliminary report. Journal of Psychiatric Research, 17, 37–49. Yonelinas, A. (2002). The nature of recollection and familiarity: A review of 30 years of research. Journal of Memory and Language, 46, 441– 517. Received July 1, 2002 Accepted June 30, 2003 Charles F. Longino, Jr., PhD

Grnb-58-06-09  Thursday, 4 September 2003  2:06 pm  Allen Press, Inc.  Page 50