An examination of the Boston Naming Test ... - Wiley Online Library

10 downloads 340 Views 128KB Size Report
Aug 5, 2010 - 1Department of Psychology, Texas Tech University, Lubbock, TX, USA ... 5Department of Psychiatry, University of Texas Southwestern Medical ...
RESEARCH ARTICLE

An examination of the Boston Naming Test: calculation of ‘‘estimated‘‘ 60-item score from 30- and 15-item scores in a cognitively impaired population Valerie L. Hobson1, James R. Hall2,3, Michelle Harvey4, C. Munro Cullum5,6, Laura Lacritz5, Paul J. Massman7,8, Stephen C. Waring 9 and Sid E. O’Bryant 10 1

Department of Psychology, Texas Tech University, Lubbock, TX, USA Institute of Aging and Alzheimer’s Disease Research, University of North Texas Health Science Center, Fort Worth, TX, USA 3 Department of Psychiatry and Neuroscience, University of North Texas Health Sciences Center, Fort Worth, TX, USA 4 Clinical Health Psychology, Keller, TX, USA 5 Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, USA 6 Department of Neurology, University of Texas Southwestern Medical Center, Dallas, TX, USA 7 Department of Psychology, University of Houston, Houston, TX, USA 8 Department of Neurology, Baylor College of Medicine, Houston, TX, USA 9 Marshfield Clinic Research Foundation, Epidemiology Research Center, Marshfield, WI, USA 10 Department of Neurology, F. Marie Hall Institute for Rural and Community Health, Texas Tech University Health Sciences Center, Lubbock, TX, USA Correspondence to: J. R. Hall, E-mail: [email protected] 2

Objective: Multiple versions of the Boston Naming Test (BNT) exist, which makes comparison of findings from different studies difficult. The current project sought to determine if estimated 60-item BNT scores could be reliably calculated from 30- and 15-item administrations with patients diagnosed with Alzheimer’s disease (AD). Methods: Estimated 60-item scores were created for 30-item (even and odd) and 15-item Consortium to Establish a Registry for Alzheimer’s disease (CERAD) versions of the BNT from a database containing item-level responses for all BNT items. Correlations were conducted between all three estimated 60-item scores and full 60-item version scores administered to all participants in the sample. Results: The estimated versions were all highly correlated with the standard 60-item version of the BNT across the sample and these findings held when the sample was separated out by case (AD) and control status. Mean difference scores were very small for scores estimated from 30-item administrations; however, difference scores for the 15-item CERAD were much larger. Conclusions: Estimated 60-item versions of the BNT can be created from 30-item BNT administrations, which will enable comparisons across studies and allow integration of data from various AD research groups for increased power in analytic protocols. Creation of an estimated score from the 15-item CERAD version is not warranted. Copyright # 2010 John Wiley & Sons, Ltd. Key words: Alzheimer’s disease; Boston Naming Test; CERAD; neuropsychological testing History: Received 25 September 2009; Accepted 15 March 2010; Published online 5 August 2010 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/gps.2533

Introduction Identification and characterization of neurocognitive deficits at different stages of Alzheimer’s disease (AD) has received a great deal of attention over the years. One cognitive domain frequently impacted early in the Copyright # 2010 John Wiley & Sons, Ltd.

course of this disease is word-finding ability. The ability to combine word-finding data from various sources to form a larger data pool is limited by the lack of consistency in the tests or versions of the tests utilized. One potential solution to this commonly faced impediment is the creation of estimated scores Int J Geriatr Psychiatry 2011; 26: 351–355.

352

that demonstrate comparable psychometric properties, including diagnostic accuracy. The Boston Naming Test (BNT) (Kaplan et al., 1983), an instrument that assesses confrontation naming, is useful in detecting word-retrieval problems and is one of the most frequently used instruments in evaluations of patients with known or suspected AD (Williams et al., 1989). Originally, the BNT was comprised of 85-items, but was reduced to the current 60-item test (Kaplan et al., 1976, 1982; Borod et al., 1980). Subsequently, short forms were developed to reduce administration time and to provide alternate versions for assessing naming performance over time (Mack et al., 1992). Two shortened versions of 30 items each (one even numbered version, and one odd numbered version from the full 60-item administration) have been found to be equivalent to one another and have shown a significant correlation with the 60-item version in healthy controls and demented patients (Williams et al., 1989). Lansing et al. (1999) empirically derived a short form of the BNT that distinguished demented from non-demented subjects with a high degree of accuracy similar to the full BNT. Additionally, four different 15-item versions were created simply by assigning each consecutive item to one of four possible series. These versions were found to be highly correlated and showed the ability to discriminate demented from non-demented subjects as effectively as the 60-item version (Mack et al., 1992). The Consortium to Establish a Registry for Alzheimer’s disease (CERAD) developed a brief battery that utilized several neuropsychological measures, including a different 15-item version of the BNT. The number of different versions of the BNT currently in use impedes the ability of researchers to combine results from multiple databases for larger analytic approaches aimed at understanding language deficits in AD and other disorders. However, this discrepancy also poses an opportunity for researchers to pursue the creation of estimated full 60-item BNT scores from briefer administrations in order to provide a framework for future studies to create similar techniques that would enable large-scale studies across already existing databases. The current investigation sought to determine if an ‘‘estimated’’ 60-item BNT could be calculated from shorter administration protocols. For the purposes of the current study, estimated 60-item scores were calculated from the even and odd 30-item administrations as well as the 15-item CERAD administration. It was hypothesized that these ‘‘estimated’’ 60-item scores would be highly correlated with the standard 60-item score, would demonstrate small difference scores (full 60-item score – estimated Copyright # 2010 John Wiley & Sons, Ltd.

V. L. Hobson et al.

score), and exhibit good estimates of diagnostic accuracy (AUC). Methods Participants

Archival data were extracted from a clinical database at an urban health sciences center. The sample consisted of 29 non-demented controls and 120 patients diagnosed with AD that were evaluated between January 2006 and May 2008 in an outpatient geriatric assessment program. The mean age was 78.8 years, with a range from 62 to 95 years. Females comprised 74% of the sample and 91% of the sample was Caucasian. A consensus diagnosis based on history, physical exam, laboratory, and neuropsychological testing was established, using criteria of the National Institute of Neurological and Communicative Disorders and Stroke–Alzheimer’s Disease and Alzheimer’s disease and Related Disorders Association Work Group (NINCDS–ADRDA); (McKhann et al., 1984) and controls performed within normal limits on psychometric assessment. This study was approved by the Institutional Review Board and was performed by means of a retrospective analysis of medical records. Measures The BNT consists of 60 black and white line drawings that represent everyday objects that are arranged in order of increasing difficulty. The patient is asked to name the object within a 20-s time allotment, and semantic and/or phonemic cues are provided as needed according to standardized instructions. The total score is calculated from the number of items correctly identified spontaneously in addition to those named correctly after a semantic cue was provided. The BNT was administered as part of a larger neuropsychological assessment protocol. Administration protocol for this clinic begins with the first BNT item rather than item 30; the discontinue rule of 8 consecutive failures was utilized. Statistical analyses

Analyses were conducted using the Statistical Package for the Social Sciences (SPSS, version 16.0.1 [SPSS, Inc., Chicago, Illinois]). The database utilized for the current study contained item-level responses enabling investigators to calculate briefer versions based on items entered into the database. Int J Geriatr Psychiatry 2011; 26: 351–355.

353

Estimated Boston Naming Test scores

First, odd (30ODD), even (30EVEN), and CERAD (15CERAD) BNT scores were extracted from the 60item administrations. Next, 60-item ‘‘estimated’’ scores were created for both the 30-item administrations (60ODD and 60EVEN) scores by multiplying the total score of each 30-item administration by two and the 15-item by 4 (60CERAD). Pearson correlation coefficients were calculated to examine the relationship between the estimated 60-item groups (60EVEN, 60ODD, 60CERAD) and original 60-item total scores. Additional analyses were then conducted separately by case and control status which also included intraclass correlations (ICC). Next, in order to evaluate how closely the estimated scores compared to actual scores, difference scores were calculated for each of the new variable groups (60ODD, 60EVEN, and 60CERAD) by subtracting the BNT 60-item sample score (i.e., estimated 60-item score – actual 60-item score). Significance testing for deviation scores (from 0) was completed via t-test. Lastly, receiver operating characteristic (ROC) curves were generated and AUC calculated. Results The demographic characteristics of the total study population are provided in Table 1. Estimated 60EVEN scores (r ¼ 0.97, p < 0.001) and 60ODD scores Table 1 Study population characteristics Characteristics Sex, No. (%) Male Female Age Mean (SD) Range Education Mean (SD) Range MMSE Mean (SD) Range BNT total score Mean (SD) Range BNT odd score Mean (SD) Range BNT even score Mean (SD) Range BNT CERAD score Mean (SD) Range

Total sample (n ¼ 149)

AD (n ¼ 120)

Control (n ¼ 29)

39 (26.2) 110 (73.8)

33 (27.5) 87 (72.5)

6 (20.7) 23 (79.3)

78.8 (7.5) 62–95

80.2 (7.1) 62–95

72.8 (6.3) 64–87

13.5 (3.1) 2–21

13.1 (3.1) 2–21

15.2 (2.6) 10–20

23.0 (5.0) 12–30

21.2 (4.4) 12–26

28.4 (1.8) 25–30

40.3 (13.5) 5–60

36.0 (12.5) 5–56

55.0 (3.8) 47–60

39.0 (14.3) 6–60

35.0 (13.1) 6–58

55.1 (3.6) 46–60

40.2 (13.4) 4–60

36.8 (12.5) 4–56

54.4 (4.5) 46–60

48.3 (11.3) 0–60

34.7 (13.2) 0–60

55.0 (5.3) 44–60

Copyright # 2010 John Wiley & Sons, Ltd.

(r ¼ 0.98, p < 0.001) were very highly correlated with standard 60-item administration scores. When analyses were separated out by case status, estimated scores remained highly correlated with actual scores in both the AD (60EVEN r ¼ 0.97, p < 0.001; 60ODD r ¼ 0.97, p < 0.001), and control (60EVEN score r ¼ 0.92, p < 0.001; 60ODD score r ¼ 0.92, p < 0.001) groups. The estimated 60CERAD score also correlated highly with standard 60-item total score (r ¼ 0.93, p < 0.001) for the entire sample as well as when restricted to AD cases (r ¼ 0.91, p < 0.001), and controls (r ¼ 0.86, p < 0.001) only. Because of the non-independence of the data between forms (i.e., all estimated forms were extracted from full administration), ICC were calculated to determine the agreement between the forms. The ICC statistics were conducted between the standard 60-item administration scores and the estimated scores 60EVEN, 60ODD, and 60CERAD. The ICC statistics were as follows: 60 EVEN (ICC ¼ 0.98, 95% CI: 0.97– 0.98, F ¼ 79.4, p < 0.00), 60 ODD (ICC ¼ 0.98, 95% CI: 0.97–0.99, F ¼ 98.6, p < .00), and 60CERAD (ICC ¼ 0.74, 95% CI: 0.04 to 0.91, F ¼ 19.4, p < 0.00). The difference scores produced for the 60ODD and 60EVEN scores were very small averaging less than a one point difference from the obtained 60-item score (see Table 2). Significance testing from a derivation of zero of the estimated scores produced for the 60ODD (t ¼ 3.05, p < 0.01, range ¼ 19 to 5) and 60CERAD scores (t ¼ 17.8, p < 0.001, range ¼ 24 to 22) were significant, 60EVEN was not significant (t ¼ 1.6, p > .05, range ¼ 23 to 15). In order to test the diagnostic utility of the estimated scores, ROC curves are presented in Figure 1 for each of the BNT scores (estimated and full). AUCs generated from the estimated scores were very impressive and comparable to full 60-item administration: 60ODD ¼ 0.947, 60EVEN ¼0.926, 60CERAD ¼0.876, and the Total BNT 60-item sample score ¼0.954. AUCS were then compared between the standard 60-item administration and the estimated scores, results were as follows: 60ODD ¼ 0.001, p > 0.05, 60EVEN ¼ 0.028, p > 0.05, and 60CERAD ¼ 0.078, p < 0.05. For the BNT total 60-item sample, a cut-score of 51 provided the best balance between estimates of SN (0.88) and SP (0.85). Estimates of SN and SP for each of the estimated 60-item scores were comparable to the standard 60item administration. For the 60ODD scores, cut-scores of 51 and 52 provided the best estimates of SN (both 0.89) and SP (both 0.84). For the 60EVEN scores, cutscores of 49 and 50 provided the best estimates of SN (both 0.83) and SP (both 0.83). For the 60CERAD, cutscores of 52, 51, 50, and 49 all resulted in the best balance between estimates of SN (0.86) and SP (0.84). Int J Geriatr Psychiatry 2011; 26: 351–355.

354

V. L. Hobson et al.

Table 2 Correlations and difference scores of the Boston Naming Estimated 60-item sample, 60-item estimated 60ODD, 60EVEN, and 60CERAD

60ODD Correlation Difference score Mean (SD) 60EVEN Correlation Difference score Mean (SD) 60CERAD Correlation Difference score Mean (SD)

Total sample (n ¼ 149)

AD (n ¼ 120)

Controls (n ¼ 29)

0.98

0.97

0.92

0.71 (2.77)

0.93 (2.99)

0.10 (1.45)

0.97

0.97

0.92

0.39 (2.98)

0.66 (3.16)

0.62 (1.88)

0.93

0.91

0.86

0.8.29 (5.50)

9.61 (5.28)

3.24 (2.68)

Note: all p-values < 0.001.

Discussion The lack of consistency of neuropsychological tests (or versions of tests) between research groups poses a formidable obstacle to combined analytic protocols taking advantage of existing databases. The BNT is one of the most widely used instruments to assess

Figure 1 ROC curve for Boston Naming Estimated 60-item sample, 60item estimated 60ODD, 60EVEN, and 60CERAD.

Copyright # 2010 John Wiley & Sons, Ltd.

confrontation naming deficits, although different versions of the BNT (i.e., 60-item, 30-item, or 15item tests) are often utilized in various settings. Because the majority of the different versions of the BNT utilize the same items (only different numbers of items), the BNT provides a unique opportunity for researchers to begin working on methodologies for creating estimated scores that will allow for comparison of findings across and within large datasets. The current results demonstrate that estimated 60-item scores from the 30-even and 30-odd administrations can be reliably calculated simply by multiplying the obtained score by two. Even though the difference score between estimated and observed BNT scores was significantly different from 0 for the 60ODD score, the mean difference score was very small and the diagnostic accuracy comparable to that of the full BNT. The best performing estimated score was the 60EVEN, which also had good diagnostic accuracy properties, but also the mean difference score was not significantly different from zero. On the other hand, the current results do not support the utility of the estimated 60CERAD score from 15-item CERAD version of the BNT and such a score should not be utilized. These findings suggest that it is possible to reliably estimate traditional 60-item BNT scores from 30-item administrations. The creation of these equivalent estimated 60-item scores will enable researchers to combine data obtained from different research databases. As an example, the National Alzheimer’s Coordinating Center (NACC) was established to facilitate collaborative research among 29 NIA-funded AD Centers. Currently, the NACC Uniform Dataset (UDS) utilizes the 30-item odd numbered version of the BNT. With the current results, one can combine NACC findings with those obtained from other Int J Geriatr Psychiatry 2011; 26: 351–355.

355

Estimated Boston Naming Test scores

protocols that utilize even or full-administration protocols. The current study should also serve as a basis and justification for similar studies on other neuropsychological tests. For example, a portion of the Wechsler memory scale (WMS) is also required as part of the NACC UDS; however, there are multiple versions of the WMS being utilized. It is possible that the NACC UDS requirement can be extracted from standard administration protocol from any of the three WMS versions though this has not been empirically tested. Similar studies could be conceived creating estimated scores from other neuropsychological tests (e.g., listlearning, intelligence, and visuospatial, attention). A limitation to the study is the fact that the data are not independent. Specifically, the briefer versions of the scale were extracted from the full administration. However, calculating an estimated total score from the briefer versions is not possible without the full administration. A potential limit to the generalizability of the current findings is the fact that administration began with item 1 rather than item 30. However, the validity of the estimated score would have been questionable without such administration as we have assumed all items 1–30 would have been correct, which may not have been the case and possibly overestimating the scores. From a clinical perspective, the findings of comparable, and very impressive, AUC for the full BNT administration and the estimated BNT scores from the 30-item administrations suggests that these scores can be utilized diagnostically and that the norms developed for the 60-item version such as those reported by Heaton et al. (2004) may be applicable to the shorter versions, though this merits further investigation. These findings also point to the possibility of clinicians being able to make comparisons across different clinical administrations on the same patient even if the exact same BNT version was not utilized. Lastly, while these findings are very

Copyright # 2010 John Wiley & Sons, Ltd.

Key Points  Results from briefer BNT administrations can be

used to calculate estimated full BNT scores. Estimated 60-item scores from 30-item administrations are very consistent with observed full administration protocols while an estimated score from the CERAD 15-item version is not. The diagnostic accuracy of these estimated scores, with the exception of the CERAD version, are very good and comparable to estimates from the 60item administration.

promising, they should be cross-validated in an independent dataset.

Conflict of interest None declared.

References Borod J, Goodglass H, Kaplan E. 1980. Normative data on the Boston Diagnostic Aphasia Examination, Parietal Lobe Battery, and the Boston Naming Test. J Clin Neuropsychol 2: 209–215. Heaton R, Miller S, Taylor M, Grant I. 2004. The Revised Comprehensive Norms for an Expanded Halstead-Reitan Battery: Demographically Adjusted Neuropsychological Norms for African American and Caucasian Adults (HRB), Lutz, Florida, Psychological Assessment Resources. Kaplan E, Goodglass H, Weintraub S. 1976. Boston Naming Test: Experimental Edition. Boston University: Boston. Kaplan E, Goodglass H, Weintraub S. 1982. Boston Naming Test: Experimental Edition. Boston University: Boston. Kaplan EF, Goodglass H, Weintraub S. 1983. The Boston Naming Test. Lea & Febiger: Philadelphia. Lansing A, Ivnik R, Cullum C, Randolph C. 1999. An empirically derived short form of the Boston Naming Test. Arch Clin Neuropsychol 14: 481–487. Mack WJ, Freed DM, Williams BW, Henderson VW. 1992. Boston Naming Test: shortened versions for use in Alzheimer’s disease. J Gerontol 47: P154–P158. Mckhann D, Drockman D, Folstein M, et al. 1984. Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA work group. Neurology 34: 939–944. Williams BW, Mack W, Henderson VW. 1989. Boston Naming Test in Alzheimer’s disease. Neuropsychologia 27: 1073–1079.

Int J Geriatr Psychiatry 2011; 26: 351–355.