Audiologic Practices - Semantic Scholar

5 downloads 0 Views 45KB Size Report
again by Martin and Sides in 1985, and most recently by ... that some changes were made in the Martin and .... 1974; Keith & Talis, 1972; Lovrinic, Burgi, & Curry,.
Clinical Focus



Grand Rounds

A forum for case studies, clinical notes and observations, and other clinical material that does not have a traditional research format.

Audiologic Practices: What Is Popular Versus What Is Supported by Evidence Terry L. Wiley Daniel T. Stoppenbach Laura J. Feldhake Kayce A. Moss Elin T. Thordardottir University of Wisconsin-Madison

A

number of published surveys over the past 20 years (Burney, 1972; Martin & Forbis, 1978; Martin & Gravel, 1989; Martin & Morris, 1989; Martin & Pennington, 1971; Martin & Sides, 1985; Pennington & Martin, 1972) have identified common audiometric practices in the United States. The combined works of Martin and Pennington (1971) and Pennington and Martin (1972) constituted the first complete survey of diagnostic audiometry practices. The initial surveys were updated in 1978 by Martin and Forbis, again by Martin and Sides in 1985, and most recently by Martin and Morris (1989). Each updated survey had some changes in the questionnaire sent to audiologists based on perceived changes in clinical practice over time. Martin and Sides (1985), for example, noted that some changes were made in the Martin and Forbis questionnaire “…including the deletion of procedures felt to be obsolete, and the inclusion of areas concerned with new developments in the field” (p. 29). The basic purpose of these surveys was the documentation of audiometric tests and test procedures used by audiologists in diagnostic practice. Specifically, the survey reports included the number and percentage of respon26

dent audiologists who used specified tests and procedures. No attempt was made in these surveys to recommend certain practices over others. Indeed, the authors noted that their intent was not to prescribe or recommend practices on the basis of the survey responses (Burney, 1972; Martin & Forbis, 1978; Martin & Gravel, 1989; Martin & Morris, 1989; Martin & Pennington, 1971; Martin & Sides, 1985). In the case of selected tests and procedures, the surveys indicate that a majority of audiologists have used and continue to use tests and procedures that are not supported by clinical or experimental evidence. Available evidence actually supports the use of different procedures. Accordingly, our purposes were (a) to review available evidence fundamental to selected tests and procedures reviewed in the published surveys of audiometric practices, and (b) to illustrate disparities between published scientific evidence and common clinical practices. Our review is not exhaustive. A number of procedures that have been surveyed are not discussed. We attempted to exemplify the disparity between practice and evidence; we did not attempt to catalog all problem issues in clinical practice. In some cases, the questions posed in the surveys did not lend themselves to unequivocal consideration. Specifically, some questions and responses were in a form that did not permit clear analysis of appropriate or inappropriate practices. An example of this problem was the rule for masking in airconduction audiometry. According to the Martin and Morris (1989) survey, 37.7% of respondents reported that they masked during air-conduction audiometry in cases for which the air-conduction thresholds differed by a predetermined amount (e.g., 40 dB) across the two ears. In the same survey, 60.7% reported they masked for air conduction if the airconduction threshold in one ear differed from the bone-conduction threshold in the opposite ear by a predetermined amount. Unfortunately, it is not clear if respondents use both rules or only one of the two. Masking should always be used in cases of critical differences in airconduction thresholds across the two ears. In selected cases, however, masking is also required in the absence of critical differences in interaural air-conduction thresholds. If the

American Journal of Audiology • Vol. 1059-0889/95/0401-0026 © American Speech-Language-Hearing Association American Journal of Audiology • Vol. 4 4• •No. 1

March 1995

difference between the air-conduction threshold in one ear and the bone-conduction threshold in the opposite ear is significant, masking is required regardless of a critical difference in interaural air-conduction thresholds (ASHA, 1978). So, although 60.7% of the Martin and Morris (1989) respondents appear to be following appropriate masking guidelines, the exact practice of the remaining respondents is unclear. In view of this type of difficulty, we attempted to select survey issues that presented a minimum of equivocation. That is, we selected tests and procedures for which the survey data were clearly interpretable. As an initial effort, we selected two topic areas, speech audiometry and tone decay tests, for which there was apparent disagreement between common practice and supporting evidence for the practice. It should be understood that our selection of these tests and procedures was based on the clarity of the published practice surveys, not on the relative importance or established efficacy of the tests and procedures. As noted earlier, our purpose was to illustrate disparities between practice and evidence; therefore, we had to begin with clear data on common practices. Unfortunately, survey data on more contemporary and, in some cases, more efficacious procedures (such as auditory evoked potential and acoustic immittance measures) are lacking or do not include enough procedural detail for a clear analysis of common practices. Accordingly, the two practice areas chosen should be viewed as reference points for exemplifying a problem, not as a judgment on our part regarding the relative efficacy of the particular tests and procedures under review. Throughout the discussion that follows, we have attempted to focus on the most current survey data. Specifically, we review changes in practices over time based on the periodic surveys, but our conclusions are primarily directed at the most current survey data available.

Speech Audiometry Speech audiometry was considered in five of the surveys (Martin & Forbis, 1978; Martin & Gravel, 1989; Martin & Morris, 1989; Martin & Pennington, 1971; Martin & Sides, 1985). Under the general topic of speech audiometry, specific practices queried included procedures for speech recognition threshold (SRT) or speech reception threshold, and word discrimination or, as it is currently termed, word recognition testing.

on tone thresholds, (b) as a valid measure of hearing sensitivity, and (c) as a reference for suprathreshold speech recognition measures. They do note that the SRT may have potential value in certain clinical applications, such as cases of pseudohypoacusis and with patients for whom conditioned tone audiometry is not feasible. Even in those special circumstances for which the SRT is perhaps useful, the surveys suggest that audiologists are not using SRT procedures based on available evidence or based on recommended guidelines for test administration. One such procedural issue is the requirement that subjects be familiarized with test words before obtaining the SRT. Early research demonstrated differences in SRTs obtained with and without familiarization (Conn, Dancer, & Ventry, 1975; Tillman & Jerger, 1959). Tillman and Jerger (1959), for example, found that prior familiarization with test words improved the SRT by 4–5 dB. Based on these earlier findings, word familiarization is now an inherent feature of the ASHA (1988) guidelines for the determination of speech recognition threshold. In spite of these reports and recommendations, only a small majority of survey respondents familiarize their patients with test words before obtaining an SRT. Only 51.4% of the respondents in the Martin and Morris (1989) survey, 55% in the Martin and Sides (1985) survey, 50.1% in the Martin and Forbis (1978) survey, and 50.71% in the Martin and Pennington (1971) survey indicated that they routinely familiarized their patients with test words before obtaining an SRT. Another clinical practice issue is the use of a standard response criterion for the SRT. The ASHA (1988) guidelines, for example, specify that if the threshold search procedure incorporates 2-dB steps, the search should be terminated when the patient responds incorrectly to five of the last six words. No respondents in the latest Martin and Morris (1989) survey reported using this recommended criterion. As noted earlier, the value of SRT measures is limited to a few clinical applications. However, if the measure is used, it should be used correctly. When SRT measures are used as a reliability check in cases of pseudohypoacusis, for example, strict adherence to specified procedures (e.g., ASHA, 1988) is no less important than it is in obtaining the tone audiogram (ASHA, 1978). The test procedure resulting in the most valid and reliable measure across clinics and test administrations should be used.

Word Recognition Tests Speech Recognition Threshold In the latest survey, Martin and Morris (1989) report that the majority of respondents routinely obtain a speech recognition threshold for each patient. Depending on the test purpose, routine use of the measure may be unwarranted. Wilson and Margolis (1983), for example, provide a critical review of the very limited clinical applications of the SRT, and conclude that the measure “may be useful under some circumstances, but the procedure probably does not deserve its current high standing among auditory tests” (p. 120). Specifically, Wilson and Margolis question the bases for using SRT measures (a) as a reliability check

Speech materials. Across surveys, the materials most often used for assessment of speech recognition ability were the word lists used for the Central Institute for the Deaf (CID) W-22 recordings (Martin & Forbis, 1978; Martin & Morris, 1989; Martin & Pennington, 1971; Martin & Sides, 1985). These lists were used by 71.6% of respondents to the Martin and Pennington (1971) survey, 70.7% of respondents to the Martin and Forbis (1978) survey, 61% of the respondents to the Martin and Sides (1985) survey, and 54.2% of the respondents to the Martin and Morris (1989) survey. The second most popular set of materials reported in the latest two surveys was the Wiley • Stoppenbach • Feldhake • Moss • Thordardottir

27

Northwestern University Test No. 6 (NU-6) word lists. In the Martin and Morris (1989) survey, for example, the NU6 word lists were used by 37.5% of respondents. Despite their continued popularity, the CID W-22 or the NU-6 word lists, if used exclusively, may not be the best choice of speech materials for the clinical assessment of speech recognition abilities. Several previous studies have shown that these recorded word lists are not sensitive to differences in word recognition abilities or to differences in the type and degree of hearing loss (Carhart, 1965; Causey, Hermanson, Hood, & Bowling, 1983; Geffner & Donovan, 1974; Keith & Talis, 1972; Lovrinic, Burgi, & Curry, 1968; Maroonroge & Diefendorf, 1984; Schwartz & Surr, 1979; Sher & Owens, 1974; Silverman & Hirsh, 1955). Indeed, the developers of the original W-22 recordings noted this limitation. Hirsh et al. (1952) included the following footnote at the end of their original article on the W-22 recorded word lists: While this paper was in preparation clinical trials of W-2 and W-22 were conducted in the Hofheimer Audiology Laboratory (Washington University) and in the Hearing Clinic of Central Institute for the Deaf. Experience to date indicates (1) that W-2 is very satisfactory for determining the threshold for speech, but (2) that W-22 does not satisfactorily separate patients with mixed deafness from patients with pure conductive deafness. The older recordings of the Egan lists are more effective in this respect. The reasons for this difference are now being sought. (p. 335) A number of subsequent studies, some with large populations of listeners with hearing loss, demonstrated that most listeners exhibited similarly high scores on the W-22 in spite of significant differences in the degree of hearing loss (Carhart, 1965; Runge & Hosford-Dunn, 1985; Thornton & Raffin, 1978). In an early study, Carhart (1965) stated that the W-22 test “did very little” (p. 255) to differentiate performance for the majority of subjects. He reported that 60% of 170 veterans with hearing loss scored 90% or better for the W-22 recordings. Thornton and Raffin (1978) found that nearly half (2,057) of their 4,120 veterans with hearing loss had scores of 90% or better on the same test. Runge and Hosford-Dunn (1985) reported the same result (90% or better) for more than half of their subject sample and further noted that three fourths of their subjects had scores from 80–100%. The data pool for Runge and Hosford-Dunn included 508 ears with hearing loss and 136 ears with normal hearing. Similar results also have been reported for the original NU-6 recordings. Walden, Prosek, and Worthington (1975), in a study of 3,000 members of the military, reported that a majority of those with hearing loss had scores within normal limits for the NU-6. Across subject groups with different degrees and configuration of hearing loss, the lowest mean NU-6 score was 95.7%. Similarly, Schwartz and Surr (1979) found that more than half of their subjects with varying degrees of hearing loss had word-recognition scores of 90% or greater for the NU-6. In contrast to findings for the W-22 and NU-6 tests of word recognition, several studies (Causey et al., 1983; 28

American Journal of Audiology • Vol. 4 • No. 1

Dubno, Dirks, & Langhofer, 1982; Eldert & Davis, 1951; Lovrinic et al., 1968; Schwartz & Surr, 1979; Thurlow, Davis, Silverman, & Walsh, 1949; Thurlow, Silverman, Davis, & Walsh, 1948) have shown that other speech materials may be more sensitive for differentiation of speech-recognition performance across listeners with hearing loss. Dubno et al. (1982), for example, reported that the Nonsense Syllable Test (NST) was sensitive to the effects of low-frequency as well as high-frequency hearing loss. Schwartz and Surr (1979) reported that the California Consonant Test (CCT) “was more sensitive in differentiating among individuals with a wide range of high frequency hearing loss than were the NU-6 lists” (p. 68). As another example, previous reports have indicated that, in contrast to W-22 and NU-6 tests, the Rush Hughes recordings of the Harvard PB-50 word lists effectively discriminate between types of hearing loss (Goetzinger, Proud, Dirks, & Embrey, 1961; Thurlow et al., 1949; Thurlow et al., 1948) and are also sensitive to disorders of the central auditory nervous system (Goetzinger, 1972; Goldstein, Goodman, & King, 1956). As a specific illustration, Goetzinger (1972) reported that W-22 scores were 90% or higher bilaterally in four cases of hemispheric pathology. In the same subjects, scores for the Rush Hughes recordings of the PB-50 were significantly reduced in the ear contralateral to the lesion. Thus, scores for the Rush Hughes PB-50 recordings were sensitive to the cortical pathology, and the W-22 test was not. Available data, then, suggest that the popular W-22 and NU-6 tests are relatively insensitive to differences in speech-recognition ability and are relatively insensitive to the presence of peripheral and central auditory disorders. Number of test items. For this topic area, audiologists were asked to indicate whether they used the entire 50 words in standardized word lists or a reduced number of test items. The Martin and Forbis (1978) survey reported that 16.6% of respondents presented 50 words to all patients, 32.3% presented 25 words to all patients, and 40.8% presented 25 words if the first 25 responses were correct. In the most recent survey (Martin & Morris, 1989), 5.9% of respondents presented 50 words to all patients, 56.9% presented 25 words to all patients, and 24.8% presented 25 words if the first 25 responses were correct. From 1978 to 1989, then, the percentage of respondents routinely using full, 50-word test lists decreased considerably. The number of items used is important because variability in scores is inversely related to the number of items in the test (Hagerman, 1976; Raffin & Schafer, 1980; Raffin & Thornton, 1980; Thornton & Raffin, 1978). The use of fewer items results in increased variability about a score. This, in turn, decreases the sensitivity of the test in terms of delineating true differences in recognition performance. The variability about a score for different numbers of test words can be predicted from the binomial model (Hagerman, 1976; Thornton & Raffin, 1978). The binomial model can be applied to any independent binary event. For word recognition testing, the binary event is a correct or incorrect response to a test word. Properties of the binomial model, irrespective of the word list used, dictate that March 1995

variability about a score is inversely related to the number of test items. As the number of test items decreases, the variability about the test scores increases. The effect of item number also varies with the score itself; variability is greater for middle-range scores and smaller at the extremes of the range. As the variability about test scores increases, clinical interpretations of score differences suffer because judgments of performance differences must then be based upon a wider distribution of scores that are not significantly different. For example, if a score of 72% is obtained with a 50-word list, the range of scores that are not significantly different is 54–86%. If this same score is obtained using 25 words, the confidence interval increases to 48–92% (Thornton & Raffin, 1978). As a consequence, clinical decisions based on performance differences are compromised, and test sensitivity is diminished. If one is interested in using clinical procedures that are sensitive to small performance differences, the variability associated with the measure of interest should be minimized. Increasing the number of test items is a way of reducing variability. Although the reason(s) for using half-lists is not reported, one might speculate that a primary reason is the increased test time required for complete list presentations. If current compact disc recordings are used, test time for W-22 half-lists is slightly less than 2 minutes. Presenting the second half of each list would take an additional 2 minutes of test time per ear. These additional 2 minutes of test time may be time well spent, considering the effect increased variability may add to measures that are relatively insensitive in the first place. Using arbitrarily shortened word lists may unreasonably exacerbate inherent weaknesses in test sensitivity. We are not necessarily advocating the use of longer word lists in clinical evaluations. We are advocating the selection of tests based on diagnostic efficacy. In some cases, it may be argued that shorter lists are as efficacious as longer lists. The construction of shorter lists, however, should not be arbitrary, such as using the first or second half of a given word list. Rather, the decision to use a shortened word list should be based on available evidence and sound objectives. Shorter word lists that are rankordered, based on error analyses and careful evaluation in large clinical populations, may offer a plausible alternative to full word list presentation (for example, see Qualitone, 1988, and Runge & Hosford-Dunn, 1985). The commercially available Q-Mass Ten-Item Test (Qualitone, 1988) produced by Raffin and Thornton, for example, may offer a useful clinical option in certain settings. This is a 10-item test using the words most sensitive to differences in word recognition performance across listeners. Here, the number of test items has been reduced considerably. Decisions on the number of items and the specific items selected, however, were based on clinical data and error analyses for 24,973 patients, not solely on the basis of time or on the appearance of items in the first or last half of a word list. Presentation mode. Most audiologists continue to use monitored live voice (MLV) for word recognition tests (Martin & Morris, 1989). Martin and Forbis (1978) reported that “most” (p. 533) audiologists used MLV, but

did not report the actual percentage. Martin and Pennington (1971) reported that 64.4% used MLV, and, in the most recent survey (Martin & Morris, 1989), 70.3% of respondents used MLV for word recognition tests. This practice continues in spite of reports demonstrating that word recognition scores obtained for different talkers or the same talker at different times are not equivalent (Brandy, 1966; Creelman, 1957; Mullennix, Pisoni, & Martin, 1989; Penrod, 1979; Resnick, 1962). In fact, each speaker of word recognition materials constitutes a different test that may produce different psychometric articulation functions (percent correct performance as a function of presentation level). Penrod (1979), for example, found that 43% of his subjects had score variations across talkers exceeding the 95% confidence limits predicted by the binomial model (Thornton & Raffin, 1978). Half of his subjects (15 of 30) had scores differing by 15% or more across talkers. Penrod (1979) concluded that “Such excessive differences are clearly not within the range of acceptable clinical variation…” (p. 346). Two fundamental issues that underlie the limitations of live voice audiometry are: (a) the speech signal is characterized by considerable variability in its acoustic composition (Mullennix et al., 1989), and (b) item difficulty is at least partially related to the talker (Hood & Poole, 1980; Penrod, 1979). The acoustic properties of speech are affected by numerous and variable phonetic and prosodic characteristics, as well as by characteristics of the source (Goldinger, Pisoni, & Logan, 1991). Such source characteristics include structural differences, differences in glottal source function, and other various talker-specific factors. Talker-specific sources of the acoustic signal include vocal force or intensity, pitch, articulation, individual dialects, and speaking rates (Fairbanks & Miron, 1957; Goldinger et al., 1991; Hirsh, Reynolds, & Joseph, 1954). Different sources (talkers) significantly influence word recognition scores (Kreul, Bell, & Nixon, 1969; Kreul et al., 1968; Mullennix et al., 1989; Penrod, 1979). This also is true for scores obtained for the same speaker from one trial to the next. Brandy (1966), for example, reported significant differences in word recognition scores obtained for the same male speaker on different days under identical conditions. Although in some instances the use of MLV may be quicker than using recorded materials, it comes with a significant sacrifice of validity and reliability. Such clinical practice obviates standardization from test to test and clinic to clinic. Tillman and Olsen (1973) perhaps summarized the issue best in stating “it is obvious that no standardized test is possible unless recorded tests are employed, that is, because of talker differences tests administered via monitored-live-voice defy standardization” (p. 53). Presentation level. Across surveys, a clear majority of audiologists continue to use a variety of methods to select one presentation level for word recognition tests. In the most recent survey (Martin & Morris, 1989), 74% of respondents reported using “a specific sensation level in reference to the SRT” (p. 27). Over half of these respondents used 40 dB re SRT, and 30% used 30 dB re SRT. Of all respondents in the 1989 survey, 22% reported using Wiley • Stoppenbach • Feldhake • Moss • Thordardottir

29

other presentation levels. Some of these respondents reported screening for rollover using two presentation levels (e.g., MCL and 90 dB HL). Martin and Morris (1989) did not report the percentage of respondents, if any, that routinely obtained psychometric articulation functions. In previous surveys (Martin & Forbis, 1978; Martin & Pennington, 1971; Martin & Sides, 1985), however, only a small minority (3.2–4.9%) of audiologists reported obtaining such psychometric functions. The basic assumption underlying the use of a single level for word recognition tests is that the level chosen is one at which the maximum word recognition score or PB Max will be obtained. This assumption, in turn, is based on knowledge or assumed knowledge regarding the characteristics of the psychometric articulation function for the test materials used and for the listener under test. Research over the past four decades, however, has indicated that the level at which PB Max is achieved differs substantially across patients. Both early research (Carhart, 1965; Clemis & Carver, 1967; Davis, 1948; Eldert & Davis, 1951; Schultz & Streepy, 1967; Yantis, Millin, & Shapiro, 1966) and more recent studies (Beattie & Zipp, 1990; Coles, 1972; Jerger & Hayes, 1977; Kamm, Morgan, & Dirks, 1983; Posner & Ventry, 1977; Ullrich & Grimm, 1976) have shown that the use of a single presentation level precludes obtaining PB Max in a large number of listeners tested. Kamm, Morgan, & Dirks (1983), for example, reported that a presentation level of 40 dB SL did not yield scores on the plateau (PB Max) portion of the psychometric articulation function in 40% of their listeners. In an earlier study, Yantis et al. (1966) showed that the level at which PB Max was obtained varied across subjects with sensorineural hearing loss. Accordingly, the authors advised that PB max will not always be found by testing at one specific level. Several past studies also have shown that using the most comfortable listening level or most comfortable loudness level as the presentation level does not assure obtaining PB Max (Clemis & Carver, 1967; Posner & Ventry, 1977; Ullrich & Grimm, 1976). Clemis and Carver (1967) noted little relation between the level at which PB Max was observed and the most comfortable listening level (MCL). In all but one of their patients, MCLs were below the level at which PB Max was obtained. And, in no patient was PB Max obtained at the MCL. Ullrich and Grimm (1976) reported that for a majority of their listeners with sensorineural hearing loss, MCL presentations did not allow the maximum score to be obtained. Indeed, they typically found that MCLs were below the level at which the maximum score was achieved. Although the use of a single presentation level for estimates of maximum word recognition ability (PB Max) is a popular clinical practice, it is not supported by reported data for subjects with sensorineural hearing loss. The single-level, PB Max approach is based on an assumed, idealized psychometric articulation function that is not appropriate in application for a large number of clinical patients. Carhart (1965) characterized the limitations in using one presentation level by stating that “the clinician who tests only at one presentation level can be sure that he 30

American Journal of Audiology • Vol. 4 • No. 1

has a valid estimate of a person’s maximum discrimination score only if the score approximates 100%” (p. 256). Similarly, Jerger and Hayes (1977) noted that in some cases, “the exact shape of the PI function is so unpredictable that the speech level producing maximum performance can seldom be accurately estimated a priori” (p. 216). Even if a true PB Max is obtained at a given suprathreshold presentation level, this finding does not necessarily permit description of an individual’s overall word recognition ability. Two individuals may exhibit similar or identical PB Max scores at a fixed presentation level, for example, yet one individual may demonstrate rollover at higher levels and the other may not (Dirks, Kamm, Bower, & Betsworth, 1977; Jerger & Hayes, 1977; Jerger & Jerger, 1971). Errors in estimation of a person’s optimal word recognition ability, based on recognition measures at a single suprathreshold level, may have an important impact on patient management. Such errors may compromise clinical judgments and decisions regarding patients’ communication difficulties, patients’ performance with and without a hearing aid, the need for further diagnostic tests, and other diagnostic and prognostic conclusions.

Tone Decay Tests Use of tone decay tests was reported in the 1978, 1985, and 1989 surveys (Martin & Forbis, 1978; Martin & Morris, 1989; Martin & Sides, 1985). In the summary of the latest survey by Martin and Morris (1989), the authors noted that 94% of the 1978 respondents administered tone decay tests on a routine basis, compared with 80% in 1985 and 75% in 1989. Compared with other behavioral tests for site of lesion (such as Bekesy audiometry and loudness balance procedures), for which reported use has declined markedly over the past 20 years, tone decay tests have remained relatively popular over the past 20 years. Despite past evidence showing clear differences in diagnostic sensitivity across tone decay procedures (Johnson, 1977; Olsen & Noffsinger, 1974; Parker & Decker, 1971; Silman, Gelfand, Lutolf, & Chun, 1980; Wiley & Lilly, 1980), however, the surveys suggest that a number of audiologists continue to use insensitive versions of tone decay tests. A primary characteristic that distinguishes tone decay tests is the criterion time for test termination. For the 1989 survey, 48.9% of the respondents reported that they continued the test until the patient heard the continuous tone for 60 seconds (or until the limits of the audiometer were reached), 48.3% percent administered the test for a total of 60 seconds regardless of audibility, and a few respondents (2.7%) used other nonspecified termination criteria. Tone decay tests that require increases in signal level until a tone is audible for a full 60 seconds at one level include the Carhart (1957) Threshold Tone Decay Test (TTDT) and the procedure described by Olsen and Noffsinger (1974). Two of the more popular shortened procedures, which terminate at 60 seconds regardless of audibility, are the Suprathreshold Tone Decay Test (STAT) (Jerger & Jerger, 1975) and the procedure described by March 1995

Rosenberg (1958). The amount of measured tone decay and the diagnostic hit rates differ significantly for these two classes of tests. Parker and Decker (1971) reported large discrepancies between the results of the Carhart and Rosenberg procedures. As an example, they reported a case study of VIII nerve pathology for which the Rosenberg test yielded 12.5 dB of decay, and the Carhart test produced 85 dB of decay. This large difference in the amount of tone decay for the two procedures likely relates to the rate at which tone decay develops. Decay must be relatively rapid to reach critical amounts in one minute. There are, however, common instances where decay is initially slow and then becomes more rapid at higher levels, eventually extending beyond the limits of the audiometer (Parker & Decker, 1971). This underscores the greater sensitivity of tests that cover a wide range of presentation levels and include a 60second termination criterion at a single level. Two other studies evaluated the test sensitivity or hit rate for tone decay procedures with different termination criteria (Johnson, 1977; Olsen & Noffsinger, 1974). These studies also demonstrated better sensitivity for procedures that continued testing until the tone was audible for 60 seconds at one level (or the audiometer limit was reached) compared to that for the Rosenberg (1958) test. Olsen and Noffsinger (1974), for example, reported a 65% hit rate (and a 35% false negative rate) for the Rosenberg 60second procedure. In the same study, however, Olsen and Noffsinger reported a 95% hit rate for the TTDT procedure and for their modified TTDT procedure that begins at 20 dB SL. Finally, a lower diagnostic hit rate also has been reported for the STAT procedure. Jerger and Jerger (1975), for example, reported hit rates of 45–75% for the STAT procedure, depending on the number of frequencies used in defining abnormal tone decay. As noted earlier, the surveys show that shortening the tone decay test by terminating it after 60 seconds is popular among audiologists. Unfortunately, based on available evidence, abnormal tone decay will in many cases be underestimated or entirely missed by that procedure. Published reports indicate that the shortened tone decay tests are associated with higher false negative rates, lower hit rates, and minimal changes in false positive rates (Jerger & Jerger, 1975; Johnson, 1977; Olsen & Noffsinger, 1974). The time invested in continuing the test until the tone is heard for a full minute is essential to maintain the sensitivity of the test. If there is little or no tone decay, the extra time will be minimal. In cases of significant tone decay, the extra time necessary to characterize the abnormal decay process will be a good investment in terms of differential diagnosis. Another aspect of tone decay that is potentially valuable in differential diagnosis is its time course. Although not specifically addressed in the surveys of audiologic practice, the surveys do suggest indirectly that a significant number of audiologists do not evaluate the rate of tone decay. That is, the time to inaudibility as a function of presentation level is not considered. The procedural design of the popular Rosenberg (1958) procedure and of the STAT (Jerger & Jerger, 1975) procedure obviates a complete

evaluation of the decay time course. However, it has long been known that the rate of tone decay may be useful in differential diagnosis. The existence of different rate patterns of tone decay was noted as early as 1944 by Schubert, and in 1957 by Carhart. The diagnostic significance of rate of decay was further documented by Sorensen (1960) and by Owens (1964). Three major time patterns of tone decay emerged from these studies: (a) no decay (normal listeners), (b) progressively slower decay as presentation level is increased (predominantly patients with cochlear disorders), and (c) brief audibility time at each level with no change in rapidity of decay as presentation level is increased (predominantly VIIIth nerve lesions). Although it is clear that the amount of decay in dB provides sufficient diagnostic information in many cases of VIIIth nerve lesions (Olsen & Noffsinger, 1974), cases have been reported for which the amount of decay alone was misleading, and the correct diagnosis could be made only by considering rate of decay. Wiley and Lilly (1980) reported a case of a patient with a surgically confirmed VIIIth nerve tumor on the left side and a sensorineural hearing loss of cochlear origin on the right. Preoperatively, the amount of tone decay in dB recorded from the left ear was limited by a small dynamic range and did not reach a level considered to be indicative of VIIIth nerve involvement. The right ear, having better hearing thresholds, actually demonstrated twice as much decay (in dB) as the tumor ear. The tone decay in the right ear resulted from cochlear disease, however, rather than from an VIIIth nerve lesion. In contrast, analysis of the temporal course of decay in the two ears revealed that decay was much faster in the left ear. Indeed, the affected patient heard the test tone in the left ear for only a few seconds at all test levels. Later the same year, Silman et al. (1980) reported a case supporting the same general thesis regarding the diagnostic value of decay time course. The extent to which audiologists make use of temporal patterns of tone decay was not directly addressed in the surveys. Because shortened tone decay tests, which do not incorporate an analysis of decay time course, were popular among a significant number of respondents, it is clear that a substantial number of audiologists do not consider the temporal characteristics of tone decay in making diagnostic decisions. Available evidence, however, suggests that the time required to record temporal information may be well spent. In some cases, the differentiation of tone decay processes for cochlear andVIIIth nerve disorders is most evident for temporal or rate measures. Although the use of tone decay tests decreased somewhat between 1978 and 1989 (Martin & Forbis, 1978; Martin & Morris, 1989), they are still used by a substantial majority of audiologists. Despite the established superiority of procedures that require a 60-second termination criterion at a single level, however, the surveys show that a considerable number of audiologists use a shortened procedure that seriously compromises test sensitivity. The improved diagnostic efficacy of procedures that include data for a wide range of test levels and that include an analysis of the decay time course apparently has been set aside in favor of time savings. In the case of tone decay Wiley • Stoppenbach • Feldhake • Moss • Thordardottir

31

tests, if time is the primary concern, it is probably wiser to skip the test altogether than to use an insensitive version of it.

Discussion and Conclusions Our review of two areas of common audiologic practice indicates clear disparities between research evidence and clinical methods. Further, our review suggests that some of the questionable practices have changed little over the survey period from 1971 through 1989. Certainly, some of the inappropriate practices are more critical than others. Our purpose here was to exemplify the gap between knowledge and practice, not to inventory or prioritize disparities. Viewing our review topics as reference points, however, it is clearly possible that such disparities exist for other clinical practices. As a general rule, any and all clinical practices that are contrary to valid published data diminish the quality of service and compromise the credibility of our discipline (Wiley, 1988). The exact basis or bases for the use of inappropriate clinical practices is unclear, but the issue of time is likely central to many of the shortcuts used in clinical practice. As we have noted, the time savings associated with certain test modifications and shortened procedures is often minimal. However, many of these shortcuts are accompanied by decreased test efficacy and, therefore, decreased quality of service. In many cases, the overall value of our diagnostic evaluations could be improved with the elimination of tests having little value and the addition of more sensitive measures and procedures. The first and primary criterion for test selection or modification should be data-based efficacy. High-quality patient care should be our goal. As Wilson and Margolis (1983) comment, “If patients are expected to spend their time and money in the audiology clinic, then they deserve the benefit of an ongoing critical evaluation of clinical procedures” (p. 122). The quality of diagnostic evaluations should not be sacrificed in the interests of time savings. Indeed, in some cases the elimination of shorter, but less sensitive, measures may allow sufficient time for the use of efficacious procedures that require more time. The inadequate integration between research and practice may, at least in part, speak to expressed weaknesses in current audiology training programs. As a result of these weaknesses, graduates entering clinical practice may not only lack adequate background in state-of-the-art clinical techniques, but also may lack appropriate research orientation and skills for effective discovery and for progress in new and better directions. Training must be centered on what is proven, not on what is popular. This issue must be a primary concern for instructors in audiology training programs and for practicing clinicians. The clear message is that we must all renew our efforts to be research consumers (Cornett & Chabon, 1988; Kent, 1985). The development of proper attitudes toward research and technological advances will be essential for audiologists in the coming decades as our knowledge base and associated technologies are expanded. Interestingly, although the Educational Standards Board 32

American Journal of Audiology • Vol. 4 • No. 1

(ESB) of the American Speech-Language-Hearing Association requires a curricular “commitment to the scientific and research bases of the profession” (ASHA, 1992, p. 18) for ESB-accredited training programs, the word research does not appear in the Standards for the Certificates of Clinical Competence (CCC) recommended by the American Speech-Language-Hearing Association (1990). Research training and experiences are not specified CCC requirements, nor are they a programmatic aspect of the curriculum originally proposed by the Academy of Dispensing Audiologists for the Doctor of Audiology (AuD) degree (1988), which is aimed at preparation of master clinicians. The lack of appropriate emphasis and exposure to research principles is counter to excellence in scholarship, is counter to the development of problemsolving skills essential for clinical work, and likely is fundamental to existent gaps between current knowledge and clinical practices. In the interests of patient care and of student training, we must all continue to read and to learn, and to practice what we have learned.

Acknowledgments The authors would like to thank Robert Goldstein, Michelle Quinn, and Dee Vetter for their comments and suggestions on the manuscript.

References Academy of Dispensing Audiologists. (1988). Proceedings: Academy of Dispensing Audiologists Conference on Professional Education (pp. 1–52). Chicago: Author. American Speech-Language-Hearing Association. (1978). Guidelines for manual pure-tone threshold audiometry. Asha, 20(4), 297–301. American Speech-Language-Hearing Association. (1988). Guidelines for determining threshold level for speech. Asha, 30(3), 85–88. American Speech-Language-Hearing Association. (1990). Standards for the certificates of clinical competence. Asha, 32(3), 111–112. American Speech-Language-Hearing Association. (1992). ESB Educational Standards Board 1992 accreditation manual. Rockville, MD: Author. Beattie, R. C., & Zipp, J. A. (1990). Range of intensities yielding PB Max and the threshold for monosyllabic words for hearing-impaired subjects. Journal of Speech and Hearing Disorders, 55(8), 417–426. Brandy, W. T. (1966). Reliability of voice tests of speech discrimination. Journal of Speech and Hearing Research, 9(3), 461–465. Burney, P. A. (1972). A survey of hearing-aid evaluation procedures. Asha, 14(9), 439–444. Carhart, R. (1957). Clinical determination of abnormal auditory adaptation. Archives of Otolaryngology, 65(1), 32–39. Carhart, R. (1965). Problems in the measurement of speech discrimination. Archives of Otolaryngology, 82(9), 253–260. Causey, G. D., Hermanson, C. L., Hood, L. J., & Bowling, L. S. (1983). A comparative evaluation of the Maryland NU-6 auditory test. Journal of Speech and Hearing Disorders, 48(1), 62–69. Clemis, J., & Carver, W. (1967). Discrimination scores for speech in Menieres disease. Archives of Otolaryngology, 86(6), 614–618. March 1995

Coles, R. (1972). Can present day audiology really help in diagnosis? An otologist’s question. Journal of Laryngology and Otology, 86(3), 191–224. Conn, M., Dancer, J., & Ventry, I. M. (1975). A spondee list for determining speech reception threshold without prior familiarization. Journal of Speech and Hearing Disorders, 40(3), 388–396. Cornett, B. S., & Chabon, S. S. (1988). Platitudes on attitudes. The clinical practice of speech-language pathology (Chapter 3, pp. 39–56). Columbus: Merrill. Creelman, C. D. (1957). Case of the unknown talker. Journal of the Acoustical Society of America, 29(5), 655. Davis, H. (1948). The articulation area and the social adequacy index for hearing. Laryngoscope, 58(8), 761–778. Dirks, D. D., Kamm, C., Bower, D., & Betsworth, A. (1977). Use of performance-intensity functions for diagnosis. Journal of Speech and Hearing Disorders, 42, 408–415. Dubno, J. R., Dirks, D. D., & Langhofer, L. R. (1982). Evaluation of hearing-impaired listeners using a nonsensesyllable test. II. Syllable recognition and consonant confusion patterns. Journal of Speech and Hearing Research, 25(1), 141–148. Eldert, E., & Davis, H. (1951). The articulation function of patients with conductive deafness. Laryngoscope, 61(9), 891– 909. Fairbanks, G., & Miron, M. S. (1957). Effects of vocal effort upon the consonant-vowel ratio within the syllable. Journal of the Acoustical Society of America, 29(5), 621–626. Geffner, D., & Donovan, N. (1974). Intelligibility functions of normal and sensorineural loss subjects on the W-22 lists. Journal of Auditory Research, 14(1), 82–86. Goetzinger, C. P. (1972). Word discrimination testing. In J. Katz (Ed.), Handbook of clinical audiology (Chapter 9, pp. 157– 179). Baltimore: Williams & Wilkins. Goetzinger, C. P., Proud, G. O., Dirks, D., & Embrey, J. (1961). A study of hearing in advanced age. Archives of Otolaryngology, 73(6), 60–72. Goldinger, S. D., Pisoni, D. B., & Logan, J. S. (1991). On the nature of talker variability effects on recall of spoken word lists. Journal of Experimental Psychology: Learning, Memory and Cognition, 17(1), 152–162. Goldstein, R., Goodman, A. C., & King, R. B. (1956). Hearing and speech in infantile hemiplegia before and after left hemispherectomy. Neurology, 6(12), 869–875. Hagerman, B. (1976). Reliability in the determination of speech discrimination. Scandinavian Audiology, 5(4), 219–228. Hirsh, I. J., Davis, H., Silverman, S. R., Reynolds, E. G., Eldert, E., & Benson, R. W. (1952). Development of materials for speech audiometry. Journal of Speech and Hearing Disorders, 17(3), 321–337. Hirsh, I. J., Reynolds, F. G., & Joseph, M. (1954). Intelligibility of different speech materials. Journal of the Acoustical Society of America, 26(4), 530–538. Hood, J. D., & Poole, J. P. (1980). Influence of the speaker and other factors affecting speech intelligibility. Audiology, 19(5), 434–455. Jerger, J., & Hayes, D. (1977). Diagnostic speech audiometry. Archives of Otolaryngology, 103(4), 216–222. Jerger, J., & Jerger, S. (1971). Diagnostic significance of PB word functions. Archives of Otolaryngology, 93(6), 573–580. Jerger, J., & Jerger, S. (1975). A simplified tone decay test. Archives of Otolaryngology, 101(7), 403–407. Johnson, E. W. (1977). Auditory test results in 500 cases of acoustic neuroma. Archives of Otolaryngology, 103(3), 152– 158. Kamm, C. A., Morgan, D. E., & Dirks, D. D. (1983). Accuracy

of adaptive procedure estimates of PB-Max level. Journal of Speech and Hearing Disorders, 48(5), 202–209. Keith, R. W., & Talis, H. P. (1972). The effects of white noise on PB scores of normal and hearing-impaired listeners. Audiology, 11(3), 177–186. Kent, R. D. (1985). Science and the clinician: The practice of science and the science of practice. Seminars in Speech and Language, 6(1), 1–12. Kreul, E. J., Bell, D. W., & Nixon, J. C. (1969). Factors affecting speech discrimination test difficulty. Journal of Speech and Hearing Research, 12(2), 281–287. Kreul, E. J., Nixon, J. S., Kryter, K. D., Bell, D. W., Lang, J. S., & Schubert, E. D. (1968). A proposed clinical test of speech discrimination. Journal of Speech and Hearing Research, 11(3), 536–552. Lovrinic, J. H., Burgi, E. J., & Curry, E T.. (1968). A comparative evaluation of five speech discrimination measures. Journal of Speech and Hearing Research, 11(2), 372–381. Maroonroge, S., & Diefendorf, A. (1984). Comparing normal hearing and hearing-impaired subjects’ performance on the Northwestern Auditory Test Number 6, California Consonant Test, and Pascoe’s High-Frequency Word Test. Ear and Hearing, 5(6), 356–360. Martin, F. N., & Forbis, N. K. (1978). The present status of audiometric practice: A follow-up study. Asha, 20(7), 531–541. Martin, F. N., & Gravel, K. L. (1989). Pediatric audiologic practices in the United States. The Hearing Journal, 42(8), 33–48. Martin, F. N., & Morris, L. J. (1989). Current audiologic practices in the United States. The Hearing Journal, 42(4), 25–44. Martin, F. N., & Pennington, C. D. (1971). Current trends in audiometric practices. Asha, 13(11), 671–677. Martin, F. N., & Sides, D. G. (1985). Survey of current audiometric practices. Asha, 27(2), 29–36. Mullennix, J. W., Pisoni, D. B., & Martin, C. S. (1989). Some effects of talker variability on spoken word recognition. Journal of the Acoustical Society of America, 85(1), 365–378. Olsen, W. O., & Noffsinger, D. (1974). Comparison of one new and three old tests of auditory adaptation. Archives of Otolaryngology, 99(2), 94–99. Owens, E. (1964). Tone decay in VIIIth nerve and cochlear lesions. Journal of Speech and Hearing Disorders, 29(1), 14–22. Parker, W., & Decker, R. L. (1971). Detection of abnormal auditory threshold adaptation (ATA). Archives of Otolaryngology, 94(1), 1–7. Pennington, C. D., & Martin, F. N. (1972). Current trends in audiometric practices: Part II. Auditory tests for site of lesion. Asha, 14(4), 199–203. Penrod, J. P. (1979). Talker effects on word discrimination scores of adults with sensorineural hearing impairment. Journal of Speech and Hearing Disorders, 44(3), 340–349. Posner, J., & Ventry, I. M. (1977). Relationships between comfortable loudness levels for speech and speech discrimination in sensorineural hearing loss. Journal of Speech and Hearing Disorders, 42(3), 370–375. Qualitone. (1988). Q/MASS Speech Audiometry, Vol. 1, Minneapolis: Qualitone. Raffin, M., & Schafer, D. (1980). Application of a probability model based on the binomial distribution to speech-discrimination scores. Journal of Speech and Hearing Research, 23(3), 570–575. Raffin, M., & Thornton, A. (1980). Confidence levels for Wiley • Stoppenbach • Feldhake • Moss • Thordardottir

33

differences between speech-discrimination scores: A research note. Journal of Speech and Hearing Research, 23(1), 5–18. Resnick, D. M. (1962). Reliability of the twenty-five word phonetically balanced lists. Journal of Auditory Research, 2(1), 5–12. Rosenberg, P. E. (1958). Rapid clinical measurement of tone decay. Paper presented at the American Speech and Hearing Association Convention, New York. Runge, C. A., & Hosford-Dunn, H. (1985). Word recognition performance with modified CID W-22 word lists. Journal of Speech and Hearing Research, 28(9), 355–362. Schubert, K. (1944). Horermudung und Hordauer. Zeit. F. Hals-, Nasen Ohrenbeil., 51, 19–74. Schultz, M. C., & Streepy, C. S. (1967). The speech discrimination function in loudness recruiting ears. Laryngoscope, 77(12), 2114–2127. Schwartz, D. M., & Surr, R. K. (1979). Three experiments on the California Consonant Test. Journal of Speech and Hearing Disorders, 44(1), 61–72. Sher, A. E., & Owens, E. (1974). Consonant confusion associated with hearing loss above 2000 Hz. Journal of Speech and Hearing Research, 17(4), 669–681. Silman, S., Gelfand, S. A., Lutolf, J., & Chun, T. H. (1980). A response to Wiley and Lilly [Letter to the editor]. Journal of Speech and Hearing Disorders, 46(2), 217. Silverman, S. R., & Hirsh, I. J. (1955). Problems related to the use of speech in clinical audiometry. Annals of Otology, Rhinology and Laryngology, 64(4), 1234–1244. Sorensen, H. (1960). A threshold tone decay test. Acta OtoLaryngologica, Suppl. 158, 356–360. Thornton, A., & Raffin, M. (1978). Speech-discrimination scores modeled as a binomial variable. Journal of Speech and Hearing Research, 21(3), 507–518. Thurlow, W., Davis, H., Silverman, S., & Walsh, T. (1949). Further statistical study of auditory tests in relation to the fenestration operation. Laryngoscope, 59(2), 113–129. Thurlow, W., Silverman, S., Davis, H., & Walsh, T. (1948). A statistical study of auditory tests in relation to the fenestration operation. Laryngoscope, 58(1), 43–66. Tillman, T. W., & Jerger, J. (1959). Some factors affecting the

34

American Journal of Audiology • Vol. 4 • No. 1

threshold in normal hearing subjects. Journal of Speech and Hearing Research, 2(2), 141–146. Tillman, T. W., & Olsen, W. O. (1973). Speech audiometry. In J. Jerger (Ed.). Modern developments in audiology (2nd ed.) (Chapter 2, pp. 37–74). New York: Academic Press. Ullrich, K., & Grimm, D. (1976). Most comfortable listening level presentation versus maximum discrimination for word discrimination material. Audiology, 15(4), 338–347. Walden, B. E., Prosek, R. A., & Worthington, D. W. (1975, August 31). The prevalence of hearing loss within selected U.S. Army branches (Interagency No. IAO 4745, pp. 1–94). Washington, DC: U.S. Army Medical Research and Development Command. Wiley, T. L. (1988). Curricular arithmetic: How do we add and subtract? Proceedings of the Ninth Annual Conference on Graduate Education (pp. 36–42). Minneapolis: Council of Graduate Programs in Communication Sciences and Disorders. Wiley, T. L., & Lilly, D. J. (1980). Temporal characteristics of auditory adaptation: A case report. Journal of Speech and Hearing Disorders, 45(2), 209–215. Wilson, R. H., & Margolis, R. H. (1983). Measurements of auditory thresholds for speech stimuli. In D. F. Konkle, & W. F. Rintelmann, Principles of speech audiometry (Chapter 5, pp. 79–126). Baltimore: University Park Press. Yantis, P., Millin, J., & Shapiro, I. (1966). Speech discrimination in sensorineural hearing loss: Two experiments on the role of intensity. Journal of Speech and Hearing Research, 9(2), 178–193. Received November 24, 1993 Accepted April 12, 1994 Contact author: Terry L. Wiley, Department of Communicative Disorders, University of Wisconsin–Madison, 1975 Willow Drive, Madison, WI 53706

Key Words: audiometry, diagnostic, speech audiometry, tone decay

March 1995