Evaluation of new external quality assessment ... - IngentaConnect

37 downloads 0 Views 124KB Size Report
Laboratorio Estatal de Salud Pública, Pachuca, Hidalgo, Mexico; § Centers for Disease Control and Prevention,. SUMMARY. Atlanta, Georgia, USA. SETTING: ...
INT J TUBERC LUNG DIS 9(3):301–305 © 2005 The Union

Evaluation of new external quality assessment guidelines involving random blinded rechecking of acid-fast bacilli smears in a pilot project setting in Mexico A. Martinez,* S. Balandrano,* A. Parissi,† A. Zuniga,‡ M. Sanchez,‡ J. Ridderhof,§ H. B. Lipman,§ B. Madison§ * Instituto de Diagnóstico y Referencia Epidemiológicos, Mexico, DF, † Laboratorio Estatal de Salud Publica, Veracruz, ‡ Laboratorio Estatal de Salud Pública, Pachuca, Hidalgo, Mexico; § Centers for Disease Control and Prevention, Atlanta, Georgia, USA SUMMARY S E T T I N G : Laboratories in Mexico that support the national tuberculosis (TB) control program have been involved in an acid-fast bacilli (AFB) microscopy external quality assurance program which includes rechecking 100% of smears identified as AFB-positive by the local laboratories and 10% of smears identified as AFB-negative. Very few errors have been detected in Mexico using nonrandom selection and unblinded rechecking of the slides. O B J E C T I V E : To evaluate the results from a 1-year pilot program involving blinded rechecking of randomly selected AFB slides from local TB laboratories in two Mexican states and determine its feasibility for future implementation. D E S I G N : To reduce potential bias, laboratory staff from the National TB Laboratory, Institute for Epidemiolog-

ical Diagnosis and Reference (InDRE), performed quarterly statistical sampling of AFB smears and on-site evaluations in local laboratories in each state. AFB smears were rechecked at the respective state laboratories with discordant results resolved at InDRE. R E S U L T S : A significantly greater percentage of errors was detected on the randomly selected, blinded AFB smears than on the non-randomly selected, unblinded smears. C O N C L U S I O N : Random blinded rechecking provides more accurate estimates of AFB microscopy results, resulting in improved diagnosis and monitoring of treatment response. K E Y W O R D S : quality assurance; laboratory testing; AFB microscopy; tuberculosis

THE INCIDENCE of tuberculosis (TB) in Mexico in 2002 ranged from 4 to 37 cases per 100 000 population; approximately 16 000 new cases were diagnosed nationwide, 80% of which were pulmonary TB. Direct sputum-smear examination for acid-fast bacilli (AFB) by light microscopy is the most cost effective and commonly used procedure for diagnosing and monitoring progress in the treatment of TB.1 The World Health Organization (WHO) strategy for TB control, DOTS, relies on a network of laboratories to provide readily accessible and quality laboratory services to prevent the spread of infection in the community and prevent unnecessary treatment of cases who do not have TB. The Mexican Secretary of Health TB laboratory network consists of more than 600 laboratories that provide diagnostic services for 40% (39 million) of Mexico’s population and perform 75% of all diagnostic tests for TB.

The Instituto de Diagnóstico y Referencia Epidemiológicos (InDRE), the National TB Reference Laboratory for the network in Mexico, is in the process of implementing new methods of external quality assurance (EQA) for improving AFB smear microscopy. For several years, rechecking has been the routine quality control method for AFB smear microscopy as recommended by the WHO and the International Union against Tuberculosis and Lung Disease (IUATLD).2,3 For local laboratories, all positive AFB smears and 10% of negative smears were supposed to be sent to state public health laboratories (SPHLs) for rechecking, and 10% of smears read in SPHLs were sent to InDRE for rechecking and identification of discrepancies between the local and state laboratories. With the non-random selection of AFB smears and the unblinded rechecking method, very few errors were detected. In 1998, technicians in over 500 of the

Correspondence to: Dr Bereneice Madison, Public Health Practice Program Office, Centers for Disease Control and Prevention, MS G23, 4770 Buford Hwy NE, Atlanta, GA 30341-3717, USA. Tel: (11) 770-488-8133. Fax: (11) 770-488-8282. e-mail: [email protected] Article submitted 12 January 2004. Final version accepted 14 June 2004.

302

The International Journal of Tuberculosis and Lung Disease

637 laboratories were evaluated by proficiency testing using a panel of slides with known numbers of AFB. Of the 430 technicians tested, 196 (46%) scored less than 80% on proficiency testing and received intensive training the following year. As previous rechecking of AFB smear results in most laboratories had not yielded errors, it was difficult to find a significant association between proficiency testing scores and rechecking error rates, although technicians whose work was routinely rechecked had a higher mean proficiency testing score than those whose work was not rechecked.4,5 Continuing efforts towards improving EQA for AFB smear microscopy in Mexico prompted InDRE to implement and evaluate a 1-year pilot program for blinded rechecking of a smaller randomly selected sample of AFB smears collected from local laboratories in two states.

METHODS Two states (A and B) were selected for the rechecking pilot program by InDRE. The selection of the states was based on their size, proximity to InDRE, humidity, test volume and estimated prevalence of TB based on smear positivity rates from previous rechecking reports. Each local or peripheral laboratory within each state was assigned a unique identifier which was included on each patient smear to permit the origin of the smear to be traced through the rechecking process. As a large number of AFB smears originate from remote, non-laboratory locations, the network was also interested in evaluating and improving the quality of sputum specimen collection and smear preparation. The slides were identified as being inside or outside the laboratory and evaluated microscopically based on the mucopurulence, size and thickness of the smear, and smearing of the specimen on the microscope slide. A quantity of microscope slide boxes holding 100 slides per box were provided to each laboratory to properly store at least a 3–4 month supply of AFB smears until they were collected for rechecking by supervisors. AFB smears were cleaned with xylene, drained and stored in consecutive numerical order without separating positive and negative smears. In reading the smears, technicians were instructed to read as recommended by the WHO/IUATLD guidelines2,3 and cautioned not to over-read slides. They were to evaluate the quality of the specimen and preparation of the smears. To improve the generalizability of the results of the pilot study, a single random sample of smears was collected for rechecking from each laboratory every 3 months. The sample size used was based on lot quality assurance sampling (LQAS) techniques recently described by the International Workgroup on External Quality Assessment for AFB Microscopy.6 For ease of implementation, the following approximately

average values were assumed for each laboratory: a testing volume of 1000–2000 slides per year, a slide positivity rate of 5%, a sensitivity of 75%, and a specificity of 100%. Selecting an LQAS acceptance number of 1 error yields a sample size of approximately 224–241 smears per year. Thus, approximately 50–60 slides were systematically selected for rechecking by InDRE supervisors every 3 months from laboratories with sufficient testing volume. For laboratories with low testing volumes all available slides were selected for rechecking. A worksheet without smear results was prepared by the supervisors and provided to the technicians along with the smears to be rechecked at each SPHL. InDRE supervisors collected the results and smears from the SPHLs. All discordant test results between state and local laboratories were resolved by InDRE technicians blinded to the previous results. Smears were restained when results were reported to be positive and found to be negative when rechecked to rule out fading of smears due to storage and humidity.7,8 For logistical reasons and to reduce expenses, the rechecking scheme used assumed that the results from all slides for which the local and state laboratory results agreed were indeed correct, so only discrepancies were retested at InDRE. Although this form of analysis is known to produce positively biased estimates for the laboratory performance parameters, these biases are known to be relatively small.9–12 InDRE supervisors and the SPHLs evaluated the results from technicians from each laboratory, interpreted errors, and recommended corrective action based on the recently published EQA guidelines6 in collaboration with the laboratory director of each state. Errors detected were documented by InDRE supervisors as either high false-positive (HFP), a negative smear that is misread as 11 to 31; high false-negative (HFN), a smear that is 11 to 31 but is misread as negative; low false-positive (LFP), a negative smear that is misread as a low-positive (1–9 AFB/100 fields); or low falsenegative (LFN), a low-positive smear that is misread as negative. The total numbers of errors and error types were recorded for each local laboratory and state. The state laboratory directors reported the rechecking results to their local laboratories. The potential sources of errors were investigated during onsite visits, and remedial training or other corrective measures were implemented when necessary. The Figure shows a flowchart for the procedure. The error rates detected with non-blinded rechecking for calendar year 1998 were compared with those detected with blinded rechecking of randomly selected slides in 2001. Exact methods using StatXact (Cytel Software Corporation, Cambridge, MA) were used to compare proportions and logistic regressions using SAS (SAS Institute, Cary, NC) were used in the overall analysis to control for variations between laboratories.

Random blinded rechecking of AFB smears

Figure Random, blinded rechecking program as implemented in Mexico. InDRE 5 Instituto de Diagnóstico y Referencia Epidemiológicos.

The feasibility and mechanisms for future implementation of blinded rechecking were judged by comparing the benefits to the National TB Program (NTP) with the costs for logistics and resources.

RESULTS In 1998, the 33 laboratories in State A had testing volumes ranging from 143 to 7366 smears (average 2000) and a reported 5% smear positivity rate, while the 15 laboratories in the smaller State B had testing volumes ranging from 8 to 2219 smears (average 1000) and a reported 3% smear positivity rate. The number of smears selected for rechecking in 2001 ranged from 78 to 272 (average 194) in State A and from 53 to 252 (average 191) in State B. The evaluation of the quality of the AFB smears in 2001 revealed that 43.3% of those prepared inside the laboratories in State A were adequate compared with 19.3% produced outside the laboratories (P , 0.0001), while in State B, 23.1% of the smears prepared inside the laboratory were judged adequate compared with 10.3% produced outside the laboratory (P , 0.0001). The overall rechecking results are shown in the Table. The number of laboratories with no errors detected was substantially lower in 2001 than in 1998. Table

In State A the positive predictive value (PPV) decreased from 99.7% in 1998 to 85.1% in 2001 (P , 0.0001), and the negative predictive value (NPV) decreased from 99.83% to 99.25% (P , 0.0001). In State B, the PPV decreased from 99.6% to 86.7% (P , 0.0001), while the NPV fell only slightly, from 99.65% to 99.59% (P . 0.05). Most of the errors detected in 2001 were HFP or HFN. In State A, 45 (44%) errors were HFP, 43 (42%) were HFN, 10 (10%) were LFP, and 5 (5%) were LFN. Of the 45 HFP errors, 26 (58%) were from 5 local laboratories, while 23 (53%) of HFN errors were from 6 laboratories. State B, which appears to have a lower prevalence of TB, smaller testing volumes, and poorer quality slides, had only a total of 24 errors, of which 9 (38%) were HFP, 11 (46%) were HFN, 2 (8%) were LFP, and 2 (8%) were LFN. Restaining of negative smears by technicians at InDRE after the initial rechecking process did not reveal evidence of fading. This was of particular concern in State A, where the humidity is higher than in State B. On-site evaluations revealed poor quality microscopes in some laboratories and failure of technicians to record and report the exact number of bacilli on low-positive smears. The practice of reporting these as negative by some technicians changed during the course of the study. To control for the differences between laboratories, data were compared from the 31 laboratories in State A and from the 14 laboratories in State B which had data for both 1998 and 2001. This comprised 95.2% of the total number of rechecked slides from the complete data set. As the 1998 data had been stratified by the initial results, stratified analysis was required to compare the 1998 and 2001 data. Logistic regression models were used to analyze the data using the 45 laboratories and the two time periods as covariates. For slides initially reported as negative, the likelihood of a false-negative error being detected was approximately two times higher in 2001 than in 1998 (P 5 0.0004). For slides initially reported as positive, the likelihood of a false-positive error being

Rechecked AFB smear results for laboratories in 1998 and 2001 State A

True positives False positives Positive predictive value True negatives False negatives Negative predictive value Slides Laboratories Laboratories with no errors detected AFB 5 acid-fast bacilli.

303

State B

1998

2001

1998

2001

3 400 11 99.7% 6 599 11 99.8% 10 021 33 25 (76%)

314 55 85.1% 6366 48 99.3% 6783 35 4 (11%)

517 2 99.6% 1728 6 99.7% 2253 15 11 (73%)

72 11 86.7% 3155 13 99.6% 3251 17 6 (35%)

304

The International Journal of Tuberculosis and Lung Disease

detected was approximately 50 times higher in 2001 than in 1998 (P , 0.0001).

DISCUSSION Although these results demonstrate that the laboratory performance values reported in 2001 were significantly lower than those reported in 1998, it does not seem likely that these results represent an actual decrease in the quality of laboratory performance during that time. Rather, a more likely explanation for the differences in the laboratory performance estimates is that the collecting of random samples of slides by InDRE supervisors as part of on-site evaluations, combined with the blinding of the local laboratory results to the state laboratory technicians responsible for rechecking them, resulted in a more accurate picture of true laboratory performance in 2001. Results from this study revealed that a smaller sample of AFB smears could be used to monitor the quality of AFB smear microscopy in Mexico for states with prevalence rates closer to 5%, such as State A. The percentage of inadequate slides due to poor quality sputum collection prompted training for improvement. Poor quality slides can make it harder to identify positives. This can result in estimates of the error rates that are biased downwards, which may be why the improvement seen in State B was less than in State A. The lower estimated prevalence rate in State B may also be a direct consequence of the poor quality of slides in State B. A previous evaluation of blinded and unblinded rechecking of AFB smears in Vietnam also demonstrated that an increased number of errors was detected with blinded rechecking by introducing mislabeled smears into the rechecking sample.13 Conclusions of the Vietnam study are similar to ours with respect to the bias inherent to using unblinded slides for rechecking. However, our study, which is based on recently published international guidelines,6 may be somewhat less labor intensive and more practical to implement within a large TB laboratory network such as Mexico’s. One advantage of the random selection and blinded rechecking quality assurance program compared with the unblinded rechecking of 100% of the initially positive slides and 10% of the initially negative slides is that it permits researchers to directly estimate laboratory sensitivity and specificity. These estimates can then be used to produce accurate estimates of prevalence rates of disease, which in turn allow health department officials to plan intervention and prevention strategies more effectively and efficiently. Programmatic resources, personnel, data management, and organizational skills that had not previously been necessary were all required for implementation of the pilot program.14 In addition to the microscope slide boxes, a laptop computer was an essential asset for the collection and data manage-

ment process. Additional major programmatic costs to the NTP included the costs of an additional fulltime technician for InDRE, travel costs for collection of slides and on-site evaluations, and training expenses. Savings included the reduction in costs for shipping the slides to be rechecked to the state laboratories and the added value of having fewer smears rechecked, plus the overall savings to the NTP resulting from improvements in the diagnosis of TB cases and treatment monitoring throughout the laboratory network.15 Laboratories in many countries do not routinely receive on-site evaluations or supervisory visits.16,17 In Mexico, local laboratory technicians reported that the opportunity to consult with SPHL and InDRE TB laboratory supervisors was a very positive feature of the slide collection process. While the non-blinded rechecking process with periodic proficiency testing may appear to be less expensive than blinded rechecking of smears with periodic proficiency testing, blinded rechecking appears to be an effective and practical way of motivating technicians to provide higher quality services on a daily basis.18 Thus, proficiency testing could be performed as an alternative to blinded rechecking in various states if the number and types of errors remain within an acceptable range in a given laboratory for a period of time.

CONCLUSIONS This study has evaluated an implementation procedure and revealed that a smaller random sample of AFB smears could be rechecked to assess the quality of AFB microscopy in Mexico compared with the nonblinded rechecking method used previously. Using a random sample with blinded rechecking yields less biased estimates in general, and permits direct estimates for sensitivity, specificity, and prevalence of disease in particular. Continuing improvements in the quality of sputum specimen collections and the preparation of smears will provide more accurate AFB microscopy results for confirming diagnoses and monitoring treatment responses. References 1 World Health Organization. Laboratory services in tuberculosis control; part I: organization and management. WHO/TB/ 98.258. Geneva, Switzerland: WHO, 1998. 2 El-Nageh M M, Heuk C, Kallner A, Maynard J. Quality systems for medical laboratories: guidelines for implementing and monitoring. World Health Organization Regional Publications. Eastern Mediterranean Series 14. Alexandria, Egypt: WHO EMRO, 1995. 3 Rieder H L, Chonde T M, Myking H, et al. The public health service national tuberculosis reference laboratory and the national laboratory network. Paris, France: International Union Against Tuberculosis and Lung Disease, 1998. 4 Balandrano S, Martínez A, Sosa M, et al. National quality control of AFB microscopy in Mexico. Madrid, Spain: Annual IUATLD Conference. Int J Tuberc Lung Dis 1999; 3 (Suppl 1): S108.

Random blinded rechecking of AFB smears

5 Martinez-Guarneros A, Balandrano-Campos S, Ridderhof J, et al. Implementation of proficiency testing in conjunction with a rechecking system for external quality assurance in tuberculosis laboratories in México. Int J Tuberc Lung Dis 2003; 7: 516–521. 6 External quality assessment for AFB smear microscopy. World Health Organization, Centers for Disease Control and Prevention, APHL, KNCV and IUATLD. Washington, DC: APHL, 2002. 7 de Kantor I N, Laszlo A, Vazquez L, et al. Periphery to centre quality control of sputum smear microscopy and ‘rapid fading’ of Ziehl-Neelsen staining. Int J Tuberc Lung Dis 2000; 4: 887–888. 8 de Kantor I N, Laszlo A, Vazquez L, et al. More on periphery to centre quality control of sputum smear microscopy and ‘rapid fading’ of Ziehl-Neelsen staining. Int J Tuberc Lung Dis 2001; 5: 387–389. 9 Hadgu A. The discrepancy in discrepant analysis. Lancet 1996; 348: 592–593. 10 Lipman H B, Astles J R. Quantifying the bias associated with the use of discrepant analysis. Clin Chem 1998; 44: 108–115. 11 Green T A, Black C M, Johnson R E. Evaluation of bias in diagnostic-test sensitivity and specificity estimates computed by discrepant analysis. J Clin Microbiol 1998; 36: 375–381.

305

12 Miller W C. Bias in discrepant analysis: when two wrongs don’t make a right. J Clin Epidemiol 1998; 51: 219–231. 13 Nguyen T N L, Wells C D, Binkin N J, et al. Quality control of smear microscopy for acid-fast bacilli: the case for blinded rereading. Int J Tuberc Lung Dis 1999; 3: 55–61. 14 Van Deun A. External quality assessment of sputum smear microscopy: a matter of careful technique and organization. Int J Tuberc Lung Dis 2003; 7: 507–508. 15 Nguyen T N L, Wells C D, Binkin N J, et al. The importance of quality control of sputum smear microscopy: the effect of reading errors on treatment decisions and outcomes. Int J Tuberc Lung Dis 1999; 3: 483–487. 16 Van Deun A, Roorda F A, Chambugonj N, et al. Reproducibility of sputum smears examination for acid-fast bacilli: practical problems met during cross checking. Int J Tuberc Lung Dis 1999; 3: 823–829. 17 Van Deun A, Portaels F. Limitations and requirements for quality control of sputum smear microscopy for acid-fast bacilli. Int J Tuberc Lung Dis 1998; 2: 756–765. 18 Fujiki A, Giango C, Endo S. Quality control of sputum smear examination in Cebu Province. Int J Tuberc Lung Dis 2002; 6: 39–46.

RÉSUMÉ

Les laboratoires du Mexique soutenant le Programme National de lutte antituberculeuse ont été impliqués dans un programme de contrôle de qualité externe de la microscopie des bacilles acido-résistants (BAAR) qui comporte le contrôle de 100% des lames identifiées comme BAAR-positives par les laboratoires locaux et de 10% des lames identifiées comme BAARnégatives. Un très petit nombre d’erreurs ont été détectées au Mexique en utilisant une sélection non-aléatoire et un contrôle non-aveugle des lames. O B J E C T I F : Evaluer les résultats d’un programme-pilote d’un an comportant un recontrôle aveugle de lames BAAR sélectionnées au hasard à partir de laboratoires locaux de TB dans deux états du Mexique et déterminer sa faisabilité pour une application future. S C H É M A : Pour réduire les biais potentiels, le personnel CONTEXTE :

du laboratoire provenant du Laboratoire National TB à l’Institut de Diagnostic Epidémiologique et de Référence (InDRE), a pratiqué tous les trimestres un échantillonnage statistique des frottis BAAR et des évaluations sur place dans les laboratoires locaux de chaque état. Des lames BAAR ont été recontrôlées dans les laboratoires d’état respectifs, les résultats discordants étant résolus à l’InDRE. R É S U L T A T S : On a découvert un pourcentage significativement plus élevé d’erreurs sur les frottis sélectionnés au hasard et examinés à l’aveugle que dans ceux non-sélectionnés au hasard et non-examinés à l’aveugle. C O N C L U S I O N : Le recontrôle aléatoire et aveugle fournit des estimations plus précises des résultats de la microscopie pour BAAR, ce qui entraîne une amélioration du diagnostic ainsi que du suivi des réponses au traitement.

RESUMEN A N T E C E D E N T E S : Los laboratorios en México que apoyan al Programa Nacional de Control de Tuberculosis (TB) participan en un programa de aseguramiento de la calidad externo de baciloscopías, que incluye la relectura del 100% de las láminas identificadas como positivas a la presencia de bacilos ácido-alcohol resistentes (BAAR) y 10% de láminas identificadas como negativas a BAAR por los laboratorios locales. En México se detectan muy pocos errores utilizando la relectura de láminas que no han sido seleccionadas al azar ni releídas a ciegas. O B J E C T I V O S : Evaluar los resultados de un programa piloto de un año en dos estados mexicanos que incluye la relectura a ciegas y la selección al azar de láminas de los laboratorios locales de TB y determinar la factibilidad de su futura aplicación. D I S E Ñ O : Para reducir potenciales desviaciones, el equipo

técnico del Laboratorio Nacional de TB, del Instituto de Diagnóstico y Referencia Epidemiológicos (InDRE), desarrollo un muestreo estadístico cuatrimestral de baciloscopías evaluadas in situ en los laboratorios locales de cada estado. Las baciloscopías fueron releídas en los respectivos laboratorios estatales, resolviendo el InDRE cualquier resultado discordante. R E S U L T A D O S : Fue detectado un porcentaje significativamente alto de errores por la selección al azar y a ciegas de baciloscopías en comparación de las que no fueron seleccionadas al azar ni a ciegas. C O N C L U S I Ó N : La relectura al azar y a ciegas provee estimaciones de confiabilidad de los resultados de microscopia de BAAR, dando como resultado un mejoramiento del diagnóstico y el monitoreo de la respuesta al tratamiento.