Systematic reviews: insufficient evidence on ... - Wiley Online Library

4 downloads 6197 Views 575KB Size Report
Leading the field has been Effective Care in Pregnancy and Childbirth3 with the ... the field, such systematic reviews provide a quick overview of ..... Oxford: Update Software, 1995. McDonald D ... National Health Service. Though different ...
British Journal of Obstetrics and Cynaecology January 1998,Vol. 105, pp. 1-5

COMMENTARIES

Systematic reviews: insufficient evidence on which to base medicine Systematic reviews have made important contributions in many branches of health care, challenging established practices and pointing to the need for medicine to be evidence based’. They advocate the use of all available objective evidence, especially that from randomised trials, and their synthesis in the form of meta-analyses2. Leading the field has been Effective Care in Pregnancy and Childbirth3 with the regularly updated electronic publications Oxford Database of Perinatal Trials4 which became the Cochrane Collaboration: Pregnancy and Childbirth Module published in the Cochrane Pregnancy and Childbirth Database5and more recently in the Cochrane Library6.This endeavour has been welcomed into our speciality’. For many inside and outside the field, such systematic reviews provide a quick overview of published randomised trials, report on the results of meta-analyses, and comment on implications for practice and further research. Should systematic reviews define policy? It has been advocated that the results of reviews be incorporated into clinical guidelines and taken into account by professional bodies, policy makers, purchasers and providerss. Managers have started to define health care policy and budgets on the basis of recommendations within the reviews. In perinatal care, the reviews of the Cochrane collaboration have been used as a foundation for health strategies, and are considered an authoritative source in debates on the provision of health care. This has thrust a great deal of responsibility onto its editors. What is generally not realised is that the reviews, and the recommendations which are based upon them, have so far not benefited from independent peer review, nor from an objective mechanism to incorporate feedback and criticism in subsequent editions. This leaves them wide open to errors in analysis and clinical interpretation.

Problems and pitfalls Systematic review is a new method whose reliability and relevance for clinical practice has not been established. A number of issues need to be considered. Publication bias

Reviews have to be comprehensive and include all relevant studies, which is a complex task9. Ideally, all trials 0 RCOG 1998 British Journal

of Obstetrics and Gynaecology

ought to be registered at inception to ensure that they are properly reportedlo, as a strong publication bias exists”. The potential for wrong conclusions from positive small trials has been shown, as in the example of the lack of efficacy of magnesium sulphate in myocardial infarctionI2.Although it is still a matter of debate, a good large trial with sufficient power is still regarded as the gold standard. The single largest trial was compared with all the others in meta-analyses of 30 different subjects in the Cochrane perinatal database; in only 17 of 30 analyses was there agreement in the direction and significance of the resultI3. Blinding and bias

Advocates of randomised trials emphasise the need for appropriate methodology such as concealment of treatment allocation to avoid biasi4. But it should also be remembered that systematic reviews are studies of studies and themselves rarely carried out ‘blind’. In fact meta-analysis is undertaken after the data are already available, which can result in all the problems of retrospective re~earch’~. Despite the best intentions and a well defined methodology, a considerable amount of judgment has to be used when interpreting individual studies and determining their suitability for inclusion, as has also been shown when applying measures for quality control16.Acceptance or rejection of a study can change the overall conclusions’5. Systematic reviews provide a uniform method of analysis and presentation, but the very uniformity of the format may make it even more difficult to recognise if bias is present in the inclusion of trials. Heterogeneity

The studies need to be examined carefilly for their suitability for inclusion, taking into account heterogeneity of study design, patient characteristics, treatments and measures of outcome”. Apart from statistical heterogeneity, there is also clinical heterogeneity within and, more importantly, between studieslH. Meta-analyses may incorporate trials with different criteria for defining the condition under study, the method of admitting patients, the selection and administration of treatment, and the assessment of outcome, which can preclude

2

COMMENTARIES

them from being combined with any validity". Another source of error may be introduced by differential weighting based on a fixed-effect assumption of treatment which usually reflects trial size and narrowness of the confidence interval; in a heterogeneous set of studies, the patient characteristics and design of a particular trial with a high weighting will dominate the metaanalysis, and make it less generalisable20.Differing levels of risk may also make the results of the analysis not applicable?'. Due to such heterogeneity, current methods of statistical analysis may be seriously flawed and lead to misleading conclusions, unless the dependence of the treatment effect on measured baseline characteristics in each trial is investigated22. Updating reviews

One of the characteristics of a review is that it becomes out of date as soon as new evidence emerges. If metaanalyses are to underpin reliable summaries of current understanding of treatment efficacy and safety, they must be rapidly ~pdated'~. Clinical relevance

The perspective within a review may be lost if those undertaking it lack clinical understanding of the field and concentrate on methodology alone. The results may be useful for drug or equipment manufacturers and regulatory agencies, but may not necessarily help the clinician concerned with the management of individual patientsl3,t7.193

Potential for a false message: the example of electronic fetal monitoring Several of these problems can be illustrated by the following example which relates to a frequently debated topic in obstetrics and midwifery: how to monitor the fetus during labour. The principal aim of continuous electronic fetal monitoring was to reduce perinatal mortality, but it was introduced about 30 years ago without validation in controlled trials. Several randomised studies have since compared electronic fetal monitoring with intermittent auscultation, with and without fetal scalp blood sampling. Systematic reviews published in the Cochrane Collaboration-Pregnancy and Childbirth Module5 concluded that electronic fetal monitoring increased the caesarean section rate, especially when used without fetal scalp blood sampling; that intermittent auscultation was associated with a higher rate of neonatal seizures but no increase in cerebral palsy; and that there was no evidence that electronic fetal monitoring reduced perinatal mortality.

This last conclusion, which fundamentally questions the value of electronic fetal monitoring, was based on a meta-analysis of perinatal mortality24,in which two of six ~ t u d i e s together ~ ~ , ~ ~ accounted for 89.5% of the weighting. Yet the considerable heterogeneity of participants in the trials and background risks raises the question whether they can be justifiably analysed together. Furthermore, each of these studies is limited in the statement it can make. In the Dublin under the hospital's active management protocol, almost half of the deliveries in the electronic fetal monitoring group occurred within 2 hours of randomisation, and it is likely that the benefits, if any, from electronic fetal monitoring would not necessarily become apparent within such a short time. This raises the question whether the results are transferable to other units, most of which have less 'active' management protocols. Although there were about 6500 deliveries in each group, the sample size was calculated to investigate perinatal mortality and morbidity together and did not have sufficient power to examine mortality alone. In fact, there was a significantly increased rate in morbidity (neonatal seizures) in the intermittent auscultation group, confined to the subgroup of labours where oxytocin had been administered. The overall rate of intrapartum stillbirths was much lower (0.3 per thousand) than expected at the beginning of the trial on the basis of previous hospital reports (1 per thousand), and the rate of stillbirths plus neonatal deaths was 2.1 per thousand (vs 3 per thousand expected). This was in part due to the exclusion of cases with meconium or oligohydramnios, which amounted to only 5.7% of all cases but was associated with a perinatal mortality rate of 11.4 per thousand (i.e. more than five times that of the trial population). Thus, as regards the outcome measure of perinatal mortality, the Dublin trial was conducted on a relatively low risk population. The second studyZhrelated to preterm babies only (26-32 weeks; estimated fetal weight 500 g to 1750 g). It showed not only no advantages of electronic fetal monitoring over intermittent auscultation, but even a clear disadvantage for the fetus: at 18 month follow up, babies who were in the electronic fetal monitoring group were more likely to have cerebral palsy". However the management protocol for the two groups resulted in an important confounder: different time intervals from heart rate abnormality to delivery. In the intermittent auscultation group, a fetal heart rate abnormality was an indication to proceed to delivery; whereas in the electronic fetal monitoring group, an abnormality was usually an indication for doing one or more fetal scalp blood samples. As a result, the average interval from fetal heart rate abnormality to delivery was substantially longer in the electronic fetal monitoring than in the auscultation group (104 vs 40 minutes). 0 RCOG 1998 Br J Obstet Gynaecol 105, 1-5

COMMENTARIES

But it has been well established by a number of studies that preterm babies have diminished reserve and develop acidosis more rapidly, and many clinicians would not delay but expedite delivery once such a fetus is considered to be at risk of 'distress'. It is regrettable but not surprising that at 18 month follow up, there was a higher rate of cerebral palsy in babies which were monitored with electronic fetal monitoring and fetal scalp blood sampling. The results of the study by Luthy et ul. 26 for small babies is not generalisable to term babies, nor even to preterm labour managed in most units. I doubt whether the research protocol would have been accepted by many other ethical committees. Thus the systematic review in the Cochrane Pregnancy and Childbirth Database, which finds no evidence in favour of the principal claim for electronic fetal monitoring, that it reduces perinatal mortality, is based on a meta-analysis which is fraught with problems of entry criteria and lacks power to address this question. The two studies which dominate the weighting each not only have their own shortcomings, but are also substantially different from one another, with widely varying selection criteria, clinical risk, background rates of adverse outcome, trial design and management. The review has also been out of date even at its last revision in May 199424as it did not include a large randomised published in June 1993. That trial took place in Athens in a population with higher risk and background rates of perinatal mortality, and showed a significant reduction of mortality due to hypoxia in labours monitored with electronic fetal monitoring. Including this study in his own meta-analysis of electronic fetal monitoring with or without fetal scalp blood sampling against intermittent auscultation, Vintzileos et al.29confirmed a higher caesarean section rate in the electronic fetal monitoring group, but also found a 60% reduction of perinatal deaths due to hypoxia. Therefore the evidence in favour of electronic fetal monitoring is much stronger than the prevailing arguments about relative importance of neonatal seizures vs excess intervention would suggest: in labours at risk of hypoxia, electronic fetal monitoring saves lives. The debate should now centre on how to identify the cases which benefit from intensive monitoring, and how to tailor the appropriate level of surveillance, allowing the birth process to be safe yet as free as possible from intervention. In the Confidential Enquiry into Stillbirths and Deaths in Infancy (CESDI), the largest audit into perinatal mortality, expert panels reviewing detailed evidence find year after year that most intrapartum deaths had sub-optimal care and could have been avoidable, and that one of the principal factors is the failure to monitor fetal heart rate continuously when indicated30. This seems in contrast to the conclusions of the Cochrane reviews, which in recent years have dominated 0 RCOG 1998 Br J Obstet Gynnecol 105, 1-5

3

many interdisciplinary, political and economic debates, and which have had an impact on policy decisions and professional recommendations. The reviews have finally been updated in a recent issue of the Cochrane Libra$' and now also include the Athens trial2x.Yet the combination of this with the other trials means that its main finding-that intennittent auscultation increases perinatal mortality in a population with high mortality rates-is not reflected in the overall results. Once again, the heterogeneity and differing rates of background risk are not accounted for, making the combined analysis invalid. The reviewers point out that with electronic fetal monitoring, only the Athens trial showed a statistically significant decrease in perinatal mortality, and only the Dublin trial a statistically significant decrease in neonatal seizures. They fail to mention that these are also the only trials with suficient statistical power to evaluate these two outcomes. It would hence be more appropriate to state that the only trial with enough power to look at neonatal seizures showed that these are increased by a policy of intermittent auscultation; and that the only trial with enough power to look at perinatal mortality found that intermittent auscultation results in an increased rate of perinatal deaths.

Peer review and criticism There are other examples of reviews with unreliable methods or conclusions. The question has to be asked whether publication of systematic reviews ought to appear without external peer review. Peer review is the process which defines quality in journals devoted to the health sciences. Despite its shortcomings, it is still the only feasible method to maintain accountability and quality control. This is particularly important for new techniques, of which the systematic review is one. The surge of any new method to wide scale acceptance without critical appraisal must be resisted, lest we repeat the mistakes of the past. After proper peer review, there also needs to be a good system to allow criticism after publication. The place for such feedback is back in the same medium, and in an electronic, regularly updated publication they should be tagged to the review itself. A critique may relate to the selection and grouping of the trials, the way the analysis is presented, the methodology of the metaanalysis, the interpretation and conclusions, or the clinical relevance. Objections to a particular study may already have been raised in correspondence subsequent to original publication, but this tends to be ignored in systematic reviews; there ought to be a way to include a summary of such reservations. The need to foster such critical analysis has been recognised for some time32 but attempts to address

4

COMMENTARIES

this issue within the Cochrane Collaboration have been hampered by logistics and a shortage of funds (I. Chalmers and D. Rennie, personal communications). Nevertheless, the electronic publication of systematic reviews has been proceeding actively, and only this year has an electronic facility for comments and criticisms appeared within the Cochrane Library. Criticism has of course been possible by conventional means, but response has been slow. It is hoped that the electronic facility for comments will be used by readers, and that reviewers will be able to respond quickly. We should recognise that reviews may be as much subject to bias and wrong conclusions as individual studies. The publication of such ‘studies of studies’ without the usual scientific checks and balances casts doubt on their authority for defining policy.

Systematic reviews and recommendations for treatment Systematic reviews ought not to provide conclusions, but should encourage debate towards formulating recommendations for treatment. Many reviews conclude that there is ‘no significant difference’ or ‘no evidence for’ a particular technique or intervention. This information can lead to a re-evaluation of clinical practice, but cannot be taken as proof for the null-hypothesis if the trials have insufficient power to test this. Health economists and budget-conscious managers must be made more aware of the importance of statistical power, and to recognise that there are limits to medical evid e n ~ e The ~ ~ . appraisal of any new technique has to involve clinical perspective^^^, and the clinical context is also important when assessing systematic reviewsIh. There are many instances where randomised trials may be ‘unnecessary, inappropriate, impossible, or inadequate’, and where epidemiological or observational studies can be complementary in discerning the best eviden~e’~. Evidence for effective treatment is derived from a variety of sources. No one method is able to provide all the answers to the complex problems in our speciality-to questions which may relate to drugs, techniques or protocols; in common as well as very rare outcomes; in young healthy women where mishaps can occur unexpectedly; and where attitudes and clinical perceptions are always changing. The systematic review is important, and is becoming increasingly sophisticated, but clinicians need to be aware of the sources of bias which can limit its usefulness. Jason Gardosi, Senior Lecturer Department of Obstetrics and Gynaecologx Queen 5 Medical Centre, Nottingham

References Evidence Based Medical Working Group. Evidence-based medicine: a new approach to teaching the practice of medicine. JAMA 1992; 268: 2420-2425. 2 Mulrow CD. Systematic Review: Rationale for systematic reviews. BMJ 1994; 309: 597-599. 3 Chalmers I, Enkin M, Kierse MJNC, editors. Eflective Care in Pregnancy and Childbirth Oxford. Oxford University Press, 1989: 846-882. 4 Chalmers I. Oxfvrd Database qfPerinatal Trials. Oxford Electronic Publishing, Oxford University Press, 1989-1992. 5 Keirse MJNC, Renfrew MJ, Neilson JP, Crowther C, editors. Pregnancy and Childbirth Module. In: The Cochrane Collaboration. The Cochrane Pregnancy and Childbirth Databue. Oxford: Update Software; 1993-1995. EMJ PublishingGroup, London. 6 Neilson JPCC, Hodnett ED, Hofmeyr GJ, Keirse MJNC, editors. Pregnancy and Childbirth Module. In: The Cochrane Library. The Cochrane Database of Systematic Reviews. Oxford: Update Software, 1996-1997. 7 Paintin DB. Effective care in pregnancy and childbirth. Br J Obstet Gynaecoll990; 91: 967-969. 8 Haines A, Jones R. Implementing findings of research. EMJ 1994; 308: 1488-1492. 9 Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic reviews. BMJ 1994;309: 1286-1291. 10 Chalmers I, Dickersin K, Chalmers TC. Getting to grips with Archie Cochrane’s agenda. BMJ 1992;305: 786-788. 1 1 Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. Lancet 1991; 337: 867-872. 12 Egger M, Davey Smith G. Misleading meta-analysis.BMJ 1995; 310: 752-754. 13 Villar J, Carroli G, Belizan JM. Predictive ability of meta-analyses of randomised controlled trials. Lancet 1995; 345: 772-776. 14 Schulz KF. Randomised trials, human nature, and reporting guidelines. Lancet 1996; 348: 596-598. 15 Chalmers T.Problems induced by meta-analysis. Stat Med 1991; 10: 97 1-980. 16 Oxman AD, Guyatt GH. Validation of an index of the quality of review articles.JCIin Epidemiol1991;44: 1271-1277. 17 Kassirer JP. Clinical Trials and Meta-Analysis.What do they do for us? N Engl JMed 1992; 327: 273-274. 18 Thompson SG. Systematic reviews: Why sources of heterogeneity in meta-analysis should be investigated BMI 1994; 309: 1351-1355. 19 Ilorwitj! RL. Large-scale randomized evidence: large, simple trials and overviews of trials. JClin Epidemiol1995; 48: 41-44, 20 Thompson SG, Pocock SJ. Can meta-analysis be trusted? Lancet 1991; 338: 1127-1130. 21 Rothwell PM. Can overall results of clinical trials be applied to all patients? Lancet 1995; 345: 1616-1619. 22 Sharp SJ, Thompson SG, Altman DG. The relation between treatment benefit and underlying risk in meta-analysis. BMJ 1996; 313: 745-748. 23 Jones DR. Meta-analysis: weighing the evidence. Stat Med 1995; 14: 137-149. 24 Neilson JP. EFM + scalp blood sampling vs intermittent auscultation in labour. In: Keirse MJNC, Renfrew MJ, Neilson JP, Crowther C, editors. Pregnancy and Childbirth Module. ln: The Cochrane Collaboration. The Cochrane Pregnancy and Childbirth Database; Issue 2. Oxford: Update Software, 1995. 25 McDonald D, Grant A, Sheridan-Pereira M, Boylan P, Chalmers I. The Dublin randomised trial of intrapartum fetal heart rate monitoring. Am JObsfet Gynecol1985; 152: 524-539. 26 LGhy DA, Shy KK, Van Belle G et al. A randomised trial of electronic fetal monitoring in pretenn labour. Obstet Gynecol 1987; 69: 687-695. 27 Shy KK, Luthy DA, Bennett FC et al. Effects of electronic fetal-heartrate monitoring, as compared with periodic auscultation, on the neurologic development of preterm infants. N Engl J Med 1990; 322: 588-593. 28 Vintzileos AM, Antsaklis A, Varvarigos I et al. A randomised trial of intrapartum electronic fetal heart rate monitoring versus intermittent auscultation.Obstet Gynecol1993;81: 899-907. 1

0 RCOG 1998 Br J Obstet Gynaecof 105, 1-5

COMMENTARIES

29 Vintzileos AM, Nochimson DJ, Guzman ER et al. Intrapartum electronic fetal heart rate monitoring versus intermittent auscultation: A meta-analysis. Obstet Gynecol1995;85: 149-155. 30 Maternal and Child Health Consortium. Confidential Enquiry into Stillbirths and Deaths in Infancy, 199415 Annual Report. London: HMSO, 1997. 31 Thacker SB, Stroup DF, Peterson HB. Continuous Electronic Fetal Monitoring during Labor. In: Neilson JPCC, Hodnett ED, Hofmeyr GJ, Keirse MJNC, editors. Pregnancy and Childbirth Module. In: The Cochrane Library. The Cochrane Database of Systematic Reviews;

5

Issue 2.Oxford: Update Software, 1997. 32 Chalmers I, Haynes B. Reporting, updating, and correcting systematic reviews of the effects of health care. BMJ 1994;309: 862-865. 33 Naylor CD. Grey zones of clinical practice: some limits to evidence based medicine Lancet 1995;345:840-842. 34 Gardosi J. Monitoring technology and the clinical perspective. In: Gardosi J, editor. Intrapartum Surveillance. Baillieres Clin Obster Gynaecoll996;10:2. 35 Black N. Why we need observational studies to evaluate the effectiveness of health care. BMJ 1996;312: 1215-12 18.

British Journal of Obstetrics and Gynaecology January 1998, Vol. 105, pp. 5-7

Perinatal pathology: centralise or perish? “You may take notes for twenty years, from morning to night at the bedside of the sick, upon diseases of the viscera, and all will be to you only a confusion of symptoms, a train of incoherent phenomena. Open a few bodies, this obscurity will disappear ’’ (Bichat)‘.

Though containing sound advice this sentiment is expressed in the belief that all those ‘opening’ bodies see and record to a uniformly high standard. Unfortunately the human condition is such that this state of perfection is unattainable. However this should not provide an excuse for not striving to reach this standard. It should therefore be of concern both to pathologists and clinicians alike that once again surveys of the quality of perinatal autopsies, on this occasion from the Northern Region2and from Northern Ireland3reveal serious deficiencies in the perinatal pathology services within the National Health Service. Though different methods of assessment were used by the two groups, they like earlier studies, have confirmed that only about half of the autopsies in these Regions reached acceptable standards. In the Northern Region only 5 1% reached the Royal College of Pathologists’ minimum criteria, while in Northern Ireland 46.6% were considered inadequate. These figures are remarkably similar to those in the West Midlands Survey of 19874 and while the authors have noted an improvement in standards in the Northern Region there would seem to be little or no evidence of any general improvement in almost a decade. It is however of note that the Annual Report of the All Wales Perinatal Survey for 1996 reveals a significant improvement in the quality of autopsies associated with an increase in the proportion of autopsies referred to the specialist centre from 42% to 78%. While the proportion of unsatisfactory autopsies outside the specialist centre was 33% the overall proportion of unsatisfactory autopsies fell from 46% to 7% as a result of the changed referral 0 RCOG 1998 Br J Obstet Gynaecol 105, 5-7

pattern5. Though the Royal College of Pathologists has attempted to improve standards with the publication of guidelines6it is clear they have either had little effect or that their effects have not yet been reflected in these series of perinatal autopsies. . In addition efforts have been made to expose all trainees to perinatal pathology. Since this exposure may be for as little as two months, the minimum to fulfil College recommendations, it is perhaps not surprising that it has made little impact. It is notable that this exposure is similar to that of the nonspecialist trainee to neuropathology, a discipline now almost entirely dependent on specialist neuropathologists, a route that perhaps should be taken by perinatal pathology (vide infra). I remain to be convinced that two months of perinatal pathology training which at best might include attendance at two perinatal mortality meetings and performance of a handful of autopsies, even with published guidelines to hand, is a proper and adequate grounding for the performance of an autopsy that might well influence the future reproductive behaviour of a bereaved family. That it is not seems to be borne out by the two current reports. In the past it was possible to learn by experience but today the combination of declining perinatal mortality and falling autopsy rates means that some pathologists will see too few cases to maintain any expertise they have acquired during their training let alone develop it further. In addition many of the apparently more interesting cases are transferred to tertiary referral centres for clinical management and on occasion autopsy. Obstetrics, particularly fetal medicine and neonatology, are rapidly advancing areas of medicine and knowledge of these advances is essential to proper interpretation of autopsy data. In the face of many other demands few pathologists will keep abreast of such developments with the exception of those with a professed special interest in the subject. It is therefore not surprising that many pathologists are looking to opt out