Chapter 1

2 downloads 0 Views 7MB Size Report
This digital document is sold with the clear understanding that the publisher is not engaged in ..... enormous increase in computer power but also the elaboration of clever algorithms .... Literature: A Manual for Evidence-Based Clinical Practice. ... medicine with the so called translational medicine, research that aims to move ...
Complimentary Contributor Copy

Complimentary Contributor Copy

MEDICAL PROCEDURES, TESTING AND TECHNOLOGY

NETWORK META-ANALYSIS EVIDENCE SYNTHESIS WITH MIXED TREATMENT COMPARISON

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.

Complimentary Contributor Copy

MEDICAL PROCEDURES, TESTING AND TECHNOLOGY Additional books in this series can be found on Nova‘s website under the Series tab.

Additional e-books in this series can be found on Nova‘s website under the e-book tab.

Complimentary Contributor Copy

MEDICAL PROCEDURES, TESTING AND TECHNOLOGY

NETWORK META-ANALYSIS EVIDENCE SYNTHESIS WITH MIXED TREATMENT COMPARISON

GIUSEPPE BIONDI-ZOCCAI EDITOR

New York

Complimentary Contributor Copy

Copyright © 2014 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers‘ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.

Library of Congress Cataloging-in-Publication Data Network meta-analysis : evidence synthesis with mixed treatment comparison / Editors: Giuseppe Biondi Zoccai (Assistant Professor in Cardiology, Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Italy). pages cm -- (Medical procedures, testing and technology) Includes index. ISBN:  (eBook)

1. Meta-analysis. I. Zoccai, Giuseppe Biondi, editor. R853.M48N48 2014 610.28--dc23 2014016913

Published by Nova Science Publishers, Inc. † New York

Complimentary Contributor Copy

To my beloved Marzia, Giovanni Vincenzo, Giuseppe Giulio, and Attilio Nicola

Complimentary Contributor Copy

Complimentary Contributor Copy

CONTENTS Acknowledgements Giuseppe Biondi-Zoccai, M.D.

xi

1st Section

xv

Foreword: The Statistician’s Perspective Mauro Gasparini, Ph.D.

xvii

Foreword: The Epidemiologist’s Perspective Gordon H. Guyatt, M.D.

xxi

Foreword: The Translational Researcher’s Perspective Giacomo Frati, M.D.

xxv

Foreword: The Clinician-Investigator’s Perspective Antonio Abbate, M.D., Ph.D.

xxix

Preface

xxxiii Giuseppe Biondi-Zoccai, M.D.

2nd Section

1

Chapter 1

The Hierarchy of Evidence Oscar L. Morey-Vargas, M.D., Claudia Zeballos-Palacios, M.D., Michael R. Gionfriddo, Pharm.D. and Victor M. Montori, M.D.,M.Sc.

Chapter 2

From Pairwise to Network Meta-Analyses Sonya J. Snedecor, Ph.D., Dipen A. Patel, Ph.D. and Joseph C. Cappelleri, Ph.D.

3rd Section

3

21

43

Chapter 3

Designing and Registering the Review Alison Booth, M.Sc. and Dawn Craig, M.Sc.

45

Chapter 4

Searching for Evidence Su Golder, Ph.D. and Kath Wright

63

Complimentary Contributor Copy

viii

Contents

Chapter 5

Abstracting Evidence Joey Nicholson, M.L.I.S., M.P.H. and Sripal Bangalore, M.D., M.H.A.

77

Chapter 6

Appraising Evidence Partha Sardar, M.D., Anasua Chakraborty, M.D. and Saurav Chatterjee, M.D.

89

4th Section Chapter 7

Chapter 8

Chapter 9

101 Choosing between Frequentist and Bayesian Frameworks and the Corresponding Statistical Package Giuseppe Biondi-Zoccai, M.D. and Fabrizio D’Ascenzo, M.D. Choosing the Statistical Model and between Fixed and Random Effects Joseph Beyene, Ph.D., Ashley Bonner, M.Sc. and Binod Neupane, Ph.D. Choosing the Appropriate Statistics Jing Zhang, Ph.D. and Lifeng Lin, Ph.D. Student

5th Section

103

117

141 155

Chapter 10

Incorporating Moderators: Network Meta-Regression Songfeng Wang, Ph.D. and Neil Hawkins, Ph.D.

157

Chapter 11

Appraising Between-Study Heterogeneity Joel J. Gagnier, Ph.D.

171

Chapter 12

Appraising Inconsistency between Direct and Indirect Estimates Konstantinos Katsanos, M.D., M.Sc., Ph.D., E.B.I.R.

191

Chapter 13

Appraising Small Study Effects and Publication Bias Giuseppe Biondi-Zoccai, M.D. and Fabrizio D’Ascenzo, M.D.

211

Chapter 14

Moving from Study-Level to Patient-Level Data: Individual Patient Network Meta-Analysis Areti Angeliki Veroniki, Ph.D., Tania B. Huedo-Medina, Ph.D. and Kostas N. Fountoulakis, M.D., Ph.D.

Chapter 15

State of the Art Reporting of Network Meta-Analyses Andrea Cipriani, Ph.D., John R. Geddes, M.D., Anna Chaimani, M.Sc., Stefan Leucht, M.D. and Georgia Salanti, Ph.D.

6th Section Chapter 16

223

245

263 Case Study in Anesthesia and Intensive Care Teresa Greco, M.Sc., Giovanni Landoni, M.D., Omar Saleh, M.D. and Alberto Zangrillo, M.D.

Complimentary Contributor Copy

265

Contents

ix

Chapter 17

Case Study in Cardiovascular Medicine Yulei He, Ph.D., John A. Bittl, M.D., Abera Wouhib, Ph.D. and Sharon-Lise T. Normand, Ph.D.

285

Chapter 18

Case Study in Neurology Gaetano Zaccara, M.D. and Fabio Giovannelli, Psy.D., Ph.D.

307

Chapter 19

Case Study in Psychiatry Toshi A. Furukawa, M.D., Ph.D.

325

Chapter 20

Case Study in Rheumatology Susanne Schmitz, Ph.D., Roisin Adams, Ph.D., Michael Barry, Ph.D. and Cathal Walsh, Ph.D.

339

7th Section

361

Chapter 21

Moving from Evidence Synthesis to Action Fabrizio D’Ascenzo, M.D., Claudio Moretti, M.D., Ph.D., Pierluigi Omedè, M.D. and Fiorenzo Gaita, M.D.

Chapter 22

The Future of Network Meta-Analysis: Toward Accessibility and Integration Matthew A. Silva, Pharm.D., R.Ph., B.C.P.S. and Gert van Valkenhoef, M.Sc., Ph.D.

Conclusion

363

375

393 Giuseppe Biondi-Zoccai, M.D.

Index

395

Complimentary Contributor Copy

Complimentary Contributor Copy

ACKNOWLEDGEMENTS Giuseppe Biondi-Zoccai*, M.D. Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Italy Too many people should be thanked for their ongoing support to my ongoing professional and personal endeavors. Of course, my whole family should be thanked, first and foremost my mother Giulia, my father Gianni, my brother Vincenzo, and my sisters Erica and Gina. While finding people who have been instrumental at large in enabling me to fulfill my clinical and academic ambitions is relatively easy, finding those who have proved key mentors, in one way or another, is much more challenging. One of my career leitmotifs has in fact always been the habit of challenging everything that was told or shown to me by other colleagues, be them senior or junior ones. This meant that most supposed opinion leaders and authorities proved less than optimal mentors, unless their teachings and recommendations were very solidly grounded. However, a subset of the purported mentors did prove remarkably smart and authoritative, leaving a more or less pronounced impression on my career. Despite our limited interactions, any formal acknowledgment to influential colleagues should include Prof. Giuseppe Gallus, who first taught me medical statistics at the Faculty of Medicine of the University of Milan in 1993, as well as some anonymous lecturers in calculus and geometry at the Faculty of Engineering of the Polytechnic University of Milan. Indeed, when still 19-year-old, I had pre-enrolled in courses leading to both an MD diploma and an nuclear engineering BSc-MSc diploma in Milan, Italy. As the courses for the latter had begun earlier, I had had the opportunity to listen to some lectures on calculus and geometry. Given my high school education mainly focused on classical subjects, such as ancient Greek and Latin, I got really scared as I could not understand much and thus decided to abandon altogether the idea of becoming a nuclear engineer, and moved instead to a career in medicine. This professional retreat taunted me for many years, either consciously or

*

Corresponding author: Giuseppe Biondi-Zoccai, MD, Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Corso della Repubblica 79, 04100 Latina, Italy. Phone: +39 07731757245. Fax: +39 07731757254. Email: [email protected].

Complimentary Contributor Copy

xii

Giuseppe Biondi-Zoccai

subconsciously, and most likely was a key driving force in pushing me to study continuously and often self-taughtly statistics and research methods. Dr. Massimo Conio, an outstanding gastroenterologist currently practicing in San Remo, Italy, who had previously trained in France and in the US, provided me key suggestions on how to study and learn clinical as well as more methodological topics when I was still an inexpert 3rd-year medical student. Then, I began (and continue to do till now) collaborating with Dr. Pierfrancesco Agostoni, at the time another medical student just like me, but now consultant cardiologist at the Utrecht University Medical Center, Utrecht, the Netherlands. Pierfrancesco is also one of my best friends, but I must admit that his was the idea of preparing and submitting our first scholarly manuscript ever. This was simply a letter of comment to a study which had been published in Circulation. [1] Yet, in this small letter to the editor there were, in a sense, all the salient features of our subsequent meta-analyses, [2] such as the attention to clinical relevance, the background bibliographic search, the focus on methodological rigor, and the effort to carefully and correctly apply sound yet precise statistical methods to the question at hand. I still remember when and where we conceived and wrote the first draft. We were on a Rome to Milan train, coming back home after having attempted to secure a spot in the prestigious cardiology training program lead by Prof. Attilio Maseri at the Catholic University, in Rome, Italy. I indeed secured a spot in such residency program, and had there the unique luck of meeting and befriend Prof. Antonio Abbate. He was at the time a cardiology fellow just as me, but he proved a true mentor and friend, and our ongoing collaboration and unity is testified by the 123 scholarly papers co-authored together between 2002 and 2014. [3-4]. His perspective on my career efforts and the pros and cons of network meta-analysis, based on his excellent standing as a clinician-investigator, are explicitly stated in the Foreword he kindly contributed to this book. Another dear friend and colleague I was fortunate to meet in Rome during my cardiology training, and with whom I have continued to collaborate with during all these years is Dr. Enrico Romagnoli. His remarkable clinical and research skills closely match his genuine affection for me and my family. After obtaining my cardiologist diploma, I had the luck to pursue additional clinical and research training in interventional cardiology with Drs. Antonio Colombo and Giuseppe Sangiorgi at San Raffaele Hospital, in Milan, Italy. [5-6] Subsequently, I enjoyed the friendship and mentorship of Prof. Imad Sheiban at the University of Turin, Italy, [7] where I was able to teach, train and mentor myself a brilliant and talented junior colleague, Dr. Fabrizio D‘Ascenzo, who, I bet, will be able to overcome his own mentor in both clinical competence and research successes. [8] Finally, I was able to return to my beloved family in Rome in 2012 when joining the research group lead with unique competence and openmindedness by Prof. Giacomo Frati at Sapienza University of Rome. Giacomo and our common colleague in Rome, Dr. Mariangela Peruzzi, have proved true friends as well, providing me much guidance and support whenever need arose. Notwithstanding all the important persons mentioned above, my beloved wife Marzia, also a competent and caring cardiologist, and our wonderful kids, Giovanni Vincenzo, Giuseppe Giulio and Attilio Nicola, are of course my main inspiring energy and deepest treasure. Their loving care has been the inextinguishable light which has guided my path in the brightest as well as darkest times. I will never forget what they did for me.

Complimentary Contributor Copy

Acknowledgements

xiii

The informed and sensible reader of this book should recognize that they are the true moving force of this humble piece of work. Rome, March 14, 2014

REFERENCES [1] [2]

[3] [4]

[5]

[6]

[7]

[8]

Agostoni P, Biondi-Zoccai G. Prognostic value of C-reactive protein in unstable angina. Circulation 2000; 102: E177. Agostoni P, Biondi-Zoccai G, de Benedictis ML, Rigattieri S, Turri M, Anselmi M, Vassanelli C, Zardini P, Louvard Y, Hamon M. Radial versus femoral approach for percutaneous coronary diagnostic and interventional procedures; Systematic overview and meta-analysis of randomized trials. J Am Coll Cardiol 2004; 44: 349-56. Biondi-Zoccai G, Abbate A, Biasucci LM. B-type natriuretic peptide and acute coronary syndromes. N Engl J Med 2002; 346: 453-5. Van Tassell BW, Arena R, Biondi-Zoccai G, McNair Canada J, Oddi C, Abouzaki NA, Jahangiri A, Falcao RA, Kontos MC, Shah KB, Voelkel NF, Dinarello CA, Abbate A. Effects of Interleukin-1 Blockade With Anakinra on Aerobic Exercise Capacity in Patients With Heart Failure and Preserved Ejection Fraction (from the D-HART Pilot Study). Am J Cardiol 2014; 113: 321-7. Biondi-Zoccai G, Agostoni P, Abbate A, Testa L, Burzotta F, Lotrionte M, Crea F, Biasucci LM, Vetrovec GW, Colombo A. Adjusted indirect comparison of intracoronary drug-eluting stents: evidence from a metaanalysis of randomized baremetal-stent-controlled trials. Int J Cardiol 2005; 100: 119-23. Biondi-Zoccai G, Lotrionte M, Agostoni P, Abbate A, Fusaro M, Burzotta F, Testa L, Sheiban I, Sangiorgi G. A systematic review and meta-analysis on the hazards of discontinuing or not adhering to aspirin among 50,279 patients at risk for coronary artery disease. Eur Heart J 2006; 27: 2667-74. Lotrionte M, Biondi-Zoccai G, Agostoni P, Abbate A, Angiolillo DJ, Valgimigli M, Moretti C, Meliga E, Cuisset T, Alessi MC, Montalescot G, Collet JP, Di Sciascio G, Waksman R, Testa L, Sangiorgi G, Laudito A, Trevi GP, Sheiban I. Meta-analysis appraising high clopidogrel loading in patients undergoing percutaneous coronary intervention. Am J Cardiol 2007; 100: 1199-206. D'Ascenzo F, Bollati M, Clementi F, Castagno D, Lagerqvist B, de la Torre Hernandez JM, ten Berg JM, Brodie BR, Urban P, Jensen LO, Sardi G, Waksman R, Lasala JM, Schulz S, Stone GW, Airoldi F, Colombo A, Lemesle G, Applegate RJ, Buonamici P, Kirtane AJ, Undas A, Sheiban I, Gaita F, Sangiorgi G, Modena MG, Frati G, BiondiZoccai G. Incidence and predictors of coronary stent thrombosis: evidence from an international collaborative meta-analysis including 30 studies, 221,066 patients, and 4276 thromboses. Int J Cardiol 2013; 167: 575-84.

Complimentary Contributor Copy

Complimentary Contributor Copy

1ST SECTION

Complimentary Contributor Copy

Complimentary Contributor Copy

FOREWORD: THE STATISTICIAN’S PERSPECTIVE Mauro Gasparini*, Ph.D. Department of Mathematical Sciences, Politecnico di Torino, Torino, Italy Network meta-analysis (NMA) is a dream come true for the research statistician. Statistics is often thought to give sensible results only in the presence of large amounts of data and information. Quite to the contrary, Statistics can certainly be used fruitfully when a lot of data are available, but it becomes most useful and best appreciated by its users in situations of scarce data, partial information, conflicting evidence and counterintuitive numerics. All these characteristics are typical of network meta-analysis, which is a methodology to compare multiple treatments not originally challenged with adequate precision against one another. In network meta-analysis, data are scarce since patient data are still costly and time-consuming, unlike other high-throughput situations such as whole genome scans, sentiment analysis of the web, market research, business analytics and, generally speaking, all contemporary computer-intensive applications of machine learning. Furthermore, for the problems usually addressed by network meta-analysis, the information building up as data accumulate is not self-evident even in the presence of a large number of patients. In addition, the evidence network may contain inconsistencies in terms of conflicting indications of superiority between treatments. Given all these complications, it is not enough to do a good bookkeeping job and to compute percentages and simple averages: a statistical model is needed. This remark becomes even more compelling if we consider that all conclusions we may reach are affected by a high degree of uncertainty, which can be properly quantified by a statistical model-based procedure. For example, a good statistical procedure will give you the best possible ranking of the treatments, but also an associated degree of confidence (or credibility, within the Bayesian approach) on that ranking and on a set of neighboring possible alternative rankings. Often, second best is preferable to best, for a myriad of reasons which may be related to costs, patient conditions, timing, availability of alternative care and so on. Uncertainty quantification will help the practitioner to weigh the alternative options in an educated way. *

Corresponding author: Mauro Gasparini, Dipartimento di Scienze Matematiche, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy. Phone: +39 0110907546. Fax: +39 0110907599. Email: [email protected].

Complimentary Contributor Copy

xviii

Mauro Gasparini

Statistical models for network meta-analysis are naturally built around the concept of a network of pairwise (or more) treatment comparisons. Data arising from network structures are very common nowadays and attract the interest of many researchers from many different fields, including the machine learning applications mentioned above - think about the hype surrounding social networks these days. In social networks, nodes are people; in statistical physics, nodes are particles; in bibliometrics, nodes may be articles or scientists; in association football and other sports, nodes are teams; in network meta-analysis, nodes are treatments. Processing data from networks would have been unthinkable just 30 years ago or even more recently. The technological progresses making it possible are not only the enormous increase in computer power but also the elaboration of clever algorithms which can be broadly classified as Markov Chain Monte Carlo (MCMC), message passing, belief propagation and the like. Some of these algorithms are simulation-based, other algorithms perform exact (or approximately exact) computations, but they all rely on the idea of reducing the computation of complicated joint distributions to a series of computations of simpler node-by-node - "local" computations. The BUGS language with all its dialects (WinBUGS, OpenBUGS, STAN, et cetera) is an example of MCMC based software widely used in network meta-analysis. [1] From this point of view, network meta-analysis is a good example of a very up-to-date technological achievement. Of course, in order to understand properly the concepts used in network meta-analysis, an educated user must know the tools of meta-analysis and evidence synthesis (a good book on the technical side for the learning statistician is the one by Welton and colleagues) [2] or at least master more basic concepts of Statistics, both within the Bayesian and not Bayesian (also called frequentist, or sometimes classical) approaches. There are plenty of general textbooks in classical Biostatistics, for example the authoritative one by Bland, [3], whereas a good introduction to Bayesian ideas applied to biomedical problems is the work by Spiegelhalter and colleagues. [4] Briefly, when doing statistics the Bayesian way, the researcher is allowed - and actually encouraged - to use all available information regarding the problem at hand, and in particular prior information not generated by the data but coming from external sources. Whether it is acceptable to use this external information, which may be subjective, is the crux of the matter and a source of eternal debate among statisticians. Unlike half a century ago, when doing Bayesian statistics seemed a complicated and rather academic exercise, nowadays Bayesian methods are particularly favored by the technological developments mentioned above, in particular MCMC methods. The current attitude among statisticians is therefore rather ecumenical, and Bayesian methods are widely more accepted than in the past, with network meta-analysis being one of the fields in which Bayesian methods are prevalent. Perhaps one of the concepts the non-statistical user will find rather novel in network meta-analysis is the systematic use of the Generalized Linear Model (GLM). Even if the reader may not find this expression very often in the book, GLM is a unifying idea underlying the analysis of dichotomous (binary) data - used in several case studies in the final chapters of the book - polytomous data, count data and survival data. Practitioners are usually familiar with regression, which was invented towards the end of the Nineteenth Century as a way to express statistical dependence of a quantitative trait (such as height of the sons) on another quantitative trait (such as height of the fathers). Simple linear regression can be extended to account for multiple predictors, possibly qualitative, which are linked together in a linear combination in order to explain the behavior of a quantitative response, usually taken to be

Complimentary Contributor Copy

The Statistician‘s Perspective

xix

normally distributed. This way we obtain the so called general (not yet generalized!) linear model (PROC GLM in SAS, or function lm() in R). But not all responses are normally distributed and not all of them are quantitative: counts, rates and survival times, which are common in biomedical data and in network meta-analysis, are typical examples. That is why starting from the 70s - GLM were invented to treat non-normal and non-quantitative responses as functions - on average - of linear combinations of predictors. See for example Dobson and Barnett. [5] Finally, a typical expression which pervades the statistical analysis of structured data such as network meta-analysis is "random effects". An effect is, generally speaking, the quantification of the way a predictor affects a response. A fixed effect is an constant average effect of a certain level of a predictor of fundamental general importance - such as "smoking" or "female". A random effect will instead be the effect of a level of a predictor which just happened - by chance - to be present in the data. For example, a certain specific vehicle which was used in an experiment, the combination of circumstances which produced a trial or a well identified patient which happened to volunteer for a trial. Random effects represent historically an important point of convergence between Bayesian and non-Bayesian methods. But are certainly more natural within a Bayesian approach, where exchangeability assumptions can be made regarding levels of a network structure below the observable level. I better stop here before getting further into technical grounds and wish the reader the most profitable return from this welcome book.

REFERENCES [1] [2] [3] [4] [5]

Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D. The BUGS Book. Boca Raton: CRC Press; 2013. Welton NJ, Sutton AJ, Cooper NJ, Abrams KR, Ades AE. Evidence Synthesis for Decision Making in Health-Care. New York: Wiley; 2012. Bland M. An introduction to medical statistics. 3rd edition. Oxford: Oxford University Press; 2000. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. New York: Wiley; 2004. Dobson A, Barnett AG. An introduction to Generalized Linear Models. 3rd edition. Boca Raton: CRC Press; 2008.

Complimentary Contributor Copy

Complimentary Contributor Copy

FOREWORD: THE EPIDEMIOLOGIST’S PERSPECTIVE Gordon H. Guyatt*, M.D. Department of Clinical Epidemiology and Biostatistics, McMaster University, Ontario, Canada In 1991, the year the term ―evidence-based medicine‖ (EBM) first appeared in the medical literature, [1] there was no training in understanding relevant published research studies in either medical schools or post-graduate training programs. Thus, most clinicians had no idea how to read methods and results of studies that should have informed their clinical practice, or to understand the results in a way that facilitated rational practice or shared decision-making. Neither did the experts on whom clinicians relied understand how to use the literature. The result was expert recommendations that disagreed with one another, that lagged a decade or more behind the relevant evidence, or were inconsistent with the available evidence. [2] It was in this world that EBM arose as an attempt to facilitate clinical practice consistent with the best available evidence. [3] Initially there was a single underlying principle of EBM – some evidence is more trustworthy than other evidence. Later, the second principle became evident: value and preference judgments are involved in every clinical decision, and evidence itself never determines the right course of action. [4] In 1991, the science of systematic reviews was in its infancy, and the initial focus of EBM was on guiding the critical appraisal and interpretation of primary studies. Over the years, in parallel with the maturing of the science of systematic reviews, it became evident that individual studies were likely to be unrepresentative, that one could not expect clinicians to do their own reviews of the relevant literature, and that optimal clinical care – and optimal expert recommendations – needed pre-processed information. Only in the latest iteration of the Users‘ Guides to the medical literature has the necessity for systematic summaries of the best available evidence to ensure optimal practice been acknowledged as the third core principle of EBM. [5] In 2014, systematic reviews are universally acknowledged as an essential foundation of optimal clinical practice, and meta-analysis – the generation of single best estimate of *

Corresponding author: Gordon H. Guyatt, MD, Department of Clinical Epidemiology & Biostatistics, McMaster University, Rm. 2C12, 1280 Main Street West, L8S 4K1 Hamilton, Ontario, Canada. Phone: +1 9055259140 x22900. Fax: +1 9055243841. Email: [email protected].

Complimentary Contributor Copy

xxii

Gordon H. Guyatt

intervention effects on a particular outcome – as a valuable tool for guiding clinical decisionmaking. The last decade has seen, however, an increasing acknowledgement that existing review methodology that involves the comparison between a single intervention and single alternative has important limitations. Clinicians and patients in 2014 are often faced with the choice between multiple alternative management strategies. Which of a dozen antidepressants, a dozen biologic agents for inflammatory disease, or a handful of alternative surgical procedures for tibial fractures, will yield optimal outcomes for their patients? The choice becomes even more difficult when – as is almost universally the case – investigators have undertaken direct or head-to-head comparisons of only some of the available alternatives. Network meta-analysis (NMA) that allows the simultaneous comparison of multiple agents against one another, has arisen as one potential solution to this problem. The number of network meta-analyses is exploding, intensive methodologic work advancing the new approach is ongoing, and improvements in sophistication and presentation are occurring on an almost monthly basis. Despite the continued rapid pace of developments of network meta-analysis, this book is timely. Drawing on the experience and the innovations of the leaders in the field, the book presents a state of the art picture conveying both the basic principles and examples of the range of optimal network meta-analysis practice. As is inevitable in a rapidly-evolving new methodology, there is no shortage of alternative perspectives and of accompanying controversy. This foreword provides an opportunity for me to air some of my particular perspectives. The most seductive, and dangerous, aspect of network meta-analysis is the ranking of treatments. The ranking will be irresistible for the "don't-expect-me-to-think-just-tell-mewhat-do" sort of clinicians. Often, these rankings are unequivocally misleading. The methodology allows for interventions to be ranked first when their superiority to alternatives is marginal, based on evidence warranting only low or very low confidence, or both. Under such circumstances, the rankings are doing more harm than good. The solution to the problem – and indeed the key to making transparent network metaanalysis results that can be obscure and even impenetrable – is for an explicit presentation of confidence in both direct and indirect estimates for each paired comparison in the network. Sophisticated methodology for making confidence ratings in direct estimates is already available and widely used. There are challenges to making similar ratings to indirect estimates, but they are surmountable. This book should be mandatory reading for current or aspiring practitioners of network meta-analysis – there is no better way to understand state-of-the-art thinking regarding this crucial new tool for achieving optimal evidence-based practice. Clinicians who wish to be thoughtful in their application of evidence from network meta-analysis will also find the book extremely informative. Network meta-analysis represents a new adventure in the summarization and presentation of critical evidence for clinical decision-making, and this book a key into that adventure.

Complimentary Contributor Copy

The Epidemiologist‘s Perspective

xxiii

REFERENCES [1] [2]

[3] [4]

[5]

Guyatt G. Evidence-based Medicine. ACP J Club (Annals of Internal Medicine) 1991; 114: A-16. Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. JAMA 1992; 268: 240-8. Evidence-Based Medicine Working Group. Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA 1992; 268: 2420-5. Guyatt GH, Haynes RB, Jaeschke RZ, Cook DJ, Green L, Naylor CD, Wilson MC, Richardson WS. Users' Guides to the Medical Literature: XXV. Evidence-based medicine: principles for applying the Users' Guides to patient care. Evidence-Based Medicine Working Group. JAMA 2000; 284: 1290-6. Guyatt G, Jaeschke R, Wilson M, Montor V, Richardson S. What is evidence-based medicine. In: Guyatt G, Meade M, Cook D, editors. The Users' Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. 3rd ed. New York, New York: McGraw-Hill; 2014.

Complimentary Contributor Copy

Complimentary Contributor Copy

FOREWORD: THE TRANSLATIONAL RESEARCHER’S PERSPECTIVE Giacomo Frati*, M.D. Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Latina, Italy

Funding: None Conflicts of interest: Prof. Frati holds a patent concerning stem cells in cardiovascular medicine (Patent Italy-RM2003A000376 31.07.2003 - WO2005012510, 2005-02-10 Giacomello A, Messina E, Battaglia M, Frati G, Method for the isolation and expansion of cardiac stem cells from biopsy, Owner: Sapienza University of Rome) and a patent concerning platelet lysate in regenerative medicine (Patent Italy-RM2011A00050023.09.2011-IT- Frati L, Frati G, Nuti M, Pierelli L. Platelet lysate, uses and method for the preparation thereof-Lysat de plaquettes, ses utilisations et son procédé de préparation, Applicants: Sapienza University of Rome [it/it]; (it). Futura Stem Cells sa [ch/ch]; (ch) Pub. No.: WO/2013/042095, International Application No.: PCT/IB2012/055062). Prometheus, giving fire to humanity provides an act that enabled progress and civilization. He is known for his intelligence and as a champion of mankind. Ἡζίοδος, Θεογονία, 700 b.C. Translational research represents a stem of scientific research that helps to make findings from basic science useful for practical applications that enhance human health and wellbeing. Deeply established on multi-disciplinary collaboration, translational research has the enormous potential to move forward applied science. To date this is particularly true in medicine with the so called translational medicine, research that aims to move from bench to bedside or from laboratory experiments through clinical trials to point-of-care patient *

Corresponding author: Prof. Giacomo Frati, Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Corso della Repubblica 79, 04100 Latina, Italy. Phone: +39 07731757245. Fax: +39 07731757254. Email: [email protected].

Complimentary Contributor Copy

xxvi

Giacomo Frati

applications. The prevalent theme of this stem of research is without doubt represented by regenerative medicine, with all the stem cell-based approaches corresponding to the zenith of the epiphenomena. Nevertheless, the concept of tissue renewal is not completely new. In Greek mythology, Prometheus stole fire from Zeus and gave it to mankind. As punishment, Zeus had him chained to a rock where a great vulture tore at his liver every day. During the night, the liver grew whole again, only to have the vulture devour it again the next day. Moving ahead, in 1894 the first classification of tissues based on their renewal potential was published by Giulio Bizzozzero (1846-1901), an Italian Professor working at the University of Pavia and at the University of Turin, both in Italy. After reviewing his own data and following a careful revision of the literature available at that time, Bizzozzero concluded that all tissues belonged to one of three categories: "labile", "stable" or "everlasting". [1] After a century, the regrowth of Prometheus‘ liver has become to date a symbol to medical researchers of the possible renewal of damaged human organs through the use of stem cells. Thus over the past two decades, great excitement and controversy have been growing in the scientific, financial and lay literature for the potential of stem cell and tissue engineering-based strategies for the treatment of acute and chronic diseases. Such interest arises from the premise that this approach could provide for the first time not only a tool for injury control but also the absolute damage elimination, meaning the complete removal of the underlying cause of pathology. So, the related chances in medicine and research for fellows, graduating PhD students and for aspiring physician-scientists have never been so exciting and extraordinary with the scientific and academic environments constantly changing, evolving and transforming themselves. These physiological adaptations are mainly related to the evolution of motivation and values among different generations and their interaction with new technologies. Recent developments in characterization of stem cell biology, in molecular genetic screening, proteomics, and metabolomics are changing the way with which investigators identify novel signal transduction pathways and functional cross-talks and develop new therapeutic strategies and patterns. Evolving imaging technologies at anatomic as well as molecular level can now better elucidate the fundamental processes contributing to several diseases. New insights into the mechanisms involved in cell turnover, dysfunction and apoptosis have important therapeutic implications especially for elderly patients. Stem cellbased therapies joined to tissue engineering expertise hold the potential to revolutionize the treatment of organ's failure by achieving what would have been unthinkable only a few years ago: tissue regeneration. It can be that the healing potential of this new therapeutic approach may clarify the reason why translational efforts have proceeded at so rapid pace. Improvements in the pharmacology field thanks to the discovery of new molecules thumping selective targets or providing a pharmacologic treatment of a genetic disease targeting the signaling pathway between the genetic mutation and the phenotypic defect have lead to a better-quality treatment of many diseases. This global flow of information and knowledge concerning very different pathologies and treatments represents, to date, a revolution similar to the one that signaled the start of the Renaissance. In fact, as the Renaissance saw a blossoming of arts and knowledge mostly due to a single invention – Gutenberg‘s press machine– that allowed a multiplication of the amount of information that any individual or organization could receive, elaborate and transmit in a given period of time, in the same way Internet has technically made possible a

Complimentary Contributor Copy

The Translational Researcher‘s Perspective

xxvii

fall of hundreds of times in receiving and acquiring information, data processing, sending information and, thus, in the global diffusion of knowledge. But how all these information can really affect the clinician's daily practice especially in the field of translational research? As the perfect mentor should awake the interest, the curiosity, the criticism of their students and residents using all the methods and formats they are familiar with, in the same way the perfect literature review should be viewed as a methodological approach able to ensure the highest safety and quality of care to our patients. Researchers, physicians and also patients today are literary inundated by manifold completely different and alternative management strategies and the correct interpretation of so many data pertinent to translational medicine may result in a total confusion. On 23 May 2013 this confusion flew into the Italian Health Minister‘s decision to endorse a law regarding controversial stem cell-based therapies in 32 young terminally ill patients. The emotion of those whose children were terminally ill demonstrated to be a powerful weapon, more powerful even than the current literature and the path pioneered by Evidence Based Medicine. [2] As we agree that decisions about where to invest in and where to cut out should be based on clinical evidence and not on ideology, personal experience or informed opinion, all extremely difficult to appraise and constructively criticize, the lack of scientific evidence proving the efficacy of any kind of therapy must imply its outlaw status. Translational research as well as translational medicine require that information and data flow from hospitals, clinics and study participants in an organized and structured format, to repositories, laboratories and back. Also, the scale, scope and multi-disciplinary approach that translational research requires means a new level of operations management capabilities within and across studies, repositories and laboratories and once again back. Fast forward to current evidence-based practice concerning translational medicine applied to myocardial regeneration and exploited by bone marrow derived stem cell, we are faced with a plethora of small or moderately-sized trials of different quality, sometimes challenging each other. In the best-case scenario in which they do not contradict themselves, they may push researchers, clinicians and also patients (the worst situation!!!) in different directions and provide dissimilar effect estimates with mixed results and with benefits ranging from absent to transient and, at most, marginal. To date, probably the mainstay of the entire field has been the extended and often exceedingly liberal use of bone marrow stem cells, here meant as a concept as well (and as much) as a popular cell type in almost every stem cell lab around the world. The only other conceivable way to address such a challenge in a constructive and practical fashion thus generating a potential solution to this problem is to rely on systematic reviews and meta-analyses that pool those small or moderately-sized studies. [3] Lastly, network meta-analysis (NMA), allowing the simultaneous comparison of multiple data against one another, may represent the better way to elucidate this blend of results. This book edited by Dr. Biondi-Zoccai constitutes an exceptional opportunity for readers at all levels to become skilled in network meta-analysis. What the reader can expect to learn from this kind of cooperation is not only to acquire basic competences, but also the ability of "manipulating" data. In brief, this project starts from a core which is made of observations, concepts and tools, and aims at wedding advanced statistical methods and common sense of the busy clinicians who always, albeit implicitly, indirectly compares different randomized trials. Here, Biondi-Zoccai hits the jackpot moving from insight to in-sight, and back.

Complimentary Contributor Copy

xxviii

Giacomo Frati

REFERENCES [1] [2]

[3]

Bizzozzero G, Salvioli G. Sulla struttura delle membrane sierose e particolarmente del peritoneo diaframmatico. G Accad Med Torino 1876; 19: 466. Frati P, Frati G, Gulino M, Vergallo G, Di Luca A, Fineschi V. Stem cell therapy: from evidence-based medicine to emotion-based medicine? The long Italian way for a scientific regulation. Stem Cell Res Ther 2013; 4: 122. Biondi-Zoccai G, Peruzzi M, Frati G. Which do you like better…a bowl of Cheerios or a Big Mac? Pros and cons of meta-analyses in endovascular research. J Endovasc Ther 2013; 20: 145-8.

Complimentary Contributor Copy

FOREWORD: THE CLINICIAN-INVESTIGATOR’S PERSPECTIVE Antonio Abbate*, M.D., Ph.D. Division of Cardiology - VCU Pauley Heart Center, Department of Internal Medicine, Virginia Commonwealth University, Medical College of Virginia Hospitals, Richmond, VA, US To study the phenomenon of disease without books is to sail an uncharted sea, .. while to study books without patients is not to go to sea at all. Sir William Osler. "Books and Men" in Boston Medical and Surgical Journal (1901)

Physicians had been using meta-analysis long before they knew the meaning of the word. When faced with a clinical question, the physician searches his mind for learned paradigms in the diagnosis or treatment of a given disease. What if we asked ourselves where do we learn such paradigms from? We would immediately realize that we did not learn only from our own experience, as we have often diagnosed and treated conditions that we had not personally encountered before. We would also concede that we have not simply learned them from our seniors or advisors, as we may well use a drug or a test that was not available to them. We would smile and not even consider that we had learned them in medical school, as the lectures of anatomy, pathology and physiology appear so remote from the patient in front of us. We would never admit to guessing, as we consider medicine as a science, and guessing is not allowed. We would finally proclaim that ‗evidence‘ has shown so. Because, we, modern physicians, practice evidence-based medicine. But ‗Evidence‘? Which evidence? Can there be too much Evidence? An information overload?

*

Corresponding author: Antonio Abbate, MD, PhD, Division of Cardiology - VCU Pauley Heart Center, Department of Internal Medicine, Virginia Commonwealth University, Medical College of Virginia Hospitals, 410 North 12th Street, Richmond, VA 23298-0533, USA. Phone: +1 8048280513. Fax: +1 3603231204. Email: [email protected].

Complimentary Contributor Copy

xxx

Antonio Abbate

I remember, not long ago, when the first large randomized clinical trials came about in Cardiology. Finally long-awaited answers from thousands of patients randomly assigned to treatment A or placebo, and indications on what to expect in such instances. Studies of comparative efficacy soon followed, and we learned about treatment A and treatment B. A dream coming true: a better treatment, evidence-based! With the advent of clinical trials, the physicians found themselves learning terminology that had nothing to do with Hippocrates or Galen: the ―P value‖? the ―sample size‖? the ―effect size‖? With new knowledge, new challenges. What to do when interpreting 2 studies comparing treatments A and B that show conflicting results? Treatment A or treatment B? or no treatment at all? I doubt any physician would ever go with no treatment at all, and, by selecting and weighing the evidence in front of him, the physician will perform a meta-analysis and decide which treatment is best for the patient in question. In many ways, with much less evidence available (or at least with a different kind of evidence) the very same Hippocrates or Galen had to make decisions like this. With the advent of the Public Library of Medicine (PubMed) and the globalization of medical literature, the number of studies has grown exponentially, generating a large amount of data that is often too difficult to ‗digest‘. With more than 50 studies on the effects of anticoagulation on stroke in patients with atrial fibrillation, how is one physician expected to comprehensively review and comprehend the data and apply the results to the individual patient? Hence the birth and rapid growth of the meta-analytic techniques in medicine. Supported by solid mathematical modeling, meta-analysis allows for a comprehensive quantitative analysis of the effects of a given treatment in comparison with other treatments through direct or indirect comparisons. A network meta-analysis is an overview, a snap shot, of currently existing data, providing the physician reader with an immediate understanding of what comparisons have been directly or indirectly explored. As such, the results of a metaanalysis are, by definition, the most valuable method to ascertain the effects. This book edited by Dr. Biondi-Zoccai constitutes a unique opportunity for the reader to ‗learn‘ about network meta-analysis in one single venue. The collection of a multitude of expert opinions allows exploring from the technical aspects of the analysis to the important challenges in the interpretation of the results. A subsequent series of case studies in different disciplines serves as an ideal setting to explore practical applications. The authors also never neglect to examine shortcomings and flaws of the meta-analytic approach. The classic statement of ‗garbage in, garbage out‘ largely applies to the meta-analytic approach, as such the reader is not excused from learning about the individual clinical trials. The book goes in detail on how to perform (or to interpret) an adequately performed literature search. An inaccurate or biased search will inevitably lead to a ‗garbage‘ meta-analysis. The estimator of the meta-analytic technique will learn also when the evidence is so scant that even the most powerful analysis will have a limited impact, or when the inclusion criteria are too broad to allow for a reasonable input for patient care. Determining the question and designing the review is likely one of the most important steps: will the question provide an answer that is useful for the physician? And will the question be narrow enough to allow a clear determination of the problem? Yet, will it be broad enough to be applicable to a substantial number of subjects? In my practice, I find that the meta-analyses are rarely too narrow, especially if you consider how narrow is the context in which one practice Medicine: one single patient at a time! More often, the meta-analyses are too broad and the results have

Complimentary Contributor Copy

The Clinician-Investigator‘s Perspective

xxxi

a vague impact on patient treatment (i.e. anticoagulation prevents stroke in atrial fibrillation without addressing the individual anticoagulants or the characteristics of the patients enrolled). In this regard, I find the use of network meta-regression as an important additional tool in the attempt to ‗narrow‘ the scope and identify predictors of the effect (or the lack thereof). Should we abandon traditional research then? And rely on meta-analysis only? This is obviously absurd! Would you ever approach a statistician for help when you had not collected any data yet? It is obvious that the value of the meta-analysis lies in the adequacy of the data inputted. As such, in terms of studies (and data) the more the merrier. As a physicianresearcher, I think that the advent and large use of meta-analysis does not challenge the conduct of small pilot studies, but rather empowers them. A meta-analysis of many small studies (if small study effects and publication biases are appraised) may provide sufficient power to have conclusive results even when the individual studies were inconclusive. Does this mean that large studies are no longer needed? Definitely not the case, as the analysis will largely weigh the number of events, and as such there is nothing better than a study with a large number of observation or events. What about indirect comparisons? Will these obviate the need for head-to-head studies? Not at all, but it may guide in the conduct of such studies indicating an adequate design, sample size and strategy to achieve a definitive result. Appraising inconsistency between direct and indirect estimates is, indeed, an utmost important aspect of network meta-analysis. Unexpected findings from network meta-analysis are also not uncommon. Current clinical research practice demands for an a priori statement of the hypothesis and of the primary and secondary endpoints. As such, findings not included in the specified end-points are considered as second grade findings. It often happens that large meta-analysis will explore endpoints or associations that were not the primary object of the individual studies. While this has a value (and I will explain why) it certainly carries a large amount of uncertainty, especially if one considers the accuracy (or lack thereof) by which the data for such endpoints were collected. The choice of the question and the use of strict inclusion and exclusion criteria may allow you to exclude studies in which the endpoint of interest in the metaanalysis was not a pre-specified endpoint in the individual study or for which details on how accurate the data were is not available. Doing so, the analysis will be restricted to those studies with accurate data, at an obvious cost of loosing data from studies that don‘t meet such strict criteria. I will comment on the serendipity of research and the value of meta-analysis in this context. I view the meta-analysis as a large reading lens that occasionally allows the viewer to see patterns that would not be visible. Ingram Olkin, one of the pioneers in meta-analysis wrote ―I like to think of the meta-analytic process as similar to being in a helicopter. On the ground individual trees are visible with high resolution. This resolution diminishes as the helicopter rises, and in its place we begin to see patterns not visible from the ground.‖ Unexpected findings appear to be the rule rather than the exception in science. Paraphrasing Thomas Kuhn, scientists collect such unexpected data and continuously disregard the data because not fitting the current paradigm, this until the scientific society is ready to embrace a new paradigm and a paradigm shift occurs. As such the value of the meta-analysis as reading lens is welcome. As a laboratory researcher, however, I will also caution against unexpected findings and I will highlight that the accuracy of the experiment falls drastically when we measure variables that are not within the initial experimental field. As such I believe that

Complimentary Contributor Copy

xxxii

Antonio Abbate

incidental or unexpected findings should be neither disregarded nor accepted as truth but rather used as the basis for further exploration. At the end of the day, we should all realize not only what we know but also how much we do not know, and in this the meta-analysis may help better frame the next steps in groundbreaking research.

Complimentary Contributor Copy

PREFACE Giuseppe Biondi-Zoccai*, M.D. Assistant Professor in Cardiology, Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Latina, Italy Why should you read this book? First and foremost, because as far as we are aware it is the first and only textbook solely dedicated to this novel, fascinating, yet highly promising topic in scientific research. Many other scholarly reasons apply, but before detailing them, I would like to use an explanation for my lifetime interest in meta-analysis in general, and in network meta-analysis in particular, that I often propose when questioned by unaccustomed people. After having published more than 122 meta-analyses between 2003 and 2014, [1-2] thus averaging more than 10 per year, many colleagues, trainees and laypersons often ask me why I like so much this type of research tool. My answer begins at large. Indeed, I typically acknowledge that my mother is a successful antique dealer, my father is an experienced certified accountant, and I was born and raised in a small Italian region, Liguria, where greed is not always considered a vice, but occasionally even a kind of virtue. In other words, I enjoy maximizing the results of my efforts and minimize the required resources. Accordingly, I enjoy meta-analysis as this research design is likely the most efficient one in the whole scientific portfolio. There are of course other reasons why meta-analyses are important and influential, and thus people should be able to carefully and insightfully read them and, if committed, design and conduct them independently. [3-5] Meta-analysis improves precision and power in statistical inference, enables the appraisal of heterogeneity, highlights important moderators, and, last but not least, may test hitherto unproved hypothesis. Indeed, network meta-analysis, also referred to as multi-treatment meta-analysis or mixed treatment comparison, has come full circle and now enables really scientific hypothesis testing within the established Popperian framework. [6] The key critique of meta-analysis, since the pioneering efforts of Gene Glass, [7] was indeed that nothing original could come from a mere collection and synthesis of established *

Corresponding author: Giuseppe Biondi-Zoccai, MD, Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Corso della Repubblica 79, 04100 Latina, Italy. Phone: +39 07731757245. Fax: +39 07731757254. Email: [email protected].

Complimentary Contributor Copy

xxxiv

Giuseppe Biondi-Zoccai

data from original yet external sources. Even the fact that we define original studies as primary research and systematic reviews as secondary research may imply a pejorative view of the latter type of research design. One of the key appealing features of network metaanalysis is conversely that it challenges and overcomes altogether this limitation in its scope. By enabling accurate predictions on future would-be comparisons it can inform well in advance and still with precise inferential features on important biologic and clinical phenomena, and eventually be disproved or tentatively confirmed. Accordingly, many may argue that network meta-analyses and mixed treatment comparisons represent the uppermost level in the evidence hierarchy for decision making, in medicine as well as in other scholarly fields. [8] They exploit data from direct (i.e. head-tohead) randomized trials and combine them, when appropriate, with indirect estimates, in order to obtain more precise and robust estimates of effect to determine which management strategy is the safest, most effective, or cost-effective. They have been developed recently thanks to the current widespread availability of high power computers and computedintensive resampling techniques, but have already succeeded in becoming a highly read and impactful type of research design. [9] Notably, we hereby consider network meta-analysis, mixed treatment comparison, and multiple-treatment meta-analysis as synonymous. While some authors have tried to identify distinctive features, it is best in our opinion to maintain a pragmatic stance and use these terms interchangeably whenever trials including apparently heterogeneous sets of treatments are included in a meta-analysis. Nonetheless, we do prefer the term network meta-analysis, as it best appears to convey the key distinctive feature and strength of a secondary research tool which goes well beyond the limits set by a traditional 2-arm parallel-group randomized clinical trial. Despite such enthusiasm for network meta-analysis, there is a dark side of the moon. Indeed, I first read about adjusted indirect comparison meta-analysis in 2003, [10] and completed my first such work exploiting the Bucher method in 2004. [11-12] Yet, it took me 8 years to be able to complete on my own my first truly network meta-analysis, and this depended much on everything I was taught at two seminal courses offered in Cambridge and Leicester in 2010 and 2011 by, respectively, David Spiegelhalter together with David Lunn, and Anthony Ades together with Keith Abrams. [13] These proved crucial teachings especially as they helped me master, at least relatively, the WinBUGS analytical package. In other words, learning how to correctly design, conduct and interpret a network metaanalysis is not easy, and may things may go wrong and provide you faulty or uninterpretable results. This is complicated by the lack, until now, of dedicated textbooks. While exceptionally useful resources are already there, and include the United Kingdom National Institute for Clinical Excellence (NICE) Decision Support Unit (DSU) Technical Support Document (TSD) publications, [14-15] and the United States Agency for Healthcare Research and Quality (AHRQ) report, [16-17] traditional books are lacking, with the notable exclusion of the excellent textbook by Welton and colleagues. [18] In addition, this field of research methodology is developing steadfastedly, and thus what is correct and relatively established today might be considered useless or plainly wrong in a few years. Our hope is still, and with an apparent paradox, that this book will become obsolete and useless in five years or less. This would mean that many other better books on this topic have become available, that the field has progressed momentously from where it stands today, and that network meta-analysis has become more reliable, more impactful, and more widespread

Complimentary Contributor Copy

Preface

xxxv

in the scholarly literature as well as among decision makers. In the meanwhile, however, there is as far as we are aware no other book devoted solely to this topic in English or any other language, and thus we do recommend to consider its thorough perusal. When we planned the table of contents of this book and invited the many knowledgeable experts who finally agreed to contribute to the book, we aimed at providing a comprehensive perspective on network meta-analysis that could be useful for beginners as well as experts. I actually tried to think of the ideal book I would like to find in the bookshelves in the future, to buy and read avidly in order to get more acquainted with this topic. Indeed, the subject of the book is a comprehensive coverage of the theoretical and practical aspects of network metaanalyses and mixed treatment comparisons. Thus, this collection could be read by students or scholars interested in these topics, clinical researchers and practitioners, as well as statisticians, epidemiologists, psychologists, and sociologists. Accordingly, this book aims to cover briefly but poignantly the main topics which should be mastered to critically read and interpret as well as, if deemed worthwhile, perform and report a network meta-analysis and mixed treatment comparison. The first section of the book includes insightful perspectives from experts with different background such as statistics, epidemiology, clinical medicine, and translational research. The second section provides the background to the development and correct use of network metaanalysis, with emphasis on the hierarchy of evidence and paradigm shift from pairwise to network meta-analysis. The third section deals with the theoretical and practical aspects involved in designing and performing a systematic review, which can then be exploited for mixed treatment comparison purposes. Accordingly, details on designing and registering the review are provided, plus important suggestions on how to best search, abstract and assess evidence. The fourth section focuses on the explicit statistical issues at hand when conducting a network meta-analysis, and in particular with the choice of the framework (Bayesian versus frequentist), the model, and the primary set of statistics. The fifth section is more advanced, and will be of interest in particular to experts in this field, as it includes sophisticated topics such as network meta-regression, heterogeneity, inconsistency, small study effects, individual patient analysis, and how to best report and interpret network meta-analysis. The sixth section provides offers several real-world examples by foremost experts. Finally, how to move from statistical results to effective decision-making and action and what the future holds for mixed treatment comparisons have been boldly hypothesized the seventh section. To be frank, we recommend every reader to go through the whole book. Indeed, readers should not disregard topics which they already feel competent about nor worry about potential redundancy or repetitions of similar concepts in different chapters. The concept of a multi-author collection is a distinct favorable feature of this book. The reader will be able to read about different approaches, different styles, different types of software, and different experiences in network meta-analysis, and will hopefully achieve, after going through the whole book, a personal yet sound approach to interpret or conduct one. Moreover, it must be emphasized that the editorial team represents a veritable Who‘s Who of worldwide experts on these topics, and everyone involved in the collection has strived to provide correct and sound yet practical advice. The manual includes dozens of tables and illustrations to guide visually the reader in understanding the basics as well as the details of network meta-analyses. Quoted references are per se a uniquely useful component of this book, as they provide an explicit guidance to the best and most informative papers and books on mixed treatment comparisons or ancillary topics.

Complimentary Contributor Copy

xxxvi

Giuseppe Biondi-Zoccai

Thus, we sincerely hope that this book will be successfully perused by students or scholars interested in these topics, clinical researchers and practitioners, as well as statisticians, epidemiologists, psychologists, and sociologists. We enjoyed editing this work and hope that it will be similarly enjoying for you reading it. Nonetheless, suggestions, corrections and any kind of feedback is more than welcome, and readers are encouraged to contact me or directly any of the contributors for improvements or revisions in the material of this collection.

REFERENCES [1]

Biondi-Zoccai GG, Abbate A, Agostoni P, Parisi Q, Turri M, Anselmi M, Vassanelli C, Zardini P, Biasucci LM. Stenting versus surgical bypass grafting for coronary artery disease: systematic overview and meta-analysis of randomized trials. Ital Heart J 2003; 4: 271-80. [2] Biondi-Zoccai G, Lotrionte M, Thomsen HS, Romagnoli E, D‘Ascenzo F, Giordano A, Frati G. Nephropathy after administration of iso-osmolar and low-osmolar contrast media: evidence from a network meta-analysis. Int J Cardiol 2014; 172: 375-80. [3] Biondi-Zoccai GG, Abbate A, Sheiban I. Systematic reviews and meta-analyses "For Dummies". EuroIntervention 2009; 5: 289-91. [4] Biondi-Zoccai G, Lotrionte M, Landoni G, Modena MG. The rough guide to systematic reviews and meta-analyses. HSR Proc Intensive Care Cardiovasc Anesth 2011; 3: 16173. [5] Biondi-Zoccai G, Landoni G, Modena MG. A journey into clinical evidence: from case reports to mixed treatment comparisons. HSR Proc Intensive Care Cardiovasc Anesth 2011; 3: 93-6. [6] Biondi-Zoccai G, Frati G, D'Ascenzo F, Stone GW, Lotrionte M, Palmerini T. Network meta-analyses and mixed treatment comparisons: are they true scientific endeavors? Int J Cardiol 2013; 168: 1575-6. [7] Smith ML, Glass GV. Meta-analysis of psychotherapy outcome studies. Am Psychol 1977; 32: 752-60. [8] Li T, Puhan MA, Vedula SS, Singh S, Dickersin K; Ad Hoc Network Meta-analysis Methods Meeting Working Group. Network meta-analysis-highly attractive but more methodological research is needed. BMC Med 2011; 9: 79. [9] Palmerini T, Biondi-Zoccai G, Della Riva D, Stettler C, Sangiorgi D, D'Ascenzo F, Kimura T, Briguori C, Sabatè M, Kim HS, De Waha A, Kedhi E, Smits PC, Kaiser C, Sardella G, Marullo A, Kirtane AJ, Leon MB, Stone GW. Stent thrombosis with drugeluting and bare-metal stents: evidence from a comprehensive network meta-analysis. Lancet 2012; 379: 1393-402. [10] Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses. BMJ 2003; 326: 472. [11] Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J Clin Epidemiol 1997; 50: 683-91.

Complimentary Contributor Copy

Preface

xxxvii

[12] Biondi-Zoccai GG, Agostoni P, Abbate A, Testa L, Burzotta F, Lotrionte M, Crea F, Biasucci LM, Vetrovec GW, Colombo A. Adjusted indirect comparison of intracoronary drug-eluting stents: evidence from a metaanalysis of randomized baremetal-stent-controlled trials. Int J Cardiol 2005; 100: 119-23. [13] Biondi-Zoccai G, Malavasi V, D'Ascenzo F, Abbate A, Agostoni P, Lotrionte M, Castagno D, Van Tassell B, Casali E, Marietta M, Modena MG, Ellenbogen KA, Frati G. Comparative effectiveness of novel oral anticoagulants for atrial fibrillation: evidence from pair-wise and warfarin-controlled network meta-analyses. HSR Proc Intensive Care Cardiovasc Anesth 2013; 5: 40-54. [14] Dias S, Welton NJ, Sutton AJ, Ades AE. NICE DSU Technical Support Document 1: Introduction to evidence synthesis for decision making. 2011; last updated April 2012. Available from: http://www.nicedsu.org.uk/TSD1%20Introduction.final.08.05.12.pdf (last accessed on January 28, 2014). [15] Dias S, Welton NJ, Sutton AJ, Ades AE. NICE DSU Technical Support Document 2: A Generalised Linear Modelling Framework for Pairwise and Network Meta-Analysis of Randomised Controlled Trials. 2011; last updated March 2013. Available from: http://www.nicedsu.org.uk/TSD2%20General%20meta%20analysis%20corrected%20 Mar2013.pdf (last accessed on January 28, 2014). [16] Coleman CI, Phung OJ, Cappelleri JC, Baker WL, Kluger J, White CM, Sobieraj DM. Use of Mixed Treatment Comparisons in Systematic Reviews. Methods Research Report. (Prepared by the University of Connecticut/Hartford Hospital Evidence-based Practice Center under Contract No. 290-2007-10067-I.) AHRQ Publication No. 12EHC119-EF. Rockville, MD: Agency for Healthcare Research and Quality. August 2012. Available from: www.effectivehealthcare.ahrq.gov/reports/final.cfm (last accessed on January 28, 2014). [17] Jonas DE, Wilkins TM, Bangdiwala S, Bann CM, Morgan LC, Thaler KJ, Amick HR, Gartlehner G. Findings of Bayesian Mixed Treatment Comparison Meta-Analyses: Comparison and Exploration Using Real-World Trial Data and Simulation. (Prepared by RTI-UNC Evidence-based Practice Center under Contract No. 290-2007-10056-I.) AHRQ Publication No. 13-EHC039-EF. Rockville, MD: Agency for Healthcare Research and Quality; February 2013. Available from: www.effectivehealthcare. ahrq.gov/reports/final.cfm (last accessed on January 28, 2014). [18] Welton NJ, Sutton AJ, Cooper NJ, Abrams KR, Ades AE. Evidence Synthesis for Decision Making in Health-Care. New York: Wiley; 2012.

Complimentary Contributor Copy

Complimentary Contributor Copy

2ND SECTION

Complimentary Contributor Copy

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 1

THE HIERARCHY OF EVIDENCE Oscar L. Morey-Vargas M.D.1, Claudia Zeballos-Palacios M.D.2, Michael R. Gionfriddo Pharm.D.,3 and Victor M. Montori M.D., M.Sc.4 1

Instructor of Medicine, Endocrinology Fellow, Division of Endocrinology, Diabetes, Metabolism, and Nutrition, Mayo Clinic, Rochester, MN, US Knowledge and Evaluation Research (KER) Unit, Mayo Clinic, Rochester, MN, US 2 Research Fellow, Knowledge and Evaluation Research (KER) Unit, Mayo Clinic, Rochester, MN, US 3 PhD candidate, Mayo Graduate School, Mayo Clinic, Rochester, MN, US; Knowledge and Evaluation Research (KER) Unit, Mayo Clinic, Rochester, MN, US 4 Professor of Medicine, Division of Endocrinology, Metabolism, Nutrition, and Diabetes, Rochester, MN, U.S.; Lead Investigator, Knowledge and Evaluation Research (KER) Unit, Mayo Clinic, Rochester, MN, US Director, Community Engagement in Research and Late-Stage Translation, Center for Clinical and Translational Sciences, Mayo Clinic, Rochester, MN, US

ABSTRACT Two fundamental principles of evidence-based medicine are that decisions should be based on systematic summaries of the body of evidence, and that there is a hierarchy of evidence that arranges study designs by their susceptibility to bias. Hierarchies, however, are not absolute, and our current understanding of them has evolved into systems that integrate the hierarchy into more sophisticated structures for rating the quality of the body of evidence of specific health care questions. We have moved from rating the quality of individual studies to an ―outcomes-centric‖ approach that rates the quality of evidence for each outcome across all available studies. Network meta-analysis is a 

Corresponding author: Victor M. Montori M.D. Division of Endocrinology, Diabetes, Metabolism, and Nutrition, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States of America. Tel.: 1-507-293-0175. Fax: 1-507-538-0850. Email: [email protected].

Complimentary Contributor Copy

4

Oscar L. Morey-Vargas, Claudia Zeballos-Palacios, Michael R. Gionfriddo et al. sophisticated and promising technique that uses both direct and indirect study results to compare the relative effectiveness of multiple interventions on an outcome of interest. This technique may offer the best chance to understand the evidence when many competing treatments are available. Nevertheless, the evaluation of these networks requires careful considerations about the validity of the indirect comparisons, as well as other factors that may potentially affect the interpretation of the results. In particular, determinants of confidence related to incomplete reporting, inconsistency, and indirectness are of major concern in the analysis of network meta-analyses, and should be looked for and evaluated carefully when interpreting their results.

Keywords: Evidence-based medicine, hierarchy of evidence, network meta-analysis

INTRODUCTION Evidence-based medicine (EBM) is the ―conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients‖. [1] The term EBM first appeared in 1991 in an article by Guyatt in the American College of Physicians‘ Journal Club, [2] and since then, the theory and practice of EBM has continued to evolve. EBM prioritizes evidence derived from clinical research over unsystematic clinical experience and pathophysiologic rationale. [3] The first and fundamental principle of EBM is the recognition that decisions should be based on systematic summaries of the body of evidence. [4] The other two principles are that there is a hierarchy of evidence; and that evidence must be evaluated considering the patients‘ values and preferences. [3] In this chapter, we will review how increased rigor of design corresponds to a higher position in the hierarchy, as the opportunity for error, either by chance (random) or systematic (bias), decreases. [3] We will also explain why this hierarchy is not absolute, and that, ultimately, the interpretation of the medical literature requires the analysis of rigorous systematic reviews of the body of evidence. The contemporary system to describe the quality of evidence from an ―outcome-centered‖ perspective, as well as specific considerations derived from these concepts that apply to network meta-analyses will be reviewed. We will start by describing important strengths and limitations of commonly used study designs.

FROM UNSYSTEMATIC OBSERVATIONS TO CLINICAL TRIALS Unsystematic Observations EBM provides a framework for the teaching and practice of medicine that relies on systematically developed evidence over unsystematic observations and personal experiences [5]. Although observations can lead to profound insights and are crucial for the development of clinical instincts, they are often limited by small sample size and by a number of biases introduced by cognitive processes that make recall, summary, and the making of inferences based on one‘s experiences unreliable [6-9]. This lack of reliability has led scholars of EBM to place unsystematic observations at the bottom of evidence hierarchies [3].

Complimentary Contributor Copy

The Hierarchy of Evidence

5

Basic Science or Physiologic Studies Basic science or physiologic studies, both descriptive and experimental, provide us with valuable insights into how systems may work and are useful to generate hypotheses. However, the findings from these investigations do not always translate into meaningful clinical outcomes and thus without confirmation in humans, using rigorously designed studies, these mechanistic investigations provide weak evidence to support clinical decisions [3, 5].

Cross-Sectional Studies Cross-sectional studies seek co-existence of factors and report if exposures and outcomes are present or not, at a point in time. These studies can be done to estimate prevalence or infer causation [10]. Cross-sectional studies are useful because prevalence estimates are needed to calculate pre-test probabilities in order to estimate the likelihood of a particular diagnosis [10]. These studies cannot establish a temporal sequence between exposure and outcome, unless the exposure is a fixed characteristic (e.g. sex) [10]. Cross-sectional studies are susceptible to threats to internal validity posed by confounding or bias, and are less feasible for rare predictors or outcomes [11].

Case-Control Studies A case-control study is a type of observational study that begins with the identification of individuals who already have an outcome of interest (cases), and a suitable control group similar with respect to important known exposures but without the outcome of interest (controls) [10]. Case-control studies are useful when the outcome of interest is rare, when looking at outcomes that take a considerable amount of time to develop, or when studying an outcome with multiple potential etiologic or prognostic factors [10]. This design is well suited for studying uncommon adverse events associated with medications or other types of treatments. Case-control studies have several weaknesses, including a high susceptibility to unmeasured determinants of outcomes, susceptibility to several types of biases (e.g., selection, ascertainment, recall, or survival biases), inability to assess incidence or prevalence rates, and the possibility of looking only at one outcome [10, 11]. Despite all these shortcomings, there are several examples when case-control studies have changed medical practice, in particular on issues regarding harm [10]. These studies are always useful for generating hypotheses that can then be tested more rigorously by other methods [10].

Cohort Studies A cohort study enrolls individuals characterized by their exposure status and follows them for a period, with the expectation that some of them will develop the outcome of

Complimentary Contributor Copy

6

Oscar L. Morey-Vargas, Claudia Zeballos-Palacios, Michael R. Gionfriddo et al.

interest. Exposures that could be assessed include risk factors, prognostic indicators or therapeutic interventions. This type of study allows measuring the incidence or relative risk of developing the outcome of interest and comparing this risk among those exposed and those unexposed [11]. Cohort studies can be prospective or retrospective dependent on when the exposure is assessed. Well-conducted cohort studies can provide evidence in which clinicians can have high confidence, placing them high in hierarchies of evidence [3]. Cohort studies can follow patients for a long time, which makes them suitable for the study of the natural history of a disease, and for the detection of consequences of disease that occur long after exposures. In general cohort studies require large sample sizes and are impractical for rare outcomes. Prospective cohort studies are particularly expensive and time-consuming. Another potential disadvantage of prospective cohort studies is the temporal change in the behavior or performance of the studied individuals because they know that they are being observed (i.e., Hawthorne effect) [11]. Selection bias and loss to follow up are major potential causes of bias in cohort studies. Statistical methods such as multivariable regression are typically employed to control for the influence of potential confounding variables, however they cannot account for unmeasured determinants of outcomes, that, however unlikely, could explain the results [10].

General Considerations about Observational Studies Cross-sectional, case-control, and cohort studies are all observational (non-experimental). These studies play an important role yielding crucial data on prevalence, incidence, association, prognosis, and natural history of diseases [12-14]. However, the major limitation of all observational studies is that the exposure occurs by choice rather than by chance. Prognostic imbalance threatens the validity of all observational studies, and there is an unavoidable risk of selection bias and of systematic differences in outcomes that are not due to the treatment itself, but due to unrecognized confounding variables [15, 16]. Only randomization (i.e., a method which assigns exposure by chance rather than by choice) can potentially provide a reliable unbiased estimate of treatment effects by distributing determinants of outcomes, both known and unknown, equally in the comparison groups [12, 16]. The role of observational studies in the evaluation of medical interventions (e.g. competing treatments, diagnostic or screening tests) has long been an area of significant debate [13-20]. Studies from the 1970s and 1980s suggested that nonrandomized studies spuriously overestimate treatment benefits yielding misleading conclusions [21, 22]. More recent studies have shown that, in fact, observational studies and randomized controlled trials (RCTs) often have high correlation in their estimates of efficacy across several types of interventions [17-20]. The problem is that sometimes they do not and when this occurs the results can be disastrous for patients. There have been many examples in the recent literature in which randomized studies have found different results compared with promising observational studies that preceded them [23-26]. Such examples included hormone replacement therapy and the risk of coronary artery disease in postmenopausal women [23, 24], the impact of beta carotene on major cardiovascular events [25], and the relationship between vitamin E and cardiovascular disease [26].

Complimentary Contributor Copy

The Hierarchy of Evidence

7

Nevertheless, well designed and conducted observational studies can limit bias and contribute valuable information that can be used in patient care [13]. These studies can identify risk factors and prognostic indicators in situations in which randomized controlled trials would be unrealistic or unethical. Observational designs can yield long-term safety data, as well as information about the effectiveness of interventions in ―real-world‘ routine practice [13]. When discrepancies occur between observational studies and randomized trials, researchers should analyze and explore these discrepancies as it may yield valuable information and clarity about the available evidence [19].

Randomized Trials RCTs are considered the standard for evaluating the efficacy and safety of therapeutic and preventive interventions [3, 12]. A RCT is an experiment in which individuals are randomly allocated to either receive an intervention or control, and then followed to determine the effect of the intervention on one or more outcomes of interest [3]. The power of randomization is that treatment and control groups are likely to be balanced with respect to both known and unknown determinants of outcome, and thus create groups with the same prognosis [3]. A well-conducted trial protects against selection bias by limiting the opportunity for patients, clinicians, or investigators to choose which arm of the trial the participant will be assigned [12]. Blinding can help maintain prognostic balance. Patients, clinicians, data collectors, adjudicators of outcomes, and data analysts should, if possible, all be blind to the treatment assignment [3]. To preserve the balance in prognosis it is also important that RCTs follow the intention-to-treat principle [3, 27]. Furthermore, loss of follow-up can threaten the validity of an RCT because differential or large loss to follow-up can distort intervention effects. As with other study designs, RCTs are not without flaws. They usually employ strict inclusion or exclusion criteria that are intended to maximize the internal validity of the study. These criteria might select a well-defined study population with a higher capacity to benefit from treatment, and result in larger estimates of effect than studies with less strict eligibility criteria [11]. Thus, although RCTs are the gold standard to establish the efficacy of a specific therapy because of their high internal validity, their external validity, or applicability, is limited because they often study highly selected populations [11]. There are, however, new designs such as practical trials that seek to make RCTs more useful to decision makers at a public health level [28].

Evaluating the Entire Body of Evidence Rather than Individual Studies Those making decisions on the basis of a single study should be cautious when interpreting the results. Replication is a key step in science, and evaluating a single study does not offer the opportunity of assessing consistency [29]. Consistency reassures clinicians about the integrity of research and reduces concerns about false positive or false negative findings [29]. Bias of significant clinical relevance may exist in both RCTs and observational studies [30, 31]. Furthermore, RCTs frequently adopt design features - such as using composite endpoints, surrogate endpoints, unfair comparisons (e.g., inactive comparators to ensure a

Complimentary Contributor Copy

8

Oscar L. Morey-Vargas, Claudia Zeballos-Palacios, Michael R. Gionfriddo et al.

large effect), or selecting patients at higher risk of the outcome - that require readers to make assumptions and extrapolations that makes the interpretation and applicability of results confusing [29]. The use of composite endpoints, for example, can be misleading when certain components of the composite occur at different frequencies [3, 32]. It is also possible that extreme estimates of effect may be published early (Proteus phenomenon), potentially leading to healthcare decisions that are based on inaccurate or exaggerated estimates [29, 33]. A favorable result in support of a specific association is more likely to appear earlier than the less favorable one [29, 33]. Conversely, small negative trials may lead to premature dismissal of an effective intervention and might not be published [29]. Considering all these shortcomings, it is clear that no study should stand alone. Rather, we should focus on the larger body of evidence and systematic reviews of all available highquality studies.

Systematic Reviews and Meta-Analyses As we mentioned at the beginning of this chapter, the recognition that decisions should be based on systematic summaries of the body of evidence is now the first principle of EBM [4]. A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question [34]. It should clearly state objectives with explicit eligibility criteria for the included studies, and have a clearly described and reproducible methodology. The search for literature should be systematic and should attempt to identify all studies meeting the eligibility criteria. Findings are assessed for their validity, and are then presented and summarized in a systematic fashion [34]. These key steps are taken to reduce bias in the identification, selection, and presentation of results. Many systematic reviews contain meta-analyses. Meta-analysis is the use of statistical methods to summarize the results of independent studies. By pooling the data from all relevant studies, meta-analyses can provide more precise estimates of the effects of an intervention than those derived from the individual studies. Meta-analysis facilitates the exploration of differences across the body of evidence [34]. Meta-analyses do not improve the quality of the summarized evidence and will reflect any biases introduced in the study selection process or by the quality of the included studies [3]. Separately from the quality of the analyzed studies, it is important to evaluate the extent to which results differ from study to study (i.e., variability or heterogeneity) [3]. It has been argued that meta-analysis should only be considered when the included studies are sufficiently homogeneous in terms of participants, interventions and outcomes to provide a meaningful summary [34]. Systematic reviews are also subject to different types of reporting biases, including publication bias [34]. Publication bias refers to the selective publication of studies based on the direction, magnitude, or statistical significance of research findings [35]. Studies that show small effects or fail to find an effect tend not to get submitted to high impact journals or may remain unpublished [35]. To minimize the possibility of publication bias, it is important to follow a comprehensive search strategy that includes efforts to identify studies in different languages as well as unpublished evidence [3]. Publication bias should be seen as a possible explanation for the tendency of observing larger beneficial estimates of an intervention when only small trials are analyzed, particularly when sponsored by commercial entities that could benefit from the results [3, 34]. Furthermore, when an outcome is measured

Complimentary Contributor Copy

The Hierarchy of Evidence

9

and analyzed but not reported on the basis of the results, outcome reporting bias occurs. In this situation, the direction and magnitude of the estimated effect of an intervention may be affected, and not infrequently, may overestimate the true effect on a specific outcome [36]. Decision makers should be cautious about the possibility of outcome reporting bias when an outcome of interest is reported in a low proportion of trials contributing to a metaanalysis [37].

Hierarchy Evolved: From Individual Studies to Grading the Quality of the Body of Evidence The notion of a hierarchy of evidence has evolved. Initially, methodologists developed hierarchies of evidence that were based only on study designs and their ability to protect against bias (i.e., RCTs above observational studies) [38]. Subsequently, the precision of the results and the applicability of the evidence were also taken into consideration when creating these classifications (table 1) [3, 39]. The fact that hierarchies are not absolute was also recognized (e.g., results from high quality observational studies with sufficiently large and consistent effects may provide more reliable evidence than poorly conducted RCTs, even for therapeutic questions) [3]. The state-of-the-art approach to describe the quality of evidence was released by the Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) Working Group [40- 45]. GRADE offers a transparent and structured system for rating the quality of evidence in systematic reviews and guidelines, and for grading the strength of recommendations in guidelines [46]. This system is designed to answer specific questions about prevention, screening, diagnosis, therapy, and public health after carefully examining alternative management strategies [46]. GRADE is not about rating the quality of individual studies, rather, GRADE is ―outcomes-centric‖, and the quality of evidence is always rated for each outcome across all available studies. This approach no longer automatically considers systematic reviews as a higher form of evidence, but correctly looks at the quality of the studies included in these reviews [40-45]. The GRADE approach starts by defining a question in terms of the relevant setting, populations, alternative management strategies, and all patient-important outcomes [46, 47]. Because most systematic reviews do not summarize the evidence for all important outcomes, decision makers must often look for evidence from multiple systematic reviews. These reviews may include different study designs depending on the outcome of interest. For instance, RCTs may provide relevant evidence for benefits while observational studies may provide better evidence for rare but serious adverse effects [46, 47]. GRADE classifies the overall quality of a body of evidence for each outcome across studies - whether benefit or harm - into four levels: high, moderate, low and very low [48]. These ratings reflect the extent of our confidence in the estimates of an intervention‘s effects. The GRADE approach starts by evaluating the study design, from which the evidence is upgraded or downgraded. RCTs start as high-quality evidence and observational studies as low-quality evidence [46, 48]. Subsequently, confidence in the effect estimates may be compromised by a number of other factors that can downgrade the quality of evidence (i.e., risk of bias, indirectness, inconsistency, imprecision, and publication bias). There are also

Complimentary Contributor Copy

10

Oscar L. Morey-Vargas, Claudia Zeballos-Palacios, Michael R. Gionfriddo et al.

factors - particularly relevant to observational studies - that may lead to rating up the quality (table 2). Table 1. A Hierarchy of strength of evidence that considers the study designs and the applicability of the evidence  N-of-1 randomized trial  Systematic reviews of randomized trials  Single randomized trial  Systematic review of observational studies  Single observational study  Basic science and mechanistic studies  Unsystematic clinical observations Adapted from Guyatt et al. [3].

Table 2. GRADE’s approach to rating quality of evidence Study design

Randomized trials

Initial quality of a body of evidence High

Lower category if

Higher category if

Quality of a body of evidence

Risk of Bias Minus 1 if serious. Minus 2 if very serious.

Large effect Plus if large. Plus 2 if very large.

High (effect estimate very likely to be close to true effect)

Dose response Plus 1 if evidence of a gradient.

Moderate (effect estimate might be close to true effect)

All plausible residual confounding Plus 1 if would reduce a demonstrated effect. Plus 1 if would suggest a spurious effect if no effect was observed.

Low (effect estimate may be substantially different from the true effect)

Moderate

Observational studies

Low

Inconsistency Minus 1 if serious. Minus 2 if very serious. Indirectness Minus 1 if serious. Minus 2 if very serious.

Very low Imprecision Minus 1 if serious. Minus 2 if very serious.

Very low (effect estimate is likely to be substantially different from the true effect)

Publication bias Minus 1 if likely. Minus 2 if very likely.

Adapted from Balshem H et al. [48].

It is important to recognize that although GRADE provides a transparent framework for assessing the quality of evidence, it does not eliminate the need for subjective judgments [48].

Complimentary Contributor Copy

The Hierarchy of Evidence

11

Rating the evidence requires an organized approach that simultaneously considers all the above mentioned factors that may upgrade or downgrade the quality of evidence. We will now review the most relevant aspects of these domains:

Risk of Bias (Study Limitations) There are potentially serious methodological flaws in the design and execution of RCTs and observational studies that may lead to rating down the quality of evidence. Several tools are available to help to judge whether study limitations are present or not [49]. Table 3 summarizes the most important limitations in RCTs and observational studies. Not only is it important to evaluate the risk of bias of individual studies, a summary risk of bias must be synthesized for the body of evidence. Table 3. Study limitations Randomized trials  Lack of allocation concealment. 

Lack of blinding.



Incomplete accounting of patients and outcome events (e.g., loss to follow-up and failure to adhere to the intention-to-treat principle).  Selective outcome reporting bias.  Other (e.g., stopping early for benefit, unvalidated outcome measures use, carryover effects in crossover trial, recruitment bias in cluster-randomized trials). Adapted from Guyatt GH et al. [49].

Observational studies  Failure to develop and apply appropriate eligibility criteria.  Defective measurements of exposures and/or outcomes.  Failure to adequately control other determinants of outcomes.



Incomplete follow-up.

Indirectness Quality of evidence may be affected when the population, intervention, or outcomes in relevant studies differ from those in which we are interested [50]. This situation is known as indirectness, or the level of separation between your clinical question and the one addressed by the systematic review. Substantial differences between the studied participants and the target population, or between the interventions used and the treatment under evaluation (applicability), may affect our confidence in the results. In general, however, one should not rate down for population or intervention differences unless there are compelling reasons to think that the biology is different enough to substantially modify the magnitude of the effect [50]. Indirectness may also be a limitation when there are differences in the outcome measures (e.g., use of surrogate endpoints in place of patient-important outcomes), or differences in settings (e.g., varying levels of technical sophistication). Finally, quality of evidence may be downgraded if head-to-head trials are

Complimentary Contributor Copy

12

Oscar L. Morey-Vargas, Claudia Zeballos-Palacios, Michael R. Gionfriddo et al.

unavailable and we must rely on indirect comparisons [50]. This last consideration is particularly relevant for network meta-analyses, as we will discuss later.

Inconsistency Our confidence in the estimate of benefits or harms is lower if studies show large differences in the direction and magnitude of the relative measures of effect (i.e., significant heterogeneity) [51]. GRADE recommends rating down the quality of evidence if significant heterogeneity remains unexplained after exploration of a priori hypotheses that might explain inconsistency. Decision makers should consider rating down for inconsistency when point estimates vary widely across studies, confidence intervals (CIs) show minimal or no overlap, the statistical test for heterogeneity shows a low p-value or the I2 is large [51]. Significant heterogeneity across studies can be secondary to differences in the studied participants (e.g., disease severity), interventions (e.g., dosage, comparison, or cointerventions), outcomes (e.g., duration of follow-up), or the study methods. If any of these categories provides the explanation, different estimates across subgroups should be provided [51].

Imprecision When judging precision, GRADE recommends focusing on the 95% CI around the difference in absolute effect between intervention and control for each outcome [52]. When studies include few patients and few events, and thus have wide CIs, the quality of the evidence is lower due to the uncertainty of the results. Even if CIs appear satisfactorily narrow, when effects are large and both sample size and number of events are modest, rating down for imprecision should be considered [52]. Deciding on a relevant clinical decision threshold between recommending and not recommending an intervention is a key aspect of the assessment of precision, and again, relies on judgment. A number of factors will influence this decision, including the importance of the outcome, adverse effects, patient burden, resource use and practicality of the intervention [52].

Publication Bias The quality of evidence may be downgraded if publication bias is suspected. Empirical observations have shown that studies with statistically significant results are more likely to be published than negative studies [35, 53]. As described above, publication bias may lead to overestimation of effect in situations where only small trials are available and particularly when industry funded. Evaluation of patterns of results (e.g., funnel plots) may suggest publication bias, but should be interpreted carefully due to lack of accuracy [53]. Additional criteria for suspicion of publication bias include relatively recent RCTs addressing novel therapies, and systematic reviews‘ failure to conduct comprehensive search strategies (e.g., including unpublished studies) [53].

Complimentary Contributor Copy

The Hierarchy of Evidence

13

Rating up the Quality of Evidence GRADE proposed three criteria for upgrading the quality of evidence that are mostly applicable to observational studies, and that are encountered infrequently [54]. Rating up one or even two levels of quality is possible when the magnitude of effect is sufficiently large and the impact of the intervention is substantial, particularly if it occurs over short periods (e.g., insulin to prevent mortality in diabetic ketoacidosis). Similarly, high-quality evidence can come from epidemiological studies of public health interventions showing large, consistent and precise estimates of effect [54]. The presence of a dose-response gradient, or judging that plausible residual confounding would further support inferences regarding treatment effect, may also enhance the quality of evidence [54]. Evidence should come from observational studies that follow rigorous methodological strategies and account for all plausible determinants of outcomes and biases. The possibility of ‗‗residual confounding‘‘ or ‗‗residual biases‘‘ should, however, be recognized.

Network Meta-Analyses in the Context of Our Current Understanding of the Quality of Evidence Most meta-analyses focus on pairwise direct comparisons of treatments, and therefore, do not permit inferences about the comparative effectiveness of more than two interventions unless all have been compared directly in head-to-head trials [34]. Unfortunately, many competing interventions have not been compared directly in high quality RCTs, and even when comparisons exist, such direct evidence is often limited and insufficient [55-59]. Furthermore, the use of ―naive‖ indirect comparisons in which results of individual arms between different trials are compared as if they were from a single trial should be avoided because they break randomization and are liable to bias [34, 60]. Network meta-analysis, the subject of this book, (also known as ―multiple-treatments meta-analysis‖ or ―mixed treatment comparisons meta-analysis‖) is a useful technique that can be used to estimate the relative efficacy of many competing interventions by analyzing simultaneously the evidence from direct and indirect comparisons [55-59]. Network metaanalysis has advantages over pairwise meta-analysis as it facilitates simultaneous comparisons of several interventions, creates treatments rankings, allows the estimation of the effect size of interventions that have not been compared in head-to-head RCTs, and can potentially improve the precision of existing direct evidence [55-59]. When complex networks are available, adjusted indirect estimates can be calculated through several loops and different intermediate comparators. In fact, results of adjusted indirect comparisons in trial networks usually - but not always - agree with the results of head-to-head RCTs [60]. However, it is important to recognize that current hierarchies of evidence place indirect and mixed comparisons below direct comparisons [34]. Direct comparisons have the advantage of randomization, and in this regard are considered superior [34, 56]. Indirect comparisons are not randomized comparisons, so they may suffer the biases of observational studies (i.e., confounding), and may not be able to account for hidden and unmeasured determinants of outcomes [56]. Conversely, there is no guarantee that direct comparisons are less affected by bias than indirect comparisons, and it is possible that in some situations indirect evidence may be more reliable [56]. Making sense of these differences is a matter of

Complimentary Contributor Copy

14

Oscar L. Morey-Vargas, Claudia Zeballos-Palacios, Michael R. Gionfriddo et al.

judgment, and can be a complicated task. In situations when both direct and indirect comparisons are available, it has been recommended that the two approaches be considered separately and carefully examined for incoherence [34]. When differences between the direct and the indirect estimates cannot be explained by chance, possible reasons should be investigated, including: publication bias, differences in the risk of bias between studies, and significant inconsistency [60]. Unless there are design flaws in the head-to-head trials, direct comparisons are typically favored for reaching conclusions [34, 57]. Two interesting properties of network meta-analyses are their geometry and asymmetry. Geometry refers to ―the overall pattern of comparisons among different treatments‖ [56]. Asymmetry ―describes the extent to which specific treatments or specific comparisons are represented more heavily in the network than others‖ [56]. Visualizing the geometry and asymmetry of a network allows examination of the differential availability of data on different comparisons [55-59]. This analysis may reveal, for example, that some comparisons have been evaluated more heavily due to the presence of a standard reference treatment (e.g. placebo), due to preferences in the research agenda, or in some cases because of publication bias [56]. Missing links in a complex network may indicate avoidance of head-to-head comparisons for specific agents, while linear patterns represent situations where the more recently developed interventions are compared against each other and the older treatments are abandoned [56]. When there are a limited number of trials in a network (particularly with few patients and/or few events), or when the network has a star geometry (i.e., several interventions compared with a single common comparator, like for example placebo), our confidence in the estimated indirect effects is weaker. The better connected a network and the greater the amount of direct evidence, the more reliable the estimates. An examination of the connections between interventions in a graphical way may also guide the future research agenda [55-59]. Besides calculating treatment effects, network meta-analyses of RCTs also offer the possibility of estimating rankings of treatments, where the probability that each intervention is superior to all other interventions is presented. Although these rankings are a convenient way to present the network results, they should be interpreted with caution. Differences between the ranks may be small and clinically unimportant, bias in the meta-analysis may affect the rank order, and probabilities may be fragile when the network is not robust (i.e., the rank order may drastically change when a new trial is introduced) [55-59]. The evaluation of network meta-analyses presents special challenges because biases can operate at different levels [56, 59]. Heterogeneity from differences in trials design, study quality, compared populations, severity of illness, dosing differences, length of follow-up, cointerventions, definition and measurement of outcomes or other features that make studies different should be assessed both within each comparison and between all comparisons [5559]. This can be a complicated task in complex network structures, and requires judgment. The validity of the indirect comparison rests on the assumption that these variables are not sufficiently heterogeneous as to result in different effects (i.e., ‗‗similarity assumption‘‘) [5559]. Particular caution is advisable when combining contemporary trials with historical studies [55]. When substantial heterogeneity is present, authors may conduct subgroup analyses or use meta-regression techniques to interpret these results [55-59]. It should also be remembered that, like in other systematic reviews and meta-analyses, publication bias and other selection biases are potential threats, but in network meta-analyses they may affect specific comparisons of the network more than others [56].

Complimentary Contributor Copy

The Hierarchy of Evidence

15

CONCLUSION We have discussed here how the need to protect inferences against error has guided the sophistication of the scientific method from unsystematic observations to large rigorous experiments. This increased sophistication has set aside reliance on judgments only at the study level, and now considers judgments at the level of the body of evidence as crucial for making health care decisions. Current strategies consider risk of bias, directness, consistency, precision, and publication bias as additional features that need to be taken in account when formulating judgments regarding the quality of evidence. Network meta-analyses use both direct and indirect evidence to compare the relative effectiveness of interventions. In their complexity they hide from casual users the determinants of our confidence in the estimates of effect they seek to summarize. In particular, determinants of confidence related to incomplete reporting, inconsistency (within each comparison and across the network), and indirectness are of major concern in the analysis of network meta-analysis. When methodological quality is optimized, these methods may offer the best chance to consider the body of evidence, its limitations and results, in guiding healthcare decisions.

REFERENCES [1]

Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn't. BMJ 1996; 312: 71-2. [2] Guyatt G. Evidence-based medicine. ACP J. Club (Ann Intern Med). 1991;114. [3] Guyatt G, Rennie D, Meade MO, Cook DJ. User's guide to the medical literature: a manual for evidence-based clinical practice. Second edition. United States of America: McGraw-Hill Companies; 2008. [4] Guyatt G, Jaeschke R, Wilson M, Montori V, Richardson S. What is evidence-based medicine. In Guyatt G, Meade MO, Cook DJ, Rennie D. Users' Guides to the Medical Literature: A Manual for Evidence-based Clinical Practice. Third edition. New York: McGraw-Hill Companies; 2014. [5] Evidence-Based Medicine Working Group. Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA 1992; 268: 2420-5. [6] Hutchinson JM, Gigerenzer G. Simple heuristics and rules of thumb: where psychologists and behavioural biologists might meet. Behav. Processes 2005; 69: 97124. [7] Kahneman D, Tversky A. On the reality of cognitive illusions. Psychol. Rev. 1996; 103: 582-91. [8] Tversky A, Kahneman D. Judgment under Uncertainty: Heuristics and Biases. Science 1974; 185: 1124-31. [9] Tversky A, Kahneman D. The framing of decisions and the psychology of choice. Science 1981; 211: 453-8. [10] Mann CJ. Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emerg. Med. J. 2003; 20: 54-60.

Complimentary Contributor Copy

16

Oscar L. Morey-Vargas, Claudia Zeballos-Palacios, Michael R. Gionfriddo et al.

[11] Ho PM, Peterson PN, Masoudi FA. Evaluating the evidence: is there a rigid hierarchy? Circulation 2008; 118: 1675-84. [12] Altman DG. Practical statistics for medical research. London: Chapman and Hall; 1991. [13] Hoppe DJ, Schemitsch EH, Morshed S, Tornetta P, 3rd, Bhandari M. Hierarchy of evidence: where observational studies fit in and why we need them. J. Bone Joint Surg. Am 2009;91: 2-9. [14] Prasad V, Jorgenson J, Ioannidis JP, Cifu A. Observational studies often make clinical practice recommendations: an empirical evaluation of authors' attitudes. J. Clin. Epidemiol. 2013; 66: 361-6. [15] Kunz R, Oxman AD. The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials. BMJ 1998; 317: 1185-90. [16] Pocock SJ, Elbourne DR. Randomized trials or observational tribulations? N. Engl. J. Med. 2000; 342: 1907-9. [17] Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N. Engl. J. Med. 2000; 342: 1878-86. [18] Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N. Engl. J. Med. 2000; 342: 1887-92. [19] Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, Tektonidou MG, Contopoulos-Ioannidis DG, Lau J. Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA 2001; 286: 821-30. [20] Dahabreh IJ, Sheldrick RC, Paulus JK, Chung M, Varvarigou V, Jafri H, Rassen JA, Trikalinos TA, Kitsios GD. Do observational studies using propensity score methods agree with randomized trials? A systematic comparison of studies on acute coronary syndromes. Eur. Heart J. 2012; 33: 1893-901. [21] Colditz GA, Miller JN, Mosteller F. How study design affects outcomes in comparisons of therapy. I: Medical. Stat. Med. 1989; 8: 441-54. [22] Miller JN, Colditz GA, Mosteller F. How study design affects outcomes in comparisons of therapy. II: Surgical. Stat. Med. 1989;8:455-66. [23] Hulley S, Grady D, Bush T, Furberg C, Herrington D, Riggs B, Vittinghoff E. Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. Heart and Estrogen/progestin Replacement Study (HERS) Research Group. JAMA 1998; 280: 605-13. [24] Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, Jackson RD, Beresford SA, Howard BV, Johnson KC, Kotchen JM, Ockene J; Writing Group for the Women's Health Initiative Investigators. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women's Health Initiative randomized controlled trial. JAMA 2002; 288: 321-33. [25] Hennekens CH, Buring JE, Manson JE, Stampfer M, Rosner B, Cook NR, Belanger C, LaMotte F, Gaziano JM, Ridker PM, Willett W, Peto R. Lack of effect of long-term supplementation with beta carotene on the incidence of malignant neoplasms and cardiovascular disease. N. Engl. J. Med. 1996;334: 1145-9. [26] Yusuf S, Dagenais G, Pogue J, Bosch J, Sleight P. Vitamin E supplementation and cardiovascular events in high-risk patients. The Heart Outcomes Prevention Evaluation Study Investigators. N. Engl. J. Med. 2000; 342: 154-60. [27] Montori VM, Guyatt GH. Intention-to-treat principle. CMAJ 2001; 165: 1339-41.

Complimentary Contributor Copy

The Hierarchy of Evidence

17

[28] Karanicolas PJ, Montori VM, Devereaux PJ, Schünemann H, Guyatt GH. A new ―mechanistic-practical" framework for designing and interpreting randomized trials. J. Clin. Epidemiol. 2009; 62: 479-84. [29] Murad MH, Montori VM. Synthesizing evidence: shifting the focus from individual studies to the body of evidence. JAMA 2013; 309: 2217-8. [30] Gluud LL. Bias in clinical intervention research. Am. J. Epidemiol 2006; 163: 493-501. [31] Ioannidis JP. Why most published research findings are false. PLoS Med 2005; 2: e124. [32] Montori VM, Permanyer-Miralda G, Ferreira-Gonzalez I, Busse JW, Pacheco-Huergo V, Bryant D, Alonso J, Akl EA, Domingo-Salaverry A, Mills E, Wu P, Schunemann HJ, Jaeschke R, Guyatt GH. Validity of composite end points in clinical trials. BMJ 2005; 330: 594-6. [33] Ioannidis JP, Trikalinos TA. Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials. J. Clin. Epidemiol. 2005; 58: 543-9. [34] Higgins JPT, Green S. Cochrane. Handbook for Systematic Reviews of Interventions. Chichester: John Wiley & Sons; 2008. [35] Montori VM, Smieja M, Guyatt GH. Publication bias: a brief review for clinicians. Mayo Clin. Proc. 2000; 75: 1284-8. [36] Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA 2004; 291: 2457-65. [37] Furukawa TA, Watanabe N, Omori IM, Montori VM, Guyatt GH. Association between unreported outcomes and effect size estimates in Cochrane meta-analyses. JAMA 2007; 297: 468-70. [38] The periodic health examination. Canadian Task Force on the Periodic Health Examination. Can. Med. Assoc. J. 1979; 121: 1193-1254. [39] Sackett DL. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest 1989;95: 2-4S. [40] Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schunemann HJ. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008; 336: 924-6. [41] Guyatt GH, Oxman AD, Kunz R, Vist GE, Falck-Ytter Y, Schunemann HJ. What is "quality of evidence" and why is it important to clinicians? BMJ 2008; 336: 995-8. [42] Guyatt GH, Oxman AD, Kunz R, Falck-Ytter Y, Vist GE, Liberati A, Schunemann HJ. Going from evidence to recommendations. BMJ 2008; 336: 1049-51. [43] Schunemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Vist GE, Williams JW, Kunz R, Craig J, Montori VM, Bossuyt P, Guyatt GH. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ 2008; 336: 110610. [44] Guyatt GH, Oxman AD, Kunz R, Jaeschke R, Helfand M, Liberati A, Vist GE, Schunemann HJ. Incorporating considerations of resources use into grading recommendations. BMJ 2008; 336: 1170-3. [45] Jaeschke R, Guyatt GH, Dellinger P, Schunemann H, Levy MM, Kunz R, Norris S, Bion J. Use of GRADE grid to reach decisions on clinical practice guidelines when consensus is elusive. BMJ 2008; 337: a744.

Complimentary Contributor Copy

18

Oscar L. Morey-Vargas, Claudia Zeballos-Palacios, Michael R. Gionfriddo et al.

[46] Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, Norris S, Falck-Ytter Y, Glasziou P, DeBeer H, Jaeschke R, Rind D, Meerpohl J, Dahm P, Schünemann HJ. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J. Clin. Epidemiol. 2011; 64: 383-94. [47] Guyatt GH, Oxman AD, Kunz R, Atkins D, Brozek J, Vist G, Alderson P, Glasziou P, Falck-Ytter Y, Schünemann HJ. GRADE guidelines: 2. Framing the question and deciding on important outcomes. J. Clin. Epidemiol. 2011; 64: 395-400. [48] Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J, Vist GE, Falck-Ytter Y, Meerpohl J, Norris S, Guyatt GH. GRADE guidelines: 3. Rating the quality of evidence. J. Clin. Epidemiol. 2011; 64: 401-6. [49] Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, Montori V, Akl EA, Djulbegovic B, Falck-Ytter Y, Norris SL, Williams JW Jr, Atkins D, Meerpohl J, Schünemann HJ. GRADE guidelines: 4. Rating the quality of evidence-study limitations (risk of bias). J. Clin. Epidemiol. 2011; 64: 407-15. [50] Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, Alonso-Coello P, Falck-Ytter Y, Jaeschke R, Vist G, Akl EA, Post PN, Norris S, Meerpohl J, Shukla VK, Nasser M, Schünemann HJ; GRADE Working Group. GRADE guidelines: 8. Rating the quality of evidence--indirectness. J. Clin. Epidemiol. 2011; 64: 1303-10. [51] Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, Alonso-Coello P, Glasziou P, Jaeschke R, Akl EA, Norris S, Vist G, Dahm P, Shukla VK, Higgins J, Falck-Ytter Y, Schünemann HJ; GRADE Working Group. GRADE guidelines: 7. Rating the quality of evidence--inconsistency. J. Clin. Epidemiol. 2011; 64: 1294-302. [52] Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, Devereaux PJ, Montori VM, Freyschuss B, Vist G, Jaeschke R, Williams JW Jr, Murad MH, Sinclair D, Falck-Ytter Y, Meerpohl J, Whittington C, Thorlund K, Andrews J, Schünemann HJ. GRADE guidelines 6. Rating the quality of evidence—imprecision J. Clin. Epidemiol. 2011; 64: 1283-93. [53] Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, Alonso-Coello P, Djulbegovic B, Atkins D, Falck-Ytter Y, Williams JW Jr, Meerpohl J, Norris SL, Akl EA, Schünemann HJ. GRADE guidelines: 5. Rating the quality of evidence-publication bias. J. Clin. Epidemiol. 2011; 64: 1277-82. [54] Guyatt GH, Oxman AD, Sultan S, Glasziou P, Akl EA, Alonso-Coello P, Atkins D, Kunz R, Brozek J, Montori V, Jaeschke R, Rind D, Dahm P, Meerpohl J, Vist G, Berliner E, Norris S, Falck-Ytter Y, Murad MH, Schünemann HJ; GRADE Working Group. GRADE guidelines: 9. Rating up the quality of evidence. J. Clin. Epidemiol. 2011; 64: 1311-6. [55] Caldwell DM, Ades AE, Higgins JP. Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ 2005; 331: 897-900. [56] Salanti G, Higgins JP, Ades AE, Ioannidis JP. Evaluation of networks of randomized trials. Stat. Methods Med. Res. 2008; 17: 279-301. [57] Mills EJ, Ioannidis JP, Thorlund K, Schunemann HJ, Puhan MA, Guyatt GH. How to use an article reporting a multiple treatment comparison meta-analysis. JAMA 2012; 308: 1246-53. [58] Cipriani A, Higgins JP, Geddes JR, Salanti G. Conceptual and technical challenges in network meta-analysis. Ann. Intern. Med. 2013; 159: 130-7.

Complimentary Contributor Copy

The Hierarchy of Evidence

19

[59] Mills EJ, Thorlund K, Ioannidis JP. Demystifying trial networks and network metaanalysis. BMJ 2013; 346: f2914. [60] Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses. BMJ 2003; 326: 472.

Complimentary Contributor Copy

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 2

FROM PAIRWISE TO NETWORK META-ANALYSES Sonya J. Snedecor1,, Ph.D., Dipen A. Patel2, Ph.D., and Joseph C. Cappelleri3, Ph.D. 1

Director, Health Economics, Pharmerit International, Bethesda, MD, US 2 Associate Director, Health Economics and Outcomes Research, Pharmerit International, Bethesda, MD, US 3 Senior Director, Statistics, Pfizer Inc., Groton, CT, US

ABSTRACT Meta-analyses have been used for several years to pool data from clinical trials and generate estimates of treatment effect associated with a therapeutic intervention in relation to a comparator. In recent years, there has been an added emphasis on comparative effectiveness research since decision makers are often faced with more than one available treatment and want to understand whether a new product is more effective than the existing options. For disease indications where there are several treatment options available, it is difficult or simply not feasible to have clinical trials comparing each pairwise combination of the available treatments. In such circumstances, a network meta-analysis is a valuable tool to compare treatments of interest that have not been assessed in head-to-head clinical trials. This chapter provides an introduction to traditional pairwise meta-analyses and the related network meta-analyses. Commonly used methods in meta-analysis involve combining effect size estimates from individual studies using a fixed-effect model or random-effects model (or both). Differences between the two models along with presentation of Bayesian meta-analysis methodology are also discussed. Evidence networks used in network meta-analysis and key assumptions of similarity, homogeneity and consistency are also described. Finally, challenges and opportunities with meta-analyses are presented, including meta-regression and integration of individual patient data.



Corresponding author: Sonya J Snedecor, PhD; Pharmerit International, 4350 East West Hwy, Ste 430, Bethesda, MD, 20814 USA;. Email: [email protected].

Complimentary Contributor Copy

22

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

Keywords: Direct evidence, evidence networks, indirect evidence, indirect treatment comparisons, meta-analysis applications, history, network meta-analysis, mixed treatment comparison

INTRODUCTION Evidence synthesis involves the development of techniques to combine multiple sources of quantitative evidence. It can be considered an extension of research synthesis, which is the integration of empirical research for the purpose of creating generalizations from data gathered from multiple sources. [1] Goals of research synthesis include critical analysis of the research covered, identification of central issues for future research, and attempts to resolve conflicts in literature. The term meta-analysis is often used as a synonym for research synthesis, but a more precise definition was first described by Glass in 1976, [2] to be ―the statistical analysis of a large collection of individual studies for the purposes of integrating the findings.‖ The definition has evolved to also include an examination of heterogeneity, the variation of treatment effects among the studies analyzed. There are four steps to conducting a meta-analysis: 1) identification of studies with relevant data; 2) assessing eligibility of studies; 3) abstracting data for analysis; and 4) execution of statistical analysis, including exploration of differences among studies or study effects. Although all steps are equally important for the quality and validity of a metaanalysis, this chapter will concentrate on execution and applications (step 4). One of the first meta-analyses as an evidence synthesis methodology was conducted by British statistician Karl Pearson who analyzed and investigated differences among clinical trial results of the association of typhoid fever inoculation with mortality. [3, 4] Although much of the subsequent meta-analysis work originated in the social sciences, where metaanalyses of observational studies are common, [5, 6] meta-analyses applied to the medical sciences typically only include data from randomized controlled trials (RCTs). Two early and influential medical meta-analyses were conducted by Elwood, Cochrane and colleagues, [7, 8], who studied the effects of aspirin treatment after heart attack to show reduced risk of recurrence, and by Chalmers et al., [9] who examined the use of anticoagulants after acute myocardial infarction. These analyses showed how methods could be used to synthesize the results of separate but similar studies to provide more scientifically robust estimates of the direction and size of treatment effects. In 1985, Peto and colleagues published a meta-analytic overview of RCTs of beta blockade to encourage clinicians to review randomized trials systematically and to combine estimates of the effects of treatments considered to be the same based on informed clinical judgment. [3, 4, 10] Subsequently, citations of meta-analyses in health-related literature have surged over the past two decades from 272 PubMed citations in 1990 to 6,354 citations in 2012 and has paralleled the increase in the number of randomized trials being conducted (7,170 to 21,444 during the same time frame). Well-executed systematic reviews of the literature and meta-analyses of RCTs are widely considered to be at the top of the evidence hierarchy. These constitute the highest level of evidence because they attempt to collect, combine, and report the best available evidence using systematic, transparent, and reproducible methodology. [11] Performance of good

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

23

quality meta-analyses requires recognition that the analysis is itself a study necessitating careful planning and execution. [6] A formal protocol including clear specification on the method of study identification, rules for study selection, and analytic methodology are encouraged prior to initiating a meta-analytic investigation. [12] Complete and consistent reporting of meta-analyses are also paramount to conveying the quality and validity of an analysis. [13, 14] For more detailed information on systematic reviews and meta-analyses, the reader is referred to highly regarded texts. [15-17]

PAIRWISE META-ANALYSIS METHODOLOGY One of the simplest methods for the combination of results of several studies is the ―vote counting‖ method, where the overall assessment is made in comparing the number of studies demonstrating positive results with the number of studies with negative results. Vote counting is limited to answering the question ―is there any evidence of an effect?‖ [16] A similarly simple method involves the combination of p values from each study. Here, a test statistic, which is a function of the one-sided p values from each study, is calculated to reject or not reject the null hypotheses that there is no effect in every study in the collection. However, with this method the null hypothesis may be rejected on the basis of a non-zero effect in just one study. Unsurprisingly, these methods are not recommended because they fail to take into account the sizes of effects observed in individual studies, differential methods or statistical rules used to obtain analysis decisions and p values, or differential weights given to each study. These methods should be avoided whenever possible but might be considered a last resort in situations such as when the individual studies provide only non-parametric data analyses or there is no consistent outcome measure and little information other than a p value being available. [6, 16] The most commonly employed meta-analytic methods combine estimates of effect sizes from individual studies. These methods fall into two broad categories: fixed-effect and random-effects. The fixed-effect model answers the question ―What is the overall treatment effect in this collection of studies?‖ Implicit in a fixed-effect model is that the true effect of interest is constant across all of the studies considered. In contrast, a random-effects model acknowledges the occurrence of variation of true effects among studies. This model assumes that the study-level effects are drawn from a common distribution and answers the question ―What is the overall treatment effect in the universe of studies from which this collection was sampled?‖ The difference between fixed and random-effects models is shown graphically in Figure 1. The fixed-effect model (left) estimates the mean effect only from the studies identified. Each study is a sample estimate of treatment effect that is assumed to measure the same (common) treatment effect; these estimates are assumed to differ only because of natural random sampling variations. The random-effects model (right) estimates the mean effect which is centrally positioned around the individual effects of the different studies, each of which has its own individual distribution of effect.

Complimentary Contributor Copy

24

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

Mathematically, the difference between fixed-effect and random-effects models can be represented by a single parameter: Fixed-effect model:

yi    ei

Random-effects model:

yi    s i  ei

where, yi is the treatment effect of the ith study,  is the overall mean, ei is the error of the ith study, and si is the deviation of the study-specific effect from the overall mean (the random effect). Thus, when si = 0, the random-effects model reduces to the fixed-effect model.

Figure 1. Graphical description of fixed-effect (left) and random-effects (right) models.

Analysis Methods Common fixed-effect methods used to combine data are the Peto [10] and the MantelHaenszel methods [18] for effects measured on a ratio scale and general variance-based methods [19] for estimates of a difference measure. The DerSimonian and Laird method [20] is a random-effects model that may be used for dichotomous or continuous effects. These methods and others are explained and reviewed in detail elsewhere. [21, 22] These methods are classified as ―frequentist‖ because the parameters of interest (i.e., the treatment effect) are assumed to be unknown but have a fixed distribution and the data are considered to be random variables. Meta-analyses may also be conducted using Bayesian methodology. Bayesian statistical inference assumes the opposite of frequentist inference: that the input data are fixed and the parameters of interest are random variables that follow no pre-specified distribution. [23-25] In hypothesis testing, frequentist methods generate Pr(data|), reported as mean estimates and 95% confidence intervals (―CIs‖); Bayesian methods generate Pr(|data) and credible intervals (―CrIs‖) surrounding the mean estimate. [26] Confidence intervals indicate the probability with which repeated study samples will generate an interval range containing the true value of the parameter of interest. Credible intervals represent the probability that the value of parameter of interest lies within the interval range, given the observed data – the definition often incorrectly applied to confidence intervals. Thus, the Bayesian approach offers a more natural means of interpreting results. [27] The Bayesian model framework also requires "prior information‖ in the form of a distribution around the parameters of interest. The prior distribution represents some prior

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

25

belief and uncertainty about the value of the parameters that is integrated with the observed data to generate the model results (called the ‗posterior distribution‘). This requirement introduces a level of subjectivity to which some researchers object, since different prior beliefs can lead to different model conclusions. To minimize the subjective nature of priors, many advocate the use of so-called ―non-informative‖ priors in meta-analyses so that the model estimates are based solely on the RCT data collected. [24]

Choice of Meta-Analysis Model In general, random-effects models will result in similar overall mean estimates but larger uncertainty intervals than fixed-effect models. Because of this, it is frequently considered best to always use a random-effects model as it will generate the most conservative estimate of statistical significance. However, a number of factors should be considered when choosing a model type. The first should be whether heterogeneity or variation in treatment effects among the studies is expected. If studies use very similar protocols, data collection methods, outcome definitions, patient populations and so on, a fixed-effect model would likely be appropriate. If heterogeneity of treatment effects among the studies is expected or identified, then use of a random-effects model will quantify the degree of heterogeneity, but should be accompanied by an attempt to investigate or explain possible sources of the variation such as subgroup analyses or meta-regression. Random-effects models give more weight to small studies and less weight to large studies when generating estimates of effect, which may not be desirable in the presence of heterogeneity. [28] Sufficient data are also necessary to properly estimate the between-study variance of treatment effects across studies. When there are very few studies available, or few events in the outcome of interest, the estimate of between-study variability of treatment effect can be unreliable, and hence, it might be reasonable to use fixed-effect models. The size of the random effect parameter can indicate the appropriateness of the model selection. An RE parameter close to zero suggests that an FE model would be appropriate since a very small value indicates little heterogeneity in treatment effects. In Bayesian statistics, one can also test the appropriateness of each model using selection criteria such as the deviance information criterion (DIC). [29]

META-ANALYSIS APPLICATIONS The most common reasons for performing a meta-analysis are to provide an estimate of a treatment effect associated with a therapeutic intervention and to quantify and explain heterogeneity of treatment effects across studies, especially when data from a single study are insufficient and the conduct of a new, large study would be impractical. [30] Meta-analyses of clinical trials are also increasingly used to identify and evaluate potential drug safety. [31] Unless an RCT is prospectively designed and statistically powered with a particular safety outcome as its primary endpoint, the trial may not have a large enough sample size to reliably evaluate whether there is an increased risk of such event. When more than one study is

Complimentary Contributor Copy

26

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

available, meta-analyses can improve the ability to detect and characterize risks of AEs that occur at low rates. [30] Cumulative meta-analysis can illustrate important gaps between accumulated clinical trial evidence and treatment recommendations or routine clinical practice, [32] identify developing trends in therapeutic efficacy, and guide the planning of future trials. Cumulative metaanalysis is repeated analysis of a set of studies updating the set as new studies are published. [27] One sequential analysis of RCTs published from 1959 to 1988 included two large studies published in 1986 and 1988, both of which demonstrated favorable effects of the use of intravenous streptokinase for the treatment of acute myocardial infarction. [33] By sequentially including studies, the cumulative meta-analysis first showed a consistent, statistically significant reduction in mortality in 1973 after only eight trials had been completed. The addition of the two large RCTs into the analysis in 1986 and 1988 had littleto-no effect on the treatment effect, but narrowed the 95% confidence interval. The United States Food and Drug Administration did not approve this drug until 1988, after the first large-scale trial had been undertaken and 15 years after a cumulative meta-analysis indicated that the treatment could have been effective. [34] Before a new treatment can be marketed in a country, it must be approved by the respective drug regulatory agency. Relevant evidence to support the safety, efficacy, and value of the new product is contained within reports on a number of individual studies and reports, and synthesis of all of these sources is necessary to support claims of drug safety, efficacy, and value. [6] Pooled estimates from meta-analyses can identify optimal dosing or bioequivalence for new therapies [34] or assist in drug planning and development by serving as inputs into a decision analysis or cost-effectiveness analyses designed to support claims of safety, efficacy, and value. [35, 36] Meta-analyses are frequently used in comparative effectiveness research which compares the relative benefits and harms among a range of available treatments or interventions for a given condition. [37] Comparative effectiveness is a growing field of research as decision makers are often faced with more than one viable treatment option and are increasingly asking whether a new medicine is more effective than the existing options rather than whether it is effective at all. However, answering this new question is difficult within clinical trials as there may be a large number of treatment alternatives and clinical trials comparing each and every combination are unlikely. Network meta-analysis methods are frequently used in comparative effectiveness research in such cases where there is little or no evidence from direct head-to-head clinical trial comparisons.

NETWORK META-ANALYSIS Comparisons between treatments included in RCTs, often referred as ‗direct evidence‘, is considered a highly reliable source of evidence for healthcare decision making. Ideally, an RCT would compare all relevant comparators in a disease area in order to understand the relative effects of all treatments and to simplify decision making. However, this is generally impractical and new treatments are most often compared to a standard of care or placebo, resulting in lack of direct evidence between new or newer treatments.

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

27

A traditional pairwise meta-analysis typically involves a comparison of effects between two treatments. In the absence of direct head-to-head evidence between two treatments of interest, a network meta-analysis (NMA) can be conducted if both treatments have been compared to a common comparator. [25, 38-42] The estimate of treatment effect obtained from such an analysis is referred to as ‗indirect evidence.‘ That is, an indirect estimate of the effect of treatment A over B can be obtained by comparing trials of A vs. C and B vs. C. Extending this concept, NMAs can also allow simultaneous comparison of more than two treatments. Formally, NMA can be defined as a statistical combination of all available evidence for an outcome from several studies across multiple treatments [25, 39] to generate estimates of pairwise comparisons of each intervention to every other intervention within a network.

Evidence Networks Network meta-analyses are so named because all of the treatments analyzed are connected to every other treatment via a network of RCT comparisons, sometimes referred to as the ‗evidence network.‘ In the evidence network, each treatment is depicted as a node and the RCTs containing the treatments are represented as lines connecting the nodes. Figure 2 shows examples of networks of varying complexity. To convey more information on the available clinical trials, the size of each node can be made proportional to the number of patients receiving that treatment and thickness of the lines between treatments can be proportional to the number of available RCTs informing the comparison. Figures 2a-2c demonstrate networks that generate indirect evidence because all treatment nodes are connected through a common comparator (treatment A), allowing indirect comparisons between treatments not having direct RCT data available (such as B vs. C, C vs. F, B vs. F). [25] Analyses of these networks are commonly referred to as indirect treatment comparisons (ITC).

Figure 2. Different types of evidence networks.

Complimentary Contributor Copy

28

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

Figure 3. Classifications of meta-analyses.

Figures 2d and 2e include identical treatments and RCT comparisons as Figures 2a and 2b but include additional trials, BC and CD, which form ―closed loops‖ where direct and indirect evidence are analyzed (i.e., ABC and ACD). An analysis combining both types of data are often referred to as mixed treatment comparisons (MTCs) and inclusion of both types of evidence can help strengthen the precision of treatment effects between a pair of treatments in the network. [25, 43] The terms NMA, ITC, and MTC are often used interchangeably. Technically, network meta-analysis is a broader concept and can be used whenever the evidence base consists of two or more trials connecting three or more treatments. [25] ITCs and MTCs can be considered as sub-classifications within NMAs (Figure 3).

METHODOLOGY AND ANALYSIS Indirect Comparison Model of Two Treatments Lack of direct treatment comparison data has been a persistent problem in the medical literature. To overcome this lack of data, some researchers have pooled findings from only the active treatment arms of the original controlled trials. [38, 44, 45] However, using the data in this way ―breaks randomization‖ and fails to separate the efficacy of the drugs from possible placebo effects. [25] Also, such pooled responses fail to account for the differences in underlying baseline characteristics, resulting in biased estimates. [25, 38, 46]. Bucher and colleagues proposed a model for making indirect comparisons of two treatments while preserving randomization of the originally assigned patient groups. [38] Under the assumption that the magnitude of the treatment effect is constant regardless of any

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

29

differences in the populations‘ baseline characteristics, it was possible to generate an unbiased estimate of treatment effect. Consider a situation where separate studies have compared treatments A vs. B and treatments A vs. C. For a binary outcome measure, suppose that the probability of an event for patients on treatment A, B, and C is PA, PB, and PC, respectively. The treatment effect for each of the trials can be assessed through the odds ratios, and

.

A summary odds ratio of indirect comparison B vs. C can be computed by taking the ratio of the odds ratios from studies comparing A vs. B and A vs. C.

To allow for parametric hypothesis testing (H0: ORBC = 1), a natural log transformation (ln) of the above yields ln(ORBC) = ln(ORBA) – ln(ORCA), from which the variance can be estimated because ORCA and ORBA are estimated from different studies and are statistically independent. Var(ln ORBC) = Var(ln ORCA) + Var(ln ORBA) This method is advantageous because of its simplicity and wide applicability in the case of two treatments compared against a common comparator. The model also protects against some biases, but may still lead to some inaccuracies. The authors tested their proposed model in a meta-analysis of RCTs that compared two experimental and one standard regimen for primary and secondary prevention of Pneurnocystis carinii pneumonia in HIV infection. Results for indirect estimates were in the same direction as the direct observed data, but a difference remained in the magnitude of treatment effect between direct and indirect data, suggesting that this model may indeed protect against some sources of bias, but yet remains at risk for some inaccuracies. Potential differences between study populations and definition or measurement of outcomes were thought to be key sources of bias. [38]

Indirect Comparison Methods for Two or More Treatments When direct data of treatments are sparse, some studies have included pairwise metaanalyses of several treatments compared to placebo and qualitatively compared the results of those. [47, 48] Although accurate to some degree, such a naïve ―indirect‖ comparison can be misleading as they can neither generate a relative effect measure with associated uncertainty (without also performing an additional indirect comparison such as the Bucher method) nor

Complimentary Contributor Copy

30

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

incorporate any available direct data. In these cases an NMA is the most appropriate and informative approach. For a network including loops of RCTs such as the ABC network in Figure 2d, the statistical model must reflect the mathematical relationships between the treatment effects within RCTs containing direct or indirect evidence. [25, 42] The basic premise of the calculations behind MTC analyses are similar to that of the Bucher method described above. One treatment within the network is designated the reference or base treatment, b, against which all others are estimated. Standard convention is to select b to be placebo, some other standard of care, or the most-studied treatment within the evidence network. [24] The effects for some outcome dXb (e.g., log odds ratio, log hazard ratio, risk difference, mean difference) are then calculated for each trial, where X is any treatment compared to b. RCTs not containing treatment b (e.g., treatment C vs. treatment D) are specified in the model as dDC. Because both treatments C and D must somehow connect to b within the evidence network, dDC can be expressed as dDb – dCb. If trials comparing C vs. b and/or D vs. b are also within the network, then all CD, Cb, and Db trials jointly estimate the dDb and dCb parameters. If no Cb or Db trials are included, the parameters are estimated via other trials connecting C and D to b. A simple example of this is shown in Figure 4. In summary, NMA models generate the relative effect estimates of each intervention relative to the reference treatment b which is informed by trials comparing those treatments to b in addition to trials comparing other treatments. Network meta-analyses can also be performed with either fixed-effect or random-effects models, using a frequentist or Bayesian approach, analogous to pairwise analyses. In practice, frequentist methodologies are often used for traditional pairwise meta-analyses and Bayesian methods are preferred for complex evidence networks. Bayesian NMAs provide joint posterior distributions of the effects of all treatments, which are particularly useful for sensitivity analysis within decision models or cost-effectiveness model utilizing NMA data. They also provide a probability calculation that allows rank-ordering the interventions, i.e., the probability that a particular drug is best, second best, third best, and so on. Hence, Bayesian analyses provide information that is directly relevant to decision makers. [25]

Figure 4. Indirect and direct effect estimations within an network meta-analysis (NMA). Reprinted from Value in Health, 17(2), Jansen J et al. ―Indirect Treatment Comparison/Network Meta-Analysis Study Questionnaire to Assess Relevance and Credibility to Inform Health Care Decision Making: An ISPOR-AMCP-NPC Good Practice Task Force Report‖, 157-73, 2014, with permission from Elsevier.

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

31

Extension of the Network An NMA will typically include only the treatments of interest for the research objective. Under circumstances when it is not possible to form a connected network of these treatments based on the available randomized data, it may be required to introduce additional treatments so that a connected network can be formed. [49] For example, if there are three treatments of interest (A, B, and C) and treatments A and B have been compared in an RCT but treatment C has not been compared to either A or B, C will not be included in the network. In such a case, an additional treatment X should be included if there are AX and CX trials to connect C with treatments A and B. [49, 50] If no trials exist to connect the network, an informed assumption between the disconnected trial(s) and the rest of the network may be made, with the caveat that the quality of comparisons made between treatments ―connected‖ by the assumption will depend on the validity of the data imputed into the model. Comparisons within the fullyconnected network, however, will be unaffected. A consideration of extending a network does not have to be restricted in cases of disconnected networks. Even if the treatments of interest are connected, including additional treatments does have potential advantages such as providing additional evidence and strengthening inference, producing a more robust analysis which is less sensitive to specific sources of data, and the ability to check consistency more thoroughly. [49, 51] However, there are potential disadvantages as well. There is an increased danger of introducing effect modifiers as the ―remotely‖ connected treatments are more likely to have been trialed on relatively different patient populations. There may be a particular danger in extending the network to include treatments that were trialed earlier, particularly if date of publication is associated with different patient characteristics or severity of condition. [49]

Assumptions in Network Meta-Analysis Network meta-analyses combine data of multiple interventions across several RCTs to synthesize estimates of relative treatment effects to generate pairwise comparisons. The validity and accuracy of estimates from NMAs depend on the requirement that trials in the network are sufficiently comparable and similar to yield meaningful unbiased estimates. [25, 52] Three assumptions underlie NMA methodology and should always be tested. [53] Homogeneity assumes that there is no significant variation (or if present, it is due to random chance) in treatment effects among studies of the same comparison. In other words, are all AB trials (and, separately, AC trials) ―comparable‖ and estimating the same treatment effect? This assumption is as applicable to network meta-analyses as it is in pairwise metaanalyses. Homogeneity can be assessed separately for each collection of identical comparisons within the network using standard statistical measures, such as Q-statistic or I2 (or both). [52, 54, 55] If heterogeneity exists, then the possible sources should be explored and implementation of random-effects modeling, sensitivity analyses, subgroup analyses, or meta-regression should be considered if sufficient data are available. Trial evidence may be homogenous within certain pairwise comparisons, but significant variation in trial characteristics across different comparisons within a network can still lead to biased estimates. This leads to the assumption of similarity that requires all trials included within a network to be ―comparable‖ in terms of key factors that can be potential treatment

Complimentary Contributor Copy

32

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

effect modifiers (such as patient baseline characteristics, trial design, outcome definition and/or measurement, and follow-up time). Similarity – which cannot be formally tested and verified – can be gauged (though not proven) through quantitative techniques (sensitivity analysis, meta-regression, subgroup analysis) and assessed qualitatively using summary tables documenting relevant baseline characteristics of patients and description of studies. The assumption of similarity is not violated if differences in baseline or study characteristics between trials do not modify or influence treatment effect. It is only when such characteristics are treatment effect modifiers that the estimated treatment effect becomes biased. When direct and indirect evidence are combined for a particular comparison, it is assumed that there is agreement between the direct and indirect comparisons. [25, 56-59] This assumption is termed consistency, and it should be assessed in every NMA. Figure 2d shows a simple closed loop network, where both direct and indirect evidence is possible for all pairwise comparisons. For example, an estimate of effect for B vs. C can be obtained directly from the BC trial, and can also be estimated indirectly from AC and AB trials. For this loop to be consistent, the direct estimate should be equivalent to the indirect estimate (i.e. dBC = dBA – dCA). Of note, consistency is a property of closed loops of evidence, and not individual comparisons. [25, 50, 52] It is possible to say that AB, BC and AC comparisons are consistent, whereas stating that AB comparison is consistent with AC comparison has no meaning. Inconsistencies can be caused by differences in treatment effect modifiers among the studies within a loop. Although three independent studies forming a closed loop of evidence are unlikely to generate exact equality within a consistency evaluation, there are several published methods for evaluating acceptable ranges of consistency. [38, 39, 50, 59]

META-ANALYSIS: CHALLENGES AND OPPORTUNITIES Study Comparability One of the most common criticisms of meta-analyses is the ―apples and oranges‖ phenomenon in that all RCTs on which they are based are different and, as such, inherently cannot be combined as if they are. Although it is true that studies available for pooling may vary with respect to design, quality, outcome measures or populations, well-designed metaanalyses can minimize the effects of these variations by preparing a protocol including welldefined criteria and objectives for including studies determined to be sufficient similar for comparison. [34] Ironically, this criticism reflects one of the strengths of a meta-analysis. By combining data from different studies of different populations (provided it is clinically reasonable to do so), the results of a meta-analysis can be considered to be generalizable to a broader population of patients than any individual RCT. Of course, every study may not be similar for a number of reasons including different patient populations, outcome definitions, study-level covariates, patient-level covariates, and so on. Another critique is that without individual patient data (IPD) to adjust for differences among the patients, results of a meta-analysis cannot be valid. Obtaining the IPD from each study and pooling them into a single analysis is the ideal meta-analytic combination. Some have argued that in order to be valid, meta-analyses must include IPD. The rationale behind

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

33

this position is that individual studies are not identical and differing patient characteristics can influence study outcomes. Meta-regression techniques may mitigate these differences by relating the size of the study effect to one or more characteristics of the trial. [60, 61] Meta-regression is an approach commonly used to address heterogeneity of effect between studies and is a hypothesisgenerating technique, not to be regarded as a proof of causality. [62] One of the largest barriers to effective meta-regression is lack of power to detect an association between covariates and effects. This is particularly true if a meta-analysis contains a small number of studies or is an NMA including a number of treatments. In the latter case, the analysis must include a sufficient number of studies per treatment to estimate the individual treatment effect as well as the covariate effect, which is often difficult. The other substantial barrier to metaregression is that the trial-level interaction between estimate and covariate may not always accurately reflect the within-trial relationship due to ecological bias or aggregation bias, the expected difference in effects between groups and individuals due to confounding in differences in other unaccounted for effect modifiers. [31, 62]

Individual Patient Analysis For these reasons, clinicians generally prefer treatment-covariate interaction estimates to be based on within-trial data as it can relate patients‘ clinical characteristics directly to their treatment responses. [63] Introduction of IPD for at least one of the studies can reduce the risk of underestimation of the covariate effect and aggregation bias. In meta-analyses where IPD is available for all studies included, the relationship between covariate(s) and effects can be well-estimated. However, IPD for all trials is generally not attainable and IPD for only one or a few is more likely. In these cases, the relationship between covariate and treatment effect can be estimated from the trials with available IPD and imputed on the other trials, providing an overall meta-analytic estimate with reduced risk of aggregation bias. [63, 64] Of note, for analyses of trials with no differences in treatment effect modifying characteristics, thus not requiring IPD, including it improves the precision but provides the same answer as the corresponding aggregate data analysis when no effect modifying covariates are considered. [31] Although labor-intensive with respect to acquiring and analyzing the data, there are several additional advantages to including IPD within meta-analyses beyond that of more precise estimation of treatment effects as a function of patient-level covariates. First, access to IPD for trials allows for re-adjustment of variables to create common variables, thereby expanding the network of studies suitable for inclusion into the analysis. Re-adjustment can be useful in situations of different clinical definitions to define adverse events or when outcomes are based on a combination of variables that may collectively define a specific event. [31] The ability to re-analyze data is also important for outcomes that are dependent on exposure time or length of follow-up. Inclusion of IPD also permits greater flexibility in the analysis of subgroups data, defined by patient-level characteristics. Subgroup data are often not reported or not reported with standardization sufficient for combination within the metaanalysis. When two treatments of interest are compared to a common treatment in two different trials, the Bucher method described above is a straight-forward method to generate an indirect

Complimentary Contributor Copy

34

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

comparison. However, in order to generate unbiased estimate, all patient characteristics known to influence treatment effect must be equal or assumed to be equal. If the distribution of important characteristics is unequal between the trials and IPD is available for one, adjustment of the IPD has been proposed by Signorovitch et al. [65] so that the two trial populations become suitably comparable. In this method, the IPD from one trial are reweighted to match the average baseline characteristics reported from the other trial.

Study Biases An implicit assumption regarding the statistical aggregation of a collection of study effects is that the identified collection is an unbiased random sample of the total universe of data. For several reasons, this assumption may not be true. The published scientific literature document only a proportion of the result of all research carried out. Some studies could be proprietary data owned by the manufacturer and unavailable to the general population. Others may not be published because their authors chose not to write and submit studies with uninteresting findings or journals may have chosen not to accept them. [34] For example, there is evidence suggesting studies with significant outcomes are more likely to be published and published more quickly than those with non-significant outcomes. [66] Completely overcoming the possibility of publication bias is difficult, but carrying out as comprehensive a search as possible when obtaining studies for synthesis will help minimize its influence. A final limitation of meta-analyses is that no algorithm for the combination of study-level effects can correct for qualitative flaws of the studies themselves, leading to potential bias in the meta-analysis outcomes. Analysts must be careful to understand and identify any possible quality defects within the studies and understand their potential influence on the model outcomes (via sensitivity analysis or subgroup analysis, for example). Several scales have been developed for assessing the quality of randomized trials, [67] although little evidence supports an association between to the magnitude of the treatment effect and the quality of the study. [68] Nevertheless, by focusing attention on the quality of RCTs, meta-analyses has probably provided a powerful stimulus for improving their conduct and reporting. [34, 69]

Network Meta-Analysis Challenges NMAs are necessary when more than two treatments of interest are compared in a number of trials. However, adoption of NMAs has been hindered by their relative complexity compared to standard pairwise analyses. The statistical concepts behind calculation of the treatment effects among the comparators are not immediately intuitive, particularly if the network of clinical trials is complex. Further, NMAs are frequently executed using Bayesian methodology, which may be unfamiliar to many clinicians or other decision makers. In a recent survey, selected samples of authors of Cochrane reviews were asked about their knowledge and view of indirect treatment comparison methods. [70] Their responses highlighted the usefulness of the methods but also expressed caution about their validity and the need to know more about the methods. It was evident that clinicians were more reserved in considering the use of indirect evidence in decision making, compared to academic researchers.

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

35

Conducting a network meta-analysis is often perceived to be more resource consuming than a traditional pairwise meta-analysis. Additionally, the underlying assumptions in a network meta-analysis, especially similarity and consistency, are often a source of skepticism towards these methods. Authors don‘t often present explicit evidence supporting these assumptions, adding to the nervousness of the readers, especially due to potential differences in treatment effect modifiers among trials in terms of effect modifiers. [53] Also, there have been recent findings that significant inconsistency between direct and indirect estimates may be more prevalent than previously observed, [71] which adds to the concern. In order to overcome these concerns, it is essential for all studies to adopt standardized methods to assess the assumptions and report the assessment methods applied. [70] A proxy for unmeasured but important patient-level characteristics which may be a potential source of meta-analysis heterogeneity and inconsistency is baseline risk. Baseline risk reflects the underlying burden of disease in the population and is the average risk of the outcome of interest had the patients been left untreated. [16] Trials with varying levels of risk can results in different treatment outcomes and correcting for these differences may increase validity of model results. Achana et al. describe methods to adjust for baseline risk in pairwise meta-analyses and extend these methods to NMAs. [72]

Multivariate Meta-Analyses Meta-analyses that consider multiple outcomes usually do so in separate analyses. However, because multiple endpoints are usually correlated, a simultaneous analysis that takes their correlation into account should add efficiency and accuracy [73, 74]. Multivariate meta-analysis can describe the associations between different estimates of effect and provide estimates with smaller uncertainty intervals. Further, results of these analyses include joint confidence regions of the outcomes, which is particularly useful in making predictions of each outcome and for use within cost-effectiveness analyses [35, 75, 76]. The difficulty with multivariate meta-analysis models is that they are more complex to execute and may require from each study the within-study correlation between the outcomes of interest, which are often not reported.

CONCLUSION Meta-analyses offer several benefits including ability to address uncertainty and heterogeneity when results of multiple RCTs disagree, increasing statistical power for outcomes and subgroups, and leading to new knowledge and formulation of new research questions [34]. As the number of clinical trials has proliferated in recent years, the value of meta-analyses to synthesize and organize clinical data will likely increase. Therefore, researchers must learn to conduct high-quality studies and clinicians and decision makers must be able to critically assess the value and reliability of meta-analyses in order to have confidence in the results. Several guides, scales, and checklists are also available to assess the quality of reviews, [77-79] meta-analyses, [52, 80, 81] and the recommendations informed by meta-analyses. [82] When executed and used correctly, meta-analysis can be a powerful tool

Complimentary Contributor Copy

36

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

and will likely have an increasingly important role in medical research, drug regulation, public policy, and clinical practice.

REFERENCES [1]

[2] [3] [4] [5]

[6] [7]

[8] [9]

[10]

[11] [12] [13]

[14] [15] [16]

Cooper HH, L.V. Research synthesis as a scientific process. In: Cooper H, Hedges LV, Valentine JC, editors. The Handbook of Research Synthesis and Meta-Analysis. 2nd ed. New York: Russell Sage Foundation; 2009. Glass GV. Primary, Secondary, and Meta-Analysis of Research. Educational Researcher 1976; 5: 3-8. O'Rourke K. An historical perspective on meta-analysis: dealing quantitatively with varying study results. J. R. Soc. Med. 2007; 100: 579-82. Pearson K. Report on Certain Enteric Fever Inoculation Statistics. BMJ 1904; 2: 1243-6. Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA 2000; 283: 2008-12. Jones DR. Meta-analysis: weighing the evidence. Stat. Med. 1995; 14: 137-49. Elwood PC, Cochrane AL, Burr ML, Sweetnam PM, Williams G, Welsby E, Hughes SJ, Renton R. A randomized controlled trial of acetyl salicylic acid in the secondary prevention of mortality from myocardial infarction. BMJ 1974; 1: 436-40. Elwood P. The first randomized trial of aspirin for heart attack and the advent of systematic overviews of trials. J. R. Soc. Med. 2006; 99: 586-8. Chalmers TC, Matta RJ, Smith H, Jr., Kunzler AM. Evidence favoring the use of anticoagulants in the hospital phase of acute myocardial infarction. New Engl. J. Med. 1977; 297: 1091-6. Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog. Cardiovasc. Dis. 1985; 27: 335-71. Pandis N. The evidence pyramid and introduction to randomized controlled trials. Am. J. Orthod. Dentofacial. Orthop. 2011; 140: 446-7. Berman NG, Parker RA. Meta-analysis: neither quick nor easy. BMC Med. Res. Methodol 2002; 2: 10. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JP, Clarke M, Deveraux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Ann. Intern. Med. 2009; 151: W65-94. Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. New Engl. J. Med. 1987; 316: 450-5. Cooper H, Hedges LV, Valentine JC, editors. The Handbook of Research Synthesis and Meta-analysis. 2nd edition. New York: Russell Sage Foundation; 2009. Higgins JP, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]: The Cochrane Collaboration; 2011.

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

37

[17] Borenstein M. Introduction to Meta-Analysis. Chichester: John Wiley & Sons; 2009. [18] Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J. Nat. Cancer Inst. 1959; 22: 719-48. [19] Wolf FM. Meta-analysis: quantitative methods for research synthesis: New York: Sage Publications; 1986. [20] DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin. Trials. 1986; 7: 177-88. [21] Petitti DB. Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine. 2nd ed. New York: Oxford University Press; 2000. [22] Shadish WR, Haddock CK. Combining estimates of effect size. In: Cooper H, Hedges LV, Valentine JC, editors. The Handbook of Research Synthesis and Meta-Analysis. 2nd ed. New York, NY: Russell Sage Foundation; 2009. [23] Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Meta-Analysis in Medical Research. London: John Wiley & Sons; 2000. [24] Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med. Decis Making 2013; 33: 607-17. [25] Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, Boersma C, Annemans L, Cappelleri JC. Interpreting indirect treatment comparisons and network meta-analysis for health-care decision making: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 1. Value Health 2011; 14: 417-28. [26] Parker M. Foundations of Statistics – Frequentist and Bayesian 2004 2/2/2014. Available from: http://www.austincc.edu/mparker/stat/nov04/talk_nov04.pdf (last accessed on February 9, 2014). [27] Sutton AJ, Higgins JP. Recent developments in meta-analysis. Stat. Med. 2008; 27: 625-50. [28] Lau J, Ioannidis JP, Schmid CH. Summing up evidence: one answer is not always enough. Lancet 1998; 351: 123-7. [29] Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:583-639. [30] US Food and Drug Administration CDER2013139 White Paper2013 1/21/2014. Available from: www.fda.gov/downloads/Drugs/NewsEvents/UCM372069.pdf (last accessed on February 9, 2014). [31] Berlin JA, Crowe BJ, Whalen E, Xia HA, Koro CE, Kuebler J. Meta-analysis of clinical trial safety data in a drug development program: answers to frequently asked questions. Clin Trials 2013; 10: 20-31. [32] Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts: Treatments for myocardial infarction. JAMA 1992; 268: 240-8. [33] Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative Meta-Analysis of Therapeutic Trials for Myocardial Infarction. New Engl. J. Med. 1992; 327: 248-54.

Complimentary Contributor Copy

38

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

[34] Cappelleri JC, Ioannidis JPA, Lau J. Meta-Analysis of Therapeutic Trials. In: Chow SC, editor. Encyclopedia of Biopharmaceutical Statistics. New York: Informa Healthcare; 2010. [35] Dias S, Sutton AJ, Welton NJ, Ades AE. Evidence synthesis for decision making 6: embedding evidence synthesis in probabilistic cost-effectiveness analysis. Med. Decis Making 2013; 33: 671-8. [36] Ades AE, Sculpher M, Sutton A, Abrams K, Cooper N, Welton N, et al. Bayesian methods for evidence synthesis in cost-effectiveness analysis. Pharmacoeconomics 2006; 24: 1-19. [37] Cappelleri JC, Network meta-analysis for comparative effectiveness research. Presented at 19th Annual Biopharmaceutical Applied Statistics Symposium, Savannah, GA., 2012. [38] Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J. Clin. Epidemiol. 1997; 50: 683-91. [39] Lumley T. Network meta-analysis for indirect treatment comparisons. Stat. Med. 2002; 21: 2313-24. [40] Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses. BMJ 2003; 326: 472. [41] Salanti G, Higgins JP, Ades AE, Ioannidis JP. Evaluation of networks of randomized trials. Stat. Methods Med. Res. 2008; 17: 279-301. [42] Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat. Med. 2004; 23: 3105-24. [43] Caldwell DM, Ades AE, Higgins JP. Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ 2005; 331: 897-900. [44] Felson DT, Anderson JJ, Meenan RF. The comparative efficacy and toxicity of secondline drugs in rheumatoid arthritis. Results of two meta-analyses. Arthritis. Rheum. 1990; 33: 1449-61. [45] O'Brien BJ, Anderson DR, Goeree R. Cost-effectiveness of enoxaparin versus warfarin prophylaxis against deep-vein thrombosis after total hip replacement. CMAJ 1994; 150: 1083-90. [46] Song F, Loke YK, Walsh T, Glenny AM, Eastwood AJ, Altman DG. Methodological problems in the use of indirect comparisons for evaluating healthcare interventions: survey of published systematic reviews. BMJ 2009; 338: b1147. [47] Chou R, Carson S, Chan BK. Gabapentin versus tricyclic antidepressants for diabetic neuropathy and post-herpetic neuralgia: discrepancies between direct and indirect metaanalyses of randomized controlled trials. J. Gen. Intern. Med. 2009; 24: 178-88. [48] Chapple CR, Khullar V, Gabriel Z, Muston D, Bitoun CE, Weinstein D. The effects of antimuscarinic treatments in overactive bladder: an update of a systematic review and meta-analysis. Eur. Urol. 2008; 54: 543-62. [49] Dias S, Welton NJ, Sutton AJ, Ades AE. Evidence synthesis for decision making 1: introduction. Med. Decis Making 2013; 33: 597-606. [50] Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Med. Decis. Making 2013; 33: 641-56.

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

39

[51] Cooper NJ, Peters J, Lai MC, Juni P, Wandel S, Palmer S, Paulden M, Conti S, Welton NJ, Abrams KR, Bujkiewicz S, Spiegelhalter D, Sutton AJ. How valuable are multiple treatment comparison methods in evidence-based health-care evaluation? Value Health 2011; 14: 371-80. [52] Hoaglin DC, Hawkins N, Jansen JP, Scott DA, Itzler R, Cappelleri JC, Boersma C, Thompson D, Larholt KM, Diaz M, Barrett A. Conducting indirect-treatmentcomparison and network-meta-analysis studies: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 2. Value Health 2011; 14: 429-37. [53] Donegan S, Williamson P, Gamble C, Tudur-Smith C. Indirect comparisons: a review of reporting and methodological quality. PloS one 2010; 5: e11054. [54] Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat. Med 2002; 21: 1539-58. [55] Cochran WG. The combination of estimates from different experiments. Biometrics 1954; 10: 101-29. [56] Cooper NJ, Sutton AJ, Morris D, Ades AE, Welton NJ. Addressing between-study heterogeneity and inconsistency in mixed treatment comparisons: Application to stroke prevention treatments in individuals with non-rheumatic atrial fibrillation. Stat. Med. 2009; 28: 1861-81. [57] Salanti G, Kavvoura FK, Ioannidis JP. Exploring the geometry of treatment networks. Ann. Intern. Med. 2008; 148: 544-53. [58] Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Stat. Med. 2010; 29: 932-44. [59] Lu G, Ades AE. Assessing Evidence Inconsistency in Mixed Treatment Comparisons. J. Am. Stat. Assoc. 2006; 101: 447. [60] Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods. Stat. Med. 1999; 18: 2693-708. [61] Christensen R. Beyond RefMan: Meta-regression analysis in context: The Cochrane Collaboration; 2014. Available from: http://musculoskeletal.cochrane.org/sites/ musculoskeletal.test.cochrane.org/files/uploads/RC_Lecture.pdf (last accessed on February 9, 2014). [62] Baker WL, White CM, Cappelleri JC, Kluger J, Coleman CI; Health Outcomes, Policy, and Economics (HOPE) Collaborative Group. Understanding heterogeneity in metaanalysis: the role of meta-regression. Int. J. Clin. Pract. 2009; 63: 1426-34. [63] Riley RD, Lambert PC, Staessen JA, Wang J, Gueyffier F, Thijs L, Boutitie F. Metaanalysis of continuous outcomes combining individual patient data and aggregate data. Stat. Med. 2008; 27: 1870-93. [64] Jansen JP, Capkun-Niggli G, Cope S. ―Incorporating patient level data in a network meta-analysis‖ Presented at ISPOR 18th Annual International Meeting, New Orleans, LA, 2013. [65] Signorovitch JE, Wu EQ, Yu AP, Gerrits CM, Kantor E, Bao Y, Gupta SR, Mulani PM. Comparative effectiveness without head-to-head trials: a method for matching-adjusted indirect comparisons applied to psoriasis treatment with adalimumab or etanercept. Pharmacoeconomics 2010; 28: 935-45.

Complimentary Contributor Copy

40

Sonya J. Snedecor, Dipen A. Patel and Joseph C. Cappelleri

[66] Sutton AJ. Publication bias. In: Valentine JCC, Harris; Hedges, Larry V., editor. The Hand of Research Synthesis & Meta-Analysis. 2nd ed. New York, NY: Russell Sage Foundation; 2009. [67] Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin. Trials 1995; 16: 62-73. [68] Balk EM, Bonis PA, Moskowitz H, Schmid CH, Ioannidis JP, Wang C, Lau J. Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 2002; 287: 2973-82. [69] Halpern SD, editor. Evidence-based Obstetric Anesthesia. Oxford: Blackwell Publishing; 2007. [70] Abdelhamid AS, Loke YK, Parekh-Bhurke S, Chen Y-F, Sutton A, Eastwood A. Use of indirect comparison methods in systematic reviews: a survey of Cochrane review authors. Research Synthesis Methods 2012; 3: 71-9. [71] Song F, Xiong T, Parekh-Bhurke S, Loke YK, Sutton AJ, Eastwood AJ, Holland R, Chen YF, Glenny AM, Deeks JJ, Altman DG. Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study. BMJ 2011; 343: d4909. [72] Achana FA, Cooper NJ, Dias S, Lu G, Rice SJ, Kendrick D, Sutton AJ. Extending methods for investigating the relationship between treatment effect and baseline risk from pairwise meta-analysis to network meta-analysis. Stat. Med. 2013; 32: 752-71. [73] Jackson D, Riley R, White IR. Multivariate meta-analysis: Potential and promise. Stat. Med. 2011; 30: 2481-98. [74] Riley RD. Multivariate meta-analysis: the effect of ignoring within-study correlation. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009; 172: 789811. [75] Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR. An evaluation of bivariate random-effects meta-analysis for the joint synthesis of two correlated outcomes. Stat. Med. 2007; 26: 78-97. [76] Nam IS, Mengersen K, Garthwaite P. Multivariate meta-analysis. Stat. Med. 2003; 22: 2309-33. [77] Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J. Clin. Epidemiol. 1991; 44: 1271-8. [78] Oxman AD, Guyatt GH, Singer J, Goldsmith CH, Hutchison BG, Milner RA, Streiner DL. Agreement among reviewers of review articles. J. Clin. Epidemiol. 1991; 44: 91-8. [79] Council NR. Finding What Works in Health Care: Standards for Systematic Reviews. In: Eden J, Levit L, Berg A, Morton S, editors. Washington: The National Academies Press; 2011. [80] Ades AE, Caldwell DM, Reken S, Welton NJ, Sutton AJ, Dias S. Evidence synthesis for decision making 7: a reviewer's checklist. Med. Decis. Making 2013; 33:679-91. [81] Jansen JP, Trikalinos T, Cappelleri JC, Daw J, Andes S, Eldessouki R, Salanti G. Indirect treatment comparison/network meta-analysis study questionnaire to assess relevance and credibility to inform health care decision making: an ISPOR-AMCPNPC Good Practice Task Force report. Value Health. 2014 ; 17:157-73. [82] Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, Norris S, Falck-YtterY, Glasziou P, DeBeer H, Jaeschke R, Rind D, Meerpohl J, Dahm P, Schunemann HJ.

Complimentary Contributor Copy

From Pairwise to Network Meta-Analyses

41

GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J. Clin. Epidemiol. 2011; 64: 383-94.

Complimentary Contributor Copy

Complimentary Contributor Copy

3RD SECTION

Complimentary Contributor Copy

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 3

DESIGNING AND REGISTERING THE REVIEW Alison Booth, M.Sc. and Dawn Craig, M.Sc.† NIHR Centre for Reviews and Dissemination, University of York, Heslington, York, UK

ABSTRACT Network meta-analysis should be done as part of a systematic review. Good quality systematic reviews, irrespective of whether they include pair-wise meta-analyses or network meta-analyses, are built on good design and careful planning. To reduce potential for bias, methods should be pre-specified in a protocol with subsequent deviations and changes from what was planned, recorded and explained adequately in the completed review report. In this chapter we describe the key features of a systematic review protocol and highlight areas which may warrant additional thought when planning a network meta-analysis. Transparent conduct and reporting enables those using systematic review and network meta-analysis findings to judge the quality of a review and assess for themselves the potential impact of any deviation from what was planned initially. We make the case why protocols should be available in the public domain and outline the role of systematic review protocol registration. We introduce PROSPERO, an open register designed specifically for prospective registration of systematic reviews. We look at how reviews with network meta-analyses have been registered and provide a stepby-step guide to the PROSPERO registration process.

Keywords: Minimizing bias, PROSPERO, protoco, registration

INTRODUCTION In this chapter we will focus on network meta-analysis as a tool for synthesizing evidence as part of a systematic review.

 †

E-mail: [email protected]. E-mail: [email protected].

Complimentary Contributor Copy

46

Alison Booth and Dawn Craig

Systematic review is the research method of choice for informing practice and policy decisions in healthcare, public health and social care. Systematic reviews allow us to identify, evaluate and summarize all relevant evidence in a consistent, transparent and reproducible manner. When appropriate, combining the results of included studies gives more precise and reliable estimates of an interventions‘ effectiveness than any individual study. Historically, synthesis has taken the form of pair-wise meta-analysis. However, where there are multiple intervention options (for the same indication in the same population), traditional pair-wise meta-analyses may be insufficient for decision-making needs. Over the last few years there has been increasing recognition of the value of network meta-analysis. As explained in the following chapters, network meta-analyses allow the combination of both direct and indirect evidence within a network, allowing the estimation of relative effect despite the lack of headto-head trial evidence. Although there may be some circumstances where a ‗standalone‘ network meta-analysis may be performed, as for a standard pair-wise meta-analysis, network meta-analyses should be done in the context of a well conducted systematic review. This ensures transparency, and minimizes bias. Good quality systematic reviews require good design, careful planning and the use of appropriate methods throughout. The aim of this chapter is to describe briefly the design and planning stage of a systematic review. We intend this to help those wishing to understand the process and to provide some practical guidance for those considering undertaking a systematic review and network meta-analysis. We begin by explaining the need for careful planning and production of a detailed protocol including clear articulation of the methods to be used. The principles and purpose of registration, are outlined, focusing specifically on PROSPERO, the international prospective register of systematic reviews. The chapter concludes with a ‗walk‘ through the submission process for PROSPERO.

DESIGNING A REVIEW AND PLANNING A NETWORK META-ANALYSIS All systematic reviews should begin by developing a protocol that sets out in advance the methods that will be used in the review. Time and effort spent designing any review and producing a protocol is essential to ensure that the findings answer the question being posed and that the results can be relied upon. The key to a good review is recognizing the many forms of potential bias and taking the steps necessary to minimise these. Transparency in the process is important for those using the findings so they can judge the quality of the review and assess for themselves the potential impact of biases. Spending time getting the design right at the start pays dividends in the long run. Development of the protocol is usually an iterative process that should result in the specification of a clear review question with well defined inclusion/exclusion criteria, and detailed methods for searching, data extraction, quality assessment/assessment of risk of bias and synthesis. That you have decided to plan a network meta-analysis is an indication that you are probably already aware of at least some of the existing evidence base. It is therefore important also to be aware of the potential for bias that prior knowledge brings. There is likely to be inherent bias in most reviews where the researchers are interested and knowledgeable on a

Complimentary Contributor Copy

Designing and Registering the Review

47

topic and /or when reviewing the results of scoping searches. Having a team decide on the inclusion/exclusion criteria can help mitigate this.

The Review Team and Advisory Group Establishing a good team with all the relevant skills is vital. In addition to skills in searching and information retrieval, review methods and, where relevant, economic evaluation, a network meta-analysis requires the input of an experienced statistician. [1] Specialist knowledge in the relevant clinical topic is also essential in ensuring that the review is designed and conducted in a way that produces meaningful results. This may be particularly important in network analyses where ensuring exchangeability of studies included in the network is a key aspect of design. Establishing an advisory group that includes wider topic and methods expertise and service user representatives is advisable. Such a group provides a wider perspective and can provide additional expertise to the project team. This may be particularly helpful in a network meta-analysis where several different interventions may be under consideration and where particular expertise and experience of each may be required. A major consideration in the planning and design of a review is balancing resources available with how long the review will take. This will include review team and advisory group availability. Clearly the level and nature of funding will influence the breadth and depth of the review. Use the skills of the review team and experience of the advisory group together with scoping searches to be realistic about the volume of evidence likely to be identified and the subsequent work that will be required.

Background, Context and Justification Before embarking on a new systematic review, you should ensure that the topic has not already been covered adequately and that the same or a similar question is not already being addressed elsewhere. [1, 2] Primary resources for determining this include the Cochrane Database of Systematic Reviews (CDSR), the Database of Abstracts of Reviews of Effects (DARE) and PROSPERO. [3-5] DARE is particularly useful in identifying completed systematic reviews: experienced information specialists carry out regular detailed bibliographic searches of the world literature. These identify potential systematic reviews of the effects of health and social care interventions, which are then screened for inclusion in the database. Bibliographic records included in DARE have to meet basic inclusion criteria to confirm that they are indeed systematic reviews. For those reviews of direct relevance to the UK NHS, an experienced CRD systematic reviewer critically appraises and writes a commentary on the reliability of the findings and conclusions of the review. This summarized information can be used to inform the development of a robust background and explanation of the need for a network meta-analysis. The commentaries can also be used as a learning tool to practice or check your own critical appraisal skills. The CRD database site is updated continuously with records being loaded as soon as they are identified as meeting the inclusion criteria or a commentary completed. [6]

Complimentary Contributor Copy

48

Alison Booth and Dawn Craig

The content of the CRD databases (DARE, NHS EED and HTA database) also appears in The Cochrane Library. Summary details of Cochrane reviews and protocols are also included in DARE (and new Cochrane protocols in PROSPERO). Either source can therefore be used to identify completed systematic reviews. PROSPERO should be checked for duplicate or similar on-going systematic reviews. Cochrane Review protocols are included in PROSPERO, but registration records from nonCochrane review protocols do not appear in The Cochrane Library. Unplanned duplication of an existing review should be avoided and where replication is deemed necessary the reason for this should be stated explicitly. Given that network metaanalysis is a relatively new technique, the use of this method of analysis in the replication of existing reviews with pair-wise meta-analyses may well be sufficient justification. [7] You may also wish to consider whether the use of this type of analysis may allow the broadening of some review questions to allow for additional comparative interventions to be included. [8]

Review Question and Title The PICOS approach makes use of decisions about the Population, Interventions, Comparators, Outcomes and, if relevant, the Setting to help formulate the research question. The review question may be simply about the comparative effectiveness of all relevant treatments for a specific condition in a specific population. Alternatively the review team may formulate a series of objectives, for example to look at efficacy and safety separately or to group interventions by mode of action. [9] Most journals recommend that the title of a review should include the interventions or exposures being reviewed and the associated health or social problem being addressed. However there is also a need to be succinct and as network meta-analyses are likely to include a number of different interventions, it may be necessary to describe them collectively rather than including them all in the title. It is also worth including the study design – in this case Systematic Review and Network Meta-Analysis. The title used in the protocol can be considered as a working title that may change for the final publication – though these should include the relevant information. This example is from a record registered in PROSPERO: Is there an increased risk of post-transplant lymphoproliferative disorder and other cancers with specific modern immunosuppression regimens in renal transplantation? Protocol for a network meta-analysis of randomized and observational studies. [10]

Review Methods Population Participants/population should be specified in advance to avoid selection bias. It is also important to anticipate any potential situations or issues that may arise when it comes to analysing the data and state in advance how these will be dealt with. For example, if a particular intervention is contra-indicated for a subset of participants included in the network, the intervention should be excluded from the network to preserve the validity of the indirect comparisons or exchangeability. [11]

Complimentary Contributor Copy

Designing and Registering the Review

49

Interventions and Comparators A clear definition of the interventions to be included, and if appropriate how they will be grouped, is particularly important for network meta-analysis. Failure to define these in advance introduces avoidable risk of bias. [11] There may be multiple ways to combine and construct the network which may give differing results and would allow selective presentation according to (potentially vested) preference. You should consider whether or not to include placebo or no treatment as part of the evidence set. Including studies that compare an intervention with placebo or no treatment may strengthen the network. [12] Outcomes It is important to pre-specify the outcomes of interest and the outcome measures that will be used. This reduces the risk of bias in selecting and presenting the most favorable results from multiple outcomes and measures. You should agree a clear definition of the outcome(s) you aim to use in your network analysis. Bear in mind that when reporting the review you will need to account for any changes to the pre-specified outcomes and that the addition of new outcomes and/or the omission of planned outcomes should be acknowledged and justified. [13] Inclusion Criteria Inclusion criteria should be straightforward and follow logically from the review question and PICOS criteria. There still remains much debate regarding how broad or narrow inclusion criteria should be, with many advocating that the outcome of any review is only as good as the studies included in the review. This leads to a narrow inclusion criteria restricted to high quality randomized control trials. However, in many cases using broader criteria possibly including some observational study designs may be warranted. Whatever inclusion criteria are defined, it may be necessary to consider alternative inclusion criteria for the network metaanalysis. If your analysis will only incorporate some of those studies included, or if you intended your base case to include only some and will undertake a sensitivity analysis to assess the impact of including other information it is important to give these issues consideration at the protocol stage. You can then be as transparent as possible regarding your planned analysis and any subsequent sensitivity analysis you may want to consider to explore heterogeneity and inconsistency. Decisions about restricting the language of included papers and the extent of efforts to identify un-published research also need to be made, stated and justified. Data Extraction The items that will be data extracted from included study reports and publications should be identified in the protocol. Piloting is usually helpful in ensuring that all the relevant information required for quality assessment and planned analyses are captured. When planning indirect treatment comparisons it is important that randomization is preserved. [14] The appropriate data for each intervention should be extracted as should data from any subgroups to be examined in the analysis. [8] Data extraction plans should also describe the procedure that will be followed, including the number of researchers collecting the data, the process for resolution of discrepancies, and the means of recording the data. Clear objective criteria will help minimize subjectivity when coding data and provide as reliable a data set as possible to support the assumptions necessary for a network meta-analysis. [8, 15]

Complimentary Contributor Copy

50

Alison Booth and Dawn Craig

Quality Assessment / Risk of Bias It is important to decide not only how the quality of the included studies will be assessed for risk of bias but how the findings of the assessment will be used in the subsequent analysis. For example, will sensitivity analyses be performed? In a network meta-analysis, the validity of indirect and mixed comparisons is underpinned by the assumption that the only important difference between included trials is in the treatments being compared. [12] Accurate assessment of the quality of the included studies is essential to give an indication of the reliability of such an assumption and the internal validity of the analyses. [14] Data Analysis Advice on choosing the best approach to data analysis for a specific review, between frequentist and Bayesian approaches and the corresponding statistical packages, between fixed and random effects, appropriate statistics and incorporating moderators is given in subsequent chapters. As with all statistical analysis it can be difficult to be explicit about the full details in advance, however it is important to consider all the relevant issues carefully and to be clear in the protocol about the issues that will be determined post protocol. A priori decisions stated in the protocol may need to be modified, which is not a problem as long as any changes are clearly documented and justified. You should also account for any additional bias that might arise because of the stage of the review process when the changes occur. Scoping Searches Development of the protocol is an iterative process that often involves undertaking scoping searches to help inform decisions. The search strategy used for scoping can be used in the protocol as an example of what will be modified for searching other databases.

Further Information For more information about systematic review methods, free online access is available to Systematic Reviews: CRD‘s guidance for undertaking reviews in health care, and the Cochrane Library Handbook. [16, 17]

THE NEED FOR PROTOCOL REGISTRATION Systematic review methodologies have developed over the last thirty years and increasingly highlighted shortcomings in the conduct and reporting of clinical trials. [18-20] This contributed to the introduction of tighter regulations for the conduct of trials including mandatory registration, with journals requiring proof of registration. [21, 22] Concern then grew that the biases seen in trials were also appearing in the way reviews were carried out and affecting the findings. This led to investigations into the extent and impact of outcome reporting bias and publication bias. Research evidence for possible selective outcome reporting bias for systematic reviews initially came from an examination of 47 Cochrane reviews. [23] Comparison of protocols with their full publication showed that 43 of the 47 contained a major change, such as the addition or deletion of outcomes. However, poor reporting in the reviews meant it was not clear whether the changes reflected at least some

Complimentary Contributor Copy

Designing and Registering the Review

51

degree of bias or legitimate changes in the focus of outcomes as the review methods were developed which had simply not been reported. For example, an outcome initially specified may not have been used in the included studies, resulting in the reviewers removing that outcome from their final review. Either way this study highlighted important issues in published systematic reviews. In 2007 a review of 300 published Cochrane and non-Cochrane systematic reviews found overall that the quality of reviews was disappointing. [24] However there were clear differences in the quality of reporting between Cochrane and non-Cochrane systematic reviews. For example details of quality assessment of the included studies were missing in about a third of the reviews; and only a quarter of cases reported undertaking any analysis to look for publication bias. Most of the reporting failures were in the non-Cochrane reviews. While Cochrane reviews are required to have a protocol, only 11% of the non-Cochrane reviews examined mentioned having one. In their editorial on Moher‘s paper, the PLoS Medicine Editors said that the absence of a protocol, ―naturally leads to concerns about methodological rigor in the assessment of the study question.‖ [24] Evidence of selective outcome reporting biases and poor reporting of reviews prompted leaders in the field to compile and publish the PRISMA statement. [25, 26] The preferred reporting items for systematic review and meta-analyses (PRISMA) that evaluate health-care interventions, include stating where the protocol may be accessed and a registration number if possible. At the time the PRISMA statement was published, the only existing access to systematic review protocols was limited to the outputs of individual organisations, such as the Cochrane and Campbell Collaborations and the Joanna Briggs Institute. [27-29] It was therefore reasonable to emphasise the importance of making planned methods publicly available but not to make it a pre-requisite at that time. In 2010 further research evidence of the dangers of outcome reporting bias was published by Kirkham et al., in their study of 288 new Cochrane reviews. [20] When the protocol and the published review were compared, 22% contained discrepancies in at least one outcome measure; 75% of these were in the primary outcome. Potential bias was found in nearly a third (8/28, 29%) of these reviews, with changes being made after seeing the results from individual trials. Only 4 (6%) of the 64 reviews with an outcome discrepancy described the reason for the change in the review. Most importantly, the study found that outcomes that were promoted from secondary in the protocol to primary in the review were more likely to be significant than if there was no discrepancy (relative risk 1.66 95% CI (1.10 to 2.49), p = 0.02).

WHERE TO REGISTER In response to the growing concerns about and evidence of potential bias in the conduct of systematic reviews CRD took the initiative to develop a database for the prospective registration of systematic review protocols. PROSPERO was launched in February 2011 as the first centralised comprehensive registry of systematic-review protocols. [5] By capturing key information at the protocol stage, the register facilitates good practice in systematic reviews by providing a transparent public record of the planned review including information about inclusion and exclusion criteria, outcomes and strategies for synthesis. This facilitates comparison of what was planned with what is published in the final review and means that

Complimentary Contributor Copy

52

Alison Booth and Dawn Craig

any discrepancies can be identified and considered when assessing the reliability of the findings. [23] It was anticipated that registration might also encourage full publication of the review‘s findings including details and justification of any changes to the methods that could be perceived as introducing bias. Registration has the further advantage that as the records are permanent, even if details of a final publication are never added, the named contact‘s details are available for users to pursue enquiries. Some but not all clinical trials registers also accept registration of systematic review protocols. For example, ―Systematic Review And Meta-Analysis Of Psoriasis Treatments”, is registered on www.ClinicalTrials.gov with the Identifier: NCT01425138. [30] The background information for this retrospectively registered review says analysis included a mixed treatment comparison. The record provides no further information about the analysis and little information about the planned methods as these are not part of the clinical trials dataset which the registry is designed to collect. There are significant advantages in registering on PROSPERO rather than on a trials register. PROSPERO was designed specifically for the registration of systematic review protocols. Both searching and registration on PROSPERO are free of charge. Review protocols from the Cochrane and Campbell Collaborations and from the Joanna Briggs Institute are included in PROSPERO as are all those funded by the NIHR in the UK. As the only open access register of systematic review protocols PROSPERO provides a single access point to identifying in-progress systematic reviews. As content grows, so does usage: in the year 2012/13 PROSPERO usage statistics showed that just under 1.2 million pages were viewed by almost 40,000 unique client internet providers. [31] As internet provider addresses can represent either a single user or a whole organisation (for example, the National Health Service in England), we know that this is a conservative estimate of the numbers of users. Examples of where potential duplication have been avoided can be found on the PROSPERO news page, demonstrating the value of having prospectively registered your work. [5]

NETWORK META-ANALYSES IN PROSPERO Since the launch of PROSPERO in Feb. 2011 there has been a growing number of systematic reviews with a network meta-analysis registered. Figure 1 demonstrates the increasing popularity and use of this type of analysis. Roughly two thirds of these submissions include plans for a pair-wise meta-analysis as well as a network meta-analysis. For example, Jia and Leung in their protocol plan to undertake a ‗simple meta-analysis‘ using a random effects model, a linear mixed-effects network meta-analysis and a Bayesian network meta-analysis. [32]

Complimentary Contributor Copy

Designing and Registering the Review

53

Figure 1. Percentage of the all registrations on PROSPERO including a network meta-analysis.

Many include details of tests for heterogeneity and the sub-groups on which they intend to perform separate analyses. There are some good examples of careful design and planning with justification of the proposed analyses that include an explanation of the methods and why they are appropriate to the planned review. Not only does this help in understanding the information the review will ultimately provide, so aiding decisions about undertaking a similar, overlapping review, it is also a useful learning tool. But bear in mind that protocols registered on PROSPERO are not peer reviewed, simply checked to ensure they meet the inclusion criteria. Making detailed planned methods publicly available via PROSPERO means the permanent record can be referred to when writing up the final report so saving word space within a journal article. Finally, quoting the unique identification number will meet the growing requirement by journals for protocol registration.

THE PROSPERO REGISTRATION PROCESS The following is a guide to the registration process and is largely based on information provided in the help pages available on the PROSPERO website. It is included here by kind permission of the NIHR Centre for Reviews and Dissemination, University of York, UK. Registration on PROSPERO involves the prospective submission and publication of key information about the design and conduct of a systematic review. Registration is free of charge. Registrants are responsible for the information entered in the registration form and by submitting this, agree to be accountable for the accuracy and timeliness of the record and its content. The person submitting the completed form, known as the Named contact, is also

Complimentary Contributor Copy

54

Alison Booth and Dawn Craig

expected to keep the record up to date, including provision of a citation and where possible a link to the report of the review when it is completed. PROSPERO initially focussed on systematic reviews of the effects of interventions and strategies to prevent, diagnose, treat, and monitor health conditions, for which there is a health related outcome. The fields in the template were agreed through an international consultation and are formulated for reviews of the effects of interventions. [33, 34] Details of the dataset can be found in Table 1. The scope is gradually expanding to include details of all ongoing systematic reviews that have a health related outcome in the broadest sense (for example, reviews of risk factors and genetic associations). Reviews of methodological issues need to contain at least one outcome of direct patient or clinical relevance in order to be included in PROSPERO. Literature reviews, and systematic reviews simply looking at the reporting of and/or use of outcomes in research are not included. Systematic reviews of reviews are accepted for registration as long as they meet all the standard PROSPERO eligibility criteria but scoping reviews and reviews of animal studies are not eligible. New Cochrane protocols are automatically uploaded from the Cochrane Library so to avoid duplication of records, Cochrane protocols should not be registered independently on PROSPERO. In order to achieve the key aim of promoting transparency and minimising bias, PROSPERO only accepts prospective registrations. Ideally registration forms should be submitted before screening against eligibility criteria commences and this is likely to be the cut off point for acceptance in the future. In the meantime, reviews are being accepted as long as they have not progressed beyond the completion of data extraction. Registrants are asked to indicate the stage of review at initial submission and whenever the record is updated or amended. Submissions must be in English for practical reasons, but search strategies and protocols attached to a record may be in any language. The register administrators are happy to advise anyone who is in doubt about eligibility of a review ([email protected]). To access a registration form, first obtain a username and password by following the 'Join' link, then use these to ‗Sign in’. This allows access to ‗Register a review’, which opens a page that encourages you to check that your review will meet the inclusion criteria. If you are sure it does, click on 'Register a review'. This opens a four page electronic registration form which has 22 required fields and 18 optional fields as illustrated in Figure 2. 'Required' fields, marked with a red asterisk, must be completed before the Submit button can be accessed. You may save and exit the form at any time, and return at a later date to add or edit information by signing in and going to 'My PROSPERO records'. You can also update your personal information and password at ‗My details‘. Each page of the form has a 'Save' button, changes are automatically saved when a field is exited, but the save button can be used at any time; and the 'Validate this page' button, which will highlight any 'Required' fields that still need to be completed. Forms can be printed or saved as a portable document (pdf) or as a word processing document (word) using the relevant button. These functions enable sharing and collaboration on development of the submission. Fields can be completed by cutting and pasting information from a prepared protocol. Alternatively, the PROSPERO form has been used as a template for developing the review protocol. Records need to be fully searchable, so information needs to be supplied in the specified fields. It is not sufficient to attach a link to a protocol in a file or publication.

Complimentary Contributor Copy

Designing and Registering the Review Table 1. PROSPERO dataset and summary guidance

Review title and timescale Review title *

Original language title Anticipated or actual start date* Anticipated completion date* Stage of review at time of this submission*

Give the working title of the review. This must be in English. Ideally it should state succinctly the interventions or exposures being reviewed and the associated health or social problem being addressed in the review. For reviews in languages other than English, this field should be used to enter the title in the language of the review. This will be displayed together with the English language title. Give the date when the systematic review commenced, or is expected to commence. Give the date by which the review is expected to be completed. Indicate the stage of progress of the review by ticking the relevant boxes. Reviews that have progressed beyond the point of completing data extraction at the time of initial registration are not eligible for inclusion in PROSPERO. This field should be updated when any amendments are made to a published record.

Review team details Named contact* Named contact email* Named contact address Named contact phone number Organisational affiliation of the review* Review team members and their organisational affiliations Funding sources/ sponsors*

Conflicts of interest*

The named contact acts as the guarantor for the accuracy of the information presented in the register record. Enter the electronic mail address of the named contact. Enter the full postal address for the named contact. Enter the telephone number for the named contact, including international dialing code. Full title of the organisational affiliations for this review, and website address if available. This field may be completed as ‗None‘ if the review is not affiliated to any organisation. Give the title, first name and last name of all members of the team working directly on the review. Give the organisational affiliations of each member of the review team. Give details of the individuals, organizations, groups or other legal entities who take responsibility for initiating, managing, sponsoring and/or financing the review. Any unique identification numbers assigned to the review by the individuals or bodies listed should be included. List any conditions that could lead to actual or perceived undue influence on judgements concerning the main topic investigated in the review.

Complimentary Contributor Copy

55

56

Alison Booth and Dawn Craig Table 1. (Continued) Collaborators

Give the name and affiliation of any individuals or organisations who are working on the review but who are not listed as review team members.

Review methods Review question(s)* Searches*

URL to search strategy

Condition or domain being studied* Participants/ population* Intervention(s), exposure(s)* Comparator(s)/ control*

Types of study to be included initially*

Context Primary outcome(s)* Secondary outcomes* Data extraction (selection and coding) Risk of bias (quality) assessment*

State the question(s) to be addressed / review objectives. Please complete a separate box for each question. Give details of the sources to be searched, and any restrictions (e.g. language or publication period). The full search strategy is not required, but may be supplied as a link or attachment. If you have one, give the link to your search strategy here. Alternatively, upload your search strategy to CRD in pdf format. Please note that by doing so you are consenting to the file being made publicly accessible. Give a short description of the disease, condition or healthcare domain being studied. This could include health and wellbeing outcomes. Give summary criteria for the participants or populations being studied by the review. The preferred format includes details of both inclusion and exclusion criteria. Give full and clear descriptions of the nature of the interventions or the exposures to be reviewed. The preferred format includes details of both inclusion and exclusion criteria. Where relevant, give details of the alternatives against which the main subject/topic of the review will be compared (e.g. another intervention or a non-exposed control group). The preferred format includes details of both inclusion and exclusion criteria. Give details of the study designs to be included in the review. If there are no restrictions on the types of study design eligible for inclusion, this should be stated. The preferred format includes details of both inclusion and exclusion criteria. Give summary details of the setting and other relevant characteristics which help define the inclusion or exclusion criteria. Give the most important outcomes. Give information on timing and effect measures, as appropriate. List any additional outcomes that will be addressed. If there are no secondary outcomes enter None. Give information on timing and effect measures, as appropriate. Give the procedure for selecting studies for the review and extracting data, including the number of researchers involved and how discrepancies will be resolved. List the data to be extracted. State whether and how risk of bias will be assessed, how the quality of individual studies will be assessed, and whether and how this will influence the planned synthesis.

Complimentary Contributor Copy

Designing and Registering the Review Strategy for data synthesis*

Analysis of subgroups or subsets*

Give the planned general approach to be used, for example whether the data to be used will be aggregate or at the level of individual participants, and whether a quantitative or narrative (descriptive) synthesis is planned. Where appropriate a brief outline of analytic approach should be given. Give any planned exploration of subgroups or subsets within the review. ‗None planned‘ is a valid response if no subgroup analyses are planned.

General information Type of review Language

Country

Other registration details

Reference and/or URL for published protocol Dissemination plans Keywords Details of any existing review of the same topic by the same authors Current review status* Any other information: Details of final report/ publication(s)

Select the type of review from the drop down list. To select more than one hold down the Control key and click on your selections. Select the language(s) in which the review is being written and will be made available, from the drop down list. Use the control key to select more than one language. Select the country in which the review is being carried out from the drop down list. For multi-national collaborations select all the countries involved. Use the control key to select more than one country. List places where the systematic review is registered (such as with The Campbell Collaboration, or The Joanna Briggs Institute). The name of the organisation and any unique identification number assigned to the review by that organization should be included. Give the citation and link for the published protocol, if there is one. Alternatively, upload your published protocol to CRD in pdf format. Please note that by doing so you are consenting to the file being made publicly accessible. Give brief details of plans for communicating essential messages from the review to the appropriate audiences. Give words or phrases that best describe the review. (One word per box, create a new box for each term). Details of earlier versions of the systematic review if an update of an existing review is being registered, including full bibliographic reference if possible. Review status should be updated when the review is completed and when it is published. Select from drop down list to indicate the current status of the review. Provide any further information the review team consider relevant to the registration of the review. This field should be left empty until details of the completed review are available. Give the full citation for the final report or publication of the systematic review. Give the URL where available.

* Indicates a required field.

Complimentary Contributor Copy

57

58

Alison Booth and Dawn Craig

Figure 2. First page of the web form used to register review protocols in PROSPERO.

Complimentary Contributor Copy

Designing and Registering the Review

59

Brief guidance about the information required in each field can be accessed by clicking on the relevant icon in the form. More detailed guidance on field requirements, with examples, is available on the site. [35] When all the required fields have been completed the ‗Submit’ button becomes active and the form can be sent to the PROSPERO administrators. Access to your record is suspended during the administrative process. Receipt of submissions is acknowledged in an automated email sent to the named contact. Application forms are checked against the inclusion criteria for PROSPERO and for clarity of content. They are then approved and published on the register, returned to you for clarification, or rejected. Submissions are turned round within five working days of receipt. Once published on the register, the record becomes accessible again in ‗My records’. This allows amendments and updates. On submitting changes you will be asked to give brief details of the changes made. The information entered here will appear in the public record and should inform users of the register of the nature of the changes made (for example, removed one of the outcome measures; changed the anticipated completion date as data extraction is taking longer than anticipated). All submitted edits and changes to a PROSPERO record are recorded, dated and made available within the public record audit trail. The most recent version appears and previous versions are accessible from dated archive links in the record together with the revision notes. Records remain permanently on PROSPERO. Once the review is completed the status should be updated in the record and the anticipated publication date given. Once available, the bibliographic reference and electronic links to final publications should be added to the record. In the absence of a publication, details of availability of the review's unpublished results, or reasons for the termination of the review, should be documented. The named contact will receive reminder emails on the anticipated completion and the anticipated publication dates, with detailed instructions on what to do. When a review registered on PROSPERO subsequently meets the criteria for inclusion in DARE, a link to the PROSPERO record is added to the DARE record. Finally, updates of a completed review that has already been registered on PROSPERO should be added to the existing record by selecting the ‗Update of a review’ status option. This ensures that the history and previous versions are all linked and available in the same record and the unique identification number links all records.

CONCLUSION The value of network meta-analysis in comparing treatment options and informing clinical and policy decision making is being increasingly recognized. The number of planned and published network meta-analyses is increasing. As increasing numbers of health and social care decisions are based on network meta-analysis results, it becomes more imperative that the methods used are sound, transparent and reproducible. The value of time spent on designing, planning and registering a systematic review and associated network meta-analysis should never be underestimated. Good design and a well thought through protocol is the basis of good research.

Complimentary Contributor Copy

60

Alison Booth and Dawn Craig

CONFLICT OF INTEREST OR FUNDING DISCLOSURE Alison Booth is a Research Fellow and Dawn Craig a Health Economist at the NIHR Centre for Reviews and Dissemination which produces the CRD databases and PROSPERO. We have no other conflicts of interest.

REFERENCES [1]

Ioannidis J. P., Greenland S., Hlatky M. A., Khoury M. J., Macleod M. R., Moher D., Schulz K. F., Tibshirani R. Increasing Value and Reducing Waste in Research Design, Conduct, and Analysis. Lancet, 2014; 383: 66-75. [2] Clarke M., Hopewell S., Chalmers I. Clinical Trials Should Begin and End with Systematic Reviews of Relevant Evidence: 12 Years and Waiting. Lancet, 2010; 376: 20-1. [3] The Cochrane Collaboration. The Cochrane Database of Systematic Reviews. Chichester: John Wiley & Sons Ltd; 2014. [4] The Centre for Reviews and Dissemination Databases. York: Centre for Reviews and Dissemination, University of York; 2014. Available from: http://www.crd.york.ac.uk/ crdweb/ (last accessed on February 9, 2014). [5] PROSPERO: International Prospective Register of Systematic Reviews [Internet]. Centre for Reviews and Dissemination. Available from: http://www.crd.york.ac.uk/ PROSPERO/ (last accessed on February 9, 2014). [6] Booth A., Wright K., Outhwaite H. Centre for Reviews and Dissemination Databases: Value, Content, and Developments. Int J Technol Assess Health Care 2010; 26: 470-2. [7] Tugwell P., Knottnerus A., Idzerda L. Updating Systematic Reviews – When and How? J. Clin. Epidemiol., 2011; 64: 933-5. [8] Greco T., Landoni G., Biondi-Zoccai G., D'Ascenzo F., Zangrillo A. A Bayesian Network Meta-Analysis for Binary Outcome: How to Do It. Stat. Methods Med. Res., 2013 Oct. 28 [Epub ahead of print]. [9] Tricco A. C., Soobiah C., Lillie E., Perrier L., Straus S. E., Chen M., Hemmelgarn B., Majumdar S. Efficacy of Cognitive Enhancers for Alzheimer‘s Disease: Protocol for a Systematic Review and Network Meta-Analysis. PROSPERO [Internet]. 2012:[CRD42012001948 p.]. Available from: http://www.crd.york.ac.uk/PROSPERO/ display_record.asp?ID=CRD42012001948 (last accessed on February 9, 2014). [10] Hutton B., Moher D., Knoll G., Fergusson D., Strauss S., Tricco A., Yazdi F., Tetzlaff J., Hersi M. Is There an Increased Risk of Post-Transplant Lymphoproliferative Disorder and Other Cancers with Specific Modern Immunosuppression Regimens in Renal Transplantation? Protocol for a Network Meta-Analysis of Randomized and Observational Studies. PROSPERO [Internet]. 2013:[CRD42013006951 p.]. Available from: http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD4201300 6951 (last accessed on February 9, 2014). [11] Grant E. S., Calderbank-Batista T. Network Meta-Analysis for Complex Social Interventions: Problems and Potential. Journal of the Society for Social Work and Research, 2013; 4: 406-20.

Complimentary Contributor Copy

Designing and Registering the Review

61

[12] Cipriani A., Higgins J. P. T., Geddes J. R., Salanti G. Conceptual and Technical Challenges in Network Meta-Analysis. Ann. Intern. Med., 2013; 159: 130-7. [13] Glasziou P., Altman D. G., Bossuyt P., Boutron I., Clarke M., Julious S., Michie S., Moher D., Wager E. Reducing Waste from Incomplete or Unusable Reports of Biomedical Research. Lancet, 2014; 383: 267-76. [14] Jansen J. P., Fleurence R., Devine B., Itzler R., Barrett A., Hawkins N., Lee K., Boersma C., Annemans L., Cappalleri J. C. Interpreting Indirect Treatment Comparisons and Network Meta-Analysis for Health-Care Decision Making: Report of the Ispor Task Force on Indirect Treatment Comparisons Good Research Practices: Part 1. Value Health, 2011; 14: 417-28. [15] Song F., Loke Y. K., Walsh T., Glenny A. M., Eastwood A. J., Altman D. G. Methodological Problems in the Use of Indirect Comparisons for Evaluating Healthcare Interventions: Survey of Published Systematic Reviews. BMJ, 2009; 338: b1147. [16] Centre for Reviews and Dissemination. Systematic Reviews: CRD's Guidance for Undertaking Reviews in Health Care. York: University of York; 2009. [17] Higgins J. P. T., Green S., editors. Cochrane Handbook for Systematic Reviews of Interventions. Version 5.1.0 [Updated March 2011]: The Cochrane Collaboration; 2011. [18] Dwan K., Altman D. G., Arnaiz J. A., Bloom J., Chan A.-W., Cronin E., Decullier E., Easterbrook P. J., Von Elm E., Gamble C., Ghersi D., Ioannidis J. P. A., Simes J., Williamson P. R. Systematic Review of the Empirical Evidence of Study Publication Bias and Outcome Reporting Bias. PLoS ONE, 2008; 3: e3081. [19] Song F., Eastwood A. J., Gilbody S., Duley L., Sutton A. J. Publication and Related Biases. Health Technol. Assess, 2000; 4: 1-115. [20] Kirkham J. J., Dwan K. M., Altman D. G., Gamble C., Dodd S., Smyth R., Williamson P. R. The Impact of Outcome Reporting Bias in Randomised Controlled Trials on a Cohort of Systematic Reviews. BMJ, 2010; 340: c365. [21] De Angelis C., Drazan J., Frizelle F., Haug C., Hoey J., Horton R., Kotzin S., Laine C., Marusic A., Overbeke J., Schroeder T., Sox H., Van Der Weyden M. Clinical Trial Registration: A Statement from the International Committee of Medical Journal Editors. CMAJ, 2004; 171: 606-7. [22] World Medical Association. WMA Declaration of Helsinki - Ethical Principles for Medical Research Involving Human Subjects: World Medical Association; 2013 [cited 2014 Jan. 24]. Available from: http://www.wma.net/en/30publications/10policies/ b3/index.html (last accessed on February 9, 2014). [23] Silagy C. A., Middleton P., Hopewell S. Publishing Protocols of Systematic Reviews: Comparing What Was Done to What Was Planned. JAMA, 2002; 287: 2831-4. [24] Moher D., Tetzlaff J., Tricco A. C., Sampson M., Altman D. G. Epidemiology and Reporting Characteristics of Systematic Reviews. PLoS Med., 2007; 4: e78. [25] Liberati A., Altman D., Tetzlaff J., Mulrow C., Gotzsche P., Ioannidis J., Clarke M., Devereaux P., Kleijnen J., Moher D. The Prisma Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration. PLoS Med., 2009; 6: e1000100. [26] Moher D., Tetzlaff J., Altman D., The PRISMA Group. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The Prisma Statement. PLoS Med., 2009; 6: e1000097.

Complimentary Contributor Copy

62

Alison Booth and Dawn Craig

[27] The Joanna Briggs Institute [Internet] [24 Jan 2014]. Available from: http:// joannabriggs.org/ (last accessed on February 9, 2014). [28] The Cochrane Collaboration [Internet] [24 Jan 2014]. Available from: www.cochrane.org (last accessed on February 9, 2014). [29] The Campbell Collaboration [Internet] [24 Jan 2014]. Available from: http:// www.campbellcollaboration.org/lib/ (last accessed on February 9, 2014). [30] Pfizer. Systemaitc Review and Meta-Analysis of Psoriasis Treatments. ClinicalTrialsgov [Internet]. Accessed 24 Jan 2013. Available from: http:// clinicaltrials.gov/ct2/show/NCT01425138 (last accessed on February 9, 2014). [31] Booth A., Clarke M., Dooley G., Ghersi D., Moher D., Petticrew M., Stewart L. PROSPERO at One Year: An Evaluation of Its Utility. Syst. Rev., 2013; 2: 4. [32] Jai Y., Leung S. Network Meta-Analysis of Randomized Controlled Trials on the Monotherapy of Angina Pectoris in China: A Protocol. 2014; CRD42014007035. Available from: http://www.crd.york.ac.uk/PROSPERO? ID=CRD42014007035 (last accessed on February 9, 2014). [33] Booth A., Clarke M., Dooley G., Ghersi D., Moher D., Petticrew M., Stewart L. The Nuts and Bolts of PROSPERO: An International Prospective Register of Systematic Reviews. Syst. Rev., 2012; 1: 2. [34] Booth A., Clarke M., Ghersi D., Moher D., Petticrew M., Stewart L. Establishing a Minimum Dataset for Prospective Registration of Systematic Reviews: An International Consultation. PLoS One, 2011; 6: e27319. [35] Centre for Reviews and Dissemination. Guidance Notes for Registering a Systematic Review Protocol with PROSPERO. 2013 [24 Jan 2014]. Available from: www.crd.york.ac.uk/PROSPERO/documents/Registering%20a%20review%20on%20P ROSPERO%202%20Sept%202013.pdf (last accessed on February 9, 2014).

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 4

SEARCHING FOR EVIDENCE Su Golder, Ph.D.1, and Kath Wright1,† 1

Information Service, Centre for Reviews and Dissemination, University of York, York, UK

ABSTRACT Any systematic review is only as good as the evidence it is based on. Indeed, the very term systematic underlines the fact that the review in question is based on an explicit and formal search of the evidence relevant to the topic being investigated. This holds true for pairwise meta-analysis, where a thorough search aims to increase precision and external validity, while minimizing the risk of small study effect, publication bias, and selective reporting. Network meta-analyses represent a paradigm shift in comparison to traditional systematic reviews and pairwise meta-analyses, yet searching evidence is even more important in this context, and more complex. Its importance relates to how even minor selective reporting or similar biases may create spurious and false positive findings when dozens of direct and indirect studies are pooled together. For a network metaanalysis a valid search needs to encompass all the potential treatment combinations and outcomes of interest. The present chapter provides an updated and comprehensive overview on how to best search the scholarly literature and ancillary evidence sources when performing a network meta-analysis.

Keywords: Evidence, network meta-analysis, search, search strategy

INTRODUCTION This chapter describes methods used in searching for evidence to inform systematic reviews and subsequent network meta-analyses. We also explore the published evidence base

 †

Phone: +44(0)1904 321055. Fax: +44(0)1904 321041. E-mail: [email protected]. E-mail: [email protected].

Complimentary Contributor Copy

64

Su Golder and Kath Wright

that informs the search approaches used when identifying both direct and indirect evidence. To date most of the evidence base related to methodological issues of network meta-analyses has focused on statistical approaches [1], with only limited research on how to identify the evidence for network meta-analyses. Thus, in this chapter expert opinion and experience is cited in areas where evidence is currently lacking on the search techniques that should be adopted. A summary of the key stages of searching for the evidence for network metaanalyses is contained in Table 1.

Searching for other Network Meta-Analyses A thorough search should be conducted before beginning any research. Unplanned duplication of existing research should be avoided. The first stage, therefore, for any systematic review or network meta-analysis is a thorough and rigorous search for previous systematic reviews or network meta-analyses to ensure the analysis has not been carried out previously. Existing reviews which have identified all of the relevant evidence, but not conducted a network meta-analysis may also provide an excellent framework on which to build. The most comprehensive source of published systematic reviews of healthcare interventions is the Database of Abstracts of Reviews of Effects (DARE) produced on behalf of the UK‘s National Institute for Health Research (NIHR) by the Centre for Reviews and Dissemination (CRD). The database is freely available from the Centre‘s website http:// www.crd.york.ac.uk/CRDWeb/. Candidate systematic reviews are identified for DARE by conducting regular extensive searches of MEDLINE/PubMed, EMBASE, PsycINFO and CINAHL as well as handsearching of selected journals and scanning key websites. New records are added to the database on a weekly basis to ensure that content is as up to date as possible. In most instances, carrying out a broad, sensitive search on DARE alone should be sufficient to identify published systematic reviews containing network meta-analyses. In addition to searching for published systematic reviews, a search of the PROSPERO database (International prospective register of systematic reviews) will help establish whether there are any ongoing or unpublished systematic reviews or network meta-analyses that address your research question. Table 1. Key steps in searching for the evidence in network meta-analyses Step 1: Search for previously published and unpublished or ongoing network meta-analyses and systematic reviews Step 2: Identify sources to be searched and preliminary search terms or strings for inclusion in a protocol Step 3: Develop search strategies using PICOs, and combine search terms using Boolean operators (AND, OR and NOT) Step 4: Search multiple sources Step 5: Download records into reference management software or equivalent Step 6: Screen records Step 7: Identify further indirect comparators for inclusion and repeat steps 3 to 6. Step 8: Report search methods, sources and results.

Complimentary Contributor Copy

Searching for Evidence

65

PROSPERO is an international database of systematic review protocols in health and social care; it is described more fully in the previous chapter. As with DARE, PROSPERO is freely available to search on the Centre for Reviews and Dissemination (CRD) website http://www.crd.york.ac.uk/PROSPERO/. Once it has been established that the research question is previously unanswered and that no published or unpublished network meta-analyses are available or about to become available, the protocol can be written. Protocol development is detailed in the previous chapter. A preliminary search strategy specifying the databases and additional sources to be searched and the likely search terms to be used should be included in the protocol. It is likely that further searches to those specified in the protocol and adaptations to the searches suggested may be needed because network meta-analyses can evolve during the research process. This is perfectly acceptable so long as the additions or adaptations can be justified and appropriate amendments are made to the protocol.

Searching for Individual Studies for a Network Meta-Analysis The same principles of good practice that apply when conducting traditional evidence synthesis apply to this new emerging method of synthesis. Searching for the evidence is a key component to any review and synthesis. Rigorous and extensive literature search can help minimise the potential for bias. With a bigger more complex picture, bias in any part of the network meta-analyses can distort the findings. The validity of the findings, therefore, is dependent upon all eligible trials being identified and included in the analysis [1], and restricting to non-random or selective subset of eligible trials in a network meta-analysis may introduce selection bias in the treatment effect estimates[1]. More comprehensive searches contribute to a more complete picture of the network and produce more precise estimates of relative effects [2]. Searching for information to inform systematic reviews and network meta-analyses has many similarities to searching for systematic reviews which use more traditional pair-wise, direct comparisons only. For instance, the search approach used should emphasise sensitivity and seek to identify as many relevant published and unpublished studies as possible (specific to the question and within the allocated resources). Reporting of search strategies should always be transparent and reproducible so that readers can validate the quality of the searches and to enable updates where required. Where possible the input of an information scientist or librarian should be sought to develop and conduct the searches. Differences in searching for evidence for systematic reviews when planning network meta-analyses need to be considered. The main difference is the additional time and resources necessary to conduct these broader searches as there will be more interventions and comparators to assess [3]. High-quality network analysis may also include evidence on both direct and indirect treatment comparisons [4]. The search process has two main elements: defining and capturing the search question in a structured search strategy (set of search terms) and applying the search strategy to a range of resources (such as database and non-database sources).

Complimentary Contributor Copy

66

Su Golder and Kath Wright

DEFINING AND CAPTURING THE SEARCH QUESTION The search question should be predefined in the protocol; the scope of the question will define the subsequent searches undertaken. The search question can be derived directly from the main topic of the review and translated in an appropriate search format. Defining the search question is more complex for network meta-analyses than in typical search strategies. In a standard pair-wise meta-analysis or traditional systematic review, searches are likely to be conducted at one point in time, usually towards the beginning of the review process and sometimes repeated as an update search towards the end of the review process. The search process in a network meta-analysis may begin by restricting to direct evidence only and then broadening if little or no direct evidence is identified. The intention to approach the analysis in this manner should be clearly outlined in the protocol. Rather than one large systematic search of all the selected databases, a series of search iterations or search stages may be necessary [2]. This may be particularly true when new indirect comparators emerge during the research process. The International Society for Pharmacoeconomics and Outcomes Research (ISPOR) document provides an excellent summary of this approach in its checklist of good research practices (Table 2). In a network meta-analysis the treatments for comparison can be unknown a priori so a series of iterative searches or search stages may be needed. Searches in a network metaanalysis should be as comprehensive as possible so that all relevant studies are identified. The search process may need to be fully factorial taking into consideration all of the different interventions and their combinations (for example, searches for A and B, B and C, B and D, A and C, A and D, C and D and so on). Search query formats for network and standard pair-wise meta-analyses are best guided by a well-formulated, clearly defined, and answerable research question. Breaking down the question into component parts or ‗building blocks‘ and combining concepts using Boolean logic operators (AND, OR and NOT) is a well-recognised and commonly used search approach. Other commands that may be used, depending on the individual database searched, include proximity operators such as NEXT, NEAR or ADJ (adjacent), truncation or stemming using symbols such as *, $ or wildcards such as ? which represent one or no characters. The PICOs framework is commonly used to help focus and structure the search by dividing the research question into populations, interventions, comparators, outcomes and study designs. Table 2. International Society for Pharmacoeconomics and Outcomes Research (ISPOR) checklist of good practices for conducting and reporting network meta-analyses [5]

Search strategies

Follow conventional guidelines for systematic literature searches; be explicit about search terms, literature, and time frames; avoid use of ad hoc data Consider iterative search methods to identify higher-order indirect comparisons that do not come up in the initial search focusing on lower-order indirect comparisons

Complimentary Contributor Copy

Searching for Evidence

67

The choice of PICOs concepts in the search is dependent on the search question, the size of the literature and the database searched. A PICOs format may be relatively easy to assemble for direct treatment comparisons where the comparators are predefined. But it may be more difficult to search in this way for indirect treatment comparisons, particularly at the beginning of a network meta-analysis when the set of relevant comparators may be unknown. Evidence for network meta-analyses is mainly derived from clinical trials or randomised controlled trials (RCTs) [3], so the building block for study designs in the PICOs framework is often known in advance. Searching for trials is well researched and there are many relevant search filters (predefined sets of search terms) to choose from that can make the search process easier (see section below on search filters). Limiting by study design may be more complicated in network meta-analyses that evaluate harms associated with interventions. There are limited data on harms or adverse effects in RCTs and this is particularly the case with long-term, rare or unexpected harms. The most appropriate methods are currently unknown and methodological research is required to address how to include harms in network meta-analyses. Developing the database search strategies can take place once the research question has been refined within the PICOs framework. Search strategies should contain both thesaurus terms and ―free text‖ terms for maximum sensitivity. MEDLINE records are assigned index terms from the Medical Subject Headings (MeSH) thesaurus (a controlled hierarchical set of keywords) while Embase has its own set of thesaurus terms called Emtree. Some databases (such as Science Citation Index) use keywords; other databases have no controlled vocabulary at all. Free text terms used to search the title and abstract fields should include synonyms, spelling variants, and abbreviations. The search will need to be adapted to suit the requirements of the search interface for each database selected. An example of a search strategy for MEDLINE is contained in Table 3. Evidence suggests that all search strategies should be peer reviewed before use [6]. This process can be quick and has potential for enormous benefit where errors are corrected and additional key terms are identified. An alternative approach to developing and running searches is to use the search results from good quality up-to-date systematic reviews with comprehensive searches of relevant pair-wise treatment comparisons [1]. As previously stated preliminary searches conducted for the research protocol can be used to identify existing systematic reviews that contain data to inform parts of the network meta-analysis. If this technique is used, any systematic review identified should be quality assessed before inclusion, particularly in relation to the searches used, to ensure that it contains appropriate up-to-date searches relevant to the question. If the searches are poor or outdated, further searching may be required to augment the existing evidence-base. If other aspects of review conduct require remediation, the review can simply serve to identify relevant studies which can be subject to further, critical appraisal or data extraction to rectify any deficiencies in the original review(s). Particular care should be taken to identify multi-arm trials in existing reviews as the original review may only have been interested in one comparison, or may have analysed different comparisons separately (frequently comparing active comparators to placebo but not each other). Trials by the same author in different meta-analyses within the review should be scrutinised to ensure that they pertain only to different outcomes or subgroups, not to different treatment arms.

Complimentary Contributor Copy

68

Su Golder and Kath Wright Table 3. An example search strategy Strategy designed to identify randomised controlled trials investigating the use of nebulisers and inhalers by children and young people with asthma. 1. exp Asthma/ (102785) 2. asthma$.ti,ab. (110620) 3. 1 or 2 (129186) line 3 combines all the search terms for asthma 4. exp "Nebulizers and Vaporizers"/ (8132) 5. (nebuliser$ or inhaler$ or vaporiser$ or vapouriser$ or vapourizer$).ti,ab. (6354) 6. 4 or 5 (11171) line 6 combines all the search terms for nebulisers and inhalers 7. exp Child/ (1501702) 8. (child$ or infant$ or boy or boys or girl or girls or teenager$ or youth or young person$ or young people).ti,ab. (1222780) 9. 7 or 8 (1992361) line 9 combines the search terms for children 10. 3 and 6 and 9 (1858) line 10 combines the search terms for asthma, nebulisers and children 11. randomized controlled trial.pt. (358644) 12. controlled clinical trial.pt. (86849) 13. randomized.ab. (259903) 14. placebo.ab. (140993) 15. drug therapy.fs. (1648210) 16. randomly.ab. (185772) 17. trial.ab. (267479) 18. groups.ab. (1200237) 19. 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 (3085266) 20. exp animals/ not humans.sh. (3863199) 21. 19 not 20 (2622734) lines 11 to 21 are the Cochrane Highly Sensitive Search Strategy for identifying randomized trials in MEDLINE: sensitivity-maximizing version (2008 revision);[13] Ovid format 22. 10 and 21 (1623) line 22 identifies records for RCTs about asthma, nebulisers and children 23. limit 22 to yr="1990 - 2013" (1449) line 23 restricts the search results to records published in the time period The strategy is designed for the OvidSP MEDLINE interface: $ truncation symbol. So ―child$‖ retrieves child, children and childhood .ti,ab. restricts the search to title and abstract fields exp indicates that the MeSH terms has been ―exploded‖ to retrieve records tagged with those MeSH terms and other subsidiary terms The number in brackets indicates the number of records identified by each search statement

Complimentary Contributor Copy

Searching for Evidence

69

WHICH RESOURCES TO SEARCH The sources used to identify trial data are common to network meta-analyses and single meta-analyses. A thorough search should include multiple sources including published literature, conference abstracts and other sources of grey literature, clinical trial registries, internal company reports, reviews of trials by regulatory agencies, and requests for data from trial investigators. A list of potentially useful sources is contained in Table 4.

Generic Bibliographic Databases Randomised controlled trials (RCTs) can be identified by using traditional bibliographic databases as well as clinical trials registers [7]. The main databases available to health services researchers are MEDLINE/PubMed and Embase. Such generic healthcare databases are the databases most commonly used to identify studies. MEDLINE is produced by the U.S. National Library of Medicine and contains more than 19 million references to journal articles. It covers the topics of biomedicine and health, including life sciences, public health and health policy. Most of the records come from journals published in English and about half of these are published in the USA. One key feature of MEDLINE is that records are indexed with MeSH designed to promote consistent record retrieval. The database can be accessed through commercial companies or free access is available via PubMED http://www.ncbi.nlm.nih.gov/pubmed/. Embase is a subscription database produced by Elsevier and has a greater focus on drugs and pharmacology. Embase includes all MEDLINE records and provides coverage of a further 2,500 journal titles that are not indexed by MEDLINE. Since 2009 Embase has included conference abstracts and can be a useful source of grey literature. Records in Embase are indexed using a thesaurus called Emtree, a hierarchically structured, controlled vocabulary for biomedicine that includes terms for drugs, chemicals, medical devices and medical procedures.

Search Filters in MEDLINE and EMBASE In both databases, search strategies can be constructed that combine terms for a clinical condition, procedure or drug with randomized controlled trial as thesaurus term in Embase or randomized controlled trial as publication type in MEDLINE. This will produce a list of potentially relevant records. An alternative to using the study methodologies or publication types assigned by the database producers is to make use of a search filter that has been designed to retrieve randomized controlled trials. A search filter is a collection of search terms designed to retrieve records of research that use a specific study design. If a search filter has been designed to maximise sensitivity the search results are more likely to be more comprehensive than a search carried out using the thesaurus term or publication type alone. RCT search filters are available from: • •

InterTASC Information Specialists' Sub-Group Search Filter Resource https://sites.google.com/a/york.ac.uk/issg-search-filters-resource/home, and McMaster University‘s Health Information Research Unit http://hiru.mcmaster.ca/ hiru/HIRU_Hedges_home.aspx.

Complimentary Contributor Copy

70

Su Golder and Kath Wright Table 4. List of potentially useful resources Bibliographic databases

Generic Embase MEDLINE PubMed Cochrane Central Register of Controlled Trials (CENTRAL) Science Citation Index (SCI) BIOSIS Previews PASCAL Specialist Iowa Drug Information Service (IDIS) Thomson Reuters Integrity Derwent Drug File (DDF) International Pharmaceutical Abstracts (IPA) ADIS Clinical Trials Insight PsycINFO CINAHL British Nursing Index (BNI) Conference Proceedings Conference Proceedings Citation Index Conference Papers Index (CPI) Inside Conferences (British Library) Zetoc Conference Theses/Dissertations Index to Theses - http://www.theses.com/ Trials registers

International Registers ClinicalTrials.gov http://clinicaltrials.gov/ct2/home IFPMA Clinical Trials Portal clinicaltrials.ifpma.org/ World Health Organization‘s International Clinical Trials Registry Platform (ICTRP) http://www.who.int/ictrp/search/en/

Complimentary Contributor Copy

Searching for Evidence

71

Trials registers

Regional Registers Australian New Zealand Clinical Trials Registry (ANZCTR) http://www.anzctr.org.au/ EU Clinical Trials Register https://www.clinicaltrialsregister.eu/index.html Pharmaceutical Company Registers AstraZeneca http://www.astrazenecaclinicaltrials.com/ GlaxoSmithKline http://www.gsk-clinicalstudyregister.com/ Pfizer http://www.pfizer.com/research/clinical_trials Roche http://www.roche-trials.com/

When searching for known trials in Embase, other search options are available including using the trade name of the drug or device, the name of the drug or device manufacturer or the clinical trial number.

Other Bibliographic Databases Cochrane Central Register of Controlled Trials (CENTRAL) is an excellent source of published and unpublished clinical trials (originally identified from a wide range of sources including bibliographic databases and handsearching). Science Citation Index (SCI) and BIOSIS Previews are not used as widely but are useful databases in which to search for trials. Other databases with a narrower focus that could be equally appropriate include specialist databases such as Iowa Drug Information Service (IDIS), Derwent Drug File (DDF), International Pharmaceutical Abstracts (IPA), Thomson Reuters Integrity and ADIS Clinical Trials Insight (for drug interventions), PsycINFO (for psychology and psychiatry), CINAHL and British Nursing Index (BNI) (for nursing) and the Allied and Complementary Medicine Database (AMED) (for complementary and alternative medicine). Advantages of using bibliographic databases to identify RCTs include: • • • •

facility for complex and/or focused searches, ability to download search results into bibliographic software, access to a summary of the study‘s results, obtaining contact details of the lead investigator.

Using Trials Registers There are many trials registers and much overlap between registers: some are disease specific; others collect together trials from a specific country or region; and pharmaceutical companies may make information about trials they have conducted available from their company website.

Complimentary Contributor Copy

72

Su Golder and Kath Wright

Many registers are freely available on the Internet and some (www.ClinicalTrials.gov and www.who.int/trialsearch/) include the facility to search by drug name or by condition. While the search functionality is usually not as sophisticated as that available on bibliographic databases the information provided about the study can be quite detailed. For example, www.ClinicalTrials.gov includes data on inclusion and exclusion criteria, outcome measures, numbers of participants, age, gender, location, and adverse events. Advantages of using trials registers to identify RCTs include: • •

ability to identify unpublished or ongoing trials, direct links to request patient level data and clinical study reports (Roche register http://www.roche-trials.com/).

How Many Sources to Search The diversity of questions addressed by network meta-analysis means there can be no agreed standard for what constitutes an acceptable number of databases to search. Surveys of systematic reviews suggest that the average number of databases searched is around three but research indicates that a larger number of databases may be required to conduct a thorough search [7, 8]. In an evaluation of a case study systematic review, the highest contribution of RCT data was retrieved by BIOSIS Previews, followed by Science Citation Index (SCI), EMBASE, Thomson Reuters Integrity, MEDLINE, and then PASCAL[8].

Non-Database Sources Reference Checking In addition to searching electronic databases, research may be obtained by scanning reference lists of relevant studies. This usually entails scanning the included primary studies as well as relevant systematic reviews. It can be quite time consuming but research has indicated its value, particularly as it can identify unique relevant studies [8, 9, 10]. Handsearching Journals Scanning paper or electronic copies of relevant journals, conference proceedings and abstracts is an additional method to identify reports of studies not yet published and indexed by electronic databases. Candidate journals for handsearching can be identified from an analysis of database search results. Handsearching is a labour intensive method for identifying studies, but it is a useful supplement to database searching as it relies on neither the accuracy of keywords assigned to database records nor on comprehensive search strategies.

Complimentary Contributor Copy

Searching for Evidence

73

Internet Sites Internet sites are a useful source of grey literature such as unpublished papers, reports and conference abstracts. Identifying and scanning specific relevant websites has been found to be more practical than using general search engines such as Google [11]. Contacting Experts and Manufacturers Researchers, clinicians and other experts, and manufacturers can be useful sources of information not identified in the electronic searches. In particular, they may be able to supply information on unpublished or on ongoing research although contacting individuals and companies can be very time consuming and may not meet with a response. Multi-arm trials and direct head to head comparisons of active treatments are important to allow information from both direct and indirect comparisons to be compared and integrated. Such information is sometimes available from manufacturers even when it is not available elsewhere. It is important that reasons for the non-provision of any direct comparisons are sought and made explicit. Comprehensive searches focused explicitly on specific direct head to head comparisons and direct statements from manufacturers regarding the data they hold increase confidence that absence of studies reflects lack of evidence, rather than potential reporting biases, particularly with pharmacological interventions. After all searches have been conducted it can be useful to send the list of studies identified to a key researcher in the field for their suggestions. This can provide a valuable check on whether all potential studies have been identified and whether there is ongoing research of relevance to your research project. Citation Searching Like reference checking and handsearching, citation searching is a useful technique to supplement database searching. It begins with a set of known journal articles, or key authors, and identifies further journal articles that have cited the original articles. Citation searching is available through a number of subscription only resources, such as Science Citation Index (SCI), Social Science Citation Index (SSCI) and Scopus, as well as freely available resources such as Google Scholar http://scholar.google.co.uk/.

UPDATING LITERATURE SEARCHES Depending on the timescale and scope of the network meta-analysis, it may be appropriate to include an update of the literature searches towards the end of the project to check for recent relevant papers. The multiple staged searches sometimes required for network meta-analysis may make update searches time consuming but can enhance the analysis.Incorporating an ―update field‖ into the search strategy enables restrictions to the search results so that records added to the database since the original search can be identified. This simple and efficient approach works only if the database records include a field for entry date. Once the search is rerun and all available records are downloaded, they can be added to the project‘s library or bibliographic management software and previously identified records removed. This additional step in the process ensures that researchers do not need to screen records retrieved from previous searches and already screened.

Complimentary Contributor Copy

74

Su Golder and Kath Wright

CURRENT AWARENESS Setting up current awareness alerts can help identify research published after the initial searches. It is possible to set up automated email alerts from specified journal titles and RSS feeds from databases or websites. Researchers should be aware that there is likely to be a high level of duplication in the records received so it is good practice to check whether records have been identified by previous searches before adding to the project‘s reference library.

MANAGING REFERENCES/LIBRARIES Specialist bibliographic management software or management systems such as Endnote, Reference Manager, ProCite, RefWorks and Mendeley can be used to store and record references. These software packages can be used to import and deduplicate results from searches of individual databases and non-database sources. The software or system in place can also be used to record screening decisions on the inclusion and exclusion of records and details of papers ordered and received. Using one of these software packages makes it quicker and easier to locate references, produce data required for publication of the network metaanalysis (such as the numbers of records retrieved and screened) and to create a list of included and excluded studies. These records may also enable production of a flow diagram as recommended by reporting guidance such as the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement [12]. Some bibliographic management software packages have the facility to import references from electronic databases such as MEDLINE and Embase directly into the project library. Registries and internet search results may not have this level of functionality and records may need to be entered manually into the bibliographic software package. Another advantage of some bibliographic management software packages is that it interfaces with word processing packages so that bibliographies/reference lists can be created automatically in a choice of styles including Harvard, Vancouver as well as styles for named journal titles.

DOCUMENTING THE SEARCH The search process should be reported in such a way that it is reproducible and transparent to the reader enabling searchers to re-run or update the searches. To ensure the search is reproducible, key items of information need to be recorded:     

names of the sources searched and the platform/interface used (some databases are available from multiple providers), dates when searches were executed, full search strategies used, database date ranges searched, numbers of records retrieved.

Complimentary Contributor Copy

Searching for Evidence

75

Reporting of network meta-analyses has been identified as poor especially in the case of search strategies and data sources used [13]. Network meta-analysis may involve multiple searches and this can make the reporting of transparent and reproducible search strategies difficult, particularly where space restrictions apply. Complete reporting of the literature searches undertaken can, however, facilitate quality assessment of the network meta-analysis. Guidelines on reporting standards of systematic reviews are available to help authors. Examples include MECIR (Methodological standards for the conduct of Cochrane Intervention Reviews), PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) and PRISMA Harms Extension. These guidelines include aspects on literature searching such as reporting the full search strategy and full listing of the information sources (database and non-database) searched. Many journals have the facility for appendices and supplementary material to be available electronically enabling authors of systematic reviews to overcome restrictive word limits and report full search strategies for at least one of the databases searched. Journal editors encourage adherence to such guidelines. Examples of search reports are provided in guides such as The Cochrane Handbook [14] and Systematic Reviews: CRD's guidance for undertaking systematic reviews in health care [11].

Quality Assessment of the Search Once published the methodological quality of systematic reviews, including the search process, can be assessed using tools such as AMSTAR (Assessment of Multiple Systematic Reviews) [15].

CONCLUSION Good quality systematic reviews, whether they undertake traditional pairwise metaanalysis or network meta-analysis, are underpinned by well-established principles of good practice. A key difference is the additional workload and flexibility required to undertake the multiple search iterations covering the numerous comparators involved. This should not detract from the quality of the searches themselves or their reporting.

CONFLICT OF INTEREST AND FUNDING DISCLOSURE None.

REFERENCES [1]

Li T., Puhan M. A., Vedula S. S., Singh S., Dickersin K.; Ad Hoc Network Metaanalysis Methods Meeting Working Group. Network meta-analysis-highly attractive but more methodological research is needed. BMC Med., 2011; 9: 79.

Complimentary Contributor Copy

76 [2] [3]

[4]

[5]

[6]

[7]

[8] [9] [10] [11]

[12]

[13]

[14] [15]

Su Golder and Kath Wright Hawkins N., Scott D. A., Woods B. How far do you go? Efficient searching for indirect evidence. Med. Decis. Making, 2009; 29: 273-81. Coleman C. I., Phung O. J., Cappelleri J. C., Baker W. L., Kluger J., White C. M., Sobieraj D. M. Use of Mixed Treatment Comparisons in Systematic Reviews. Rockville: Agency for Healthcare Research and Quality (AHRQ); 2012. Mills, E. J. Kanters, S. Thorlund, K. Chaimani, A. Areti-Angeliki Veroniki, A. Ioannidis, J. P. A. The effects of excluding treatments from network meta-analyses: survey. BMJ, 2013; 347: f5195. Hoaglin D. C., Hawkins N., Jansen J. P., Scott D. A., Itzler R., Cappelleri J. C., Boersma C., Thompson D., Larholt K. M., Diaz M., Barrett A. Conducting IndirectTreatment-Comparison and Network-Meta-Analysis Studies: Report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices—Part 2. Value Health, 2011; 14: 429-37. Sampson M, McGowan J, Cogo E, Grimshaw J, Moher D, Lefebvre C. An evidencebased practice guideline for the peer review of electronic search strategies. J Clin Epidemiol. 2009;62(9):944–52. Crumley, E. T., Wiebe, N., Cramer, K., Klassen, T. P. and Hartling, L. Which resources should be used to identify RCT/CCTs for systematic reviews: a systematic review. BMC Medl. Res. Methodol., 2005; 5: 24. Golder S., Loke Y. K. The contribution of different information sources for adverse effects data. Int. J. Technol. Assess Health Care, 2012; 28: 133-7. Golder S., Mason A., Spilsbury K. Systematic searches for the effectiveness of respite care. J. Med. Libr. Assoc. 2008; 96: 147-52. Golder S. Optimising the retrieval of information on adverse drug effects. Health Info. Libr. J., 2013; 30: 327-31. The Centre for Reviews and Dissemination. Systematic Reviews: CRD's guidance for undertaking systematic reviews in health care. York: Centre for Reviews and Dissemination, University of York; 2009. Liberati A., Altman D. G., Tetzlaff J., Mulrow C., Gøtzsche P. C., Ioannidis J. P., Clarke M., Devereaux P. J., Kleijnen J., Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ, 2009; 339: b2700. Bafeta, A. Trinquart, L. Seror, R. Ravaud P. Analysis of the systematic reviews process in reports of network meta-analyses: methodological systematic review. BMJ, 2013; 347: f3675. Higgins J. P. T., Green S., editors. Cochrane Handbook for Systematic Reviews of Interventions. Chichester: John Wiley & Sons Ltd; 2008. Shea B. J., Grimshaw J. M., Wells G. A., Boers M., Andersson N., Hamel C., Porter A. C., Tugwell P., Moher D., Bouter L. M. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med. Res. Methodol., 2007; 7: 10.

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 5

ABSTRACTING EVIDENCE Joey Nicholson, M.L.I.S., M.P.H. and Sripal Bangalore, M.D., M.H.A. Assistant Curator, New York University School of Medicine, New York, NY, US Associate Professor of Medicine, New York University School of Medicine, New York, NY, US

ABSTRACT Extracting data from studies after performing a literature search is a vital part to any meta-analysis. If incorrect, incomplete, biased, or duplicate data is collected, outcomes produced could be erroneous and can change the results and interpretation of the metaanalysis, potentially leading to wrong conclusions. To prevent such problems, metaanalysis authors should dedicate ample time early in the process to understand and plan what data they need to collect, how to collect it, and how to verify it. This chapter discusses the current best practices available for data synthesis and extraction and how to use these practices to enhance the validity, reliability, transparency and reproducibility of a meta-analysis.

Keywords: Abstraction, abstraction form, extraction, extraction form, source data, variable Funding: None Conflict of Interest Disclosure: None



Sripal Bangalore, MD, MHA, FACC, FAHA, FSCAI, Director of Research, Cardiac Catheterization Laboratory, Director, Cardiovascular Outcomes Group, Associate Professor of Medicine, New York University School of Medicine, The Leon H. Charney Division of Cardiology, 577 1st Ave, New York, NY 10016. Email: [email protected].

Complimentary Contributor Copy

78

Joey Nicholson and Sripal Bangalore

INTRODUCTION Drawing conclusions in a meta-analysis relies heavily on accurately and systematically extracting data from the identified studies after an extensive literature search. Researchers must make a series of judgments about which data are extracted, how they are presented, and how they are analyzed. Not only is the researcher responsible for making all these decisions, but also a key part of the meta-analysis process is ensuring that these decisions are transparent, reproducible, and that they minimize any possibility of human error or introduction of bias [1, 2]. This chapter reviews issues related to data extraction for metaanalyses, including: how to select and define variables for extraction; how to design and use data extraction forms; and how to methodically and transparently extract data in a non-biased, reportable and reproducible format. This chapter gives general guidance for extracting data. Keep in mind, each individual topic for a meta-analysis has its own unique issues that arise, and this chapter is only intended to be a guide for the overall process of data extraction and synthesis.

VARIABLES TO EXTRACT The exact variables to extract in completing a meta-analysis will depend on the topic of the study, the focus of the question, and the available types of studies for inclusion. Researchers must plan ahead to identify what will be the variables of interest and how they will be obtained. The information collected when extracting data will not only be used for analyzing the data, but also for assessing the risk of bias and the quality of each of the studies. By delineating the variables of interest and extraction methods in the protocol, researchers will save time later and help prevent data extraction errors [3–5].

Variables to Consider When beginning a meta-analysis project, there are often many uncertainties in regards to what outcome measures are of interest, what ones are actually reported, and what ones can be transformed to the desired format. Allowing extra time at the beginning stages of a project to plan for what data are needed and what data are not needed saves time later in developing extraction forms and not extracting unnecessary data. The variables to consider include not only the outcomes of interest but also variables that provide information on each study (study author, year of publication etc), eligibility criteria, study information including inclusion and exclusion criteria, study treatment and intervention, baseline demographics, sample size for each groups, as well as data needed to assess study quality. Table 1 compiles some variables of interest recommended for consideration when planning data extraction for a meta-analysis by the Cochrane Collaboration, the Centre for Research & Dissemination, and the Institute of Medicine [6–8]. When study data is collected from multiple sources, it is a good practice to also denote the source of data collected.

Complimentary Contributor Copy

Abstracting Evidence Table 1. Example variables for data extraction General Information:  Study record number (uniquely identifies a study within a metaanalysis)  Study author  Title  Type of publication (e.g. journal article, conference proceeding, unpublished clinical trial data)  Country of origin  Funding source  Dates of data extraction  Researcher ID (who performs data extraction) Eligibility Information:  Confirmation of eligibility  Reason for exclusion (if not eligible) Study Information:  Objectives  Study design  Allocation  Blinding  Other bias concerns  Study inclusion and exclusion criteria  Unit of allocation (e.g. individual, group practice, etc) Participant Information:  Total number  Setting  Age  Gender  Ethnicity  Socio-economic status  Study Dates  Diagnostic criteria  Comorbidities

Intervention Information:  Setting for intervention  Number of intervention groups  Number of participants in each group  Specific intervention details (e.g. dose, duration, theoretical basis, etc) Outcomes:  Unit of analysis  Statistical techniques  Outcomes collected  Outcomes reported  Time points collected  Time points reported  Type of analysis (e.g. intention to treat, etc) For each outcome of interest:  Definition used in study  Number of participants enrolled  Number of participants analyzed  Number of withdrawals, exclusions, etc  Measurement tool/method used  Summary results (e.g. dichotomous or continuous)  Estimate of effect with CI and P-value  Any subgroup analyses Miscellaneous:  Costs  Resources used  Adverse events  Conclusions from study authors  Miscellaneous study author comments  References to other relevant studies  Whether correspondence is needed to complete data extraction  Comments by meta-analysis author

Complimentary Contributor Copy

79

80

Joey Nicholson and Sripal Bangalore

Defining Variables Because meta-analyses are team projects, data are extracted and used by multiple different team members. It is essential to reducing errors that variables are clearly defined and that each team member knows how each variable should be recorded [9, 10]. For example, it is often not enough to simply check a box to indicate a study is blinded, when assessing bias a researcher will also need to know whether it was single-blinded vs. double-blinded or blinding was for participants, patients or outcome assessors. Similarly, it may not be sufficient to denote whether the study was randomized but more details such as how randomization was performed and steps to maintain allocation concealment should be collected. In general all the data needed to assess the Cochrane quality assessment metrics should be collected at a minimum to ensure good quality assessment of each study. Definitions of variables are also critical for outcomes of interest. For example the definition of myocardial infarction and the criteria/biomarker threshold used in studies can vary widely. In such cases, recording each study specific definition of MI will be helpful. When several outcomes based on varying definition are available, it is good practice to extract data for most commonly used definitions. For example, the bleeding outcome can be reported using varying many different criteria-TIMI bleeding, ACUITY bleeding criteria, major bleeding vs. minor bleeding or any trial specific bleeding criteria. Extracting data for as many reported bleeding definitions will help pool relevant definitions more appropriately. In addition to collecting major outcomes of interest, it is a good practice to also report adverse events with treatment. Data should also be collected in the rawest form possible to allow for any necessary conversion. For example, when collecting ages saying child, adult, or senior does not actually provide consistent information since those ranges are subjective and the borders can overlap. It is better to collect the exact ages or age ranges. Table 2 provides an example of the difference between raw data and data that has been processed. The processed data has limited use and can only be properly interpreted or reused when the raw data is known. Similarly, while your protocol should specify the exact outcome measurement, time point, and summary statistics of interest, these may not all be readily available in the study. It is then necessary to make sure you are collecting relevant sample sizes and numbers of participants who experience the outcome in each group for dichotomous outcomes or the mean and standard deviation for continuous outcomes [11, 12]. Table 2. Raw versus processed variables Variable type Subject ID Subject 1 Subject 2 Subject 3

Raw variables Age Height 23 180.45 cm 32 191 cm 45 170.18 cm

Weight 79.4 kg 115.5 kg 77.1 kg

Processed variables Age BMI Young Adult 24.4 Adult 31.7 Middle-Aged 26.6

Category Normal Obese Overweight

Collecting the base data from which the original authors are drawing conclusions will allow a researcher to reproduce reported summary statistics and also to convert if the needed

Complimentary Contributor Copy

Abstracting Evidence

81

statistics are not directly reported. The information regarding the definitions of variables and how they should be recorded and coded should be consistently documented and given to the researchers performing the data extraction. This transparency is useful not only in ensuring researchers are all using the same definitions, but also can be utilized in the future if you need to revisit or update the meta-analysis.

Data Sources The majority of study results included in meta-analyses are from published journal articles reporting on an original study. However, a detailed meta-analysis will usually also include data from other sources, including: conference abstracts, dissertations, clinical trial registries, grey literature, books, or the deep web [6–8]. In some cases all data will not be readily available, either because it is not included in the published journal article, or because the report located was only an abstract or trial registry result. When this occurs it may be necessary to contact the original study authors [13]. In order to obtain reliable and accurate data from all potential sources, it is necessary to design an appropriate data extraction form that includes all necessary fields. When authors are contacted for unpublished data, using this data extraction form is helpful to ask for relevant information.

DATA EXTRACTION FORMS A data extraction form is used first to collect the data reported by original study authors. Once all the data are collected, the meta-analysis authors can then use these forms to summarize and report the findings of the whole group of original studies. Given the importance of collecting reliable data, leading organizations such as Cochrane, CRD, AHRQ, and IOM recommend dedicating sufficient time to designing appropriate data extraction forms early in the meta-analysis process [5–8,10,14]. Designing data extraction forms includes a number of steps and decisions. Not only do the authors need to agree on the variables needed for extraction, but also a form should be piloted and refined using a sample of the included studies. Piloting a data extraction form with a small sample will help to confirm that all variables of interest are appropriately captured the first time and that extra time is not wasted on collecting variables that are not useful. The piloting process should involve at least two team members and primarily includes checking to make sure those who will be extracting the data interpret and code the data in the same manner [5]. It may also be useful to pilot how the team will get data out of the form for analysis. By pilot testing a form, the team can help reduce data extraction errors and ensure that the analyses completed are reliable and valid.

Electronic or Paper The format for data extraction forms is largely a personal decision. There are many potential methods from a basic paper and pencil form for each included study, to a basic

Complimentary Contributor Copy

82

Joey Nicholson and Sripal Bangalore

online document, a spreadsheet, a custom database, a web-based survey instrument, or a commercially available online tool. Each format has a variety of pros and cons that must be considered. Most decisions will depend on the topic of the meta-analysis, the geographic locations of the researchers, funding available, technical ability of the researchers, and time available to complete the project [15]. Paper forms are preferred for several reasons: data extraction can occur anywhere; easier to create; permanent paper record of all decisions; simple to compare forms completed by two review authors; and no need for technical expertise to design a custom database or spreadsheet. However, paper forms also have cons: they are cumbersome; easily lost; require additional time to input the data once extracted; and difficult to implement when collaborating remotely. Most commonly paper forms are created as documents and printed out to go with each person extracting data. Figure 1 is an example of the outcomes section of a data extraction form that could be printed out and filled manually. As seen in Table 1, there are at least seven sections to have a complete extraction form and this is only one section. Paper forms are often 5-10 pages long.

Figure 1. Sample data extraction form for outcomes data section.

Complimentary Contributor Copy

Figure 2. Sample spreadsheet for data extraction.

Complimentary Contributor Copy

84

Joey Nicholson and Sripal Bangalore

In this digital age, there are also many electronic options. Potential reasons to use electronic forms are: combine data extraction and data entry into one step; forms and databases can be programmed to lead authors through the process; easier to store and retrieve data from large numbers of studies; easier to compare and validate results of two data extractors. But, electronic forms may not work for every project: no technical expertise to build a custom database or spreadsheet; no funding to purchase a product; and limited access to computers. Spreadsheets are among the most common of the electronic forms for extracting data. Figure 2 is an example of a spreadsheet used to collect baseline and outcomes data.

DATA EXTRACTION PROCESS Once the team has decided upon the necessary variables and piloted and refined the data extraction form, it is time to begin actually extracting the data. Typically, data included in meta-analyses comes from published journal articles reporting results of studies. Data extraction is often a labor and time-intensive process, but if it is not done meticulously it can lead to biased or invalid results. In a 2005 study, Jones reports that data extraction errors occurred in 20 out of 34 Cochrane reviews [4]. In this instance, the prevalence of errors was quite high, but fortunately these errors did not significantly change the conclusions. Gøtzsche also found a high incidence of data extraction errors in the meta-analyses that he sampled [9]. Unfortunately in his study, in 37% of those papers with data extraction errors, the errors compromised the results and led to incorrect summary of effect measures.

Who Extracts Similar to the screening process, two of the meta-analysis authors should perform data extraction independently. By having two researchers extract data, authors can work to minimize both potential human error and bias. While on the surface it may seem like this work is not susceptible to a high number of errors, in fact multiple studies have found that significant data extraction errors are common [3, 4, 7]. Following the independent data extraction, results should be compared and any disagreements should be resolved either by a third party or by agreement between the two extractors. A common mistake that is made when performing meta-analyses is to use a single person instead of two people to extract the data from the included studies. While this method is faster, it has also been found to lead to a higher percentage of errors. A 2006 study by Buscemi and colleagues found 21.7% more data extraction errors compared to using two independent reviewers [3].

Duplicate Publications When completing data extraction, another consideration is to account for duplicate publications to ensure that reporting bias is reduced. Authors frequently publish multiple

Complimentary Contributor Copy

Abstracting Evidence

85

reports and articles on the same study. If these duplicate reports are all included, the effect summary will be incorrect since data from some of the reports are double counted [16–18]. It can be difficult to determine if published reports are referring to the same study, but it is necessary to double-check and confirm that all included data is only included one time. To help with tracking duplicate reports, it is recommended to include a field in the data extraction form to link multiple reports of the same study. Once multiple reports have been identified and linked, meta-analysis authors have two primary options for how to extract the data. One option is to extract the data from each available report individually, then combine the linked results for the final product. This is more likely to be used when multiple articles are reporting on different aspects of the same study, such as different follow-up times. The second main option is to use one form to extract the data from all available reports. This option is more likely to be used when the multiple reports have limited information, such as conference abstracts.

CONCLUSION Accurate data extraction is an important step in a meta-analysis. When extraction is done properly, it can lead to improved validity and reliability of the results and reduced bias. Metaanalysis authors should plan ahead of time about the data extraction process in the beginning stages when writing the protocol. The process can go smoothly as long as the authors follow these general steps:      

Identify a list of the necessary variables for extraction (use similar published meta analyses when available). Develop and pilot data extraction forms. Provide clear instructions and documentation on how to extract and code the data. Pilot the data extraction forms to identify problems Verify data using two reviewers to extract data And resolve inconsistency by consensus or by using a third reviewer

REFERENCES [1]

[2]

[3]

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 2009; 339: b2700. Crummett C, Graham A, McNeill K, Sheehan D, Stout A. Manage Your Data. Data Management and Publishing Subject Guide 2014. Available from: http://libraries.mit.edu/guides/subjects/data-management/index.html (last accessed on March 21, 2014). Buscemi N, Hartling L, Vandermeer B, Tjosvold L, Klassen TP. Single data extraction generated more errors than double data extraction in systematic reviews. J. Clin. Epidemiol 2006; 59: 697-703.

Complimentary Contributor Copy

86 [4]

[5]

[6]

[7]

[8]

[9] [10]

[11]

[12]

[13]

[14]

[15]

Joey Nicholson and Sripal Bangalore Jones AP, Remmington T, Williamson PR, Ashby D, Smyth RL. High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. J. Clin. Epidemiol. 2005; 58: 741-2. Zaza S, Wright-De Agüero LK, Briss PA, Truman BI, Hopkins DP, Hennessy MH, Sosin DM, Anderson L, Carande-Kulis VG, Teutsch SM, Pappaioanou M. Data collection instrument and procedure for systematic reviews in the Guide to Community Preventive Services. Task Force on Community Preventive Services. Am. J. Prev. Med. 2000; 18): 44-74. Institute of Medicine, Eden J, Levit L, Berg A, Morton S. Standards for Finding and Assessing Individual Studies. Finding What Works in Health Care: Standards for Systematic Reviews. Washington: National Academies Press; 2011. Higgins JPT, Deeks JJ. Selecting studies and collecting data. In: Higgins J, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions. Version 5. The Cochrane Collaboration; 2011. Available from: http://handbook.cochrane.org/ index.htm#chapter_7/7_6_extracting_data_from_reports.htm (last accessed on March 21, 2014). Centre for Reviews and Dissemination. Core Principles and Methods. Systematic Reviews: CRD‘s Guidance for Undertaking Reviews in Health Care. NHS Centre for Reviews & Dissemination; 2009. Available from: https://www.york.ac.uk/inst/ crd/SysRev/!SSL!/WebHelp/SysRev3.htm#TITLEPAGE.htm (last accessed on March 21, 2014). Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in metaanalyses that use standardized mean differences. JAMA 2007; 298: 430-7. Noyes J, Lewin S. Extracting Qualitative Evidence. In: Noyes J, Booth A, Hannes K, Harden A, Harris J, Lewin S, Lockwood C, editors. Supplementary Guidance for Inclusion of Qualitative Research in Cochrane Systematic Reviews of Interventions. 1st ed. Cochrane Collaboration Qualitative Methods Group; 2011. Available from: http://cqrmg.cochrane.org/supplemental-handbook-guidance (last accessed on March 21, 2014). Parmar MK, Torri V, Stewart L. Extracting summary statistics to perform metaanalyses of the published literature for survival endpoints. Stat. Med. 1998; 17: 281534. Tendal B, Higgins JPT, Jüni P, Hróbjartsson A, Trelle S, Nüesch E, Wandel S, Jørgensen AW, Gesser K, Ilsøe-Kristensen S, Gøtzsche PC. Disagreements in metaanalyses using outcomes measured on continuous or rating scales: observer agreement study. BMJ 2009; 339: b3128. Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, Williamson PR. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ 2010; 340: c365. Lau J, Chang S, Berkman N, Ratichek SJ, Balshem H, Brasure M, Moher D. EPC Response to IOM Standards for Systematic Reviews [Internet]. Rockville, MD: Agency for Healthcare Research and Quality (US); 2013 Apr. Report No.: 13-EHC006-EF. Available from: http://www.ncbi.nlm.nih.gov/books/NBK137841/#results.s45 (last accessed on March 21, 2014). Elamin MB, Flynn DN, Bassler D, Briel M, Alonso-Coello P, Karanicolas PJ, Guyatt GH, Malaga G, Furukawa TA, Kunz R, Schünemann H, Murad MH, Barbui C, Cipriani

Complimentary Contributor Copy

Abstracting Evidence

87

A, Montori VM. Choice of data extraction tools for systematic reviews depends on resources and review complexity. J. Clin. Epidemiol. 2009; 62: 506-10. [16] Von Elm E, Poglia G, Walder B, Tramèr MR. Different patterns of duplicate publication: an analysis of articles used in systematic reviews. JAMA 2004; 291: 97480. [17] Tendal B, Nüesch E, Higgins JPT, Jüni P, Gøtzsche PC. Multiplicity of data in trial reports and the reliability of meta-analyses: empirical study. BMJ 2011; 343: d4829. [18] Tramèr MR, Reynolds DJ, Moore RA, McQuay HJ. Impact of covert duplicate publication on meta-analysis: a case study. BMJ 1997; 315: 635-40.

Complimentary Contributor Copy

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 6

APPRAISING EVIDENCE Partha Sardar, M.D.1,*, Anasua Chakraborty, M.D.2,† and Saurav Chatterjee, M.D.3,‡ 1

Research Associate, Texas Tech University Health Sciences Center, El Paso, TX, US Fellow, Pulmonary and Critical Care Medicine, Thomas Jefferson University Hospital, Philadelphia, PA, US 3 Fellow, Division of Cardiovascular Diseases, St. Luke‘s - Roosevelt Hospital Center of the Mount Sinai Health System, New York, NY, US 2

ABSTRACT Assessment of the quality of evidence in a network meta-analysis is of great importance, especially in presence of complex network constructs, as it is possible to amplify any errors of bias identification in presence of multiple comparisons. Validity of the individual studies in a network should be evaluated by a systematic approach. The assessment of the risk of bias and its consideration in the network meta-analysis is far more challenging than in conventional meta-analyses as it may affect several pooled effect estimates obtained in a network meta-analysis. The reliability of the results of a study depends on the extent to which potential sources of bias have been avoided during study conduct. A key part of a review is to consider the risk of bias in the results of each of the eligible studies. Many tools have been used for risk of bias assessment. These tools have been shown in prior research to be reliable. To adjust for bias, meta-regression can be used in both pair-wise and network meta-analyses. For network meta-analyses, there is ongoing research in methodology to identify and adjust for novel sources of bias-however some guidance is already available for performing rigorous network meta-analyses with the highest order of evidence presently.

*

Email: [email protected]. † Email: [email protected]. ffi Address for correspondence: Saurav Chatterjee, MD, Division of Cardiovascular Diseases, St. Luke‘s - Roosevelt Hospital, Center, Division of Cardiology, 3rd floor, Clark Building, 1111 Amsterdam Avenue, New York, NY 10025, USA. Fax: 001-347-244-7148. E-mail: [email protected].

Complimentary Contributor Copy

90

Partha Sardar, Anasua Chakraborty and Saurav Chatterjee

Keywords: Grading evidence, internal validity, network meta-analysis, quality assessment, risk of bias tool

INTRODUCTION Assessment of the quality of evidence in a network meta-analysis is in part an assessment of the strength of the individual studies, and of likely even greater significance, that of testing the quality and validity of the inter-relation between the constituent studies of the network. The basis of the strength of the interpretations of a network meta-analysis depends on the validity of the individual studies. The evaluation of the quality of the included studies is therefore an essential component of a network meta-analysis, and should influence the component analyses, interpretation(s) and conclusion(s). Validity of the studies should be evaluated by a systematic approach, in the same vein as that of the relevant search for the studies by ‗assessments of methodological quality‘ or ‗quality assessment‘. The Cochrane collaboration recommends use of the phrase ―risk of bias‖ instead of ―quality,‖ reasoning that ―an emphasis on risk of bias overcomes ambiguity between the quality of reporting and the quality of the underlying research (although it does not overcome the problem of having to rely on largely published reports to assess the underlying research).‖ [1] The definition of the ‗quality‘ of a study varies depending on the different recommendations. The Grading of Recommendations Assessment, Development and Evaluation Working Group (GRADE) use the term quality to refer to an individual study and consequent judgments based on the strength of the body of evidence (quality of evidence) presented therein. [2] The U.S. Preventive Services Task Force (USPSTF) equates quality of a study with internal validity and individual studies are classified first according to a hierarchy of study design and then by individual criteria. [3] The assessment of the risk of bias and its consideration in network meta-analysis is far more challenging than in conventional meta-analysis. While bias in the effect estimate from any single trial affects a pooled effect estimate in a conventional meta-analysis, it may affect several pooled effect estimates obtained in a network meta-analysis. In this chapter, we will discuss, how to critically appraise each study included in a network meta-analysis: available tools for assessing quality and risk of bias, stages in assessing the risk of bias, using predefined criteria for screening of studies and special issues related to bias in network meta-analysis.

RISK-OF-BIAS ASSESSMENT Different approaches of the ―risk-of-bias assessment‖ have been suggested. However, no strong empirical evidence supports one approach over another. In the absence of strong supportive evidence, methodological decisions in confirming risk of bias rely on epidemiological principles. [1] Available instruments and recommendations have indicated the types of constructs included in risk of bias or quality assessments to have included one or more of the following issues:

Complimentary Contributor Copy

Appraising Evidence 1) 2) 3) 4) 5) 6) 7) 8) 9)

91

Conduct of the study or internal validity External validity or applicability Random error Completeness of reporting Selective outcome reporting Choice of outcome measures Study design Fidelity of the intervention, and Conflicts of interest involved

SOURCES OF BIAS IN STUDIES The reliability of the results of a study depends on the extent to which potential sources of bias have been avoided during study conduct. A key part of a review is to consider the risk of bias in the results of each of the eligible studies. The Cochrane Collaboration classifies the biases for RCTs into domains of selection bias, performance bias, attrition bias, detection bias and reporting bias. Selection bias in a study refers to systematic differences between baseline characteristics of the groups that are compared. In a RCT, the selection bias can be avoided by proper randomization. Initial step is to evaluate for sequence generation; which is a rule for allocating interventions to participants. Next is to examine for allocation concealment, which is strict implementation of that schedule of random assignments by preventing foreknowledge of the forthcoming allocations. Performance bias refers to systematic differences between groups in the care that is provided, or in exposure to factors other than the interventions of interest. Adequate and proper blinding (or masking) of study participants and personnel may reduce the risk of performance bias. Detection bias refers to systematic differences between groups in how outcomes are determined. Blinding (or masking) of outcome assessors may reduce the risk of this kind of bias. Attrition bias refers to systematic differences between groups in withdrawal from a study, which lead to incomplete outcomes data. Reporting bias refers to systematic differences between reported and unreported findings. Within a published report those analyses with statistically significant differences between intervention groups are more likely to be reported than non-significant differences. Additionally, The Cochrane Collaboration mentioned about the other sources of bias that are relevant only in certain circumstances. These relate mainly to particular trial designs, specific circumstances or particular clinical setting, and also include issues like rigorous and valid definitions of end-points and adjudications, and interim analyses. For all potential sources of bias, it is important to consider the likely magnitude and the likely direction of the bias. For example, if all methodological limitations of studies were expected to bias the results towards a lack of effect, and the evidence indicates that the intervention is effective, then it may be concluded that the intervention is effective even in the presence of these potential biases.

Complimentary Contributor Copy

92

Partha Sardar, Anasua Chakraborty and Saurav Chatterjee

ADDITIONAL FACTORS INFLUENCING BIASES Numerous, often discipline-specific, taxonomies exist for classifying the different phenomena that introduce bias in studies. Risk of bias might be influenced by conflict of interest from the investigators and sponsors, inflated effect size in prematurely stopped trials, and with type I errors and false positive findings.

TOOLS FOR ASSESSING QUALITY AND RISK OF BIAS Many tools have been used for risk of bias assessment. These tools have been shown in prior research to be reliable or valid, are widely used, or have been recommended for use in systematic reviews that compared risk of bias assessment instruments. [1, 4, 5] For most tools, the initial step in assessing whether a chosen tool is applicable to a specific study is to categorize the study design.

RANDOMIZED CONTROLLED TRIALS A large number of tools have been developed to assess the risk of bias in randomized clinical trials (RCTs).

Cochrane Collaboration “Risk of Bias Tool” The Cochrane Collaboration recommends use of an updated ‗Risk of Bias‘ tool. [1, 2] It is a two-part tool, addressing seven specific, individual domains (namely sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcomes assessment, incomplete outcome data, selective outcome reporting and specific ‗other issues‘). The ‗Risk of bias‘ table access a study as ‗low risk‘ of bias, ‗high risk‘ of bias, or ‗unclear risk‘ of bias. Each specific domain includes one or more specific entries and each entry of the tool describes what was reported to have happened in the study, to support a judgment about the risk of bias. A single entry for each study is allowed for the domains of sequence generation, allocation concealment and selective outcome reporting. For blinding of participants and personnel, blinding of outcome assessment and for incomplete outcome data, two or more entries may be used because assessments may need to be made separately for different outcomes (or for the same outcome at different time points). The Cochrane Handbook emphasizes that topics within the ‗other issues‘ domains should focus on specifics related to bias and not imprecision, heterogeneity, or other quality measures that are unrelated to bias, and may be dealt with separately. Further, these items will vary across different reviews and should be identified and pre-specified when developing the review protocol. Although the Risk of Bias tool is now the most popular and recommended method for assessing risk of bias of RCTs in systematic reviews, the tool has not till date undergone extensive validity or reliability testing. However, one of the important and unique features of

Complimentary Contributor Copy

Appraising Evidence

93

the Risk of Bias tool is its transparency in implementation. Indeed, the Cochrane Collaboration argues that this transparency is more important than demonstrations of ―reliability‖ and ―validity,‖ because complete transparency is ensured and each assessment can readily be re-appraised by the reviewer/reader.

The Jadad Scale The Jadad scale has been the most commonly used tool to assess risk of bias of RCTs in the past. [6] This scale addresses three domains (i.e., randomization, blinding, and handling of withdrawals and drop-outs) and includes five questions. However the Jadad scale does not address adequacy of allocation concealment. Recently, concerns regarding its appropriateness have emerged. Specifically, there is some evidence that the tool reflects quality of reporting rather than risk of bias. [7] In fact the recent version of the Cochrane handbook has recommended authors to move away from using ‗scales‘ and ‗score‘-and rather to focus on the reporting of the issues identified in sufficient details with the ‗Risk of Bias‘ tool.

Other Tools A recent systematic review [8] identified 21 scales to assess the risk of bias of RCTs but found that the majority were not ―rigorously developed or tested for validity and reliability.‖ Delphi List to assess RCTs includes the following items: inclusion/exclusion criteria of the study population defined; randomization; allocation concealment; baseline comparability of study groups; blinding of investigator, subjects, and care providers; reporting of point estimates and variability for primary outcomes; and intention-to-treat analysis. [8, 9] Another tool was developed by Yates et al., and has two parts, one related to the treatment (five items) and the second related to study design and methods (eight items with multiple parts). [8]

NONRANDOMIZED STUDIES Recommendations are also available for the risk of bias assessment of nonrandomized studies.

The Newcastle-Ottawa Scale The Newcastle-Ottawa Scale (NOS) was developed to assess the quality of nonrandomized studies with its design and content. [10] A 'star system' has been developed in which a study is judged on three broad perspectives: the selection of the study groups; the comparability of the groups; and the ascertainment of either the exposure or outcome of interest for case-control or cohort studies respectively. The tool was revised based on the face and content validity for this instrument. Its content validity and inter-rater reliability have

Complimentary Contributor Copy

94

Partha Sardar, Anasua Chakraborty and Saurav Chatterjee

been established. An assessment plan is being formulated for evaluating its construct validity with consideration of the theoretical relationship of the NOS to external criteria and the internal structure of the NOS components.

Other Tools At least 86 different tools are available for bias assessment tools for observational studies. [11] The Cochrane Collaboration recommends following the domains in the Risk of Bias tool for RCTs, particularly for prospective studies. Risk of Bias tool is currently being modified by a working group within the Cochrane Collaboration for use in nonrandomized studies. Downs and Black developed a scale for nonrandomized studies. [12] However, this scale requires considerable epidemiology expertise, and has been found difficult to apply to casecontrol studies.

STAGES IN ASSESSING THE RISK OF BIAS OF STUDIES Agency for Healthcare Research and Quality (AHRQ) [8] has recommended approaches to assessment of risk of bias in five steps: protocol development, pilot testing and training, assessment of risk of bias, interpretation, and reporting. These stages and specific steps are described in Table 1. Details plan for the assessment of risk of bias should be included within the initial protocol of the network meta-analysis.

INCORPORATING ASSESSMENTS INTO NETWORK META-ANALYSES Bayesian methods can be used to adjust for the biases in a meta-analysis; however these methods are not sufficiently well developed for widespread adoption, and subject of current methodology research. Bayesian analyses allow for the incorporation of external information or opinion on the nature of bias.

SPECIAL CONSIDERATIONS IN NETWORK META-ANALYSIS The issue of bias may assume special significance when comparing multiple interventions. Studies with a high risk of bias influence the credibility of the summary estimate from a conventional meta-analysis [1, 13]. Similar concerns also exist for the network meta-analysis, while comparing multiple interventions. To adjust for bias, metaregression can be used in a way similar to the standard pairwise meta-analysis.

Complimentary Contributor Copy

Table 1. Stages in assessing the risk of bias of individual studies* Stages in risk-of-bias assessment 1. Development of protocol

2. Pilot test and train

3. Perform assessment of risk of bias of individual studies

4. Use assessment of risk of bias in synthesis of evidence

Specific steps Identification of specific terms of interest Criteria for specific risk-of-bias category assessment Choice of specific risk-of-bias rating tool(s) and justification of same Explanation and stratification of low, moderate, high, or unclear risk of bias for individual outcomes and justification of use of scales or scores leading to categories of risk of bias A priori determination of how inconsistencies between pairs of risk of bias reviewers will be resolved Justification on how the synthesis of the evidence will include assessment of risk of bias (including whether studies with high or unclear risk of bias will be used in synthesis of the overall evidence, or any sensitivity analysis thereof) Minimum of two reviewers have been recommended for rating the risk of bias of each study, with a third reviewer to serve as arbitrator of conflicts Appropriate training of reviewers Pilot test assessment and corroboration of use of risk of bias tools using a small subset of studies that represent the range of risk of bias in the evidence base Analysis of the need for resolution of issues with usage of specific tools, and revised tools or training as needed Identification of study design of each (individual) study Judgments about each risk of bias criterion for each predetermined outcome specific for individual types of study Assessment of overall risk of bias for each included outcome of the individual study and categorization into low, moderate, high, or unknown risk of bias within study design; appropriate documentation of the reasons for judgment and process for finalizing judgment Resolve differences in judgment, with involvement of multiple reviewers as per need, and recording of final rating for each outcome Preplanned ‗sensitivity‘ analyses Consider additional required analyses Incorporate assessment of risk of bias in quantitative/qualitative synthesis, per separate study design categories

Complimentary Contributor Copy

Table 1. (Continued) Stages in risk-of-bias assessment 5. Report assessment of risk of bias process and limitations

Specific steps Cite reports on validation of the selected tool(s), the assessment of risk of bias process (summarizing from the protocol), and limitations to the process followed for the individual review Postulated actions to improve assessment of risk-of-bias reliability if applicable

*Adapted from: Viswanathan M, Ansari MT, Berkman ND, Chang S, Hartling L, McPheeters LM, Santaguida PL, Shamliyan T, Singh K, Tsertsvadze A, Treadwell JR. Assessing the Risk of Bias of Individual Studies in Systematic Reviews of Health Care Interventions. Agency for Healthcare Research and Quality Methods Guide for Comparative Effectiveness Reviews. March 2012. AHRQ Publication No. 12-EHC047-EF. Available at: www.effectivehealthcare.ahrq.gov/ (last accessed on March 21, 2014).

Complimentary Contributor Copy

Appraising Evidence

97

For example, an indicator variable can be created to define the ‗appropriate‘, ‗unclear‘, and ‗inappropriate‘ method of allocation concealment and can be included in the network meta-regression. Dias et al., used this approach with a modification for probabilistic modeling of the ‗unclear‘ risk of bias. [14]. However, during multiple treatment comparison, it is necessary to make assumptions about the direction of the bias. When the network is starshaped (all trials are placebo-controlled trials), it is expected that the bias will favor the active treatment. When two active treatments are compared, stronger assumptions are required for the direction of bias, to avoid the risk that the newer or the sponsored treatment is favored. Directionality assumptions can be embedded in the model in the form of an index variable. Adjusting for bias in a network of interventions offers the advantage of increased power compared with the traditional, pair-wise meta-analysis. Consider, for example, that comparison AB is informed by very few studies or studies that all fall in the same quality category, that is, they all have poor allocation concealment. Then, conducting sensitivity analysis or adjusting via meta-regression is suboptimal or impossible. However, if these studies are part of a network meta-regression model, the bias coefficient for allocation concealment estimated in the network from all involved comparisons is imposed to AB studies as well, and the summary estimate for AB can be adjusted. This rests on the assumption that the magnitude of bias is similar across comparisons, which can be defendable in many clinical settings. Moreno et al., used a similar approach to address small study effects and publication bias in a network of antidepressants. [15]

“Small Study Effects” or Publication Bias and Network Meta-Analysis Funnel plot asymmetry can be caused by the association between sample size, heterogeneity, and the probability of publication, and a very challenging issue in metaanalysis. Similar problem exists for networks of interventions and may have even more extended form comparisons that do not give significant results may be underrepresented or completely missing from the network, and their relative effects will primarily be informed by indirect evidence. Publication bias and selective reporting might affect interventions and comparisons in different ways depending on the clinical context in network meta-analysis. Using methodology from ecology, attempts have been made to associate the possibility of selection bias with asymmetry measures of the network. [16]

Sponsorship Bias and Network Meta-Analysis Magnitude and direction of the observed effect in multiple treatment comparison may occasionally be influenced by the sponsorship of a trial. [17] A comparison of A versus B may favor A when the trial is sponsored by the company that produces A; while it may favor B when the trial is sponsored by the company that produces B. Sponsorship bias may reflect subtle or less subtle differences in the study designs or the conduct of a trial that only supports the preferred drug. The impact of sponsorship bias may be more complicated, in case of complex network analysis.

Complimentary Contributor Copy

98

Partha Sardar, Anasua Chakraborty and Saurav Chatterjee

Role of Network Meta-Epidemiology There is scope to evaluate biases that can affect an entire research field, while conducting network meta-analysis. Network meta-epidemiology offers the opportunity of joint study of characteristics associated with the design of the study (e.g. blinding). Regression bias coefficients estimated within a network can be assumed exchangeable across a collection of networks extending the idea of meta-epidemiology [18] and providing large-scale evidence for potential sources of bias [16, 19]. The biases that are not identifiable in a head-to-head meta-analysis can be explored in a network meta-analysis. Optimism bias, associated with the use of novel interventions, has been a concern difficult to address, in traditional pair wise meta-analysis. However, in a network of interventions, the same treatment C can be the newer and, hence, the ‗favored‘ in a comparison AC, but the older in another comparison BC. This allows us to study apparent changes in the effectiveness of C because of optimism. In a network metaepidemiology model, three networks on different cancer treatments were linked to estimate the novelty bias effect. [16, 19].

Inconsistency and Bias The presence of bias can manifest itself as an inconsistency in the constructed network. Exploration of possible sources of inconsistency may enhance understanding of risk of bias and form a more stable network of comparisons, with greater validity of resultant conclusions. The risk of bias may differ across different regions within the network of interventions being examined in multiple treatment comparisons. Future methodological research is needed to address the options to deal with such variation in risk of bias between direct and indirect comparisons and across the network. Specifically, such research may examine the impact of risk of bias in an individual trial on the network meta-analytic effect estimates, identify the biases specific to the network meta-analysis context that need to be considered, develop methods to assess, summarize and present the variation in risk of bias across the network, and use empirical research to postulate guidance for network meta-analysts on incorporating bias assessments in statistical analyses. Finally, methodological research may also examine whether network meta-analysis offers a potential method for identifying and adjusting for biases within included trials.

CONCLUSION There are standardized methods to appraise evidence for assessment of risk of bias in standard, frequentist, pair-wise meta-analyses. While some of the available methods may be used for appraisal of evidence for network meta-analyses as well, but complex networks may represent unique challenges as there is potential for multiplying any undetected bias and errors. Ongoing and future research should focus on specifically identifying these, and present

Complimentary Contributor Copy

Appraising Evidence

99

options for adjustments of same for the synthesis of the most rigorous evidence by way of network meta-analyses.

REFERENCES [1]

[2]

[3] [4] [5]

[6] [7] [8]

[9]

[10]

[11] [12] [13] [14]

Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0. In: Higgins JPT, Green S, eds. The Cochrane Collaboration; 2011. Available from www.cochranehandbook.org (last accessed on March 21, 2014). Balshem H, Helfand M, Schunemann HJ, Oxman AD, Kunz R, Brozek J, Vist GE, Falck-Ytter Y, Meerpohl J, Norris S, Guyatt GH.. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol 2011; 64: 401-6. U.S. Preventive Services Task Force Procedure Manual. AHRQ Publication No. 0805118-EF. Available at:http://www.uspreventiveservicestaskforce.org/uspstf08/ methods/procmanual.htm (last accessed on March 21, 2014). Juni P, Altman DG, Egger M. Assessing the quality of controlled clinical trials. In: Egger M, Davey SG, Altman DG, eds. Systematic reviews in health care. Meta-analysis in context. London: BMJ Books; 2001. Lohr KN. Rating the strength of scientific evidence: relevance for quality improvement programs. Int J Qual Health Care 2004; 16: 9-18. Jadad AR, Enkin M. Randomized Controlled Trials: Questions, Answers and Musings. 2007. London: Blackwell; 2007. Guyatt GH, Oxman AD, Kunz R, Atkins D, Brozek J, Vist G, Alderson P, Glasziou P, Falck-Ytter Y, Schünemann HJ.. GRADE guidelines: 2. Framing the question and deciding on important outcomes. J Clin Epidemiol 2011;64(4):395-400. Viswanathan M, Ansari MT, Berkman ND, Chang S, Hartling L, McPheeters LM, Santaguida PL, Shamliyan T, Singh K, Tsertsvadze A, Treadwell JR. Assessing the Risk of Bias of Individual Studies in Systematic Reviews of Health Care Interventions. Agency for Healthcare Research and Quality Methods Guide for Comparative Effectiveness Reviews. March 2012. AHRQ Publication No. 12-EHC047-EF. Available at: www.effectivehealthcare.ahrq.gov/ (last accessed on March 21, 2014). Guyatt GH, Oxman AD, Kunz R, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, Devereaux PJ, Montori VM, Freyschuss B, Vist G, Jaeschke R, Williams JW Jr, Murad MH, Sinclair D, Falck-Ytter Y, Meerpohl J, Whittington C, Thorlund K, Andrews J, Schünemann HJ. GRADE guidelines 6. Rating the quality of evidenceimprecision. J Clin Epidemiol 2011; 64: 1283-93. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Available at: http://www.ohri.ca/programs/clinical_epidemiology/ oxford.asp (last accessed on March 21, 2014). Cochrane Collaboration Glossary Version 4.2.5. 2005. Available at:http://www. cochrane.org/sites/default/files/uploads/glossary.pdf (last accessed on March 21, 2014). Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, Alonso-Coello P, Djulbegovic B, Atkins D, Falck-Ytter Y, Williams JW Jr, Meerpohl J, Norris SL, Akl EA, Schünemann HJ. GRADE guidelines: 5. Rating the quality of evidence-publication bias. J Clin Epidemiol 2011; 64: 1277-82.

Complimentary Contributor Copy

100

Partha Sardar, Anasua Chakraborty and Saurav Chatterjee

[15] Mills EJ, Ioannidis JP, Thorlund K, Schünemann HJ, Puhan MA, Guyatt GH. How to use an article reporting a multiple treatment comparison meta-analysis. JAMA 2012; 308: 1246-53. [16] Dias S, Welton N, Marinho V, Salanti G, Ades A. Estimation and adjustment of bias in randomised evidence using mixed treatment comparison meta-analysis. J Royal Stat Soc (A) 2010;173:613-29. [17] Moreno SG, Sutton AJ, Ades AE, Cooper NJ, Abrams KR. Adjusting for publication biases across similar interventions performed well when compared with gold standard data. J Clin Epidemiol 2011; 64: 1230–1241. [18] Salanti G, Dias S, Welton NJ, Ades AE, Golfinopoulos V, Kyrgiou M, Mauri D, Ioannidis JP. Evaluating novel agent effects in multiple-treatments meta-regression. Stat Med 2010; 29: 2369-2383 [19] Salanti G, Higgins JP, Ades AE, Ioannidis JP. Evaluation of networks of randomized trials. Stat Methods Med Res 2008; 17: 279-301. [20] Sterne JA, Juni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study characteristics on treatment effects in ‗metaepidemiological‘ research. Stat Med 2002; 21: 1513-1524. [21] Salanti, G. Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Res Synth Method 2012; 3: 80-97.

Complimentary Contributor Copy

4TH SECTION

Complimentary Contributor Copy

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 7

CHOOSING BETWEEN FREQUENTIST AND BAYESIAN FRAMEWORKS AND THE CORRESPONDING STATISTICAL PACKAGE Giuseppe Biondi-Zoccai, M.D.1,* and Fabrizio D’Ascenzo, M.D.2 1

Assistant Professor in Cardiology, Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Latina, Italy 2 Fellow, Division of Cardiology, Department of Internal Medicine, Città della Salute e della Scienza, Turin, Italy

ABSTRACT Once the decision to pursue evidence synthesis with adjusted indirect comparison, mixed treatment comparison, or network meta-analysis has been made, two key choices need to be made. First, the statistical framework and approach, frequentist versus Bayesian, must be selected. Afterwards, and accordingly, the suitable computing package must be identified and exploited. Historically, the frequentist approach was the first to be adopted for evidence synthesis, with computation tools initially based on simple means, in keeping with the limitations in computing power of the past. More recently, several alternative statistical packages have become available, based on frequentist or Bayesian approaches, enabling simple as well as highly sophisticated analyses for evidence synthesis. This Chapter will highlight the pros and cons of the frequentist and Bayesian approaches, as well as the key features of the main statistical packages developed to date for network meta-analysis. *

Corresponding author: Giuseppe Biondi-Zoccai, MD, Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Corso della Repubblica 79, 04100 Latina, Italy. Phone: +39 07731757245. Fax: +39 07731757254. Email: [email protected].

Complimentary Contributor Copy

104

Giuseppe Biondi-Zoccai and Fabrizio D‘Ascenzo We hope to clarify the reader that it is important to emphasize similarities rather than differences among different approaches and tools, reconciling their apparent discrepancies. The ultimate goal is sound and careful application of the results stemming from any given approach or package for clinically relevant evidence synthesis and decision making, rather than explicit yet uncritical preference of one of them.

Keywords: Adjusted indirect comparison, Bayesian inference, Bayesian statistics, frequentist inference, frequentist statistics, mixed treatment comparison, network meta-analysis, statistical package, statistical software

INTRODUCTION Decision making can be informed by different factors. In the era of evidence based medicine, the best quality evidence base should guide decision making in keeping with patient preferences, physician skills, and system resources [1]. There is debate on whether large pragmatic randomized trials or systematic reviews of homogenous and similar randomized trials represent the uppermost ladder in the hierarchy of evidence [2]. Yet, in several instances there may be persisting uncertainty on which treatment is the most beneficial and safe for a given condition [3]. Whether any single trial can address in a conclusive fashion any given issue, and whether we can truly pose as uninformed about prior probabilities when interpreting a trial (equipoise) is also often proving a challenging question [4-5]. On top of this, this scenario is becoming ever increasingly commoner and more complex as several similar treatments are introduced over time in the healthcare marketplace for the very same condition, making materially impossible to obtain precise and accurate comparative estimates stemming from large and pragmatic randomized trials. Adjusted indirect comparisons [6-7], network meta-analyses [8], multivariate/multitreatment meta-analyses [9], and mixed treatment comparisons [10] have been developed with the precise goal of overcoming the limitations of the currently fragmented and asymmetric evidence base stemming from head-to-head randomized trials (Figure 1). Accordingly, they hold the promise of providing more precise and more accurate effect estimates, while also enabling additional insightful analysis (e.g. exploration of bias sources, moderator effects, and so forth) [11]. In addition, they may also be less biased and valid [12]. The leitmotiv of this book is indeed that network meta-analysis and similar approaches are highly beneficial, if appropriately used and interpreted, in empowering decision-makers. Pragmatically, anyone wishing to pursue this type of evidence synthesis must however become acquainted with the statistical and probability environment in which the analyses will take place, and the actual instrument which will be used to carry out the necessary computations. In other words, in order to complete a network meta-analysis we must first choose between a frequentist and a Bayesian conceptual framework. Then, the most appropriate statistical package must be identified. This Chapter focuses on these two important and interconnected topics. Subsequent chapters will inform the reader on how to proceed further in a safe and successful fashion once these two crucial cross-roads have been overcome.

Complimentary Contributor Copy

Choosing between Frequentist and Bayesian Frameworks …

105

Figure 1. Conceptual differences between pairwise meta-analysis, adjusted indirect comparison metaanalysis, mixed treatment comparison, and multiple treatment comparison. A, B, C and D represent alternative treatments compared in one or more head to head randomized trials (continuous lines), with ensuing meta-analytic estimates based on direct, indirect or network/mixed methods (dashed lines).

CHOOSING THE STATISTICAL FRAMEWORK Frequentist Framework History of meta-analysis has repeated itself, [13] with the first pioneering efforts at network meta-analysis being both conceived and conducted within frequentist realms [14]. Specifically, Bucher and colleagues developed in the late ‗90s their original concept of adjusted indirect comparison within a frequentist and mainly fixed effects context, [6-7] whereas Lumley et al., first proposed a comprehensive and thorough frequentist network meta-analytic approach to evidence synthesis (and actually originally devised the very term ―network meta-analysis‖) [8, 15]. The frequentist approach will indeed prove appealing for those who already have a working knowledge of basic or advanced biostatistics (Table 1). However, at closer look frequentist inference is only apparently simple and straightforward, as it relies on the complex and counterintuitive concepts of hypothesis testing, null hypothesis (H0), alternative

Complimentary Contributor Copy

Giuseppe Biondi-Zoccai and Fabrizio D‘Ascenzo

106

hypothesis (H1), type I error and type II error. Even confidence intervals (CI), which have simplified the interpretation of frequentist inference and empowered end-users in applying it for decision-making, are often misunderstood as direct probability statements [16]. Table 1. Differences and similarities between frequentist and Bayesian approaches for network meta-analysis

Prior information Basic interpretation

Presentation of results

Frequentist framework Used only informally How likely can the data at hand have occurred by simple random variability given a specific parameter value? Likelihood functions, p values, confidence intervals

Caveat

P values and confidence intervals often misinterpreted as, respectively, probability that the alternative hypothesis (H1) is true and direct probability statements regarding the summary effect estimate

Additional features

Model fit and efficiency appraised with Akaike information or similar criteria

Bayesian framework Used formally by specifying a prior probability distribution How likely is a specific parameter value given the data at hand? Posterior probability distributions, credibility intervals, probability of being best/worse, surface under the cumulative ranking area Priors may be difficult to generate; non-informative priors may occasionally be sufficiently informative to impact on posterior probability statements; subjective component often overemphasized uncritically by readers thus undermining the overall set of analyses Easily incorporates skepticism or expert opinion; can adjust for baseline risk; can model specialized likelihood functions; model fit and efficiency appraised with deviance information criterion

Indeed, as poignantly summarized by Jansen and colleagues, within a frequentist framework ―… the result of the analysis is a point estimate with a 95% CI. The 95% CI, under repeated sampling, would contain the true population parameter 95% of the time. It must be noted that CIs obtained with a frequentist approach cannot be interpreted in terms of probabilities; the 95% CI does not mean that there is 95% probability that ‗true‘ or population value is between the boundaries of the interval‖ [17-18]. Accordingly, the other apparently appealing feature of the frequentist approach is the generation of p values, which are still considered today, despite their obvious limitations, a necessary evil to enable immediate understanding of statistical inference.

Complimentary Contributor Copy

Choosing between Frequentist and Bayesian Frameworks …

107

Bayesian Framework In keeping with Spiegelhalter et al., Bayesian approaches may be generally ―defined as the explicit quantitative use of external evidence in the design, monitoring, analysis, interpretation and reporting‖ of evidence [19-20]. This is based on the combined use by means of the Bayes theorem of: a) a prior probability distribution based on some external evidence or subjective assessment, b) a likelihood estimate based on the real information at hand, and c) a posterior probability distribution which combines a) and b), finally enabling decision-making. Notably, if a) is considered irrelevant and completely uninformative, than c) basically equals b) and such Bayesian inference will tend to correspond to frequentist inference. The first approach to complex evidence synthesis with Bayesian means is probably best attributed to Ades and colleagues, [10, 21] who built upon the pioneering confidence profile method developed by Eddy et al., for pairwise analyses [22]. Subsequent developments and applications for Bayesian network meta-analysis have been momentous, and most of mixed treatment comparisons published to date in the scholarly literature have been performed in such context [23]. A strategic contribution to the systematization and clarification of how best exploit Bayesian methods for mixed treatment comparisons has been achieved by the Bristol and Leicester group lead by Ades and Abrams [24-31]. The obvious difference between frequentist and Bayesian methods, and the one which really identifies the latter set of inferential mean, is the incorporation of prior beliefs in the form of a prior probability distribution into the final inferential estimates [20, 32]. This is both a blessing and a curse, as no one is really uninformed when planning an analysis, and different vague priors may significantly impact on the final results of the analysis [33]. Conversely, flexibility in the choice of priors and inclusion of expert opinions do make Bayesian meta-analyses capable of incorporating varying degrees of uncertainty and skepticism, [34-35] which may safeguard from the overwhelming risk of false positive research findings in modern biomedical and psychological research [36]. The other major distinctive feature of Bayesians methods as far as network meta-analysis is concerned is the direct interpretation and usage of probabilistic estimates for modeling and decision making purposes, [37] and the ranking of treatments (with probability of being best, being worst, or the surface under the cumulative ranking area [SUCRA] curve) [17]. Other additional advantages of the Bayesian framework also include the ease with which specialized likelihoods can be modeled (e.g. binomial, multinomial and Poisson), the suitability for metaregression exploring the impact of baseline risk, [38-40] and the inferential robustness even there are few events or zero count cells, as well as outliers [41]. In summary, as emphatically stated by Carlin et al., [42] and corroborated by Jonas and colleagues and Song et al., [14, 43] by providing probabilities of being best of any set of chosen treatments, Bayesian methods may appear in comparison to frequentist approaches ―more flexible and their results more clinically interpretable‖. However, they do seem to ―require more careful development and specialized software.‖ [42] Specifically, Song et al., have recently provided insightful data from a simulation study appraising the impact of

Complimentary Contributor Copy

Giuseppe Biondi-Zoccai and Fabrizio D‘Ascenzo

108

different type of bias in primary studies on the results of direct comparisons, adjusted indirect comparisons, frequentist network meta-analyses, and Bayesian meta-analyses, based on either consistency or inconsistency models [14]. The results were not univocally in favor of one of the methods, but clearly frequentist and Bayesian network meta-analysis methods exploiting consistency models appeared less likely than the other approaches to be fully biased depending on the different biasing scenarios, while concomitantly maximizing precision (Table 2). An important disclaimer is that all simulations and analyses in this work were performed with R (and the related RJAGS package). Table 2. Comparative precision and bias estimates for different approaches at evidence synthesis in a comprehensive simulation study Evidence synthesis methods Direct comparison

Precision (1/mean square error) Moderate

AIC

Low

Trials not biased Not biased Not biased Not biased Not biased Not biased

Actual true biases* All trials One set of similarly AIC trials biased biased Fully biased Not biased

Direct comparison trials biased Fully biased

Not biased

Not biased

Fully biased

Consistency High Moderately Moderately Moderately frequentist NMA biased biased biased Consistency High Moderately Moderately Moderately Bayesian NMA biased biased biased Inconsistency Moderate Fully biased Not biased Fully biased Bayesian metaanalysis Random Moderate Not Fully biased Not biased Fully biased inconsistency biased Bayesian NMA * fully biased: the bias equals the bias in trials; moderately biased: as a result of combining biased direct estimate and unbiased indirect estimate, or a result of combining unbiased direct estimate and biased indirect estimate); AIC=adjusted indirect comparison; NMA=network meta-analysis. Modified from Song et al. [12]

One key strength of Bayesian analysis if of course the explicit use of priors. Typically, however, authors of reports based on Bayesian methods rely on substantially non-informative priors. Whether this approach is really objective and non-informative or may be a source of imprecision or bias may depend on several conditions and vary in importance on a case by case basis [GAJIC]. The best approach is always to frankly disclose which priors have been used and why. In addition, sensitivity analyses with different priors are often helpful to demonstrate that results are robust and stable irrespective of changes in priors and other analytical choices. Despite the above apparent yet often overestimated difficulties in implementing Bayesian analysis for evidence synthesis and other decision making applications, the use of Bayesian methods is increasing [45]. Probably two factors can best explain this phenomenon: the increased availability of suitable software and powerful computers, and the widening

Complimentary Contributor Copy

Choosing between Frequentist and Bayesian Frameworks …

109

perception that Bayesian methods may offer effective and explicit means to factor into inferential processes both prior information and actual experimental data.

CHOOSING THE STATISTICAL PACKAGE All-Purpose Packages Several comprehensive packages can be used for network meta-analysis, with most of the evidence in favor of WinBUGS, R, and SAS (Table 3) [46-48]. Table 3. Available statistical packages for adjusted indirect comparisons and network meta-analyses Statistical package ADDIS/GeMTC

Framework

Pros

Cons

Bayesian

ITC

Frequentist (only adjusted indirect comparison allowed) Bayesian (through R)

User-friendly, reliance on established analytical approaches for analysis and assumption checks User-friendly, reliance on established analytical approaches for analysis and assumption checks User-friendly, reliance on established analytical approaches for analysis and assumption checks Extensive flexibility, high quality graphs, cost Extensive flexibility, high quality graphs, Extensive flexibility

Limited space for modeling; relatively few analytical and graphical options available Limited space for modeling; relatively few analytical and graphical options available Limited space for modeling; relatively few analytical and graphical options available Limited userfriendliness Cost, limited userfriendliness Cost, limited userfriendliness Cost, limited modeling flexibility, limited userfriendliness Limited userfriendliness, limited graphical capabilities

Open MetaAnalyst

R

Stata

Both frequentist and Bayesian Both frequentist and Bayesian Both frequentist and Bayesian Frequentist

WinBUGS

Bayesian

S-PLUS SAS

High quality graphs, ancillary analyses available Extensive flexibility, cost

WinBUGS is the Windows compatible version of BUGS (Bayesian Inference Using Gibbs Sampling) [49-50]. Thanks to its freeware nature, flexibility and potential for extensive Bayesian modeling relying on Markov chain Monte Carlo [MCMC] methods, it has become the most commonly used package for network meta-analysis and mixed treatment comparisons [23]. Indeed, complex hierarchical models can be directly and explicitly

Complimentary Contributor Copy

110

Giuseppe Biondi-Zoccai and Fabrizio D‘Ascenzo

specified and fitted in WinBUGS, and later inference obtained with MCMC with fitness appraised by means of deviance information criterion (DIC) [51]. Nonetheless, it is not devoid of potential drawbacks and requires substantial expertise for valid modeling and interpretation [52]. On the bright side, most relevant codes are available either on the internet or in the pertinent scholarly articles [25-31, 46-48,53]. Notably, WinBUGS can also be called upon from other packages, such as Matlab [54], R [55], SAS [56], Stata, [57]. enabling even more flexibility and sophistication in data analysis. Similar considerations to those pertinent to WinBUGS can be applied to other Bayesian all-purpose packages such as OpenBUGS and JAGS. R is probably the most comprehensive and flexible all purpose package, enabling frequentist as well as Bayesian analyses (e.g. with RJAGS), either directly or through WinBUGS queries [14]. It has the unique set of advantages of being a freeware, very potent and robust in terms of both analytics and graphics [58]. Historically it has been used less commonly than WinBUGS in published mixed treatment comparisons, but its uptake is projected to increase dramatically with the development of user-friendly interfaces. SAS is a very powerful and flexible computational environment, and several useful models and codes have been developed to enable it to conduct network meta-analysis, in both frequentist and Bayesian frameworks [13]. Its results, when comparison is limited to Bayesian inference, appear highly concordant with those provided by WinBUGS [13]. Nonetheless, its proprietary nature and the expertise required to master it may bring to a slower uptake and adoption for network meta-analysis in the foreseeable future. Stata is another potent all-purpose proprietary package, typically exploiting a frequentist framework, but it also has a large library of very useful meta-analytic routines, and high quality graphical capabilities. Most recently, several codes dedicated to network and multivariate meta-analysis have been provided (e.g. mvmeta). Finally, S-plus is seldom used for network meta-analysis, given its proprietary nature, cost and complexity, but being the natural parent of R is course very potent in analytical as well as graphical capabilities [59]

User-Friendly Packages Several user-friendly packages have been developed or are under construction to enable even clinicians with limited statistical expertise to perform network meta-analysis. While we favor this course of events, as it will increase dissemination of this research design and empower end-users and decision-makers, caution should be exercised in avoiding superficiality in performing the rather complex set of procedures that leads to a network metaanalysis. The main user-friendly programs currently available are GeMTC, Open MetaAnalyst, and ITC. ADDIS (Aggregate Data Drug Information System) is a novel open-source package developed by researchers belonging to the University of Groningen in the Netherlands [60]. It is an open source software and can be downloaded for free. It estimates probabilities and other key parameters with Markov chain Monte Carlo (MCMC) methods within a Bayesian framework. Its analytical engine is GeMTC, which uses R to reach for BUGS/JAGS and generate Bayesian probability estimates [61]. It appealingly generates and adapt a suitable BUGS/JAGS codes based on the data provided by the final user.

Complimentary Contributor Copy

Choosing between Frequentist and Bayesian Frameworks …

111

Open Meta-Analyst is an open-source package enabling simple as well as complex analysis in both frequentist and Bayesian frameworks by remotely accessing the R package [62]. Finally, ITC is a user-friendly program which enables straightforward adjusted indirect comparisons based on the Bucher method [63].

PRACTICAL RECOMMENDATIONS At the beginning of our experience we relied only on the frequentist framework and the Bucher method, performing computations with an Excel routine we had developed ad hoc [64]. Nowadays we perform most of our analysis within a Bayesian framework using WinBUGS, but are open occasionally to frequentist calculations and other computational tools on a case by case scenario. In the future, we think that the choice of framework and package will not be crucial, but a valid approach will always be required, preferably with extensive sensitivity analyses, for instance based on the alternative framework or other statistical programs. Accordingly, we may infer that inexperienced users wishing to synthesize a complex evidence base might benefit from user-friendly packages with established analytical algorithms and assumptions checks, such as ADDIS/GeMTC or Open Meta-Analysist. However, they should still bear in mind the need for careful use of such apparently fool-proof packages and interpretation of the results stemming from them. More experienced users will obviously benefit from more versatile and flexible packages such as WinBUGS, R, SAS, and Stata, which enable customized modeling and complex types of analyses. However, knowledge is not always synonymous of wisdom, and even in this setting caution should be exercised to avoid superficiality in conducting and interpreting the analyses. More specifically, it is clear that network meta-analytic approaches and mixed treatment comparison methods are generally in agreement with adjusted indirect comparison techniques or less flexible approaches [63, 65]. However, in several important instances, and especially when the evidence network is at least moderately complex, adjusted indirect comparisons may be less precise and less accurate than the other methods. In addition, even the remarkably flexible and conservative Bayesian methods may occasionally face the risk of inaccuracy for specific closed loop patterns [43]. Thus, no single winner can be identified in the battle for the best framework or package.

CONCLUSION The choice of a specific statistical framework or computational method for network metaanalysis is less important than the thorough acquaintance with its pros and cons, and correct area of application as well as means of interpretation [66-67]. Indeed, a cautious yet pragmatic stance is recommended whenever a network meta-analysis is conducted or perused, whichever are the methods employed to perform it.

Complimentary Contributor Copy

112

Giuseppe Biondi-Zoccai and Fabrizio D‘Ascenzo

REFERENCES [1] [2]

[3] [4] [5] [6]

[7]

[8] [9] [10] [11]

[12]

[13]

[14]

[15]

[16] [17]

Guyatt GH, Rennie D, editors. Users' Guides to the Medical Literature: A Manual of Evidence-Based Clinical Practice. Chicago: AMA Press; 2002. Biondi-Zoccai G, Landoni G, Modena MG. A journey into clinical evidence: from case reports to mixed treatment comparisons. HSR Proc Intensive Care Cardiovasc Anesth 2011; 3: 93-6. Greco T, Zangrillo A, Biondi-Zoccai G, Landoni G. Meta-analysis: pitfalls and hints. Heart Lung Vessel 2013; 5: 219-225. Spiegelhalter DJ, Freedman LS, Parmar MK. Applying Bayesian ideas in drug development and clinical trials. Stat Med 1993; 12: 1501-11. Guyatt GH, Mills EJ, Elbourne D. In the Era of Systematic Reviews, Does the Size of an Individual Trial Still Matter? PLoS Med 2008; 5: e4. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J Clin Epidemiol 1997; 50: 683-91. Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses. BMJ 2003; 326: 472. Lumley T. Network meta-analysis for indirect treatment comparisons. Stat Med 2002; 21: 2313-24. Chootrakool H, Shi JQ. Meta-analysis of multi-arm trials using empirical logistic transform. The Open Medical Informatics Journal 2008; 2: 112-16. Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med 2004; 23: 3105-24. Biondi-Zoccai G, Lotrionte M, Landoni G, Modena MG. The rough guide to systematic reviews and meta-analyses. HSR Proc Intensive Care Cardiovasc Anesth 2011; 3: 16173. Song F, Harvey I, Lilford R. Adjusted indirect comparison may be less biased than direct comparison for evaluating new pharmaceutical interventions. J Clin Epidemiol 2008; 61: 455-63. Jones B, Roger J, Lane PW, Lawton A, Fletcher C, Cappelleri JC, Tate H, Moneuse P; PSI Health Technology Special Interest Group, Evidence Synthesis sub-team. Statistical approaches for conducting network meta-analysis in drug development. Pharm Stat 2011; 10: 523-31. Song F, Clark A, Bachmann MO, Maas J. Simulation evaluation of statistical properties of methods for indirect and mixed treatment comparisons. BMC Med Res Methodol 2012; 12: 138. Psaty BM, Lumley T, Furberg CD, Schellenbaum G, Pahor M, Alderman MH, Weiss NS. Health outcomes associated with various antihypertensive therapies used as firstline agents: a network meta-analysis. JAMA 2003; 289: 2534-44. Altman D, Machin D, Bryant T, Gardner S. Statistics with Confidence. 2nd edition. London: Wiley-Blackwell; 2000. Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, Boersma C, Annemans L, Cappelleri JC. Interpreting indirect treatment comparisons and network

Complimentary Contributor Copy

Choosing between Frequentist and Bayesian Frameworks …

[18] [19]

[20] [21] [22] [23]

[24]

[25] [26]

[27]

[28]

[29] [30]

[31] [32] [33]

[34]

113

meta-analysis for health-care decision making: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 1. Value Health 2011;14:417-28. Goodman SN. Towards evidence based medical statistics: 1. The P value fallacy. Ann Intern Med 1999; 120: 995-1004. Spiegelhalter DJ, Myles JP, Jones DR, Abrams KR. Methods in health service research. An introduction to bayesian methods in health technology assessment. BMJ 1999; 319: 508-12. Spiegelhalter DJ, Myles JP, Jones DR, Abrams KR. Bayesian methods in health technology assessment: a review. Health Technol Assess 2000; 4: 1-130. Ades AE. A chain of evidence with mixed comparisons: models for multi-parameter synthesis and consistency of evidence. Stat Med 2003; 22: 2995-3016. Eddy DM, Hasselblad V, Shachter R. Meta-analysis by the confidence profile method: the statistical synthesis of evidence. Boston: Academic Press; 1992. Coleman CI, Phung OJ, Cappelleri JC, Baker WL, Kluger J, White CM, Sobieraj DM. Use of Mixed Treatment Comparisons in Systematic Reviews. Rockville: Agency for Healthcare Research and Quality; 2012. Ades AE, Welton NJ, Caldwell D, Price M, Goubar A, Lu G. Multiparameter evidence synthesis in epidemiology and medical decision-making. J Health Serv Res Policy 2008; 13 Suppl 3: 12-22. Dias S, Welton NJ, Sutton AJ, Ades AE. Evidence synthesis for decision making 1: introduction. Med Decis Making 2013; 33: 597-606. Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Decis Making 2013; 33: 607-17. Dias S, Sutton AJ, Welton NJ, Ades AE. Evidence synthesis for decision making 3: heterogeneity--subgroups, meta-regression, bias, and bias-adjustment. Med Decis Making 2013; 33: 618-40. Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Med Decis Making 2013; 33: 641-56. Dias S, Welton NJ, Sutton AJ, Ades AE. Evidence synthesis for decision making 5: the baseline natural history model. Med Decis Making 2013; 33: 657-70. Dias S, Sutton AJ, Welton NJ, Ades AE. Evidence synthesis for decision making 6: embedding evidence synthesis in probabilistic cost-effectiveness analysis. Med Decis Making 2013; 33: 671-8. Ades AE, Caldwell DM, Reken S, Welton NJ, Sutton AJ, Dias S. Evidence synthesis for decision making 7: a reviewer's checklist. Med Decis Making 2013; 33: 679-91. Higgins JP, Whitehead A. Borrowing strength from external trials in a meta-analysis. Stat Med 1996; 15: 2733-49. Lambert PC, Sutton AJ, Burton PR, Abrams KR, Jones DR. How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS. Stat Med 2005; 24: 2401-28. Higgins JP, Spiegelhalter DJ. Being sceptical about meta-analyses: a Bayesian perspective on magnesium trials in myocardial infarction. Int J Epidemiol 2002; 31: 96104.

Complimentary Contributor Copy

114

Giuseppe Biondi-Zoccai and Fabrizio D‘Ascenzo

[35] Youn JH, Lord J, Hemming K, Girling A, Buxton M. Bayesian meta-analysis on medical devices: application to implantable cardioverter defibrillators. Int J Technol Assess Health Care 2012; 28: 115-24 [36] Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Med 2005; 2: e124. [37] Warn DE, Thompson SG, Spiegelhalter DJ. Bayesian random effects meta-analysis of trials with binary outcomes: methods for the absolute risk difference and relative risk scales. Stat Med 2002; 21: 1601-23. [38] Thompson SG, Smith TC, Sharp SJ. Investigating underlying risk as a source of heterogeneity in meta-analysis. Stat Med 1997; 16: 2741-58. [39] Sharp SJ, Thompson SG. Analysing the relationship between treatment effect and underlying risk in meta-analysis: comparison and development of approaches. Stat Med 2000; 19: 3251-74. [40] Achana FA, Cooper NJ, Dias S, Lu G, Rice SJ, Kendrick D, Sutton AJ. Extending methods for investigating the relationship between treatment effect and baseline risk from pairwise meta-analysis to network meta-analysis. Stat Med 2013; 32: 752-71. [41] Baker R, Jackson D. A new approach to outliers in meta-analysis. Health Care Manag Sci 2008; 11: 121-31. [42] Carlin BP, Hong H, Shamliyan TA, Sainfort F, Kane RL. Case Study Comparing Bayesian and Frequentist Approaches for Multiple Treatment Comparisons. Rockville: Agency for Healthcare Research and Quality; 2013. [43] Jonas DE, Wilkins TM, Bangdiwala S, Bann CM, Morgan LC, Thaler KJ, Amick HR, Gartlehner G. Findings of Bayesian Mixed Treatment Comparison Meta-Analyses: Comparison and Exploration Using Real-World Trial Data and Simulation. Rockville: Agency for Healthcare Research and Quality; 2013. [44] Gajic-Veljanoski O, Cheung AM, Bayoumi AM, Tomlinson G. The choice of a noninformative prior on between-study variance strongly affects predictions of future treatment effect. Med Decis Making 2013; 33: 356-68. [45] Cooper NJ, Spiegelhalter D, Bujkiewicz S, Dequen P, Sutton AJ. Use of implicit and explicit bayesian methods in health technology assessment. Int J Technol Assess Health Care 2013; 29: 336-42. [46] A Network Meta-Analysis Toolkit. Computing Multiple Interventions Methods Group. A Methods Group of the Cochrane Collaboration. Available from: http:// cmimg.cochrane.org/network-meta-analysis-toolkit (last accessed on March 31, 2014). [47] Mixed treatment comparison software. University of Ioannina, Ioannina, Greece. Available from: http://www.dhe.med.uoi.gr/software.htm (last accessed on March 31, 2014). [48] Programs & Code for Mixed Treatment Comparisons. University of Bristol, Bristol, UK. Available from: http://www.bristol.ac.uk/social-community-medicine/ projects/ mpes/code/ (last accessed on March 31, 2014). [49] Bayesian Analysis Using Gibbs Sampling (BUGS). Available from: http://www.mrcbsu.cam.ac.uk/software/bugs/ (last accessed on March 31, 2014). [50] Fryback DG, Stout NK, Rosenberg MA. An elementary introduction to Bayesian computing using WinBUGS. Int J Technol Assess Health Care 2001; 17: 98-113. [51] Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc B 2002; 64: 583-639.

Complimentary Contributor Copy

Choosing between Frequentist and Bayesian Frameworks …

115

[52] Lunn D, Spiegelhalter D, Thomas A, Best N. The BUGS project: Evolution, critique and future directions. Stat Med 2009; 28: 3049-67. [53] Greco T, Landoni G, Biondi-Zoccai G, D'Ascenzo F, Zangrillo A. A Bayesian network meta-analysis for binary outcome: how to do it. Stat Methods Med Res 2013 Oct 28 [Epub ahead of print] doi: 10.1177/0962280213500185. [54] MATBUGS. Available from: http://code.google.com/p/matbugs/ (last accessed on March 31, 2014). [55] R2WinBUGS. Available from: http://cran.r-project.org/web/packages/ R2WinBUGS/ index.html (last accessed on March 31, 2014). [56] Running WinBUGS through SAS. Available from: http://www2.mrc-bsu.cam.ac.uk/ bugs/winbugs/remoteSAS.html (last accessed on March 31, 2014). [57] WinBUGS & Stata. Available from: http://www.personal.leeds.ac.uk/ ~hssdg/ Stata/index.htm (last accessed on March 31, 2014). [58] Tan SH, Cooper NJ, Bujkiewicz S, Welton NJ, Caldwell DM, Sutton AJ. Novel presentational approaches were developed for reporting network meta-analysis. J Clin Epidemiol 2014 Feb 19 [Epub ahead of print] doi: 10.1016/j.jclinepi.2013.11.006. [59] Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. J Clin Epidemiol 2011; 64: 163-71. [60] van Valkenhoef G, Tervonen T, Zwinkels T, de Brock B, Hillege H. ADDIS: a decision support system for evidence-based medicine. Decision Support Systems 2013; 55: 45975. [61] GeMTC. Available from: http://drugis.org/gemtc (last accessed on March 31, 2014). [62] Open Meta-Analyst. Available from: http://www.cebm.brown.edu/open_meta (last accessed on March 31, 2014). [63] Wells GA, Sultan SA, Chen L, Khan M, Coyle D. Indirect evidence: indirect treatment comparisons in meta-analysis. Ottawa: Canadian Agency for Drugs and Technologies in Health; 2009. [64] Indirect Meta-analysis Tool. METCARDIO. Available from: http://www.metcardio. org/macros/IMT.xls (last accessed on March 31, 2014). [65] O'Regan C, Ghement I, Eyawo O, Guyatt GH, Mills EJ. Incorporating multiple interventions in meta-analysis: an evaluation of the mixed treatment comparison with the adjusted indirect comparison. Trials 2009; 10: 86. [66] Graham PL, Moran JL. Robust meta-analytic conclusions mandate the provision of prediction intervals in meta-analysis summaries. J Clin Epidemiol 2012; 65: 503-10. [67] D'Ascenzo F, Biondi-Zoccai G. Network meta-analyses: the "white whale" for cardiovascular specialists. J Cardiothorac Vasc Anesth 2014; 28: 169-73.

Complimentary Contributor Copy

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 8

CHOOSING THE STATISTICAL MODEL AND BETWEEN FIXED AND RANDOM EFFECTS Joseph Beyene, Ph.D.1,*, Ashley Bonner, M.Sc.2,† and Binod Neupane, Ph.D.3,‡ 1

Associate Professor of Biostatistics, Program in Population Genomics, Department of Clinical Epidemiology & Biostatistics, Faculty of Health Sciences, McMaster University, Hamilton, ON 2 PhD Candidate, McMaster University, Hamilton, Ontario, Canada 3 Research Associate, McMaster University, Hamilton, Ontario, Canada

FUNDING DISCLOSURE Joseph Beyene would like to acknowledge funding from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Canadian Institutes of Health Research (CIHR).

ABSTRACT The choice of statistical model used in network meta-analysis (NMA) primarily depends on the type of outcome measured in the trials. In this chapter, we describe the unified generalized linear model (GLM) framework that can handle a wide range of outcomes, including those derived from binary, count, and continuous data types. We present both the fixed and random effects modelling approaches within the GLM framework, explain corresponding assumptions in the context of NMA, and discuss methodological issues and strategies on how to approach the decision between choosing *

Corresponding author: Joseph Beyene, PhD, McMaster University, 1280 Main Street West, MDCL 3211, Hamilton, ON L8S 4K1, Canada. Phone: +1 905-525-9140 x 21333. Fax: +1 905-528-2814. Email: [email protected]. † Email: [email protected]. ffi Email: [email protected].

Complimentary Contributor Copy

118

Joseph Beyene, Ashley Bonner and Binod Neupane fixed or random effects models. Along with the GLM framework, we present the Bayesian approach for inference, model fit assessment, and ranking treatments. Finally, we apply the Bayesian GLM models to three publicly available datasets, with binary, count, and continuous outcome, and demonstrate the use of deviance statistics to assess model fit for both fixed and random effect models.

Keywords: Bayesian models, fixed effects, generalized linear models, heterogeneity, model assessment, random effects, rank probabilities

1. INTRODUCTION Traditional pairwise meta-analysis is a statistical method that enables the quantitative synthesis of effect estimates from multiple studies that compared the same two interventions. When performed in the context of systematic reviews, results from meta-analyses can heavily influence clinical decision-making. However, because pairwise meta-analysis is restricted to studies that directly compare two treatments of interest, its utility is limited in most health research areas. When many treatment options exist, pairwise meta-analysis cannot answer natural questions such as which treatment is best, second best, and so on. Additionally, pairwise meta-analysis does not consider indirect sources of evidence. Network meta-analysis (NMA), also known as multiple treatment meta-analysis (MTM) or mixed treatment comparison (MTC), is a statistical framework that can simultaneously incorporate direct and indirect evidence and rank treatments in terms of overall effectiveness [1-4]. When both direct and indirect evidence is ‗mixed‘ together, the resulting estimate for a comparison is expected to have improved precision than that obtained from direct evidence alone [5]. Due to its potential to guide clinical decisions, NMA has become increasingly popular and has been applied in many areas of health research, including cardiovascular disease, oncology, and rheumatology [6-8]. Though some practical limitations of pairwise meta-analysis are overcome with NMA, important statistical considerations still remain. The model choice for NMA depends on the outcome of interest. NMA can be applied to synthesize effect measures that are calculated from a variety of underlying data structures. Three outcome types are often encountered in practice and will be covered in this chapter: binary, continuous, and count. For binary variables (e.g., ‗Yes‘ or ‗No‘, ‗Dead‘ or ‗Alive‘, ‗Infected‘ or ‗Not infected‘), odds ratios (ORs), relative risks (RRs), or risk differences (RDs) are typically used as the effect measure to compare treatment groups in each study. For continuous variables (e.g., systolic blood pressure (mmHg), fat mass (kg)), the mean difference (MD), standardized mean difference (SMD), or ratio of means (RoM) can be used as effect measures. For count variables (e.g. number of falls per unit time, number of infection flares per unit time), rate ratio is used as an effect measure. Another outcome type that is often used in disciplines such as oncology is time-to-event, or survival, data. Although we will not discuss this outcome in detail, the general methodological principles are similar with the other data types. Corresponding distributional assumptions that reflect the nature of the underlying variables dictate the statistical modelling framework that is used to estimate the relative treatment effects. As is the case with traditional meta-analysis, the choice between fixed or random effects is also important in NMA. The fixed effect assumption in NMA coincides with the belief that,

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

119

for each pair of treatments, the set of studies that directly compare the same two treatments are estimating the same underlying treatment effect. This assumption may be unreasonable for a set of studies that, although comparing the same two treatments, have sufficient differences in their attributes (e.g., study design, patient sample characteristics). As with meta-analysis, this is reflective of heterogeneity in the true treatment effect across studies and must be modeled differently. The random effect assumption in NMA coincides with the belief that, for each pair of treatments, the set of studies that directly compare the same two treatments have a distribution for the underlying treatment effects. Whether to assume fixed or random effects in the NMA framework is a critical decision as it will dictate the statistical models and can influence downstream conclusions. In this chapter, we present and discuss statistical models appropriate for modeling outcomes obtained from binary, continuous, and count variables, as well as the distinction between fixed and random effect models. We also explain the use of model fit assessment measures to validate the models and help to choose between fixed and random effect. All models in this chapter adopt the Bayesian framework (see Chapter 11) because its downstream analytic and graphical tools are more useful for making clinical decisions than those available for the frequentist framework. This will be apparent as we demonstrate the implementation of these NMA models using selected datasets from the literature. The models in this chapter embrace the assumption of consistency, as the matter of inconsistency is reserved for Chapter 16 and complicates the modeling beyond the scope of this chapter. In addition, the models in this chapter handle arm-level, as opposed to contrast-level, summary data because arm-level data is the realistic and frequent encounter when conducting systematic reviews of clinical trials.

2. STATISTICAL MODELS Statistical models are rapidly being developed to handle the complexities of NMA. The simplest method for NMA is Butcher‘s method, from one of the early seminal papers on the subject area [2]. It combines direct and indirect evidence from pairwise meta-analyses under the so-called consistency assumption. It is still a popular method, especially when only a few treatments are compared in two-arm trials, and is usually employed under the frequentist paradigm. If there are several treatments of interest with possibly several closed loops, it requires performing several analyses to estimate different indirect effects from different sources for the same comparison, and hence may not be feasible. We will not discuss this method further in this chapter. In recent years, NMA has been predominantly carried out with a unified generalized linear model (GLM) framework under both frequentist [3] and Bayesian frameworks [4]. One can regard the true relative effect between each pair of treatments as either common (homogeneous), leading to a fixed-effect assumption, or different (heterogeneous), corresponding to the random-effects assumption. Both assumptions can be integrated into the GLM framework. The Bayesian approach to NMA has become increasingly popular in recent years compared to the frequentist method [9], it is particularly flexible for fitting complex models including multi-arm trials, and provides credible intervals and rank probabilities for

Complimentary Contributor Copy

120

Joseph Beyene, Ashley Bonner and Binod Neupane

competing treatments to be the best, second best, and so on [9-13]. In this chapter, we discuss the fixed effect and random effects models and methods and their implementations within a GLM framework from only the Bayesian perspective without providing much technical details. We refer readers to Chapter 11 for more details on frequentist and Bayesian approaches to NMA. In Section 2.1, we present the GLM framework in conjunction with the Bayesian approach to parameter estimation. In Sections 2.2 and 2.3, we discuss the concept of heterogeneity, the merits of both fixed and random effect models, and strategies to choose between them. In Sections 2.4 to 2.6, we present information regarding rank probabilities and the implementation of these models.

2.1. Generalized Linear Models (GLMs) with Bayesian Framework 2.1.1. Generalized Linear Models (GLMs) Suppose the effects of treatments are compared in trials. An appropriate likelihood for modeling the data is chosen depending on the type of data available. For armbased aggregate data of binary outcomes (e.g., ―Yes‖ or ―No‖, ―Dead‖ or ―Alive‖), the total number events out of the total number of study participants who were given treatment in trial is available and estimating the true probability of the event occurring, , is of primary interest. The total number of events is assumed to follow a Binomial distribution, . For arm-based summary data of count outcomes (number of events over time), the total number of events observed during a total exposure time for all persons, , (e.g., persons-years) in treatment arm in trial are available and estimating the true rate of events per unit time, , is of primary interest. The total number of events observed in the total exposure time for all persons is assumed to follow a Poisson distribution, . For arm-based summary data from continuous outcomes, the mean and its variance in arm in trial are available and estimating the true mean, , is of primary interest. In this case, is assumed to follow a normal distribution, . To facilitate the analysis of multiple treatment comparisons, a reference treatment should be chosen. The treatments are considered in order such that is chosen to be the reference treatment (which is usually chosen to be the treatment that is typically compared with the rest treatments, such as a placebo) and the comparison of all other treatments with (that is, ) gives relative treatment effects referred to as basic parameters ( , , ,…). Additionally, the baseline treatment within each trial can be different across trials. It is chosen for trial such that comes first in the above specified order before any other treatment in the same trial. Thus, for instance, for a trial that if B and any other treatments except are compared within a trial. contains , and Mathematically, the GLM under evidence consistency (also known as a consistency model) with a random effects assumption for treatment compared to in trial ) is expressed as:

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects where,

if

121

)

(Consistency assumption) for all where

is the link function such that

(a logistic regression

model) for binomial likelihood, (a log-linear model) for Poisson likelihood, and (a linear regression model) for normal likelihood. All relative treatment effects are expressed in terms of basic parameters (e.g., ) under transitivity and consistency assumptions. These assumptions are essential in order to model the relationships between different comparisons and borrow strength from indirect evidence [14]. The above models assume random effects for the relative treatment effects. This means that the are assumed to be random around with variance known as the betweenstudy variance. The distribution of (for any ) is typically assumed to be normal and its variance to be the same for all comparisons (i.e., for any treatment and ) for simplicity, which is common in NMA literature. When =0, for all trials comparing treatments and (i.e., share a common underlying treatment effect) and the above GLMs reduce to the fixed effect versions. Transitivity and consistency are also assumed under fixed effect models.

2.1.2. Bayesian Parameter Estimation and Inference i) Prior Distributions for Model Parameters In the Bayesian framework, prior distributions are specified for all unknown parameters in the above GLM models under both the fixed and random effect assumptions. The resulting model is referred to as a Bayesian hierarchical model. As described in more detail in Chapter 11, this specification allows prior knowledge about values of parameters to be incorporated into the analysis. In the absence of information prior to observing the data, it is most common to choose non-informative or vague Normal prior distributions for absolute and relative effect measures (location parameters) to allow the data to dominate estimation [9]. Depending on the type of outcome, a weakly-informative, usually Uniform, prior distribution is chosen for between-study standard deviation (a scale parameter) [9, 15, 16]. For example, if the relative effect measures (e.g., log-odds ratio) are not expected to vary much in a meta-analysis, it may be reasonable to expect to be smaller than 4 ( , since is non-negative) and hence a may be an appropriate choice of prior distribution for [15, 16]. But for outcomes that have larger relative effect sizes between treatments (e.g., systolic blood pressure) and when larger variability in treatment effects is expected across trials, a wider uniform prior, say , may be more appropriate. An appropriate choice of prior distribution for is especially critical when the number of available trials is small [16].

Complimentary Contributor Copy

122

Joseph Beyene, Ashley Bonner and Binod Neupane

In the NMA literature, the commonly used prior distributions are ) where is typically between 3 to 5, for ( ) in all trials and all basic parameters ( if is the reference, for example) under both fixed and random effects assumptions. For random effects models, a prior, where is typically between 2 to 10, is used for under random effects assumption [17]. Another prior for

has been infrequently used in NMA literature (e.g., [4]),

but it is commonly used in Bayesian pairwise meta-analysis [16]. Alternatively, empirical priors inferred from the data of network of trials can be chosen for Bayesian NMA [17].

ii) Posterior Samples for Estimation and Inference Once a suitable GLM model is chosen with either a fixed or random effect assumption, and priors are specified, the model parameters are estimated using a Markov Chain MonteCarlo (MCMC) algorithm to generate posterior samples, which are the basis for Bayesian inference. They are used to estimate parameters of interest, such as relative treatment effects and the between-study variance, as well as corresponding credible intervals. In addition to these primary quantities of interest, posterior samples can be used to empirically rank treatments in terms of their probabilities to be best, second best, and so on. Accurate parameter estimation and inference relies on convergence of the MCMC sampling algorithm to the posterior distribution. These algorithms can take a long time to converge and a large number of samples and proper diagnostics are required to ensure convergence. A so-called ‗burn-in‘ period whereby the first several thousand samples returned from the MCMC algorithm are discarded is an important first step. In addition, since there can be a tendency for sequentially generated data to be temporally correlated (due to the nature of the underlying data generating mechanisms), it is suggested to keep values intermittently or, in other words, ‗thin‘ the generated data by accepting one in every 10, for instance. It is also important to consider running two or more ‗chains‘ of MCMC sampling to enable assessing whether different chains starting from different initial values for each parameter converge to same posterior distribution of the parameter [18]. Convergence is usually assessed through Brooks-Gelman-Rubin diagnostic test [19] or through visual inspection of the trace plot for each parameter [20]. For NMA, convergence is deemed achieved if the estimate of potential scale reduction factor (psrf) in the Brooks-Gelman-Rubin diagnostic test is close to 1 with its upper confidence interval ≤1.05 for each parameter [17]. However, visual inspection of trace plot rather than solely relying on the statistical test is suggested to assess convergence. If convergence is not attained, one may try increasing the number of burn-in samples or change the initial values.

2.2. Fixed Effect and Random Effects Models: Concepts and Interpretation in NMA As in pairwise meta-analysis [21, 22], each trial is considered to represent a unique population under the random effects model so that the study-wise true relative effect of, say, treatment compared to , , is assumed to be randomly dispersed (i.e., exchangeable) around the average value with variance [11, 23-25]. Here, does not represent a

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

123

true underlying treatment effect but an average of different underlying effects ( ‘s) in different populations. Under fixed effects model on the other hand, no genuine diversity within and across the trials is assumed, hence true treatment effects across trials are considered identical, i.e., [11, 23-25]. Here, is interpreted as the common underlying treatment effect for vs. for all trials, and any variation in estimates ̂ of across trials could be for some or all of trials) is due to sampling error (i.e., if trial is infinitely large, ̂ so that ̂ ). In other words, a fixed effects model assumes that an individual trial with infinitely large sample size is sufficient to infer about the underlying treatment effect for the target population. Therefore, as in pairwise meta-analysis, fixed effect (FE) and random effect (RE) models represent two broad modelling approaches and concepts in NMA and therefore the estimated effect sizes ̂ and ̂ from NMA for the effect of treatment compared to are interpreted differently.

2.3. Choosing between a Fixed or Random Effects Model in NMA In direct pairwise meta-analysis, unless investigators have convincing evidence that all trials are clinically and methodologically similar (i.e., they are from a single population) the random effect model is preferred because it accounts for (unexplained) heterogeneity across studies and provides a more conservative credible or confidence interval for the pooled point estimate [21, 22]. If investigators believe that the available trials are from the same population so that true relative treatment effect in all trials is the same, or if investigators do not focus on generalizing findings beyond the available trials, the fixed effect model can be chosen [21, 22]. However, such a belief is typically unrealistic to uphold in practice, and might only be reasonable when synthesized trials are conducted by the same investigators in the same study center, or from multicenter trials or consortia following the same study protocols [21, 22, 26]. Therefore, the random effects assumption may be more realistic in real applications and is preferred to account for heterogeneity in the context of NMA as well unless trials are similar in all important ways and statistical assessment of heterogeneity and model fits justifies fixed effect model [11, 27]. If the degree of heterogeneity is suspected to be different across different comparisons with some showing little or no heterogeneity and some showing high heterogeneity, random effects model can still be applied relaxing the constraint of common between-trial variance [14]. However, the choice between assuming fixed or random effects for NMA may not be that straightforward, where one may need to consider subjective assessment of the heterogeneity in clinical trials designs, methods and populations, and statistical and visual examination for the presence of heterogeneity in the estimates across trials, and assess goodness-of-fit to decide whether fixed or random effects model should be used in a given NMA. In the following section, we discuss methods for choosing between fixed and random effects models.

Complimentary Contributor Copy

124

Joseph Beyene, Ashley Bonner and Binod Neupane

2.3.1. Assessment of Heterogeneity Studies included in the evidence synthesis from a systematic review should be relatively homogeneous. Differences in the clinical designs and methods and biases can cause not only heterogeneity but also inconsistency if they result in imbalance in important treatment effect modifiers within or across trials [26, 28-33]. These differences could be genuine clinical and methodological differences in trial characteristics such as study patients‘ clinical and demographic characteristics (e.g., ethnicity or target age-group), tools or methods of outcome measurements or definitions, study centers or locations, doses or routes or formulation of drugs, use of co-interventions, durations of follow-up, etc. But, systematic biases as a result of lack of randomized sequence generation or allocation concealment, industry funding, preference towards newer drugs, etc. or sometimes chance alone can also be a source of heterogeneity and inconsistency in a network of trials. NMA assumes similarity (more precisely speaking, ―transitivity‖ [34]) in treatment effects modifiers across trials. Therefore, the first important step is to examine the distributions of all these trial characteristics that are potential sources of heterogeneity and inconsistency [32]. If data on treatment effect modifiers are available from all trials, and their imbalance is suspected across trial, a network meta-regression rather than a sole random effects NMA is preferred since it can improve the consistency and explain heterogeneity and also minimize biases [32, 35-37]. However, this may not be effective nor efficient when the number of available trials is small compared to the number of treatment comparisons or when there are substantial clinical and methodological differences across trials [9, 27, 30, 32, 38]. When using estimates to assess heterogeneity, a visual inspection of estimates (e.g., using forest plots) for each pairwise comparison is a practical and effective strategy than formally running some statistical tests as the former can identify outlying trials as well as trials with similar estimates while the latter approach may not find problem due to lack of power or other limitations. For statistical assessment of the presence of heterogeneity in one or more treatment comparisons, the Cochran Q-test can be used and also a matrix of pairwise can be produced, where is the percentage of total variation due to between-study heterogeneity [39]. However, both Q-test and analysis suffer from lack of statistical power or accuracy and hence may not be optimal when only a small to moderate number of trials are available for direct comparisons [40, 41]. Some investigators suggest considering the assessment of heterogeneity and inconsistency jointly in NMA where heterogeneity and inconsistency can be viewed as the broad concept of total heterogeneity present in a network [29]. There are various statistical approaches to assess inconsistency in a network [2, 29, 35, 42-46]. Dias et al. (2013) have provided a detailed review of different approaches of inconsistency analysis along with WinBUGS codes [46]. We refer the reader to Chapter 16 for more information on detection and dealing with inconsistency. If unexplained heterogeneity is suspected through clinical and statistical evaluations, a random effect meta-analysis that accounts for the heterogeneity is viewed as a better choice. In NMA, where consistency is also likely and a part of it may be explained by heterogeneity, Lu and Ades (2006) suggest a random effects model [35]. When the choice is not clear from heterogeneity assessment, a goodness-of-model fit assessment as described below may shed light on whether this is truly the case.

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

125

2.3.2. Assessment of Goodness-of-Fit Assessing and comparing the goodness-of-fit of fixed and random effects models can help choose between which to use for a NMA [35, 47, 48]. The mean ̅ of the residual deviances ( , where ‗ ‘ denotes the likelihood function), can be calculated from the posterior samples and used as an absolute measure of model fit [35, 48, 49]. Since each data point is expected to contribute 1 to ̅ , a model with ̅ roughly equal to (but not too much higher than) the total number of independent data points can be considered an adequate model fit [48, 49]. The deviance ̅ information criterion, , where is the estimate of the effective number of parameters, can also be used to select and compare fixed and random effects models as it penalizes for model complexity [35, 48, 49]. Much like the use of the Akaike Information Criterion ( ) within a frequentist framework, a smaller indicates a better model fit [49]. As a rule of thumb, if the difference in between two models is greater than 5, then is considered a better fit and is the preferred choice [48, 49]. the model with lower

2.4. Ranking Treatments One advantage of performing a NMA using a Bayesian approach is that competing treatments can be ranked for overall effectiveness based on the MCMC simulated samples. Within each MCMC sample, treatments are ranked by their estimated effect sizes, and then, across all samples, averages for the first rank, second rank, and so on, are calculated to obtain rank probabilities. These estimated probabilities are plotted against the ranks to create a rankogram [50]. In addition to the rankogram, cumulative probability plots can also be used as another useful graphical tool. For each treatment, the cumulative probability that it is among the top treatments (anywhere between the first and th rank) is plotted against the ranks. The so-called the surface under the cumulative ranking curve (SUCRA) is a useful numerical summary which compliments the graphical display of cumulative ranking plots and is obtained for each treatment. SUCRA allows identification of the best overall, 2nd best overall etc. The value of SUCRA would be 1 (i.e., 100%) for a treatment that is certain to be the best and 0 for a treatment is certain to be the worst [50].

2.5. Inconsistency Model In this Chapter, we focused only on statistical models for NMA under the consistency assumption. The illustrative examples presented in the next section are all discussed under this framework. In practice, inconsistency across various comparisons may be present and an appropriate extension of the models presented so far would need to be fitted. There are various approaches to assess inconsistency as discussed in Section 2.3.1. If inconsistency is suspected, there are methods to incorporate or deal with inconsistency in Bayesian NMA (e.g., [35]) as well as in frequentist NMA (e.g., [3]).

Complimentary Contributor Copy

126

Joseph Beyene, Ashley Bonner and Binod Neupane

2.6. Implementation There are different Bayesian software packages and tools that can be used to implement the methods we discussed so far (see Chapter 11 for more details on choices between statistical packages). Irrespective of what program is used to run Bayesian NMA, investigators need to be aware of some technical and practical considerations while implementing NMA.

3. ILLUSTRATIVE EXAMPLES In this section we use the R package ‗gemtc‘ and applied the Bayesian GLMs described in Section 2 to three publicly available datasets. Empirically derived vague prior distributions were chosen by the ‗gemtc‘ package, as they are suggested to be reasonable choices [17]. Convergence for all models was assessed and confirmed among 4 chains using the GelmanRubin plot and diagnostic statistics [19, 20]. For each model, after discarding the first 50000 burn-in samples, we obtained, with a thinning interval of 10, and used 50000 posterior samples for inference. For all three examples, we fit both fixed and random effect models and compared goodness-of-fit by calculating the mean residual deviance ( ̅ ) and deviance information criterion ( ). We assessed heterogeneity by calculating pairwise statistics and used this information in combination with deviance statistics to guide our choice whether to use fixed or random effect models for inference. To assess heterogeneity by means of study characteristics, design, and so forth (a non-statistical way), we refer the reader to the referenced original papers for study information and to Chapter 15, where the critical appraisal of heterogeneity is the focus.

3.1. Binary Outcome: Diabetes Data In 2007, Elliott et al. [51] conducted a systematic review and network meta-analysis that compared the relative effect of six antihypertensive agents on incident diabetes. The RCTs included in their study compared the incidence (new cases) of diabetes within groups of participants who were randomized to one of two antihypertensive drugs (two-arm studies) or to one of three (three-arm studies). Therefore, the primary outcome for a study participant in one of these studies was binary (1 if they developed diabetes, 0 if they did not) and the probability of developing diabetes was of primary interest. The diabetes data can be obtained from Table 1 of Elliott et al. (2007) [51]. The data includes 22 studies, of which 18 are two-arm trials and the remaining 4 are three-arm trials. The 6 antihypertensive drugs were compared across these studies in a total of 154176 study participants, of which 10962 developed diabetes during their participation. The network geometry is shown in Figure 1.

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

127

Figure 1. Network configuration for diabetes data. Edges connecting treatments indicates the presence of direct comparison evidence within the network. Numbers along edges indicate the number of studies comparing the corresponding pair of treatments head-to-head.

We fit a Bayesian GLM with binomial likelihood and the logit link function, under both fixed and random effect assumptions. The posterior samples of the fixed effect model and , whereas the posterior returned a mean residual deviance of ̅ and . The samples from the random effect model resulted in ̅ random effect model returned much smaller deviance statistics, suggesting that it provides a better fit to the data and that heterogeneity might be present. To assess heterogeneity, we calculated the pairwise values that ranged from to . This further suggests some evidence of heterogeneity and that the random effect assumption is appropriate for modeling this data. Table 1 shows relative treatment effects for all possible pairwise treatment comparisons. Since the outcomes in the trials are binary and the probability of developing diabetes is of interest, and because we used the logit link for a logistic regression framework, relative treatment effects are conveniently obtained and expressed as odds ratios with reference to developing diabetes. For example, ARB resulted in a statistically significant relative reduction in odds of 39% compared to Diuretic, odds ratio = 0.61 (95% credible interval: 0.48 to 0.76).

Complimentary Contributor Copy

Table 1. Estimated relative treatment effects as odds ratios for the diabetes data. Values in this table represent the estimated treatment effect with 95% credible interval of the treatment shown in the row relative to the treatment in the corresponding column Placebo

ACEinhibitor

ARB

b-blocker

CCB

Diuretic

Placebo

—–

ACEinhibitor

0.89 (0.77, 1.04) 0.82 (0.68, 1.00) 1.25 (1.05, 1.50) 1.05 (0.89, 1.26) 1.34 (1.13, 1.63)

1.13 (0.96, 1.30) —–

1.22 (1.00, 1.47) 1.08 (0.87, 1.34) —–

0.80 (0.66, 0.95) 0.71 (0.60, 0.83) 0.66 (0.54, 0.80) —–

0.95 (0.79, 1.12) 0.85 (0.71, 0.99) 0.79 (0.63, 0.94) 1.19 (1.04, 1.35) —–

0.75 (0.61, 0.88) 0.66 (0.55, 0.78) 0.61 (0.48, 0.76) 0.93 (0.77, 1.11) 0.78 (0.66, 0.93) —–

ARB b-blocker CCB Diuretic

0.93 (0.75, 1.15) 1.41 (1.20, 1.66) 1.18 (1.01, 1.40) 1.51 (1.28, 1.82)

1.52 (1.26, 1.87) 1.27 (1.06, 1.58) 1.63 (1.31, 2.08)

0.84 (0.74, 0.96) 1.07 (0.90, 1.30)

1.28 (1.08, 1.53)

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

129

Rank probabilities calculated from the posterior samples help make overall conclusions. A rankogram for the diabetes data is shown in Figure 2. It can be seen that ARB is ranked as the best treatment at reducing the incidence of diabetes, whereas Diuretic is the worst. This is partially confirmed by observing the relative treatment effects in Table 1 that involve these two treatments.

Figure 2. Rankogram showing estimated rank probabilities for treatments in the diabetes data.

In addition to the rankogram, Figure 3 displays cumulative probability plots along with SUCRA values for each of the six treatments. Once again, ARB is the best treatment with a SUCRA value of 94.9%, whereas Diuretic is the worst with a SUCRA value of 4.0%.

3.2. Count Outcome: Multiple Sclerosis Data Multiple sclerosis (MS) is a chronic and complex, autoimmune inflammatory disease affecting the central nervous system, presenting symptomatically by loss of neurological function, movement ability, and sensation [52]. Early onset of MS tends to occur in the form of recurrent attacks (relapses) followed by complete or incomplete recovery and time where disease activity is limited or seemingly non-existent. In 2012, Roskell et al. [52] performed a systematic review and network meta-analysis to compare the relative effectiveness for treatments of MS to reduce relapse rates. Each of the included studies followed participants for a period of time and counted the number of relapses each participant endured. Therefore, the primary outcome was a count (# of relapses) and the rate of relapse was of primary interest.

Complimentary Contributor Copy

130

Joseph Beyene, Ashley Bonner and Binod Neupane

Figure 3. Surface under the cumulative ranking curve (SUCRA) plots for each treatment in the diabetes data.

The data can be obtained from Table 1 in Roskell et al., 2012 [52]. Across the 14 studies, 8 different treatment regimens are compared, some of which are the same type but different dosage. There were two three-arm trials and the rest were two-arm. A network diagram representing the direct evidence available is presented in Figure 4. We fit a Bayesian GLM with Poisson likelihood with the log link function, under both fixed and random effect assumptions. The posterior samples of the fixed effect model returned a ̅ and , whereas the posterior samples from the random effect model resulted in ̅ and , suggesting a similar fit for both models. The pairwise values ranged from to depending on the pair of treatments, suggesting potential heterogeneity. Therefore, the random effects model was chosen.

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

131

Figure 4. Network configuration for multiple sclerosis data. Edges connecting treatments indicates the presence of direct comparison evidence within the network. Numbers along edges indicate the number of studies comparing the corresponding pair of treatments head-to-head.

Table 2 shows relative treatment effects for all possible comparisons. Since the outcomes in the trials are counts and rates of relapsing is of interest, and because we used the log link for a log-linear regression framework, relative treatment effects are expressed as rate ratios with reference to incurring relapses. For example, the model estimated that patients taking 0.5 mg of Fingolimod have a statistically significant relative relapse rate reduction of 57% per year than patients taking placebo, rate ratio = 0.43 (95% credible interval: 0.34 to 0.55). The rankogram is presented in Figure 5 and the cumulative probability plots along with SUCRA values are presented in Figure 6. It is clear from these visuals and confirmed by the estimated relative treatment effects that 0.5 mg of Fingolimod is the best ranked treatment with regards to reducing relapse rates (SUCRA = 99.4%) and placebo is the worst (SUCRA = 3.0%).

Complimentary Contributor Copy

Table 2. Estimated relative treatment effects as rate ratios for the multiple sclerosis data. Values in this table represent the estimated treatment effect with 95% credible interval of the treatment shown in the row relative to the treatment in the corresponding column Placebo

Fingolimod 0.5 mg

Glatiramer acetate 20 mg

Placebo

-----

2.30 (1.82, 2.95)

Fingolimod 0.5 mg

0.43 (0.34, 0.55) 0.61 (0.49, 0.70) 0.71 (0.55, 0.92) 0.84 (0.69, 1.01) 0.67 (0.54, 0.80) 0.91 (0.70, 1.14) 0.65 (0.52, 0.76)

Glatiramer acetate 20 mg Interferon beta-1a 22 mcg Interferon beta-1a 30 mcg Interferon beta-1a 44 mcg Interferon beta-1b 50 mcg Interferon beta-1b 250 mcg

1.64 (1.42, 2.02)

Interferon beta-1a 22 mcg 1.40 (1.09, 1.81)

Interferon beta-1a 30 mcg 1.19 (0.99, 1.44)

Interferon beta-1a 44 mcg 1.50 (1.25, 1.84)

Interferon beta-1b 50 mcg 1.10 (0.88, 1.43)

Interferon beta-1b 250 mcg 1.55 (1.31, 1.92)

-----

0.71 (0.55, 0.98)

0.61 (0.43, 0.85)

0.52 (0.40, 0.66)

0.65 (0.49, 0.88)

0.48 (0.35, 0.67)

0.67 (0.51, 0.91)

1.40 (1.02, 1.83)

-----

0.86 (0.62, 1.10)

0.73 (0.56, 0.89)

0.92 (0.72, 1.10)

0.67 (0.51, 0.85)

0.95 (0.77, 1.12)

1.64 (1.17, 2.33)

1.17 (0.91, 1.62)

-----

0.85 (0.63, 1.15)

1.07 (0.83, 1.40)

0.79 (0.56, 1.13)

1.11 (0.83, 1.53)

1.93 (1.51, 2.49)

1.38 (1.13, 1.78)

1.18 (0.87, 1.59)

-----

1.26 (1.03, 1.58)

0.93 (0.70, 1.24)

1.31 (1.06, 1.66)

1.53 (1.14, 2.03)

1.09 (0.91, 1.38)

0.93 (0.71, 1.21)

0.79 (0.63, 0.97)

-----

0.73 (0.55, 0.98)

1.03 (0.83, 1.32)

2.09 (1.49, 2.89)

1.49 (1.17, 1.97)

1.27 (0.89, 1.78)

1.08 (0.80, 1.42)

1.37 (1.02, 1.81)

-----

1.41 (1.14, 1.76)

1.49 (1.09, 1.95)

1.06 (0.89, 1.30)

0.90 (0.66, 1.21)

0.77 (0.60, 0.95)

0.97 (0.76, 1.21)

0.71 (0.57, 0.88)

-----

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

133

Figure 5. Rankogram showing estimated rank probabilities for treatments in the multiple sclerosis data.

Figure 6. Surface under the cumulative ranking curve (SUCRA) plots for each treatment in the multiple sclerosis data.

Complimentary Contributor Copy

134

Joseph Beyene, Ashley Bonner and Binod Neupane

3.3. CONTINUOUS OUTCOME: PARKINSON DISEASE DATA The Parkinson‘s data was used for NMA by Dias et al., 2013 [48]. The outcome is the mean off-time reduction when Parkinson‘s symptoms are experienced, a continuous measure, where means and standard deviations are available. The data can be obtained from Table A8 in the supplementary material of Dias et al. (2013) [48]. It includes 7 trials of which 6 have two-arms and 1 has three-arms. A total of 5 treatments were compared in 1613 participants. The network configuration with only direct treatment comparisons is depicted in Figure 7.

Figure 7. Network configuration for Parkinson data. Edges connecting treatments indicates the presence of direct comparison evidence within the network. Numbers along edges indicate the number of studies comparing the corresponding pair of treatments head-to-head.

We fit a Bayesian GLM with Normal likelihood and the identity link function, under both fixed and random effect assumptions. The posterior samples of the fixed effect model returned ̅ and , whereas the posterior samples from the random ̅ effect model resulted in and , suggesting a similar fit. The pairwise values ranged from to 37.38 . This suggests a variable but overall small amount of heterogeneity within the network and that the fixed effect assumption could be viable. To stay consistent with prior examples, the results that we report are from the random effect model. Table 3 shows relative treatment effects for all possible comparisons. Since the outcomes in the trials are means and differences in means of off-time reduction is of interest, relative

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

135

treatment effects are expressed as difference in means with reference to a reduction in offtime. For example, the model estimated that patients taking Treatment B has an average of 1.84 less off-time compared to patients on Treatment A. This means that Treatment B is superior to reducing off-time than Treatment A. Table 3. Estimated relative treatment effects as difference in means for the Parkinson data. Values in this table represent the estimated treatment effect with 95% credible interval of the treatment shown in the row relative to the treatment in the corresponding column Treatment A Treatment A Treatment B Treatment C Treatment D Treatment E

—– -1.84 (-2.87, -0.87) -0.49 (-1.75, 0.75) -0.53 (-1.76, 0.66) -0.83 (-2.33, 0.63)

Treatment B 1.84 (0.87, 2.87) —– 1.35 (-0.06, 2.79) 1.31 (-0.03, 2.68) 1.01 (-0.55, 2.60)

Treatment C 0.49 (-0.75, 1.75) -1.35 (-2.79, 0.06) —– -0.04 (-0.94, 0.88) -0.33 (-1.60, 0.93)

Treatment D 0.53 (-0.66, 1.76) -1.31 (-2.68, 0.03) 0.04 (-0.88, 0.94) —– -0.30 (-1.17, 0.58)

Treatment E 0.83 (-0.63, 2.33) -1.01 (-2.60, 0.55) 0.33 (-0.93, 1.60) 0.30 (-0.58, 1.17) —–

The rankogram in Figure 8 and the cumulative probability plots along with SUCRA values in Figure 9 show that Treatment B is the best ranked treatment with regards to reducing off-time (SUCRA = 96.5%) and Treatment A is the worst (SUCRA = 12.1%).

Figure 8. Rankogram showing estimated rank probabilities for treatments in the Parkinson disease data.

Complimentary Contributor Copy

136

Joseph Beyene, Ashley Bonner and Binod Neupane

Figure 9. Surface under the cumulative ranking curve (SUCRA) plots for each treatment in the Parkinson disease data.

CONCLUSION NMA has become increasingly popular due to its potential for impacting clinical decisions through its ability to synthesize evidence on multiple competing health care interventions and rank treatments as best, second best, and so on. Applications are flourishing in a wide range of clinical domains from cardiology, oncology, rheumatology, and beyond, making it important to understand the statistical models, their proper implementation and interpretation. In this chapter, we presented some of the commonly used generalized linear models (GLMs) for NMA under the popular Bayesian approach to inference as well as the important connection between heterogeneity and the choice between fixed and random effect assumptions, which influences these models. Measures taken to guide the choice between

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

137

using fixed and random effects models are diverse, pulling from both statistical and clinically based rational, implying how difficult the choice may be in practice. Moving from pairwise meta-analysis, incorporating indirect sources of evidence can only complicate these types of decisions.

REFERENCES [1]

Caldwell DM, Ades AE, Higgins JP. Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ 2005; 331: 897-900. [2] Bucher, HC, Guyatt, GH, Griffith, LE, Walter, SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J Clin Epidemiol 1997; 50: 683-91. [3] Lumley T. Network meta-analysis for indirect treatment comparisons. Stat Med 2002; 21: 2313-24. [4] Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med 2004; 23: 3105-24. [5] Thorlund K, Mills EJ. Sample size and power considerations in network meta-analysis. Syst Rev 2012; 1: 41. [6] Mills EJ, Rachlis B, Wu P, Devereaux PJ, Arora P, Perri D. Primary prevention of cardiovascular mortality and events with statin treatments: a network meta-analysis involving more than 65,000 patients. J Am Coll Cardiol 2008; 52: 1769-81. [7] Dequen P, Lorigan P, Jansen JP, van Baardewijk M, Ouwens MJ, Kotapati S. Systematic review and network meta-analysis of overall survival comparing 3 mg/kg ipilimumab with alternative therapies in the management of pretreated patients with unresectable stage III or IV melanoma. Oncologist 2012; 17. [8] Nixon R, Bansback N, Brennan A. The efficacy of inhibiting tumour necrosis factor alpha and interleukin 1 in patients with rheumatoid arthritis: a meta-analysis and adjusted indirect comparisons. Rheumatology (Oxford) 2007; 46: 1140-7. [9] Hoaglin DC, Hawkins N, Jansen JP, Scott DA, Itzler R, Cappelleri JC, et al. Conducting indirect-treatment-comparison and network-meta-analysis studies: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 2. Value Health 2011; 14: 429-37. [10] Hong H, Carlin BP, Shamliyan TA, Wyman JF, Ramakrishnan R, Sainfort F, et al. Comparing Bayesian and frequentist approaches for multiple outcome mixed treatment comparisons. Med Decis Making 2013; 33: 702-14. [11] Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, et al. Interpreting indirect treatment comparisons and network meta-analysis for health-care decision making: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 1. Value Health 2011; 14: 417-28. [12] Carlin BP, Hong H, Shamliyan TA, Sainfort F, Kane RL. Case study comparing Bayesian and Frequentist spproaches for multiple treatment comparisons. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013. Available from: http://www.ncbi.nlm.nih.gov/books/NBK132729/ (last accessed on March 21, 2014).

Complimentary Contributor Copy

138

Joseph Beyene, Ashley Bonner and Binod Neupane

[13] Jonas DE, Wilkins TM, Bangdiwala S, Bann CM, Morgan LC, Thaler KJ, Amick HR, Gartlehner G. Findings of Bayesian Mixed Treatment Comparison Meta-Analyses: Comparison and Exploration Using Real-World Trial Data and Simulation. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013. Available from: http://www.ncbi.nlm.nih.gov/books/NBK126109/ (last accessed on March 21, 2014). [14] Lu G, Ades A. Modeling between-trial variance structure in mixed treatment comparisons. Biostatistics 2009; 10: 792-805. [15] Thorlund K, Thabane L, Mills EJ. Modelling heterogeneity variances in multiple treatment comparison meta-analysis--are informative priors the better solution? BMC Med Res Methodol 2013; 13: 2. [16] Lambert PC, Sutton AJ, Burton P, Abrams KR, Jones D. How vague is vague? Assessment of the use of vague prior distributions for variance components. Stat Med 2005; 24: 2401-28. [17] van Valkenhoef G, Lu G, de Brock B, Hillege H, Ades AE, Welton NJ. Automating network meta-analysis. Res Synth Methods 2012; 3: 285-99. [18] Gelman A, Shirley K. Inference from simulations and monitoring convergence In: Brooks S, Gelman A, Jones GL, Meng X-L, editors. Handbook of Markov Chain Monte Carlo: Boca Raton: Chapman and Hall/CRC; 2011. [19] Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 1998; 7: 434-55. [20] Gelman A, Rubin RD. Inferences from iterative simulation using multiple sequences. Statistical Science 1992; 7: 457-72. [21] Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Fixed-effect versus randomeffects models. In: Borenstein M, Hedges LV, Higgins JPT, Rothstein HR, editor. Introduction to meta-analysis. Chichester, U.K.: John Wiley & Sons; 2009; p. 77-86. [22] Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. A basic introduction to fixedeffect and random-effects models for meta-analysis. Res Syn Meth 2010; 1: 97-111. [23] Sutton A, Ades AE, Cooper N, Abrams K. Use of indirect and mixed treatment comparisons for technology assessment. Pharmacoeconomics 2008; 26: 753-67. [24] Mills EJ, Bansback N, Ghement I, Thorlund K, Kelly S, Puhan MA, Wright J. Multiple treatment comparison meta-analyses: a step forward into complexity. Clin Epidemiol 2011; 3: 193-202. [25] Greco T, Landoni G, Biondi-Zoccai G, D'Ascenzo F, Zangrillo A. A Bayesian network meta-analysis for binary outcome: how to do it. Stat Methods Med Res 2013 Oct 28 [Epub ahead of print] doi: 10.1177/0962280213500185. [26] Mills EJ, Thorlund K, Ioannidis JP. Demystifying trial networks and network metaanalysis. BMJ 2013; 346: f2914. [27] Glenny AM, Altman DG, Song F, Sakarovitch C, Deeks JJ, D'Amico R, et al. Indirect comparisons of competing interventions. Health Technol Assess 2005; 9: 1-134. [28] Ioannidis JP. Integration of evidence from multiple meta-analyses: a primer on umbrella reviews, treatment networks and multiple treatments meta-analyses. CMAJ 2009; 181: 488-93. [29] Krahn U, Binder H, Konig J. A graphical tool for locating inconsistency in network meta-analyses. BMC Med Res Methodol 2013; 13: 35.

Complimentary Contributor Copy

Choosing the Statistical Model and between Fixed and Random Effects

139

[30] Jansen JP, Crawford B, Bergman G, Stam W. Bayesian meta-analysis of multiple treatment comparisons: an introduction to mixed treatment comparisons. Value Health 2008; 11: 956-64. [31] Jansen JP, Naci H. Is network meta-analysis as valid as standard pairwise metaanalysis? It all depends on the distribution of effect modifiers. BMC Med 2013; 11: 159. [32] Cooper NJ, Sutton AJ, Morris D, Ades AE, Welton NJ. Addressing between-study heterogeneity and inconsistency in mixed treatment comparisons: Application to stroke prevention treatments in individuals with non-rheumatic atrial fibrillation. Stat Med 2009; 28: 1861-81. [33] Baker SG, Kramer BS. The transitive fallacy for randomized trials: if A bests B and B bests C in separate trials, is A better than C? BMC Med Res Methodol 2002; 2: 13. [34] Salanti G. Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Res Syn Meth 2012; 3: 80-97. [35] Lu G, Ades AE. Assessing evidence inconsistency in mixed treatment comparisons. J Am Statist Assoc 2006; 101: 447-59. [36] Salanti G, Marinho V, Higgins JPT A case study of multiple-treatments meta-analysis demonstrates that covariates should be considered J Clin Epidemiol 2009; 62: 857-64. [37] Dias S, Sutton AJ, Welton NJ, Ades AE. Evidence synthesis for decision making 3: heterogeneity--subgroups, meta-regression, bias, and bias-adjustment. Med Decis Making 2013; 33: 618-40. [38] Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman HI. Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Stat Med 2002; 21: 371-87. [39] Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med 2002; 21: 1539-58. [40] Huedo-Medina TB, Sanchez-Meca J, Marin-Martinez F, Botella J. Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Methods 2006; 11: 193206. [41] Kulinskaya E, Dollinger MB, Bjorkestol K. Testing for homogeneity in meta-analysis I. The one-parameter case: standardized mean difference. Biometrics 2009; 67: 203-12. [42] Lam SK, Owen A. Combined resynchronisation and implantable defibrillator therapy in left ventricular dysfunction: Bayesian network meta-analysis of randomised controlled trials. BMJ 2007; 335: 925. [43] Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Stat Med 2010; 29: 932-44. [44] Lu G, Welton NJ, Higgins JPT, White IR, Ades AE. Linear inference for mixed treatment comparison meta-analysis: a two-stage approach. Res Synth Methods 2011; 2: 43-60. [45] White, I, Barrett JK, Jackson D, Higgins JPT. Consistency and inconsistency in network meta-analsyis: model estimation using multivariate meta-regression Res Syn Meth 2012; 3: 111-25. [46] Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Med Decis Making 2013; 33: 641-56.

Complimentary Contributor Copy

140

Joseph Beyene, Ashley Bonner and Binod Neupane

[47] Ades, AE, Mavranezouli I, Dias S, Welton NJ, Whittington C, Kendall T. Network meta-analysis with competing risk outcomes. Value Health 2010; 13: 976-83. [48] Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Decis Making 2013; 33: 607-17. [49] Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. J Roy Stat Soc B 2002; 64: 583-616. [50] Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. J Clin Epidemiol 2011; 64: 163-71. [51] Elliott WJ, Meyer PM. Incident diabetes in clinical trials of antihypertensive drugs: a network meta-analysis. Lancet 2007; 369: 201-7. [52] Roskell NS, Zimovetz EA, Rycroft CE, Eckert BJ, Tyas DA. Annualized relapse rate of first-line treatments for multiple sclerosis: a meta-analysis, including indirect comparisons versus fingolimod. Curr Med Res Opin 2012; 28: 767-80.

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 9

CHOOSING THE APPROPRIATE STATISTICS Jing Zhang, Ph.D.* and Lifeng Lin, Ph.D. Student† Division of Biostatistics, School of Public Health, University of Minnesota, MN, US

ABSTRACT The contrast-based (CB) method for network Meta-analysis (NMA) on binary outcomes typically focuses on the relative treatment effects and is incapable to estimate the overall treatment-specific event rates. There have been a lot of papers reporting only odds ratios (ORs) based on the CB method. However, there is doubt about the appropriateness of merely reporting this summary statistic. In this chapter, we present a recently proposed arm-based (AB) method developed from a missing data perspective and illustrate how treatment-specific event proportions, risk differences (RDs), risk ratios (RRs), and ORs can be computed in NMAs. This more comprehensive and appropriate reporting of summary statistics facilitates patients and their caregivers to understand and trade-off efficacy and safety and thus the AB approach is recommended in practice.

Keywords: Appropriate statistics, arm-based method, contrast-based method, odds ratio, relative risk, population-averaged event rate

INTRODUCTION Network meta-analysis (NMA) (also called mixed or multiple treatment comparisons) expands the scope of conventional pairwise meta-analyses to simultaneously compare multiple treatments, synthesizing both direct comparisons of interventions within randomized controlled trials (RCTs) and indirect comparisons across trials. With appropriate assumptions, borrowing strength from indirect evidence allows more precise estimates of treatment

*

Corresponding author: Jing Zhang, PhD, A450 Mayo Building, Division of Biostatistics, School of Public Health, University of Minnesota, MN 55455, USA. Phone: +1 612-229-1978. Email: [email protected]. † [email protected].

Complimentary Contributor Copy

142

Jing Zhang and Lifeng Lin

differences than can be obtained from pairwise meta-analysis. [1] In a nutshell, NMA enables simultaneous inference of multiple treatments and can strengthen inference. A limitation in reporting for the contrast-based (CB) NMA methods of binary outcomes is that usually only odds ratios (ORs) [2-10] are reported as the summary statistic. This limitation arises because most existing CB approaches and software [11-19] are not capable of estimating treatment-specific response proportions and summary statistics such as risk difference (RD) and risk ratio (RR). They choose one of the arms in each study as "baseline" (nuisance parameters) and focus on estimating merely treatment contrasts, mainly ORs. The inappropriateness of reporting merely OR lies in the following three aspects. First, OR buries the detail information carried by event rate. For example, a patient wants to choose between treatments A vs. B with the following two sets of one-year survival probabilities: a) 0.9 versus 0.5; b) 0.009 versus 0.001. Patients will definitely choose treatment A in the 1st scenario regardless of other characteristics of the two treatments, whereas in the 2nd scenario they may not have any preference. Yet under both scenarios, the same OR of 4.0 is obtained. Second, ORs are often mistakenly interpreted as RRs by physicians, patients and their caregivers, although it has been well-known that RRs and ORs diverge when events are common (i.e., event rates are higher than 10%). [20-23] Third, RD carries some important information that can‘t be expressed by OR and RR. In addition, in the presence of effect modification, when an exposure increases risk but all risks are less than 0.5, it is possible for the RR and RD to change in the same direction while the OR to change in the opposite direction. [24] In summary, event rate, RR and RD in addition to OR are all very important summaries for decision making. Thus we recommend reporting both relative measures (including ORs and RRs) and absolute measures (event rates and RDs). Although several works [15, 25-28] have discussed the transformation from ORs to event rates, RRs and RDs, they rely on a very strong assumption that the event rate in a ―reference‖ treatment group (one of the treatments of interest) can be accurately estimated. [29] However, this estimation needs either to borrow some external data or to summarize trials containing the ―reference‖ arm (e.g. using a separate random effects model). The external data, not to mention their unavailability, may come from a different population from the original NMA, causing potential bias. From the theory of missing data analysis, [30] the latter methods are unbiased only under a strong missing completely at random assumption that the ―reference‖ arm is randomly selected to be included or not included in the trials. This chapter introduces an arm-based (AB) multivariate Bayesian hierarchical model from the perspective of missing data analysis as recently proposed in Zhang et al. [29] Their method is more robust to the missing data and is more accurate compared to the CB method, talked in Zhang et al. [29] It can estimate all aforementioned summary statistics, enabling more comprehensive reporting. We re-analyze one published network meta-analyses in The Lancet on incident diabetes to show how these statistics can be summarized with this AB method in this Chapter. [31]

THE ARM-BASED (AB) APPROACH Suppose a network meta-analysis contains a collection of trials i = 1, 2, ..., I, and each of the trials only compares a subset of the complete collection of K treatments. Let be the

Complimentary Contributor Copy

143

Choosing the Appropriate Statistics

number of treatments and be the set of treatments that are compared in trial i. Let Di  {( yik , nik ), k  Si } denote the available data from the ith trial, where nik is the total number of subjects and is the total number of events for the kth treatment in the ith trial. Then the corresponding probability of success is denoted by pik . In this section, we present the AB approach [29] and illustrate how to estimate the overall treatment-specific event rates from the perspective of missing data analysis as well as RRs, RDs, and ORs. Zhang et al. [29, 32] view the analytic challenges associated with NMA from the perspective of missing data analysis [30, 33-36] and assume that each study hypothetically compares all treatments, many of which are missing by design and thus can be considered as missing at random. [33] Under this assumption, the treatment specific event rates as well as its associated summary statistics such as the RD, RR and OR can be estimated. The AB approach is as follows,

yik ~ Bin(nik , pik ), k  Si , i  1,..., I , (2.1)

f 1 (pik )  k   k ik ,( i1,  iK )T ~ MVN (0, R K ), K  diag(1 ,..., K )  RK  diag(1 ,..., K ) , where is some proper link function, are treatment-specific fixed effects, is a positive definite correlation matrix, and is the standard deviation for the random effects  ik . Then

diag(1 ,..., K ) is a diagonal matrix with elements  i , the covariance matrix is thus K  diag(1 ,..., K )  RK  diag(1 ,..., K ) . Here, captures trial-level heterogeneity in response to treatment k , and RK captures the within-study dependence among treatments. 1

If we use a probit link for f () , the population-averaged (or marginal) treatmentspecific event rate can be calculated as a closed form as follows





 k  E  pik | k , k      k   k z   z  dz  k 



1   k2 , k  1,..., K ,

(2.2)

where is the standard normal cumulative distribution function and is the standard normal density function. Based on the marginal event rate  k , the marginal OR, RR and RD are defined as ORkl   k 1   k   l 1   l  , and for a pairwise comparison between treatments k and l (k  l). There are several important simplifications for model (2.2) talked in Zhang et al. [32] First, the simplest model can be specified as  (pik )  k   i with 1

 i ~ N (0,1) .

Another model which allows correlation among random effects for different treatments in the same trial is  (pik )  k   ik , where ( i1 ,... iK ) has an exchangeable correlation 1

matrix with parameter ρ. A more complex model allowing heterogeneous variance is with ( i1 ,... iK ) having an exchangeable correlation matrix with parameter ρ.

Complimentary Contributor Copy

144

Jing Zhang and Lifeng Lin

Minimally informative but proper priors are used for the above (2.1). For example, a 2 weakly informative prior with    1000 is usually used for  k . A Wishart prior for the precision matrix is set with the degrees of freedom being n  K and V being a known K  K matrix, e.g., diagonal elements equal 1.0, and off-diagonal elements equal 0.005. This Wishart prior corresponded to a 95% CI of 0.45 to 32.10 for the standard deviation parameters and a 95% CI of –1.00 to 1.00 for the correlation parameters. A uniform prior is set for both and

 k , and a prior is set for correlation parameter ρ.

All models were implemented via Markov chain Monte Carlo (MCMC) method using the WinBUGS software. We employed a burn-in of 1000000 iterations and another 1000000 iterations for the posterior summaries. Convergence was checked with the Gelman-Rubin statistics and visual plots including history plot and density plot. The detailed WinBUGS code for various models are attached in the Appendix. The Deviance Information Criterion (DIC) [37] was used as a guide to select the final model for posterior inference. The deviance, up to an additive quantity not depending upon is ̅̅̅̅̅̅̅̅ where is the likelihood for the respective model. The DIC is given by D  θ   p , where is D

the Bayesian deviance, and is the effective number of model parameters. It rewards better fitting models through the first term and penalizes more complex models through the second term. A model with smaller overall DIC value is preferred.

A BRIEF REVIEW OF THE CONTRAST-BASED (CB) APPROACH Though we focus on the AB approach in the Chapter, we briefly introduce the most commonly used CB models as the following Bayesian hierarchical model, [15, 17]

yik ~ Bin(nik , pik ), k  Si , i  1,..., I , (2.1) logit( pik )  i  X ik ibk , ibk ~ N (dbk , bk2 ) , dhk  dbk  dbh , b  h  k  Si ,

ORhk  edhk . Here is the specified ―baseline‖ treatment for the ith trial, commonly denoted as for simplicity. X ik is an indicator taking value if and if k  b . Some prior distributions are then (b ) chosen for i , d bk ,  bk2 and  hk . The focus of the above model is the estimation of comparing treatment h versus k. The ―baseline‖ effect i ‘s are treated as nuisance parameters, therefore this current CB approach is not able to estimate the overall treatment-specific event rates. The transformation from ORs to event rates, as well as RRs and RDs, is problematic as we discussed in the introduction. We only briefly present the CB approach in this section in case the readers want to learn more about the CB approach. For the rest of this chapter, we will just focus on the AB approach.

Complimentary Contributor Copy

145

Choosing the Appropriate Statistics

EXAMPLE: EFFECT OF ANTIHYPERTENSIVE AGENTS ON INCIDENT DIABETES It is well known that some antihypertensive drugs may precipitate diabetes. Some longterm clinical trials of antihypertensive agents have shown significant differences in incident diabetes rates for different treatment groups. [38-40] A systematic review was undertaken to identify long-term randomized clinical trials of antihypertensive drugs that reported the number of new cases of diabetes from 1966 to 2006. Finally 22 clinical trials with 143153 hypertensive patients who did not have diabetes at randomization were identified. Elliott et al. [31] used network meta-analysis to compare the effect of 6 classes of antihypertensive drugs (angiotensin-receptor blockers (ARB), angiotensin-converting-enzyme (ACE), calciumchannel blocker (CCB), β blocker, diuretic, and placebo) on incident diabetes mellitus. The main outcome was the proportion of patients who develop diabetes. The initial drug therapy used in the trials and their crude event rates (the number of patients developed incident diabetes divided by the total number at risk) include: ACE 1618/ 23351=6.93%, ARB 1189/14185=8.38%, CCB 2791/38809=7.19%, Diuretic 973/18699=5.20%, Placebo 1686/22982=7.34% and β-blocker 2705/36150=7.48%. We reanalyze this NMA with the AB approach presented in Section 2. Table 1. Model selection with deviance information criterion (DIC) M1 425.9

DIC 1

M1:

 (p ik )   k   i

with

correlation; M3: where DIC is in bold.

 i ~ N (0, 1) ( i1 , ... iK )

M2 410.3

M3 412.2

M4 415.5

1

; M2:

 (p ik )   k   ik

where

( i1 , ... iK )

has an exchangeable

has an exchangeable correlation; M4: Model (2.1). The smallest

Table 1 shows the DIC values obtained with (2.1) and its various model simplifications. The M2 with homogeneous variance and exchangeable correlation has the smallest DIC=410.3. Both M1 and M4 are DIC-worse than M2. Though the difference of DICs between M2 and M3 is less than 5, we choose M2 as our final model since it is structurally simpler and computationally faster. The results of M2 are explored further in Table 2 and Figure 1. Table 2 summarizes the results of M2 for this network meta-analysis. The populationaveraged treatment-specific event (newly developed diabetes) rates range from 5.70% (95% CI 4.21% to 7.56%) for ARB to 8.65% (95% CI 6.62% to 10.96%) for Diuretic. The upper and lower triangular panels report the RRs and RDs of all pairwise comparisons. With CCB as a referent, ACE and ARB have significantly smaller number of incident diabetes; with Diuretic as the referent treatment, ACE, ARB, CCB, and Placebo are associated with significantly few incident diabetes; with Placebo as the referent treatment, Diuretic and βblocker have significantly larger number of incident diabetes; with β-blocker as the referent treatment, ACB, ARB, CCB, and β-blocker are associated with significantly less new cases. In contrast, Elliott et al. [31] reported that the ORs comparing ACE, ARB, CCB, and Placebo versus Diuretic were statistically significant when Diuretic was the referent agent,

Complimentary Contributor Copy

146

Jing Zhang and Lifeng Lin

while only Diuretic and ARB retained significance when Placebo was selected as the reference agent. In Table 2, comparisons statistically significant in the AB method but not in Elliott et al. [31] or vice versa are noted with *. There are only two cells labeled with * because in the Elliott et al. only ORs with Diuretic and Placebo as referents are provided. Therefore although the AB method simultaneously provides all pairwise comparisons, we can only compare part of its results with the CB method due to the limitation in reporting of the CB method. Table 2. Population averaged event proportions, relative risks (RRs), and risk differences (RDs) of the 6 antihypertensive agents based on model M2 (see Appendix) ACE 6.19% (4.67%,7.99%) 0.00 ARB (-0.01,0.02) -0.01 CCB (-0.02,0.00) -0.02 Diuretic (-0.04,-0.01) -0.01 Placebo (-0.02,0.00) -0.02 β-blocker (-0.03,-0.01) ACE

ARB

CCB

Diuretic

Placebo

β-blocker

1.08 (0.88,1.34) 5.70% (4.21%,7.56%) -0.01 (-0.03,-0.00) -0.03 (-0.05,-0.02) -0.01* (-0.02,0.00) -0.03 (-0.04,-0.01)

0.87 (0.75,1.00) 0.80 (0.66,0.96) 7.12% (5.47%,9.05%) -0.02 (-0.03,-0.00) 0.00 (-0.01,0.01) -0.01 (-0.02,-0.00)

0.72 (0.61,0.83) 0.66 (0.54,0.81) 0.82 (0.71,0.97) 8.65% (6.62%,10.96%) 0.02 (0.01,0.03) 0.00 (-0.01,0.02)

0.90 (0.79,1.05) 0.83* (0.70,1.00) 1.04 (0.90,1.22) 1.26 (1.08,1.49) 6.84% (5.22%,8.77%) -0.01* (-0.03,-0.00)

0.75 (0.64,0.86) 0.69 (0.57,0.83) 0.86 (0.76,0.98) 1.05 (0.89,1.22) 0.83* (0.70,0.96) 8.27% (6.34%,10.42%)

Drugs are reported in alphabetical order. Diagonal panels are the population averaged event rates (i.e., proportion of patients who developed incident diabetes); upper triangular and lower triangular panels are the relative risks (RRs) and risk differences (RDs) of the first drug in alphabetical order compared with the second drug in alphabetical order, respectively. Drugs with higher event rates are more harmful; RRs smaller than 1.0 or negative RDs favor the first drug in alphabetical order. To obtain comparisons in the opposite direction, reciprocals should be taken for RR and opposite sign should be used for RD. Statistically significant results are in bold and underlined. Comparisons statistically significant here but not in Elliott et al. 31 or vice versa are noted with *. For all summaries, we report both the Bayesian posterior medians and the 95% credible intervals.

Figure 1 compares the ORs (with Diuretic as the referent agent) estimated from the best AB model (M2) with those estimated from the Elliott et al. [31] The solid and dashed lines represent the posterior medians of ORs with their 95% CIs estimated from the AB and CB (Elliott et al) [31] methods respectively. All ORs in Figure 1 from both methods are lower than 1. In summary, the CB and the AB methods are consistent in term of statistical significance when diuretic is selected as the referent agent. In summary, we illustrate how to estimate and interpret the event rates, RRs, RDs, and ORs with the AB method presented in the prior section. The CB method, in contrast, can only provide ORs. In spite of no difference in statistical significance of the AB and CB methods in this antihypertensive agents case study, the two methods do lead to difference in statistical significance in some other cases, for example, the two case studies in Zhang et al. [29]

Complimentary Contributor Copy

Choosing the Appropriate Statistics

147

Figure 1. Comparison of the odds ratios (ORs) estimated from the arm-based (AB) method versus the ORs estimated from the contrast-based (CB) method for the 6 antihypertensive agents. ACE=angiotensin-converting-enzyme, ARB=angiotensin-receptor blockers, and CCB=calciumchannel blockers.

DISCUSSION AND FUTURE WORK In this chapter, we introduce the arm-based method proposed by Zhang et al. [29] and illustrate how to provide more appropriate and comprehensive summaries for network metaanalysis with this method using a published network meta-analysis on incident diabetes. Many current network meta-analysis methods focus on treatment contrasts (mainly OR) and cannot estimate treatment specific event rate, RR and RD without a strong assumption of missing completely at random or borrowing external data. The AB method instead is valid under the missing at random assumption and can accurately estimate population-averaged event rate, RR, RD, and OR. [29] With the published NMA on incident diabetes, we illustrate how the AB method can be used to estimate more appropriate statistics and comprehensive reporting, and in some circumstances lead to different conclusions from the CB method. The comparisons between the AB and the CB method are talked in detail in Zhang et al. [29, 32] Our analysis has some limitations. First, the typical data set of an NMA is like an incomplete block design with seriously missing data and nonignorable missingness may appear due to some deliberate choice. Though Zhang et al. [32] proposed models incorporating nonignorable missingness, we still assume the missingness to be at random. Second, although the AB approach with random effects accounts for heterogeneity among treatments, there is not statistics qualitatively measuring the heterogeneity. Third, ongoing debate over the value of network meta-analysis concerns the agreement between the direct and indirect evidence, but we did not consider inconsistency in this chapter. There are some other papers talking about the disagreement between direct and indirect information for the

Complimentary Contributor Copy

148

Jing Zhang and Lifeng Lin

CB approach. [15, 41-44] However, statistical methods for identifying and accounting for potential inconsistency for the AB approach await further development. In summary, this Chapter presents a relatively new AB method proposed in Zhang et al., [29] states its strengths, and illustrates how to provide comprehensive summaries using a published NMA on incident diabetes. Patients and their caregivers can choose their preferred statistic summaries according to their particular circumstances. In a nutshell, the AB method should be recommended in practice in order to choose the appropriate statistics. [29]

CONFLICT OF INTEREST AND FUNDING DISCLOSURE: J. Z. is supported in part by the US NIAID AI103012.

APPENDIX WinBUGS code for various models M1 model{ for(i in 1:sN) { p [i] 75%) might obligate a reviewer to look for covariates to explain this variability, a nonsignificant test or a small I2 (e.g., 1.96 equivalent to P 99% probability of being ranked the least effective treatment). Overall, the ranking plot indicates some differences between the outcome measures, while an overall trend can be observed.

Complimentary Contributor Copy

Case Study in Rheumatology

351

AbaIV, abatacept intravenously; AbaSC, abatacept subcutaneously; Ada, adalimumab; Cert, certolizumab; Eta, etanercept; Gol, golimumab; Inf, infliximab. Figure 4. Posterior distribution of the relative efficacy of each biologic agent versus placebo estimated using ACR20, ACR50, and HAQ (as defined in Figure 3).

Checking for Inconsistency Figure 6 plots the posterior mean deviance of each data point based on the model relaxing the consistency assumption against the original model. The contributions to the deviance are similar in both models, indicating no inconsistencies within the network for neither of the three outcome measures. The deviance contribution of some of the multi-armed trials [26-30] are higher than expected; this can be attributed to the fixed effects assumption within trials where the same treatment is given.

Complimentary Contributor Copy

352

Susanne Schmitz, Roisin Adams, Michael Barry et al.

Figure 5. Ranking plot. For each treatment, the probability of it being the best, second best etc. is plotted given each of the three outcome measures, ACR20, ACR50 and HAQ (as defined in Figure 3).

Complimentary Contributor Copy

Case Study in Rheumatology

353

RCT, randomized controlled trial; OBS, observational. Figure 6. Individual data points‘ posterior mean deviance contribution for the model relaxing the consistency assumption against the original model for each of the three outcome measures.

Including Observational Evidence As only one of the observational trials identified in the literature search reports ACR20 and ARC50 outcomes, the analysis is restricted to the HAQ outcome. Fitting a three-level hierarchical model yields overall estimates of efficacy as well as estimates for each study type. Table 3 summarizes the biologic treatment effect estimates on both levels. For brevity, only comparisons versus placebo are recorded. Only estimates including adalimumab, infliximab and etanercept change, as no further information was added for the remaining treatments. The efficacy of etanercept is affected mostly by the inclusion of observational data. The multiplier drops from 0.31 based only on RCT evidence to 0.24 when including observational data. The efficacy of adalimumab and infliximab increases slightly from 0.21 to 0.22 for adalimumab and from 0.11 to 0.12 for infliximab. The results show how RCT level estimates and observational level estimates are combined to overall estimates, which lie between the trial design estimates. Borrowing strength across the network results in the trial design level estimates being drawn towards the overall mean. Figure 7 illustrates the shift of the posterior distribution from RCT only to including observational evidence for the etanercept effectiveness compared to placebo. Figure 8 illustrates the effects of adjusting for bias. Adjusting for overprecision by downweighting the impact of observational information, shifts overall results towards the RCTonly results, as the weight on observational evidence is decreased. The effect on overall estimates is small, when adjusting for an assumed 30% over- and underestimate of the treatment effect. This is due to the strength of the observational network in relative effects between biologic treatments rather than the effects relative to placebo.

DISCUSSION The NMA presents efficacy estimates for each biologic agent against each other. Consistent with the individual trials, all agents show a strong improvement over placebo across all outcome measures. While overall trends remain the same, the ranking of treatment is dependent on outcome measure.

Complimentary Contributor Copy

354

Susanne Schmitz, Roisin Adams, Michael Barry et al.

Table 3. 3-level Hierarchical Model Results: Mean estimate and 95% credible interval for relative efficacy estimates Comparison (A vs. B) Overall level RCT level OBS level RCT only Ada vs. P 0.22 (0.12,0.33) 0.21 (0.17,0.26) 0.23 (0.13,0.34) 0.21 (0.16,0.26) Inf vs. P 0.12 (0.02,0.23) 0.12 (0.05,0.19) 0.13 (0.02,0.24) 0.11 (0.04,0.19) Eta vs. P 0.24 (0.14,0.37) 0.27 (0.17,0.37) 0.22 (0.12,0.34) 0.31 (0.20,0.41) Gol vs. P 0.21 (0.11,0.32) 0.21 (0.11,0.32) 0.21 (0.12,0.31) Cert vs. P 0.24 (0.19,0.30) 0.24 (0.19,0.30) 0.24 (0.19,0.29) Aba IV vs. P 0.22 (0.12,0.33) 0.22 (0.12,0.33) 0.22 (0.12,0.32) Aba SC vs. P 0.21 (0.12,0.31) 0.21 (0.12,0.31) 0.21 (0.12,0.30) ζ 0.05 (0.00,0.10) 0.03 (0.00,0.08) 0.07 (0.00,0.24) 0.03 (0.00,0.07) Mean difference in % HAQ improvement and 95% credible interval for each biologic agent versus placebo on study-type and overall level. RCT, randomized controlled trial; OBS, observational trial; Ada, adalimumab; Inf, infliximab; Eta, etanercept; Gol, golimumab; Cert, certolizumab; Aba IV, abatacept intravenously; Aba SC, abatacept subcutaneously; P, placebo; ζ, random effects standard deviation.

RCT, randomized controlled trial; OBS, observational study. Figure 7. Posterior Distribution of the relative efficacy of etanercept versus placebo estimated using the Health Assessment Questionnaire (HAQ) outcome based on randomized controlled trial evidence only and including observational evidence.

The dependence on cut-off point in binary outcome measures is a common problem and has recently been assessed in the context of NMA [31]. Incorporating observational data shrinks the estimated differences in effects between treatments compared to analyzing RCT evidence alone. Some heterogeneity was detected in the analysis. The between trial variance (ζ2) captures the heterogeneity between study populations of the same drug; variation between drugs is not captured. The analysis illustrates a framework for systematically including evidence from different trial designs in an NMA model. The hierarchical modelling approach accounts for uncertainty arising from combining information from different trial designs by using random effects and allows for bias adjustment due to design.

Complimentary Contributor Copy

Case Study in Rheumatology

a

355

b

RCT, randomized controlled trial. Figure 8. Bias adjustment in the hierarchical model: Percentage Health Assessment Questionnaire (HAQ) improvement of etanercept versus placebo; a) adjusting for overprecision applying different weights to observational data; estimates shift towards RCT only results as weight on observational evidence decreases; b) adjusting for a 30% over- and 30% underestimation of the treatment effect.

The hierarchy levels allow us to quantify the impact evidence from different designs have on the result while adjusting for potential bias. Other methodologies for combining different trial designs include naïve pooling, which does not distinguish between designs, and using observational data as prior information in the Bayesian framework. Naïve pooling does not allow for bias adjustment or additional uncertainty due to design variation. Using observational data to inform the prior allows for bias adjustment, but does not model heterogeneity due to trial design. Furthermore, it is not possible to distinguish between more than two designs. A hierarchical modelling approach therefore, is the most flexible approach to combining data from different designs. A challenge is given by open-label extension trials. Such trials are typically one armed and cannot be included in the analysis without adjustment. In this case study, we took a matching approach, where baseline characteristics are compared across available trials to identify a suitable match [32]. However, such methods do not control for unobserved variables that may also affect the outcome. We have therefore excluded the one-armed trial from the analysis in a sensitivity analysis, which did not change results notably. Results are not reported in this chapter.

CONCLUSION The analysis confirms results from individual trials that biologic agents provide a significant improvement over placebo in the treatment of rheumatoid arthritis in patients who have previously failed MTX. Some differences between biologic agents are observed, however, ranking depends on the outcome measure chosen, highlighting the fact that the choice of cut-off point for binary outcome measures can have a significant impact on the results. The cut-off point in clinical trials is typically chosen with the aim of demonstrating a particular treatment effect in order to gain market authorization. The aim of the NMA in this chapter is to estimate the relative efficacy of agents, all of which have gained market authorization already; results may inform treatment choice and cost effectiveness analyses.

Complimentary Contributor Copy

356

Susanne Schmitz, Roisin Adams, Michael Barry et al.

Due to the different objective, different cut-off thresholds may be of interest. This issue does not arise when analyzing continuous outcome measures. The addition of observational evidence to the analysis further strengthens the results confirming that treatment effects are repeated in a real life setting. For many disease areas, there are many observational data available providing additional information on treatment effectiveness. We think that it is important for an informed decision-making process to include all available evidence. Including observational data in the base case analysis or in form of a sensitivity analysis can greatly improve evidence synthesis as part of economic assessments, as well as choice of agent in a clinical setting. The method described in this chapter provides a flexible approach to analyze the impact of such additional information. Including all available evidence, while acknowledging heterogeneity between study designs, is the basis of informed decision making.

REFERENCES Klareskog, L., Catrina, A. I., Paget, S. Rheumatoid arthritis. Lancet 2009; 373: 659-72. Rothwell, P. M. External validity of randomised controlled trials:―to whom do the results of this trial apply?‖. Lancet 2005; 365: 82-93. [3] O'Rourke, K., Walsh, C., Hutchinson, M. Outcome of beta-interferon treatment in relapsing-remitting multiple sclerosis: a Bayesian analysis. J. Neurol. 2007; 254: 1547-54. [4] Ades, A., Sculpher, M., Sutton, A., Abrams, K., Cooper, N., Welton, N., Lu, G. Bayesian methods for evidence synthesis in cost-effectiveness analysis. Pharmacoeconomics 2006; 24: 1-19. [5] Sutton, A. J., Higgins, J. Recent developments in meta‐analysis. Stat. Med. 2008; 27: 625-50. [6] Prevost, T. C., Abrams, K. R., Jones, D. R. Hierarchical models in generalized synthesis of evidence: an example based on studies of breast cancer screening. Stat. Med. 2000; 19: 3359-76. [7] Schmitz, S., Adams, R., Walsh, C. Incorporating data from various trial designs into a mixed treatment comparison model. Stat. Med. 2013; 32: 2935-49. [8] Schmitz, S., Adams, R., Walsh, C. D., Barry, M., FitzGerald, O. A mixed treatment comparison of the efficacy of anti-TNF agents in rheumatoid arthritis for methotrexate non-responders demonstrates differences between treatments: a Bayesian approach. Ann. Rheum. Dis. 2012; 71: 225-30. [9] Saag, K. G. 1, Teng, G. G., Patkar, N. M., Anuntiyo, J., Finney, C., Curtis, J. R., Paulus, H. E., Mudano, A., Pisu, M., Elkins-Melton, M., Outman, R., Allison, J. J., Suarez Almazor, M., Bridges, S. L. Jr, Chatham, W. W., Hochberg, M., MacLean, C., Mikuls, T., Moreland, L. W., O'Dell, J., Turkiewicz, A. M., Furst, D. E.; American College of Rheumatology. American College of Rheumatology 2008 recommendations for the use of nonbiologic and biologic disease modifying antirheumatic drugs in rheumatoid arthritis. Arthritis Rheum. 2008; 59: 762-84. [10] Hyrich, K. L. 1, Watson, K. D., Silman, A. J., Symmons, D. P.; British Society for Rheumatology Biologics Register. Predictors of response to anti-TNF-α therapy among

[1] [2]

Complimentary Contributor Copy

Case Study in Rheumatology

[11]

[12] [13] [14] [15] [16] [17]

[18]

[19] [20] [21]

[22]

[23]

[24]

[25]

357

patients with rheumatoid arthritis: results from the British Society for Rheumatology Biologics Register. Rheumatology (Oxford) 2006; 45: 1558-65. McCarron, C. E., Pullenayegum, E. M., Thabane, L., Goeree, R., Tarride, J.-E. The importance of adjusting for potential confounders in Bayesian hierarchical models synthesising evidence from randomised and non-randomised studies: an application comparing treatments for abdominal aortic aneurysms. BMC Med. Res. Methodol. 2010; 10: 64. Spiegelhalter, D. J., Abrams, K. R., Myles, J. P. Bayesian approaches to clinical trials and health-care evaluation: New York: John Wiley and Sons; 2004. Turner, R. M., Spiegelhalter, D. J., Smith, G., Thompson, S. G. Bias modelling in evidence synthesis. J. R. Stat. Soc. A 2009; 172: 21-47. Lilford, R., Braunholtz, D. The statistical basis of public policy: a paradigm shift is overdue. BMJ 1996; 313: 603. Benson, K., Hartz, A. J. A comparison of observational studies and randomized, controlled trials. New Engl. J. Med. 2000; 342: 1878-86. Concato, J., Shah, N., Horwitz, R. I. Randomized, controlled trials, observational studies, and the hierarchy of research designs. New Engl. J. Med. 2000; 342: 1887-92. Nixon, R., Bansback, N., Brennan, A. Using mixed treatment comparisons and metaregression to perform indirect comparisons to estimate the efficacy of biologic treatments in rheumatoid arthritis. Stat. Med. 2007; 26: 1237-54. Dias, S., Welton, N. J., Sutton, A. J., Caldwell, D. M., Lu, G., Ades, A. E. NICE DSU Technical Support Document 4: Inconsistency in Networks of Evidence Based on Randomised Controlled Trials. 2011 Available from: http://www.nicedsu.org.uk (last accessed on March 28, 2014). Lunn, D. J., Thomas, A., Best, N., Spiegelhalter, D. WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility. Stat. Comput. 2000; 10: 325-37. Plummer, M., Best, N., Cowles, K., Vines, K. CODA: Convergence diagnosis and output analysis for MCMC. R. news 2006; 6: 7-11. Kievit, W., Adang, E. M., Fransen, J., Kuper, H. H., van de Laar, M. A., Jansen, T. L., De Gendt, C. M., De Rooij, D. J., Brus, H. L., Van Oijen, P. C., Van Riel, P. C. The effectiveness and medication costs of three anti-tumour necrosis factor α agents in the treatment of rheumatoid arthritis from prospective clinical practice data. Ann. Rheum. Dis. 2008; 67: 1229-34. Bazzani, C. 1, Filippini, M., Caporali, R., Bobbio-Pallavicini, F., Favalli, E. G., Marchesoni, A., Atzeni, F., Sarzi-Puttini, P., Gorla, R. Anti-TNFα therapy in a cohort of rheumatoid arthritis patients: clinical outcomes. Autoimmun. Rev. 2009; 8: 260-5. Klareskog, L., Gaubitz, M., Rodriguez-Valverde, V., Malaise, M., Dougados, M., Wajdula, J. A long-term, open-label trial of the safety and efficacy of etanercept (Enbrel) in patients with rheumatoid arthritis not treated with other disease-modifying antirheumatic drugs. Ann. Rheum. Dis. 2006; 65: 1578-84. Ades, A., Welton, N., Lu, G. Introduction to Mixed Treatment Comparisons. Available at: http://www.bristol.ac.uk/social-community-medicine/media/mpes/intro-to-mtc.pdf 2007 (last accessed on March 28, 2014). Weinblatt, M. E., Kremer, J. M., Bankhurst, A. D., Bulpitt, K. J., Fleischmann, R. M., Fox, R. I., Jackson, C. G., Lange, M., Burge, D. J. A trial of etanercept, a recombinant

Complimentary Contributor Copy

358

[26]

[27]

[28]

[29]

[30]

[31]

[32] [33]

[34]

[35]

Susanne Schmitz, Roisin Adams, Michael Barry et al. tumor necrosis factor receptor: Fc fusion protein, in patients with rheumatoid arthritis receiving methotrexate. New Engl. J. Med. 1999; 340: 253-9. Weinblatt, M. E., Keystone, E. C., Furst, D. E., Moreland, L. W., Weisman, M. H., Birbara, C. A., Teoh, L. A., Fischkoff, S. A., Chartash, E. K. Adalimumab, a fully human anti–tumor necrosis factor α monoclonal antibody, for the treatment of rheumatoid arthritis in patients taking concomitant methotrexate: the ARMADA trial. Arthritis Rheum. 2003; 48: 35-45. Van de Putte, L. B., Atkins, C., Malaise, M., Sany, J., Russell, A. S., van Riel, P. L., Settas, L., Bijlsma, J. W., Todesco, S., Dougados, M., Nash, P., Emery, P., Walter, N., Kaul, M., Fischkoff, S., Kupper, H. Efficacy and safety of adalimumab as monotherapy in patients with rheumatoid arthritis for whom previous disease modifying antirheumatic drug treatment has failed. Ann. Rheum. Dis. 2004; 63: 508-16. Miyasaka, N. Clinical investigation in highly disease-affected rheumatoid arthritis patients in Japan with adalimumab applying standard and general evaluation: the CHANGE study. Mod. Rheumatol. 2008; 18: 252-62. Maini, R., St Clair, E. W., Breedveld, F., Furst, D., Kalden, J., Weisman, M., Smolen, J., Emery, P., Harriman, G., Feldmann, M., Lipsky, P.; ATTRACT Study Group. Infliximab (chimeric anti-tumour necrosis factor α monoclonal antibody) versus placebo in rheumatoid arthritis patients receiving concomitant methotrexate: a randomised phase III trial. Lancet 1999; 354: 1932-9. Kremer, J. M., Dougados, M., Emery, P., Durez, P., Sibilia, J., Shergy, W., Steinfeld, S., Tindall, E., Becker, J. C., Li, T., Nuamah, I. F., Aranda, R., Moreland, L. W. Treatment of rheumatoid arthritis with the selective costimulation modulator abatacept: twelve-month results of a phase IIb, double-blind, randomized, placebo-controlled trial. Arthritis Rheum. 2005; 52: 2263-71. Schmitz, S., Adams, R., Walsh, C. The use of continuous data versus binary data in MTC models: A case study in rheumatoid arthritis. BMC Med. Res. Methodol. 2012; 12: 167. D‘Agostino Jr R. B., D‘Agostino Sr R. B. Estimating treatment effects using observational data. JAMA 2007; 297: 314-6. Keystone, E. C., Kavanaugh, A. F., Sharp, J. T., Tannenbaum, H., Hua, Y., Teoh, L. S., Fischkoff, S. A., Chartash, E. K. Radiographic, clinical, and functional outcomes of treatment with adalimumab (a human anti–tumor necrosis factor monoclonal antibody) in patients with active rheumatoid arthritis receiving concomitant methotrexate therapy: A randomized, placebo-controlled, 52-week trial. Arthritis Rheum. 2004; 50: 1400-11. Kim, H. Y., Lee, S. K., Song, Y. W., Yoo, D. H., Koh, E. M., Yoo, B., Luo, A. A randomized, double-blind, placebo-controlled, phase III study of the human anti-tumor necrosis factor antibody adalimumab administered as subcutaneous injections in Korean rheumatoid arthritis patients treated with methotrexate. APLAR J. Rheumatol. 2007; 10: 9-16. Westhovens, R., Yocum, D., Han, J., Berman, A., Strusberg, I., Geusens, P., Rahman, M. U.; START Study Group. The safety of infliximab, combined with background treatments, among patients with rheumatoid arthritis and various comorbidities: a large, randomized, placebo-controlled trial. Arthritis Rheum. 2006; 54: 1075-86.

Complimentary Contributor Copy

Case Study in Rheumatology

359

[36] Zhang, F. C., Hou, Y., Huang, F., Wu, D. H., Bao, C. D., Ni, L. Q., Yao, C. Infliximab versus placebo in rheumatoid arthritis patients receiving concomitant methotrexate: a preliminary study from China. APLAR J. Rheumatol. 2006; 9: 127-30. [37] Schiff, M., Keiserman, M., Codding, C., Songcharoen, S., Berman, A., Nayiager, S., Saldate, C., Li, T., Aranda, R., Becker, J. C., Lin, C., Cornet, P. L., Dougados, M. Efficacy and safety of abatacept or infliximab vs placebo in ATTEST: a phase III, multi-centre, randomised, double-blind, placebo-controlled study in patients with rheumatoid arthritis and an inadequate response to methotrexate. Ann. Rheum. Dis. 2008; 67: 1096-103. [38] Moreland, L. W., Schiff, M. H., Baumgartner, S. W., Tindall, E. A., Fleischmann, R. M., Bulpitt, K. J., Weaver, A. L., Keystone, E. C., Furst, D. E., Mease, P. J., Ruderman, E. M., Horwitz, D. A., Arkfeld, D. G., Garrison, L., Burge, D. J., Blosch, C. M., Lange, M. L., McDonnell, N. D., Weinblatt, M. E. Etanercept therapy in rheumatoid arthritis. A randomized, controlled trial. Ann. Intern. Med. 1999; 130: 478-86. [39] Keystone, E. C., Genovese, M. C., Klareskog, L., Hsia, E. C., Hall, S. T., Miranda, P. C., Pazdur, J., Bae, S. C., Palmer, W., Zrubek, J., Wiekowski, M., Visvanathan, S., Wu, Z., Rahman, M. U.; GO-FORWARD Study. Golimumab, a human antibody to tumour necrosis factor {alpha} given by monthly subcutaneous injections, in active rheumatoid arthritis despite methotrexate therapy: the GO-FORWARD Study. Ann. Rheum. Dis. 2009;68:789-96. [40] Kay, J., Matteson, E. L., Dasgupta, B., Nash, P., Durez, P., Hall, S., Hsia, E. C., Han, J., Wagner, C., Xu, Z., Visvanathan, S., Rahman, M. U. Golimumab in patients with active rheumatoid arthritis despite treatment with methotrexate: a randomized, double-blind, placebo-controlled, dose-ranging study. Arthritis Rheum. 2008; 58: 964-75. [41] Keystone, E., Heijde, D. V., Mason D. Jr, Landewé, R., Vollenhoven, R. V., Combe, B., Emery, P., Strand, V., Mease, P., Desai, C., Pavelka, K. Certolizumab pegol plus methotrexate is significantly more effective than placebo plus methotrexate in active rheumatoid arthritis: findings of a fifty-two-week, phase III, multicenter, randomized, double-blind, placebo-controlled, parallel-group study. Arthritis Rheum. 2008; 58: 3319-29. [42] Smolen, J., Landewé, R. B., Mease, P., Brzezicki, J., Mason, D., Luijtens, K., van Vollenhoven, R. F., Kavanaugh, A., Schiff, M., Burmester, G. R., Strand, V., Vencovsky, J., van der Heijde, D. Efficacy and safety of certolizumab pegol plus methotrexate in active rheumatoid arthritis: the RAPID 2 study. A randomised controlled trial. Ann. Rheum. Dis. 2009; 68: 797-804. [43] Fleischmann, R., Vencovsky, J., van Vollenhoven, R. F., Borenstein, D., Box, J., Coteur, G., Goel, N., Brezinschek, H. P., Innes, A., Strand, V. Efficacy and safety of certolizumab pegol monotherapy every 4 weeks in patients with rheumatoid arthritis failing previous disease-modifying antirheumatic therapy: the FAST4WARD study. Ann. Rheum. Dis. 2009; 68: 805-11. [44] Genovese, M. C., Covarrubias, A., Leon, G., Mysler, E., Keiserman, M., Valente, R., Nash, P., Simon-Campos, J. A., Porawska, W., Box, J., Legerton C. 3rd, Nasonov, E., Durez, P., Aranda, R., Pappu, R., Delaet, I., Teng, J., Alten, R. Subcutaneous abatacept versus intravenous abatacept: a phase IIIb noninferiority study in patients with an inadequate response to methotrexate. Arthritis Rheum. 2011; 63: 2854-64.

Complimentary Contributor Copy

360

Susanne Schmitz, Roisin Adams, Michael Barry et al.

[45] Weinblatt, M. E., Schiff, M., Valente, R., van der Heijde, D., Citera, G., Zhao, C., Maldonado, M., Fleischmann, R. Head-to-head comparison of subcutaneous abatacept versus adalimumab for rheumatoid arthritis: findings of a phase IIIb, multinational, prospective, randomized study. Arthritis Rheum. 2013; 65: 28-38.

Complimentary Contributor Copy

7TH SECTION

Complimentary Contributor Copy

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 21

MOVING FROM EVIDENCE SYNTHESIS TO ACTION Fabrizio D’Ascenzo, M.D.1,, Claudio Moretti, M.D., Ph.D.2, Pierluigi Omedè, M.D.3 and Fiorenzo Gaita, M.D.4 1

Fellow, 2-3Consultant, 4Professor, Division of Cardiology, Department of Internal Medicine, Città della Salute e della Scienza, Turin, Italy

ABSTRACT The growing body of evidence suitable for decision making is often and actually paradoxically limited by the absence of direct and adequately powered head-to-head comparisons between various different treatments which are exclusively or specifically indicated for the same disease. Systematic review including network meta-analysis may help to overcome these limits. Indeed, network meta-analyses may combine results of direct and indirect comparisons (when the former type of studies are available) or create hitherto unprecedented estimates of indirect comparisons (when no direct head-to-head trial has been reported). In this chapter we will discuss how to read and exploit a network meta-analysis in order to best make use of its findings and translate what appears as a dry collection of quantitative figures into meaningful suggestions for decision-making and the improvement of patient health while concomitantly minimizing, at least in relative terms, costs.

Keywords: Action, adjusted indirect comparison, evidence-based medicine, meta-analysis, network meta-analysis, pairwise meta-analysis, systematic review



Corresponding author: Dr. Fabrizio D‘Ascenzo, Division of Cardiology, Department of Internal Medicine, Città della Salute e della Scienza, Corso Bramante 88-90, 10126 Turin, Italy. Phone: +39 3333992707. Fax: 39 0116967053. Email: [email protected].

Complimentary Contributor Copy

364

Fabrizio D‘Ascenzo, Claudio Moretti, Pierluigi Omedè et al.

INTRODUCTION Randomized controlled trials (RCT) represent the cornerstone of evidence, offering many advantages. Randomization, that is a total casual choice between two or more treatments, offers physicians an independent way to compare different strategies for common problems [1]. Differently from observational studies, even after multivariate adjustment [2], they offer an evaluation of both known and unknown confounders, leading to an accurate and not biased evaluation of a causal relationship between, for example, a disease and a specific intervention. In an era of deep and profound economic changes [3], RCTs comparing head to head different strategy will become less frequent. First, it will be more difficult to obtain enough funding and economical resources either by public institutions or by private companies. Moreover each company will be less prone to directly compare one new strategy with another, trying to test its efficacy and side effects towards placebo, or however not again the best treatment or gold standard. Finally in some cases, the disease of interests remains quite infrequent, not allowing direct comparisons [4]. PairWise Meta-Analysis (PWMA) may partially overcome these limitations [5]. Actually a pooled analysis of larger sample size may drive evidence from only theorical benefit to substantial reduction of ―hard‖ clinical end point. Otherwise on the contrary, it was recently showed than nearly one quarter of highly-cited RCTs published between 1990 and 2003 in leading medical journals reported results that initially overstated effects in favor of the therapy. Pooling data across studies also allows for evaluations of its impact on specific subgroups of patients and in varying clinical environments. But on the same time, PWMA does not allow to overcome any limitation of RCTs, allowing comparison only of a limited number of concurrent and parallel choices. For example, five classes of drugs are commonly used in the treatment of hypertension (calcium channel blockers, beta-blockers, ace-inhibitors, sartanes, and diuretics): to compare all of them a study with 32 different arms should be performed. Network meta-analysis (NMA) represents a possible solution for the previously cited limits. When the available RCTs of interest do not compare all the same interventions but each trial compares only a subset of the interventions of interest, it is possible to develop a network of RCTs where all trials have at least one intervention in common with another. Such a network allows for indirect comparisons of interventions not studied in a head-to-head fashion.

EVIDENCE SYNTHESIS: WHEN IS IT NEEDED? Evidence relies on biological evidence of a clinical, pharmacological and interventional procedures, needed to being tested both in observational studies, both in randomized controlled trials. A parallel and contrary trend is being observed, especially in the last years. The large amount of available choice for the same pathology relies, paradoxically, on a smaller amount of randomized controlled evidence. A practical example may be found about choice of antiplatet and anticoagulation for patients undergoing Percutaneous Coronary Intervention (PCI). Despite this problem involves

Complimentary Contributor Copy

Moving from Evidence Synthesis to Action

365

thousands of patients and their physicians, [6] it has only been addressed by a single randomized controlled trial [7], consequently with a low level of evidence. This trend may lead to different common problems. First a lower probability of finding of a significative result is present, due to small sample size. Second an increased tendency of sub-reporting of not significative finding is enhanced, due both to publication bias both to economical issues [8]. In this particular clinical situation, a physician is often offered many potential solutions for the same pathology, in the absence of head to head comparison [9]. NMA may partially overcome these limitations, offering an efficacious way of comparions of different strategies.

HOW TO READ, HOW TO CONDUCT, AND EXAMPLES NETWORK META-ANALYSIS For a physician facing for the first time a NMA some issues need to be considered. First, NMA of high quality has been considered also by the National Institute for Clinical Excellence (NICE) as one of the highest level of evidence, along with head to head metaanalysis [10]. Second, they are useful when comparing different and mutual exclusive strategies which cannot be administered together, like for example first line drugs for pulmonary arterial hypertension or stent for coronary artery disease [11-12]. Luckily, some points are in common with PWMA, although with some differences (see Figure 1):  Pre-defined outcomes: it should include evaluation of different definitions of outcomes among included studies. How to read. Eligibility criteria combine aspects of the research of interest (e.g., Population, Interventions, Comparisons, and Outcomes) and assessment of studies that have addressed this clinical question [13]. Actually different combinations of direct and indirect evidence, some independent and some overlapping, contribute to the comparisons and estimates of treatment effect [14-16]. One of the most impelling challenge for physicians, especially for NMA, is represented by decision of inclusion or not of interventions no longer in use or of placebos. Interestingly, discordant conclusions were drawn from two systematic reviews that utilized direct and indirect evidence regarding the comparative effectiveness of second generation anti-depressants for major depression disorder [17], because only one of them included studies with placebo. The other major challenge is represented by studies evaluating the same drug with different doses, offering a wide range of possible choice. How to do. It is not possible to perform an overall recommendation: inclusion of all studies is more attractive to drive a larger sample size, consequently with a higher chance to obtain significative results, while pooling together older or not yet used strategies may limit clinical relevance. A possible solution is represented by sensitivity analysis mutually excluding, for example, studies with placebos and/or not yet recommended strategies.

Complimentary Contributor Copy

366

Fabrizio D‘Ascenzo, Claudio Moretti, Pierluigi Omedè et al.

Figure 1. Points in common between paiwise and network meta-analyses.

Finally for studies including a drug with different doses, only those approved and used in everyday clinical practice should be appraised. Examples. For examples, in a recent paper of our group under submission, we compared different strategies for patients presenting with Acute Coronary Syndrome. All these drugs are commonly used, except for ximelagatran, and some of them like rivaroxaban [18], were exploited in different doses. Our choice, in order to make the findings as useful as possible to physicians, was to perform an overall analysis including only those drugs approved in everyday clinical practice and a sensitivity analysis including also ximelagatran. In this particular case, no differences were between primary and sensitivity analysis. In the case of discordance, however, we recommend to compare only drugs suitable to be exploited.  Literature search: it should be accurate and comprehensive, including at least two databases, performed by two or more blinded authors and with an explicated strategy of search. [19] How to read. Various forms of reporting biases have been identified in the literature: NMA involving both drug and non-drug interventions, for example, may be affected disproportionately if industry-sponsored trials are subject to greater reporting biases than other studies. Similarly, the internal validity of network meta-analysis of drug interventions may be affected if placebo-controlled trials are subject to greater reporting biases than activecontrolled trials [20]. Moreover it that paper seventy-eight percent of the trials submitted to the FDA was published, and trials with active controls or statistically significant outcomes favoring the test drug were more likely to be published. In a multivariate model, trials with favorable primary outcomes and active controls were more likely to be published. How to do. An accurate as possible key of research should be performed, with a particular attention on congress and abstract reports. Inclusion of corresponding authors of the various papers may be of great help in this setting, enlarging the bulk of papers. For example

Complimentary Contributor Copy

Moving from Evidence Synthesis to Action

367

the recent paper of Hart et al. [21] showed that addition of unpublished FDA trial data caused a tendency to lower efficacy of the drug and more harm from the drug after inclusion of unpublished FDA trial data. Examples. As a common experience, at least two research keys should be tested by at least two different researchers. For example two different researchers (the second with 10 years more of experience) created two alternative strategies for a metanalysis of performance of fractional flow reserve. The first strategy was ―((fractional flow reserve) OR (FFR) OR (pressure wire)) AND ((myocardial infarction) OR (revascularization) OR (death))NOT (review[pt] OR editorial[pt] OR letter[pt])‖ appraising 383 items, while the second ―((fractional AND flow AND reserve) OR ffr OR (pressure AND wire)) AND ((myocardial AND infarction) OR (coronary ANN (revascularization OR angioplasty OR stent*)) OR (death)) NOT (review[pt] OR editorial[pt] OR letter[pt])‖ with 280 papers. The second key, being more accurate, lead to inclusion of the same number of works of the first, however analyzing a lower number of papers. For a more detailed version see the chapter on searching evidence by Golder et al.  Evaluation of heterogeneity: it is the variation among the results of individual trials beyond that expected from chance [22]. How to read. A test for heterogeneity examines the null hypothesis that all studies are evaluating the same effect. If it is low, it means that the authors may assume the underlying effects to be similar, but non identical. Basically random effect means that researchers assume they come from the same distribution with some central value and some degree of variability, stating that these patients do not derive from a totally comparable population. Actually conceptual heterogeneity refers to differences in methods, study design, study populations, settings, definitions and measurements of outcome, follow-up, co-interventions, or other features that make trials different. How to do. It should be assessed quantitatively or using the chi-squared test and Isquared statistic, exactly like in PWMA. According to heterogeneity, analysis according to both fixed and random effect should be performed, and the most conservative one should be presented in the paper. Examples. The recent paper by Dr. DiNicolantonio et al. [23] directly compared carvedilol versus beta 1 selective beta-blockers for patients with Heart Failure (HF), demonstrating on a fixed effect model a superiority for carvedilol for mortality which however was not confirmed at random effect analysis. The big practical challenge for a physician is represented by the following question: should I give carvedilol or not to reduce mortality of my patients with HF? In this case, one should evaluate heterogeneity, which was low (10%), consequently theorical and statistically allowing exploitation of fixed effect. But a more accurate clinical assessment should drive researchers and/or physicians towards random effect analysis: it is quite unlikely that all patients of these trials were so similar to derive from a theorical consistent and homogeneous population. From a statistical point of view the fixed effect model assumes that no (or a negligible amount of) heterogeneity exists. This assumption is recognized to be typically unrealistic. When heterogeneity exists and the fixed effect model is applied, uncertainty intervals (for example, 95% credible intervals) become artificially narrow. For this reason, the random effects model, which does assume and account for unexplained heterogeneity, is typically preferred. Finally a recent NMA by Chatterjee et

Complimentary Contributor Copy

368

Fabrizio D‘Ascenzo, Claudio Moretti, Pierluigi Omedè et al.

al. [24] showed no differences between beta blockers for heart failure patients. For further data, see the chapter by Beyene et al. on choosing the most appropriate statistical model.  Methodological assessment: the internal validity of the included study should be performed according to Cochrane and reported in the discussion and in the conclusion, with influence of presentation of the results. How to read. A fundamental difference between a conventional pair-wise meta-analysis and network meta-analysis is that a conventional pair-wise meta-analysis yields only one pooled effect bias in the effect estimate from any single trial affects a single pooled effect estimate in a conventional meta-analysis, it may affect several pooled effect estimates obtained in a network meta-analysis. How to do. Possibly, sensitivity analysis according to quality of the included works should be performed [25]. Examples. The recent paper of Bafeta et al. [26] showed that methodological assessment is often not reported in current NMA, and this can deeply influence the results of the whole paper, while Dias et al. [27] showed that bias adjustment reduces the estimated relative efficacy of the treatments and the extent of between-trial heterogeneity. This adjustment for example, was obtained by Trinquart et al. [28] basically through meta-regression.  How to report outcomes. Apart from p best (see further) mainly results may be reported as risk ratios (RRs), odds ratios (ORs) and number needed to treat (NNT) How to read, how to do, examples. Relative risks (RR) are defined as the ratio of incidence rates, and are thus used for dichotomic variables: RR=1 no difference in risk; RR1 increased risk in group 1 vs 2.The risk difference (RD), i.e. absolute risk difference, is the difference between the incidence of events in the experimental vs control groups. The number to treat (NNT), defined as 1/RD, identifies the number of patients that we need to treat with the experimental therapy to avoid one event, while the Numbers needed to harm (NNH) similarly express the number of patients that we have to treat with the experimental therapy to cause one adverse event. Odds ratios (OR) are defined as the ratio of the odds (P/[1-P]) and also used for dichotomic variables. When prevalences are low, they are a good approximation of RR. Otherwise, some points are very different for those usually assessed for PWMA:  Software: WinBUGS is the most exploited and validated one, whereas the other is GeMTCR package [29-30]. Given the elevated levels of difficulties of this softwares and of the computation, it remains strongly recommended a close collaboration with a statistician with expertise in this field.  Peculiar that is Bayesian, Statistic: the other major challenge for a physician reading NMA is to understand the basis of the statistics of these works. How to read. Mathematical statistics uses two major paradigms, conventional (or frequentist), and Bayesian. The latter offers a complete paradigm for both statistical inference and decision making under uncertainty. Bayesian methods make it possible to incorporate scientific hypothesis in the analysis (by means of the prior distribution) and may be applied to

Complimentary Contributor Copy

Moving from Evidence Synthesis to Action

369

problems whose structure is too complex for conventional methods to be able to handle. The Bayesian paradigm is based on an interpretation of probability as a rational, conditional measure of uncertainty, which closely matches the sense of the word ‗probability‘ in ordinary language. Statistical inference about a quantity of interest is described as the modification of the uncertainty about its value in the light of evidence, and Bayes‘ theorem precisely specifies how this modification should be made. The special situation, often met in scientific reporting and public decision making, where the only acceptable information is that which may be deduced from available documented data, is addressed by objective Bayesian methods, as a particular case. It relies on formal combination of a priori probability distribution with a likehood distribution of the pooled effect based on observed data to derive a probability distribution of the pooled effect. Two examples may help the reader. For a clinical point of view, accuracy of ergometric stress test to detect coronary artery disease relies on a similar process. Actually pretest probability of presence of coronary artery disease deeply influences sensitivity and specificity of the test itself [31]. For further details see the chapter on choosing the most appropriate statistical framework by Möller and colleagues. How to do. More specifically Markov chain Monte Carlo was first developed, among others, by Enrico Fermi who derived it to predict collisions and energy of neutrons, given a set of prior observational data, and now it represents one of the most exploited and validated methods. Again the concept at its basis is similar, that is the presence of previous observations, pooled together through dedicated mathematical models, allows to investigate the effects of intervention A versus intervention B given a common comparator C [32-34]. Examples. For examples we may observe some football matches in the Italian Championship. The first match Juventus may win against Inter 4:1. In the second Inter may win against Milan 3:2, and Milan Juventus may end 1:0. These results are totally realistic, but are not coherent with the previous data, and NMA offers a solution, by pooling the a priori distribution.  Consistency or coherence: it is the agreement between direct and indirect of analysis. How to read. One of the most crucial thing is represented by appraisal of same directions of significance between available direct comparisons and indirect ones, which can strengthen the overall methodological relevance of the analysis. For example, for a comparison between treatments A and B, randomized clinical trials must have compared A and B head to head and both interventions with some common comparator, C. This is commonly referred to as a closed loop. Incoherence can exist only in closed loops, and the presence of incoherence can be assessed by comparing the results of the direct and indirect evidence informing the same comparison. Considering the apparent conceptual heterogeneity in connection with the limited power to detect statistical heterogeneity and incoherence, Ioannidis et al. suggested caution when appraising results of the network meta-analysis [35], The same authors, however, noted that while most of the readers are often focused on type II error (lack of power to detect heterogeneity/incoherence, when evidence is sparse) also to type I error (false positive detection of heterogeneity/incoherence) may be present, especially when many tests are performed, as in a very complex network). Anyway, most of NMA are fully complex models, and consequently absence of significant signals for incoherence does not fully exclude its presence, and finding an occasional nominally significant signal of incoherence

Complimentary Contributor Copy

370

Fabrizio D‘Ascenzo, Claudio Moretti, Pierluigi Omedè et al.

may sometimes be a false positive. A possible and feasible solution is represented by a careful combined consideration of clinical and statistical thinking. How to do. When statistical heterogeneity or incoherence is detected, some issues should be faced. First, heterogeneity/incoherence may be explained by clinical or methodological reasons, often related to the primary end point [36], or the signal is a chance finding. In this case, if no explanation may be provided, a critical re-appraisal of all the process of the review should be performed, from clinical question, to inclusion criteria, and to literature search. Random effects meta-analysis models can accommodate unexplained heterogeneity for the available pairwise comparisons and often also make the incoherence signals less prominent. Examples. On the same time, however, the authors recognized that there is no evidence which describes the lower reliability of NMA with incoherence and or heterogeneity. Moreover, Song et al. [37] demonstrated that a significant discrepancy was observed in only three of the 44 comparisons between the direct and the adjusted indirect estimates.  Geometry of the network: before starting a NMA, it is important to have a complete view of the distribution of included studies. The network diagram allows an intuitive approach to symbolically represent all the direct comparisons among treatments How to read. This graph consists of a set of nodes representing the interventions linked by lines that describe numbers of RCTs that have been included. Two important properties of network configuration are geometry and asymmetry [38-39]. Geometry refers to the overall structure of treatment contrasts, while asymmetry describes the amount of data for a specific comparison. The network structure must be carefully built and examined so that each pattern of data may be used to reveal particular characteristics that may assist in the choice of the analytical method. How to do. Networks are depicted in different ways, but one of the most consistent choice is the following. Continuous line represents direct comparison, dashed line trials with only indirect comparisons, while number of trial for each intervention should be written or represented by enlarged circles. Examples. In the previously under review paper of our group (see figure 2) number of trial and of patients for each arm was described. Moreover we added continuous and/or dashed line. This represents an interesting example of previously possible cited bias. Actually, as most of the drugs described here are new, lack of direct comparisons between them is present. In this paper, the direct comparisons are between all new strategies against the commonly used one.  P best, or ranking best: each treatment is the most effective out of all treatments compared. How to read. Estimate for each treatment the probability of being the best. This is straightforward within a Bayesian framework and fairly easy in frequentist setting (use resampling techniques) How to do. Using the (posterior) distributions for all relative treatment effects, the researcher will draw many random samples and will find which intervention outperforms in each sample. The number of times that an intervention ranks first out of the total number of random samples gives the P(best).

Complimentary Contributor Copy

Moving from Evidence Synthesis to Action

371

Examples. For example in the recent paper of Haas et al. [40] all treatments were superior to placebo, no treatment was superior to each other, but two of them had the higher probabilities to perform best.

Figure 2. Example of a network meta-analysis on antithrombotics. Pts=patients.

CONCLUSION Network meta-analysis may be a very useful tool to compare different treatments or interventions for the same clinical problem, although deserving an accurate and critical evaluation before moving from evidence synthesis to action.

REFERENCES [1]

[2] [3] [4] [5] [6]

[7]

Nallamothu BK, Hayward RA, Bates ER. Beyond the randomized clinical trial: the role of effectiveness studies in evaluating cardiovascular therapies. Circulation, 2008; 118: 1294-303. Riegelman R. Studying a Study and Testing a Test: How to Read the Medical Evidence. Philadelphia: Lippincott Williams & Wilkins; 2000. Obama B. Modern health care for all Americans. N. Engl. J. Med., 2008; 359: 15371541. D'Ascenzo F, Biondi-Zoccai G. Network meta-analyses: the "white whale" for cardiovascular specialists. J. Cardiothorac. Vasc. Anesth., 2014; 28: 169-73. Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical research. JAMA, 2005; 294: 218-228. Lamberts M, Gislason GH, Lip GY, Lassen JF, Bjerring Olesen J, Mikkelsen AP, Sørensen R, Køber L, Torp-Pedersen C, Hansen ML. Antiplatelet Therapy for Stable Coronary Artery Disease in Atrial Fibrillation Patients on Oral Anticoagulant: A Nationwide Cohort Study. Circulation 2014 Jan 27 [Epub ahead of print]. Dewilde WJ, Oirbans T, Verheugt FW, Kelder JC, De Smet BJ, Herrman JP, Adriaenssens T, Vrolix M, Heestermans AA, Vis MM, Tijsen JG, van 't Hof AW, ten

Complimentary Contributor Copy

372

[8]

[9]

[10]

[11]

[12]

[13] [14] [15] [16]

[17]

[18]

[19]

[20]

Fabrizio D‘Ascenzo, Claudio Moretti, Pierluigi Omedè et al. Berg JM; WOEST study investigators. Use of clopidogrel with or without aspirin in patients taking oral anticoagulant therapy and undergoing percutaneous coronary intervention: an open-label, randomised, controlled trial. Lancet, 2013; 381: 1107-15. Jones CW, Handler L, Crowell KE, Keil LG, Weaver MA, Platts-Mills TF. Nonpublication of large randomized clinical trials: cross sectional analysis. BMJ, 2013; 347: f6104. NICE Guideline Development Methods: Reviewing and grading the evidence. Available at: http://www.nice.org.uk/niceMedia/pdf/GDM_Chapter7_0305.pdf (last accessed on February 4, 2014). Biondi-Zoccai G, D'Ascenzo F, Cannillo M, Welton NJ, Marra WG, Omedè P, Libertucci D, Fusaro E, Capriolo M, Perversi J, Fedele F, Frati G, Mancone M, DiNicolantonio JJ, Vizza CD, Moretti C, Gaita F. Choosing the best first line oral drug agent in patients with pulmonary hypertension: evidence from a network meta-analysis. Int. J. Cardiol., 2013; 168: 4336-8. Palmerini T, Biondi-Zoccai G, Della Riva D, Stettler C, Sangiorgi D, D'Ascenzo F, Kimura T, Briguori C, Sabatè M, Kim HS, De Waha A, Kedhi E, Smits PC, Kaiser C, Sardella G, Marullo A, Kirtane AJ, Leon MB, Stone GW. Stent thrombosis with drugeluting and bare-metal stents: evidence from a comprehensive network meta-analysis. Lancet, 2012; 379: 1393-402. Li T, Puhan MA, Vedula SS, Singh S, Dickersin K; Ad Hoc Network Meta-analysis Methods Meeting Working Group. Network meta-analysis-highly attractive but more methodological research is needed. BMC Med., 2011; 9: 79. Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat. Med., 2004; 23: 3105-24. Higgins JP, Whitehead A. Borrowing strength from external trials in a meta-analysis. Stat. Med., 1996; 15: 2733-49. Lumley T. Network meta-analysis for indirect treatment comparisons. Stat. Med., 2002; 21: 2313-24. Cipriani A, Furukawa TA, Salanti G, Geddes JR, Higgins JP, Churchill R, Watanabe N, Nakagawa A, Omori IM, McGuire H, Tansella M, Barbui C. Comparative efficacy and acceptability of 12 new-generation antidepressants: a multiple-treatments metaanalysis. Lancet, 2009; 373: 746-58. Gartlehner G, Gaynes BN, Hansen RA, Thieda P, DeVeaugh-Geiss A, Krebs EE, Moore CG, Morgan L, Lohr KN. Comparative benefits and harms of second-generation antidepressants: background paper for the American College of Physicians. Ann. Intern. Med., 2008; 149: 734-50. Oldgren J, Wallentin L, Alexander JH, James S, Jönelid B, Steg G, Sundström J. New oral anticoagulants in addition to single or dual antiplatelet therapy after an acute coronary syndrome: a systematic review and meta-analysis. Eur. Heart J., 2013 Jun;34(22):1670-80.. Biondi-Zoccai GG, Agostoni P, Abbate A, Testa L, Burzotta F. A simple hint to improve Robinson and Dickersin's highly sensitive PubMed search strategy for controlled clinical trials. Int. J. Epidemiol., 2005; 34: 224-5. Rising K, Bacchetti P, Bero L. Reporting bias in drug trials submitted to the Food and Drug Administration: review of publication and presentation. PLoS Med., 2008; 5:e217.

Complimentary Contributor Copy

Moving from Evidence Synthesis to Action

373

[21] Hart B, Lundh A, Bero L. Effect of reporting bias on meta-analyses of drug trials: reanalysis of meta-analyses. BMJ, 2012; 344: d7202. [22] Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in meta-analysis: an empirical study of 125 meta-analyses. Stat. Med., 2000; 19: 1707-28. [23] DiNicolantonio JJ, Lavie CJ, Fares H, Menezes AR, O'Keefe JH. Meta-analysis of carvedilol versus beta 1 selective beta-blockers (atenolol, bisoprolol, metoprolol, and nebivolol). Am. J. Cardiol., 2013; 111: 765-9. [24] Chatterjee S, Biondi-Zoccai G, Abbate A, D'Ascenzo F, Castagno D, Van Tassell B, Mukherjee D, Lichstein E. Benefits of β blockers in patients with heart failure and reduced ejection fraction: network meta-analysis. BMJ, 2013; 346: f55. [25] The Cochrane Library. Available at: http://www.thecochranelibrary.com/view/0/index. html (last accessed on February 3, 2014). [26] Bafeta A, Trinquart L, Seror R, Ravaud P. Analysis of the systematic reviews process in reports of network meta-analyses: methodological systematic review. BMJ, 2013; 347: f3675. [27] Dias S, Welton NJ, Marinho V, Salanti G, Higgins JP, Ades AE. Estimation and adjustment of bias in randomized evidence by using mixed treatment comparison metaanalysis. J. R. Stat. Soc., 2010; 173: 613-29. [28] Trinquart L, Chatellier G, Ravaud Adjustment for reporting bias in network metaanalysis of antidepressant trials. BMC Med. Res. Methodol., 2012; 12: 150. [29] WinBUGS. Available at: http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml (last accessed on February 4, 2014). [30] GeMTC R Package. Available at: http://drugis.org/software/r-packages/gemtc (last accessed on February 3, 2014). [31] Gibbons RJ, Balady GJ, Bricker JT, Chaitman BR, Fletcher GF, Froelicher VF, Mark DB, McCallister BD, Mooss AN, O'Reilly MG, Winters WL, Gibbons RJ, Antman EM, Alpert JS, Faxon DP, Fuster V, Gregoratos G, Hiratzka LF, Jacobs AK, Russell RO, Smith SC; American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Committee to Update the 1997 Exercise Testing Guidelines. ACC/AHA 2002 guideline update for exercise testing: summary article. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee to Update the 1997 Exercise Testing Guidelines). J. Am. Coll Cardiol., 2002; 40: 1531-40. [32] Andrieu C: An Introduction to MCMC for Machine Learning. Mach. Learn., 2003; 50: 5-43. [33] Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, Boersma C, Annemans L, Cappelleri JC. Interpreting indirect treatment comparisons and network meta-analysis for health-care decision making: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 1. Value Health 2011; 14: 417-28. [34] Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects metaanalysis: A comparative study. Stat. Med., 1995; 14: 2685-99. [35] Mills EJ, Thorlund K, Ioannidis JP. Demystifying trial networks and network metaanalysis. BMJ, 2013; 346: f2941.

Complimentary Contributor Copy

374

Fabrizio D‘Ascenzo, Claudio Moretti, Pierluigi Omedè et al.

[36] Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in metaanalyses. BMJ, 2003; 327: 557-60. [37] Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect comparison for estimating efficacy of competing interventions: Empirical evidence from published meta-analyses. BMJ, 2003; 326: 472. [38] Salanti G, Higgins JP, Ades AE, Ioannidis JP. Evaluation of networks of randomized trials. Stat. Meth. Med. Res., 2008; 17: 279-301. [39] Salanti G, Kavvoura FK and Ioannidis JP. Exploring thegeometry of treatment networks. Ann. Intern. Med., 2008; 148: 544-53. [40] Haas DM, Caldwell DM, Kirkpatrick P, McIntosh JJ, Welton NJ. Tocolytic therapy for preterm delivery: systematic review and network meta-analysis. BMJ, 2012; 345: e6226.

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

Chapter 22

THE FUTURE OF NETWORK META-ANALYSIS: TOWARD ACCESSIBILITY AND INTEGRATION Matthew A. Silva, Pharm.D., R.Ph., B.C.P.S.1, and Gert van Valkenhoef, M.Sc., Ph.D.2 1

2

Professor of Pharmacy Practice, MCPHS University, Worcester, MA, US Researcher, Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands

ABSTRACT Network meta-analysis is a demanding type of evidence synthesis that builds on traditional pair-wise meta-analysis. As such, it shares many of the same inefficiencies of the systematic review process. Network meta-analysis itself is also time-consuming and, until recently, difficult as specific software for network meta-analysis did not exist. Recent developments in software and several high-quality tutorials have made Bayesian network meta-analysis much more accessible. Advances in machine learning and more powerful, intuitive software will soon emerge to automate or semi-automate parts of the research process. The development of a fully integrated software toolchain is one of many important developments designed to coordinate the workflow, provide a robust set of analytic tools and facilitate report production. Ultimately, better methods, tools, applications, and technical support will make network meta-analysis methods much more accessible. This is especially important as systematic reviews are moving from static reports to regularly updated online living systematic reviews.

Keywords: Bayesian methods, living reviews, network meta-analysis, software, systematic review, toolchain



Corresponding author: Matthew A. Silva, PharmD, RPh, BCPS; Professor of Pharmacy Practice, MCPHS University, 19 Foster Street, Worcester, MA 01608, USA. Email: [email protected].

Complimentary Contributor Copy

376

Matthew A. Silva and Gert van Valkenhoef

INTRODUCTION The field of meta-analysis has rapidly evolved since the mid-1990s to allow the synthesis of more complex evidence sources [1]. Network meta-analysis is one of the most important developments in evidence synthesis because these methods enable simultaneous comparisons of many treatments using networks of randomized controlled trials. This allows decisionmakers to construct decisions based on consistent estimates of treatments‘ relative effects and explore a comprehensive evaluation of indirect treatment effects which is not readily possible with pair-wise meta-analysis. As a result, the process of evidence synthesis is becoming broader and more inclusive as the volume of published research expands. Methodological developments will continue apace and will allow meta-analyses to be even more inclusive, for example by enabling the synthesis of similar outcomes measured on different scales, of correlated outcomes, of various patterns of missing data, and of datasets with both aggregated and individual patient data. In addition, empirical research on the typical level of heterogeneity [2] and the appropriate scale of measurement [3] in network meta-analysis as well as other meta-epidemiological research [4] will prove a valuable resource for practitioners of meta-analysis and an inspiration for further methodological development. The analysis of heterogeneity and inconsistency remains a challenging topic, and both the conceptual framework and the tools in this area require further research. Network metaregression, which allows systematic reviewers to explore correlations and linear relationships between intervention effects and covariates [5], will increasingly be used to explain (rather than simply detect) heterogeneity and inconsistency in evidence networks, although obtaining sufficient data to do this is often challenging. The core methods of network meta-analysis are now mature [6], yet the quality of published network meta-analyses leaves much to be desired, both in terms of the systematic review methods [7] and the statistical methods [8], especially the analysis of heterogeneity and inconsistency. Reporting guidelines, tutorials, textbooks, and specialized software will help raise the quality of future network metaanalyses. It is especially important that authors and reviewers are aware of the importance of the balance of effect modifiers in meta-analysis, and its relation to heterogeneity and inconsistency [9]. Important though these improvements in methodology and awareness of methodological issues will be, they will be eclipsed by the implications network meta-analysis has for the production and consumption of systematic reviews. These implications will be the focus for the remainder of this chapter. First, we discuss how the challenging nature of statistical methods for network meta-analysis creates a need for specialized software that enables a broad audience of researchers to apply the methods correctly. Then, we describe how network meta-analysis plainly shows the inefficiencies of the current systematic review process, and how they can be addressed, culminating in the concept of ―living reviews‖. Afterwards, we discuss the current trend towards greater disclosure of clinical trials data, and how it reflects on the efficiency of systematic review. Finally, we deal with the startling amount of results presented in network meta-analyses, and the need to embed network meta-analyses in a broader decision making framework. We conclude with a brief summary.

Complimentary Contributor Copy

The Future of Network Meta-Analysis: Toward Accessibility and Integration

377

THE NEED FOR USER-FRIENDLY SOFTWARE The statistical methods for network meta-analysis are relatively complex, and often require some proficiency in statistical programming to adapt example code for the data at hand [10]. This limits the number of researchers capable of performing network metaanalysis, and introduces the opportunity to introduce errors by incorrectly adapting the existing code, either because it is poorly understood or because of simple oversights. These are challenges to the validity of network meta-analyses, especially when the underlying code and data are not made available for verification [11]. Methods for the analysis of heterogeneity and inconsistency are especially difficult to implement correctly, and this may partially explain why they are rarely discussed in published applications of meta-analysis. To quickly build a high-quality network meta-analysis, researchers require specialized software that is powerful and easy to learn, and preferably intuitive. Researchers also may choose software tools according to the scope of the project, size and scalability of their workgroups and available funding. Fortunately, software developers and statistical programmers have already begun work to support evidence-based think-tanks and researchers by creating these tools. Currently, network meta-analysis methodology is ahead of software applications and there is considerable lead-time and effort required by developers to transform the methods and theoretical models into working software applications. There is also considerable time required to validate applications, fix errors or bugs, establish a distribution plan and build a user interface. Moreover, while excellent methodology papers, tutorials and checklists exist in the public domain, good documentation and practical examples are vital parts of usable software, especially in an area as complex as evidence synthesis. Pairwise meta-analysis methods are well established [12-14] and so are methods for evaluating direct and indirect treatment comparisons [15-17]. The use of Markov Chain Monte Carlo simulation methods is a powerful and general method for statistical estimation, and is widely used for Bayesian modeling in network meta-analysis. [18-25]. Some software tools for network meta-analysis require moderate to extensive software and coding experience to prepare data for basic analysis and significant experience is required to adjust models [18-28]. With increasing model complexity the need for experience with coding increases. Most clinicians and public health professionals do not have experience with statistical software or coding and spend most of their time practicing and researching according to their primary disciplines. Most research groups do not have ready access to a programmer familiar with traditional and Bayesian statistics, but are familiar with study quality and trial design and can evaluate research according to methodology, patient characteristics and can extract data for meta-analysis. Software that automatically builds and estimates network meta-analysis models is now available [10]. One software tool in particular has made traditional meta-analysis, network meta-analysis and benefit-risk analysis readily accessible without extensive training in statistical software and is also the first to automate some of these processes [29]. The Java based ADDIS software package was developed by a team of researchers at the University Medical Center Groningen and the University of Groningen and is freely available to anyone with a desktop or laptop computer [29]. The package allows for input of study features including methods, randomization and blinding along with summary data including patient demographics, clinical outcomes and adverse events experienced during treatment or follow

Complimentary Contributor Copy

378

Matthew A. Silva and Gert van Valkenhoef

up. In traditional meta-analysis software, data are usually extracted into tables intended to be used directly for analysis. By contrast, ADDIS attempts to capture the design of the included clinical trials in order to allow more flexible re-analysis of the extracted data [29]. The ambition to capture this information in order to speed up systematic review and meta-analysis is not new and whether ADDIS succeeds in this objective remains to be evaluated [30]. The fact that ADDIS is specifically structured to capture clinical trial information also helps improve data quality and reduce time required for validation compared to more ad-hoc solutions for data management. This is an example of how thoughtful software design improves the quality of work used to make decisions. ADDIS can perform pair-wise meta-analysis on various scales, and presents the results as forest plots that also include heterogeneity statistics. The interpretation of network metaanalyses is facilitated by network graphs, tables of relative effect estimates, rank probability charts, and models for detecting inconsistency. Importantly, each analysis is linked to the included studies, providing the user with valuable information on each study's characteristics. ADDIS uses the Bayesian method for network meta-analysis, and convergence is assessed and graphically plotted using established methods and the number of tuning and simulation iterations may be extended as required [25]. ADDIS can also generate and export code for further modification and analysis in BUGS or JAGS [18, 26]. Familiarity with general, biomedical and Bayesian statistical concepts are assumed and later required to interpret the results and output generated in ADDIS, though extensive knowledge of statistical coding is not needed to build any of the core analyses. This is a pioneering development in the availability and accessibility of incredibly powerful tools that enable individuals to undertake large-scale evidence synthesis to inform their practice decisions. It is important to consider that tools are in constant development to keep pace with newer insights and improved methods. For example an interactive three-dimensional network graph could allow the user to navigate the network structure and visually inspect its characteristics. Networks could be drawn and scaled proportionally to show effect sizes, weight of evidence, or precision. Also, network geometry could be automatically updated to show the flow of evidence and highlight areas of inconsistency between direct and indirect evidence [22, 31, 32]. Additionally, it may be possible to automate and visually evaluate symmetry, diversity and co-occurrence as network ecology evolves, such as when new trials add to the totality of evidence or when known trials extended follow-up periods [22]. Software should enable users to easily perform meta-analysis on different scales of measurement (e.g. odds ratio or risk ratio) and study how this affects heterogeneity and inconsistency [3]. It may also be useful to evaluate study precision and potential for bias in a given network [33]. Network evaluation methods and familiarity with viable treatments may help identify and plan new trials that could enhance closed loops in a network [1, 22, 34].

OPTIMIZING THE SYSTEMATIC REVIEW PIPELINE Systematic reviews aim to comprehensively capture the current state of evidence on a specific topic. Therefore, missing relevant evidence, or misrepresenting identified evidence due to mistakes in interpretation or data extraction are extremely costly, and can greatly diminish the overall value of a systematic review. To address this, guidance and guidelines

Complimentary Contributor Copy

The Future of Network Meta-Analysis: Toward Accessibility and Integration

379

for the conduct of systematic reviews, such as the Cochrane Handbook for systematic reviews [35] and the PRISMA statement (which supersedes the QUOROM guideline for systematic reviews) [36] have become more demanding over the years. Similarly, guidance and checklists for the proper implementation of network meta-analysis call for careful and thorough investigation of the evidence network [6, 37-44]. It was previously estimated that a well-conducted systematic review with meta-analysis cost between 1,000 and 2,000 personhours, or up to a year of full-time work. [45]. The increasing demands placed on the process of systematic review have no doubt increased this further, and the number of trials published per year requiring consideration is increasing [46]. Network meta-analysis, however, is truly demanding: where a typical systematic review with pair-wise meta-analysis includes at most several tens of trials, comprehensive network meta-analyses may include hundreds [47-48]. The investment in terms of person-years and funding required to produce such reviews is prohibitive, and raises the question whether the systematic review enterprise will remain sustainable as the number of published trials and the inclusiveness of reviews increase further. Why is the performing a systematic review so time consuming? Part of the effort may be unavoidable, because producing a good systematic review and meta-analysis is inherently difficult, and requires making many decisions about the questions to ask, the evidence to include, and the models to estimate. However, there are also many inefficiencies in workflow and task management, as well as data management. Users currently use a mix of unrelated and unconnected tools, either paper-based or software, resulting in incompatibility and fragmentation of data, which leads to considerable effort on technical tasks or busywork. For example, citation results from multiple search engines (PubMed, Embase, clinicaltrials.gov) require cross-referencing to eliminate duplicate reports. Once a collection of citations is finalized, they must be retrieved, read and scored for quality. Researchers then retrieve the original full-text articles, separate addendum, appendices, supplemental data figures or tables, and even then important information may be missing. The retrieval process can take weeks depending on personal or institutional access to published reports and available funding for reprints. Large or complex subject areas lead to lengthy reading lists. Reading full reports, with close attention to patient characteristics, methods (particularly randomization, treatment allocation, allocation concealment and potential sources for bias) and outcome reporting is time-consuming and sometimes demanding. New citations frequently appear while the systematic review is underway and these must be added to stay current. Selection of key articles and exclusion of others requires some experience and sometimes clinical judgment to meet prespecified review objectives. It is also up to researchers to organize and store the information generated during the screening phase. This is often done using an ad-hoc collection of paper-based forms, spreadsheets, electronic documents, and databases. Researchers may store qualitative and quantitative information in different locations leading to fragmentation and disorganization during the preparation and reporting stages. Similarly, manual extraction of data and preparing data for input into a relational database and related analytic tools is subject to human error, disorganization and fragmentation, which is made more likely with ad-hoc tools. Moreover, transferring data between the tools used for the various steps of a systematic review is often non-trivial and may be a source of errors. In the following, we describe how information technology may help reduce the effort spent on the more tedious steps of the systematic review process and how it can increase efficiency and traceability by offering a more tightly integrated toolchain. Then, we discuss

Complimentary Contributor Copy

380

Matthew A. Silva and Gert van Valkenhoef

how researchers could leverage such technology to create reviews that other can build on, finally leading to the concept of living reviews.

Assisted Publication Screening and Data Extraction Accurate publication screening and data extraction are critical to systematic reviewing. It is therefore unlikely that evidence synthesis can be completely automated. However, welldesigned software tools can reduce the workload significantly [49]. For example, the Abstrackr system reduces the literature screening workload in two main ways [50]. First, it provides an integrated environment in which citations can be screened, helping users to tag citations, record their exclusion decisions, and resolve inter-rater disagreements. Second, it uses state-of-the-art machine learning techniques to prioritize citations for screening based on previous exclusion decisions made by the user, allowing reductions in workload by 50% or more without affecting quality [51]. Assessment of full text articles as well as abstracts can be made more efficient using validated methods for extracting important trial features including randomization, blinding, allocation concealment procedures, statistical methods, eligibility, treatments, and intervention parameters as well as sample size, dates, endpoints, outcomes, funding sources, authors, and journal details. Methods that identify these characteristics with high accuracy have already been developed [52-54] but have not yet been integrated in tools for end-users. These methods have the potential to greatly enhance the efficiency of systematic review, but further development of end-user applications is required before they will be accepted and used in the everyday practice of systematic reviewing.

Integrating the Pipeline As previously discussed, systematic reviewing consists of many different steps, which are often performed using different tools. Reporting alone is often done with a combination of word processing, spreadsheet, and graphics software. Reviewers therefore expect to be copying data between programs and spend considerable time on reformatting tasks. Hence, the toolkit for systematic review and meta-analysis is fragmented, leading to inefficiency, lack of reproducibility, and possibly data loss. Therefore, systematic reviewers would benefit from a more integrated toolchain that captures a larger part of the ―systematic review pipeline‖ – the entire process from inception, through scoping, literature search, citation screening, full-text screening, and data extraction to meta-analysis and reporting. For example, the Cochrane collaboration's RevMan software [55] has moved in this direction by offering users a single tool in which to document full-text screening decisions, perform data extraction and meta-analysis, and, simultaneously, write the report. Abstrackr [50] provides another piece of the puzzle by providing a single tool in which the citation screening process takes place and the Systematic Review Data Repository does something similar for data extraction [56]. However, important pieces of the pipeline are still missing, and the full benefit to efficiency, transparency, and reproducibility can only be achieved when the toolchain is fully integrated, capturing all information generated in the systematic review process. In such a system, researchers would use the integrated search functions to query multiple databases and return linked data. Assisted title and abstract

Complimentary Contributor Copy

The Future of Network Meta-Analysis: Toward Accessibility and Integration

381

screening with procedures to remove duplicates will improve efficiency of the initial search. Assisted screening of the full text publications will remove any additional duplicate or redundant data. Data could be extracted with the help of algorithms that detect data tables and figures, and then automatically transferred to tools for (network) meta-analysis. Moreover, the report writing process could be improved through shared editorial and reviewer tools with tracking and archiving functions for collaborative writing. Tracking and archiving capabilities support an open and transparent process when preparing data to be shared or submitted for peer review. Network meta-analysis can and should now be embedded in the systematic review workflow for all groups undertaking a systematic review to ensure a comprehensive analysis. Network meta-analysis theory and methodology are far ahead of the available software applications and data-analysis tools. Methodology for direct and indirect evidence synthesis, meta-regression, predictive modeling, and evaluation of publication bias, evidence-flows, and network meta-epidemiology has been developed. The challenge for an integrated toolchain for systematic review is thus to be comprehensive enough to capture the entire workflow, yet be flexible enough to allow innovative synthesis methods to be applied, even as they are still being developed. The tools for this are already emerging, allowing existing statistical software to be embedded in web-based applications. For example, RApache (http://rapache.net/) and RStudio Shiny (http://www.rstudio.com/shiny/) enable building interactive web applications that integrate R statistical analyses, and several cloud providers offer on-demand computing using R. Clearly, many pieces of the anticipated integrated toolchain for systematic review and meta-analysis have already been developed. We hope that these tools will converge to offer a fully integrated experience, eliminating needless busywork and allowing researchers to make full use of the machine learning tools that have already been developed.

Collaboration and Data Re-Use The current tools for creating systematic reviews are self-limiting and although the research process can take years, it results in a static document that may or may not be updated with new evidence [57]. While assisted data extraction is essential, there continues to be a need for review of extracted data and reports by researchers with clinical experience. The decisions made and the data extracted are some of the most important interim steps in a systematic review. However, these interim data are rarely published, and future reviewers do not benefit from this work having already been performed. Moreover, peer review of extracted data promotes quality and collaboration and saving interim process steps promotes transparency. Researchers currently working on systematic reviews may choose to collaborate and can work toward shared objectives, however for many there is no clear incentive to share previously compiled datasets. Of course, some choose to share as much data possible, but due to the ad-hoc nature of how systematic review data is often collected this is not easy. In addition, sharing too much, or too early, may help competing teams to publish their review earlier. This means that time consuming steps such as citation screening and data extraction are performed time and again by various research teams. The overall enterprise of systematic reviewing would benefit greatly if interim steps were easier to share and if the benefits of sharing would outweigh the risks. A fully integrated toolchain for systematic review, as

Complimentary Contributor Copy

382

Matthew A. Silva and Gert van Valkenhoef

discussed in the previous section, would go a long way in addressing this challenge. However, the culture of protecting private databases is heavily entrenched, and is for now difficult to counter in a ―publish or perish‖ research climate. However, the scientific enterprise as a whole is moving towards greater transparency and open access, and as more and more data become freely available the benefits of keeping a private database diminish. It is especially encouraging that the Cochrane Collaboration is moving towards open access for their library of systematic reviews [58]. Collaboration, data sharing and process transparencies are key features when moving research out of respective silos and into a shared workspace. User authentication is essential for building an open platform designed for reproducibility and quality. Fortunately, the Open Researcher and Contributor ID (ORCID, http://orcid.org/) has recently started offering identification, authentication, and institutional information services for researchers, with broad buy-in from publishers, academic institutions, and companies. This should help track an individual‘s contributions across multiple systems, and will enable clearer attribution of valuable data extraction work, independent of publications.

Living Reviews Perhaps the most important thing to happen for the systematic review process is migration to the living systematic review [59]. Because the available evidence is so rapidly out-dated and in many cases before clearing peer-review; online living reviews offer a methodology for keeping up with new evidence. By regularly scheduling updates to an existing systematic review, living systematic reviews require moderate amounts of regularly scheduled work rather than a great amount of work once. With the right software capabilities, users could even study how our best estimates of effect sizes change as new evidence becomes available [60]. The previously discussed developments, namely advances in machine learning for the automation or semi-automation of routine tasks, the development of integrated toolchains designed to manage complete workflows, and routinely sharing the interim products of a systematic review would greatly benefit the effort to create living reviews, and vice versa. Moreover, peer review may be a challenging aspect of a fast-moving living systematic-review process and an integrated online toolchain would also facilitate the ongoing peer-review of these works.

TRENDS IN THE DISCLOSURE OF TRIAL DATA Publication bias likely affects every available systematic review and meta-analysis published to date. The available published evidence is usually highly selective and positively reflects the funding source, the researchers' objectives, or is part of a market strategy to promote a treatment [61-66]. Therefore, it is reasonable to expect that most published evidence and older systematic reviews over-estimate treatment effect sizes, particularly when including studies published early in a drug‘s market cycle when effect sizes are unstable [60]. While we have methods to evaluate publication bias in meta-analysis, detecting publication bias in network meta-analysis is more difficult [67-73]. The methods to handle the problem of

Complimentary Contributor Copy

The Future of Network Meta-Analysis: Toward Accessibility and Integration

383

publication bias are well described in these key papers and elsewhere in this book. The practical management of publication bias requires collaborative efforts from all actors and agencies with mutual understanding and commitment to publishing null findings, which are important for developing a complete body of evidence. In most countries, there are no consequences for failure to report full trial results with all null or negative findings into the public domain. Furthermore, commercial protection under law may be applied broadly to allow any information (including trial data) considered commercially sensitive to remain unpublished. Research data is especially complicated because of the combination of ethical and legal safeguards. Researchers have the obligation to protect patient level data and the law protects market privileges when a product reaches market. The practice of delaying publication or withholding information hampers the process of systematic review and contributes to publication bias favoring an intervention. These practices are wasteful and not in the interest of the human subjects that volunteered as participants in the research process [74-79]. Consequently, most inconclusive or dead-end research is never published, which also results in needless duplication of trials. Overall, approximately half of all trials conducted (industry, government or privately funded) are left unpublished, and trials with positive findings are twice as likely to be published [80-82]. Publishers and their affiliated peer-reviewers may unknowingly contribute to the problem of fragmented data by declining to publish null or negative findings or by publishing these data in obscure supplements instead of the main publication. The prospective registration of clinical trials was proposed as a solution to publication bias as early as 1986 [67]. Eleven years later in 1997, the United States was the first country to introduce the mandatory registration of clinical trials, leading to the establishment of the ClinicalTrials.gov registry in 2000 [83]. Many countries have since adopted similar legislation, which has been further supported by statements from the International Committee of Medical Journal Editors (ICMJE) and the World Health Organization (WHO) [84]. Some government registries require reporting of summary data but enforcing legislation on complete reporting has proven to be difficult [81]. Also, researchers and organizations may fulfill minimal obligations by reporting data summaries to trial registries but selectively choose not to publish full data sets in manuscript form using academic or medical journals. Because of selective reporting, the effect sizes of known treatments may appear larger than they truly are in the absence of null or negative findings. Additionally, delayed publication practices that extended the market advantage contribute to incomplete understanding of the true effect size. As such, there is a general failure to inform and disclose important information to providers, patients and insurers about treatments that are costly, either by direct or indirect valuation. The European Medicines Agency and the AllTrials initiative is urging all researchers to release complete trial results into the public domain to reduce waste and promote transparency by allowing users to review original research protocols and complete datasets, including treatment-associated harm [85]. Capturing this previously unreleased information and recompiling systematic reviews may lead to different conclusions about a treatment‘s effectiveness. In the meantime, researchers could improve transparency by moving previously unpublished reports and manuscripts out of archives and into the public domain at least in electronic format, and it can be argued that not doing so violates the trust of patients who participated in clinical trials [74]. Since those critical of researcher-industry relationships question research integrity, a policy of openness and transparency with a commitment to

Complimentary Contributor Copy

384

Matthew A. Silva and Gert van Valkenhoef

sharing strategically archived and deidentified data may only improve patient perceptions about the relationships between industry and clinicians enrolling patients in clinical trials [8689]. Patient-level data is the most valuable data for meta-analysis or network meta-analysis and collaborations that share patient-level data from original trials or use large registries produce useful research. Pooling of patient level data is ideal when the available trials raise questions about the impact of covariates that cannot be addressed with aggregate level data and has the potential to add to greatly modify existing evidence networks [1]. However, making patient level data publicly available is difficult because of privacy concerns, legal protections, and institutional safeguards currently limit sharing of patient-level data. All patient identifiers must be redacted and even then it is possible to re-identify some patients by linking multiple data sources. Because access to patient-level data is highly desirable for systematic reviews and meta-analysis, further work on how senstitive data can be shared without compromising patient privacy is required [90]. Initiatives such as DataSHIELD which aim to take the analysis to the individual patient data, rather than take the individual patient data to the analysis, may be a workable compromise [91]. Although publication bias and delayed disclosure are still problematic, recent developments in the registration of trials and their results are starting to address the problem. In addition, the registration of clinical trials is also creating more structured datasets than previously existed, creating the opportunity for these data to be extracted automatically for use in meta-analyses [29]. This development will reduce the added value of private databases, and may accelerate the move towards more collaboration and data sharing by systematic reviewers.

BEYOND NETWORK META-ANALYSIS Network meta-analysis has expanded the scope of systematic reviews significantly, making the task of summarizing the results increasingly challenging. Presenting a network meta-analysis on a single outcome is difficult because it involves presenting the underlying data, a (potentially large) number of correlated effect estimates as well as treatment rankings and information on heterogeneity and inconsistency as well as covariates. Making matters more complicated, clinicians are routinely interested in multiple outcomes. This raises the question of how to present the balance of effects across outcomes, and whether the journal article is the best format in which to present such a complex collection of results. To address this issue, systematic reviewers should carefully consider the role of systematic review and meta-analysis in the broader decision making context. For example, a systematic review on the beneficial and harmful effects of antidepressants could be most appropriately summarized using a multiple criteria benefit-risk model in which trade-offs between efficacy and side effects can be made explicit [92]. Such multiple criteria decision analysis (MCDA) models further structure the domain by identifying the outcomes of interest. Each outcome is placed on a scale where the worst possible (or plausible) value is assigned a utility of zero and the best possible (or plausible) value is assigned a utility of one [93]. Utilities of in-between values are often assumed linear, but can also be elicited from a clinical expert. This enables ―scale swings‖, changes from

Complimentary Contributor Copy

The Future of Network Meta-Analysis: Toward Accessibility and Integration

385

worst to best on each of the outcomes, to be compared for their relative attractiveness, resulting in a weighting. By integrating uncertain evidence across multiple outcomes, a multiple criteria decision analysis can help clarify the clinical implications of the systematic review, and identify how patient and decision maker preferences determine the best treatment. Other systematic reviews could use the pooled direct and indirect evidence to build decision analytic models and cost-effectiveness analyses [94]. The results of evidence synthesis would inform how each treatment modifies the transition probabilities in a disease state model. The disease state model, in turn, would predict how the treatments impact patients' overall survival and/or quality of life [95]. This could serve as a summary of the overall evidence or, coupled with cost information, enable the results of the evidence synthesis to be presented as an overall cost-effectiveness acceptability curve. However, specific models should be tailored to a specific question, health system, government, or payer and reviewers cannot predict all the different ways in which a systematic review could be used. Therefore, it is important that systematic reviews are not just available as static text-based reports, but also enable clinicians, patients, or decision makers to interact with the results and adapt them to their specific scenario. This can be achieved in many ways. For example, the underlying clinical trials data can be made publicly available in a machine-readable format, the full set of effect estimates and their correlations can be made available in an online supplement to allow further modeling, and interactive decision models can be provided to allow other researchers or patients to input their own preferences. This would further enhance the already substantial value that systematic reviews provide for the decision making process.

CONCLUSION The future of network meta-analysis will bring the development of specialized software to make it accessible to a broader audience, and the integration of this software in the broader systematic review toolchain. The systematic review process itself is rapidly evolving with the development of powerful new applications that will exploit advances in machine learning to automate a workflow for living systematic reviews. High quality open-access tools, automated and semi-automated functionality for citation screening, data extraction, and model specification, as well as coordinated task and workflow management systems are in the nearterm future and will help reviewers to transform the systematic review process. Hopefully, systematic reviews will become more transparent, offering users access to the underlying trial data and interim products rather than just the report. Overall, today‘s changes that are simplifying and making network meta-analysis more accessible will be part of larger, systematic improvements in the systematic review process that will put evidence synthesis into practice sooner to the benefit of patients. Conflicts of interest: G. van Valkenhoef is the technical lead of the ADDIS project.

Complimentary Contributor Copy

386

Matthew A. Silva and Gert van Valkenhoef

REFERENCES [1] [2]

[3]

[4] [5]

[6] [7]

[8]

[9]

[10] [11]

[12] [13] [14] [15]

[16] [17]

Sutton AJ, Higgins JP. Recent developments in meta-analysis. Stat. Med., 2008; 27: 625-50. Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JP. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. Int. J. Epidemiol., 2012; 41: 818-27. Caldwell DM, Welton NJ, Dias S, Ades AE. Selecting the best scale for measuring treatment effect in a network meta-analysis: a case study in childhood nocturnal enuresis. Res. Synth. Meth., 2012; 3: 126-41. Veroniki AA, Vasiliadis HS, Higgins JP, Salanti G. Evaluation of inconsistency in networks of interventions. Int. J. Epidemiol., 2013; 42: 332-45. Jansen JP, Cope S. Meta-regression models to address heterogeneity and inconsistency in network meta-analysis of survival outcomes. BMC Med. Res. Methodol., 2012; 12: 152. Dias S, Welton NJ, Sutton AJ, Ades AE. Evidence synthesis for decision making 1: introduction. Med. Decis. Making, 2013; 33: 597-606. Bafeta A, Trinquart L, Seror R, Ravaud P. Analysis of the systematic reviews process in reports of network meta-analyses: methodological systematic review. BMJ, 2013; 347: f3675. Song F, Loke YK, Walsh T, Glenny AM, Eastwood AJ, Altman DG. Methodological problems in the use of indirect comparisons for evaluating healthcare interventions: survey of published systematic reviews. BMJ, 2009; 338: b1147. Jansen JP, Naci H. Is network meta-analysis as valid as standard pairwise metaanalysis? It all depends on the distribution of effect modifiers. BMC Med., 2013; 11: 159. van Valkenhoef G LG, de Brock B, Hillege H, Ades AE, Welton NJ. Automating network meta-analysis. Res. Synth. Meth., 2012; 3: 285-99. Sobieraj DM, Cappelleri JC, Baker WL, Phung OJ, White CM, Coleman CI. Methods used to conduct and report Bayesian mixed treatment comparisons published in the medical literature: a systematic review. BMJ Open, 2013;3. Hedges LV, Vevea JL. Fixed and random-effects models in meta-analysis. Psychol. Methods, 1998; 3: 486-504. Normand SL. Meta-analysis: formulating, evaluating, combining, and reporting. Stat. Med., 1999; 18: 321-59. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin. Trials, 1986; 7: 177-88. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J. Clin. Epidemiol., 1997; 50:6 83-91. Lumley T. Network meta-analysis for indirect treatment comparisons. Stat Med. 2002; 21: 2313-24. Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses. BMJ, 2003; 326: 472.

Complimentary Contributor Copy

The Future of Network Meta-Analysis: Toward Accessibility and Integration

387

[18] The BUGS Project 2012. Available from: http://www.mrc-bsu.cam.ac.uk/bugs/ (last accessed on March 28, 2014). [19] Plummer M. Comments on 'The BUGS project: Evolution, critique and future directions'. Stat. Med., 2009; 28: 3073-4. [20] Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat. Med., 2004; 23: 3105-24. [21] Lu G, Ades AE. Assessing evidence inconsistency in mixed treatment comparisons. JASA, 2006; 101: 447-59. [22] Salanti G, Kavvoura FK, Ioannidis JP. Exploring the geometry of treatment networks. Ann. Intern. Med., 2008; 148: 544-53. [23] Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Stat. Med., 2010; 29: 932-44. [24] Gelman A, Rubin, D.B. Inference from Iterative Simulation Using Multiple Sequences. Statist. Sci., 1992; 7: 457-72. [25] Brooks SP, Gelman, A. General Methods for Monitoring Convergence of Iterative Simulations. J. Comput. Graph. Stat., 1998; 7: 434-55. [26] Plummer M, editor. JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003); 2003 March 20-22; Vienna, Austria. [27] Stata Software 2014. Available from: http://www.mrc-bsu.cam.ac.uk/Software /stata.html#Software (last accessed on March 28, 2014).. [28] netmeta: Network meta-analysis with R. Available from: http://cran.rproject.org/web/packages/netmeta/index.html (last accessed on March 28, 2014). [29] van Valkenhoef G, Tervonen T, Zwinkels T, de Brock B, Hillege H. ADDIS: a decision support system for evidence-based medicine. Decision Support Systems, 2012; 55: 45975. [30] Sim I, Owens DK, Lavori PW, Rennels GD. Electronic Trial Banks: A complementary method for reporting randomized trials. Med Decis Making., 2000; 40: 440-450. [31] Konig J, Krahn U, Binder H. Visualizing the flow of evidence in network meta-analysis and characterizing mixed treatment comparisons. Stat. Med., 2013; 32: 5414-29. [32] Krahn U, Binder H, Konig J. A graphical tool for locating inconsistency in network meta-analyses. BMC Med. Res. Methodol., 2013; 13: 35. [33] Chaimani A, Vasiliadis HS, Pandis N, Schmid CH, Welton NJ, Salanti G. Effects of study precision and risk of bias in networks of interventions: a network metaepidemiological study. Int. J. Epidemiol., 2013; 42: 1120-31. [34] Roloff V, Higgins JP, Sutton AJ. Planning future studies based on the conditional power of a meta-analysis. Stat. Med., 2013; 32: 11-24. [35] Higgins JPT, Green, S. Cochrane Handbook for Systematic Reviews of interventions. 2011 March 2011. In: Cochrane Handbook for Systematic Reviews of interventions. The Cochrane Collaboration, [cited March, 6th 2014]. Available from: http://handbook.cochrane.org. (last accessed on March 28, 2014). [36] Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Ann. Intern. Med., 2009; 151: W65-94.

Complimentary Contributor Copy

388

Matthew A. Silva and Gert van Valkenhoef

[37] Hoaglin DC, Hawkins N, Jansen JP, Scott DA, Itzler R, Cappelleri JC, Boersma C, Thompson D, Larholt KM, Diaz M, Barrett A. Conducting indirect-treatmentcomparison and network-meta-analysis studies: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 2. Value Health 2011; 14: 429-37. [38] Jansen JP1, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, Boersma C, Annemans L, Cappelleri JC. Interpreting indirect treatment comparisons and network meta-analysis for health-care decision making: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 1. Value Health, 2011; 14: 417-28. [39] Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med. Decis. Making, 2013; 33: 607-17. [40] Dias S, Sutton AJ, Welton NJ, Ades AE. Evidence synthesis for decision making 3: heterogeneity--subgroups, meta-regression, bias, and bias-adjustment. Med. Decis. Making, 2013; 33: 618-40. [41] Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Med. Decis. Making, 2013; 33: 641-56. [42] Dias S, Welton NJ, Sutton AJ, Ades AE. Evidence synthesis for decision making 5: the baseline natural history model. Med. Decis. Making, 2013; 33: 657-70. [43] Dias S, Sutton AJ, Welton NJ, Ades AE. Evidence synthesis for decision making 6: embedding evidence synthesis in probabilistic cost-effectiveness analysis. Med. Decis. Making, 2013; 33: 671-8. [44] Ades AE, Caldwell DM, Reken S, Welton NJ, Sutton AJ, Dias S. Evidence synthesis for decision making 7: a reviewer's checklist. Med. Decis. Making, 2013; 33: 679-91. [45] Allen IE, Olkin I. Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA, 1999; 282: 634-5. [46] Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med., 2010; 7: e1000326. [47] Cipriani A, Furukawa TA, Salanti G, Geddes JR, Higgins JP, Churchill R, Watanabe N, Nakagawa A, Omori IM, McGuire H, Tansella M, Barbui C. Comparative efficacy and acceptability of 12 new-generation antidepressants: a multiple-treatments metaanalysis. Lancet, 2009; 373: 746-58. [48] Leucht S, Cipriani A, Spineli L, Mavridis D, Orey D, Richter F, Samara M, Barbui C, Engel RR, Geddes JR, Kissling W, Stapf MP, Lässig B, Salanti G, Davis JM. Comparative efficacy and tolerability of 15 antipsychotic drugs in schizophrenia: a multiple-treatments meta-analysis. Lancet, 2013; 382: 951-62. [49] Wallace BC, Dahabreh IJ, Schmid CH, Lau J, Trikalinos TA. Modernizing the systematic review process to inform comparative effectiveness: tools and methods. J. Comp. Eff. Res., 2013; 2: 273-82. [50] Wallace BC, Small K, Brodley CE, Lau J, Trikalinos TA. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. ACM SIGHIT International Health Informatics Symposium (IHI); 2011. [51] Wallace BC. Machine Learning in Health Informatics: Making Better use of Domain Experts. Medford: Tufts University; 2012.

Complimentary Contributor Copy

The Future of Network Meta-Analysis: Toward Accessibility and Integration

389

[52] de Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I. Automated information extraction of key trial design elements from clinical trial publications. AMIA Annual Symposium proceedings / AMIA Symposium AMIA Symposium. 2008:141-5. [53] Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med. Inform. Decis. Mak., 2010;10:56. [54] Hsu W, Speier W, Taira RK. Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature. AMIA Annu. Symp. Proc., 2012; 2012: 350-9. [55] Department TCIKM. RevMan 5: The Cochrane Informatics & Knowledge Management Department; 2012. Available from: http://tech.cochrane.org/Revman (last accessed on March 28, 2014). [56] Ip S, Hadar N, Keefe S, Parkin C, Iovin R, Balk EM, Lau J. A Web-based archive of systematic review data. Syst. Rev., 2012; 1: 15. [57] Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann. Intern. Med., 2007; 147: 224-33. [58] Collaboration TC. Open access: The Cochrane Collaboration; 2014. Available from: http://www.cochrane.org/editorial-and-publishing-policy-resource/open-access (last accessed on March 28, 2014). [59] Elliott JH, Turner T, Clavisi O, Thomas J, Higgins JP, Mavergames C, Gruen RL. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med., 2014; 11: e1001603. [60] Ioannidis J, Lau J. Evolution of treatment effects over time: empirical insight from recursive cumulative metaanalyses. Proc. Natl. Acad. Sci. USA, 2001; 98: 831-6. [61] Lexchin J. Sponsorship bias in clinical research. Int. J. Risk. Saf. Med., 2012; 24: 23342. [62] Buchkowsky SS, Jewesson PJ. Industry sponsorship and authorship of clinical trials over 20 years. Ann. Pharmacother, 2004; 38: 579-85. [63] Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database Syst. Rev., 2009: MR000006. [64] Golder S, Loke YK. Is there evidence for biased reporting of published adverse effects data in pharmaceutical industry-funded studies? Br. J. Clin. Pharmacol., 2008; 66: 76773. [65] Golder S, Loke Y, McIntosh HM. Poor reporting and inadequate searches were apparent in systematic reviews of adverse effects. J. Clin. Epidemiol., 2008; 61: 440-8. [66] Hopewell S, Clarke M, Stewart L, Tierney J. Time to publication for results of clinical trials. Cochrane Database Syst. Rev., 2007: MR000011. [67] Simes RJ. Publication bias: the case for an international registry of clinical trials. J. Clini. Oncol., 1986; 4: 1529-41. [68] Song F, Gilbody S. Bias in meta-analysis detected by a simple, graphical test. Increase in studies of publication bias coincided with increasing use of meta-analysis. BMJ, 1998; 316: 471. [69] Duval S, Tweedie R. Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics, 2000; 56: 455-63.

Complimentary Contributor Copy

390

Matthew A. Silva and Gert van Valkenhoef

[70] Sterne JA, Gavaghan D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J. Clin. Epidemiol., 2000; 53: 111929. [71] Begg CB. Comment on A comparison of methods to detect publication bias in metaanalysis by P. Macaskill, S. D. Walter and L. Irwig, Statistics in Medicine, 2001; 20:641-654. Stat. Med., 2002; 21: 1803. [72] Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Comparison of two methods to detect publication bias in meta-analysis. JAMA, 2006; 295: 676-80. [73] Mavridis D, Sutton A, Cipriani A, Salanti G. A fully Bayesian application of the Copas selection model for publication bias extended to network meta-analysis. Stat. Med., 2013; 32: 51-66. [74] Dickersin K, Rennie D. Registering clinical trials. JAMA, 2003; 290: 516-23. [75] Chalmers I, Bracken MB, Djulbegovic B, Garattini S, Grant J, Gülmezoglu AM, Howells DW, Ioannidis JP, Oliver S. How to increase value and reduce waste when research priorities are set. Lancet, 2014; 383: 156-65. [76] Ioannidis JP, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R. Increasing value and reducing waste in research design, conduct, and analysis. Lancet, 2014; 383: 166-75. [77] Al-Shahi Salman R, Beller E, Kagan J, Hemminki E, Phillips RS, Savulescu J, Macleod M, Wisely J, Chalmers I. Increasing value and reducing waste in biomedical research regulation and management. Lancet, 2014; 383: 176-85. [78] Chan AW, Song F, Vickers A, Jefferson T, Dickersin K, Gøtzsche PC, Krumholz HM, Ghersi D, van der Worp HB. Increasing value and reducing waste: addressing inaccessible research. Lancet, 2014; 383: 257-66. [79] Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, Michie S, Moher D, Wager E. Reducing waste from incomplete or unusable reports of biomedical research. Lancet, 2014; 383: 267-76. [80] Bourgeois FT, Murthy S, Mandl KD. Outcome reporting among drug trials registered in ClinicalTrials.gov. Ann. Intern. Med., 2010; 153: 158-66. [81] Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results database--update and key issues. N. Engl. J. Med., 2011; 364: 852-60. [82] Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM. Trial publication after registration in ClinicalTrials.Gov: a cross-sectional analysis. PLoS Med., 2009; 6: e10 00144. [83] McCray AT, Ide NC. Design and implementation of a national clinical trials registry. J. Am. Med. Inform. Assoc., 2000; 7: 313-23. [84] van Valkenhoef G, Tervonen T, Brock B, Hillege H. Deficiencies in the transfer and availability of clinical trials evidence: a review of existing systems and standards. BMC Med. Inform. Decis. Mak., 2012; 12: 95. [85] AllTrials. All Trials Registered | All Results Reported. AllTrials; 2014 [cited 2014 March 7th, 2014]. Available from: http://www.alltrials.net (last accessed on March 28, 2014). [86] Arkinson J, Holbrook A, Wiercioch W. Public perceptions of physician pharmaceutical industry interactions: a systematic review. Healthc. Policy, 2010; 5: 6989.

Complimentary Contributor Copy

The Future of Network Meta-Analysis: Toward Accessibility and Integration

391

[87] Boumil SJ, 3rd, Nariani A, Boumil MM, Berman HA. Whistleblowing in the pharmaceutical industry in the United States, England, Canada, and Australia. J. Public Health Policy, 2010; 31: 17-29. [88] Boumil MM, Berman H. Transparency in research and its effect on the perception of research integrity. JONAS Healthc. Law Ethics. Regul., 2010; 12: 64-8. [89] Nature. Data sharing will pay dividends. Nature. 2014. Available from: http://www.nature.com/polopoly_fs/1.14468!/menu/main/topColumns/topLeftColumn/ pdf/505131a.pdf (last accessed on March 28, 2014). [90] Kaiser J. Making clinical data widely available. Science, 2008; 322: 217-8. [91] Wolfson M1, Wallace SE, Masca N, Rowe G, Sheehan NA, Ferretti V, LaFlamme P, Tobin MD, Macleod J, Little J, Fortier I, Knoppers BM, Burton PR. DataSHIELD: resolving a conflict in contemporary bioscience--performing a pooled analysis of individual-level data without sharing the data. Int. J. Epidemoiol., 2010; 39: 1372-82. [92] van Valkenhoef G, Tervonen T, Zhao J, de Brock B, Hillege HL, Postmus D. Multicriteria benefit-risk assessment using network meta-analysis. J. Clin. Epidemiol., 2012; 65: 394-403. [93] Keeny RL, Raiffa H. Decisions with multiple objectives: preferences and value tradeoffs. New York: Wiley; 1976. [94] Bujkiewicz S1, Jones HE, Lai MC, Cooper NJ, Hawkins N, Squires H, Abrams KR, Spiegelhalter DJ, Sutton AJ. Development of a transparent interactive decision interrogator to facilitate the decision-making process in health care. Value Health, 2011; 14: 768-76. [95] Drummond MF, Sculpher MJ, Torrance GW, O'Brien BJ, Stoddart GL. Methods for the economic evaluation of health care programmes. New York: Oxford University Press; 2005.

Complimentary Contributor Copy

Complimentary Contributor Copy

In: Network Meta-Analysis Editor: Giuseppe Biondi-Zoccai

ISBN: 978-1-63321-001-1 © 2014 Nova Science Publishers, Inc.

CONCLUSION Giuseppe Biondi-Zoccai*, M.D. Assistant Professor in Cardiology, Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Latina, Italy We hope you have enjoyed reading this textbook as much as we have liked it during the planning, editing and producing phases. It must be clear to everybody that this was a collaborative effort from its beginning. While I am the sole editor of this book, almost all its specific contents were authored by experts other than me, who should be fully credited for their efforts in every appropriate occasion and every pertinent venue. As stated at the inception of this opus, if this work will have become obsolete in 5 years or less, we will be even happier than today, as this will have meant that the field has grown and improved in its breadth and impact momentously. Indeed, any meta-analysis must be carefully interpreted and needs to address a specific and relevant question, and provide useful inference for pragmatic decision making. Accordingly, this book should be viewed as a tool and a means to an end, rather than a goal in itself. Nonetheless, we hope this work will have been a useful instrument and adjunct to your armamentarium, irrespective of your specific goals and background. Finally, we humbly look forward to receiving your comments, critiques and suggestions for upcoming revisions and subsequent editions.

*

Corresponding author: Giuseppe Biondi-Zoccai, MD, Department of Medico-Surgical Sciences and Biotechnologies, Sapienza University of Rome, Corso della Repubblica 79, 04100 Latina, Italy. Phone: +39 07731757245. Fax: +39 07731757254. Email: [email protected].

Complimentary Contributor Copy

Complimentary Contributor Copy

INDEX A abstraction, 75 abstraction form, 75 access, 33, 48, 49, 50, 52, 67, 69, 82, 90, 164, 230, 340, 375, 377, 380, 382, 383, 387 accessibility, 376 accounting, 11, 146, 218, 298, 338 acid, 36, 305, 306 acquaintance, 109 action, ix, 361 adalimumab, 39, 338, 343, 346, 347, 349, 351, 352, 356, 358 adaptations, xxvi, 63 adjunctive therapy, 306, 318, 320, 321 adjustment, 33, 34, 98, 111, 137, 207, 212, 214, 215, 281, 338, 352, 353, 362, 366, 371, 386 administrators, 52, 57 adolescents, 247, 258 ADR, 187 adults, 310, 311, 319, 320 advancement(s), 209, 237, 296 adverse effects, 9, 12, 65, 74, 306, 312, 316, 321, 387 adverse event, 5, 33, 70, 78, 254, 277, 305, 307, 313, 319, 320, 366, 375 AFM, 303 African-American, 232, 241 age, 70, 78, 82, 122, 170, 171, 173, 175, 179, 232, 233, 234, 296, 340, 343, 346 agencies, 171, 183, 381 aggregation, 33, 34, 222 AIDS, 241 algorithm, 34, 120, 269, 270, 286 alternative hypothesis, 104 alternative medicine, 69 alternative treatments, 103, 329

American Heart Association, 290, 294, 296, 298, 304, 371 amplitude, 310 anatomy, xxix anesthesia, viii, 40, 263, 264, 281 anesthetics, 263, 264, 272, 278, 279 angioplasty, 199, 303, 365 ANOVA, 180 antibody, 356, 357 anticoagulant, 370 anticoagulation, xxx, xxxi, 362 anticonvulsant, 321 antidepressant(s), xxii, 95, 149, 217, 218, 219, 250, 253, 254, 259, 323, 324, 325, 326, 328, 329, 330, 331, 334, 335, 370, 371, 386 antiepileptic drugs, 305, 306, 318, 319, 320 antihypertensive agents, 124, 143, 144, 145, 246 antihypertensive drugs, 124, 138, 143, 149, 151, 166, 258 antipsychotic, 253, 254, 255, 259, 386 antipsychotic drugs, 253, 255, 259, 386 anxiety, 247, 258, 323 anxiety disorder, 247, 258, 323 aortic valve, 279, 280 apoptosis, xxvi apples, 32 appropriate statistics, 139 aristotle, 221 arm-based method, 139 arterial hypertension, 363 artery, 199, 207, 278, 280, 281, 283, 284, 285, 297, 301, 302, 303, 304, 367 arthritis, 337, 338, 354, 356 articulation, 44 assessment, 12, 23, 35, 44, 47, 48, 49, 54, 73, 78, 87, 88, 90, 91, 92, 93, 94, 96, 105, 111, 112, 116, 117, 121, 122, 136, 157, 163, 171, 207, 238, 242, 246, 305, 306, 307, 309, 311, 328, 332, 363, 366 assessment tools, 92

Complimentary Contributor Copy

396

Index

asthma, 66 asymmetry, 14, 95, 217, 368 ataxia, 305, 311, 312, 314, 316, 317 atrial fibrillation, xxx, xxxi, xxxvii, 39, 137, 162, 166, 241 attachment, 54 attitudes, 16 attribution, 380 audit(s), 57, 318 Austria, 172, 385 authentication, 380 authorities, xi autoimmune disease, 338 automate, 373, 375, 376, 383 automation, 380 avoidance, 14 awareness, 72, 374

B background information, 50 balloon angioplasty, 199, 200, 201 barriers, 33 base, xxvi, 28, 30, 44, 47, 61, 62, 65, 78, 93, 102, 109, 213, 244, 264, 283, 354 Bayesian decision-making, 283 Bayesian framework, 104, 105, 108, 109, 117, 119, 163, 198, 202, 212, 215, 227, 249, 339, 353, 368 Bayesian inference, 102, 105, 108, 120, 190, 199 Bayesian method(s), xviii, xix, 24, 30, 34, 38, 92, 105, 106, 109, 111, 157, 159, 207, 298, 303, 304, 338, 354, 366, 373, 376 Bayesian model, 24, 107, 116, 189, 195, 199, 200, 240, 283, 284, 285, 286, 328, 355, 375 Bayesian statistics, xviii, 25, 102, 375 behaviors, 232 benefits, xxvii, 6, 9, 12, 16, 26, 35, 98, 137, 190, 223, 236, 237, 238, 240, 244, 334, 370, 379 beta blocker, 366 BIA, 320 bias, viii, 7, 10, 11, 12, 17, 48, 59, 88, 89, 90, 91, 92, 94, 95, 96, 97, 167, 170, 207, 209, 216, 217, 218, 242, 244, 290, 291, 339, 340, 353, 355, 387 biopsy, xxv bleeding, 78, 149, 248, 258 blindness, 272 blood, 222, 264 blood pressure, 222, 264 BMI, 78 bone, xxvii, 338 bone marrow, xxvii brain, 311, 317 brain stem, 311

breast cancer, 301, 354 building blocks, 64 burn, 120, 124, 142, 266, 343 Butcher, 117 bypass graft, xxxvi, 219, 279, 302

C calcium, 143, 145, 149, 362 calcium channel blocker, 362 calculus, xi cancer, 96 carbamazepine, 307, 321 carcinoma, 149 cardiac surgery, 263, 264, 265, 271, 272, 273, 277, 278, 279, 280 cardiologist, xii cardiopulmonary bypass, 264, 272, 279, 280, 281 cardiovascular disease, 6, 16, 116, 163 caregivers, 139, 140, 146 carotene, 6, 16 cartilage, 338 case study, xviii, xxx, 70, 85, 137, 144, 165, 167, 216, 258, 323, 324, 333, 338, 353, 356, 384 casting, 212 categorization, 93 category a, 93 causal relationship, 175, 181, 362 causality, 33 causation, 5, 170 cell biology, xxvi central nervous system, 127, 311 cerebrospinal fluid, 321 cerebrovascular disease, 306 challenges, xxii, xxx, xxxiv, 14, 18, 21, 96, 141, 185, 375 channel blocker, 143, 145 chemicals, 67 Chicago, 110 childhood, 66, 384 children, xxvii, 66, 247, 258, 308, 310, 320 China, 60, 357 chronic diseases, xxvi chronic obstructive pulmonary disease, 244, 245, 258 citalopram, 250, 325, 327, 329, 332 civilization, xxv clarity, 7, 57 classes, 143, 309, 362 classification, xxvi, 304 cleaning, 232 climate, 380 clinical assessment, 365

Complimentary Contributor Copy

397

Index clinical heterogeneity, 169, 170, 171, 173, 174, 185 clinical judgment, 22, 377 closed loop, 190 cluster analysis, 254 clustering, 165, 236, 242, 254 clusters, 254 CNS, 199, 201, 202 coding, 47, 54, 375, 376 cognitive dysfunction, 280 cognitive process, 4 coherence, 189, 190, 193, 367 collaboration, xii, xxv, 52, 88, 307, 318, 366, 378, 379, 382 College Station, 240, 241 collisions, 367 color, 342 combination therapy, 340, 343 commercial, 8, 67, 227, 381 common sense, xxvii communication, 237 community, 112, 232, 288, 355 complexity, 15, 27, 34, 37, 85, 108, 113, 123, 136, 138, 151, 165, 167, 199, 202, 221, 222, 243, 244, 267, 319, 375 compliance, 174, 178 complications, xvii compounds, 324, 328, 329 computation, xviii, 101, 366 computer, xvii, xviii, 375 computing, 101, 112, 379 conceptualization, 248 conditioning, 279 conduction, 306 conference, 67, 70, 71, 77, 79, 83, 172, 264 configuration, 125, 129, 132, 265, 368 conflict, 90, 389 conflict of interest, 90 confounders, 195, 296, 355, 362 confounding variables, 6 consensus, 17, 83, 169, 170, 171, 181, 182, 183, 188, 240, 264 consistency, 7, 106, 119, 137, 151, 155, 157, 166, 190, 192, 193, 201, 203, 207, 269, 325, 328, 367 construct validity, 92 construction, 108, 344 consulting, 175, 257 consumers, 323 consumption, 374 continuous data, 115, 356 contour, 214 contraceptives, 166 contrast-based method, 139 control group, 5, 7, 54, 176, 195, 225, 232, 340, 366

controlled studies, 182, 218, 307, 309, 316, 317, 332 controversial, xxvii, 164 convention, 30 convergence, xix, 120, 136, 200, 233, 235, 287, 376 cooperation, xxvii coordination, 305, 311, 312, 314, 317 coronary angioplasty, 303 coronary artery bypass graft, 264, 266, 278, 279, 280, 281, 284, 290, 296, 301, 302, 303 coronary artery disease, xiii, xxxvi, 6, 219, 283, 284, 285, 301, 302, 303, 363, 367 coronary bypass surgery, 301 coronary heart disease, 16 correlation(s), 6, 35, 40, 141, 142, 143, 159, 212, 217, 229, 240, 374, 383 correlation coefficient, 240 cost, xxxi, xxxiv, 26, 30, 35, 37, 38, 107, 108, 111, 150, 207, 245, 335, 353, 354, 377, 383, 386 cost effectiveness, 353 counseling, 241, 242 covering, 73 CPB, 272 CPI, 68 creatinine, 280 critical analysis, 22 criticism, xxvii, 32 cross-design synthesis, 283, 288, 290 culture, 380 cumulative distribution function, 141

D danger, 31 data analysis, 33, 48, 108, 140, 141, 237 data collection, 25 data mining, 254 data processing, xxvii data set, 47, 145, 381 data structure, 116, 179, 188, 266 database, 45, 46, 49, 62, 63, 64, 65, 67, 70, 71, 72, 73, 80, 82, 217, 219, 222, 258, 377, 380, 388 deaths, 272, 295 decision makers, xxxv, 7, 9, 21, 26, 30, 34, 35, 171, 383 decision-making process, 354, 389 defects, 34 defibrillator, 137 deficiencies, 65 dementia, 306 demographic characteristics, 122 demonstrations, 91 dependent variable, 181 depressants, 329, 363, 382

Complimentary Contributor Copy

398

Index

depression, 323, 324, 325, 326, 328, 329, 334, 335 depth, 45, 202 destruction, 248, 258 detectable, 317 detection, 6, 89, 122, 216, 367 deviation, 24, 43 diabetes, 124, 125, 126, 127, 128, 138, 140, 143, 144, 145, 146, 149, 151, 166, 246, 258 diabetic ketoacidosis, 13 diabetic neuropathy, 38 diagnostic criteria, 325 diffusion, xxvii diplopia, 305, 311, 312, 314, 316, 317 direct evidence, 22 disclosure, xi, xvii, xxi, xxix, xxxiii, 146, 257, 298, 305, 374, 382, 391 discordance, 364 disease activity, 127, 339 diseases, xxvi, 6 disorder, 46, 363 diuretic, 143, 144 diversity, 70, 121, 191, 376 dizziness, 305, 311, 312, 314, 316, 317 dosage, 12, 128, 311, 314 dosing, 14, 26 double-blind trial, 314 draft, xii, 172 drawing, 78, 252 dream, xvii, xxx drug safety, 25, 26 drug therapy, 66, 143 drug treatment, 246, 356 drugs, 28, 38, 67, 122, 143, 149, 157, 166, 246, 254, 257, 258, 264, 265, 266, 277, 305, 306, 309, 310, 314, 324, 329, 331, 338, 343, 346, 347, 352, 354, 355, 362, 363, 364, 368 DSC, 385 duplicate, 82, 264

E ecology, 95, 376 economic change, 362 economic evaluation, 45, 389 economic status, 77 editors, xxiii, 36, 37, 40, 59, 73, 74, 84, 110, 136, 172, 184, 237, 242, 258, 303, 321 education, xii, 232 elaboration, xviii, 36, 74, 83, 385 eligibility criteria, 7, 8, 11, 52, 76, 328 e-mail, 172 emotion, xxvii, xxviii empirical studies, 236, 237

end-users, 104, 108, 171, 378 energy, xii, 367 enforcement, 211 engineering, xi England, 50, 303, 389 enuresis, 384 environment(s), xxvi, 102, 108, 198, 340, 362, 378 enzyme, 143, 145, 309 EPC, 84 epidemic, 241 epidemiologic, 210 epidemiology, xxxv, 36, 92, 96, 97, 111, 172, 188, 379 epilepsy, 306, 307, 309, 318, 319, 320, 321 equality, 32 equipment, 272 equity, 173 escitalopram, 250, 325, 326, 329 estimation process, 286 estrogen, 16 etanercept, 39, 338, 343, 346, 347, 349, 351, 352, 353, 355 ethics, 172 ethnicity, 122, 171, 174, 177 evidence network, 21, 22 evidence-based medicine, xxiii, 4, 15, 361 evil, 104 evoked potential, 280 evolution, xxvi exchangeability, 190 exclusion, xxxi, xxxiv, 7, 44, 45, 49, 54, 70, 72, 76, 77, 91, 324, 329, 377, 378 execution, 11, 22, 23, 170 exercise, xviii, 177, 222, 371 expertise, xxvi, 45, 80, 82, 92, 108, 169, 171, 173, 175, 176, 181, 182, 272, 366 exploitation, 365 exposure, 5, 6, 33, 54, 89, 91, 118, 140, 170, 176 external validity, 7, 61, 215, 338 extraction, 44, 47, 52, 53, 54, 57, 65, 75, 76, 77, 79, 80, 81, 82, 83, 84, 85, 176, 223, 236, 325, 328, 340, 376, 377, 378, 379, 380, 383, 387 extraction form, 75

F false negative, 7 false positive, 7, 61, 90, 105, 367 fat, 116 file drawer problem, 209 filters, 65, 67 financial, xxvi fitness, 108

Complimentary Contributor Copy

399

Index fixed effect model, 119, 121, 125, 128, 132, 224, 266, 276, 365 fixed effects, 116, 239 flaws, xxx, 7, 11, 14, 34 flexibility, 33, 73, 105, 107, 165, 228 fluctuations, 317 fluoxetine, 250, 253, 254, 325, 328, 329, 330, 331, 332, 333 fluvoxamine, 250, 325, 327, 332, 333 Food and Drug Administration (FDA), 26, 37, 211, 219, 311, 315, 364, 365, 370 football, xviii, 367 force, xii, xiii forest plot, 201, 267, 270, 284, 285, 348 formation, 176, 181 formula, 196, 331 fractures, xxii framing, 15 France, xii freedom, 142, 191, 197, 307, 310, 318 frequentist framework, 104 frequentist inference, 102 frequentist statistics, 102 friendship, xii funding, 45, 80, 82, 115, 122, 146, 171, 173, 183, 257, 362, 375, 377, 378, 380 funnel plot, 95, 268 fusion, 356

G gait, 311 gamma-tocopherol, 278 garbage, xxx, 209 gastroenterologist, xii generalizability, 339 generalized anxiety disorder, 323 genetic disease, xxvi genetic diversity, 173 genetic screening, xxvi genetics, 17 genome, xvii geometry, xi, 14, 39, 124, 190, 191, 192, 200, 368, 376, 385 Georgia, viii, 243, 257 Germany, 172, 243 glaucoma, 166 GLM, xviii, 115, 117, 118, 119, 120, 125, 128, 132, 158 globalization, xxx google, 67, 71, 113 grading, 9, 17, 370 grading evidence, 88

graph, 213, 249, 265, 342, 368, 376 graphical tool, 244, 256, 257, 258 Greece, 112, 221, 241, 243 greed, xxxiii grouping, 254 growth, xxx guessing, xxix guidance, xii, xxxvi, 44, 48, 53, 57, 72, 73, 74, 76, 84, 87, 96, 171, 172, 178, 188, 221, 242, 289, 376 guideline(s), 9, 17, 18, 40, 64, 73, 97, 98, 171, 172, 182, 185, 186, 187, 188, 199, 304, 370, 374, 376 guilty, 214

H hair, 339 half-life, 317 harmful effects, 382 hazards, xiii, 155, 163, 187, 219 healing, xxvi health, 3, 15, 17, 22, 36, 37, 39, 45, 46, 48, 49, 52, 53, 54, 57, 63, 67, 73, 74, 97, 111, 112, 116, 134, 135, 171, 184, 186, 187, 206, 207, 222, 237, 240, 298, 355, 361, 369, 371, 383, 385, 386, 389 health care, 3, 15, 36, 48, 73, 74, 97, 134, 186, 187, 237, 298, 369, 385, 389 health condition, 52 health services, 67 heart attack, 22, 36 heart failure, 366, 371 heart rate, 264 height, xix herpes, 247, 258 herpes simplex, 247, 258 heterogeneity, viii, 14, 116, 122, 169, 173, 174, 176, 178, 183, 185, 187, 190, 193, 207, 234, 271, 325, 328, 371 hierarchical model, 301, 338, 354 high school, xii hip replacement, 38 historical data, 283 history, 6, 57, 111, 142, 150, 200, 386 history of meta-analysis, 103 HIV, 29, 232, 233, 241, 242 HIV test, 241, 242 HIV/AIDS, 241 homogeneity, 21, 31, 137, 190, 193, 215, 221, 222, 224, 228 hormone, 6 hospital death, 219 human, xxv, xxvi, 76, 82, 356, 357, 377, 381 human health, xxv human subjects, 381

Complimentary Contributor Copy

400

Index

Hunter, 237, 239 hypertension, 362 hypothesis, xxxi, xxxiv, 24, 29, 33, 103, 175, 177, 181, 212, 266, 272, 273, 366 hypothesis test, xxxiv, 24, 29, 103 hysterectomy, 248

I icon, 57 ideal, xxx, xxxv, 32, 175, 179, 292, 382 identification, 5, 8, 22, 23, 51, 53, 55, 57, 87, 123, 167, 310, 317, 318, 328, 380 identity, 132 ideology, xxvii illusions, 15 imbalances, 195 immune system, 338 immunogenicity, 150 immunosuppression, 46 improvements, xxii, xxxvi, 237, 374, 383 incidence, 5, 6, 16, 82, 124, 127, 151, 232, 366 incompatibility, 377 inconsistency, viii, 10, 12, 39, 40, 96, 106, 123, 189, 190, 197, 200, 201, 204, 206, 239, 244, 256, 281, 341, 349, 355 independent variable, 181, 187 indirect effect, 14, 117, 256 indirect evidence, 22, 113 individual character, 170 individual characteristics, 170 individual patient data, 221, 222, 237, 239 individuals, 5, 6, 7, 33, 39, 53, 54, 71, 137, 164, 166, 172, 173, 175, 176, 179, 181, 187, 241, 305, 307, 309, 310, 376 industry, 12, 122, 310, 329, 364, 381, 387, 388, 389 inefficiency, 378 infancy, xxi infarction, 365 infection, 29, 116, 241 inferences, 4, 13, 15, 157, 257, 287, 291, 292, 307 inflammatory disease, xxii, 127 infliximab, 338, 343, 346, 347, 349, 351, 352, 356, 357 influenza, 150 information retrieval, 45 information technology, 377 inhaler, 66 inhibition, 151 injury, xxvi, 280 inoculation, 22 institutions, 362, 380 insulin, 13

integration, 21, 22, 383 integrity, 7, 381, 389 intelligence, xxv intensive care unit, 279 interaction effect, 231 intercourse, 232 interface, 65, 66, 72, 198, 375 interferon, 354 internal validity, 5, 7, 48, 88, 89, 215, 364, 366 intervals, 194, 199 intraocular, 166 intraocular pressure, 166 intravenously, 263, 338, 344, 346, 347, 349, 352 investment, 377 Iowa, 68, 69 Ireland, 337 isolation, xxv, 244, 257 issues, xxxv, 5, 22, 46, 47, 48, 49, 52, 62, 76, 88, 89, 90, 91, 93, 115, 155, 174, 178, 185, 186, 191, 202, 209, 222, 243, 308, 333, 363, 368, 374, 388 Italy, xi, xii, xvii, xxv, xxvi, xxxiii, 101, 209, 263, 305, 312, 361, 391 iteration, xxii

J Japan, 323, 356 Java, 375 joint swelling, 338 justification, 46, 50, 51, 93

K Karl Pearson, 22

L languages, 8, 53, 211 laptop, 375 lead, xii, xxvi, xxx, 4, 8, 10, 11, 12, 25, 29, 31, 53, 69, 82, 83, 89, 105, 144, 145, 157, 161, 165, 182, 193, 211, 215, 224, 236, 246, 292, 310, 363, 365, 375, 377, 381, 383 learning, xviii, xxx, xxxiv, 45, 51 legal protection, 382 legislation, 381 lens, xxxi lesions, 304 life sciences, 67 lifetime, xxxiii light, xiii, 122, 165, 367

Complimentary Contributor Copy

401

Index linear model, xix, 37, 111, 115, 116, 117, 119, 134, 138, 155, 161, 207, 225, 281, 386 liver, xxvi living reviews, 373 localization, 318 logical reasoning, 172 low risk, 90, 248 Luo, 356

M machine learning, xvii, xviii, 373, 378, 379, 380, 383, 386 magnesium, 111 magnitude, 8, 11, 12, 13, 28, 29, 34, 89, 95, 175, 181, 182, 198, 202, 222, 224, 257, 287, 291, 297 major depression, 250, 253, 254, 323, 324, 325, 329, 363 major depressive disorder, 325, 335 majority, 79, 91, 212, 246, 310 management, xxii, xxvii, xxxiv, 9, 62, 71, 72, 83, 135, 170, 171, 199, 232, 318, 376, 377, 381, 383, 388 mania, 149, 249, 258 manic, 249 marginal distribution, 229 marketplace, 102 Markov chain, 107, 108, 142, 200, 286, 367 marrow, xxvii masking, 89 mass, 116 mathematics, 193 matrix, 122, 141, 142, 160, 161, 255, 266 matter, xviii, 13, 117 measurement(s), 11, 14, 29, 32, 74, 78, 122, 169, 170, 171, 173, 174, 178, 307, 365, 374, 376 media, xxxvi, 219, 355 median, 199, 266, 268, 284, 287, 290, 296, 307, 340 medical, xi, xii, xix, xxi, xxii, xxvi, xxix, xxx, 4, 5, 6, 15, 16, 22, 28, 36, 67, 111, 112, 199, 222, 272, 283, 284, 296, 297, 301, 323, 362, 381, 384 medical science, 22, 296 medication, 257, 310, 355 medicine, xii, xxi, xxiii, xxv, xxvi, xxvii, xxviii, xxix, xxx, xxxiv, xxxv, 3, 4, 15, 26, 37, 102, 112, 113, 187, 253, 272, 283, 355, 361, 385 melanoma, 135 mellitus, 151 membership, 172, 176 mentor, xii, xxvii mentorship, xii message passing, xviii messages, 55

meta-regression, 33, 39, 155, 156, 167, 181, 187, 207, 218, 384 methodological principles, 116 methodology, xvii, xxii, xxxv, 8, 21, 22, 24, 31, 34, 87, 92, 95, 169, 172, 222, 223, 238, 303, 334, 338, 339, 340, 374, 375, 379, 380 migration, 380 minimizing bias, 43 Minneapolis, 151 mitral valve, 279, 280 mixing, 233 model assessment, 116 model fit, 104, 190 model specification, 162, 231, 383 modelling, 115, 116, 121, 189, 195, 207, 222, 230, 240, 259, 339, 340, 352, 353, 355 moderators, xxxiii, 48, 181, 221 molecules, xxvi monoclonal antibody, 356 Monte Carlo method, 286 morbidity, 232 morning stiffness, 338 mortality, 13, 22, 26, 36, 135, 232, 263, 264, 265, 266, 267, 268, 270, 271, 272, 273, 278, 284, 285, 287, 288, 289, 290, 292, 294, 295, 296, 297, 298, 365 mortality rate, 284, 288, 294, 296 mortality risk, 288, 298 motivation, xxvi, 232, 241, 272, 287 motivational skills, 241 mRNA, 279 multidimensional, 150, 163, 174, 178 multiple sclerosis, 129, 130, 131, 138 multiplication, xxvi multiplier, 341, 351 multivariate distribution, 158 musculoskeletal, 39 mutation, xxvi myocardial infarction, xxiii, 22, 26, 36, 37, 78, 111, 365 myocardial ischemia, 279, 280 myocardium, 278 mythology, xxvi

N narcissism, 334 National Health Service (NHS), 45, 46, 50, 84, 189 nausea, 305, 311, 314 necrosis, 135, 355, 356, 357 neglect, xxx nephropathy, 216 nervousness, 35

Complimentary Contributor Copy

402

Index

Netherlands, xii, 108, 373 network meta-regression, 158, 165, 374 neuralgia, 38 neurology, viii, 184, 305 neutrons, 367 New Zealand, 69, 188 next generation, 98, 137, 240, 334 nodes, xviii, 27, 198, 199, 234, 244, 245, 246, 265, 368 normal distribution, 118, 156, 158, 162, 193, 194, 197, 199, 224, 226, 229, 285, 287, 289, 295, 341 NPC, 40 nuisance, 140, 142 null, 23, 103, 196, 237, 297, 365, 381 null hypothesis, 23, 103, 196, 237, 365 nursing, 69 nystagmus, 314

O Obama, 369 objective criteria, 47 occupational therapy, 185 octopus, 191 odds ratio, 139, 150, 308, 337, 366 omission, 47 open-mindedness, xii openness, 381 operations, xxvii opportunities, 21 optimism, 96 organ(s), xxvi organize, 35, 377 outpatients, 332 overlap, 12, 69, 78, 194, 197, 201, 328 ownership, 188 oxidative stress, 278

P paclitaxel, 199, 302 pain, 338 Pairwise meta-analysis, 223, 224, 361, 375 paradigm shift, xxxi, xxxv, 61, 355 parallel, xxi, xxxiv, 224, 309, 317, 320, 357, 362 parameter estimates, 165, 227, 236, 287 parameter estimation, 118, 120, 164 Pareto, 287 paroxetine, 250, 325, 327, 329, 333 partial seizure, 307 participants, xxvii, 8, 11, 12, 46, 54, 55, 70, 77, 78, 89, 90, 118, 124, 127, 132, 156, 169, 170, 171,

172, 179, 181, 191, 193, 222, 233, 234, 235, 238, 244, 246, 381 password, 52 pathogenesis, 338 pathology, xxvi, xxix, 362, 363 pathophysiological, 174 pathways, xxvi patient care, xxiii, xxx, 7 patient-level, 222, 382 PCT, xxv peer review, 51, 65, 74, 245, 379, 380 peptide, xiii permission, 51, 265, 267, 268, 269, 270, 290, 294, 296, 313, 315, 316 permit, 13 personality, 186 PES, 199 pharmaceutical, 69, 110, 217, 310, 326, 387, 388, 389 pharmacokinetics, 272, 321 pharmacology, xxvi, 67 phenytoin, 307 Philadelphia, 87, 369 physicians, xxvii, xxx, 140, 273, 362, 363, 364, 365 physics, xviii physiology, xxix pilot study, 280 pipeline, 378 platform, 72, 380 plausibility, 175, 181, 195, 199, 256 pneumonia, 29 policy, 44, 57, 67, 171, 184, 298, 381, 387 policy makers, 171 polyurethane, 241 Pooled analysis, 222 pools, 190 population, 7, 11, 32, 34, 35, 44, 46, 54, 91, 104, 120, 121, 140, 141, 143, 144, 145, 156, 170, 173, 176, 224, 273, 285, 288, 292, 307, 310, 317, 318, 340, 365 Population-averaged event rate, 139 portfolio, xxxiii post-transplant, 46 preparation, xxv, 377 preterm delivery, 372 prevention, 9, 16, 29, 36, 39, 135, 137, 162, 166, 216, 232, 238, 241 principles, xxii, xxiii, 3, 4, 44, 63, 73, 88, 189, 190 prior distribution, 283, 341, 343 prior knowledge, 44, 119, 176 probability distribution, 104, 105, 287, 294, 341, 367 probability theory, 285 producers, 67

Complimentary Contributor Copy

Index professionals, 171, 375 prognosis, 6, 7, 318 programming, 375 project, xxvii, 45, 71, 72, 76, 80, 82, 113, 169, 171, 172, 173, 182, 183, 190, 240, 257, 264, 284, 285, 375, 383, 385 propagation, xviii prophylaxis, 38 PROSPERO, 43, 44, 45, 46, 49, 50, 51, 52, 53, 56, 57, 58, 60, 62, 63, 199 protection, 264, 278, 279, 280, 281, 381 proteomics, xxvi protocol, 43, 46, 48, 58, 60, 63 psoriasis, 39 psychiatry, ix, 69, 221, 243, 323, 324 psychology, 15, 69, 238 psychotherapy, xxxvi, 243, 335 public domain, 43, 375, 381 public health, 7, 9, 13, 44, 67, 222, 232, 375 public policy, 36, 355 publication bias, 8, 10, 17, 39, 95, 171, 209, 210, 219, 238, 380, 387 publishing, 210, 381, 387 pulmonary hypertension, 370 punishment, xxvi P-value, 77

Q quality assessment, 88 quality improvement, 97 quality of life, 383 quantification, xviii, xix, 214 quantitative technique, 32 query, 64, 378 questionnaire, 339

R race, 216 random assignment, 89 random effects, xix, 116, 190, 194, 203, 204, 368 random walk, 286 rank, 116, 127, 212, 254, 271, 277 rank probabilities, 116, 127, 277 ranking, 123, 249, 250, 252, 253, 308, 335, 350 rating scale, 84, 325 reading, xxii, xxxi, xxxvi, 182, 366, 377, 391 reality, 15, 330, 339 reasoning, 88 recall, 4, 5 recognition, 4, 8, 23, 44

403

recommendations, xi, xxi, xxii, xxiii, 9, 16, 17, 26, 35, 37, 88, 167, 169, 171, 172, 173, 175, 179, 180, 181, 182, 183, 187, 188, 323, 354 recovery, 127, 279 recurrence, 22 redundancy, xxxv regeneration, xxvi, xxvii regenerative medicine, xxv, xxvi registration, 43, 48, 50, 51, 59, 60 registry, 67, 68, 69, 79, 301, 303, 326, 343, 381, 382 regression analysis, 39, 160, 187, 188, 212, 217 regression equation, 188 regression method, 158, 179, 218 regression model, 95, 119, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 167, 171, 188, 195, 384 regrowth, xxvi regulations, 48 regulatory agencies, 67, 326 relapses, 127, 129 relapsing-remitting multiple sclerosis, 354 relative risk, 139, 366 relevance, xii, 7, 45, 52, 71, 97, 184, 309, 318, 363, 367 reliability, 4, 35, 45, 48, 50, 75, 83, 85, 87, 89, 90, 91, 94, 183, 223, 228, 237, 368 remediation, 65 remission, 326, 331, 333 renal replacement therapy, 219 renin, 151 repetitions, xxxv replication, 46 reporting, viii, 59, 63, 72, 73, 89, 175, 181, 182, 199, 209, 216, 218, 243, 370, 374, 378 reporting bias, 89, 209, 370 requirements, 57, 65, 253 research funding, 257 researchers, xviii, xxvi, xxvii, xxxv, xxxvi, 7, 25, 28, 34, 35, 44, 47, 54, 67, 71, 76, 79, 80, 82, 108, 164, 165, 214, 217, 222, 230, 253, 334, 365, 374, 375, 377, 378, 379, 380, 381, 383 residual error, 188 resolution, xxxi, 47, 93 resources, xxxiii, xxxiv, 17, 45, 63, 68, 71, 74, 85, 102, 182, 195, 362 response, xix, 10, 13, 49, 55, 71, 140, 141, 174, 178, 319, 320, 321, 326, 330, 331, 332, 333, 337, 338, 339, 340, 341, 347, 354, 357 restenosis, 199, 200, 202 restrictions, 54, 71, 73 rewards, 142 rheumatoid arthritis, 38, 135, 162, 166, 335, 337, 338, 353, 354, 355, 356, 357, 358 rheumatology, ix, 135, 183, 337, 339, 348, 354

Complimentary Contributor Copy

404

Index

right atrium, 279 rings, 44 risk assessment, 389 risk factors, 6, 7, 52, 239 risk of bias tool, 88 risks, 26, 116, 140, 144, 366, 379 routes, 122 routines, 108, 227 Royal Society, 186 rules, 15, 23, 164, 298

S safety, xxvii, 7, 25, 26, 37, 46, 139, 223, 309, 319, 320, 343, 355, 356, 357 sample variance, 285 sampling error, 121, 156, 159, 228 SAS, xix, 107, 108, 109, 113, 159, 160, 227 scaling, 150, 174, 178 schizophrenia, 253, 255, 259, 386 school, xxi, xxix science, xxi, xxv, xxix, xxxi, 5, 7, 10, 172, 193, 334 scientific method, 15 sclerosis, 127 scope, xxvii, xxxi, xxxiv, 52, 64, 71, 96, 117, 139, 212, 217, 375, 382 SCT, 326 search, 61, 62, 64, 65, 66, 67, 70, 72, 73, 264, 325, 340 search strategy, 61 search terms, 62, 63, 64, 65, 66, 67 second generation, 363 seizure, 306, 307, 308, 309, 310, 318, 319 selective reporting, 209 selectivity, 211 sensation, 127 sensitivity, 30, 31, 32, 34, 47, 48, 63, 65, 66, 67, 93, 95, 106, 109, 174, 178, 214, 215, 216, 225, 227, 237, 263, 271, 287, 291, 292, 296, 329, 330, 340, 344, 353, 354, 363, 364, 366, 367 sensitivity analysis, 283 sertraline, 250, 325, 327, 329 serum, 321 services, 380 SES, 199 sex, 5, 171, 173, 174, 177, 232, 233 sexually transmitted diseases, 232 shape, 163, 266 shortage, xxii showing, 13, 121, 127, 131, 133, 244, 249, 251, 257, 271, 315 side effects, 329, 334, 362, 382 signal transduction, xxvi

signaling pathway, xxvi signals, 367, 368 signs, 311 simulation(s), xviii, 105, 106, 111, 136, 165, 187, 188, 190, 199, 214, 219, 230, 240, 303, 375, 376 Sinai, 87, 183 Small study effects, 209, 211, 215 smoking, xix, 259 social anxiety, 335 social care, 44, 45, 57, 63 social influence(s), 237 social network, xviii social sciences, 22 social support, 232 society, xxxi, 338 socioeconomic status, 173 sodium, 311, 317, 320, 321 software, xviii, xxxv, 62, 69, 71, 72, 102, 105, 106, 108, 112, 124, 140, 142, 150, 159, 160, 161, 198, 202, 223, 227, 233, 240, 241, 266, 269, 284, 285, 286, 312, 315, 366, 371, 373, 374, 375, 376, 377, 378, 379, 380, 383, 385 software code, 286 solution, xxii, xxvii, 136, 252, 362, 363, 367, 368, 381 somnolence, 314 Source data, 75 specialists, 45, 113, 369 specifications, 229 spelling, 65 spreadsheets, 377 stable angina, 301 standard deviation, 78, 119, 132, 141, 142, 200, 268, 340, 347, 352 standard error, 159, 212, 213, 269 standardization, 33 state(s), xxii, 8, 9, 46, 53, 146, 215, 242, 285, 321, 376, 378, 383 statin, 135 statistical inference, xxxiii, 24, 104, 161, 284, 366 statistical package, 102, 107 statistics, xi, xviii, xix, xxxv, 16, 25, 48, 50, 78, 84, 102, 111, 116, 124, 125, 139, 140, 141, 142, 145, 146, 162, 172, 199, 222, 240, 266, 366, 375, 376 stem cells, xxv, xxvi, xxvii stenosis, 301, 302, 303, 304 stent, xiii, xxxvi, xxxvii, 201, 149, 206, 283, 301, 302, 303, 363, 365, 370 stimulus, 34 strategy use, 48, 295 stratification, 93 streptokinase, 26 stress, 367

Complimentary Contributor Copy

405

Index stress test, 367 stroke, xxx, xxxi, 39, 137, 162, 166, 241 structure, xix, 64, 92, 136, 150, 159, 206, 208, 223, 225, 226, 229, 240, 255, 265, 273, 274, 275, 343, 355, 367, 368, 376, 382 study-level, 222 subcutaneous injection, 356, 357 subgroups, 12, 33, 35, 55, 65, 111, 137, 180, 181, 207, 222, 272, 281, 301, 362, 386 subjective judgments, 10 subjectivity, 25, 47 subtraction, 200 SUCRA, 105, 123, 127, 128, 129, 131, 133, 134, 252, 254, 255 Sun, 184 supplementation, 16 survival, xix, 5, 84, 116, 135, 140, 155, 163, 167, 263, 271, 272, 273, 277, 283, 284, 298, 301, 302, 303, 383, 384, 387 susceptibility, 3, 5 swelling, 338 symmetry, 376 symptoms, 132, 338 syndrome, 370 systolic blood pressure, 116, 119

T tanks, 375 target, 11, 121, 122, 187, 232, 338 target population, 11, 121 Task Force, 17, 37, 39, 59, 74, 84, 88, 97, 111, 135, 167, 206, 298, 304, 371, 386 tau, 146, 277, 300 team members, 53, 54, 78, 79, 176 teams, xviii, 379 technical support, 207, 373 techniques, xxx, xxxiv, 14, 22, 33, 62, 77, 109, 149, 157, 158, 165, 180, 215, 224, 308, 368, 378 technological developments, xviii technological progress, xviii technology, xxvi, 111, 112, 136, 207, 296, 378 telephone, 53 terminally ill, xxvii test statistic, 23 testing, 79, 88, 90, 92, 195, 197, 212, 371, 387 textbook(s), xxxiii, xxxiv, 181, 374, 391 therapeutic agents, 258 therapeutic interventions, 6 therapy, xxvii, xxviii, 6, 7, 9, 16, 137, 151, 283, 284, 296, 297, 307, 321, 354, 355, 356, 357, 362, 366, 370, 372 thesaurus, 65, 67

thinning, 124 Thomas Kuhn, xxxi threats, 5, 14, 215 thrombosis, xiii, xxxvi, 38, 149, 166, 206, 370 time frame, 22, 64 tissue, xxvi tissue engineering, xxvi TLR, 279 TNF-α, 354 tonic, 306 tonic-clonic seizures, 306 toolchain, 373 toxicity, 38, 311, 321 trade, 69, 139, 382 trade-off, 139, 382 trainees, xxxiii training, xii, xxi, 92, 93, 173, 174, 178, 272, 375 training programs, xxi transformation(s), 29, 140, 142, 222, 252, 309, 317 translation, 184, 185, 283 transmission, 232, 241 transparency, 44, 52, 75, 79, 91, 378, 379, 381 transplantation, 46 tricyclic antidepressant(s), 38 tumor, 356 tumor necrosis factor (TNF), 354, 356 turnover, xxvi type 2 diabetes, 151 type II error, 104, 367 typhoid, 22 typhoid fever, 22

U uniform, 119, 142, 341, 343 unique features, 90 United Kingdom (UK), xxxiv, 43, 45, 50, 51, 61, 62, 112, 155, 172, 183, 189, 193, 195, 243, 257 United Nations, 241 United States, xxxiv, 3, 15, 26, 381, 389 universe, 23, 34 unstable angina, xiii updating, 26 urban, 232, 241 USA, xxix, 21, 67, 87, 139, 155, 169, 172, 183, 373, 387

V vaccine, 232 validation, 94, 214, 376 valuation, 381

Complimentary Contributor Copy

406

Index

variable(s), xxxii, 14, 24, 33, 75, 76, 77, 78, 79, 82, 83, 116, 117, 155, 156, 161, 169, 171, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 188, 193, 195, 212, 222, 232, 245, 294, 339, 353, 366 variance-covariance matrix, 160 variations, 23, 32, 156, 157, 159, 164, 173, 178, 193, 272 vector, 156, 161, 266 vein, 38, 88 venlafaxine, 250, 325, 326, 327, 328, 329 venue, xxx, 391 vero, 221 vertigo, 311, 314 vision, 311 vitamin E, 6 vocabulary, 65, 67, 173 Volatile agents, 263, 264, 266 vomiting, 305, 311, 314

water, 210 weakness, 215 wealth, 338 web, xvii, 56, 79, 80, 113, 264, 334, 379, 385 websites, 62, 71, 72 weight loss, 338 well-being, xxv white paper, 37 withdrawal, 89, 305, 307, 309, 311, 312 word processing, 52, 72, 378 workflow, 373, 377, 379, 383 workload, 73, 378 World Health Organization(WHO), 68, 381 worldwide, xxxvi, 272 worry, xxxv wrists, 338 writing process, 379

Y W Washington, 40, 84 waste, 185, 381, 388

yield, xxii, 7, 31, 211, 215, 288 young people, 66

Complimentary Contributor Copy