a systematic review - Journal of Clinical Epidemiology

5 downloads 195 Views 579KB Size Report
REVIEW ARTICLE. Improvement evident but still necessary in clinical practice guideline quality: a systematic review. James Jacob Armstrong a,*, Alexander M.
Journal of Clinical Epidemiology 81 (2017) 13e21

Improvement evident but still necessary in clinical practice guideline quality: a systematic review James Jacob Armstronga,*, Alexander M. Goldfarba, Ryan S. Instruma, Joy C. MacDermidb,c a

Department of Medicine, Schulich School of Medicine & Dentistry, Western University, Clinical Skills Building, London, Ontario N6A 5C1, Canada b Monsignor Roney Ambulatory Care Center, 930 Richmond Street, London, Ontario N6A 3J4, Canada c Hand and Upper Limb Center Clinical Research Lab, McMaster University School of Rehabilitation Science, 1280 Main Street West, Hamilton, Ontario L8S 4L8, Canada Accepted 10 August 2016; Published online 24 August 2016

Abstract Objective: To review the quality of clinical practice guidelines (CPGs) from a wide range of health care topics and report any changes seen since 1992. Study Design and Setting: A literature search in MEDLINE, EMBASE, Web of Science Core Collection, and BIOSIS was conducted in London, Ontario, Canada. Publications were screened to identify those assessing the quality of CPGs using the Appraisal of Guidelines, Research and Evaluation (AGREE) II instrument. Data were gathered regarding year of publication, institution type, health topic, country of origin, domain scores, and final recommendation. Results: Twenty-five studies met the inclusion criteria. AGREE II scores from 415 individual CPGs published between 1992 and 2014 were obtained. Domain scores increased significantly over time, and the proportion of guidelines being recommended based on AGREE II assessment was significantly greater after 2010. Domain scores in Applicability and Editorial independence had no significant effect on a CPG’s final recommendation, whereas other domains had a significant effect. Finally, international development groups produced CPGs with significantly higher scores. Conclusion: This review found a steady improvement in CPG quality over time. This is particularly evident in guidelines published after 2010. However, certain domains that are integral to the methodological quality of CPGs remain unsatisfactorily low. Ó 2016 Elsevier Inc. All rights reserved. Keywords: Guidelines; Clinical; AGREE; Quality; Appraisal; Policy

1. Introduction Influencing almost all fields of health care, clinical practice guidelines (CPGs) aim to improve the quality, consistency, and effectiveness of care by applying evidencebased medicine and providing health care practitioners with expert summaries of the most recent evidence [1]. The purpose of CPGs is to bridge the gap between clinical research and clinical practice and should therefore be based on the best scientific evidence and developed using the most rigorous methodology. Since the 1980s, the number of CPGs has increased dramatically. However, over the past

Conflict of interest: All the authors declare that they have no conflict of interests. * Corresponding author. Tel.: þ1-519-933-6373. E-mail address: [email protected] (J.J. Armstrong). http://dx.doi.org/10.1016/j.jclinepi.2016.08.005 0895-4356/Ó 2016 Elsevier Inc. All rights reserved.

25 years, evidence suggests that CPG quality may be highly variable, if not low in general, and the rigor with which CPGs follow standardized development methods is unsatisfactory [2e5]. It was therefore prudent for a common, widely accepted, and standardized method to evaluate CPGs to be developed. An international collaboration, the Appraisal of Guidelines, Research and Evaluation (AGREE), created a tool that can be used to evaluate the methodological quality of CPG development. The newest version, the AGREE II instrument, was released in 2010 and is the only appraisal tool that has been developed and validated internationally [6,7]. It provides a standardized framework consisting of a semiquantitative scoring system involving 23 items over six domains of methodological quality: Scope and purpose, Stakeholder involvement, Rigor of development, Clarity of presentation, Applicability, and Editorial independence.

14

J.J. Armstrong et al. / Journal of Clinical Epidemiology 81 (2017) 13e21

What is new?  AGREE II quality scores for 415 clinical practice guidelines from multiple medical disciplines have been consolidated and analyzed.  Clinical practice guideline quality significantly increased from 1992-2014.  The proportion of clinical practice guidelines being recommended for use by reviewers was significantly higher after 2010.  Additional improvement is required for AGREE II quality domains Applicability and Editorial Independence.

The updated AGREE II instrument is an evolution of the original AGREE. Several changes were made and are outlined in the AGREE II technical document [8]. The AGREE II instrument and its predecessor have been prominent in the literature for over a decade, thus giving CPG developers a viable and effective framework from which to base their final product on. Unfortunately, concerns regarding suboptimal quality, a paucity of supporting evidence, the exclusion of relevant stakeholders from the development process, compromised editorial independence, and a lack of CPG applicability persist [9e11]. These concerns may be negatively affecting the uptake, utilization, and efficacy of CPGs in their health care domains [12]. The purpose of this study is to review the quality of CPGs spanning many different health care topics published since 1990 to analyze trends in the quality of guideline development and assess the potential effect of the availability of the AGREE II instrument on CPG quality.

for reference management and screening, DistillerSR (Evidence Partners, Ottawa, Ontario). Search strategy was initially run on October 12, 2015, and rerun for a last time on June 17, 2016, to retrieve more recent publications for inclusion in our analysis. The bibliographic list maintained by the AGREE trust was last searched on June 17, 2016. Extracted publications underwent title and abstract screening during which articles were included based on a predefined set of inclusion criteria: (1) full text is available in English and (2) publication in a peer-reviewed journal. After title and abstract screening, full texts were acquired, and a more in-depth screening was performed using the following inclusion criteria: (1) complete AGREE II scores (all six domains and final recommendation) of one or more CPGs were reported and (2) AGREE II scores were generated by two or more independent reviewers. Three authors (J.J.A., A.M.G., and R.S.I.) assessed all abstracts and full-text articles for inclusion. Any disagreement between authors was resolved by consensus. Methods were in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 standards [13]. 2.2. Data collection Three authors (J.J.A., A.M.G., and R.S.I.) collected data on the following characteristics of each review: search methods for included CPGs, number of AGREE II appraisers, and the interobserver agreement achieved by each group of reviewers. The following information for the included guidelines was collected from each review: year of publication, institution, CPG health topic, country of origin, AGREE II domain scores (Scope and purpose, Stakeholder involvement, Rigor of development, Clarity of presentation, Applicability, and Editorial independence), and the CPGs’ overall assessment (recommended, recommended with modifications, or not recommended). If any included reviews had incomplete data, authors were contacted for further information.

2. Methods 2.1. Literature search and study selection

2.3. Data analysis

A predefined search strategy was used to obtain potentially relevant literature from the MEDLINE, EMBASE, and Web of Science Core Collection and BIOSIS databases. The search strategy used only terms relating to the AGREE II instrument and CPGs to target articles that used the AGREE II instrument to review CPGs from any medical field. In addition to database searching, a bibliographic list of studies citing the AGREE II instrument (list maintained by the AGREE trust and available for download at http://www.agreetrust.org/resource-centre/citations-of-corepublications/) was used to source additional potentially relevant studies. References obtained from the database searches and the AGREE trust’s bibliographic list were organized using EndNote X7 (Thomson Reuters, New York, NY, USA) and imported into the online systematic review software

The correlation between the different domain scores and overall assessment was analyzed using the Pearson coefficient. For the purposes of this analysis, recommended and recommended with modification were grouped into a single recommended category to dichotomize the data into two categories: recommended and not recommended. The recommendations were compared based on CPG date of publication, location of publication, and type of development organization by analysis of variance and post hoc (Duncan) when appropriate. To analyze the trends in domain scores and final recommendations over time, CPGs were grouped based on publication date into four categories (1990e1999, 2000e2004, 2005e2009, and 2010e2015) and analyzed using the KruskaleWallis test and ManneWhitney test. The authors explored the potential

J.J. Armstrong et al. / Journal of Clinical Epidemiology 81 (2017) 13e21

influence of the publication of the AGREE II instrument on the quality of guidelines by comparing the quality scores of guidelines published before and after the release date of the AGREE II instrument (2010).

3. Results 3.1. Literature review The authors retrieved 515 citations of articles using the AGREE II instrument from the AGREE Trust online database. Systematic searching of MEDLINE, EMBASE, and Web of Science Core Collection and BIOSIS databases retrieved 719 citations. EndNote X7 (Thomson Reuters) was used to remove duplicate citations, of which there were 180. Of the remaining 539 articles, 445 were published in

15

English and of potential relevance. Full-text versions of the articles were sourced and assessed, after which another 421 articles were excluded in accordance with our inclusion and exclusion criteria (Fig. 1). Eleven authors were contacted to obtain additional information; however, no additional data were obtained. In total, 25 reviews of CPG quality were included, and agreement in the inclusion process was high (kappa 5 0.915). The included reviews contained AGREE II domain scores from a total of 415 CPGs with publication dates ranging from 1992 to 2014. 3.2. Characteristics of included reviews and guidelines Each review of CPG quality included a mean of 18 AGREE II quality assessments [14e39], with 26% generating scores from two reviewers, 47% from three reviewers, and 57% from four or more reviewers. Only 44% (10 of 25) of

Fig. 1. Study selection flowchart. AGREE II, Appraisal of Guidelines, Research and Evaluation II.

16

J.J. Armstrong et al. / Journal of Clinical Epidemiology 81 (2017) 13e21

Table 1. Characteristics of reviewed guidelines (n 5 415)

reviews reported the interobserver agreement achieved while scoring CPGs; however, the mean reported interclass correlation coefficients (ICCs) were quite good (mean ICC, 0.857; range, 0.671e0.965). Guideline publication dates spanned 22 years with 89% being published after 2005. International (groups that contain members on two or more continents) and national medical societies were the largest publishers of CPGs, publishing 36% and 37%, respectively. Finally, most of the guidelines were published by groups in North America (26%), Europe (28%), or by groups with very internationally diverse membership (34%). The remaining 12% of CPGs originated from South America, Asia, and Australia. Guideline characteristics are displayed in Table 1.

international collaborations compared with those from individual continents, with the exception of Australia and New Zealand (P ! 0.001). In addition, CPGs developed in Asia scored lower in Rigor of development (P ! 0.001). There were no other differences among regions. Guidelines created by government or international organizations scored better in both Applicability and Scope and purpose (P ! 0.001). CPGs developed by government organizations scored highest in Stakeholder involvement, and those developed by international organizations scored highest in Clarity of presentation. For the Editorial independence and Rigor of development domains, there were no differences in mean CPG domain scores when grouped by type of development organization. Table 2 displays mean AGREE II domain scores for all included guidelines. It also shows mean domain scores for guidelines when grouped according to final recommendation. In our sample of CPGs, 37% were recommended, 45% were recommended with modifications, and 18% were not recommended by appraisers. Fig. 2 shows the proportion of CPGs receiving a final rating of recommended, recommended with modifications, or not recommended. All domain scores increased significantly over time (Table 3), with the largest increases occurring in Editorial independence (90%) and Rigor of development (45%). Scope and purpose, Stakeholder involvement, Rigor of development, and Clarity of presentation were significantly correlated with the reviewer’s final assessment (Pearson coefficient, r 5 0.62, P ! 0.05). Applicability and Editorial independence were not significantly related (P 5 0.74 and 0.71, respectively) to whether a reviewer would recommend or not recommend a CPG. CPG publication date was significantly correlated with a reviewer’s final recommendation (P ! 0.001). Specifically, if the CPG was published after 2010, it was more likely to be recommended than a CPG published before 2010. When looking at the final recommendation based on continent of origin, there was no significant correlation overall (P 5 0.07). Finally, CPGs developed by medical societies were less likely to be recommended overall compared with other organization types (P ! 0.01).

3.3. AGREE II quality scores

4. Discussion

Table 2 includes the mean scores for the six AGREE II domains and mean domain scores when CPGs were grouped according to the final recommendation (not recommended, recommended with modifications, and recommended). CPGs graded as recommended received significantly higher domain scores than CPGs graded as recommended with modifications and not recommended. CPGs developed internationally scored significantly higher in the domains Scope and purpose and Clarity of presentation (P ! 0.001) compared with CPGs originating from a single country. Mean Applicability domain scores were also significantly higher in CPGs developed from

Overall, CPG quality has improved considerably over the past 2 decades. Steady improvement was observed in all AGREE II domain scores. Domain scores for Clarity of presentation and Scope and purpose reached acceptable levels, Stakeholder involvement and Rigor of development were considered borderline, and Editorial independence and Applicability scored quite poorly. Particularly troubling are the suboptimal scores in Editorial independence and Rigor of development, as these two domains have been considered to have the most direct effect on CPG content quality [9]. It can be argued that, overall, CPG quality has seen notable improvement over the last 2 decades;

CPGs included Year of publication 1990e1999 2000e2004 2005e2009 2010e2014 Location of publication North America South America Europe Asia Australia/New Zealand International Health care topic Internal medicine/critical care/geriatrics Oncology Obstetrics and gynecology Psychology Pediatrics Musculoskeletal Ophthalmology Occupational medicine Other Type of organization responsible for guidelines Government Medical society International Research institute Other

8 34 176 193 107 6 115 28 14 140 170 53 51 17 35 27 1 6 48 61 153 148 23 29

Abbreviation: CPGs, clinical practice guidelines.

J.J. Armstrong et al. / Journal of Clinical Epidemiology 81 (2017) 13e21

17

Table 2. Mean (SD) AGREE II domain scores for all included CPGs as well as mean (SD) scores for CPGs grouped by final recommendation (recommended, recommended with modifications, or not recommended) Domain

All (n [ 415)

Recommended (n [ 155)

Recommended with modifications (n [ 185)

Not recommended (n [ 74)

P

Scope and purpose Stakeholder involvement Rigor of development Clarity of presentation Applicability Editorial independence

75.8 (20.6) 52.6 (21.9)

87.7 (12.0) 68.0 (17.3)

74.5 (16.0) 48.7 (16.3)

52.9 (25.2) 28.5 (17.4)

!0.001 !0.001

51.3 80.0 37.1 41.8

75.2 90.6 53.8 57.9

42.5 79.5 30.7 36.7

21.2 57.9 16.7 19.3

!0.001 !0.001 !0.001 !0.001

(26.3) (18.3) (25.7) (28.8)

(15.7) (8.7) (22.9) (24.5)

(18.9) (15.2) (22.4) (26.3)

(12.5) (21.1) (14.8) (23.1)

Abbreviations: SD, standard deviation; AGREE, Appraisal of Guidelines, Research and Evaluation; CPGs, clinical practice guidelines.

however, there is still much progress to be made, particularly in terms of editorial independence and CPG applicability. The last question the AGREE II framework asks a reviewer is for their subjective opinion to recommend, recommend with modifications, or not recommend a CPG. When looking at rates of reviewer recommendation on guideline use over time, a more recent inclination to recommend CPGs is evident. A large increase in recommendation rate is observed from the year 2010 onward. One possible explanation of the marked upward trend in the quality scores of guidelines published after 2010 is the release of the AGREE II instrument itself. Its use as a framework for the development of CPGs was encouraged, and it is plausible that it has had a positive effect on CPGs since its release in 2010 [8]. Brouwers et al. [40] suggested that Clarity of presentation domain scores could be improved through increasing recommendation specificity, reducing recommendation ambiguity, offering several clearly described management options for a specific health issue, and providing easy to access, succinct, and clear summaries of key recommendations. Our results indicated a modest improvement in this domain over the study duration suggesting that CPG authors may have used some of the aforementioned strategies to improve the presentation of their recommendations. A review examining barriers to physician CPG compliance has found that a lack of awareness and familiarity with guidelines were the top two reasons a physician might not use CPGs [41]. Improvements relating to the Clarity of presentation domain may play a vital role in reducing 1 0.8 0.6 0.4 0.2 * 0 Pre 1990s

2002-2004 Not Recommended

2005-2009

Recommended with modificaƟons

2010-2014 Recommended

Fig. 2. Proportion of clinical practice guidelines that received a final assessment of recommended, recommended with modifications, or not recommended. Significant difference indicated by * (P ! 0.001).

these two barriers and increasing CPG utilization. Clarity of presentation also has the smallest standard deviation of all domains, indicating that CPG authors across all disciplines were quite consistent in their fulfillment of this domain’s requirements. Rigor of development has been argued to have the one of the greatest and most direct effects on guideline quality. Our results indicating that global CPG scores were borderline in this domain are troubling [9]. However, the improvement seen in this domain’s scores over the past 2 decades is encouraging. Of all six domains, Rigor of development saw the second greatest overall improvement; this improvement was surpassed only by that seen in Editorial independence. Experienced teams and sufficient resources to perform a well-documented literature search and evidence appraisal are cornerstones of rigorous CPG development. Indeed, sound methodological expertise is paramount for the development of CPGs, thus development groups should only pursue CPG authorship if they have the minimal requirements to do so. It has been previously suggested that low scores in this domain may be attributed to CPG development groups inadequately reporting their methods [9]. Therefore, improvement in this domain could be achieved through inclusion of a more comprehensive description of methodology. By including items such as literature search strategy, evidence selection process and summary tables, as well as methods of grading evidence and determining recommendation strength, CPG authors could expect to improve scores in this domain. In accordance with previous reviews of CPG quality [28,31,42e44], the present review found that domain scores were lowest in the Applicability and Editorial independence. These areas of CPG development have historically been weak and continue to be an area in which great improvement is possible [10,11,28,31,43e45]. Guideline development groups can improve their domain scores in Editorial independence by providing transparent and comprehensive information regarding funding sources and including a direct statement declaring the presence or the absence of each author’s conflicts of interest. Strangely, the criteria for this domain are arguably the easiest to satisfydauthors merely need to include two statements regarding funding and conflicts of interest to meet the

18

J.J. Armstrong et al. / Journal of Clinical Epidemiology 81 (2017) 13e21

Table 3. Mean (SD) AGREE II domain scores when grouped according to the publication date Domain Scope and purpose Stakeholder involvement Rigor of development Clarity and presentation Applicability Editorial independence

1990e1999 (n [ 8) 74 50 40 74 29 21

(20) (30) (27) (20) (23) (18)

2000e2004 (n [ 34) 72 45 46 73 24 28

(26) (27) (27) (20) (25) (29)

2005e2009 (n [ 176) 72 49 45 78 37 36

(21) (22) (25) (18) (25) (28)

2010e2014 (n [ 193) 80 58 58 83 40 50

(19) (19) (26) (17) (26) (27)

P for trend 0.001 !0.001 !0.001 0.001 0.004 !0.001

Abbreviations: SD, standard deviation; AGREE, Appraisal of Guidelines, Research and Evaluation.

AGREE II standards. Further work is necessary to determine if development groups are merely forgetting to include such statements, omitting them out of simplicity, ease, or if conflicts of interest in fact exist and are having a negative effect on the content of CPGs. Although this domain is presently rather weak, our data show that it has seen the greatest improvement over the past 2 decades. Regardless of the observed improvement, evidence has suggested that conflicts of interest are almost endemic throughout all fields of CPG authorship [46e48]. As conflicts of interest have been shown to have a measurable effect on the recommendations found within CPGs [30], it seems imperative that dramatic improvement continues to take place in this domain. Indeed, more recently, implementation of new methods of reporting and mitigating financial and intellectual conflicts of interest has already begun [49e51]. Such reporting standardization may mitigate potential misconceptions as to what constitutes a reportable conflict of interest and simplify the reporting process. Future evaluation will be necessary to determine the effectiveness of these methods. A review of physician adherence to CPGs suggested that as many as 38% of physicians considered CPGs as inconvenient or too difficult to use [41]. Increasing the applicability of CPGs to everyday clinical practice is a crucial and indispensable step toward increasing their rate of use and maximizing their positive impact on health care. Unfortunately, our results reveal Applicability as the lowest scoring domain. Other studies have also reported similar findings, consistently reporting Applicability as the lowest or second-lowest scoring domain [10,11,28,42,45]. AlonsoCoello et al. [9] suggested that persistent low scores in this domain may be because of development groups considering guideline development and guideline implementation as separate entities. If development groups were to address issues facing CPG implementation, such as organizational barriers, economic impact, and dissemination strategies, scores could be improved substantially in this domain. Inclusion of criteria to monitor and audit CPG uptake and utilization after publication is also crucial to facilitate monitoring the efficacy of implementation strategies and success of making guidelines more applicable for end users. When looking at mean domain scores of CPGs grouped by geographic region, those that were published

internationally generally scored higher. In particular, mean scores in Scope and purpose, Clarity of presentation, and Applicability were highest among guidelines developed in multiple countries. Because international bodies can draw on the expertise of a global staff, they are likely better equipped to write clearer and more methodologically sound guidelines. By leveraging their greater variety of expertise, deeper pool of knowledge, and larger funding sources, these international development groups are better able to target guidelines toward specific populations, understand implementation barriers, and put into effect strategies to overcome said barriers. When considering Rigor of development, CPGs developed within Asia scored significantly lower than any other geographic location. This could be because of a limitation of evidence to draw upon based on language barriers. In addition, funding limitations could decrease the quality of the process used to gather evidence. An analysis of domain scores by type of publication body showed that CPGs originating from government and international development bodies scored highest in both Applicability and Scope and purpose. As mentioned previously, these organizations have increased funding and are more capable of setting and influencing policy to facilitate CPG implementation. Governmental development bodies also scored highest in Stakeholder involvement. Again, this may be because of high levels of funding, which allow for the recruitment of patients for focus groups. In addition, because public money is used to fund governmental CPG development bodies, there is an obligation to include the public in the development processes. Finally, we found that CPGs published by medical societies were less likely to be recommended than CPGs published by other types of development groups. This finding was previously found in a review of CPG quality that used the AGREE I instrument [9] and indicates that medical societies should continue to focus on improving their CPG development process. This could be because of medical societies having a less diverse development group consisting mainly of physicians. Physician-only development groups may produce CPGs scoring well in certain domains such as Rigor of development and Scope and purpose; however, the perspective of other health care professionals and community members may be necessary to improve the quality of domains, such as Stakeholder involvement, Clarity of presentation, Applicability, and Editorial independence.

J.J. Armstrong et al. / Journal of Clinical Epidemiology 81 (2017) 13e21

As a systematic review of CPG quality spanning over 2 decades, our study has many strengths. The guidelines included come from a wide range of topics over a fairly diverse geographic distribution. Furthermore, because we retrieved our data from reviews of CPG quality, we had access to data for CPGs that were no longer publicly available in the corresponding institutions’ web pages because of updated versions being released. This allowed for a less biased assessment of changes in guideline quality over time. Finally, as the types of studies we were examining report the quality of CPGs and by their nature do not have positive and negative results, there is less risk of publication bias affecting our study. Our study, however, was not without its limitations. Our data come from published AGREE II appraisals; this is clearly a potential source of publication bias. Authors may be more likely to review and publish reviews on CPGs in an area already thought to be lacking quality. Furthermore, the decision to review CPGs from a particular health care topic may be dependent on the number of CPGs available in that health care topic. Finally, we only included studies using the AGREE II instrument as prescribed by the AGREE Collaboration, whereas few studies reported the interobserver agreement attained during their appraisal of CPGs. This also may be a potential source of bias in that studies with a rather low ICC may be less likely to report an ICC, which may result in our review reporting an overestimated ICC. The validity of our study also depends heavily on the validity of the AGREE II instrument. Previous studies have confirmed the validity of the AGREE II instrument and concluded that it is a useful and reliable tool [40,52]. However, there are inherent limitations to the AGREE II instrument itself. It is a tool for the assessment of methodological quality, and it does not assess the clinical context or quality of the health recommendations or therapies found within a CPG. This limitation is shared by all existing appraisal tools [7]. Such tools can never fully replace a reader’s critical judgment or sound clinical judgment. Clinicians must ensure that they consider the nature of the health condition and patient to whom a CPG’s recommendations would be applied. 4.1. Future implications The findings of our systematic review are promising in so far as they show a marked improvement in CPG quality, especially in certain domains. However, improvement is still necessary, and we hope that our results can assist development groups in focusing improvement efforts on the areas that are currently the most lacking. Guideline development is a quickly changing and resource intensive field; therefore, larger institutions are likely able to better produce high-quality guidelines. For smaller institutions, or those with less development experience, an alternative for de novo CPG development was proposed by Fervers et al. [53]. In their article, they described a methodology

19

for adapting established high-quality guidelines such that they can better suit the development groups’ specific needs while still maintaining their original methodological quality [53]. However, the efficacy of this method over de novo CPG development remains to be established. Another method to increase the global quality of CPGs is to reduce redundant guideline development. Multiple organizations releasing similar guidelines or guidelines that significantly overlap in their scope could potentially publish a high-quality product if their efforts, expertise, and resources were combined. International collaborations by specialty, topic, or condition have been shown to reduce the number of redundant CPGs published [54]. Furthermore, our results support international guidelines generally achieving high-quality scores. Therefore, the formation of international networks or collaborations for guideline development should be a priority. Such networks could additionally be used to centralize and share evidence used in CPG development. Knowledge gaps could then be determined and published, thus directing future funding and investigation in that specific area. Organizations such as the World Health Organization, the Guidelines International Network, and the Cochrane Collaboration should play a central role in developing, maintaining, and supporting such networks. Acknowledgments J.J.A. was funded by the Schulich School of Medicine Summer Research Training Program. J.C.M. was funded by a Canadian Institutes of Health Research Chair in Gender, Work and Health; and Dr James Roth Research Chair in Musculoskeletal Measurement and Knowledge Translation. Karen O’Neil contributed to editing the final article. Contributors: J.J.A. and J.C.M. conceived the idea for this research and designed the study together with A.M.G. and R.S.I. J.J.A., A.M.G., and R.S.I. searched the literature, reviewed the published work, and participated in data extraction. Statistical analysis was conducted by A.M.G. and J.J.A. All authors participated in data interpretation. J.J.A., A.M.G., and R.S.I. contributed significantly to article preparation. All authors commented on each draft of the article and approved the final version. References [1] Field MJ, Lohr KN. Clinical practice guidelines: directions for a new program. Washington, DC: National Academies Press; 1990. [2] Grilli R, Magrini N, Penna A, Mura G, Liberati A. Practice guidelines developed by specialty societies: the need for a critical appraisal. Lancet 2000;355:103e6. [3] Rivara FP. Are guidelines following guidelines? AAP Gd Rounds 1999;281:1900e5. [4] Qaseem A, Forland F, Macbeth F, Ollenschl€ager G, Phillips S, van der Wees P. Guidelines International Network: toward international standards for clinical practice guidelines. Ann Intern Med 2012;156: 525e31.

20

J.J. Armstrong et al. / Journal of Clinical Epidemiology 81 (2017) 13e21

[5] Kung J. Failure of clinical practice guidelines to meet Institute of Medicine standards: two more decades of little, if any, progress. Arch Intern Med 2012;172:1628e33. [6] AGREE Collaboration. Development and validation of an international appraisal instrument for assessing the quality of clinical practice guidelines: the AGREE project. Qual Saf Health Care 2003;12: 18e23. [7] Vlayen J, Aertgeerts B, Hannes K, Sermeus W, Ramakers D. A systematic review of appraisal tools for clinical practice guidelines: multiple similarities and one common deficit. Int J Qual Health Care 2005;17:235e42. [8] Brouwers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, et al. AGREE II: advancing guideline development, reporting and evaluation in health care. Can Med Assoc J 2010;182: E839e42. [9] Alonso-Coello P, Irfan A, Sola I, Gich I, Delgado-Noguera M, Rigau D, et al. The quality of clinical practice guidelines over the last two decades: a systematic review of guideline appraisal studies. Qual Saf Health Care 2010;19:e58. [10] Sabharwal S, Patel NK, Gauher S, Holloway I, Athansiou T. High methodologic quality but poor applicability: assessment of the AAOS guidelines using the AGREE II instrument. Clin Orthop Relat Res 2014;472:1982e8. [11] Don-Wauchope AC, Sievenpiper JL, Hill SA, Iorio A. Applicability of the AGREE II instrument in evaluating the development process and quality of current National Academy of Clinical Biochemistry guidelines. Clin Chem 2012;58:1426e37. [12] Lugtenberg M, Zegers-van Schaick JM, Westert GP, Burgers JS. Why don’t physicians adhere to guideline recommendations in practice? An analysis of barriers among Dutch general practitioners. Implement Sci 2009;4:54. [13] Moher D, Liberati A, Tetzlaff J, Altman DG, Grp P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 2009;6:e1000097. [14] Avin KG, Hanke TA, Kirk-sanchez N, Mcdonough CM, Shubert TE, Hardage J, et al. Management of falls in community-dwelling older adults: clinical guidance statement from the Academy of Geriatric Physical Therapy of the American Physical Therapy Association. Phys Ther 2015;95:815e34. [15] Castellani A, Girlanda F, Barbui C. Rigour of development of clinical practice guidelines for the pharmacological treatment of bipolar disorder: systematic review. J Affect Disord 2015;174C:45e50. [16] Olivera MJ, Fory JA, Olivera AJ. Quality assessment of clinical practice guidelines for Chagas disease. Rev Soc Bras Med Trop 2015;48:343e6. [17] Ser on P, Lanas F, Rıos E, Bonfill X, Alonso-Coello P. Evaluation of the quality of clinical guidelines for cardiac rehabilitation. J Cardiopulm Rehabil Prev 2015;35:1e12. [18] Wilby KJ, Black EK, MacLeod C, Wiens M, Lau TTY, Paiva MA, et al. Critical appraisal of clinical practice guidelines in pediatric infectious diseases. Int J Clin Pharm 2015;37:799e807. [19] Burda BU, Chambers AR, Johnson JC. Appraisal of guidelines developed by the World Health Organization. Public Health 2014; 128:444e74. [20] Farghali A, Al-Khawaja R. Rigorous method to assess quality and generalizability of clinical practice guidelines. Can J Hosp Pharm 2014;67:397e8. [21] Gillon TER, Pels A, von Dadelszen P, MacDonell K, Magee LA. Hypertensive disorders of pregnancy: a systematic review of international clinical practice guidelines. PLoS One 2014;9: e113715. [22] Larmer PJ, Reay ND, Aubert ER, Kersten P. Systematic review of guidelines for the physical management of osteoarthritis. Arch Phys Med Rehabil 2014;95:375e89. [23] Lee GY, Yamada J, Kyololo O, Shorkey A, Stevens B. Pediatric clinical practice guidelines for acute procedural pain: a systematic review. Pediatrics 2014;133:500e15.

[24] Lytras T, Bonovas S, Chronis C, Konstantinidis AK, Kopsachilis F, Papamichail DP, et al. Occupational Asthma guidelines: a systematic quality appraisal using the AGREE II instrument. Occup Environ Med 2014;71:81e6. [25] Marciano NJ, Merlin TL, Bessen T, Street JM. To what extent are current guidelines for cutaneous melanoma follow up based on scientific evidence? Int J Clin Pract 2014;68:761e70. [26] Rıos E, Seron P, Lanas F, Bonfill X, Quigley EMM, AlonsoCoello P. Evaluation of the quality of clinical practice guidelines for the management of esophageal or gastric variceal bleeding. Eur J Gastroenterol Hepatol 2014;26:422e31. [27] Wang Y, Luo Q, Li Y, Wang H, Deng S, Wei S, et al. Quality assessment of clinical practice guidelines on the treatment of hepatocellular carcinoma or metastatic liver cancer. PLoS One 2014;9: e103939. [28] Zhang Z, Guo J, Su G, Li J, Wu H, Xie X. Evaluation of the quality of guidelines for myasthenia gravis with the AGREE II instrument. PLoS One 2014;9:e111796. [29] Acu~na-Izcaray A, Sanchez-Angarita E, Plaza V, Rodrigo G, Montes de Oca M, Gich I, et al. Quality assessment of asthma clinical practice guidelines. CHEST 2013;144:390. [30] Norris SL, Burda BU, Holmer HK, Ogden LA, Fu R, Bero L, et al. Author’s specialty and conflicts of interest contribute to conflicting guidelines for screening mammography. J Clin Epidemiol 2012;65: 725e33. [31] Holmer HK, Ogden LA, Burda BU, Norris SL. Quality of clinical practice guidelines for glycemic control in type 2 diabetes mellitus. PLoS One 2013;8:1e6. [32] Legido-Quigley H, Panteli D, Car J, Mckee M, Busse R. Clinical Guidelines for Chronic Conditions in the European Union. Eur Heart J 2012;33:1635e701. [33] Luitjes SHE, Wouters MGAJ, K€onig T, Hollander KW, van Os ME, van Tulder MW, et al. Hypertensive disorders in pregnancy: a review of international guidelines. Hypertens Pregnancy 2013;32: 367e77. [34] Nowobilski R, Plaszewski M, Wloch T, Mika P, Gajewski P, Bro_zek JL. Physiotherapy in asthmadseeking consensus. J Asthma 2013;50:681e6. [35] Rohde A, Worrall L, Le Dorze G. Systematic review of the quality of clinical guidelines for aphasia in stroke management. J Eval Clin Pract 2013;19:994e1003. [36] Sabharwal S, Patel V, Nijjer SS, Kirresh A, Darzi A, Chambers JC, et al. Guidelines in cardiac clinical practice: evaluation of their methodological quality using the AGREE II instrument. J R Soc Med 2013;106:315e22. [37] Winther LP, Mitchell AU, Moller AM. Inconsistencies in clinical guidelines for obstetric anaesthesia for Caesarean section. Acta Anaesthesiol Scand 2013;57:141e9. [38] Bastian H. Nondisclosure of financial interest in clinical practice guideline development: an intractable problem? PLoS Med 2016; 13:e1002030. [39] Jin Y, Wang MSY, Zhang BSY, Ma BSY, Li MSY. Nursing practice guidelines in China do need reform: a critical appraisal using the AGREE II instrument. Worldviews Evid Based Nurs 2016;13: 124e38. [40] Brouwers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, et al. Development of the AGREE II, part 1: performance, usefulness and areas for improvement. CMAJ 2010;182: 1045e52. [41] Cabana MD, Rand CS, Powe NR, Wu AW, Wilson MH, Abboud PA, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA 1999;282: 1458e65. [42] Armstrong JJ, Rodrigues IB, Wasiuta T, MacDermid JC. Quality assessment of osteoporosis clinical practice guidelines for physical activity and safe movement: an AGREE II appraisal. Arch Osteoporos 2016;11:1e10.

J.J. Armstrong et al. / Journal of Clinical Epidemiology 81 (2017) 13e21 [43] Nelson AE, Allen KD, Golightly YM, Goode AP, Jordan JM. A systematic review of recommendations and guidelines for the management of osteoarthritis: the Chronic Osteoarthritis Management Initiative of the U.S. Bone and Joint Initiative. Semin Arthritis Rheum 2014;43:701e12. [44] Cates JR, Young DN, Bowerman DS, Porter RC. An independent AGREE evaluation of the occupational medicine practice guidelines. Spine J 2006;6:72e7. [45] Yan J, Min J, Zhou B. Diagnosis of pheochromocytoma: a clinical practice guideline appraisal using AGREE II instrument. J Eval Clin Pract 2013;19:626e32. [46] Norris SL, Holmer HK, Ogden LA, Burda BU. Conflict of interest in clinical practice guideline development: a systematic review. PLoS One 2011;6:e25153. [47] Neuman J, Korenstein D, Ross JS, Keyhani S. Prevalence of financial conflicts of interest among panel members producing clinical practice guidelines in Canada and United States: cross sectional study. BMJ 2011;343:d5621. [48] Mendelson TB, Meltzer M, Campbell EG, Caplan AL, Kirkpatrick JN. Conflicts of interest in cardiovascular clinical practice guidelines. Arch Intern Med 2011;171:577e84.

21

[49] Schunemann HJ, Hill SR, Kakad M, Vist GE, Bellamy R, Stockman L, et al. Transparent development of the WHO rapid advice guidelines. PLoS Med 2007;4:e119. [50] Hirsh J, Guyatt G. Clinical experts or methodologists to write clinical guidelines? Lancet 2009;374:273e5. [51] Sch€unemann HJ, Osborne M, Moss J, Manthous C, Wagner G, Sicilian L, et al. An official American Thoracic Society Policy Statement: managing conflict of interest in professional societies. Am J Respir Crit Care Med 2009;180:564e80. [52] Macdermid JC, Brooks D, Solway S, Switzer-McIntyre S, Brosseau L, Graham ID. Reliability and validity of the AGREE instrument used by physical therapists in assessment of clinical practice guidelines. BMC Health Serv Res 2005;12:1e12. [53] Fervers B, Burgers JS, Haugh MC, Latreille J, Mlika-Cabanne N, Paquet L, et al. Adaptation of clinical guidelines: literature review and proposition for a framework and procedure. Int J Qual Health Care 2006;18:167e76. [54] Schunemann HJ, Woodhead M, Anzueto A, Buist S, MacNee W, Rabe KF, et al. A vision statement on guideline development for respiratory disease: the example of COPD. Lancet 2009;373: 774e9.