Assessing validity of observational intervention ... - Semantic Scholar

4 downloads 0 Views 406KB Size Report
Centre for Health and Social Economics, National Institute for Health and Welfare, ... Background: Benchmarking Controlled Trial (BCT) is a concept which covers all observational ..... 8. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche.
ANNALS OF MEDICINE, 2016 VOL. 48, NO. 6, 440–443 http://dx.doi.org/10.1080/07853890.2016.1186830

ORIGINAL ARTICLE

Assessing validity of observational intervention studies – the Benchmarking Controlled Trials Antti Malmivaara Centre for Health and Social Economics, National Institute for Health and Welfare, Helsinki, Finland

ABSTRACT

ARTICLE HISTORY

Background: Benchmarking Controlled Trial (BCT) is a concept which covers all observational studies aiming to assess impact of interventions or health care system features to patients and populations. Aims: To create and pilot test a checklist for appraising methodological validity of a BCT. Methods: The checklist was created by extracting the most essential elements from the comprehensive set of criteria in the previous paper on BCTs. Also checklists and scientific papers on observational studies and respective systematic reviews were utilized. Ten BCTs published in the Lancet and in the New England Journal of Medicine were used to assess feasibility of the created checklist. Results: The appraised studies seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies. Conclusions: The checklist can be used for planning, conducting, reporting, reviewing, and critical reading of observational intervention studies. However, the piloted checklist should be validated in further studies.

Received 29 March 2016 Accepted 2 May 2016 KEYWORDS

Checklist; validity; benchmarking controlled trial; effectiveness; costeffectiveness; inequality; real-effectiveness medicine

KEY MESSAGES

 Benchmarking Controlled Trial (BCT) is a concept which covers all observational studies aiming to assess impact of interventions or health care system features to patients and populations.  This paper presents a checklist for appraising methodological validity of BCTs and pilot-tests the checklist with ten BCTs published in leading medical journals. The appraised studies seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies.  The checklist can be used for planning, conducting, reporting, reviewing, and critical reading of observational intervention studies.

The experimental studies, randomized controlled trials (RCTs), provide the least biased information of the efficacy of medical interventions (1). However, RCTs mostly assess effectiveness of interventions in ideal settings and they focus on specific interventions. Their ability to assess effectiveness of clinical pathways or interventions targeting health care system features is limited. Thus there is an obvious need for valid observational data on actual performance in routine settings (2). A recent paper presents the novel concept of Benchmarking Controlled Trial (BCT) and a comprehensive set of methodological criteria to be considered when appraising evidence from observational intervention studies (3). BCTs can be used to assess impacts of

clinical interventions and impacts of features of the health care systems. The aim of this paper is to create a simple checklist for assessing methodological validity of a BCT and to pilot-test the checklist with recent BCTs published in the Lancet and in the New England Journal of Medicine.

Methods The original comprehensive checklist for methodological validity issues of BCTs was based on author’s previous work with RCTs, systematic reviews and observational studies (1,4–6). Also checklists and scientific papers for observational studies and respective systematic reviews were utilized (7,8). The current

CONTACT Antti Malmivaara, M.D., Ph.D., Chief Physician, [email protected] Health and Welfare, Mannerheimintie 166, 00270 Helsinki, Finland

Centre for Health and Social Economics, National Institute for

ß 2016 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/bync-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.

ANNALS OF MEDICINE

441

Table 1. Criteria for the judgment of acceptable validity (scored ‘Yes’*) for the sources of risk of bias in Benchmarking Controlled Trials (3). 1

Statistical power calculated. Score Yes, if description of power calculations and rationale on how the study size was arrived at; post-analysis power calculation is also accepted.

2

Selection of patients described. Score Yes, if clear description of patients’ clinical path before eligible for the study; or if the patient population was comprehensive of the catchment area.

3

Valid and sufficient documentation of baseline characteristics in both index and control populations.* Score Yes, if demographic and socio-economic factors, clinically important data relevant to the particular disorder/disease (e.g. severity), general health/risk status, comorbid conditions, behavioural and environmental factors when relevant, were sufficiently documented. (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased). Baseline comparability acceptable.* Score Yes, if groups are sufficiently similar at baseline regarding demographic and socio-economic factors, duration and severity of the main indication, co-morbid conditions, and value of main outcome measure(s). (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased). If baseline documentation is insufficient, score ‘Unclear’. Valid and sufficient documentation of degree of adherence to the main intervention(s), and of other processes in both index and control populations. * Score Yes, if relevant factors for each particular study question are sufficiently reported, like intensity, duration, number and frequency of health services; and if there were no confounding interventions or they were similar between the index and control groups. (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased). Valid and sufficient documentation of outcomes in both index and control populations, including identical timing of outcome assessment.a Score Yes, if validity of the outcomes has been documented for both index and control populations, and the follow-up time points are similar; when relevant: if outcomes are assessed also among disadvantaged patients. (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased). Drop-out rate acceptable. The number of included participants who did not complete the observation period or were not included in the analysis must be described and reasons given. Score Yes, if the percentage of withdrawals and drop-outs does not exceed 10% and does not lead to substantial bias. (N.B. the percentage is arbitrary, not supported by literature, and should be appraised in relation to the study context). System related features sufficiently documented in both the index and control health care providers. Score Yes, if relevant system related factors are sufficiently documented and adjusted for in the statistical analyses: financing of the care system, organization of the care system, available resources, reimbursement and incentives, regulations. If system related features are not relevant in the study context: score ‘Yes’ (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased). Staff competence, use of up-to-date evidence, quality and benchmarking activities sufficiently documented in both the index and control health care providers. Score Yes, if differences in staff competence, use of up-to-date evidence, quality and benchmarking activities Real Effectiveness Medicine framework (2) are sufficiently documented between the index and control groups. If these items are not relevant: score ‘Yes’ (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased). Statistical analyses appropriate. Score Yes, if all appropriate statistical methods have been used to increase the validity of the comparisons (e.g. instrumental variables (when feasible), propensity score matching, baseline-adjustment between observed groups, use of multilevel modelling or survival modelling). Includes possible further information of the potential biases including extrinsic biases, e.g. conflict of interests of the researchers.

4

5

6

7

8

9

10

Comments

*Each item may be scored also ‘Unclear’ or ‘No’. a In studies having comparisons between cohorts in time (before-after comparisons): documentation of overall changes in patient characteristics, treatment practices, and outcome in health care over time should also be described in order to score Yes.

checklist was created by extracting the most essential elements from the comprehensive set of criteria in the previous paper on BCTs (open access: http://www. tandfonline.com/doi/full/07853890.2011.586901./ 07853890.2015.1027255) (3). The ten BCTs analyzed in the original paper on BCT were used to assess feasibility of the checklist created. The appraisal was rechecked and errors were corrected by the author.

Results The ten main methodological issues and description of how to assess whether they possess a risk for bias are presented in Table 1. The issues 1 and 2 evaluate whether statistical power calculations were made, and whether there is a description of patient selection

before patients were eligible to the study. Issues 3 and 4 consider documentation of baseline characteristics and how comparable are the index and reference groups. Issue 5 relates to documentation of processes, and issues 6 and 7 relate to outcomes and proportion of drop-outs. Issue 8 encompasses documentation of outcome relevant health care system features; and issue 9 covers the essential elements for producing high quality services in ordinary health care, particularly staff competence. Issue 10 evaluates whether the statistical analyses are appropriate. The results of the pilot testing of the checklist show that there is considerable variation between the studies in realization of the methodological issues (Table 2). Of the ten validity criteria, two studies scored 7, one study 6, two studies 5, three studies 4, and two studies

Yes

No

No

No

No

To compare in-hospital mortality No of patients with rupture of an abdominal aortic aneurysm in two countries

To assess 30-day mortality for acute myocardial infarction between two countries

To assess impact of high-volume No hospitals for decreased mortality after five major surgical procedures

To assess the effect of a quality Yes system on health care spending and on quality of ambulatory care

4. Karthikesalinam et al. Lancet Mar 15, 2014

5. Chung et al. Lancet April 12 2014

6. Finks et al. NEJM June 2, 2011

7. Song et al. NEJM Aug 9, 2011

8. Wallace et al. To assess the impact of night-time No NEJM May 31 intensivist physician staffing for 2012 mortality of intensive care patients

9. Sutton et al. NEJM Nov 8, 2012

To analyze impact of a hospital No pay-for-performance program with patient mortality in three acute diagnoses Unclear

3

No

4

Unclear

Yes

Unclear

Unclear

Yes

Unclear

Yes

Unclear

Unclear

3. Valid and sufficient documentation of baseline characteristics in both index and control populations

3

Unclear

Unclear

Yes

Unclear

Unclear

Yes

Unclear

Yes

Unclear

Unclear

4. Baseline comparability acceptable after statistical adjustment

4

Unclear

Unclear

Yes

Yes

Unclear

10

Yes

Yes

Yes

Yes

Yes

Yes

Yes

NAb

Yes

Yes

Yes

Yes

NAb

10

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

0

No

No

No

No

1

Yes

No

No

No

No

NAb

NAb

No

NAb

Yes

NAb

NAb

10

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

4

3

6

5

3

7b

4b

7

5b

4b

9. Differences in staff competence, use of up-to-date evidence, quality and benchmarking activities (REM framework)a 10. Total of valdocumented in both Appropriate idity points index and control statistical (0 to 10) for health care providers analyses each study

NAb

No

NAb

NAb

Yes

NAb Yes

8. System related features documented in both index and control health care providers

6. Valid and 5. Valid and sufficient sufficient documentation of documentaadherence to inter- tion of outvention, and of other comes in processes in both both index 7. Drop-out rate index and control and control populations acceptable populations Comments

Authors declare no conflicts of interests

Paper provides no declaration of conflict of interests. All authors’ declaration forms are available in the internet.

Paper provides no declaration of conflict of interests. All authors’ declaration forms are available in the internet.

Paper provides no declaration of conflict of interests. All authors’ declaration forms are available in the internet.

Paper provides no declaration of conflict of interests. All authors’ declaration forms are available in the internet.

Authors declare no conflicts of interests

Authors declare no conflicts of interests

Paper provides no declaration of conflict of interests. All authors’ declaration forms are available in the internet.

Authors declare no conflicts of interests

Funding from UK Department of health, no other conflicts of interest

REM ¼ Real Effectiveness Medicine framework, in which competence is considered the sine qua non for effectiveness in health care (2). b The study question includes impacts of the whole health care system including the clinical processes; therefore items 5, 8 and 9 are not needed for a valid answer to the study question in these studies. However, lack of information on items 5, 8 and 9 impair possibilities to make inferences of the reasons for between country differences.

a

To assess impact of nurse work- No loads and nurses’ educational qualifications to in hospital mortality after common surgical procedures in several countries Total of validity points (0 to 10) for each criteria 1

10. Aiken et al. Lancet May 24, 2014

No

No To assess the effect of surgical 3. Birkmeyer et al. NEJM Oct skill as a determinant for complication rates after bariatric surgery 10 2013

Yes

Yes

2. Pearse et al., To assess mortality rates and pat- Yes Lancet Sep 22, terns of critical care resource use 2012 for non-cardiac surgery patients across countries

No

Yes

Aim of the study

To assess between country differ- No 1. Coleman et al., Lancet Jan ences for selected cancer survival 8, 2011

Author, year, country

2. Selection of patients described; Yes, if well described or the whole 1. Statistical catchment area is power covered calculated

Table 2. Validity of recent Benchmarking Controlled Trials published in the Lancet and in the New England Journal of Medicine(3). Studies 1–5 assessed impact of clinical interventions, and studies 6–10 impact of health care system features. 442 A. MALMIVAARA

ANNALS OF MEDICINE

3. Four studies made comparisons between countries, and consequently evaluate the impact of the whole health care system including all the clinical processes as determinants of outcomes. Therefore items 5, 8 and 9 are not needed for a valid answer to the study question in these studies. However, lack of information on items 5, 8 and 9 impair possibilities to make inferences of the reasons for between country differences. One study presented statistical power calculations. Four studies fulfilled the criterion of information on selection of patients, because the whole catchment area (country) was covered. Three studies documented a valid and sufficient description of baseline characteristics in the index and control groups, and baseline comparability was considered adequate in these studies. Four studies showed valid and sufficient documentation of adherence to intervention, and description of other treatment processes. All the studies had sufficient documentation of the outcomes. The drop-out rates were acceptable, and the statistical analyses were appropriate in all ten studies. No study described health care system related features. Staff competence was evaluated only in one study, in which impact of surgical skill was the very study question.

443

system features, and staff competence would be important for making hypotheses of the possible reasons for the between country differences. In conclusion, current observational intervention studies (BCTs) seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies, and others should be acknowledged in the discussion. The piloted checklist is suggested for anyone interested in assessing validity of observational intervention studies. However, the checklist should be validated in further studies.

Acknowledgements The author has developed the idea for the paper and written the manuscript solely. Riitta Malmivaara, MA, is acknowledged for productive discussions.

Disclosure statement The author declares no support from any organisation for the submitted work; no financial relationships with any organisation that might have an interest in the submitted work; and no other relationships or activities that could appear to have influenced the submitted work.

Funding information Discussion This paper presents, for the first time, a checklist for assessing validity of observational intervention studies, the BCTs. The checklist is intended for supporting planning, conducting, reporting, peer reviewing, and for critical reading of any observational intervention study. The piloted checklist should be validated in further studies. Several methodological limitations were observed in all the ten studies. Only one study reported on statistical power calculations. None of the studies provided a description of patient selection to the study (four studies included a comprehensive patient population). Only three studies provided a valid and sufficient description of the baseline characteristics, which is a prerequisite for determining whether the comparability between the study groups is acceptable. Only four studies provided a sufficient description of the treatment processes. No study provided a description of health care system features (which potentially have impact on outcomes). Staff competence was described only in one study, which very aim was to assess the impact of competence. In between country comparisons, the treatment processes, health care system features, and staff competence are parts of the causes for the outcome, and thus documentation is not needed from the point of view of validity of the results. However, data on treatment processes, health care

No outside funding.

References 1.

2.

3.

4.

5.

6. 7. 8.

Furlan AD, Pennick V, Bombardier C, van Tulder M, Editorial Board, Cochrane Back Review Group. 2009 updated method guidelines for systematic reviews in the Cochrane Back Review Group. Spine (Phila Pa 1976). 2009;34:1929–41. Malmivaara A. Real-effectiveness medicine-pursuing the best effectiveness in the ordinary care of patients. Ann Med. 2013;45:103–6. Malmivaara A. Benchmarking Controlled Trial – a novel concept covering all observational effectiveness studies. Ann Med. 2015;47:332–40. H€akkinen U, Malmivaara A. [Guest editors]. The PERFECT project: measuring performance of health care episodes. Ann Med. 2011;43. (Suppl 1). doi: 07853890.2011.586901./07853890.2011.586901. Sihvonen R, Paavola M, Malmivaara A, Itala A, Joukainen A, Nurmi H, et al. Arthroscopic partial meniscectomy versus sham surgery for a degenerative meniscal tear. N Engl J Med. 2013;369:2515–24. Croft P, Malmivaara A, van Tulder M. The pros and cons of evidence-based medicine. Spine. 2011;36:E1121–5. Vandenbroucke J. When are observational studies as credible as randomised trials? Lancet. 2004;363:1728–31. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. PLoS Med. 2007;410:e296.