CuRR=NT CLINICAL TRIALS

1 downloads 0 Views 767KB Size Report
4) Main Outcome Measure: Publication of clinical trial results. .... tus). (25) We also performed an unadjusted metaanalysis where we combined data from this ..... A (33%) were classified as· relevant only to males (for exam-ple, treatment to re-.
.

·

The Online ·Journal of .

• ~

.

CuRR=NT CLINICAL TRIALS 1993 ISSN 1059-2725

Document number: 50

Date of Publication: 1993 Apr 28

NIH CLINICAL TRIALS AND PUBLICATION BIAS

.....

Kay Dickersin, Yuan-1 Min

PRIMARY JOURNALS ONLINt

\

A joint venture of the American Association for the Advancement of Science and OCLC Online Computer Library Center, Inc.

Dickersin K, Min Yl. NIH clinical trials and publication bias [article]. Online J Curr Clin Trials [serial online] 1993 Apr 28;1993(Doc No 50) :[4967words; 53 paragraphs]. 1 figure; 3 tables.

NIH CLINICAL TRIALS AND PUBLICATION BIAS Kay Dickersin, Yuan-1 Min

Department of Epidemiology and Preventive Medicine, University of Maryland School of Medicine, Howard Hall, 660 West Redwood Street, Baltimore, MD 21201 USA Document Classification: Research Report Keywords: clinical trials, NIH, publishing, United States Date of Publication: 19930428

ABSTRACT :1)

2)

3)

4) 5)

5)

Objective: To investigate the association between trial characteristics, findings, and publication. The major factor hypothesized to be associated with publication was "significant" results, which included both statistically significant results andresults assessed by the investigators to be qualitatively significant, when statistical testing was not done. Other factors hypothesized to have a possible association with publication were funding institute, funding mechanism (grant versus contract versus intramural), multicenter status, use of comparison groups, large sample size, type of control (parallel versus nonparallel), use of randomization and masking, type of analysis (by treatment received versus by treatment assigned), and investigator sex and rank. Design: Follow-up, by 1988 interview with the principal investigator or surrogate, of all clinical trials funded by the National Institutes of Health (NIH) in 1979, to learn of trial results and publication status. · Population: Two hundred ninety-three NIH trials, funded in 1979. Main Outcome Measure: Publication of clinical trial results. Results: Of the 198 clinical trials completed by 1988, 93% had been published. Trials with "significant" results were more likely to be published than those showing · "nonsignificant" results (adjusted odds ratio [OR] = 12.30; 95% confidence interval [CI], 2.54 to 60.00). No other factor was positively associated with publication. Most unpublished trials remained so because investigators thought the results were "not interesting" or they "did not have enough time" (42.8%). Metaanalysis using data from this and 3 similar studies provided a combined unadjusted OR of 2.88 (95% Cl, 2.13 to 3.89) for the association between significant results and publication. Conclusions: Even when the overall publication rate is high, such as for trials funded by the NIH, publication bias remains a significant problem. Given the importance of trials and their utility in evaluating medical treatments, especially within the context of metaanalysis, it is clear that we need more reliable systems for maintaining information about initiated studies. Trial registers represent such a system but must receive increased financial support to succeed.

INTRODUCTION "We too commonly see references of 'so many successful cases', with a certain inevitable emphasis on the word 'successful'. Such papers have their value and also their manifest dangers .... There is unquestionably a false emphasis in all such publications, tending to increase the reputation of the writer, but not to render the public more secure. We have no proper balance to this very natural tendency to publish our successes except through the more frequent publication of our errors and failures which likewise mark the path of every successful practitioner. Such papers, written by men of experience and standing, would do much toward overcoming the tendency to over-security. and would certainly serve an educational purpose which the ordinary publication so often fails to attain."1 1 (8) . Concern over publication bias, the tendency to submit or accept manuscripts for publication based on the direction or strength of study findings, has grown with recent reports supporting its existence. 2-3 There has been particular interest in the problem as it relates to clinical trials, 4-6 perhaps because of the importanl role of. trials in the development of new treatments and in metaanalysis. (9) Previous research on publication bias has included follow-up studies of initiated projects in specific subject areas,4.7 institutions, 3 and geographic regions.2These studies have been useful for investigating the influence of both significant findings and additional "risk factors" on the publication of results. They have been less useful for generalizing beyond institutions or regions. (10) The National Institutes of Health (NIH) maintained an inventory of NIH-supported clinical trials from 1975 to 1979, but was unable to continue this effort due to lack of funding. Despite broad-based support for trial registn;ttion 8-1 o and despite some NIH support for similar activities at the NIH,11-13 the Inventory has not yet been reestablished. (11) In the current study, we aimed to follow trials supported by the NIH in 1979, to learn whether reports from clinical trials with significant results are more likely to be published than those reporting nonsignificant results. We also aimed to see whether certain trial characteristics (for example, institute funding the trial, mechanism of funding, multicenter status, use of comparison groups, sample size, use of randomization and masking, and type of analysis) and investigator characteristics (principal investigator [PI] rank and sex) are associated with publication. We elected to follow 1979 trials for 2 reasons. First, of the Inventory trials, they were the ones funded in the most recent past, thus investigators would be most likely to remember details of the trial on which they were being queried. Second, 1979 was sufficiently in the past for a trial to have been completed and published by 1988; when we performed the interviews. :12) We were aware from previous work3 that externally funded studies were more likely to be published than those that were unfunded. And, although analyses in previous studies had examined somewhat the subgroup of studies classified as clinical trials, 2-3 the NIH trials provided an additional opportunity to explore in greater depth factors specifically associated with trials that might have an impact on publication. (7)

METHODS :13) An application to perform this study was submitted and approved by the Committee on Human Volunteers at the Johns Hop~ins University School of Hygiene and Public Health in 1986. :14) We obtained magnetic tapes of the 1979 Inventory of Clinical Trials from the Nl~.14 A study was included in the Inventory if it was funded by the NIH and met the following definition of clinical trial:

.(15)

"A scientific research activity undertaken to define prospectively the effect and value of prophylactic/diagnostic/therapeutic agents, devices, regimens, procedures, etc., applied to human subjects. It is essential that the study be prospective, and that intervention of some sort occur. The choice of number of cases or patients will depend on the hypothesis being tested, but must be sufficient to permit a definite result to be anticipated. Phase 1, feasibility or pilot studies are excluded."14 (16) A total of 986 trials were indexed on the tapes, 654 funded by the National Cancer Institute (NCI) and 332 by 8 other institutes. These included: the National Eye Institute [NEI]; National Heart Lung and Blood Institute [NHLBI]; National Institute of Allergy and Infectious Diseases [NIAID]; National Institute of Arthritis, Metabolism, and Digestive Diseases [NIAMDD]; National Institute of Child Health and Human. Development [NICHD]; National Institute of Dental Research [NIDR]; National In,. stitute of Neurological and Communicative Disorders and Stroke [NINCDS]; and National Institute of General Medical Services [NIGMS]. Information available on the tapes included: name and address of the PI, trial title, funding institute and. type of funding (for example, contract, extramural grant), whether the trial was sin. gle center or multicenter, age range and sex (that is, male and female, male only, female only) of population, sample size, intervention tested, use of randomization and masking, number of study groups, type of control, and whether there had been· any publications by 1979. :11) We elected to follow trials funded by all institutes except the NCI. This decision was based on the fact that the nature of cancer trials tends to be different, by and large, from trials supported in other areas. An NCI-sponsored "trial" is often part of an ongoing program and not a single, distinct, protocol. These types of trials were not amenable to our data collection strategy, designed for "stand-alone" trials. , :1s) Pis or designated surrogates· for all 332 trials were written a letter describing the study, and were subsequently contacted by telephone for an interview. Studies were eligible for follow-up by interview if they were classified by the PI as trials, they involved human volunteers, and they were implemented. Although all studies were classified as trials by NIH, for a variety of reasons they were not always so classified by the investigators. In many cases, after starting an interview, the investigator felt he or she could not continue because the interview questions were not appropriate to the study design. For example, the purpose of 1 study classified as "not a trial" was to collect baseline data on an available population, in order to plan for a vaccine trial. In another example, a laboratory study examined serum to identify factors potentially related to host-parasite resistance mechanisms. :1e) Investigators were interviewed in a randomly assigned order, to ensure that a "learning curve" effect would not be related to any logical ordering of investigators. Randomly ordered Pis were assigned in sequential blocks of 10 to 1 of 5 interviewers. Because we were concerned that investigators funded for multiple trials would selectively report on published trials or trials with "significant" results .. and refuse to report on others, all.trials for a given PI were randomly ordered and selected sequentially. Although Pis could have designated a surrogate for any interview, they were encouraged to complete interviews for the 1st 2 of their randomly ordered trials and to refer us to a coinvestigator for subsequent trials, if they de-· sired. Investigators provided information on the trial design (such as sample size; number of study groups, number of study sites, method of treatment assignment, . and use of masking), funding mechanism, and characteristics of the study. 20) Investigators were also asked to characterize the trial findings, either in terms of the results of statistical testing or in terms of the investigator's assessment of the relative importance of the results, when statistical tests were not used. For analysis· · purposes, responses were classified as falling into 1 of 2 groups: results reported to be statistically significant in either direction were grouped with those deemed to

· be of "great importance" and classified as "significant." Results showing a trend in either direction, but not statistically significant, were grouped with those results designated by investigators to be of "moderate importance;'' with those results showing no difference, and those designated to be of "little importance." This 2nd group was classified as having "nonsignificant" results. (21) Investigators were asked whether any abstracts, journal articles, book chapters, proceedings, letters to the editor, or other material had been published from the trial. If there had been, they were asked for the number of publications and the references. If there had not been any publications, the investigators were asked why not. Publications were classified by whether or not they vvere in journals indexed by the ..1988 Index Medicus.15 (22) Most analyses were performed using PC-SAS version 6.04, SAS Institute. 16 Initial analyses included frequencies and cross tabulations. The associations between publication and the variables of interest were assessed by chi-square statistics and associated P values. Unadjusted ORs and 95% confidence intervals (Cis) were calculated using Woolf's method. For situations where there were empty cells in the cross tabulations, 0.5 was added to each cell. An OR of 1.00 indicates an absence of an association between publication and the variable· of interest. ORs of greater than 1.00 indicate a "positive" association (that is, an association indicating that studies are more likely to be published) and those of less than 1 .00 indicate a "negative" or inverse association. Ninety-five percent Cis that do not include 1.00 correspond with Pvalues of less than 0.05 (statistically significant). In our analyses we elected to consider our results statistically significant if P was less than 0.05.· We realize that this is a liberal use of the term, given the fact that we have performed multiple tests. (23) A forward, stepwise logistic regression procedure, 17 BMDP LR (BMDP statistical software, Los Angeles, 1990), was used to compute the adjusted OR. The regression model tested the following variables: significance of results, funding, multicenter status, number of study groups, sample size, type of control, use of randomization, masking, type of analysis, PI rank in 1988, and PI sex. Missing values were imputed to the most frequent category. For example, the sample size for 7 studies missing data for this variable was assumed to be :2:100. The 27 studies with only 1 study group were coded as nonrandomized and non masked. Variables were added to the regression model sequentially, starting with the variable/publication association having the smallest Pvalue. The process was stopped once all theremaining associations between variables and publication yielded P values of more. than 0.05. (24) Stratified data analysis was performed to see whether the magnitude of publication bias depended on any of the study characteristics. For this analysis, the associ. ation (OR) between publication and significant results was calculated separately within individual strata of each variable initially tested for an independent association with publication. A Breslow-Day test for homogeneity was used to test whether the associations differed significantly among individual strata. These analyses are equivalent to performing a logistic regression with 2 main effects and their interaction. Because of the small number of studies that did not publish and the "empty cells" that would result, many possible interactions (for example, significant results with funding, number of study groups, type of control, PI rank, and PI sex) could not be evaluated using logistic regression. Results from the stratified analysis were confirmed using the multiple logistic model that included all the main effects and interactions, where possible (for example, significant results and multicenter status). (25) We also performed an unadjusted metaanalysis where we combined data from this · study with data collected in 3 other populations of initiated studies, 2-3 in order to

get a better estimate of the size of ttie association between significant results and publication. The 4 populations combined were: 1) all studies approved between 1984 and 1987 by the Central Oxford Research Ethics Committee and followed in 1989;2 2) all studies· approved in 1980' by the Johns Hopkins Medical School institutional review board and followed in 1988; 3) all studies approved in 1980 by the Johns Hopkins School of Hygiene and Public Health institutional review board an~ followed in 1988;3 and 4) the NIH trials described in this article. All 4 populations were followed using a similar time period, study design, and data collection forms. The Mantei-Haenszel method was used to calculate the combined OR.

RESULTS· NIH Trials Follow-up Study. (26)

Of 332 trials listed in the Inventory, 293 (83.3%) were "eligible" for the interview (Table 1). We obtained a full interview for 74.1% (217/293) of the trials, publication information only for 12.3% (36/293) (partial interview), and no informa~ion (refusal) for 13.6% (40/293) of the trials.

(27)

Trials for which we obtained a full interview (n = 217) appear to .be no different from those eligible for interview (n = 293), in terms of the type of support, number. of sites; population included, type of intervention tested, use of randomization and masking, number of study· groups, and type of control (Table 2a and Table 2b). There appears to be a greater proportion of trials with a sample size.. less than 100 amongst the eligible studies than those for which we obtained a full interview. In terms of differences in publication practice, the population of trials eligible for interview and actually interviewed also appear similar, when data from the original NIH datatape were compared. The distribution of trials, by institute, was somewhat different for studies for which we were able to get full interviews. Trials funded by the NIAID and NINCDS represented the major portion of partial or refused interviews. Although the refused interviews associated with trials funded by these institutes do not appear to be due to single investigators refusing for many trials, the inves- · tigators refusing do appear, overall, to be Pis for a larger number of studies than investigators completing a full interview. Table 1 • NIH-funded studies* and interview status. Total studies Interview eligible Yes No Not a trial No patients Total Interviewed Yes Fully Partially Total No Refused Total *Does not include NCI-funded trials.

· 332

293

22 17 39

217 36 253 40 40

(28)

One hundred eighty-four of the 198 (92.9%) trials-for which there was publication information and for which analyses on the primary outcome had been completedhad been published at the time of the interview (Table·3a and Table 3b). Published trials almost always appeared in indexed journals (95.2%). The. results of .univariate ·analyses showed that publication was more likely for-trials reporting "si§nificant" than "nonsignificant" findings (OR = 7 .04; 95% Cl, 1.90 to 26.16). No other trial characteristic was pdsitively associated with publication in the univariate analysis, although trends favored those factors associated with trials of heightened "quality" (use of randomization, larger sample size, multicenter). Nor were any PI characteristics associated with publication (for example, sex, rank of investigator at time of interview). The publication rates for the 8 various NIH institutes were: 85.2% (NICHD), 85,7% (NEI), 90.9% (NIDR), 93.6% (NIAMDD), 94.8% (NI· AID), and I 00% (NHLBI, NINDS, NIGMS). There was no evidence that unpublished

Table 2a. Characteristics

at

interview-eligible trials versus trials having full I interview, ·usina information from NIH datataee. lnter\.tiew Eligible n= 293

Full Interview n= 217

No.

%

No.

%

Total Trials

293

100.0

217

100.0

Publication Yes No

176 117

(60.1) (39.9)

132 85

(60.8) (39.2).

Institute · NEI NHLBI NIAID NIAMMD NICHHD NIDR NINCDS NIGMS

21 20 95 61 32 25 38 1

(7.2) (6.8) (32.4) (20.8) (1 0.9) (8.5) (13.0) (0.3)

18 18 61 50 30 25 14 1

(8.3) (8.3) (28.1) (23.0) (13.8) (11.5) (6.5) (0.5)

Support Grant Contract Intramural

145 88 58

(49.5) . (30.3) (19.8)

112 68 35

(51.6) (31.3) (16.1)

41 252

(14.0) (86.0)

29 188

(13.4) (86.6)

76 217

(25.9) (74.1)

52 165

(24.0) (76.0)

136 157

(46.4) (53.6)

83 134

(38.2) (61.8)

Design No. study groups 1 >1 No. sites 1 >1 Sample size