A Practical Guide on Incorporating and Evaluating

4 downloads 0 Views 155KB Size Report
This article provides a practical guide for clinical researchers to incorporate and ... port labeling claims of patient-reported measures in medical product devel-.
Clinical Research and Regulatory Affairs, 25(4):197–211 (2008) Copyright © Informa UK, Ltd. ISSN 1060-1333 print/1532-2521 online DOI: 10.1080/10601330802471162

A PRACTICAL GUIDE ON INCORPORATING AND EVALUATING PATIENT-REPORTED OUTCOMES IN CLINICAL TRIALS Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

1532-2521 1060-1333 LCRR Clinical Research and Regulatory Affairs Affairs, Vol. 1, No. 1, October 2008: pp. 1–26

Xuemei Luo Guide X. Luofor andPROs J.C. Cappelleri



Global Outcomes Research, Pfizer Inc, New London, CT, USA

Joseph C. Cappelleri



Biostatistics, Pfizer Inc, New London, CT, USA

䡺 This article provides a practical guide for clinical researchers to incorporate and assess patientreported outcomes (PROs) in a clinical trial for regulatory purposes, including for a label claim, drug promotion, and program planning. We provide six fundamental steps for evaluating PROs in clinical trials: 1) formulating study objectives, 2) developing or selecting an instrument, 3) developing data collection strategies, 4) analyzing data, 5) reporting data, and 6) interpreting study findings. Patient-reported outcomes should be handled like other clinical study endpoints, be integrated into the statistical analysis plan, and adopt the same set of scientific standards.

Keywords Patient-reported Outcomes, Clinical Trial, Label Claim, Drug Promotion, Program Planning

INTRODUCTION Assessment of patient-reported outcomes (PROs) has become increasingly common in clinical trials over the past two decades (1,2). The measurement of PROs involves any aspect of a patient’s health status and may include his or her reporting of frequency and severity of symptoms, perception of daily functioning, feelings of well-being, satisfaction with treatments, and overall health-related quality of life. Traditionally, clinical trials have focused on clinically determined end points such as survival and physiological or laboratory measures of disease. But these end points are often inadequate or incomplete when studying conditions such as pain, depression, and fatigue that require patients’ evaluations of their health status and symptoms. After all, who knows more about how the patient is feeling than the patient herself? Even for conditions like hypertension, where physiological or laboratory measures are adequate for the evaluation of treatment effects, the Address correspondence to Xuemei Luo, PhD, Pfizer Inc., 50 Pequot Ave., MS 6025-A3218, New London, CT 06320. E-mail: [email protected]

197

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

198

X. Luo and J.C. Cappelleri

improvement in these measurements may be associated with additional improvement—above and beyond traditional clinical measures—in patients’ functioning and well-being (3,4). Patients’ perceived functioning and well-being may be directly relevant to how patients consider the benefits of a treatment and may ultimately influence their choice of and compliance with treatment (3,4). Therefore, assessment of PROs can complement and supplement the information captured by conventional clinical end points and provide added value for treatment evaluations and decision making. The need to incorporate PROs into the evaluation of new pharmaceutical products is increasingly recognized—and acted upon by the incorporation of PROs in studies—by biotechnology and pharmaceutical companies during research and development (Phases II and III) in regulatory submissions, especially when there is an opportunity for a label claim and to launch a drug with value-added promotion (2,5–6). This is because the PROs lend distinction towards demonstrating the effect of treatment beyond traditional clinical efficacy and safety end points. Patient-reported outcomes may be useful, for example, in differentiating competing products with similar or different clinical efficacy. These outcomes may also provide pertinent information to third-party payers and patients for treatment decisions. Moreover, the PRO benefits, if approved by the U.S. Food and Drug Administration (FDA), can be incorporated into a drug’s label claim and used in promotional campaigns to make patients aware of how medicines may improve the quality of their lives. In fact, as of this writing, the FDA has released a draft guidance to support labeling claims of patient-reported measures in medical product development, and an entire journal issue has been devoted to the topic (6–14). The number of applications submitted by the biopharmaceutical industry to the FDA in order to achieve PRO labeling or promotional claims has increased over the past 10 years (15). Along with these submissions has come underlying questions about how PROs should be assessed in clinical trials (5,16–18). The assessment of PROs bears some inherent challenges. One is that PROs reflect unobserved (latent) concepts, which may manifest themselves in different observable ways depending on the disease or treatment of interest. Therefore, no single PRO instrument exists that is suitable for all clinical trials. In addition, selecting the most appropriate measure can be challenging. A second challenge, related to the first, is that PROs rely on patients’ self-report and are therefore inherently subjective (16). Subjectivity requires that PROs measure the concept of interest in a reliable and valid way and, in doing so, detect a clinically meaningful difference. The assessment of PROs also faces some important data analysis issues such as simultaneous assessment of multiple end points and time points as well as how to

199

Guide for PROs Step 1: Formulating study objectives Relevant instruments Reliability Step 2: Developing or selecting an instrument

Psychometric properties

Validity

Feasibility

Responsiveness

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

Intervals of PRO assessment Step 3: Developing data collection strategies

Mode of administration Eligibility criteria Standardizing data collection Scoring procedure

Step 4: Analyzing data

Handling missing data

Missing items Missing questionnaire

Descriptive analysis Inferential analysis

Step 5: Reporting data

Standard tables and figures

Hypothesis Tests and Confidence Intervals – multiple instruments, multiple domains on same instrument, multiple time points

Statistical significance Step 6: Interpreting study findings Clinically important difference

FIGURE 1 Key steps for evaluating patient -reported outcomes in clinical trials.

handle missing data. Finally, the interpretation of PRO results may require special effort, as many PROs do not share common metrics and there is no well-established guideline about how to interpret such data. In light of these clear challenges currently facing PROs, it will be difficult for clinical researchers (statisticians, clinicians, behavioral or outcome research scientists, regulatory personnel) without specialized training in psychometrics and biostatistics to conduct, analyze, and interpret PROs. The purpose of this article is to provide a step-by-step practical guide to clinical researchers on designing and assessing PROs in a clinical trial. To simplify the evaluation of PROs, we simplified and divided the whole process into six steps (Figure 1). We review central issues encountered in these steps and offered recommendations based on our experiences. STEP 1: FORMULATING STUDY OBJECTIVES The evaluation of PROs often begins with the formulation of study objectives (Figure 1). If a sponsor wishes to seek a label claim or promote benefits of a drug, the development of a clear and explicit a priori objective is critical for subsequent trial design and study conduct. Stated objectives should breathe concrete and specificity, not vagueness and ambiguity. For

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

200

X. Luo and J.C. Cappelleri

example, the objective “To compare PROs between regiment A and regimen B” fails to provide specific information about the relevant domains of PROs, patient population of interest, and time of assessment. Clear specifications of these details can help to better design study protocols and are essential to the ultimate success of a clinical trial. Here is an example with specific and concrete objective: “Moodlift 20 mg, taken once daily, will lead to improvement in symptoms of depression, and psychological and social function among adult men with major depressive disorder (MDD).” (19) While behavioral and clinical scientists usually lead the effort to formulate specific objectives for a study on PROs, statisticians should also be actively engaged to collaborate or partner in this effort, as study objectives can influence subsequent study design and data analysis plans. STEP 2: DEVELOPING OR SELECTING AN INSTRUMENT The fulfillment of a study objective requires appropriate instruments to measure the PROs included in the objective (Figure 1). Instrument development can be an expensive and time-consuming process. It usually involves a number of considerations: item generation (through expert panels and focus groups), data collection from a sample in the target population of interest, item reduction, instrument validation, translation and cultural adaptation. This whole procedure can easily require at least one year. Therefore, the use of a previously validated instrument is typically preferable to the development of a new instrument for the same purpose. For researchers who are not familiar with various instruments, updated information on currently available instruments can be accessed from databases such as the Patient-Reported Outcome and Quality of Life Instruments Database (http://www.proqolid.org) and the On-Line Guide to Quality-of-Life Assessment (http://www.OLGA-Qol.com). With many instruments currently available, the choice of the most appropriate instruments becomes vital to the success of a study in which PROs are included as a key end point. Because clinical researchers may play important roles in instrument selection, what follows are issues that need to be taken into consideration for instrument selection. Relevance of the Selected Instrument The selection of an instrument should first consider study objectives. The instrument should reflect the concrete, unambiguous questions being asked that are relevant to the targeted disease and study population. The instrument should also be able to measure intended advantages and disadvantages of treatment. A conceptual framework (linking items to their specific domains (or subscales) within the same instrument (or questionnaire)) and

Guide for PROs

201

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

an end point model (linking treatment, traditional clinical outcomes, and different PROs in the same study) should be established before the selection of an instrument or questionnaire. This framework and model should be based on theoretical and empirical evidence, and should provide strong rationale about why specific PROs are measured, both within a particular patient-reported questionnaire and across PROs in the same study. Psychometric Properties of Instrument The selection of an instrument must also consider the instrument’s measurement properties. Is the instrument measuring what it intended to measure—is it valid? Does it give accurate measurements—is it reliable? The selected instrument must be psychometrically sound. Measurement characteristics including reliability and validity, as well as responsiveness, are fundamental aspects for judging the quality and merits of an instrument. Reliability measures to what extent an instrument yields reproducible and consistent results. Evidence on two types of reliability is usually required. One is internal consistency reliability and another is test-retest reliability. The internal consistency reliability assesses to what extent the items of a domain or subscale are correlated—to what extent the items move in tandem to measure different aspects of the same concept. The assessment of internal consistency reliability is usually carried out using Cronbach’s alpha coefficient. Test-retest reliability measures to what degree an instrument gives similar scores when it is repeatedly administered to the same patient under a stable condition. It is often based on an intraclass correlation coefficient. For Cronbach’s alpha and intraclass correlation coefficient, a minimum value of 0.7 is considered acceptable for a comparison between groups (4,20). Assessing reliability is not sufficient for the validation of an instrument. An instrument may be reliable (accurate or precise in measuring something), yet not measure what it is supposed to measure and hence not be valid. There are at least three major types of validity: content validity, construct validity, and criterion validity. Criterion validity is not assessed when there no criteria or “gold standard” measure, as is often the case for most of the diseases. Content validity concerns the extent to which the constituent items reflect the intended concept. The assessment of content validity usually involves critical examination on whether the items are comprehensive enough and clearly cover, without ambiguity, the concept of interest. Content validity is often evaluated by consulting with patients with the disease of interest, physicians, and specialists to ensure that the included items are clear, comprehensive, and acceptable. Construct validity is one of the most important characteristics of a measurement instrument: it assesses to what extent an instrument measures the

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

202

X. Luo and J.C. Cappelleri

construct or concept it is supposed to measure. The assessment of construct validity often begins with postulating a relationship between the concept (construct) of interest and other related or unrelated measures or characteristics. Data are then collected and the assessment is conducted. If the results confirm the postulated relationship, evidence exists to support construct validity. Different methods can be used to establish construct validity. For example, construct validity can be assessed by comparing instrument scores among different groups of patients that are clinically distinct and anticipated to score differently (discriminant validity). Construct validity can also be assessed by correlating instrument scores with other measures that are theoretically related (convergent validity) or unrelated (divergent validity) to the underlying concept measured by the instrument. It is difficult to fully and completely prove construct validity. Instead, researchers rely on accumulating amounts of evidence to demonstrate that an instrument is valid in measuring the concept of interest. Responsiveness, which can also be viewed as another type of validity, is the ability of an instrument to detect small but important changes within a group over time. Responsiveness is one of the most essential characteristics of an instrument; a non-responsive instrument has little use to discern true drug effects. Two of the most commonly used measures of responsiveness are the standardized response mean and the effect size. The standardized response mean is the ratio of the mean change to the standard deviation of that change. The effect size is the ratio of the mean change to the standardized deviation of the initial measurement. These two measures can be used to compare the responsiveness of a new instrument with that of existing ones. Related to responsiveness is sensitivity: the ability to detect known differences between treatment groups over time or at a specific time. Its measures of effect correspond to those for responsiveness except that the mean change is between groups instead of within group. Feasibility The final consideration on instrument selection is feasibility. Issues related to feasibility include language availability, time required to complete the instrument, patient ability to complete the questionnaire, the rate of refusal, and percentage of missing items. All of these issues, each an important element itself, should be thought out when selecting an instrument. STEP 3: DEVELOPING DATA COLLECTION STRATEGIES After determining which instrument will be used in an evaluation on PROs, a carefully planned data collection strategy should be built into study design and research protocol to ensure high quality of data (Figure 1).

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

Guide for PROs

203

Although this is true of any clinical trial, the fact that PROs are based on a patient’s self-report makes it even more important to develop a judicious strategy in order to prevent or minimize data missing or bias. An important consideration when developing the data collection strategies is the time intervals that PROs are assessed. Time intervals of assessment should be based on disease progression, treatment response, drug side effects, duration of trial, and number of questionnaires. At a minimum, assessments of PROs should be performed at baseline and at the end of study. But intermediate follow-up measurements may be required to more fully capture changes within group and between groups over time. Therefore, a reasonable number of assessments to capture this trajectory should be planned in a clinical trial. Assessments of PROs are usually performed at the same time as clinical visits and are best completed before professional encounters with non-PRO measures, which may influence a patient’s response on PROs. The mode of administration on PROs can be obtained by paper and pencil, computer (including Internet) administration, electronic devices (e.g., wireless PDA) or interactive voice response phone system. Standardized data collection procedures need to be established to ensure that the data are collected consistently among different patients and investigators, and across various study sites. Before the start of the trial, data collection personnel and study monitors should be carefully and uniformly trained. A detailed guideline on the assessment of PROs should be prepared and serve as a reference book for study monitors and data collection personnel in order to handle issues arising from the assessment. Missing data can occur at the item level for at least one but not all items on the questionnaire, or at the questionnaire level for all of its items. The reasons for missing data should be recorded at the time of occurrence and later considered to lend insight into the potential patterns for why data are missing. Because data quality is directly linked to the validity of study findings, researchers should have a thorough understanding about the data collection process along with potential issues and biases inherent in this process. Such knowledge can help facilitate the development of appropriate data analysis plans to understand and minimize potential bias. If missing data do occur for some but not all items on the questionnaire, the non-missing data may still be used for analysis based on some pre-specified criteria. For example, the EORTC QLQ-C30 (European Organization for Research and Treatment of Cancer Quality of Life Questionnaire–Cancer-30) consists of five functional scales (physical, role, cognitive, emotional, and social), three symptom scales (fatigue, pain, nausea and vomiting), a global health status scale and 6 single-item scales (21). The EORTC QLQ-C30 Scoring Manual has specified that missing values will be imputed for multi-item scales. If at least half of the items from the scale have been answered, the missing items

204

X. Luo and J.C. Cappelleri

are assumed to have values equal to the average of those items which are present for the respondent. For example, the physical function subscale consists of 5 items and this scale can be estimated whenever 3 of its 5 constituent items are present (21).

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

STEP 4: ANALYZING DATA The next step in the evaluation of PROs is to develop a comprehensive and detailed plan for data analysis. For a clinical trial, the statistical analysis plan (SAP) on PROs is best integrated with other study end points or developed as a stand-alone component. From experience, we have witnessed the gains in efficiency when PROs are integrated and unified with other end points in the SAP. Clinical statisticians in collaboration with outcomes research, behavioral, clinical, and regulatory representatives usually play leading roles in developing the SAP, with the project clinical statistician serving a principal author. The SAP on PROs should be clear and concise, and yet complete and comprehensive, about the stated objective. In addition to the data analysis on PROs, the SAP should also include a brief description of how the instruments are selected, how domains belonging to an instrument are scored, and how missing items of an instrument are handled. The development of the data analysis plan should be based on study objectives and may vary among different phases of clinical trials. For example, for a phase II trial intended to explore the potential impact of a specific drug treatment on PROs, we recommend that the analysis plan focus on a comprehensive descriptive analysis. Basic statistics such as instrument compliance rate, the observed mean and median of domain scores (along with standard errors or, say, 95% confidence intervals), and the observed mean change from baseline (and its 95% confidence interval) to a each follow-up time should be included within each group. Additionally, if a trial has multiple arms, a comparison of the domain scores between arms is typically worthwhile to include by analyzing (and then reporting) the between-group difference in mean changes from baseline to each follow-up time, along with corresponding (say) 95% confidence intervals. For a phase III trial (especially one intended for a label claim on PROs), inferential statistics (hypothesis testing and confidence intervals) should be the focus of the analysis plan, along with a detailed descriptive summary. Regardless of phase of the study, data on PROs should be treated just like any other study points and adopt the same analytical rigors. Inferential testing of data on PROs should consider the analytical issues specific to the evaluation of PROs in a clinical trial. For example, many instruments have multiple domains and each instrument may be measured a number of times. Multiple comparisons then become an important issue

Guide for PROs

205

that deserves special consideration. Missing data usually occur in PRO studies. How to handle the missing data also requires special considerations. More detail on these two issues follow.

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

Multiplicity Issue It has been well-recognized that the multiple comparisons of drug treatments can result in false significant results. Because data on a particular patient-reported outcome is usually measured over a number of time points, and because the same study may comprise multiple PROs (or multiple subscales within the same patient-reported outcome instrument), it becomes important to describe in the SAP how to deal with the multiplicity issue, especially if the evaluation in the clinical trial is intended for label claims on PROs. Several methods can be applied to address the multiplicity issue. One of the methods is to use summary measures or summary statistics. For many instruments, a single score can be constructed by aggregating data across different domains on the same questionnaire. Such a summary score can be used as the primary end point for hypothesis testing and, consequently, prevents the concern of repeated testing on multiple domains on the same instrument. A potential problem with the use of the summary score is that significant changes in some specific domains may be masked and what is really measured may become clouded or convoluted, resulting in low confidence about validity. Summary measures can also be constructed on a particular subscale or domain of an instrument to summarize the repeated observations on an individual, and then across individuals, on that subscale. Examples include the average of within-subject post-treatment values as well as area under growth curve and time to reach a peak or pre-specified value. The use of these summary measures begins with the construction of the measure for each individual and follows with the analysis of the summary measures across individuals for a within-group or between-group comparison on a given measure such as subscale of an instrument. A drawback of summary measures across time is that they do not fully capture the weighted and correlated nature of repeated observations on PROs over time. In addition to reducing the repeated observations by creating a summary measure on an individual, it is also possible to construct summary statistics on the repeated measures for a group of individuals (e.g., average rate of change over time for a treatment group). Summary statistics can be created using hierarchical, mixed-effect, or mean-response profile models that incorporate and model all available data, which can address missing data and therefore help to mitigate this potential bias, especially if the data are missing at random.

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

206

X. Luo and J.C. Cappelleri

Another way to minimize the problem of multiplicity is to select a few key domains and key time points. These key domains and time points can be pre-specified in the SAP as primary end and time points for statistical inference. Other domains and time points may be regarded as secondary end and time points. While this recommendation provides a straightforward way to handle the multiplicity issue, a major challenge is how to select the most appropriate domains and time points. One way to address this challenge is to rely on substantive knowledge and well-grounded theory regarding the nature of the disease and postulated beneficial effects of the interventions. The problem of multiplicity can also be addressed in several ways including through p-value adjustment. Three types of p-value adjustment are commonly used: 1) Bonferroni, 2) Bonferroni-Holm (Step-Down) Procedure, and 3) Hochberg’s (Step-Up) Method. Of the three methods, the Bonferroni procedure is the most conservative. In contrast, the Holm’s procedure and Hochberg’s method may be more accurate and preferable. Missing Data Missing data on PROs can have at least two major repercussions. At a minimum, the missing data will result in wider confidence intervals and reduced statistical power for detecting a treatment effect. The larger, more troublesome issue is the likelihood that missing data are closely linked to patients’ health and treatment, leading possibly to a biased estimation of treatment effects. Given these potential impacts, the SAP should clearly describe how to handle missing data, especially if the evaluation on PROs is intended for label claims or promotional use. Missing data on PROs can occur as missing items or missing questionnaires. Missing items involve the lack of responses for some specific items; missing questionnaire involves patients who may fail to complete and return the whole questionnaire. Many instruments include well-documented procedures by their developers on how to handle missing items. We recommend that these procedures be followed. Missing questionnaires are a more complex situation than missing items. Missing questionnaires can happen as a result of dropout from the study, late entry into the study, or randomly missing the questionnaire. In any of these situations, it is important to first analyze the rates (proportions) and reasons for missing data. Such information will help to gauge the severity of the non-response problem and the underlying mechanisms for missing data. There are at least three ways to address the missing data problem. One is to remove patients with missing forms from the analysis and only analyze complete cases. While simple, this method is usually not recommended

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

Guide for PROs

207

because it can reduce sample size and may produce bias results if the missing data are not missing completely at random (MCAR). Another way is to impute the missing data. Different methods can be used for the imputation. The simplest way is to substitute the mean scores of patients with observed data for those with missing data (mean imputation). Unless the missing data are MCAR, the mean imputation method may result in bias estimates and should be used carefully. Another commonly used method is last observation carried forward (LOCF), which replaces a patient’s missing value with his last completed observation. In the event that data on PROs may not remain stable over time, the LOCF method may also be suspect. Some more sophisticated techniques have been developed including regression imputation, hot deck imputation, and cold deck imputation. All these techniques, like the simple mean imputation and LOCF, belong to a single imputation category in which a single value is imputed for a specific missing point. A major limitation with the single imputation is that estimated errors are generally too small, as the imputed values are treated as actual data when in fact they are not. However, this obstacle can be overcome by multiple imputation whereby several values are imputed instead of just one. Multiple imputation not only improves the accuracy of standard errors but also allows researchers to conduct various sensitivity analysis, especially constructive when data are not missing at random. Finally, the problem of missing data can be addressed through the application of a likelihood-based approach that mixed-effect models incorporate. In this approach, every subject would contribute his or her available (observed) measurements. When missing data are MCAR or missing at random (MAR), the marginal distribution of the observed data provides the correct likelihood of unknown parameters and, therefore, the missing data are considered to be ignorable. The likelihood-based approach is attractive because it can provide valid estimate of treatment effects if missing data are MCAR or MAR. When data are not missing at random (non-ignorable), models such as selection models or pattern-mixture models that do not assume that data are MCAR or MAR should be considered as secondary models in sensitivity analyses. STEP 5: REPORTING DATA The reporting of data on PROs is a critical component to their evaluation. Data on PROs should be presented clearly, concisely and sufficiently to foster clarity, transparency and comprehension. While a table is a useful way to summarize study results, we also recommend graphical presentations as a way to simplify the longitudinal and multi-dimensional nature of data on PROs. Regardless of tables or graphs, it is imperative to present information

208

X. Luo and J.C. Cappelleri

as comprehensively and practically as possible. For example, data on the number of subjects completing the PRO evaluation at each treatment assessment should be reported, as should the metrics of variability embodied as in confidence intervals or standard errors of average estimates.

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

STEP 6: INTERPRETING STUDY FINDINGS The data analysis may show a statistically significant difference on scores of PROs between treatment groups at a specific time or a significant change within or between groups over time. A natural ensuing question is whether the treatment difference or change is clinically meaningful. It has been well-recognized that statistical significance may not imply clinical significance. For example, a small difference on PRO scores between two treatment groups may be statistically significant given a large sample size, but clinical relevance may be scant or difficult to interpret in a meaningful manner. Understanding the degree of difference on scores of PROs that is considered to be clinically meaningful can enhance the application and interpretation of PROs. A number of methods have been proposed for establishing clinically meaningful change in PROs. These methods can be grouped into two broad categories: distribution-based and anchor-based. Distribution-based approaches use the statistical characteristics of the sample (mean and standard deviation) or instrument (reliability) to suggest a clinically meaningful change. One of the most widely used distribution-based methods is the (standardized) effect size, which is also used for assessing instrument’s responsiveness. It involves the differences in means over time within a group (in a one-group study) or between two groups (in a multiple-group study), divided by some measure of variability such as the standard deviation of the baseline measurements. The effect size metric resembles a signal-to-noise ratio. When examining treatment effects, an effect size of 0.2 of a standard deviation is considered “small,” 0.5 “moderate,” and 0.8 “large.” Because the effect size is derived purely from a statistical distribution, it does not provide an estimation of clinical significance per se. Another distribution-based approach called the standard error of measurement (standard deviation of instrument scores times the square root of one minus the instrument’s reliability) has been used to define clinical meaningfulness. One standard error of measurement has been offered as a guideline corresponding to a patient perceived clinical change but no consensus exists and more research is needed here. Unlike distribution-based approaches, the anchor-based methods are those in which differences at a given time or changes over time in PROs are linked—or anchored—to differences or changes in an external clinical

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

Guide for PROs

209

measure (e.g., patients’ global rating of change and clinical rating of disease severity). This external measure should bear an appreciable correlation to the patient-reported outcome and have clinical understanding and import. Variations of anchor-based approaches exist. For example, scores on PROs can be compared with norm-based PRO scores from a general population or with external variables such as utilization of health care services and ability to work. A variation of an anchor-based approach provides a content-based interpretation with scores on a patient-reported outcome related to the probability of response to an item internal and representative of that patient-reported outcome (22). Compared with distribution-based methods, anchor-based approaches offer more clinical import into the interpretation of PROs. Although there is no consensus about what constitutes the most appropriate clinical indicator, the application of multiple anchors are encouraged with the aim that their convergence would arrive at a scientifically justifiable and reasonable range of estimates as to what is clinically meaningful. The interpretation of findings on PROs is an evolving area. No strict rules exist but guidelines actuated by published examples with respect to distribution-based and anchor-based approaches are available and being refined. In light of the different methods currently available, along with the opportunity to develop new methodologies, statisticians can play important roles in helping their outcomes research, behavioral, clinical, and regulatory colleagues to identify suitable approaches to enhance the interpretation of PROs A meaningful change is a subjective concept and different audiences may have different perspectives. For example, patients may consider an improvement in daily activities as a meaningful change, whereas physicians may place greater emphasis on disease prognosis. We therefore recommend that a given interpretation strategy be targeted to a given audience or perspective. However, given the nature of PROs, patients’ perspectives remain paramount. When the targeted audiences are not well- known, a comprehensive strategy should be developed that incorporates different anchors each having its own metric that is meaningful to a given stakeholder. SUMMARY This article provides a practical and broad guide for clinical researchers to incorporate and assess PROs in a clinical trial for regulatory purposes, including for a label claim or drug promotion as well as supplemental (exploratory) content in planning a program. Evaluation of PROs can be time-consuming and involved task. We provide six fundamental steps for evaluating PROs in clinical trials: 1) formulating study objectives, 2) developing

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

210

X. Luo and J.C. Cappelleri

or selecting an instrument, 3) developing data collection strategies, 4) analyzing data, 5) reporting data, and 6) interpreting study findings. Patientreported outcomes should be handled like other clinical study endpoints, be integrated into the statistical analysis plan, and adopt the same set of scientific standards. This article, which provides an overview on the topic, is intended to whet the general appetite of interested researchers who are encouraged to consult the published literature for more detailed information, In addition to the articles cited in this paper, several books (and their references) can be consulted (4,23–25). Success on the implementation and evaluation of PROs requires good knowledge about their assessment process and issues, aided by careful and thoughtful planning. To that end, this article is intended to awaken and highlight the central elements that can serve as a useful overview for clinical researchers on designing, analyzing, and interpreting PROs in clinical studies.

REFERENCES [1] Acquadro C, Berzon R, Dubois D, Leidy NK, Marquis P, Revicki D, Rothman M. Incorporating the patient’s perspective into drug development and communication: An ad hoc task force report of the Patient-Reported Outcomes (PRO) Harmonization Group meeting at the Food and Drug Administration, February 16, 2001. Value Health. 2003;6:522–531. [2] Burke LB, Erickson P, Leidy NK, Patrick D, Petrie C. Patient Reported Outcomes – Selecting, Evaluating and Documenting Support for Existing Instruments for Labeling Claims: Content Validity. Task Force Presentation at the International Society for Pharmacoeconomics and Outcomes Research Annual International Meeting, May 3–7, 2008, Toronto, Ontario, Canada [3] Fairclough DL. Patient reported outcomes as endpoints in medical research. Stat Methods Med Res. 2004;13:115–138. [4] Fayers P, Machin D. Quality of Life: The Assessment, Analysis and Interpretation of Patient-Reported Outcomes. 2nd ed. West Sussex, England: John Wiley & Sons Ltd.; 2007. [5] Revicki DA, Gnanasakthy A, Weinfurt K. Documenting the rationale and psychometric characteristics of patient reported outcomes for labeling and promotional claims: the PRO Evidence Dossier. Qual Lif Res. 2007;16:717–723. [6] U.S. Food and Drug Administration. Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims (DRAFT). U.S. Department of Health and Human Services, Food and Drug Administration: Center for Drug Evaluation and Research (CDER); Center for Biologics Evaluation and Research (CBER); Center for Devices and Radiological Health (CDRH). February 2006. Available from: http://www.fda.gov/cder/guidance/5460dft.htm. Accessed January 6, 2008. [7] Rothman ML, Beltran P, Cappelleri JC, Lipscomb J, Teschendorf B, the Mayo/FDA PatientReported Outcomes Consensus Meeting Group. Patient-reported outcomes: Conceptual issues. Value Health 2007; 10 (Suppl 2):S66–S75. [8] Sloan JA, Halyard MY, Frost MH, Dueck AC, Teschendorf B, Rothman ML, the Mayo/FDA PatientReported Outcomes Consensus Meeting Group. The Mayo Clinic manuscript series relative to the discussion, dissemination, and operationalization of the Food and Drug Administration guidance on patient-reported outcomes. Value Health. 2007; 10 (Suppl 2):S59–S63. [9] Snyder CF, Watson ME, Jackson JD, Cella D, Halyard MY, the Mayo/FDA Patient-Reported Outcomes Consensus Meeting Group. Patient-reported outcomes instruction selection: Designing a measurement strategy. Value Health. 2007; 10 (Suppl 2):S76–S85.

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.

Guide for PROs

211

[10] Turner RR, Quittner AL, Parasuraman BM, Kallich JD, Cleeland CS, the Mayo/FDA PatientReported Outcomes Consensus Meeting Group. Patient-reported outcomes: Instrument development and selection issues. Value Health. 2007; 10 (Suppl 2):S86–S93. [11] Frost MH, Reeve BB, Liepa AM, Reeve BB, Stauffer JW, Hays RD, the Mayo/FDA Patient-Reported Outcomes Consensus Meeting Group. What is sufficient evidence for the reliability and validity of patient-reported outcomes? Value Health. 2007; 10 (Suppl 2): S94–S105. [12] Sloan JA, Dueck AC, Erickson PA, Guess H, Revicki DA, Santanello NC, the Mayo/FDA PatientReported Outcomes Consensus Meeting Group. Analysis and interpretation of results based on patient-reported outcomes. Value Health. 2007;10 (Suppl 2):S106–S115. [13] Revicki DA, Erickson PA, Sloan JA, Dueck A, Guess H, Santanello NC, the Mayo/FDA PatientReported Outcomes Consensus Meeting Group. Interpreting and reporting results based on patient-reported outcomes. Value Health. 2007;10 (Suppl 2):S116–S124. [14] Patrick DL, Burke LB, Power JH, Scott JA, Rock EP, Dawisha S, O’Neil R, Kennedy DL. Patientreported outcomes to support medical product labeling claims: FDA perspective. Value Health. 2007;10 (Suppl 2):S125–S137. [15] Willke RJ, Burke LB, Erickson P. Measuring treatment impact: A review of patient-reported outcomes and other efficacy endpoints in approved labels. Controlled Clin Trials 2004;25:S35–S52. [16] Leidy NK, Vernon M. Perspectives on patient-reported outcomes. Pharmacoeconomics 2008;26:363–370. [17] Revicki DA, Osoba D, Fairclough D, Barofsky I, Berzon R, Leidy NK, Rothman M. Recommendations on health-related quality of life research to support labeling and promotional claims in the United States. Qual Life Res. 2000;9:887–900. [18] Chassany O, Sagnier D, Marquis P, Fulleton S, Aaronson N. Patient-reported outcomes: The example of health-related quality of life—a European guidance document for the improved integration of health-related quality of life assessment in the drug regulatory process. Drug Inf J. 2002;36:209–238. [19] Rothman ML, Beltran P, Cappelleri JC, Lipscomb J, Teschendorf B; Mayo/FDA Patient-Reported Outcomes Consensus Meeting Group. Patient-reported outcomes: conceptual issues. Value Health. 2007 Nov–Dec;10 Suppl 2:S66–75. [20] Nunnally JC, Bernstein IH. Psychometric Theory (3rd ed.) New York: McGraw Hill, Inc.; 1994. [21] Fayers PM, Aaronson NK, Bjordal K, Groenvold M, Curran D, Bottomley A, on behalf of the EORTC Quality of Life Group. EORTC QLQ-C30 Scoring Manual (3rd edition). Brussels: EORTC; 2001. ISBN: 2–9300 64-22-6. [22] Cappelleri JC, Bell SS, Siegel RL. Interpretation of a self-esteem subscale for erectile dysfunction by cumulative logit model. Drug Information Journal. 2007;41:723–732. [23] Steiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. 3rd ed. New York, NY: Oxford University Press; 2003. [24] Fairclough D. Design and Analysis of Quality of Life Studies in Clinical Trials. Boca Raton, Florida: Chapman & Hall/CRC; 2002. [25] DeVellis RF. Scale Development: Theory and Applications. 2nd ed. Thousand Oaks, California: Sage Publications. 2003.

Clinical Research and Regulatory Affairs Downloaded from informahealthcare.com by 117.172.98.97 on 05/20/14 For personal use only.