Establishing and Reporting Evidence of the Content Validity of Newly-Developed Patient-Reported Outcome (PRO) Instruments for Medical Product Evaluation: Good Research Practices
Donald L. Patrick PhD, MSPH, Laurie B. Burke RPh, MPH, Chad Gwaltney PhD, Nancy Kline Leidy PhD, Mona L. Martin RN, MPA, Lena Ring PhD
Part I Developing Content for a New PRO Instrument
RUNNING TITLE: Developing Content for a New PRO Instrument CORRESPONDING AUTHOR: Donald L. Patrick PhD, MSPH, University of Washington, Box 359455 Seattle, Washington 98195-9455
[email protected]
KEY WORDS: content validity; patient reported outcomes; FDA, EMA; quality of life Authors are listed in alphabetical order by surname after the senior author. The views expressed herein represent those of the authors and not those of the University of Washington, Food and Drug Administration, PRO Consulting, United BioSource, Health Research Associates, AstraZeneca, or Uppsala University.
ISPOR PRO Task Force: Content Validity Part I ABSTRACT Background: A patient-reported outcome (PRO) instrument is a means to capture data for assessing treatment benefit or risk in medical product evaluation. Two articles in these issue present conclusions of an ISPOR taskforce convened to address good research practices for documenting content validity in newly developed PRO instruments. Content validity is the extent to which the content of a new PRO instrument adequately represents a given concept or set of concepts. We use the specific context of a PRO instrument newly developed to support PRO claims in medical product labeling. Paper I outlines steps for gathering and presenting qualitative evidence to support the inclusion of concepts in the new instrument. Paper II addresses how to gather evidence that persons in the target population understand the content of the new instrument. Both papers present suggestions for documenting the chosen qualitative theoretical approach, methods, results and conclusions. Adequate qualitative evidence is critical to ensure PRO instrument content validity. These papers do not address methods that mix qualitative and quantitative approaches to establishing content validity; however, the same qualitative research principles apply. Mixed qualitative and quantitative approaches to content validity testing will be addressed in future papers. Methods for Paper I: Five good practices consistent with U.S. and European review processes are addressed in chronological order: 1.) plan context of measurement; 2.) develop protocol for qualitative concept elicitation; 3.) conduct concept elicitation interviews and\or focus groups; 4.) Analyze qualitative data for concept elicitation; and 5.) document concept development and elicitation. Illustrations are given of suggested ways to collect and present evidence. Results and Conclusions of Paper I: Using qualitative evidence to support content validity requires a clear understanding of the actual intervention study design (e.g., the entry criteria for the clinical trial population) and the targeted context of measurement. Qualitative research applies to all PRO instruments used to support labeling claims and must be completed well before confirmatory (Phase III) trials are initiated to allow time for instrument finalization. The qualitative study protocols address a broad range of the target population demographics and characteristics and include a plan for data analyses. Conducting
2
ISPOR PRO Task Force: Content Validity Part I Interviews or focus groups requires trained interviewers with appropriate quality controls. Qualitative analyses require trained coders, demonstration of saturation and clearly presented results supported by the transcripts of audio recordings. Detailed documentation of the entire concept elicitation process provides the body of evidence to support conclusions drawn by qualitative researchers that the instrument measures a certain concept. The evidence must support patients’ responses and outcomes data using the language of the instrument items correspond to the concept that is reflected by the instrument score(s) and that the concept is adequately covered by the instrument. The detailed documentation is also reviewed in a regulatory setting to determine whether medical product claims are truthful and not misleading when the instrument is used in an outcomes trial to measure treatment impact.
3
ISPOR PRO Task Force: Content Validity Part I
Background
1
The ISPOR Health Science Policy Council and the ISPOR Board of Directors recommended that
2
an ISPOR Task Force be established on Good Practices in Establishing and Reporting Evidence of the
3
Content Validity of Newly-Developed Patient-Reported Outcomes (PRO) Instruments for Medical Product
4
Evaluation. The purpose of this task force was to extend the work of a previously published report on the
5
use of existing or modified PRO instruments to support medical product labeling claims (1) by addressing
6
methods for assuring and documenting the content validity of newly-developed PRO instruments.
7
The chair of this task force, (Donald L. Patrick, PhD) recruited members based on their experience
8
as scientific leaders and practitioners in the field, as well as developers and users of PRO instruments. A
9
range of perspectives on PRO instruments was provided by the diversity of their work experience:
10
academia, government, research organization, and industry. In addition, forty-seven members of the
11
ISPOR Patient Reported Outcomes Review Group provided written comments on the draft reports. In
12
addition, oral feedback was provided at the PRO Forum held during the ISPOR 15th Annual International
13
Meeting in Atlanta. The task force met regularly via conference calls and held one face-to-face meeting.
14
During content and outline development, the task force decided two papers would be needed: Part
15
I covers the development of content for a new PRO instrument, i.e. concept identification to inform content
16
and structure using qualitative focus group and interview methodology, while Part II covers item
17
development and the assessment of patient understanding of the draft instrument using cognitive
18
interviews and steps for instrument revision. The two parts are meant to be read together. Rather than
19
prescriptive, they are intended to offer suggestions for good practices in planning, executing, and
20
documenting the process of content validation of PRO instruments to be used in medical product
21
evaluation.
22 23 24 4
ISPOR PRO Task Force: Content Validity Part I 25
PART I
26
Developing Content for a New PRO Instrument
27 28
Definition of Terms
29
The term “PRO” is often used interchangeably to refer to a concept, instrument, questionnaire, score, or
30
claim. According to the FDA Guidance, A patient reported outcome (PRO) is any report of the status of a
31
patient’s health condition that comes directly from the patient, without interpretation of the patient’s
32
response by a clinician or anyone else(2). In Europe, the European Medicines Agency Reflection Paper
33
(3) released a reflections paper on the place of health-related quality of life (HRQL) in medical product
34
development, specifying that this was one type of PRO. This task force report uses the term “PRO” to
35
refer to the general concept or outcome of interest. “PRO” serves as the umbrella term covering all
36
patient-reported outcomes, with HRQL one specific type (4). A PRO instrument or measure is a means to
37
collect data. Questionnaires and diaries are examples of PRO instruments. The term instrument refers to
38
item content (stem and response options), instructions, and recall period. PRO scores are numeric values
39
or categorical assignment generated through the use of a PRO instrument and used to represent the PRO
40
of interest.
41
In medical product development, PRO instruments may be used in clinical trials to capture and
42
quantify treatment benefit or risk (5, 6), with the possibility that this information will be use3d to support a
43
“claim” in medical product labeling. Within this context, it is useful to distinguish the PRO concept, claim,
44
instrument, and score (5). For example, pain intensity is a PRO (the concept); decrease in pain intensity
45
is a PRO claim; a 10-centimeter visual analog scale (VAS) assessing pain intensity, including the anchors,
46
instructions, and recall period, is a PRO instrument; and the value a subject assigns to their pain intensity
47
on the VAS is a PRO score.
48
Content validity is the extent to which the content of an instrument represents the most important
49
aspects of a given concept (7), in this case, the extent to which it represents the PRO. In the FDA
50
Guidance on PRO Measurement, content validity is defined by the empirical evidence showing that the
5
ISPOR PRO Task Force: Content Validity Part I 51
items and domains of an instrument are appropriate and comprehensive relative to its intended
52
measurement concept, population, and use (2).
53
Qualitative data are essential for establishing the content validity of a PRO instrument.
54
Quantitative data, including factor analysis and item response theory (IRT) analyses, may be supportive
55
but are insufficient on their own to document content validity for medical product development. This task
56
force report, Parts 1 and 2, summarize the elements of good research practices around the establishment
57
of content validity of a new instrument through qualitative research.
58 59 60
Good Practices in Eliciting Concepts for a New Patient-Reported Outcome Instrument Table 1 lists five steps to elicit concepts for establishing and documenting content validity of a new
61
PRO instrument in a recommended chronological order, consistent with the Wheel and Spokes Diagram
62
contained in the Final FDA PRO Guidance (2). Within each step are recommended good research
63
practices.
64
Table 1 About Here
65 66 67
Good Practice 1: Plan the Context of Measurement The development of an instrument, simple or complex, must start with a clear definition of the
68
concept to be measured in the proposed context of measurement. The purpose of Step 1 in Table 1 is to
69
ensure that the context is clearly defined and the approach for concept measurement is appropriate for the
70
intended context. In situations involving instrument development within a regulatory framework, context
71
considerations include the disease or condition of interest, target population, and treatment setting.
72
Consideration should also be given to the positioning of the measure in the hierarchy of clinical trial
73
endpoints. Clarification of the context of use and the role the measure will play in clinical trials informs
74
preliminary decisions on instrument scope of content, measurement structure, and mode of administration.
75
With this essential work established, qualitative research protocols can be developed to gather patient
76
input on the concept(s) of interest. Descriptions for each of the components of Step 1 follow below.
6
ISPOR PRO Task Force: Content Validity Part I 77 78 79
Disease models. Development of a new PRO instrument for use in medical product evaluation often begins with a
80
clear delineation of the concept of interest through the development of a disease model. Consideration is
81
given to the pathophysiology and expression of the disease or condition, including its characteristic signs
82
and symptoms in the target population. The relevant concepts measured by laboratory tests, performance
83
assessments such as exercise stress or cognitive function tests, or standardized clinical observations are
84
also identified. If symptoms are a defining characteristic, the appropriate symptom concepts cannot be
85
determined based on a literature review and consultation with clinical experts alone. Qualitative research
86
in the target patient population provides essential data on the patients’ perspectives of their symptoms. If
87
the impact of health on related physiologic, psycho logic, or sociologic concepts is of interest, those impact
88
concepts may also be targeted for instrument development and require patient input.
89
Disease models help to clarify and focus the specific PRO concept of interest within the context of
90
the entire disease process and the specific clinical trial population. Figure 1 illustrates a disease model for
91
psoriasis with a proposed pathway linking risk factors, diagnosis, signs and symptoms, and impacts. The
92
types of questions addressed by disease models include the following: Is the disease or condition
93
characteristically symptomatic? Are these symptoms amenable to treatment? Are there functional effects
94
of the condition, such as activity limitation due to symptoms that could be altered with treatment? What
95
other outcomes might the treatment affect? What concepts should be the focus of efficacy evaluation?
96
Because variability in measurement lowers the probability of detecting a meaningful treatment effect, the
97
more specific the concept and the closer this concept is to the goals of the treatment, the greater the
98
likelihood of success.
99
Figure 1 About Here
100
During the development of a disease model, consideration is given to the prevalence, severity, and
101
characteristics of the condition, the treatment to be tested, the target population for treatment, and
102
potential trial endpoints. Questions related to PRO candidates and trial design includes many questions,
7
ISPOR PRO Task Force: Content Validity Part I 103
depending on the actual disease or condition and trial. Will patients enrolled in the trial be experiencing
104
decrements in the symptoms, signs, or impacts that might be captured by a PRO instrument, so that the
105
effect of treatment on this outcome can be appropriately tested? Will or can patients be screened for
106
enrollment based on criteria specific to this outcome? In situations where the PRO is positioned as a
107
secondary outcome and enrollment does not include criteria related to this secondary outcome, study
108
results may be poor simply because a significant portion of study participants could not change with
109
treatment.
110
Patterns of change over time in the PRO of interest are another consideration. Is the PRO
111
relatively stable, with small changes over time? Is the condition acute with potentially large and/or rapid
112
changes with treatment? Or is the condition chronic with an expectation for minimal or slow changes in
113
the outcome of interest? For example, clinical trials for acute infectious conditions may be relatively short,
114
while trials to demonstrate a survival advantage can involve relatively long observations. Trial design,
115
including frequency of assessments, compliance, and missing data, is part of the context of use of a PRO
116
instrument that will inform the content and structure, including the items, response options, and recall
117
period.
118
Endpoint models
119
Endpoint models specify the primary and secondary endpoints to be tested in the target clinical
120
trial(s). Example endpoint models from the FDA PRO Guidance were provided in Figures 1 and 2 the
121
Guidance (2). Even when a medical product cannot be specified, e.g., in multi-sponsor instrument
122
development consortia, the anticipated role of the instrument can be shown in one or more hypothetical or
123
illustrative endpoint models to specify the context of use. In this case, the model(s) represent an educated
124
prediction of the prioritization of study hypotheses in clinical trials in which the PRO instrument is to be
125
used.
126
Of course, a new PRO instrument may serve as an exploratory endpoint in early trials, with the
127
data used to test reliability, validity, and responsiveness. The endpoint models we are describing here
128
pertain to future medical product development trials using current best clinical trial practices keeping in
8
ISPOR PRO Task Force: Content Validity Part I 129
mind that target patient populations may change to related severity or diagnostic groups. While it is
130
important to be forward looking in developing a new instrument, this can result in a concept and instrument
131
that is too generic, diluting measurement content, reducing reliability and sacrificing the near-term
132
objectives.
133
Literature review and experts.
134
Disease and endpoint models both inform and are informed by existing knowledge or experience,
135
published literature, and consultation with clinical content experts. Models focus the literature review and
136
clarify the type of experts and the role they will play in the development process. Input from the literature
137
and experts, in turn, are used to revise the disease and endpoint models as appropriate.
138
Target population - cultural/language groups.
139
As instrument development is planned, thought is given to the details of the target population,
140
including the languages and cultures of patients likely to be enrolled in clinical trials. The extent to which
141
the disease, standard of treatment, and measurement concept(s) are the same or differ across countries
142
or cultures is considered. Literature and experts can help in this discussion. If the development program
143
will be international and the concept is highly variable across countries, simultaneously developing an
144
instrument internationally may strengthen and document cultural equivalence of the final instrument. If
145
there is published or empirical evidence indicating concept stability across countries, it may be possible to
146
develop the measure in one country with review by a PRO linguistic expert to facilitate ease of translation
147
for future use.
148
Preliminary decisions on the instrument content and structure
149
As the context of use is identified and clarified, decisions are made concerning the optimal
150
instrument structure and likely content. The following principles of good measurement are among those
151
used during decision making: (1.) Consider both positive and negative content. For example, the effects
152
of treatment may include positive effects on pain and negative effects on sleep. (2.) In general,
153
respondents should not be asked to attribute the cause of their symptoms or experiences. It would be
154
difficult, for example, for subjects to know whether their breathlessness was due to congestive heart failure
9
ISPOR PRO Task Force: Content Validity Part I 155
as opposed to other causes, such as aging, anxiety, infection, etc. (3.) In general, respondents should not
156
be asked to rate change over time, but rather should be asked to evaluate their current state with an
157
appropriate recall period. Change is then computed across evaluations. (4.) Consider the method (self
158
versus interviewer administered) or mode (paper-pen, electronic, voice response) of data collection early.
159
Switching methods or modes of administration between development and use may require an additional
160
validation step to assure score equivalence.
161
Hypothesized Conceptual framework.
162
The considerations outlined above should lead to a list of the PROs of interest and the concepts
163
and sub concepts or domains comprising them.
164
shows two possible PROs of interest: psoriasis symptoms and Impacts. Within each of these general
165
PROs are concepts and sub-concepts suggestive of instrument content, e.g., pain, itching, burning etc.
166
This information informs the development of the qualitative elicitation protocol and the interview or focus
167
group discussion guide. As outlined below, the guide includes reference to what the interviewers might
168
expect to hear and areas requiring greater clarity, with the understanding that new information may be
169
uncovered, contributing to the conceptual focus and accuracy of the instrument.
170
The disease model shown in Figure 1, for example,
An example of a conceptual framework for a PRO evaluating the concept of pain is shown in
171
Figure 2. Note that the category of pain quality is divided into deep pain and surface pain. These two
172
concepts are further divided into aspects of pain quality. This conceptual framework will help with the
173
coding dictionary developed later in the process.
174
Figure 2 About Here
175 176 177
Good Practice 2: Develop the research protocol for qualitative concept elicitation The study protocol and interview guide provide documentation of the pre-specified plan for
178
identifying the sample, conducting interviews or focus groups, and analyzing data that will inform the
179
content and structure of the new instrument. Contents of the study protocol include: Study sample, data
180
collection method, setting, materials and methods, and analyses.
10
ISPOR PRO Task Force: Content Validity Part I 181 182
Study Sample Demographic and clinical characteristics of the sample should match the target population, i.e., the
183
intended clinical trial sample. For example, if clinical trials will include patients who have either psoriatic
184
arthritis or plaque psoriasis, both types of patients are included in the qualitative study sample in order to
185
allow for of the full range of comments and expressions to arise. Clinical sites and methods for participant
186
recruitment should be selected with this goal in mind. When evaluating clinical sites and/or locations for
187
possible participation, considerations should be given to geographic, educational, ethnic and racial
188
diversity, and the availability of clinical information needed to characterize and evaluate sample
189
characteristics in the final report.
190
Estimating the sample size for a qualitative study can be challenging. In quantitative research
191
protocols, sample size is estimated using analytical techniques requiring projections of magnitude likely to
192
be observed in the study (e.g., means, differences, variances, proportions, confidence intervals) together
193
with the desired power and a significance criterion. In qualitative research, sample size estimation is
194
based on projections of the data needed to reach “saturation”. Discussed further in Section 4, saturation
195
is “the point at which no new concepts [relevant to the concept of interest] are forthcoming from the
196
population being interviewed” (8) When the concept of interest is clearly defined and relatively narrow in
197
scope and the target population is largely homogenous, relatively few participants (e.g., 15 to 20) may be
198
required to achieve saturation. In contrast, situations involving a very broad, poorly defined, or
199
multidimensional concept or heterogeneous target populations will involve larger sample sizes (e.g., 40 or
200
more). As noted in the FDA Guidance (2), “the number of patients is not as critical as interview quality and
201
patient diversity included in the sample in relation to intended clinical trial population characteristics.”
202
Data collection method
203
Individual interviews and focus groups are the data collection methods used in qualitative research
204
involving concept elicitation for instrument development purposes (9). A summary of the advantages and
205
disadvantages these methods are shown in Table 2. Focus groups are economical and can stimulate
206
discussion of topics and comparison of experiences across participants that cannot be captured in
11
ISPOR PRO Task Force: Content Validity Part I 207
individual interviews (9-12).
208
when run by inexperienced or untrained leaders. One example is a highly vocal, assertive participant who
209
dominates or leads the discussion, minimizing participation of other group members and resulting in
210
content, tone or perspectives that do not necessarily represent those of individuals or the group as a
211
whole. Interviews are ideal for concepts that are sensitive or target populations/people unlikely to
212
volunteer or share information in a group setting(13). There are also disadvantages to individual
213
interviews. For example, by design, interviews must be conducted sequentially or by multiple interviewers,
214
both of which are more expensive and time consuming (14). Interviews are usually the best methodology
215
for concepts that are sensitive or target populations unlikely to volunteer or share information in a group
216
setting.
217 218 219
Unfortunately, there are also risks associated with focus groups, particularly
Table 2 About Here Setting Focus groups and interviews may be conducted in in-patient settings, out-patient clinics, or
220
dedicated research facilities. Interviews may also be conducted at participant homes or, in some cases
221
over the telephone (e.g., rare, episodic, or contagious conditions). The appropriate setting depends on the
222
target population, including illness severity or contagiousness, physical mobility, psychological state, or
223
other factors that would affect a person’s ability to travel or participate. The setting should optimize the
224
extent to which the sample is consistent with the target population by making participation accessible.
225
Materials and Procedures
226
The interview or focus group guide includes the questions that should be addressed and how the
227
interviews or focus group should unfold for optimal clarity and data quality. It is not a script to read
228
verbatim, but is a manual that provides the interviewer with an organized summary of the topics to be
229
discussed, specific questions for each topic, and sample probes that can be used to further explore areas
230
when needed. Exploratory questions may be included to uncover features of the condition or its treatment
231
that may not be well understood through previous research and clinical experience.
232
The specific content of questions comprising the interview guide is dictated by the context of
12
ISPOR PRO Task Force: Content Validity Part I 233
measurement, including the disease and endpoint models and draft conceptual framework. For example, if
234
pain is hypothesized as an important symptom in the disease model, the interview/focus group guide
235
includes questions to understand the patient’s experience of pain, which may include frequency, severity,
236
duration, and or impact. The reference timeframe, that is the timeframe the participants are asked to
237
consider as they respond to the questions, will also depend on the PRO and measurement context. For
238
example, when developing a measure for chronic heart failure patients, participants may be asked to recall
239
and describe a recent acute episode or hospitalization or their experiences during the current day or week.
240
In general, it is desirable for the reference timeframe to be as close as possible to the interview or focus
241
group, in order to diminish recall errors and bias. One method known as the day-reconstruction approach
242
(15) can be used to focus a participant on a specific day as they describe symptoms, impacts or other
243
experiences relevant to the target concept.
244
Unless carefully worded and conducted, interview questions and procedures can introduce bias
245
into the data. For example, certain closed-ended or highly specific questions can be leading, such as “you
246
experienced pain in your knee today, right?” or “how depressed were you during this event?” Questions
247
should be open-ended whenever possible and worded to encourage spontaneous information from the
248
participant without pointing them toward a specific response. With this in mind, open-ended questions that
249
are too broad can be confusing to participants. “What was yesterday like for you?” or “Tell me about your
250
condition”, for example, lacks the specificity required for participants to address the concept of interest and
251
can lead to irrelevant data. Open-ended question should include parameters consistent with the concept
252
of interest. If the concept of interest is knee pain, the interviewer could ask: How did your knee feel
253
yesterday? With probes to better understand the nature and characteristics of the experience offered by
254
the participant. This approach provides data on the words and phrases participants use to describe their
255
condition that will inform instrument content.
256
Interview questions can also address multiple dimensions of a concept. For example, it may be
257
useful to understand the severity, duration, and frequency of a particular symptom. Pain that can be
258
severe, but doesn’t last long or occur very often may be a very different experience from pain that is
13
ISPOR PRO Task Force: Content Validity Part I 259
moderately severe, but occurs frequently and lasts for a long period of time. Understanding these
260
dimensions of an experience can be useful for developing a new instrument based on a complete picture
261
of a participant’s experience. The following list of questions show how more specific symptom-related
262
information might be obtained once the symptom has been elicited by more open ended questioning: How
263
often do you have (symptom X)? How severe is the (symptom X)? How long does it usually last? Does
264
anything make (symptom X) better or worse? Please tell me more about that. Do you have any other
265
sensations or symptoms when you feel (symptom x)? Questions to elicit information about symptom
266
impact might include: How do your symptoms affect or influence your everyday life? Probes might
267
include: How does symptom X affect your daily activities? Does it affect your relationship with others? Tell
268
me more about (the difficulty you have performing activity X).
269
Once a draft interview guide has been created, it is reviewed by other qualitative researchers for
270
possible difficulties in flow, redundancy, poorly formulated questions, and the appropriate use of
271
terminology and probes. The draft guide should be pretested with study naïve individuals or colleagues
272
or, ideally, pilot tested in the target population to identify areas that do not flow easily or may confuse
273
respondents before primary data collection begins.
274
Analyses
275
As with a clinical trial, the interview protocol should also include a plan for analyzing, summarizing,
276
and interpreting the interview data. Unlike quantitative analyses, there are no inferential statistical tests
277
involved. Rather, this portion of the protocol describes the methods that will be used to identify, code, and
278
summarizing themes, procedures for quality control, and methods for determining and documenting
279
saturation. Qualitative analyses are discussed further in section 4.
280 281 282 283
Good Practice 3: Conduct the concept elicitation interviews and focus groups The research protocol must be reviewed and approved by an appropriate institutional review board prior to the initiation of subject recruitment and data collection. Sites are provided a copy of the study
14
ISPOR PRO Task Force: Content Validity Part I 284
protocol and trained on inclusion/exclusion criteria, sample monitoring, recruitment processes, and
285
informed consent procedures.
286
Interviewers and focus group facilitators should be experienced in qualitative research methods
287
and trained on the background and objectives of the protocol. Mock interviews or focus groups may be
288
used to help the interviewers/facilitators develop a complete understanding of the questions and process
289
and assure a smooth, clear data collection process. Sustained interaction with interviewers is important to
290
establishing and maintains quality of data collection.
291
Core competencies in concept elicitation interviewing are shown in Table 3. The concept elicitation
292
process is intentionally broad in order to explore and define information from the perspective of the patient.
293
A well constructed interview guide defines the broad territory of discussion, leaving no need for the
294
interviewer to censure or discount participant responses. Although discipline is needed to keep the
295
participant or focus group “on task”, interviewers should avoid being overzealous in assuming irrelevance,
296
favoring an open dialogue among participants to encourage participation. Interviewers should be aware
297
that their body language and actions, such as nodding in agreement, frowning, or sighing, can
298
communicate approval or disapproval of the participant’s contribution, altering the content or emphasis of
299
subsequent information. Interviewers should remain neutral, while conveying genuine interest to
300
encourage open and honest communication. Hallmarks of interviewer skill rest on the ability to get the
301
participant to talk about the areas and topics of interest in a natural conversational engagement, where
302
they feel they are being heard and respected.
303
Table 3 About Here
304
Concept elicitation interviews and focus groups are recorded (either audio or video) to fully capture
305
the context and content and produce transcripts that form the data for analysis. Audio recordings are
306
generally preferred because they are easier to perform and transcribe, facilitate participant anonymity, and
307
are generally more comfortable for participants, particularly when sensitive topics are being discussed.
308
Regardless of recording method, participants need to be assured of the confidentiality and limited usage of
309
the recorded materials from their interviews. In addition to being essential for data analyses, recordings
15
ISPOR PRO Task Force: Content Validity Part I 310
can be monitored for quality assurance by a senior interviewer who provides feedback to the interviewer to
311
maintain or improve the quality of data collection throughout the duration of the study by improving
312
question clarity, altering probes, and/or pursuing specific aspects in greater detail.
313
Recording frees the interviewer or moderator from note taking in order to engage fully with the
314
participant(s). For focus groups, an assistant moderator is often useful to observe the group and take
315
notes to facilitate data interpretation. These notes include a seating chart with participant initials and key
316
points associated these initials. This also helps in checking the transcriptions of focus group recordings.
317
Transcriptions of the audio/video recordings need to be verbatim and reviewed, quality checked,
318
and cleaned by the facilitators/interviewers and associates. Cleaning includes: (a) removal of any
319
personal identifiers; (b) correction of any medical terms that the transcribers did not recognize or
320
misspelled; and (c) removal of any clearly extraneous narrative (for example, the participant answers their
321
cell phone or the nurse walks in with a message). Dialogue that is related but not central to the purpose of
322
the interview can be retained in the transcript and separated during the coding process to document the
323
irrelevance of the information for data analyses. Transcript quality is assessed through the direct
324
comparison of voice and transcript files, generally performed randomly. Once transcripts have been
325
quality checked and cleaned, qualitative analysis begins.
326 327 328
Good Practice 4: Analyze the Data Analyze according to the theoretical approach. There are multiple theoretical approaches and
329
methodologies that can be applied to qualitative research procedures and data analyses, including
330
phenomenology, grounded theory, content analysis and thematic analysis(16-19). In qualitative research
331
to inform instrument development, data collection and analyses are interrelated and concurrent rather than
332
linear: “Analysis is the interplay between researcher and data. It is both science and art” (20). All of these
333
approaches are idiographic (focus on the individual) in contrast to a quantitative nomothetic paradigm
334
(focus on the general) founded in positivism (21). Across all qualitative methods, the purpose is to
16
ISPOR PRO Task Force: Content Validity Part I 335
understand participant perspectives and experiences using “decontextualisation” i.e., assigning of codes
336
and “recontextualisation” (i.e., reducing data around central themes).
337
Phenomenology as an overarching theoretical framework and grounded theory as a specific
338
methodology have been proposed as most appropriate for the development of a new PRO instrument(18).
339
An adaptation of grounded theory has also been proposed (14) which allows for the use of prior
340
knowledge in the analysis of data. This added deductive element to an otherwise inductive approach is
341
consistent with the need to draw from existing information, pulled together as part of context-of-
342
measurement development (Step 1), to identify themes and concepts in the data and interpret the results
343
in light of the ultimate goal, to develop a new PRO instrument for a specific use. This approach also
344
permits moving back and forth between a hypothetico-deductive and inductive approach where the
345
developer’s understanding can change based on new information and / or observations, resulting in an
346
iterative process of instrument development.
347
It is important to clearly describe how the data were analyzed, i.e., what was done and why.
348
Existing guidelines for performing qualitative research can aid in structuring the description, evaluating the
349
process used and, and determining how best to present and discuss results (22-24),(25, 26).
350 351 352
Coding qualitative data for instrument development The primary goal of transcript coding is to organize and catalog a participant’s descriptions of their
353
experiences within the context of measurement. The coding process for different qualitative approaches
354
share methodologies for decontextualisation and recontextualisation, even when the coding focus differs.
355
For example, based on a phenomenological approach, one can identify descriptions of the phenomenon
356
that are universal (phenomenology); based on grounded theory, one can use open coding (examining,
357
comparing, conceptualizing, and categorizing data), axial coding (reassembling data into groupings based
358
on relationships and patterns within and among the categories identified in the data); and selective coding
359
(identifying and describing the central phenomenon, or “core category.
17
ISPOR PRO Task Force: Content Validity Part I 360
The “coding framework” is an initial structure or organization of codes for grouping clusters of
361
information that form a coherent theoretical unit. This framework is based on the disease model and draft
362
conceptual framework developed at the onset of the work. A preliminary coding framework is developed
363
and revised during data analyses based on information and insight gained during data review, including
364
the development of new codes to represent clusters of new information. Data coding is an iterative
365
process, and should include opportunities to be re-examined and re-analyzed until no new codes are
366
identified and all relevant concepts have been assigned one or more codes.
367
Figure 3 shows the various inputs into the development of a coding framework (structure to hold
368
codes) to the completed “coding dictionary” (document inclusive of all codes assigned with definitions as
369
appropriate for standardization, clarity, and communication). A coding framework provides patient-based
370
insight into the relevance of concepts included in the disease model and conceptual framework. A coding
371
dictionary is used to assure consistency in coding across data analysts or coders and to document and
372
communicate the meaning of the codes to external reviewers.
373
Figure 3 about here.
374
Presentation of the coded qualitative data is intended to identify both the predominance of
375
participants expressing the concepts and to provide a description of the language that the participants use
376
to talk about those concepts. Depending on the qualitative approach, the presentation of codes and
377
themes might differ. A thematic ‘map’, i.e., an overall conceptualization of the data patterns and the
378
relationships between them will be produced when using a thematic analysis(19).
379
Computer-assisted qualitative data analysis software programs, such as Atlas.ti (27) can be used
380
to organize the data and coding scheme for easier retrieval and analyses. These programs do not assign
381
codes to the data; skilled decision making is still needed to allocate participant expression of concepts to
382
the appropriate code.
383
Assessing Saturation.
384 385
Best practice is to code and assess saturation at multiple points during the data collection process, Data should be transcribed and coded on a rolling basis with regular intervals of assessment to evaluate
18
ISPOR PRO Task Force: Content Validity Part I 386
the consistency of the code assignment process, adequacy of the coding framework and to monitor the
387
appearance and organization of newly appearing concept codes. Careful monitoring during the coding
388
process and a phased approach to assessing saturation provides the researcher with insight into the data
389
as the study progresses and an opportunity to return to the field for comprehensiveness or clarity.
390
To assess saturation of concept, transcripts and coding can be evaluated after a set of 5 to 8
391
interview or focus group transcripts become available. A saturation table is used to track either the new
392
appearance of concepts, or noting all occurrences of the concept across the transcript groups. Data are
393
examined for either the continued identification of new concepts (newly appearing codes) or codes
394
requiring further examination to confirm relevance or the attainment of saturation.
395
Codes are identified in each next set of transcripts and compared with the codes that appeared in
396
the previous groups. In the best case scenario, saturation is documented by showing no new concepts
397
arising in the last several interviews or final focus group. In reality, it is not uncommon for a new concept
398
to arise late in data collection process. Scientific judgment, including knowledge of the field and
399
consultation with experts, is used to determine if this new concept is an outlier, i.e., reflecting a relevant
400
but unusual case, and further judgment is required to determine if additional data collection is required or
401
warranted to re-assess saturation following this late revelation.
402
Multiple coders
403
Best practice in analyses of qualitative data from elicitation interviews involves two or more coders.
404
Each coder is carefully trained around the purpose of the study, target concept, nature of data itself, the
405
coding framework, and the coding dictionary. Each coder completes 1-2 transcripts and meets to
406
compare codes assigned, identify areas of consistency and inconsistency, reconcile the codes on these
407
transcripts, and revise the coding framework and dictionary for clarity and to enhance consistency in
408
subsequent transcript coding. This process is repeated regularly throughout the coding process. An
409
agreement is defined as a set of words or phrases identified as reflecting the same code and/or sub-code.
410
Given the nature of qualitative data, flexibility is permitted around the words that constitute the word set or
411
phrases. For example, two coders assigning the codes “pain” and “pain with kneeling” to the transcript
19
ISPOR PRO Task Force: Content Validity Part I 412
text “You know I am always in pain when I kneel” would be considered in agreement, even though one is
413
more specific than the other.
414
Assuring coding precision can take several forms. One approach is to have a “super coder” review
415
all data to assure consistency across coders. A second approach is to draw a random selection of
416
transcripts that are dually coded and assess inter rater agreement. Through discussion of coding and
417
reconciliation when disagreement between coders is uncovered, greater than 90% agreement can be
418
reached. These methods are similar to those in interviewer coded audio recordings using psychiatric
419
ratings scales where inter-rater agreement is critical and inter-rater agreement is assessed until it reaches
420
90% or higher (28). Regardless of the approach used, the coding method and procedures for quality
421
assurance should be carefully documented.
422
Multi-Vectored Analysis of Qualitative Data
423
Analyzing qualitative data is a multi-vectored assessment where different vectors of information are
424
gained throughout the qualitative interview process. These often include: pre-specified concepts
425
(symptoms, signs, limitations, worries, impacts, etc.); concepts participants report spontaneously versus
426
those they recognize when probed; predominant language participants use to express various concepts;
427
variability in experience around concepts; the most meaningful way to address concepts (attributes of
428
frequency, severity, duration); and the degree of difficulty, bother and/or impact.
429
The selection of any one format, focus or analytic approach is dependent on the purpose of the
430
study. For example, an exploratory analysis aiming to elicit concepts for theory development might focus
431
more on presenting information vectors like “relevant concepts” and “patient language”. In contrast,
432
analyses for instrument development requires a focus on information vectors to successfully craft items,
433
response options, instructions and recall, such as the “attributes” and “variability” associated with the
434
target concept. Each information vector can have one or more uses and can be presented for
435
assessment in a number of formats (i.e., by content, by predominance, by actual scores, or by proportion),
436
depending on the type of inference to be drawn and the framework and analytical method chosen. This
437
multi-vectored approach is illustrated in Figure 4.
20
ISPOR PRO Task Force: Content Validity Part I 438 439
Figure 4 About Here The language in participant quotes provides a rich picture of the participants’ experiences with the
440
target concept. In qualitative research for instrument development, the goal is to understand, organize,
441
and communicate the meaning of the data and translate that meaning into a quantitative measure. The
442
analysis of qualitative data is not quantitative; there is no effect size, significance level, or other
443
quantitative metric. The goal of qualitative analyses to understand and communicate the meaning
444
embedded in a dataset comprised of words and phrases. This is done by analyzing, organizing and
445
summarizing the data in a manner that shows the relationship between the concepts, the words and
446
phrases, and the final PRO instrument. Because each vector of information involving patient input
447
contributes a unique aspect of understanding and communication, the use of multiple vectors of
448
information provides an instrument developer with greater confidence that the concept is understood and
449
the instrument adequately expresses this understanding.
450 451 452
Good Practice 5: Document Concept Development and Elicitation The FDA PRO Guidance lists the information to be provided by Sponsors in PRO dossiers in an
453
Appendix (2)pp 35-39). The FDA Guidance proposes an order and taxonomy based on the wheel and
454
spokes diagram that provides a logical flow for organizing the report to support the PRO being submitted
455
for review in relation to the claims desired and the development process ((2), page 7). For both the FDA
456
and EMA reviews, documentation begins with the PRO instrument to be reviewed, followed by a
457
description of the steps used to identify concepts and create the instrument.
458
Concept elicitation methods are part of the evidence supporting content validity as recommended
459
in the first two spokes of the FDA diagram. Essential documentation of content validity includes both
460
concept elicitation discussed in this paper and cognitive interviewing discussed in the next (Part II). This
461
qualitative evidence may be accompanied by supplementary quantitative evidence that confirms or revises
462
the proposed conceptual framework. Essentially the early content validity documentation provides
463
evidence that the proposed instrument captures the most important concepts as viewed by the target
21
ISPOR PRO Task Force: Content Validity Part I 464
population, and that the concepts are complete and relevant to persons in the target population. This
465
evidence is specific to the planned clinical trial population and indication, i.e., the context of measurement.
466 467
Consistent with the FDA Guidance, documentation of the concept elicitation phase of instrument development include the following elements:
468
•
Target claims and description of the target population (i.e., from Target Product Profile)
469
•
The preliminary and final disease model
470
•
The underlying endpoint model
471
•
Preliminary and revised conceptual framework for the PRO instrument based on qualitative studies
472
conducted prior to testing of measurement properties
473
•
Literature review and documentation of expert input
474
•
Qualitative study methods and results, including protocols, interview guides, and results
475
•
Evidence of saturation.
476
•
Origin and derivation of concepts captured in the PRO instrument
477
•
Summary of qualitative data supporting the concepts, items, response options, and recall period
478
Organizing the document in a manner consistent with recommendations contained in the FDA
479
PRO Guidance makes it easier for reviewers to determine if the essential elements of qualitative
480
development of a new PRO instrument are included in a submitted dossier. Further recommendations on
481
documentation of item wording, cognitive interviewing and the final item tracking matrix prior to
482
quantitative evaluation are contained in the following manuscript.
483 484 485
Conclusion This paper outlines the steps needed to derive a new PRO instrument for use in medical product
486
development trials evaluating the benefits and risks of treatment. The paper covered the steps of
487
concept elicitation, from determining, defining and documenting the context of measurement to the
488
analyses of qualitative data from interviews and focus groups and the documentation of methods and
489
results of this work. Examples have been provided to clarify specific steps and inform the development of
22
ISPOR PRO Task Force: Content Validity Part I 490
documentation needed to support the content validity of the new measure. Paper II of this 2-part task
491
force report covers the creation of the new PRO instrument, evaluating its clarity and content validity
492
through cognitive interviewing, and documenting this work for medical product evaluation.
23
ISPOR PRO Task Force: Content Validity Part I
REFERENCES
1. Rothman M, Burke L, Erickson P, Leidy NK, Patrick DL, Petrie CD. Use of Existing PatientReported Outcome (PRO) Instruments and Their Modification: The ISPOR Good Research Practices for Evaluating and Documenting Content Validity for the Use of Existing Instruments and Their Modification PRO Task Force Report. Value Health2009 Sep 25. 2. U.S. Department of Health and Human Services FDA. Guidance for Industry- Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. 2009 [cited 2010 12-29]; Available from: http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM193282 .pdf 3. European Medicines Agency. Reflection Paper on the Regulatory Guidance for the Use of Health Related Quality of Life (HRQL) Measures in the Evaluation of Medicinal Products. London: European Medicines Agency; 2004 [cited 2010 12-29]; EMEA/CHMP/EWP/139391/2004]. Available from: http://www.ema.europa.eu/pdfs/human/ewp/13939104en.pdf. 4. Acquadro C, Berzon R, Dubois D, Leidy NK, Marquis P, Revicki D, Rothman M. Incorporating the patient's perspective into drug development and communication: an ad hoc task force report of the Patient-Reported Outcomes (PRO) Harmonization Group meeting at the Food and Drug Administration, February 16, 2001. Value Health2003 Sep-Oct;6(5):522-31. 5. Patrick DL, Burke LB, Powers JH, Scott JA, Rock EP, Dawisha S, O'Neill R, Kennedy DL. Patientreported outcomes to support medical product labeling claims: FDA perspective. Value Health2007 NovDec;10 Suppl 2:S125-37. 6. U.S. Department of Health and Human Services F. Guidance for Industry- Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. 2009. 7. American Educational Research Association APA NRCoMiE. Standards for Educational and Psychological Testing. . Washington, DC: AERA; 1999. 8. Guest GB, A; Johnson, L. How many interviews are enough? - An experiment with data saturation and variability. Field Methods2006;18(1):59-82. 9. Lehoux P, Poland B, Daudelin G. Focus group research and "the patient's view". Soc Sci Med2006 Oct;63(8):2091-104. 10. Kitzinger J. Qualitative research: Introducing focus groups. Brit Med J1995;311:299-302. 11. Hollander J. The social contexts of focus groups. J Contemp Ethnogr2004;33:602-37. 12. Smithson J. Using and analysing focus groups: limitations and possibilities. Int J Social Research Methodology2000;3:103-19. 13. Holstein JGJ, editor. Handbook of Interview Research: Context and Method. Thousand Oaks, CA: Sage; 2002. 14. Brod M, Tesler LE, Christensen TL. Qualitative research and content validity: developing best practices based on science and experience. Qual Life Res2009 Sep 27. 15. Kahneman D, Krueger AB, Schkade DA, Schwarz N, Stone AA. A survey method for characterizing daily life experience: the day reconstruction method. Science2004 Dec 3;306(5702):1776-80. 16. Denzin N, Lincoln, YS, editor. The SAGE Handbook of qualitative research. 3rd ed. Thousand Oaks, London, and New Delhi: Sage Publications; 2005. 17. Starks H, Trinidad, SB. Choose your method: a comparison of phenomenology, discourse analysis, and grounded theory. Qual Health Res2007;17:1372-80. 18. Lasch KE, Marquis P, Vigneux M, Abetz L, Arnould B, Bayliss M, Crawford B, Rosa K. PRO development: rigorous qualitative research as the crucial foundation. Qual Life Res2010 May 30. 19. Braun V, Clark, V. Using thematic analysis in psychology. Qualitative Research in Psychology2006;3(2):77-101. 20. Strauss A, Corbin, J. Basics of Qualitative Research. Newbury Park, CA: Sage; 1990.
24
ISPOR PRO Task Force: Content Validity Part I 21. Ponterotto JG. Qualitative research in counseling psychology: a primer on research paradigms and philosophy of science. J Counseling Psychology2005;52(2):126-36. 22. Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care2007 Dec;19(6):349-57. 23. Mays N, Pope C. Qualitative research in health care. Assessing quality in qualitative research. BMJ2000 Jan 1;320(7226):50-2. 24. Elliot B. Using narrative in social research: qualitative and quantitative approaches. London: Sage Publications; 2005. 25. Cochrane Qualitative Research Methods Group. Available from: http://www.joannabriggs.edu.au/cqrmg/about.html. 26. The British Psychological Society. [12/28/2010]; Available from: http://www.bpsjournals.co.uk/journals/joop/qualitative-guidelines.cfm. 27. Muhr T. User's Manual for ATLAS.ti 5.0. . Berlin: ATLAS.ti Scientific Software Development GmbH; 2004. Available from: http://www.qsrinternational.com/products_nvivo.aspx 28. Overall JE, Gorham DR. The brief psychiatric rating scale. Psychol Rep1962 Nov;10:799-812.
25
ISPOR PRO Task Force: Content Validity Part I
Table 1
Five Steps to Elicit Concepts for New Patient-Reported Outcome Instruments and Document
Content Validity Consistent with Good Research Practices* ____________________________________________________________________________________ 1. Determine the context of measurement •
Develop a hypothesized disease model based on literature, experts, and patients
•
Name and define the concept within the context of a clinical trial end-point model
•
Select and define the target population
•
Conduct a literature review, prepare list of candidate items from disease model and existing instruments addressing the same concept, and consult content experts
•
Select the target cultural/language groups
•
Make preliminary decisions on instrument content and structure
•
Develop an hypothesized conceptual framework for the instrument
2. Develop the research protocol for qualitative concept elicitation
3.
•
Define the target sample characteristics
•
Select the data collection method - focus groups, individual interviews, both
•
Determine the setting and location for data collection
•
Develop the interview guide – Draft, pilot, revise
•
Determine quality control procedures for data collection and monitoring
•
Develop a preliminary qualitative analysis plan
Conduct the concept elicitation interviews and focus groups •
Obtain IRB approval
•
Recruit and train sites
•
Recruit participants; monitor sample characteristics to assure representation
•
Select and train interviewers
•
Conduct interviews – implement quality control measures
•
Record or videotape interviews
•
Transcribe and clean transcripts
4. Analyze qualitative data •
Analyze qualitative data according to theoretical approach used
•
Establish preliminary coding framework; update as data are coded
•
Establish coding procedures and train coders
•
Organize data using a qualitative research software program Assess saturation
•
Interpret results
26
ISPOR PRO Task Force: Content Validity Part I 5. Document concept development and elicitation methodology and results •
Provide context for use
•
Specify and define the concept
•
Denote the target claims and population
•
Provide a disease model and an endpoint model
•
Provide supporting documentation for concept
•
Show the original and revised conceptual framework
•
Summarize the literature review
•
Document input from content experts
• •
Present the methods and results of qualitative research Provide clear evidence of saturation
•
*Steps to develop an instrument, evaluate the new measure through cognitive interviewing, and document that aspect of content validity are addressed in Part 2 of the Task Force report.
27
ISPOR PRO Task Force: Content Validity Part I
Table 2
Focus Groups & Interviews: Advantages and Disadvantages Focus Groups
Advantages
Interviews
Rich source of data
Allows individuals to use
detailed information about an
ideas of others as cues to
individual’s experience
express their own views
Participants can compare
Get more in-depth and
Can be useful for sensitive topics
their experiences with others
Data can be easier to analyze
Able to reach many
Scheduling can be easier
It may take longer to collect
participants at once Disadvantages
Data can be tough to analyze because talking can be in
the data
reaction to the comments of
other group members
view at a time; no peer
Moderators need to be highly
comparison
trained and able to lead the
Limited to one participant’s
Interviewers need to be
group
trained with excellent one-on-
One strong group member
one communication skills
can sway tone of entire group
May be more costly (e.g. travel, room rental, transcription fees, etc…)
28
ISPOR PRO Task Force: Content Validity Part I Table 3
Form for Evaluating Core Competencies in concept elicitation interviewing
FOCUS OF EVALUATION
Criteria met?
Issues Found In
REMEDIES
Yes
Evaluation
NEEDED
No
PREPARATION to start interview w/subject Prepared for Interview, Familiar with interview content, mechanics of worksheets, tallies, record keeping Demonstrates understanding of interview content, comprehends patient response in context of goals PROTOCOLS followed as identified in interview guide Identified primary purpose of Interview/Focus Group to subject(s) Adhered to interview guide Covered all probe content Allowed time for participant to spontaneously respond to probes before offering examples Thoroughly explored responses to probes Asked for additional comments at completion of interview COMPETENCIES Demonstrated in Conduct of Interview Responsive to lack of understanding of subject to question/topic – able to reframe question for participant understanding Allowed subject time to respond without interrupting/rushing subject Offered a minimum of 3 examples where needed Control of interview (keeping subject on topic, familiar with interview logistics) Stayed neutral – avoids confirming
29
ISPOR PRO Task Force: Content Validity Part I
FOCUS OF EVALUATION
Criteria met?
Issues Found In
REMEDIES
Yes
Evaluation
NEEDED
No
subjects responses Actively promoted in depth responses from subject Avoiding leading Questions Using participant language Recognizing the symptom has already been explored – spontaneously by participant GENERAL COMMENTS Overall feeling from the interview:
Things done well:
Things that need improvement:
Printed with permission of M. L. Martin, Health Research Associates
30
ISPOR PRO Task Force: Content Validity Part I
31
ISPOR PRO Task Force: Content Validity Part I
Figure 2: Example conceptual framework for a PRO evaluating the concept of pain quality
Item Concept
aching
Deep pain Item dull Concept Pain quality
Item itchy Concept Surface pain
Item numb
Item tingling
32
ISPOR PRO Task Force: Content Validity Part I Figure 3: From coding framework to coding dictionary
Literature Clinical Experts Interview Guide Hypothesized TPP
CODING DICTIONARY
CODING FRAMEWORK
Structure
Contains all
Starting
Expanded to
codes assigned,
Structure for
add more
grouped by
concepts
concept, and
PRO Instrument Review Hypothesized Conceptual Framework
Patient Interview results
All Code Assignments from transcripts
Printed with permission of M. L. Martin, Health Research Associates
33
ISPOR PRO Task Force: Content Validity Part I Figure 4:
Multi-Vectored Approach to Understanding Qualitative Data Patient Language
Attribute to Measure
Relevance of Concept
INTERPRETATION & MEANING
Necessary Coverage (degree of importance to patient)
• •
Variability
Presenting the relation of qualitative results to concepts wording, with most meaningful and appropriate measurement design, response options and recall period
Degree of Bother
Degree of Difficulty
34
Essential Coverage (degree of importance clinically and to measurement strategy