Content Validity-Establishing and Reporting the Evidence in Newly ...

Establishing and Reporting Evidence of the Content Validity of Newly-Developed Patient-Reported Outcome (PRO) Instruments for Medical Product Evaluation: Good Research Practices

Donald L. Patrick PhD, MSPH, Laurie B. Burke RPh, MPH, Chad Gwaltney PhD, Nancy Kline Leidy PhD, Mona L. Martin RN, MPA, Lena Ring PhD

Part I Developing Content for a New PRO Instrument

RUNNING TITLE: Developing Content for a New PRO Instrument CORRESPONDING AUTHOR: Donald L. Patrick PhD, MSPH, University of Washington, Box 359455 Seattle, Washington 98195-9455 [email protected]

KEY WORDS: content validity; patient reported outcomes; FDA, EMA; quality of life Authors are listed in alphabetical order by surname after the senior author. The views expressed herein represent those of the authors and not those of the University of Washington, Food and Drug Administration, PRO Consulting, United BioSource, Health Research Associates, AstraZeneca, or Uppsala University.

ISPOR PRO Task Force: Content Validity Part I ABSTRACT Background: A patient-reported outcome (PRO) instrument is a means to capture data for assessing treatment benefit or risk in medical product evaluation. Two articles in these issue present conclusions of an ISPOR taskforce convened to address good research practices for documenting content validity in newly developed PRO instruments. Content validity is the extent to which the content of a new PRO instrument adequately represents a given concept or set of concepts. We use the specific context of a PRO instrument newly developed to support PRO claims in medical product labeling. Paper I outlines steps for gathering and presenting qualitative evidence to support the inclusion of concepts in the new instrument. Paper II addresses how to gather evidence that persons in the target population understand the content of the new instrument. Both papers present suggestions for documenting the chosen qualitative theoretical approach, methods, results and conclusions. Adequate qualitative evidence is critical to ensure PRO instrument content validity. These papers do not address methods that mix qualitative and quantitative approaches to establishing content validity; however, the same qualitative research principles apply. Mixed qualitative and quantitative approaches to content validity testing will be addressed in future papers. Methods for Paper I: Five good practices consistent with U.S. and European review processes are addressed in chronological order: 1.) plan context of measurement; 2.) develop protocol for qualitative concept elicitation; 3.) conduct concept elicitation interviews and\or focus groups; 4.) Analyze qualitative data for concept elicitation; and 5.) document concept development and elicitation. Illustrations are given of suggested ways to collect and present evidence. Results and Conclusions of Paper I: Using qualitative evidence to support content validity requires a clear understanding of the actual intervention study design (e.g., the entry criteria for the clinical trial population) and the targeted context of measurement. Qualitative research applies to all PRO instruments used to support labeling claims and must be completed well before confirmatory (Phase III) trials are initiated to allow time for instrument finalization. The qualitative study protocols address a broad range of the target population demographics and characteristics and include a plan for data analyses. Conducting

2

ISPOR PRO Task Force: Content Validity Part I Interviews or focus groups requires trained interviewers with appropriate quality controls. Qualitative analyses require trained coders, demonstration of saturation and clearly presented results supported by the transcripts of audio recordings. Detailed documentation of the entire concept elicitation process provides the body of evidence to support conclusions drawn by qualitative researchers that the instrument measures a certain concept. The evidence must support patients’ responses and outcomes data using the language of the instrument items correspond to the concept that is reflected by the instrument score(s) and that the concept is adequately covered by the instrument. The detailed documentation is also reviewed in a regulatory setting to determine whether medical product claims are truthful and not misleading when the instrument is used in an outcomes trial to measure treatment impact.

3

ISPOR PRO Task Force: Content Validity Part I

Background

1

The ISPOR Health Science Policy Council and the ISPOR Board of Directors recommended that

2

an ISPOR Task Force be established on Good Practices in Establishing and Reporting Evidence of the

3

Content Validity of Newly-Developed Patient-Reported Outcomes (PRO) Instruments for Medical Product

4

Evaluation. The purpose of this task force was to extend the work of a previously published report on the

5

use of existing or modified PRO instruments to support medical product labeling claims (1) by addressing

6

methods for assuring and documenting the content validity of newly-developed PRO instruments.

7

The chair of this task force, (Donald L. Patrick, PhD) recruited members based on their experience

8

as scientific leaders and practitioners in the field, as well as developers and users of PRO instruments. A

9

range of perspectives on PRO instruments was provided by the diversity of their work experience:

10

academia, government, research organization, and industry. In addition, forty-seven members of the

11

ISPOR Patient Reported Outcomes Review Group provided written comments on the draft reports. In

12

addition, oral feedback was provided at the PRO Forum held during the ISPOR 15th Annual International

13

Meeting in Atlanta. The task force met regularly via conference calls and held one face-to-face meeting.

14

During content and outline development, the task force decided two papers would be needed: Part

15

I covers the development of content for a new PRO instrument, i.e. concept identification to inform content

16

and structure using qualitative focus group and interview methodology, while Part II covers item

17

development and the assessment of patient understanding of the draft instrument using cognitive

18

interviews and steps for instrument revision. The two parts are meant to be read together. Rather than

19

prescriptive, they are intended to offer suggestions for good practices in planning, executing, and

20

documenting the process of content validation of PRO instruments to be used in medical product

21

evaluation.

22 23 24 4

ISPOR PRO Task Force: Content Validity Part I 25

PART I

26

Developing Content for a New PRO Instrument

27 28

Definition of Terms

29

The term “PRO” is often used interchangeably to refer to a concept, instrument, questionnaire, score, or

30

claim. According to the FDA Guidance, A patient reported outcome (PRO) is any report of the status of a

31

patient’s health condition that comes directly from the patient, without interpretation of the patient’s

32

response by a clinician or anyone else(2). In Europe, the European Medicines Agency Reflection Paper

33

(3) released a reflections paper on the place of health-related quality of life (HRQL) in medical product

34

development, specifying that this was one type of PRO. This task force report uses the term “PRO” to

35

refer to the general concept or outcome of interest. “PRO” serves as the umbrella term covering all

36

patient-reported outcomes, with HRQL one specific type (4). A PRO instrument or measure is a means to

37

collect data. Questionnaires and diaries are examples of PRO instruments. The term instrument refers to

38

item content (stem and response options), instructions, and recall period. PRO scores are numeric values

39

or categorical assignment generated through the use of a PRO instrument and used to represent the PRO

40

of interest.

41

In medical product development, PRO instruments may be used in clinical trials to capture and

42

quantify treatment benefit or risk (5, 6), with the possibility that this information will be use3d to support a

43

“claim” in medical product labeling. Within this context, it is useful to distinguish the PRO concept, claim,

44

instrument, and score (5). For example, pain intensity is a PRO (the concept); decrease in pain intensity

45

is a PRO claim; a 10-centimeter visual analog scale (VAS) assessing pain intensity, including the anchors,

46

instructions, and recall period, is a PRO instrument; and the value a subject assigns to their pain intensity

47

on the VAS is a PRO score.

48

Content validity is the extent to which the content of an instrument represents the most important

49

aspects of a given concept (7), in this case, the extent to which it represents the PRO. In the FDA

50

Guidance on PRO Measurement, content validity is defined by the empirical evidence showing that the

5


items and domains of an instrument are appropriate and comprehensive relative to its intended

52

measurement concept, population, and use (2).

53

Qualitative data are essential for establishing the content validity of a PRO instrument.

54

Quantitative data, including factor analysis and item response theory (IRT) analyses, may be supportive

55

but are insufficient on their own to document content validity for medical product development. This task

56

force report, Parts 1 and 2, summarize the elements of good research practices around the establishment

57

of content validity of a new instrument through qualitative research.

58 59 60

Good Practices in Eliciting Concepts for a New Patient-Reported Outcome Instrument Table 1 lists five steps to elicit concepts for establishing and documenting content validity of a new

61

PRO instrument in a recommended chronological order, consistent with the Wheel and Spokes Diagram

62

contained in the Final FDA PRO Guidance (2). Within each step are recommended good research

63

practices.

64

Table 1 About Here

65 66 67

Good Practice 1: Plan the Context of Measurement The development of an instrument, simple or complex, must start with a clear definition of the

68

concept to be measured in the proposed context of measurement. The purpose of Step 1 in Table 1 is to

69

ensure that the context is clearly defined and the approach for concept measurement is appropriate for the

70

intended context. In situations involving instrument development within a regulatory framework, context

71

considerations include the disease or condition of interest, target population, and treatment setting.

72

Consideration should also be given to the positioning of the measure in the hierarchy of clinical trial

73

endpoints. Clarification of the context of use and the role the measure will play in clinical trials informs

74

preliminary decisions on instrument scope of content, measurement structure, and mode of administration.

75

With this essential work established, qualitative research protocols can be developed to gather patient

76

input on the concept(s) of interest. Descriptions for each of the components of Step 1 follow below.

6

ISPOR PRO Task Force: Content Validity Part I 77 78 79

Disease models. Development of a new PRO instrument for use in medical product evaluation often begins with a

80

clear delineation of the concept of interest through the development of a disease model. Consideration is

81

given to the pathophysiology and expression of the disease or condition, including its characteristic signs

82

and symptoms in the target population. The relevant concepts measured by laboratory tests, performance

83

assessments such as exercise stress or cognitive function tests, or standardized clinical observations are

84

also identified. If symptoms are a defining characteristic, the appropriate symptom concepts cannot be

85

determined based on a literature review and consultation with clinical experts alone. Qualitative research

86

in the target patient population provides essential data on the patients’ perspectives of their symptoms. If

87

the impact of health on related physiologic, psycho logic, or sociologic concepts is of interest, those impact

88

concepts may also be targeted for instrument development and require patient input.

89

Disease models help to clarify and focus the specific PRO concept of interest within the context of

90

the entire disease process and the specific clinical trial population. Figure 1 illustrates a disease model for

91

psoriasis with a proposed pathway linking risk factors, diagnosis, signs and symptoms, and impacts. The

92

types of questions addressed by disease models include the following: Is the disease or condition

93

characteristically symptomatic? Are these symptoms amenable to treatment? Are there functional effects

94

of the condition, such as activity limitation due to symptoms that could be altered with treatment? What

95

other outcomes might the treatment affect? What concepts should be the focus of efficacy evaluation?

96

Because variability in measurement lowers the probability of detecting a meaningful treatment effect, the

97

more specific the concept and the closer this concept is to the goals of the treatment, the greater the

98

likelihood of success.

99

Figure 1 About Here

100

During the development of a disease model, consideration is given to the prevalence, severity, and

101

characteristics of the condition, the treatment to be tested, the target population for treatment, and

102

potential trial endpoints. Questions related to PRO candidates and trial design includes many questions,

7


depending on the actual disease or condition and trial. Will patients enrolled in the trial be experiencing

104

decrements in the symptoms, signs, or impacts that might be captured by a PRO instrument, so that the

105

effect of treatment on this outcome can be appropriately tested? Will or can patients be screened for

106

enrollment based on criteria specific to this outcome? In situations where the PRO is positioned as a

107

secondary outcome and enrollment does not include criteria related to this secondary outcome, study

108

results may be poor simply because a significant portion of study participants could not change with

109

treatment.

110

Patterns of change over time in the PRO of interest are another consideration. Is the PRO

111

relatively stable, with small changes over time? Is the condition acute with potentially large and/or rapid

112

changes with treatment? Or is the condition chronic with an expectation for minimal or slow changes in

113

the outcome of interest? For example, clinical trials for acute infectious conditions may be relatively short,

114

while trials to demonstrate a survival advantage can involve relatively long observations. Trial design,

115

including frequency of assessments, compliance, and missing data, is part of the context of use of a PRO

116

instrument that will inform the content and structure, including the items, response options, and recall

117

period.

118

Endpoint models

119

Endpoint models specify the primary and secondary endpoints to be tested in the target clinical

120

trial(s). Example endpoint models from the FDA PRO Guidance were provided in Figures 1 and 2 the

121

Guidance (2). Even when a medical product cannot be specified, e.g., in multi-sponsor instrument

122

development consortia, the anticipated role of the instrument can be shown in one or more hypothetical or

123

illustrative endpoint models to specify the context of use. In this case, the model(s) represent an educated

124

prediction of the prioritization of study hypotheses in clinical trials in which the PRO instrument is to be

125

used.

126

Of course, a new PRO instrument may serve as an exploratory endpoint in early trials, with the

127

data used to test reliability, validity, and responsiveness. The endpoint models we are describing here

128

pertain to future medical product development trials using current best clinical trial practices keeping in

8


mind that target patient populations may change to related severity or diagnostic groups. While it is

130

important to be forward looking in developing a new instrument, this can result in a concept and instrument

131

that is too generic, diluting measurement content, reducing reliability and sacrificing the near-term

132

objectives.

133

Literature review and experts.

134

Disease and endpoint models both inform and are informed by existing knowledge or experience,

135

published literature, and consultation with clinical content experts. Models focus the literature review and

136

clarify the type of experts and the role they will play in the development process. Input from the literature

137

and experts, in turn, are used to revise the disease and endpoint models as appropriate.

138

Target population - cultural/language groups.

139

As instrument development is planned, thought is given to the details of the target population,

140

including the languages and cultures of patients likely to be enrolled in clinical trials. The extent to which

141

the disease, standard of treatment, and measurement concept(s) are the same or differ across countries

142

or cultures is considered. Literature and experts can help in this discussion. If the development program

143

will be international and the concept is highly variable across countries, simultaneously developing an

144

instrument internationally may strengthen and document cultural equivalence of the final instrument. If

145

there is published or empirical evidence indicating concept stability across countries, it may be possible to

146

develop the measure in one country with review by a PRO linguistic expert to facilitate ease of translation

147

for future use.

148

Preliminary decisions on the instrument content and structure

149

As the context of use is identified and clarified, decisions are made concerning the optimal

150

instrument structure and likely content. The following principles of good measurement are among those

151

used during decision making: (1.) Consider both positive and negative content. For example, the effects

152

of treatment may include positive effects on pain and negative effects on sleep. (2.) In general,

153

respondents should not be asked to attribute the cause of their symptoms or experiences. It would be

154

difficult, for example, for subjects to know whether their breathlessness was due to congestive heart failure

9


as opposed to other causes, such as aging, anxiety, infection, etc. (3.) In general, respondents should not

156

be asked to rate change over time, but rather should be asked to evaluate their current state with an

157

appropriate recall period. Change is then computed across evaluations. (4.) Consider the method (self

158

versus interviewer administered) or mode (paper-pen, electronic, voice response) of data collection early.

159

Switching methods or modes of administration between development and use may require an additional

160

validation step to assure score equivalence.

161

Hypothesized Conceptual framework.

162

The considerations outlined above should lead to a list of the PROs of interest and the concepts

163

and sub concepts or domains comprising them.

164

shows two possible PROs of interest: psoriasis symptoms and Impacts. Within each of these general

165

PROs are concepts and sub-concepts suggestive of instrument content, e.g., pain, itching, burning etc.

166

This information informs the development of the qualitative elicitation protocol and the interview or focus

167

group discussion guide. As outlined below, the guide includes reference to what the interviewers might

168

expect to hear and areas requiring greater clarity, with the understanding that new information may be

169

uncovered, contributing to the conceptual focus and accuracy of the instrument.

170

The disease model shown in Figure 1, for example,

An example of a conceptual framework for a PRO evaluating the concept of pain is shown in

171

Figure 2. Note that the category of pain quality is divided into deep pain and surface pain. These two

172

concepts are further divided into aspects of pain quality. This conceptual framework will help with the

173

coding dictionary developed later in the process.

174

Figure 2 About Here

175 176 177

Good Practice 2: Develop the research protocol for qualitative concept elicitation The study protocol and interview guide provide documentation of the pre-specified plan for

178

identifying the sample, conducting interviews or focus groups, and analyzing data that will inform the

179

content and structure of the new instrument. Contents of the study protocol include: Study sample, data

180

collection method, setting, materials and methods, and analyses.

10

ISPOR PRO Task Force: Content Validity Part I 181 182

Study Sample Demographic and clinical characteristics of the sample should match the target population, i.e., the

183

intended clinical trial sample. For example, if clinical trials will include patients who have either psoriatic

184

arthritis or plaque psoriasis, both types of patients are included in the qualitative study sample in order to

185

allow for of the full range of comments and expressions to arise. Clinical sites and methods for participant

186

recruitment should be selected with this goal in mind. When evaluating clinical sites and/or locations for

187

possible participation, considerations should be given to geographic, educational, ethnic and racial

188

diversity, and the availability of clinical information needed to characterize and evaluate sample

189

characteristics in the final report.

190

Estimating the sample size for a qualitative study can be challenging. In quantitative research

191

protocols, sample size is estimated using analytical techniques requiring projections of magnitude likely to

192

be observed in the study (e.g., means, differences, variances, proportions, confidence intervals) together

193

with the desired power and a significance criterion. In qualitative research, sample size estimation is

194

based on projections of the data needed to reach “saturation”. Discussed further in Section 4, saturation

195

is “the point at which no new concepts [relevant to the concept of interest] are forthcoming from the

196

population being interviewed” (8) When the concept of interest is clearly defined and relatively narrow in

197

scope and the target population is largely homogenous, relatively few participants (e.g., 15 to 20) may be

198

required to achieve saturation. In contrast, situations involving a very broad, poorly defined, or

199

multidimensional concept or heterogeneous target populations will involve larger sample sizes (e.g., 40 or

200

more). As noted in the FDA Guidance (2), “the number of patients is not as critical as interview quality and

201

patient diversity included in the sample in relation to intended clinical trial population characteristics.”

202

Data collection method

203

Individual interviews and focus groups are the data collection methods used in qualitative research

204

involving concept elicitation for instrument development purposes (9). A summary of the advantages and

205

disadvantages these methods are shown in Table 2. Focus groups are economical and can stimulate

206

discussion of topics and comparison of experiences across participants that cannot be captured in

11


individual interviews (9-12).

208

when run by inexperienced or untrained leaders. One example is a highly vocal, assertive participant who

209

dominates or leads the discussion, minimizing participation of other group members and resulting in

210

content, tone or perspectives that do not necessarily represent those of individuals or the group as a

211

whole. Interviews are ideal for concepts that are sensitive or target populations/people unlikely to

212

volunteer or share information in a group setting(13). There are also disadvantages to individual

213

interviews. For example, by design, interviews must be conducted sequentially or by multiple interviewers,

214

both of which are more expensive and time consuming (14). Interviews are usually the best methodology

215

for concepts that are sensitive or target populations unlikely to volunteer or share information in a group

216

setting.

217 218 219

Unfortunately, there are also risks associated with focus groups, particularly

Table 2 About Here Setting Focus groups and interviews may be conducted in in-patient settings, out-patient clinics, or

220

dedicated research facilities. Interviews may also be conducted at participant homes or, in some cases

221

over the telephone (e.g., rare, episodic, or contagious conditions). The appropriate setting depends on the

222

target population, including illness severity or contagiousness, physical mobility, psychological state, or

223

other factors that would affect a person’s ability to travel or participate. The setting should optimize the

224

extent to which the sample is consistent with the target population by making participation accessible.

225

Materials and Procedures

226

The interview or focus group guide includes the questions that should be addressed and how the

227

interviews or focus group should unfold for optimal clarity and data quality. It is not a script to read

228

verbatim, but is a manual that provides the interviewer with an organized summary of the topics to be

229

discussed, specific questions for each topic, and sample probes that can be used to further explore areas

230

when needed. Exploratory questions may be included to uncover features of the condition or its treatment

231

that may not be well understood through previous research and clinical experience.

232

The specific content of questions comprising the interview guide is dictated by the context of

12


measurement, including the disease and endpoint models and draft conceptual framework. For example, if

234

pain is hypothesized as an important symptom in the disease model, the interview/focus group guide

235

includes questions to understand the patient’s experience of pain, which may include frequency, severity,

236

duration, and or impact. The reference timeframe, that is the timeframe the participants are asked to

237

consider as they respond to the questions, will also depend on the PRO and measurement context. For

238

example, when developing a measure for chronic heart failure patients, participants may be asked to recall

239

and describe a recent acute episode or hospitalization or their experiences during the current day or week.

240

In general, it is desirable for the reference timeframe to be as close as possible to the interview or focus

241

group, in order to diminish recall errors and bias. One method known as the day-reconstruction approach

242

(15) can be used to focus a participant on a specific day as they describe symptoms, impacts or other

243

experiences relevant to the target concept.

244

Unless carefully worded and conducted, interview questions and procedures can introduce bias

245

into the data. For example, certain closed-ended or highly specific questions can be leading, such as “you

246

experienced pain in your knee today, right?” or “how depressed were you during this event?” Questions

247

should be open-ended whenever possible and worded to encourage spontaneous information from the

248

participant without pointing them toward a specific response. With this in mind, open-ended questions that

249

are too broad can be confusing to participants. “What was yesterday like for you?” or “Tell me about your

250

condition”, for example, lacks the specificity required for participants to address the concept of interest and

251

can lead to irrelevant data. Open-ended question should include parameters consistent with the concept

252

of interest. If the concept of interest is knee pain, the interviewer could ask: How did your knee feel

253

yesterday? With probes to better understand the nature and characteristics of the experience offered by

254

the participant. This approach provides data on the words and phrases participants use to describe their

255

condition that will inform instrument content.

256

Interview questions can also address multiple dimensions of a concept. For example, it may be

257

useful to understand the severity, duration, and frequency of a particular symptom. Pain that can be

258

severe, but doesn’t last long or occur very often may be a very different experience from pain that is

13


moderately severe, but occurs frequently and lasts for a long period of time. Understanding these

260

dimensions of an experience can be useful for developing a new instrument based on a complete picture

261

of a participant’s experience. The following list of questions show how more specific symptom-related

262

information might be obtained once the symptom has been elicited by more open ended questioning: How

263

often do you have (symptom X)? How severe is the (symptom X)? How long does it usually last? Does

264

anything make (symptom X) better or worse? Please tell me more about that. Do you have any other

265

sensations or symptoms when you feel (symptom x)? Questions to elicit information about symptom

266

impact might include: How do your symptoms affect or influence your everyday life? Probes might

267

include: How does symptom X affect your daily activities? Does it affect your relationship with others? Tell

268

me more about (the difficulty you have performing activity X).

269

Once a draft interview guide has been created, it is reviewed by other qualitative researchers for

270

possible difficulties in flow, redundancy, poorly formulated questions, and the appropriate use of

271

terminology and probes. The draft guide should be pretested with study naïve individuals or colleagues

272

or, ideally, pilot tested in the target population to identify areas that do not flow easily or may confuse

273

respondents before primary data collection begins.

274

Analyses

275

As with a clinical trial, the interview protocol should also include a plan for analyzing, summarizing,

276

and interpreting the interview data. Unlike quantitative analyses, there are no inferential statistical tests

277

involved. Rather, this portion of the protocol describes the methods that will be used to identify, code, and

278

summarizing themes, procedures for quality control, and methods for determining and documenting

279

saturation. Qualitative analyses are discussed further in section 4.

280 281 282 283

Good Practice 3: Conduct the concept elicitation interviews and focus groups The research protocol must be reviewed and approved by an appropriate institutional review board prior to the initiation of subject recruitment and data collection. Sites are provided a copy of the study

14


protocol and trained on inclusion/exclusion criteria, sample monitoring, recruitment processes, and

285

informed consent procedures.

286

Interviewers and focus group facilitators should be experienced in qualitative research methods

287

and trained on the background and objectives of the protocol. Mock interviews or focus groups may be

288

used to help the interviewers/facilitators develop a complete understanding of the questions and process

289

and assure a smooth, clear data collection process. Sustained interaction with interviewers is important to

290

establishing and maintains quality of data collection.

291

Core competencies in concept elicitation interviewing are shown in Table 3. The concept elicitation

292

process is intentionally broad in order to explore and define information from the perspective of the patient.

293

A well constructed interview guide defines the broad territory of discussion, leaving no need for the

294

interviewer to censure or discount participant responses. Although discipline is needed to keep the

295

participant or focus group “on task”, interviewers should avoid being overzealous in assuming irrelevance,

296

favoring an open dialogue among participants to encourage participation. Interviewers should be aware

297

that their body language and actions, such as nodding in agreement, frowning, or sighing, can

298

communicate approval or disapproval of the participant’s contribution, altering the content or emphasis of

299

subsequent information. Interviewers should remain neutral, while conveying genuine interest to

300

encourage open and honest communication. Hallmarks of interviewer skill rest on the ability to get the

301

participant to talk about the areas and topics of interest in a natural conversational engagement, where

302

they feel they are being heard and respected.

303

Table 3 About Here

304

Concept elicitation interviews and focus groups are recorded (either audio or video) to fully capture

305

the context and content and produce transcripts that form the data for analysis. Audio recordings are

306

generally preferred because they are easier to perform and transcribe, facilitate participant anonymity, and

307

are generally more comfortable for participants, particularly when sensitive topics are being discussed.

308

Regardless of recording method, participants need to be assured of the confidentiality and limited usage of

309

the recorded materials from their interviews. In addition to being essential for data analyses, recordings

15


can be monitored for quality assurance by a senior interviewer who provides feedback to the interviewer to

311

maintain or improve the quality of data collection throughout the duration of the study by improving

312

question clarity, altering probes, and/or pursuing specific aspects in greater detail.

313

Recording frees the interviewer or moderator from note taking in order to engage fully with the

314

participant(s). For focus groups, an assistant moderator is often useful to observe the group and take

315

notes to facilitate data interpretation. These notes include a seating chart with participant initials and key

316

points associated these initials. This also helps in checking the transcriptions of focus group recordings.

317

Transcriptions of the audio/video recordings need to be verbatim and reviewed, quality checked,

318

and cleaned by the facilitators/interviewers and associates. Cleaning includes: (a) removal of any

319

personal identifiers; (b) correction of any medical terms that the transcribers did not recognize or

320

misspelled; and (c) removal of any clearly extraneous narrative (for example, the participant answers their

321

cell phone or the nurse walks in with a message). Dialogue that is related but not central to the purpose of

322

the interview can be retained in the transcript and separated during the coding process to document the

323

irrelevance of the information for data analyses. Transcript quality is assessed through the direct

324

comparison of voice and transcript files, generally performed randomly. Once transcripts have been

325

quality checked and cleaned, qualitative analysis begins.

326 327 328

Good Practice 4: Analyze the Data Analyze according to the theoretical approach. There are multiple theoretical approaches and

329

methodologies that can be applied to qualitative research procedures and data analyses, including

330

phenomenology, grounded theory, content analysis and thematic analysis(16-19). In qualitative research

331

to inform instrument development, data collection and analyses are interrelated and concurrent rather than

332

linear: “Analysis is the interplay between researcher and data. It is both science and art” (20). All of these

333

approaches are idiographic (focus on the individual) in contrast to a quantitative nomothetic paradigm

334

(focus on the general) founded in positivism (21). Across all qualitative methods, the purpose is to

16


understand participant perspectives and experiences using “decontextualisation” i.e., assigning of codes

336

and “recontextualisation” (i.e., reducing data around central themes).

337

Phenomenology as an overarching theoretical framework and grounded theory as a specific

338

methodology have been proposed as most appropriate for the development of a new PRO instrument(18).

339

An adaptation of grounded theory has also been proposed (14) which allows for the use of prior

340

knowledge in the analysis of data. This added deductive element to an otherwise inductive approach is

341

consistent with the need to draw from existing information, pulled together as part of context-of-

342

measurement development (Step 1), to identify themes and concepts in the data and interpret the results

343

in light of the ultimate goal, to develop a new PRO instrument for a specific use. This approach also

344

permits moving back and forth between a hypothetico-deductive and inductive approach where the

345

developer’s understanding can change based on new information and / or observations, resulting in an

346

iterative process of instrument development.

347

It is important to clearly describe how the data were analyzed, i.e., what was done and why.

348

Existing guidelines for performing qualitative research can aid in structuring the description, evaluating the

349

process used and, and determining how best to present and discuss results (22-24),(25, 26).

350 351 352

Coding qualitative data for instrument development The primary goal of transcript coding is to organize and catalog a participant’s descriptions of their

353

experiences within the context of measurement. The coding process for different qualitative approaches

354

share methodologies for decontextualisation and recontextualisation, even when the coding focus differs.

355

For example, based on a phenomenological approach, one can identify descriptions of the phenomenon

356

that are universal (phenomenology); based on grounded theory, one can use open coding (examining,

357

comparing, conceptualizing, and categorizing data), axial coding (reassembling data into groupings based

358

on relationships and patterns within and among the categories identified in the data); and selective coding

359

(identifying and describing the central phenomenon, or “core category.

17


The “coding framework” is an initial structure or organization of codes for grouping clusters of

361

information that form a coherent theoretical unit. This framework is based on the disease model and draft

362

conceptual framework developed at the onset of the work. A preliminary coding framework is developed

363

and revised during data analyses based on information and insight gained during data review, including

364

the development of new codes to represent clusters of new information. Data coding is an iterative

365

process, and should include opportunities to be re-examined and re-analyzed until no new codes are

366

identified and all relevant concepts have been assigned one or more codes.

367

Figure 3 shows the various inputs into the development of a coding framework (structure to hold

368

codes) to the completed “coding dictionary” (document inclusive of all codes assigned with definitions as

369

appropriate for standardization, clarity, and communication). A coding framework provides patient-based

370

insight into the relevance of concepts included in the disease model and conceptual framework. A coding

371

dictionary is used to assure consistency in coding across data analysts or coders and to document and

372

communicate the meaning of the codes to external reviewers.

373

Figure 3 about here.

374

Presentation of the coded qualitative data is intended to identify both the predominance of

375

participants expressing the concepts and to provide a description of the language that the participants use

376

to talk about those concepts. Depending on the qualitative approach, the presentation of codes and

377

themes might differ. A thematic ‘map’, i.e., an overall conceptualization of the data patterns and the

378

relationships between them will be produced when using a thematic analysis(19).

379

Computer-assisted qualitative data analysis software programs, such as Atlas.ti (27) can be used

380

to organize the data and coding scheme for easier retrieval and analyses. These programs do not assign

381

codes to the data; skilled decision making is still needed to allocate participant expression of concepts to

382

the appropriate code.

383

Assessing Saturation.

384 385

Best practice is to code and assess saturation at multiple points during the data collection process, Data should be transcribed and coded on a rolling basis with regular intervals of assessment to evaluate

18


the consistency of the code assignment process, adequacy of the coding framework and to monitor the

387

appearance and organization of newly appearing concept codes. Careful monitoring during the coding

388

process and a phased approach to assessing saturation provides the researcher with insight into the data

389

as the study progresses and an opportunity to return to the field for comprehensiveness or clarity.

390

To assess saturation of concept, transcripts and coding can be evaluated after a set of 5 to 8

391

interview or focus group transcripts become available. A saturation table is used to track either the new

392

appearance of concepts, or noting all occurrences of the concept across the transcript groups. Data are

393

examined for either the continued identification of new concepts (newly appearing codes) or codes

394

requiring further examination to confirm relevance or the attainment of saturation.

395

Codes are identified in each next set of transcripts and compared with the codes that appeared in

396

the previous groups. In the best case scenario, saturation is documented by showing no new concepts

397

arising in the last several interviews or final focus group. In reality, it is not uncommon for a new concept

398

to arise late in data collection process. Scientific judgment, including knowledge of the field and

399

consultation with experts, is used to determine if this new concept is an outlier, i.e., reflecting a relevant

400

but unusual case, and further judgment is required to determine if additional data collection is required or

401

warranted to re-assess saturation following this late revelation.

402

Multiple coders

403

Best practice in analyses of qualitative data from elicitation interviews involves two or more coders.

404

Each coder is carefully trained around the purpose of the study, target concept, nature of data itself, the

405

coding framework, and the coding dictionary. Each coder completes 1-2 transcripts and meets to

406

compare codes assigned, identify areas of consistency and inconsistency, reconcile the codes on these

407

transcripts, and revise the coding framework and dictionary for clarity and to enhance consistency in

408

subsequent transcript coding. This process is repeated regularly throughout the coding process. An

409

agreement is defined as a set of words or phrases identified as reflecting the same code and/or sub-code.

410

Given the nature of qualitative data, flexibility is permitted around the words that constitute the word set or

411

phrases. For example, two coders assigning the codes “pain” and “pain with kneeling” to the transcript

19


text “You know I am always in pain when I kneel” would be considered in agreement, even though one is

413

more specific than the other.

414

Assuring coding precision can take several forms. One approach is to have a “super coder” review

415

all data to assure consistency across coders. A second approach is to draw a random selection of

416

transcripts that are dually coded and assess inter rater agreement. Through discussion of coding and

417

reconciliation when disagreement between coders is uncovered, greater than 90% agreement can be

418

reached. These methods are similar to those in interviewer coded audio recordings using psychiatric

419

ratings scales where inter-rater agreement is critical and inter-rater agreement is assessed until it reaches

420

90% or higher (28). Regardless of the approach used, the coding method and procedures for quality

421

assurance should be carefully documented.

422

Multi-Vectored Analysis of Qualitative Data

423

Analyzing qualitative data is a multi-vectored assessment where different vectors of information are

424

gained throughout the qualitative interview process. These often include: pre-specified concepts

425

(symptoms, signs, limitations, worries, impacts, etc.); concepts participants report spontaneously versus

426

those they recognize when probed; predominant language participants use to express various concepts;

427

variability in experience around concepts; the most meaningful way to address concepts (attributes of

428

frequency, severity, duration); and the degree of difficulty, bother and/or impact.

429

The selection of any one format, focus or analytic approach is dependent on the purpose of the

430

study. For example, an exploratory analysis aiming to elicit concepts for theory development might focus

431

more on presenting information vectors like “relevant concepts” and “patient language”. In contrast,

432

analyses for instrument development requires a focus on information vectors to successfully craft items,

433

response options, instructions and recall, such as the “attributes” and “variability” associated with the

434

target concept. Each information vector can have one or more uses and can be presented for

435

assessment in a number of formats (i.e., by content, by predominance, by actual scores, or by proportion),

436

depending on the type of inference to be drawn and the framework and analytical method chosen. This

437

multi-vectored approach is illustrated in Figure 4.

20

ISPOR PRO Task Force: Content Validity Part I 438 439

Figure 4 About Here The language in participant quotes provides a rich picture of the participants’ experiences with the

440

target concept. In qualitative research for instrument development, the goal is to understand, organize,

441

and communicate the meaning of the data and translate that meaning into a quantitative measure. The

442

analysis of qualitative data is not quantitative; there is no effect size, significance level, or other

443

quantitative metric. The goal of qualitative analyses to understand and communicate the meaning

444

embedded in a dataset comprised of words and phrases. This is done by analyzing, organizing and

445

summarizing the data in a manner that shows the relationship between the concepts, the words and

446

phrases, and the final PRO instrument. Because each vector of information involving patient input

447

contributes a unique aspect of understanding and communication, the use of multiple vectors of

448

information provides an instrument developer with greater confidence that the concept is understood and

449

the instrument adequately expresses this understanding.

450 451 452

Good Practice 5: Document Concept Development and Elicitation The FDA PRO Guidance lists the information to be provided by Sponsors in PRO dossiers in an

453

Appendix (2)pp 35-39). The FDA Guidance proposes an order and taxonomy based on the wheel and

454

spokes diagram that provides a logical flow for organizing the report to support the PRO being submitted

455

for review in relation to the claims desired and the development process ((2), page 7). For both the FDA

456

and EMA reviews, documentation begins with the PRO instrument to be reviewed, followed by a

457

description of the steps used to identify concepts and create the instrument.

458

Concept elicitation methods are part of the evidence supporting content validity as recommended

459

in the first two spokes of the FDA diagram. Essential documentation of content validity includes both

460

concept elicitation discussed in this paper and cognitive interviewing discussed in the next (Part II). This

461

qualitative evidence may be accompanied by supplementary quantitative evidence that confirms or revises

462

the proposed conceptual framework. Essentially the early content validity documentation provides

463

evidence that the proposed instrument captures the most important concepts as viewed by the target

21


population, and that the concepts are complete and relevant to persons in the target population. This

465

evidence is specific to the planned clinical trial population and indication, i.e., the context of measurement.

466 467

Consistent with the FDA Guidance, documentation of the concept elicitation phase of instrument development include the following elements:

468

•

Target claims and description of the target population (i.e., from Target Product Profile)

469

•

The preliminary and final disease model

470

•

The underlying endpoint model

471

•

Preliminary and revised conceptual framework for the PRO instrument based on qualitative studies

472

conducted prior to testing of measurement properties

473

•

Literature review and documentation of expert input

474

•

Qualitative study methods and results, including protocols, interview guides, and results

475

•

Evidence of saturation.

476

•

Origin and derivation of concepts captured in the PRO instrument

477

•

Summary of qualitative data supporting the concepts, items, response options, and recall period

478

Organizing the document in a manner consistent with recommendations contained in the FDA

479

PRO Guidance makes it easier for reviewers to determine if the essential elements of qualitative

480

development of a new PRO instrument are included in a submitted dossier. Further recommendations on

481

documentation of item wording, cognitive interviewing and the final item tracking matrix prior to

482

quantitative evaluation are contained in the following manuscript.

483 484 485

Conclusion This paper outlines the steps needed to derive a new PRO instrument for use in medical product

486

development trials evaluating the benefits and risks of treatment. The paper covered the steps of

487

concept elicitation, from determining, defining and documenting the context of measurement to the

488

analyses of qualitative data from interviews and focus groups and the documentation of methods and

489

results of this work. Examples have been provided to clarify specific steps and inform the development of

22


documentation needed to support the content validity of the new measure. Paper II of this 2-part task

491

force report covers the creation of the new PRO instrument, evaluating its clarity and content validity

492

through cognitive interviewing, and documenting this work for medical product evaluation.

23


REFERENCES

1. Rothman M, Burke L, Erickson P, Leidy NK, Patrick DL, Petrie CD. Use of Existing PatientReported Outcome (PRO) Instruments and Their Modification: The ISPOR Good Research Practices for Evaluating and Documenting Content Validity for the Use of Existing Instruments and Their Modification PRO Task Force Report. Value Health2009 Sep 25. 2. U.S. Department of Health and Human Services FDA. Guidance for Industry- Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. 2009 [cited 2010 12-29]; Available from: http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM193282 .pdf 3. European Medicines Agency. Reflection Paper on the Regulatory Guidance for the Use of Health Related Quality of Life (HRQL) Measures in the Evaluation of Medicinal Products. London: European Medicines Agency; 2004 [cited 2010 12-29]; EMEA/CHMP/EWP/139391/2004]. Available from: http://www.ema.europa.eu/pdfs/human/ewp/13939104en.pdf. 4. Acquadro C, Berzon R, Dubois D, Leidy NK, Marquis P, Revicki D, Rothman M. Incorporating the patient's perspective into drug development and communication: an ad hoc task force report of the Patient-Reported Outcomes (PRO) Harmonization Group meeting at the Food and Drug Administration, February 16, 2001. Value Health2003 Sep-Oct;6(5):522-31. 5. Patrick DL, Burke LB, Powers JH, Scott JA, Rock EP, Dawisha S, O'Neill R, Kennedy DL. Patientreported outcomes to support medical product labeling claims: FDA perspective. Value Health2007 NovDec;10 Suppl 2:S125-37. 6. U.S. Department of Health and Human Services F. Guidance for Industry- Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. 2009. 7. American Educational Research Association APA NRCoMiE. Standards for Educational and Psychological Testing. . Washington, DC: AERA; 1999. 8. Guest GB, A; Johnson, L. How many interviews are enough? - An experiment with data saturation and variability. Field Methods2006;18(1):59-82. 9. Lehoux P, Poland B, Daudelin G. Focus group research and "the patient's view". Soc Sci Med2006 Oct;63(8):2091-104. 10. Kitzinger J. Qualitative research: Introducing focus groups. Brit Med J1995;311:299-302. 11. Hollander J. The social contexts of focus groups. J Contemp Ethnogr2004;33:602-37. 12. Smithson J. Using and analysing focus groups: limitations and possibilities. Int J Social Research Methodology2000;3:103-19. 13. Holstein JGJ, editor. Handbook of Interview Research: Context and Method. Thousand Oaks, CA: Sage; 2002. 14. Brod M, Tesler LE, Christensen TL. Qualitative research and content validity: developing best practices based on science and experience. Qual Life Res2009 Sep 27. 15. Kahneman D, Krueger AB, Schkade DA, Schwarz N, Stone AA. A survey method for characterizing daily life experience: the day reconstruction method. Science2004 Dec 3;306(5702):1776-80. 16. Denzin N, Lincoln, YS, editor. The SAGE Handbook of qualitative research. 3rd ed. Thousand Oaks, London, and New Delhi: Sage Publications; 2005. 17. Starks H, Trinidad, SB. Choose your method: a comparison of phenomenology, discourse analysis, and grounded theory. Qual Health Res2007;17:1372-80. 18. Lasch KE, Marquis P, Vigneux M, Abetz L, Arnould B, Bayliss M, Crawford B, Rosa K. PRO development: rigorous qualitative research as the crucial foundation. Qual Life Res2010 May 30. 19. Braun V, Clark, V. Using thematic analysis in psychology. Qualitative Research in Psychology2006;3(2):77-101. 20. Strauss A, Corbin, J. Basics of Qualitative Research. Newbury Park, CA: Sage; 1990.

24

ISPOR PRO Task Force: Content Validity Part I 21. Ponterotto JG. Qualitative research in counseling psychology: a primer on research paradigms and philosophy of science. J Counseling Psychology2005;52(2):126-36. 22. Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care2007 Dec;19(6):349-57. 23. Mays N, Pope C. Qualitative research in health care. Assessing quality in qualitative research. BMJ2000 Jan 1;320(7226):50-2. 24. Elliot B. Using narrative in social research: qualitative and quantitative approaches. London: Sage Publications; 2005. 25. Cochrane Qualitative Research Methods Group. Available from: http://www.joannabriggs.edu.au/cqrmg/about.html. 26. The British Psychological Society. [12/28/2010]; Available from: http://www.bpsjournals.co.uk/journals/joop/qualitative-guidelines.cfm. 27. Muhr T. User's Manual for ATLAS.ti 5.0. . Berlin: ATLAS.ti Scientific Software Development GmbH; 2004. Available from: http://www.qsrinternational.com/products_nvivo.aspx 28. Overall JE, Gorham DR. The brief psychiatric rating scale. Psychol Rep1962 Nov;10:799-812.

25


Table 1

Five Steps to Elicit Concepts for New Patient-Reported Outcome Instruments and Document

Content Validity Consistent with Good Research Practices* ____________________________________________________________________________________ 1. Determine the context of measurement •

Develop a hypothesized disease model based on literature, experts, and patients

•

Name and define the concept within the context of a clinical trial end-point model

•

Select and define the target population

•

Conduct a literature review, prepare list of candidate items from disease model and existing instruments addressing the same concept, and consult content experts

•

Select the target cultural/language groups

•

Make preliminary decisions on instrument content and structure

•

Develop an hypothesized conceptual framework for the instrument

2. Develop the research protocol for qualitative concept elicitation

3.

•

Define the target sample characteristics

•

Select the data collection method - focus groups, individual interviews, both

•

Determine the setting and location for data collection

•

Develop the interview guide – Draft, pilot, revise

•

Determine quality control procedures for data collection and monitoring

•

Develop a preliminary qualitative analysis plan

Conduct the concept elicitation interviews and focus groups •

Obtain IRB approval

•

Recruit and train sites

•

Recruit participants; monitor sample characteristics to assure representation

•

Select and train interviewers

•

Conduct interviews – implement quality control measures

•

Record or videotape interviews

•

Transcribe and clean transcripts

4. Analyze qualitative data •

Analyze qualitative data according to theoretical approach used

•

Establish preliminary coding framework; update as data are coded

•

Establish coding procedures and train coders

•

Organize data using a qualitative research software program Assess saturation

•

Interpret results

26

ISPOR PRO Task Force: Content Validity Part I 5. Document concept development and elicitation methodology and results •

Provide context for use

•

Specify and define the concept

•

Denote the target claims and population

•

Provide a disease model and an endpoint model

•

Provide supporting documentation for concept

•

Show the original and revised conceptual framework

•

Summarize the literature review

•

Document input from content experts

• •

Present the methods and results of qualitative research Provide clear evidence of saturation

•

*Steps to develop an instrument, evaluate the new measure through cognitive interviewing, and document that aspect of content validity are addressed in Part 2 of the Task Force report.

27


Table 2

Focus Groups & Interviews: Advantages and Disadvantages Focus Groups

Advantages

Interviews

Rich source of data

Allows individuals to use

detailed information about an

ideas of others as cues to

individual’s experience

express their own views

Participants can compare

Get more in-depth and

Can be useful for sensitive topics

their experiences with others

Data can be easier to analyze

Able to reach many

Scheduling can be easier

It may take longer to collect

participants at once Disadvantages

Data can be tough to analyze because talking can be in

the data

reaction to the comments of

other group members

view at a time; no peer

Moderators need to be highly

comparison

trained and able to lead the

Limited to one participant’s

Interviewers need to be

group

trained with excellent one-on-

One strong group member

one communication skills

can sway tone of entire group

May be more costly (e.g. travel, room rental, transcription fees, etc…)

28

ISPOR PRO Task Force: Content Validity Part I Table 3

Form for Evaluating Core Competencies in concept elicitation interviewing

FOCUS OF EVALUATION

Criteria met?

Issues Found In

REMEDIES

Yes

Evaluation

NEEDED

No

PREPARATION to start interview w/subject Prepared for Interview, Familiar with interview content, mechanics of worksheets, tallies, record keeping Demonstrates understanding of interview content, comprehends patient response in context of goals PROTOCOLS followed as identified in interview guide Identified primary purpose of Interview/Focus Group to subject(s) Adhered to interview guide Covered all probe content Allowed time for participant to spontaneously respond to probes before offering examples Thoroughly explored responses to probes Asked for additional comments at completion of interview COMPETENCIES Demonstrated in Conduct of Interview Responsive to lack of understanding of subject to question/topic – able to reframe question for participant understanding Allowed subject time to respond without interrupting/rushing subject Offered a minimum of 3 examples where needed Control of interview (keeping subject on topic, familiar with interview logistics) Stayed neutral – avoids confirming

29


FOCUS OF EVALUATION

Criteria met?

Issues Found In

REMEDIES

Yes

Evaluation

NEEDED

No

subjects responses Actively promoted in depth responses from subject Avoiding leading Questions Using participant language Recognizing the symptom has already been explored – spontaneously by participant GENERAL COMMENTS Overall feeling from the interview:

Things done well:

Things that need improvement:

Printed with permission of M. L. Martin, Health Research Associates

30


31


Figure 2: Example conceptual framework for a PRO evaluating the concept of pain quality

Item Concept

aching

Deep pain Item dull Concept Pain quality

Item itchy Concept Surface pain

Item numb

Item tingling

32

ISPOR PRO Task Force: Content Validity Part I Figure 3: From coding framework to coding dictionary

Literature Clinical Experts Interview Guide Hypothesized TPP

CODING DICTIONARY

CODING FRAMEWORK

Structure

Contains all

Starting

Expanded to

codes assigned,

Structure for

add more

grouped by

concepts

concept, and

PRO Instrument Review Hypothesized Conceptual Framework

Patient Interview results

All Code Assignments from transcripts

Printed with permission of M. L. Martin, Health Research Associates

33

ISPOR PRO Task Force: Content Validity Part I Figure 4:

Multi-Vectored Approach to Understanding Qualitative Data Patient Language

Attribute to Measure

Relevance of Concept

INTERPRETATION & MEANING

Necessary Coverage (degree of importance to patient)

• •

Variability

Presenting the relation of qualitative results to concepts wording, with most meaningful and appropriate measurement design, response options and recall period

Degree of Bother

Degree of Difficulty

34

Essential Coverage (degree of importance clinically and to measurement strategy