Original Article Osteoporosis - Springer Link

4 downloads 43 Views 67KB Size Report
Measuring Quality of Life in Women with Osteoporosis. Osteoporosis Quality of Life Study Group*. Abstract ...... color of the card (grey, blue, pink, orange, yellow).
Osteoporos Int (1997) 7:478–487 ß 1997 European Foundation for Osteoporosis and the National Osteoporosis Foundation

Osteoporosis International

Original Article Measuring Quality of Life in Women with Osteoporosis Osteoporosis Quality of Life Study Group*

Abstract. The objective of the study was to test an instrument for measuring health-related quality of life (HRQL) for women with osteoporosis and back pain caused by vertebral fractures related to osteoporosis. A longitudinal cohort study design was used in which women were seen at baseline, after 2 weeks, and after 6 months. The setting was a secondary care rheumatology practice in Canada and four separate secondary care metabolic bone disease practices in the United States. The patients were 226 women suffering from osteoporosis and back pain with a mean bone mineral density of 0.84 + 0.14 g/cm2 and a mean of 2.78 (median 2, range 1–11) vertebral fractures. We administered the Osteoporosis Quality of Life Questionnaire (OQLQ)

*Protocol development: D. J. Cook1,2, G. H. Guyatt1,2, J. D. Adachi1, R. S. Epstein3, E. F. Juniper2. Clinical protocol development and data collection: Hamilton, Ontario: P. A. Austin2, J. Clifton2, J. D. Adachi; Bangor, Maine: C. J. Rosen4, C. R. Kesennich5; Worchester, Massachusetts; J. L. Stock6, J. Overdorf6; Denver, Colorado: P. D. Miller7, A. L. Erickson7; Portland, Oregon: M. McLung8, B. Love8. Data analysis and interpretation: L. E. Griffith2, G. H. Guyatt, D. J. Cook. Manuscript preparation: G. H. Guyatt, D. J. Cook. 1 Department of Medicine, McMaster University, Hamilton, Ontario, Canada; 2Department of Clinical Epidemiology & Biostatistics, McMaster University, Hamilton, Ontario, Canada; 3Epidemiology Department, Merck Sharp and Dohme Research Laboratory, West Point, Pennsylvania, USA; 4Department of Medicine, St. Joseph’s Hospital, Bangor, Maine, USA; 5Faculty of Nursing, Husson College, Bangor, Maine, USA; 6Department of Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, USA; 7 Department of Medicine, University of Colorado, Denver, Colorado, USA; 8Department of Medicine, Oregon Health Sciences University, Oregon, USA. Correspondence and offprint requests to: Dr Gordon Guyatt, Department of Clinical Epidemiology and Biostatistics, McMaster University Health Sciences Centre, Room 2C10, 1200 Main Street West, Hamilton, Ontario, Canada L8N 3Z5.

with 30 items distributed across five domains: Symptoms, Physical Function, Activities of Daily Living, Emotional Function, and Leisure. In addition, we administered the Sickness Impact Profile (SIP), the 36item short form of the Medical Outcomes Survey instrument (MOS SF-36), and the Brief Pain Inventory (BPI). On the basis of what women told us about the areas of their lives adversely affected by their osteoporosis, we constructed the OQLQ. Reliability coefficients between baseline and 2-week follow-up varied from 0.80 to 0.89 for the five domains. Crosssectional correlations between OQLQ domains and other measures were consistently higher than predicted (0.51– 0.81). The OQLQ proved to be as powerful or more powerful than alternative instruments for detecting improvement or deterioration in patients whose status changed. Sample sizes of less than 150 per group should be required to detect minimally important differences in parallel-group clinical trials. Longitudinal correlations between OQLQ and other measures were generally lower than predicted. This may be due to limitations in either the OQLQ or the other instruments. It is concluded that the OQLQ may prove to be a useful instrument for discriminating between women with different levels of impaired HRQL, and for evaluating change in women undergoing treatment for back pain related to osteoporosis. Keywords: Functional status; Health status; Measurement; Osteoporosis quality of life; Pain

Introduction Osteoporosis is a common [1–3] condition characterized by decreased bone mass and increased susceptibility to

Measuring Quality of Life in Women with Osteoporosis

fractures [4]. Many patients suffer from pain and disability associated with vertebral fractures [5–9]. Recently, investigators have become increasingly interested in osteoporotic fractures as they relate to patients’ health-related quality of life (HRQL) [10–13]. A growing focus on women’s health issues, the development of new pharmacological treatments for osteoporosis, and the advent of rehabilitation programs for women suffering from osteoporotic back pain [14,15] have contributed to the need for a more sophisticated measure of HRQL. General HRQL measures such as the Sickness Impact Profile and the Medical Outcomes Survey instruments have a broad focus and wide applicability. Generic instruments allow comparisons of populations suffering from different diseases associated with different disabilities. They are limited, however, in that they do not explore in depth the issues specific to osteoporosis. This limitation could lead to several problems. First, important aspects of HRQL impairment not explicitly addressed by generic questionnaires may be omitted. For example, previous investigations have documented fear of falling and breaking a bone [10], clothes not fitting properly [10], and fear about the future [14,15] as important problems for women with osteoporosis. These items are not included in generic measures, and their omission may lead to an incomplete or skewed picture of the HRQL impairment in women with osteoporosis. Omission of key items and a relatively superficial survey of key areas may also result in generic measures failing to detect important improvement in HRQL [16]. This could result in false negative results of trials of pharmacological or rehabilitation interventions: the intervention improves HRQL, but the questionnaire fails to detect that improvement. In a number of direct comparisons in randomized trials in widely varying areas other than osteoporosis, specific instruments have proved more responsive to intervention effects than generic measures [17–22]. Results from these trials confirm that concern about lack of responsiveness of generic measures is more than a theoretical concern. We therefore developed and tested a disease-specific questionnaire designed for postmenopausal women suffering chronic back pain due to osteoporosis. Asymptomatic women with osteoporosis have a different set of problems from those who already suffer from clinically important back pain. Because our goal was to produce a questionnaire that would be useful as a measure to evaluate pharmacological and rehabilitative treatment regimens that might be administered to women with pain from osteoporosis, we focused exclusively on this population. In a previous report, we described our survey of the problems of osteoporotic women, the results of which determined the items included in our new questionnaire [8]. This report reviews the development and describes the testing of the Osteoporosis Quality of Life Questionnaire (OQLQ).

479

Methods Population In both the instrument development and instrument testing phases, we included women with the following characteristics: 1. Age over 50 years and a last menstrual period at least 1 year prior to enrollment. 2. At least one vertebral fracture, defined as the radiographic presence of a 15% or more anterior height reduction (relative to posterior height) [23]. 3. A clinical diagnosis of chronic back pain due to osteoporosis-induced fractures. Back pain was attributed to a facture when there was a history of a sudden onset of back pain following a fracture. We determined the presence of osteoarthritis on the basis of clinical findings on radiographic and physical examination. We determined the presence of muscle spasm on the basis of the clinical physical examination. We excluded patients with the following characteristics: 1. Psychiatric, emotional, language or cognitive difficulties which would prevent reliable completion of the questionnaire. The clinicians responsible for the patient’s care made this decision. 2. Patients with secondary causes of bone loss including osteomalacia, hyperthyroidism, hyperparathyroidism, multiple myeloma and Cushing’s syndrome. 3. A diagnosis other than osteoporosis which might explain the patient’s back pain (such as osteoarthritis, rheumatoid arthritis, metastatic cancer, cauda equina syndrome, lumbar disc disease, hip fractures, muscular spasm). 4. Another major illness which would substantially influence the patient’s quality of life (such as chronic airflow limitation, asthma, unstable angina, congestive heart failure, blindness, stroke). We recruited 118 patients from a Canadian rheumatology practice and 108 patients from four American metabolic bone disease practices in Bangor, Maine, Worchester, Massachusetts, Lakewood, Colorado, and Portland, Oregon. All patients had been seen by the participating clinicians in the previous year. We identified eligible patients through chart review, and then contacted them to see whether they would be willing to participate. While individual practice varied to some extent, approximately 15–20% of the patients in the study received estrogen replacement therapy, approximately 60% received cyclic etidronate, and approximately 20% calcium. A handful of patients in the American centers received calcitonin. Women were advised to mobilize gradually and to take acetaminophen with codeine for pain. Approximately 50% of the Canadian patients received 10–20 mg of amitriptyline

480

at bedtime, but amitriptyline was used very infrequently in the American centers. We measured bone mineral density (BMD) by dual-energy X-ray absorptiometry. We were concerned that the American patients from the metabolic bone disease practices might be quite different from the Canadian patients from a rheumatology practice. However, they proved very similar (see Results).

Instrument Testing General Approach. HRQL instruments may have one or more purposes. A discriminative instrument is designed to distinguish between people at a single point in time. Discriminative instruments require high reliability (high ratio of differences between subjects to difference within subjects) and cross-sectional construct validity (appropriate correlations between established rating scales and the instrument being tested) [24]. An evaluative instrument is designed to measure the magnitude of longitudinal change in an individual or group. Evaluative instruments must be responsive (they can detect important changes, even if the changes are small) and demonstrate longitudinal construct validity (appropriate correlations between changes in the new questionnaire and changes in other measures) [25]. Testing Protocol. To test both the discriminative and the evaluative properties of the OQLQ required repeated administration of the questionnaire during periods in which patients were stable, and over periods in which they had changed. Eligible, consenting patients attended clinic for three visits: at baseline, after 2 weeks (when patients would, for the most part, be stable), and after 6 months (when some patients would be better, some unchanged, and some worse). At each visit, patients completed a number of questionnaires. The interviewers were trained at a workshop. Training involved review of a document providing details of interviewing technique and role play in which potential difficulties were presented, and strategies for dealing with them discussed. We will now describe the questionnaires we used in the study. Because it is a new questionnaire, and the primary focus of this paper, we will review in some detail the process of developing the OQLQ, a process that is fully presented in another publication [8]. Osteoporosis Quality of Life Questionnaire (OQLQ). We have described the development of the Item Reduction Questionnaire and the responses that particpating women provided in detail in a previous publication [8]. We reviewed generic measures of quality of life [26–32] and existing questionnaires evaluating functional status in low back pain [33–42] and work specifically related to quality of life in osteoporosis [10–15], and extracted items that we thought applicable to our population from these questionnaires. We identified additional items in the physical, emotional, social, occupational and

Osteoporosis Quality of Life Study Group

domestic domains through interviews with rheumatologists, rheumatology nurse specialists, physiotherapists, occupational therapists, and patients. We identified 168 unique items which we included in an Item Reduction Questionnaire. While our omission of involvement from clinicians in subspecialties other than rheumatology (particularly gynecology and endocrinology) may have compromised our item selection process, subsequent review of the 168 items by such clinicians has not revealed important omissions. We asked 100 women who met the eligibility criteria listed above whether any of the 168 items on the questionnaire represented problems for them. For items to which they responded affirmatively, the importance of each item was graded using a five-point scale, on which each item was rated from ‘very important’ to ‘not at all important’. We multiplied the response frequency of each item by its mean importance to yield an impact score. In both the questionnaire development and questionnaire testing phases the questionnaires were administered by trained research staff who were tested for standardization and accuracy of questionnaire administration. The 100 patients involved in the item reduction process were a different group from those involved in testing the OQLQ, the subject of the current paper. The participation of these 100 women was restricted to the item reduction phase of the study. The mean age of the 100 women interviewed was 69.0 + 8.1 years (range 51–91 years). All women had at least one vertebral fracture; 49 had two or more. There was a mean of 2.84 + 2.18 vertebral fractures (median 2, range 1–11). When considering fractures, we counted individual vertebrae. Fractures were located in the lumbar spine (n = 74 in 42 patients) and thoracic spine (n = 156 in 74 patients). Bone densitometry revealed a mean BMD of 0.87 + 0.13 g/cm3. The highest scores were for items relating to pain, particularly standing, and difficulty lifting and carrying objects. Fear of falling and fear of additional fractures were serious problems for these patients. We selected the items with the highest impact scores for inclusion in the OQLQ. We pretested the questionnaire with 10 new patients to ensure clarity of wording, lack of discomfort with the questions, and use of the full range of response options. The final OQLQ includes 30 questions distributed across five domains as follows: Symptoms (physical experiences associated with osteoporosis, which proved primarily associated with pain), 9 items; Physical Function (difficulties with components of activities of daily living), 5 items; Activities of Daily Living (ADL; self-care, housework, and associated activities such as shopping), 8 items; Emotional Function (affective components associated with osteoporosis), 4 items; Leisure (recreational activities), 4 items. Each item is associated with a seven-point scale in which a rating of 7 represents the best possible function and a rating of 1 the worst possible function. To produce a summary score for each of the five domains, we add up the patient’s ratings on each item and then divide by the number of items in the domain, yielding a score of 1 to 7.

Measuring Quality of Life in Women with Osteoporosis

The content of the questionnaire is summarized in the Appendix. Readers interested in the full questionnaire and guidelines for administration can write to the corresponding author. Because we have not conducted a weighting exercise that would allow us to specify the relative weight associated with the domains, we do not recommend adding up the five domains for a total OQLQ score. Sickness Impact Profile (SIP). The SIP is a generic health status measure that has 136 items grouped into 12 categories. Extensive data suggest it is reliable and valid as a discriminative instrument [43]. The SIP can be aggregated into a Physical domain and a Psychosocial domain. SIP scores, which theoretically can vary from 0 (no dysfunction) to 100 (complete dysfunction), are preference-weighted to reflect the impact of each area of dysfunction on quality of life. MOS SF-36. The short form of the Medical Outcomes Survey instrument with 36 items is a widely used generic instrument for measuring HRQL. Extensive data on reliability and validity for discriminative purposes are available [44,45]. The instrument includes nine domains. A number of scoring algorithms have been suggested; the present study used one in which the scores are aggregated into scores for physical function and mental health. Global Ratings. At the two follow-up visits we asked women to make an overall rating of the extent to which their health status had changed with respect to their overall pain, disability, and emotional function since the previous visit, using a 15-point scale from ‘a very great deal worse’ to ‘a very great deal better’ [46]. Patients who said they were unchanged were considered to be stable, while all others were considered to have undergone a change. Brief Pain Inventory (BPI). The BPI was initially developed for cancer pain, but has subsequently been used in other areas. The BPI asks patients to rate their worst pain in the last 24 hours, their least pain in the last 24 hours, their average pain in the last 24 hurs, and their pain at present. The developers have adduced preliminary evidence of reliability and validity of the BPI [47].

Statistical Methods Because the patient samples Canada were very similar characteristics (see below), bach’s alpha separately for combined the samples for all

in the United States and with respect to all key after calculating Crohnthe two populations, we main analyses.

Discriminative Properties. We measured the reliability of patients’ responses from the baseline and 2-week visits by calculating intraclass correlation coefficients for

481

each of the five OQLQ domains. We calculated each coefficient twice, once using the total sample, and once using only those patients who reported that they were unchanged when seen at 2 weeks. The intraclass correlation coefficients examined the ratio of variability due to differences between patients to total variability including variability between and within patients. We calculated Pearson’s correlation coefficients on the mean scores across the three visits. We used the mean of the three visits because estimates of the true relationship between measures are attenuated by the reliability of the measures themselves, and mean scores are thus more likely to reveal the true underlying relationship. Evaluative Properties. The responsiveness of an instrument is directly related to the sample size required to detect small but important differences in HRQL: the more responsive the instrument, the smaller the sample size required. Because of the limited correlations of the global ratings with the OQLQ domains (see Results) we could not directly estimate the change in OQLQ score that represents small but important differences in disease-specific HRQL. However, previous work has suggested that changes in domain score of approximately 0.5 on each domain represent small but important differences in other disease-specific HRQL instruments that, like the OQLQ, use seven-point scales as response options [48–50]. In estimating sample sizes for clinical trials, the smallest differences that are clinically important should be detectable. For calculating sample size for a clinical trial in which both pre- and postintervention scores are available, the noise comes from the variability across adjacent visits over the time frame of the trial. Over two visits, this variability can be quantified as the standard deviation of the difference between scores on these two visits. Therefore, calculating the ratio between the minimally important difference and the standard deviation of the difference between changes in OQLQ score over the 6 months between the baseline and second follow-up allows us to estimate sample size requirements for using the OQLQ as a measure of outcome in clinical trials. We made these calculations assuming a two-sided type I (or alpha) error of 0.05 and a beta error of 0.1. We also compared the responsiveness of the OQLQ and other questionnaires by conducting paired t-tests on patients who stated that they had changed on the global rating questionnaires. If patients said they had deteriorated we reversed the sign of their change in OQLQ score, and the score of the instrument with which we were comparing the OQLQ. We compared the t-tests using OQLQ domains with t-tests using other instruments measuring related constructs. Using the same approach as for the discriminative properties, we used a group consensus to generate predictions as to how strongly the changes in the domains of the OQLQ should correlate with the changes in scores in other instruments. We calculated Pearson’s correlation coefficients on the changes between adjacent visits. Patients with complete data provided two data

482

Osteoporosis Quality of Life Study Group

points for each correlation (the difference between the 2week follow-up and the baseline, and the difference between the 6-month and 2-week follow-ups). Where the interpretation of the correlations was unclear, we calculated correlations of change scores between the other measures.

and Canadian patients in Table 1. The two groups are remarkably similar in their characteristics at baseline. Further support for pooling the two groups comes from the Crohnbach’s alpha for the OQLQ, which was 0.96 in the Canadian women and 0.97 in the American women. In the overall group of 226 women, the mean BMD was 0.84 + 0.14 g/cm2 and the mean number of vertebral fractures was 2.78 (median 2, range 1–11).

Results Patients

Discriminative Properties

In Canada, 118 patients completed the first visit, 110 completed two visits, and 104 all three visits. Of those who did not complete all three visits, 5 lost interest, 3 became too ill, 4 were away from home at the time of schedule follow-up visits, 1 had problems with transportation, and 1 could not attend because her partner was ill. In the United States, 108 patients completed one visit, 105 completed two visits, and 96 all three visits. Of those who did not complete all three visits, 6 lost interest, 2 became too ill, 4 were away from home at the time of scheduled follow-up visits, and 1 could not attend because her partner was ill. We summarize the salient characteristics of the American

Table 1. Baseline characteristics of American and Canadian patients

Sample size Age (years) Duration of osteoporosis (years)b Lives alone Requires help bathing Requires help shopping Global pain Global disability Global emotional SF-36 Well-being SF-36 Functional Status

Americans

Canadians

108 70.1+7.3a 3.5 (1.5, 7.4)c 36.1% 5.6% 21.3% 3.6+1.5 2.9+1.6 4.2+1.3 61.9+16.3 58.3+24.2

118 69.1+7.5 3.4 (2.2, 5.2) 30.5% 5.9% 25.6% 3.5+1.5 2.7+1.4 4.2+1.1 61.6+16.0 57.3+21.7

a

Mean and standard deviation. Duration from first documented vertebral fracture associated with pain until questionnaire administration. c Mean and range. b

The mean (and standard deviation) of patient scores on the five domains at the baseline visit were Symptoms 4.8 (1.4), Physical Function 4.9 (1.5), ADL 4.9 (1.5), Emotional Function 4.9 (1.5), Leisure 5.2 (1.7). The intraclass correlation coefficients across the baseline and 2-week follow-up visits for the total sample and those who stated they were stable (the latter in brackets) were as follows: Symptoms 0.86 (0.91), Physical Function 0.85 (0.87), ADL 0.89 (0.90), Emotional Function 0.80 (0.84), Leisure 0.83 (0.85). Table 2 shows the correlations between the OQLQ domains and the other measures. Missing entries in the table represent relationships in which we did not make a priori predictions. The correlations are all moderate or strong, and in most case substantially greater than the correlations predicted. In all cases, chance represents a very unlikely explanation of non-zero correlation (p50.001).

Evaluative Properties Table 3 shows the standard deviations of the differences between adjacent visits, the ratio of the minimal important difference of 0.5 to the standard deviations of the difference between adjacent visits, and the resulting sample sizes needed to detect minimally important differences for the five OQLQ domains. In all cases, sample size of less than 150 per group are necessary to detect minimal important differences in any of the OQLQ domains.

Table 2. Cross-sectional correlations of the OQLQ with other measures: tests of discriminative validity OQLQ domain

Physical Function Emotional Function ADL Symptoms Leisure

Other instruments measuring pain and physical and emotional function BPI

SIP Physical

0.72 (4 0.3)a

0.59 (4 0.5)

0.81 (4 0.5)

SF-36 Mental Health

0.53 (4 0.4)

0.65 (4 0.4)

SF-36 Physical Function 0.79 (4 0.4)

0.63 (4 0.4) 0.56 (4 0.3)

ADL, activities of daily living. The numbers in parentheses are the predicted correlations.

a

SIP Psychocial

0.79 (4 0.25) 0.51 (4 0.25)

0.71 (4 0.25)

Measuring Quality of Life in Women with Osteoporosis

483

Table 3. Sample size requirements for detecting minimal important differences in the five OQLQ domains in a parallel-group randomized trial OQLQ domain

Standard deviation (SD)

MIDa/SD

Sample size requiredb

Physical Function Emotional Function ADL Symptoms Leisure

1.17 1.27 0.97 1.05 1.11

0.43 0.39 0.52 0.48 0.45

114 139 78 92 104

a

Minimal important difference of 0.5. Two-sided alpha error of 0.05 and beta error of 0.1, n per group in a parallel-group randomized trial.

b

Table 4. Relative responsiveness of various instruments in patients reporting in global ratings Global rating of change

Instrument

T value

p value

Change in pain

OQLQ symptoms Brief Pain Inventory OQLQ Physical Function OQLQ ADL OQLQ Leisure SIP Physical Function MOS SF-36 Physical Function OQLQ Emotional Function SIP Psychosocial Function MOS SF-36 Mental Health

5.0 4.2 3.9 2.4 2.0 0.68 2.1 3.4 2.3 4.2

50.0001 50.0001 0.0002 0.02 0.05 0.5 0.04 0.0009 0.03 50.0001

Change in disability

Change in emotional function

Table 5. Longitudinal correlations of OQLQ with global ratings of change and with changes in other measures: tests of evaluative validity OQLQ domain

Other instruments measuring pain and physical and emotional function BPI

Physical Function

Global pain

0.38 (4 0.3)a

Global disability

0.23 (4 0.5)

Emotional Function

Leisure a

SIP Physical

0.48 (4 0.5)

0.28 (4 0.5)

0.17 (4 0.3) 0.23 (4 0.3) 0.16 (4 0.25)

SIP Psychosocial

SF-36 Mental Health

0.12 (4 0.5) 0.20 (4 0.5)

ADL Symptoms

Global emotion

SF-36 Physical Function 0.35 (4 0.4)

0.14 (4 0.4) 0.13 (4 0.4) 0.21 (4 0.3)

0.24 (4 0.4) 0.31 (4 0.25)

0.03 (4 0.25)

0.22 (4 0.25)

The numbers in parentheses are the predicted correlations.

Table 4 demonstrates the relative ability of the OQLQ and other instruments to detect change in patients who reported themselves to be changed in their global ratings. In all cases but one, the OQLQ domains were at least as responsive as other instruments measuring related constructs. The one exception was the MOS SF-36 Mental Health domain, which was slightly more powerful than the OQLQ Emotional Function domain in detecting differences in patients who reported change in their global rating of emotional function. The mean differences in the OQLQ corresponding to the p values

in the table were 0.38 for the Symptoms domain, 0.38 for the Physical Function domain, 0.24 for the ADL domain, 0.26 for the Leisure domain, and 0.43 for the Emotional Function domain. Table 5 shows the correlations between changes in the OQLQ and changes in the other measures. Correlations greater than 0.12 were significant at the 0.01 level. In general, correlations were lower than predicted. In particular correlations were lower than expected between each of the OQLQ domains and the global ratings of change and the SIP Physical and Psychosocial domains.

484

In contrast, the correlations between changes in the OQLQ domains and both the BPI and SF-36 Physical Function domain were higher, and close to or in the range predicted. We explored the possibility that one or more of the OQLQ, the SIP, or the global ratings of the change were not providing valid ratings of change in HRQL by calculating correlations between the global ratings of change and change in the SIP, and between these two instruments and the changes in the BPI and SF-36 and Physical Function and Mental Health scores. The correlations between the global ratings and the other instruments were all 0.22 or less. The correlations between the SIP Physical Function and the other domains were all 0.19 or less. The correlation of the SIP Psychosocial domain with the SF-36 Mental Health domain was 0.28.

Discussion The strength of the present work includes the large and heterogeneous patient sample, with recruitment of patients from two countries; the careful training of interviewers; and rigorous and separate examination of the properties of the OQLQ for discriminative and evaluative purposes. The results, however, do not provide unequivocal positive support for the OQLQ in either discriminative or evaluative roles. With regard to its discriminative function, the OQLQ showed excellent reliability in all five domains: intraclass correlation coefficients were consistently above 0.75 and in most cases above 0.85. Correlations with independent measures of pain, physical function, and emotional function were high. In fact, they were appreciably higher than we anticipated would be consistent with the instrument’s measuring osteoporosis-related HRQL. While these results strongly suggest that the instrument does discriminate between patients with respect to their HRQL, they raise the question of how much the instrument is adding to established generic measures. Further study may elucidate what (if any) important information the OQLQ can contribute with respect to differences between HRQL in osteoporotic women. The problem with respect to the OQLQ’s evaluative function is quite different. All OQLQ domains performed well in terms of responsiveness. The OQLQ domains were at least as responsive as alternative measures of the same constructs (Table 4) and results suggest that fewer than 150 patients per group are needed to detect minimally important differences in HRQL when using the OQLQ in clinical trials (Table 3). In addition, the OQLQ proved considerably more responsive than generic measures. For instance, the OQLQ Physical Function domain was much better at detecting change than either the SIP or SF-36 Physical Function domains (Table 4). These findings are consistent with comparisons of generic and specific instruments summarized in the Introduction.

Osteoporosis Quality of Life Study Group

With respect to the OQLQ’s validity as an evaluative instrument, the results are less clear. We found that correlation between changes in OQLQ Physical Function, ADL, and Leisure domains and changes in two other measures, a problem-specific pain measure (the BPI) and the SF-36 Physical Function domain, were at, or slightly lower than, the predicted magnitude (Table 5). However, the correlations between change in the OQLQ Emotional Function and change in the SF-36 Mental Health was considerably lower than predicted, and correlations between changes in the OQLQ and changes in both global ratings of change and the SIP were much lower than predicted. These results are consistent with two interpretations. The OQLQ may not be measuring disease-specific HRQL, or the global ratings of change and the SIP may not be valid in their evaluative function in this population. We examined these hypotheses by calculating the correlations between changes in the SIP and the global ratings of change on the one hand, and between these two measures and changes in the SF-36 and BPI. The correlations were uniformly low, supporting the hypothesis that the low correlations between the OQLQ and the SIP and global ratings reflected problems with the validity of the latter two instruments rather than the former. We could hypothesize why the global ratings and the SIP have limited validity in this population. The global ratings asked women to estimate the degree to which they had improved or deteriorated. Perhaps they did not clearly remember their prior state, or were unable to place it in relation to their current state. With respect to the SIP, its items may not have focused sufficiently on the specific problems of osteoporotic women. These explanations are, however, speculative. The sample size of 226 in this study could be viewed as either a strength or a weakness. While greater than the usual sample size in initial validation of disease-specific instruments, the sample size is much smaller than for studies mounted to validate major generic instruments. A second limitation is that the low longitudinal correlations with some of the validation instruments, particularly the SIP, raise questions about the validity of the OQLQ as a measure of change. Finally, we carefully trained our interviewers, and it is likely that inferior results would be obtained with untrained interviewers. We conclude that the OQLQ shows promise as an instrument for measuring HRQL for women with osteoporotic back pain for clinical trials. However, the limitations we have listed above necessitate continued exploration of the OQLQ’s measurement properties. We therefore recommend that investigators interested in measuring HRQL include not only the OQLQ but also other instruments with potential usefulness as HRQL measures in osteoporosis. We would suggest instruments similar to those we used in the current study, including generic instruments such as the SF-36, and specific measures of pain, functional status, and emotional function.

Measuring Quality of Life in Women with Osteoporosis

The OQLQ provides considerably more detailed information about the impact of osteoporosis in women’s lives than other instruments, and the preliminary evidence of increased responsiveness recommends it for use in evaluating the effect of osteoporosis treatment on HRQL. Direct comparison of instruments in the context of clinical trials will further elucidate the measurement properties of the OQLQ and will, over time, establish the optimal method of HRQL measurement in women with osteoporotic back pain.

Appendix: Osteoporosis Quality of Life Questionnaire The questionnaire includes 30 questions. Each has one of four sets of seven response options, identified by the color of the card (grey, blue, pink, orange, yellow). Before reading each question ensure that the patient is looking at the correct color-coded card. 1. In general, how often during the last 2 weeks have you experienced a lack of energy because of your back problems due to osteoporosis? (orange card) 2. How often in the last 2 weeks have you felt afraid of fractures? (orange card) 3. How difficult has it been for you to bend in the last 2 weeks because of your back problems due to osteoporosis? (blue card) 4. How difficult has it been for you to clean a bathtub in the last 2 weeks? (blue card) 5. How difficult has it been for you to garden in the last 2 weeks because of your back problems due to osteoporosis? (blue card) 6. How often in the last 2 weeks did you experience decreased flexibility? (orange card) 7. How often in the last 2 weeks have you felt afraid of falling? (orange card) 8. How difficult has it been for you to carry things in the last 2 weeks because of your back problems due to osteoporosis? (blue card) 9. How difficult has it been for you to cut your toenails in the last 2 weeks? (blue card) 10. How difficult has it been for you to participate in sports or other physical exercise in the last 2 weeks? (blue card) 11. How much distress or discomfort have you had because of pain in the last 2 weeks? (grey card) 12. How often in the last 2 weeks have you felt angry about having an illness at a time of life when you planned to enjoy yourself? (orange card) 13. How difficult has it been for you to lift things in the last 2 weeks? (blue card) 14. How difficult has it been for you to get into and out of a car in the last 2 weeks because of your back problems due to osteoporosis? (blue card)

485

15. How difficult has it been for you to travel in the last 2 weeks? (blue card) 16. How much distress or discomfort have you had due to pain from bending in the last 2 weeks? (grey card) 17. How often in the last 2 weeks have you felt frustrated? (orange card) 18. In the last 2 weeks, how much trouble have you had finding a comfortable chair to sit in? (yellow card) 19. How difficult has it been for you to do housework in the last 2 weeks? (blue card) 20. How difficult has it been for you to take the type of vacation or holiday you enjoy because of your back problems due to osteoporosis? (blue card) 21. How much distress or discomfort have you had because of pain from carrying things in the last 2 weeks? (grey card) 22. How much has your walking been limited over the last 2 weeks because of your back problems due to osteoporosis? (pink card) 23. How difficult has it been for you reaching and retrieving something from overhead cupboards or shelves in the last 2 weeks? (blue card) 24. How much distress or discomfort have you had in the last 2 weeks because it has been painful to sit for long? (grey card) 25. How difficult has it been for you to shop for clothes in the last 2 weeks? (blue card) 26. How much distress or discomfort have you had in the last 2 weeks because it has been painful to stand for a long time? (grey card) 27. How difficult has it been for you to shop for groceries in the last 2 weeks? (blue card) 28. How much distress or discomfort have you had in the last 2 weeks because it has been painful to walk? (grey card) 29. How difficult has it been for you to vacuum in the last 2 weeks? (blue card) 30. How often in the last 2 weeks have you experienced tiredness because of your back problems due to osteoporosis? (orange card)

Response Options Grey card 1 extreme distress or discomfort 2 very much distress or discomfort 3 quite a bit of distress or discomfort 4 moderate distress or discomfort 5 some distress or discomfort 6 a little distress or discomfort 7 no distress or discomfort Blue card 1 extremely difficult – impossible to do 2 very difficult – almost impossible 3 quite a bit difficult 4 moderately difficult

486

5 6 7 8

somewhat difficult a little difficult not difficult not applicable

Pink card 1 extremely limited 2 very limited 3 quite a bit limited 4 moderately limited 5 somewhat limited 6 a little limited 7 not limited Orange card 1 all of the time 2 most of the time 3 a good bit of the time 4 some of the time 5 a little of the time 6 hardly any of the time 7 none of the time Yellow card 1 extreme trouble 2 very much trouble 3 quite a bit of trouble 4 moderate trouble 5 some trouble 6 a little trouble 7 no trouble

Domains The items are grouped into five domains as follows: Symptoms (1, 6, 11, 16, 21, 24, 26, 28, 30) Emotional function (2, 7, 12, 17) Physical Function (3, 8, 13, 18, 22) Activities of Daily Living (ADL) (4, 5, 9, 14, 19, 23, 25, 27, 29) Leisure and Social Activities (10, 15, 20) Acknowledgements. This work was supported by Merck, Sharpe and Dohme, Inc., and by Procter and Gamble Pharmaceuticals, Inc. We would like to thank the following people for help with data collection: Sandra Harper, Carol Wasnok, Marianne Bovee, Sandra Rosenberg, Jodee Kennedy, Debra Storm, Marcia Ostrowski, Jeffrey Hopkins, Sandy Pickens, and Roberta Mansfield. D.J.C. is a Career Scientist of the Ontario Ministry of Health. J.D.A. is an Associate Fellow of the Arthritis Society.

References 1. Ettinger B, Genant HK, Cann CE. Long-term estrogen replacement therapy prevents bone loss and fractures. Ann Intern Med 1985;102:319–24. 2. Jensen F, Christiansen C, Boesen J, et al. Relationship between bone mineral content and frequency of postmenopausal fractures. Acta Med Scand 1983;215:61–3. 3. Goldsmith NF, Johnston JO, Picetti G, et al. Bone mineral in the radius and vertebral osteoporosis in an insured population: a correlative study using I–125 photon absorption and miniature roentgenography. J Bone Joint Surg Am 1973;55:1276–93.

Osteoporosis Quality of Life Study Group 4. Riggs BL, Melton LJ III. Involutional osteoporosis. N Engl J Med 1986;314:1676–86. 5. Ettinger B, Block JE, Smith R, et al. An examination of the association between vertebral deformities, physical disabilities and psychosocial problems. Maturitas 1988;10:283–96. 6. Ettinger B, Black DM, Nevitt MC, et al. Contribution of vertebral deformities to chronic back pain and disability. J Bone Miner Res 1992;7:449–56. 7. Leidig G, Minne HW, Sauer P, et al. A study of complaints and their relation to vertebral destruction in patients with osteoporosis. Bone Miner 1990;8:217–29. 8. Cook DJ, Guyatt GH, Adachi JD, et al. Quality of life issues in women with vertebral fractures due to osteoporosis. Arthritis Rheum 1993;36:750–6. 9. Ross PD, Davis JW, Epstein RS, Wasnich RD. Pain and disability associated with new vertebral fractures and other spinal conditions. J Clin Epidemiol 1994;47:231–9. 10. Roberto KA. Women and osteoporosis: the role of the family and service community. Gerontologist 1988;28:224–8. 11. Roberto KA. Stress and adaptation patterns of older osteoporotic women. Women Health 1988;14:105–19. 12. Gold DT, Smith SD, Bales CW, Lyles KW, Westlund RE, Drezner MK. Osteoporosis in late life: does helath locus of control affect psychosocial adaptation? J Am Geriatr Soc 1991;39:670–5. 13. Lyles DW, Gold DT, Shipp KM, Pieper CF, Martinez S, Mulhausen PL. Association of osteoporotic vertebral compression fractures with impaired functional status. Am J Med 1993;94:595–601. 14. Gold DT, Lyles KW, Bales CW, Drezner MK. Teaching patients coping behaviours: an essential part of successful management of osteoporosis. J Bone Miner Res 1989;4:799–801. 15. Gold DT, Bales CW, Lyles KW, Drezner MK. Treatment of osteoporosis. J Am Geriatr Soc 1989;37:417–22. 16. Guyatt GH, Feeny DH, Patrick DL. Measuring health-related quality of life: basic sciences review. Ann Intern Med 1993; 70:225–30. 17. Tandon PK, Stander H, Schwarz RP Jr. Analysis of quality of life data from a randomized, placebo controlled heart-failure trial. J Clin Epidemiol 1989;42:955–62. 18. Smith D, Baker G, Davies G, Dewey M, Chadwick DW. Outcomes of add-on treatment with Lamotrigine in partial epilepsy. Epilepsia 1993;34:312–22. 19. Chang SW, Fine R, Siegel D, Chesney M, Black D, Hulley SB. The impact of diuretic therapy on reported sexual function. Arch Intern Med 1991;151:2402–8. 20. Tugwell P, Bombardier C, Buchanan WW, Goldsmith C, Grace E, Bennett KJ, et al. Methotrexate in rheumatoid arthritis: impact on quality of life assessed by traditional standard-item and individualized patient preference health status questionnaires. Arch Intern Med 1990;150:59–62. 21. Laupacis A, Wong C, Churchill D. The use of generic and specific quality-of-life measures in hemodialysis patients treated with erythropoietin. Control Clin Trials 1991;12:5168S–79. 22. Goldstein RS, Gort EH, Guyatt GH, Stubbing D, Avendano MA. Prospective randomized controlled trial of respiratory rehabilitation. Lancet 1994;344:1394–7. 23. Riggs BL, Seeman E, Hodgson SF, Taves DR, O’Fallon WM. Effect of fluoride/calcium regimen on vertebral fracture occurrence in postmenopausal osteoporosis: comparison with conventional therapy. N Engl J Med 1982;306:446–50. 24. Kirshner B, Guyatt GH. A methodologic framework for assessing health indices. J Chron Dis 1985;38:27–36 25. Guyatt GH, Kirshner B, Jaeschke R. Measuring health status? what are the necessary measurement properties? J Clin Epidemiol 1992;45:1341–5. 26. Bergner M, Bobbit RA, Carter WB, et al. The Sickness Impact Profile: development and final revision of a health status measure. Med Care 1981;19:787–805. 27. Hunt SM, McKenna SP, McEwen J, et al. A quantitative approach to perceived health status: a validation study. J Epidemiol Commun Health 1980;34:281–6.

Measuring Quality of Life in Women with Osteoporosis 28. Sackett DL, Chambers LW, MacPherson AS, et al. The development and application of indices of health: general methods and a summary of results. Am J Public Health 1977;67:423–42. 29. Andrews FM, Withey SB. Social indicators of well being: Americans’ perception of life quality. New York: Plenum Press, 1976. 30. Brook RH, Ware JE, Davies-Avery A, et al. Conceptualization and measurement of health for adult in the Health Insurance Study: overview. Med Care 1979;17(7 Suppl):1–131. 31. Stewart AL, Ware JE, Brook RH, et al. Conceptualization and measurement of health for adults in the Health Insurance Study: vol II, Physical health in terms of functioning. Rand Corporation, report R-1987/2-HEW, July 1978. 32. Parkerson GR, Gehlbach SH, Wagner EH, et al. The Duke-UNC Health Profile: an adult health status instrument for primary care. Med Care 1981;19:806–28. 33. Roland M, Morris R. A study of the natural history of back pain. Spine 1983;8:141–4. 34. Follick MJ, Smith TW, Ahern DK. Sickness Impact Profile: global measure of disability on chronic low back pain. Pain 1985;21:67–76. 35. Tait RC, Pollard CA, Margolis RB, et al. Pain disability index: psychometric and validity data. Arch Phys Med Rehabil 1987; 68:438–41. 36. Million R, Hall W, Nilsen KH, et al. Assessment of progress of back pain patient. Spine 1982;7:204–12. 37. Rock DL, Fordyce WE, Brockway JA, et al. Measuring functional impairment associated with pain: psychometric analysis of exploratory scoring protocol for activity pattern indicators. Arch Phys Med Rehabil 1984;65:295–300. 38. Evans JH, Kagan A. Development of functional rating scale to measure treatment outcome of chronic spinal patients. Spine 1986;11:277–81.

487 39. Fairbank JCT, Davies JB, Mbaot JC, et al. Oswestry low back pain disability questionnaire. Physiotherapy 1980;66:271–3. 40. Waddell G, Main CJ. Assessment of severity in low back disorders. Spine 1984;9:204–8. 41. Deyo RA, Diehl AK. Measuring physical and psychosocial function in patients with low back pain. Spine 1983;8:635–42. 42. Deyo R. Measuring functional status of patients with low back pain. Arch Phys Med Rehabil 1988;69:1044–53. 43. Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile: development and final revision of a health status measure. Med Care 1981;19:787–805. 44. Ware JE, Sherbourne CD. The MOS 36-Item Short-Form Health Survey (SF-36). I. Conceptual framework and item selection. Med Care 1992;30:473. 45. McHorney CA, Ware JE Jr, Raczek AE. The MOS 36-Item ShortForm Health Survey (SF-36). II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993;31:247–63. 46. Guyatt GH, Berman LB, Townsend M, Taylor DW. Should study subjects see their previous responses? J Chron Dis 1985;38:1003– 7. 47. Cleeland CS. Assessment of pain in cancer. In: Foley KM, editor. Advances in pain research and therapy, vol 16. New York, Raven Press, 1990. 48. Guyatt GH, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chron Dis 1987;40:171–8. 49. Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal important change in a disease-specific quality of life instrument. J Clin Epidemiol 1994;47:81–7. 50. Juniper EF, Guyatt GH, Feeny DH, Ferrie PJ, Griffith LE, Townsend M. Measuring quality of life in children with asthma. Quality Life Res 1996;5:35–46.

Received for publication 5 June 1996 Accepted in revised form 4 April 1997