Convergent Reliability and Validity of the Questions About Behavioral ...

11 downloads 8222 Views 61KB Size Report
Abstract. This study compared key psychometric properties of the Motivation Assessment Scale (MAS) and the Questions About Behavioral Function (QABF) and ...
P1: GXB Journal of Developmental and Physical Disabilities [jodd]

Pp967-jodd-472040

October 21, 2003

C 2003) Journal of Developmental and Physical Disabilities, Vol. 15, No. 4, December 2003 (

Convergent Reliability and Validity of the Questions About Behavioral Function and the Motivation Assessment Scale: A Replication Study1 Karrie A. Shogren2,3 and Johannes Rojahn4,5

This study compared key psychometric properties of the Motivation Assessment Scale (MAS) and the Questions About Behavioral Function (QABF) and explored their convergent validity. Twenty adults with mental retardation and problem behaviors (aggression, self-injury, or property destruction) and 31 respondents participated. Test–retest reliability of the subscales in both scales was good to excellent (Cicchetti, D. V., 1994, Psychol. Assess. 6: 284– 290), and—except for 1 QABF subscale—internal consistency was good considering the small number of items and the purpose of the scale. Consistent with some earlier studies, interrater reliability was less satisfactory with both scales falling only into the fair to good range. Correlations between functionally equivalent subscales were statistically significant and were generally higher than correlations between nonequivalent subscales. The QABF and the MAS were found to be comparable in terms of the assessed reliabilities, and both instruments appear to be measuring very similar constructs. KEY WORDS: Functional assessment; behavior problems; mental retardation.

1 This

research was conducted as part of the first author’s Senior Honors Thesis. University of Kansas, Lawrence, Kansas. 3 nee ´ Scott. 4 George Mason University, Fairfax, Virginia. 5 To whom correspondence should be addressed at Center for Cognitive Development, George Mason University, 4400 University Drive, MSN 2C6, Fairfax, Virginia 22030-4444; e-mail: [email protected]. 2 The

367 C 2003 Plenum Publishing Corporation 1056-263X/03/1200-0367/0 

20:57

Style file version June 18th, 2002

P1: GXB Journal of Developmental and Physical Disabilities [jodd]

Pp967-jodd-472040

368

October 21, 2003

20:57

Shogren and Rojahn

The functional properties of problem behaviors in persons with mental retardation such as self-injurious behavior (SIB), aggression, and property destruction are important indicators for behavioral treatment selection (Carr, 1977; Iwata et al., 1982). To determine functional properties of a target behavior, several behavior-rating scales have been developed. Among them are the Motivation Analysis Rating Scale (MARS; Wieseler et al., 1985), the Behavior Analytic Questionnaire (BAQ; Hauck, 1985), the Motivation Assessment Scale (MAS; Durand and Crimmins, 1988), the Stimulus Control Checklist (SCC; Van Houten and Rolider, 1991), and the Questions About Behavioral Function (QABF; Matson and Vollmer, 1995). The MARS and SCC are generally not considered psychometrically robust (Sturmey, 1994). In addition, the MARS, SCC, and the BAQ are rarely found in the literature, and there is little evidence that they are much in use. The MAS (Durand and Crimmins, 1988) is the most widely used and extensively tested functional behavior rating scale (Sturmey, 1994). The orginal authors of the MAS reported strong reliability and validity (Durand and Crimmins, 1988), and the a priori factor structure was later empirically confirmed by factor analysis (Bihm et al., 1991). However, numerous other studies have failed to replicate the optimistic psychometric findings of the developers (Duker and Sigafoos, 1998; Sigafoos et al., 1994; Zarcone et al., 1991). The QABF was developed by Mastson and Vollmer (1995). Paclawskyj et al. (2000) reported promising interrater and test–retest reliability, and they also conducted a factor analysis that revealed five factors consistent with the a priori subscales. The purpose of this study was to compare psychometric properties of the QABF and the MAS and to replicate the cross-validation of the instruments reported by Paclawskyj et al. (2001).

METHOD Participants Participants were 20 individuals with mental retardation who attended day programs provided by a public service provider in central Ohio. To be eligible for the study, candidates had to be on a Restricted Procedure Plan for a problem behavior (self-injury, property destruction, or aggression). A Restricted Procedure Plan is required for every individual whose behavior is dangerous enough to warrant aversive and/or restrictive interventions and must be approved and monitored by the county Human Rights Committee. Sixty percent of the participants were between 20 and 29 years of age, 35% were between 30 and 39, and 5% were between 40 and 49. The majority of them were male (75%). Their levels of mental retardation ranged from mild

Style file version June 18th, 2002

P1: GXB Journal of Developmental and Physical Disabilities [jodd]

Pp967-jodd-472040

October 21, 2003

Comparison of the QABF and MAS

20:57

369

(15%), moderate (30%), severe (10%), severe/profound (10%), to profound (35%). Thirteen participants had aggressive behavior, 6 had SIB, and 1 was treated for property destruction. Respondents Thirty-one direct care staff members who were employed at the participants’ programs volunteered as respondents. All had reasonable familiarity with the participants and their behaviors, which was defined as having known or worked with the respective client for 6 months or more prior to the study. Instruments Questions About Behavioral Function (QABF) The QABF is an informant behavior rating scale that consists of 25 items. Each item is rated on a 4-point Likert-type scale (× = does not apply, 0 = never, 1 = rarely, 2 = some, 3 = often). The items were developed to probe five different functional properties, which are reflected by five subscales (Attention, Escape, Tangible, Nonsocial, and Physical). Motivation Assessment Scale (MAS) The MAS is an informant behavior rating scale that consists of 16 questions assigned to four subscales (Attention, Escape, Sensory, and Tangible). Unlike the QABF, a respondent completes the MAS directly without the involvement of an interviewer. The questions are rated on a 7-point Likerttype scale (0 = never, 1 = almost never, 2 = seldom, 3 = half the time, 4 = usually, 5 = almost always, 6 = always). Procedure One of the researchers (KAS) introduced the respondents to the rationale and the administration of the MAS and the QABF. Consistent with previous studies, the MAS was completed directly by the respondents without the involvement of an interviewer, whereas the QABF was administered by interview. The researcher, who conducted the QABF interview, presented each question one at a time, and recorded the informant’s response on a score sheet. When responding to the items of the MAS and QABF, respondents

Style file version June 18th, 2002

P1: GXB Journal of Developmental and Physical Disabilities [jodd]

Pp967-jodd-472040

October 21, 2003

370

20:57

Shogren and Rojahn Table I. Reliability Coefficients of the QABF and the MAS

Attention Escape Tangible Sensory Physical

Interrater (ICC)

Test–Retest (ICC)

Internal Consistency (α)

QABF

MAS

QABF

MAS

QABF

MAS

.60 .53 .46 .57 .53

.52 .35 .53 .73 n.a.

.86 .69 .91 .90 .61

.72 .71 .78 .88 n.a.

.88 .83 .82 .83 .24

.96 .84 .80 .83 n.a.

were instructed to focus on the behavior that was on the Restricted Procedure Plan. Respondents were told they could ask questions at any time. The order of presentation of the two instruments was counterbalanced across participants. For each client two staff persons independently completed the instruments to assess interrater reliability. One of them completed the instruments again 2 weeks after the initial completion to assess test–retest reliability.

RESULTS Interrater Reliability Interrater reliability was calculated by intraclass correlation coefficients (ICC), considered by many to be the most appropriate formula (e.g., Shrout and Fleiss, 1979). Values can be found in Table I. In addition, Pearson product–moment correlations (r ) were calculated because they were reported in several earlier studies. The QABF subscales’ r -values ranged from .46 to .60 and for the MAS from .35 to .73. Using Cicchetti’s interpretive guidelines (Cicchetti, 1994) for correlation coefficients,6 interrater reliability for the QABF subscales ranged from fair to good; interrater reliability of the MAS subscales ranged from poor to good. Generally we found that orginal authors of the instruments tended to report higher interrater reliability than other investigators did in subsequent studies. Paclawskyj et al. (2000) found considerably higher interrater reliability scores for the all five QABF subscales than was found by the current study (ranging from .79–.99). Similarly, Durand and Crimmins (1988) found higher scores for three of the four MAS subscales compared to this study (Pearson r ranging from .80 to .90). Duker and Sigafoos (1998), Sigafoos et al. (1994), Newton and Sturmey (1991), and Spreat and Connelly (1996) for the most part found even lower interrater reliability values for the MAS 6 .74).

Style file version June 18th, 2002

P1: GXB Journal of Developmental and Physical Disabilities [jodd]

Comparison of the QABF and MAS

Pp967-jodd-472040

October 21, 2003

20:57

373

Although this is disappointing, the findings were consistent with previous studies involving the MAS. There too, independent researchers failed to obtain interrater reliability levels as high as reported by the original developers, Durand and Crimmins (1988). However, interrater reliability in this study was somewhat higher than in any of the other MAS replication studies (Duker and Sigafoos, 1998; Newton and Sturmey, 1991; Sigafoos et al., 1994; Spreat and Connelly, 1996). As for the QABF, Paclawskyj et al. (2000) also found higher levels of QABF interrater reliability than this study, despite important methodological similarities between the studies (e.g., use of untrained respondents, participants with variable rates of behaviors and with behaviors other the SIB). At this point the reason for the difference in interrater reliability between the studies is not readily apparent. Future studies should explore the utility of a consensus process in situations when multiple respondents disagree. Another possibility to minimize the distortion by informant ratings is to use two or more raters and average their scores, a format recommended for other rating scales (e.g., Reiss, 1988). Discrepancies between respondents may be a function of several factors, not all of which can be attributed to the instrument itself. It is also conceivable that it has to do with different levels of expertise of the raters, different levels of familiarity with the client, or differences in the circumstances the raters tend to interact with the client, which may impact the actual functional properties of the targeted behavior. Another form of interrater reliability comparison that could be explored is the congruence between treatment indication decisions made on the basis of the MAS and the QABF (e.g., Thompson and Emerson, 1995). Although a potentially important subject of inquiry, this remained unexplored as our respondents were not asked to identify a treatment based on the rating scale results. In planning such a study, one should consider the scenarios that respondent who filled out the rating scales may or may not be the one who uses the information for treatment decisions. Test–retest reliability ranged between good and excellent in both instruments and generally was on the same level as reported by the instruments’ original authors. As for the QABF, the subscale retest-reliability data were similar to values reported by Paclawskyj et al. (2000), except Physical subscale. As far as the MAS was concerned, test–retest reliability for the subscales were similar to the figures reported by Durand and Crimmins (1988). Internal consistency of the instruments was comparable, with slightly better scores for most of the MAS. Except for the Physical subscale of the QABF, α-levels of all subscales were .80 or higher. Considering the way functional assessment scales are interpreted (i.e., clinical decisions usually do not depend on the exact score of the instrument), these values can be considered as fairly good (Nunnally, 1967). As for the MAS, internal consistency was

Style file version June 18th, 2002

P1: GXB Journal of Developmental and Physical Disabilities [jodd]

Pp967-jodd-472040

374

October 21, 2003

20:57

Shogren and Rojahn

comparable to those found in the Duker and Sigafoos’ study (Duker and Sigafoos, 1998), but generally higher than those found in the other three studies that assessed internal consistency of the MAS (Bihm et al., 1991; Duker and Sigafoos, 1998; Newton and Sturmey, 1991; Spreat and Connelly, 1996). Convergent validity between the MAS and the QABF, which was examined through correlations between functionally analogous subscales was by and large satisfactory in this study and suggests that four of the analogous scales of the two instruments measure similar constructs. The results of this study offer slightly stronger support of convergent validity than those of the earlier study by Paclawskyj et al. (2001). Some of the correlations between functionally nonanalogous scales were also relatively high, however, which may suggest that the functional properties of the behaviors among the participants in this sample were motivated by more than one function. In this regard, there were a couple of noticeable discrepancies between this study and the Paclawskyj et al. (2001) study. For instance, Escape and Attention as measured by the MAS and the QABF in this study showed correlation of .64 (statistically significant) and .31 respectively (see Table II, top panel); Paclawskyj et al. (2001) reported nonsignificant correlations of −.28 and −.14. Also, in this study the MAS subscale Tangible had a correlation of .17 with the QABF subscale Nonsocial, whereas Paclawskyj et al. (2001) found a significant correlation of .66. Further data will be necessary to determine the convergent and divergent character of subscales of these two scales. In summary, the QABF and the MAS were found to be very similar in terms of their respective reliability measures, with problems remaining with interrater reliability. Therefore, clinical results yielded by the QABF or the MAS—particularly those where independent and credible informants disagree—should be treated with caution unless other corroborating clinical evidence such as behavior observations, functional analysis, and a consensus among several independent respondents can be obtained. ACKNOWLEDGMENTS The authors acknowledge gratefully the generous support by the Franklin County Board of Mental Retardation and Developmental Disabilities and thank the staff members who served as volunteer respondents. REFERENCES Bihm, E. M., Kienlen, T. L., Ness, M. E., and Poindexter, A. R. (1991). Factor structure of the Motivation Assessment Scale for persons with mental retardation. Psychol. Rep. 68: 1235–1238.

Style file version June 18th, 2002

P1: GXB Journal of Developmental and Physical Disabilities [jodd]

Comparison of the QABF and MAS

Pp967-jodd-472040

October 21, 2003

20:57

375

Carr, E. G. (1977). The motivation of self-injurious behavior: A review of some hypotheses. Psychol. Bull. 84: 800–816. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol. Assess. 6: 284–290. Duker, P. C., and Sigafoos, J. (1998). The Motivation Assessment Scale: Reliability and construct validity across three topographies of behavior. Res. Dev. Disabil. 19: 131–141. Durand, M. V., and Crimmins, D. B. (1983). A preliminary report on an instrument which assesses the functional significance of children’s deviant behavior. Paper Presented at the Berkshire Association for Behavior Analysis and Therapy, Amherst, MA. Durand, M. V., and Crimmins, D. B. (1988). Identifying the variables maintaining self-injurious behavior. J. Autism Dev. Disord. 18: 99–117. Hauck, F. (1985). Development of a behavior-analytic questionnaire precising four functions of self-injurious behavior in the mentally retarded. Int. J. Rehabil. Res. 8: 350–352. Iwata, B. A., Dorsey, M. F., Slifer, K. J., Bauman, K. E., and Richman, G. S. (1982). Toward a functional analysis of self-injury. Anal. Intervention Dev. Disabil. 2: 3–20. Matson, J. L., and Vollmer, T. R. (1995). The Questions About Behavioral Function (QABF) User’s Guide, Scientific Publishers, Baton Rouge, LA. Newton, J. T., and Sturmey, P. (1991). The Motivation Assessment Scale: Interrater reliability and internal consistency in a British sample. J. Ment. Deficiency Res. 35: 472–474. Nunnally, J. C. (1967). Psychometric Theory, McGraw Hill, New York. Paclawskyj, T. R., Matson, J. L., Rush, K. S., Smalls, Y., and Vollmer, T. R. (2000). Questions about behavior function (QABF): A behavioral checklist for functional assessment of aberrant behavior. Res. Dev. Disabil. 21: 223–229. Paclawskyj, T. R., Matson, J. L., Rush, K. S., Smalls, Y., and Vollmer, T. R. (2001). Assessment of the convergent validity of the Questions About Behavioral Function scale with analogue functional analysis and the Motivation Assessment Scale. J. Intellectual Disabil. Res. 45: 484–494. Reiss, S. (1988). Test Manual for the Reiss Screen for Maladaptive Behavior, International Diagnostic Systems, Columbus, OH. Shrout, P. E., and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychol. Bull. 86: 420–428. Sigafoos, J., Kerr, M., and Roberts, D. (1994). Interrater reliability of the Motivation Assessment Scale: Failure to replicate with aggressive behavior. Res. Dev. Disabil. 15: 333–342. Spreat, S., and Connelly, L. (1996). Reliability analysis of the Motivation Assessment Scale. Am. J. Ment. Retard. 100: 528–532. Sturmey, P. (1994). Assessing the functions of aberrant behaviors: A review of psychometric instruments. J. Autism Dev. Disord. 24: 293–304. Thompson, S., and Emerson, E. (1995). Inter-observer agreement on the Motivation Assessment Scale: Another failure to replicate. Ment. Handicap Res. 8: 203–208. Van Houten, R., and Rolider, A. (1991). Applied behavior analysis. In Matson, J. L., and Mulick, J. A. (eds.), Handbook of Mental Retardation, 2nd edn., Pergamon Press, New York, pp. 569–585. Wieseler, N. A., Hanson, R. H., Chamberlain, T. P., and Thompson, T. (1985). Functional taxonomy of stereotypic and self-injurious behavior. Ment. Retard. 23: 230–234. Zarcone, J. A., Rodgers, T. A., Iwata, B. A., Rourke, D. A., and Dorsey, M. F. (1991). Reliability analysis of the Motivation Assessment Scale: A failure to replicate. Res. Dev. Disabil. 12: 349–360.

Style file version June 18th, 2002