Information Associated with Ambulatory Medical Care - NCBI

6 downloads 1734 Views 868KB Size Report
Apr 25, 1980 - diagnoses were comparedby having three independent internists (blinded to sourceand ... in coded computer-compatible form. In spite of inevitable ... practice with or a consultant to the clinic studied. Two months after they ...
A Comparison of Medical Record with Billing Diagnostic Information Associated with Ambulatory Medical Care DONALD R. STUDNEY, MD, AND A. RALPH HAKSTIAN, PHD

Abstract: The degree of similarity between diagnostic information furnished with claims and that simultaneously entered into the medical record was estimated for 1,215 private office visits in British Columbia, Canada. For each visit, claim card and chart diagnoses were compared by having three independent internists (blinded to source and type of the data) make judgments about each diagnostic pair. The judges were highly consistent internally and their judgments were stable over time. In 40 per cent of cases chart and

claims data were judged dissimilar, and in 38 per cent of cases claims data were judged more valuable as a reflection of the primary problem treated. The degree of judged similarity of chart and claims data correlated significantly and negatively with physician workload, income, and judges' preference for the billing card diagnosis. We conclude that in using claims data to determine the content of ambulatory visits, independent validation of such data may be important. (Am J Public Health 1981; 71:145-149.)

Because the largest volume of ambulatory medical care is provided in physicians' offices,1 more information and understanding of these patient contacts is needed. In the medical office visit, an often-complex diagnostic and therapeutic process is generally reduced to a few terse lines of text on the patient's chart and, in many cases, to a "reason for visit" or "diagnosis" on a third party payment claim form. The investigator in ambulatory care usually has little choice but to accept these data as representing ambulatory care problems or diagnoses. Similar limitations occur in the study of hospital care or in the derivation of vital statistics where the use of summary sheets or death certificates, while of limited validity,2 3 is nonetheless useful. In some studies, however,4-6 it has been possible to have well-trained, often medically-qualified observers observe a physician-patient visit and record or infer the data being sought. Unfortunately, the expense, unacceptability to patients and doctors, perturbation, and observer variability associated with this procedure have limited the number of these studies to small populations and often to highly specialized circumstances. The widespread introduction of health insurance plans, with associated claims systems, has in recent years provided investigators with a new source of medical-visit data, often in coded computer-compatible form. In spite of inevitable limitations of these data,7 some studies have been published8-12 in which it was assumed that the claims-associated diagnostic data were acceptable representations of the actual visit content.

To shed further light on the validity of claims data, we studied 1,215 office visits, and compared the chart information with that submitted to a third party payor.

From the Departments of Medicine and Psychology, University of British Columbia, Vancouver. Address reprint requests to Dr. Donald R. Studney, Room SF 173, Department of Medicine, Acute Care Unit, University of British Columbia, Vancouver, BC V6T lW5, Canada. This paper, submitted to the Journal April 25, 1980, was revised and accepted for publication October 6, 1980. AJPH February 1981, Vol. 71, No. 2

Methods The clinic studied was located in a British Columbia community of 65,000 and was comprised of 12 primary-care and five specialist physicians whose practice experience ranged from one to 35 years. Ninety-three per cent of visits were billed to a single payor, the Medical Services Plan of British Columbia (MSPBC). At each visit a paper chart and partially-completed account card were supplied. Visit notes were written on lined sheets in the chart, and the physician was also required to complete the account card by writing on it a diagnosis or chief complaint. No coding was required, and all account card diagnoses were written by physicians, usually concurrently with the visit. The nature or numbers of diagnoses could not affect the amount chargeable for the visit. Thus, for each office encounter two records of medical information were simultaneously made by each physician: one for medical record-keeping and the other to meet a payor's requirement for a written diagnosis or reason for visit. For each of 1,215 office visits sampled from an estimated 12,000 visits to 12 primary-care physicians over a three-month period in 1976, and billed to the MSPBC, both medical chart and billing-card diagnoses were obtained prior to refiling the chart or submitting the billing card. This was done by holding all billing cards at the end of a working day and comparing them with the related charts. Most data were obtained over a two-month period with the third month used to equalize the number of cases among physicians. Thus, approximately 100 cases consisting of pairs of chart and billing 145

STUDNEY AND HAKSTIAN

data were available for each of the clinic's 12 primary-care physicians, as the basic data unit for study. A registered nurse, experienced in medical abstracting, recorded the chart and billing diagnosis/reason-for-visit information. Where no information was recorded a blank was left, and where information was illegible a dashed line was typed on the capture sheet. Where at one visit multiple entries had been made that appeared to be diagnoses or reasons for visits, all were recorded but were treated as one case. A questionnaire was designed to assess two characteristics of the chart and billing card diagnostic data by requesting that two judgments be made on each pair: 1) the relative value of the chart vs the billing card diagnosis in determining the primary problem that was dealt with at the visit; and 2) the similarity of the two forms of diagnostic data, the latter recorded on a 5-point scale. Figure 1 shows examples of diagnostic pairs and typical judgments. The chart and billing card diagnoses were randomly allocated between columns A and B on the actual questionnaire. The questionnaires were submitted to three judges who had agreed to participate in the study. All judges were re-

cently-qualified, practicing internists with at least two years of training in medical information systems. All were blinded to the nature and purpose of the study, as well as to the sources of the two kinds of diagnostic data. None was in practice with or a consultant to the clinic studied. Two months after they had completed the original 1,215 case questionnaire (that is, approximately 100 cases for each of 12 primary-care physicians), each judge was given a retest questionnaire consisting of a 10 per cent sample of the cases previously examined for value and similarity of diagnosis. One of the authors (DRS), an internist, classified each case as being either a predominantly medical (including dermatological), surgical/trauma, obstetrical/gynecological, or psychiatric condition. Data from the three original and three retest questionnaires were abstracted, summarized, and coded for computer input by a research assistant and, where appropriate, checked by a second person. Intrajudge reliability was determined by computing Pearson product-moment correlation coefficients (time 1 vs time 2) as well as the proportions of the "value" responses I

A

I 1

Dyspnea, thumping in chest

B Frequent extrasystoles

Cardiac status evaluation 2

20 skin infection

20 skin infection

3

Upset re baby crying. Pain in epi siotomy.

Painful episiotomy.

4

Senile vaginitis.

Leukorrhea.

5

Nail wound foot

Puncture wound

6

Swollen R elbow ? infected bursa

Olecranon bursitis

7

106/104

Hypertension

8 9

10 11

Peritendinitis calcarea L knee

Checkup, previous alcoholic, heavy

Systol ic murmur

smoker,

Up tight

Anxiety.

-

-

Cold all the time.

Chronic anxiety state

Cystitis.

Hypertension

FIGURE 1-Examples of Chart and Corresponding Billing Card Diagnoses, with Typical Judgments of Value* and Similarity.

*Judges were asked to rate relative value of A to B in determining what primary problem was being dealt with at the office encounter which produced the data. 146

AJPH February 1981, Vol. 71, No. 2

CHART VS CLAIMS INFORMATION

which were identical and of "similarity" responses that were either identical or adjacent (one category apart), between the original and retest judgments made by the three judges. Internal consistency of the three judges (interjudge reliability) was estimated by computing a reliability coefficient known as Cronbach's alpha coefficient'3 for the judges' performance on both the "value" and "similarity" questions for each of the 12 physicians. The individual physician alpha coefficients were averaged using the methods of Hakstian and Whalen. 14 An independent estimation of interjudge reliability was obtained using the kappa statistic, for the value and similarity questions, using the method of Fleiss'5 for three raters. Pearson correlation coefficients were obtained between: 1) the proportion of cases in which the judge found the billing and chart diagnoses to be "somewhat similar", "similar", or "identical" for each of the 12 physicians, and 2) average number of patients seen by each physician per day (computed from day sheets for the time period studied), physician income (based on gross billings), the proportion of cases in which a physician's billing card diagnosis was preferred over the chart diagnosis, and the number of years since graduation. The proportions of cases, classified as medical, surgical, obstetric/gynecological, or psychiatric, which were similar by the above criteria were calculated.

Results The results in Table 1 show that with a liberal criterion for similarity (somewhat similar, similar, or identical) the judges found acceptable similarity in only 60 per cent of cases (column 1), with a range among the 12 physicians of 34 per cent to 89 per cent. With regard to relative value, it is seen that over all 12 physicians, the judges attributed a greater value to the billing diagnosis than the chart entry in 38 per cent of cases, whereas in only 17 per cent of cases was the chart diagnosis preferred (the judges did not know that they were comparing billing with chart data). This difference was found to be significant at the 0.001 level by a one-sample chi-square test for the parameter of a multinomially-distributed variable. With respect to correlations between the judged similarity of the two diagnoses (quantified as the proportion judged "similar", found in the left-most column in Table 1 and the various characteristics of the 12 physicians, significant negative correlations (at well beyond the .01 level of significance) were found between diagnosis similarity (over the 12 physicians) and workload, as measured by both patients per day (r = -.73) and gross income (r = -.75). An even stronger correlation (r = -0.89) was found between diagnosis similarity and a preference (among the judges) for the billing card diagnosis, indicating that the tendency toward diagnosis dissimilarity within a physician was not random but rather related to a factor that generated a billing card diagnosis which judges felt was of more value than the chart diagnosis. A weak (not statistically significant) negative correlation (r = -.41) was found between diagnosis similarity and physician age (as measured by years since graduation). Finally, no sigAJPH February 1981, Vol. 71, No. 2

TABLE 1-Judges' Estimates of "Similarity" and Relative "Value" of Chart vs Third-Party Billing Diagnoses Value

Similarity

"Billing"

Cases

Diagnosis Judged

Judged

"Chart"

Physician

Similar*

Greater Valuet

Diagnoses Judged Equal Valuet

1 2 3 4 5 6 7 8 9 10 11 12 All Physicians

.48 .62 .46 .41 .60 .55 .34 .73 .67 .66 .89 .82 .60

.46 .42 .41 .56 .40 .34 .47 .29 .37 .32 .25 .24 .38

.40 .42 .39 .32 .45 .44 .18 .58 .55 .55 .55 .60 .45

Diagnosis Judged

.14 .16 .20 .12 .15 .22 .35 .13 .08 .13 .20 .16 .17

Greater Valuet

*All three judges had to select "somewhat similar," "similar," or "identical" for a diagnostic pair to be included in this proportion. Total cases = 1,215. tThese data reflect the total proportion of judgments falling into the value category described, i.e., three judges x 1,215 diagnostic pairs = 3,645 cases.

nificant correlations were found between physician age and income, or physician age and the generation of superior billing card diagnoses. No difference in judged similarity was found between cases classified as medical, surgical, obstetric/gynecological, or psychiatric. The intrajudge consistency-i.e., the extent to which each of the three judges rated similarly the relative value and the similarity between billing card and chart diagnosesfrom the original rating to the repeat rating of 10 per cent of the data units (two months later) is presented in Table 2. It is seen that intrajudge consistency was high, with identical responses, between two time points, in 85 per cent of the cases rated, overall. The mean stability coefficients, over time, of .77 for "value" and .95 for "similarity" indicate high intrajudge agreement also. When addressing the question of internal consistency, we are concerned with interjudge reliability. In Table 3, both alpha and kappa values appear for each of the 12 physicians, and for both the "value" and "similarity" questions. It should be noted here that the alpha coefficient is unaffected by differences in the judges' frame of reference (mean ratings), whereas kappa is so affected. This fact explains some of the difference in the magnitudes of the resulting coefficients in Table 3. It can be seen that the internal consistency, or agreement, among the three judges, is very high when assessed by both Cronbach's alpha coefficient and the kappa statistic and compares favorably with published data regarding physician judgments.'6 All tabled coefficients in Table 3 are significantly different from zero at the .001 level of significance. Finally, we note that the judgments of diagnostic similarity tended to be more consistent than those of relative value of the pairs of diagnoses. 147

STUDNEY AND HAKSTIAN TABLE 2-4ntra-Judge Rliabillity on Retesting Judgment

"Value" N = 120 cases Responses Identical Stability Coefficient* "Similarity" N = 120 cases Responses Identical Responses Identical or Adjacent Stability Coefficient*

Judge "A"

Judge "B"

Judge "C"

85.0% .73

87.5% .80

83.3% .78

82.5% 100.0%

83.3% 100.0%

83.3% 97.5%

.95

.95

.94

*Stability Coeficient = Test-retest (Pearson) correlation coefficient. Reliability based on 10% retest; time interval was two months.

Discussion We studied diagnoses and chief complaints, items that Zuckerman6 showed to be uniformly well-recorded by physicians. One would expect that these claims diagnoses, which were written by physicians, would also be well recorded and would resemble chart diagnoses written simultaneously. Our finding that these were dissimilar in 40 per cent of cases suggests independent validation may be required in circumstances where claims diagnoses are used to infer visit data. The possible reasons for the perceived discrepancies noted between chart entries and simultaneously-written billing diagnoses are of interest. First, it should be noted that a review of the data revealed no evidence of apparent effort to conceal the real reason for visit from the payor, and, as noted previously, the billing diagnosis could not influence the amount of payment claimed. In a proportion of cases (17 per cent), the similarity was due to the lack or illegibility of a chart diagnosis, e.g., Case 10, Figure 1. Many of the remaining cases judged dissimilar were truly discordant, e.g., billing diagnosis "hypertension" TABLE 3-4nternal Consistency of Three Judges Assessing Relative "Value" of and "Similarity" of Medical Chart vs Third-Party Billing Diagnoses Value

Similarity

Physician

r,,*

Kappat

r,,*

Kappat

1 2 3 4 5 6 7 8 9 10 11 12

.77 .76 .83 .75 .76 .79 .79 .74 .80 .61 .79 .52 .75

.50 .42 .47 .47 .44 .50 .47 .48 .61 .29 .45 .26

.96 .96 .96 .96 .97 .96 .96 .95 .98 .95 .94 .94 .96

.59 .57 .51 .59 .57 .57 .59 .56 .75 .46 .48 .50

Average

x** *rx= Intemal judges) reliability. r

consistency (Cronbach's alpha), or

*From Fleiss JL: Ref. 15. "From Hakstian AR, Whalen TE: Ref. 14. 148

interjudge (three

bears no resemblance to the chart entry "cold all over." In other cases there was enough difference in the number, specificity, or precision of diagnoses to cause at least one judge to rule claim card and chart diagnoses "dissimilar" or "totally dissimilar." Such dissimilarities may be due to carelessness resulting from time pressures, as suggested by the workload correlations described earlier. Since the physician was required to write both a claims and a chart diagnoses at the same time, double-recording could be avoided by writing signs and symptoms in the chart and a specific medically-proper diagnosis on the billing card. Judged dissimilarity also may be related to the physician's personal assessment of the value of the diagnostic information to the payor, with the younger, less-experienced physicians assuring a greater conformity of billing to chart information than their older more experienced or successful colleagues, as suggested by the income and age correlations. The apparent increased precision of claims diagnoses may also point to a desire, conscious or otherwise, of the physicians to make a relatively unequivocal and therefore defensible case for payment on the billing card. This practice may also account for the noted preference by our internist judges for the billing card over the chart diagnoses, when dissimilarity was great. We stress that our discussion relates to the resemblance of claims to chart data. The question of the validity or usefulness of these data for review purposes has been addressed by Kroeger,'7 Linn,'8 Thompson,19 and others. The intra- and inter-judge reliability results strongly suggest that the methodology used here generates stable and consistent data for substantive analysis. The tendency to more consistent judgments of similarity than of value is explained by the ease of making a similarity vs relative value judgment, since the former is more strongly established by medical training and the medical literature, while the latter can be strongly affected by judges' practice philosophies, and by the experimental design which withheld from them the sources of the data. It is apparent that prior to analysis and quantification, the raw data pertaining to medical encounters were transformed possibly twice. The first transformation occurred between the actual content of the visit and what the physician recorded on either the chart or the billing card. This transformation could have been eliminated only by having the threeAJPH February 1981, Vol. 71, No. 2

CHART VS CLAIMS INFORMATION

judge panel witness all 1,215 medical encounters-an impossibility in this study. The possible second transformation would be inadvertently performed by the transcriber in abstracting the charts and transcribing the account card diagnoses. Since the same person performed all abstractions and did not participate in any of the judgments, the effect of this latter transformation is assumed to be minimal or factored out by the methodology. We conclude, therefore, because of the significant differences found between chart and claims data, that the use of the latter in health care research or quality assurance may require additional measurements to establish their similarity to chart data and, where possible, to establish concordance of the chart with actual visit content. The similarity of claims data may be further altered when diagnostic coding is required or when nonphysicians complete the claims forms and this also needs to be considered. The degree of similarity between the two diagnostic data pools may be affected by physician workloads, as well as by age and experience, and these correlates may be important in selecting study situations.

REFERENCES 1. U S Department of Health, Education, and Welfare: Health United States, 1975. DHEW Pub. No. HRA76-1232. National Center for Health Statistics. Rockville MD, Govt Printing Office, 1976, p.293. 2. Alderson MR, Meade TW: Accuracy of diagnosis on death certificates compared with that in hospital records. Brit J Prev Soc Med 1967; 21:22-29. 3. Gittelsohn A and Senning J: Studies on the reliability of vital and health records: 1. Comparison of cause of death and hospital record diagnoses. Am J Public Health 1979; 69:680-689. 4. Clute KF: The General Practitioner. Toronto: University of Toronto Press, 1963. 5. Peterson OL, Andrews LP, Spain RS, et al: An analytical study of North Carolina general practice 1953-1954. J Med Educ 1956;

31:12:1-165.

6. Zuckerman A, Starfield B, Hochreiter C, et al: Validating the content of pediatric outpatient medical records by means of tape-recording doctor-patient encounters. Pediatrics 1975; 56:407-411. 7. Tenney JB: Diagnostic precision for insurance records: A physicians' survey. Inquiry 1968; 5:4:14-19. 8. Buck CR, White KL: Peer review: Impact of a system based on billing claims. N Engl J Med 1974; 291:877-883. 9. Roos NP, Henteleff PD, Roos LL: A new audit procedure applied to an old question: Is the frequency of T & A justified? Med Care 1977; 15:1-18. 10. Home JM, Beck RG: Temporal patterns in the use of health services leading to cholecystectomy. Med Care 1978; 16:10061018. 11. Moen JB, Hill GB: Survival following renal transplantation in Saskatchewan, 970-74: Follow-up study using medical insurance records. Can Med Assoc J 1979; 121:434-438. 12. Mesel E and Wirtschafter DD: Automation of a patient medical profile from insurance claims data: A possible first step in automating ambulatory medical records on a national scale. Milbank Mem Fund Quarterly 1976; 54:29-45. 13. Cronbach LJ: Coefficient alpha and the internal structure of tests. Psychometrika 1951; 16:297-335. 14. Hakstian AR, Whalen TE: A K-sample significance test for independent alpha coefficients. Psychometrika 1976; 41:219-231. 15. Fleiss JL: Measuring nominal scale agreement among many raters. Psychol Bull 1971; 76:378-382. 16. Koran LM: The reliability of clinical methods, data and judgments, Part I. N Engl J Med 1975; 293:642-646. 17. Kroeger HH, Altman I, Clark DA et al: The office practice of internists: I. The feasibility of evaluating quality of care. JAMA 1965; 193:371-376. 18. Linn BS, Linn MW, Greenwald SR, et al: Validity of impairment ratings made from medical records and from personal knowledge. Med Care 1974; 12:363-368. 19. Thompson HC, Osborne CE: Office records in the evaluation of quality of care. Med Care 1976; 14:294-314.

ACKNOWLEDGMENTS This study was supported by Health and Welfare Canada NHRDP Grant No. 6610 1142-44. Dr. Studney is a National Health Research Scholar. An earlier version of this paper was presented in part at the National Meeting, American Federation of Clinical Research, Washington, DC, May 11, 1980.

EMF Now Accepting Grant Applications The Emergency Medicine Foundation is now accepting applications for grants beginning January 1. The applications will be considered for funding in the 1981-82 fiscal year, and must be received at the Foundation's Headquarters before June 1, 1981. The Foundation, organized in 1972 by the American College of Emergency Physicians, is the research and education arm of emergency medicine. In 1977 it was expanded to include representatives from the Emergency Department Nurses Association and the University Association for Emergency Medicine. In addition, representatives serve on the EMF Board of Trustees who are leaders in the National Association of Emergency Medical Technicians and the Society of Teachers of Emergency Medicine. The primary purpose of the Foundation is to improve the availability and quality of emergency medical services through research and education. Criteria for grants and applications may be obtained by writing: Emergency Medicine Foundation, Post Office Box 61911, Dallas, TX 75261, or calling (214) 659-0911.

AJPH February 1981, Vol. 71, No. 2

149