Assessing Diagnostic Reasoning - Wiley Online Library

15 downloads 0 Views 247KB Size Report
L. Hansen, MD, Geoffrey R. Norman, PhD, Kevin W. Eva, PhD, Bernard Charlin, MD, PhD, and. Jonathan ...... Norcini J, Anderson B, Bollela V, et al. Criteria for.
BREAKOUT SESSION

Assessing Diagnostic Reasoning: A Consensus Statement Summarizing Theory, Practice, and Future Needs Jonathan S. Ilgen, MD, MCR, Aloysius J. Humbert, MD, Gloria Kuhn, DO, PhD, Matthew L. Hansen, MD, Geoffrey R. Norman, PhD, Kevin W. Eva, PhD, Bernard Charlin, MD, PhD, and Jonathan Sherbino, MD, MEd

Abstract Assessment of an emergency physician (EP)’s diagnostic reasoning skills is essential for effective training and patient safety. This article summarizes the findings of the diagnostic reasoning assessment track of the 2012 Academic Emergency Medicine consensus conference “Education Research in Emergency Medicine: Opportunities, Challenges, and Strategies for Success.” Existing theories of diagnostic reasoning, as they relate to emergency medicine (EM), are outlined. Existing strategies for the assessment of diagnostic reasoning are described. Based on a review of the literature, expert thematic analysis, and iterative consensus agreement during the conference, this article summarizes current assessment gaps and prioritizes future research questions concerning the assessment of diagnostic reasoning in EM. ACADEMIC EMERGENCY MEDICINE 2012; 19:1454–1461 © 2012 by the Society for Academic Emergency Medicine

P

sychologists have been studying how people think for decades. Translation and application of these theories to medicine has accelerated in recent years in response to emerging themes of patient safety and competency-based education.1–5 Emergency physicians (EPs) are challenged daily by the vast spectrum and acuity of clinical presentations they diagnose in a data-poor, rapidly evolving, decision-dense

environment. Diagnostic uncertainty is a hallmark of emergency medicine (EM), yet as a result of these factors, it is perhaps not surprising that errors are made.6– 8 One retrospective study of patients evaluated by EPs reported a diagnostic error rate of 0.6%.9 In contrast, 37% to 70% of malpractice claims allege physician negligence or diagnostic error,6,10 and in one study, 96% of missed ED diagnoses were attributed to cognitive

From the Division of Emergency Medicine, University of Washington School of Medicine (JSI), Seattle, WA; the Department of Emergency Medicine, Indiana University School of Medicine (AJH), Indianapolis, IN; the Department of Emergency Medicine, Wayne State University (GK), Detroit, MI; the Department of Emergency Medicine, Oregon Health & Science University (MLH), Portland, OR; the Department of Clinical Epidemiology and Biostatistics (GRN), McMaster University, Hamilton, Ontario, Canada; the Centre for Health Education Scholarship, University of British Columbia (KWE), Vancouver, British Columbia, Canada; the Center of Pedagogy Applied to Health Sciences, University of Montreal (BC), Montreal, Quebec, Canada; and the Division of Emergency Medicine, Department of Medicine (JS), McMaster University, Hamilton, Ontario, Canada. Breakout session participants: Chandra Aubin, Kat Bailey, Jeremy Branzetti, Rob Cloutier, Eva Delgado, Frank Fernandez, Doug Franzen, Robert Furlong, David Gordon, Nikhil Goyal, Richard Gray, Nathan Haas, Danielle Hart, Emily Hayden, Corey Heitz, Sheryl Heron, Cherri Hobgood, Laura Hopson, Hans House, Sharhabeel Jwayyed, Sorabh Khandelwal, Paul Ko, Amy Kontrick, Richard Lammers, Katrina Leone, Michelle Lin, Kerry McCabe, Chris McDowell, Brian Nelson, Elliot Rodriguez, Nestor Rodriguez, Sally Santen,Tim Schaefer, Jeff Siegelman, Bill Soares, Susan Stern, Tom Swoboda, James Takayesu, Dave Wald, Clare Wallner, John Wightman, Adam Wilson, and Paul Zgurzynski. Received June 26, 2012; accepted June 27, 2012. This paper reports on a workshop session of the 2012 Academic Emergency Medicine consensus conference, “Education Research in Emergency Medicine: Opportunities, Challenges, and Strategies for Success,” May 9, 2012, Chicago, IL. The authors have no relevant financial information or potential conflicts of interest to disclose. Supervising Editor: Terry Kowalenko, MD. Address for correspondence and reprints: Jonathan S. Ilgen, MD, MCR; e-mail: [email protected].

1454

ISSN 1069-6563 PII ISSN 1069-6563583

© 2012 by the Society for Academic Emergency Medicine doi: 10.1111/acem.12034

ACADEMIC EMERGENCY MEDICINE • December 2012, Vol. 19, No. 12 • www.aemj.org

factors.8 However, the retrospective analyses used in all of these studies are prone to substantial hindsight bias, where further clinical information, and the evolution of patients’ symptoms were unavailable at the time of ED diagnosis.11 In May 2012, Academic Emergency Medicine hosted a consensus conference entitled “Education Research in Emergency Medicine: Opportunities, Challenges, and Strategies for Success,” with the goals of defining research agendas that address the measurement gaps in EM education and building infrastructure for collaboration in these domains. This article reports on the findings of the diagnostic reasoning assessment breakout session. Through a qualitative process that included a review of the literature, expert thematic analysis, and iterative consensus agreement at the conference, current assessment gaps are summarized and future research questions concerning the assessment of diagnostic reasoning are prioritized. BACKGROUND Research in cognitive psychology has explored how individuals reason when solving problems.12–14 Emerging theories suggest ways to understand the process of diagnostic reasoning in medicine.15–18 One clear conclusion is that general problem-solving strategies cannot be effectively taught, learned, or applied.19–21 Success on one type of problem does not predict success on another,20,22–24 nor does the quality of general reasoning processes appear to distinguish between experts and novices.17 A classic study by Elstein et al.20 demonstrated that experts have more knowledge than novices, and it is this increased knowledge that enables them to achieve a higher rate of diagnostic accuracy, rather than general problem solving skills. It is not only the amount of knowledge, but also the manner in which this knowledge is arranged in clinicians’ memories, that facilitates accurate diagnostic reasoning.25 Compared to novices, expert physicians are better able to access knowledge precisely because of their experience, while novices may be unable to connect existing knowledge to a “novel” clinical problem.26,27 From didactic presentations, role modeling, case discussions, and clinical exposure, novices integrate networks of information, associative links, and memories of real patient encounters to form unique clusters of information for each diagnosis. Barrows and Feltovich28 coined the term “illness scripts” for these complex collections of data. The illness script theory assumes that knowledge networks adapted to clinical tasks develop through experience and operate autonomously beneath the level of conscious awareness.29,30 Clinicians refine their unique collection of illness scripts based on real patient encounters, thereby forming idiosyncratic memories relating to a diagnosis.27,31,32 Through experience, clinicians accumulate a vast “library” of patient presentations that can be rapidly and subconsciously accessed for the purpose of hypothesis generation and diagnostic decision-making.18 This “pattern matching” is seen as the dominant mode of reasoning for most expert clinicians.19,33 These automatic reasoning processes, more recently labeled as System 1 thinking,34–39 are nonana-

1455

lytical, rapid, and require little cognitive effort.39 In contrast, System 2 thinking is analytical, effortful, and employs a deductive search for a fit between the available information and appropriate scripts.34–39 Novices employ this analytic mode of reasoning more frequently than their experienced counterparts because they lack the experience necessary for System 1 reasoning. However, while System 1 reasoning is a hallmark of the experienced physician, errors may result from an overreliance on automatic reasoning.40 Most clinical scenarios require both systems. This combined approach, often referred to as “dual processing,” likely offers the best chance at diagnostic success, even for novices.16,17,25 A series of studies in which undergraduate psychology students were taught to read electrocardiograms demonstrated improved performance when these subjects were given instructions to use both similarity (e.g., System 1) and feature identification (e.g., System 2) strategies for diagnosis, compared to use of either of these strategies alone.41 It is possible that the combined use of automatic and analytic thinking is more beneficial for complex rather than simple cases42 or when physicians anticipate difficulty.43,44 Finally, diagnostic reasoning constructs must be considered in the context of a dynamic and decision-dense environment.45 Studies suggest that EPs care for a median of six to seven patients with as many as 16 patients simultaneously.46 The average time on particular tasks is limited to less than 2 minutes,47 and interruptions occur every 2 to 10 minutes in the ED.46–50 A study in a U.S. academic ED demonstrated that 42% of tasks were interrupted before completion.46 In one study, when EPs were interrupted, they resumed the original suspended activity only after they performed one to eight additional activities.51 Additionally, the high metabolic demand of analytical reasoning (i.e., System 2)52,53 is likely amplified in decision-dense environments. Based on cognitive load theory,54 it is thus possible that current admonitions to “think carefully” in the ED environment (i.e., employ System 2) may, in fact, overwhelm working memory, and be detrimental.

ASSESSING DIAGNOSTIC REASONING Assessment should not be considered in isolation to other integrated elements of a training program (learning objectives, instructional methods, etc.) or the reward structure inherent in continuing professional development activities.55 Neither should a single instrument or testing format be regarded as sufficient for assessing diagnostic reasoning. The psychometric properties, feasibility, acceptability, and educational effect of any strategy are dependent on its context and application.56 An assessment program that employs multiple integrated strategies will provide the most robust process for determining physician competence in diagnostic reasoning.57 Beyond the specific issues noted in the review of each tool below, the following general issues limit existing assessment formats:



Diagnostic reasoning must be inferred from behavior because it is not a discrete, measurable quality and is not independent of context and content. To achieve

1456







any degree of validity, inferences must sample over a number of knowledge domains. Inferences are imprecise. The theorized dual process of reasoning cannot be isolated from the context in which it functions, nor can the explicit use of System 1 or System 2 be measured in the clinical environment.16,17 Any decision-making task requires a mixture of both processes.58 Existing instruments that assess diagnostic reasoning emphasize System 2. System 1 reasoning is unconscious and cannot be explicitly articulated with trustworthy accuracy. The shift between automatic and analytic reasoning (which may in fact be performed in parallel) is impossible to directly observe or absolutely infer in clinical environments.59 Accuracy, and therefore assessment of diagnostic reasoning, is influenced by context specificity. Diagnostic accuracy is not stable across all of the clinical domains that inform EM and mandates assessment on multiple patient problems. For example, the diagnostic accuracy of an EP confronted with a patient with chest pain does not necessarily correlate to his or her diagnostic accuracy of a patient with a vesicular exanthem.57 Expert assessors are influenced by their frame of reference, where a rater’s personal knowledge, experience, ability, and personal bias (of the importance of a specific element in case management) influences his or her adjudication of a learner’s performance in a nonstandard fashion.60–64

ASSESSING DIAGNOSTIC REASONING IN THE EXTRA-CLINICAL SETTING Assessment of diagnostic reasoning outside of the clinical setting allows for better standardization, improved resource efficiency, sampling across a broad variety of clinical pathologies, and improved reliability over clinically based assessments.65 Common criticisms of these modalities include lack of authenticity66 and general neglect of the process of information gathering15 (an important factor in diagnostic reasoning). Written Examinations The most commonly used assessment of diagnostic reasoning is through multiple-choice question (MCQ) examinations. These form the bulk of the United States Medical License Examination (USMLE) Steps 1, 2, and 3 and Part I of the Medical Council of Canada Qualifying Exam. Context-rich MCQs may be reasonable tests of decision-making in situations of certainty, where a particular answer has been determined to be most correct. MCQs can be produced with excellent psychometric qualities, can be easily administered, and in some ways are less resource-intensive than other assessment instruments.65 A variety of multiple-choice examinations have been shown to have good predictive validity with respect to performance in practice as measured by peer assessment, a variety of clinical indicators including appropriate antibiotic prescribing and cardiac mortality, and patient complaints.67–69 A clear drawback to MCQs for the purposes of assessing diagnostic reasoning is that the list of predefined

Ilgen et al. • DIAGNOSTIC REASONING ASSESSMENT

choices may cue responses. In clinical practice, diagnostic possibilities must be generated de novo by the practitioner, a crucial step in decision-making that may be driven by a preponderance of subconscious System 1 processes.70,71 Thus, the diagnostic hypothesis-generation stage is bypassed by the traditional MCQ format. MCQs likely have the most value in the context of diagnosis verification, a process that incorporates elements of both System 1 and System 2. If there is a correlation between hypothesis generation and diagnosis verification, then one could potentially infer performance in one from performance in the other. One way to evaluate de novo hypothesis generation is to use key feature problems (KFPs).72 Questions prompt learners with a clinical scenario followed by a series of open-ended questions on essential steps for resolution of the case. This testing format allows for several approaches to the same scenario and prompts learners to think about problem identification, diagnostic strategies, and management decisions.72,73 KFPs are used as part of licensing and certification examinations in several countries, and scores have been shown to correlate with clinical performance outcomes.68 The script concordance test (SCT) was developed based on the illness script theory.74 Questions include a short clinical scenario followed by diagnostic possibilities and sequential pieces of information to consider. After each new piece of information, the learner is prompted to indicate how it affects decision-making using a Likert-type scale. SCTs can also be structured to probe knowledge about use of diagnostic tests or therapeutic interventions. Unlike MCQs, where there is always one most correct answer, this method compares the responses of learners to the range of responses generated by a reference panel of experts. Scoring is based on the degree of concordance with the reference panel. Thus, there is greater capacity to reflect the authentic situation of a clinical problem that does not necessarily have a simply defined, single correct answer. SCTs are challenging to develop, although they can be administered with the same ease as MCQs.75 Research concerning SCTs suggests that scores offer a valid reflection of diagnostic reasoning, with test performance correlating with clinical experience and intraining examination scores.76,77 However, by design, SCTs emphasize System 2 processes, specifically how clinicians interpret data with a particular hypothesis in mind. Oral Examinations The American Board of Emergency Medicine uses an oral examination consisting of clinical scenarios to assess diagnostic reasoning. Clinical experts simultaneously provide sequential information about the scenario and assess the participant using a structured template of key actions. The expert assessor, an EP trained to give the examination, can explore diagnostic reasoning using semistructured prompts. Several studies suggest that oral exams can be valid assessment instruments,78,79 although these tools are resource-intensive when used for high-stakes assessment. Oral examinations that present multiple clinical scenarios simultaneously may offer a novel way to assess diagnostic

ACADEMIC EMERGENCY MEDICINE • December 2012, Vol. 19, No. 12 • www.aemj.org

1457

reasoning that more closely approximates the cognitive load inherent in most ED settings.7,46,48

ASSESSING DIAGNOSTIC REASONING IN THE CLINICAL SETTING

Objective Structured Clinical Examinations Objective structured clinical examinations (OSCEs) use multiple brief stations, where at each a specific and truncated task is performed in a simulated environment. Scoring involves either standardized patients (SPs) or experts completing checklists or global rating scales.80 Using an OSCE to assess diagnostic reasoning may be confounded by the use of: 1) SPs as nonexpert scorers (although some evidence exists that nonclinicians and nonexperts can be used with sufficient reliability),81,82 2) checklists that focus on thoroughness of data gathering and devalue System 1 reasoning,83 and 3) a truncated scenario that does not simulate the diagnostic density or complexity of clinical problems encountered in EM practice. Although OSCEs represent a step toward authenticity relative to written or oral examinations,84 the validity of this method for diagnostic reasoning assessment remains uncertain.85,86

Workplace-based assessments sample learner performance in the clinical environment to form a judgment of diagnostic reasoning capacity.25,94 In artificial testing environments, the significant multitasking demands of EM are removed, perhaps leading to an artificial inflation of a learner’s diagnostic reasoning ability. For these and other reasons, multiple undergraduate, graduate, and continuing medical education organizations have endorsed these types of in vivo assessments.2–5,95,96 The feasibility of these methods is challenged by the time and attention required of assessors, especially given the frequency of interruptions and distractions in EDs.

Virtual Patients Virtual patients, such as the computer-based case simulations employed in Step 3 of the USMLE, prompt examinees to obtain a history, perform a physical examination, and make diagnostic and therapeutic decisions.87,88 Examinees direct the patient encounter and the advance of simulated time, independently generating diagnostic possibilities and determining what additional information is necessary to confirm or refute initial hypotheses.87 This type of testing attempts to bridge the gap between the control afforded by standardized testing and the authenticity of a true patient interaction. Through the lens of diagnostic error, one study demonstrated that 22% of examinees made potentially harmful decisions on the USMLE Step 3 computer case simulations, although the authors emphasized that such actions have not been shown to be predictive of a physician’s decisions in a true clinical setting.89 Team-based Simulation Team-based simulation (or crisis resource management simulation) has recently emerged as an instrument with potential to assess diagnostic reasoning. This type of simulation uses computerized mannequins, physical replications of clinical care areas, and multiple actors (nurses, respiratory therapists, physician colleagues, etc.) to approximate the complexity of diagnostic reasoning (among other competencies) in the clinical environment.90 Simulation offers the benefits of standardization and opportunities to explore reasoning in greater detail using postencounter debriefings. However, the relationship between simulated and actual clinical performance is unclear.91 Most current research has focused on the instructional and educational value of this type of simulation, rather than the use of this modality for assessment.91,92 While the efficacy of partial task trainers (i.e., partially simulated models of procedural tasks) to assess performance of procedural skills has been demonstrated,92 further research is required before team-based simulation can be recommended for high-stakes assessment of diagnostic reasoning.93

Direct Observation Tools There are few tools explicitly designed to assess diagnostic reasoning via direct observation in the clinical setting. The existing direct observation instruments (e. g., in-training evaluation reports, encounter cards, mini clinical evaluation exercise [mini-CEX], and the standardized direct observation tool [SDOT]) generally use checklists and global ratings that are completed by a physician assessor observing a learner perform focused element of patient care.95–101 Narrative comments are typically required, but are often incomplete or limited in nature.100,101 While no instrument specifically addresses diagnostic reasoning, there are a number of items that are loosely approximated. For example, the mini-CEX includes assessment in the domain of “clinical judgment,”97 while the SDOT includes ratings of learner performance in “synthesis/ differential diagnosis” and “management.”98,99 These tools have the advantage of offering observations in authentic, real-time settings; however, the protocol requires the observer to infer the line of reasoning based on the behavior observed. Retrospective Clinical Case Analysis Retrospective clinical case analysis, in contrast to direct observation tools, provides an opportunity for learners to reflect upon past clinical decisions with real patients in the presence of an examiner.102 Chart stimulated recall uses semistructured interviews with expert assessors. Learners are probed regarding their decision-making on actual cases, providing insights that may not be documented in the medical record or fully observed in real time.103 A chart audit involves nonexpert assessors matching key metrics of patient care to a retrospective sample of a learner’s charts.104 Multisource Feedback Multisource feedback combines multiple assessments from the sphere of influence of the learner (e.g., resident peers, nurses, other health professionals, patients).105–111 However, reliable assessment with this technique requires a large number of assessors. In a study of 1066 physicians in the United Kingdom,112 it would require 34 patient questionnaires and 15 peer questionnaires to achieve a reliability of 0.70.

1458

Future Instruments If we are unable to open the “black box” of reasoning in clinical practice, perhaps what is most important is the accuracy of the ED diagnosis and not the process by which it was achieved. However, hospital discharge diagnoses, the current criterion standard against which ED diagnoses are measured, suffer from hindsight bias.11 Actual patient outcomes may be the best measure of diagnostic reasoning. However, current measures of quality (i.e., core measures) are confounded by many elements outside of an EP’s control and may not correlate with the accuracy of diagnostic reasoning. To address these challenges, future clinically based assessment instruments should consider ED-specific markers of patient care that correlate to accurate ED diagnoses. Finally, the surreptitious introduction of SPs into EDs might standardize the assessment of common EM diagnoses while preserving the authenticity of the decision environment.113 However, a number of operational considerations must first be addressed prior to widespread adoption, not the least of which is the ethics of blocking (real) patient access to emergency care because of the presence of SPs. Prioritized List of Research Questions With a review of the diagnostic reasoning literature and a thematic analysis by education and diagnostic reasoning experts, a series of important research questions was developed prior to the consensus conference. These

Table 1 Prioritized List of Research Questions in Each Domain of Diagnostic Reasoning Assessment, as Validated by Participants in the Consensus Conference Assessing diagnostic reasoning in the extraclinical setting 1. What is the effect of distraction on diagnostic accuracy in simulated environments? 2. What factors influence the predictive value of extraclinical assessments of diagnostic reasoning when comparing to performance in the clinical environment? 3. What is the value in assessing a learner’s diagnostic abilities at different points in training using a standardized bank of simulated cases? 4. Can the SCT demonstrate the development of diagnostic reasoning in learners over time? 5. What extraclinical tools adequately assess System 1 processes? Assessing diagnostic reasoning in the clinical setting 1. What patient-oriented outcomes or surrogate markers are reliable and valid indicators of accurate ED diagnostic reasoning? 2. What is the effect of metacognition on diagnostic error in experienced EPs? 3. What is the feasibility of assessing diagnostic reasoning in real time? 4. What is the effect of cognitive load (e.g., treating multiple patients simultaneously) on diagnostic reasoning of simple and complex problems? 5. What ED-specific factors inhibit the assessment of diagnostic reasoning? SCT = script concordance test.

Ilgen et al. • DIAGNOSTIC REASONING ASSESSMENT

questions were validated and prioritized using an iterative consensus process that consisted of background didactics by content experts, focused group discussion, and individual multivoting. Table 1 represents the highest priority education research questions relating to diagnostic reasoning as indicated by education thought leaders, education researchers, and front-line educators. The authors advocate for research programs to address these issues and funding agencies to promote research streams that explore these concepts. CONCLUSIONS Diagnostic reasoning is a complex process, with elements of both System 1 and System 2 thinking. Assessment of these processes must be inferred from behavior because it is not a discrete, measurable quality, nor is it independent of context and content. No single strategy can be used to assess the accuracy of a clinician’s diagnostic decisions. Rather, multiple strategies must be used if an accurate assessment is to be gained. Many questions remain regarding how these reasoning processes can be most accurately measured, offering a multitude of avenues for future research that offer great potential to ultimately improve patient care. References 1. Kohn LT, Corrigan J, Donaldson MS, Institute of Medicine (U.S.). Committee on Quality of Health Care in America. To Err is Human: Building a Safer Health System. Washington, DC: National Academies Press, 2000. 2. Accreditation Council for Graduate Medical Education. ACGME Program Requirements for Resident Education in Emergency Medicine, 2007. Available at: http://www.acgme.org/acgmeweb/Portals/0/PFAssets/ProgramRequirements/110emergency med07012007.pdf. Accessed Sep 9, 2012. 3. Liaison Committee on Medical Education. Functions and Structure of a Medical School. Available at: http:// www.lcme.org/functions.pdf. Accessed Sep 9, 2012. 4. Royal College of Physician and Surgeons of Canada. General Standards Applicable to All Residency Programs: B Standards. Available at: http://www.cfpc. ca/uploadedFiles/Education/_PDFs/Blue_Book_B_ Standards_January_%202011_English_Final.pdf. Accessed Sep 9, 2012. 5. National Health Service. The UK Foundation Programme Curriculum. Available at: http://www. foundationprogramme.nhs.uk/download.asp?file= Foundation_Curriculum_2011_WEB.pdf. Accessed Sep 9, 2012. 6. Leape LL, Brennan TA, Laird N, et al. The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II. N Engl J Med. 1991; 324:377–84. 7. Croskerry P, Sinclair D. Emergency medicine: a practice prone to error? CJEM. 2001; 3:271–6. 8. Kachalia A, Gandhi TK, Puopolo AL, et al. Missed and delayed diagnoses in the emergency department: a study of closed malpractice claims from 4 liability insurers. Ann Emerg Med. 2007; 49:196–205.

ACADEMIC EMERGENCY MEDICINE • December 2012, Vol. 19, No. 12 • www.aemj.org

9. Chellis M, Olson J, Augustine J, Hamilton G. Evaluation of missed diagnoses for patients admitted from the emergency department. Acad Emerg Med. 2001; 8:125–30. 10. Brown TW, McCarthy ML, Kelen GD, Levy F. An epidemiologic study of closed emergency department malpractice claims in a national database of physician malpractice insurers. Acad Emerg Med. 2010; 17:553–60. 11. Wears RL, Nemeth CP. Replacing hindsight with insight: toward better understanding of diagnostic failures. Ann Emerg Med. 2007; 49:206–9. 12. Tversky A, Kahneman D. Judgment under uncertainty: heuristics and biases. Science. 1974; 185:1124–31. 13. Tversky A, Kahneman D. The framing of decisions and the psychology of choice. Science. 1981; 211:453–8. 14. Flavell JH. Metacognitive Aspects of Problem Solving. In: Resnick LB (ed.). The Nature of Intelligence. Hillsdale, NJ: Lawrence Erlbaum Associates, 1976, pp 231–5. 15. Eva KW. What every teacher needs to know about clinical reasoning. Med Educ. 2005; 39:98–106. 16. Norman G. Dual processing and diagnostic errors. Adv Health Sci Educ Theory Pract. 2009; 14(Suppl 1): 37–49. 17. Norman GR, Eva KW. Diagnostic error and clinical reasoning. Med Educ. 2010; 44:94–100. 18. Schmidt HG, Norman GR, Boshuizen HP. A cognitive perspective on medical expertise: theory and implication. Acad Med. 1990; 65:611–21. 19. Coderre S, Mandin H, Harasym PH, Fick GH. Diagnostic reasoning strategies and diagnostic success. Med Educ. 2003; 37:695–703. 20. Elstein AS, Shulman LS, Sprafka SA. Medical Problem Solving: An Analysis of Clinical Reasoning. Cambridge, MA: Harvard University Press, 1978. 21. Mandin H, Jones A, Woloschuk W, Harasym P. Helping students learn to think like experts when solving clinical problems. Acad Med. 1997; 72:173–9. 22. Eva KW. On the generality of specificity. Med Educ. 2003; 37:587–8. 23. Eva KW, Neville AJ, Norman GR. Exploring the etiology of content specificity: factors influencing analogic transfer and problem solving. Acad Med. 1998; 73:S1–5. 24. Norman GR, Tugwell P, Feightner JW, Muzzin LJ, Jacoby LL. Knowledge and clinical problem-solving. Med Educ. 1985; 19:344–56. 25. Eva KW, Hatala RM, Leblanc VR, Brooks LR. Teaching from the clinical reasoning literature: combined reasoning strategies help novice diagnosticians overcome misleading information. Med Educ. 2007; 41:1152–8. 26. Boshuizen HP, Schmidt HG. On the role of biomedical knowlege in clinical reasoning by experts, intermediates, and novices. Cogn Sci. 1992; 16:153–84. 27. Patel VL (ed.). The Psychology of Learning and Motivation: Advances in Research and Theory. San Diego, CA: Academic Press, 1994. 28. Barrows HS, Feltovich PJ. The clinical reasoning process. Med Educ. 1987; 21:86–91.

1459

29. Charlin B, Boshuizen HP, Custers EJ, Feltovich PJ. Scripts and clinical reasoning. Med Educ. 2007; 41:1178–84. 30. Charlin B, Tardif J, Boshuizen HP. Scripts and medical diagnostic knowledge: theory and applications for clinical reasoning instruction and research. Acad Med. 2000; 75:182–90. 31. Brooks L. The matching game. New Physician. 1978; 27:34–6. 32. Custers EJ, Boshuizen HP, Schmidt HG. The role of illness scripts in the development of medical diagnostic expertise: results from an interview study. Cogn Instr. 1998; 16:367–98. 33. Brooks LR, Norman GR, Allen SW. Role of specific similarity in a medical diagnostic task. J Exp Psychol Gen. 1991; 120:278–87. 34. Stanovich KE, West RF. Individual differences in reasoning: implications for the rationality debate? Behav Brain Sci. 2000; 23:645–65. 35. Evans JS. Heuristic and analytic processes in reasoning. Br J Psychol. 1984; 75:451–68. 36. Evans JS. Dual-processing accounts of reasoning, judgment, and social cognition. Annu Rev Psychol. 2008; 59:255–78. 37. Kahneman D. Thinking, Fast and Slow. New York, NY: Farrar, Straus and Giroux, 2011. 38. Sloman SA. The empirical case for two systems of reasoning. Psychol Bull. 1996; 119:3–22. 39. Croskerry P. A universal model of diagnostic reasoning. Acad Med. 2009; 84:1022–8. 40. Eva KW, Cunnington JP. The difficulty with experience: does practice increase susceptibility to premature closure? J Contin Educ Health Prof. 2006; 26:192–8. 41. Ark TK, Brooks LR, Eva KW. Giving learners the best of both worlds: do clinical teachers need to guard against teaching pattern recognition to novices? Acad Med. 2006; 81:405–9. 42. Mamede S, Schmidt HG, Penaforte JC. Effects of reflective practice on the accuracy of medical diagnoses. Med Educ. 2008; 42:468–75. 43. Mamede S, Schmidt HG, Rikers RM, Penaforte JC, Coelho-Filho JM. Influence of perceived difficulty of cases on physicians’ diagnostic reasoning. Acad Med. 2008; 83:1210–6. 44. Moulton CA, Regehr G, Lingard L, Merritt C, MacRae H. Slowing down to stay out of trouble in the operating room: remaining attentive in automaticity. Acad Med. 2010; 85:1571–7. 45. Gonzalez C. Learning to make decisions in dynamic environments: effects of time constraints and cognitive abilities. Hum Factors. 2004; 46:449–60. 46. Chisholm CD, Weaver CS, Whenmouth L, Giles B. A task analysis of emergency physician activities in academic and community settings. Ann Emerg Med. 2011; 58:117–22. 47. Westbrook JI, Coiera E, Dunsmuir WT, et al. The impact of interruptions on clinical task completion. Qual Saf Health Care. 2010; 19:284–9. 48. Chisholm CD, Dornfeld AM, Nelson DR, Cordell WH. Work interrupted: a comparison of workplace interruptions in emergency departments and primary care offices. Ann Emerg Med. 2001; 38:146–51.

1460

49. Coiera EW, Jayasuriya RA, Hardy J, Bannan A, Thorpe ME. Communication loads on clinical staff in the emergency department. Med J Aust. 2002; 176:415–8. 50. Spencer R, Coiera E, Logan P. Variation in communication loads on clinical staff in the emergency department. Ann Emerg Med. 2004; 44:268–73. 51. Brixey JJ, Tang Z, Robinson DJ, et al. Interruptions in a level one trauma center: a case study. Int J Med Inform. 2008; 77:235–41. 52. Bos MW, Dijksterhuis A, van Baaren R. Food for thought? Trust your unconscious when energy is low. J Neurosci Psychol Econ. 2012; 5:24–130. 53. Masicampo EJ, Baumeister RF. Toward a physiology of dual-process reasoning and judgment: lemonade, willpower, and expensive rule-based analysis. Psychol Sci. 2008; 19:255–60. 54. van Merrienboer JJ, Sweller J. Cognitive load theory in health professional education: design principles and strategies. Med Educ. 2010; 44:85–93. 55. Sherbino J, Frank JR (eds). Educational Design: A CanMEDS Guide for the Health Professions. Ottawa, Canada: The Royal College of Physicians & Surgeons, 2011. 56. Norcini J, Anderson B, Bollela V, et al. Criteria for good assessment: consensus statement and recommendations from the Ottawa 2010 Conference. Med Teach. 2011; 33:206–14. 57. Schuwirth LW, van der Vleuten CP. Programmatic assessment and Kane’s validity perspective. Med Educ. 2012; 46:38–48. 58. Jacoby LL. A process dissociation framework: separating automatic from intentional uses of memory. J Mem Lang. 1991; 30:513–41. 59. Moulton CA, Regehr G, Mylopoulos M, MacRae HM. Slowing down when you should: a new model of expert judgment. Acad Med. 2007; 82:S109–16. 60. Crossley J, Johnson G, Booth J, Wade W. Good questions, good answers: construct alignment improves the performance of workplace-based assessment scales. Med Educ. 2011; 45:560–9. 61. Gingerich A, Regehr G, Eva KW. Rater-based assessments as social judgments: rethinking the etiology of rater errors. Acad Med. 2011; 86:S1–7. 62. Holmboe ES, Hawkins RE, Huot SJ. Effects of training in direct observation of medical residents’ clinical competence: a randomized trial. Ann Intern Med. 2004; 140:874–81. 63. Kogan JR, Conforti L, Bernabeo E, Iobst W, Holmboe E. Opening the black box of clinical skills assessment via observation: a conceptual model. Med Educ. 2011; 45:1048–60. 64. Kogan JR, Hess BJ, Conforti LN, Holmboe ES. What drives faculty ratings of residents’ clinical skills? The impact of faculty’s own clinical skills. Acad Med. 2010; 85:S25–8. 65. Epstein RM. Assessment in medical education. N Engl J Med. 2007; 356:387–96. 66. Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990; 65:S63–7. 67. Tamblyn R, Abrahamowicz M, Dauphinee D, et al. Physician scores on a national clinical skills exami-

Ilgen et al. • DIAGNOSTIC REASONING ASSESSMENT

68.

69.

70.

71.

72.

73.

74.

75.

76.

77.

78.

79.

80.

81.

82.

nation as predictors of complaints to medical regulatory authorities. JAMA. 2007; 298:993–1001. Wenghofer E, Klass D, Abrahamowicz M, et al. Doctor scores on national qualifying examinations predict quality of care in future practice. Med Educ. 2009; 43:1166–73. Ramsey PG, Carline JD, Inui TS, Larson EB, LoGerfo JP, Wenrich MD. Predictive validity of certification by the American Board of Internal Medicine. Ann Intern Med. 1989; 110:719–26. Hatala R, Norman GR, Brooks LR. Impact of a clinical scenario on accuracy of electrocardiogram interpretation. J Gen Intern Med. 1999; 14:126–9. Young M, Brooks L, Norman G. Found in translation: the impact of familiar symptom descriptions on diagnosis in novices. Med Educ. 2007; 41:1146–51. Page G, Bordage G, Allen T. Developing key-feature problems and examinations to assess clinical decision-making skills. Acad Med. 1995; 70:194–201. Page G, Bordage G. The Medical Council of Canada’s key features project: a more valid written examination of clinical decision-making skills. Acad Med. 1995; 70:104–10. Charlin B, Roy L, Brailovsky C, Goulet F, van der Vleuten C. The Script Concordance Test: a tool to assess the reflective clinician. Teach Learn Med. 2000; 12:189–95. Fournier JP, Demeester A, Charlin B. Script Concordance Tests: guidelines for construction. BMC Med Inform Decis Mak. 2008; 8:e18. Humbert AJ, Besinger B, Miech EJ. Assessing clinical reasoning skills in scenarios of uncertainty: convergent validity for a Script Concordance Test in an emergency medicine clerkship and residency. Acad Emerg Med. 2011; 18:627–34. Goulet F, Jacques A, Gagnon R, Charlin B, Shabah A. Poorly performing physicians: does the Script Concordance Test detect bad clinical reasoning? J Contin Educ Health Prof. 2010; 30:161–6. Bianchi L, Gallagher EJ, Korte R, Ham HP. Interexaminer agreement on the American Board of Emergency Medicine oral certification examination. Ann Emerg Med. 2003; 41:859–64. Maatsch JL. Assessment of clinical competence on the Emergency Medicine Specialty Certification Examination: the validity of examiner ratings of simulated clinical encounters. Ann Emerg Med. 1981; 10:504–7. Bandiera G, Sherbino J, Frank JR. The CanMEDS Assessment Tools Handbook. An Introductory Guide to Assessment Methods for the CanMEDS Competencies. Ottawa, Canada: Royal College of Physicians and Surgeons of Canada, 2006. Wong ML, Fones CS, Aw M, et al. Should nonexpert clinician examiners be used in objective structured assessment of communication skills among final year medical undergraduates? Med Teach. 2007; 29:927–32. Chenot JF, Simmenroth-Nayda A, Koch A, et al. Can student tutors act as examiners in an objective structured clinical examination? Med Educ. 2007; 41:1032–8.

ACADEMIC EMERGENCY MEDICINE • December 2012, Vol. 19, No. 12 • www.aemj.org

83. Hodges B, Regehr G, McNaughton N, Tiberius R, Hanson M. OSCE checklists do not capture increasing levels of expertise. Acad Med. 1999; 74:1129–34. 84. Schuwirth LW, Van Der Vleuten CP. The use of clinical simulations in assessment. Med Educ. 2003; 37:65–71. 85. Brailovsky C, Charlin B, Beausoleil S, Cote S, Van der Vleuten C. Measurement of clinical reflective capacity early in training as a predictor of clinical reasoning performance at the end of residency: an experimental study on the Script Concordance Test. Med Educ. 2001; 35:430–6. 86. Schuwirth L, Gorter S, Van der Heijde D, et al. The role of a computerised case-based testing procedure in practice performance assessment. Adv Health Sci Educ Theory Pract. 2005; 10:145–55. 87. Dillon GF, Clauser BE. Computer-delivered patient simulations in the United States Medical Licensing Examination (USMLE). Simul Healthc. 2009; 4:30– 4. 88. Huang G, Reynolds R, Candler C. Virtual patient simulation at US and Canadian medical schools. Acad Med. 2007; 82:446–51. 89. Harik P, Cuddy MM, O’Donovan S, Murray CT, Swanson DB, Clauser BE. Assessing potentially dangerous medical actions with the computerbased case simulation portion of the USMLE step 3 examination. Acad Med. 2009; 84:S79–82. 90. Sydor DT, Naik VN. Simulation and competencybased medical education: “showing how.” In: Sherbino J, Frank JR, eds. Educational Design: A CanMEDS Guide for the Health Professions. Ottawa, Canada: Royal College of Physicians & Surgeons, 2011. 91. Cook DA, Hatala R, Brydges R, et al. Technologyenhanced simulation for health professions education: a systematic review and meta-analysis. JAMA. 2011; 306:978–88. 92. Norman G, Dore K, Grierson L. The minimal relationship between simulation fidelity and transfer of learning. Med Educ. 2012; 46:636–47. 93. McGaghie WC, Issenberg SB, Petrusa ER, Scalese RJ. A critical review of simulation-based medical education research: 2003-2009. Med Educ. 2010; 44:50–63. 94. Kassirer JP. Teaching clinical reasoning: casebased and coached. Acad Med. 2010; 85:1118–24. 95. Kogan JR, Holmboe ES, Hauer KE. Tools for direct observation and assessment of clinical skills of medical trainees: a systematic review. JAMA. 2009; 302:1316–26. 96. Mitchell C, Bhat S, Herbert A, Baker P. Workplace-based assessments of junior doctors: do scores predict training difficulties? Med Educ. 2011; 45:1190–8. 97. Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: a method for assessing clinical skills. Ann Intern Med. 2003; 138:476–81. 98. LaMantia J, Kane B, Yarris L, et al. Real-time inter-rater reliability of the Council of Emergency Medicine Residency Directors standardized direct

99.

100.

101.

102.

103.

104.

105.

106.

107.

108.

109.

110.

111.

112.

113.

1461

observation assessment tool. Acad Emerg Med. 2009; 16(Suppl 2):S51–7. Shayne P, Gallahue F, Rinnert S, Anderson CL, Hern G, Katz E. Reliability of a core competency checklist assessment in the emergency department: the Standardized Direct Observation Assessment Tool. Acad Emerg Med. 2006; 13:727–32. Bandiera G, Lendrum D. Daily encounter cards facilitate competency-based feedback while leniency bias persists. CJEM. 2008; 10:44–50. Dudek NL, Marks MB, Wood TJ, Lee AC. Assessing the quality of supervisors’ completed clinical evaluation reports. Med Educ. 2008; 42:816–22. Norcini J, Burch V. Workplace-based assessment as an educational tool: AMEE Guide No. 31. Med Teach. 2007; 29:855–71. Goulet F, Jacques A, Gagnon R, Racette P, Sieber W. Assessment of family physicians’ performance using patient charts: interrater reliability and concordance with chart-stimulated recall interview. Eval Health Prof. 2007; 30:376–92. Accreditation Council of Graduate Medical Education, American Board of Medical Specialties. Toolbox of Assessment Methods. Available at: http://www.ac gme.org/outcome/assess/ toolbox.asp. Accessed Jan 15, 2001 (site subsequently removed). Now: http:// www.partners.org/Assets/Documents/GraduateMedical-Education/ToolTable.pdf. Accessed Sep 9, 2012. Ramsey PG, Carline JD, Blank LL, Wenrich MD. Feasibility of hospital-based use of peer ratings to evaluate the performances of practicing physicians. Acad Med. 1996; 71:364–70. Ramsey PG, Wenrich MD. Peer ratings. An assessment tool whose time has come. J Gen Intern Med. 1999; 14:581–2. Ramsey PG, Wenrich MD, Carline JD, Inui TS, Larson EB, LoGerfo JP. Use of peer ratings to evaluate physician performance. JAMA. 1993; 269:1655–60. Wenrich MD, Carline JD, Giles LM, Ramsey PG. Ratings of the performances of practicing internists by hospital-based registered nurses. Acad Med. 1993; 68:680–7. Arnold L, Willoughby L, Calkins V, Gammon L, Eberhart G. Use of peer evaluation in the assessment of medical students. Acad Med. 1981; 56:35–42. Archer JC, Norcini J, Davies HA. Use of SPRAT for peer review of paediatricians in training. BMJ. 2005; 330:1251–3. Whitehouse A, Walzman M, Wall D. Pilot study of 360 degrees assessment of personal skills to inform record of in training assessments for senior house officers. Hosp Med. 2002; 63:172–5. Wright C, Richards SH, Hill JJ, et al. Multisource feedback in medical regulation: the example of the UK GMC Patient and Colleague Questionnaires. Acad Med. 2012;(in press). Rethans JJ, Sturmans F, Drop R, van der Vleuten C. Assessment of the performance of general practitioners by the use of standardized (simulated) patients. Br J Gen Pract. 1991; 41:97–9.