Standards-Based Data Model for Clinical Documents

0 downloads 0 Views 45KB Size Report
Knowtator format used in the ShARe project and CDA+GrAF. ... model: CDA+GrAF.2 This data model combines two existing standards (HL7 Clinical Document ...
Standards-Based Data Model for Clinical Documents and Information in the Shared Annotated Resources (ShARe) Project Stéphane M. Meystre, MD, PhD1, Narong Boonsirisumpun, MS2, Noémie Elhadad, PhD3, Guergana Savova, PhD4, Wendy W. Chapman, PhD1, 1 3

Department of Biomedical Informatics, 2 School of Computing, University of Utah, Salt Lake City, UT Columbia University, New York, NY 4 Children’s Hospital and Harvard Medical School, Boston, MA

Abstract: We evaluated the adequacy of a standards-based data model – CDA+GrAF – for clinical text annotations in the Shared Annotated Resources (ShARe) project, and developed tools to automatically convert annotations between the Knowtator format used in the ShARe project and CDA+GrAF. A random sample of 50 annotated notes were successfully converted back and forth, with valid and accurate annotations in both versions. Introduction: To support clinical research, Natural Language Processing (NLP) can be used to extract detailed information from clinical documents, but progress with applications of NLP to clinical narratives has been, and still is, significantly hindered by the lack of clinical narratives that can be easily used or shared for research applications. The Shared Annotated Resources (ShARe) project aims at alleviating this hindrance by developing de-identified and annotated sharable corpora of clinical notes.1 To ease sharing and enable interoperability, a common information model and common terminologies are required. To answer this need, we evaluated a standards-based text annotation data model: CDA+GrAF.2 This data model combines two existing standards (HL7 Clinical Document Architecture and ISO Graph Annotation Format) to represent all kinds of text annotations and serve as a pivot data model for annotations exchange and combination. Methods: To evaluate the adequacy of CDA+GrAF for the representation of ShARe clinical text annotations, we focused on four objectives: 1) manually create examples of MIMIC-II clinical text annotations (according to the current ShARe use case) using the CDA+GrAF data model; 2) develop conversion tools to automatically convert text annotations from the Knowtator format used in the ShARe project to the CDA+GrAF format, and back; 3) use the conversion tools to automatically convert MIMIC-II clinical notes annotated with Knowtator in the CDA+GrAF format, and back; and 4) examine the validity of the resulting CDA+GrAF XML annotation files, and the accuracy of the annotation files translated back in Knowtator format. To work on these objectives, we randomly selected 50 MIMIC-II notes and obtained the corresponding ShARe annotations. These text annotations included multiple different categories of concepts and relations such as anatomical sites, diseases and disorders, and temporal information. Results: The ShARe annotations are rather complex because of the various class and relation types used, but we managed to represent them faithfully in the CDA+GrAF format, without any loss of information. To easily convert Knowtator XML annotation files in the CDA+GrAF format, and back, we developed two Java conversion tools: KnowtatorXmlConverter and CDAGrAFXmlConverter. The first automatically converts Knowtator annotation files and the corresponding annotated text notes into CDA+GrAF annotation files. The second automatically converts CDA+GrAF annotation files into Knowtator annotation files and the annotated text notes. The CDA+GrAF annotation files were examined for validity and general content. The Xerces XML parser with HL7 CDA and GrAF XML schemata were used for testing, and all documents were considered “well-formed” and “valid.” We also manually examined 10 of the CDA+GrAF annotation files, as visualized in a web browser using an XSL stylesheet. All ShARe annotations were correctly represented, and no errors were found. The newly generated Knowtator annotation files and text notes were finally compared with the original Knowtator annotations files and text notes, and the analysis of differences showed that content of the former was identical (XML elements order was changed), and text files were identical. This automatic bi-directional conversion was therefore a success. Acknowledgments: Project funded by R01GM090187 from NIGMS. References 1. 2.

Chapman WW, Elhadad N, Savova G. Clinical NLP Annotation [Internet]. Available from: http://www.clinicalnlpannotation.org/index.php/Main_Page Meystre SM, Lee S, Jung CY, Chevrier RD. Common data model for natural language processing based on two existing standard information models: CDA+GrAF. J Biomed Inform. 2012 Aug;45(4):703–10.

188