Leveraging Knowledge Representation to Maintain

0 downloads 0 Views 1MB Size Report
documents, and (e) Microsoft Excel (XLSX) tables. ... availability of all file formats for all content types, (2) discrepancies among files .... (ETL.2) D2D Converter.
Leveraging Knowledge Representation to Maintain Immunization Clinical Decision Support Janos L. Mathe, PhD1, Scott D. Nelson, PharmD, MS1, Stuart T. Weinberg, MD1, Christoph U. Lehmann, MD1, Andras Nadas, MS1, Asli O. Weitkamp, PhD1 1 Vanderbilt University Medical Center, Nashville, TN, USA Abstract Immunizations are one of the most cost-effective interventions for preventing morbidity and mortality. As vaccines, related clinical knowledge and requirements change, clinical applications must be updated in a timely manner to avoid practicing outdated medicine. We use the Centers for Disease Control and Prevention (CDC) as a source for immunization knowledge for our Clinical Information Systems (CIS). After identifying knowledge management related gaps in the CDC’s content and email notification service, we developed and adapted a knowledge management tool chain – called COMET – for facilitating automatic processing of the available immunization content to implement mature knowledge lifecycle management practices locally. The implemented features include error and change tracking, content discovery and analytics, and tracking of dependencies to dependent downstream CISs. We demonstrate the creation of a tool that enables content curators to visualize, track, and implement immunization changes. Introduction Immunizations are one of the most cost-effective methods for preventing morbidity and mortality from infectious diseases. As new vaccines continue to be developed, administration rules are being modified, and vaccines become outdated or unavailable, clinical applications must be updated in a timely manner. Changes to immunization knowledge affects multiple functions including order entry, documentation, clinical decision support, population health management, quality measures, and interoperability. Therefore, in 2010, the Community Preventive Services Task Force recommended the use of immunization information systems based on strong evidence of effectiveness in increasing vaccination rates1. To facilitate the reporting, exchange, forecasting, and analysis of immunization data, the Centers for Disease Control and Prevention (CDC) supported the development of Immunization Information Systems (IISs). The success of the IISs efforts depend in part on knowledge and code sets that are maintained by the CDC2. While immunizations as medications are coded using multiple systems such as RxNorm, the authority in the United States on maintaining the immunization codes and knowledge is the CDC3. Using its public website, the CDC publishes general immunization content as well as HL7 Standard Immunization Code Sets2, which define an immunization ontology. The CDC also publishes Vaccine Information Statement (VIS) documents, which are important from a regulatory and compliance perspective. In order to facilitate patients receiving recommended immunizations on schedule, the CDC publishes Clinical Decision Support for Immunization (CDSi) rules and data for immunization CDS engines4. This manuscript focuses on the HL7 Standard Immunization Code Sets content. To alert consumers of the periodic updates to its published immunization content, the CDC offers a subscription based email notification service. This service is suitable for triggering workflows, where domain experts manually identify changes and update relevant clinical content in their local clinical information systems (CIS). While helpful, we identified a need to augment the notification service with sophisticated knowledge management tools to analyze newly published content and to automatically import content into the local immunization implementations within CISs and to automatically identify the dependencies with orderable items and documentation. Objectives We aimed to model the immunization information system code set content provided by the CDC to automatically compare this “source of truth” with immunization related content embedded in our local CISs with the ultimate goal of automatically triggering modification activities driven by the content change. In addition to the subscription and analysis of the new content, we wanted to manage critical dependencies (e.g. new vaccines requiring new orderable items) automatically to inform the local content curation efforts. Such schedule-driven, automated content review reports can discover updates and send notifications of content changes to multiple dependent downstream systems and subscribers based on the type of change consumers declare an interest. Furthermore, these reports can expose the impact of changes and the necessary updates within our systems through traversing formal conceptual links across

CDC content and local content. We also sought to perform ad-hoc reporting on meaningful content changes in the context of related elements (e.g. finding vaccine alternatives when dealing with shortages), utilize constraint checking for data quality (e.g. find which vaccines are missing manufacturer information), and be flexible in handling changes to the domain model, focusing on integration with our implementation of Epic electronic records system. Related Work In his landmark paper, Cimino described the importance of creating precise models to describe relationships among concepts for representing healthcare knowledge ultimately to support content development and maintenance5. Serving as a resource to provide domain specific content, such knowledge bases have been employed in many areas to drive clinical decision support, more recently in providing genomic clinical decision support6,7. In this paper, we describe the methods we employed to create a domain model to represent relationships among all immunization concepts and utilize the code sets CDC publishes to populate and maintain our knowledge base. Methods Sustainable maintenance of clinical knowledge, such as CDS logic and content, has been an active area of research in healthcare for decades due to its clinical importance and its lack of industry wide accepted solutions8,9. To date, numerous successful, but localized approaches have demonstrated that knowledge management efforts – including authoring, analysis, and maintenance – can greatly benefit from the application of rigorous knowledge engineering techniques and principles10–12. Techniques in knowledge engineering, such as domain modeling13,14 – a form of knowledge representation that allows the creation of mathematically precise, manipulable, and analyzable abstractions – are especially useful for the representation of complex CDS knowledge15,16. Introduction of COMET at a high level As part of an institutional effort to tackle clinical knowledge management in general and to aid knowledge migration to a local implementation of a vendor electronic health record (EHR) system17, we defined a set of requirements for an enterprise-wide clinical knowledge management solution. The resulting tool suite, called Clinical knOwledge Management EcosysTem (COMET), provides content integration, analytics, curation, and dissemination functionalities. We describe the components of COMET relevant to immunization knowledge management; however, a detailed description – including its ability to perform content curation and dissemination – is outside the scope of this paper. Overview of COMET’s application to the immunization domain In order to implement the objectives outlined earlier, we concentrated exclusively on enabling COMET’s content integration and analysis features. The following sections describe the features utilized by providing a stepwise description of our approach: (1) Analysis of the source domain, (2) Modeling: Defining the target domain concepts, (3) Storing instances of the target domain as a graph and (4) ETL: Extract, Translate, and Load immunization content. We discuss operationalization of the execution of the ETL processes and describe analytics in the Results section. Analysis of the source domain To consume the immunization knowledge offered by the CDC, a detailed analysis of the available content was required. We already documented our detailed findings elsewhere18, thus we will only provide an overview of this source domain. From the information available on the CDC website, we identified 11 sources that we wanted to consume (see the “Resource” column of Table 1). For each content type, the CDC typically offers information in multiple formats: (a) HTML-embedded tables, (b) XML v1 and (c) XML v2 files, (d) comma separated value (CSV) documents, and (e) Microsoft Excel (XLSX) tables. With the intent to programmatically process the available information, our order of preference was (1) XML, (2) CSV, and (3) Excel. Unfortunately, there were multiple technical anomalies with the source files including (1) lack of availability of all file formats for all content types, (2) discrepancies among files representing the same content in different formats, and (3) inability to extract information unambiguously due to the format design (see “Technical challenges in the source” column of Table 1). The XML v1 format was unusable because of its ambiguity caused by having conceptually related name-value pairs captured as a list of independent sibling XML elements under a single parent, which would have rendered the processing of name-value pairs extremely brittle. Some files labeled XML v2 were actually stored in the XML v1 format (see “XML v2: type mismatch problem” entries in Table 1). Additional ambiguity came from unescaped line breaks in the CSV files at multi-line entries. Further challenges and the resulting file format selections are shown in Table 1.

Table 1. CDC resource file content analysis Resource

Technical challenges in the source

1. CVX: Generic vaccines 2. VG_CVX: Grouping of generic vaccines

• XML v2: type mismatch problem

3. VIS_CVX: Generic vaccines related to vaccine information statements 4. MVX: Manufacturers of vaccines 5. MVX_CVX: Generic vaccines related to manufacturers

• XML v2: type mismatch problem • trailing "|" causing an unnamed header

6. CPT_CVX: Generic vaccines related to vaccine Current Procedural Terminology (CPT) Medical Code Set

• XML v2: type mismatch problem • CSV: unescaped line breaks • XLSX: missing "Status" column

7. VIS: Vaccine information statements (VIS) 8. VISURL_VIS: Vaccine information statements related to vaccine description documents

Selected Format XML v2 CSV XML v2

• XML: content represented via both nodes and attributes of the XML complicating the ETL parsing unnecessarily

9. NDCUS: NDC Unit of Sale: defines the vaccinations • XML: Not available for sale (i.e. packaged units) • CSV: unescaped line breaks 10. NDCUU: NDC Unit of Use: it defines the • XML: Not available administrable vaccinations, which are sold packaged • CSV: missing UseUnitPackForm column as (part of) sale vaccinations; it also defines possible compared to XLSX routes of administration 11. NDCUU_NDCUS_MVX: NCD linker: maps sale • XML: Not available and administrable vaccines (i.e. how administrable vaccines packed together as part of for sale vaccines); defines the manufacturers for sale and administrable vaccines

XML v2 CSV XLSX

XML v2 CSV

CSV XLSX

CSV

During the domain analysis, we encountered additional data nuances that complicated our primary goals significantly: (a) resource files contain duplicate and overlapping entries (e.g. duplicate rows in NDCUS table and six different permutation of the “intramuscular” route concept in NDCUU), (b) lack of a proper, well-defined data model (e.g. adhoc column naming, which makes querying and joining of tables difficult), (c) resources represent only the current state of authoring, lack guarantees for data persistence, and offer limited version tracking and provenance (i.e. while some concepts have a manually curated history, this cannot replace an appropriate change tracking solution), (d) email notifications are not in sync with actual changes of content and (e) the email communicated changes only represent clinically relevant information. Modeling: Defining the target domain concepts To represent the clinical and supporting concepts of the immunization domain, we applied a model-based design methodology, which focuses on the formal representation, composition, and manipulation of abstractions captured in the form of models. It addresses layered, multiple view modeling of domain-specific abstractions and enables model transformation, analysis, and validation. Additionally, model-based design tools typically facilitate automated synthesis of implementation directly from the models (such as system and configuration generation) and support design evolution. To represent the concepts, rules, and relationships of our domain model, we followed a modelintegrated computing (MIC) approach19,20 utilizing WebGME21, an open-source tool suite. The application of MIC principles and tools casted the creation of immunization domain models into the following four steps: (1) metamodeling - designing of a domain-specific modeling language (DSML, or DSL), (2) modeling using the modeling language to capture domain-specific model instances, (3) validation - checking the model instances against a set of rules defined by the domain, and (4) model interpretation - translation of model instances for external utilization. In step (1), we designed a modeling language – the COMET metamodel – to represent clinical knowledge

constructs that enable knowledge management workflows. The semantics of the COMET language include a mapping of the domain-specific concepts to a content storage layer. In step (2), we captured the abstractions of the immunization domain as model instances of the COMET language analogous to designing a schema in a database. The immunization domain model was created based on our interpretation of the CDC content and current clinical use case requirements. In step (3), we used the built-in constraint enforcement facility of WebGME with custom translators layered on top of the tool’s model transformation infrastructure to verify the immunization domain model against a set of structural constraints defined by the domain. In step (4), using custom model interpreters (specific-only to COMET), we translated the model into a set of configuration files for our storage layer. These configurations include schema definitions and validation rules in form of graph pattern queries to implement run-time constraint checking. (St.0) Data Source

VIS

CVX

NDCUS

Legend

MVX

NDCUU

VIS_CVX

VG_CVX

CPT_CVX

VISURL _VIS

CSV

XML

MS Ex cel

MVX _CVX

Concept

DB Table

ETL Job Config

NDCUU_NDCUS_MVX

(St.1) Source Domain

CVX

VIS_CVX

VG_C VX

VIS

CPT_CVX

MVX

NDCUS

MVX_CVX

NDCUU

NDCUU_NDCUS_MVX

VISURL _VIS

(ETL.2) D2D Converter

Route.oetl.json

VaccineGroup.oetl.json

VaccineGene ric_C hildOf.oetl.json

VaccineAdministered_HasRoute.oetl.json

VaccineAdministered.oetl.json

VaccineSale_HasUseNDC.oetl.json

VaccineCPT.oetl.json

VaccineAdministered_Manufac turedBy .oetl.json

VaccineGener ic.oetl.json

VaccineSale .oetl.json

VaccineSale_SaleManu facturedBy .oetl.json

CurrentV IS.oetl.json

VaccineGeneric_HasVIS.oetl.json

Manufacturer.oetl.json

(St.2) Target Domain

Vaccine Sale

Vaccine Adminis tered

Vaccine CPT

Vaccine Generic (ETL.3) Multi-Dimensional Versioning

Manufacturer

Route

Vaccine Purchasable

Vaccine Group

Current VIS (St.3) Target Domain

Vaccine

Figure 1. Content transformation flow The box “(St.2) Target Domain” in Figure 1 shows the simplified representation of the target domain. While the diagram omits several modeling details, it illustrates the fundamentals: concepts represented as rounded rectangles, relationships as the solid lines between concepts, and modeled type inheritance as dashed lines.

Storing instances of the target domain as a graph The COMET architecture uses a graph database, which we utilized as our main storage layer and as an analytic and data transformation tool. Analytics comprises three distinct use cases. The first one is constraint checking, where structural patterns required to be enforced can be implemented as graph traversal queries. The second use case is analytical reporting, where query-based reports generate custom views of the content for a specific purpose. The third use case is ad hoc data discovery, during which content consumers traverse a large knowledge graph to gain an understanding of the relationships between different sets of data in a more intuitive manner. The current implementation of this architectural component in COMET is OrientDB22, which is a multi-model open source NoSQL database management system combining the power of graphs with document, key/value, reactive, object-oriented, and geospatial models into a single scalable, high-performance operational database. In COMET, the model translation tools translate the modeled concepts into OrientDB schema definitions and related graph queries. This translation and import of the resulting graph schema occur once during the design of each domain. Figure 4 shows an example graph instance. ETL: Extract, Translate, and Load immunization content In the business intelligence world, “extract, translate and load” (ETL) processes are utilized to provide transformation of data between two – often-heterogeneous – data storage systems. In COMET, we utilize an ETL pipeline to implement the mapping between source and target domains. In the immunization domain, we translate the CDC file contents into instances of the graph schema obtained through modeling our target domain. The following sections describe the steps involved. Stage 1: Preprocessing COMET applies a three-stage ETL solution. Preprocessing the data in the first stage supports four functions: (a) Achieving source system independence, (b) Providing data aggregation, (c) Performing low-level data cleansing and (d) Minimizing technology proliferation. Function (a) means that COMET replicates the CDC content “as is” to provide a basic versioning of the source content and to enable computationally expensive data manipulation and querying of a local copy to reduce dependence and physical load on the original content provider. Function (b) is necessitated by having multiple physical content sources (e.g. files or tables) containing data of the source domain, which is often the case when composing a target domain from multiple subdomains. Composition of subdomains at the modeling level translates to data integration at the storage level enabling the physical connection of conceptually related, independently stored source content. Function (c) enforces basic assumptions that the next stage of the COMET ETL pipeline makes about the data. Such low-level transformations of the CDC content include type casting the various date representations to a single date type and trimming textual content. Function (d) simplifies the COMET technology stack, such that we can implement and maintain fewer software components. During Stage 1, we translate content into predefined storage formats: Stage 2 expects SQL (specifically, SQL database solutions for which OrientDB provides JDBC connectors) for table-based content and JSON or XML for graphs. For the table-based CDC content, the Stage 1 implementation translates everything to SQLite. In Figure 1, the relevant boxes are “(St.0) Data Source” and “(St.1) Source Domain” representing the CDC resources and the SQLite unified database instance respectively. Besides simplifying access, generating detailed documentation of data mappings, and providing execution logs, our immunization translators perform some basic data cleansing tasks including converting textual content to the same character set (i.e. UTF-8), escaping multi-line entries, generating database-compatible column names, removing of duplicate rows, and defining database keys. Stage 2: Domain conformation The second stage of COMET’s ETL solution populates the target domain with valid model instances through (a) achieving conformation between source and target domains, while (b) enforcing the rules of the target domain. Because domains are often composed of multiple subdomains (i.e. derived from multiple source domains), Stage 2 also provides a (c) domain unification function. Function (a) requires the creation of semantically valid mappings between concepts of the source and target domains, which manifest as data translation rules in form of a set of ETL jobs referred to as the ETL plan. This stage addresses model-to-model translation and concept-level data cleansing to prevent model instances created during the ETL plan execution from violating the constrains of the target domain (e.g. there should be only one copy of a particular model instance). The ETL jobs for the immunization domain were implemented manually (see “(ETL.2) D2D Converter” on Figure 1). The assumptions made implicitly define constraints on the order of execution (e.g. a “Has Route of Administration” relationship cannot be created in a graph

database with missing “Administrable Vaccine” or “Route of Administration” model instances). This phenomenon – which is discussed in the “ETL orchestration” section – represents a partial order (in order set theory), where the ordering is dictated by dependencies of the domain model. Further restraints to the initial partial order are dictated by the need to minimize the processing of data sources (e.g. a source table that defines two concepts and a relationship between them is processed three times, once per an ETL job for each abstraction, or once, by an ETL job that creates all three). Figure 1 represents the initial partial order for the immunization domain via the arrows across the ETL jobs of the “(ETL.2) D2D Converter”. In Stage 2, function (b) is implemented as a combination of OrientDB schema enforcement and queries generated during the model interpretation (previous sections). The execution of the automatically generated validation rules produces a report of potential domain violations, which can be analyzed in conjunction with the ETL job execution logs to ensure correctness of Stage 2 data. Stage 3: Versioning The third stage of COMET ETL provides a sophisticated, “multi-dimensional versioning” solution, which enables flexible, change-driven workflows (see “(ETL.3) Multi-Dimensional Versioning” in Figure 1). Multi-dimensional versioning provides versioning of multiple constructs concurrently. At the lowest level, COMET maintains an “entitybased version” information, versioning the model instances generated by the execution of the associated ETL plan automatically. The version information stored for each model instance contains a flexible set of version signatures. The default configuration stores two version signatures: An “ID version signature” generated from hashing the information that uniquely identifies the instance and a “content version signature” from all the information stored by it. (The discussion of further, workflow-specific version signatures is out of the scope of this manuscript.) Entitybased ID versions provide COMET with the ability to recognize new and deleted immunization model instances, thus enabling a form of change tracking. This is complemented by content versioning, which shows the evolution of the existing model instances (e.g. enumerate the vaccine definitions that were updated recently). Another dimension of versioning is provided by the automatic computation of “concept-based versions” during ETL execution, which generates a set of signatures for each concept type. This allows of the creation of abstract views that suggest to content reviewers where to focus their attention (e.g., there were no new instances of vaccine manufacturers in the past month). For convenience, COMET computes an ETL plan-based version noting any changes to the sources processed. ETL orchestration ETL orchestration is the seamless coordination of the tasks associated with the aforementioned stages including a stepwise execution of the tasks and the ability to automate execution of the ETL by a predefined schedule or events generated by other processes. For the immunization domain, we implemented a daily scheduled ETL process, which has been running since 2nd of August 2017. Because requirements did not define the need for parallel execution, we opted for the simplest orchestration solution. Consequently, the immunization plan orchestrator implements a purely sequential set of steps, which required extending the partial ordering of the ETL jobs into complete order for execution. Currently, the linear sequence of ETL jobs (satisfying the partial order) is provided for the ETL orchestrator as a configuration file. Content evolution COMET enables versioning at multiple stages of the ETL pipeline. Since the immunization domain required little storage space, we versioned at all stages concurrently preserving every file downloaded (Stage 0), every Stage 1 database created, and all ETL version information regarding instances of the target domain (Stage 3). This allows flexibility in changing the knowledge representation solution without losing information. It also enables debugging and supports historical playback needs. To demonstrate the evolution of the immunization content and the execution history of our ETL solution over time, we built a simple timeline visualization (Figure 2). The chart shows the count of concept instances for every ETL plan execution. Significant events (on Figure 2) A. B. C. D. E. F. G.

2017-08-02: First ETL plan execution 2017-08-12: First content update from the CDC 2017-08-29: Domain constraint violation resolved (see the “Content validation examples” section of Results) 2017-09-28: Multi-dimensional versioning enabled 2017-11-01: CDC content update while COMET server configuration errors caused outages 2017-11-29: Import anomalies due to file access problems 2017-12-13: Content update without email notification from CDC

Figure 2. ETL execution timeline visualization Results Content discovery example To demonstrate the intuitiveness of a graph model, we provide the results of a simple graph traversal query (Figure 4), which required less effort than classic relational database techniques used over the CDCs tables. We ran the query from Figure 3 and used the built-in graph discovery features of OrientDB to augment and visualize the returned graph. 1 2 3

traverse both() from ( select * from VaccineGeneric_imm_typ where CVXCode_imm_des in [158, 149, 111] ) limit 29 strategy BREADTH_FIRST

Figure 3. Sample graph traversal query (written in OrientDB's SQL syntax) While we will not discuss the domain model and the content stored in the nodes in detail, there are interesting observations to be made from the graph (Figure 4). Custom naming of concepts of a highly specialized domain allows a wider range of people to engage the model. The layer of interpretation that modeling provides can vary across use cases, however content sources can successfully populate multiple domain models fulfilling different needs. Understanding of the concepts at hand, the graph model allows quick interpretation of results. Figure 4 shows three generic vaccinations grouped by a single vaccine group, all manufacturer-specific options for each generic vaccine, and route of administration, which can be discovered by relating generic vaccines to administrable vaccinations. Structural irregularities are easy to spot in this representation including omissions (e.g., the missing “Manufactured By” and ”Has Route” relationships of administrable vaccines) and atypical additions (e.g. multiple CPT codes associated with one vaccine). Content validation examples Besides allowing ad hoc discoveries demonstrated above, we address domain rule violations in a more scientific manner. The gradual approach of the ETL pipeline provides a progressively constrained environment. At stage 1, we eliminate the low-level anomalies, such as duplicate entries. In stage 2, we perform semantical constraint checking, such as cardinality violation testing. Finally, in stage 3, we compute a delta of the actual changes. Our findings for stage 1 violations, which we shared with the CDC, include observing 8 duplicate rows in the National Drug Code (NDC) related content of a combined view23 (e.g. Sale NDC11: 19515-0893-07, Use NDC11: 195150893-02). A more complicated example of a semantic violation was the entry for Fluvirin (Sale NDC11: 66521-0114-

02, Use NDC11: 76420-0482-01) contradicting itself by equating the sale and the administrable versions of the vaccine (No Use NDC: True). Concepts Vaccine CPT Vaccine Group

Vaccine Generic (CVX)

Vaccine Sale

Manufacturer

Vaccine Administered

Route

Current VIS (VIS)

Flumist Quadrivalent

UNDC 66019-0301-01

Flumist Quadrivalent

UNDC 66019-0303-10

HasC VX

MED CVX 149

CPT 90672

HasVIS

UNDC 66019-0300-01

Flumist Quadrivalent

UNDC 66019-0304-01

FluMist Quadrivalent

UNDC 66019-0302-01

FluMist

UNDC 66019-0110-01

FluMist

UNDC 66019-0108-01

FluMist

UNDC 66019-0109-01

FluMist

UNDC 66019-0107-01

Afluria Quadrivalent

UNDC 33332-0416-11

VIS 0886983000097 HasVIS

C hil dO f

FluMist Quadrivalent

HasC VX

Route: NASAL

CVX 1 11

CPT 90660 O ild Ch

f

Vaccine Group CVX 88

Of ild Ch

SEQ Afluria Quadrivalent

H a sC V

UNDC 33332-0417-11

X

CPT 90687

FLUZONE QUADRIVALENT Has C

UNDC 49281-0621-78

VX

CVX 158 PMC

HasVIS

CPT 90688

Fluzone quadrivalent

UNDC 49281-0625-78

Flulaval quadrivalent

Route: INTRAMUSCULAR

UNDC 19515-0896-01

VIS 0886983000103 IDB Flulaval quadrivalent

Has CVX Child Of

Has Generic Vaccine Has VIS

UNDC 19515-0903-01

Sale Manufactured By

Manufactured By

Has Use NDC

Relationships

Figure 4. Example query visualized via OrientDB’s graph editor

Has Route

Another anomaly – discovered as a cardinality violation – was a mapping problem between billing related CPT codes and generic vaccinations in the CPT_CVX table. In the source table, single CPT codes were mapped to multiple CVX codes in four instances. Such duplicate mappings prevent a clinical system from unambiguously translating a CPT code into vaccine product. Following reports from the community, this anomaly was corrected by the CDC, which can be observed in Figure 2 (as “significant event: C”). Operationalization efforts Content discovery and validation are critical features of knowledge management. However, our primary goal was to support the ongoing, enterprise-wide adoption of the highly valuable immunization content that the CDC provides. There are multiple approaches to achieving this goal. To avoid fundamentally changing our existing clinical knowledge curation workflows, we chose the least invasive solution. A retrospective report that compares the CDC content with the immunization content from our EHR. The COMET facility allows automatic import of content on a daily basis into a staging environment and generation of the required report, which then can be consumed by the clinical build teams, who make the desired changes to the production content. Discussion The CDC provides frequently updated, important clinical content that must be integrated into exiting local clinical systems in a timely manner. Our results demonstrated an effective way to identify changes to the immunization content and report them in a way that builders can digest them. Utilizing a similar approach to knowledge management at the CDC or a third party, would enable more consumers to implement changes faster. Conducting knowledge management prior to distribution is particularly important since not all interested clinical parties can spare resources for such an effort. Our efforts to inform the immunization content curation in our CISs made an impact. However, we do not consider this problem solved. Immediate next steps would include the proactive aiding of content build workflows, which requires a fundamental shift in the current knowledge curation approach. Rather than forcing content owners to monitor changes of a large pool of relevant, but often independent content sources, we would offer them – with the help of model integration – a single point of reference for the tracking of relevant content allowing better management of incoming change requests and of build quality reports. An additional improvement for our approach would be to directly represent and conceptually link the immunization domain concepts of our EHR within our existing domain model. Such integration would enable a single query against our COMET implementation to provide the currently used report instead of merging separate query results from COMET and Epic. Another enhancement would be to minimize the cost of implementing the ETL transformation logic, which is written by hand in form of various ETL jobs. We have found that for the immunization domain all ETL jobs could be generated from models. The additional models required to represent these transformations would include a precise representation of the source domain and the mapping of the source and target concepts using formal data transformation operators. A more complex potential future contribution would be to bypass prospective reporting and to facilitate a true knowledge curation environment, where we could stage observed immunization content changes as build templates and – through utilizing programmatic interfaces provided by our CISs – allow content generation directly within our local CIS implementations. In this pipeline, the emphasis is on enabling a workflow with steps for review, approval, and controlling release to CISs; thus the role of the content experts shifts from builder to reviewer. We found knowledge engineering to be a viable approach for our use case of updating local immunization content from CDS updates. COMET, the associated architecture we utilized, offered more features than needed. Besides aiding our immunization use case, it has proven to be viable in other institutional efforts including cataloging our legacy CDS content pre-conversion to the vendor EHR and revealing discrepancies between local implementation of radiology CDS content and vendor-based recommendations. However, the platform’s implementation is still in its infancy. Conclusion We developed and adapted a knowledge management tool chain – called COMET – for facilitating the management of immunization content in local CIS implementations. We identified several inconsistencies in the data published by the CDC requiring correction. We demonstrated the effectiveness of our tool to help content curators visualize and implement immunization recommendation changes. Our next steps will be to model and import immunization concepts of our CIS to streamline our content curation workflows.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.

20. 21. 22. 23.

Community Preventive Services Task Force (CPSTF). Increasing Appropriate Vaccination: Immunization Information Systems.; 2014. https://www.thecommunityguide.org/sites/default/files/assets/VaccinationImmunization-Info-Systems.pdf. Accessed March 4, 2018. CDC. IIS | Code Sets | HL7 Data | Vaccines | CDC. Code Sets. https://www.cdc.gov/vaccines/programs/iis/code-sets.html. Accessed March 2, 2018. CDC. Centers for Disease Control and Prevention. Centers for Disease Control and Prevention. https://www.cdc.gov/index.htm. Published April 26, 2017. Accessed March 2, 2018. CDC. IIS | Clinical Decision Support | CDSi | Vaccines | CDC. https://www.cdc.gov/vaccines/programs/iis/cdsi.html. Published February 16, 2018. Accessed March 2, 2018. Cimino JJ. From data to knowledge through concept-oriented terminologies: experience with the Medical Entities Dictionary. J Am Med Inform Assoc JAMIA. 2000;7(3):288-297. Welch BM, Eilbeck K, Fiol GD, Meyer LJ, Kawamoto K. Technical desiderata for the integration of genomic data with clinical decision support. J Biomed Inform. 2014;51:3-7. doi:10.1016/j.jbi.2014.05.014 Hoffman JM, Dunnenberger HM, Kevin Hicks J, et al. Developing knowledge resources to support precision medicine: principles from the Clinical Pharmacogenetics Implementation Consortium (CPIC). J Am Med Inform Assoc. 2016;23(4):796-801. doi:10.1093/jamia/ocw027 Greenes RA. Clinical Decision Support: The Road Ahead. Elsevier; 2011. Zhou L, Karipineni N, Lewis J, et al. A study of diverse clinical decision support rule authoring environments and requirements for integration. BMC Med Inform Decis Mak. 2012;12:128. doi:10.1186/1472-6947-12-128 Dube K, Wu B. A generic approach to computer-based Clinical Practice Guideline management using the ECA Rule paradigm and active databases. Int J Technol Manag. 2009;47(1/2/3):75-95. doi:10.1504/IJTM.2009.024115 Leong TY, Kaiser K, Miksch S. Free and open source enabling technologies for patient-centric, guidelinebased clinical decision support: a survey. Yearb Med Inform. 2007:74-86. Mulyar N, van der Aalst WMP, Peleg M. A pattern-based analysis of clinical computer-interpretable guideline modeling languages. J Am Med Inform Assoc JAMIA. 2007;14(6):781-787. doi:10.1197/jamia.M2389 Karsai G, Sztipanovits J, Ledeczi A, Bapty T. Model-integrated development of embedded software. Proc IEEE. 2003;91:145-164. Kelly S, Tolvanen J-P. Domain-Specific Modeling: Enabling Full Code Generation. Wiley-Interscience; 2008. Mathe J, Martin J, Miller P, et al. A Model-Integrated, Guideline-Driven, Clinical Decision-Support System. IEEE Softw Spec Issue Domain-Specif Lang Model. 2009;26(4):54-61. doi:10.1109/MS.2009.84 Mathe JL, Sztipanovits J, Levy M, Jackson EK, Schulte W. Cancer Treatment Planning: Formal Methods to the Rescue. In: 4rd International Workshop on Software Engineering in Health Care (SEHC 2012). Zurich, Switzerland; 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6227014. The Tennessean. VUMC starts “gigantic” switch to Epic records system. The Tennessean. https://www.tennessean.com/story/money/industries/health-care/2015/12/18/vumc-switch-epic-recordssystem/77579202/. Accessed March 3, 2018. Stuart Weinberg. Immunization Content Management. Immunization Content Management | Stuart Weinberg | Vanderbilt University. https://my.vanderbilt.edu/stuartweinberg/imms-content-management/. Accessed March 1, 2018. Karsai G, Ledeczi A, Neema S, Sztipanovits J. The Model-Integrated Computing Toolsuite: Metaprogrammable Tools for Embedded Control System Design. In: Computer Aided Control System Design, 2006 IEEE International Conference on Control Applications, 2006 IEEE International Symposium on Intelligent Control. IEEE; 2006:50-55. doi:10.1109/CACSD.2006.285443 Maróti M, Kecskes T, Kereskényi R, et al. Next generation (Meta)modeling: Web- and cloud-based collaborative tool infrastructure. CEUR Workshop Proc. 2014;1237:41-60. Institute for Software Integrated Systems. WebGME. https://webgme.org/. Accessed March 5, 2018. OrientDB by CallidusCloud. OrientDB | Graph Database | Multi-Model Database. OrientDB. https://orientdb.com/. Accessed March 5, 2018. CDC. CDC | IIS | Code Sets | HL7 NDC | Vaccines [HTML]. https://www2a.cdc.gov/vaccines/iis/iisstandards/vaccines.asp?rpt=ndc. Accessed March 7, 2018.