Our Health Language and Data Collections

0 downloads 0 Views 249KB Size Report
... RN, PhD (UNSW), FACHI, FACS, MAICD, and Heather GRAINb A.Dip MRA, ...... B. Webber, K. Markert, B. Rauch, B, 'Supporting the development of formal ...
Our Health Language and Data Collections Evelyn J.S HOVENGAa1, RN, PhD (UNSW), FACHI, FACS, MAICD, and Heather GRAIN b A.Dip MRA, RMRA, GD DP, MHI, FACHI a CEO, Professor, Director and Trainer, eHealth Education Pty Ltd, Australia, b Director eHealth Education Pty Ltd, and Director Health-e-Words, Australia

Abstract. All communication within the health industry is dependent upon the use of our health language consisting of a very extensive and complex vocabulary. Converting this language into computable formats is necessary in a digital environment with a strong reliance on data, information and knowledge sharing. This chapter describes our health language, what terminologies and ontologies are, their use and relationships with natural language, indexing, data standards, data collections and the need for data governance., Keywords. Terminology, Classification, Coding, Natural Language, Syntax, SNOMED-CT, Ontology, Abstracting and indexing, Data collection, Information storage & retrieval

Introduction Our health language is complex and consists of a very extensive vocabulary. Once included in for example patient documentation as free text, it is difficult to find and retrieve unless it has been structured or coded in some way. Even computerized natural language processes rely on identifying words and/or associated rules. This may be the method of choice to suit some purposes, however this is not well suited for retrieving key data for reporting or data sharing purposes. Data standards are necessary for valid and meaningful data collections and electronic data processing. Key data elements, that represent any variable of interest, need to be sharable and retrievable for analytical purposes. A number of methods have been established to structure data collections and to assist data retrieval. This chapter explains our health language, how language relates to terminologies and ontologies, how it may be coded, classified and indexed to assist electronic data retrieval and analytics.

1.

Our Health Language

The foundation for enabling useful communication in health is the development of health care languages that are useable by both humans and computers to communicate health information. Language structure and usage is complex, yet this forms the basis for all information systems. Every language has rules which guide its usage; however, not everyone uses language in accordance with those rules! Furthermore the same words may be used but assigned different meaning. One meaning can also be assigned to any number of different words. These variations may depend on context or perhaps location or culture. Natural language processing (NLP) is the study of mathematical and computational modelling of various aspects of language and the development of a wide range of systems [1]. It involves computer systems that analyse, attempt to understand or produce one or more human languages [2] in a computer readable form. In natural language, meaning can be conveyed in many different ways. Words can be used ambiguously with the meaning being implicit in the context of the communication such as body language, rather than explicit in the words and sentences. Natural language is used when we communicate with each other. However there are subtle variances in the use of language to convey meaning between groups of people based on context such as situation, location, social background and so forth. In natural language words tend to have more meaning when they are part of a sentence, either in the form of a description of the named thing or concept, or in a sentence designed to convey meaning through usage, that is, context. The additional words that are used are said to be qualifiers. Qualifiers make the object of discussion more specific, as opposed to very general. For example in the sentence: "He fell over the table" the object of the sentence, the thing being talked of is a ‘table’. The table can be described more precisely when it is referred to as a ‘small wooden round table’. Small, wooden and round all describe the table and are said to qualify the meaning of table. Thus, the qualifiers refer to such things as size, type of material used and shape. In addition, one could add location, condition, age, colour etc. By adding a

1

Corresponding Author: Prof Evelyn Hovenga eHealth Education P/L Email: [email protected]

qualifier such as the word ‘light’ to the word ‘red’, the combined words ‘light red’ then have a new, more specific meaning than either word on its own. Data at the lowest level of detail is called atomic data, where meaning cannot be broken down further. Many data elements are created through aggregation of atomic data. For example: A pathology test result could be stored in a system as: Lymph node biopsy result intraductal carcinoma. Though this is useful information the information is more useful if it is broken down into components each of which the computer can process. The example of atomic data below is simplistic and not necessarily totally clinically correct, but provided to illustrate the point. Specimen: 13341245 Date: 15 March 2013 Collection method: biopsy Anatomical site: left auxiliary lymph node Specimen size weight: 0.015 grams size: 0.5 cm x 0.5cm x 0.2cm shape: square Test performed: histopathology Result: pathology: intraductal carcinoma margins: clear Due to the nature of people and the complexity of representing clinical information nurses, doctors and other health professionals tend to use subtle variations in their language to describe identical phenomena. Differences are discernible between departments within one hospital as well as between health care organisations. It is these complexities that make it very difficult to computerise natural language. It is impossible to use logic and statistics to represent and simulate subtle linguistic and cultural differences. The best available alternative is the use of artificial or structured languages. The most commonly used structured languages in healthcare are called terminologies. Terminologies have structure to represent meaning through the use of atomic relationships which computers can manipulate and 'understand'. For example: The computer would understand that Axillary lymph node is in the arm pit because the anatomical structure involved is located in that area. Terminologies differ from natural languages in that the structure of its statements is represented in a consistent, computable manner. 1.1 Health and Medical Terminologies Health and medical terminologies refer to the language used by health service providers and administrators. Terminologies may be described as the currency for communication between the many players in a healthcare system. Healthcare terminologies describe, organize and standardise the content for health information systems whether manual or electronic, and as such their use is essential to achieving health information system interoperability. Terminologies are closely related to the language used to develop and use knowledge. Therefore the absence of a uniform health language is a major barrier to effective communication about practice, dissemination of research and further development of health knowledge bases. The development of a terminology begins with the collection and definition of the words that belong to a language within a particular context. These lists of words are commonly referred to as lexicons. Words are selected for inclusion based on how well they represent the concepts of interest. A designation (term, appellation, or symbol) represents a concept. Terminologies today generally refer to computer and human readable and processable representation of concepts. They are precise and have detailed specification of the meaning of each concept. While classifications are considered a type of terminology they are usually statistical and have catch all concepts (such as Not Otherwise Classified, or Other). Many specialist classifications exist such as the International Classification of Nursing Practice (ICNP®) owned and maintained by the International Council of Nurses (ICN) [3], or the International Classification of Diseases for Oncology (ICD-O-3) [4] which is a member of the WHO family of classifications. Some terminologies and classifications are privately owned and governed, Clinical Care Classification, Logical Observation Identifiers Names and Codes (LOINC) [5], Global Medical Device Nomenclature [6] etc. Such terminologies may:  Be highly specialised, disease or domain specific, or very general and comprehensive  Be designed for a variety of specific purposes and structured in a variety of ways and/or  originate from, and thereby embody, different philosophies / approaches to health care. Available terminologies are listed as UMLS Knowledge Sources and a long list is included in a 2007 action agenda for the United States [7] complete with owner and governance process. The existence of so many different terminologies, owned and governed by various entities has resulted in fragmented governance,

proprietary licensing, uncoordinated release cycles and a lack of available standards. This reality is seen by many as a major obstacle to realizing the vision of making optimal use of electronic health records as it has resulted in an inability to link or aggregate data contained in the many silos of health information. As a consequence there is now an agreed need for the mapping of these terminological resources. A coherent approach to health concept representation has the potential to underpin a range of health care benefits. 1.1.1

What is a Terminology?

A terminology is a human and computer readable and interpretable representation of concepts. Good terminologies were defined by Jim Cimino in his Desiderata for Controlled Medical Vocabularies in the Twenty-First Century [8]. Good terminologies will:  have concept permanence - if the concept changes, it's a new concept  have a non-semantic concept identifier - the code has no meaning  be poly-hierarchical - many hierarchies conveying meaning not just one structure  have formal definitions of each element (uses the hierarchical relationships to define meaning of the concept)  have multiple granularities - able to represent things in detail or high level  be able to be viewed from multiple consistent points  be related to context  evolve gracefully - i.e. all changes fall in a historical context so that old and new concepts can still be 'understood'.  recognise redundancy - handle synonymy appropriately To meet the demands made of them, terminologies need a representation that can be used to display text to the user (that makes sense in the user's environment) - this includes different languages. This requirement is often called the interface terminology - it is not a different terminology but a function of a good terminology. However the health terminologies that need to support the use of information for different purposes, used by different users working in any location and in a wide range of specialties, use a variety of natural languages [9]. A terminology also has a computer and human readable representation of the structure (hierarchy and definitions) unique identifiers, and terms used. This is the real terminology, the actual product, and is sometimes called the Reference terminology which describes terms by specifying their structure, relationships with other data concepts and, if present, their systematic and formal definitions. A third type of terminology is one where terms are categorized, this is known as an aggregate terminology or classification system. Often the simple terms in a classification can be combined into complex terms that convey even more specific meaning. When strict rules are applied to the way in which terms may be combined to give meaningful complex terms the classification might be considered to be a nomenclature. Systematised nomenclatures that have been accepted and adopted by everyone in the discipline exist in many fields. 1.1.2

Interface Terminologies

Interface terminologies provide the means for the collection of health data into an information system. There may be a drop-down list of terms to assist the person entering the data to find the most suitable term for data entry. Interface terminologies most closely resemble our natural language. Consequently such terms tend to change over time and there is a considerable amount of flexibility, though they generally require correct spelling which does not always fit the pressures of clinical workloads. There are variations in the level of detail included in each of these terminology types, this may be referred to as the degree of granularity. In some cases specialised software-based approaches are needed rather than drop-down lists as terminologies can be very extensive and impractical to use in common data collection mechanisms. For example: can you imagine a drop down list of 5000 entries! All terminologies provide some structure (such as parent/children relationships which can open up as you drop down a list) to assist with data entry and data retrieval but the interface terminology is the most user friendly as it represents the language most commonly used and is most expressive. 1.1.3

Reference Terminologies

Reference terminologies are hierarchical and multipurpose for example SNOMED CT. This is the most extensive and widely used clinical terminology and contains more than 900,000 terms that are structured in a poly-hierarchical manner that establishes rich semantic internal links amongst its terms. For example terms that all have the same clinical meaning but may be expressed using any one of the following interface terminology terms: acute myocardial infarction, and AMI, can all be represented in an information system using one SNOMED CT code: 57054005: acute myocardial infarction This is defined in the figure below:

This indicates that an acute myocardial infarction is: an acute heart disease, is a myocardial infarction, with clinical course sudden onset and/or short duration with associated morphology - acute infarct and finding site myocardium structure. There is no other condition which has all of these attributes therefore the condition is completely and uniquely (fully) defined. This enables these data to be retrieved via the use of any of these data element components as a search term. This same code may also be mapped to the term AMI from an aggregate terminology or classification system when required for statistical reporting purposes. Note that the interface terminology terms are the most granular, and the classification system term is the least granular. Interface and reference terminologies and their relationships to classifications will have considerable bearing on how, when and at what cost electronic health records can be implemented. Mapping and modelling between terminologies and classifications is a critical and necessary first step in constructing the foundations for meaningful and accurate communication of medical information across care sectors and between carers [10]. The use of terminologies enables us to build new knowledge from which decision support systems can be developed but only if all systems make use of the same standard terminologies. Most software applications that make use of the SNOMED CT terminology create a ‘subset’ of terms to suit the software application’s purpose to avoid having to store all of the SNOMED CT terms in the field. As a consequence there may be multiple ‘subsets’ of terms even for the same software application types unless an agreement is reached about adopting a standard ‘subset’ of terms by the stakeholders involved. Even with the use of an agreed standard ‘subset’ of terms, the system is reliant on each user to select the appropriate term/code for the concept encountered that needs to be documented. Another widely used terminology is the International Classification of Nursing Practice (ICNP®), a unified nursing language system [3]. This compositional terminology facilitates the development of and the cross mapping among local term and existing terminologies, including SNOMED-CT. It is viewed as an integral part of the global information infrastructure informing health care practice and policy to improve patient care worldwide. Strategically it may be used by the nursing profession to articulate nursing’s contribution to health and health care as it describes nursing phenomena (often describes as nursing problems or nursing diagnoses), nursing actions and outcomes. Healthcare concepts need to convey meaning. In a healthcare terminology this meaning is captured and conveyed through the use of a well-defined term (label, symbol or name). These terms are known as data elements as described. To facilitate both the local quality of healthcare data for use in computerised databases and for any harmonisation efforts, well organised governance systems must be in place in order to facilitate and manage how the data definitions and nomenclatures in use evolve. 1.2 The use case for reference terminologies in healthcare Clinical decision support systems require precise specification of clinical conditions, drugs, procedures or other clinical concepts which should trigger action or thought - such as a Clinician who is giving drug X to a patient with disease B may cause problems such as………Each component, the trigger (ordering a drug), the drug and the disease need to be able to be represented using clinical terminology as these terminologies allow the

hierarchies of relationships to determine that drug X is a type of drug Y and the rules in the computer system indicate that drug Y when given to patients with disease B may cause problems such as…… Knowledge analysis requires a clinical record which is specific and accurate in order to mine the information in the record for trends, both good and bad. This requires precise and consistent representation of the information in the record in a manner that computers can identify and query. Identifying that a particular test was performed and documenting the result of that test requires atomic data at a level that the computer is able to query and retrieve to support clinical decision making. Healthcare today seeks to use knowledge to improve practice through clinical decision support systems, and seeks to increase knowledge through analysis and the study of health and health practice. There is also a desire to improve the management of healthcare through identification of tests performs, drugs ordered and the accurate and timely availability of information about the individual person. None of these activities can be safely or effectively achieved without the use of a well governed, clinically correct computer appropriate clinical terminology. 1.3 What is an Ontology? From a linguistic perspective the word ontology comes from “the study of being” (Greek onto—(being) + logia (discourse) + logos (word—as the essence or existence of words). Ontologies specify concepts and their relationships within a formal structure for any knowledge domain. A concept system is a set of concepts structured according to the relations among them. A terminological system is a concept system with designations for each concept [11]. Every terminology tends to reflect a knowledge domain, such as nursing diagnosis, or medical interventions/procedures or universal identifiers for laboratory and other clinical observations. Every terminology was developed for a specific purpose, though terminologies such as SNOMED CT represent meaning and may be used for many purposes while some ontologies, particularly in healthcare classifications are developed for a specific purposes such as comparing data about a topic of interest between populations, or to create new knowledge based on statistical analysis, or data mining, or to enable the distribution of resources according to predetermined criteria such as number of patients treated by disease or procedure. Every terminology consists of terms used to define a set of knowledge concepts within a domain. The structure of a terminology may reflect existing relationships between concepts. Therefore every terminology may be described as an ontology to some extent. Formal or true ontologies are highly structured to the point where every concept has one meaning (nonvagueness), one meaning cannot describe more than one concept (non-ambiguity) and the meanings themselves can only correspond to no more than one term/code (non-redundancy). Structures are about classifying and organizing concepts. Casual or informal ontologies do not meet these ideal requirements to variable degrees. Consequently individual terminologies will vary regarding their ontological status; this impacts upon their capacity to be reliably used for the application of computational logic necessary for decision support purposes. The SNOMED CT terminology comes fairly close to being identified as a true ontology, but it has shortcomings as not all clinical concepts can be described using this terminology. Mainstream ICT education has not focused much on ontology-based advanced software engineering, yet complex Object Oriented System design benefits greatly from the use of ontologies as these are best able to provide formal specifications of biomedical and other health related knowledge. Health information systems, especially those that manage clinical applications such as electronic health records, need to be able to accurately process simple and highly complex professional knowledge. Such knowledge is best specified through the use ontology development to enable reliable machine processing of such knowledge. In computer science this is referred to as knowledge representation and engineering. 1.4 Indexing Cataloguing and indexing of medical terminology began in the United States of America where medical terms were classified into subject headings. The US Library of Medicine (NLM) has its origins in the library established by the first US Surgeon General appointed in 1865 who compiled a catalogue and an index to assist retrieval in 1873. The first volume of Index Medicus was published in 1879 [12]. Thus, Dr Billings had responded to the problem of data complexity by developing a controlled vocabulary for indexing stored literature citations in its Medical Subject Headings (MeSH) system [13]. This thesaurus facilitates searching and is maintained by the US National Library of Medicine and used to index Medline® and PubMed®, an article database and is used for the purpose of indexing journal articles and books in the life sciences Data Thesaurus [14] A thesaurus is a place for finding words to suit a concept, idea, feeling or object that one wishes to describe or define. It is like a dictionary in reverse. It represents a different organisation of the same data set. A

thesaurus arranges a language on the basis of broad areas of meaning. It is organised alphabetically using a system of keywords. Under each keyword a number of associated words are provided for the reader to consider. There are three levels of thesauri: 1. Universal, like the Library of Congress List of Subject Headings, or the Library of Congress Classification scheme, or the Dewey Decimal Classification scheme; 2. Broad areas, like the Medical Subject Headings (MeSH) of the U.S. National Library of Medicine, or the Thesaurus of Engineering and Scientific Terms (TEST) originally from the Engineers Joint Council, or the Art and Architecture Thesaurus (AAT) supported by the Getty Trust; and 3. Specific areas, like the Transportation Research Thesaurus (TRT) administered by the Transportation Research Board of the National Research Council, or the ERIC Thesaurus on education. The best example of an electronic coded clinical thesaurus of terms is the one developed in the UK via a number of terms projects completed in 1994 and 1995. Three projects were set up to create the common agreed thesaurus to cover the language used by the medical profession, the professions allied to medicine and the nursing, midwifery and health visiting professions [15]. All terms used were cross referenced to national and international statistical classification systems. This thesaurus has now been incorporated into the SNOMED CT classification system. SNOMED CT contains well over 450,000 healthcare concepts with unique meanings, specified and preferred terms and formal logic-based definitions organised into hierarchies. Both concepts and terms, of which there are more than a million, have IDs that can never be re-used but they can be made obsolete. Terms and IDs are always quoted together. This is the most comprehensive, multi-lingual clinical healthcare terminology in the world [16] and has been adopted as the national health language by a number of countries. This terminology is owned and governed by the International Health Terminology Standards Development Organisation (IHTSDO). 1.4.1

A Unified Medical Language System

In the early 1990s the National Library of Medicine in the United States undertook to the development of a Universal Medical Language System (UMLS) [17], a meta-thesaurus that amalgamates the terms used by many different controlled vocabularies in use. UMLS integrates and distributes key terminology, classification and coding standards, and associated resources to promote creation of more effective and interoperable biomedical information systems and services, including electronic health records. The UMLS consists of a number of knowledge sources, such as a meta-thesaurus, a semantic network, a specialist lexicon and lexical tools. 1.4.2

US National Cancer Institute (NCI) Thesaurus

The need to accurately code, analyse and share cancer and biomedical research, clinical and public health information was met by the development of an Enterprise Vocabulary Service (EVS) by the NCI as early as 1997 that develops, licenses and publishes the terminology in use for these activities. This widely recognized reference terminology and core biomedical ontology covers vocabulary for clinical care, including all cancers and related diseases, plus anatomy, translational and basic research, public information and administrative activities. It contains over 200,000 cross-links between concepts to provide formal logic-based definitions for many concepts and links to the NCI Meta-thesaurus and other information sources. This meta-thesaurus provides a broad, concept based mapping of terms from over 70 biomedical terminologies. EVS maintains hundreds of NCIt subsets and other code lists [18]. 1.5 Coding and Classification Schema Data which is to be used to analyse and review information is often classified. This provides a method of representing all possible conditions or situations in a controlled number of codes. Classifications always have the ability to indicate 'not otherwise specified' or 'other' so that the whole scope of the domain being represented is able to be indicated.. This organisation of data implies that data elements need to be classified so that they can be assigned to different groups. Codes are unique identifiers for data elements contained within any classification system. As such they tend to reflect the organisation and classification of those data elements. Codes may either reflect the position of the associated data element within a hierarchy or may have no such association. The use of codes enables the digitisation of health information for the purpose of automating data and information processing. It is important to recognise that classified data (data that has been aggregated) does not attempt to convey concise clinical meaning. Precise clinical meaning is modified when data is classified due to:  aggregation of concepts - grouping similar things together  application of rules which say which code to use for a given purpose

This is not a weakness of classifications, but part of their strength for their intended use. They should not misrepresent the original meaning, but might not represent meaning exactly as originally intended in the health record. When looking towards classifying any phenomena for any purpose we must first identify all the elements contained within the universal set of these phenomena. Once the elements have been identified one can begin to look for the sort of classificatory pattern required for logical classification, that is, to classify these phenomena to suit the purpose of classification. In seeking to rationally classify phenomena we need to seek a rationale, based on known relationships between elements, as distinct from an intuitive technique, for the fundamental identification of elements that comprise the phenomena. In doing so, conceptual frameworks relative to associated philosophies influence element identification for classification purposes, hence the variety of classification systems even for identical phenomena. Elements making up the universal set of phenomena may be many and varied. For example the common classification in healthcare International Classification of Diseases (ICD) uses a framework of body systems, with additional components for infectious and parasitic disease, cancers, injuries and supplementary classifications. Data elements need to be assigned codes which must be compliant with other data types. Codes are far more efficiently stored in a computer system than the terms themselves as they are less likely to be misinterpreted. To undertake coding of data or to use coded data we need to be absolutely sure that the meaning and rules related to individual codes are or were at the time of collection as they may change over time. These changes need to be well monitored, as computers are very poor at recognising concepts from inconsistent descriptions. Individual systems may have rules to assist coders to assign the correct code to any given preferred term or concept contained in the system to improve the accuracy of coding. Coding accuracy of all source data is crucial as these codes may be used for further classification purposes such a Diagnosis Related Groups (DRGs). Clinical coding occurs through two different mechanisms. Either the data is directly coded by clinicians as the record of care is created, or data is abstracted from the record by suitably qualified clinical coders. The abstraction approach is usually taken where the coding process and rules are complex, such as the coding of inpatient morbidity data. In this case the information in a medical record is reviewed for the purpose of assigning disease and procedure codes describing an episode of care. This latter coding activity is undertaken for reimbursement purposes. This seemingly simple process is, in reality very complex, it takes up to 2 years to train an entry level Clinical Coder. Software has been developed to assist in the coding process for some classifications but a suitably qualified Clinical Coder is still required to audit the resultant codes. The most widely used classification system is the International Classification of Diseases (ICD). There are many others. Classification schema and associated coding rules determine the degree of detail (level of granularity) about the information extracted that may be retained in the codes themselves. For example when atomic terms are post-coordinated, coding is accomplished through the use of multiple codes as needed to describe the data. Whereas in the pre-coordinated approach every concept (term) such as every type of pneumonia, is assigned its own code. This limits the system use for multiple purposes, and explains the proliferation of terminology systems as each is developed to suit a specific purpose. At the programming level the data type chosen for each code specifies to the computer how to store and use the data. Technical data types are related to the systems architecture or reference model (RM). Numeric types such as integer and real are used to store numbers that need to be used in calculation. Alphanumeric data types such as character and string are used to store data (numbers, letters and other symbols) that are not needed for calculation. The data type determines both the amount of space required to store the data and how the data can be used (machine processed) later. For instance [19]: “the code that is recorded for a ‘Problem/Diagnosis’ may be an element of the data that is included in a clinical document, such as a Discharge Summary, to be transferred from one system to another. It is in the interests of clinical safety that the receiving system and user of that element of data makes the exact same interpretation of it’s meaning as applied in the source environment. Simply presenting the code only may leave such interpretation open/prone to error. To mitigate this risk, a standard ‘specification’ can be defined whereby the code value is complemented by other essential information such as: the ID of code system (or reference Set) from which the code was retrieved for use as well as the version of the system (or Reference Set) used. Collectively, the value and the code system Id and the system version define the ‘data type’ (or properties) of the ‘Problem/Diagnosis’ data element”. The continuing use of systems based on a variety of different architectures, many proprietary, is a major impediment to achieving the benefits to be obtained from consistently collecting and securely exchanging health information electronically (semantic interoperability) between systems within any jurisdiction. Such systems have adopted their own set of data types, where some may be harmonized with other sets of data types and

others cannot be. Many years have been invested to undertake such harmonization as part of the development of the ISO 21090 specification. There is an urgent need for an agreed, standard set, of technical data types to suit electronic health information exchange as this is relevant to the construction and implementation of software solutions. A lexicon or terminology in which the terms have been grouped according to a hierarchical structure is a classification scheme. Some common classifications that would be familiar to most people would include a classification of species or a classification of diseases such as the International Classification of Diseases that is now in its tenth version (ICD-10) with the eleventh version due by 2015 [20]. This well-known and widely used classification system, was developed as a statistical classification of diseases that is endorsed and maintained by the World Health Organisation (WHO). It consists of a tabular list of codes and code titles along with a highly structured index and is used to globally report death (mortality) and diseases (morbidity). Australia has developed a modification consisting of extra characters on some ICD codes, plus the inclusion of the Australian Classification of Health Interventions (ACHI) that is based on its national medical benefits schedule (MBS) for use in acute care. It is known as ICD-10-AM and forms the foundation for its Activity Based Funding (ABF) system described in chapter 16 on Casemix. Often the simple terms in a classification can be combined into complex terms that convey even more specific meaning. Automation requires the building of a lexical database which could also be defined as an electronic data dictionary and thesaurus. A data dictionary is a document that stores not only the names of data elements and records but also technical details such as how the computer will store the data, any relationships that the data has with other data (including synonyms), and programmer comments that describe why the data has been defined in the way that it has and how it can or should be used. 1.6 SKMT – Standards Knowledge Management (Glossary) Tool2 As a result of the vision of the widespread use of information technologies in the health industry we have witnessed many health informatics standards development initiatives. By 2008 many standards were available from various standards development organisations (SDOs) but finding the most suitable set of standards was difficult. It is customary to define concepts used in each standard and to reuse those definitions as appropriate in new standards being developed. However another identified problem was competing definitions across standards. To solve these issues it was decided to establish a web based document register and glossary of terms with a search capability. A number of SDOs have supported this initiative, contributed their glossaries and listed the documents these came from. This web based tool may be used by anyone free of charge. It supports [21]:  Document and standards product identification, links to those documents, search metadata to assist in fining relevant documents, feedback facilities to collect modifications for future revision of documents.  Term identification, their definitions, source, document/s in which the terms appear and, where appropriate the rationale for variations to definitions for terms used in different contexts.  Reporting and data extraction processes that will support the identification of duplications and terminological contradictions requiring resolution and assist in ongoing document production and management.  Include all terms defined or referenced in standards produced by those organisations who are member of the Joint Initiative Council – where SDOs for health informatics seek to harmonise and coordinate standards development activities.

2.

Data Collection, Registries and Minimum Data Sets

A minimum data set (MDS) is the name given to a core set of data identified by users and stakeholders as the minimum number of identified data elements for collection to suit a specific purpose. An example of a National Minimum Data Set (NMDS) is the Admitted Patient Care NMDS which specifies what information needs to be collected nationally about episodes of care for admitted patients in all public and private hospitals in Australia. The Australian Institute for Health and Welfare now has many national minimum data sets included in the Australian Health and Welfare’s Metadata Online Registry. Minimum data sets usually refer to national health data collections, but may refer to data represented in an Australian Standards publications. The first national uniform hospital discharge data set (UHDDS) was developed in the United States of America in 1969. Harriet Werley [22] was among the first nurses during the 1970s, who identified the need for a Nursing Minimum Data Set. A minimum data set does not necessarily describe the individual data elements to be collected. For example the Nursing MDS includes the data element 2

http://www.skmtglossary.org

‘Nursing Intervention’, but unless data elements that are collectively known as nursing interventions are defined and readily collectable from an information system, such data simply cannot be routinely collected. In the case of NMDS reporting there is normally a two-stage process. Firstly data required of service providers to jurisdictions, and secondly from jurisdictions to a national collection agency. These data may in turn be submitted to an international agency such as the World Health Organisation. In such instances the original data collected tends to reduce in data item number and in some cases data specificity as it moves up through the jurisdictional hierarchy. Jurisdictions are provided with a national specification, and agree to collect and transmit the data according to those specifications. Some collections can be reported directly from service providers to national collection agencies. The preservation of information integrity is dependent upon data governance. When looking at some of the items listed for inclusion into a NMDS it should become clear that most items actually represent a unique data set. One example is medical diagnosis. The data elements that make up this data set are probably the oldest, as the collection and classification of diseases began a few centuries ago. These data today and undergo continual improvement and extension. Many countries collect similar data for their own NMDS. Traditionally the focus has been on administrative minimum data sets. These essentially constitute the data collected nationally for statistical purposes. With the introduction of electronic health records and clinical information systems we are witnessing a change. In some respect this is requiring a paradigm shift in thinking. We now need to consider first and foremost the clinical data terminology requirements to support decision making at the point of care. Then we need to ensure that the traditional administrative data required to contribute to our national statistical data collections can be extracted from these clinical data. We also have national clinical health registers, some require mandatory reporting of clinical conditions, for example cancer and some infectious diseases. These registries are used for national public health management as well as for statistical and research purposes. Some data sets contain similar data elements when compared with others. These are the result of health professionals identifying what data sets are required to reflect evidence of their practice. Minimum data set definitions for treatment practices, enable the reporting of key indicators to support clinical outcome research and the evaluation of health services. In these instances there may be a need or desire to map one clinical data set into another to permit comparisons to be made. Ideally there should only be one standard data element representation for each unique clinical concept, as variations make it difficult for information systems to accurately identify and reliably make use of such data in decision support systems. This is on the reason for the development of standard Clinical Knowledge Objects where every data element used to make up such an object is bound to a standard term from a recognized terminology such as the Systematized Nomenclature of Medicine (SNOMED)- Clinical Terms (CT). These knowledge objects or computable clinical content definitions,) also referred to as domain clinical models (DCMs) or Archetypes), may be found in a national or international repository such as the Clinical Knowledge Manager (CKM) [23]. See Chapter 13, Clinical Knowledge Governance.

3.

Public Health and Regulatory Use of Data

Many of the data defined in national Minimum Data Sets are required, often by legislation or formal agreement. Common data collections around the world relate to inpatient disease and treatment, cancer, infectious diseases, and perinatal morbidity, though there are hundreds of others. Agreements require that data be submitted on specific time lines and often relate to fiscal payments. These data collections form the basis for public health trend analysis, identification of diseases and treatments and their costs to the community. This information guides national health targets and is used to monitor public health initiatives such as antismoking campaigns - through the ability to monitor changes in the prevalence and severity of smoking related conditions. With governance processes in place for national and state data definitions the future offers mechanisms which can make monitoring of these conditions routine, as can identification of new trends which emerge.

4.

Mapping

Mapping involves the process of comparing each data element, in one version of data to another version or representational form. A good map will also identify the closeness of fit and where the concept in one data collection and the concept in the other are not identical in meaning. The aim is to ensure that the meanings of the concepts represented by each data element are retained, or where they are not, that it is possible to understand the level of difference. It is a very involved and time consuming process. Decisions made regarding

which concept is mapped to what between data elements need to be carefully documented to ensure consistency. Frequently, there are differences in the level of detail between data sets or from one version to another. This is why it is important to carefully identify the database used, including version and edition, in any research or management report referring to it.

5.

Conclusion

Our health language is used for all communication between the many providers and other stakeholders of the health industry. For data used in direct patient care it is essential to ensure that meaning is retained so as not to create misunderstandings or misinterpretations, and to provide the basis for new approaches to clinical practice through clinical decision support systems and knowledge development analytics. Data used for statistical, fiscal or reporting purposes should not misrepresent meaning but be relevant to their purpose. This data is often derived from direct care data but differs from that data. In a digital environment it is critical that the health language can be converted into computer codes to assist machine processing and retain its meaning irrespective of where or when or how this language was used to generate computer data. This chapter has described various methods used for this purpose. Governing standard health data is of vital importance. It’s a formal process that needs to be undertaken by every healthcare organisation. There is a need to have multiple governance agencies and they all need to work together. Every health worker needs to be aware of the consequences if health data aren’t managed effectively. There is a need to obtain lots of valid and timely data, to generate new information and knowledge to assist and improve informed decision making at all levels within the health industry.

References [1]. K. Joshi, Natural Language Processing, Science 13 September 1991: Vol. 253 no. 5025 pp. 1242-1249 DOI: 10.1126/science.253.5025.1242 [2]. H. Henderson, Encyclopedia of Computer Science and Technology, Revised Edition, Infobase Publishing 2009 [3]. International Council of Nurses (ICN), International Classification for Nursing Practice (ICNP®), Available at: http://www.icn.ch/pillarsprograms/international-classification-for-nursing-practice-icnpr/ [4]. World Health Organisation (WHO), International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3) Available from: http://www.who.int/classifications/icd/adaptations/oncology/en/ [5]. Regenstrief, Logical Observation Identifiers Names and Codes (LOINC®), Available from http://loinc.org/ [6]. GMDN Agency, The Global Medical Device Nomenclature, Available from: http://www.gmdnagency.com/Info.aspx?pageid=2 [7]. American Medical Informatics Association and American Health Information Management Association Terminology and Classification Policy Task Force, An Action Agenda for the United States, National Committee on Vital and Health Statistics (NCVHS) 2007 [cited 14 Nov 2012] Available from: www.ncvhs.hhs.gov/080221p4.pdf [8]. JJ, Cimino, Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med. 1998 Nov;37(4-5):394-403. [9]. N. Hardiker, B. Webber, K. Markert, B. Rauch, B, ‘Supporting the development of formal models of nursing terminology’, in V Saba, R Carr, W Sermeus & P Rocha (eds), Proceedings of the 7th International Congress Nursing Informatics, April/May, Adis International, New Zealand, 2000 p. 582 [10]. Scott P 2002 An Introduction to Health Terminologies, National Centre for Classification in Health, Brisbane, Australia p.18 [11]. AS ISO/IEC 11179.1-2005 Information Technology – Metadata registries (MDR) Framework p.9 [12]. D. Walker, 1991b, ‘Biomedical terminology part 2’, Health Informatics News & Technology (HINT), vol. 1, no. 4, p. 8 [13]. CW. Bishop, ‘A name is not enough’, MD Computing, 1989 vol. 6, no. 4 pp.200–206 [14]. US National Library of Medicine, Fact Sheet Medical Subject Headings (MeSH)® [cited 9 May 213| Available from: http://www.nlm.nih.gov/pubs/factsheets/mesh.html [15]. National Health Service Centre for Coding and Classification (NHS CCC) 1995, Read codes and the terms projects: a brief guide, NHS Executive, Information Management Group, p. 5 [16]. International Health Terminology Standards Development Organisation (IHTSDO), SNOMED-CT [cited 7 Jan 2013] Available from; http://www.ihtsdo.org/ [17]. US National Library of Medicine, National Institutes of Health, Unified Medical Language System (UMLS) [cited 7 Jan 2013] Available from http://www.nlm.nih.gov/research/umls/ [18]. US National Cancer Institute, NCIthesaurus [cited 7 Jan 2013] Available from http://ncit.nci.nih.gov/ [19]. NEHTA Data Types in NEHTA Specifications – A Profile of the ISO 21090 Specification V.1-20100907 2010 p.1. [20]. World Health Organisation (WHO), International Classification of Diseases (ICD) [cited 7 Jan 2013] Available from; http://www.who.int/classifications/icd/revision/en/index.html [21]. Grain H, Health Informatics Standards Knowledge Management Tool Document Register and Glossary Developer’s guide –2009 unpublished. [22]. HH. Werley, NM. Lang, Identification of the nursing minimum data set, Springer Publishing Co., New York 1988. [23]. openEHR Foundation Clinical Knowledge Manager [cited 12 Nov 2012] Available from; http://openehr.org/knowledge/ (international) or http://dcm.nehta.org.au/ckm/ (national)

Review Questions: 1.

What are the terminology characteristics that make their use preferable to the use of natural language in the health industry? 2. How are reference terminologies used? Is there a relationship between data governance and terminologies? Explain why or why or why not.