Health administrative data - Administrative Data Liaison Service

5 downloads 19109 Views 536KB Size Report
2 http://www.datadictionary.nhs.uk/data_dictionary/nhs_business_definitions/c/ .... There is a dedicated annual survey ...... hard drive or plenty of server space.
Health administrative data: Exploring the potential for academic research

Authors Elisabeth Garratt - Research Officer at the Social Disadvantage Research Centre, Department of Social Policy and Social Work, University of Oxford. Helen Barnes - Research Fellow at the Social Disadvantage Research Centre, Department of Social Policy and Social Work, University of Oxford. Chris Dibben - Lecturer in Geography at the University of St Andrews and ADLS Director Suggested citation: Garratt, E., Barnes, H. and Dibben, C. (2010) Health administrative data: Exploring the potential for academic research, St Andrews: Administrative Data Liaison Service.

Print and electronic design Darren Lightfoot, ADLS Service Manager at the University of St Andrews © ADLS 2010

Contents Introduction Chapter 1 - Context 1.1 Introduction to health administrative data 1.2 Outline of health administrative data reviewed in this paper 1.3 Outline of other health data for research Chapter 2 - Previous uses of health administrative data 2.1 Published statistics 2.2 Research using HES data Chapter 3 - Access to the data Chapter 4 - Data preparation 4.1 Software 4.2 Filtering the data 4.3 Data quality 4.4 Linking patient episodes Chapter 4 - Strengths and limitations of health administrative data Bibliography

Introduction Administrative data (data that can be used for research but is collected for other purposes) has the potential to provide a relatively cheap, potentially less intrusive and yet comprehensive resource for research in the UK.

of ADLS publications designed to help raise awareness of the utility of administrative data for academic research.

For further information on the ADLS or this publication then please contact Many European, particularly their Advisory Service on 01334 Scandinavian, countries are, however, 463901, email [email protected] much further ahead than the UK in their or alternatively visit the website at development of administrative data in www.adls.ac.uk. this way, replacing national censuses and major surveys with ‘register’ data. For example, Statistics Denmark now bases most of its national statistics on register data which can be linked longitudinally and between registers of different types, and can also be used to supplement survey data. Controlled access to individual level data from these registers has enabled a variety of research questions to be answered. This method of data ‘recycling’ reduces both costs to the taxpayer (to fund Censuses and surveys) and the burden on citizens from requests for information. Although the use of this type of data has increased in the UK, it still lags behind European neighbours. The Administrative Data Liaison Service (ADLS) is funded by the ESRC to be part of a set of services (including the Secure Data Service) and initiatives aimed at changing this situation. This publication is part of a series

1

Chapter 1 - Context 1.1 Introduction to health administrative data Administrative datasets are made up of routinely collected information, usually gathered during the delivery of a service. In the case of health data, Hospital Episode Statistics (HES) is the main provider, holding information about National Health Service (NHS) secondary care in England. The information is collected for the main purpose of keeping detailed, accurate and up to date records regarding patients and their contact with the health services. A secondary use is realised through the collation of this information into national datasets for the purposes of research. Hospital episode statistics were introduced in 1989-90 as the first case of national coverage of hospital activity. Prior to this, the Department of Health’s (DH) Hospital Inpatient Enquiry was responsible for national collection of a 10% sample of admitted patients which operated from 19531 to 1987. Since HES was launched the breadth of information included has improved through the introduction of new data fields, reflecting both amendments in clinical coding and widening linkage of healthcare data to other datasets.

for research purposes (Dixon et al., 1998). While a range of research was possible using this approach, some topics and methods could not be examined this way and methodological approaches were much more labour intensive. For example, comparing regional variation required collaboration between researchers in different locations to provide data from their regions as this data was not available centrally. This document gives an introduction to the body of research which has grown out of the increased availability of health based administrative data as a research resource. We consider how data has been used by researchers, discuss issues of data quality and outline some of the strengths and limitations of the different datasets. The aim is not to provide a comprehensive review of the literature, but rather to give a flavour of the kind of research that has been undertaken and some of the issues encountered. Healthcare is the responsibility of the devolved UK administrations. This document currently discusses research carried out using English hospital episode data only. Brief details are provided about data held in Scotland and Wales with a view to reviewing

Before national data was made available through the launch of HES, clinicians were able to use the episode statistics for patients under their care

1 http://www.sochealth.co.uk/news/NHSreform.htm (accessed 19/7/10)

2

research using these datasets at a later date.

1.2 Outline of health administrative data reviewed in this paper The datasets referred to in this review are outlined below.

Censuses and Surveys (OPCS) Classification of Interventions and Procedures. The information held Hospital Episode Statistics (HES) is in HES originates from patient notes a data warehouse containing all patient made at point of contact that are contacts with NHS secondary care in entered onto the hospital’s Patient England. It includes care provided by Administration Systems (PAS) by NHS hospitals and for NHS patients trained clinical coders (Aylin et al., treated elsewhere. HES is an episode 2007) when the episode is finished and level dataset where each record relates stored in a central database know as to one period of finished patient contact the Secondary Uses Service. with the hospital services, defined as ‘the time a patient spends in the The data fields in HES fall into four 2 continuous care of one consultant’ main categories: before being transferred to another consultant or discharged. In 1998 the • clinical information about a decision was made to assign activity patient’s diagnoses and treatments; to an individual consultant team (Royal College of Physicians, 2007), which • demographic information about can be either a consultant or another the patient, such as their age group, qualified health professional such as a gender and ethnic category; midwife (Royal College of Physicians & UHCE, 2007). The various HES • administrative information, for datasets record diagnoses using example date of admission and time the World Health Organisation’s waited; and International Classification of Diseases (ICD) which is currently on its tenth • geographical information on the revision. Details of procedures and location of treatment and the area in interventions performed are classified which the patient lives. using the Office of Population, 2 http://www.datadictionary.nhs.uk/data_dictionary/nhs_business_definitions/c/consultant_episode_%28hospital_ provider%29_de.asp?shownav=1 (accessed 20/12/10)

3

Personal identifiers including patient name and date of birth are removed from the dataset. Some of the fields are derived variables calculated from information contained within HES, for example, patient age is derived from date of birth and attendance date and geographical variables are derived from postcode information. Further variables are added ex-post, for example information on the date of death is taken from Office for National Statistics (ONS) mortality statistics. Other variables, such as waiting times, are not entered onto PAS by clinicians or administrators but are coded under HES for administrative purposes. A data dictionary has been produced for each of the datasets and contains information about each field and details of cleaning and derivation rules that have been applied.

information on accident and emergency care in NHS hospitals in England. The dataset is compiled from data sent by NHS Trusts and Primary Care Trusts (PCTs) in England although information has not been returned by all trusts. The data is currently labelled as experimental due to early issues with data quality and coverage. The dataset has been updated annually since data collection began in 2007-08 and provisional monthly updates are also available from the HES website. The 2007-08 dataset currently holds approximately 12.3 million records. Each record contains approximately 100 variables including patient details, reason for and location of accident, hospital arrival, diagnosis, hospital attended, type of department attended, waiting times, and referral source. A small number of variables have been introduced since the dataset was established.

Different episodes relating to the same patient can be linked within or between datasets. This is frequently done with HES data to allow the linkage of patient episodes (the period of care under a single consultant) to create patient spells (the period from admission to discharge3) .

The HES inpatient (including maternity) dataset provides information on admissions to NHS hospitals in England. This is the oldest HES dataset, where data collection began in 1989-90. The data is updated annually and provisional monthly updates are also available from the HES website. The 2008-09 dataset contains approximately 15 million records. Each record contains over 200 variables including patient details,

The HES data is currently divided into three discrete datasets relating to episodes of secondary care: The HES A&E dataset provides

3 http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=1072 (accessed 7/7/10)

4

date and location of treatment, care period, diagnosis, discharge date, and geographical data. It also contains finished and unfinished episodes for every consultant, nurse and midwife.

Provisional monthly updates are also available from the HES website. The 2008-09 dataset holds approximately 60 million records. Each record contains approximately 100 variables including information on appointment dates, attendance types and nonattendances, waiting times, clinical and geographical data, patient details, socio-economic factors, referral source and outcome results. Not all data fields have been present in the dataset from the start; some have been introduced since the dataset was established.

The maternity HES data contains information about all births in England, including those at home and in nonNHS hospitals, although records for births outside of NHS hospitals do not include all data fields. HES contains two types of maternity record: the delivery record and the birth record. The delivery record is the HES record for the mother and contains the same data as a general record, with additional information about the delivery (this additional information is present for each of the babies she gives birth to). The birth record is the HES record for the baby and has the same format as the mother’s record, with general data fields and extra information about delivery that matches the information in the mother’s record4.

Comprehensive summaries of the datasets mentioned in this report, as well as many others, and information on availability and how to access the data can be found on the Administrative Data Liaison Service (ADLS) website (http://www.adls.ac.uk).

The HES outpatient dataset provides information on outpatient appointments to NHS hospitals in England. Each outpatient visit is termed an attendance and is equal to the episodes recorded in the inpatient and A&E datasets. The dataset is compiled from data sent by all NHS Trusts and PCTs in England and has been updated annually since data collection began in 2003-04. 4 http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=925 (accessed 3/6/10)

5

1.3 Outline of other health data for research Administrative data in Wales5 and Scotland6 The Patient Episode Database for Wales is the equivalent to HES in England, containing episode level records of inpatient, day case and maternity care received by patients in NHS Wales hospitals. Some episodes relating to treatment of Welsh residents in English hospitals are also included. It is a far smaller dataset than HES, with around 120,000 episodes of care being processed each year since 1991. Although fewer variables are included, many data fields are shared with HES and in 1997 the Welsh dataset was aligned with England to promote benchmarking. Anonymised records are submitted to NHS Wales for analysis7. Information on attendances at A&E departments in Wales is held separately by the All Wales Injury Surveillance System. Currently this does not include all of Wales but improving coverage has been prioritised.

Information Linkage (SAIL) project at Swansea University. Having refined the methods of data linkage, the project is now focussing on expanding the databank, both geographically and in the types of dataset included. Many organisations have provided – or agreed to provide – their datasets to form a valuable source of health data8. The Scottish equivalent to HES is a national database of health information known as the Scottish Morbidity Record (SMR). It is managed by Information Services Division Scotland on behalf of NHS Scotland and contains annual and quarterly activity since 1981. The SMR contains separate episode level datasets of inpatient, day case and maternity care and is one of the oldest and most complete national health datasets in the world9. Approximately one million records are created annually (Harley et al., 1996). The datasets contain largely the same variables as HES and, like HES, variables have been added over time. It has also been possible to link data from the SMR with information

A databank of health information for Wales is currently under construction as part of the Secure Anonymised

5 http://www.wales.nhs.uk/sites3/page.cfm?orgid=166&pid=4262 (accessed 23/6/10) 6 http://www.isdscotland.org/isd/4150.html (accessed 23/6/10) 7 http://www.capic.org.uk/in_patient.html (accessed 23/6/10) 8 http://www.healthinformaticsresearchlabs.swansea.ac.uk/sailproject.html (accessed 20/12/10) 9 http://www.indicators.scot.nhs.uk/Archive/SMR.html (accessed 24/6/10)

6

collected using the Scottish Health Survey, allowing clinical data to be matched to self-reported data.

the data are not linked to demographic or diagnosis information on patients and so cannot be used to provide prescribing information by age and sex or for specific conditions where the same drug is licensed for more than one purpose.

Other administrative data There are other administrative datasets that can be used for health research. For example, the General Practice Research Database (GPRD) is a database of anonymised records of patients from primary care10. The GPRD currently has data on approximately four million active patients (and approximately nine million people in total) from around 500 primary care practices throughout the UK. Although useful, the dataset does not have national coverage as it only includes a subset of practices (according to the GPRD website it covers about 7% of the population). The practices are self-selected and so the database cannot be said to be representative, in a statistical sense, of the UK. The GPRD has recently been linked to HES and other data and so is useful for tracking a sample of patients from primary care to secondary care.

Although both the GPRD and the PPA prescribing data can be used for certain health analyses, the most widely used and most powerful dataset on health is HES and as such is the focus of this report. Census and survey data A number of social science survey datasets contain information on health, allowing a wide range of health related questions to be answered. There is a dedicated annual survey on health – the Health Survey for England run by the NHS Information Centre – which focuses on a different demographic group each year and looks at a variety of health indicators such as cardiovascular disease, physical activity, eating habits, oral health, accidents, and asthma. Other surveys such as the General Lifestyle Survey (formerly known as the General Household Survey and now part of the Integrated Household Survey) also contain questions on health and use of

As another example, NHS Prescription Services collects and collates all prescribed medication data from primary care11. Prescribing data are uploaded to a national database and updated on a monthly basis. However, 10 http://www.gprd.com/home/ (accessed 22/7/10)

11 http://www.nhsbsa.nhs.uk/PrescriptionServices.aspx (accessed 22/7/10)

7

health services, smoking and drinking. A number of longitudinal studies in England collect information on health, and some of these have also been linked to administrative data (including HES). Those that have linked in HES data include the Millennium Cohort Study, the English Longitudinal Study of Ageing, and the Avon Longitudinal Study of Parents and Children.

8

Chapter 2 - Previous uses of health administrative data 2.1 Published statistics Self service data

The information captured in HES is made available through published statistics. A range of data is available to download directly from the HES website, providing an easily accessible source of top level data for researchers, policy makers and clinicians.

The self service option allows users to download their own custom tables from a selection of available data. Data relating to a particular region, year and diagnosis can be specified so that data downloads are targeted to the needs of users.

Annual and monthly data

Articles and research

National level annual data is available to download for the inpatient, outpatient, A&E and maternity datasets. Data relating to a range of conditions and characteristics in the inpatient, outpatient and maternity datasets can be accessed and downloaded in pdf and excel format. The A&E data is presented in report format with accompanying discussion as it is still considered experimental. Annual data relating to mortality, critical care and patient reported outcome measures can also be freely accessed.

This section of the HES website presents a brief discussion of research undertaken using the HES data. It is split into the following four sections: • Articles: This provides some examples of research articles that have used HES data. It includes a reference and short summary for each article mentioned. At present there is little information here. • HES on... : This section contains a small number of articles that each focuses on a particular condition, such as breast cancer, food poisoning and sunburn. Each article provides general information about the condition as well as prevalence figures by age and sex and further details from HES such as average waiting time for admission and length of stay.

From 2008, provisional monthly data relating to all three datasets has been published. These summaries include a broad overview of key facts along with a 12 month comparison with the previous year. These data are considered estimates until the annual publication is released and are subject to changes and revisions each month.

9

• Statistics papers: These are a small selection of articles that have used HES data in their research. They cover widespread themes including a discussion of pregnancy and alcohol and trends in ear, nose and throat (ENT) admission rates.

• Topic papers: Articles in this section consider methodological issues relevant to HES and to researchers using HES data. It includes information about changes to clinical coding and organisational changes as well as a discussion considering waiting times and how these are calculated.

2.2 Research using HES data There is now a substantial body of health research, produced independently of the DH, applying original analysis of individual level administrative data to address a wide range of research questions. Clinicians who undertake research were previously able to access and use organisation level data for research and also had the option to examine the records of patients under their care, so the availability of administrative data has not totally transformed health services research. However, data from HES enables a far broader range of research questions to be asked, for example about regional differences in the prevalence of disease, comparisons of methods and consideration of health inequalities.

undertaken using these data, and a few main themes are discussed below. All of the publications cited in this section, and many others included in this report, are available from the publication hub on the ADLS website (http://www.adls.ac.uk/).

The richness and breadth of data contained within HES provides many possibilities for diverse research. A wide range of research has been

10

Policy evaluation and research Official statistics, for example on waiting times, do not use HES data due to historical concerns over coverage and accuracy. Additionally there are significant differences in how waiting times are calculated between HES and NHS data such that published HES data is not used to monitor performance against waiting time targets12. Despite this, the health services are

continually introducing new guidelines on how patients should be cared for and how this is financed and managed, based on policy objectives to promote high quality care and optimum efficiency in the NHS. Such initiatives require evaluation and some clinicians who also perform research sometimes undertake informal policy evaluations, supplementing the clinical data with their experiences of the policies.

Table 1: Examples of policy evaluation Propper et al. (2008) Did ‘targets and terror’ reduce waiting times in England for hospital care? This research sought to assess the impact of waiting time targets introduced by the English government in 2000. These were preceded by devolution of responsibility for health services to the constituent nations of the UK in 1999 which created a natural experiment whereby a common policy environment existed before the change, followed by a policy divergence after this. HES data in England was examined for the years 1997 to 2004 alongside equivalent data from Scotland, which did not adopt the targets regime. Data from two data sources was used: waiting list census to provide a snapshot of the list at a particular time-point and hospital discharge data for information on realised waits. Difference-in-difference models were estimated for the proportion of people on waiting lists who waited over 6, 9 and 12 months.

(continued)

12 http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=412 [accessed 8/7/10]

11

Table 1: Examples of policy evaluation (continued) Farrar et al. (2009) Has payment by results affected the way that English hospitals provide care? Difference-in-difference analysis This piece of research was commissioned by the DH with a view to assessing the effects of the system of ‘payment by results’ introduced by the DH in 2002 by examining the outcome variables of volume, cost, and quality of care. Difference-in-difference analysis and analysis of patient level secondary data with fixed effects models was undertaken to compare outcomes between those implementing the change with a control group of trusts across England and Scotland not implementing the new system. Retrospective data from HES and Scottish morbidity records between 2002 and 2006 were used to establish the effects of payment by results. Data was converted from episode level to spell level so that the period of care in hospital could be examined.

12

Assessments of interventions and treatments For many conditions and diseases there is a range of treatment options available. As new interventions are developed, this further widens the treatment choices available to

clinicians and patients. Not surprisingly, therefore, a significant body of research is dedicated to assessing the effectiveness of different treatments and interventions.

Table 2: Examples of research assessing interventions and treatments Gravelle et al. (2007) Impact of case management (Evercare) on frail elderly patients: controlled before and after analysis of quantitative outcome data This study sought to evaluate the impact of the Evercare case management approach to patient outcomes in elderly people. Quantitative examination of emergency hospital admissions, emergency bed days and mortality taken from HES and qualitative interviews with staff, patients and carers were used to assess the success of the scheme. This compared the experimental group of patients enrolled in Evercare with a control group of patients from all other English practices. The design enabled the change in outcomes to be assessed before and after the intervention to remove the effect of baseline differences. Propensity score matching was employed to control for differences in Evercare and control practices. (continued)

13

Table 2: Examples of research assessing interventions and treatments (continued) Lenaghan et al. (2007) Home-based medication review in a high risk elderly population in primary care-the POLYMED randomised controlled trial The objective of this research was to establish whether home based medication reviews by pharmacists affect hospital readmission rates among elderly people. Patients aged 80 or over were randomly allocated to receive home based medication review or usual care. Those in the experimental group had an initial home visit from a pharmacist and a follow up visit six to eight weeks later. The study examined the total number of emergency hospital admissions over six months taken from HES as well as the secondary outcome measures of death, admission to residential or nursing homes and quality of life self assessments. The analysis employed regression methods to compare admission rates between groups and survival analysis to compare mortality. The variables of living alone and confusion were also controlled for.

14

Research considering cost-effectiveness Given the limited resources of the health services, a considerable body of research has been undertaken to examine the cost-effectiveness of healthcare in order to identify and promote efficiency measures that make best use of resources. Publications typically seek to ascertain the most cost-effective method of treating a

particular condition by comparing the health outcomes of different interventions with their associated costs. This usually involves reference to the cost of procedures, length of stay, need for follow-up care, contact with other parts of the health services (GP visits, outpatient appointments and re-admissions) and the price of drugs.

Table 3: Examples of research considering cost effectiveness Farndon et al. (1998) Cost-effectiveness in the management of patients with oesophageal cancer This study assessed the relationship between clinical outcome, quality of life and cost for treatments commonly used to manage oesophageal cancer. A range of management approaches is available and assessments of clinical effectiveness have been undertaken but cost implications have not previously been considered. Clinical data for prospectively and retrospectively recruited patients from HES were examined to compile the hospital management cost profile for oesophageal carcinoma. The cost incurred for each patient was related to their survival from the date of diagnosis to produce a measure of cost per unit survival. Cost according to quality of life was assessed using the results of quality of life questionnaires completed before the onset of treatment and at monthly intervals for three months following the date of first treatment.



15

(continued)

Table 3: Examples of research considering cost effectiveness (continued) Jit and Edmunds (2007) Evaluating rotavirus vaccination in England and Wales Part II. The potential cost-effectiveness of vaccination This research focused on the role of the rotavirus in causing acute gastroenteritis in children, a disease that is responsible for significant costs to the NHS. The cost-effectiveness of two vaccines that have recently completed clinical trials was investigated. The researchers created a model that followed imaginary cohorts of children in England and Wales from birth to five years of age. The net costs of this programme were estimated as the cost of vaccination minus the savings to the NHS resulting from vaccination outcomes. This included health provider costs, economic costs and quality adjusted life years lost due to rotavirus-related deaths as well as hospital admissions, attendances at A&E and general practitioner consultations extracted from HES. The estimated number of quality adjusted life years saved in patients and their carers as a result of immunisation provided a measure of benefits.

16

Influence of external events on health The level of detail contained within the clinical coding of HES allows for research examining a wide range of different causes of illness or injury. For example, road safety can be considered through the identification of episodes where the cause of injury is a transport accident. Episode statistics can also be linked to other data sources at an aggregate level to consider the health implications of external factors such as weather patterns and air quality.

Table 4: Examples of research examining the influence of external events on health Kovats et al. (2004) Contrasting patterns of mortality and hospital admissions during hot weather and heat waves in Greater London, UK Based on previous research findings that mortality increases during hot weather and heat waves, this research sought to assess evidence on its effects on non-fatal outcomes in the UK. Data on emergency hospital admissions in Greater London over a six year period were obtained from HES and stratified into eight age groups and six diagnostic groups, including circulatory disease, renal failure and cerebrovascular disease. Data on influenza activity, air quality and weather patterns were obtained from the relevant organisations. The relationship between admissions and daily mean temperature was investigated using linear modelling. The models controlled for year, humidity, ozone, influenza, day of the week and public holidays (including Christmas). The effects of a six day heatwave were defined as the increase in admissions seen during this period based on the number of admissions predicted by the model.

17

(continued)

Table 4: Examples of research examining the influence of external events on health (continued) Wilkinson et al. (1999) Case-control study of hospital admission with asthma in children aged 5-14 years: relation with road traffic in north west London (continued) This study reports the results of research in north west London examining whether the risk of hospital admission for asthma is higher in children living near major sources of road traffic emissions to establish the existence of an association between road traffic pollution and asthma. Children aged 5 to 14 years were identified from the HES A&E dataset and their location of residence was determined as the centre of the residential postcode recorded in HES. The primary diagnosis field was used to identify emergency admissions due to asthma and respiratory disease. A control group of all other children with an emergency admission (excluding those due to accidental injuries) was included for comparative purposes. Road locations were obtained from Ordnance Survey and traffic data were calculated using the London Research Centre’s road traffic model for London. The two datasets were linked using a Geographical Information System and roads were then assigned estimated traffic volumes. A number of different measures of exposure to traffic were utilised. Logistic regression was used to consider the association between traffic exposure and hospital admission for asthma and respiratory illness. The possibility of results being confounded by socio-economic status was controlled for by assigning each child a Carstairs deprivation score (see below) according to the census enumeration district of their home location.

18

Investigating health inequalities It is widely accepted that poorer people are more likely to suffer from ill health and consequently a disproportionate share of healthcare resources are used by people living in deprived circumstances. The HES datasets each contain information on deprivation levels in patients’ area of residence which enables data on the incidence and severity of health problems to be considered with reference to the level of deprivation of a patient’s surroundings, typically measured using the Townsend Index, Carstairs Index or Index of Multiple Deprivation (IMD). The Townsend Index was devised by Townsend and colleagues during the late 1980s and offers a material measure of deprivation and disadvantage based on information from the 1981 Census. It employs the four variables of unemployment, noncar ownership, non-home ownership and household overcrowding to construct ward level scores (Townsend et al., 1986; Townsend et al., 1987). The Carstairs Index was developed by Carstairs and Morris, also during the 1980s. It uses census data to construct four indicators of material disadvantage: overcrowding, male unemployment, non-car ownership and the proportion of people in social classes four and five. These

components are combined to form a postcode level composite score (Carstairs and Morris, 1989). The IMD is a new methodology for the creation of small area indices of deprivation that was developed for England in the late 1990s by the Social Disadvantage Research Centre at the University of Oxford. This has since been adopted by all the countries of the UK. The first English Indices of Deprivation were produced in 2000 for the government Department of Environment, Transport and the Regions; they have since been updated on two other occasions (2004 and 2007) and are currently being updated for a fourth time13. The Indices include the IMD, measured at Lower-layer Super Output Area (LSOA) level, and two supplementary income deprivation indices (also at LSOA level): the Income Deprivation Affecting Children Index and the Income Deprivation Affecting Older People Index. The IMD comprises 38 indicators in seven domains of deprivation (income, employment, health, education, barriers to housing and services, living environment and crime). The indicators in the IMD are mainly constructed from administrative data sources, with only a small number making use of Census and survey

13 http://www.communities.gov.uk/publications/communities/englishindicesdeprivationcon [accessed 5/7/10]

19

data. HES data are used for some indicators in the health domain. The knowledge gained through such research can then be used to target resources and tailor public health interventions in an attempt to narrow the gap in health outcomes between rich and poor.data sources, with only a small number making use of Census

and survey data. HES data are used for some indicators in the health domain. The knowledge gained through such research can then be used to target resources and tailor public health interventions in an attempt to narrow the gap in health outcomes between rich and poor.

Table 5: Examples of research examining health inequalities Groom et al. (2006) Inequalities in hospital admission rates for unintentional poisoning in young children This piece of research sought to determine the relationship between deprivation and hospital admission rates for unintentional poisoning, by poisoning agent, in young children (aged 0 to 4 years). All admissions due to unintentional poisoning in children aged 0 to 4 from 862 wards in the East Midlands between April 1995 and March 1997 were extracted from HES. Admissions that had a disease cause code indicating unintentional poisoning or poisoning were included. Each patient was linked to their ward of residence using postcode information contained in HES and the number of admissions due to poisoning in each ward was calculated both by age group and poisoning agent. Townsend scores were used as a proxy for material deprivation. Regression methods were used to determine incidence rate ratios for admission rates due to poisoning.

20

(continued)

Table 5: Examples of research examining health inequalities (continued) Pollock and Vickers (1998) Deprivation and emergency admissions for cancers of colorectum, lung, and breast in south east England: ecological study This study investigated the relationship between deprivation and acute emergency admissions for the three most common cancers found in south east England. Consultant episodes for all cases with a primary diagnosis of the three cancers were obtained from the HES inpatient dataset for residents of the Thames regions for the three years from 1992 to 1995. The episodes were linked using sex, date of birth and postcode to obtain data on patients and admissions. Postcode information was used to assign patients’ district of residence within the region and Townsend score. Four main analyses were completed for each tumour site and decile (tenth) of deprivation: first, the number and proportion of day case admission; second, the number and proportion of emergency admissions; third, the proportion of admissions to hospitals that treat more than 100 patients for the cancer in question; and fourth, the proportion of patients to have received therapeutic or palliative surgery in any admission.

21

Comparison of providers and regional variation The standardised nature of the HES datasets enables tentative comparisons to be made between providers. Such comparisons serve both to understand geographical variation in the incidence of disease, and to assess the methods and performance of different providers. These evaluations must be made carefully due to other factors

that contribute to outcomes, such as geographical variation in hospital practice and demographic differences in the populations served by different hospital providers. Comparisons between providers have also been made with respect to hospital management and the internal workings of health care providers.

Table 6: Examples of research comparing providers Judge et al. (2007) Patient outcomes and length of hospital stay after radical prostatectomy for prostate cancer: analysis of Hospital Episodes Statistics for England The purpose of this research was to investigate the relationship between the number of radical prostatectomies (RPs) carried out at individual hospitals and consequent morbidity and mortality following recent reports of worse outcomes in low volume hospitals. The high frequency of prostate cancer and treatment involving RPs is well established but factors influencing common adverse outcomes are not so well understood. A HES inpatient dataset of men undergoing RP between 1997 and 2005 was used. Hospital trust volume was divided into quintiles, the degree of co-morbidity was classified for each patient and the IMD was used as a measure of small area level deprivation. Age and year of surgery were controlled for during the analysis. Three main outcomes of 30-day in-hospital mortality, length of stay, and complications of surgery were examined.

22

(continued)

Table 6: Examples of research comparing providers (continued) Thompson et al. (2004) Patterns of hospital admission for adult psychiatric illness in England: analysis of Hospital Episode Statistics data The patterns of psychiatric hospital admissions and the type of patients admitted at a national level need to be assessed and reported for strategic service planning. This study used cross-sectional HES data on inpatient psychiatric admissions among patients aged 16 to 64 to investigate patterns by region, age, gender and diagnosis. Only spells where the first episode of care related to a psychiatric diagnosis were included. Admission rates were calculated using population estimates from the 2001 census as the denominator. Median length of admission, mean total bed days and the proportions of patients remaining in hospital for more than 90 and 365 days were examined within each of the diagnostic groups.

23

Identification of trends in healthcare A number of publications have examined hospital episode statistics over a number of years to identify any emerging trends over time. Amongst other topics, studies have explored the

changing incidence of particular diseases over time, rates of particular incidents such as stabbing injuries and fire deaths as well as trends in the treatment of health conditions.

Table 7: Examples of research examining time trends in health care Maxwell et al. (2007) Trends in admissions to hospital involving an assault using a knife or other sharp instrument, England, 1997-2005 This piece of research sought to investigate recent trends in inpatient admissions in England for assaults that involve a stabbing. Data on hospital admissions between April 1997 and March 2005 that had ‘assault by a sharp object’ mentioned in any of the diagnosis fields were extracted from the HES inpatient dataset. Any records indicating that the injury could have been accidental, self-inflicted or of undetermined intent were excluded. In cases where the same patient was admitted more than once during the study period, only the first admission for these individuals was included to avoid the possibility of double counting. Data for all stabbing deaths during this time period were obtained from the Office for National Statistics (ONS). Overall admission rates for stabbing incidents were compared over time and the profile of stabbing injuries was also examined in relation to the age and sex of patients and their length of stay in hospital. The rate of deaths by stabbing over time was also ascertained. (continued)

24

Table 7: Examples of research examining time trends in health care (continued) Fraser and Wormald (2006) Hospital Episode Statistics and changing trends in glaucoma surgery The authors report an observed fall in the frequency of surgical interventions for glaucoma over recent years. This has been attributed to the use of new medications but additional influences have also been hypothesised, such as the increase in cataract surgical rate which may have had an impact on glaucoma surgery. This piece of research sought evidence for this hypothesis using HES data on the main glaucoma procedures and number of cataract operations performed. As information about medical treatment and prescriptions is not included in HES, the role of new medications was assessed using data regarding glaucoma prescriptions supplied by a pharmaceuticals firm.

25

Predictive research As well as examination of past events recorded in hospital episode statistics, this data has also been used to perform statistical modelling to predict future outcomes. In some cases predictions can be made to identify

high risk patients with the possibility of pre-emptive interventions. Episode data can also be used to predict future incidence of disease to inform future health service provision.

Table 8: Examples of exploratory research Naidoo et al. (2000) Modelling the short term consequences of smoking cessation in England on the hospitalisation rates for acute myocardial infarction and stroke The study sought to estimate the short term consequences of attaining two smoking cessation targets for England in a cohort of 35 to 64 year olds in terms of the number of acute myocardial infarctions (heart attacks) and strokes avoided as well as associated healthcare costs. These outcomes were selected for being two of the most common diseases known to increase because of smoking. The effects of achieving first, the government’s smoking cessation target and second, the smoking reduction seen in California were simulated using a spreadsheet model based on previous work, in comparison to the same cohort who had continued to smoke. A range of data sources were used in this piece of research; HES data was used to calculate the admission rates for acute myocardial infarctions and strokes as the cohort aged.

26

(continued)

Table 8: Examples of exploratory research (continued) Billings et al. (2006) Case finding for patients at risk of readmission to hospital: development of algorithm to identify high risk patients The aim of this study was to develop a means of identifying patients at high risk of readmission to hospital within 12 months that could be used by PCTs and general practices. An algorithm was developed using a 10% sample of patients admitted to NHS trusts in England between 1999 and 2004 with a range of conditions for which improved management could help to decrease future admissions. HES data was extracted to create this test sample and a further 10% sample against which the algorithm was validated.

27

Research examining the HES data itself Given the range of research topics that have been examined using the HES data it is not surprising that time is being devoted to considerations of data quality and the statistical methods used in analysis. Studies have

compared HES data with information from other sources, linked HES data with other datasets and considered the use of HES data in assessing hospital performance.

Table 9: Examples of research examining HES data itself Jack et al. (2006) Ethnicity coding in a regional cancer registry and in Hospital Episode Statistics Following the observation that ethnicity is generally not well recorded in UK cancer registries, this piece of research compared the completeness of ethnicity coding within HES data with the Thames Cancer Registry (TCR) database. This study draws on TCR data for the calendar year 2002 and the HES dataset for the financial year 2002-2003. Data fields covering whether ethnicity was recorded, age, sex, deprivation, site of cancer and proportion of non-whites in the local population were considered. As the datasets defined cancer records differently and covered different time periods, no attempts were made to match datasets and validate ethnic code information between them.

28

(continued)

Table 9: Examples of research examining HES data itself (continued) Jen et al. (2008) Descriptive study of selected healthcare-associated infections using national Hospital Episode Statistics data 1996-2006 and comparison with mandatory reporting systems The objective of this study was to compare rates of Clostridium difficile (C. difficile) recorded in hospital episode statistics with mandatory reporting data from the Health Protection Agency (HPA). HES data from 1996 to 2007 were examined to identify inpatients aged 65 years or over with a diagnosis of C. difficile. Total bed-days for inpatients of this age were used as the denominator so that frequencies and rates of infection could be calculated. Episodes relating to surgical site infections following orthopaedic procedures were also extracted and rates calculated. Infection rates within HES were calculated by year, age, sex, method of admission, deprivation and co-morbidity and considered in relation to patient outcomes. Logistic regression was used to examine the likelihood of infection for both C. difficile and orthopaedic surgical site infections. The number and rates of infection recorded in the HES data and HPA data were also compared.

29

Chapter 3 - Access to the data Details on how to access HES data are available on the ADLS website (http://www.adls.ac.uk) and this is not discussed further here. However, it is useful to give a broad picture of who

is given access, the different levels of access that are available and who is able to access the most sensitive data.

What different levels of access to data are there? Data that is not available direct from the HES website can be ordered from HES in the form of tailor- made reports. A tabulation or extract can be requested by any individual or organisation, subject to HES terms and conditions and the Data Protection Act. The data is available at four ‘service levels’ with different characteristics based on the level of detail and sensitivity of the constituent data. The four service levels are discussed below:

Service Level 3 - Bespoke Approved Extracts An extract as described above but containing data fields identified as sensitive, including date of birth, NHS Number and patient postcode.

Service Level 1 - Bespoke Anonymised Tabulations This is an interactive query service where tabulations present average or aggregate results to a particular data specification. Tabulations range in complexity from simple single figure charts to complex cross-tabulations and may be downloaded by all users.

Certain diagnoses are considered especially sensitive and in some cases this data is restricted and may not be released. This includes abortion, neurosurgery for mental disorders, diagnoses of HIV and AIDS, sexually transmitted diseases and IVF. Official figures relating to these cases can instead be obtained from the relevant organisation.

Service Level 4 – Monthly Managed Service This provides provisional monthly pseudonymised extracts and is aimed at organisations that utilise all of the HES data.

Service Level 2 - Bespoke Anonymised Extracts An extract contains individual level data for relevant cases, showing selected information for each case.

30

Who gets given highly sensitive data? The user of the data is not taken into consideration when a request for a HES data extract is made. This means that data may be requested by students as well as researchers from

the academic, commercial and not-forprofit sectors. However, access to the higher service levels is contingent upon the fulfilment of conditions relating to the use and security of the data.

What legitimate reasons are there for requesting such data? Organisations receiving person identifiable data or sensitive data from HES must provide justification for requiring such data. They are also obliged to provide details of their security measures and safeguards with respect to the access, processing and storage of data.

Group.

Researchers receiving extracts of patient identifiable HES data are required to provide a legal basis for doing so prior to the data being supplied. This involves either obtaining consent from patients or through Section 251 of the NHS Act 2006 which exempts the user from obtaining consent in cases where patient data is necessary and consent is not practicable14. A further application is necessary for approval in this case. In cases where sensitive data is requested, special approval is required from the Database Monitoring Sub14 http://www.ic.nhs.uk/services/medical-research-information-service/the-application-process/use-of-patientidentifiable-data-without-subjects-consent [accessed 7/7/10]

31

Chapter 4 - Data preparation Large administrative datasets have many statistical advantages over more conventional survey data; however, they also require a different set of considerations. Drawing on examples from health research, the following

section provides an overview of some common data preparation issues arising in the analysis of administrative datasets.

4.1 Software A key difference between survey and administrative data is the size of the datasets, with administrative datasets often containing many millions of records. For example, there are more than 16 million records contained within the HES inpatient dataset for the year 2008-09. Depending on the analysis being undertaken (i.e. number of records, variables and years of data included, requirement to link to other datasets, statistical procedures used), some statistical software packages

may not be suitable. For most uses Stata or SPSS are sufficient; however, SAS is capable of handling very large datasets and may be required on occasion. A computer with a large hard drive or plenty of server space is necessary to process very large datasets. In addition, a computer with a large processing capacity is invaluable for increasing the speed with which analysis can be carried out on large datasets.

4.2 Filtering the data The size and complexity of the HES datasets means that unlike some other administrative datasets, users are not supplied with the entire dataset. Custom tables of aggregate information can be downloaded from the HES website and to obtain individual level data a bespoke request is made in which users specify their exact requirements and the relevant data is extracted from HES and supplied to the user. These specifications may include, amongst other things, the

year of data required, the range of diagnostic codes, the area of residence of the patients, the age of the patient and details of operative and surgical procedures. By filtering at the time the extract request is made, the size of extract required can be minimised making storage and processing easier.

32

4.3 Data quality As the HES datasets are widely used by government departments and other key stakeholders, data quality is prioritised to ensure the highest possible standards are reached. The following sections present a discussion of data quality and how this is promoted. Cleaning the data

3. Manual cleaning This involves the removal of duplicate records as well as records outside the relevant date range. Some trusts also specify their own data correction rules. This element of cleaning is applied at the end of each financial year. 4. Derivation Information is derived from existing data fields. This includes group fields, such as age and descriptions of procedures and diagnoses. Reference data from the ONS Postcode Directory is used to derive geographical variables on location of residence and treatment.

The importance of maintaining high quality datasets means that several stages of cleaning are applied to the data within HES before it is published or released to researchers. The promotion of data quality within HES means that data extracts do not require any further cleaning. In addition, regular data quality checks are applied throughout the year and Four stages of data cleaning are the HES data quality team contact undertaken by HES: the individual trusts if problems are identified. There are a number of 1. Provider mapping recognised data quality issues with Old or invalid provider codes are data relating to maternity episodes, corrected to produce valid provider and for this reason maternity data codes. also undergoes additional validation, including cross comparisons with data 2. Automatic cleaning on registered births from ONS. A pre-defined list of cleaning rules is applied to remove or correct common errors and promote data quality. This is applied at the end of each financial year.

33

Data errors Data quality notes relating to each of the three datasets are released annually, each containing details of all known data issues within that particular annual publication15. They are updated where new issues are identified and users are encouraged to check for any updates prior to using the data. These notes typically discuss national level anomalies that have been identified and known issues specific to individual trusts; both of these carry an impact statement.

A guide to the use of HES data produced by the Royal College of Physicians explained that the most reliable fields are those that are widely used. The main source of errors occurred between the recording and coding of clinical information, making accuracy of information supplied by physicians crucial to promoting data quality (Royal College of Physicians & UHCE, 2007).

Variation in collection and recording methods The compulsory collection of data includes several stages with a range of individuals being responsible for different components of the process. Williams and Mann (2002) summarised seven steps that take place in clinical data collection, involving doctors, hospital management, coders, and finally the Department of Health. To encourage maximum accuracy and consistency, the responsibility for entering patient data onto the hospital’s PAS lies with trained clinical coders, using national rules. Clerical staff are responsible for the input of administrative data (such as start and end dates and admission type) onto PAS (Royal College of Physicians,

2007). Despite this, there inevitably remains some margin for variation in the approaches taken towards the collection and recording of data and the reliability of HES data has often been criticised, especially in its early stages (Williams and Mann, 2002). The greatest margin of variation in data collection is introduced first in doctors’ notes documenting diagnoses and procedures (or a list of signs and symptoms where a diagnosis has not been ascertained), and then in the identification and coding of relevant diagnoses and procedures by coders. In the case of the former, genuine clinical uncertainty imposes an upper

15 http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=1189; http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=898 [accessed on 5/7/10]

34

limit of confidence on the validity of patient records. Additionally, the data only includes surgical procedures and not counselling, clinical examination and administration of drugs, potentially compromising the completeness of records (McKee and James, 1997). With regard to the coding of diagnoses, identifying the correct diagnoses and procedures involves some interpretation of doctors’ notes by coders, even when the relevant data dictionary is used. While collaboration between clinicians and coders, in particular sign-off by clinicians, is expected to produce the most accurate results (McKee, 1993; McKee and James, 1997), whether this happens in practice is questionable. The verification of coders’ work by clinicians would add an important level of quality assurance to the translation of doctors’ notes into medical records. To minimise translation errors, HES contains 20 diagnoses fields which can contain information about a patient’s illness or condition (recorded using ICD-10 codes), an increase from 14 before April 2007 and seven before April 2002. Having a large number of spaces available for diagnostic codes aims to prevent loss of information as records are summarised and also allows co-morbidities to be included without being displaced by complications of the principal

diagnosis. Twenty-four fields are allocated to recording procedures and interventions (recorded using OPCS4.4 codes), an increase from 12 before 2007-08 and four before 2002-03. The first field is expected to contain the code of the main procedure and intervention, usually considered the most resource intensive. However, the use of ICD-10 coding has been questioned by some researchers (Williams and Mann, 2002). The main purpose of ICD coding is to record mortality and morbidity data, not management of care, so the appropriateness of this coding system is under question (Hobbs et al., 1997). Another predominant concern is that ICD codes provide no distinction between a suspected case and a diagnosed case which has implications both for clinicians and for researchers using the data to estimate disease prevalence (Prins et al., 2002). Coding problems are also expected as ICD codes do not contain an explicit definition of disease and coders may consequently have difficulty in ascribing the correct codes if they have not received clinical training. In addition to individual level variation in data collection and recording, trusts have different ways of managing specialties. Consultants are registered under a particular main specialty which may be different to the treatment

35

specialty under which the consultant with prime responsibility for the patient is working. Consultants may also operate over more than one treatment function code area. These considerations mean that care must be taken when analysing HES data by specialty or by groups of specialties, especially if comparisons are being made between trusts.

Missing data The data quality notes also detail shortfalls in coverage, listing which trusts have missing data, the months affected and an estimate of the number of records missing, along with a warning that these trusts’ data should be interpreted with caution. Coverage is assessed by calculating the proportion of trusts submitting data over time. These calculations exclude the independent sector because HES are not able to determine the number of independent sector trusts who conduct outpatient appointments. It is worth noting that if the data field from which other fields are derived is unknown or incorrect, the data quality of derived variables will also be compromised. For example, geographical variables such as county of residence and electoral ward are derived from postcode information so if this field is missing or has been

entered incorrectly, the quality of derived variables will also be incorrect or contain errors. Within the HES data there are certain known patterns of missing data which are worth highlighting here. Some of these omissions relate to nonmandatory fields where submission is not obligatory, resulting in a greater proportion of missing data than would be expected for other variables. For example, the ‘last did not attend date’ variable records the date of the patient’s last appointment that they did not attend and completion of this field is not mandatory. Despite this, coverage is good, reaching 96.0% in 2007-08. Other omissions are associated with data fields that should be completed and returned. For example, only a small number of providers of genito-urinary medicine (GUM) clinics have historically submitted data, generally due to local concerns regarding the handling of sensitive data. In the year 2007-08, just 20 trusts submitted data from a total of 304 providers, equating to coverage of just 6.6%. More broadly, shortfalls in certain areas including maternity, psychiatric and adult critical care data are widely recognised. In cases where a previously voluntary data field becomes compulsory, completion of the field will not be immediate. Submission of the ethnic

36

origin field became mandatory in April 2008 and it is expected that the completion of the field will improve over time. The number of independent hospital providers submitting data has improved over recent years from a very low base. The national dataset requires NHScommissioned independent sector data to be returned so that waiting times can be comprehensively calculated. Additionally, the independent sector accounts for a significant proportion of elective care in certain fields (for example, diagnostics, ENT and orthopaedics) making data submissions vital to assessing the contribution of the independent sector and to avoid introducing bias into the national dataset. The completeness of coverage and quality of data are expected to improve once payments for care are fully related to these data submissions. This is anticipated to incentivise good record-keeping and comprehensive data returns as payments may be affected if data is not submitted. Discontinuity Changing administrative requirements and the introduction of new clinical classifications, revisions to clinical coding and to coding schemes have led to modifications of the datasets

over time. In addition to changes in clinical coding, the values within data items can change over time and such changes may be less well publicised. Organisational changes further undermine consistency across years. Changes to clinical coding Most significantly, in 1995 the recording of disease changed from the 9th to the 10th revision of the ICD; consequently the 1995-96 dataset included a combination of coding classifications. Such a change means that care must be taken when making comparisons of diagnoses between these years. The OPCS Classification of Interventions and Procedures is reviewed annually to ensure that it continues to include the latest procedures and reflect current clinical practice. To minimise disruption to the dataset, all codes present in previous versions are incorporated in the latest revision, currently OPCS-4.4. Measures are being taken to handle the difficulties caused by changes to clinical coding. From April 2003, the HES inpatient data was made available in normalised form for ease of interpretation across years. The values of certain items have been modified so that as far as possible they conform to contemporary standards. Organisational changes Trusts regularly undergo mergers which alter trust boundaries, and changes to the organisational

37

structure of the NHS are also frequent, interrupting the temporal consistency of the datasets. This has implications for the construction of time series analyses as changing hospital boundaries and corresponding effects on case mix may lead to artefactual changes in the data. PCTs and

Strategic Health Authorities (SHAs) were both restructured between 200506 and 2006-07, and HES includes two versions of the SHA using both the old and new SHA structure. The quality of data on derived fields is also affected by organisation changes.

4.4 Linking patient episodes The identification and linkage of episodes relating to a single patient is promoted within HES so that patient activity can be accurately followed over time. This is especially important for the analysis of chronic disease and so that readmission rates can be calculated (Royal College of Physicians & UHCE, 2007). Over recent years measures have been taken to encourage data linkage and make this a straightforward process. A 32 character alphanumeric pseudonymised identification code known as the ‘PSEUDO_HESID’ has been added to HES to facilitate the linkage of episodes16. As it does not contain any personal information about a patient it cannot be unscrambled to reveal any identifiable details about the patient. A matching algorithm compares activity records (using NHS number, date of birth, sex, postcode and local patient identifier) to identify episodes relating to the same patient before assigning the PSEUDO_HESID

to every episode. Patients can therefore be identified for linkage purposes without the need to disclose other data fields that could reveal the identity of patients. For compliance with NHS standards, the PSEUDO_HESID is encrypted at the 256 bit level. To further protect the identity of patients, each requested extract receives a unique version of the PSEUDO_HESID called EXTRACT_HESID, which means that different customers cannot link their PSEUDO_HESIDs together. The PSEUDO_HESID superseded the HESID in 2009, to minimise the risk of patient identification (as described above) and to improve the matching of records where data quality is poor. As such, past concerns about the accuracy of the matching (i.e. the same individual not being assigned an identical HESID), and thus potential bias in the linkage of records, should now be largely addressed. Details on the changes to the matching algorithm can be found in HES (2009).

16 http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=330 [accessed 4/8/10]

38

Chapter 5 - Strengths and limitations of administrative data General strengths and limitations common to all three HES datasets are listed below:

Strengths • • •

All HES datasets can, in theory, be linked as they contain unique identifiers such as NHS number, year of birth, sex and postcode. Linking datasets allows the ‘patient journey’ to be examined. It is possible to link the data with other datasets, for example ONS mortality statistics and police records. A large number of variables are recorded, allowing a wide range of research possibilities.

Limitations • • • • • • • • •

HES data only covers hospital trusts in England, not UK-wide data. No HES dataset covers primary care. Information is collected directly from NHS organisations and there may be coding inconsistencies. Conducting a time-series can be problematic due to changes to trust boundaries and mergers between trusts. Changes to ICD codes and coding rules can lead to artefactual changes in rates of diseases. Admission rates are not necessarily a good measure of condition prevalence or morbidity due to variation in admission policies. Performance measures will in some part reflect variation in hospital case mix. Gaps in information gathering means that HES data is not suitable for all analyses, including alcohol and drug misuse, cancelled operations and rates of hospital acquired infections. HES does not capture data for individuals who die before arriving at hospital nor for individuals who die after discharge, although this information can be matched from ONS mortality records. HES also does not record the cause of death17.

17 http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=1004 [accessed on 5/7/10]

39

Strengths and limitations specific to the three HES datasets are discussed below: HES A&E dataset Strengths •

A personal identifier field was introduced in 2007-08. A patient identifier for previous years can be approximated from other fields (date of birth, sex and postcode) but accuracy is compromised.

Limitations • •

The dataset is currently labelled experimental by the NHS Information Centre. It does not yet meet National Statistics standards. The dataset is incomplete as the dataset is new and information submission is not mandatory. It contains over seven million fewer A&E attendances than the Quarterly Monitoring of Accident and Emergency aggregate data.

HES inpatient (including maternity) dataset Strengths • •

The inpatient dataset is used by government departments so data quality is strongly promoted. A personal identifier field was introduced in 1997-98. A patient identifier for previous years can be approximated from other fields (date of birth, sex and postcode) but accuracy is compromised.

Limitations • • •

Episode level data is not a measure of patient admission rates (Hansell et al., 2001). Definition of speciality codes is not consistent between trusts. Information relating to babies is not as complete as other elements of the HES data. 40

HES outpatient dataset Strengths • • •

Since 2006-07, the experimental statistics label was removed and the dataset is now accredited as a National Statistic. This means the data within these reports has been produced in accordance with the ‘Code of Practice for Official Statistics’. HES records the specialty of the consultant who is caring for the patient. A personal identifier field was introduced in 2002-03. A patient identifier for previous years can be approximated from other fields (date of birth, sex and postcode) but accuracy is compromised.

Limitations • • • • •

Many variables within the dataset are not mandatory so coverage is low. It is not mandatory to collect data on clinical codes so it is not possible to tell what people are being treated for. Definition of speciality codes is not consistent between trusts Coverage of GUM clinics is low. In 2007-08, just 6.6% of hospital providers submitted data and in previous years the figure was even lower. Coverage of the independent sector is low which undermines national monitoring of healthcare and also makes it difficult to assess the contribution of the independent sector.

41

Bibliography Aylin, P., Bottle. B. and Majeed, A. (2007) ‘Use of administrative data or clinical databases as predictors of risk of death in hospital: comparison of models’, British Medical Journal, 334: 7602-7609. Billings, J., Dixon, J., Mijanovich, T. and Wennberg, D. (2006) ‘Case finding for patients at risk of readmission to hospital: development of algorithm to identify high risk patients’, British Medical Journal, 333: 7563-7568. Carstairs V., and Morris, R. (1989) ‘Deprivation: explaining differences in mortality between Scotland and England’, British Medical Journal, 299: 886–9. Dixon, J., Sanderson, C., Elliott, P., Walls, P., Jones, J. and Petticrew, M. (1998) ‘Assessment of the reproducibility of clinical coding in routinely collected hospital activity data: a study in two hospitals’, Journal of Public Health Medicine, 20 (1): 63-69. Farndon, M.A., Wayman, J., Clague, M.B. and Griffin, S.M. (1998) ‘Costeffectiveness in the management of patients with oesophageal cancer’, British Journal of Surgery, 85 (10): 1394–1398. Farrar, S., Yi, D., Sutton, M., Chalkley, M., Sussex, J. and Scott, A. (2009) ‘Has payment by results affected the way that English hospitals provide care? Difference-in-difference analysis’, British Medical Journal, 339. Fraser, S.G. and Wormald, R.P.L. (2006) ‘Hospital Episode Statistics and changing trends in glaucoma surgery’ Eye, 22: 3-7. Gravelle, H., Dusheiko, M., Sheaff, R., Sargent, P., Boaden, R., Pickard, S., Parker, S. and Roland, M. (2007) ‘Impact of case management (Evercare) on frail elderly patients: controlled before and after analysis of quantitative outcome data’, British Medical Journal, 334: 31-34. Groom, L., Kendrick, D., Coupland, C. and Hippisley-Cox, J. (2006) ‘Inequalities in hospital admission rates for unintentional poisoning in young children’, Injury Prevention, 12: 166–170.

42

Hansell, A., Bottle, A., Shurlock, L. and Aylin, P. (2001) ‘Accessing and using hospital activity data’, Journal of Public Medicine, 23 (10): 51-56. Harley K. and Jones C. (1996) ‘Quality of Scottish Morbidity Record (SMR) data’, Health Bulletin (Edinburgh), 54 (5): 410-7. HES (2009) Replacement of the HES Patient ID (HESID), The Health and Social Care Information Centre. Hobbs, F.D.R., Parle, J.V. and Kenkre, J.E. (1997) ‘Accuracy of routinely collected clinical data on acute medical admissions to one hospital’, British Journal of General Practice, 47: 439-440. Jack, R., Linklater, K., Hofman, D., Fitzpatrick, J. and Moller, H. (2006) ‘Ethnicity coding in a regional cancer registry and in Hospital Episode Statistics’, BMC Public Health, 6, 281. Jit, M. and Edmunds, W.J. (2007) ‘Evaluating rotavirus vaccination in England and Wales Part II. The potential cost-effectiveness of vaccination’ Vaccine, 25: 3971–3979. Jen, M.H., Holmes, A.H., Bottle, A. and Aylin, P. (2008) ‘Descriptive study of selected healthcare-associated infections using national Hospital Episode Statistics data 1996-2006 and comparison with mandatory reporting systems’, Journal of Hospital Infection, 70: 321-327 Judge, A., Evans, S., Gunnell, D.J., Albertson, P.C., Verne, J. and Martin, R.M. (2007) ‘Patient outcomes and length of hospital stay after radical prostatectomy for prostate cancer: analysis of Hospital Episodes Statistics for England’, BJU international, 100: 1040–1049. Kovats, R.S., Hajat, S. and Wilkinson, P. (2004) ‘Contrasting patterns of mortality and hospital admissions during hot weather and heat waves in Greater London, UK’, Occupational and Environmental Medicine, 61: 893–898. Lenaghan, E., Holland, R. and Brooks, A. (2007) ‘Home-based medication review in a high risk elderly population in primary care: the POLYMED randomised controlled trial’, Age and Ageing, 36: 292–297.

43

Maxwell, R., Trotter, C., Verne, J., Brown, P. and Gunnell, D. (2007) ‘Trends in admissions to hospital involving an assault using a knife or other sharp instrument, England, 1997-2005’, Journal of Public Health, 29: 186–190. McKee, M. (1993) ‘Routine data: a resource for clinical audit?’, Quality in Health Care, 2: 104-111. McKee, M. and James, P. (1997) ‘Using routine data to evaluate quality of care in British hospitals’, Medical Care, 35 (10): 102-111. Naidoo, B., Stevens, W. and McPherson, K. (2000) ‘Modelling the short term consequences of smoking cessation in England on the hospitalisation rates for acute myocardial infarction and stroke’, Tobacco Control, 9: 397-400. Pollock, A.M. and Vickers, N. (1998) ‘Deprivation and emergency admissions for cancers of colorectum, lung, and breast in south east England: ecological study’, British Medical Journal, 317: 245–252. Prins, H., Kruisingha, F.H., Buller, H.A. and Zwetsloot-Schlonk, J.H.M. (2002) ‘Availability and usability of data for medical practice assessment’, International Journal for Quality in Health Care, 14 (2): 127-137 Propper, C., Sutton, M., Whitnall, C. and Windmeijer, F. (2008) ‘Did ‘targets and terror’ reduce waiting times in England for hospital care?’, The B.E. Journal of Economic Analysis and Policy, 8 (2): 5. Royal College of Physicians (2007) Hospital Activity Data: A Guide for Clinicians. Royal College of Physicians and Unit of Health-Care Epidemiology (UHCE), University of Oxford (2007) HES for Physicians: A Guide to the Use of Information Derived from Hospital Episode Statistics. Thompson, A., Shaw, M., Harrison, G., Ho, D., Gunnell, D and Verne, J. (2004) ‘Patterns of hospital admission for adult psychiatric illness in England: analysis of Hospital Episode Statistics data’, The British Journal of Psychiatry, 185: 334-341.

44

Townsend, P., Corrigan, P. and Kowarzik, U. (1987) Poverty and the London Labour Market: The Third London Survey: Interim Report, London: The Low Pay Unit. Townsend, P., Phillimore, P. and Beattie, A. (1986) Inequalities in Health in the Northern Region: An Interim Report, Northern Regional Health Authority and the University of Bristol. Wilkinson, P., Elliott, P., Grundy, C., Shaddick, G., Thakrar, B., Walls, P. and Falconer, S. (1999) ‘Case-control study of hospital admission with asthma in children aged 5-14 years: relation with road traffic in north-west London’, Thorax, 54: 1070-1074. Williams, J.G. and Mann, R.Y. (2002) ‘Hospital episode statistics: time for clinicians to get involved?’, Clinical Medicine, 2 (1): 34-37.

45

Administrative Data Liaison Service University of St Andrews, Irvine Building North Street St Andrews, KY16 9AL Tel: +44(0) 1334 463901 Email: [email protected]

© ADLS 2010

www.adls.ac.uk

46