Population Trends Autumn 1999 - Office for National Statistics

1 downloads 920 Views 47KB Size Report
the evaluation of public services, is limited because of the lack of information .... Computer Services CBMDC, six schools in Bradford Local Education. Authority.
Population Trends 97

Autumn 1999

The potential to identify South Asians using a computerised algorithm to classify names Seeromanie Harding London School of Hygiene and Tropical Medicine, Howard Dews Nottingham Health Authority, Stephen Ludi Simpson, City of Bradford Metropolitan District

Ethnic studies are limited by the lack of information on ethnic origin in data sources. This paper examines a computerised approach to identify South Asians by names. Computerised analysis of names is not an ideal solution, but it helps to solve the methodological difficulty of identifying South Asians, particularly in historical administrative datasets where self-assessed ethnicity is not possible and visual inspection is too time consuming and prone to errors. This approach, though useful, is limited to groups easily distinguishable by name and the collection of information on ethnicity should remain the aim in the long-term.

INTRODUCTION The study of ethnic differentials in health, and more generally in the evaluation of public services, is limited because of the lack of information on ethnic origin on databases. Country of birth is usually used as a proxy for ethnicity but with growing numbers of second generation migrants, 1 this approach has become increasingly inappropriate. Analysis of names has been used to identify South Asian, Chinese and Hispanic populations.2-6 Although there have been some major improvements, such as the recording of ethnic origin in the 1991 Census and in records of NHS inpatients since 1995, ethnic group is not recorded on existing data, such as birth, death, electoral and general practice registrations. This paper examines a computerised approach to identify South Asians by names.

METHOD Three datasources in Bradford In Bradford, a computerised approach has been used by the Metropolitan District Council since 1983 to identify South Asians (Box 1). 7 In this paper the term ‘South Asian’ refers to people who originated from the Indian subcontinent. This program was used on 3 datasources of the local population. The ‘Speak-Out’ panel study consisted of a representative sample of 2,510 residents of Bradford aged 16 and over. Each member completed a questionnaire that included their self-assigned ethnicity from a checklist of the categories used in the 1991 census. In the ‘Schools in Bradford’ study, the parents of 3,450 new pupils in 6

Office for National Statistics

46

Population Trends 97

Autumn 1999

Measures

Box one

Self assigned ethnicity (‘Speak out panel’ study, ‘Schools in Bradford’ study and in the Longitudinal Study) and visual inspection of records (Death registrations in Bradford) were used as the references. Sensitivity (South Asians correctly identified/all South Asians identified by reference), specificity (non-South Asians correctly identified/all non-Asians identified by reference), and predictive (all South Asians correctly identified/all South Asians identified by names analysis) values were derived.

COMPUTERISED ALGORITHM The updated program ‘Nam Pehchan’ was written in ‘C’ programming language and produced as a DOS and Windows application. A dictionary of 2,995 South Asian names was used to make matches. If the name was not found in the dictionar y, the first five letters in the name were used which identified a further 15 per cent of South Asian names. Two or more names on the same record were matched. The likely religious (Sikh, Hindu, Muslim) origins and language were then assigned.

RESULTS Studies based on Bradford’s population Tables 1 and 2 show the results from the studies based on people living in Bradford. Sensitivity, specificity and predictive values were more than 90 per cent for South Asian ethnicity. Identifying religious origin of South Asians was achieved with high specificity, but with sensitivity good only for Moslems and Sikhs and less so for Hindus. Results for language, not shown, also gave high specificity but low sensitivity.

ONS Longitudinal Study

schools supplied details of the family including ethnic origin, religion and language spoken in the home. Death registrations in 1995 and 1996 (7,793) of people aged 35 and over at death who were living in Bradford were also examined.

Table 3 shows the corresponding results for Longitudinal Study members born in the Indian subcontinent. Sensitivity was lower in this study compared with those based on people living in Bradford – 61 per cent for females and 73 per cent for males. Specificity and predictive values, however were high (over 90 per cent). The discrepancy was larger for those who described themselves as ‘Indian’ than for those who described themselves as ‘Pakistani’ or Bangladeshi’ in the 1991 Census (not shown). Of those who described themselves as ‘Indian’ in the 1991 Census, 39 per cent were not identified by name. The corresponding figures for Pakistanis were 21 per cent and Bangladeshi 12 per cent.

ONS Longitudinal Study In the ONS Longitudinal Study, 8 a 1 per cent follow-up sample of the population of England and Wales, a special manual exercise was carried out when the study was being set up in the early 1970s to identify those of South Asian origin. This was achieved by assigning South Asian ethnicity through computer processing to extract the records by own and parents’ country of birth. Clerical coding of names on the original manual records was then used to refine the classification. For those who survived and were present at the 1991 Census, this assignment of ethnicity in the 1970s was compared with their self-assigned ethnicity in the 1991 Census. Previous work suggested that significant proportions of those aged over 45 in 1971 would have been children born to British expatriates living in the Indian subcontinent. 9 For these reasons, the sample used in this analysis included those aged under 45 in the 1971 Census who were present at the 1991 Census (337,485 people). For confidentiality purposes all of the analysis of this sample was done in ONS.

Table 1

DISCUSSION Computerised analysis of names is not an ideal solution, but it helps to solve the methodological difficulty of identifying South Asians in the interim, particularly in historical administrative datasets where selfassessed ethnicity is not available and visual inspection is too time consuming and prone to errors. South Asian names are usually distinguishable and endogamy (marriage within the cultural group)10 is still relatively common. Analysis of names provides a better classification of South Asians than that of country of birth but there are limitations.

Identifying South Asians by names using a computerised algorithm. Studies based on people living in Bradford

Reference Speak-out panel (self assigned)

Schools data (parents’ assignment)

Death registrations (visual inspection)

South Asian Names analysis

Yes

No

Total

Yes

No

Total

Yes No Total

249 15 264

10 2,236 2,246

259 2,251 2,510

2,161 42 2,203

45 1,202 1,247

2,206 1,244 3,450

Sensitivity Specificity Predictive value

94.3% 99.6% 96.1%

Yes 251 4 255

94.3% 99.6% 96.1%

No

Total

25 7,513 7,538

276 7,517 7,793

94.3% 99.6% 96.1%

47

Office for National Statistics

Population Trends 97

Table 2

Autumn 1999

Identifying religious origin of South Asians by names using a computerised algorithm. Studies based on children in schools in Bradford.

Reference - Parents’ assignment of religion Hindu

Moslem

Sikh

Names analysis

Yes

No

Total

Yes

No

Total

Yes No Total

67 33 100

14 3,124 3,138

81 3,157 3,238

1,915 143 2,058

25 1,155 1,180

1,940 1,298 3,238

Sensitivity Specificity Predictive value

Table 3

67.0% 99.6% 82.7%

93.1% 97.9% 98.7%

Yes 41 6 47

No

Total

9 3,182 3,191

50 3,188 3,238

87.2% 99.7% 82.0%

Longitudinal Study members present at the 1971 and 1991 Censuses, aged under 45 years in 1971:

Reference - Ethnic origin in the 1991 Census Females

Males

South Asian

South Asian

Names analysis at the 1971 Census

Yes

No

Total

Yes

No

Total

Yes No Total

916 576 1,492

26 129,774 129,800

942 130,350 131,292

1,714 657 2,371

67 127,889 127,956

1,781 128,546 130,327

Sensitivity Specificity Predictive value

61.4% 100.0% 97.2%

The findings from the ONS Longitudinal Study, which compared selfassessed ethnicity with that of name analysis in a national dataset, provided some corroborative evidence for the usefulness of this approach but it also highlighted some limitations. Validation at a national level occurred in both directions – self-reported ethnicity validated the analysis by names and vice versa but sensitivity values were lower compared with those of the studies in Bradford. This could be due to the following reasons – an incomplete name dictionary and the inclusion of partially matched names in Bradford’s computerised algorithm, change of surname of women on marriage, and the use of Anglicised names. The latter is a feature common among those born in southern states such as Kerala. South Asians living in Bradford were mainly Punjabi which would explain the higher sensitivity values. Generally, however, studies have found that name analysis can provide useful insights into the understanding of the aetiology of disease, and for the planning and provision of services. High IHD mortality of South Asians living in England and Wales was confirmed in an early study in the 1970s which identified South Asians by visual inspection of records.6 A recent study in Canada used a computerised algorithm and showed that those South Asian migrants living there also showed high mortality from IHD and diabetes.3 Information on ethnicity in cancer registries in England is incomplete and a study of incidence of cancer among South Asians in the West Midlands, Thames, Trent and Yorkshire has been recently done using the Nam Pehchan program to identify South Asians. 11 Non-health applications of this approach have included monitoring demographic composition of urban areas, service provision of the libraries and extracting sampling frames for surveys. 7,12

Office for National Statistics

48

72.3% 99.9% 96.2%

Current developmental work of the Nam Pehchan program focuses on improving the coverage of names of Tamil and Singhalese origin, and adding sex to the religion and language origin of each name. This will be an ongoing process as new users add information to the dictionary. The program developed in Bradford focuses on identifying those of South Asian origin because Bradford has the largest Pakistani population of any health district. The ethnic mix is different in other parts of the country. It is feasible that this computerised approach of analysis of names could be extended to identify other distinct groups such as those of Chinese origin. It is, however, generally limited to groups easily distinguishable by name. It would not, for example, be suitable for those of Indian origin from the Caribbean among whom Anglicised names are commonly used. The direct collection of information on ethnic group should remain the aim in all surveys and other studies where it is possible.

ACKNOWLEDGEMENTS Computer Services CBMDC, six schools in Bradford Local Education Authority.

REFERENCES 1

Schuman J. The ethnic minority populations of Great Britain - latest estimates. Population Trends 96. The Stationery Office (1999).

Population Trends 97

2

3

4

5

6

7

8

9

10

11

12

Autumn 1999

Nicoll A, Bassett K, Ulijaszek SJ. What’s in a name? Accuracy of using surnames and forenames in ascribing Asian ethnic identity in English populations. J Epidemiol Community Health. 1986; 40(3): 364-8. Colman AJ, Braun T, Gallagher RP, The classification of ethnic status using name information. J Epidemiol Community Health 1988; 42:390-395. Sheth T, Nargundkar M, Chagani K, Anand S, Nair C, Yusuf S. Classifying ethnicity utilizing the Canadian mortality data base. Ethnicity and Health 1997; 2(4): 287-295. Becker TM, Wiggins C, Key CR, and Samet JM. Ischaemic heart disease mortality in Hispanics, American Indians, and non-Hispanic whites in New Mexico, 1958-1982. Circulation 1988; 78: 302-9. Balarajan R. Ethnic differences in mortality from ischaemic heart disease and cerebrovascular disease in England and Wales. BMJ 1991; 302:560-4. Nam Pehchan News, Summer 1998. Computer Services, Bradford Council (Dept 13), Britannia House, Bradford, BD1 1HX. Hattersley L and Cresser R. Longitudinal Study 1971-91: History, Organisation and Quality of data. HMSO (1995). Marmot MG, Adelstein AM, Bulusu L. Immigrant mortality in England and Wales 1970-78: causes of death by country of birth. Studies of Medical Population Subjects No 47. HMSO (1984). Berrington A. Marriage patterns and inter-ethnic unions. Chapter 7 In: Coleman D, Salt J. eds. Ethnicity in the 1991 Census, Volume 1. Demographic Characteristics of the ethnic minority populations, 178-212. HMSO (1996). Winter H, Cheng KK, Cummins C, Maric R, Silcocks P and C Varghese. Cancer incidence in the South Asian population of England (1990-92). BJC 1999: 79(3/4), 645-654. Simpson SN. Demography and ethnicity: case studies from Bradford. New Community 1997;23(1): 89-107.

49

Office for National Statistics

Population Trends 97

Office for National Statistics

Autumn 1999

50