Assessing the Effectiveness of Administrative ...

2 downloads 0 Views 111KB Size Report
Luca Mancini, Luigi Marcone, Francesco Borrelli, Marco Fortini and Alessandra ... Istat [email protected]; Marco Fortini, Istat, [email protected]; Alessandra Ronconi, ...
Assessing the Effectiveness of Administrative Registers in Reducing Under-Coverage Errors in a Population Census: Evidence from the 2009 Italian Census Pilot Survey Luca Mancini, Luigi Marcone, Francesco Borrelli, Marco Fortini and Alessandra Ronconi1

Abstract The paper assesses the performance of administrative archives in reducing undercoverage errors in register-assisted population censuses caused by poorly maintained population registers. Preliminary results from a simulation based on the 2009 pilot survey provide encouraging evidence on the usefulness of these auxiliary archives. Key words: population census, record linkage, administrative registers.

1 Introduction The 15th Italian Population Census to be held in October 2011 will be officially assisted for the first time in history by municipal population registers (Liste Anagrafiche Comunali delle Famiglie e delle Convivenze or LACs). Million of census questionnaires will be delivered by post at the address of each head of household as resulting from the respective LACs. Although by law LACs should be constantly updated to provide at any time a precise snapshot of the resident population living within the municipal borders, coverage errors – either permanent residents not listed (undercounts) or people who have left the municipality or passed away but have not yet been written off (overcounts)- are not uncommon. Therefore, within the new census strategy, the use of auxiliary administrative registers (Liste Integrative da Fonti Ausiliarie or LIFAs) to identify and count households and individuals not properly accounted for in the LACs is regarded as an important asset. Amidst these expectations, the real potential of the LIFAs – will include, inter alia, the National Tax Register and the Permits to Stay Archive– in reducing the undercount of the Italian population induced by poorly maintained municipal population registers is still largely unknown. At a stage where the criteria for the formation of the LIFAs are about to be defined, an assessment of the effective gains from using these registers is particularly timely and relevant.

1

Luca Mancini, Istat [email protected]; Luigi Marcone, Istat [email protected]; Francesco Borrelli, Istat [email protected]; Marco Fortini, Istat, [email protected]; Alessandra Ronconi, Istat [email protected]

2

Mancini, Marcone, Borrelli, Fortini and Ronconi

2 Data and methodology The analysis is based on the Census Pilot Survey (CPS) carried out by ISTAT during the last quarter of 2009 in 31 municipalities [5]. Out of the 14 towns and cities where enumerators were instructed to visit all the dwellings located in the pilot Enumeration Areas (EAs) to track down those households missed out by the mail-out of census questionnaires, only four municipalities (Genova, Prato, Scandicci and Abbiategrasso) were considered in the paper. The choice was dictated by data quality issues which resulted in significant differences in the accuracy and completeness of the survey information across the 31 municipalities. These include two cities (Genova and Prato) with a population exceeding 150,000 and two middle-sized towns (Scandicci and Abbiategrasso) with a population between 20,000 and 50,000. The LACs refer to dates ranging from December 31st 2008 to May 31st 2009 depending on the municipality, i.e. they are between 5 to 10 months old at the time the CPS was conducted. The main register for the construction of the LIFA is the National Tax Registry (Anagrafe Tributaria –AT) which contains about 80 million records nationwide. After excluding the deceased as well as the expatriates who are no longer in the country at the time of the survey a series of validation rules is applied to decide whether or not an individual record in AT should to be included in the LIFAs. For instance an individual record qualifies for inclusion if it is found for the same individual in other administrative registers such as student archives, maternity ward registries, pensioners’ archives and others. In case some of the information for the same individual differs across two or more archives, the most recent available entry is retained in the LIFAs2. The size of the populations of interest whose records have been linked is shown in Table 1. Table 1: Population size of the linked archives by municipality Municipality

LAC

LIFA

CPS

Genova

LAC updated to: 31/12/2008

Undercounts (UC) 316

605895

57627

9716

Prato

31/05/2009

186608

43109

9536

273

Scandicci Abbiategrasso

31/01/2009 31/12/2009

49764 31145

2325 2734

3573 2077

9 74

The paper uses probabilistic record linkage techniques to link up records from the CPS with the same record(s), if any, found in the corresponding LIFA. The model maximizes a log-likelihood function via an iterative EM algorithm proposed by Fellegi and Sunter [3]. The estimation is done using the software Relais 2.2 [4]. The analysis aims to determine how many of the residents missed out by the CPS’s questionnaire mail-out could have been found on the basis of the supplementary information provided by the LIFAs3. The simulation is carried out at the municipal level. For each municipality the undercounts are defined as the members of those households living permanently at an address within the

2

We are grateful to ISTAT’s Central Department for Archives, and in particular to Carla Runci, for sharing the database and for the assistance in compiling the LIFAs. For more details on the LIFA’s validation rules see [2]. 3 It should be noted that for the 2011 Census the LIFAs will be used before post-enumeration field operations begin to direct enumerators in their filed search.

Assessing the Effectiveness of Administrative Registers

3

municipal borders4 which never received the questionnaire because the LAC contained no indication of their presence at that location. These individuals were included in the census only at a later stage when they were found by the enumerator in her door-to-door visits following the main questionnaire collection stage. These individuals are easily identified because their household questionnaires were assigned a different code type in the Enumeration Management System (SGR)5. Once identified, the records were linked with their municipal LIFA. The RL strategies are discussed in the next section.

3 Results Table 2 presents the model’s specification as well as the main results of the record linkage between the CPS and the LIFAs. RL strategies vary accordingly with the dimensions of the municipality and reflect the need to use more aggressive space reduction solutions in more populated localities in order to make estimation computationally feasible. For each municipality two indicators are used to assess the LIFAs: a) the percentage of UC that match with a LIFA record and b) the percentage of UC-LIFA links having the same address in both sources. The values of the first indicator vary significantly across the four municipalities (between 16.2% and 42.2%) suggesting that the current version of the LIFAs may be more effective in some places than others. Table 2: RL model’s specification and results Municipality Space Blocking/ Matching reduction sorting variables strategy variables Genova

SNM

Prato

SNM

Scandicci

Blocking

Abbiategrasso

Cross product

Comparison function [threshold]

UC in LIFA (links)

Links with CPS address coinciding with LIFA address

N, S

N,S,(A) D,M,Y

3-g [0.5] Equality

134 (42.2)

120 (89.6)

N, S

N,S,(A) D,M,Y

3-g [0.5] Equality

64 (23.4)

27 (43.5)

G

N,S,(A) D.M.Y

Lev [0.7] Equality

3 (33.3)

3 (100.0)

-

N,S,(A) D,M,Y

Lev [0.7] Equality

12 (16.2)

11 (91.7)

Percentages in bracket. SNM=sorted neighbourhood method, N=name, S=surname, D=day of birth, M=month of birth, Y=year of birth, A=address, G=gender, NS=namesurname, Lev=Levenshtein, 3-g=3-grams.

If one excludes Prato, the second indicator shows that a fairly high rate of records would have been found by the enumerators if they were visited at their LIFA address6. This finding 4

In the CPS, the borders coincide with those of the census EAs selected for the pilot survey. We are grateful to Lorenzo Cassata for his help and troubleshooting on the SGR. 6 It is likely that the degree of success of the LIFAs in locating and counting the undercounts would have been higher had the enumerator known in advance the whereabouts and composition (number of members, age, gender, nationality) of households rather than just calling by at every dwelling located in the EA. 5

4

Mancini, Marcone, Borrelli, Fortini and Ronconi

is encouraging because by and large the addresses recorded in the LIFAs appear to provide a reliable signal in order to locate households missed out in the census mail-out. The lower value for Prato suggests that the success of the LIFAs is likely to depend upon the size of the foreign population living within the municipal borders. Prato is home to a notoriously large Chinese population and it is no coincidence that the LIFAs perform relatively poorly compared to other places. Summary statistics (not shown) reveal that in Prato about 39% of the undercounts have non-Italian citizenship compared to 26% in Genova, 28% in Abbiategrasso and 11% in Scandicci7.

4 Concluding remarks The 2011 Italian General Census of Population and Dwellings will be assisted for the first time by municipal-level population registers (LAC). In order to minimize coverage errors induced by poorly maintained LACs, the National Institute of Statistics has decided to use auxiliary population archives from alternative sources, called LIFA. Despite expectations of their usefulness are high, little is known on the real benefits from using the LIFAs. In order to assess the degree of success of these auxiliary archives in locating and including in the census residents unaccounted for by the municipal population registers, probabilistic record linkage models were estimated to match individual records found in the 2009 Census Pilot Survey (CPS) but not in the LACs with records contained in the LIFAs. The latter were constructed around the National Tax Register as the pivot database and then enriched and validated by entries from a series of other public registries. The results of the simulation exercise are encouraging and show that the LIFAs could provide reliable guidance to enumerators trying to locate individuals not yet counted in the census. However, the performance of the LIFAs differs significantly across municipalities. One critical aspect affecting their success appears to be the size of foreign population living within the municipal borders. This suggests that the effectiveness of the LIFAs is likely to increase when the Permits to Stay Archive will be integrated in. In any case, it is clear that the LIFA cannot be used in isolation and need to be combined with other instruments in order to thoroughly tackle under-coverage errors.

5 References 1 2 3 4 5

7

Fortini, M. e G. Gallo (2009) Misure di sottocopertura anagrafica in base alla revisione postcensuaria del 2001, paper presented at the 2009 SIS Conference. Fortini, M. et al. (2010) L’uso delle LAC e degli archivi ausiliari nel processo di produzione del censimento, mimeo, DCCG, Istat, Rome. Gu, L., Baxter, R., Vickers, D. e C. Rainsford (2003) Record Linkage: Current Practice and Future Directions, CMIS Technical report n. 03/83, Canberra, Australia Scannapieco, M. et al. (2010) Relais User’s Guide, Version 2.1, DCMT and DCCG, Istat, Rome Stassi, G. (2010) Cosa abbiamo imparato dalle rilevazioni pilota del Censimento, presented at the 9th DEA National Conference, Baveno-Stresa, November 2010.

Similar results were found by Fortini and Gallo for the 2001 Census. For more details see [1].