Simulations Demonstrate Feasibility of Capture ...

4 downloads 0 Views 695KB Size Report
homeless persons (Fisher et al., 1994), heroin users (Squires et al., 1995; Barnes et al.,. 1995) and lesbians (Aaron et. ...... Dorothy Dunn. 27. Andrea Apple. 15.
Simulations Demonstrate Feasibility of Capture-Recapture

Lori Ann Post Assistant Professor, Family and Child Ecology 297 Communication Arts Building East Lansing, Michigan 48824 517-355-3401 Hui Zhang Michigan State University Gia Elise Barboza Michigan State University Tom Conner Michigan State University

1

Simulations Demonstrate Feasibility of Capture-Recapture ABSTRACT During the past ten years, the capture-recapture technique for generating prevalence and incidence estimates has become increasingly useful to researchers, census adjusters and public health officials. This paper illustrates the logistics of using capture-recapture methodology, evaluates the utility of capture-recapture estimation to demonstrate the sensitivity of models it is based on, and determines the precision of capture-recapture methods in predicting the size of unknown populations. The findings indicate: 1) the capture-recapture model is sensitive to the probability of being captured with larger probabilities providing more accurate and stable results; 2) when sample representation drops below 15%, the confidence placed in the estimate is reduced considerably as the standard deviation doubles; 3) the population estimates should be profiled in intervals with more credence given to upper interval estimates because of the positive skew and to nonparametric bootstrap methods because they are robust to violations of normality; 4) capture-recapture produces reasonable estimates of otherwise hard to enumerate populations when maintaining necessary assumptions; and 5) when capture probabilities are symmetric, capture-recapture techniques provide better estimates of population parameters.

2

KEY WORDS 1. Capture-Recapture 2. Simulations 3. Incomplete Case Ascertainment 4. Indirect Estimation 5. Population Prediction 6. Demographic and Epidemiologic Techniques 7. Bootstrap

3

Simulations Demonstrate Feasibility of Capture-Recapture I.

Introduction Capture-recapture estimation, which is based on the overlap between two or more

distinct samples of a population, provides an indirect method for determining incidence and prevalence estimates for difficult to count populations. The capture-recapture method was derived from the practice of randomly sampling a defined population, marking the captured sample members for later identification purposes, then releasing those captured back into the population. A second random sample from the same population is taken and, because the sources are presumed to be independent, the proportion of those marked from the first sample found in the second sample is assumed to be the same as the proportion of marked sample members in the defined population (Barnes et al. 1995). Multiplication of the two sample sizes divided by the number found in both samples results in an estimate of the total population of interest (Bishop, Fienberg, and Holland, 1975). With respect to estimating the size of human populations in particular, lists or registers are often used as the “captures” with duplicate individuals effectively being “recaptured.” Capture-recapture was first used by wildlife researchers (Peterson, 1896; Lincoln, 1930), briefly used and forgotten by demographers (Sekar and Deming, 1949), borrowed by epidemiologists (Wittes and Sidel, 1968), improved by statisticians (Fienberg, 1972; Bishop et al., 1975) and, employed once again by demographers (Nanan and White, 1997). Pioneering research by Sekar and Deming (1949) applied the earliest applications of capture-recapture for the purpose of estimating birth and death rates. More recently, the renewed interest in the application of capture-recapture methodology to

4

epidemiological data has led to innovative applications of this technique. Khan et. al, (2004) used capture-recapture to estimate the number of mobile male sex trade workers in a conservative port city of Bangladesh. Others have used capture-recapture to estimate the size of partially hidden populations such as prostitutes (McKeganey et al., 1992), homeless persons (Fisher et al., 1994), heroin users (Squires et al., 1995; Barnes et al., 1995) and lesbians (Aaron et. al, 2003). In the biomedical area, researchers have used capture-recapture to monitor diabetes (Tull et. al, 1998), to estimate the prevalence of individuals with alcohol related problems who seek or need treatment (Corrao, 2000), to investigate the prevalence of Down Syndrome among live born infants (Orton et. al, 2001) and to assess incidence and number of deaths from pneumococcal meningitis (Gjini, 2004). Non-academics have also been successful in applying this approach to a variety of research problems. For example, the U.S. government successfully applied capture-recapture to control for undercounting in the U.S. Census (Nanan and White, 1997). NASA has employed this technique to count the number of stars in the universe (Fienberg, 1998). Finally, the British Society for Statistics used this methodology to estimate the size of the World Wide Web (Fienberg, 1998). As renewed interest in using capture-recapture methodology has arisen, so has polarization of the literature into believers and skeptics (Hay, 1997). One camp claims this provocative technique could be the cure-all to statistical, demographic, and epidemiological pitfalls in population estimation because it does not encumber the biases or costs of traditional techniques, such as surveys or registration systems (LaPorte, 1994; Watts et al., 1994; Shaw et al., 1996; Stephen, 1996). In contrast, the other camp is becoming increasingly concerned with validity issues of capture-recapture methods

5

(Hook and Regal, 1995; Papoz et al, 1996). The skepticism that many researchers have towards the application of capture-recapture methods also stems from seasonality in the distribution of cases and/or the relatively few number of cases reported (Goldman, 2004). Therefore, despite improvements to the methodology, the accuracy of capture-recapture is believed to be impossible to ascertain because it is used to estimate elusive and invisible populations, whose size can never be independently known (Hay, 1997). This paper will shed some light on the capture-recapture controversy by illustrating the logistics of using capture-recapture methodology, evaluating the utility of capture-recapture estimation to demonstrate the sensitivity of models it is based on, and by determining the precision of capture-recapture methods in predicting the size of an already known population. The limitations of predicting an unknown population size are overcome by employing controlled experiments in the form of simulations. It allows the population to be preset and the prediction capabilities tested. In addition to testing the estimation power of the capture-recapture method, this research generates a typical capture-recapture distribution. In sum, this research allows the capture-recapture dialectic to transcend some of the barriers and to move the discussion forward by removing the unknown population obstacle. II.

The Logistics of Capture-Recapture Before demonstrating the precision and sensitivity of the capture-recapture model,

it is necessary to understand the logistics of the model. Therefore, the following example will illustrate exactly how to employ this technique using three hypothetical data sources in a population of 1000 women. Each data source contains the same characteristics as the other two data sources, yet each list was produced in a separate way. For example, if the

6

characteristic was “homelessness,” each list would contain names of homeless persons and each list would come from a separate source such as a homeless shelter, a social services agency, and a religious mission. There are 26 cases with fictitious names for demonstration purposes in each of the three databases (see Table 1). Relying solely on any one of these data sources as a predictor of a phenomenon (direct estimation procedure) creates an estimate of 26 per thousand cases. It is evident that each of the single registries is missing other true cases and therefore, estimating prevalence at the rate of 26 per thousand would be an underestimate. Similarly, the three data sources cannot be simply added together to derive an estimate of prevalence which would yield a rate of 78 per thousand (see Table 1). Duplication errors occur when registrants are found in more than one registry. [Table 1 goes here] Seemingly, the most logical way to derive an estimate of prevalence is to add the three databases together and to subtract out cases which have been counted more than once. This would yield a rate of 50 per thousand. This is probably a closer estimate of the true size of the population, but is also almost certainly still an underestimate. In this example, as in real case registries, the three databases are not an exhaustive collection of cases. The more elusive the population, the less representative the registry becomes and the greater the underestimate becomes. For example, heroin users and prostitutes are hidden populations because they are involved in illegal and socially deviant activity and could receive legal sanctions by self-identifying. Therefore, if public health workers want to estimate the magnitude of social problems such as drug use and

7

prostitution, traditional research methods are problematic and official registries such as police records are sure to be gross undercounts. Capture-recapture can be employed because it capitalizes on having more than one database and indirectly estimates the true population of cases by measuring the “overlap” between databases and not the combination of three data sources. Using the log-linear analysis to model dependence between the sources for three factor capturerecapture, we indirectly estimate the population by looking at the number of cases found in each database exclusive of the other databases, the number of cases found jointly in all three databases, and the number of cases found in a combination of two out of the three databases. [Table 2 goes here] The parameter of interest, N, represents the true population size with individuals in the population being indexed by 1, 2, …, N. We assume that there are k = 3 data sources or registries that will be used to estimate the invisible population. Presence or absence in any registry is denoted by 1 or 2 respectively. For the three list case three numbers ranging from 221 – 111 are used to denote the individual’s representation across the three registries. Each individual in a three- list case is associated with one of the following seven possible “capture histories”: (221), (212), (211), (122), (121), (112) and (111). The ascertainment data for the hypothetical sample of homeless individuals is aggregated in categorical format in Table 3. Specifically, we let m s1 , s2 ...st be the number of individuals occurring in registry t with record s1 , s 2 ,..., st , where s j = 2 denotes absence in registry j while s j = 1 denotes presence in registry j. In the hypothetical example, there

8

were only 8 people listed in Registry 3 only, 11 listed in Registry 2 only, and 5 people listed in both Registries 2 and 3 but not 1. The other records have similar interpretations. [Table 3 goes here] Our approach is to use log-linear analysis to model the expected value of each observable category. The data are regarded as an incomplete contingency table for which the cell corresponding to those individuals who are absent in all three registries is missing. There are 2 k = 8 potential models that may fit the data derived from three

sources based on the interdependencies (Hook and Regal, 1995). Among the eight possible models, equation (4) below represents the one containing no higher order interaction terms (the sources are presumed to be independent), equation (5) represents the three models containing an interaction between two sources only, equation (6) represents the three models containing interactions between pairs of sources and equation (7) represents the single model containing interactions between all sources. With the independence of a 2-way case, it is not possible to test the 2nd-order interaction assumption, :123, however, other log-linear models can be fit to the data (Fienberg, 1998). The corresponding models are: ln mijk = θ + µ i1 + µ 2j + µ k3

(1)

ln mijk = θ

+ µ i1

+ µ 2j

+ µ k3

+ µ ij12

ln mijk = θ

+ µ i1

+ µ 2j

+ µ k3

+ µ ij12

(2) + µ ij13

(3)

ln mijk = θ + µ i1 + µ 2j + µ k3 + µ ij12 + µ ij13 + µ ij23

(4)

where:

θ = The Constant term µ i1 = The Main Effect of Source 1 µ 2j = The Main Effect of Source 2

µ k3 = The Main Effect of Source 3

9

µij12 = The Interaction Effect of Source 1 and 2

µ ik13 = The Interaction Effect of Source 1 and 3 µ 23 jk = The Interaction Effect of Source 2 and 3 Where: mijk

denotes those cases in the i, j, and k samples where 1= present and 2= missing. We solve to estimate the missing population m2221 : m 222 =

m111 m122 m 212 m 221 m112 m121 m 211

(5)

And thus can estimate the total population: N = N obs + m 222

(6)

Where the observed population is: N obs = m111 + m 221 + m121 + m 211 + m122 + m 212 + m112

(7)

The purpose is to estimate the number of homeless persons (N) in a given geographic area. This is equivalent to predicting the number of homeless individuals missed by all three registries (i.e. m222 ). [Tables 4 and 5 go here] Using log-linear analysis with a saturated model we derive an estimate of 68 cases per 1000 persons. Following suit, we estimate the standard error (not based on probability) to be 19. This could be used to create a confidence interval of 53 to 147 cases. This estimate is more intuitively sensible in that it is consistent with believing that direct methods undercount when dealing with hidden and elusive populations.

10

III. Model Sensitivity to Capture Probabilities

[Figure 1 goes here] Figure 1 demonstrates the relationship of three independent sources of the same characteristic. Sample one includes sections A, B, D, E; sample two includes sections B, E, C, F; and sample three includes D, E, F, G. In this example, the probability of being captured in sample one, two, or three is written as P1, P2, and P3 respectively. The probability of being captured in sample one (P1) is the probability one person is in section A, B, D, or E of Figure 1. From now on , unless specified otherwise, we assume symmetric capture probabilities, i.e. P1 = P2 = P3 . The “unknown population” or N predict, is the area outside the three circles in Figure 1 or those cases of the study characteristic that are not present in samples 1, 2, or 3. The variation of N predict is based on which and how many data sources an individual is captured in. In reality it is difficult to determine the accuracy of a capture-recapture estimate, but in theory we can. A simulation can elucidate the precision with which capturerecapture is able to estimate the population by comparing estimates to predetermined population parameters. Simulations were employed to investigate how the estimator performs under certain pre-specified conditions by estimating the population totals for the two factor model and the three factor sample. More specifically, this simulation of the capture1

This example is taken from Fienberg (1972) where the reader can find a further explanation of CR and log linear models. 11

recapture method will demonstrate model sensitivity to capture probabilities. The simulations to predict the size of the population were done using Visual FORTRAN. For purposes of analysis, the population size was preset at N = 1000 for the first six models and N = 2000 for the next six models. Ten thousand trials were calculated for each level of probability at the given preset population size. The simulation calculated estimates of the population according to each probability of capture, which ranged from 0.05 to .5 with 0.05 increments. The average estimate and standard deviation of the 10,000 trials are listed below. For trials 1- 4, we preset our population to be either 1000 or 2000, respectively, and assume equal (symmetric) capture probabilities ranging from 0.05-0.5 with 0.05 increments. Both here and below, our simulations are based on N = 10,000 iterations. Table 6 shows our results for the two-factor model with a population size of 1000. [Table 6 goes here] The results of the simulation demonstrate that as the capture probabilities increase, the point estimators of our population parameters become more precise, as is made apparent by its smaller range, standard error, and coefficient of variance. [Table 7 goes here] In Table 7, the smaller coefficient of variation suggests that a doubling of the population size from 1000 to 2000 results in even more accurate estimates. [Table 8 goes here] Note that the capture–recapture model is sensitive not only to the probability of capture but to the number of factors included in the model as well. The three factor model

12

is less stable than two factor model, and this is especially true when the capture probability is small. [Table 9 goes here] Overall, our results demonstrate that as the population size increases, the estimate becomes more stable. For example, with p = 0.5 and n = 2000, the coefficient of variation is smaller than when n = 1000. The differences in our simulations, while subtle, suggest that estimates derived from capture-recapture methods are sensitive to capture probabilities, population size and the number of registries included. As a general estimation procedure, capture-recapture provides very good estimates of population parameters when the population is large or when the probability of capture is high. On the other hand, we recommend using this technique with caution if the population is small or the capture probability is negligible. Empirically, for n ≈ 1000 we suggest using capturerecapture when p ≥ 0.25, but using it when p ≥ 0.20 only if n ≈ 2000. IV Model sensitivity to list dependency

A crucial assumption of capture-recapture is that the samples or registries are independent. Under this assumption, the ratio of marked subjects that are recaptured in the second sample will be the same as the ratio of marked subjects to the total number of objects. For example, the capture probability P1 should be the same as P12/1, the probability of being included in both samples 1 and 2 given inclusion in sample 1. In practice, however, because dependencies often exist between two or more sources, loglinear modeling is used to adjust for correlation bias. In this section we will focus on the effects of dependency for the capture–recapture model.

13

A. The two factor model: In this trial we preset the total population at 1000. We have designed the simulation in such a way that the second sample intentionally excludes cases present in the first sample. The probability of being captured in sample one is assumed to be the same as the probability of being captured in sample two, i.e., P1 = P2 = .33, but we allow P12/1, the recapture probability, to range from 0.05-0.3. We fully expect to observe large deviations from the independence assumption when the recapture probability is small. The simulation results are given in table 10. The results suggest that as P12/1 increases the population estimate becomes more precise. The best estimate is obtained when the recapture probability is approximately equal to the probability of being captured in sample one, a result that we would expect given that this is precisely when the recapture probability is approaching the independent case (which would occur when P12/1 = 0.33). [Table 10 goes here] B. The three factor model For trials 1 through 12, we preset the total population at 1000. In this simulation, the uni-sample, bi-sample and tri-sample probabilities of being captured are varied to determine how sensitive the formula is to capture. For trial 1: The probability of being captured in sample one is P1 = .33. The same is true

for samples two and three (see Table 11). The probability of being captured in the intersection of samples one and two given the person has already been captured in sample one or two is set at P12/1 and P12/2 = .30 respectively. The same probability is set for P13 and P23. Finally, the probability for being captured in all three databases (P123) given the person has already been captured in Samples 1 AND 2, Samples 1 AND 3, and Samples 1

14

AND 2 is set at .33. Given these uni-sample, bi-sample and tri-sample probabilities, the estimate of the total population2 is 1242. For trials 2-4: The probabilities of the uni-sample and the tri-samples remain constant at

P=.33 while the bi-sample probabilities are decreased by .5 per subsequent trial (See Table 11). With each decrease, the population estimate increases dramatically. For trials 5-8: The uni-sample and the bi-sample probabilities are held constant at .33

and the tri-sample probability is decreased by .5 per subsequent trial. In contrast to trials 1-4, the decreasing probability has the opposite effect of decreasing the population estimates (See Table 11). For Trials 9-12: The bi-sample and tri-sample probabilities are held constant at .33 and

the uni-sample probabilities are decreased by .5 per subsequent trial. This decrease in probability also has a negative effect on the population estimates (see Table 11). For Trials 13-36: The uni-sample probabilities are held constant at .33 but the bi-sample

and tri-sample probabilities varied from 0.05 – 0.30 but increased by .5 per subsequent trial. Generally speaking, this increase in probability has a positive effect on the population estimates (see Table 11). In sum, the model is sensitive to the probability of being captured. Also, the dramatic predicted population increases and decreases created by the size of the intersections of the three samples are indicative of the importance of matching cases. For example, if there is one case located in the intersection of samples one, two and three (Section E in Figure 1), and only two out of the three sources are matched, then it would have an incremental effect on the population estimate as opposed to a negative effect.

2

This simulation is not intended to estimate realistic population sizes because dependencies have NOT been controlled, rather this exercise is a demonstration of the sensitivity of the model to dependencies. 15

This would lead to a decrease in the population estimate as we see in Table 11, the more likely a person is to be captured in the two-sample probability, the smaller the population estimates. An error such as this would have a double effect since the case would increase the bi-sample probability as well as decrease the tri-sample probability. The capturerecapture formula is more sensitive to errors for estimating populations than traditional survey research statistics because in a survey an error is additive, whereas, in capturerecapture, the error is exponential. [Table 11 goes here] [Table 12 goes here] Table 12 shows the simulation results. As was the case in the two-sample model, small P12/1 indicates intentional exclusion of cases in previous samples. As P12/1 or P123|12 converges towards P1, the stability of the estimate increases and becomes more accurate. From these results we conclude that dependency between lists has a large effect on the corresponding estimates. V. Symmetric versus asymmetric capture probabilities

So far we have discussed the symmetric case where P1 = P2 = P3. In real life, however, the capture probability of the various samples may vary. Therefore, it is important to explore the effects of this probabilistic asymmetry on capture–recapture estimates. To do so, we fix the total probability of two lists at 64%. In table 13, we forced three cases to satisfy a total capture of 64%. We compare three cases: (1) symmetric capture probability, P1 = P2 = 0.4; (2) moderate asymmetry, P1 = 0.5, P2 = 0.28; and (3) large asymmetry, P1 = 0.6, P2 = 0.1. [Table 13 goes here]

16

The simulation result demonstrates that the symmetric capture probability case yields better estimates than either of the asymmetric cases. VI. Precision of the Technique Capture-Recapture

Given the logistics and sensitivity of capture-recapture, it is still not possible to determine the efficacy of the technique for indirectly estimating the population size. This technique is used when the true size of the population is unknown; therefore, ascertaining the accuracy is difficult. There are significant benefits in having multiple samples from which to estimate the same population as is the case in capture-recapture (Fienberg, 1972). However, there are other means to determine precision. A.

Precision in Simulations (1) Coefficient of Variation

In the following example, a two-factor model of capture-recapture with no dependencies is used to predict the size of an unknown population using simulation techniques. The three-factor model can only accommodate specific dependencies between samples and therefore, an estimate for comparison is not possible without loglinear modeling. The following findings are confirmed by the simulation results. The probability of being captured increases, the mean estimate (column 3) is closer to the real population size (column 1) and the standard deviation (column 6) decreases. When the number of cases captured in the sample drops below 15%, the standard deviation becomes quite large. As the capture probability increases the standard deviation becomes smaller. [Table 14 goes here]

17

The inverse relationship between probability of capture and standard deviation is consistent regardless of sample size. However, since standard deviation is a function of sample size the statistic “Coefficient of Variability” is also calculated. The Coefficient of Variability takes into account the fact that distributions with very large means usually have larger standard deviations by dividing the standard deviation by the mean (Blalock, 1979). As can be seen in the final column of Table 14, the Coefficient dramatically decreases when the probability of capture is less than .15. This relation remains regardless of sample size. In other words, the variation of capture-recapture estimates is a function of probability of capture and NOT the size of the population. Therefore, researchers should assume that the point estimate is closer to the real population when a larger proportion of the real cases are included in the analysis. (2) Confidence Intervals

Bootstrapping is a widely used re-sampling method used to determine the upper and lower bounds of confidence intervals when the usual assumptions of parametric statistical methods are not feasible (Efron and Tibshirani, 1993). Since bootstrap methods are non-parametric, the estimates they produce are robust to assumption violations. In addition, they provide a way to estimate confidence intervals for statistics with complicated sampling distributions (Haukoos, 2005). Because bootstrap procedures are second order correct (Bertail, 1997), distributions based on it are closer to the sample distribution than are those based on normal approximations. Although log-linear modeling allows us to obtain unbiased estimates of population totals, in order to construct confidence intervals based on these estimates it is useful to use bootstrapping procedures. The bootstrap procedure is briefly summarized below:

18

1.

Multinomial probabilities for each sample are calculated based on loglinear modeling.

2.

The parametric bootstrap procedure is applied B times by re-sampling from the multinomial probabilities.

3.

An ordered sequence of B estimates containing population totals is obtained. An 100(1 − α ) confidence interval is constructed by truncating two equal tails of B(α 2).

Coverage probabilities provide an index of how well the bootstrap confidence interval fits the data and provides us with a measure of precision for our estimates. Consequently, if the capture-recapture model is effective, we expect to observe higher coverage probabilities. Table 15 shows the simulation results of coverage probabilities for confidence intervals based on the bootstrap methodology. Our estimates are based on a two-factor model with P1 = P2 = 0.33, P12|1 = 0.25 and P12|1 = 0.30. Furthermore, our total population size was n = 1000 while our bootstrap resample was B = 2000, with N = 1000 iterations. [Table 15 goes here] The simulation results confirm that as the conditional probability approaches 0.33 (a situation corresponding to observing independent registries), the coverage probability increases. In other words, the bootstrapping procedure suggests that capture-recapture is more precise when dependencies between lists are weak. B. Shape of the Capture-Recapture Distribution

Next, we explore the shape of the distribution of the capture-recapture estimates for a three-factor model with population size n = 1000. We plot the N = 10,000 estimates 19

of population size for two cases corresponding to the smallest and largest capture probabilities, P1 = 0.05 and P2= 0.50. Figures 1 and 2 show that the shape of the distribution of estimates is right-skewed, meaning that estimates based on capturerecapture will be biased towards zero (i.e., they will underestimate the true population total). On the other hand, when the capture probability increases, the distribution of estimates is less skewed and tends toward normality. [Figures 2 and 3 go here] C.

Empirical Reality

There is significant evidence in previous research studies that capture-recapture netted reasonable estimates of invisible populations. For example, public health workers in Bangkok suspected much higher rates of heroin use than were detected in survey data and registries (Mastro et al., 1994). Higher rates were suspected because public health researchers were able to determine the percentage of heroin users who were HIV positive. The number of HIV cases increased more than the reported number of heroin users. In an epidemiological study using capture-recapture, Mastro et al. (1994) estimated the number of injection drug users in Bangkok. Mastro et al. (1994) was able to count those cases missed by direct estimation procedures (public health surveillance system). An additional six thousand heroin users were identified which directly reflected the increase in HIV cases that were expected to be contracted from heroin usage. D.

Other Applications

Animal scientists from the US Fish and Wildlife Services, the National Wildlife Federation, and the Biological Resources Division of the US Geological Survey have long since monitored animal and insect species using capture-recapture (Schemnitz,

20

1980). With this technique wildlife specialists have learned how many animals within a species exist, sex ratios, location, migratory patterns, frequency of breeding, habitat ecology, survival rates and fertility rates. The information these scientists collect regarding animal species is so intrinsically specific that several species on the verge of extinction have been rescued due to the tailored “Recover Programs” designed to save threatened animal species from becoming extinct. “Animal demographers” understand the fertility, mortality, migration processes and population size of some elusive creatures (Schemnitz, 1980, Seber, 1993). VII. Conclusions

Can the capture-recapture technique provide a reasonable estimate of the size of an invisible population? We believe capture-recapture is a useful statistical technique to estimate the size of invisible populations; however, there are some caveats. First, the simulations demonstrated that the model is sensitive to the probability of being captured and therefore, it would not suffice to routinely give point estimates as opposed to confidence intervals. As shown in Table 14, when sample representation drops below 15%, the confidence placed in the estimate is reduced considerably as the standard deviation doubles. When sample representation of the observed cases exceeds 15%, the standard deviation decreases substantially while the range of the estimates decreases. The smaller the probability of being captured in the sample, the less confidence the researcher should hold in the point estimate and rely more heavily on the confidence intervals constructed around the point estimate. Second, the capture-recapture estimates indicate a positively skewed distribution. This characteristic should be considered when calculating standard errors for use in

21

confidence intervals. The upper parameters should have more credence than the lower parameters because of the positive skew. This study supports Fienberg's (1998) contention that it is best to profile population estimates with intervals to reduce the errors and the interval between the point estimate and the upper parameter are more likely to contain the true population than the lower parameter. Additionally, our bootstrap estimates and coverage probabilities suggest that when weak dependencies in the data are present, capture-recapture estimates provide good approximations to population parameters. Third, there are necessary assumptions that must be maintained for estimation purposes. If population stability and accurate case matching can be established, the remaining violations (sample dependency and heterogeneity) can be statistically modeled. Fourth, compared with asymmetric capture probabilities, a model with equal capture probability will provide more accurate and stable estimates. Fifth, some researchers have prematurely dismissed capture-recapture. Their reluctance to accept this procedure as a valid and reliable technique (Hay, 1997) may be a function of discipline. Some researchers may not be aware of the refinements developed by statisticians (Pledger et al., 2005; Fienberg, 1972; Bishop et al., 1975; Fienberg, 1998; Cormack, 1981; Ding, 1990; Ding and Fienberg, 1994; Sekar and Deming, 1949; Alho, 1990; Alho et al., 1993; Erickson and Kadane, 1985; Darroch et al., 1993; Agresti, 1994; Agresti and Lang, 1993). In sum, despite these cautions, researchers can be confident the capture-recapture techniques produce reasonable estimates of otherwise hard to enumerate populations. They also represent the best that can be done at the present time.

22

23

References:

Aaron, D.J., Chang, Y-F, Markovic, N., LaPorte, R.E. (2003) Estimating the lesbian population: a capture-recapture approach, Journal of Epidemiological Community Health, 57, 207 – 209. Aaron, D.J., Markovic, N. Danielson, M.E., et al. (2001) Behavioral risk factors for disease and preventive health practices among lesbians, American Journal of Public Health, 91, 972 – 975. Agresti, A. (1994) Simple capture-recapture models permitting unequal catchability and variable sampling effort, Biometrics, 50, 494-500. Agresti, A. , Lang, J. B. (1993) Quasi-symmetric latent class models, with application to rater agreement, Biometrics 49, 131-139. Alho, J. M. (1990) Logistic regression in capture-recapture models, Biometrics, 46, 62335. Alho, M. Mulry, M. H., Wurdeman, K. et al. (1993) Estimating heterogeneity in the probabilities of enumeration for the dual-system estimation, Journal of American Statistical Association, 80, 98-131. Barnes, Suzie et. al. (1995) Michigan Heroin Prevalence Study 1994. College of Urban, Labor and Metropolitan Affairs. Wayne State University. Bertail, P. (1997). Second-order properties of an extrapolated bootstrap without replacement under weak assumptions, Bernoulli 3(2), 149–179. Bishop, Y. M. M., S. E. Fienberg, and P. W. Holland (1975) Discrete multivariate analysis: Theory and practice. Cambridge. Cambridge, MA: MIT Press, 229-256. Blalock, H. M. (1979) Social Statistics, 2nd ed. , New York : McGraw-Hill. Burnham, K. P., and Anderson, D. R. (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, 2nd ed: Springer, New York. Chao A., Tsay P.K., Lin S.H., Shau W.Y., Chao D.Y. (2001) The applications of capturerecapture models to Epidemiological data, Statistics in Medicine, 20(20), pp. 3123 – 3157. Cormack, R. (1992) Interval estimation for mark-recapture studies of closed populations, Biometrics, 48, 567-576.

24

Cormack, R. M. (1981) Log-linear models for capture-recapture experiments on open populations. In: Hiorns R.W., Cooke D. eds., The mathematical theory of the dynamics of biological populations II. London: Academic Press, 1981. Corrao, G., Bagnardi, V., Vittadini, G., Favilli, S. (2000) Capture-recapture methods to size alcohol related problems in a population, Journal of Epidemiological Community Health 54, 603-610. Darroch, J. N., Fienberg, S. E., Glonek G. F. G.. et al. (1993) A three-sample multiplerecapture approach to census population estimation with heterogeneous catchability, Journal of American Statistical Association, 88, 1137-1148. Efron, B. and R. J. Tibshirani, An Introduction to the Bootstrap. London, U.K.: Chapman & Hall, 1993. Ericksen, E. P., Kadane, J. P. (1985) Estimating the population in a census year: 1980 and beyond, Journal of American Statistical Association, 80, 98-131. Fienberg, Stephen E. (1972) The multiple recapture census for closed populations and incomplete 2k contingency tables, Biometrika, 3, 591-601. Fienberg, Stephen E. (1998) Innovative Methodology for Capture Recapture Studies. Workshop Notebook from the North Eastern Biometrics Conference, Pittsburgh, PA. March 31, 1998. Fisher, N., S. W. Turner, and R. Pugh, et al (1994). Estimating numbers of homeless and homeless mentally ill people in north east Westminster by using capture-recapture analysis, British Medical Journal 308, 27-30. Gjini, A., Stuart, J.M., George, R.C., Nichols, T., Heyderman, R.S. (2004). Capturerecapture analysis and Pneumococcal Meningitis Estimates in England, Emerging Infectious Diseases, 10, 87-93. Goldman, G.S., (2004). Using capture-recapture methods to assess varicella incidence in a community under active surveillance, Vaccine, 3, 22(25-26). Haukoos J.S,, Lewis RJ (2005). Advanced statistics: bootstrapping confidence intervals for statistics with "difficult" distributions, Academic Emergency Medicine 12(4):360-5. Hay, Gordon (1997). The selection from multiple data sources in epidemiological capture-recapture studies, The Statistician, 46, No. 4, pp. 515-520. Hines, J. E., Kendall, W. L., Nichols, J.D. (2003) On the use of robust design with transient capture-recapture models, The Auk, 120, 1151-1158.

25

Hook, E. B., Regal, R.R. (1995) Capture-Recapture Methods in Epidemiology: Methods and Limitations, Epidemiologic Reviews, 17, No. 2. Hook, E. B., Regal, R.R. (1993a) Accuracy of alternative approaches to capture-recapture estimates of disease frequency: internal validity analysis of data from five sources, American Journal of Epidemiology, 152: 771-779. Hook, E. B., Regal, R.R. (1993b) Effect of Variation in Probability of Ascertainment by Sources (Variable Catchability) upon “Capture-Recapture” Estimates of Prevalence, American Journal of Epidemiology, 137, No. 10. Hook, E. B. (1993). Re: Use of Bernoulli census and log-linear methods for estimating the prevalence of spina bifida in live births and the completeness of vital record reports in New York State, (Letter), American Journal of Epidemiology, 137, 1285 Hook, E. B., Regal, R.R. (1992) The value of capture-recapture methods even for apparent exhaustive surveys: the need for adjustment for source and ascertainment intersection in attempted complete prevalence studies, American Journal of Epidemiology, 135, 1060-7. Hook, E. B., Regal, R.R. (1982) Validity of Bernoulli census, log-linear, and truncated binomial models for correcting for underestimates in prevalence studies. American Journal of Epidemiology, 116, 168-76. Hook, E. B., S. G. Albright, and P. K. Cross (1980) Use of Bernoulli census and loglinear methods for estimating the prevalence of spina bifida in live births and the completeness of vital record reports in New York State. American Journal of Epidemiology, 112, 750-8. Ismail AA, Beeching NJ, Gill GV, Bellis MA., (2000) How many data sources are needed to determine diabetes prevalence by capture-recapture? International Journal of Epidemiology, 29:536-541. Kelly A., Carvalho, M., Teljeur, C. (2003) Prevalence of opiate use in Ireland 2000 – 2001: A 3-source capture-recapture study, National Advisory Committee on Drugs Khan, S.I., Bhuiya, A., Uddin, J., (2004) Application of the capture-recapture method for estimating number of mobile male sex workers in a Port City of Bangladesh, Journal of Health Popular Nutrition 22, 19-26. LaPorte, R. E. (1994) Assessing the human condition – capture-recapture techniques. British Medical Journal, 305, 801-804. Lincoln, F.C. (1930). Calculating waterfowl abundance on the basis of banking returns. Cir. U.S. Dept Agric., 118, 1-4.

26

Mastro, T.D., D. Kitayaporn, B. G. Weniger, S. Vanichseni, V. Laosunthorn, T. Uneklabh, C. Uneclabh, K. Choopanya, and K. Limpakarnjanarat (1994) Estimating the number of HIV-infected injection drug users in Bangkok: A capture-recapture method, American Journal of Public Health, 84(7), 1068-1069. McKeganey, N, M. Barnard, A. Leyland, et al (1992). Female street-working prostitution and HIV infection in Glasgow, British Medical Journal, 305, 801-4. Morse, T., Dillon, C., Warren, N., Hall, C., Hovey, D., (2001) Capture-recapture estimation of unreported work-related musculoskeletal disorders in Connecticut, American Journal of Industrial Medicine, Nanan, Debra J. and Franklin White (1997) Capture-Recapture: Reconnaissance of a Demographic Technique in Epidemiology. Chronic Diseases in Canada, 18, No. 4: 144-148. Orton, H., Russel, R., Miller, L. (2001) Using active medical record review and capturerecapture methods to investigate the prevalence of Down Syndrome among live-born infants in Colorado, Teratology, 64, 14-19. Orton, H., Gabella, B. (1999) Capture-recapture estimation using statistical software, Epidemiology, 10, 563-564. Papoz, L., Balkau, B. and Lellouch, J. (1996) Case counting in epidemiology – limitations of methods based on multiple data sources. International Journal of Epidemiology., 25, 474-478. Petersen, C. G. J. (1896) The yearly immigration of young plaice into the Limfjord from the German Sea. Rep. Dan. Biol. Stn (1895) 6, 5-84. Pledger, S. (2005) The Performance of Mixture Models in Heterogeneous Closed Population Capture-Recapture, Biometrics 61(3): 868-873. Pledger S. (2000) Unified maximum likelihood estimates for closed capture-recapture models using mixtures, Biometrics 2000, 56:434-442. Schemnitz, S. D., (1980) Wildlife Management Techniques Manual. 3rd ed. Washington, D.C. : Wildlife Society. Seber, G. A. F. (1993) The estimation of animal abundance and related parameters. London Griffin. Sekar, C. C. and W.E. Deming (1949) On a method of estimating birth and death rates and the extent of registration, Journal of American Statistical Association 44, 101-115.

27

Stephen, C. (1996) Capture-recapture methods in epidemiologic studies, Infection Control Hospital Epidemiology, 17, 262-266. Squires, N. F., Beeching, N. J., Schlecht, B. J. M. and Ruben, S. M. (1995) An estimate of the prevalence of drug misuse in Liverpool and a spatial-analysis of known addiction, Journal of Public Health Medicine, 17, 103-109. Tsay PK, Chao A. (2001) Population size estimation for capture-recapture models with applications to epidemiological data, Journal of Applied Statistics, 28:25-36. Tull, E.S., Butler, C., Gumbs, L., Williams, S. (1998) The use of capture-recapture methods to monitor diabetes in Dominica, West Indies, Pan American Journal of Public Health, 5, 303-307. Watts, C, A., and D. Wilson, et al (1994) Capture-recapture as a tool for program evaluation (Letter), British Medical Journal,, 308, 858. Wittes, J, and V. W. Sidel (1968) A generalization of the simple capture-recapture model with applications to epidemiological research, Journal of Chronic Diseases, 21, 287301. Wittes, J. T., T. Colton, and V. W. Sidel (1974) Capture-recapture methods for assessing the completeness of case ascertainment when using multiple information sources, Journal of Chronic Diseases, 27, 25-36.

28

Tables I. Table 1.

ID No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Hypothetical Ascertainment Data Based on Three Registries of Homeless Persons Sample 1 Name Mary Albert Christine Bosco Beatrice Carrey Gertrude Dalton Abigale Evans Shantelle Figaro Dalia Gallagher Loren Hall Vivian Ithica Eva Juno Wanda Kirk Noreen Lavender Fergie Mason Xandria Nelson Ophelia Oswald Halle Pierson Yolanda Queens Patricia Radcliffe Indigo Sundry Zena Tandy Quintena Upstein Justine Verigo Ursula Watson Kiadra Zolton Megan Yeltzen Thelma Xavier

ID No. 1 3 7 8 11 12 15 16 21 25 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Sample 2 Name Mary Albert Beatrice Carrey Dalia Gallagher Loren Hall Wanda Kirk Noreen Lavender Ophelia Oswald Halle Pierson Quintena Upstein Megan Yeltzen Andrea Apple Barbara Busch Cherry Coldwater Dorothy Dunn Evelyn Edgar Frani Fredrick Gail Getze Hedi Hopper Josephine Jaguar Kyla Kola Luanna Larrow Margaret Mead Natasha Nixon Orlanda Opal Paula Paddington Rachel Ripzeld

29

ID No. 2 26 24 4 9 13 17 18 1 3 7 8 11 27 28 29 30 31 43 44 45 46 47 48 49 50

Sample 3 Name Christine Bosco Thelma Xavier Kiadra Zolton Gertrude Dalton Vivian Ithica Fergie Mason Yolanda Queens Patricia Radcliffe Mary Albert Beatrice Carrey Dalia Gallagher Loren Hall Wanda Kirk Andrea Apple Barbara Busch Cherry Coldwater Dorothy Dunn Evelyn Edgar Sheila Saber Terri Treefold Wilma Wallace Ariel Arson Bethany Bardell Chloe Clemens Diedra Dunson Yvette Youth

II. Table 2. Three registry multiple recapture history

Registry 1 Included

Not Included

Registry 2

Registry 2

Included

Not Included

Included

Not Included

Included

m111

m121

m211

m221

Not Included

m112

m122

m212

m222

Registry 3

III. Table 3. Aggregate Data on Homelessness List 1 2 2 2 2 1 1 1 1

List 2 2 2 1 1 2 2 1 1

List 3 2 1 2 1 2 1 2 1

Data m222 = ? m 221 = 8

m212 = 11 m211 = 5

m122 = 8 m121 = 8 m122 = 5

m111 = 5

IV Table 4. Number of identified cases in each list n1 26

n2 26 30

n3 26

V. Table 5. Estimates Based on Log-Linear Models Independent 13/2 23/1 12/3 12/23 12/13 23/13 Symmetry Quasi-sym part-qs1 part-qs2 part-qs3 saturated

Deviance 1.71 0.00 1.45 1.45 .91 0.00 0.00 1.60 1.60 0.00 1.17 1.17 0.00

df 3 2 2 2 1 1 1 4 2 1 1 1 0

Estimate 62 68 61 61 58 68 68 67 67 68 67 67 68

Std. Err. 6 9 6 6 6 12 12 18 18 19 18 18 19

95% CIL 55 57 54 54 52 55 55 53 53 53 53 53 53

95% CIU 80 96 81 81 78 111 111 142 142 147 144 144 147

VI Table 6. Measures of precision for the two –factor model with population size=1000 in trial 1. Sample Size 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000

Probability of Capture 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

MEAN Estimate 1565.533 1110.31 1034.163 1014.924 1008.013 1003.708 1002.398 1001.351 1000.486 1000.135

MAX Estimate 38964 10528 3154 1874 1621 1385 1235 1205 1169 1127

31

MIN Estimated Estimate SD 244 1670.057 453 448.5197 552 201.3101 654 135.1701 717 98.68423 773 74.88184 801 58.70313 842 47.30447 867 38.54342 892 31.57567

CV 1.066766 0.403959 0.19466 0.133183 9.79E-02 7.46E-02 5.86E-02 4.72E-02 3.85E-02 3.16E-02

VII Table 7. Measures of precision for the two –factor model with population size=2000 in trial 2. Sample Size 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000

Probability of Capture 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

MEAN Estimate 2580.682 2100.982 2032.783 2013.955 2008.926 2002.139 2001.724 2001.136 2000.218 1999.944

MAX Estimate 38218 5858 3614 2942 2811 2486 2395 2299 2269 2216

MIN Estimated Estimate SD 672 1765.156 1055 474.2756 1280 267.9607 1469 185.5363 1550 138.6326 1661 104.3307 1733 82.2494 1776 67.40355 1826 54.57188 1854 44.79333

CV 0.683988 0.22574 0.13182 9.21E-02 6.90E-02 5.21E-02 4.11E-02 3.37E-02 2.73E-02 2.24E-02

VIII Table 8. Measures of precision for the three –factor model with population size=1000 in trial 3. Sample Size 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000

Probability of Capture 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

MEAN Estimate 3218.736 1408.289 1109.261 1049.692 1026.998 1016.208 1010.545 1006.169 1003.816 1002.711

MAX Estimate 437861 41396 6595 3670 2276 2108 1455 1355 1253 1175

32

MIN Estimated Estimate SD 103 14390.13 217 1950.425 344 561.3645 466 295.5064 651 180.9468 716 121.1177 803 84.38704 852 60.78705 878 44.04869 902 32.52528

CV 4.47074 1.38496 0.506071 0.281517 0.17619 0.119186 8.35E-02 6.04E-02 4.39E-02 3.24E-02

IX Table 9. Measures of precision for the three –factor model with population size=2000 in trial 4. Sample Size 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000

Probability of Capture 5.00E-02 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

MEAN Estimate 4309.808 2319 2092.266 2046.636 2022.567 2015.411 2010.026 2004.461 2003.083 2002.491

MAX Estimate 752392 32991 6598 4570 3393 2848 2611 2417 2300 2257

MIN Estimated Estimate SD 229 15550.6 482 1745.597 695 672.494 1095 383.6801 1413 244.8424 1560 164.1244 1685 116.5299 1775 82.76697 1827 60.63626 1872 44.64991

X Table 10. Measures of precision for the two-factor model with dependencies P1 0.33 0.33 0.33 0.33 0.33 0.33

P12|1 0.05 0.1 0.15 0.2 0.25 0.3

MEAN 7055.226 3449.507 2266.101 1680.614 1338.308 1114.437

MAX 20196 6496 3407 2356 1814 1405

.

33

MIN 3464 2279 1571 1303 1072 907

SD 1877.408 553.7256 269.7505 160.2413 111.1226 78.37547

CV 0.27 0.16 0.12 0.10 0.08 0.07

CV 3.608189 0.752737 0.321419 0.187469 0.121055 8.14E-02 5.80E-02 4.13E-02 3.03E-02 2.23E-02

XI Table 11. Probabilities of Being Captured Trial 1 2 3 4 5 6 7 8 9 10 11 12

P1 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.3 0.25 0.2 0.15

P2 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.3 0.25 0.2 0.15

P3 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.3 0.25 0.2 0.15

P12/1 0.30 0.25 0.20 0.15 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33

P12/2 0.30 0.25 0.20 0.15 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33

P13/1 0.30 0.25 0.20 0.15 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33

P13/3 0.30 0.25 0.20 0.15 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33

P23/2 0.30 0.25 0.20 0.15 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33

P23/3 0.30 0.25 0.20 0.15 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33

P123/1 0.33 0.33 0.33 0.33 0.30 0.25 0.20 0.15 0.33 0.33 0.33 0.33

P123/2 0.33 0.33 0.33 0.33 0.30 0.25 0.20 0.15 0.33 0.33 0.33 0.33

P123/3 0.33 0.33 0.33 0.33 0.30 0.25 0.20 0.15 0.33 0.33 0.33 0.33

XII Table 12. Measures of precision for the two-factor model with dependencies. P1 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33

P12|1 0.05 0.05 0.05 0.05 0.05 0.05 0.10 0.10 0.10 0.10 0.10 0.10 0.15 0.15 0.15 0.15 0.15 0.15 0.20 0.20

P123|1 0.05 0.10 0.15 0.20 0.25 0.30 0.05 0.10 0.15 0.20 0.25 0.30 0.05 0.10 0.15 0.20 0.25 0.30 0.05 0.10

MEAN 1298.64 1765.50 2330.08 3116.87 4088.40 5349.33 1020.36 1181.99 1389.32 1647.02 1999.52 2451.72 917.38 1002.14 1112.81 1255.66 1436.05 1679.22 838.22 888.27

MAX 4533 7025 10519 35682 54974 97322 1652 2181 2877 3707 5259 7097 1097 1316 1599 1974 2674 3603 924 1134

34

MIN 927 932 937 938 945 950 866 887 900 908 1023 1063 822 824 849 891 967 1022 767 767

SD 460.12 809.95 1147.54 2024.35 2810.13 4541.13 110.65 184.08 275.76 376.85 521.65 724.64 46.02 77.13 114.70 167.53 228.95 317.95 24.32 40.44

CV 0.35 0.46 0.49 0.65 0.69 0.85 0.11 0.16 0.20 0.23 0.26 0.30 0.05 0.08 0.10 0.13 0.16 0.19 0.03 0.05

Estimate 1242 1949 3564 7819 920 826 764 723 909 758 606 455

0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33

0.20 0.20 0.20 0.20 0.25 0.25 0.25 0.25 0.25 0.25 0.30 0.30 0.30 0.30 0.30 0.30

0.15 0.20 0.25 0.30 0.05 0.10 0.15 0.20 0.25 0.30 0.05 0.10 0.15 0.20 0.25 0.30

951.79 1033.25 1138.78 1276.06 775.05 806.53 846.03 897.76 966.23 1057.23 716.01 736.03 761.70 794.94 839.80 897.80

1286 1474 1790 2367 836 904 988 1074 1196 1404 763 804 852 898 995 1085

826 850 912 937 726 745 759 795 809 845 656 677 683 715 746 771

56.89 80.72 113.46 157.77 18.58 25.06 32.64 43.97 59.40 83.31 15.55 17.97 21.53 27.84 37.64 49.80

0.06 0.08 0.10 0.12 0.02 0.03 0.04 0.05 0.06 0.08 0.02 0.02 0.03 0.04 0.04 0.06

XIII Table 13. Asymmetric probabilities and capture-recapture estimates. P1 0.4 0.5 0.6

P2 0.4 0.28 0.1

MEAN 1001.351 1001.672 1005.559

MAX 1205 1226 1630

35

MIN 842 840 760

SD 47.30447 50.72381 80.23787

CV 4.72E-02 5.06E-02 7.98E-02

XIV Table 14. Measure of Precision Population Probability of Size Capture3 (1)

(2)

1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000

.05 .10 .15 .20 .25 .30 .05 .10 .15 .20 .25 .30

MEAN Estimate (3)

1298 1124 1029 1027 1025 1012 2524 2056 2046 2024 1996 1992

MAX Estimate (4)

3105 2179 1527 1349 1386 1262 10494 3534 3071 2817 2451 2263

MIN Estimate

SD

(5)

(6)

302 602 656 791 889 891 993 1259 1577 1648 1658 1767

721 294 192 124 88 77 1489 458 269 199 137 98

XV. Table 15. Bootstrapped Coverage Probabilities

3

P12|1 0.25

Coverage Probability

0.30

0.619

1.90E-02

Probability is P12/1 and P12/2 36

CV (7)

.56 .26 .18 .12 .08 .07 .58 .22 .13 .09 .07 .05

Figures

I. Figure 1. Sections of Probability

II. Figure 2. Shape of the capture-recapture distribution for three-factor model for P1 = 0.05.

37

III. Figure 3. Shape of the capture-recapture distribution for three-factor model for P2= 0.50

38