Estimating Parameters of Derived Random Variables

0 downloads 0 Views 123KB Size Report
Jan 9, 2011 - Any opinions and views expressed in this publication are the opinions and ... Comparison of the Delta and Parametric Bootstrap Methods ... product, quotient, and asymptotic functions. ... the bootstrap method be used for estimating the variance of a derived .... from the delta method are provided in Table 1.
This article was downloaded by: [CSIRO Library Services] On: 20 August 2015, At: 20:00 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London, SW1P 1WG

Transactions of the American Fisheries Society Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/utaf20

Estimating Parameters of Derived Random Variables: Comparison of the Delta and Parametric Bootstrap Methods Shijie Zhou

a

a

Oregon Department of Fish and Wildlife , Post Office Box 59, Portland, Oregon, 97207, USA Published online: 09 Jan 2011.

To cite this article: Shijie Zhou (2002) Estimating Parameters of Derived Random Variables: Comparison of the Delta and Parametric Bootstrap Methods, Transactions of the American Fisheries Society, 131:4, 667-675, DOI: 10.1577/1548-8659(2002)1312.0.CO;2 To link to this article: http://dx.doi.org/10.1577/1548-8659(2002)1312.0.CO;2

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Transactions of the American Fisheries Society 131:667–675, 2002 q Copyright by the American Fisheries Society 2002

Estimating Parameters of Derived Random Variables: Comparison of the Delta and Parametric Bootstrap Methods SHIJIE ZHOU*

Downloaded by [CSIRO Library Services] at 20:00 20 August 2015

Oregon Department of Fish and Wildlife, Post Office Box 59, Portland, Oregon 97207, USA Abstract.—In quantitative fisheries research, analysts often need to estimate the parameters of derived random variables that cannot be observed directly and are instead computed from other observed random variables. The delta method, which is based on the Taylor series, has been widely used for approximating the mean and variance of derived variables. This paper applies a computer simulation method that involves the parametric bootstrap method for estimating the mean and variance of derived random variables. The results from these two methods are compared with the true values (when they can be computed) or with each other. Several types of commonly used functions in quantitative fisheries studies are examined, namely, the exponential, logarithmic, product, quotient, and asymptotic functions. The parametric bootstrap method leads to fairly accurate estimates for the mean and variance of the derived variables. The delta method provides reasonably good estimates when the variation (expressed as the coefficient of variation, CV, which is defined as 100·SD/mean) of the underlying variable is small; however, the deviation from the true values or the bootstrap results increases as the variations of the underlying observed variables increase. The impact is more severe for the variance estimator than for the mean estimator. The delta method under- or overestimates the variance by about 10% when the CVs of the underlying variables approach the following values: 25% for the function exp(x), 20% for loge(x), 50% for xy, 15% for x/y and 1/x, 25% for loge(x/y), and 30% for x[1—exp(–yt)]. It is recommended that the bootstrap method be used for estimating the variance of a derived random variable when the observed variables have a large variance.

In quantitative fisheries research, the majority of variables are observations recorded as direct measurements or counts of biological responses. These variables may be called the observed or underlying random variables. There is another important class of variables that cannot be directly observed but have to be derived from the observed variables. Variables of this type may be called derived or computed variables (Sokal and Rohlf 1981). Data transformations, such as logarithmic and square root transformations, result in simple derived variables. Products, ratios, indices, and the like are also examples of derived variables. Other derived variables may involve complicated functions of multiple random variables. For example, the number of offspring of one individual that survive to independence is estimated from three observed variables: reproductive lifetime, fecundity, and survival rate (Brown and Alexander 1991). Similarly, with the area-under-the-curve method, the salmon spawning population is computed from several observed random variables: fish counts,

* E-mail: [email protected] Received September 17, 2001; accepted January 3, 2002

stream life, and survey observation efficiency (Zhou 2000). The methods for estimating the parameters of derived variables are scant (Wolter 1985). The delta method has been widely used for approximating the mean, variance, and covariance of derived variables (Seber 1982; Wolter 1985; Bieler and Williams 1993; CTC 1999; Quinn and Deriso 1999). This method has continued to receive additional study in recent years (Benichous and Gail 1989; Meneses et al. 1990; Oehlert 1992; Cobb and Church 1995; Marcheselli 2000). There have been a few studies on specific types of derived variables, such as products (Goodman 1960, 1962; Bohrnstedt and Goldberger 1969; Brown and Alexander 1991). However, general studies on derived variables are rare. Because the delta method is based on approximation, it is important to know the size of the error to use this method intelligently. This paper employs a computer-intensive simulation approach that involves using the parametric bootstrap method (Efron and Tibshirani 1993) to estimate parameters for several types of derived random variables. The results from the bootstrap and delta methods are compared with the true values (when the latter can be obtained) or with each other.

667

668

ZHOU

Methods Parametric bootstrap.—In fisheries studies, we often have information on observed variables. The information may come from similar research at a different time or location, from published literature, or from direct sampling. Let z be a random variable derived from the observed random variables xi (i 5 1, 2, . . . , n): z 5 g(x1, x2, . . . , xn).

(1)

where m is the number of data points generated during the bootstrapping. The variance of z is V[z]bootstrap 5 E[(z 2 E[z]bootstrap)2].

Delta method.—The delta method is based on the Taylor series approximation (Seber 1982):

O (x 2 u ) ]]xg (x 2 u )(x 2 u ) ] g 1O O , 2! ]x ]x n

z ø g(u i ) 1

Downloaded by [CSIRO Library Services] at 20:00 20 August 2015

n

Assuming that we know that the random variable xi has a mean ui, variance V[xi], and probability density function f(xi), we can generate a large number of xi,j (j 5 1, 2, . . . , m) using a computer program. Each derived random data point zj is then computed by zj 5 g(x1,j, x2,j, . . . , xn,j).

(2)

This process involves parametric bootstrapping (Efron and Tibshirani 1993). The mean ui has little effect on the distribution of z and the relative deviation (as described below) of the estimated parameters of z from their true values or that between the results of the bootstrap and the delta methods when the coefficient of variation (CV 5 100·SD/ mean) of the underlying variables remains constant. Hence, in this study the variance V[xi] was expressed as the coefficient of variation and was set to 1, 5, 10, 20, 30, 40, 50, 80, or 100%. The values of xi used in the examples were based on scaling simplicity. For each bootstrap simulation 10,000 data points were generated, and this process was repeated nine times to obtain the variance between these bootstrap runs. These results were used for evaluating the stability of the bootstrap method. In this paper, I assumed that the observed variables were mutually independent, although nonindependent variables can also be studied as long as their covariance is known. The normal and lognormal distributions were assumed for the observed random variables, that is, xi ; N(ui, V[xi]) or loge(xi) ; N(ui, V[loge(xi)]); other types of distributions can also be used for this method. From the artificial population of zi we can compute many statistics that describe the derived random variable z. Because the mean and variance (or standard deviation) are the two most commonly used statistics, they are examined closely here. The mean of z is the expected value of zj:

Oz m

E [z]bootstrap 5

j51

m

j

.

(3)

(4)

i

i51

i

i

n

i

i

k

2

k

i51 k51

i

(5)

k

where all derivatives are evaluated at xi 5 ui. The mean is the expected value of z, which includes the second derivative of the Taylor series: E [z]delta ø g(u i ) 1 1

1 2

O V [x ] ]]xg n

2

i

i51

2 i

O O cov[x , x ] ]x] ]gx . 2

i

i, k

k

i

(6)

k

The last term vanishes when the observed variables are mutually independent. The most common application of the delta method is for variance estimation (Bieler and Williams 1993; Cox 1990; Quinn and Deriso 1999) using the first-order Taylor series approximation of the deviations of the estimates from their expected values (Seber 1982): V [z]delta ø E {[g(x i ) 2 g(u i )] 2 }

O V [x ] 1]]xg 2 ]g ]g 1 2 O O cov[x , x ] . ]x ]x 2

n

5

i51

i

i

i, k

i

k

i

(7)

k

The last term also disappears when the xi are mutually independent. I evaluated several common derived variables using both the parametric bootstrap and delta methods. The formulae for the functions derived from the delta method are provided in Table 1. The true parameters (mean and variance) were calculated for some derived random variables. The results from the bootstrap and delta methods were then compared with these true values to assess their bias. If the true parameters of the derived variables could not be readily computed, the results of the bootstrap and delta methods were compared. The relative deviation between the true values and the estimates from the bootstrap and delta methods (or between those of the two methods) was used to evaluate the performance of the meth-

ESTIMATING PARAMETERS OF DERIVED RANDOM VARIABLES

669

TABLE 1.—Formulae for the means and variances derived from the delta method for some common derived random variables; x¯ and y¯ denote the mean values of x and y.

Downloaded by [CSIRO Library Services] at 20:00 20 August 2015

Function (z)

Mean (E[z])

ex

1

2

V[x]e 2x¯

log e (x)

log e (x) 2

V[x] 2x¯ 2

V[x] x¯ 2

xy

xy

V[x]y¯ 2 1 V[y]x¯ 2

x y

V[y]x¯ x¯ 1 y¯ y¯ 3

x¯ 2 V[x] V[y] 1 2 y¯ 2 x¯ 2 y¯

1 x

1 V[x] 1 3 x¯ x¯

V[x] x¯ 4

x x1y

x¯ V[y]x¯ 1 V[x]y¯ 1 x¯ 1 y¯ (x¯ 1 y¯) 3

V[x]y¯ 2 1 V[y]x¯ 2 (x¯ 1 y¯) 4

log e

1y2

log e

1x 1 y2

x

x

x(1 2 e2yt )

[

log e

1y¯2 2 2x¯

log e

1x¯ 1 y¯2 2



[

V[x]

1

2



V[y] 2y¯ 2

V[x]y¯ (2x¯ 1 y¯) V[y] 1 2(x¯ 2 1 xy ) 2 2(x¯ 1 y¯) 2

1

¯ x¯ 1 2 e2yt 11

]

2

t 2 V[y] 2

RDM% 5

(E [z] 2 z¯ ) 3 100, z¯

RDV% 5

(V [z] 2 s 2z

(8)

delta method is an approximation, the deviation is considered the bias of this method. When the relative deviation is known, bias-corrected estimates can be obtained from the delta method, namely, E [z] 5

3 100

(E [z]delta 2 E [z]bootstrap ) 3 100 E [z]bootstrap

(9)

(12)

V [z]delta 1 1 RDV%

(13)

for the variance.

(10)

for the mean and as (V [z]delta 2 V [z]bootstrap ) 3 100 V [z]bootstrap

E [z]delta 1 1 RDM%

for the mean and V [z] 5

where V[z] is the estimated variance from the bootstrap or delta method and sz2 is the true variance. For comparison of the bootstrap and the delta methods, the relative deviation was calculated as

RDV% 5

V[x]y¯ 2 V[y] 1 (x¯ 2 1 xy ) 2 (x¯ 1 y¯) 2 ¯ 2 ¯ V[x](1 2 e2yt ) 1 V[y]x 2 t 2 e22yt

where E[z] is the estimated mean from the bootstrap or delta method and z¯ is the true mean. Similarly, the percent relative deviation of the variance is s 2z )

]

V[x] V[y] 1 2 x¯ 2 y¯

ods. The percent relative deviation of the mean (RDM%) is defined as

RDM% 5

Variance (V[z])

V[x] x¯ 11 e 2

(11)

for the variance, where the superscript indicates either the delta or bootstrap results. Because the

Results The Exponential Function The exponential function z 5 exp(x) is commonly used in fisheries studies. For example, the survival rate over one unit of time can be expressed as an exponential function of instantaneous mortality. Here I assumed that the underlying variable x follows a normal distribution. The mean was set to 1 and the coefficient of variation was used for the variance (i.e., 1, 5, 10, 20, 30, 40, 50, 80, or 100%). For both methods, different values of the mean resulted in different values for E[z] and V[z]; however, the shape of the curve remained the same

Downloaded by [CSIRO Library Services] at 20:00 20 August 2015

670

ZHOU

FIGURE 1.—Effects of the coefficient of variation of the observed random variable x on the mean (E[z]; upper panel) and variance (V[z]; lower panel) of the derived variable z 5 exp(x) at E[x] 5 1 as computed by the parametric bootstrap and delta methods. The results from the bootstrap method largely overlap the true values.

under the constant CV. The value of the mean also had little effect on the relative deviation between the two estimation methods. This was also true for the other functions studied in this paper. While x had a normal distribution, the derived variable z of course had a lognormal distribution. The true mean and variance of z were obtained from the equations

1

E [z] true 5 exp E [x] 1

V [x] 2

2

and

(14)

V [z] true 5 exp(2E [x] 1 V [x])[exp(V [x]) 2 1]. (15) The variability of z (expressed as CV[z]) was comparable to that of x but slightly larger. As the variance of x increased, both E[z] and V[z] increased (Figure 1). With the delta method, the relative deviations of both the mean and variance from their true values increased as CV[x] increased (Table 2). The delta method always underestimated the mean and variance. For example, it underestimated V[z] by about 13% with a CV of 30% and by about 31% with a CV of 50%. In contrast, the parametric bootstrap method provided accurate estimates for both the mean and variance at all levels of vari-

FIGURE 2.—Effects of the coefficient of variation of the observed random variable x on the mean (E[z]) and variance (V[z]) of the derived variable z 5 loge(x) at E[loge(x)] 5 1. The results from the bootstrap method overlap the true values.

ability (Table 2). According to equation (13), when CV[x] 5 50%, the variance estimated from delta method must be increased by about 46% [1/(1 1 RDV%) 5 1/(1 – 0.313)] to obtain a bias-corrected estimator. The Logarithmic Function The logarithmic function z 5 loge(x) has wide uses in quantitative fisheries studies. For example, the linearized Ricker stock–recruitment model uses log-transformed stock and recruitment abundance, and instantaneous mortality is the negative logarithmic function of survival rate (Quinn and Deriso 1999). Here I assumed that the observed random variable x had a lognormal distribution and that the mean of loge(x) was 1. The true mean and variance were obtained from the equations E [z] true 5 log e (E [x]) 2 V [z] true 5 log e

[

V [z] 2

]

V [x] 11 . (E [x]) 2

and

(16) (17)

Treating loge(x) and CV[loge(x)] as constant simplified the simulation process. However, this resulted in the deviation of CV[x] from CV[loge(x)]

671

ESTIMATING PARAMETERS OF DERIVED RANDOM VARIABLES

bootstrap method can provide accurate estimates of the true values. Both x and y were assumed to be normally distributed, and their expected values were set to 10. The variability in both x and y contributes to V[z] and makes CV[z] much larger than CV[x] and CV[y]. The delta method underestimated the variance of the product (Figure 3). As the CVs of x and y increased, the deviation of V[z]delta from its true value also increased. Again, the bootstrap method provided more accurate estimates for both the mean and variance (Table 2).

Downloaded by [CSIRO Library Services] at 20:00 20 August 2015

The Logarithmic-Quotient Function Logarithmic-quotient type functions [z 5 loge(x/ y)] can be found in instantaneous mortality and additive stock–recruitment models (Quinn and Deriso 1999). I assumed that both x and y followed a lognormal distribution. The mean of loge(x) was set to 1 and that of loge(y) to 2. Since x and y were assumed to be mutually independent, the true mean and variance can be obtained by the equations FIGURE 3.—Effects of the coefficients of variation of the observed random variables x and y on the mean (E[z]) and variance (V[z]) of the derived variable z 5 xy at E[x] 5 E[y] 5 10 and CV[x] 5 CV[y]. The values of E[z] from both the delta and bootstrap methods overlap the true value (i.e., E[x]E[y]).

as their variability increased. The true values of CV[x] are included in Table 2. As CV[x] increased, the deviation of E[z]delta and V[z]delta from their true values increased rapidly (Figure 2). The delta method underestimated the mean but overestimated the variance. On the other hand, the bootstrap method resulted in more accurate estimates for both the mean and variance throughout the range of CV[x] (Table 2). The Product Function Computing the products of two or more observed random variables (z 5 xy) is very common in quantitative fisheries science. In this special situation, the delta method results in an exact E[z] rather than an approximation. Also, methods have been developed for determining the exact variance of products (Goodman 1960, 1962; Brown and Alexander 1991). The exact variance of products of two independent random variables can be computed as V[xy] 5 E[x]2V[y] 1 E[y]2V[x] 1 V[x]V[y].(18) The purpose here was to show that the parametric

E [z] true 5 log e

1E [y]2 1 2 log 1CV [x] 1 12 E [x]

CV 2 [y] 1 1

1

e

2

(19)

for the mean and V[z]true 5 loge(CV2[x] 1 1) 1 loge(CV2[y] 1 1) (20) for the variance. According to equations (16) and (17), CV[y] . CV[x] when loge(y) , loge(x) but CV[loge(y)] 5 CV[loge(x)]. The parametric bootstrap results were stable even at high variability levels. The delta method provided reasonably good estimates for the mean when CV[x] was less than 30% (Figure 4). The relative deviation of V[z]delta from V[z]true increased as the CV of x and y increased and was about 10% at CV[x] 5 25%. Again, the parametric bootstrap method resulted in accurate estimates for both the mean and variance for values of CV[x] up to 100% (Table 2). With a similar function, z 5 loge[x/(x 1 y)], where the E[loge(x)] and E[loge(y]) were set to 1, the approximations from the delta method had a smaller relative deviation from the results of bootstrap simulation than was the case with z 5 loge(x/y) (Table 3). One reason for this was that the relative variability of the denominator (x 1 y) was smaller than that of y alone. The Quotient Function The quotient-derived variable z 5 x/y is another common variable in quantitative fisheries science. For example, rates, ratios, and percents are all quo-

672

ZHOU

TABLE 2.—Relative deviations (%) from their true values of the means and variances computed using the delta and parametric bootstrap methods for four derived random variables.

Downloaded by [CSIRO Library Services] at 20:00 20 August 2015

exp(x) Delta method

log e (x) Bootstrap

Delta method

Bootstrap

CV[x] (%)

Mean

Variance

Mean

Variance

CV[x] (%)

Mean

Variance

Mean

Variance

1 5 10 20 30 40 50 80 100

0.0 0.0 0.0 0.0 20.1 20.3 20.7 24.1 29.0

0.0 20.4 21.5 25.9 212.9 221.4 231.3 262.1 278.5

0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.1 0.0

0.0 0.0 0.2 20.1 20.4 21.3 1.8 21.3 1.7

1 5 10 20 31 42 54 94 130

0.0 0.0 0.0 0.0 20.2 20.7 21.7 212.5 234.3

0.0 0.1 0.5 2.0 4.7 8.4 13.8 39.6 70.2

0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.1 20.5

0.0 20.1 0.0 20.1 20.3 0.2 20.5 0.5 1.3

tients of two observed variables. I assumed that both observed variables x and y had normal distributions with means of 10. The random variation of the observed variables, especially that of y, had a dramatic impact on the quotients, resulting in very large values for CV[z] compared with those for CV[x] and CV[y]. Because the true mean and variance cannot be readily obtained, I compared the results between the two methods. The delta method tended to give lower estimates for the mean and variance at lower variations (Figure 5). When CV[x] and CV[y] increased, E[z] became unstable even with 100,000 bootstraps. For example, CV(E[z]bootstrap), the coefficient of variation of the mean E[z] from the 10 replicate simulation runs (each with 10,000 bootstraps) was 5% and 55% at CV[x] or CV[y] of 30% and 40%, respectively. The variance estimate had similar yet more serious problems, becoming unstable (CV{V[z]bootstrap} 5 212%) and very large when the CV of x and y reached 30%. The delta method provided a good approximation of E[z] as long as the values of the CV were no larger than 30%, which resulted in a deviation less than 3% (Table 3). The variance estimated by the delta method was very small compared with that of the bootstrap results when CV[x] and CV[y] reached 20%. At CV[x] and CV[y] 5 30%, V[z]delta needed to be increased by about 1,120% to obtain a biascorrected estimator. The variance from the delta method became insignificant compared with that computed from the parametric bootstrap when CV[x] and CV[y] exceeded 30%. Two other quotient functions were also examined, namely, the reciprocal function z 5 1/x and the function z 5 x/(x 1 y). The reciprocal function had much the same outcome as z 5 x/y, though E[z]delta and V[z]delta may have higher relative deviations from those of the bootstrap method than was the case with z 5 x/y at the same CV levels

(Table 3). In contrast, with z 5 x/(x 1 y), E[z]delta and V[z]delta had lower deviations from the bootstrap results at the same CV levels because the relative variability of the denominator (x 1 y) was smaller (Table 3). In all of these quotient functions, the variance of the derived variable z became very large and the simulation became unstable when CV[x] and CV[y] were approximately 30% or more. Asymptotic functions Functions of the form z 5 x[1– exp(–yt)] appear in asymptotic growth and asymptotic catch equations. Both x and y were assumed to have a normal distribution and were set to 10, while the constant t was set to 1. The bootstrap results became unstable when CV[x] and CV[y] were larger than 40% for the mean and 30% for the variance (Figure 6). The delta method provided fairly good approximations when CV[x] and CV[y] did not exceed 30% for the mean and 20% for the variance (Table 3). Discussion Two conclusions can be drawn from this study. First, the parametric bootstrap method can provide accurate estimates for the parameters of derived random variables if the distributions of the underlying random variables are known. Second, there can be a substantial deviation between the means and variances estimated by the delta method and either the true values or those estimated by the parametric bootstrap method. Because the delta method is an approximation based on the Taylor series, its results are considered biased (Hughes-Hallett et al. 1994). The bias is more severe for the variance than for the mean because the variance is computed from only the first-degree terms of the Taylor polynomial where-

673

ESTIMATING PARAMETERS OF DERIVED RANDOM VARIABLES

TABLE 2.—Extended.

Downloaded by [CSIRO Library Services] at 20:00 20 August 2015

xy Delta method

log e (x/y) Bootstrap

Delta method

Bootstrap

CV[x,y] (%)

Mean

Variance

Mean

Variance

CV[x] (%)

Mean

Variance

Mean

Variance

1 5 10 20 30 40 50 80 100

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 20.1 20.5 22.0 24.3 27.4 211.1 224.3 233.2

0.0 0.0 0.0 0.0 20.1 0.0 20.1 0.0 20.3

20.1 0.1 0.5 0.2 20.5 0.2 0.1 0.7 20.3

1 5 10 20 31 42 53 94 137

0.0 0.0 0.0 20.6 23.5 212.4 236.2 2310.1 22,373.5

0.0 0.4 1.7 7.2 17.2 34.1 61.5 246.5 777.1

0.0 0.0 0.0 0.0 0.0 0.1 1.2 29.5 213.6

0.0 20.3 0.1 20.1 20.3 20.7 21.7 8.2 11.4

as the mean includes the second-degree terms (Seber 1982). The Taylor approximations are generally accurate approximations of the function locally (that is, near ui in the delta method; Brown and Rothery 1993; Hughes-Hallett et al. 1994). However, as seen in this paper, the approximation becomes less accurate when the variation in the underlying variables increases. Therefore, caution should be taken when using the delta method for variables with large variances. For example, if a bias larger than 10% is unacceptable, one should

FIGURE 4.—Effects of the coefficients of variation of the observed random variables x and y on the mean (E[z]) and variance (V[z]) of the derived variable z 5 loge(x/ y) at E[loge(x)] 5 1 and E[loge(y)] 5 2 and CV[x] 5 CV[y].

avoid using the delta method for variance estimation when CV[x] and CV[y] are approximately as large as the following values: 25% for the function exp(x), 20% for loge(x), 50% for xy, 15% for x/y (and 1/x), 25% for loge(x/y), and 30% for x[1– exp(–yt)]. Unfortunately, fishery data typically have a large variance (Walters and Ludwig 1981; Stanley 1992; Punt and Butterworth 1993; Zhou 2000). It is not uncommon to see a coefficient of variation larger than 40% or 50% (Chen and Jackson 1995; CTC 1999). Although it is possible to include higher degrees of the Taylor polynomial to get better approximations, the variance formulas are often computationally cumbersome, especially when nonindependent multivariates are involved (Gefeller 1992). This paper demonstrates that the parametric bootstrap method can be used to obtain unbiased parameters for derived variables. The method can also be applied to more complex functions than those discussed here; it is not necessary to limit the underlying variables to a normal or lognormal distribution and to being mutually independent as long as their distributions and covariances are known. For rigorous research, I recommend that the parametric bootstrap method be used in lieu of the delta method. Although it is easier to use computer programming techniques for parametric bootstrapping, this process can also be performed on spreadsheet software. Nevertheless, the relative deviations computed in this paper can be used to correct the delta method’s bias with respect to certain functions when the distributions of the underlying variables are as assumed in this paper. Caution is needed when estimating the parameters of derived random variables because of their uncertainty. The variability and distribution of all the underlying variables contribute to the uncertainty of the derived variables. Also, the type of

674

ZHOU

Downloaded by [CSIRO Library Services] at 20:00 20 August 2015

TABLE 3.—Relative deviations (%) of the means and variances derived by the delta method and those derived by the parametric bootstrap method for derived variables. The results from the delta method are considered bias. Numbers in italics indicate unstable results from the bootstrap method. The values of CV[x] and CV[y] apply to the columns to their right. x/y

x/(x 1 y)

1/x

CV[x], CV[y]

Mean

Variance

Mean

Variance

Mean

1 5 10 20 30 40 50 80 100

0.0 0.0 0.0 20.6 22.5 249.9 23.8 241.6 2225.1

20.2 22.0 24.9 225.7 291.8 299.8 299.9 2100.0 299.9

0.0 0.0 0.0 20.6 28.2 81.2 17.1 234.0 74.4

20.1 22.0 28.1 234.2 291.4 299.9 2100.0 2100.0 299.9

0.0 0.0 0.0 0.0 0.0 0.0 2.0 235.1 24.7

log e [x/(x 1 y)] CV[x ], CV[x], Variance CV[y] Mean Variance CV[y] 0.4 20.1 21.5 26.2 215.9 255.1 299.6 299.9 299.9

1 5 10 20 31 42 54 95 131

0.0 0.0 0.0 0.1 0.2 0.7 1.7 10.1 23.5

20.3 0.0 20.3 1.6 2.5 5.3 9.1 25.5 52.8

1 5 10 20 30 40 50 80 100

x[1 2 exp(2yt)] Mean

Variance

0.0 0.0 0.0 20.1 0.2 9.4 2264.4 2100.0 2100.0

0.0 0.0 0.0 0.1 210.7 294.5 2100.0 2100.0 2100.0

function has a remarkable impact on the uncertainty of the derived variables. These factors may result in much higher uncertainty for the derived variable than for the underlying ones. For example, when the coefficients of variation of the underlying variables are larger than 50%, computing the var-

iance of any quotient-type function may be meaningless. Finally, I should point out that the expected value of the derived variable is generally not equal to the value of the function of the expected values of the underlying variables. That is, while z 5 g(x1, x2, . . . , xn), E[z] ± g(E[x1], E[x2], . . . , E[xn]) ex-

FIGURE 5.—Effects of the coefficients of variation of the observed random variables x and y on the mean (E[z]) and variance (V[z]) of the derived variable z 5 x/y at E[x] 5 E[y] 5 10 and CV[x] 5 CV[y]. The vertical bars represent the standard errors from the bootstrap replicates. The variance from the bootstrap method becomes too large and unstable to be displayed when CV[x] and CV[y] exceed 30%.

FIGURE 6.—Effects of the coefficients of variation of the observed random variables x and y on the mean (E[z]) and variance (V[z]) of the derived variable z 5 x[1— exp(–yt)] at E[x] 5 E[y] 5 10, t 5 1, and CV[x] 5 CV[y]. The mean and variance from the bootstrap method become too large or small and too unstable to be displayed when CV[x] and CV[y] exceed 40% and 30% for the mean and variance, respectively.

Downloaded by [CSIRO Library Services] at 20:00 20 August 2015

ESTIMATING PARAMETERS OF DERIVED RANDOM VARIABLES

cept for products. Biologists often make this mistake. The expected value of the derived variable depends not only on the means of the underlying random variables but also on their distributions and variances. The variances in the underlying random variables may either increase or decrease the expected value of the derived random variable. The impacts may be offset when the underlying variables act in opposite directions. As a result, E[z] may equal g(E[x1], E[x2], . . . , E[xn]) for some functions in special situations. For example, E[z] 5 g(E[x], E[y]) for z 5 loge(x/y) when CV[x] 5 CV[y] and for z 5 x/(x 1 y) when E[x] 5 E[y] and V[x] 5 V[y]. Nonetheless, one should be aware of the effects of variance on the expected value of the derived variable when performing computations. Acknowledgments I thank David Bernard, Din Chen, Jay Hensleigh, Terry Quinn, Sam Sharr, Dana Hanselman, and one anonymous reviewer for reviewing and offering constructive comments on the original version of this paper. References Benichous, J., and M. H. Gail. 1989. A delta method for implicitly defined random variables. American Statistician 43:41–44. Bieler, G. S., and R. L. Williams. 1993. Ratio estimates, the delta method, and quantal response tests for increased carcinogenicity. Biometrics 49:793–801. Bohrnstedt, G. W., and A. S. Goldberger. 1969. On the exact covariance of products of random variables. American Statistical Association Journal 64:1439– 1442. Brown, D., and N. Alexander. 1991. The analysis of the variance and covariance of products. Biometrics 47: 429–444. Brown, D., and P. Rothery. 1993. Models in biology: mathematics, statistics, and computing. Wiley, Chichester, UK. Chen, Y., and D. A. Jackson. 1995. Robust estimation of mean and variance in fisheries. Transactions of the American Fisheries Society 124:401–412. Cobb, E. B., and J. D. Church. 1995. A discrete delta method for estimating standard errors for estimators in quantal bioassay. Biometrical Journal 6:691–699. Cox, C. 1990. Fieller’s theorem, the likelihood and the delta method. Biometrics 46:709–718. CTC (Chinook Technical Committee). 1999. Maximum sustained yield or biologically based escapement goals for selected chinook salmon stocks used by the Pacific salmon commission’s chinook technical

675

committee for escapement assessment. Pacific Salmon Commission, Joint Chinook Technical Committee, Report TCCHINOOK (99)-3, Vancouver. Efron, B., and R. J. Tibshirani. 1993. An introduction to the bootstrap. Chapman and Hall, New York. Gefeller, O. 1992. A simple method of avoiding the computational problems of the delta method for the end-user of statistical packages. Computer Applications in the Biosciences 8:293–294. Goodman, L. A. 1960. On the exact variance of products. Journal of the American Statistical Association 55:708–713. Goodman, L. A. 1962. The variance of the product of K-random variables. Journal of the American Statistical Association 57:54–60. Hughes-Hallett, D., A. M. Gleason, D. E. Flath, S. P. Gordon, D. O. Lomen, D. Lovelock, W. G. McCallum, B. G. Osgood, A. Pasquale, J. TecoskyFeldman, J. B. Thrash, K. R. Thrash, T. W. Tucker, and O. K. Bretscher. 1994. Calculus. Wiley, New York. Marcheselli, M. 2000. A generalized delta method with applications to intrinsic diversity profiles. Journal of Applied Probability 37:504–510. Meneses, J., C. E. Antle, M. J. Bartholomew, and R. L. Lengerich. 1990. A simple algorithm for delta method variances for multinomial posterior Bayes probability estimates. Communications in Statistics 19:837–845. Oehlert, G. W. 1992. A note on the delta method. American Statistician 46:27–29. Punt, A. E., and D. S. Butterworth. 1993. Variance estimates for fisheries assessment: their importance and how best to evaluate them. Canadian Special Publication of Fisheries and Aquatic Sciences 120: 145–162. Quinn, T. J., and R. B. Deriso. 1999. Quantitative fish dynamics. Oxford University Press, New York. Seber, G. A. F. 1982. The estimation of animal abundance and related parameters, 2nd edition. Charles Griffin and Company, London. Sokal, R. R., and F. J. Rohlf. 1981. Biometry, 2nd edition. Freeman, San Francisco. Stanley, R. D. 1992. Bootstrap calculation of catch-perunit-effort variance from trawl logbooks: do fisheries generate enough observations for stock assessments? North American Journal of Fisheries Management 12:19–27. Walters, C. J., and D. Ludwig. 1981. Effects of measurement errors on the assessment of stock2recruitment relationships. Canadian Journal of Fisheries and Aquatic Sciences 38:704–710. Wolter, K. M. 1985. Introduction to variance estimation. Springer-Verlag, New York. Zhou, S. 2000. Stock assessment and optimal escapement of coho salmon in three Oregon coastal lakes. Oregon Department of Fish and Wildlife, Information Report 2000-07, Portland.