ESTIMATING HAZARD FUNCTIONS FOR ... - BYU Marriott School

24 downloads 47 Views 123KB Size Report
Scott D. Grimshaw,1∗ James McDonald,2 Grant R. McQueen,3 and Steven Thorley3. 1Statistics Department,. 2Economics Department, and. 3Marriott School of ...
ESTIMATING HAZARD FUNCTIONS FOR DISCRETE LIFETIMES

Scott D. Grimshaw,1∗ James McDonald,2 Grant R. McQueen,3 and Steven Thorley3 1

Statistics Department,

2

Economics Department, and

3

Marriott School of Management,

Brigham Young University Provo, Utah, USA Key Words: rounding; interval censoring; Weibull; discrete hazard function; logistic. ABSTRACT Frequently in inference the observed data are modeled as a sample from a continuous probability model, implying the observed data are precisely measured. Usually the actual data available to the investigator are discrete — either because they are measured discretely, meaning there has been rounding to some small measurement unit related to the precision of the measuring device, or because the data are discretely measured, meaning the time periods until the event of interest are countable instead of continuous. This paper is motivated by the common practice of testing for duration dependence (non-constant hazard function) in economic and financial data using the continuous Weibull distribution when the data are discrete. A simulation study shows that biased parameter estimates and distorted hypothesis tests result when the degree of discretization is severe. When observations are measured discretely, as in measuring the time between stock trades, it is proper to treat them as interval-censored. When observations are discretely measured, as in measuring the length of stock runs, a discrete hazard function must be specified. Both cases are examinined in simulation studies and demonstrated on financial data. 1. INTRODUCTION One interesting aspect of economic and financial data is duration dependence — does the probability of ending the strike, unemployment spell, expansion, or stock run depend on the 1

length of the strike, spell, or run. For example, the “reservation wage” theory of labor implies that as a spell of unemployment grows longer, the job seeker becomes more desperate, lowers her reservation wage, and consequently, becomes more likely to find a job. In contrast, the “damaged goods” theory implies just the opposite. The longer the period of unemployment, the more likely the job seeker has some attribute that makes her unemployable (damaged) and, consequently, less likely to find a job. Measures of and tests for duration dependence in the finance and economics literature estimate the hazard function of T =time to event given by λ(t) = lim

∆t→0

P[t ≤ T ≤ t + ∆t | T ≥ t] ∆t

since the hazard function represents the likelihood that an event that lasted until t will end in the next instance. Most duration dependence studies assume T follows a Weibull distribution whose hazard function is λ(t; α, β) =

βtβ−1 αβ

where α > 0 is the scale parameter and β > 0 is the shape parameter. Notice that β > 1 yields an increasing hazard function (the longer the run or event, the more likely it will end) and the opposite is true when β < 1. The case β = 1 reduces to the constant or “memoryless” hazard function of λ(t) = 1/α that is associated with the exponential distribution. The test of duration dependence is Ho : β = 1 vs. Ha : β 6= 1 since if β = 1 the hazard function does not depend on t. Returning to the unemployment example, if unemployment spells have β > 1 where the hazard function increases as t increases, the duration dependence supports the “reservation wage” theory. However if unemployment spells have β < 1 where the hazard function decreases as t increases, the duration dependence supports the “damaged goods” theory. Many economic and financial events are continuous but measured discretely. For example, strikes, unemployment spells, and expansions end at any time in a continuous interval, not just at specified discrete times. In practice, however, these events are rarely continuously monitored. Rather, data is discrete when reported, usually corresponding to a convenient 2

small unit of measure. For example, strike lengths may be reported in full days (see, Kennan (1)), unemployment spells in weeks (see, Solon (2)), and economic expansions in months (see, Sichel (3)). If the measurement precision is sufficient, treating the measured discretely data as exactly measured is convenient and acceptable. However, what happens when the precision is not sufficient? Other economic events are discretely measured. For example, a run of positive stock returns can last 5 or 6 periods (see, McQueen and Thorley (4)), but never 5.4 periods. Another example is counting on time mortgage payments which can last 12 or 13 months (see, Deng, Quigley, and Van Orden (5)), but never 12.7 months. In analyzing discretely measured data the application of a continuous probability model is clearly wrong, but often easier than the discrete model with the implicit assumption that the model specification doesn’t effect the conclusions. For example, the distribution of counts may follow a binomial distribution but the normal distribution is an adequate approximation in some instances. However, what happens when the continuous model is a poor approximation of the discrete model? This paper presents a simulation study that demonstrates the bias in estimation and hypothesis tests that can arise when the Weibull probability model, treating data as precisely measured, is erroneously applied to data that is either measured discretely or discretely measured. It is proposed that discretely measured lifetime data be treated as interval censored. Inference using interval censored data is widely available in software, particularly for the Weibull distribution. It is proposed that measured discretely lifetime data use a logistic model for the discrete hazard function. Both of these proposed methods are examined in simulation studies and demonstrated on financial data. 2. MEASURED DISCRETELY DATA Suppose {Ti } are a random sample of lifetimes from a Weibull distribution with hazard function λ(t; α, β) =

βtβ−1 αβ

where α > 0 is the scale parameter and β > 0 is the shape parameter. Using the notation 3

common in investigations of rounded data, suppose that due to rounding the lifetimes are measured discretely so that Ti∗ takes on the possible values (0, w, 2w, 3w, . . .) where w denotes the precision of the measuring instrument. Haitovsky (6) contains a review of methods for rounded or grouped data and Tricker (7, 8, 9, 10), Tricker, Coates and Okell (11), and Lee and Vardeman (12) investigate the effect of rounding on normal, a family of non-normal, gamma, and exponentially distributed data on parameter estimates, hypothesis tests, and control charts. As an example, consider the time between trades of a stock that trades continuously throughout the day. In the Trades and Quotes (TAQ) database of the New York Stock Exchange (NYSE) the trade times are only time stamped to the nearest second. This results in measured discretely time between trades where two trades 0.4 seconds apart will appear as a simultaneous trade (zero seconds apart) and two trades 0.6 seconds apart will appear as occurring one second apart. A study of the time between stock trades (see, Engle and Russell (13)) would need to account for the discreteness of the observed data. Tables I and II present results of a simulation study of the consequences of ignoring the fact that data are measured discretely. For each of fifteen scenarios, 10,000 samples are generated from a Weibull distribution. The sample data are measured discretely by rounding each observation to (1, 2, 3, . . .). For each simulated sample, maximum likelihood estimates of the shape parameter β and the scale parameter α are computed and the likelihood ratio test Ho : β = 1 (for duration dependence) is computed at the 0.05 significance level. The scenarios in the simulation are chosen to reflect the study of time between stock trades. The value w = 1 reflects the precision of the NYSE data gathering which is currently to the nearest second. The three values of n (3000, 600, 50) correspond to the number of trades between 10:30 am and 3:00 pm for active, medium, and less active stocks, respectively. For example n = 3, 000 is approximately equivalent to a trade every 3.5 seconds, calibrated to an actively traded NYSE stock, and n = 50 is approximately equivalent to a trade every 210 seconds, calibrated to a less active Nasdaq stock. The five β (0.8, 0.9, 1.0, 1.1, 1.2) allow investigation of decreasing, constant, and increasing hazard functions, where the 0.8 level

4

was chosen since it was observed in the IBM time between trades example presented at the end of this section. Tables I and II indicate the problems that arise when measured discretely data are erroneously treated as precisely measured. First, the maximum likelihood estimators of the Weibull parameters are positively biased. For example, when β = 0.8 and n = 3, 000, then ˆ = 1.07, a bias of 0.27. Second, the likelihood ratio test of Ho : β = 1 is significantly E(β) biased. For example, if n = 3, 000, when β = 1 instead of the expected Type I Error Rate of 0.05 the test falsely rejects Ho with probability one! Third, the positive bias can be large enough to lead to the wrong conclusion about the shape of the hazard function when Ho is rejected in favor of Ha : β 6= 1. For example, when n = 3, 000 and β = 0.8 the test has power 0.999 to reject Ho in favor of Ha : β 6= 1 but often in the false conclusion of β > 1 ˆ due to the positive bias in β. Casual study of the table appears to indicate the bias is largest for large samples. This seemingly counterintuitive result is better understood by comparing the ratio r = w/σ, where σ denotes the population standard deviation. Tricker, Coates and Okell (10) summarize the many different “rules of thumb” for r, varying from r < 1 to r < 10. Recall that the Weibull q

standard deviation is σ = α Γ(1 + 2/β) − [Γ(1 + 1/β)]2 , where Γ(·) denotes the gamma function. In the case of β = 0.8 and an actively traded stock (n = 3, 000 and α = 3.5) then σ = 5. Rounding to the nearest second (integer) is r = 1/5 which is small according to the rule of thumb recommendation but still results in severe bias. The case of β = 0.8 and a less traded stock (n = 50 and α = 210) has σ = 300 and rounding to the nearest second is r = 1/300 which is smaller than the most severe rule of thumb recommendation, but the rounding still results in estimators whose bias is large enough to effect on the outcome of the hypothesis test for constant hazard function. As pointed out by Meeker and Escobar (14, pp. 165-166), the origin of the problems found in the simulation study is that the likelihood has been misspecified. The “appropriate likelihood” recognizes that data measured discretely is interval censored. For example, when the next stock trade occurs before the time stamp changes, the time between trades is

5

reported as 1. In truth, the time between trades is more than 0 but less than 1 second, with the exact value unknown. Therefore, the “appropriate likelihood” is L(α, β) = F (1; α, β) − F (0; α, β) where F (t; α, β) denotes the Weibull cumulative distribution function. Maximum likelihood estimation for interval-censored data is commonly available in statistical software. The likelihood ratio test for duration dependence, Ho : β = 1 can be constructed by evaluating the interval-censored Weibull likelihood and comparing it to the interval-censored exponential likelihood. The likelihood function for interval data is discussed in many textbooks. See Meeker and Escobar (14) for reliability, Klein and Moeschberger (15) for survival analysis, and Prentice and Gloeckler (16) for grouped data. Applications of the interval or grouped model include McDonald (17), Butler and McDonald (18), and Butler, Anderson, and Burkhauser (19). Gajewski (20) introduces the interval censoring in hierarchical Bayes models. One could claim that since all continuous data is rounded to some degree at measurement, the interval-censored likelihood should always be used. Since w denotes the measuring precision, the appropriate likelihood if t∗i is observed is L(α, β; t∗i ) = F (t∗i ; α, β) − F (t∗i − w; α, β) Since the density function f (t) is defined as f (t; α, β) =

dF (t; α, β) F (t; α, β) − F (t − ∆; α, β) = lim ∆→0 dt ∆

then if w is sufficiently small to replace ∆, an approximation to the “appropriate likelihood” is L(α, β; t∗i ) ≈ f (t∗i ; α, β) · w If w is small enough to not effect the shape and position of likelihood (that is, does not depend on α and β) then the density approximation of the likelihood that corresponds to precisely measured data is convenient. Figure 1 illustrates how the density approximation to 6

the “appropriate likelihood” fails when w is relatively large. The figure illustrates the Weibull with α = 3.5, β = 0.8 and the degree of rounding w = 1. Notice how the approximation at t = 1 (open circle) is over 0.1 below the appropriate likelihood (dot). Also note how the severity of the problem diminishes as the relative degree of rounding diminishes with larger values of t. The simulation study presented in this paper demonstrates that for data with short durations where w is not small enough, ignoring round off has important consequences. Tables III and IV reports the simulation results of the fifteen scenarios previously described. The difference from Tables I and II is that the interval-censored maximum likelihood estimates and likelihood ratio duration dependence test are reported. Notice that properly treating measured discretely data as interval censored, corrects the problems found in Tables I and II. For example, maximum likelihood estimators of β and α appear unbiased. Regarding the test Ho : β = 1, the likelihood ratio test has the specified Type I Error rate and there appears to be high power for detecting a non-constant hazard function (duration dependence) in the stock trading scenarios of interest in the following example. To illustrate the practical importance of correctly treating data measured discretely, consider the work of Engle and Russell (13) modeling the time between trades of IBM stock. The time between trades plays an important role in the microstructure literature of finance which studies how stocks are traded and the process and speed with which new information is incorporated into asset prices. Whereas Engle and Russell propose an autoregressive conditional duration model which is capable of accounting for higher volume at the beginning and ending of the trading day, for simplicity this example investigates mid-day trading (10:30 am to 3:00 pm) and tests for duration dependence of time between trades. That is, Engle and Russell explicitly model the a-typical volume in the first and last trading hour of the day whereas, in order to focus on the discrete data issue, these unique periods are excluded here. The most significant difference between this analysis and Engle and Russell is the treatment of two trades with the same time stamp. Engle and Russell treat this case as a single large trade that was split in market execution. In contrast, this paper considers each trade in the TAQ data as a separate trade and trades with the same time stamp as occurring sometime

7

in the one second interval. Consider the time between trades for IBM stock on 3 January 2002 between 10:30 am and 3:00 pm. From the TAQ data, this results in n = 3, 828 time between trade values measured discretely since the time stamp is to the nearest second so that Ti∗ takes on the possible values (0, 1, 2, . . .). When the Ti∗ values are erroneously treated as precisely measured, values of Ti , the maximum likelihood estimates of the Weibull parameters are βˆ = 1.089 and α ˆ = 4.523. The likelihood ratio test of Ho : β = 1 has test statistic χ2 = 51.65 whose p-value is less than 0.001, indicating statistically significant duration dependence since the hazard function is increasing as t increases. In contrast, when the time between trades are recognized to be measured discretely, the interval-censored maximum likelihood estimates of the Weibull parameters are βˆ = 0.848 and α ˆ = 3.491. The correct likelihood ratio test of Ho : β = 1 has test statistic χ2 = 162.41 whose p-value is less than 0.001. In summary, this example demonstrates the gross distortion in conclusions caused by the positive bias in maximum likelihood estimates when measured discretely data is erroneously treated as precisely measured. Properly treating the data as interval censored results in confirming Engle and Russell’s finding that the hazard function of time between trades is decreasing as t increases or that the duration dependence indicates the longer the time between trades the less likely the next trade will occur. However, the opposite statistically significant conclusion results from ignoring the precision of the data. 3. DISCRETELY MEASURED DATA Suppose the time to an event is discretely measured, meaning the time periods until the event of interest are countable instead of continuous. For example, count the number of consecutive dividend increases for a given company or the number of consecutive on-time mortgage payments. The possible values for this count are discrete since dividends are issued quarterly and mortgate payments are made monthly, not continuously in time. A sample of discretely measured time to an event or run {Ni } have a discrete probability model with

8

discrete hazard function λ(n) = P[N = n|N > n − 1], n = 1, 2, 3, . . . which has the same conditional probability interpretation as the continuous hazard function but it is only defined at discrete points. A stylized version of the efficient markets hypothesis posits that a sequence of holding period returns on a risky asset should be serially random. Thus, stock runs (N counts the number of consecutive months of positive or negative stock returns) should not exhibit duration dependence (a constant hazard function). A competing hypothesis posits that asset prices contain “bubbles” which grow each period until they “burst,” causing the stock market to crash. McQueen and Thorley (4) argue that bubbles cause runs of positive stock returns to exhibit duration dependence where λ(n) is increasing as n increases (the longer the run the less likely it will end) but that runs of negative stock returns exhibit no duration dependence. It is important to notice that stock runs are discretely measured even if higher frequency data were used. That is, if daily or tick-by-tick instead of monthly data were used to construct the stock run it would still be a count taking on discrete possible values. When the data are discrete, estimating a continuous probability model is technically wrong. However, when the continuous normal probability model is used for discretely measured binomial or Poisson counts, the analysis is simpler and the normal distribution is an adequate approximation. This paper presents a simulation study to investigate the consequences of modeling discretely measured lifetime data with the continuous Weibull probability model. Samples of discretely measured data were generated where the discrete hazard function is given by λ(n) =

1 1+

e−(a+b ln n)

, n = 1, 2, 3, . . .

which is a conditional logistic model that depends on n. Notice that b = 0 yields the geometric probability model whose constant hazard function indicates no duration dependence. If b > 0, the increasing hazard function indicates the longer the event or run the more likely it will end. If b < 0, the opposite is true. It should be noted that other parameterizations 9

involving n could be defined. Since the focus of this research is on short stock runs, the difference between parameterizing the hazard function as linear, quadratic or log is small for n = 1, 2, 3. See Cox (21), Prentice and Gloeckler (16), Kennan (1), and Zuehlke (22) for more discussion of parameterizing the discrete hazard function. The fifteen scenarios for the simulation are chosen to reflect the analysis of the number of runs of positive monthly stock returns in the January 1927 to December 1991 period used by McQueen and Thorley. Sample sizes of n =200, 100, 50 reflect short, medium, and long runs since the longer the stock run the smaller the number of runs in the time period. The five choices for b (-0.6, -0.3, 0, 0.3, 0.6) investigate decreasing, constant, and increasing discrete hazard function. The case of n = 200 and b = −0.3 are of particular interest since this case corresponds to the runs of positive stock returns in McQueen and Thorley. For each sample of discretely measured data, maximum likelihood estimates of the Weibull distribution are computed along with the likelihood ratio test Ho : β = 1 at the 0.05 significance level. Recall that β is the shape parameter of the Weibull that indicates the shape of the hazard function. Notice that the maximum likelihood estimate of β will not equal b, but will indicate the shape of the estimated hazard function and allow a test of duration dependence. Tables V and VI present the results of the simulation study where the discretely measured data are approximated by the Weibull distribution. First, notice that the maximum likelihood estimators of the Weibull shape parameters demonstrate a bias towards increasing hazard functions. For example, if n = 200 and b = −0.6 instead of indicating a decreasing ˆ = 1.252 indicates that on average the approximate hazard function, the Weibull with E(β) Weibull will have an increasing hazard function! Second, notice that the case b = 0.0 is a constant hazard function yet the likelihood ratio test of Ho : β = 1 to indicate duration dependence has a Type I Error Rate much greater than the stated 0.05 level. Third, it appears that when the likelihood ratio test Ho : β = 1 is rejected the estimated Weibull shape parameter may give a wrong conclusion about the shape of the hazard function. For example, when n = 200 and b = −0.6 the test has power 0.953 to reject Ho in favor of Ha : β 6= 1 but often in the false conclusion that the hazard function is increasing.

10

While it may appear that the problems are most severe for large samples, it is actually because the large sample case corresponds to short average runs. There are indications that as the average run length becomes larger, the Weibull approximation becomes more acceptable. However, for the scenarios of this simulation approximating discretely measured time to event or runs is unacceptable. Therefore, it is more appropriate to propose a model for the discrete hazard function, estimate the parameters using maximum likelihood, and compute the likelihood ratio test for a constant discrete hazard function. Following Allison (23), notice that the likelihood function can be written L(a, b; ni ) = f (ni ; a, b) = P[N = ni ] = P[N > 1] · P[N > 2|N > 1] · P[N > 3|N > 2] · · · P[N > ni − 1|N > ni − 2] · P[N = ni |N > ni − 1] = [1 − λ(1; a, b)] · [1 − λ(2; a, b)] · [1 − λ(3; a, b)] · · · [1 − λ(ni − 1; a, b)]λ(ni ; a, b)

which is a product of the complement of the hazard function from 1 to ni − 1 and the hazard function at ni . If the discrete hazard function follows a logistic parameterization such as 1

λ(n; a, b) =

1+

e−(a+b ln n)

then the maximum likelihood estimators of (a, b) can be computed using a logistic regression algorithm. The logistic likelihood function is L(a, b) =

max{Ni } "

Y

n=1

e−(a+b ln n) 1 + e−(a+b ln n)

# yn 

1 1 + e−(a+b ln n)

x n

where xn = # of observations in sample with Ni = n 11

yn = # of observations in sample with Ni > n

which can be maximized to obtain aˆ and ˆb. The likelihood ratio test Ho : b = 0 can be computed by comparing L(ˆ a, ˆb) to L0 (ˆ a0 ) where a ˆ0 maximizes L0 (a0 ) =

max{Ni } "

Y

n=1

e−a0 1 + e−a0

# yn 

1 1 + e−a0

x n

Tables VII and VIII report the simulation results of the fifteen scenarios previously described. The bias of the maximum likelihood estimators for the logistic discrete hazard function model is given in Table VII and the power of the likelihood ratio test of Ho : b = 0 is given in Table VIII. Notice that properly estimating a discrete hazard function for discretely measured data results in unbiased estimators and an unbiased test of duration dependence. The analysis of positive stock runs demonstrates the importance of correctly treating discretely measured data. McQueen and Thorley report the runs of above-average valueweighted portfolio returns. Approximating the discrete hazard function for Ni with the continuous Weibull yields the maximum likelihood estimate of the shape parameter βˆ = 1.337. The likelihood ratio test of Ho : β = 1 has test statistic χ2 = 27.69 whose p-value is less than 0.001. Therefore, approximating the discretely measured stock runs with a Weibull probability model indicates statistically significant duration dependence where the hazard function is increasing as n increases. However, when the stock runs are correctly modeled using a discrete hazard function given by λ(n) =

1 1+

e−(a+b ln n)

, n = 1, 2, 3, . . .

the maximum likelihood estimate ˆb = −0.303 indicates the hazard function is decreasing, meaning that the longer the run of positive stock returns the less likely it will end. The likelihood ratio test of Ho : b = 0 has test statistic χ2 = 4.66 with p-value 0.0309. Therefore, in this application with short runs the Weibull distribution is a poor approximation for discretely measured data since the duration dependence hypothesis test is rejected but with the wrong interpretation. 12

4. CONCLUSTIONS Some continuous economic data is measured discretely, meaning that there has been rounding to some small measurement unit related to the precision of the measuring device. For example, the time between trades is recorded to the nearest second. Other economic data is discretely measured, meaning the time periods until the event of interest are countable instead of continuous. For example, the length of runs of holding period returns on a stock portfolio is a count over the holding periods. This paper examines the consequences of estimating a continuous Weibull probability model when the data are discrete. Simulations show that treating discrete data as if it were continuous leads to a bias towards increasing hazard functions. Two methods are described for unbiased estimation of the hazard function and valid hypothesis tests for constant hazard function. For measured discretely data use interval censored estimation of the Weibull probability model. For discretely measured data choose a discrete hazard function model such as the logistic. These methods are applied to data from the finance literature. BIBLIOGRAPHY (1) Kennan, J. The Duration of Contract Strikes in U.S. Manufacturing. J. Econometrics, 1985, 28, 5-28. (2) Solon, G. Work Incentive Effects of Taxing Unemployment Benefits. Econometrica, 1985, 53, 295-306. (3) Sichel, D. Business Cycle Duration Dependence: A Parametric Approach. Rev. Econ. Statist., 1991, 73, 254-260. (4) McQueen, G.; Thorley, S. Bubbles, Stock Returns, and Duration Dependence. J. Financial Quantitative Analysis, 1994, 29, 379-401. (5) Deng, Y.; Quigley, J.; Van Order, R. Mortgage Terminations, Heterogeneity, and the Exercise of Mortgage Options. Econometrica, 2000, 68, 275-307. (6) Haitovsky, Y. Grouped Data, in Encylopedia of Statistical Sciences 3, Wiley, New York, 1982.

13

(7) Tricker, A. R. Effects of Rounding Data Sampled from the Exponential Distribution. J. Appl. Statist., 1984, 11, 54-87. (8) Tricker, A. R. The Effect of Rounding on the Significance Level of Certain Normal Test Statistics. J. Appl. Statist., 1990, 17, 31-38. (9) Tricker, A. R. Estimation of Parameters for Rounded Data from Non-normal Distributions. J. Appl. Statist., 1990, 17, 219-228. (10) Tricker, A. R. Estimation of Parameters for Rounded Data from Non-normal Distributions. J. Appl. Statist., 1992, 19, 465-471. (11) Tricker, A.; Coates, E.; Okell, E. The Effect on the R chart of Precision of Measurement. J. Qual. Tech., 1998, 30, 232-239. (12) Lee, C.; Vardeman, S. Interval Estimation of a Normal Process Mean from Rounded Data. J. Qual. Tech., 2001, 33, 335-348. (13) Engle, R.; Russell, J. Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data. Econometrica, 1998, 66, 1127-1162. (14) Meeker, W.; Escobar, L. Statistical Methods for Reliability Data, Wiley, New York, 1998. (15) Klein, J.; Moeschberger, M. Survival Analysis: Techniques for Censored and Truncated Data, Springer, New York, 1997. (16) Prentice, R.; Gloeckler, L. Regression Analysis of Grouped Survival Data with Applications to Breast Cancer Data. Biometrics, 1978, 34, 57-67. (17) McDonald, J. Some Generalized Functions for the Size Distribution of Income. Econometrica, 1984, 52, 647-663. (18) Butler, R.; McDonald, J. Trends in Unemployment Duration Data. Rev. Econ. Statist., 1986, 68, 545-557. (19) Butler, J.; Anderson, K.; Burkhauser, R. Work and Health After Retirement: A Competing Risks Model with Semiparametric Unobserved heterogeneity. Rev. Econ. Statist., 1989, 71, 46-52. (20) Gajewski, B.; Sedwick, J.D.; Antonelli, P.J. A Log-normal Distribution Model of the

14

Effect of Bacteria and Ear Fenestration on Hearing Loss: A Bayesian Approach. Stat. in Med., 2004, in press. (21) Cox, D. Regression Models and Life-Tables. J. Royal Statist. Soc. Ser. B, 1972, 34, 187-220. (22) Zuehlke, T. Business Cycle Duration Dependence Reconsidered. J. Bus. Econ. Stat., 2003, 21, 564-569. (23) Allison, P. Survival Analysis Using the SAS System: A Practical Guide, SAS Publishing, Cary, NC, 1995.

15

Table I: Bias of Maximum Likelihood Estimators of Weibull Parameters (α, β) when Measured Discretely Data are Erroneously Treated as Precisely Measured. βˆ

α ˆ

Sample Size (n)

Sample Size (n)

3,000

600

50

3,000

600

50

(α = 3.5) (α = 17.5) (α = 210.0) 0.8 0.270 0.086 0.036

1.147

1.547

3.335

Shape Parameter

0.9 0.276 0.081 0.035

0.987

1.245

2.991

(β)

1.0 0.281 0.079 0.039

0.878

1.063

2.140

1.1 0.286 0.077 0.043

0.798

0.933

1.184

1.2 0.290 0.076 0.044

0.741

0.829

1.186

Table II: Approximate Power of Likelihood Ratio Test of Constant Hazard Function (Duration Dependence) when Measured Discretely Data are Erroneously Treated as Precisely Measured. Sample Size (n) 3,000

600

50

0.8

0.999 0.983 0.437

Shape Parameter

0.9

1.000 0.075 0.111

(β)

1.0

1.000 0.682 0.541

1.1

1.000 0.999 0.175

1.2

1.000 1.000 0.432

16

Table III: Bias of Maximum Likelihood Estimators of Weibull Parameters (α, β) when Measured Discretely Data are Correctly Treated as Interval Censored. βˆ

α ˆ

Sample Size (n)

Sample Size (n)

3,000

600

50

3,000

600

50

(α = 3.5) (α = 17.5) (α = 210.0) 0.8 0.001 0.002 0.023

0.004

0.016

0.891

Shape Parameter

0.9 0.001 0.002 0.024

0.000

0.005

1.236

(β)

1.0 0.001 0.003 0.029

0.001

0.017

0.804

1.1 0.000 0.003 0.034

0.000

0.013

0.096

1.2 0.001 0.003 0.026

0.000

0.000

0.246

Table IV: Approximate Power of Likelihood Ratio Test of Constant Hazard Function (Duration Dependence) when Measured Discretely Data are Correctly Treated as Interval Censored. Sample Size (n) 3,000

600

50

0.8

1.000 1.000 0.488

Shape Parameter

0.9

1.000 0.899 0.142

(β)

1.0

0.051 0.050 0.056

1.1

1.000 0.845 0.161

1.2

1.000 1.000 0.409

17

Table V: Expected Value Maximum Likelihood Estimators of Weibull Parameters (α, β) when Discretely Measured Data are Approximated by the Continuous Weibull Probability Model. βˆ

α ˆ

Sample Size (n)

Sample Size (n)

200

Shape Parameter (b)

100

50

200

100

50

-0.6 1.252 0.946 0.799 2.414 4.282

7.674

-0.3 1.316 1.096 0.957 2.760 4.682

9.308

0.0

1.534 1.262 1.168 2.493 5.047

9.660

0.3

1.621 1.465 1.405 2.810 5.000

9.817

0.6

1.806 1.690 1.657 2.597 4.456 10.417

Table VI: Approximate Power of Likelihood Ratio Test of Constant Hazard Function (Duration Dependence) Discretely Measured Data are Approximated by the Continuous Weibull Probability Model. Sample Size (n) 200

100

50

-0.6 0.953 0.248 0.640 Shape Parameter (b)

-0.3 0.999 0.210 0.075 0.0

1.000 0.879 0.219

0.3

1.000 0.999 1.000

0.6

1.000 1.000 0.997

18

Table VII: Bias of Maximum Likelihood Estimators of Logistic Discrete Hazard Function for Discretely Measured Data. ˆb

a ˆ

Sample Size (n)

Sample Size (n)

200

100

50

200

100

-0.6 0.025 0.027 0.039 -0.007 -0.015

50 0.037

Shape Parameter

-0.3 0.018 0.023 0.034 -0.006 -0.016 -0.045

(b)

0.0 0.020 0.020 0.036 -0.005 -0.012 -0.054 0.3 0.020 0.024 0.043 -0.008 -0.021 -0.071 0.6 0.019 0.030 0.050 -0.005 -0.027 -0.090

Table VIII: Approximate Power of Likelihood Ratio Test of Logistic Discrete Constant Hazard Function (Duration Dependence) for Discretely Measured Data. Sample Size (n) 200

100

50

-0.6 0.968 0.986 0.965 Shape Parameter (b)

-0.3 0.602 0.597 0.509 0.0

0.051 0.048 0.053

0.3

0.566 0.567 0.472

0.6

0.961 0.969 0.941

19

Appropriate Likelihood Density Approximation Density

0.30

f(1) w

F(2)-F(1)

0.15

Likelihood

0.20

0.25

F(1)-F(0)

0.05

0.10

f(2) w

0

2

4

6

8

10

t

Figure 1: “Appropriate Likelihood” and Density Approximation for Weibull (α = 3.5, β = 0.8) at Precision w = 1.

20