estimation sample selection for discretionary accruals ...

4 downloads 0 Views 791KB Size Report
accruals models in cross-section, and define the estimation sample as all firms in the same ... estimation samples for discretionary accruals models should be ...
Estimation Sample Selection for Discretionary Accruals Models Frank Ecker, Jennifer Francis*, Per Olsson and Katherine Schipper (Duke University)

We examine how the criteria for choosing estimation samples (peer firms) affect the ability to detect discretionary accruals, using several variants of the Jones (1991) model. Researchers commonly estimate accruals models in cross-section, and define the estimation sample as all firms in the same industry. We examine whether firm size performs at least as well as industry membership as the criterion for selecting estimation samples. For U.S. data, we find that estimation samples based on similarity in lagged assets perform at least as well as estimation samples based on industry membership at detecting discretionary accruals, both in simulations with seeded accruals between 2% and 100% of total assets and in tests examining restatement data. For non-U.S. data, we show that industry-based estimation samples result in significant sample attrition and that estimation samples based on lagged assets perform at least as well as estimation samples based on industry membership, with substantially less sample attrition.

October 2011

All authors are at the Fuqua School of Business, Duke University. * Corresponding author: [email protected], Fuqua School of Business, Duke University, Durham, NC 27708. We appreciate helpful comments from workshop participants at the 2011 Global Issues in Accounting Conference at the University of North Carolina, Copenhagen Business School, ESADE Business School, Duke University, Hong Kong Polytechnic University, Stockholm School of Economics, University of Arizona, University of Lancaster, University of Michigan, and University of Washington.

Estimation Sample Selection for Discretionary Accruals Models 1. Introduction Using several variations of the Jones (1991) model of discretionary accruals, we examine how the selection of estimation samples affects the power of these models to detect discretionary accruals. Our research aims to provide a practical solution to the problem of substantial sample attrition when discretionary accruals models are estimated in time series (eliminating firms that lack the requisite number of time-series observations) and in industry cross-sections (eliminating firms whose industries lack the requisite number of members). The problem we address is significant in the U.S. and acute in non-U.S. markets. The average number of firms per industry in the U.S. during 1988-2009 with the necessary data to estimate an accruals observation1 is 80.5 (SIC2), 21.7 (SIC3) and 13.6 (SIC4); for the 99 non-U.S. countries with data on Compustat Global over the same time period the corresponding mean values are 3.5 (SIC2), 1.8 (SIC3) and 1.6 (SIC4). Imposing typical requirements—that 10 observations besides the event firm are available for an industry to be included—leads to substantial sample attrition and the complete elimination of many countries from studies of discretionary accruals. For U.S. data, the use of industry-based estimation samples eliminates 1% to 22% of the otherwise-available sample (depending on how industry is defined); for the 69 non-U.S. countries with at least one year with 11 firm-year observations overall, the sample loss is between 32% and 93% on average (depending on how industry is defined and the weighting of countries). That is, requiring sufficient data to form industry-based estimation samples for a given country causes between 29 and 40 countries (out of 69) to be eliminated entirely from a study that estimates discretionary accruals by industry.2                                                             

1

The accruals models require each observation to have current and one-year lagged financial data to construct the dependent and independent variables. For example, the basic Jones (1991) model requires data on lagged total assets, current total accruals, current net property, plant and equipment, and the change in sales revenues. 2 The researcher could pool data across countries, within industry, to increase the number of firms in each industry. To the extent countries are not homogeneous on dimensions that affect accruals, this pooling creates noise and lowers the power of the test. If the researcher aims to examine jurisdictional influences on discretionary accruals, pooling observations across jurisdictions is either not feasible or will bias the results against detecting such influences.

1   

We propose and test a solution to this problem, specifically, basing estimation samples on similarity in size, not industry membership. Our solution rests on two principles. The first is that estimation samples for discretionary accruals models should be homogeneous with respect to the accruals-generating process. The assumption behind choosing estimation samples based on industry is that firms in the same industry (that is, the same SIC code) meet this criterion. However, research shows that this assumption may not be empirically descriptive (e.g., Bernard and Skinner, 1996; Brickley and Zimmerman, 2010). Dopuch et al. (2010) find little support for the assumption that firms in the same industry have a homogeneous accruals-generating process; they do not, however, propose an alternative to industry for defining an estimation sample. The second principle is that the criterion for choosing estimation samples should be both widely available and numerical, so that large samples can be ranked on the criterion. These two principles support firm size as the criterion for selecting estimation samples.3 With regard to the first principle, similarity in the accruals process, previous research shows that firm size is correlated with factors associated with accruals, such as growth, complexity and monitoring. Relative to smaller firms, larger firms are likely to be older (so more stable, with lower growth rates), to have more segments (so more complex) and to be more closely monitored (larger analyst following, more regulatory oversight, greater likelihood of Big-4 auditor, more institutional ownership). To the extent these and other factors correlated with size affect the accruals-generating process, we expect that grouping firms on size sharpens the estimated accruals parameters. In addition, Kothari et al. (2005) document a correlation between size and discretionary accruals estimated using an industry-crosssectional accruals model. With regard to the second principle, size-based estimation samples impose no sample loss incremental to that imposed by the accruals models themselves, because size-based peers can be defined as firms that are closest in size to the target firm. Since firms can always be ranked on size, there will                                                              3

 To emphasize the importance of homogeneity in the accruals-generating process within an estimation sample, we refer to estimation samples as “peer firms.” For example, estimation samples based on industry membership are “industry peers” or “industry-based peers” and estimation samples based on size are “size peers” or “size-based peers.”

2   

always be a set of firms in the neighborhood of the target firm. Whether those firms are close enough for purposes of detecting discretionary accruals is the empirical question we explore.4 Evidence that firm size performs as well as or better than industry membership as the criterion for selecting an estimation sample for a discretionary accruals model means that researchers can substantially expand sample sizes without loss of detection power. We investigate the selection of estimation samples (peer firms) for estimating accruals models using simulations and archival tests of restatement data. Our simulations modify the approach used by Dechow et al. (1995) to examine the power of several discretionary accruals models to detect earnings management. Their aim was to compare accruals models, whereas we compare the effects of estimation sample selection for a given set of models. Dechow et al. estimated accruals models using time-series data5 while our focus is on cross-sectional estimation using industry-based versus size-based criteria for peer firm selection. We use Dechow et al.’s framework to guide our analysis and to assist the reader in interpreting our results. Our simulations use all U.S. firms with available data on Compustat over 1951-2009. These simulation tests reveal that peers based on lagged assets are at least as powerful as industry peers at detecting induced discretionary accruals, and are often more powerful. Our archival tests consider the power of size-based and industry-based peer groups to detect discretionary accruals for restatement firms during 1996-2006. Restatement firms have unknown (to the researcher) amounts of discretionary accruals, because a restatement is after-the-fact evidence of purposeful or inadvertent deviations from a normal (GAAP-based) accruals-generating process. These tests show that lagged-asset-based peers have significantly higher rates of discretionary accruals detection than do industry-based peers: on average                                                              4

Our tests also address the fact that the larger population of U.S. traded firms relative to any non-U.S. market implies that size-based neighbors in the U.S. will (likely) be more similar to the target firm than are neighbors in non-U.S. markets. 5  Using the firm as its own control offers the advantage of holding firm-specific factors constant, but results in substantial, even unacceptable, sample loss because sample firms must have a sufficient time series to estimate the accruals models. For example, requiring at least 10 consecutive time-series observations eliminates 57% of the 251,367 firm-year observations on Compustat over 1950-2009 that have the data necessary to estimate the variables in a Jones-style accruals model.

3   

tests using lagged asset peers detect restatements about 22% of the time versus about 4-6% for tests based on industry peers. We also consider the power of size-based peers to detect seeded discretionary accruals using non-U.S. data.6 Our tests, using Compustat Global data for1988-2009, show that size-based peers perform as well as, or better than, industry-based peers at detecting induced discretionary accruals and impose far less sample attrition. Taken together, our results indicate that size-based estimation samples perform well in terms of detecting discretionary accruals in both U.S. data and non-U.S. data. The use of size-based estimation samples for U.S. data avoids sample losses (up to 22% in some cases) that arise from using industrybased estimation samples. Our results have even greater practical value for estimating discretionary accruals models using non-U.S. data, where basing estimation samples on industry membership results in sample attrition as high as 90%. Entire countries that cannot be included in industry-based estimations can be included in size-based estimations, with no loss of power to detect discretionary accruals. The remainder of the paper is organized as follows. Section 2 provides descriptive evidence on heterogeneity in total accruals and discretionary accruals and the extent to which that heterogeneity exists in industry groupings. Section 3 reports the results of a simulation analysis that investigates the power of size-based and industry-based peer groups to detect induced discretionary accruals using U.S data. Section 4 analyzes the ability of estimations using industry-based peers and size-based peers to detect discretionary accruals of restatement firms. Section 5 applies our simulation tests for detection of discretionary accruals to non-U.S. samples. Section 6 reports additional tests and section 7 concludes.

2. Descriptive Evidence of Accruals Heterogeneity We provide descriptive information about total accruals, discretionary accruals and size measures for the sample of Compustat firm-years covering 1951-2009 that meet the data requirements of our simulation, described in section 3. The requirements for firm-years to be included in the simulation sample are described in Table 1. Other than satisfying plausibility checks and reporting the data items                                                              6

 We do not analyze non-U.S. restatement data because these data are not widely available.

4   

needed for the discretionary accruals models, firm-years are required to have the data on each criterion we use to create estimation samples. The criteria are industry membership, which requires data on SIC code, and size, which requires data on assets, lagged assets, sales, lagged sales, market capitalization and firm age. Because our analyses require a minimum number of firms to estimate each cross-sectional discretionary accruals model, we require that each peer group have at least 11 firm-year observations (10 non-event firms and one event firm).7 The final simulation sample contains 143,584 firm-years, representing 59 cross-sections (one per year) and 1,972 distinct SIC2-year peer groups (4,349 distinct SIC3-year peer groups and 5,486 distinct SIC4-year peer groups). On average, the industry-year peer groups contain 73 firms (SIC2), 33 firms (SIC3) and 26 firms (SIC4). Panel A of Table 2 reports descriptive statistics on total accruals (signed and unsigned) and on size-based criteria for identifying peer firms: assets, lagged asset, sales, lagged sales, market capitalization, and firm age. These data show that mean (median) total accruals for the simulation sample is -3.1% (-3.9%) of total assets, with a standard deviation of 33.12%. Panel B of Table 2 shows how the standard deviation of accruals (scaled by mean absolute accruals) varies when the sample is grouped by industry (SIC2, SIC3 and SIC4). The column labeled ‘Entire cross-section’ provides a benchmark, the scaled standard deviation measure without any grouping other than by fiscal year. Consistent with the view that the accruals-generating process is more homogeneous within industry than across-industry, grouping on industry reduces the scaled standard deviation of unsigned accruals measures. However, the standard deviations of scaled signed accruals for SIC3 and SIC4 exceed the cross-section standard deviation, which suggests heterogeneity in industry groupings. Also consistent with the view that grouping by industry membership reduces heterogeneity, there is generally less variation in the size-based variables within each industry than within the entire cross-section. For example, the standard deviation of total assets (sales) decreases from 3.51 (3.64) for the cross-section to 1.84 (1.79) for the grouping on SIC 4. However, the standard deviation of profitability measured by ROA                                                              7

This requirement, or a similar one, is common in the literature. For example, Kothari et al. require 10 firms in each 2-digit SIC code.

5   

increases with the fineness of industry groupings, from 5.02 for the cross-section to 12.44 for the SIC4 grouping. Similarly, with regard to complexity, measured by the number of segments, the data in Panel B suggest that grouping on industry increases complexity (the average number of segments reported increases from just under 2 for the cross-section to just under 2.5 for the grouping on SIC4). Results are similar when we measure complexity as the number of reported segments from different SIC codes. The last three rows of Panel B report descriptive data from the Jones (1991) model, specifically, the standard deviations of total accruals scaled by total assets and the residuals from the accruals model, and the explanatory power of the accruals model. Industry groupings typically reduce heterogeneity, as measured by standard deviation, relative to the benchmark (the sample cross-section). The standard deviation of total accruals decreases from .1893 for the cross-section to .1422 for the SIC4 grouping; the discretionary accruals standard deviation decreases from .1622 for the cross-section to .0964 for the SIC4 grouping and explanatory power increases from .3304 for the cross-section to .4970 for the SIC4 grouping. We interpret these descriptive data as supporting both intuition and prior research that uses industry membership to identify estimation samples for discretionary accruals models. Prior research (e.g., Dopuch et al. 2010) also suggests that industry groupings are nonetheless heterogeneous with respect to factors associated with the accruals-generating process. Our aim is to provide evidence on whether choosing estimation samples based on similarity in size yields results that are at least as good as choosing estimation samples based on industry membership, in terms of detecting discretionary accruals. We focus on firm size as the alternative (to industry membership) for selecting estimation samples because both intuition and prior research suggest that firm size is an important factor explaining accruals and because the size criterion imposes few sample restrictions.

3. Simulations with Seeded Discretionary accruals This section describes our simulation analyses of how choosing estimation (peer) samples based on size versus industry membership affects the ability to detect seeded discretionary accruals. Sections 6   

3.1-3.6 describe the simulation design, including the reasoning supporting certain design choices, and section 3.7 reports the results. 3.1 Peer group definitions Our benchmark estimation sample is a peer group formed by a random selection of firms from the sample cross-section.8 We define industry-based estimation samples (industry peers) based on 2-digit, 3-digit and 4-digit SIC codes, because our reading of the literature suggests that these are the most commonly-used approaches to determining estimation samples for discretionary accruals models.9 Based on the reasoning described in section 1 for considering estimation samples based on similarity in firm size, we select size-based peers based on total assets, lagged total assets, sales, lagged sales, market capitalization and firm age.10 3.2 Simulation procedures This section describes our simulations; we describe the reasons for certain design choices in section 3.6. Using the firm-years from the simulation sample described in Table 1, we perform 100 iterations. Each iteration involves four steps: 1. We randomly select 500 firm-years and define them as event-firm-years. The event-firm-years remain constant throughout the iteration. 2. For each event-firm-year, we select initial peer firms from which estimation samples will be chosen: a. Sample cross-section; b. Industry-based peer groups (SIC2 industry, SIC3 industry, SIC4 industry) chosen by matching by year and industry, regardless of the number of firm-years in the industry; c. Size-based peer groups (total assets, lagged total assets, sales, lagged sales, market capitalization, firm age) chosen by matching the year and the 25 adjacent (to the event firm) lower-ranked firms and the 25 adjacent (to the event firm) higher-ranked firms; these are the event firm’s “closest neighbors.”                                                             

8

We do not consider double-sorted peer groups (i.e., firms from the same industry and same size decile), because our goal is to relax sample restrictions. 9 Results of sensitivity tests related to industry definition are discussed in Section 6. 10 We also examined ROA-based peer groups. Results (not tabulated) indicate that ROA-based estimation samples have discretionary accruals detection power that is marginally better than that reported for the entire cross-section.

7   

Step 2 yields varying numbers of observations in the sets of initial peer firms; they vary for the four industry-based peer groups and are fixed at 50 for the six size-based peer groups. 3.

To equalize the number of observations in each peer group, we randomly select 10 firms from the initial peer groups identified in Step 2. This step is intended to equalize the power of tests, which is affected by sample size. 11 We use a constant number of peer firms (10), both to estimate the discretionary accruals regression and for the significance test on the difference in discretionary accruals between event and non-event observations.

4. We repeat Steps 2 and 3 for each of the 500 event-firm-years. Our simulation tests are based on 100 iterations, yielding 50,000 event-firm-years, each matched with 10 peer-firm-years from each of the 10 peer group definitions. While this design does not impose restrictions on how often a firm-year can be selected, either as an event observation or a non-event observation, the two-layer sampling minimizes the likelihood an entire subsample would be replicated. 3.3 Seeding discretionary accruals For each event-firm-year, we seed discretionary accruals into the data. First, we calculate the event-firm-year’s ratio of total accruals to lagged total assets (the dependent variable in the accruals model regressions). We add between 2% and 20% of lagged total assets, in two percentage point increments, to yield 10 “positively managed” accruals figures for each event-firm-year. As a reference, Dechow et al. (1995) seed discretionary accruals in 10 percentage point increments, from 0% to 100%. While we also report results for 20% to 100% seed levels we focus on smaller amounts of induced discretionary accruals (20% or less) because we believe they are more descriptive of observed levels of discretionary accruals. In particular, Table 2 shows that the 5th and 95th percentiles of scaled signed accruals are about -.022 to +0.19, respectively; the unsigned percentiles are 0.006 (5th percentile) and 0.30 (95th percentile). The 0% seed case reflects 0% discretionary accruals as proxied by the raw Compustat data.

                                                             11

Table 9, discussed in Section 6, provides evidence of a negative correlation between the number of firms in an industry and power to detect discretionary accruals.

8   

3.4 Discretionary accruals estimation We conduct tests on six discretionary accruals models that have been used in the literature. The six models are based on variations of the following four equations:

Total accrualsi ,t Total Assetsi ,t −1 Total accrualsi ,t Total Assetsi ,t −1 Total accrualsi ,t Total Assetsi ,t −1

Total accrualsi ,t Total Assetsi ,t −1

= α1

Δ Salesi ,t Net PPEi ,t 1 + α2 + α3 + DAiJones ,t Total Assetsi ,t −1 Total Assetsi ,t −1 Total Assetsi ,t −1

= α 0 + α1

= α 0 + α1

=α0 + α1

where

,

(1)

Δ Salesi ,t Net PPEi ,t 1 ( intercept ) + α2 + α3 + DAiJones ,t Total Assetsi ,t −1 Total Assetsi ,t −1 Total Assetsi ,t −1

(2)

Δ Salesi ,t − Δ ARi ,t Net PPEi ,t 1 . Jones + α2 + α3 + DAiMod ,t Total Assetsi ,t −1 Total Assetsi ,t −1 Total Assetsi ,t −1

(3)

Δ Salesi ,t Net PPEi,t 1 + ROA (4) + α2 + α3 + α4 ROAi ,t + DAiJones ,t Total Assetsi,t −1 Total Assetsi ,t −1 Total Assetsi ,t −1 = firm j’s total accruals in year t, measured as the change in current assets

(adjusted for the change in cash) minus the change in current liabilities (adjusted for current liabilities used for financing) minus depreciation expense ; ,

,

= firm j’s change in sales between year t-1 and t;

equipment in year t;

,

= firm j’s total assets in year t-1; ,

= firm j’s net property, plant and

= firm j’s change in accounts receivable between year t-1 and t; 1;

,

=

firm j’s return on assets in year t. Equations (1) and (2) are the Jones (1991) model without an intercept and with an intercept, introduced by Kothari et al. (2005). Equation (3) is a modified Jones model which includes an intercept and an adjustment for the change in accounts receivable. Equation (4) adds a performance adjustment Model

(current period ROA) to equation (2). The residuals from estimating equations (1)-(4), denoted DAi ,t

,

provide four measures of discretionary accruals. As described by Kothari et al., these residuals can be used to construct “performance-adjusted discretionary accruals” (PADA), equal to the difference between two residuals, where the second residual is selected from an ROA-matched firm in the same peer group. We consider two measures of discretionary accruals that are the PADA-variants of the discretionary accruals obtained from equations (2) and (3). For completeness, we also evaluate the discretionary

9   

accruals obtained from a Healy-type (1985) model which uses the average accruals for the peer firms as the benchmark for normal accruals. In total, therefore, we consider seven accruals models. 3.5. Assessing detection power Our simulations investigate the ability to detect an amount (between 2% and 20%) of seeded discretionary accruals at the subsample level. For each peer firm definition, we obtain 50,000 subsamples, each one containing one event-firm matched with 10 peer firms, and each one containing the 10 levels of seeded accruals. These seeded amounts represent the discretionary accruals that are the object of the discretionary accruals detection tests. Our tests compare the estimated discretionary accruals of the event-firm-years with the average of the estimated discretionary accruals of the 10 non-event-firm-years, using the following regression, separately for each seed level: Discretionary accrualsi ,t ⎣⎡ε i ,t ⎦⎤

[0%, 2%,4%,..., 20%]

= α 0[0%,2%, 4%,...,20%] + α1[0%,2%,4%,...,20%] Event Dummyi ,t + ηi ,t

(5)

where EventDummy = 1 for the single event-firm observation in each subsample. We assess detection power by counting the number of positive α1 coefficients that are significant at the 10% level.12 The detection rate is the fraction, out of 50,000, of significant coefficients. 3.6 Rationale for simulation choices In this section, we describe the reasoning for four design choices used in the simulation. Our first design choice concerns the two-layer selection of peer firms, as described in section 3.2. We believe the two-layer selection captures a typical setting faced by a researcher whose objective is to identify discretionary accruals. For example, there is typically a limited number of “event” firm-years, each of which is hypothesized to contain discretionary accruals of some (perhaps unspecified) amount and sign. In all likelihood, the sample and data restrictions in a typical discretionary accruals study are more constraining than suggested by the comprehensive dataset from which we initially select peer firms in Step 2. We use the Step-3 selection to capture the effects of the constrained sample size of a typical discretionary accruals study. The second selection layer ensures a

                                                             12

We find similar results (not reported) using 5% and 1% significance levels.

10   

constant number of peer firms and ensures that our results are not driven by using the closest size-based neighbors in the estimation sample.13 A second design choice concerns the selection of neighbors for the size-based peer groups. We form relative peer groups (i.e., peers are defined as the event firm’s closest-in-size neighbors) not absolute peer groups (i.e., peers are determined using an absolute size cutoff). For example, an absolute size peer group would contain firms from the same asset decile or market capitalization decile as the event firm, where the cutoff values for those deciles are determined using the entire cross-section of firms; the eventfirm is then matched to firms from its same decile. We believe the absolute peer group approach has at least two disadvantages compared to the relative peer group approach. First, the absolute approach does not ensure symmetry in the selection of peer firms; for example, non-event firms from the same decile will be systematically smaller (larger) when the event firm is a large (small) firm in that decile. Second, the absolute peer group approach requires the full cross-section of firms to determine the initial partitions (e.g., asset deciles or market capitalization deciles). Forming initial deciles/partitions using a subsample of 500 firms yields different results than forming deciles/partitions using a subsample of 5,000 firms, particularly if the 500 firm subsample is not distributed uniformly across the deciles of the 5,000-firm subsample, but, say, biased towards bigger and more profitable firms. A third design choice concerns the interaction between seeded discretionary accruals and the models of discretionary accruals. We seed discretionary accruals by adding between 2 and 20 percentage points to the event-firm’s ratio of total accruals to lagged assets. We do not adjust other variables such as sales or total assets. Our seeding, therefore, is most similar to Dechow et al.’s (1995) “expense manipulation” view of discretionary accruals (although our approach is cross-sectional and theirs is timeseries). Modeling other financial statement effects requires additional assumptions; as summarized and applied by Dechow et al., these changes affect the values of the independent variables, and their power,

                                                             13

As a sensitivity test, we select the ten closest-in-size peers. Results (not reported) show that this lagged asset peer group performs better than the one we select based on the random sampling in Step 3.

11   

differentially, depending on what independent variables are included.14 Because we do not compare accruals models (the focus of Dechow et al.), we focus on the simplest (from a modeling perspective) view of discretionary accruals which does not adjust the values of the independent variables in the accruals models and therefore does not induce varying estimation power across models. A fourth design choice concerns the level of the analysis. As described in section 3.5, we analyze detection rates at the subsample level (50,000 subsamples each consisting of 1 event firm and 10 nonevent firms) rather than at the more aggregated sample level (100 samples each consisting of 500 event firms and 5,000 non-event firms). We select the subsample level for two reasons. First, as a practical matter, the discretionary accruals estimation must be performed at the subsample level, so the finer data are readily available. Second, it is hard to observe differences at the sample level because firm-specific idiosyncrasies are averaged out, implying that even at small seeded discretionary accruals levels detection rates will approach 100% quickly for all models. Stated differently, we believe that performing our analysis at the sample level (where the number of event firms and non-event firms is large) would mislead readers about the generalizability of our findings to the smaller samples typically used in discretionary accruals research. By analyzing subsamples, we believe we more closely approximate the issues faced in discretionary accruals research. 3.7 Results of simulation Table 3 reports for each discretionary accruals model (panels of table) and peer group definition (columns of table) the fraction of times, as a percentage of the 50,000 subsamples, that the α1 coefficient from equation (5) is significantly positive at the 10% level for increasing levels of seeded discretionary accruals (the seed levels are the rows of each panel). We refer to the rejection rate, at the 10% level, as the discretionary accruals detection rate. Results for the 0% seed                                                              14

Dechow et al. also model earnings management as “revenue manipulation” and “margin manipulation”. In the revenue manipulation scenario, they add the seeded discretionary accruals to total accruals, sales and accounts receivable. For the models given by equations 1, 2 or 3, the change in sales would increase; however, the modified model (equation 4) is not affected because the increase in sales is offset by the same increase in accounts receivable. In the margin manipulation scenario they introduce a magnifier, defined as the net profit margin, to translate an X dollar change in total accruals to a (bigger) change in sales and accounts receivable. Again, this affects the models which contain the change in sales, but does not affect the modified models because the bigger change in sales is offset by the same bigger change in accounts receivable.

12   

level correspond to the benchmark case of no seeded discretionary accruals and can be used to gauge the validity of the approach. Specifically, if there are no discretionary accruals and if the data are truly random, 5% of cases should exhibit significant positive discretionary accruals at the 10% level (5% of cases will show significant negative discretionary accruals). A 95% confidence interval around this theoretical rejection rate ranges from 0.7% to 9.3% for samples of 100 observations. Table 3 shows that the detection rates for the 0% seed level are close to 5% for all peer definitions and models, although there is some evidence that the 0% detection rates are higher for the lagged asset-based estimation sample. 15 These rejection rates are well within the limits of the 95% confidence interval and are not statistically different from 5% or each other. Turning to the non-zero seed levels, our interest is in which peer group (column) yields the highest detection rates across accruals models. Figure 1 illustrates differences in detection rates across peer groups and seed levels graphically for the Jones model with intercept, given by equation (2). These results show that for all seed levels, the peer group formed using the cross-section performs the worst; this result is expected since this peer group does not attempt to match non-event firms with event-firms on any dimension except the event year. Peer groups based on firm age, sales and market capitalization also perform poorly across all seed levels. At the lowest seed levels (less than 10%), the industry-based peers and lagged asset-based peers have the highest detection rates, and these rates are similar. At seed levels between 10% and 20%, the detection rate of the lagged asset -based peers begins to dominate the others. We create two measures of the aggregate performance of each peer group across the 10 seed levels for each model. Our first measure of the performance of a given peer group/model/seed level combination is the detection rate achieved by a specified peer group, for each model and seed level, divided by the maximum detection rate achieved by any peer group for that seed level/model combination. We average this peer group/model/seed level performance measure across the seed levels                                                              15

Sensitivity tests reported in Section 6 show that these slightly elevated rejection rates are largely driven by our choice of 10 peer firms. The rejection rates in the 0% seed case do not propagate into higher detection rates for positive seed levels. We find that increasing the number of size-based peer firms (which can be done without sample loss) leads to lower benchmark rejection rates, but not necessarily to lower detection rates for the positive seed levels.

13   

for each model and peer group, to yield an “effectiveness score” for each peer group/model combination. The closer the effectiveness score to 100% (the value that would result if that peer group had the highest detection rate for every seed level), the better at detection is that peer group. This measure captures the distance between a given detection rate and the maximum detection rate. Our second performance measure assigns a rank to each peer group/model/seed level combination; the lowest (highest) rank is assigned to the peer group with the best (worst) detection rate. We average these ranks across the seed levels, for each peer group, to obtain an “effectiveness rank”. The closer the effectiveness rank to 1.0 (10.0), the better (worse) is that peer group at detecting discretionary accruals. The effectiveness scores and effectiveness ranks for each peer group are reported in the last two rows of each panel of Table 3. Consistent with Figure 1, we find that the cross-section peer group performs the worst, with effectiveness scores between 77.0% and 81.8% and effectiveness ranks of 9.910. Peer groups based on firm age, sales and market capitalization perform better, with effectiveness scores between 84.1% and 90.5% and ranks between 4.7 and 8.3. The industry peer groups have effectiveness scores between 93.6% and 99.0% and ranks of 1.70 to 4.80. Consistent with the better comparability of finer industry definitions, effectiveness scores monotonically increase, and effectiveness ranks monotonically decrease, over the 2-digit, 3-digit and 4-digit industry peer definitions. In general, the lagged assets-based peer group has the highest effectiveness scores (98.7% to 99.6%) and lowest effectiveness ranks (1.60 to 2.5), except that for the PADA-models the SIC4 peer group outperforms the lagged asset peer group. We extend the analyses in Table 3 to seed levels between 20% and 100%, in 10 percentage point increments. Results of this analysis for the Jones model with intercept, graphed in Figure 2, show that for seeded discretionary accruals exceeding 20%, the lagged asset-based peer group has the best detection power. Effectiveness scores and effectiveness ranks that include the higher seed levels (not reported) confirm that lagged asset-based peers have the highest discretionary accruals detection rates of the peer groups we consider.

14   

We perform two sensitivity checks on these simulation results. First, include all non-event firms in the event-firm’s industry in the estimation sample. Therefore, there will be more peer firms in the industry-based peer groups than in the size-based peer groups (where we continue to select 10 observations). Adding more firms to the industry peer groups adds degrees of freedom for estimating the accruals model, which should improve the detection rates for industry peers. Results of this analysis (not tabulated) are mixed: increasing the number of firms in the industry-based estimation sample reduces the detection rate for SIC2, does not affect the detection rate for SIC3, and increases the detection rate for SIC4. The relation between detection rates for these expanded samples of industry-based peers and detection rates for lagged assets-based peers is unaffected; the detection rate for lagged asset-based peers is indistinguishable from the detection rates for industry-based peers at seed levels less than 10%, and is higher for seed levels above 10%. In a second sensitivity test we align the size of the size-based peer groups with the size of the SIC4 peer group, for each event firm. This approach maximizes the estimation sample for the industry peers and ensures that size-based peer groups are the same size as the industry-based peer groups. However, if multiple event-firm-years are selected from the same SIC4-year, the industry peer firms selected will be the same for the event-firm (with the exception of the event-firm itself), but the sizebased estimation samples might shift considerably. Results (not tabulated) are similar to those reported. To summarize, our simulation tests indicate that lagged asset-based peers perform at least as well as, and sometimes better than, industry-based peers at detecting induced discretionary accruals. At low levels of seeded discretionary accruals (0%-20%) differences in detection rates between industry-based peers and lagged asset-based peers are small; the differences become larger, and favor lagged asset peers, for higher seed levels. This finding is reasonably robust across the accruals models we examine.

4.

Tests on Restatement Firms In this section we examine the performance of industry-based and size-based estimation samples

in detecting unusual levels of absolute discretionary accruals for firm-years with restatements. In contrast 15   

to simulation tests with specified levels of induced (seeded) discretionary accruals, we take the existence of a restatement as evidence that reported earnings contained material but unspecified amounts of discretionary accruals (we refer to both intentional and unintentional misstatements as discretionary accruals).16 We assess how often the restatement firm’s estimated absolute discretionary accruals exceed the estimated absolute discretionary accruals of its peer firms. Under the view that restatement firms have, in fact, managed earnings such that their absolute discretionary accruals are larger than those of their peers, larger detection rates indicate that the peer group is better at detecting discretionary accruals. Our analyses of discretionary accruals are performed at the iteration level; we perform 200 iterations. For each iteration, we select 100 event-firm-years from the population of Compustat firms with restatements during 1996-2006; an event-firm-year is a firm-year with a restatement announcement in the 11 months following the fiscal-year end. For each event-firm-year, we randomly select 10 nonrestating firms from the same year and from each peer group, where the 10 peer groups are as previously defined. We estimate the discretionary accruals models for each sample consisting of one event-firm and 10 peer firms, generating residuals for each of the 11 firms. We pool the absolute values of these residuals at the iteration level, generating 100 event-firm absolute residuals and 1,000 peer firm absolute residuals per iteration. We focus on absolute discretionary accruals to avoid directional predictions of the intentional earnings manipulation or the unintentional error that led to the restatement. For each iteration, we compare the mean absolute residual for the 200 event-firms with the mean absolute residual of the 2,000 non-event firms, calculating a t-statistic for the difference. After 200 iterations, we have 200 t-statistics for the differences in mean absolute residuals.17 Because we focus on observed restatements (where we assume accruals have been misstated), we expect the t-statistics to be significantly positive. The detection rate is the frequency of significant t-statistics, as a percentage of the 200 iterations. Our analyses focus on how the choice of peer groups affects this detection rate: holding                                                              16

Hennes et al. (2008) conclude that about 24% of restatements are due to irregularities and 76% to errors, i.e., unintentional misapplications of authoritative guidance. For our purposes either would result in poor earnings quality that the models for detecting discretionary accruals should detect. 17 This test is statistically equivalent to the approach used in the simulations tests, a regression on an event-firm dummy variable.

16   

the event firms constant, better (worse) peer groups will have larger (smaller) detection rates. We assume that restatement firms have managed (or misstated) their accruals, and that the accruals management affects the fiscal year prior to the restatement announcement. Because these assumptions may not hold for all observations, we do not expect to observe 100% detection rates. Table 2 provides information about the accruals of restatement firms in our sample. These data show that the mean (median) amount of total absolute accruals is 10.5% (7%) of assets, with a standard deviation of about 17.3% of assets. Table 4 reports the detection rates, defined as the fraction of iterations (of 200) where the tstatistic is significant at the 10% level or better. The rows in Table 4 correspond to the accruals models and the columns correspond to the peer groups. In all cases, the lagged asset-based peer group has the highest detection rates, between 14.0% and 39.5%, compared to a range of 0.5% to 22.0% for the other peer groups. These tests show that lagged asset peers dominate all definitions of industry peers; detection rates for industry peers are never more than about 30% of the detection rate of lagged asset peers. These findings from an analysis of restatement firms are consistent with our simulation results. In both cases, we find that estimation samples based on similarity in lagged assets work at least as well as, and sometimes better than, estimation samples based on industry membership in detecting discretionary accruals.

5. Peer Group Selection for Analyses of Discretionary Accruals in Non-U.S. Data This section considers the relative power of industry-based peers versus size-based peers in detecting discretionary accruals in non-U.S. data. Section 5.1 provides evidence on the restrictiveness of industry peer definitions in markets with considerably fewer firms than the U.S. markets. Section 5.2 provides evidence on the discretionary accruals detection power of industry-based peers versus size-based peers using non-U.S. data. We summarize key findings and inferences in section 5.3. 5.1 Restrictiveness of industry peer definitions applied to non-U.S. data As discussed in the introduction, creating industry-based peers to estimate discretionary accruals models imposes more substantial sample attrition for non-U.S. data than for U.S. data. To see this, we impose the same data 17   

requirements on non-U.S. data that our simulation analysis imposed on U.S. data: we require that each firm-year observation have the data necessary for the accruals models and that there be at least 11 firms per industry group (one event firm plus 10 non-event firms). We perform our analyses by country (that is, we do not combine same-industry observations from two or more countries or same-country observations across industries). We start by requiring a country-year to have at least 11 firm-year observations on Compustat Global for the period 1988-2009. We use Compustat Global because it has more consistent and complete industry identifiers than other non-U.S. databases such as Datastream. Table 5 shows that these requirements result in 217,153 firm-years, representing 69 countries.18 Table 5 also reports the distribution of firm-year observations by country, after imposing the requirement that each country have the necessary data to determine peer firms using industry definitions based on 2-digit, 3-digit, and 4-digit SIC codes; these data are reported in the columns labeled SIC2, SIC3 and SIC4, respectively. As a benchmark for these non-U.S. data, the first row of Table 5 shows the result of applying these requirements to Compustat North America data for the period 1988-2009. There are 120,791 U.S. firm-year observations with the data needed to estimate the accruals models; applying the industry requirements associated with 2-digit, 3-digit and 4-digit SIC codes reduces this sample by 1%, 11% and 22%, respectively. Our focus in Table 5 is on how the number of firm-year observations, and their distribution across industries within each country, influences the sample loss imposed by industry definitions of peer groups. In terms of sample size, Japan has the largest fraction of the total, with 47,385 observations, while Bangladesh and Romania have the least, 11 observations; the median country (New Zealand) has 1,086 firm-year observations. For the least restrictive SIC2 definition, two countries have sample loss rates in the single digits – Japan with 5% and China with 9%; the comparable loss rate is 1% for the U.S. The country with the second largest sample size, Great Britain with 20,561 firm-year observations, experiences a 16% reduction in sample size from an SIC2 requirement.                                                             

18

Compustat Global contains firm-year observations for 99 countries. Thirty countries do not meet the minimal restrictions we impose, leaving the 69 countries listed in Table 5.

18   

We calculated three measures of sample loss for the 69 countries in Table 5. The first measure is the number of countries (of 69) that would be eliminated entirely because of insufficient data to estimate the accruals models using industry-based peer groups. The least restrictive SIC2 definition eliminates 29 of 69 countries, increasing to 37 and 40 for SIC3 and SIC4, respectively. The second measure of sample loss averages the country-specific losses of firm-year observations across the 69 countries, weighting each country equally. Using this measure, 76% of the sample observations are lost using an SIC2 definition, increasing to 89% and 93% for SIC3 and SIC4 definitions. The third measure of sample loss is a weighted average version of the second measure, where the weights are each country’s sample size as a proportion of the total; the weighted average version produces smaller measures of sample loss because it counts the sample loss for Japan (Bangladesh and Romania) more (less) in the overall measure. Weighted average sample losses are 32% (SIC2), 59% (SIC3) and 70% (SIC4). The evidence in Table 5 indicates that using industry peers (that is, requiring the necessary data to estimate accruals models at the industry level) imposes significant sample attrition on non-U.S. data, and even eliminates substantial numbers of countries. However, using a size-based peer group generates no sample attrition beyond that imposed by the accruals model itself. Basing estimation samples on similarity in lagged assets instead of industry membership, therefore, offers the possibility of much larger samples, including more countries. At the same time, in a country with relatively few firms, the average size-spread within a size-based peer group could be large, which could reduce the power of size-based peers to detect discretionary accruals. Therefore, it is a priori unclear whether the detection power of sizebased peers is as good as the detection power of industry-based peers in countries other than the U.S. We examine this issue in the next section. 5.2. Detecting induced discretionary accruals in non-U.S. data

In this section we compare

industry-based and size-based peer groups in terms of their ability to detect discretionary accruals in nonU.S. data. Our analysis applies the simulation tests reported in Table 3 to each of the 69 countries with available data, and restricts the simulations to 50 event firms in each of 100 iterations. The most restrictive requirement is the requirement that for each randomly selected event firm there be 10 non19   

event firm observations in a 4-digit SIC code for that country. As shown in Table 5, only 29 of 69 countries meet this requirement; of these 29 countries, nine have too few observations (less than 100 firmyears) for us to perform the bootstrapping required by our simulation. We therefore restrict our analysis to the 20 countries with at least 100 firm-year observations identifiable under SIC4. For this “restricted sample” of 20 countries, we analyze the performance of industry-based and size-based peer groups at detecting induced discretionary accruals. We tabulate results for the Jones model with intercept and for the entire cross-section, three industry-based peers, and lagged asset-based peers. Results (not tabulated) for the other models and peer groups yield similar inferences. We analyze all seed levels and report results for the aggregate effectiveness score which averages detection rates (measured relative to the best detection rate for that seed level) across all seed levels.19 We also report a count of how many times (of the 20 restricted sample countries) a specified peer group has the highest effectiveness score. The results, reported in Table 6, show that the lagged asset-based peer group has the highest average effectiveness score (96.2%) calculated across the 20 countries and a higher effectiveness score for more individual countries than any industry-based peer group (for example, the lagged asset peer group has the highest effectiveness score in 11 countries versus 5 countries for the SIC3 peer group). The tests reported in Table 6 hold the sample sizes of each peer group constant at 11 observations (one event firm plus 10 non-event firms). Applying this requirement to industry-based peer groups results in substantial sample loss, especially for SIC4. Given the result that lagged assets-based peers perform better than industry-based peers for the constant-size peer groups, we investigate the performance of lagged asset-based peers when we do not impose the industry requirements. Because basing estimation samples on similarity in lagged assets imposes no incremental sample loss, we can theoretically perform this analysis for the 69 countries with at least 11 observations in a given year, listed in Table 5. Because we require that each country have at least 100 firm-year observations (to perform the bootstrapping required by our simulation), we analyze the 58 countries listed in Table 5 with at least 100                                                             

19

In unreported tests, we ensure that the models are unbiased for the non-U.S. data (i.e., we observe 5% detection rates for the 0% seed level).

20   

firm-year observations (the “maximized sample”). Table 7 shows the detection rates for these 58 countries, for a 10% seed level (results for other seed levels are similar and are not tabulated). On average, the lagged asset-based peers detect the seeded discretionary accruals 25.9% of the time. We can compare the detection rates for countries that are in both the maximized and restricted samples; differences in detection rates for these countries relate to the additional data included in the maximized sample. For the 20 countries in both samples, the average detection rate when industry data are required is 24.7% (restricted sample) versus 24.3% when industry data are not required (maximized sample); the difference is not statistically reliable at conventional levels. 5.3. Summary of analyses involving non-U.S. data We draw two inferences from the evidence in Tables 5-7. The first inference is that estimating discretionary accruals models using non-U.S. data and industry-based peers results in significant sample attrition. As a result, researchers may face the problem of low detection power for discretionary accruals and may find the analysis restricted to the few countries with markets large enough to support the data requirements (Japan, China, Great Britain). Alternatively, the researcher could aggregate data across countries, within industry, to increase the number of firm-year observations per industry. This approach treats all observations for a given industry, across countries, as similar, so it ignores the influence of jurisdiction-specific factors on discretionary accruals (e.g., accounting standards, legal systems, degree of market development). This approach is problematic if the researcher is interested in whether and how jurisdictional factors affect discretionary accruals because combining observations across countries will obscure the phenomenon being investigated. Finally, researchers could use a country-cross-section as the estimation sample. Our results indicate that this “entire cross-section” peer group is less powerful at detecting discretionary accruals than either industrybased peers or size-based peers. The second inference is that size-based estimation samples perform at least as well as, and often better than, industry-based estimation samples for detecting discretionary accruals in non-U.S. data. This finding means that researchers can estimate accruals models using lagged asset-based peers with no sample loss incremental to that imposed by the requirement that the firm have data to calculate the 21   

variables included in the accruals models. Therefore, entire countries which cannot be analyzed using industry-based peers can be analyzed using lagged asset-based peers. Increasing the number of countries for which researchers can estimate discretionary accruals models should increase the power of research designs examining whether and how jurisdiction-specific factors influence managers’ ability and incentives to engage in discretionary accruals, because the larger samples should include both more and more diverse countries.

6. Additional Tests This section reports the results of several investigations of the robustness of our main finding that size-based peer groups based on similarity in lagged assets perform at least as well as industry-based peer groups in detecting discretionary accruals. We investigate over-rejection rates in the 0% seed case, the effects of peer group size, the effects of alternative industry definitions, and the effects of scaling. 6.2 Over-rejection rates in the 0% seed case Recall that our main test for detecting discretionary accruals is based on rejecting, at the 10% level, the hypothesis that the slope coefficient in regression (5) is zero. In the 0% seed case, we expect that the rejection rate for positive (negative) accruals to be 5% (5%). Our first sensitivity analysis probes the higher-than-expected rejection rates found for lagged asset-based peers for the 0% seed level, documented in Table 3. We wish to determine whether the higher detection rates for positive seed levels observed for lagged asset-based peers are driven by the greater than 5% rejection rates found for the 0% seed level. The concern is that the over-rejection in the 0% seed case propagates, possibly nonlinearly, into higher detection rates for larger seed levels. We first examine whether the over-5% rejection rates for the 0% seed level are observed across all lagged asset deciles. The first panel in Table 8 shows the rejections rates for the 0% seed level for each decile of the lagged asset-based peer group; lower (higher) deciles correspond to smaller (bigger) firms. These data show that over-rejection rates are concentrated in smaller firms. If higher rejection rates at 0% seed levels for small firms drive our main findings about the detection power of lagged-asset-based peer groups, we should observe the highest detection rates for small firms as we seed positive amounts of 22   

discretionary accruals. The second and third panels of Table 8 show detection rates by lagged asset decile for seed levels of 10% and 20% (results for other seed levels are similar and are not reported). These data show no evidence that the highest detection rates are associated with small firms. In fact, we observe higher detection rates for larger firms, whose results are better-specified in the 0% seed case. We conclude from the results in Table 8 that the possibility of bias when estimation samples are based on lagged assets does not drive our finding that size-based peers perform as well as, or better than, industrybased peers in detecting discretionary accruals. Instead, the results point to a higher variation in the total accruals of small firms which is not explained by the combinations of discretionary accruals models and estimation sample selection criterion that we consider. 6.2. Size of the Peer Group

Having shown in Table 8 that benchmark (0% seed level) detection

rates vary across lagged asset deciles, we investigate the across-industry variation in these benchmark detection rates. We test for a systematic association between the number of firms in an industry and detection rates for discretionary accruals. We hypothesize that using the entire industry as the estimation sample in a given year (i.e., maximizing the number of industry peer firms) leads to lower test power, motivating our design choice of standardizing the number of peers in our main tests (at 10 peers per event firm-year). Our tests repeat the simulations described in section 3 using a stratified-by-industry design. Specifically, we sample 250 event-firms from each of the 54 SIC2 industries, with replacement. The sample size is substantially reduced compared to the main tests, as some industries have few observations. Each event firm is matched with either all or, consistent with the design in the main tests, 10 randomly chosen peer firms from the same SIC2-year. Table 9 presents the benchmark (no seeding) rejection rates for each SIC2 industry and several accruals models. We report include data on the number of firm-years in the industry and the number of peers in an average SIC2-year. The average benchmark rejection rates across the 54 SIC2 groups are close to the expected 5%. When all industry peers are used, rejection rates are slightly lower than 5% across models; when 10 industry peers are used, rejection rates are slightly higher than 5%. There is considerable cross-industry variation in these results. 23   

The last rows of Table 9 contain the pairwise correlation coefficients (ρ) between the average number of peers and the rejection rate as well as the corresponding p-value of a two-sided test that

ρ = 0.20 When all peers in an industry-year are used, the correlations are significantly (at the 0.0002 level or better) negative, implying that industries with fewer observations show a greater incidence of rejection rates that deviate from the sample averages (and 5%). Moreover, the five industries with the largest (smallest) average number of peers show average rejection rates of 2.4% (6.2%) across the four models. When the test design is standardized to using 10 random peer firms only, none of the correlations is significantly different from zero at conventional levels. The five top (bottom) industries by size show rejection rates of 5.2% (5.3%) on average. In the context of our simulation tests, these results indicate that controlling the number of peer firms has a statistically meaningful effect on detection rates under the null hypothesis of zero discretionary accruals when there are zero seeded accruals. These findings also suggest that the number of peer firms may matter more than the choice of the accruals model itself in calibrating the design towards the theoretically correct 5% benchmark rejection rate in the 0% seed level case. Given this finding, and our general finding that a size-based peer group often performs best at detecting discretionary accruals, a natural next question is: what is the optimal number of estimation firms? To maximize the number of SIC4-years in our sample, we require 10 non-event firms (11 including the event firm) to estimate the accruals models. However, because the number of size-based peers is defined by the number of neighbors, it is easy to expand the size-based estimation sample beyond 10. Increasing the size-based estimation sample increases the number of observations used to estimate the accruals models (and so increases the degrees of freedom) but also likely increases heterogeneity among the peer observations (and so increases the variance of the discretionary accruals estimates). It is an empirical question which of these two effects dominates. To provide evidence on this question, we focus on lagged asset-based peers and repeat our main simulation analyses increasing the number of lagged asset-based peers from 10 firms (base case) to 20, 30, 50, 100, 250, 500 and 1000 firms.                                                              20

This correlation is equal to the correlation between the average number of peer firms and the deviation of the rejection rate from its theoretical value of 5%. Using the overall number of firm-years in the industry instead of the average number of peer firms yields similar results.

24   

Results of this test are shown in Table 10. We tabulate detection rates for the Jones model with intercept (other models yield similar inferences and are not reported). The first row (0% seed level) provides a measure of the unbiasedness of the model for each peer group size. Starting with 10 peer firms (the number used in our main tests), we find that the rejection rate is monotonically decreasing in peer group size, falling below the theoretical benchmark of 5% for large peer groups (n>250). Turning to results for seeded positive discretionary accruals, the highest detection rates are generally achieved with peer sizes of n=10 and n=20.21 Further, a benchmark rejection rate exceeding 5% does not necessarily translate into higher rejection rates at higher seed levels. Specifically, the highest benchmark rejection rate is observed for n=10 ( 6.1% for the 0% seed level), yet the highest detection rates for positive seed levels are not observed for n=10 past the 8% seed level. 6.3. Alternative industry definitions We used SIC codes to define industry classification to connect to the extant research in this area which primarily uses SIC codes. We examine the sensitivity of our results to three other industry definitions: NAICS, Global Industry Classification Standard (GICS), and historical SIC codes from the top ranked segment (in terms of sales revenues) as reported on the Compustat Segments database. Because we retain the requirement that there be at least 11 firms in the most narrow industry definitions, the samples necessarily change. In untabulated tests, we find that with few exceptions discretionary accruals detection rates for these alternative industry definitions are lower than those reported for the SIC definitions in Table 3. Lagged assets-based peers perform better than peers based on these alternative industry definitions in terms of both effectiveness scores and effectiveness ranks. In the restatements tests, the discretionary accruals detection rates of the three alternative industry definitions are also considerably weaker than the SIC-based detection rates, and detection rates using lagged asset-based peers continue to be higher than detection rates using industry – based peers.

                                                            

21

In unreported tests, we considered peer sizes between 10 and 20, in increments of one. No consistent pattern emerged.

25   

6.4. Scaling To examine whether using lagged assets as the scaling factor (the denominator) in the accruals models influences our main finding, that lagged-asset-based peers perform at least as well as industry-based peers in detecting discretionary accruals, we repeat our tests using lagged sales as the scaling factor. If the choice of denominator drives our results, we should find that lagged-sales-based peers have approximately the same discretionary accruals detection power as lagged-asset-based peers. Results (not reported) show that lagged-sales-based peers do not have the highest discretionary accruals detection power when we use lagged sales as the scaling factor. The effectiveness scores for lagged-salesbased peers are about 85-93%, and are always below the scores for lagged-asset-based peers. These results suggest that the scaling factor used in the accruals model is not driving our detection rate results. These findings also speak to the question of whether the superior performance of lagged-assetsbased peers is driven by using lagged assets both as a parametric control in the accruals model (i.e., it is used to scale the variables) and as a non-parametric control (i.e., it is used to define the estimation sample). If any numerical variable would produce superior detection rates when used both parametrically and non-parametrically, then we would expect to find superior detection rates for lagged sales or any other variable that is used both as a parametric control and as the basis for defining the estimation sample. As noted above, we do not observe this result for lagged sales, nor do we observe it for other variables we examine (tests not reported). This finding suggests that the detection rate results for lagged assets-based estimation samples are not due solely to econometric specification.

7. Summary and Conclusion We examine the ability to detect discretionary accruals using several variants of the Jones (1991) model of discretionary accruals and estimation samples based on industry membership (industry peers) and estimation samples based on size (size peers). Our examination is motivated by the practical problem of sample attrition when estimation samples are based on industry membership, particularly the SIC4 industry definition, and particularly for non-U.S. data.

26   

Our main finding is that estimation samples based on similarity in lagged assets perform at least as well as industry membership-based estimation samples, and often better, in detecting both seeded discretionary accruals and observed discretionary accruals (as proxied by the existence of a restatement). We document that the superior discretionary accruals detection power of lagged asset-based peers applies to both U.S. data and non-U.S. data. For non-U.S. samples not constrained by the availability of industry peers, lagged asset-based peer detection rates are similar to the detection rates observed for samples where we can perform a controlled comparison of detection rates for industry-based peers and lagged asset-based peers. Defining estimation samples (peer firms) based on similarity in asset size instead of industry membership has substantial practical value in estimating discretionary accruals models because sizebased peer selection imposes no incremental sample loss, beyond the sample losses that result from estimating the variables in the models. For U.S. data, this means avoiding sample attrition of anywhere from 1-3% (SIC2 definitions) to 22-30% (SIC4 definitions). The benefits, in terms of increased sample sizes, are much greater for non-U.S. data, where using lagged assets instead of industry membership to identify estimation samples avoids sample attrition ranging from 32% to 93% (depending on industry definitions and weighting schemes). We believe our finding concerning the discretionary accruals detection power of lagged assetbased peers is important both in the U.S. context and in the non-U.S. context. For both settings we show that using lagged asset-based peers rather than industry-based peers to estimate discretionary accruals models increases sample sizes and generally results in equal or better detection of discretionary accruals. The overall effect is more dramatic for non-U.S. settings and more valuable, because in those settings, entire countries that would be dropped in an industry-based peer design can be included with a size-based peer design.

27   

Table 1 Simulation Sample Selection Using U.S. Data 1950-2009 Selection criteria  Unique firm-years on Compustat North America Firms with more than one year of data  With lagged total assets, total assets, sales >= 1, ROA >= -1 With data for total accruals calculation Non-bank firm-years  With data for identifying ALL peer groups  With at least 11 firm-year observations per SIC4

Peer group Entire cross-section  SIC2  SIC3  SIC4 

# Firm-years 143,584  143,584  143,584  143,584 

# Firm-years 408,245 407,726 299,414 247,841 239,896 203,842 143,584

# Peergroup-years 59 1,972 4,349 5,486

Mean 2,434 73 33 26

Firms per peergroup-year Min Median 15 2,442  11 37  11 18  11 17 

Max 5,683 958 824 436

Table 1 reports the sample restrictions imposed by requirements to have the necessary data to calculate an accruals observation, to identify peer groups, and to estimate the accruals models. The most restrictive criterion requires 10 non-event firms in the same SIC4 code. The bottom rows of the table report the sizes of the industry peer groups, e.g., out of the 143,584 firm-year observations that meet our requirements, there are 1,972 distinct SIC2-year peer groups.

28   

Table 2 Descriptive Statistics on the U.S. Simulation and U.S. Restatement Samples Panel A: Descriptive statisticson main variables  # Obs.

Mean

Std. Dev.

P5

P25

Median

P75

P95

Total Accruals  Total Accruals (Restatement obs.)  Abs. Total Accruals  Abs. Total Accruals (Restatement obs.) 

143,584  1,337  143,584  1,337 

-0.0310 -0.0702 0.1059 0.1051

0.3312 0.1895 0.3154 0.1726

-0.2215 -0.2491 0.0063 0.0060

-0.0897 -0.1070 0.0309 0.0334

-0.0391 -0.0572 0.0642  0.0705 

0.0126  -0.0139 0.1188  0.1221 

0.1863 0.0925 0.3000 0.2745

Total Assets  Lagged Total Assets  Sales  Lagged Sales  Market Cap.  Firm Age (in years) 

143,584  143,584  143,584  143,584  143,584  143,584 

1,661.95 1,528.27 1,364.37 1,264.11 1,647.14 15.54

8,672.43 8,090.74 7,697.82 7,269.43 11,049.66 14.39

5.28 4.42 4.05 3.05 2.87 2.00

26.83 22.68 25.27 21.54 18.80 6.00

104.77  90.45  103.60  90.51  88.03  11.00 

531.32  466.25  499.23  447.74  482.85  21.00 

6,419.31 5,870.00 5,214.00 4,824.60 5,406.10 46.00

Panel B: Statistics on intra-industry heterogeneity, U.S. simulation sample

# Industry-years 

Entire crosssection 

SIC2

SIC3

SIC4

59

1,972

4,349

5,486

5.30 1.30 1.93 1.89 2.36

6.96 1.21 1.85 1.81 2.17

7.45 1.20 1.84 1.79 2.18

5.02  2.21  2.36  0.72 

5.90 2.08 1.96 0.83

7.02 2.02 1.90 0.80

12.44 1.99 1.89 0.79

1.98  1.60 

2.33 1.94

2.43 2.01

2.49 2.03

0.1893 0.1622

0.1448 0.1037

0.1386 0.0943

0.1422 0.0964

0.3304

0.4524

0.4920

0.4970

Standard Deviations (scaled by absolute value of the mean) Total Accruals  5.74  Abs. Total Accruals  1.85  3.51  Total Assets  3.64  Sales  3.90  Market Cap.  ROA  Sales Growth  Analyst Following  Firm Age  Average Segment Descriptives  # Segments  # Segments from different SIC codes  Jones model diagnostics Std. Dev. (Total Accruals)  Std. Dev. ( ε ) 2



Table 2 presents descriptive information about accruals (signed and unsigned) for firm-years in the U.S. simulation sample and U. S. restatement sample (Panel A) and, for the U. S. simulation sample, descriptive statistics on size measures (Panel A) and standard deviations of variables for several size industry and size groupings (Panel B). Panel B shows the standard deviations of the values of size variables used to create the estimation samples, by industry grouping. The bottom three lines present accruals model diagnostics: the standard deviation of the dependent variable, total accruals; the standard deviation of the residuals, the measure of discretionary accruals; and the explanatory power of the accruals model.

29   

Table 3 Detecting Earnings Management I - Base Simulation Using U.S. Data, 1951-2009

Magnitude of Accruals Management Entire (in % of lagged total assets) cross-section

SIC2

SIC3

SIC4

Lagged Total Assets Total Assets Neighbors Neighbors

Sales Neighbors

Market Lagged Sales Capitalization Neighbors Neighbors

Firm Age Neighbors

Jones Model 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Effectiveness Score Effectiveness Rank

5.2% 6.9% 9.3% 12.6% 16.9% 21.9% 27.3% 32.9% 38.6% 43.9% 49.1% 78.3% 10.00

5.2% 7.8% 11.6% 16.5% 22.0% 27.8% 33.5% 39.2% 44.7% 49.6% 54.0% 94.0% 4.50

5.3% 8.1% 12.5% 17.7% 23.2% 29.1% 34.8% 40.4% 45.7% 50.7% 55.1% 98.0% 2.40

5.3% 8.1% 12.4% 17.7% 23.5% 29.3% 35.1% 40.7% 45.9% 50.8% 55.1% 98.3% 1.90

5.0% 7.4% 11.0% 15.7% 21.1% 27.3% 33.7% 39.7% 45.2% 50.2% 54.7% 92.6% 4.50

5.8% 8.3% 11.9% 17.0% 22.9% 29.3% 35.5% 41.6% 47.3% 52.7% 57.4% 98.9% 1.70

5.2% 7.1% 9.9% 14.0% 19.0% 24.4% 29.9% 35.6% 41.0% 45.9% 50.8% 84.1% 8.30

5.1% 7.1% 10.2% 14.3% 19.2% 24.7% 30.3% 35.9% 41.3% 46.3% 50.8% 84.9% 7.30

5.2% 7.2% 10.2% 14.4% 19.3% 24.8% 30.7% 36.3% 41.7% 46.6% 51.2% 85.6% 6.20

5.2% 7.2% 10.4% 14.2% 19.1% 24.1% 29.6% 35.2% 40.7% 45.8% 50.5% 84.3% 8.20

Jones Model (with intercept) 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Effectiveness Score Effectiveness Rank

5.2% 7.1% 9.6% 12.8% 16.9% 21.5% 26.4% 31.4% 36.7% 41.5% 46.0% 79.1% 10.00

5.2% 7.8% 11.7% 16.4% 21.7% 27.3% 32.5% 37.5% 42.4% 46.8% 50.7% 94.3% 4.50

5.3% 8.1% 12.5% 17.4% 22.6% 28.0% 33.2% 38.3% 43.1% 47.6% 51.6% 97.5% 2.90

5.3% 8.2% 12.5% 17.4% 22.6% 28.3% 33.7% 38.8% 43.6% 47.9% 51.7% 98.1% 2.00

5.1% 7.4% 11.0% 15.8% 21.2% 26.9% 32.7% 38.2% 43.4% 48.1% 52.4% 93.7% 4.00

5.8% 8.5% 12.3% 17.0% 22.5% 28.4% 34.0% 39.8% 44.9% 49.7% 53.9% 99.5% 1.60

5.2% 7.3% 10.3% 14.1% 18.7% 23.8% 28.8% 33.9% 38.8% 43.2% 47.5% 84.7% 8.10

5.2% 7.3% 10.5% 14.4% 19.2% 24.2% 29.3% 34.2% 39.0% 43.5% 47.9% 85.7% 6.90

5.2% 7.2% 10.4% 14.3% 18.9% 24.2% 29.4% 34.5% 39.3% 43.8% 48.1% 85.6% 6.90

5.2% 7.4% 10.7% 14.4% 18.7% 23.6% 28.6% 33.7% 38.6% 43.1% 47.4% 85.0% 8.10

Modified Jones Model 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Effectiveness Score Effectiveness Rank

5.5% 7.1% 9.4% 12.5% 16.4% 20.7% 25.4% 30.4% 35.3% 39.9% 44.4% 78.2% 10.00

5.3% 7.8% 11.6% 16.1% 21.2% 26.5% 31.7% 36.7% 41.4% 45.7% 49.6% 94.2% 4.60

5.3% 8.1% 12.3% 17.1% 22.2% 27.5% 32.6% 37.5% 42.1% 46.6% 50.5% 97.5% 3.10

5.4% 8.3% 12.4% 17.1% 22.3% 27.7% 32.8% 37.8% 42.5% 46.7% 50.6% 98.2% 2.10

5.1% 7.5% 11.0% 15.6% 21.2% 26.8% 32.3% 37.9% 43.1% 47.7% 51.8% 94.8% 3.60

6.1% 8.5% 12.2% 16.8% 22.0% 27.8% 33.3% 38.7% 43.9% 48.6% 52.7% 99.6% 1.60

5.2% 7.2% 10.2% 13.8% 18.3% 23.2% 28.1% 33.0% 37.6% 42.0% 46.2% 84.3% 8.30

5.3% 7.3% 10.4% 14.1% 18.8% 23.6% 28.5% 33.3% 38.1% 42.5% 46.5% 85.5% 6.90

5.2% 7.2% 10.3% 14.1% 18.5% 23.5% 28.6% 33.6% 38.4% 42.9% 47.2% 85.6% 6.70

5.3% 7.4% 10.4% 14.1% 18.4% 23.0% 27.7% 32.7% 37.3% 41.7% 45.8% 84.4% 8.10

Jones Model + ROA 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Effectiveness Score Effectiveness Rank

5.5% 7.4% 9.9% 13.0% 17.0% 21.2% 25.7% 30.4% 35.0% 39.1% 43.1% 81.8% 9.90

5.4% 8.0% 11.8% 16.4% 21.3% 26.2% 30.8% 35.4% 39.6% 43.6% 47.1% 95.5% 4.10

5.4% 8.3% 12.5% 17.1% 21.8% 26.7% 31.5% 35.9% 40.3% 44.2% 47.8% 98.1% 2.50

5.5% 8.4% 12.5% 17.1% 21.9% 26.9% 31.6% 36.3% 40.6% 44.5% 48.0% 98.6% 1.70

5.3% 7.7% 11.0% 15.3% 20.0% 25.1% 30.3% 35.1% 39.5% 43.6% 47.3% 92.6% 5.00

6.1% 8.8% 12.4% 16.7% 21.6% 26.7% 31.8% 36.7% 41.4% 45.7% 49.4% 99.5% 1.80

5.4% 7.5% 10.5% 14.1% 18.4% 23.0% 27.5% 31.9% 36.1% 40.0% 43.7% 85.7% 8.20

5.3% 7.4% 10.3% 14.1% 18.5% 23.1% 27.6% 32.0% 36.3% 40.2% 44.0% 85.9% 7.10

5.2% 7.3% 10.3% 14.1% 18.4% 23.1% 27.7% 32.2% 36.5% 40.4% 44.2% 85.8% 7.10

5.5% 7.7% 10.9% 14.4% 18.5% 22.9% 27.3% 31.7% 36.0% 40.1% 43.7% 86.4% 7.60

Table 3 is continued on the next page.

30   

Performance-adjusted discretionary accruals (based on Jones) 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Effectiveness Score Effectiveness Rank

5.2% 6.5% 8.0% 10.0% 12.3% 15.0% 17.8% 20.8% 24.0% 27.2% 30.3% 79.0% 9.90

5.1% 6.8% 9.2% 12.1% 15.3% 18.6% 22.1% 25.5% 28.9% 32.2% 35.3% 93.8% 4.50

5.1% 7.2% 9.8% 12.9% 16.1% 19.6% 23.1% 26.5% 30.0% 33.3% 36.3% 98.4% 2.50

5.3% 7.2% 9.9% 13.0% 16.3% 19.7% 23.1% 26.6% 30.1% 33.4% 36.3% 98.9% 1.70

5.1% 6.8% 9.1% 11.8% 14.9% 18.3% 22.0% 25.6% 29.3% 32.8% 35.9% 93.7% 4.50

5.5% 7.3% 9.5% 12.4% 15.7% 19.5% 23.2% 27.0% 30.7% 34.2% 37.6% 98.7% 1.80

5.2% 6.7% 8.6% 10.9% 13.6% 16.7% 19.8% 23.0% 26.3% 29.4% 32.5% 85.9% 7.80

5.0% 6.4% 8.4% 10.8% 13.6% 16.8% 20.0% 23.4% 26.7% 29.7% 32.7% 85.8% 7.50

5.1% 6.7% 8.6% 11.0% 13.7% 16.6% 19.9% 23.3% 26.7% 30.0% 33.2% 86.5% 6.90

5.0% 6.6% 8.7% 11.1% 13.7% 16.7% 19.7% 22.9% 26.1% 29.2% 32.3% 85.9% 7.90

Performance-adjusted discretionary accruals (based on Modified Jones) 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Effectiveness Score Effectiveness Rank

5.2% 6.5% 8.0% 9.9% 12.1% 14.6% 17.3% 20.3% 23.3% 26.4% 29.4% 78.9% 10.00

5.1% 6.8% 9.1% 11.9% 15.0% 18.3% 21.6% 24.9% 28.2% 31.5% 34.5% 93.8% 4.50

5.2% 7.1% 9.7% 12.8% 15.9% 19.2% 22.5% 25.9% 29.2% 32.5% 35.5% 98.3% 2.70

5.2% 7.2% 9.9% 12.8% 16.0% 19.3% 22.6% 26.1% 29.5% 32.6% 35.6% 99.0% 1.70

5.0% 6.8% 9.0% 11.7% 14.8% 18.3% 21.8% 25.3% 28.9% 32.4% 35.7% 94.5% 4.20

5.6% 7.3% 9.5% 12.3% 15.6% 19.0% 22.7% 26.4% 30.0% 33.5% 36.8% 98.8% 1.80

5.2% 6.5% 8.4% 10.6% 13.3% 16.2% 19.4% 22.5% 25.7% 28.8% 31.8% 85.3% 8.40

5.1% 6.5% 8.3% 10.7% 13.3% 16.4% 19.5% 22.7% 25.9% 29.0% 32.0% 85.7% 7.50

5.1% 6.6% 8.5% 10.9% 13.5% 16.4% 19.5% 22.7% 26.0% 29.3% 32.5% 86.5% 6.70

5.2% 6.7% 8.7% 11.0% 13.6% 16.5% 19.4% 22.4% 25.5% 28.6% 31.7% 86.3% 7.40

6.3% 7.6% 9.4% 11.9% 14.9% 18.8% 23.2% 28.2% 33.4% 38.7% 44.1% 77.0% 10.00

5.9% 7.7% 10.8% 14.9% 19.6% 24.8% 30.2% 35.8% 41.2% 46.4% 51.4% 93.6% 4.80

5.8% 8.2% 11.9% 16.3% 21.0% 26.2% 31.6% 37.1% 42.6% 47.8% 52.6% 98.7% 2.40

5.8% 8.1% 11.7% 16.4% 21.2% 26.4% 32.0% 37.5% 42.8% 47.9% 52.6% 99.0% 1.70

6.0% 7.9% 10.6% 14.6% 19.3% 24.7% 30.1% 35.5% 41.0% 45.9% 50.6% 93.0% 5.60

5.8% 7.7% 10.8% 15.1% 20.4% 26.3% 32.2% 38.0% 43.6% 48.9% 53.6% 97.3% 2.50

6.2% 7.9% 10.4% 14.0% 18.4% 23.5% 29.1% 34.5% 40.0% 45.2% 50.0% 90.5% 6.80

6.1% 7.9% 10.6% 14.4% 19.0% 24.4% 30.1% 35.8% 41.4% 46.5% 51.5% 93.1% 4.70

6.2% 7.8% 10.2% 13.4% 17.5% 22.5% 27.5% 32.8% 38.2% 43.3% 48.2% 86.8% 8.70

6.1% 7.8% 10.5% 13.8% 17.8% 22.4% 27.6% 33.1% 38.4% 43.7% 48.7% 87.8% 7.80

Healy Model 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Effectiveness Score Effectiveness Rank

Table 3 reports the detection rates for each seed level of induced discretionary accruals (2%-20% of lagged assets) and peer definitions, for each of the seven accruals models. The detection rate is the fraction of times, as a percentage of the 50,000 subsamples, that the slope coefficient in equation (5) is significantly positive at the 10% level. The 0% seed level is a specification check, insofar as a well-specified model should show a 5% detection rate of positive discretionary accruals (when none is induced) at a 10% significance level. Effectiveness scores are reported for each peer group and model; the effectiveness score is the average, across seed levels, of the absolute value of the distance between the peer group’s detection rate and the maximum (across all peer groups) detection rate for that seed level. For each seed level (row) and model, we assign a rank of 1 (10) to the peer group with the best (worst) detection rate. We average these ranks across the seed levels, for each peer group, to obtain an “effectiveness rank”. An effectiveness score of 100% and effectiveness rank of 1.0 indicate that, for that model, the peer groups was always the best at detecting discretionary accruals.

31   

Table 4 Analysis of Discretionary Accruals of Restatement Firms, 1996-2006

Entire cross-section Jones Model Jones Model (with intercept)

14.5% 7.5%

 

SIC2

SIC3

SIC4

Total Assets Neighbors

9.5% 6.5%

6.5% 4.5%

8.5% 7.5%

19.0% 15.0%

Lagged Total Assets Neighbors

Sales Neighbors

MktCap Neighbors

39.5% 26.0%

22.0% 14.5%

14.0% 9.0%

Modified Jones Model

8.0%

7.0%

3.5%

6.5%

10.5%

25.0%

16.0%

Jones Model + ROA

5.0%

3.5%

2.0%

4.0%

3.5%

14.0%

4.5%

PADA (based on Jones)

4.5%

0.5%

2.5%

4.0%

5.5%

16.0%

5.0%

PADA (based on Modified Jones)

3.5%

1.5%

2.0%

3.0%

3.0%

14.0%

4.5%

Average

7.2%

4.8%

3.5%

5.6%

9.4%

22.4%

11.1%

 

Firm Age Neighbors

    9.0%  3.5%  1.5%  2.5% 

17.5% 5.5%

6.6%

6.8%

8.0% 5.0% 2.0% 3.0%

Table 4 reports the fraction of the time that restatement firms’ absolute discretionary accruals differ significantly (at the 10% level) from the absolute discretionary accruals of non-restating peer firms, for seven models of discretionary accruals and two approaches to defining peer estimation samples. We assume that restatement firms reported substantial discretionary accruals, so better (worse) peer groups should have a greater (smaller) frequency of detecting discretionary accruals.

32   

Table 5 Sample Loss Due to Industry Restrictions for U.S. and Non-U.S. Data, 1988-2009 Entire Cross-Section Country USA USA (1988 - 2009) ARE ARG AUS AUT BEL BGD BMU BRA CHE CHL CHN COL CYM CYP CZE DEU DNK EGY ESP EST FIN FRA GBR GRC HKG HRV HUN IDN IND IRL ISR ITA JAM JOR JPN KEN KOR KWT LKA LTU LUX LVA MAR MEX MYS NGA NLD NOR NZL OMN PAK PER PHL POL PRT QAT ROU RUS SAU SGP SVN SWE THA TUR TWN VEN VNM ZAF ZWE Total

SIC2

SIC3

# Industry-years 2,377 1,130

# firm-years 198,961 119,463

Loss 2% 1%

# Industry-years 5,015 2,684

# firm-years 170,246 107,207

Loss 16% 11%

# Industry-years 5,486 3,164

# firm-years 143,584 94,433

Loss 30% 22%

212 673 9,327 1,184 1,425 11 4,929 3,141 2,924 1,483 18,909 174 2,114 70 150 8,869 1,796 104 1,834 94 1,688 8,826 20,561 1,190 2,597 61 228 2,638 10,454 811 1,044 2,795 60 251 47,385 61 5,386 156 391 180 280 116 130 1,277 9,615 131 2,393 2,133 1,086 273 1,359 486 1,246 1,506 617 66 11 822 164 5,861 175 3,756 4,276 793 9,521 69 67 2,726 12 217,153

0 6 229 4 12 0 125 71 82 24 399 0 36 0 0 240 19 0 15 0 29 223 562 7 38 0 0 53 212 0 22 60 0 0 791 0 157 0 6 0 0 0 0 20 305 0 17 50 0 0 34 1 12 26 0 0 0 21 0 147 0 70 134 3 145 0 0 53 0 4,460

0 70 6,257 49 141 0 2,545 1,368 1,317 414 17,258 0 825 0 0 6,346 244 0 168 0 485 5,448 17,219 85 652 0 0 834 9,112 0 382 896 0 0 45,164 0 3,847 0 117 0 0 0 0 278 7,246 0 361 900 0 0 628 11 182 383 0 0 0 325 0 3,305 0 1,660 2,208 35 8,458 0 0 894 0 148,117

100% 90% 33% 96% 90% 100% 48% 56% 55% 72% 9% 100% 61% 100% 100% 28% 86% 100% 91% 100% 71% 38% 16% 93% 75% 100% 100% 68% 13% 100% 63% 68% 100% 100% 5% 100% 29% 100% 70% 100% 100% 100% 100% 78% 25% 100% 85% 58% 100% 100% 54% 98% 85% 75% 100% 100% 100% 60% 100% 44% 100% 56% 48% 96% 11% 100% 100% 67% 100%

0 0 134 0 4 0 51 30 12 12 498 0 18 0 0 132 3 0 0 0 11 66 353 0 17 0 0 2 220 0 12 10 0 0 1,491 0 82 0 5 0 0 0 0 0 184 0 12 25 0 0 12 0 0 5 0 0 0 16 0 86 0 24 35 0 175 0 0 15 0 3,752

0 0 3,482 0 48 0 777 594 137 188 13,147 0 382 0 0 2,815 34 0 0 0 199 1,806 7,959 0 250 0 0 23 5,952 0 225 157 0 0 34,833 0 1,555 0 78 0 0 0 0 0 3,291 0 225 453 0 0 211 0 0 89 0 0 0 236 0 1,348 0 713 439 0 6,451 0 0 277 0 88,374

100% 100% 63% 100% 97% 100% 84% 81% 95% 87% 30% 100% 82% 100% 100% 68% 98% 100% 100% 100% 88% 80% 61% 100% 90% 100% 100% 99% 43% 100% 78% 94% 100% 100% 26% 100% 71% 100% 80% 100% 100% 100% 100% 100% 66% 100% 91% 79% 100% 100% 84% 100% 100% 94% 100% 100% 100% 71% 100% 77% 100% 81% 90% 100% 32% 100% 100% 90% 100%

0 0 126 0 0 0 15 17 5 12 492 0 14 0 0 83 0 0 0 0 3 47 288 0 5 0 0 2 210 0 5 0 0 0 1,342 0 68 0 5 0 0 0 0 0 146 0 5 20 0 0 12 0 0 3 0 0 0 5 0 48 0 28 13 0 194 0 0 5 0 3,218

0 0 2,885 0 0 0 195 393 55 188 11,294 0 193 0 0 1,525 0 0 0 0 35 1,065 5,449 0 72 0 0 23 5,000 0 57 0 0 0 26,204 0 1,047 0 78 0 0 0 0 0 2,290 0 75 253 0 0 211 0 0 44 0 0 0 114 0 640 0 443 147 0 5,140 0 0 63 0 65,178

100% 100% 69% 100% 100% 100% 96% 87% 98% 87% 40% 100% 91% 100% 100% 83% 100% 100% 100% 100% 98% 88% 73% 100% 97% 100% 100% 99% 52% 100% 95% 100% 100% 100% 45% 100% 81% 100% 80% 100% 100% 100% 100% 100% 76% 100% 97% 88% 100% 100% 84% 100% 100% 97% 100% 100% 100% 86% 100% 89% 100% 88% 97% 100% 46% 100% 100% 98% 100%

33   

SIC4

# firm-years 203,842 120,791

Table 5 reports the sample restrictions imposed on non-U.S. data by requirements to have the necessary accruals data to estimate the accruals models cross-sectionally for peer groups that have at least 11 firms (1 event and 10 non-events) per industry definition. The population consists of all firm-years reported on Compustat Global over 1988-2009. The column labeled “entire cross-section” shows the number of firm-years with data on accruals observations (n=217,153). Of this number, 148,117 firm-year observations (88,374; 65,178) also have the necessary SIC2 data (SIC3; SIC4). As a benchmark, the first two rows of the table show how the same restrictions affect U.S. data over the full period for which data are available (1950-2009, top row) and for the same period for which we have non-U.S. data (1988-2009, second row).

34   

Table 6 Discretionary Accruals Detection - Effectiveness Scores by Country

Country AUS  BMU BRA CHL  CHN CYM DEU  FRA  GBR  IND JPN KOR MYS NOR PAK  RUS  SGP SWE THA TWN  Average  Count if highest Count if lowest 

Entire cross-section 85.8%  95.4%  93.7%  98.0%  81.0%  93.8%  86.1%  88.6%  82.5%  86.2%  89.8%  84.3%  89.2%  81.4%  99.4%  99.3%  82.5%  93.6%  93.1%  86.0%  89.5%  0  14

SIC2 93.9% 97.5% 99.3% 97.3% 85.0% 90.2% 93.4% 87.2% 96.6% 91.4% 94.8% 94.2% 96.0% 93.5% 95.1% 96.5% 97.3% 92.1% 99.6% 88.9% 94.0% 1 3

SIC3 92.3% 95.5% 98.8% 98.6% 87.3% 94.4% 97.8% 85.6% 95.0% 92.6% 96.3% 96.7% 97.8% 99.9% 96.6% 97.9% 98.8% 88.4% 99.6% 88.8% 94.9% 5 1

SIC4 93.8% 97.4% 99.3% 97.8% 87.0% 87.6% 100.0% 85.9% 96.1% 96.0% 96.0% 100.0% 96.8% 98.6% 98.0% 96.7% 99.0% 100.0% 99.4% 91.8% 95.9% 3 0

Lagged Total Assets Neighbors 99.4%  97.3%  94.4%  97.6%  100.0% 99.6%  96.6%  99.9%  98.0%  99.8%  99.9%  91.9%  96.3%  82.5%  96.6%  95.8%  91.1%  94.5%  92.4%  100.0% 96.2%  11  1 

Table 6 reports the effectiveness scores for selected peer group definitions, for the 20 countries listed in Table 5with at least 100 firm-year observations under the SIC4 definition. We tabulate results for the Jones model with intercept; other accruals models produce similar inferences and are not shown. The effectiveness score is the average, across seed levels, of the absolute value of the distance between the peer group’s detection rate and the maximum (across all peer groups) detection rate for that seed level. An effectiveness score of 100% indicates that the peer group is always the best at detecting discretionary accruals. The last row in the table reports the average effectiveness score for each peer group, calculated across the 20 countries. The Count if highest (lowest) reports the number of countries (of 20) that the specified peer group has the highest (lowest) effectiveness score. The SIC3 peer group has the highest score in 5 countries and the lagged total assets peer group has the highest score in 11 countries.

35   

Table 7 Lagged Asset Peer Performance for Restricted and Maximized Samples Country ARE ARG AUS AUT BEL BMU BRA CHE CHL CHN COL CYM CZE DEU DNK EGY ESP FIN FRA GBR GRC HKG HUN IDN IND IRL ISR ITA JOR JPN KOR

Restricted Sample

15.7%

17.2% 39.7% 41.8% 20.9% 16.7% 15.8%

17.4% 19.1%

19.9%

49.9% 22.7%

Maximized Sample 19.5% 28.9% 21.0% 26.8% 27.2% 18.8% 25.3% 37.7% 30.7% 19.1% 48.6% 17.6% 20.6% 22.3% 28.0% 15.1% 26.5% 29.7% 28.5% 25.1% 20.7% 21.3% 24.4% 19.4% 18.9% 27.1% 23.7% 30.9% 18.3% 50.5% 28.2%

Country KWT LKA LTU LUX LVA MAR MEX MYS NGA NLD NOR NZL OMN PAK PER PHL POL PRT RUS SAU SGP SVN SWE THA TUR TWN ZAF Average

Restricted Sample

21.9%

23.6%

27.0%

19.3% 23.4% 16.2% 40.6% 25.7% 24.7%

Maximized Sample 15.3% 25.4% 20.2% 23.5% 18.0% 16.8% 30.5% 22.0% 11.0% 28.1% 24.5% 25.4% 20.3% 18.5% 22.6% 22.3% 18.1% 21.6% 21.7% 19.7% 19.8% 41.4% 26.9% 21.9% 17.0% 25.2% 23.5% 25.9%

Table 7 examines the performance of the lagged asset peer group when non-U.S. samples are constrained by industry definitions (the “restricted sample”) and when samples are not so constrained (the “maximized sample”). The 20 countries in the restricted sample are identical to those shown in Table 6. The 58 countries in the maximized sample include those 20, as well as the 38 other countries in Table 4 with at least 100 firm-year observations in cross-section. The 100 firm-year observation requirement is imposed to facilitate the simulation. Table 7 shows detection rates, for a 10% seed level using the Jones model with intercept; other seed levels and models produce similar inferences and are not reported.

36   

Table 8 Rejection Rates by Size Decile

Jones

Jones no intercept

Mod Jones

Jones + ROA

Healy

Seed Level = 0% 1 2 3 4 5 6 7 8 9 10

6.5% 6.5% 6.1% 5.6% 6.0% 5.2% 6.1% 5.4% 5.5% 5.2%

6.3% 6.8% 6.1% 5.7% 6.1% 5.6% 6.2% 5.4% 5.3% 4.8%

6.8% 6.9% 6.5% 6.1% 6.5% 5.4% 6.3% 5.3% 5.8% 5.4%

6.7% 6.7% 6.4% 6.3% 6.4% 5.5% 6.1% 6.0% 5.7% 5.4%

6.9% 6.5% 6.7% 6.3% 6.1% 5.4% 6.1% 4.8% 5.0% 4.3%

Seed Level = 10% 1 2 3 4 5 6 7 8 9 10

12.9% 16.3% 18.4% 20.6% 23.5% 26.4% 30.4% 34.7% 45.3% 55.1%

12.9% 16.8% 17.9% 20.4% 24.1% 27.0% 31.8% 36.2% 48.0% 57.3%

12.9% 15.8% 17.6% 19.7% 22.6% 25.0% 30.0% 33.7% 45.0% 55.4%

13.2% 16.2% 18.0% 20.5% 21.9% 24.9% 28.8% 32.0% 42.2% 49.2%

11.3% 13.9% 15.3% 16.9% 20.0% 22.3% 27.5% 31.6% 44.2% 59.6%

Seed Level = 20% 1 2 3 4 5 6 7 8 9 10

23.5% 33.2% 37.6% 44.3% 51.3% 56.5% 62.0% 68.5% 78.1% 84.4%

25.4% 34.5% 40.6% 47.3% 54.2% 61.2% 66.3% 73.8% 82.6% 88.1%

22.1% 31.5% 35.7% 42.6% 49.1% 55.3% 60.6% 67.7% 77.5% 84.6%

23.2% 31.2% 35.9% 41.1% 46.5% 51.0% 56.7% 62.1% 70.6% 75.7%

20.3% 27.6% 33.4% 40.5% 47.1% 55.4% 63.1% 72.0% 83.6% 92.6%

Size Decile

Table 8 examines whether the over-rejection in the 0% seed case drives the high detection rates of the size-based peers at higher seed levels. We report rejection rates for seed levels of 0%, 10% and 20% for each size decile based on lagged assets. Size deciles are in ascending order, with the smallest (biggest) firms in decile 1 (10).

37   

Table 9 Benchmark Rejection Rates by Industry (No Seeding)

Using All Peers SIC2 1 10 12 13 14 15 16 17 20 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 42 44 45 47 48 49 50 51 52 53 54 55 56 57 58 59 70 72 73 78 79 80 82 83 87 99 Average

ρ (Peergroup size, rejection rate) p value

# Firmyears in base data 442 2,468 91 9,428 521 222 359 214 3,170 979 2,158 662 473 2,026 2,367 11,189 1,833 1,935 632 872 2,279 2,173 8,747 13,866 3,319 10,077 1,405 660 1,455 702 1,328 259 6,037 8,651 2,560 1,762 274 1,959 2,042 273 1,457 359 3,296 2,515 911 569 14,787 293 1,422 1,877 564 82 2,256 1,327 2,659

Avg. # Peers (per year) 14 53 14 195 13 15 15 12 65 29 45 21 14 44 50 232 30 43 14 19 46 48 198 314 65 228 34 15 32 28 28 13 136 239 66 43 13 39 41 19 37 19 74 57 21 14 369 15 35 64 20 13 58 36 63

Jones 5.6% 5.6% 6.8% 1.2% 6.8% 4.8% 4.0% 4.4% 5.6% 4.4% 7.6% 4.8% 5.6% 4.0% 6.8% 1.6% 2.8% 3.2% 6.0% 5.2% 2.8% 2.8% 1.6% 4.0% 4.0% 4.4% 2.4% 5.6% 3.6% 4.0% 5.2% 6.8% 1.6% 2.4% 6.0% 6.0% 7.6% 4.8% 2.4% 8.0% 5.6% 5.6% 4.0% 4.8% 6.4% 4.0% 2.0% 3.6% 4.0% 4.0% 5.6% 6.8% 2.8% 3.2% 4.5%

Jones no intercept 3.2% 5.2% 9.2% 0.8% 5.2% 4.4% 4.4% 4.8% 6.0% 4.4% 8.0% 4.4% 6.4% 5.2% 7.2% 2.0% 1.2% 4.0% 5.2% 4.4% 3.6% 2.4% 1.6% 3.2% 3.6% 4.8% 2.4% 6.0% 3.6% 2.0% 4.0% 4.4% 1.6% 2.8% 5.2% 6.0% 6.0% 3.6% 3.2% 8.0% 4.8% 6.8% 3.6% 4.8% 7.6% 4.8% 2.0% 3.6% 4.8% 3.6% 6.0% 5.6% 3.6% 3.2% 4.4%

Mod Jones 5.2% 6.0% 6.8% 1.2% 8.0% 5.6% 4.4% 4.8% 5.2% 4.4% 8.0% 4.8% 4.8% 4.4% 6.4% 1.6% 2.0% 2.8% 4.4% 4.0% 2.8% 3.2% 1.6% 3.2% 4.0% 4.0% 2.8% 4.8% 4.0% 3.2% 5.2% 5.6% 2.0% 3.2% 7.2% 5.2% 7.6% 4.8% 2.8% 8.0% 5.6% 6.8% 4.4% 4.8% 7.2% 3.6% 1.6% 1.6% 5.6% 4.4% 4.4% 6.4% 2.8% 3.2% 4.5%

Using 10 (Random) Peers Jones + ROA 5.2% 4.8% 6.8% 0.8% 8.0% 4.0% 3.2% 6.0% 4.8% 4.4% 8.4% 4.8% 7.2% 4.4% 7.2% 2.8% 3.2% 4.0% 5.2% 3.2% 3.6% 3.6% 1.2% 3.2% 3.6% 4.4% 2.8% 4.4% 3.2% 4.8% 5.2% 7.2% 1.6% 1.6% 7.2% 6.0% 6.0% 4.8% 3.2% 8.0% 4.8% 6.8% 4.4% 4.0% 7.2% 2.0% 2.4% 3.2% 5.2% 4.8% 5.2% 6.4% 2.0% 3.2% 4.5%

Jones 6.8% 7.6% 5.2% 6.0% 6.0% 4.4% 5.2% 4.0% 6.0% 4.4% 4.0% 4.4% 4.4% 6.8% 2.8% 6.4% 5.6% 5.2% 4.8% 6.4% 5.6% 5.2% 5.6% 4.8% 9.2% 6.0% 4.0% 5.2% 4.0% 7.6% 5.2% 5.2% 5.2% 2.8% 5.6% 4.4% 5.2% 7.2% 6.8% 10.4% 7.2% 8.0% 4.8% 4.8% 2.4% 2.4% 4.0% 3.6% 2.0% 5.2% 3.6% 4.4% 5.2% 7.2% 5.3%

Jones no intercept 4.8% 7.2% 4.8% 5.2% 7.2% 3.6% 5.6% 6.8% 6.0% 4.4% 4.0% 3.2% 6.4% 5.2% 3.6% 6.4% 4.0% 6.0% 6.4% 6.4% 5.6% 4.4% 6.4% 4.4% 7.6% 8.0% 3.6% 4.4% 4.4% 10.0% 4.4% 4.0% 5.2% 4.0% 4.8% 4.8% 5.2% 6.4% 6.8% 8.4% 4.8% 6.8% 3.2% 5.2% 1.6% 3.6% 4.8% 4.4% 2.0% 4.4% 4.8% 4.0% 5.6% 7.6% 5.2%

Mod Jones 6.4% 7.2% 5.2% 4.8% 6.4% 4.0% 6.0% 4.4% 6.0% 4.8% 4.0% 4.4% 4.8% 7.2% 2.4% 6.0% 4.8% 4.8% 4.4% 7.2% 6.4% 5.6% 6.8% 4.0% 7.2% 5.2% 5.6% 4.8% 3.6% 8.0% 6.4% 6.0% 4.8% 2.4% 4.8% 4.8% 5.2% 7.2% 6.8% 10.4% 7.2% 8.0% 4.4% 4.8% 3.6% 2.4% 4.8% 2.0% 2.0% 6.4% 4.8% 4.0% 5.2% 6.4% 5.3%

Jones + ROA 5.2% 5.2% 4.0% 6.8% 7.2% 3.6% 6.4% 4.4% 5.2% 5.2% 5.6% 4.4% 4.8% 8.0% 3.2% 6.8% 4.0% 6.0% 4.4% 4.0% 5.2% 4.0% 7.2% 4.4% 7.2% 4.8% 4.0% 5.2% 2.8% 6.0% 5.6% 6.8% 3.6% 4.4% 5.2% 6.4% 6.0% 6.4% 6.4% 10.8% 6.8% 7.6% 3.6% 5.6% 2.8% 5.6% 4.0% 2.8% 2.0% 3.6% 3.6% 4.4% 5.2% 8.4% 5.2%

-0.52

-0.46

-0.49

-0.49

-0.05

0.05

-0.12

-0.03

0.0000

0.0002

0.0001

0.0001

0.3496

0.3675

0.2001

0.4143

Table 9 shows the rejection rates for the 0% seed case (where we expect a 5% rejection rate) for each 2-digit SIC code (SIC2), using the pooled sample, for several accruals models. The correlation between the ranked size of the industry population and the absolute deviation of the specified rejection rate from 5% (last row) ranges from -0.26 to -0.49, indicating that smaller industries have rejection rates further from the expected 5% value.

38   

Table 10 Optimizing the Number of Peer Firms in the Estimation Sample

Magnitude of Accruals Management  (in % of lagged total assets)  Jones Model (with intercept)  0% 2% 4% 6% 8% 10%  12%  14%  16%  18%  20% 

Number of Peers in the Estimation Sample  10

20

30

50

100

250 

500 

1000

6.1% 8.6% 12.1%  16.9%  22.5%  28.4%  34.0%  39.7%  44.9%  49.6%  54.2% 

5.1% 7.2% 10.6% 15.7% 22.1% 29.0% 36.0% 42.7% 48.9% 54.5% 59.5%

4.8% 6.7% 9.8% 14.6% 21.0% 27.7% 34.7% 41.8% 48.5% 54.4% 59.6%

4.7% 6.3% 9.0% 13.4% 19.4% 26.4% 33.6% 40.7% 47.3% 53.3% 59.0%

4.1% 5.5% 8.0% 12.1% 17.6% 24.2% 31.2% 38.4% 45.2% 51.6% 57.1%

3.7% 4.8% 6.7% 10.3%  15.5%  22.0%  29.1%  36.2%  43.0%  49.4%  55.2% 

3.5% 4.6% 6.2% 9.1% 13.9%  20.0%  27.1%  34.5%  41.5%  48.1%  53.8% 

3.1% 4.0% 5.2% 7.5% 10.9% 16.6% 23.2% 30.7% 37.8% 44.7% 50.5%

Table 10 examines how the size of the lagged asset peer group affects discretionary accruals detection rates. Results are shown for the Jones model with intercept; other models produce similar results and are not tabled. The 0% seed level is a specification check; a well-specified model should show detection of positive discretionary accruals (when none is induced) at a 5% significance level.

39   

Figure 1 graphs the detection rates for each peer definition and seed level (between 0% and 20%) for the Jones model with intercept; other models yield similar inferences and are not shown.

40   

100% 90% 80% 70% 60% 50% 40% 30% 20%

Entire  cross‐section

SIC2

SIC3

SIC4

Total Assets  Neighbors

Lagged  Total Assets  Neighbors

Sales  Neighbors

MktCap  Neighbors

Firm Age  Neighbors

ROA Neighbors

10% 0% 0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Figure 2 graphs the detection rates for each peer definition and seed level (between 0% and 100%) for the Jones model with intercept; other models yield similar inferences and are not shown.

41   

References Bernard, V. and D. Skinner. 1996. What motivates managers’ choice of discretionary accruals? Journal of Accounting and Economics 22: 313-325. Brickley, J. and J. Zimmerman. 2010. Corporate governance myths: comments on Armstrong, Guay and Weber, Journal of Accounting and Economics 50: 235-245. Dechow, P., R. Sloan and A. Sweeney. 1995. Detecting discretionary accruals, The Accounting Review 70; 2: 193-225. DeFond, M. and J. Jiambalvo. 1994. Debt covenant violation and manipulation of accruals, Journal of Accounting and Economics 17: 145-176. Dopuch, N., R. Mashruwala, C. Seethamraju, and T. Zach. 2010. The impact of a heterogeneous accrualgenerating process on empirical accrual models, Washington University working paper. Healy, P.. 1985. The effect of bonus schemes on accounting decisions. Journal of Accounting and Economics 7: 85-107. Hennes, K., A. Leone, B. Miller. 2008. The importance of distinguishing errors from irregularities in restatement research: the case of restatements and CEO/CFO turnover. The Accounting Review 83: 14871519. Jones, J.. 1991. Earnings management during import relief investigations, Journal of Accounting Research 29:193-228. Kothari, S.P., A. Leone, and C. Wasley. 2005. Performance matched discretionary accruals measures, Journal of Accounting and Economics 39: 163-197.

42