Statistical Estimation Of Average Power Dissipation Using ...

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 6, NO. 1, MARCH 1998

consumption using the TFP matching scheme was 40% higher than using our full-matched delays, indicating the impact of glitch propagation in TFP. 2) Fine-Grained Stages: An 8-bit shift register was used to compare our approach with the TFP scheme in datapaths with little or no computation. SPICE simulations show that our system (using the fast interlock mechanism) was 40% faster than using TFP scheme, since control overhead in the return-to-zero phase is hidden in our scheme. VIII. CONCLUSIONS This paper presented an architectural optimization for low-power nonpipelined asynchronous systems. The optimization can be applied to low-power applications such as the DCC error corrector in [3] and the FIR filter bank in [2]. Two new concurrent sequencers were introduced, which significantly increase the concurrent activity and throughput of a system. We showed that, when the new sequencers are used, data hazards may arise in existing datapaths. To avoid data hazards, several interlock schemes were then proposed, for both dualrail and single-rail implementations. SPICE simulations showed up to 73% performance improvements, and energy reductions up to a factor of 2.4 after voltage scaling, using our approach over a sequential approach. Simulations also indicated that true four-phase systems may consume up to 40% more energy than ours, due to glitch propagation in the datapath.

65

Alamitos, CA, IEEE Computer Society, May 1995, pp. 82–90. [12] A. J. Martin, “Programming in VLSI: From communicating processes to delay-insensitive circuits,” in Developments in Concurrency and Communication,” C. A. R. Hoarde, Ed. Reading MA: Addison-Wesley, UT Year of Programming Series, 1990, pp. 1–64. [13] K. van Berkel and M. Rem, “VLSI programming of asynchronous circuits for low power,” in Asynchronous Digital Circuit Design, G. Birtwistle and A. Davis, Eds. New York: Springer-Verlag, 1995, pp. 152–210. [14] S. H. Unger, “A building block approach to unclocked systems,” in Proc. Hawaii Int. Conf. Syst. Sci.. Los Alimitos, CA, IEEE Computer Society, Jan. 1993, vol. I, pp. 339–348. [15] C. Farnsworth, D. A. Edwards, J. Liu, and S. S. Sikand, “A hybrid asynchronous system design environment,” in Proc. Working Conf. Asynchronous Design Methodologies, Los Alamitos, CA, IEEE Computer Society, May 1995, pp. 91–98. [16] L. A. Plana and S. M. Nowick, “Concurrency-oriented optimization for low-power asynchronous systems,” in Proc. Int. Symp. Low Power Electron. Design, Aug. 1996, pp. 151–156. [17] M. Favalli and L. Benini, “Analysis of glitch power dissipation in CMOS IC’s,” in Proc. Int. Symp. Low Power Design, 1995, pp. 123–128. [18] S. B. Furber and J. Liu, “Dynamic logic in four-phase micropipelines,” in Proc. Int. Symp. Advanced Res. Asynchronous Circuits Syst., Los Alamitos, CA, IEEE Computer Society, Mar. 1996, pp. 11–16

Statistical Estimation of Average Power Dissipation Using Nonparametric Techniques

ACKNOWLEDGMENT The authors would like to thank S. Unger, S. Furber, A. Peeters, C. Farnsworth, and the anonymous reviewers for their useful comments. They are especially grateful to an anonymous reviewer for comments on asymmetric delays, including their use with TFP to decrease glitching. REFERENCES [1] A. P. Chandrakasan, S. Sheng and R. W. Brodersen, “Low-power CMOS digital design,” IEEE J. Solid-State Circuits, vol. 27 pp. 473–484, Apr. 1992. [2] L. S. Nielsen and J. Sparsø, “A low-power asynchronous data-path for a FIR filter bank,” in Proc. Int. Symp. Advanced Res. Asynchronous Circuits Syst. Los Alamitos, CA, IEEE Computer Society, Mar. 1996, pp. 197–207. [3] K. van Berkel, R. Burgess, J. Kessels, A. Peeters, M. Roncken, and F. Schalij, “Asynchronous circuits for low power: A DCC error corrector,” IEEE Design Test Comput., vol. 11, pp. 22–32, Summer 1994. [4] K. Kagotani and T. Nanya, “Performance enhancement of two-phase quasidelay-insensitive circuits,” Syst. Comput. Japan, vol. 27, no. 5, pp. 39–46, May, 1996. [5] A. Peeters and K. van Berkel, “Single-rail handshake circuits,” in Proc. Working Conf. Asynchronous Design Methodologies, May 1995, pp. 53–62. [6] C. L. Seitz, C. L., “System timing,” in Introduction to VLSI Systems, C. A. Mead, and L. A. Conway, Eds. Reading, MA: Addison-Wesley, 1980, ch. 7. [7] S. H. Unger, Asynchronous Sequential Switching Circuits. New York: Wiley-Interscience, 1969. [8] A. Peeters, “Single-Rail Handshake Circuits,” Ph.D. dissertation, Eindhoven Univ. Technol., Eindhoven, The Netherlands, June 1996. [9] S. M. Nowick, K. Y. Yun, P. A. Beerel, and A. E. Dooply, “Speculative completion for the design of high-performance asynchronous dynamic adders,” in Proc. Int. Symp. Advanced Res. Asynchronous Circuits Syst., Los Alamitos, CA: IEEE Computer Society, Apr. 1997. [10] J. D. Garside, S. Temple, and R. Mehra, “The AMULET2e cache system,” in Proc. Int. Symp. Advanced Res. Asynchronous Circuits Syst.. Los Alamitos, CA, IEEE Computer Society, Mar. 1996, pp. 208–217. [11] A. Bailey and M. Josephs, “Sequencer circuits for VLSI programming,” in Proc. Working Conf. Asynchronous Design Methodologies, Los

Li-Pen Yuan, Chin-Chi Teng, and Sung-Mo Kang Abstract— In this paper,1 we present a new statistical technique for estimation of average power dissipation in digital circuits. The present parametric statistical technique estimates the average power based on the assumption that the power distribution can be characterized by a preassumed function. Large error can incur when the assumption is not met. On the other hand, the existing nonparametric technique, although accurate, is too conservative and requires a large sample size in order to achieve convergence. For a good tradeoff between simulation accuracy and computational efficiency, we propose a new nonparametric technique using the properties of the order statistics. It is generally applicable to any type of circuit irrespective of its power distribution function. Compared to the existing nonparametric technique, it is much more computationally efficient since it requires a much smaller sample size to achieve the same accuracy specification. This new technique is implemented in the distribution-independent power estimation tool (DIPE). DIPE is empirically demonstrated to be more robust and accurate than the parametric technique. Index Terms—Power estimation, reliability, statistical techniques.

I. INTRODUCTION For state-of-the-art VLSI technology, power analysis poses a great challenge to both circuit designers and design automation engineers. Manuscript received September 8, 1996; revised May 15, 1997. This work was supported in part by Joint Services Electronics Program (N00014-95-J1270) and Semiconductor Research Corporation (SRC95DP109). L.-P. Yuan and S.-M. Kang are with the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, Univerisity of Illinois at Urbana-Champaign, Urbana, IL 61801 USA. C.-C. Teng is with Avant! Corporation, Fremont, CA 94538 USA. Publisher Item Identifier S 1063-8210(98)01115-9. 1 See the Guest Editorial of the Special Issue on Low Power Electronics and Design of the IEEE TRANSACTONS ON VERY LARGE SCALE INTEGRATED (VLSI) SYSTEMS, vol. 5, pp. 349–351, Dec. 1997.

1063–8210/98$10.00  1998 IEEE

66


Fig. 1. Structure of a statistical power estimation tool.

For designers, evaluation of battery life in portable equipment and assessment of several reliability problems rely on accurate power analysis. For design automation engineers, accurate and fast power analysis is essential to the development of efficient CAD tools for power optimization. Thus, power estimation has become the focus of research efforts in recent years. The current scope of power analysis covers architectural level, register transfer level, gate level, and transistor level. For gate-level and transistor-level power estimation, statistical technique is attractive because of its efficiency, accuracy and simplicity. A typical statistical power estimation tool consists of three major components: an input pattern generator, a simulation engine, and a simulation-stopping criterion. Depending on the environment, the input patterns for an embedded circuit may have temporal and/or spatial correlations. The input pattern generator is responsible for capturing such statistical characteristics and generating input patterns accordingly. The simulation engine takes input patterns and simulate circuit response, including the power dissipation. In this phase, signal correlations among internal nodes can be inherently taken into consideration. As the number of power data increases, the sample average power approaches the real average power. The stopping criterion turns off the simulation engine when the desired accuracy is achieved and reports the results. The framework of a statistical power estimation tool is sketched in Fig. 1. Implementation of the framework in Fig. 1 is very flexible. Although the three components jointly determine the simulation accuracy and computational efficiency of an estimation tool, they can be implemented separately depending on the availability of resources and accuracy requirement. The major objective of the simulation engine is to generate the power dissipation data of the target circuit. Depending on the desired trade-off between accuracy and computational time, one can choose from electrical simulator (such as SPICE [1]), fast timing simulator (such as ILLIADS [2] and PowerMill [3]) or less accurate gate-level simulator. Also, the input pattern generator can be implemented either by properly modeling the statistics of input streams or by simply using a random input generator. In other words, the implementation details of one component only affects the performance of the tool but not the validity of another component. In this paper, we focus on constructing a generally applicable stopping criterion, which is the key component in a statistical power estimation framework. Stopping criterion plays a crucial part for statistical power estimation because of its important role in determining the simulation time and accuracy. According to the law of large numbers, as the sample size increases, the sample average power approaches the real average power asymptotically, but at the cost of increased simulation time.

Thus, a good stopping criterion should find a good tradeoff between the simulation time and accuracy. For accuracy, it should stop the simulation when the sample size becomes sufficiently large such that the sample average power satisfies the accuracy specification. For efficiency, it should keep the sample size as small as possible to reduce the simulation time. In the literature, two stopping criteria have been proposed [4], [5]. By assuming that the average power dissipation of a circuit over a time interval (0T =2; +T =2] has a normal distribution, Burch et al. [4] proposed a stopping criterion (hereafter denoted as McPower) based on the central limit theorem [12]. McPower is a parametric technique since the stopping criterion is designed by employing the properties of certain distribution function. The drawback of McPower is that it is error-prone when the average power does not follow a normal distribution. Since the causes for circuits to demonstrate nonnormal power distributions are not clearly understood, the use of McPower is inevitably limited. In order to eliminate the dependency of the validity of a stopping criterion on the distribution function, Yuan et al. proposed a nonparametric stopping criterion (hereafter denoted as K–S) [5]. It is based on the Kolmogorov–Smirnov (K–S) theorem [6] which describes the distribution of a random variable D n defined as the maximum difference between the real and sample distribution function. This technique is nonparametric because the distribution described by the K–S theorem is independent of the (power) distribution function. Although this criterion can be generally applied to any circuit regardless of its power distribution, it is too conservative and requires a large sample size for average power estimates to converge. For robust and efficient average power estimation, a stopping criterion must be valid for any distribution, with efficiency superior to the existing nonparametric technique. This paper presents one such criterion. The rest of the paper is organized as follows. In Section II we formulate the average power dissipation problem as a mean estimation problem so that statistical techniques can be applied. In Section III, first we use the properties derived from the distribution-independent order statistics to estimate a confidence band of the cumulative distribution function (cdf). Next, we develop upper and lower bounds of the average power using the confidence band. Based on the distribution-independent bounds, we design a stopping criterion to terminate the random simulation when the bounds satisfy the user-specified accuracy and confidence level. We implemented the proposed technique and tested it on a set of benchmark circuits. The results are reported in Section IV with discussions and comparisons with those obtained from the previous approaches [4], [5], followed by concluding remarks and proposed future work in Section V.

II. PROBLEM FORMULATION For a digital circuit, an input pattern is a binary vector received by its primary inputs. Depending on the environment the circuit is embedded in, input patterns may bear various statistical characteristics. Because of its random nature, the input pattern can be treated as a random variable V : As discussed in Section I, samples of V can be generated by an input pattern generator which takes into account spatial and/or temporal correlations extracted from the input streams, as suggested in [7] and [8]. Power dissipation in CMOS circuits can be attributed to the following three sources: charging/discharging current through load capacitances, direct-path short circuit current, and leakage current. Except for very low voltage technologies, the contribution due to leakage current is usually negligible [9]. Thus, the logic state transition accounts for the major power consumption in a gate. For a circuit with Ng gates, the amount of power it dissipates during one


Fig. 2. Construction of sample cdf

F

67

n (p):

clock cycle can be expressed as

P

=

1

N

T i=1

E i (V 1 ; V 2 )

(1)

where V 1 and V 2 are the input patterns received by the circuit in two consecutive clock cycles, E i is the energy dissipated at gate i and T is the clock cycle time. Since P is a function of random variables E i ; i = 1; 1 1 1 ; Ng ; it is also a random variable and possesses a distribution function. Thus, the average power of the circuit can be interpreted as the expected value of P : It should be noted that the energy dissipated at node i is a discretevalued function of V 1 and V 2 : According to (1), so does P : For practical circuits, however, two adjacent observable values of the large sample space of P are close to each other. Therefore, we can assume that P has a continuous distribution function [10]. However, as will be shown in experimental results, the applicability of the proposed technique is not limited to large circuits. III. ESTIMATION

OF

n! z i01 (1 0 z )n0i : (2) i 0 1)!(n 0 i)! The integration of gi (z ) with respect to z gives Gi (z ): Using partial integration, Gi (z ) can be expressed in the following recursive (

Gi (z ) = Gi01 (z ) 0

relation always holds:

F (P j ) F (P^ j ) F (P j +1 ): (4) ) = Since Pr(F (P j ) zjmin ) = 1 0 =2 and Pr(F (P j +1 ) zjmax +1 min 1 0 =2; [zj ; zjmax +1 ] is a reasonable estimate of a 1 0 confidence interval of F (P^ j ). However, due to the positive correlation between Z j and Z j +1 ; this approximation needs to be justified. A conservative estimator of the confidence level of F (P^ j ) between zjmin and zjmax +1 zjmin F (P j ) F (P j +1 ) zjmax +1 :

For a given circuit, let F (p) be the cdf of P : Suppose that the random variables P 1 ; 1 1 1 ; P n form a random sample of F (p); where P 1 < P 2 < 1 1 1 < P n : A power sample p1 ; 1 1 1 ; pn sorted in the order of increasing value can be viewed as the observed values of P 1 1 1 1 ; P n : Using these data, an empirical discrete distribution function Fn (p) can be constructed by assigning probability 1=n to each of the n values p1 ; 1 1 1 ; pn : For any p (0 < p < 1) the value of Fn (p) represents the proportion of the observed values in the sample which is less than or equal to p; as depicted in Fig. 2. Fn (p) represents a sample function of the unknown F (p): In the random sample, the ith smallest random variable P i is called the ith order statistic and pi in the power sample is the observed value of P i : For each order statistic P i ; i = 1; 1 1 1 ; n; define a new random variable Z i = F (P i ): One prominent feature of Z i is that its cdf Gi (z ) and probability density function (pdf) gi (z ) are independent of the distribution function of P i [12]. The pdf gi (z ) of Z i is

form:

Using (3), for each Z i we can find its 1 0 confidence interval zimin ; zimax ] simply by solving the equations Gi (z ) = =2 and Gi (z ) = 1 0 =2: Let P j ; P j +1 be the j th and j + 1th-order max statistics in a random sample of size n and [zjmin ; zjmax ] [zjmin +1 ; zj +1 ] be the corresponding 1 0 confidence interval of Z j = F (P j ) and Z j +1 = F (P j +1 ); respectively. Because F is a nondecreasing function, for random variable P^ j ; P j < P^ j < P j +1 ; the following [

is the probability of the following event:

AVERAGE POWER

A. Order Statistics

gi (z ) =

Fig. 3. The empirical cdf and 99% confidence band of F (p) of circuit C880 when sample size is 672.

Pr(zjmin =

F Pj F Pj z

z

(

)

z

z

(

+1 )

zj

max +1 )

h(zj ; zj +1 ) dzj dzj +1

h(zj ; zj +1 ) is the joint pdf of Z j and Z j +1 [12] n! h(zj ; zj +1 ) = zjj 01 (1 0 zj +1 )n0(j +1) (j 0 1)!(n 0 j 0 1)! (7)

As shown in Appendix A, Pr(zjmin F (P j ) F (P j +1 ) zjmax +1 ) is larger than 1 0 : Thus, the confidence level of F (P^ j ) between zjmin and zjmax +1 is ensured. The above result can be applied to the random power P^ j distributing over an arbitrary random interval (P j ; P j +1 ) and the 1 0 confidence band of F (p) can be estimated as follows. Given the observed values p1 ; 1 1 1 ; pn of the order statistics, let p0 = 0 and pn+1 = 1: For arbitrary p; 0 < p < 1; we can always find an index j; j 2 0; 1 1 1 ; n; such that pj < p < pj +1 : For random variable P^ j ; pj < P^ j < pj +1 ; F (P^ j ) has a 1 0 confidence interval [zjmin ; zjmax +1 ]: By connecting the endpoints of every confidence interval, a stairwise approximation of the 1 0 confidence band of F (p) is encompassed by curves BL (p) and BU (p) (8)

where BL (p) = zi ; BU (p) = zi ; pi < p < pi+1 ; i = 1; 1 1 1 ; n: To illustrate the idea, Fig. 3 shows the empirical cdf Fn (p) as well as min

(3)

(6)

where

Pr (BL (p) F (p) BU (p)) 1 0

n! n0i+1 z i01 : (1 0 z ) (i 0 1)!(n 0 i + 1)!

(5)

The probability of (5) can be evaluated by

max +1

68


BL01 (u) and u; and from below by the area between BU01 (u) and u; respectively. However, care must be taken here since both BL (p) and BU (p) are nondecreasing stairwise functions whose inverse functions do not exist. To be mathematically correct, we claim that a function

BÛ (p; s) can be created by connecting every two adjacent points zimax and zimax +1 by a smooth invertible function BÛi (p; s); as shown Ûi (p; s); i = 1; 1 1 1 ; n; are invertible, their in Fig. 4. Because B ^ composite function BU (p; s) is also invertible. s is an adjustable Û (p; s) and BU (p): To parameter controlling the closeness between B Û (p; s) arbitrarily keep the bounds as tight as possible, we can make B close to BU (p) and its inverse function still exists. As s approaches Ûi (p; s) infinity, BU (p) becomes the limiting function of B lim

s!1

the estimated 0.99 confidence band between BL (p) and BU (p) of a benchmark circuit C880 when n is 672. (8) is an approximation of the 1 0 confidence band because we compute the confidence interval for one order statistic P j at a time, instead of examining the joint pdf of all order statistics simultaneously. Nevertheless, the effect of confidence loss on average power estimate due to this approximation will be compensated by the additional confidence introduced in the bounds of average power. Thus, (8) is adopted for its computational simplicity. It should be noted that since Gi (z ); i = 1; 1 1 1 ; n are distributionindependent, so is the 1 0 confidence band (8) derived from Gi (z ). Based on (8), in the following we shall show how to find bounds of the average power and further derive a distribution-independent stopping criterion. B. Bounds of Average Power and Stopping Criterion To find bounds of the average power, let us first recall that the average power p of a circuit is the mean of P

1

0

pf (p) dp

(9)

where pdf f (p) = dF (p)=dp: Define a new variable u = F (p): By substituting f (p)dp by du and p by F 01 (u), the integration in (9) can be performed on the domain of variable u as

p =

0

1 01 F (u) du:

(11)

Following the similar procedure, we can make another invertible ^L (p; s) to approximate BL (p): Note that since the confunction B fidence band encompassed by BU (p) and BL (p) are still embraced Û (p; s) and B ^L (p; s); there is no loss in confidence level by by B making this approximation. Û and B ^L ; we can bound p by With invertible functions B

Fig. 4. Lower and upper bound of average power.

p = E [P ] =

BÛ (p; s) = BU (p):

(10)

In (10), the existence of F 01 requires F (p) to be a one-to-one function of p; which is a more strict condition than that F (p) is nondecreasing. According to the properties of cdf, F (p) is a nondecreasing but not one-to-one function of p only when the pdf f (p) is zero at more than one point in the sample space of P . Those points, however, do not contribute to the mean of P because their probability densities are zero. For simplicity of presentation, the following derivation of average power bounds proceeds under the assumption that F 01 (u) exists. The general case when F (p) is nondecreasing is handled in Appendix B, and the same bounds can be derived. By referring to Fig. 4, we can see that p is just the area between F 01 (u) and u axis. It is bounded from above by the area between

0

1 01 BÛ (u) du p

0

1 01 B^L (u) du:

(12)

pL = s01 BÛ01 (u) du be the lower bound and pU = 1 ^ s0 BL01(u) du be the upper bound of p ; respectively, and p be Let

n; (12) can be recast as pU 0 p : (13) p

the sample mean of a power sample of size

pL 0 p p

p 0 p p

Equation (13) provides distribution-independent and computable bounds of the real average power p : As the simulation proceeds, more power data are collected into the power sample and used to ^L (p) and B Û (p) are updated construct the sample cdf Fn (p): B accordingly as explained in Section III-A. The 1 0 confidence ^L (p) and B Û (p) become increasingly closer band gets narrower as B toward the the real cdf F (p) as the sample size increases. As a result, the bounds of the average power pL and pU will approach p : For a desired percentage error and confidence level 1 0 specified up-front by the user, the power simulation can be stopped when the following criterion is satisfied: max

p 0 pL pU 0 p ; p p

:

(14)

The first sample size n to satisfy (14) is defined as the convergent sample size. By (14), we can bound jp 0 p j=p to the specified percentage error and guarantee the obtained sample mean p is close enough to p : Since the derivation of (14) only employs the properties of the order statistics and requires no assumption on the distribution of P ; the stopping criterion is thus distribution-independent and can be applied to any type of circuits. It is worth mentioning that because ^ 01 (u) du of averaging effect, the confidence of p between s01 B U 0 1 1 ^ (u) du is higher than that of F (p) bounded as in (8). and s0 B L That is, while some power data in the sample may exceed, say, the upper bound, the resultant positive error may be cancelled by the negative error from other data lower than BL of F (p). The additional confidence in (12) compensates the loss in (8), as stated previously. As will be shown in experimental results, due to the combined effect of (8) and (12), the confidence level of p between pL and pU is already larger than 1 0 : Thus the usage of BÛ and B^L in the derivation of (12) is rather for mathematical rigorousness than for application interests. In practice, we substitute BL (p) and BU (p) for B^L (p; s) and BÛ (p; s); respectively, and pL (pU ) can be obtained


STATISTICS OF ISCAS85

TABLE I MCNC91 BENCHMARK CIRCUITS

pL = pU

=

n01

i=2 n01 i=1

TABLE II POWER ESTIMATION RESULTS USING DIPE

AND

simply by calculating the area between axis

69

BU (p) (BL (p)) and F (p)

pi01 (BU (i) 0 BU (i 0 1)); pi (BL (i + 1) 0 BL (i)):

IV. EXPERIMENTAL RESULTS

AND

(15)

DISCUSSION

The proposed stopping criterion has been implemented as the core of a distribution-independent power estimation tool (DIPE) along with a gate-level power simulator and an input pattern generator. Currently the input pattern generator can take description decks of signal probabilities of primary inputs and spatial correlation coefficients among the primary inputs. Since the temporal correlations have not been considered, the current version of DIPE only handles combinational circuits. All of the following experiments were performed on a SPARC 20 workstation with 244 MB of memory.

Fig. 5. Correlation between average convergent sample size and normalized variance.

A. Performance DIPE has been applied to a set of benchmark circuits to estimate the average power dissipation, the statistics of which are tabulated in Table I. For all of the following experiments, circuits are assumed to operate at a clock frequency of 20 MHz with 5 V power supply. The maximum error allowed was specified as 5% with 0.99 confidence. For simple demonstration, the signals at primary inputs are assumed to be mutually independent and have probabilities of 0.5. However, as we discussed in Section I, the validity of the stopping criterion is not affected by the statistics in input patterns, as long as random power samples can be obtained. Table II shows the power estimation results for the test circuits. In Table II, SIM is the sample average power obtained from a sample of size 1 million, and is deemed a sufficiently accurate estimate of the real average power. LB(UB) is the lower(upper) bound of the average power calculated according to (15). p is the sample mean power. LB, UB and p are all calculated using the convergent sample size listed in column Sample Size. The last column reports the CPU time usage. For all of the circuits, DIPE produces very accurate average power estimates with reasonable amount of CPU time. As observed in Table I, the

set of test circuits covers a wide spectrum of gate count. This supports the previous argument that although the derivation of the stopping criterion assumes a continuous distribution function for P ; its applicability to small circuits is not affected. Another distinguished property of the technique is that it is dimensionally independent [4], i.e., the convergent sample size is independent of the circuit size. Thus, it is suitable to handle very large circuits. Although the convergent sample size is not a function of circuit size, it depends on how “widely” the power distributes. Fig. 5 shows the correlation between the average convergent sample size over 1000 simulation runs and the normalized variance ((=)2 ) for the set of test circuits. The convergent sample size shows a general trend of growing linearly with (=)2 : This coincides with the similar observation from parametric approaches. In the scope of our approach, a qualitative explanation for this phenomenon is that every power data in the sample is weighted nonlinearly when calculating the bounds of the average power. Starting from (15), pL and pU are nonlinearly weighted sums of the power sample data since (BU (i) 0 BU (i 0 1)) and (BL (i + 1) 0 BL (i)) vary with i:

70


TABLE III PERFORMANCE COMPARISON OF DIPE, MCPOWER, AND K–S FROM THE STATISTICS OF 1000 SIMULATION RUNS

In contrast, the sample mean p is obtained by summing the sample data weighted by the same coefficient 1=n: The variation of weighting coefficients as a function of order i is plotted in Fig. 6 when sample size is 128. It can be clearly seen that smaller power data will be more heavily weighted in estimating pL than pU ; while larger power data will be more heavily weighted in estimating pU than pL . The sample data ordered in between have approximately equal weighting coefficients. Because sample data are unequally weighted, when evaluating the bounds of the average power, the normalized variance plays an important role in deciding the convergent sample size. To understand this, suppose that at some point of simulation, the current sample size is n and m new sample data are collected from the simulation which are smaller than the current sample mean. Let wu ; wav ; wl denote the weighting coefficients of BU (p); Fn (p); and BL (p); respectively. Since wu < wav < wl ; compared to the old bounds, the new upper bound will be closer to the new sample mean while the new lower bound will be farther from the new sample mean. The situation when the collected power data are larger than the current sample mean can be discussed similarly. On the other hand, if the newly acquired data are close to the current sample mean, they will be approximately equally weighted in computing all three quantities. The data of extreme values will be pushed away from the middle of the empirical cdf and will be more lightly weighted. Consequently, both new bounds are closer to the new sample mean. Bearing this observation in mind, it is easy to understand that if 2 (=) is large, sample data tend to spread relatively more widely so that it is more difficult for pL and pU to converge. If (=)2 is small, power data tend to be sampled from a limited range of values and convergence will be faster. B. Comparison with Other Approaches For comparison, we implemented McPower [4] and K–S [5], the statistical approaches which have been proposed. The accuracy specification is the same as used to obtain Table II. In Table II, for McPower the average power sample size Smean ; the sample size from which an average power sample is obtained, is heuristically chosen as 32 to facilitate parallel power simulation, instead of 50 as used in [4]. Strictly speaking, the central limit theorem holds when average power sample size is infinitely large and is only approximately true

Fig. 6. Weighting coefficients for the calculation of sample average power and the power bounds. Sample size n = 128.

when it is finite. The difference in average power sample size here, nevertheless, should not interfere with the comparison since a ruleof-thumb size for the sample mean from any distribution to be at least approximately normal is about 30 [13]. Hence, 32 is deemed an appropriate average power sample size. The comparison results collected from 1000 simulation runs are listed in Table III. For convenience of comparison, the number of samples used in McPower is multiplied by the average power sample size (i.e., 32) to obtain the total sample size. In Table III, Min, Max, and Avg represent the minimum, maximum, and average sample size used during the 1000 simulation runs, respectively; Err shows the percentage of the runs violating the accuracy specification. Since the confidence level is specified as 0.99, the error percentage is at most 1% if the normal distribution assumption made in McPower is valid. As shown in Table III, the error percentage of McPower exceeds 1% for 9 of 21 test circuits, implying that the assumption at this average power sample size is not generally valid. On the contrary, for all the simulation runs conducted for all the benchmark circuits, no error is detected from the results generated by DIPE and K–S. On the other


71

TABLE IV PERFORMANCE COMPARISON OF MCPOWER WITH DIFFERENT AVERAGE POWER SAMPLE SIZES AND DIPE

hand, although both DIPE and K–S do not produce any estimation error in all of the experiments, K–S is too conservative and requires much larger sample size than DIPE in order to achieve convergence. This indicates that DIPE is much more computationally efficient than K–S since the CPU time usage is proportional to the convergent sample size. In summary, the comparison results show that DIPE performs better than McPower in accuracy and better than K–S in simulation efficiency. The results of the DIPE part also justify a previous statement that the confidence level of the real average power between the bounds generated by DIPE is larger than 1 0 : It should be noted, though, that according to the central limit theorem, no matter what the distribution of P is, as Smean approaches infinity, the limiting distribution of the average power sample is normal. Thus, in principle, McPower is always valid. However, in practice Smean is finite and how to choose a “large enough” average power sample size is still unclear. It can be learned from Table III that such size is also circuit dependent. In consequence, if Smean is chosen to be small, McPower can produce an accurate average power estimate with a smaller sample size for circuits whose average power samples follow a normal distribution. For those who do not, McPower would not be able to achieve the desired accuracy and confidence. If a large Smean is picked, the central limit theorem is more likely to hold, but the required sample size begins to approach that of DIPE. In order to see this, Table IV shows the results of the same experiments as Table III with Smean set as 64, 96, and 128, respectively. As Smean increases, the error percentage decreases because the average power sample behaves more like as if it has a normal distribution, with smaller variance. Ideally, the convergent sample size should remain nearly unchanged since variance decreases at the same rate as the average convergent sample size increases. However, with a large average power sample size, one additional average power sample is equivalent to, say, a sample size increment of 128. This effect of coarse discretization in sample size is dominant and overall the convergent sample size of McPower increases. In Table IV, the error percentage is below 1% for all test circuits when Smean is 128, implying the central limit theorem approximately holds. However, the average convergent sample size SSavg is also very close to that of DIPE. Furthermore, if we only consider the circuits

without any estimation error, the average sample size of DIPE is 3.6% lower than that of McPower. This shows that DIPE indeed finds a good tradeoff between simulation accuracy and efficiency. When applied to practical circuits, DIPE seems to be more favorable because how the power dissipation of a circuit distributes and in what manner the average power sample approaches normal distribution with increasing Smean are usually not available at the time of average power estimation. Consequently it would be difficult to justify the validity of any assumption on the functional form of P and the propriety of a chosen average power sample size. In contrast, DIPE minimizes the uncertainty in the accuracy of power estimation results by employing a distribution-independent stopping criterion and is still comparably computationally efficient. Therefore, it can be used confidently as a reliable power estimation tool. V. CONCLUSION

AND

FUTURE WORK

We have proposed a new statistical power analysis tool, DIPE, for estimation of average power dissipation of digital circuits. The core of DIPE is a nonparametric stopping criterion. By using the properties derived from the order statistics, we estimate a 1 0 confidence band of the unknown cdf F (p). The confidence band can be used to find an upper and lower bound of the average power, which can further derive a stopping criterion to terminate the circuit simulation upon achievement of the accuracy specification. By employing the nonparametric stopping criterion, DIPE can be applied to any type of circuit regardless of its power distribution. Experimental results show that DIPE is more robust and accurate than the current parametric technique based on the central limit theorem and is more computationally efficient compared to the current nonparametric technique. Nevertheless, DIPE can be improved in the following ways. First, in the derivation of the stopping criterion, the confidence level of F (p) between BL (p) and BU (p) is usually less than 1 0 : The robustness of DIPE can be enhanced by deriving the confidence band of F (p) from the joint pdf of all order statistics. Second, the current implementation of DIPE assumes the availability of a random power sample, i.e, a sample of independent and identically distributed power data. In many applications, however, the operating

72


environment often provides primary input signals and the feedback mechanism in sequential circuits produces input patterns to the latches which are both spatiotemporally correlated. In order to be applicable in these cases, the effect of input correlations on the performance of DIPE needs to be carefully addressed and remedied. We are currently working on these issues and will report further results in the near future [11]. APPENDIX A

zjmin F (P j ) F (P j +1 ) zjmax +1

The probability of event Pr(zj

min

F P F P (

j)

z z

z

j +1 )

z

dzj +1

=

dzj

z

n! z j 01 (1 0 zj +1 )n0(j +1) n 0 (j + 1)]! j n0(j +1) (1 0 zj +1 ) dzj +1

1 j0 (

1)![

z =

z

1

(

is

max j +1 )

n! zj j ![n 0 (j + 1)]! j

Fig. 7. Construction of

z z

practical interest. As explained in Section III-A, this result indicates ^ that [zjmin ; zjmax +1 ] is a valid 1 0 confidence band of F (P j ):

n! = zjj+1 j ![ n 0 (j + 1)]! z 1 (1 0 zj+1 )n0(j+1) dzj+1 z

APPENDIX B

n! j ![n 0 (j + 1)]! 1 (zmin)j (1 0 zj+1 )n0(j+1) dzj+1 : z

0

z

j

of

Using (2), the first integration on the LHS is just the probability G(zj +1 ) in interval [zjmin ; zjmax +1 ]: Therefore Pr(zjmin =

F P F P 0G G z (

(

max

j +1 ( j +1 )

1 z

min j

G

z

( j

=

j)

) (1

j +1 )

z

max

j +1 )

z min ) +

j +1 ( j

0z

0j

n

j +1 )

n! j !(n 0 j )!

z

min j +1 ( j +1 )

p = of

1 z

min j

( j

)

(1

0z

0 j nn0 j 0 0 0z

(A1)

min n

j

)

!

!(

j

(1

)!

0j

max n j +1 )

1 0

H 01 (u) du:

It can be seen that H (p) is still within the (1 0 ) confidence band F (p). See Section III-B for this discussion. REFERENCES

max Derivation of (A1) utilizes the fact that [zjmin +1 ; zj +1 ] is the 1 0 interval of Z j +1 : In order for F (P^ j ) to have 1 0 confidence between zjmin and zjmax +1 ; in (A1) the residual term min Gj +1 (zjmin ) +1 ) 0 Gj +1 (zj

elsewhere.

H (p) is identical to F (p) wherever f (p) is nonzero. As depicted in Fig. 7, H (p) is discontinuous at points Pil ; i = 1; 1 1 1 ; r; and is a one-to-one function of p: Thus, the inverse function H 01 is welldefined. Since the intervals in which H (p) is not defined do not contribute to the mean of P ; the average power p can be expressed using H 01 (u) as

z

min Gj +1 ( jmin ) j +1 ( j +1 ) 0 j !(nn0! j )! (zjmin)j n0j 1 (1 0 zjmin)n0j 0 (1 0 zjmax +1 ) min min = (1 0 ) + [Gj +1 (zj +1 ) 0 Gj +1 (zj )] n ! 0 j !(n 0 j )! (zjmin)j [(1 0 zjmin)n0j : n0j ]: 0 (1 0 zjmax +1 ) +

F (p) is a nondecreasing but not one-to-one function of p only when the pdf f (p) is zero at more than one point in the sample space of P : Suppose there are r intervals along the positive p axis in which f (p) is zero. Let the intervals be denoted as [P1l ; P1r ); 1 1 1 ; [Prl ; Prr ): We define a new function H (p) as F (p) 8p 2 [0; P1l ) F (p) 8p 2 [Pir ; P(i+1)l ); i = 1; 1 1 1 ; r 0 1 H (p) = F (p) 8p 2 [Prr ; 1) undefined

0G z z 0G z

max j +1 ( j +1 )

H (p).

(A2)

has to be nonnegative. Since this cannot be shown through analytical proof, (A2) has to be evaluated numerically. Our computation shows that the residual probability is larger than zero for all order statistics of

[1] L. W. Nagel, “SPICE2: A Computer Program to Simulate Semiconductor Circuits,” Univ. of California, Berkeley, Memo ERL-M520, 1975. [2] A. Dharchoudhury, “Advanced techniques for fast timing simulation of MOS VLSI circuits,” Ph.D. dissertation, Dep. Elec. Comput. Eng., Univ. Illinois, Urbana-Champaign, 1991. [3] PowerMill Reference Manual, Version 2.7.1, EPIC Design Technology Inc., Aug. 1992. [4] R. Burch, F. N. Najm, P. Yang, and T. Trick, “A Monte Carlo approach for power estimation,” IEEE Trans. VLSI Syst., vol. 1, pp. 63–71, Mar. 1993. [5] L.-P. Yuan, C.-C. Teng, and S.-M. Kang, “Nonparametric estimation of average power dissipation in CMOS VLSI circuits,” in Proc. 1996 IEEE Custom Integr. Circuits Conf., 1996, pp. 225–228. [6] J. D. Gibbons, Nonparametric Methods for Quantitative Analysis. Columbus, OH: American Sciences, 1985.


[7] R. Marculescu, D. Marculescu, and M. Pedram, “Efficient power estimation for highly correlated input streams,” in Proc. 32nd Design Automation Conf., 1995, pp. 628–634. [8] J. Monteiro and S. Devadas, “Techniques for the power estimation of sequential logic circuits under user-specified input sequences and programs,” in Proc. 1995 Int. Symp. Low Power Design, pp. 33–38, 1995. [9] D. Rabe and W. Nebel, “Short circuit power consumption of glitches,” in Proc. IEEE/ACM Int. Symp. Low Power Electron. Design, Monterey, CA, 1996, pp. 125–128.

73

[10] A. Hill, C.-C. Teng, and S.-M. Kang, “Simulation based maximum power estimation,” in Proc. 1996 Int. Symp. Circuit Syst., vol. 4, pp. 13–16, 1996. [11] L.-P. Yuan, C.-C. Teng, and S.-M. Kang, “Statistical estimation of average power dissipation in sequential circuits,” in Proc. 34th Design Automation Conf., Anaheim, CA, 1997, pp. 377–382. [12] R. Hogg and A. Craig, Introduction to mathematical statistics, 5th ed. Englewood Cliffs, NJ: Prentice-Hall, 1995. [13] R. A. Johnson, Miller and Freund’s probability and statistics for engineers, 5th ed. Englewood Cliffs, NJ: Prentice Hall, 1994.