Supply and Threshold Voltage Optimization for Low Power Design

7 downloads 0 Views 192KB Size Report
One of the most effective ways to design low power cir- cuits is to use low power ..... slow circuits, to reduce static dissipation, and increases by. ~20-100 mV ...
Supply and Threshold Voltage Optimization for Low Power Design David J. Frank, Paul Solomon, Scott Reynolds and John Shin* IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598 * Hyundai Electronics Industries, Korea

Introduction One of the most effective ways to design low power circuits is to use low power supply voltages. If the threshold voltages are also reduced, it is possible to maintain good performance at these lower voltages. This paper addresses the question of how to choose the optimum supply and threshold voltages for low power design. Other workers have also addressed this question, but have only considered nominal conditions[1-3] or nominal conditions plus simplified tolerances[4,5]. These prior works have used simplified models for device switching speed and power dissipation. In the present work we take more detailed account of parameter tolerances, take full account of short channel effects in the devices, and carry out full circuit simulations to obtain accurate speed and power information. The general goal of these optimizations is to take a given circuit and determine the supply voltage and threshold voltage that will result in minimum dissipation for that circuit, subject to the condition that its delay is not longer than a specified amount. Realistic operating margins are incorporated by including the effects of device and supply variations in the optimizations. Furthermore, because the statistical variation of the threshold voltage depends on the gate length (worse for shorter gates), the nominal gate length is also considered as an optimization parameter. Thus, the optimization problem is refined to the following: Find the nominal supply voltage, nominal threshold voltage, and nominal gate length, such that the worst case power dissipation is minimized while the worst case delay does not exceed a specified amount.

Optimization Procedure Our procedure is to tabularize the results of a large number of circuit simulations at different conditions, and then to derive optima from the tabulated data according to various constraints. Since these data are derived from actual simulations, all forms of dissipation are automatically included:

dynamic, static and crossover. The simulations use device models corresponding to 0.1 µm technology. These models are based on the work in [6,7], with the addition of DIBL and short channel V T -rolloff models. The model parameters yield the basic FET characteristics shown in Table 1. TABLE 1. 0.1µm FET characteristics, as simulated. T=85C, VGT=VGS-VT. nFET

pFET

Drain current (mA/mm @ VDS=1.5, VGT=1.0)

515

240

RSD (Ohm mm @ VDS=0.1, VGT=1.5)

0.7

2.0

gm (mS/mm, max. @ VDS=1.5)

540

260

DIBL (mV/V @ VDS=0.2)

57

62

Subthreshold slope (mV/dec)

108

107

110

100

VT

model

– VT

extr

(mV)

The final row provides a VT calibration by showing the difference between the model parameter VT, which is used throughout this paper, and the VT that is extracted from the IV curves by extrapolating the I D curves to zero. The primary circuit that has been simulated is a conventional static CMOS adder. The ripple carry delay through two bits has been used as the primary unit of delay, which is equivalent to four gate delays. Using the ASX circuit simulator, roughly 1000 separate simulations have been run for different supply voltages V DD , threshold voltages V T , and gate lengths l G . These results are assembled into a large array and interpolation is used to create a new array representing worst case delay and power. Interpolations on this second array are used to locate the points corresponding to minimum worst case power at specific values of worst case delay. The inclusion of worst case considerations is one of the most significant portions of this work. Six types of parameter variations are considered in evaluating worst case circuit performance.

Global (chip-to-chip) variations in gate length. The threshold variations caused by these gate length variations (due to short channel effects) are also included in the model using the roll-off behavior shown in Fig. 1.

Figure 1: Threshold voltage versus gate length, relative to V T at 0.1 µm.

2.

Global variations in threshold voltage due to process fluctuations, but not due to (1). 3. Global variations in the power supply voltage due to overall regulation tolerances. 4. Local (intra-chip, device-to-device) variations in gate length, and the accompanying threshold variations due to short channel effects. 5. Local variations in threshold voltage not due to (4). 6. Local variations in supply voltages due to on-chip resistive drops and inductive effects. These supply variations are thought to significantly dominate over wire-towire coupled signal noise at the macro level considered here, and so no separate coupled signal noise was included. These six variations are considered to be distributed according to uncorrelated Gaussian distributions with standard deviations of σ G, l , σ G, V , σ G, V , σ L, l , G T DD G σ L, V , and σ L, V , respectively. In addition to the above T DD variations, the temperature was taken to be the maximum operating temperature of 85 C for all of the simulations, since this should result in maximum leakage dissipation (requiring higher threshold voltages), and slowest operation.

The first three variations are global, and can be handled by simple interpolations from the tabulated values, since they correspond to the global independent variables used to generate the table. The last three variations are local (device-to-device) and cannot be precisely evaluated from the tabular data. To determine the magnitudes of these effects, we have run a large number of statistical ASX simulations on the adder circuit. An example of the result of such a simulation is shown in Fig. 2. This figure shows the strong caustic behavior of the scattergram close to an optimum point. For design points further from optimum, the scattergram takes on the more common circular or elliptical cloud-like shape, with much less correlation between power and delay. The statistical simulations show that the local variations can be reasonably well correlated with smaller magnitude global variations, as indicated in Table 2. The table only shows the correlations for the delay. Local statistical variations in the power dissipation are not treated, as they are expected to become insignificant when averaged over the very large number of circuits on a chip. In the last four columns, σ τ ⁄ σ parameter is the ratio between the standard

DELAY (ns)

1.

POWER (mW) Figure 2: Delay versus power scattergram of adder cross-section for VDD and VT near optimum conditions and including global variations of power-supply and gate length only. Note that near optimum the data show a caustic behavior since power has been minimized as a function of delay by the optimization process.

TABLE 2. Correlations between local variations and global derivatives.

V DD

στ ⁄ σl -----------------G ∂τ ∂ lG

στ ⁄ σV ------------------T∂τ ∂ VT

στ ⁄ σV DD ---------------------∂τ ∂ V DD

στ ⁄ σV DD ---------------------∂τ ∂ VT

Case

lG

VT

1

0.1

0.25

0.8

0.336

0.514

1.93

0.802

2

0.1

0.31

0.44

0.386

0.444

1.146

0.766

3

0.1

0.28

1.4

0.338

0.634

2.984

0.942

4

0.1

0.176

0.96

0.368

0.632

2.942

0.944

5

0.17

0.176

0.96

0.530

0.588

2.50

0.772

deviation in the circuit delay and the standard deviation of the parameter fluctuations that caused that delay distribution. In each case only one parameter is allowed to have fluctuations, while the other parameters are held constant. ∂τ ⁄ ∂parameter is the derivative of the delay with respect to a global parameter variation. Thus, these last four columns show the correlation between the sensitivity of the delay to local variations and the sensitivity of the delay to global variations. As can be seen, the correlations for l G and V T are reasonably consistent, and so C l = 0.38 and G C V = 0.6 have been chosen for these correlation coeffiT cients. The correlations for global V DD variations are not very consistent, however. Rather, it is found that the effects of local V DD fluctuations are better correlated with global V T variations, as shown in the last column. A value of C V = 0.9 is adopted for this correlation coefficient. DD Using these correlation coefficients, the local parameter variations can be accounted for in the same way as global fluctuations, by interpolations from the tabular data. Realistic “worst case” conditions are derived from the point of view that the end goal is to achieve a high yield of working chips from the manufacturing process. (Note that this goal is with respect to chips suffering from ordinary process variations. Chip failures due to defects do not contribute the present considerations.) This idea is formalized by requiring that the chip satisfy both constraints (maximum delay and maximum power) with a probability p greater than or equal to a specified value, Y0, typically taken as 90%. As illustrated in Fig. 2, a functional chip is unlikely to fail both constraints, so it suffices to take a value, Y=95% as the probability of passing each test individually. If the overall probability distributions for the delay and power are Gaussian, then the number of sigmas out on those distributions required to get a given value of Y can be determined from the error function. Several values for Y are shown in Table 3. TABLE 3. Number of sigmas required to achieve a yield Y. Y

90%

95%

98%

99%

Number of σ’s

1.28

1.64

2.04

2.37

The situation on a chip is more complicated, however. The chip has a set of ‘critical paths,’ all of which must meet the worst case delay constraint[8]. The delays on these paths should all vary together with global parameter changes, but local parameter fluctuations will cause variability in the delays of these critical paths. Assume that these critical path delays are Gaussian distributed with probability density given by 2

 ( τ – τL )  1 P L(τ, τ L) = -------------------- exp  – --------------------- , 2πσ τL  2σ 2 

(1)

τL

where τ is the actual delay of a given path, τ L is the mean value of τ, averaged over local variations, but not over glo-

bal variations, and σ τL is the standard deviation of critical path delays caused by local fluctuations. (Note: σ τL is taken to be 1 ⁄ ( n st ) times the standard deviation of the delay through the simulated circuit block[8], since these delay variations in each stage of the path are uncorrelated, where nst is the equivalent number of circuit block delays in an entire clock cycle.) Then the probability of all N of the critical paths having τ ≤ τ WC is p(τ L, τ WC) =  

τ WC

∫–∞

N

P L(t, τ L) dt , 

(2)

where τ WC is the worst case delay constraint. Next, assume that the τ L ‘s are distributed with a Gaussian probability distribution due to the global parameter fluctuations: 2

 ( τL – τG )  1 P G(τ L, τ G) = --------------------- exp  – ------------------------- , 2 2πσ τG   2σ

(3)

τG

where σ τG is the standard deviation of circuit delay caused by global fluctuations, and τ G is the mean value of τ, averaged over all variations. Then the total probability, over all fluctuations, that all of the critical paths have delay less than τ WC is p T(τ G, τ WC) =



∫–∞ PG(t, τG)p(t, τWC) dt .

(4)

The goal now is to find τ WC – τ G such that p T(τ G, τ WC) ≥ Y . For σ τL ⁄ σ τG less than 1 (which is the expected regime of interest), p(t, τ WC) can be approximated as a step function in Eq. 4 with reasonable accuracy. Equation 4 can then be approximately solved for τ WC – τ G as τ WC – τ G = n σL σ τL + n σG σ τG ,

(5)

N n σL ≈ 2 ln  ----------------------------- .  π ln ( N ⁄ 2 )

(6)

where

A similar equation applies to the power, but without the term for local fluctuations. In Eq. 5, n σG is the number of standard deviations required for global variations, and is given by Table 3. n σL is the number of standard deviations required for local fluctuations. The relationship in Eq. 6 is approximately valid for N > 2 and is plotted in Fig. 3. Since n σG and n σL can be determined for any particular set of parameters, Eq. 5 would readily give the desired worst case delay for a given set of nominal parameters if σ τL and σ τG were known. If the delay and power were linear with respect to the independent parameters, it would be easy enough to calculate these σ’s from standard statistical theory. Unfortunately, this problem space is really quite nonlinear. Various ways of handling the statistics in this nonlinear space have been considered. The only reasonably

Eq. 8, Table 3, and Eq. 6, respectively.

σ τL is given by

2

σ τL = 2 ∂τ  2 2 2 ∂τ 2 2 2 2  ------- C σ +  ----------  C σ + C V σ L, V   ∂l G l G L, l G  ∂V T  V T L, V T DD DD (10)

in accordance with the previous discussion of the correlations between local and global fluctuations.

Results Figure 3: Plot of number of local standard deviations versus N, the number of critical paths.

tractable approach that has been found is to use a linearitybased theory to estimate the most probable worst case parameter values, and then to carry out interpolations for those parameter values in the full nonlinear table. In this way the nonlinearity is at least partially taken into account. The linearity approximation appears reasonably good for the inverse delay, but may not be as good for the power. For simplicity, the following discussion is in terms of delay. The actual calculation is done twice, once in terms of inverse delay and once in terms of power, to obtain the worst case inverse delay and the worst case power, respectively. If the delay is a sufficiently linear function of the independent parameters, it can be expressed as ∂τ ∂τ ∂τ τ = τ 0 + -------- δl G + ---------- δV T + -------------- δV DD ∂l G ∂V T ∂V DD

(a)

1X Tolerances

(7)

In this case, the standard deviation of τ is given by σ τG =

Typical results for these optimizations are shown in Figs. 4 and 5 for the two bit section of a static CMOS adder circuit. In these optimizations nσL and nσG are 3.0 and 1.64, respectively. Fig. 4 illustrates the effect of parameter variations on the minimum achievable power dissipation, where the 4(b) design points tolerate twice as much parameter variation as the 4(a) points and require 1.5-2X more power.

(b)

∂τ  2 2 ∂τ 2 2 ∂τ 2 2  ------- σ +  ---------- σ +  -------------- σ  ∂l G G, l G  ∂V T G, V T  ∂V DD G, V DD

2X Tolerances

(8) If an excursion of ∆τ occurs in τ, it can be shown that the most probable values of the independent variables required to generate that excursion are 2

δl G

σ G, l ∂τ G =  -------- ∆τ ------------2  ∂l G σ τG 2

δV T

σ G, V ∂τ T =  ---------- ∆τ -------------2  ∂V T σ τG

(9)

2

δV DD

σ G, V ∂τ DD =  -------------- ∆τ -----------------2  ∂V DD σ τG

To compute these excursions, ∆τ is taken equal to τ WC – τ G , from Eq. 5, with σ τG , n σG and n σL given by

Figure 4: Plot of worst case delay versus nominal gate length for a static CMOS adder. At each point the threshold voltage and supply voltage have been optimized to achieve minimum possible worst case power for that particular nominal channel length and worst case delay. In determining power, the cycle time is taken to be 7 times the circuit delay indicated in the plot, and the activity factor was taken to be 0.05. The tolerances in (a) are as given in Table 4. For (b) the tolerances are doubled.

(a)

1X Tolerances

Figs. 4 and 5 indicate that the optimum low power design point for slow circuits may well be at longer-than-minimum gate lengths. The sensitivity of the optimum design point to the 6 tolerance parameters is tabulated in Table 4, using the 0.1 µm, TABLE 4. Nominal variations and design point sensitivities. nσL and nσG are 3.0 and 1.64, respectively. ∆P WC --------------∆σ

∆V T ----------∆σ

∆V DD --------------∆σ

6 nm

0.54 µW/nm

10. mV/nm

33 mV/nm

10 mV

0.09 µW/mV

1.6 mV/mV

6 mV/mV

2%

0.39 µW/%

-0.6 mV/%

10 mV/%

5 nm

0.02 µW/nm

0.5 mV/nm

2 mV/nm

7 mV

0.02 µW/mV

0.8 mV/mV

3 mV/mV

4%

0.34 µW/%

0.3 mV/%

17 mV/%

Nom. value σ G, l

(b)

2X Tolerances

σ G, V

T

σ G, V

DD

σ L, l

Figure 5: Optimum voltages and thresholds corresponding to Fig. 4. Each point indicates the optimized threshold and supply voltage for a given nominal gate length and worst case delay requirement. The points along each curve are for lG=0.09, 0.1, 0.11, 0.12, 0.14, 0.17, and 0.2 µm, from top to bottom. There are no valid solutions for the shortest delays at the longer gate lengths, and optimizations that reached the maximum permitted supply voltage (1.8 V) have also been excluded.

Note also that for the longer delays the optimum gate length is longer than the nominal technology gate length of 0.1 µm, due to the smaller variability of threshold voltage at longer gate lengths. Power-delay curves extracted from these opti– 1.3 –3 mizations show power varying as τ to τ as the delay varies from long to short, indicating that the best low power trade-offs occur for moderately fast circuits. Fig. 5 shows that the optimum supply voltages can readily drop below 1V, and can even reach 0.5 V if the speed target is slow enough. The optimum threshold increases for slow circuits, to reduce static dissipation, and increases by ~20-100 mV between (a) and (b) to accommodate the increased tolerances, with the largest increases occurring for the shortest gate lengths. For 0.1 µm technology it appears that the model parameter threshold voltage should be chosen in the range 0.34-0.45 V, depending on desired speed and anticipated process tolerances. In the usual case where the VT of a given technology cannot be varied, the results in

G

G

σ L, V

T

σ L, V

DD

0.6 ns design point. The first three rows are the global variations, and the last three rows are the local variations. As can be seen, global uncertainty in the gate length has the most effect on the design point. The remaining variations have little effect on the VT design value, but both the global and local supply variations have significant effect on VDD and on power dissipation. The dependence of the design points on activity factor and logic depth is illustrated in Fig. 6, where the same parameter tolerances have been used as in Figs. 4 and 5. As shown, the optimum nominal threshold voltage can become quite low for high duty factor and/or short logic depth circuits, but these conditions are not expected to be common in low power circuitry. The consistency of the center design points in Fig. 6 can be seen by observing that the 3 1/2 orders of magnitude of the on/off current ratio are roughly balanced by the activity factor/logic depth ratio of 0.0018 multiplied by the static to dynamic power ratio at minimum, which is around 0.2 (because of the very fast VT dependence of the static power). To better understand the reasons for the optimum threshold and supply voltages obtained, a series of optimizations have been carried out with progressively greater simplification. The results are shown in Table 5, again for the 0.1 µm, 0.6 ns design point The first row is for the full FET model and complete set of nominal parameter variations (given in Table 4). In the second row, the full FET model is used, but all parameter variations have been removed, so that this optimization represents a purely nominal design with no

is optimized instead for 5 ns delay, the optimum VT and VDD are 0.21 V and 0.26 V, respectively, which is roughly comparable to previous optimization work for ultra low power[1,3]. As can be seen, the reduction of the subthreshold slope is required to enable these very low voltages and powers.

(a)

Conclusions. Robust low power technology design points require realistic FET characteristics and careful consideration of the process and circuit variations that will occur. Simplified models may result in unrealistically low voltage and power expectations. For moderately active logic circuitry and the aggressive technology (0.1 µm channel length) and process/circuit tolerances considered here, it appears that the measured VT (extrapolated from I D curves) should be in the 0.24-0.35 V range, while the supply voltage may be as low as 0.6 V for relatively slow circuit designs.

(b)

References

Figure 6: Plot of (a) supply voltage and (b) threshold voltage versus the activity factor-to-logic depth ratio for four different delay constraints. Logic depth = 4nst = 48, 40, 28, and 8 for the data points from left to right across the plots.

margins. This results in a 22% reduction in VDD and 42% lower power. In the third row, there are no variations and the two-dimensional effects have been removed from the FET model, making it equivalent to older technology. This removal of 2D effects only results in a slight, 7%, reduction in power. The fourth row is the same as the third, but with the temperature reduced, resulting in a substantial reduction in voltages and power. Finally, the fifth case shows the effect of reducing the subthreshold slope to an ideal value of 55 mV/decade (at 273 K), which (if possible) would permit operation at very low voltages. If this 5th case

1. J. Burr and A. Peterson, NASA VLSI Design Symp., pp. 4.2.1-13 (1991). 2. A. J. Bhavnagarwala, V. K. De, B. Austin and J. D. Meindl, “Circuit Techniques for Low Power CMOS GSI,” 1996 ISLPED Dig. Tech. Papers, Monterey, pp. 193-196. 3. Z. Chen, J. Shott, J. Burr and J. D. Plummer, “CMOS Technology Scaling for Low Voltage Low Power Applications,” 1994 SLPE Dig. Tech. Papers, San Diego, pp. 56-7. 4. D. Liu and C. Svensson, “Trading Speed for Low Power by Choice of Supply and Threshold Voltages,” IEEE J. Solid-State Circ. 28, p. 10 (1993). 5. Z. Chen, J. Burr, J. Shott, J. D. Plummer, “Optimization of Quarter Micron MOSFETs for Low Voltage/Low Power Applications,” 1995 IEDM Tech. Dig., pp. 63-66. 6. G. Merckel, J. Borel, and N.Z. Cupcea, “An accurate large-signal MOS transistor model for use in computeraided design,” IEEE TED, ED-19, pp. 681-690 (1972). 7. G. Baccarani and M.R. Wordeman, “Transconductance degradation in thin-oxide MOSFETs,” IEEE TED, ED-30, pp. 1295-1304 (1983). 8. M. Eisele, J. Berthold, D. Schmitt-Landsiedel and R. Mahnkopf, “The Impact of Intra-Die Device Parameter Variations on Path Delays and on the Design for Yield of Low Voltage Digital Circuits,” 1996 ISLPED Dig. Tech. Papers, Monterey, pp. 237-242.

TABLE 5. Variation of design point results with model simplifications. Case

Variations

2D

Temp.(C)

Sub-VT slope (mV/dec)

VT

VDD

Power (µW)

1

yes

yes

85

108

0.352

0.980

8.57

2

no

yes

85

108

0.32

0.762

4.97

3

no

no

85

108

0.29

0.726

4.64

4

no

no

0

80

0.23

0.606

3.14

5

no

no

0

55

0.15

0.455

1.75