Modeling and characterization of thermally induced ...

4 downloads 865 Views 662KB Size Report
this issue is to design ad-hoc Clock Distribution Networks (CDNs) made up of buffered ... with temperature [9] thus leading to worst-case delay degradation.
Microelectronics Journal 44 (2013) 970–976

Contents lists available at ScienceDirect

Microelectronics Journal journal homepage: www.elsevier.com/locate/mejo

Modeling and characterization of thermally induced skew on clock distribution networks of nanometric ICs Alessandro Sassone, Wei Liu, Andrea Calimera n, Alberto Macii, Enrico Macii, Massimo Poncino Politecnico di Torino, 10129 Torino, Italy

a r t i c l e i n f o

a b s t r a c t

Article history: Received 4 December 2011 Received in revised form 3 July 2012 Accepted 10 July 2012 Available online 3 August 2012

Temperature has traditionally been a key parameter to take into account during the many stages of IC design flows, and in particular, during the sign-off phases of critical circuit components like the Clock Distribution Networks (CDNs). While for old technologies this task was accomplished by means of worst case corner-based static analysis, the advent of nanometric CMOS technologies made this approach intrinsically inadequate. This paper provides a detailed analysis of clock skew variations induced by non-uniform thermal profiles on tree-like CDNs. Using a dedicated simulation framework, we characterized the complex thermal effects that metal interconnects and buffers under inverted temperature dependence (ITD) may induce on the clock tree. Experiments conducted on a synthetic, thermal-programmable benchmark underline the presence of unexpected behaviors that standard tools are not able to catch. & 2012 Elsevier Ltd. All rights reserved.

Keywords: Temperature analysis Simulation Clock tree Skew ITD

1. Introduction Guaranteeing the right functionality of synchronous circuits requires not just that logic paths meet the set-up constraints imposed by registers and flip-flops, but also that clock signals are delivered with minimal phase shift to all the registers and flipflops distributed across the chip. A common practice to address this issue is to design ad-hoc Clock Distribution Networks (CDNs) made up of buffered global interconnects routed with symmetric topological schemes, like the H-tree scheme [1]. Although many algorithms for automated clock routing are available from quite long time, e.g., [2,3], the design of robust CDNs requires synthesis and analysis tools able to catch all the complex effects that may alter the nominal behavior of buffers and metal interconnects. This becomes extremely critical when considering nanometer ICs whose thermal profile is characterized by peak temperatures of 100 1C and spatial gradients of around 50 1C [4]. The clock signal is extremely sensible to temperature conditions: it spans the entire circuit layout crossing sequence of physical regions with highly skewed substrate temperature. Overlooking this issue during the synthesis of the CDN can generate faulty circuits, where race conditions between different paths in the clock tree may induce excessive clock skew, and thus, possible de-synchronization among sequential elements.

n

Corresponding author. E-mail address: [email protected] (A. Calimera).

0026-2692/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.mejo.2012.07.007

To overcome this issue, several recent works proposed thermaldriven clock tree optimizations in which the temperature is considered as a direct design variable during the optimization phase. The authors of [5], for instance, modified the deferredmerge embedding (DME) method (typically used for minimizing total wirelength and power consumption with zero or bounded skew constraints) to minimize clock skew for both uniform and non-uniform thermal profiles. While in [6] a sequence of linear programming methods has been used to minimize clock skew in a buffered clock tree. A different design paradigm, instead, has been presented in [7,8]. In these works the authors make use of runtime skew minimization strategies where dynamically adjustable delay buffers can be programmed according to the temperature variations monitored by thermal sensors in different regions of the die. Although important, considering non-uniform thermal profiles is not the only critical aspect. In addition one should take into account that differently from old technologies, where the propagation delay across both metal wires and MOS transistors shows a rising monotonic dependency to temperature, in modern technologies (i.e., r65 nm) this assumption may no longer hold. While metal interconnects still show a resistivity that increases linearly with temperature [9] thus leading to worst-case delay degradation at high temperatures, for CMOS gates the delay-temperature relationship gets more complicated. Recent works [10] have shown that depending on various parameters, such as cell size, load, supply voltage, and threshold voltage (Vth), the delay can show a direct temperature dependence (i.e., the delay increases with temperature) or an inverted temperature dependence, ITD (i.e., the delay

A. Sassone et al. / Microelectronics Journal 44 (2013) 970–976

decreases with temperature). It is therefore possible that the slow corner, typically assumed at high temperature, may coincide with low temperatures [11]. This contrasting behavior between interconnects and active gates may invalidate the output obtained through standard tools, even if they consider non-uniform temperature distribution. Most of those tools in fact rely on static timing analysis engines which are fast and versatile, but not able to catch transistor level effects like the ITD [12]. Given the complex scenario described above, the aim of this work is to analyze and quantify the effects of temperature on CDNs, also including second order effects that surfaced with the new technologies. In order to support such analysis we implemented a clock skew modeling method that allows to annotate the temperature of individual clock tree elements and perform accurate SPICE-level timing analysis that can capture inverted temperature dependences of active transistors. The tool has been integrated within a commercial design framework as it is fully compatible with industrial technology kits. Without loss of generality, the characterization has been done on a flexible, reconfigurable and thermal-programmable circuit that consists of a two-dimensional grid of micro-heaters mapped into a 45 nm technology by TMicroelectronics. Simulation results show that larger clock skew variations may occur under thermal conditions different than flat high-temperature profiles, but, most important, that the worst corner case can be at either low or high temperatures depending on the supply voltage at which the circuit is powered.

2. Modeling of thermal effects 2.1. Metal interconnects High temperature causes performance degradation in metal interconnects. The temperature rise within the wire is due to both self-heating and heat diffusion from active devices on the substrate layer. According to [13], a comprehensive equation for modeling the temperature T(x) along a metal wire of length L that is routed on a substrate with temperature Tsub can be expressed as   y sinh lx þ sinh lðLxÞ ð1Þ TðxÞ ¼ T sub þ 2 1 sinh lL l where y and l are parameters whose value is function of the current density injected in the wire, the thermal conductivity of the metal used to route the wire and the insulator through which the wire itself is routed. When the temperature on the metal layers rises, the interconnect resistance increases as well. The linear dependence of metal resistivity on temperature is usually expressed as RðxÞ ¼ R0 ð1þ b  TðxÞÞ

ð2Þ

where R(x) is the effective resistance at location x, R0 is the resistance measured at reference temperature (typically 25 1C), b is the temperature coefficient ð1 1CÞ. The value of b for copper at room temperature is 3:9e3, which means for every 10 1C rise in temperature, the resistance would increase by 3.9%. Notice that the parasitic capacitance associated with metal interconnect is rather temperature independent and it is usually considered constant with changes in temperature. The delay of an interconnect subject to a given temperature profile can be therefore modeled using the distributed RC Elmore delay model of [14] D ¼ D0 þ ðc0 L þ C L Þr 0 b

Z

L

TðxÞ dxc0 r 0 b 0

Z

L

x  TðxÞ dx 0

ð3Þ

971

where T(x) is given by Eq. (1), while D0, that is the Elmore delay of the interconnect corresponding to the unit length resistance at reference temperature, is given by ! L2 D0 ¼ Rd ðc0 L þ C L Þ þ c0 r 0 þ r 0 LC L ð4Þ 2 with Rd the driver cell’s ON resistance, c0 ðxÞ and r 0 ðxÞ the metal’s capacitance and resistance per unit length, and CL the load capacitance. 2.2. MOS transistors and CMOS buffers The propagation delay through a CMOS gate is a direct function of the total active current Id drained from the internal transistors during state transition. Hence, the physical effects which govern the thermal behavior of the transistors also influence the speed of the gate. Using the alpha-power model described in [15], the active drain current Id of a short channel MOSFET can be modeled as 8 W > < mðTÞ kl ðV gs V th ðTÞÞa=2 V ds V ds rV dsat Leff ð5Þ Id ðTÞ ¼ > : vsat ðTÞWks ðV gs V ðTÞÞa V ds ZV dsat th where the first equation describes the drain current in the linear region (i.e., V ds r V dsat ), while the second one refers to the saturation region (i.e., V ds 4 V dsat ); W and Leff represent the channel dimension, while kl, ks are constants that lump various technology-dependent quantities and a is the exponent of the alpha-power law [16]. Three parameters are strongly dependent on temperature: The carrier’s mobility m in the linear region, the saturation velocity vsat in the saturation region, and the threshold voltage Vth in both. The temperature dependence of carrier’s mobility is expressed as  m T mðTÞ ¼ mðT 0 Þ 0 ð6Þ T where T is the junction temperature, T0 is the nominal temperature (typically at 27 1C) and m is the temperature coefficient, which is ideally 1.5 but can vary depending on the process. The temperature dependence of saturation velocity has a more linear relationship with temperature, and the dependence is weaker than that of the mobility vsat ðTÞ ¼ vsat ðT 0 ÞhðTT 0 Þ

ð7Þ

where the temperature coefficient h has an extracted value around 150 ms1 K1 . Finally, the temperature dependence of the threshold voltage can be expressed as V th ðTÞ ¼ V th ðT 0 ÞkðTT 0 Þ

ð8Þ

where the temperature coefficient k is measured to be around 0:8 mV K 1 . It is evident that all the three quantities decrease for increasing temperature; however, they affect the drain current in opposite ways: while a lower m (in the linear region) or vsat (in the saturation region) causes the drain current to decrease, a lower Vth causes the drain current to increase. Depending on which parameter dominates, the current of a transistor, and hence the speed of the gate, will either increase or decrease as temperature increases. At high supply voltages, i.e., V dd b V th , the gate overdrive (V gs V th ) is less sensitive to thermally induced variations in Vth, the mobility (or saturation velocity) effect dominates and the drain current decreases with temperature. This makes a CMOS gate slower as temperature increases. At low supply voltages, i.e.,

972

A. Sassone et al. / Microelectronics Journal 44 (2013) 970–976 200

-40ºC -20ºC 0ºC 25ºC 50ºC 75ºC 100ºC 125ºC 150ºC

180

Propagation Delay [ps]

160

sink2

140 120 100 80

buffer

60

source

40 20 0.75

0.8

0.85

0.9

0.95 1 1.05 1.1 Supply Voltage [V]

1.15

1.2

1.25

1.3

sink1

Fig. 1. Temperature-dependent propagation delay for different supply voltages and temperatures.

V dd Z V th , the quantity (V gs V th ) becomes smaller and thus more sensible to changes in the Vth, the thermal effect on Vth dominates, and the drain current increases with temperature. This makes a CMOS gate faster as temperature increases. In other words, depending on the V dd =V th ratio, the transistor delay can either increase (V dd =V th b1) or decrease (V dd =V th Z 1) as temperature increases. The latter case is usually referred to as inverted temperature dependence (ITD) and the supply voltage, where temperature dependence inverts, is called zero-temperature coefficient (ZTC) voltage VZTC. Fig. 1 shows the characterization results of a clock buffer using SPICE simulation. The propagation delay is plotted against an increasing supply voltage for different temperatures. The ITD effect can be clearly observed at low supply voltages, e.g. with a supply voltage of 0.75 V, the delay increases with a decrease in temperature. To notice that the largest delay occurs when the temperature is at  40 1C. As the supply voltage increases, the ITD effect becomes less apparent and eventually the curves for different temperatures begin to cross over showing direct temperature dependence. At the ZTC voltage the buffer is almost insensible to temperature variations.

Fig. 2. Paths subject to different temperatures develop different delays. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

Environmental parameters, such as high supply voltage and low temperature, are assumed as Best Case scenario while low supply voltage and high temperature are assumed as Worst Case scenario. However, due to the contrasting thermal behavior between transistor and metal levels, skew analysis is typically more complex and difficult. Fig. 2 illustrates the case in a clock tree where two paths (Source-to-Sink1, and Source-to-Sink2) having the same lengths, but different temperatures can develop unequal delays due to the thermal effect. While hot (i.e., red) regions corresponds to the worst-case for all the metal interconnects, concerning buffers one should take into consideration if Vdd is below the VZTC; in that case, cold (i.e., blue) regions correspond to the worst case condition. Consequently, the ‘‘true’’ Worst Case scenario may not lie at the highest temperature.

2.3. Temperature induced clock skew 3. Clock tree simulation framework Temperature may have a significant and complex impact on the delay of both metal interconnects and CMOS devices. As a result, signal paths that have to span long distances and are subject to large thermal gradients are mostly prone to thermally induced delay mismatches. Clock distribution network, being the largest on-chip network, plays a crucial role in the correct operation of synchronous systems. The clock network is more vulnerable to thermally induced delay variations, which can cause synchronization errors and, in the worst case, result in circuit failure. In a clock distribution network, clock edges may arrive at the sinks at different point of time due to delay unbalance. A phase shift in the arrival times is commonly referred to as clock skew. More formally, the clock skew is defined as the maximum difference between the arrival time of the clock signal at any two sinks in the set of clock pins skew ¼ 9maxft i t j g9,

i,j A S

ð9Þ

where S is the set of all sinks of the design. Standard design flows usually rely on corner based analysis, accounting for process, voltage and temperature variation effects.

Standard commercial design tools analyze clock skew assuming uniform temperature profiles given by technologies libraries pre-characterized under different process, voltage, and temperature (PVT) corners. As discussed in previous sections, large thermal gradients on the die and the possible presence of ITD effect makes this simple assumption invalid since the ‘‘Worst Case’’ scenario may not lie in the corners. We implemented a dedicated clock skew analysis framework that is fully integrated with commercial design tools and libraries, and can take into account the spatial temperature gradients across the die and the complex delay-temperature dependencies of CMOS gates and interconnects. Fig. 3 shows a block diagram of the proposed exploration framework. The flow entails four main steps: 1. Logic and physical synthesis: This first phase consists of a standard logic synthesis followed by a place&route stage where gates are placed in layout rows and clock distribution network is fully synthesized and routed as a clock tree; intermediate steps include area-, power- and performance-driven optimizations.

A. Sassone et al. / Microelectronics Journal 44 (2013) 970–976

973

mesh are modeled as a subcircuit composed of resistors, capacitors and current sources based on the analogy between heat diffusion in the thermal domain and current flow in the electrical domain. The thermal mesh can therefore be converted to an equivalent RC circuit to be solved using circuit analysis techniques. Since meshing the chip at the tiny size of metal wires would result in an excessive number of thermal cells, we only obtain temperatures on the device layer and temperatures on metal layers are computed using Eq. (1). We use SPICE to solve the equivalent RC circuit to obtain the nodal voltage within each thermal cell, which, according to the thermal–electrical analogy, is in fact the temperature in the center of the thermal cell. Power estimation is based on the annotated switching activity obtained through simulation of the post-synthesis netlist with random generated test vectors. Using the layout and power consumption information of the standard cells at the post-placement stage, our thermal simulator can produce a highly accurate thermal map. 3.2. The clock tree simulator

Fig. 3. Design flow and simulation framework for thermal analysis of the clock tree.

The next three phases are independent from the kind of algorithms adopted during the clock tree synthesis. 2. Logic-thermal simulation: At this stage we estimate the thermal profile of the layout also considering uneven switching activity of internal gates. The behavior of the post-synthesis netlist is simulated with a logic-simulator using random generated test vectors. This allows us to extract the signal statistics of the internal nodes, i.e., static probability and number of toggles, which are then backannotated within a probabilistic power analysis tool (in our framework we use the power analysis engine integrated within the physical design platform). Exploiting the characterization data provided by the silicon vendor, we can finally extract accurate power values for each gate accordingly with its logic activity. To finalize the logicthermal analysis, both power consumptions and physical locations of each standard cell are used to generate the equivalent RC-based thermal model of the die. A detailed temperature distribution map is then generated as main output. 3. Parasitic extraction: An accurate estimation of the interconnect parasitics is annotated on a Standard Parasitic Exchange Format (SPEF) file. 4. Clock tree simulation: Taking as input the information generated in the previous steps (i.e., the netlist of the circuit, the thermal map, and the SPEF file) we perform a temperaturedriven delay simulation of the clock tree; output reports include delay measurement on the longest and shortest timing arcs, as well as the maximum and minimum arrival time of all the buffers at each level of the clock tree.

3.1. The thermal simulator The heat generated in the transistor junctions during signal transition is mainly transferred to the ambient environment through conduction. In general, heat conduction can be modeled using Fouriers law. In our work, we use the finite difference method (FDM) to solve the steady-state heat diffusion problem. Our thermal simulation method is described in detail in [17] and we briefly summarized it below. Using the FDM method, a chip is meshed into a uniform threedimensional grid of thermal cells of same size. Thermal cells in the

This is the actual core of the entire simulation framework. As shown in Fig. 3, it works after the clock tree synthesis (CTS) stage of the design flow, when the clock distribution network is placed and routed following a tree-like structure. The analysis is internally performed in two steps: (a) netlist generation and (b) path delay simulation. The generated netlist contains the components of a clock tree and their topological connectivity, including active components, i.e., clock buffers, modeled by dedicated SPICE netlists and model cards provided by the silicon vendor, and passive components, i.e., metal wires modeled as lumped RC networks, with the value of parasitic resistance and capacitance read from the extracted SPEF (Standard Parasitic Exchange Format) file. The sinks of the clock tree (i.e., the flip-flops fed by the clock signal) are modeled as load capacitances and the corresponding values are taken from the datasheet provided by the silicon vendor, according to the chosen PVT corner. The locations of clock buffers and RC elements in the netlist are analyzed against a given thermal map to set the individual temperature values. Temperature of a buffer is simply the value at the same location in the thermal map while temperature of an RC element is the value computed using (1). In the second step, i.e., the path delay simulation step, a transient analysis is performed to calculate the arrival time for all paths in the clock tree. During the simulation, timing characteristics of the clock tree are measured and reported, including the longest and the shortest timing paths, the maximum and minimum arrival time of all buffers at each tree level and most importantly the clock skew. The final report contains a full-set of data also including, the longest and the shortest timing paths, as well as the maximum and minimum arrival time of all the buffers at each level of the clock tree. It is worth emphasizing that the proposed clock skew analyzer allows the annotation of temperatures at individual clock tree elements and performs detailed timing analysis that can capture the directed and inverted temperature dependence effect. Instead of just analyzing the quality of an obtained design solution, the proposed clock skew analyzer can also be used in an iterative approach where thermal compensation techniques are applied to reduce the extra skew caused by temperature variation.

4. Clock-skew analysis 4.1. A flexible test-case: the micro-heater benchmark In order to get a true understanding of the impact of thermal effects on the performance of the clock tree the availability of a

974

A. Sassone et al. / Microelectronics Journal 44 (2013) 970–976

that ranges from 3 to 19 stages, represents an independent source of heat that can be activated using the enable control signal EN. By applying vectors of enable signals, it is possible to distribute the switching activity across the chip, therefore generating arbitrary thermal maps. Few examples are reported in Fig. 5 which shows a 15  15 thermal grid obtained through our thermal simulator. To notice that there is not a 1-to-1 matching between the number of thermal cells and the number of microheater blocks, namely each thermal cell may include multiple micro-heaters. The circuit has been synthesized and placed using Synopsys tools belonging to the Galaxy Design Platform, while a 45 nm CMOS technology provided by STMicroelectronics has been used during technology mapping. The obtained circuit, which consists of 171,037 cells, has a clock tree with 526 buffers and 13,500 sinks.

huge set of arbitrary power density maps is essential. Unfortunately, generating such maps only through the control of the input data in a generic circuit is extremely difficult if not infeasible. Borrowing the idea of [18], we designed a dedicated circuit that serves as benchmark for the thermal characterization. It consists of a thermal-programmable layout made up of a 30  30 two-dimensional grid of micro-heater blocks, Fig. 4. Each block, which contains 15 inverter chains with a variable length

4.2. Simulation results

EN

The purpose of this section is to validate the functionality of the proposed simulation framework, but also to provide the readers with in-deep analysis of the effects of non-uniform thermal gradients while considering that active transistors may experience ITD. Table 1 shows a first set of simulation results where the performance of the clock tree are measured under different temperature distribution but same supply voltage, i.e., V dd ¼ 1:1 V, that is the nominal value for this technology. Each row in the table refers to a different thermal profile. Uniform represents the typical worstcase considered by standard tools, i.e., a uniform high temperature distribution (125 1C). Hotspot is the temperature profile when the center of the layout is thermally overexcited with very high switching activity, while the external ambient temperature is 25 1C. Labels N140, E140, S140, W140, refer to the previous Hotspot case but assuming that circuit is surrounded by an hotter component

Chain

EN

Block

Grid Fig. 4. Abstract view of the placed benchmark.

16

16

14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

0

2

4

6

8

10

12

14

16

0

16

16

14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

0

2

4

6

8

10

12

14

16

0

100 90 80 70 60 50

0

2

4

6

8

10

12

14

16

40

100 90 80 70 60 50

0

2

4

6

8

10

12

14

16

40

Fig. 5. Different thermal maps generated using the micro-heater benchmark and used as test-cases in Table 1 (clockwise: N140, S140, W140, E140).

A. Sassone et al. / Microelectronics Journal 44 (2013) 970–976

Table 1 Experimental results. Thermal map

DTemp (1C)

Longest path (ps)

Shortest path (ps)

Global skew (ps)

Uniform

0.0

441.97

426.24

15.72

Hotspot

15.63

445.34

428.54

16.81 (6.9%)

N140

17.73

450.57

431.85

18.72 (19.0%)

E140

40.54

449.67

431.05

18.63 (18.5%)

S140

45.45

447.40

429.98

17.41 (10.8%)

W140

41.92

447.49

430.01

17.48 (22.7%)

l75s25

50.00

446.63

428.68

17.95 (14.2%)

l75s125

50.00

439.65

423.67

15.98 (1.6%)

l25s75

50.00

444.16

428.20

15.96 (1.5%)

l125s75

50.00

438.09

423.20

14.89 (  5.3%)

90

0.8V 0.85V 0.9V 0.95V 1.0V 1.05V 1.1V 1.15V 1.2V 1.25V 1.3V

80 70

50 40

5. Conclusions In this paper we have shown that non-uniform thermal profiles combined with the complex temperature dependence of active transistors result in complex thermal behaviors which are 90

70 60 50 40 30

20

20

10

10

Temperature [ºC]

90 10 0 11 0 12 0 13 0 14 0

80

70

60

50

40

30

20

0 10

0 -2 0 -1 0

-3

0

0

-40 ºC -20 ºC 0 ºC 25 ºC 50 ºC 75 ºC 100 ºC 125 ºC 140 ºC

80

30

-4

Skew [ps]

60

heat other than internal active transistors. In addition, our simulator is also able to show the impact of hotspots located on critical regions of the circuit, as in the case of heat concentrated on the longest/shortest paths (14.1% of error for l75s25). The second set of experiments aims at showing the effect of ITD on clock tree. The plots in Fig. 6 show the skew-temperature (left) and skew-voltage (right) relationships for operating temperatures which range from 401 C to 1401 C (uniform distribution) and supply voltages from 0.9 V to 1.3 V (the nominal supply voltage is 1.1 V). Let us first focus on the left plot. For large values of Vdd, the traditional direct temperature dependence holds, i.e., the skew increases with temperature (bottom curves with a positive yet moderate slope). As Vdd is set to lower values, the clock skew gets progressively insensitive to temperature variations (the flat curves in the middle), before showing an inverted dependence at very low Vdd, i.e., the skew reduces with temperature (the upper curves). It is worth emphasizing that for intermediate values of Vdd, e.g., 1.1 V, the clock skew does not show monotonicity against temperature, and the worst-case appears at an intermediate temperature around 55 1C. This causes trouble on standards tools, as it is difficult to analytically determine (off-line) the exact temperature that maximize (or minimize) the delay of a path. From the plot on the right it is also possible to abstract the concept of ZTC at the clock tree level, namely find the voltage at which the clock shows zero skew variation due to temperature. The VZTC can be graphically extracted as the x-coordinate at which the curves intersect each others,  1:0 V. Identifying the VZTC is crucial not just for the sign-off stages, but also during the optimization phases. If the circuit is powered with V dd o V ZTC , buffers show an inverted temperature dependence and the longest path, whose delay is mostly affected by buffers, gets slower as temperature reduces. Therefore the worst-case temperature manifests at low temperatures. For instance, at Vdd ¼ 0:8 V the skew increases from 29 ps at 140 1C to 82 ps at 40 1C. On the contrary, when V dd 4V ZTC , the buffers are almost insensible to temperature variations (refer to Fig. 1), and the skew, dominated by delay degradation along metal wires, increases with temperature. At Vdd ¼ 1:3 V the worst highest skew of 18 ps occurs at 140 1C.

Skew [ps]

(at 140 1C) placed at one of the four edges: up, right, down and left respectively. Even if we are dealing with standard cell designs, those artificially generated maps emulate thermal patterns that can be obtained in realistic multi-cores architectures where the internal temperature of a chip is influenced by neighboring cores. Finally, the last four maps represent specific cases where the longest and the shortest clock paths cross thermal regions showing 50 1C of temperature gradients: l75s25 where the longest path works at 75 1C and the shortest at 25 1C, l75s125 with longest path at 75 1C and shortest path at 125 1C, l25s75 with longest path at 25 1C and shortest path at 75 1C, l125s75 longest path at 125 1C and shortest path at 75 1C. In the same table, DTemp represents the maximum temperature gradient on the circuit, Longest Path and Shortest Path are the root-to-sink propagation delay along the longest and the shortest paths of the clock tree respectively, while Global Skew is the clock skew (percentage values represent difference w.r.t. the Uniform map). As one can observe, the clock skew is underestimated when considering a flat temperature distribution. The skew reported in the Hotspot row, for instance, is 6.9% larger, while the difference increases even more for the other cases (up to 19% for N140). Interestingly, the increase of the skew is larger for typical thermal maps (i.e., those with a single concentrated hotspot) than for apparently worst-case ones in which the shortest and the longest clock paths exhibit a large temperature gradient. Such numbers underline the deficiency of standard corner-based tools, which fail to correctly catch the thermal effects induced by sources of

975

0 0.8

0.85

0.9

Fig. 6. Clock skew temperature dependence.

0.95

1

1.05

1.1

Supply Voltage [V]

1.15

1.2

1.25

1.3

976

A. Sassone et al. / Microelectronics Journal 44 (2013) 970–976

not observable with standard analysis tools. To overcome this issue we developed a SPICE-level simulation framework that provides designers with accurate yet efficient skew analysis. Simulations on a thermal-programmable benchmark mapped on a 45 nm industrial technology clearly states the existence of critical operating conditions for which the clock skew shows unusual thermal behaviors, that is, a non-monotonic relationship with temperature. This translated into worst case conditions different from standard high-temperature corners which may invalidate the output of clock tree synthesis tools. References [1] P. Ramanathan, A. Dupont, K.G. Shin, Clock distribution in general vlsi circuits, IEEE Trans. Circuits Syst. I: Fundamental Theory Appl. 41 (5) (1994) 395–404. [2] R.-S. Tsay, An exact zero-skew clock routing algorithm, IEEE Trans. ComputerAided Des. Integr. Circuits Syst. 12 (2) (1993) 242–249. [3] J. Cong, A. Kahng, G. Robins, Matching-based methods for high-performance clock routing, IEEE Trans. Computer-Aided Des. Integr. Circuits Syst. 12 (8) (1993) 1157–1169. [4] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, V. De, Parameter variations and impact on circuits and microarchitecture, in: Proceedings of the Design Automation Conference (DAC’03), 2003, pp. 338–342. [5] M. Cho, S. Ahmedtt, D.Z. Pan, Taco: temperature aware clock-tree optimization, in: Proceedings of the IEEE/ACM International Conference on ComputerAided Design (ICCAD’05), 2005, pp. 582–587. [6] K. Athikulwongse, X. Zhao, S.K. Lim, Buffered clock tree sizing for skew minimization under power and thermal budgets, in: Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’10), 2010, pp. 474–479.

[7] A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, L. Benini, A. Macii, E. Macii, M. Poncino, Dynamic thermal clock skew compensation using tunable delay buffers, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 16 (6) (2008) 639–649. [8] J. Long, J.C. Ku, S. Memik, Y. Ismail, Sacta: a self-adjusting clock tree architecture for adapting to thermal-induced delay variation, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 18 (9) (2010) 1323–1336. [9] K. Banerjee, A. Mehrotra, Global (interconnect) warming, IEEE Circuits Dev. Mag. 17 (5) (2001) 16–32. [10] A. Calimera, E. Macii, M. Poncino, R.I. Bahar, Temperature-insensitive synthesis using multi-vt libraries, in: Proceedings of the ACM Great Lakes Symposium on VLSI (GLSVLSI’08), ACM, 2008, pp. 5–10. [11] A. Calimera, R.I. Bahar, E. Macii, M. Poncino, Temperature-insensitive dualsynthesis for nanometer cmos technologies under inverse temperature dependence, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 18 (11) (2010) 1608–1620. [12] A. Dasdan, I. Hom, Handling inverted temperature dependence in static timing analysis, ACM Trans. Des. Automat. Electron. Syst. 11 (2006) 306–324. [13] A.H. Ajami, K. Banerjee, M. Pedram, Modeling and analysis of nonuniform substrate temperature effects on global ULSI interconnects, IEEE Trans. Computer-Aided Des. Integr. Circuits Syst. 24 (6) (2005) 849–861. [14] W.C. Elmore, The transient response of damped linear networks with particular regard to wideband amplifiers, J. Appl. Phys. 19 (1) (1948) 55–63. [15] J.C. Ku, Y. Ismail, On the scaling of temperature-dependent effects, IEEE Trans. Computer-Aided Des. Integr. Circuits Syst. 26 (10) (2007) 1882–1888. [16] T. Sakurai, A.R. Newton, Alpha-power law mosfet model and its applications to cmos inverter delay and other formulas, IEEE J. Solid-State Circuits 25 (2) (1990) 584–594. [17] W. Liu, A. Calimera, A. Nannarelli, E. Macii, M. Poncino, On-chip thermal modeling based on SPICE simulation, in: Proceedings of the International Workshop on Power And Timing Modeling, Optimization and Simulation (PATMOS’09), 2009, pp. 66–75. [18] S. Reda, Thermal and power characterization of real computing devices, IEEE J. Emerg. Select. Topics Circuits Syst. 1 (2) (2011) 76–87.