Rectilinear Steiner Clock Tree Routing Technique ...

20 downloads 0 Views 322KB Size Report
[5] Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin. Yang. Obstacle-Avoiding Rectilinear Steiner Tree Construction Based on Spanning ...
2015 28th International Conference 2015on28th VLSI International Design andConference 2015 14th on International VLSI Design Conference on Embedded Systems

Rectilinear Steiner Clock Tree Routing Technique with Buffer Insertion in Presence of Obstacles

We adopt a restricted wire snaking model along with buffer insertion in global and local skew minimization steps. Wire snaking, instead of wire sizing is technology friendly design approach with easy optical projection correction (OPC) during fabrication [12]. Restricted wire snaking improves the area and power budget also. Aggressive buffer insertion may cost some area overhead, however, buffer insertion only at the end of global and local routing phases helps optimizing number of buffers used, resulting in minimized skew at the sink terminals. So, this domain has a lot of opportunities in selecting a route with minimum number of buffers of appropriate type as well as selecting a balanced path so as to minimize skew within a specified minimum value. Clock routing is done in presence of obstacles to achieve zero or minimal clock skew to all the sink terminals so that clock signal reaches every sink synchronously for critical paths.

Abstract—Clock tree design plays a significant role in determining chip performance and requires serious involvement for designing a critical VLSI circuit. Algorithms to design clocked net involve complexities of memory and time along with the physical design constraints. In this work an efficient algorithm, BBLUE (Blockage Look Up and Buffer Estimation) is designed, which routes all the sinks in two phases. First routing in the global domain is achieved after tiling process and then routing in the local domain is done by connecting all the sinks inside a tile and combining the routes of all the tiles. Further in this work, BBLUE avoids the obstacles by snaking of wire with Steiner point insertion and the skew minimization is achieved by restricted buffer insertion in an efficient way. BBLUE is tested on ISPD 2010 benchmark suite and performance wise it is a better performer in certain parameters compared to its contenders of the benchmark suite provided by Intel and IBM. Keywords—VLSI Routing; Clock Tree; Obstacle Avoiding Rectilinear Steiner Tree; Buffer Insertion; Skew Minimization.

The technique comprises of the following steps•

I.

Divide the chip layout into number of tiles detecting obstacle edges inside the tiles. • Find Centre of Mass (CM) for every tile averaging terminal points in each tile. • Find Global Centre of Mass (Global CM) averaging CM s of every tile. • Route Global CM from source avoiding obstacles. • Reach the CMs of each tile with minimized delay and clock skew, by global perturbation and buffer insertion. • Connect node pins inside the tiles to reach signals from source with equal delay and merge the tiles in global domain. • Restricted number of buffers is inserted in the routing paths along with wire snaking to reduce clock skew as well as wire delay. The process is alleviated by buffer insertion and wire snaking at the obstacle edges, if required. The rest of the paper is organized as follows. Section II presents a brief overview on existing works on Steiner tree routing and clock routing. Problem definition is presented is Section III. Our proposed method is discussed in Section IV with necessary algorithmic deliberation. Section V shows simulation results and comparative studies of our work with some existing works. Finally, Section VI concludes the paper with possible future directives.

INTRODUCTION

The very large scale integration requires a large number of interconnects for the connection of the several circuit components within the chip. It is a challenge to the computer scientists to design and implement algorithms that do the interconnections in a minimal time with minimal routing cost. Routing in presence of obstacles (due to Blockages, InputOutput Pins and Pre-routed nets) requires more stringent constraints like congestion avoidance, delay minimization, area utilization and many more [7]. Within most VLSI circuits, data transfer between functional elements is synchronized by a single control signal, the processing clock. The basic idea behind the Clock Routing is that, the clock signal is generated external to the chip and provided to the chip through the clock entry point or the clock pin. Each functional unit which needs the clock is interconnected to the clock entry point by the clock net. Each functional unit computes and waits for the clock signal to pass its results to another unit before the next processing cycle. Clock routing [15] is a variant in this paradigm, where the main task is to minimize clock skew. Several work has been done to address the problem of skew minimization [3, 4, 17, 20], however, none of the works claims exact achievability of their goals. The proposed scheme named BBLUE is capable of clock tree construction in the presence of obstacles. Skew minimization by wire snaking is a popular approach, however, not a power and area efficient design. Wire snaking is a method where we turn the interconnects in either of left, right, up or down directions to avoid obstacles and move around them to reach our destination node. 1063-9667/15 $31.00 © 2015 IEEE DOI 10.1109/VLSID.2015.81

II. LITERATURE SURVEY Rectilinear Steiner tree routing is an area of research interest for decades for wire-length minimization [1], [4], [18] 447

minimized simultaneously. The constraints for solving the problem are,

and [13]. The edge removal heuristic that the authors in [19] have chosen to focus is Simulated Annealing approach whereby an initial solution is repeatedly improved by making small alterations till further improvements is not achieved. Works on Rectilinear Steiner Tree avoiding obstacles is presented in [5] and [14]. However, none of the works addresses the performance issues like delay, crosstalk or clock tree generation. Clock routing with buffer insertion for skew minimization is shown in [18], [19], [11], [9] and [20]. In [11], the authors present ClockTune, a simultaneous buffer insertion or sizing and wire-sizing algorithm which guarantees zeroskew and minimizes delay or power in pseudo-polynomial time. In [19], the authors proposed a maze routing based algorithm integrated with the buffer insertion and buffer sizing in order to achieve the effective slew control. In [20], the authors use adjustable delay buffers (ADB) whose delays can be tuned or adjusted to minimize clock skew under different power modes and proposed a linear-time optimal algorithm which assigns the values of ADBs so that the skew is optimal. Also they have proposed an optimal algorithm to minimize the latency with a heuristics to position the ADBs. Only limited number of works is reported in literature for clock tree construction in presence of obstacles. Recently, Sze etal proposed a clock tree routing with buffer insertion and skew minimization [8]. III.



The routing of the sinks involve the delay based routing, i.e. all the points will be with zero or minimum clock skew where routing delay is to be calculated based on Elmore delay model [15].



Obstacles need to be avoided while making interconnect following Steiner tree formation among the pins. Number of buffer insertion should be restricted to a bounded value to reduce area penalty. Wire delay need to be balanced avoiding unnecessary wire snaking.

• •

In the next section, our proposed method is discussed with an explanatory example. IV.

4.1 The Solution – BBLUE Algorithm The solution to the problem is a multistage one. In the previous section, the objective function has been set. Now it will be the journey towards achieving the objective. Algorithm BBLUE works in the following way-

PROBLEM FORMULATION

1)

Buffer: The buffer is a concept but it is implemented using an inverter. The propagation delay for an inverter defines the ultimate speed of logic. However the propagation delay of an inverter circuit is the analysis of a first-order linear RCnetwork. Thus, the propagation delay of such a network for a voltage step at the input is proportional to the time-constant of the network, formed by pull-down resistor and load capacitance. It is desirable for a gate to have identical propagation delays for both rising and falling inputs. This condition can be achieved by making the on-resistance of the NMOS and PMOS approximately equal.

2)

Mathematically, Tp = 0.69ReqCL

(1) 3)

where Tp gives the propagation delay and Req and CL gives the equivalent resistance and the load capacitance respectively[10]. 4)

Clock Skew: In a synchronous circuit clock skew (TSkew) is the difference in the arrival time between two sequentiallyadjacent registers. Given two sequentially-adjacent registers Ri and Rj with clock arrival times at register clock pins as TCi and TCj respectively, then clock skew can be defined as : TSkewi,j = TCi - TCj

PROPOSED METHOD

(2) 5)

Clock skew can be positive or negative. If the clock signals are in complete synchronization, then the clock skew observed at these registers is zero [10]. Problem Definition: Given a set of terminals (pins) and obstacles locations, our objective is to find a clock tree, and a set of feasible buffer locations, buffer widths, and wire-widths with bounded delay such that the zero-skew constraint is satisfied at all the sink terminals and the total tree delay is

6) 7)

448

Read the input benchmark files to obtain the required information about the problem such as the node coordinates, wire details, buffer library, obstacle details and other parasitic to keep them in an intermediate file. Keep all the sinks in the memory and then tile them. The tiling method reduces the larger chip area to some suitable routable area. For the tiling purpose our algorithm considers the LCS or Local Clock Skew distance to be the tile size which is provided in the benchmark [2] as 600 nm. However, we tested our algorithm with other tile sizes also by almost equal partitioning of the chip layout. Once tiling is completed, the Center of mass(CM) e,g, A for tile1 in Figure.1 is calculated considering all nodes present in that tile using function MEAN(Si) where Si represents ith sink as shown in algorithm presented in Figure 2. Then, the node farthest apart from CM in the tile is routed using rectilinear path including Steiner nodes, if required, which corresponds to the delay, Required Arrival Time (RAT) locally using function ROUTECENTRE (Si, C, B, W, BLOCK) where ith Sink, Si, is routed to the CM, C for the tile avoiding obstacle, BLOCK using wire, W and Buffer, B as per requirement. All other nodes are connected to CM following rectilinear path such that the delay of every path is nearly equal to RAT. This is achieved by inserting appropriate number of buffers in required paths. Thus, all the nodes in a tile are skew balanced using procedure stated in Figure 3. Then this process is repeated for all the tiles. In the next phase of the algorithm, similar procedure is applied to find the Global Center of

8)

9)

Mass represented by O in Figuure.1, considering local CMs of all tiles. Then all of the local CMs are coonnected to Global CM using similar procedure as mentioned m in Steps 4 and 5 where the path OR in Figgure.1 corresponds to path for global RAT. Thus dellay is balanced for all local CMs using appropriate number n of buffers. The added part of BBLUE is thatt it is Block aware routing algorithm that also minnimizes the clock skew, thereby showing its efficiency. The blockages are avoided here by b wire snaking around the obstacles.

Procedure: ROUTE_CENTER(P,CE ENTER,BUFF,WIRE,BLOCK) INPUT: Set of Points for Sinks and a Steiner nodes, P, Mean of the points, CEN NTER, Parasitic for buffer, BU UFF, Parasitic for wire, WIRE E, Blockage Parasitic, BLO OCK. OUTPUT: The centre rooted Roouted Tree, TR Dist is the Distance beetween the connecting points D gives the calculated Deelay value for the connection (* Initialization *) 1. P:= { P1,P2,P3, …, Pn } 2. D:={Ø}, TR:={Ø} 3. i:= 1 4. For each Pi  P 5. Connect C Å Pi with WIRE W 6. Dist := CÅ Pi 7. If CÅ Pi face BLOCK K 8. Repeat until CÅ Pi  BLOCK 9. Move CÅ Pi ass { UP, DOWN, RIGHT, LEFT } 10. Increase Dist 11. End If 12. Di:= Calculate Delay foor Dist 13. D:= D U Di 14. Increase i by 1 15. End For (* Initialization *) 16. i:= 1 17. For each Di  D and Pi  P 18. If Di < max(D) 19. Connect CÅ Pi as a CÅBUFFERÅ Pi 20. End If 21. Increase i by 1 22. TR := TR U { CÅ Pi } 23. End For 24. Return TR

Figure. 1. Tiling and Skew Minimization using Bufffers Figure. 3. Procedure ROUTE_CEN NTER

Thus the algorithm BBLUE routes the tiled sinks accordingly such that the overall goal of roouting is achieved keeping all the constraints enforced.

V.

MENTAL RESULTS EXPERIM

This section gives the comparison c about the actual performance of the BBLUE with w respect to its earlier parts that is, minimizing skew onlly using wire snaking without using any buffer. The BBLUE E algorithm was implemented in GNU C in Linux Platform (RH HEL 5.0) on Intel Pentium Dual Core Processor 2.2 GHz and 1GB RAM. BBLUE was further tested on Intel Core i3(x64) inn Windows platform and shows similar performance with respeect to skew and capacitance. The comparison with respect to the performer of ISPD 2010 Benchmarks, Contango2 [6], iss also provided. But this is only from mathematical point of view. v So, for complete analysis another view is required whichh is well depicted in the coming section.

Algorithm: BBLUE – Block Look Up an nd Buffer Estimation INPUT : Set of Tiles, T, Set of Sinks, S, Set of Buffers, B, Set of Wires, W, Set of Blockages, BLOCK. OUTPUT: Skew Minimized Global Routed Treee,  TS is Skew Minimized Local Routed Tree D is Delay in Global (* Initialization *) 1. C:= {Ø}, TL := {Ø}, TG := {Ø} 2. T := {T1, T2, T3, . . . ,Tn } 3. i := 1 4. For each Tile Ti belongs to T 5. Si := {S1, S2, S3, . . . , Sn}  Ti 6. Ci:= MEAN ( Si) 7. C:= C U Ci 8. Ti:= ROUTE_CENTER(Si , Ci , B, W, BL LOCK) 9. TL := TL U TLi 10. Increase i by 1 11. End For 12. Midcenter := MEAN ( C ) 13. TG := ROUTE_CENTER( C, Midcenter, B, W, BLOCK) 14. TG := TG U TL 15.  :=  U TG 16. Return 

5.1 •

Comparative Study The Performance of BBLUE in Skew Minimization, CPU U Execution Time and Capacitance

The BBLUE algorithm is im mplemented in C to compare the results with the contenders of ISPD 2010 contest. The implementation yielded the results that can be compared easily with the help of following charrts.

Figure. 2. Algorithm BBLUE

449

5.2 Analysis Here, it is noted that Required Arrival Time (RAT) is the maximum delay of all the delays within a tile and routing of the other Sinks will require delay less than that of the RAT. This will cause a skew within the tile. The benchmark limits that skew to be 7.5 ps within a Local Clock Skew Distance. BBLUE minimizes this skew within the specified limit. This process is performed as a) tiling b) routing the global Centre of Mass c) routing of all the center of masses for every tile with the global CM d)routing all the sinks present in a tile with its center of mass. Here the BBLUE performs considerably well. Figure. 4.1 Capacitance comparision for [6, 16, 17, BBLUE]

However, it is to be mentioned that the BBLUE algorithm is block aware algorithm. The benchmark design allows blocks to be present within the routable area. The blocks are the areas within the design which will not allow any interconnect to pass through. BBLUE senses block and take necessary avoidance mechanism, specially snaking, to avoid the block and thus complete the routing. The simulation results of BBLUE are depicted in Table I. (Appendix 1) Figure 4.1 depicts comparative capacitance of our algorithm with respect to Contango2 [6], T.Mittal et al [16], Yeh Chi Chang et al [17]. Our algorithm performs much better in 3 benchmarks than others [6, 16, 17] and performs equally well in other 5 benchmarks with respect to overall interconnect capacitance shows the comparative study of our algorithm with [6, 16, 17] for skew minimization. BBLUE shows uniformity in overall skew and is comparable to [6], [16] and [17]. Here, it is worthwhile mentioning BBLUE successfully minimizes the skew well within skew limits provided by the ISPD contest 2010[2]. Another aspect of BBLUE is that it executes comparably much faster than [6], [16] and [17] in all of the 8 benchmarks as depicted in Figure 4.3, though all of the algorithms were executed on similar processors. Figure 4.4 shows that BBLUE performs much better in skew minimization in routing all sinks of every tile than normal wire snaking based algorithm. So, we avoided wire snaking and used buffering technique.

Figure. 4.2 Skew Comparision for [6, 16, 17, BBLUE]

VI.

CONCLUSION

This work presents a two stage technique for obstacle avoiding clock tree construction with buffer insertion. The main objective of the proposed work is to construct obstacle avoiding Rectilinear Steiner tree with minimal clock skew at all the sink terminals. However, there are several parameters that affect the chip design in nanometer regime, like wire congestion, crosstalk between parallel wires, power consumption of buffers and many more. A possible future directive would be to design a rectilinear delay tree addressing several conflicting design parameters like power consumption, minimization of routing area, crosstalk, noise and jitter.

Figure. 4.3 Execution Time Comparison for [6, 16, 17, BBLUE]

REFERENCES [1]

A. B. Kahng, G. Robins. A New Class of Iterative Steiner Tree Heuristics With Good Performance. IEEE Transactions Computer-Aided Design, 11(7):893–902, July 1992. [2] C. Sze, “ISPD 2010 High Performance Clock Network Synthesis Contest: Benchmark Suite and Results”. International Symposium on Physical Design 2010. http://www.sigda.org/ispd/contests/10/ISPD 2010-cns-contest-v3.pdf [3] Charles J. Alpert, Jiang Hu, Sachin S. Sapatnekar, C. N. Sze. “Accurate Estimation of Global Buffer Delay Within a Floor-plan”. IEEE Transactions on Computer-Aided Design of Integrated Circuits And Systems, Vol. 25, No. 6, June 2006.

Figure. 4.4 Skew comparision of BBLUE Vs Normal Snaking Based Clock Tree NOTE: THE NUMBERS [1-8] IN THE CHART X-AXIS REPRESENTS ISPD 2010 BENCHMARK FILES HAVING NAME AS “cns 01”.

450

pp.32-38, 2010, November 2010. [13] M. Borah, R.M. Owens, M.J. Irwin. An edge-based heuristic for Steiner routing. IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems 13,pp. 1563-1568, Dec. 1994. [14] M. R . Garey and D. S. Johnson, “The rectilinear Steiner tree problem is NP-complete,” SIAM J. Appl. Math. , vol. 32, no. 4, pp. 826–834, Jun. 1977. [15] Naveed Sherwani, “ Clock Routing” in Algorithms for VLSI Physical Design Automation, 3rd Edition, Kluwer Academic Publishers, 2002. [16] T. Mittal et al. “Cross Link Insertion for Improving Tolerance to Variations in Clock Network Synthesis”. In Proc. of ISPD, pages 29-36, 2011. [17] Y.C. Chang, C.K.Wang, H.M. Chen, “On Construction Low Power and Robust Clock Tree via Slew Budgeting”, In Proc. of 2012 ACM International Symposium On Physical Design ,Pages 129-136. [18] Yen-Hung Lin, Shu-Hsin Chang, Yih-Lang Li. Critical Trunk Based Obstacle Avoiding Rectilinear Steiner Tree Routings and Buffer Insertion for Delay and Slack Optimization. IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems, vol. 30, no. 9, September 2011. [19] Ying-Yu Chen, Chen Dong, Deming Chen. Clock Tree Synthesis under Aggressive Buffer Insertion. DAC’10, June, 2010. [20] Yu-Shih Su, Wing-Kai Hon, Cheng-Chih Yang, Shih-Chieh Chang, and Yeong-Jar Chang, “Clock Skew Minimization in Multi-Voltage Mode Designs Using Adjustable Delay Buffers”, IEEE Transactions on Computer-Aided Design of Integrated Circuits And Systems, Vol. 29, No. 12, December 2010.

[4] Chris Chu, Yiu-Chung Wong. FLUTE: Fast Lookup Table Based Rectilinear Steiner Minimal Tree Algorithm for VLSI Design. IEEE Transactions On Computer-Aided Design,1-14,2007. [5] Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang. Obstacle-Avoiding Rectilinear Steiner Tree Construction Based on Spanning Graphs. IEEE Transactions on Computer-Aided Design of Integrated Circuits And Systems, Vol. 27, No. 4, April 2008. [6] D.J. Lee et al. “Low-Power Clock Trees for CPUs”. In Proc. of ICCAD, pages 444-451, 2010. [7] Evangeline F. Y. Young Tao Huang. “Obstacle-Avoiding Rectilinear Steiner Minimum Tree Construction: An Optimal Approach.” In Proc. of the 2010 Intl. Conf. on Computer-Aided Design, pages 610 – 613. NY, USA, November 2010. [8] Feifei Niu, Qiang Zhou, Hailong Yao,Yici Cai, Jianlei Yang, C.N.Sze. Obstacle avoiding and Slew-constrained Buffered Clock Tree Synthesis for Skew Optimization. GLSVLSI’11, May 2–4, 2011, Lausanne,Switzerland. [9] Hochang Jang, Deokjin Joo, Taewhan Kim. Buffer Sizing and Polarity Assignment in Clock Tree Synthesis for Power/Ground Noise Minimization. IEEE Transactions on Computer-Aided Design Of Integrated Circuits And Systems, Vol. 30, No. 1, January 2011. [10] J.E.Rabaey, A.Chandrakasan, B.Nikolic, “Digital Integrated Circuits”Design Perspective, 2nd Edition, Prentice Hall of India, 2003. [11] Jeng-Liang Tsai, Tsung-Hao, Charlie Chung-Ping Chen. MinimumDelay/Power Zero-Skew Clock-Tree Optimization with Simultaneous Buffer-Insertion/Sizing and Wire-Sizing. IEEE Transactions on Computer-Aided Design of Integrated Circuits And Systems, Vol. 23, No. 4, April 2004. [12] Kun Yuan and David Z. Pan, "WISDOM: Wire Spreading Enhanced Decomposition of Masks in Double Patterning Lithography", In Proc. IEEE/ACM Int'l Conference on Computer-Aided Design (ICCAD),

APPENDIX 1 TABLE I : COMPARATIVE STUDY OF BBLUE WITH OTHER WORKS[6,16,17] ON ISPD 2010 BENCHMARKS cns01

cns02

cns03

cns04

cns05

cns06

cns07

cns08

95% LCS(ps)

7.01

7.33

4.18

4.46

4.41

6.05

4.58

5.15

[17]

Cap(pF)

198.30

375.90

55.90

71.80

37.70

47.80

72.70

52.50

Contango

Run time(sec)

12015.00

25006.00

3840.00

6075.00

2406.00

2660.00

2351.00

1987.00

95% LCS(ps)

7.32

7.42

4.49

6.70

4.78

6.41

5.86

5.07

[18]

Cap(pF)

142.60

265.20

36.60

51.10

25.10

32.70

48.30

32.70

T.Mittal

Run time(sec)

1092.00

4314.00

383.00

934.00

278.00

285.00

818.00

327.00

95% LCS(ps)

6.48

7.38

4.76

7.14

5.88

5.61

6.62

6.50

[19]

Cap(pF)

137.90

268.30

34.20

42.80

22.10

28.50

43.90

28.40

Chan-Wang-Chen

Run time(sec)

472.00

1450.00

79.00

110.00

40.00

61.00

133.00

54.00

95% LCS(ps)

6.25

6.25

6.21

6.09

6.23

5.64

5.94

5.32

Cap(pF)

115.48

203.31

46.55

72.16

35.42

40.76

65.1

40.59

Run time(sec)

0.66

1.27

0.54

0.98

0.48

0.47

0.99

0.61

[Ours]

451