BBVC-3D-NoC: An Efficient 3D NoC Architecture Using ... - IEEE Xplore

2010 IEEE Annual Symposium on VLSI

BBVC-3D-NoC: An Efficient 3D NoC Architecture Using Bidirectional Bisynchronous Vertical Channels Amir-Mohammad Rahmani1,2, Pasi Liljeberg2, Juha Plosila2, and Hannu Tenhunen1,2 1 Turku Centre for Computer Science (TUCS), Turku, Finland 2 Computer Systems Lab., Department of Information Technology, University of Turku, Finland Email: {amir.rahmani, pasi.liljeberg, juha.plosila, hannu.tenhunen}@utu.fi number of interconnect TSVs, an architecture using high-speed BBVCs is proposed. Fig. 1 shows the schematic representation of such a system. The main idea of the proposed 3D NoC system is to exploit a bidirectional channel for inter-layer communication operating at a higher frequency compared to intra-layer communication (f2>f1) and being capable to dynamically change the channel direction between routers in neighboring layers based on the real time need of bandwidth.

Abstract—In this paper, a 3D NoC architecture based on Bidirectional Bisynchronous Vertical Channels (BBVC) is proposed as a solution to mitigate area footprints of vertical interconnects. BBVCs, which can be dynamically self-configured to transmit flits in either direction, enable the system to benefit from a high-speed bidirectional channel instead of a pair of unidirectional channels for inter-layer communication. By exploiting the high-speed nature of the vertical links in 3D ICs, this substitution indicates better bandwidth utilization, lower area footprint, and improved routability at each layer. Our results reveal that the proposed architecture helps to achieve up to 47% savings in TSV area footprint at the 65nm technology node.

I.

INTRODUCTION

Fig. 1. Example of a 3D NoC with three 3×3 layers

The emergence of three-dimensional (3D) integration and NoCs technologies provides a new horizon for on-chip interconnect design. The major advantage of 3D NoCs is the considerable reduction in the length and number of global interconnects, resulting in an increase in the performance and decrease in the power consumption and area of wire limited circuits [1]. In addition, the 3D NoC approach offers a matchless platform to implement the globally asynchronous, locally synchronous (GALS) design paradigm [2]; this makes the clock distribution and timing closure problems more manageable and enables 3D technology to be suitable for heterogeneous integration. In 3D ICs, out of different types of stacking technologies, wafer stacking is one of the most promising yet inexpensive implementation technologies for 3D ICs [3]. Wafer stacking relies on Through-Silicon Vias (TSVs) [3] for vertical connectivity, guaranteeing low parasitics and high density of vertical wires. In 3D NoCs, as the number of cores increases in each layer to support increasing application complexity, the amount of communication between layers is also expected to grow, and consequently the number of interconnect TSVs will get higher. Since each TSV requires a pad for bonding to a wafer layer, the area footprint of TSVs in each layer is no longer negligible. To the best of our knowledge, there is only one study targeting to minimizing the number of TSVs in 3D NoCs [4] in which serialization of vertical TSV interconnects is proposed as a way to reduces the interconnect TSV footprint. Although, this can lead to a better thermal TSV distribution resulting in lower peak temperatures, as well as more efficient core layout across multiple layers due to the reduced routing complexity, it degrades the performance due to serialization overhead and low bandwidth utilization. In this paper, we explore a mechanism to reduce TSV area footprint, and thus improving 3D IC cost, routability, thermal efficiency, and power consumption. Specifically, we propose a novel technique to replace the pair of unidirectional vertical channels between layers by a bidirectional channel that is dynamically selfreconfigurable to be used in either out-going or incoming direction. To compensate the bandwidth degradation, we exploit the low-latency nature of vertical TSVs by establishing high-speed inter-layer communication using mixed-clock FIFOs.

II.

3D routers in a mesh-type 3D NoC usually have routing logic, arbitration logic, crossbar and seven pairs of unidirectional ports: East, West, North, South, Local, Up, and Down. To implement a 3D NoC architecture using BBVCs, we modify the configuration of the Up and Down input/output ports and replace their two unidirectional channels by a BBVC, as can be seen from Fig. 2. We avoid changing the structure and operating frequency of the routing modules, the arbiter, and the crossbar of a conventional router. As shown in the figure, in order to dynamically adjust the direction of vertical bidirectional channels at a run time, we add a control module to each Up and Down port to arbitrate the authority of the channel direction.

Fig. 2. Proposed router architecture supporting bidirectional bisynchronous vertical channels

In order to support different clock frequencies for inter-layer communication, we replace the synchronous input FIFOs of the Up and Down ports by bisynchronous FIFOs [5]. In addition, a bisynchronous FIFO is required for each vertical output port to enable a different (higher) clock frequency for inter-layer communication. Since in this case the number of slots in the bisynchronous FIFO does not affect the general architecture, to reduce the area overhead caused by two additional output FIFOs, the FIFO length can be reduced. This router architecture is one of the possible router designs regarding the fact that the proposed NoC architecture is capable to be utilized in other 3D topologies using routers with different crossbar sizes.

BBVC-BASED 3D-NOC ARCHITECTURE

Generally the bandwidth utilization of vertical interconnects in a conventional NoC architecture is low (please refer to Section 3), and it can be speculated that many vertical channels are idle during each cycle because of the fixed channel direction. Regarding this fact and the aforementioned pros and cons of 3D NoC design, to reduce the

978-0-7695-4076-4/10 $26.00 © 2010 IEEE DOI 10.1109/ISVLSI.2010.21

452

(a)

300 200

0.57% 0.45% 0.35% 0.27% 0.22%

100 0 0

0.1

0.2

0.3

(b)

300 200

0 0

Average Packet Arrival Rate

0.83%

0.64% 0.51% 0.39% 0.29% 0.22% 0.18%

100

0.1

0.2

Bi-synchronous FIFO credit_i

put_token

credit_out

rx

get_token

tx

FSM Control Module

credit_i

clk_read

credit_out

0

eop

0.2

0.3

0.4

0.2 0.1

Typical-3D-NoC BBVC-3D-NoC

0 0

0.2

0.4


layer communication was set to 1 and a 2 GHz, respectively. To perform the simulations, we used an XYZ wormhole routing algorithm under uniform and Negattive Exponential Distribution (NED) [6][7] traffic patterns. 50

% Area Saving

40 30

130 nm

20

90 nm

10

65 nm

0 8 um TSV 8 um TSV 16 um TSV 16 um TSV Pitch, 32-bit link Pitch, 64-bit link Pitch, 32-bit link Pitch, 64-bit link

Fig. 4. Percentage of area saving with BBV VC-based communication scheme for different TSV pitches and link sizess for 130–65 nm technologies

clk _read

Fig. 5(a) and 5(b) show that our BBVC-3D-NoC B has almost the same performance as the Typical-3D D-NoC in these two traffic patterns. The number next to each point p shows the performance overhead percentage for each averaage packet arrival rate (the difference between two curves). The bandwidth utilization experiments versus average packet arrival rate under synthetic patterns are presented in Fig. 6(a) and 6(b). We can find that BBVC3D-NoC always has better bandwidth utilization due to its flexibility.

clk _write

clk _read

clk_write

IV.

Fig. 3. Inter-layer data transmission scheeme

CONCL LUSION

In this paper, we proposed a dyn namically self-reconfigured 3D NoC architecture exploiting bidirecttional bisynchronous vertical channels. By replacing a pair of unidirrectional vertical channels by a high-speed bidirectional channel, thiss communication scheme can considerably reduce the area footprin nt of interconnect TSVs and mitigate routing complexity. To com mpensate the performance loss caused by reduction in number of verticcal links, bisynchronous buffers for vertical communication were utilized d.

In the proposed communication scheme, thhe control module possessing the token, in the case there are one or more available packets in its input port FIFO, starts to send the paccket. After sending the last flit indicated by the eop signal, if there is a pending request from the other router, it passes the token to the neeighboring module via the put_token signal, changes its status to rreceive_mode, and disconnects the data_out port from the bidirectionnal link. Next, the control module of the adjacent router changes its m mode to send_mode, connects the received credit_o coming from inputt port buffer of the opposed router to credit_i of its output port bufffer, and starts the transmission.

III.

0.1

0.3

Fig. 6. Bandwidth utilization versus average packet arrival rate results under ED traffics. (a) uniform and (b) NE

data

Bi-Synchronous FIFO

clk _write


0.1


data

data

Bi-synchronous FIFO

Down port of Router 2 (Layer x+1)

rx

data

Put_token

Up port of Router 1 (Layer x)

Control Module FSM

Bi-synchronous FIFO

eop

get_token

tx

0.2

0

Fig. 3 discloses details of the proposed communnication scheme and signals between the two adjacent routers. The vertical channel between the routers is composed by the following signalss: (i) tx/rx: control signal indicating data availability; (ii) data: dataa to be sent; (iii) credit_in: control signal indicating space availabbility in the target buffer; (iv) credit_out: control signal indicating sppace availability in the input buffer; (v) clk_read and clk_write: cloock signals. In the proposed scheme, the bidirectional channel is managged by a finite state machine. To dynamically change the channel ddirection, a tokenpassing-based technique is utilized, therefore one exxtra signal for both directions is needed to pass the token between ccontrol modules. It should be noted that since the vertical ports arre benefiting from bisynchronous FIFOs, they can transmit data wiith a higher speed compared to intra-layer communication.

clk _read

0.3

0.3

(b)

0.4

0.4

Averaage Packet Arrival Rate

Fig. 5. Latency versus average packet arrival rate resullts under (a) uniform and (b) NED traffics

clk_write

(a)

0.5

Utilization

0.81%

0.5

0.6


400

Utilization

500


400

Average Packet Latency

Average Packet Latency

500

REFERENC CES

EXPERIMENTAL RESULTTS

To assess the efficiency of the proposed BBV VC-based 3D NoC, we have simulated the Typical-3D-NoC and B BBVC-3D-NoC to characterize their area, latency and bandwidth utilizzation. The area of the BBVC-based communication scheme wass computed once synthesized on CMOS 130, 90, and 65nm standardd cells by Synopsys Design Compiler. Fig. 4 shows the area savinggs when using the proposed communication scheme for 8µm and 16µm pitches with 32bit and 64-bit links. To demonstrate the negligible performance overhead of the proposed high-speed inter-layer communication scheme, a cycleaccurate 3×3×3 3D NoC simulation environment was implemented along with two different inter-layer channel dessigns (typical and BBVC-based) in HDL. The clock frequency for inttra-layer and inter-

453

[1]

B. S. Feero and P. P. Pande, “Network ks-on-Chip in a Three-Dimensional Environment: A Performance Evaluation n,” IEEE Trans. on Computers, Vol. 58, No. 1, 2009, pp. 32-45.

[2]

J. Muttersbach et al., “Practical design n of globally asynchronous locally synchronous systems,” in Proc. of ASYNC C 2000, pp. 52–59.

[3]

I. Loi et al., “Supporting vertical links for fo 3D networks-on-chip: Toward an automated design and analysis flow,” in Proc. P of Nano-Net 2007, pp. 1-5.

[4]

S. Pasricha, “Exploring serial vertical interconnects for 3D ICs,” in Proc. of DAC 2009, pp. 581-586.

[5]

T. Ono, and M. Greenstreet, “A Modularr Synchronizing FIFO for NoCs,” in Proc. of NoCS 2009, pp. 224-233.

[6]

A. M. Rahmani et al, “Negative Exponeential Distribution Traffic Pattern for Power/Performance Analysis of Network k on Chips,” in Proc. of VLSID 2009, pp. 157-162.

[7]

A. M. Rahmani et al., “NED: A No ovel Synthetic Traffic Pattern for Power/Performance Analysis of Neetwork-on-chips Using Negative Exponential Distribution,” Journal of Lo ow Power Electronics, Vol. 5, No. 3, 2009, pp. 396-405.