modular processor: a flexible library of asic modules

0 downloads 0 Views 508KB Size Report
Processor library, ASIC module, synthesis, and layout. 1. ... getting a design from architectural level or Register ... In next sections we will explain in detail how to create, ... The first LEF file must include the technology .... The peripheral bus controller is supposed to be ... metal layers, the total chip area is 40 mm2 (Figure 3).
MODULAR PROCESSOR: A FLEXIBLE LIBRARY OF ASIC MODULES Z. Stamenković, G. Panić, U. Jagdhold, H. Frankenfeldt, K. Tittelbach-Helmrich, G. Schoof and R. Kraemer IHP GmbH Im Technologiepark 25, 15236 Frankfurt (Oder) Germany [email protected] Test project defines a new Design-for-Testability (DFT) approach and techniques for testing multiprocessors on a chip.

Abstract The paper presents a specific approach to SoC design, aimed to provide a library of ASIC modules reusable in standard digital design flow. It familiarises you with the concept of library and its hierarchy. Also it describes how to specify, synthesise, layout, verify, and reuse ASIC modules.

Commercially available libraries (for example Synopsys's DesignWare® Library [6]) usually do not offer a full hierarchy. Namely, they do not include ready-for-use layout modules. In the contrary, our approach provides a full and, in the same time, flexible hierarchy of the library by generating and verifying ASIC modules in all three forms: functional (RTL), logical (net-list), and physical (layout).

Key Words Processor library, ASIC module, synthesis, and layout

1. Introduction

In next sections we will explain in detail how to create, maintain, and extend such a modular processor library. Section 2 describes the library structure. In Section 3, we will focus on the design flow and implementation details of the library. Section 4 presents an example of the library module.

The System-on-Chip (SoC) design methodology assumes getting a design from architectural level or Register Transfer Level (RTL) to layout. It should provide the designer with a working starting point for each stage of the design process. We describe such a methodology that relies on a flexible and extensible library of reusable ASIC modules and satisfies the unique needs of custom applications.

Protocol Engine Wireless Internet

As the Systems Department of IHP works on complete wireless high-end systems, our library is specified to support wireless applications and design of a wireless engine that needs to result in a multiprocessor on a chip (Figure 1). Several projects contribute and use the library:

Application Engine Mobile Bus. Engine

DLC

Power Management Wireless Internet

Baseband

Test Engine Test Project

RF

Wireless Broadband Network

Wireless Broadband Network (WBN) project focuses on highly integrated broadband wireless modems according to IEEE802.11a [1] and HiperLAN2 [2] standards. The WBN team developed special hardware/software codesigns that allow the realisation of wireless modems with full throughput of 54Mb/s including the MAC layer [3].

Figure 1 Illustration of the wireless engine approach

2. Library Structure A library comprises of modules described in RTL, flat or hierarchical Net-list Format, Cadence's Library Exchange Format (LEF) and Timing Library Format (TLF), Standard Delay Format (SDF), and test benches. This section describes the hierarchical structure of the library, module formats, and test benches.

Wireless Internet (WI) project develops a new terminal oriented TCP/IP for wireless systems. The focus is to rise the energy efficiency by using a vertical optimisation [4]. Mobile Business Engine (MBE) project seeks a specific application processor for the wireless engine that targets highly efficient encryption operations to increase wireless privacy and security [5].

443-143

428

2.1 RTL Modules

2.5 Test Benches

Having written a design specification, the next job is to describe the design in computer readable form using the client's Hardware Description Language (HDL) of choice (VHDL or Verilog). The aim is to produce RTL code that clearly exhibits the functionality prescribed in the specification, whilst meeting the constraints placed upon it by the target technology and standard cell library. The skill and experience of the designer is the most important factor in this process.

A test bench is written in order to prove correctness of the RTL model against the specification. The aim of a good test bench (or test benches) is to: • Prove that the device functions as described in the specification reporting any discrepancies to the transcript in order that they may be fixed; • Fully exercise every line of RTL code; • Allow re-simulation of the post-layout net-list. In many cases, it is only possible to perform a subset of the total test suite at gate level due massive increase in computing power required; • Test benches are sometimes used to produce cycle based test vectors for final electrical testing during manufacture.

2.2 Net-list Modules To create a module database for the library, we synthesise flat and hierarchical Verilog net-lists. The module netlists are compiled, saved and added to the library database. The basic elements of a net-list are global design data, components, and nets. Global design data is described in terms of special keywords. Examples are a design name and comments specifying the design engineer, revision numbers, etc. Components are described in terms of instance names associated with names of standard cells. Nets are described in terms of groups of standard cell pins names.

We provide configurable and hierarchically linked test benches. The hierarchy of a top test bench follows the hierarchy of the library. Configuration of test benches is possible using an in-house prepared Tcl interface.

3. Modular Library Design Flow Design flow starts with the creation of a HDL file describing the system and its components as black boxes [7]. Most of components are configurable functional building modules, which are automatically, after choosing the parameters, described by generation of the net-list and physical layout. Configuration of functional modules is possible using an in-house prepared Tcl interface. In addition, area and average power dissipation estimates are also generated to support exploration of design alternatives. This is done using the hierarchical structure of the library.

2.3 LEF and TLF The first LEF file must include the technology information and a complete description of standard cells, I/O cells, and macro blocks. You can incrementally add data from other LEF files to the library in memory. A single macro block may have multiple LEF files (logical, physical and timing). The description within LEF files must follow the structure and syntax rules defined by the layout tool. The details of the LEF are as follows: • The LEF file is used to create a library database. • If you plan to use a timing-driven placement, a TLF file with timing information is necessary.

Figure 2 shows all main steps of the design flow including both logical and physical stage. Logic synthesis tools are used to synthesise a net-list representation of the design [8]. RTL and net-list (without and with timing information) simulations are performed by HDL simulation tools [9], [10]. Layout tools can do the floor planning, placement, clock tree generation, routing (including the delay optimisation), and verification of layout [11].

It is necessary to use one timing model for all layout tools. One can use Compiled Timing Library Format (CTLF), which supports table (non-linear) and linear models.

3.1 Logic Synthesis

2.4 SDF Files

The standard digital design flow usually starts with an RTL description of the design. Then synthesis tools take the design to produce a gate-level equivalent design with specified timing constraints, i.e. to synthesise it into target standard cell library. At the end of this synthesising step, a Verilog net-list and a SDF file are generated, and the design is simulated. To shorten the time necessary for synthesis, we have carefully prepared the synthesis scripts with design constraints.

Timing of a library module is contained in a corresponding SDF file. This file can be generated after logic and/or layout synthesis and is necessary for simulation.

429

3.2.1 Floor Planning

Applications

First Encounter is intended for a hierarchical floor planning. Floor planner uses an interactive approach, combining automatic functions and interactive editing to plan locations for blocks and core rows. It can predict and assess the effects of physical layout before you place and route the design. This speeds up the design process by minimising costly iterations. The primary goal of floor planning is to create rows where the placer can place cells. Each row is assigned to a site type, limiting the number of cells than can be placed in a row.

System System Specification Specification

Library Database Library Database New New Modules Modules Configurable CustomisableModules Modules (synthesisable (synthesisable RTL RTL code) code)

Predefined Predefined Modules Modules (standard cells,net-list, synthesised (synthesised SDF net-list,and/or SDF and LEF) LEF)

Initialisation sets up the physical design based on the data you read in. It creates a core area with core rows, and sets up GCell and RGrid tracks. If the standard cell model is tall and narrow, horizontal rows are created. If the standard cell model is short and wide, vertical rows are created. Initialisation does not place any cells. The floor planner estimates the required layout area (die size), defines the layout domain (shape, aspect ratio), and plans and modifies the row configuration. The row configuration implies the position, spacing, orientation and row type. You can also align rows, adjust the channel space and modify the cell distribution within rows. Finally, it is possible to edit groups and group constraints, resize the layout area and modify the global routing grid.

HDL HDL Model Model Definition Definition

no

Simulation Simulation OK? OK? yes Logic Logic Synthesis Synthesis

no

Simulation Simulation OK? OK?

Test TestBenches Benches

yes

3.2.2 Power Planning

Layout Layout Synthesis Synthesis

no

After the floor plan is ready and read in, the next step is to create power paths in the design. Then power rings and power stripes can be added.

Simulation Simulation OK? OK?

3.2.3 Placement

yes Chip Design

Generally, I/O pins are placed before blocks. If there are some predetermined I/O pin placement requirements, an I/O constraint file should be used to place I/O pins. Otherwise I/O pins and cells can be placed at the same time. After this, we can analyse the timing at different points in the design with ideal clock, as there are no clock trees yet. First Encounter is capable to perform in-place optimisation resizing gates (including flip-flops), and inserting buffers and inverters to correct timing and electrical violations.

Figure 2 Overview of the library design flow 3.2 Layout and Physical Synthesis Layout tools provide a foundation for the deep submicron technology design with three or more metal layers. We use First Encounter® [12], which is a tool for hierarchical designs that have timing critical blocks. As the core of the Cadence® Encounter™ digital IC design platform, First Encounter quickly produces a silicon virtual prototype of the physical design, which provides both rapid feedback on chip performance and a fully functional, physically feasible layout.

3.2.4 Generating Clock Trees Clock distribution and skew are controlled using an automatic clock tree generator. The clock buffer space and clock net must be defined. The clock tree generator automatically constructs an optimised clock tree, minimises skew and clock tree min/max insertion delay, and uses buffer/inverter selection control. First Encounter can perform a hierarchical clock tree synthesis.

With this physical prototype, our front-end designers quickly explore the impact of their implementation choices on chip performance and physical feasibility. The back-end designers produce a floor plan and placement optimised for rapid, reliable design closure. We do first the floor planning including the power planning, and then placement, clock tree generation and routing using our own scripts. 430

3.2.5 Routing

As an example, we have selected a system-on-chip implementation of the IEEE 802.11a MAC layer [17]. It integrates some unique solutions using an architecture that exploits dedicated hardware for timing critical tasks. The system comprises of the MIPS core (including an instruction cache of 8 KB and a data cache of 8 KB) and several peripherals: a hardware accelerator, an I2C bus, a PCMCIA, and two UARTs. The MIPS core has access to two slaves (a peripheral bus controller and a GPIO) via Xbusses. The peripheral bus controller is supposed to be used to control the external SRAM, programmable flash, and UARTs. The hardware accelerator, I2C bus and PCMCIA are connected to the processor via GPIO module.

At this point, we have a fully placed design and ready to route it. In general, the routing of a design is done in three stages: the routing of power nets, the routing of clock trees and the routing of the remaining nets. We use Cadence® NanoRoute Ultra™ [13] to perform the global and detail routing. 3.2.6 Optimisation There is a critical synergistic relationship between placement and routing for all aspects of design closure, including timing, power, cross talk and congestion. That is why a post-routing optimisation step is necessary.

Two embedded SRAM modules with size of 8 KB and 2 KB are supposed to be used for implementation of the cache tag and data arrays, and PCMCIA and hardware accelerator interfaces.

3.2.7 Verification and Back-annotation The routed design is checked out against connectivity and geometry violations. At this phase, we extract parasitic capacitances from the layout and generate a Verilog netlist, a SDF file, and a LEF file of the design. This step ends with the design simulation (including the generated SDF file) or timing analysis to verify the design performance. The operation is known as a backannotation.

The cache tag array is implemented of a module of 2 KB (the size is 512x32, of course a size of 512x24 is needed only). Also, the cache data array is implemented of a module of 8 KB (the size is 2048x32). The way-select array (the size is 512x1) is made of flip-flops. PCMCIA interface is supposed to be used as an interface to the upper layer (TCP/IP). Interface to the system is provided via 4 modules of dual-ported SRAM (adopted single-port SRAMs) and a pair of configuration-registers. There is a pair of SRAM modules of 8 KB for data and a pair of SRAM modules of 2 KB for control.

3.3 Design Verification We use a HDL simulator to verify that all RTL, logically synthesised net-list and physically synthesised net-list have the functionality as expected. The design is simulated with a specific hierarchical test bench. Iteration may be needed to get the final layout and, at the end, a new hard-core library module (as a LEF file).

The hardware accelerator includes a single-port SRAM module of 2 KB and a 2x256-byte dual-port RAM made of flip-flops.

3.4 Design Reuse We have used the flat-design approach in order to meet the timing requirements and to generate the clock tree efficiently. In IHP’s 0.25µm CMOS technology having 5 metal layers, the total chip area is 40 mm2 (Figure 3). The core area, excluding the pads and memory modules, is about 16 mm2. The chip contains 188 signal pins and 16 power pins (204 pins in total) and integrates, excluding memories about 200.000 NAND gates (800.000 transistors). The power consumption is estimated to be in the range of 1 W at 80 MHz.

A new design becomes automatically a module of the library that can be reused in more complex system-onchip designs. But we are still not done. To make a chip, we have to add I/O pads and a top level power ring, and to perform the hierarchical clock tree synthesis and final routing.

4. Library Modules We have started with a library including a soft-core MIPS32 4KEp™ processor [14], several open-cores (UART, GPIO, I2C bus, PCMCIA and controllers [15]), three hardware accelerators, and three hard-core SRAM modules of 2 KB, 8 KB and 16 KB. Afterwards the full version of configurable AMBA on-chip-bus [16] was installed. That was a platform for implementation of five modules: three for WBN project, one for both WI and MBE project, and one based on scratchpads and AMBA bus for test purpose.

5. Conclusions We have developed a specific approach to SoC design: a modular system consisting of various hardware components according to the requirements of modern wireless high-end systems. Most of components are extensible or configurable functional building modules, which are automatically, after choosing the parameters, described by generation of the net-list and physical layout.

431

K.F. Dombrowski, N. Fiebig, R. Kraemer, & P. Mähönen, On the single-chip implementation of a Hiperlan/2 and IEEE 802.11a capable modem, IEEE Personal Communications Magazine, 8(12), 2001, 48-57. [4] M. Methfessel, K.F. Dombrowski, P. Langendörfer, H. Frankenfeldt, I. Babanskaja, I. Matthaei, & R. Kraemer, Vertical optimization of data transmission for mobile wireless terminals, IEEE Wireless Communications, 9(12), 2002, 36-43. [5] P. Langendörfer, Integration moderner Hand Implementierungstechniken in Codegeneratoren, University of Erlangen, 2001. [6] http://www.synopsys.com/products/designware [7] E. Blokken, H. DeKeulenaer, F. Catthoor, H.J. DeMan, A flexible module library for custom DSP applications in a multiprocessor environment, IEEE J. Solid-State Circuits, 25(6), 1990, 720-729. [8] http://www.synopsys.com/products/logic [9] http://www.model.com/products [10] http://www.cadence.com/products/functional_ver [11] http://www.cadence.com/products/digital_ic [12] http://www.cadence.com/products/digital_ic/first_enc ounter [13] http://www.cadence.com/products/digital_ic/nanorout e_ultra [14] http://www.mips.com/content/Products [15] http://www.opencores.org/browse.cgi/by_category [16] http://www.synopsys.com/cgi-bin/designware/amba [17] G. Panić, D. Dietterle, Z. Stamenković and K. Tittelbach-Helmrich, A system-on-chip implementation of the IEEE 802.11a MAC layer, Proc. 3rd EUROMICRO Symposium on Digital System Design, Antalya, Turkey, 2003, 319-324.

We have also described how to specify, synthesise, layout, verify and reuse hard-core ASIC modules, and create a flexible library of modules as a base for design of a powerful modular processor.

t

Figure 3 Layout of the MAC SoC design made by the modular library approach

References [1] http://grouper.ieee.org/groups/802/11 [2] http://www.hiperlan2.com [3] E. Grass, K. Tittelbach-Helmrich, U. Jagdhold, A. Troya, G. Lippert, O. Krüger, J. Lehmann, K. Maharatna,

432