Paper Title (use style: paper title)

0 downloads 0 Views 295KB Size Report
Inovallée – 110 Rue Blaise Pascal. 38330 Montbonnot Saint Martin - France marc.renaudin@tiempo-ic.com. Robin Wilson, Sylvain Engels. STMicroelectronics.
Clockless Design Performance Monitoring for Nanometer Technologies Marc Renaudin, Aurélien Buhrig, Charles Guillemet

Robin Wilson, Sylvain Engels

TIEMPO-SAS Inovallée – 110 Rue Blaise Pascal 38330 Montbonnot Saint Martin - France [email protected]

STMicroelectronics 850 Rue Jean Monnet 38926 Crolles - France [email protected], [email protected]

Abstract—This paper introduces a breakthrough in the domain of performance process monitoring that is based on clockless/ asynchronous circuits. The process monitoring chip was fabricated using 32nm and 28nm bulk technologies and enabled relevant process parameters monitoring using an easy-to-use set-up. Keywords—process monitoring; clockless design; asynchronous design; nanometer technologies.

I. MOTIVATIONS AND OBJECTIVES When developing silicon design platforms, it is important from a design perspective to characterize key design performance metrics for a given technology node. Such key performance parameters for digital designs could be maximum frequency, dynamic and leakage power at given operating conditions. Design performance metric validation is required to validate that the target technology performance is met by actual designs. In addition it is necessary to demonstrate that the cad models and design flow used during design development can accurately predict final silicon performance. Currently there are a number of different approaches to make design performance assessments, each with its own advantages or limitations. Such silicon characterization is possible through a wide variety of circuits and silicon measurements. For example technology frequency capability can be assessed using the classical approach of ring oscillators, a reference digital design block or a full product level (SOC) IP. Each method of validation has its advantages and limitations. While the classical ring oscillator approaches are extremely good tools for cad vs silicon modeling comparisons and also for technology performance assessment, they may not fully reflect the final design performance seen at product level, mainly due “the random nature of digital logic and associated interconnect”. Reference designs tailored to design performance monitoring can be used. However given the ever increasing design performance requirements, now in the 1-2 GHz arena, even a simplified reference design can be complex to implement given the clocking and power requirements. Additional complexity at design implementation comes into play given the evolving needs to validate over a very wide voltage range. This project is supported by ENIAC and was carried out in collaboration with CEA/LETI.

We introduced and validated a new performance monitoring method, based on Tiempo’s asynchronous design technology and CAD flow. As a validation of this performance-monitoring technology, we designed and implemented a monitoring application chip – MTAM16 – that has been fabricated by STMicroelectronics using its 32nm and 28nm process technologies. The clockless library and design flow enable a highly robust and reliable monitoring chip that is capable of obtaining relevant performance parameters, including dynamic and leakage power, at a wide range of operating conditions. II.

DESIGN FOR PERFORMANCE MONITORING

The architecture of the monitoring chip MTAM16 includes a programmable processing unit based on Tiempo’s asynchronous microcontroller TAM16 (Fig.1), associated with a RAM and a ROM, and four peripherals, an RS232 serial link to communicate with a host computer, delay-insensitive inand-out serial links to communicate with other monitoring chips and/or sensors, and a GPIO used to provide status information and select the monitoring program executions. GPIO

TAM16 µC

Serial_In/Out

RS232

Decoder

RAM

ROM

TIEMPO Monitoring Chip

Fig. 1. Synoptic of the architecture of the monitoring chip.

All the blocks of the chip, including the memories, are asynchronous and delay insensitive, which represents around 300 K of equivalent logic gates. This choice is prompted by the necessity of exclusively using digital blocks designed with Tiempo asynchronous technology and not relying on blocks that are either not available yet or sensitive to process variations. This is an important aspect in order to reach a high robustness and in order for the functionality not to rely on unavailable process characterizations.

Even though the monitoring chip can easily be extended using the serial links (adding for voltage, temperature sensors, etc…), the sensors implemented in this application are exclusively made of short programs executing a few instructions in loop which are representative of digital IPs in terms of logic content. Due to the fact that it can run operating instructions and real calculations, many different logic paths are exercised during the performance characterization test, giving a wide coverage range over which throughput can be calculated. These measurements provide the means to calculate Fmax and Power for representative cones of digital logic. III. DESIGN FLOW Tiempo developed a complete design flow based on commercial CAD tools complemented with Tiempo’s specific synthesis tool called ACC (Asynchronous Circuit Compiler). Circuits are designed following a standard-cell-based design flow as depicted in Fig. 2. SystemVerilog

Standard cells

Asynch cells

ACC synthesis Verilog SDC / SDF

SystemVerilog Benches P&R Simulation

Tape-Out

Physical verifications

Fig. 2. Design flow. Standard tools and libraries are in blue, and Tiempo tools and libraries are in green.

For the purpose of this project a limited set of asynchronous cells was designed (13 functions) to complement the standard library available for synchronous circuits. This set is fully compatible with the existing synchronous library so that synchronous and asynchronous cells can be mixed to synthesize, place and route asynchronous circuits. Details on the flow and in particular the back-end steps applied to avoid the use of timing characterization and still properly manage the isochronic forks of the design, will be outlined during the presentation. IV.

SILICON CHARACTERIZATION EXPERITMENTS AND RESULTS

The ROM embedded in the chip includes a Built-In-Self Test (BIST) program enabling self-testing of all parts of the

monitoring device. It also includes predefined monitoring programs as well as a loader enabling the execution of userdefined monitoring campaigns (a software development kit is available to enable the design and debug of application programs written in C). The on-chip hardware provides a straightforward mechanism to load test programs into the onchip memory via a RS232 interface. In this way we were able to load the silicon characterization programs directly from a PC connected via USB to the RS232 port of the chip. The USB chip on the application board provides the voltage reference power supply for the TAM16. Once the circuit is available on silicon the fundamental characteristics of clockless design greatly facilitate silicon characterization. First of all, the design robustness across a very wide voltage range enables performance measurements for complex computation algorithms over this wide voltage range. This can be considered as characterizing the performance of a representative design with the simplicity of ring oscillator characterization. Likewise Ileakage and Idyn measurements can be made directly for each VDD setting. This is simpler than measuring the performance of a clocked design, where for a given characterization point the frequency is programmed to a given value (PLL settings), then a sweep on voltage is made to find VDDmin, and then Idyn and Ileak are measured at this VDD. Therefore unlike for clocked design, no PLL and its associated programming steps are required for TAM16. In order to analyze the TAM16 performance we first of all calibrated the silicon measurements against known design performance monitors that were available on the same skew wafers. The results of these monitors show that the TAM16 readings are in line with silicon centering expectations. From the graphs obtained we can clearly see the split between silicon corners. We can also see a good correlation between the throughput of TAM16 and a reference silicon performance monitor. V. CONCLUSION Using the Tiempo design flow, we designed a monitoring chip that we can use to (1) monitor design characteristics in the target technology, and (2) calibrate the CAD model predictions against the silicon model realities. The asynchronous design style produces a robust monitor with a control simplicity similar to that of a ring oscillator validation suite. For this chip design, the use of timing information all along the design flow was avoided. This proved that Tiempo circuits can be taped-out and work correctly using any new process, even during the phase where the characterization of such a new process is not finalized. Additionally, this property enables an easy migration of the MTAM16 chip on any other advanced process technology regardless of the state of maturity of such process. This was verified doing a blind shrink of the 32nm chip into a 28nm process.