EDA challenges facing future microprocessor design - IEEE Xplore

3 downloads 0 Views 217KB Size Report
and finer process technologies, the microprocessor designer faces unprecedented Electronic Design Automation (EDA) challenges over the future .... and structural RTL involves manual code rewrite, causing the behavioral model to be ...
1498

IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 12, DECEMBER 2000

EDA Challenges Facing Future Microprocessor Design Timothy Kam, Shishpal Rawat, Desmond Kirkpatrick, Rabindra (Rob) Roy, Gregory S. Spirakis, Naveed Sherwani, and Craig Peterson

Abstract—As microprocessor design progresses from tens of millions of transistors on a chip using 0.18- m process technology to approximately a billion transistors on a chip using 0.10- m and finer process technologies, the microprocessor designer faces unprecedented Electronic Design Automation (EDA) challenges over the future generations of microprocessors. This article describes the changes in the design environment that will be necessary to develop increasingly complex microprocessors. In particular, the article describes the current status and the future challenges along three important areas in a design flow: design correctness, performance verification and power management. Index Terms—Design correctness, design flow, design methodology, design validation, design verification, electronic design automation, low-power design, microprocessor design, performance verification, power management.

I. INTRODUCTION

D

URING the mid-1980s, most design activity in the microprocessor development cycle was centered on performance verification challenges. High-end microprocessors were operating at 25-MHz to 33-MHz speed. The scope of the verification effort began to expand and by the mid nineties, these resources were distributed across performance verification and presilicon/postsilicon verification. To address the design challenges the industry transitioned from schematic capture to designs described in hardware description language (HDL). A subset of HDL description could be synthesized into final design using logic and layout synthesis, a technology that was developed in the mid 1980s. As we approach the designs of the next decade (Circa 2006 with production around 2008–2010), the landscape is about to undergo another radical change, as designs will be limited by power dissipation [1] and our ability to verify performance and correctness. Design and verification of power centric design modules will be added to the list of “verification challenges” for the coming decade. To take a closer look at these issues, we describe the design environment for the upcoming decade and then propose design directions. Presently designs at 0.18- m are in production. Foundries such as TSMC have announced commercial availability for 0.13- m designs for late 2000 [2]. Smaller designs, at 0.10 m Manuscript received May 19, 2000. This paper was recommended by Associate Editor R. Camposano. T. Kam, D. Kirkpatrick, N. Sherwani, and C. Peterson are with Intel Corporation, Hillsboro, OR 97124 USA. S. Rawat is with Intel Corporation, Folsom, CA 95630 USA. G. S. Spirakis is with Intel Corporation, Santa Clara, CA 95052 USA. R. Roy is with Mobilian Corporation, Hillsboro, OR 97124 USA. Publisher Item Identifier S 0278-0070(00)10452-X.

and below are in advance development in major labs [3]. This acceleration in feature size is faster than predicted by ITRS 99 (Table I). Looking ahead to microprocessor design environment in the year 2006, a high-performance microprocessor design team will have an available silicon budget of 2.5 billion to 5 billion transistors (Table I). From past experience, these additional transistors will not be deployed in cache memory only. Continual architectural advancements are necessary to achieve the processor performance trend while effectively trading off cost and power. Selective local clocking trends to improve performance will force increasing divergence between local and global clocks (Table I). Micro-architecture will be selectively tuned to what the performance circuits and fabrication process can deliver. Furthermore, motivated by performance increase and cost reduction, silicon real estate has been effectively used to place more functionality on chip. The higher degree of integration is evident in processor offerings from a CPU processor (e.g., 80 386) to one with integrated floating point unit (FPU), to on-chip cache, to MMX media enhancements, etc. Table I is an abbreviated version of the ITRS roadmap [4] for high-end microprocessors. The semiconductor industry has consistently surpassed the ITRS roadmap long-range frequency targets; therefore we expect the microprocessor designs of the next decade to be more aggressive than indicated by the roadmap. In particular, the frequency targets for years 2008–2011 appear to be too conservative. Additionally, since supply voltage is not scaling down aggressively, the architects, circuit designers and tool developers face a major challenge to keep the total power dissipation in line with ITRS projections. System designers consider the power ratings to be already too high for the highest performance servers by 20%–30%. Today the power consumed for desktop through server microprocessors ranges from 40 W to 115 W in an air-cooled environment. Even if we meet these external constraints the power delivery and power density issues are getting worse, since the core logic portion of the chip is decreasing as a percentage of total chip (absolute number of transistors in core is still growing). Extrapolation of current power trends predicts that future chips may run with the power density of a nuclear reactor—2000 W/cm [1]. As the workstation and server markets diversify, we envision the market will demand the family of microprocessors to cover a much wider spectrum of power range, although the upper range may be slightly relaxed in future systems. Raw design complexity may compromise the optimality of designs as resources for design, test, and optimization are spread thin. For a microprocessor design beginning in the 2006 era, the

0278–0070/00$10.00 © 2000 IEEE

KAM et al.: EDA CHALLENGES FACING FUTURE MICROPROCESSOR DESIGN

1499

TABLE I ITRS 99 KEY FEATURES

performance verification flow should be able to handle 10 M transistors as a single block and have much faster turn-around time. This will allow the resulting hierarchy and number of blocks at the full chip level to be manageable given the various anticipated design constraints. Divide-and-conquer approaches can cope with design complexity: a design is partitioned and a hierarchy is created so that smaller subproblems can be solved first. Unfortunately this approach leaves gaps in the global optimization space and adds complexities in verification. Today, there is a tool orientation and a designer bias toward small block-based designs and little optimization occurs once partitioning into small blocks has taken place. This results in suboptimal performance, power consumption, test, and reliability. This small block-based approach is the centerpiece of a customized approach to high-performance microprocessor design. This approach is under significant pressure to move to higher levels of granularity—bigger block/application-specific integrated circuit-style approach. Therefore, future design flows for high-performance microprocessors will place heavy emphasis on tightly integrated flows to manage power, performance verification, and platform (presilicon/postsilicon) verification. The design teams will also continue their focus on producing a minimal area design. For a component supplier, business success depends on managing the silicon area effectively (as opposed to a lower volume system design that can relax some of the tight area constraints, and recoup their investment at the system level). Besides complexity, design quality is a fundamental requirement for successful deployment of automated solutions to modern microprocessor designs. On one hand, high performance translates into stringent requirements on delay, area and power, all minimized by quality computer-aided design (CAD) tools; On the other hand, the success of a microprocessor product relies heavily on low design errors in the first silicon. Presently, Intel produces more microprocessors in two quarters than the 386/486 generations did in twelve quarters; twelve quarters today exceeds the “life of the current microprocessor.” This places heavy emphasis on getting the silicon right at the first pass. In order to achieve faster turnaround and maintain a tightly integrated approach, design methodology will need to move to higher levels of abstraction. As design abstraction continues, expect design time and effort, validation and verification to improve significantly. In 1989, Gelsinger and his co-authors studied the trends in microprocessor design, and predicted the characteristics of a microprocessor in the year 2000 [5]. It was predicted to have 50 million transistors on a die measuring 1-in square, operating at over 250 MHz and would perform over 750 MIPS. As we ap-

proach 2001, we note that the microprocessor frequency, in the latter part of the decade rose sharply; actual frequency shooting the predicted (to over 1.5 GHz). This overshooting implies that the actual designs may be even more complex than predicted. II. DESIGN FLOW The high-performance microprocessor design flow is quite complicated. A simplified version is shown in Fig. 1. During the design process various subflows emerge based on design characteristics and performance targets. The details available at any stage dictate the type of tool flow that will be utilized. A complete design flow for a performance microprocessor typically utilizes 100–150 tools. Since the process technology also evolves while the design is being developed, the flows and subflows become quite complicated. Adding to the complexity of the flow is a myriad of circuit types (and requirements of manual design) and the need for robust design. Based on detailed analysis of past design data and trends, we identify three problems to be the most critical in the design of next generation products—Design Correctness, Performance Verification, and Power Management. Achieving faster design convergence, and avoiding design errors are among the most critical challenges for state-of-the-art microprocessor designs, which are experiencing rapid growth in design complexity and increased time-to-market pressures. We review how these issues are handled in the current design flow and outline design challenges that must be addressed in order to execute an effective future microprocessor design flow. III. DESIGN FOR CORRECTNESS A. Current Status on Design Validation Microprocessor designs can be divided into two classes—lead processors, and generations or proliferations. All Intel IA microprocessors are backward compatible in Instruction Set Architecture. Major micro-architectural changes are made in lead processor designs, while follow on proliferations often involve change in silicon process technology, and minor changes in the micro-architecture. The front-end of a typical design flow for a lead microprocessor is: , feature a) architectural design and exploration in C/C selection, and performance analysis; b) behavioral and structural register transfer language (RTL) emulation and simulation; c) presilicon system validation—single- and multi-processor; d) postsilicon debug and platform validation;

1500

IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 12, DECEMBER 2000

Fig. 1. Simplified microprocessor design flow.

An architectural team manages and executes the first step. The team does not have enough accuracy in estimating critical

delays or power dissipation of various design blocks to effectively tradeoff different architectural features. The emphasis at

KAM et al.: EDA CHALLENGES FACING FUTURE MICROPROCESSOR DESIGN

this stage is on making the chip functional, as well as reaching predefined performance targets. The path bridging behavioral and structural RTL involves manual code rewrite, causing the behavioral model to be abandoned at the later design stages. Therefore the structural RTL model becomes the golden model. A system validation environment is built out of the simulation/emulation capabilities developed during the design phase. In order to integrate these various flows, a more abstract model of the microprocessor is necessary as a starting point for architectural modeling, estimation, exploration, validation, and synthesis. If we look at the current status and trends, validation effort required for modern processors is increasing at an alarming rate. Design errata correspond to mismatches between the expected behavior and the observed response from simulation or emulation of a processor design. Fig. 2 shows the errata count discovered at specific weeks after a microprocessor has been taped out. In the lead/first generation of the microprocessor, the initial errata count is higher as compared to a proliferation project. As expected, the cumulative errata count for proliferation projects stabilizes in a much shorter period of time. Given that the time between generations of products, also called product cycle, is becoming shorter and shorter, emphasis will shift to presilicon validation to reduce the errata count. The traditional simulationbased presilicon design validation will not be acceptable due to its slow discovery rate on the lead microprocessor. There are two components in any validation effort—computation cycles and human effort. Given the falling computer price per MIPS, it may appear that growing number of computation cycles is not a big concern. But Fig. 3 shows the computational needs for validation in an Intel design project is heavily loaded toward the end of its project cycle. Detection and correction of design errata earlier on in the design flow is much desired. Furthermore, over successive design projects, the rate of growth in computation required is alarming and is faster than the growth rate of the computational capability of processors, which follows Moore’s Law [6]. Since the increase in computational power of successive generations of microprocessors falls behind the need for the increasing need for computational cycles for validation, some breakthrough technologies are needed to solve this problem.

B. Future Challenges in Design for Correctness Avoiding design errors is critical since validation is one of the most time-consuming steps in our design process. We envision a validation framework in which errors traditionally detected downstream can be identified and fixed as early as possible in the design cycle. In order to ensure that we obviate the need for maintaining millions of lines of RTL code and managing errata at every stage of the design process, it is imperative that we perform design and validation at a level of abstraction higher than RTL. Therefore, in order to move to higher levels of abstraction, pre-RTL designs must be able to: 1) capture the desired level of functional details; 2) efficiently translate the descriptions into the RT level.

1501

Fig. 2. Cumulative errata for different generations of product as a function of weeks after tapeout.

Fig. 3. Compute hours (in thousands) per week plotted against the duration of a design project cycle.

Today’s environment is witnessing the growth of C/C as a system design language. This has been primarily driven by engineers working with system-level design or embedded hardware where software is a major component of design. Use of C/C enables faster hardware and/or software (co-)simulation. The synthesis/translation path from behavioral C to RTL is currently performed either by manual design refinement or in rare cases by behavioral synthesis, the latter being limited in producing acceptable quality of results for data-dominated designs. In fact most EDA vendors are using a very restrictive subset/style of C to describe structural RTL. While the high-performance mifor croprocessor designer desires an unrestricted use of C/C modeling chip design and the entire platform, the designer needs the following essential details to make high-level design worthwhile. 1) The high-level model (HLM) for high-performance microprocessors and IC’s, as an earlier design entry point, must effectively capture popular and evolving architectural building blocks so that performance analysis and tuning can be performed. 2) The high-level transformations should enable effective design exploration of different microarchitectures, provide a good starting design points for synthesis, and aid design convergence. 3) The high-level synthesis or automation-supported path from HLM to RTL must produce quality hardware results

1502

IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 12, DECEMBER 2000

Fig. 5. Variations in long wire RC delay and intrinsic delay with reduction in feature size.

Fig. 4. Requirements of new CAD environment.

as well as be capable of handling large designs, especially for cases where the design space is too large for a designer to achieve optimal designs manually. 4) Early accurate design estimation is key to high-level design convergence and should ensure careful consideration of deep submicrometer (DSM) effects (including noise, power, timing, and interconnect). 5) Validation (simulation and formal verification) should be performed at and between different levels of design abstractions. Designers have used equivalence checking since the first set of layout verification engines came into existence in the early 80’s. It has now been moving into RTL verification and we expect this trend to continue to higher levels of abstraction. To be effective, high-level design should encompass all aspects of design including modeling, validation, exploration, synthesis, and estimation. If a microarchitect or designer can perform all design activities at this high level, then the design can be explored and converged completely during the high-level design phase, which enables the search of a larger design space, and reduces the possibility of nonconvergence at later stages of the design process. Research breakthroughs here promise big impact on design productivity and quality. Researchers are encouraged to work on high-level design, rather than prematurely migrate to tackle problems only in system design or in HW/SW co-design. The future front-end design flow, as illustrated in Fig. 4, leverages a validation framework that provides tools and technologies to enable the designer to work at a higher level of abstraction. IV. PERFORMANCE VERIFICATION The performance verification phase attempts to drive timing closure between various physical blocks. The structural RTL description is partitioned into synthesizable and nonsynthesizable blocks. Experienced engineers make an educated guess about timing allocations and power budgets. Progress at subsequent stages depends on the quality of partitions and timing allocations. In some instances, achieving performance goals may be intractable for automated tools. In such instances custom design

may be the only acceptable method for achieving performance targets. The inefficiency of manual partitioning manifests itself as costly, multiple iterations in design cycle. A. Current Status on Performance Verification Fig. 5 shows the trend of wire delay versus gate delay. As the feature size reduces over time, gate delay decreases while distributed wire (RC) delay goes up. The cross over occurs around 0.8 m, and therefore in the realm of current and next generation design, long wire RC delay definitely dominates. The performance verification loop ensures that the design meets timing, noise, power, and reliability verification constraints under the worst case and best case operating conditions. Current generation tools try to solve these problems sequentially, and success through a portion of design cannot be considered complete until the entire design has achieved convergence. In many cases, EDA tools have to be customized for certain circuit styles (e.g., gates with large number of fan-in). Multiple iterations through the performance verification loop are necessary because either estimates (such as wire load) which guide convergence continue to be poor or the resulting design partitioning makes the problem intractable in practice. Ability to make tradeoffs is limited to very local situations. Therefore an integrated approach to managing these issues is required. B. Future Challenges in Performance Verification The performance verification flow at current processor frequencies requires a turn-around time of weeks. For a microprocessor design beginning in the 2006 era, the performance verification flow must be able to handle 10 M transistors as a single block and have much faster turn-around time. Logic synthesis and layout automation tools must be capable of handling and analyzing diverse circuit families. As stated earlier, estimations and approximations must be guaranteed to be on a convergent path, so that repetitious and costly engineering changes are avoided. A data centric performance verification flow, such as one shown in Fig. 6, is desirable. Every part of this design flow is dependent on other activities, and cannot be treated in isolation. Good place and route solutions depend on an optimized netlist provided by the synthesis tool. Therefore, unless the synthesis tool comprehends its netlist implications to place and route, it

KAM et al.: EDA CHALLENGES FACING FUTURE MICROPROCESSOR DESIGN

Fig. 6.

Development of a data-centric flow.

will iterate only through old violations, possibly causing new violations to occur while fixing the old ones. In addition to this requirement of a data-centric flow, we believe that there are other specific issues to be addressed, as discussed below. Current placement and routing technologies cannot handle the number of transistors in 2006 and beyond. The easy way out would be to perform heavy partitioning and create multiple levels of hierarchy. The efficiency of this type of methodology is questionable at best and the amount of time needed to layout the full chip would be intolerable. A reintroduction of redundant devices and maximization of their usefulness in the synthesis process could be a potential method of minimizing the impact of engineering changes and achieving design convergence faster. Reliability will be an integral part of future performance verification flows. Metal electro-migration issues have been temporarily put aside with the introduction of copper, which has about an order of magnitude more resistance to electro-migration. However, new dielectrics raise concerns about thermal issues, both in conducting heat and handling the stress due to thermal gradients. In contrast, SiO is a very robust material that conducts heat extremely well. Movement to any other material is likely to degrade reliability. We have reached device oxide limits that are several molecules thick. Oxide stress and failure is a key concern now, which will likely only be resolved through new oxide materials or new device structures. Methodological solutions may include redundant circuitry to improve yield on chip. The modeling of interconnect is a future performance verification problem as RC modeling is only one small step in modeling, yet involves a large complexity growth in timing analysis. The next concerns are signal and timing integrity under increasing process variation and increasing coupling effects, such as inductive effects that come from switching wide structures (either single lines, like clocks, or multiple lines, like busses). Since inductive effects are “at a distance,” the modeling complexity is far larger than that of moving from capacitive to RC models, posing a major challenge to EDA. Given that frequencies are heading toward time-of-flight for global interconnects,

1503

Fig. 7. Historic power trends for lead and proliferation products.

multiple clock domain analysis and full-wave modeling may be required for global communication lines. To summarize, a data-centric flow for performance verification will require a solution integrated across several domains. In the past we have been able to solve performance verification issues using point tools. As we move forward, we will need to address the issues mentioned above simultaneously for block sizes approaching 10 M transistors.

V. POWER MANAGEMENT A. Current Trends on Power The current approach for power management is an iterative process involving a dedicated team of experienced engineers who work with RTL designers in power estimation, clock gating strategies, and circuit design techniques. The power management team works hand in hand with design teams to reduce power on existing functional blocks. This approach has worked well so far, but as the following data will demonstrate, it will not be sufficient for maintaining a manageable power dissipation figure for next generation microprocessors. Power estimation is an important part of any design flow. Average power estimates made at the beginning phase of design are derived empirically and therefore undergo multiple iterations. Power estimates get better for certain areas of the chip at the schematic level, e.g., clock power consumption. However, power estimates for certain other areas, such as logic, remain difficult to derive accurately. Fig. 7 shows the power dissipation trends. The power dissipation of 2 watts for Intel 80 386 processor was not a major concern in the late 1980s. However, the power consumption has always increased at the introduction of a new lead microprocessor (though proliferations of recent processor families managed to reduce power). If we extrapolate this trend, the power dissipation for a single processor will reach 2000 W per cm [1]. New breakthrough technologies are absolutely essential for new microprocessor designs to meet and beat realistic power dissipation goals. This figure also demonstrates that the criticality of

1504

IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 12, DECEMBER 2000

Fig. 8. Active and leakage power trends. Fig. 9. Supply induced noise trend.

design parameters changes with time. While in late 1980s performance and die size were more critical, the next decade will see an increasing emphasis on power dissipation. In CMOS circuits, so far the dominant component of power dissipation has been active switching power. Leakage and shortcircuit power has been negligible. However, with deep-submicrometer circuits, this power consumption profile is not valid anymore. As Fig. 8 demonstrates, with reducing feature width, the leakage power becomes an increasing component of the total power dissipation and is not modeled adequately by current tools. In the next generation processors, focusing on reduction of only active power will not be sufficient. Techniques will be needed to reduce the leakage power as well.Two new devices, one from Berkeley, the FinFet, and AT&T’s new vertical transistor, provide hope [7]. Besides absolute power dissipation, delivery of power has also become very important. From the source of power, i.e., Vdd coming into the chip, the power needs to be delivered to the point of switching. This involves resistive voltage drop ( ) and ). With the increased switching inductive voltage drop ( is increasing rapidly, speed, the inductive voltage drop , as shown in Fig. 9. This primarily due to the increase in trend will continue and the design of next generation of microprocessors will need to address power delivery issue in order to meet the required performance targets. In summary, the three most critical design problems related to power are absolute power dissipation, leakage power, and power delivery. B. Future Challenges in Power Management We expect future designs to become constrained by power envelopes. Already, complex packages have become a significant fraction of unit cost, and more power dissipation will require even more complex packages. A significant part of the product space is becoming mobile/wireless, putting extreme limits on what levels of power consumption are acceptable. Increased design complexity to control active power, combined with new techniques to minimize standby power, is inevitable. Future power management issues for high-performance microprocessors need to be addressed throughout the design flow. We categorize power management into three major areas. 1) Reduction of Active Power: We believe that this is best addressed at the architectural level. Typical tradeoffs that have been made in literature constitute of solutions like

clock gating and data driven dependencies [8]. This needs to be taken to the next level, for example, by adapting the clock for continuous control of power; dynamically scheduling certain computations to less aggressive hardware (based on certain environmental conditions), sensor based clock-throttling etc. Research in this area is needed to ensure that power can be managed further at an architectural level. A smaller percentage of active power reduction will continue to be innovated by circuit and design techniques throughout an entire design process. These low-level techniques are best suited to internal groups who understand their own technology and processes very well. Research should continue to target management of active power via architectural tradeoffs at early levels of design for it to have a substantial impact. 2) Reduction of Leakage Power: In very deep submicrometer, it is very difficult to turn devices completely off. Transistor source-drain leakage current grows expo. One process generation nentially with decrease in increases leakage by a factor of 6 to 10 . Leakage currents also increase exponentially with temperature, so maintaining low die temperature is a co-requisite. This level of leakage becomes an architectural concern: how much active versus static power fit within the power envelope of the design? Leakage affects power distribution variation (due to increased through resistive IR drop, thermal effects), noise, and interconnect reliability under electromagnetic (EM) effects. Solutions may include de, careful stacking of transistors, signing with multiple and exotic techniques to control , for example through the back-gate. Because of the temperature dependence, active power dissipation reduction indirectly benefits leakage. New transistor designs give hope to what could become the most fundamental limiter of CMOS design. Historically, voltage and process scaling kept the power equation for microprocessor in check for quite some time but the trend will be difficult to continue. Assuming current leakage trends, a microprocessor in 0.10- m generation may dissipate up to 50% leakage power. Currently there are no tools to minimize leakage power. Hence, due to the power dissipation limitations, use of the extra transistors available on the chip for other logic functions becomes prohibitive. A major breakthrough

KAM et al.: EDA CHALLENGES FACING FUTURE MICROPROCESSOR DESIGN

could be achieved through innovative architectural methods and tools that make static/dynamic tradeoffs. 3) Power Distribution Networks: They do not scale, hence, with each generation, we dedicate more silicon area to distributing higher currents at lower voltages. Here, IR drop is an issue, and increasingly, on-die inductance of power supply networks must be modeled. The impedance of these power supply networks is frequency dependent, due to the skin effect of conductors and to the proximity effect of networks of conductors carrying signals consisting of multiple frequencies. Predicting return currents and the modeling the effective supply impedance is a major challenge at the scale of next generation microprocessor design. Solutions for improving power distribution may come from improved decoupling structures as well as increased metal dedicated to power. In the limit, we may have to dedicate power planes to reduce supply inductance on die. In addition to power distribution, we also do not have methods of generating clock trees at or above 4 GHz. Novel distribution methods of power and clock will require development of new technologies necessary for communicating across “local areas” of the chip. Therefore, it is likely that architectural tradeoffs and circuit techniques can help us manage various aspects of power as defined above. However, research work in these areas is limited and needs to be accelerated if this is to become a reality. Research must focus on early tradeoffs, at higher levels of abstraction, as it is the only conceivable path to a smooth and productive design process. VI. OTHER AREAS OF IMPORTANCE There are other areas of design that are also very challenging. We feel that there is good work in progress in these areas and in the industry. A few are listed here. The SRC website [9] contains a detailed list of challenges in all major areas of design and test. Specifically, physical design problems are described in more detail in [10] and [11]. Prominent test issues that must be researched in order to drive down the cost of manufacturing tests are as follows: • design environment management; • clock planning, routing and skew control; • accurate 3-D modeling and complete characterization of cross coupling; • designing for process variability; • very high-speed place and route; • layout driven synthesis and synthesis driven layout; • general testing methods based on built in self test; • built-out self test using companion chips to test main IC; • use of noninvasive equipment such as ion-beam to verify ic structures. VII. SUMMARY In this paper, we have focused on three aspects of current microprocessor design flow: Design for Correctness, Performance Verification and Power Management. We have highlighted challenges, the need for designing at higher levels of complexity and

1505

abstraction that the microprocessor industry currently faces for designs in year 2006 and beyond. We have also proposed a futuristic design validation and performance verification flow that attempts to address these challenges. ACKNOWLEDGMENT The authors would like to acknowledge the helpful comments and suggestions provided by S. Borkar, V. De, J. Parkhurst, G. Tollefson, P. G. Roy, and M. Kishinevskey of Intel Corporation. REFERENCES [1] S. Borkar, ISPD 2000 Invited Talk—EE Times, Apr. 14, 2000 ed., p. 73. [2] TSMC website. [Online]. Available: http://www.tsmc.com/ [3] D. Pescovitz, “Wired for Speed,” Scientic American, pp. 40–41, May 2000. [4] ITRS Roadmap (1999). [Online]. Available: http://www.itrs.net/1999_ SIA_Roadmap/Home.htm [5] P. Gelsinger et al., “Microprocessors circa 2000,” IEEE Spectrum, Mag., Oct. 1989. [6] Intel web site. Intel Corp., Folsom, CA. [Online]. Available: http://www. intel.com/intel/museum/25anniv/hof/moore.htm [7] C. Brown, “New gate geometry opens way for ultra-fine transistor geometries,” EE Times, Nov. 26, 1999. [8] G. Yeap, Practical Low Power Digital VLSI Design. Norwell, MA: Kluwer Academic, 1998. [9] SRC Web Site. [Online]. Available: http://www.src.org [10] SRC Web Site PD problems. [Online]. Available: http://www.src.org/ areas/cadts/pd.dgw [11] J. Parkhurst et al., “SRC physical design top ten problems,” in Proc. 1999 Int. Symp. Physical Design, pp. 55–58.

Timothy Kam received the B.Sc. (Eng.) degree with first class honors in electrical engineering and computer science (EECS) from the University College, University of London, London, U.K. in 1986, and the M.S. and Ph.D. degrees in EECS from the University of California, Berkeley, in 1990 and 1995, respectively. Since 1995, he has been a Researcher and is the Technical Manager of synthesis at the Strategic CAD Laboratories of Intel Corporation, Hillsboro, OR. Between 1986 and 1989, he was an IC Design Engineer at Motorola, Hong Kong, and received a patent award. He has spent summer at AT&T Bell Laboratories, Holmdel, NJ, and Hewlett Packard, Santa Clara, CA.. He co-authored Synthesis of FSMs: Functional Optimization (Norwell, MA: Kluwer Academic, 1996) and Synthesis of FSMs: Logic Optimization (Norwell, MA: Kluwer Academic, 1997). His research interests include synthesis, optimization and verification of integrated circuits. Dr. Kam received Best Paper Award at the 36th Design Automation Conference in 1999. He is the General Chair of the IEEE International Workshop on Logic Synthesis in 2001, and serves on numerous technical program committees.

Shishpal Rawat earned the B. Tech. degree in electrical engineering from Indian Institute of Technology, Kanpur, India, in 1979, and M.S. and Ph.D. degrees in computer science from Pennsylvania State University, University Park, in 1982 and 1988, respectively. Since then he has held a variety of Design and CAD management positions at Intel. He is currently Strategic Programs Manager, Managing University Research, EDA Investment and EDA Tools Research Program for Design Technology. Dr. Rawat served as Chair for the Design Science Technical Advisory Board of the Semiconductor Research Corporation in 1998, and as Chair for the Integrated Circuits and Systems Sciences Technical Advisory Board in 1999.

1506

IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 12, DECEMBER 2000

Desmond Kirkpatrick received the S.B. degree in electrical engineering from Massachusetts Institute of Technology, Cambridge, MA, in 1986 and the Ph.D. degree in electrical engineering and computer sciences from the University of California at Berkeley in 1997. In 1986, he joined Intel Corporation, Santa Clara, CA, where he made significant contributions in hierarchical, full-chip timing analysis, floor-planning, layout synthesis, and extraction for the 486 and Pentium microprocessors. From 1991–1997, he worked with the Pentium Pro microprocessor design team while completing his research, and in 1997, he returned to Intel, joining the Pentium 4 microprocessor design team. On both teams he contributed to full-chip assembly and interconnect performance management techniques, especially addressing noise concerns. He also contributed to the specification of the interconnect architecture for Intel’s 130 nm process technology. In 1999, he became Intel’s first Technical Liaison to the Gigascale Silicon Research Center at the University of California at Berkeley.

Gregory S. Spirakis received the B.S. degree in electrical engineering and materials science engineering from the University of California, Berkeley in 1982. He is a Vice President of the Intel Architecture Group and Director of Design Technology at Intel Corporation, Folsom, CA. He is responsible for developing the computer-aided design technologies used to design Intel architecture microprocessors. He joined Intel in 1982 as an EPROM Reliability Engineer and has since held a variety of technical and management roles, including positions as Yield Manager, Design Manager and Q & R Manager. In 1997, he became Director of Design Technology. He has received two Intel Achievement Awards.

Rabindra (Rob) Roy received the B.Tech (Hons) degree in electronics & electrical communication engineering from Indian Institute of Technology in 1984. He received his M.S. and Ph.D. in electrical & computer engineering from University of Illinois at Urbana-Champaign. He is currently the Chief Technology Strategist at Mobilian Corporation, Hillsboro, OR. Previously, he performed research and development in VLSI Design and Test at Intel Corporation, Hillsboro, NEC Research Labs, Princeton, NJ, AT&T Bell, Naperville, IL, and General Electric R&D Center, Schenectady, NY. He has authored/co-authored more than 50 research papers. He holds five U.S. and international patents and has several pending. Dr. Roy was the Program Chair and General Chair of IEEE VLSI Test Symposium, 1996 and 1998. He was the Chairman of ACM Doctoral Dissertation Award Committee, 2000.

Craig Peterson received the B.S. degree in electrical and computer engineering from Oregon State University, Corvallis, in 1974. He has worked at Intel in design and design technology for 26 years. During that time, he co-designed three microprocessors, lead projects on five generations of chipsets, co-invented and headed the interconnect component development for the world’s first supercomputer to break the sustained TeraFLOPS computation barrier. More recently, he was Technology Director in Design Technology helping innovate dramatic improvements in microprocessor development productivity and optimization. He achieved many “firsts at Intel” including: co-wrote the first RTL simulation, wrote first cell-based synthesis tool, lead first all-workstation-based project, designed first single-chip memory controller, designed first single-board multiprocessor board, and co-microarchitected the first multithreaded processor.

Naveed Sherwani received the Ph.D. degree in computer science from University of Nebraska, Lincoln, in 1988. After graduation, he served as a Professor for six years, where his research concentrated on combinatorics, graph algorithms and algorithms for VLSI physical design automation. In particular, he concentrated on efficient algorithms for over-the-cell routing to reduce channel routing area. Since 1994, he has been with Intel Corporation, Folsom, CA. He is the General Manager for Intel Microelectronics. He has published over 75 refereed papers. His research concentrated on graph theoretic algorithms for routing in printed circuit boards. Dr. Sherwani’s paper on three layer over-the-cell routing received “distinguished paper” award at ICCAD-91. He is Founder of Great Lakes Symposium on VLSI. He is Member of the technical committee for ICCAD’97, ICCAD’98, and ICCAD’99 and Program Chair for International Conference for VLSI for 1999 and 2000. He is also General Chair for the same conference in 2001.