Power Optimization of Sum-of-Products Design for ...

4 downloads 0 Views 2MB Size Report
Mar 27, 2013 - f ASAP conference. REVIEW 4. PAPER: .... (Ardavan Pedram, John McCalpin, Andreas Gerstlauer). 185. Reduce .... William Kritikos. Vahid Lari.
Power Optimization of Sum-of-Products Design for Signal Processing Applications Seok Won Heo

Suk Joong Huh

Milo˘s D. Ercegovac

Computer Science Department University of California at Los Angeles CA, USA 90095 [email protected]

Samsung Electronics Suwon, Korea [email protected]

Computer Science Department University of California at Los Angeles CA, USA 90095 [email protected]

Abstract—Power consumption is a critical aspect in today’s mobile environment, while high-throughput remains a major design goal. To satisfy both low-power and high-throughput requirements, parallelism has been employed. In this paper we present an approach to reducing power dissipation in the design of sum-of-products operation by utilizing parallel hardware while maintaining high-throughput. The proposed design reduces about 46% of execution time with about 12% energy penalty compared to the ARM7TDMI-S multipliers in benchmark programs.

Keywords—Low-power Arithmetic; Arithmetic; Sum-of-products.

High-throughput

I. INTRODUCTION There is a fundamental technological shift taking place in the electronics industry. It is moving from the wired era driven by Personal Computer (PC) to the wireless era driven by mobile devices. With an increasing complexity of mobile VLSI systems and a growing number of signal processing applications, minimizing the power consumption of signal processing applications has become of great importance in today’s mobile system design while performance and area remain the other two major design goals. Multiplication and related arithmetic operations are frequently executed operations in conventional digital signal processing applications. However, digital signal processing applications may take many clock cycles using a conventional multiplier even when they include a high-performance parallel multiplier. This is the critical problem for the arithmetic operations in state-of-the-art signal processing applications which require intensive numerical calculations. Moreover, studies on power dissipation in Digital Signal Processors (DSPs) and Graphics Processing Units (GPUs) indicate that the multiplier is one of the most power demanding components on these chips [1]. Therefore, research on new arithmetic models is needed to satisfy low-power and high-throughput requirements in mobile systems. The total power consumed by a CMOS circuit is composed of two sources: dynamic power and static power [2]. Dynamic power dissipation is the dominant factor in the total power consumption of a CMOS circuit and typically contributes over 60% of the total system power dissipation. Although the effect of static power dissipation increases significantly as

978-1-4799-0493-8/13/$31.00 © 2013 IEEE

VLSI manufacturing technology shrinks, the dynamic power dissipation remains dominant [3]. It can be described by 2 Pdynamic = 0.5 × CL × VDD × fp × N

(1)

where CL is the load capacitance, VDD is the power supply voltage, fp is the clock frequency, and N is the switching activity. The equation indicates that the power supply voltage has the largest impact on the dynamic power dissipation due to its squared term factor. Unfortunately, reducing power supply voltage causes performance degradation. A great deal of effort has been expended in recent years on the development of techniques to utilize the low-power supply voltage while minimizing the throughput degradation. Parallel architectures mitigate such throughput degradation [4]. This paper proposes a new arithmetic architecture for signal processing applications and develops a scheme to achieve power savings in the sum-of-products operation by utilizing parallel architectures. This paper is organized as follows. Section II addresses the problem of conventional arithmetic architectures. Section III presents an in-depth view of recent research in the design of parallel multipliers and provides the proposed sum-of-products architecture. In Section IV, the paper provides power and throughput estimates for the sumof-products design and compares them to the estimates for conventional ARM multipliers and the proposed multipliers. Section V discusses current problems. Finally, a summary is given in Section VI. The designs presented in this paper assume 32-bit integer operands, but they can easily be extended to other types of fixed-point operands. II. PROBLEM Sum-of-products are found in many digital signal processing and multimedia applications including FIR filter, high pass filter, and inner-products. This computation is a summation of two products. It can be described by S =a×b+x×y

(2)

A variation of the sum-of-products is the inner-product. The inner-product is usually computed by repeatedly using a sumof-products.

192

S[i + 1] = a[i] × b[i] + x[i] × y[i] + S[i],

(3)

ASAP 2013

where S[0] = 1. Previous research has mainly focused on designs for dedicated multipliers demonstrating that parallel multipliers can be implemented with clustering/partitioning [5], pipelining [6], bypassing [7] and signal gating [8] techniques for reduced power dissipation. An improved modified Booth encoding with multiple-level conditional sum adder [9] and a sign select Booth encoder [10] techniques have been proposed for high-performance. However, recent studies show conventional arithmetic design cannot efficiently support increasing highthroughput and low-power requirements. The sum-of-product architecture offers an opportunity to satisfy these requirements. Due to the frequent use of multiplication and related arithmetic calculations in digital signal processing applications, many processors provide multiply and/or multiply-accumulate instructions. In order to execute sum-of-products operations, processors use an existing multiplier or a multiply-accumulate (MAC) unit. Conventional processors take extra cycles when using multipliers and MAC units to perform sum-of-products. Clearly, by including a sum-of-product operation one expects that fewer cycles are needed. We want to show that the energydelay product is also reduced. Consider a typical FIR filter: y[n] =

∞ X

c[k] × x[n − k]

(4)

k=−∞

This equation can be implemented in high level language, such as C, as follows: y[n] = 0 for(k = 0; k < N ; k + +) { y[n] = y[n] + c[k] × x[n − k] }

(5)

The last line corresponds a multiply-accumulate operation: x = x + y × z. This equation can be translated into a single multiply-accumulate instruction. The FIR filter can be also implemented in C in another way as:

III. SUM-OF-PRODUCTS DESIGN A. Baseline Architecture The sum-of-products baseline model needs two multipliers and one adder. One way to design the sum-of-products is to use two Partial Product Reduction (PPR) arrays and [4:2] adders followed by a single final Carry Propagate Adder (CPA). The other way to design the sum-of-products is to use two PPR arrays and two CPAs followed by a single CPA. The structure using a [4:2] adder followed by a single CPA will be a better solution, because it has one less carry-propagate addition, and thus the power and delay of this architecture is slightly better than those of its counterpart. The inner-products can be designed based on sum-of-products model. It consists of two PPR arrays, [6:2] adders and latches for accumulation and a single CPA. The [6:2] adders accumulate four inputs with the previous partial sums and carries. Figure 1 shows the baseline models. B. Multiplier Multipliers consume more power and have a longer latency than adders, and thus this paper mainly describes multiplier designs. The previous studies demonstrate that array multipliers that integrate array splitting and left-to-right techniques are better than tree multipliers in terms of power while keeping similar delay and area for up to 32-bit [11][12]. Therefore, in this paper we focus on developing the sum-of-products based on the left-to-right split array multipliers. 1) Left-to-Right Array Multiplier: In conventional right-toleft array multipliers, the Partial Products (PPs) are added sequentially from the rightmost multiplier bit. In contrast, in left-to-right array multipliers, the PPs are added in series starting from the leftmost multiplier bit [13]. Of the two designs, left-to-right array multipliers have the potential of saving power and delay, because the carry signals propagate fewer stages, which reduce the power consumption in the Most Significant (MS) region. Left-to-right array multipliers are superior for data with large range, because PPs corresponding to sign bits with low switching activities are located in the upper region of array [14][15].

y[n] = 0 for(k = 0; k < N ; k+ = 2) { y[n] = y[n] + c[k] × x[n − k] + c[k + 1] × x[n − k + 1] } (6) The last line corresponds to accumulated sum-of-products: x = x + y0 × z0 + y1 × z1 . This equation can be translated into a single instruction using the sum-of-products design. In the best case scenario, the sum-of-products operations will require only half the number of cycles using sum-of-products hardware compared to using a single multiplier.

(a)

(b)

Fig. 1. The baseline models: (a) Sum-of-products design (b) Inner-products design

193

2) Split Array Multiplier: In array multipliers, the lower rows consume much more power than the upper rows in PPR array, because glitches cause a snow balling effect as signals propagate through array [16]. Therefore, if the length of array could be reduced, there would be significant power savings. The way to reduce the array is to split the array into several parts. The previous architectures are the two-level even/odd [17] and upper/lower split array multiplier [14]. Each part only has a half number of rows, and is added separately in parallel. The final even/odd (upper/lower) vectors from two parts can be reduced to two vectors using [4:2] adders. The upper/lower split array architecture is shown in Figure 2. Previous studies have mainly focused on developing twolevel split array designs. However, there would be more powerand delay-efficient if each part is split further. The upper/lower structure is better than the even/odd structure if four-level splitting is used, because it allows simpler interconnection. The physical regularity of array multipliers will be maintained by interleaved placement & routing if we apply upper/lower structure. Moreover, the two-level upper/lower split array multiplier consumes the less power compared to the two-level even/odd counterpart. Therefore, in this paper, we utilize the four-level upper/lower split array multiplier. 3) Carry Save Adder: A [4:2] adder has been widely used in parallel multipliers. As technology scales to the deep sub-micron, the importance of simple wire interconnections increases. Compared to two cascaded [3:2] adders, a [4:2] adder has a regular structure with simple interconnection, and thus it reduces the physical complexity. Moreover a [4:2] adder has the same gate complexity as two [3:2] adders. However a [4:2] adder is faster than two cascaded [3:2] adders because it has 3 × TXOR2 delay while each single [3:2] adder has 2×TXOR2 delay. Thus, by using [4:2] adders, the PPR delay is reduced about 25% without area penalties. The delay reduction is positive for power savings as less switching activities can be generated when signals propagate fewer stages. In this paper, we utilize [4:2] adders for low-power design. IV. EXPERIMENTAL RESULTS A. ARM Multiplier Results We summarize relative performance using total execution time of programs. The execution time required for a program Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Fig. 2.

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

Upper/lower split array architecture [13].

Ǹ Ǹ Ǹ Ǹ Ǹ

Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ

can be written as Execution time for a program (7) = Clock cycles for a program × Clock cycle time = Instructions for a program × Clock cycles per instruction × Clock cycle time The instruction for a program depends upon compiler and Instruction Set Architecture (ISA), and the Clock cycles Per Instruction (CPI) depends upon ISA and micro architecture [18]. Therefore, we are restricted to the specific compiler, ISA and micro architecture for accurate results. A good example is the ARM architecture. The ARM instruction set differs from the pure RISC definition in several ways that make the ARM instruction set suitable for low-power embedded applications, and hence the ARM core is used to perform real-time digital signal processing in most embedded systems. Specifically, digital signal processing programs are typically multiplication intensive and the performance of the multiplication hardware is critical to meeting the real-time constraints. All ARM processors have included hardware support for integer multiplication and used two styles of multiplier [19]. Several ARM cores include a low-cost multiplication hardware that supports only the 32-bit results multiply and multiplyaccumulate instructions. This multiplier uses the main data path iteratively, employing the barrel shifter and Arithmetic Logic Unit (ALU) to generate a 2-bit PP in each clock. The other version is ARM cores with an M in their name (for example the ARM7DM) and recent higher performance cores have high-performance multiplier and support the 64-bit results multiply and multiply-accumulate instructions. This multiplier employs a modified Booth’s algorithm to produce the 2-bit PP. The carry save array has four layers of adders, each handling two multiplier bits, so the array can multiply 8-bit per clock cycle. The array is cycled up to four times, and the partial sum and carry are combined 32-bit at a time and written back into the register. As multiplication performance is very important, more hardware resource must be dedicated. The best choice is the ARM7TDMI-S processor. The ARM7TDMIS includes an enhanced 32 × 8 single multiplier with a radix-4 modified Booth’s algorithm, and this is synthesizable version of the ARM7TDMI core. Therefore, when trying to measure the cycle counts for an application executed on the ARM multiplier with the cycle-level simulator, synthesizable core provides efficient solution. ARM and Thumb are two different instruction sets supported by ARM cores with a T in their name. ARM instructions are 32-bit wide, and Thumb instructions are 16-bit wide. Thumb mode allows for code to be smaller, and can potentially be faster if the target has slow memory. However, multiply-accumulate operations are not available in Thumb mode. Therefore, we use ARM mode in this experiment. ARM7TDMI-S core does not have a sum-of-products hardware, but includes an enhanced single multiplier, and thus, in particular, cannot use the sum-ofproducts instruction directly. The ARM compiler avoids generating the sum-of-products instructions, and hence we cannot

194

directly measure the total clock cycles with sum-of-products using cycle-level simulation with compiled assembly code. This means for every sum-of-products instruction code must be regenerated manually after analyzing the original ARM assembly code. Suppose we have the modified implementation of ARM7TDMI-S ISA. We replace two consecutive multiplication operations into one sum-of-products operation. The sum-of-products instruction can execute two multiplications simultaneously, and then two products are converted into the final result using a CPA. The ARM7 multiplication finishes up to 4 clock cycles, and thus sum-of-products take up to 5 clock cycles due to single cycle final addition. To regenerate the modified ARM assembly code, we use the ARM technical reference manual after compiling the original C code [20]. The reference manual shows all instructions and their cycle count. We can measure the clock cycles using the ARM multiplier for benchmark programs by running cycle-level simulation using compiled ARM assembly code. The hardware/software co-simulation tool such as Mentor Graphics Questa Codelink profiles clock cycles for programs. The comparison results of clock cycle estimates are shown in Table I. Based on an analysis of clock cycles, we expect the clock cycles of sum-ofproducts to be 42% and 48% less than those of multiplication for FIR filter and high pass filter programs, respectively. The clock cycle time is usually published as part of the specification document. However, as the ARM7TDMI-S is synthesizable core, we can directly measure the power and latency of the ARM multiplier using Synopsys Design Compiler with ARM7TDMI-S HDL code, and estimate those of the sum-of-products hardware. We assume the sum-of-products hardware consists of two identical ARM7TDMI-S multiplier and ALU. Table II shows the power, delay and area of a multiplier and a sum-of-products hardware. The amount of energy used depends on the power and the time for which it is used, and can be written as

The execution time for benchmark programs can be calculated by using equation (7) and measured clock cycles for a program and clock rate. Table III summarizes the energy and execution time. The sum-of-products unit dissipates between 23% and 24% more energy than a single multiplier while between 40% and 41% decrease in execution time for a FIR filter program and between 11% and 12% more energy while between 45% and 46% decrease in execution time for a high pass filter. The designer often faces a trade-off between execution time and energy. Thus, we need to have a suitable metric for energy efficiency. Energy-delay product is widely used when reporting a new architecture design that addresses energy-performance effectiveness [21]. The sum-of-products units are better than multipliers only in terms of energy-delay product in the considered benchmarks. Shorter execution time of sum-of-products can provide less energy demanded by the design. If we reduce the supply voltage, our design can save significant energy. The reason is that the clock cycles per program with sum-of-products are reduced by half compared to those with the multiplier while reducing supply voltage is to increase the clock cycle time slightly. For example, if we replace the ARM multiplier at 1.32V with the sum-of-products at 1.08V for high pass filter, about 22% in execution time and 10% in energy can be decreased. For FIR filter, sum-of-products has 14% less execution time while keeping the same energy. The multiplier and sum-of-products are characterized in the execution time ratio vs. energy ratio in Figure 3. Energy ratio is decreased as execution time ratio is increased. The sumof-products unit consumes more energy as the difference of execution time between a sum-of-products and a multiplier is increased. The energy ratio is expected to be less than 1 if their execution time is the same. This means the sum-ofproducts unit consumes less power than a multiplier only when the execution time is the same.

Energy (Joules) = Power(Watts) × Time(Seconds) (8)

B. The Design Characteristics of the Proposed Sum-ofProducts Units

TABLE I C LOCK CYCLES FOR BENCHMARK PROGRAMS . FIR Filter (length = 100) 1415 1.00 817 0.58

Clock Cycles Multiplication Sum-of-products

High Pass Filter (length = 100) 1617 1.00 845 0.52

TABLE II T HE POWER ,

DELAY AND AREA OF THE ARM7TDMI-S MULTIPLIER AND A SUM - OF - PRODUCTS HARDWARE .

Supply Voltage 1.32 V 1.2 V 1.08 V ∗

Hardware MUL∗ SOP† MUL∗ SOP† MUL∗ SOP†

multiplier: measured value,



Power (µW) 1678 3461 1250 2578 940 1940

Delay (ns) 0.99 1.02 1.15 1.19 1.42 1.48

Area (NAND2) 1384 2941 1316 2788 1364 2896

sum-of-products: estimated value

We implemented the proposed sum-of-products unit using Verilog and a top-down methodology. The designs were verified using Cadence NC-Verilog and synthesized using Synopsys Design Compiler in Samsung 65 nanometer CMOS standard cell low power library. The proposed designs were synthesized with three supply voltages 1.08V, 1.20V, and 1.32V supported by technology. To reduce the effects of changes made by the synthesis tool to the structure of the original Verilog code, we implement the same design technology and use the same Synopsys Design Compiler constraints in all designs. Placement & routing processes were performed to obtain more precise results using Synopsys Astro. Delays were obtained from Synopsys PrimeTime, and powers were obtained from the Samsung in-house power estimation tool, CubicWare. Table IV shows power, delay and area estimates for sum-of-products design. Synthesis results indicate that the multipliers have most power, delay, and area of the sum-ofproducts design.

195

TABLE III T HE EXECUTION TIME , ENERGY, Benchmark Programs

AND ENERGY- DELAY PRODUCT OF THE ARM7TDMI-S MULTIPLIER AND A SUM - OF - PRODUCTS HARDWARE FOR BENCHMARK PROGRAMS .

Supply Voltage

Hardware ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products †

1.32 V FIR Filter (length = 100)

1.2 V 1.08 V 1.32 V

High Pass Filter (length = 100)

1.2 V 1.08 V



Execution Time (ns) 1400.85 1.00 833.34 0.59 1627.25 1.00 972.23 0.60 2009.30 1.00 1209.16 0.60 1600.83 1.00 861.90 0.54 1859.55 1.00 1005.55 0.54 2296.14 1.00 1250.60 0.55

Energy (µJ) 2.35 1.00 2.88 1.23 2.03 1.00 2.50 1.23 1.89 1.00 2.35 1.24 2.69 1.00 2.98 1.11 2.32 1.00 2.59 1.12 2.16 1.00 2.43 1.12

Energy-Delay 3292.00 2400.02 3303.32 2430.58 3797.58 2841.53 4306.23 2568.46 4314.16 2604.37 4959.66 3038.96

Product 1.00 0.73 1.00 0.74 1.00 0.75 1.00 0.60 1.00 0.60 1.00 0.61



: measured value, : estimated value

TABLE IV P OWER , Hardware Sum-of-products LR 4ULS 42 Array∗ × 2 [4:2] adder, CPA

DELAY, AND AREA FOR SUM - OF - PRODUCTS .

Power (µW) 3799.45 1.00 1469.90 × 2 0.39 × 2 859.65 0.22

Delay (ns) 13.80 1.00 12.68 0.92 1.12 0.08

Area (NAND2) 11870 1.00 5295 × 2 0.45 × 2 1280 0.10

*LR 4ULS 42 Array: 4-Level Upper/Lower Split Left-to-Right Array using [4:2] adder with supply voltage 1.08 V

TABLE V T HE POWER , Supply Voltage 1.32 V 1.2 V 1.08 V ∗

DELAY AND AREA OF THE PROPOSED MULTIPLIER AND SUM - OF - PRODUCTS HARDWARE .

Hardware LR 4ULS 42∗ SOP† LR 4ULS 42∗ SOP† LR 4ULS 42∗ SOP†

Power (µW) 2691 5825 2246 4844 1856 3799

Delay (ns) 9.84 10.74 11.48 11.85 13.02 13.80

Area (NAND2) 5864 12226 5736 12038 5722 11870

LR 4ULS 42: 4-Level Upper/Lower Split Left-to-Right Array Multiplier using [4:2] adder = LR 4ULS 42 Array + CPA † sum-of-products

To compare cycle-level results, the other experiment is to use the proposed multipliers and sum-of-products. For accurate results, we assume the clock cycles for benchmark programs are the same as those in ARM7 test environments. Table V shows the power, delay and area of the proposed multiplier and sum-of-products hardware, and Table VI summarizes the energy and execution time. The sum-of-products unit dissipates between 25% and 36% more energy than a single multiplier while between 37% and 40% decrease in execution time and between 15% and 23% less in energy-delay product for a FIR filter program. Also the sum-of-products consumes between 13% and 23% more energy with between 43% and 46% decrease in execution time and between 30% and 37% decrease in energy-delay product for a high pass filter. The sum-of-products is better than the multiplier only solution in terms of energy-delay product. V. DISCUSSION

mainly determined by the silicon process technology and the total number of transistors. Unfortunately, the sum-of-products design will consume more static power than a multiplier only due to larger area. One obvious technique to reduce static power is to reduce the supply voltages used in the circuit [22]. However, it is difficult to find opportunities to reduce the supply voltage, since static power dissipation decreases with the scaling of supply voltage, while delay only increases linearly. It is possible to use high supply voltage in the critical paths of a design to achieve the required performance while the offcritical paths of the design use lower supply voltage to achieve low static power dissipation. By partitioning the circuit into several domains operating at different supply voltages, static power savings are possible. However, level shifter circuits are required for inter-domain communication. This should come at the cost of added circuitry. The other way is to use multi-threshold voltages. This technology has transistors with multiple threshold voltages in order to optimize delay or power. In modern process technology, multiple threshold voltages are provided for each transistor. A multi-threshold voltage process provides the designer with transistors that are either fast with a high static power or slow with the low static power. Therefore, a circuit can be partitioned into high and low threshold voltage gates, which are a trade-off between high-performance and reduced static power dissipation. However, a limitation of this technique is that CAD tools need to be developed and integrated into the design to optimize the partitioning process.

A. Static Power Dissipation

VI. SUMMARY

As mentioned earlier, power of a circuit consists of static and dynamic power dissipation components. Static power is

In this paper, we have proposed a new sum-of-products arithmetic architecture, and have discussed ways of power

196

TABLE VI T HE EXECUTION TIME , ENERGY,

AND ENERGY- DELAY PRODUCT OF THE PROPOSED MULTIPLIER AND SUM - OF - PRODUCTS HARDWARE FOR BENCHMARK PROGRAMS .

Benchmark Programs

Supply Voltage

LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products

1.32 V FIR Filter (length = 100)

1.2 V 1.08 V 1.32 V

High Pass Filter (length = 100)

Execution Time (µs) 13.92 1.00 8.77 0.63 16.24 1.00 9.68 0.60 18.42 1.00 11.27 0.61 15.91 1.00 9.08 0.57 18.65 1.00 10.01 0.54 21.05 1.00 11.66 0.55

Hardware

1.2 V 1.08 V

ARM7 TDMIS-S mu ltip lier v s su m-of-produ cts

Trend line When t he execut ion t ime is t he same, sum-of -prod uct s unit s consume less power.

(a) LR_ 4 ULS_ 4 2 v s su m-o f-p rodu cts

Trend line

When t he execut ion t ime is t he same, sum-of -prod uct s unit s consume less power.

(b) Fig. 3. Comparison of energy ratio for execution time ratio of benchmarks: (a) ARM7TDMI-S multiplier and sum-of-products (b) LR 4ULS 42 and sumof-products

savings of the sum-of-products. The proposed unit achieves a significant power or delay savings. It is comparable in power and latency to other current multiplier designs. Compared to sum-of-products implementation using the ARM7TDMI-S multiplier, the proposed sum-of-products unit can reduce the execution time by approximately 45% with 15% of energy penalty in benchmark applications. R EFERENCES

Energy (µJ) 37.48 1.00 51.11 1.36 36.49 1.00 46.90 1.29 34.20 1.00 42.84 1.25 42.84 1.00 52.86 1.23 41.70 1.00 48.51 1.16 39.08 1.00 44.31 1.13

Energy-Delay Product 521.85 1.00 448.50 0.85 592.72 1.00 454.07 0.77 630.10 1.00 482.97 0.77 681.48 1.00 479.76 0.70 774.03 1.00 485.72 0.63 822.83 1.00 516.65 0.63

[2] J. M. Rabaey, A. P. Chandrakasan, and B. Nikolic, Digital integrated circuits: a design perspective. 2nd ed., Prentice Hall, 2003. [3] J. M. Rabaey, Low power design essentials. Springer, 2009. [4] D. E. Culler, J. P. Singh, and A. Gupta, Parallel computer architecture: a hardware/software approach. Morgan Kaufmann Publishers, 1998. [5] A. A. Fayed and M. A. Bayoumi, “A novel architecture for low-power design on parallel multipliers,” in Proc. IEEE Comput. Soc. Workshop on VLSI, Apr. 2001, pp. 149–154. [6] J. Di and J. S. Yuan, “Power-aware pipelined multiplier design based on 2-dimensional pipeline gating,” in Proc. GLSVLSI, Apr. 2003, pp. 64–67. [7] M.-C. Wen, S.-J. Wang, and Y.-N. Lin, “Low power parallel multiplier with column bypassing,” in Proc. ISCAS, May 2005, pp. 1638–1641. [8] Z. Huang and M. D. Ercegovac, “Two-dimensional signal gating for low-power array multiplier design,” in Proc. ISCAS, vol. 1, Aug. 2002, pp. 489–492. [9] W. Yeh and C. Jen, “High-speed Booth encoded parallel multiplier design,” IEEE Trans. Comput., vol. 49, no. 7, pp. 692–700, Jul. 2000. [10] K. Choi and M. Song, “Design of a high performance 32 × 32-bit multiplier with a novel sign select Booth encoder,” in Proc. ISCAS, vol.2, pp. 701–704, May 2001. [11] Z. Huang, High-level optimization techniques for low power multiplier design, Ph.D. dissertation, University of California at Los Angeles, 2004. [12] Z. Huang and M. D. Ercegovac, “High-performance low-power left-toright array multiplier design,” IEEE Trans. Comput., vol. 54, no. 3, pp. 272–283, Mar. 2005. [13] M. D. Ercegovac and T. Lang, “Fast multiplication without carrypropagate addition,” IEEE Trans. Comput., vol. 39, no. 11, pp. 1385– 1390, Nov. 1990. [14] Z. Huang and M. D. Ercegovac, “Low power array multiplier design by topology optimization,” in Proc. SPIE Advanced Signal Processing Algorithms, Architectures, and Implementations XII, vol. 4791, Jul. 2002, pp. 424–435. [15] Z. Huang and M. D. Ercegovac, “Number representation optimization for low power multiplier design,” in Proc. SPIE Advanced Signal Processing Algorithms, Architectures, and Implementations XII, vol. 4791, Jul. 2002, pp. 345–356. [16] T. Sakuta, W. Lee, and P.T. Balsara, “Delay balanced multipliers for low power/low voltage DSP core,” in Proc. ISLPED, Oct. 1995, pp. 36–37. [17] S. S. Mahant-Shetti, P. T. Balsara, and C. Lemonds, “High performance low power array multiplier using temporal tiling,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 7, no. 1, pp. 121–124, Mar. 1999. [18] J. L. Hennessey and D. A. Patterson, Computer organization and design: the hardware/software interface, Morgan Kaufmann Publishers, 2005. [19] S. Furber, ARM system architecture, Addison-Wesley, 1996. [20] ARM, ARM7TDMI technical reference manual [21] R. Gonzales and M. Horowitz, “Energy dissipation in general purpose microprocessors,“ IEEE J. Solid-State Circuits, vol. 31, no. 9, Sep. 1996. [22] S. W. Heo, S. J. Huh, and M. D. Ercegovac, “Power optimization in a parallel multiplier using voltage islands,” To appear in Proc. ISCAS, May 2013.

[1] W. Suntiamorntut, Energy efficient functional unit for a parallel asynchronous DSP, Ph.D. dissertation, University of Manchester, 2005.

197

6/24/2015

Gmail ­ ASAP2013 notification for paper 16

Seok Won Heo 

ASAP2013 notification for paper 16 ASAP2013  To: Seok Won Heo 

Wed, Mar 27, 2013 at 1:49 PM

Dear Authors, We received an number of outstanding papers for ASAP13 and we are pleased to inform you that your paper was selected as a SHORT PAPER for this year's conference.  The reviews are included at the end of this email.  Please address the comments from the reviewers and provide your camera­ready ***4­PAGE*** paper (2 additional pages my be purchased for a fee) as specified in the directions on the conference website (http://asap­conference.org/). Camera­ready papers are due April 21st. Information regarding registration, lodging, and travel for the conference are available at the conference website: http://asap­conference.org/ Best Regards, ASAP PC Co­Chairs ­­­­­­­­­­­­­­­­­­­­­­­ REVIEW 1 ­­­­­­­­­­­­­­­­­­­­­ PAPER: 16 TITLE: Power Optimization of Sum­of­Products Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 4 (weak accept) Relevance: 5 (excellent) Originality: 5 (excellent) Soundness: 5 (excellent) Language: 2 (poor) Presentation: 3 (fair) Best paper candidate: 1 (no) ­­­­­­­­­­­ REVIEW ­­­­­­­­­­­ This paper examines a proposed enhancement to the ARM7TDMI­S processor core.  The processor's base design includes a hardware multiply unit that uses a modified Booth's algorithm with a carry save array of four layers to perform a 32x8 multiply per cycle, presumably requiring an 8 cycle latency to produce a full 64­bit result.  Although the ARM instruction set includes a multiply­accumulate instruction, it cannot be executed natively on this version of the core because it lacks a fused­multiply­add unit.  The authors designed their own sum of products functional unit, which is different from a multiply­accumulate in that it can perform two multiplies and one add in a single instruction.  Their functional unit is designed to be energy efficient because it uses a split array multiplier comprised of two left­to­right array multipliers which are built on [4:2] adders.  In their sum of products instruction, it seems that their one multiplier is used twice and the addition is perfo rmed using an accumulator.  Using this new instruction, the authors are able to achieve substantial improvement of energy­delay product for an FIR filter kernel. This is an interesting paper because its results are from transistor­level models that include post­place and route layout parasitics, which is the most accurate way to measure power consumption aside from directly instrumenting the fabricated silicon.  The authors used powerful tools, models, and a full technology and standard cell library to collect these results.  I was particularly impressed with how the authors integrated their functional unit into the ARM core design.  I would have thought that the ARM IP would be somehow encrypted or otherwise obfuscated to protect ARM's proprietary designs, making an effort like this impossible.  I’m also impressed that they were able to perform whole­program simulation using a circuit­level simulator.  That must have taken an enormous amount of simulation time. https://mail.google.com/mail/u/0/?ui=2&ik=d680faa9bb&view=pt&q=ASAP&qs=true&search=query&msg=13dad9d283a63fdb&siml=13dad9d283a63fdb

1/4

6/24/2015

Gmail ­ ASAP2013 notification for paper 16

­­­­­­­­­­­­­­­­­­­­­­­ REVIEW 2 ­­­­­­­­­­­­­­­­­­­­­ PAPER: 16 TITLE: Power Optimization of Sum­of­Products Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 5 (strong accept) Relevance: 5 (excellent) Originality: 4 (good) Soundness: 5 (excellent) Language: 5 (excellent) Presentation: 5 (excellent) Best paper candidate: 2 (yes) ­­­­­­­­­­­ REVIEW ­­­­­­­­­­­ I appreciated the clarity of this paper. The paper shows that expressing dots products and matrix products as sums­of­products (i.e., as operations of the form ab+cd) may lead to significant reduction in execution time without large energy penalty. The suggested architectures look interesting (yet I must say that I am not an expert in multiplier design). Maybe the name "sum­of­products" is a bit misleading: sum­of­two­products would be better. The fact that application to dot product, FIR and IIR filters is interesting is intuitive. It would be interesting to see if this remains true for other applications (e.g. FFT, polynomial evaluation, LU decomposition, etc.), so trying other benchmarks would be important. Also, I would appreciate tests to know what is the impact on accuracy. ­­­­­­­­­­­­­­­­­­­­­­­ REVIEW 3 ­­­­­­­­­­­­­­­­­­­­­ PAPER: 16 TITLE: Power Optimization of Sum­of­Products Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 4 (weak accept) Relevance: 5 (excellent) Originality: 4 (good) Soundness: 3 (fair) Language: 5 (excellent) Presentation: 3 (fair) Best paper candidate: 1 (no) ­­­­­­­­­­­ REVIEW ­­­­­­­­­­­ This submission presents a very high­performance parallel arithmetic unit for the computation of sum­of­products : a*b + c*d. The two products a*b and c*d are performed in parallel inside the unit to speed up the execution of long multiply­accumulate loops. The arithmetic part of the paper is new and very interesting but not very well described. It is difficult to clearly understand what is the proposed solution for the partial product reduction array. It seems to be a mix between left­to­right array multipliers to reduce the glitching activity and a split array multiplier. But the full description is not very clear. For the accumulation part, [4:2] adders seem to be used, but this part is not clear. Does the unit contain an accumulator (redundant or non­redundant?) or a general register of the register file is used as an accumulator? The authors mainly detail the a*b+c*d part not the accumulation. Without internal accumulation, this unit would require 5 read ports on the register file (see below). Moreover in case of an internal redundant accumulator, the representation of signed numbers in redundant form is not detailed (2's complement extension for carry­save trees is not very efficient for power aspects). One cannot assume only positive integers are used the real signal processing applications. Several figures presented in the paper are not clear (fig. 2, 3, 4.b). The text inside these figures is too small. It is not possible to read the details. https://mail.google.com/mail/u/0/?ui=2&ik=d680faa9bb&view=pt&q=ASAP&qs=true&search=query&msg=13dad9d283a63fdb&siml=13dad9d283a63fdb

2/4

6/24/2015

Gmail ­ ASAP2013 notification for paper 16

It seems that there is no accumulation part in figure 1.a, but it is not clear why compared to fig. 1.b. The authors reports a lot of simulation results. The proposed sum­of­product unit clearly saves power compared to a simple a*b multiplier unit. But the comparison is not totally fair. It seems that the authors only compare the arithmetic part, not the performance and power of a complete processor. Using a a*b + c*d sum­of­product arithmetic unit instead of simple a*b one, will require a register file with 4 read ports (or 5 read ports in case of a solution without internal accumulator). This would increase the fanout at the output of the register file and add delays in the registers address decoding path. What would be the impact on the number of registers in the register file and one the load­store unit in the processor if 4 operands are fetched at each clock cycle instead of two? Then the proposed arithmetic part may slow down the complete processor and all other parts of the applications. A complete architecture + arithmetic analysis would be interesting for the audience o f ASAP conference. ­­­­­­­­­­­­­­­­­­­­­­­ REVIEW 4 ­­­­­­­­­­­­­­­­­­­­­ PAPER: 16 TITLE: Power Optimization of Sum­of­Products Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 3 (borderline paper) Relevance: 3 (fair) Originality: 3 (fair) Soundness: 2 (poor) Language: 3 (fair) Presentation: 3 (fair) Best paper candidate: 1 (no) ­­­­­­­­­­­ REVIEW ­­­­­­­­­­­ This paper describes the authors' work on designing and implementing multipliers and sum­of­products units. The overall presentation of the paper is good, easy to follow. The paper also includes extensive experimental results for different units and scenarios. However, most of the design are based on existing methods or approaches. I do not see much novel contribution in this work. The title of the paper is focused on power optimization. But what I see is just a straightforward design and measurements of power consumption in the experiments. I do not see much optimization efforts that try to reduce the power consumption of the design. For the comparison part, we see the reduction in execution time by using sum­of­products units. but we also see a big increase in area. The two filter applications might not be enough to justify the advantage of sum­of­product designs. It might be more convincing if we can see some more benchmarks. ­­­­­­­­­­­­­­­­­­­­­­­ REVIEW 5 ­­­­­­­­­­­­­­­­­­­­­ PAPER: 16 TITLE: Power Optimization of Sum­of­Products Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 2 (weak reject) Relevance: 3 (fair) Originality: 2 (poor) Soundness: 3 (fair) Language: 3 (fair) Presentation: 2 (poor) Best paper candidate: 1 (no) ­­­­­­­­­­­ REVIEW ­­­­­­­­­­­ The authors propose adding sum of products hardware to an ARM microprocessor. When the new hardware is used, energy for sum­of­products computation is reduced. https://mail.google.com/mail/u/0/?ui=2&ik=d680faa9bb&view=pt&q=ASAP&qs=true&search=query&msg=13dad9d283a63fdb&siml=13dad9d283a63fdb

3/4

6/24/2015

Gmail ­ ASAP2013 notification for paper 16

The paper's presentation leaves something to be desired. It is not entirely clear until the end what the authors believe their particular contribution to be, and background is mixed with the "new content". From a technical perspective, the paper's conclusion is somewhat obvious: it is well understood that dedicated hardware is more efficient in terms of both performance and energy than microprocessors. The authors have selected a processor that does not have a multiply­accumulate instruction as a starting point, so even the addition of just that instruction would already have helped. Minor other notes: Page 1, Col 1: "multiplier is one of the most power demanding components on these chips" ­­> this statement is crying out for a reference. Page 3, Col 2: "presents the less power consumption" ­­> reword Page 5, table 3: it is not entirely clear whether 'energy' refers to energy for the complete core, or just for the multipler/sum­of­products unit. ­­­­­­­­­­­­­­­­­­­­­­­ REVIEW 6 ­­­­­­­­­­­­­­­­­­­­­ PAPER: 16 TITLE: Power Optimization of Sum­of­Products Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 3 (borderline paper) Relevance: 4 (good) Originality: 3 (fair) Soundness: 3 (fair) Language: 4 (good) Presentation: 4 (good) Best paper candidate: 1 (no) ­­­­­­­­­­­ REVIEW ­­­­­­­­­­­ This paper proposes a method to design a sum­of­prodcuts unit for digital signal processing that reduces the execution time of sum­of­products calculations compared to using a multiplier found on the ARM7TDMI­S processor.  The proposed sum­of­products unit uses left­to­right multiplication, a four­level upper/lower split array multiplier, and 4:2 adder. The paper is fairly well written and the proposed approach is interesting and seems to be effective. Some issues with the paper include: (1) The techniques used to design the sum­of­products unit (left­to­right multiplication, split array multiplier, and 4:2 adder) have all been previously published. It seems like their new contributions are the us of a four­level split array multiplier (but this design is not described well)), applying the multiplier design to sum­of products calculations (but this is trivial) and the comparison with the multiplier in the ARM7TDMI­S processor. The authors should be more clear about the novel contributions of the paper. (2) The authors compare their design with the multiplier in the ARM7TDMI­S processor. However, this is an odd choice since the ARM processor has a 32­bit by 8­bit multiplier and does not direct support sum­of­products computations. It would be better if the authors instead compared their sum­of­products unit to previous­sum­of­ products units. Also, it was not clear from the discussion if the authors implemented a their sum­of­product unit using 32­bit by 32­bit multipliers or something else.

https://mail.google.com/mail/u/0/?ui=2&ik=d680faa9bb&view=pt&q=ASAP&qs=true&search=query&msg=13dad9d283a63fdb&siml=13dad9d283a63fdb

4/4

ASAP 2013 Table of Contents

[Page 6 / 11]

Linear Algebra and Signal Processing 160

A Practical Measure of FPGA Floating Point Acceleration for High Performance Computing (John D. Cappello, Dave Strenski)

168

Sparse Matrix-Vector Multiply on the Texas Instruments C6678 Digital Signal Processor (Yang Gao, Jason D. Bakos)

175

Transforming a Linear Algebra Core to an FFT Accelerator (Ardavan Pedram, John McCalpin, Andreas Gerstlauer)

185

Reduce, Reuse, Recycle (R 3 ): A Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms (Kevin Townsend, Joseph Zambreno)

192

Power Optimization of Sum-of-Products Design for Signal Processing Applications (Seok Won Heo, Suk Joong Huh, Milo˘s D. Ercegovac)

198

An Efficient & Reconfigurable FPGA and ASIC Implementation of a Spectral Doppler Ultrasound Imaging System (Adam Page, Tinoosh Mohsenin)

[Search]

ASAP 2013 Brief Author Index

[Page 3 / 6]

H

J

Hajkazemi, Mohammad Hossein . . . . . . . . 153 Hammond, Simon D. . . . . . . . . . . . . . . . . . . . . . . . 321 Hannig, Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1, 10 Hao, Lu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Hemani, Ahmed . . . . . . . . . . . . . . . . . . . . . . . . 227, 277 Heo, Deukhyoun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Heo, Seok Won . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Hsieh, Genie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Hu, X. Sharon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Huang, Jia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Huang, Kai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Huh, Suk Joong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Hunt, Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .237 Hussain, Waqar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Hutchings, Brad L. . . . . . . . . . . . . . . . . . . . . . . . . . . 363

Jafri, Syed M.A.H. . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Jain, Abhishek Kumar . . . . . . . . . . . . . . . . . . . . . .219 Jarollahi, Hooman . . . . . . . . . . . . . . . . . . . . . . . . . . 305

K Kang, Jihoon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Ke˛ pa, Krzysztof . . . . . . . . . . . . . . . . . . . . . . . . . 26, 261 Kim, Yongjoo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Kirchgessner, Robert . . . . . . . . . . . . . . . . . . . . . . . 211 Knoll, Alois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Ko, Yohan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Koelmans, Albert . . . . . . . . . . . . . . . . . . . . . . . . . . . .314 Kougianos, Elias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

I Ioualalen, Arnault . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

[Search]

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

ASAP 2013 Detailed Author Index

[Page 12 / 36]

H

Hajkazemi, Mohammad Hossein 153

FARHAD: A Fault-Tolerant Power-Aware Hybrid Adder for Add Intensive Applications

Hammond, Simon D. 321

GPU Acceleration of Data Assembly in Finite Element Methods and Its Energy Implications

Hannig, Frank 1

Symbolic Parallelization of Loop Programs for Massively Parallel Processor Arrays

10

Loop Program Mapping and Compact Code Generation for Programmable Hardware Accelerators

Hao, Lu 91

Virtual Finite-State-Machine Architectures for Fast Compilation and Portability

Hemani, Ahmed 227

Private Configuration Environments (PCE) for Efficient Reconfiguration, in CGRAs

277

Unifying CORDIC and Box-Muller Algorithms: An Accurate and Efficient Gaussian Random Number Generator

Heo, Deukhyoun 79

Design Space Exploration for Reliable mm-Wave Wireless NoC Architectures

Heo, Seok Won 192

Power Optimization of Sum-of-Products Design for Signal Processing Applications

Hsieh, Genie 321

GPU Acceleration of Data Assembly in Finite Element Methods and Its Energy Implications H continues on next page…

[Search]

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Message from the ASAP 2013 Chairs

Tarek El-Ghazawi, General Chair

Alan George, General Co-Chair

Melissa Smith, Program Chair

Kubilay Atasu, Program Co-Chair

We welcome you to the 24th IEEE International Conference on Applicationspecific Systems, Architectures and Processors (ASAP 2013). This year’s event takes place in Washington, D.C., USA, the capital of the United States. Prior to visiting D.C., the conference has been held in many places around the globe including Oxford (1986), San Diego (1988), Killarney (1989), Princeton (1990), Barcelona (1991), Berkeley (1992), Venice (1993), San Francisco (1994), Strasbourg (1995), Chicago (1996), Zürich (1997), Boston (2000), San Jose (2002), The Hague (2003), Galveston (2004), Samos (2005), Steamboat Springs (2006), Montréal (2007), Leuven (2008), Boston (2009), Rennes (2010), Santa Monica (2011), and Delft (2012). This year’s program includes an exciting collection of contributions resulting from a successful call for papers. The selected papers have been divided into thematic areas, which include regular papers, short papers, and poster papers that highlight the current focus of application-specific systems research activities. In response to the call for papers, 125 submissions were received, 108 of which were reviewed. These submitted papers came from 33 countries in Africa, Asia, Europe, and America. The largest number of submitting authors have an affiliation in the US (140), followed by the EU (102), and China (30). Submissions were subjected to a rigorous review by the members of the program committee, 46 members from 12 countries, as well as 70 external reviewers: the committee provided 235 reviews and external reviewers contributed 98 reviews. After an intense scrutiny of the reviews, we are pleased to present a high quality technical

continues ...

program that includes 24 long papers, 15 short papers, and 22 posters for presentation at the conference. They represent the current state-of-the-art in application-specific systems research. These are complemented by keynote and invited talks. We thank the authors who responded to our call for papers, the members of the program committee and the external referees who, with their opinion and expertise, ensured a very high quality program. Herman Lam worked tirelessly to publicize and seek out industry sponsors for the conference. Lubomir Riha helped in getting the IEEE and CS sponsorship. Ahmad Anbar ensured that the web environment was active and responsive and that we were on track with finances. Esam El-Araby ensured that the proceedings are indeed a reality and Vikram Narayana made sure the registration process was smooth. We are grateful to the IEEE Computer Society for sponsoring the conference. Thank you all. We hope that the proceedings will serve as a useful reference of the state-of-theart in application-specific systems research. General Chair: Tarek El-Ghazawi, The George Washington University, USA General Co-Chair: Alan George, University of Florida, USA Program Chair: Melissa Smith, Clemson University, USA Program Co-Chair: Kubilay Atasu, IBM Research - Zurich, Switzerland May 2013

ASAP 2013 Conference Committee General Chair: Tarek El-Ghazawi, The George Washington University, USA General Co-Chair: Alan George, University of Florida, USA Honorary General Chair: Reiner Hartenstein, University of Kaiserslautern, Germany Program Chair: Melissa Smith, Clemson University, USA Program Co-Chair: Kubilay Atasu, IBM Research - Zurich, Switzerland Industrial Chair: Herman Lam, University of Florida, USA Publication Chair: Esam El-Araby, The Catholic University of America, USA Registration Chair & Awards Chair:: Vikram Narayana, The George Washington University, USA Finance Chair: Lubomir Riha, The George Washington University, USA Finance Co-Chair & Web Chair: Ahmad Anbar, The George Washington University, USA Program Committee: Peter Athanas, Virginia Tech, USA Jason Bakos, University of South Carolina, USA Pascal Benoit, University of Montpellier, France Jeremy Buhler, Washington University in St. Louis, USA Joseph Cavallaro, Rice University, USA Roger Chamberlin, Washington University in St. Louis, USA Anupam Chattopadhyay, RWTH Aachen University, Germany Esam El-Araby, Catholic University of America, USA Suhaib Fahmy, Nanyang technical University of Singapore, Singapore Michael J. Flynn, Stanford University, USA José A.B. Fortes, University of Florida, USA Frank Hannig, University of Erlangen-Nuremberg, Germany

continues ...

Haohuan Fu, Tsinghua University, China Krzysztof Kuchcinski, Lund Institute of Technology, Sweden Sun-Yuan Kung, Princeton University, USA Philip Leong, The University of Sydney, Australia Wayne Luk, Imperial College London, UK Diana Gohringer, KIT, Germany Ann Gordon-Ross, University of Florida, USA Akila Gothandaraman, University of Pittsburgh Center for Simulation&Modeling, USA Peter Hofstee, IBM Research – Austin, USA Brian Holland, SRC, USA Martin Herbordt, Boston University, USA Volodymyr Kindratenko, NCSA, USA Jean-Michel Muller, Ecole Normale Supérieure de Lyon, France Onur Mutlu, Carnegie Mellon University, USA Oliver Pell, Maxeler Technologies, UK Gang Qu, University of Maryland, USA Oliver Sander, KIT, Germany Kentaro Sano, Tohoku University, Japan Ron Sass, UNC Charlotte, USA Mariagiovanna Sami, Politecnico di Milano, Italy Michael J. Schulte, AMD Research, USA Cristina Silvano, Politecnico di Milano, Italy Eric Stahlberg, OpenFPGA, National Cancer Institute, USA Thomas Steinke, Zuse Institute Berlin, Germany Dave Strenski, Cray, USA Earl Swartzlander, University of Texas at Austin, USA Jürgen Teich, University of Erlangen-Nuremberg, Germany David Thomas, Imperial College London, UK Ingrid Verbauwhede, K.U.Leuven, Belgium Mike Wirthlin, Brigham Young University, USA Christophe Wolinski, Université de Rennes 1, France Roger Woods, Queen’s University Belfast, UK External Reviewers: Per Andersson Jose Rodrigo Azambuja Paul Barber Tobias Becker Ramakrishna Bijanapalli Chakri Srinivas Boppu Vy Bui Vineet Chadha Francois Charot Kit Cheung Laurent Condat

Ayesha Khalid Peter Kornerup William Kritikos Vahid Lari Tao Li Andrew Love Stephen McKeown Alastair McKinley Nick Ng Xinyu Niu Stuart Oberman Gianluca Palermo

Yuichiro Shibata Ali Asgar Sohanghpurwala Renato Stefanelli Christoph Studer Michael Sullivan Hiroyuki Takizawa Arnaud Tisserand Carsten Tradowsky David Uliana Girish Venkatasubramanian Aida Vosoughi Guohui Wang

continues ...

Florent de Dinechin James Demma Shaver Deyerle Chris Dobson Renato Figueiredo Scott Fischaber Michael Frechtling Flavius Gruian John Harris Abhishek Jain Shweta Jain Mioara Joldes

Vivek Pallipuram Raphael Polig Mitra Purandare Krishna Ramadurai Nimisha Raut Felix Reimann Kurt Rooks Zoltán Endre Rákossy Nilim Sarma Yukinori Sato Moritz Schmid Bernhard Schmidt

Andreas Weichselgartner Eddie Weill Michael Wu Hongyi Xin Simin Xu Sotirios Xydis Yoshiki Yamaguchi Gavin Yao Bei Yin Qi Zhang Daniel Ziener

Message from the ASAP 2013 Chairs

Tarek El-Ghazawi, General Chair

Alan George, General Co-Chair

Melissa Smith, Program Chair

Kubilay Atasu, Program Co-Chair

We welcome you to the 24th IEEE International Conference on Applicationspecific Systems, Architectures and Processors (ASAP 2013). This year’s event takes place in Washington, D.C., USA, the capital of the United States. Prior to visiting D.C., the conference has been held in many places around the globe including Oxford (1986), San Diego (1988), Killarney (1989), Princeton (1990), Barcelona (1991), Berkeley (1992), Venice (1993), San Francisco (1994), Strasbourg (1995), Chicago (1996), Zürich (1997), Boston (2000), San Jose (2002), The Hague (2003), Galveston (2004), Samos (2005), Steamboat Springs (2006), Montréal (2007), Leuven (2008), Boston (2009), Rennes (2010), Santa Monica (2011), and Delft (2012). This year’s program includes an exciting collection of contributions resulting from a successful call for papers. The selected papers have been divided into thematic areas, which include regular papers, short papers, and poster papers that highlight the current focus of application-specific systems research activities. In response to the call for papers, 125 submissions were received, 108 of which were reviewed. These submitted papers came from 33 countries in Africa, Asia, Europe, and America. The largest number of submitting authors have an affiliation in the US (140), followed by the EU (102), and China (30). Submissions were subjected to a rigorous review by the members of the program committee, 46 members from 12 countries, as well as 70 external reviewers: the committee provided 235 reviews and external reviewers contributed 98 reviews. After an intense scrutiny of the reviews, we are pleased to present a high quality technical

continues ...

program that includes 24 long papers, 15 short papers, and 22 posters for presentation at the conference. They represent the current state-of-the-art in application-specific systems research. These are complemented by keynote and invited talks. We thank the authors who responded to our call for papers, the members of the program committee and the external referees who, with their opinion and expertise, ensured a very high quality program. Herman Lam worked tirelessly to publicize and seek out industry sponsors for the conference. Lubomir Riha helped in getting the IEEE and CS sponsorship. Ahmad Anbar ensured that the web environment was active and responsive and that we were on track with finances. Esam El-Araby ensured that the proceedings are indeed a reality and Vikram Narayana made sure the registration process was smooth. We are grateful to the IEEE Computer Society for sponsoring the conference. Thank you all. We hope that the proceedings will serve as a useful reference of the state-of-theart in application-specific systems research. General Chair: Tarek El-Ghazawi, The George Washington University, USA General Co-Chair: Alan George, University of Florida, USA Program Chair: Melissa Smith, Clemson University, USA Program Co-Chair: Kubilay Atasu, IBM Research - Zurich, Switzerland May 2013

6/16/2015

Copy of www.cs­conference­ranking.org

Conference Ranking (was www.cs­conference­ranking.org) I maintained here (with only cosmetic alteration: the conferences where I had accepted papers have links to their latest edition ­ you can find other copies of it there there and there) a copy of what was on http://www.cs­conference­ranking.org/conferencerankings/alltopics.html Sadly that webpage is no longer maintained, but my feeling is that it was one of the most accurate conferences ranking, and the fine grain (49 possibilities) gave a better understanding than the usual A,B,C notes (even though the second digit is probably not so representative).

Databases / Knowledge and Data Management / Data Security / Web / Mining Although we will attempt to keep this information accurate, we cannot guarantee the accuracy of the information provided. The numbers in brackets correspond to the EIC value (Estimated Impact of Conference). The numbers are normalized to be in the range 0.00­1.00 (the closer the number to 1.00, the better the conference). Only conferences with EIC above 0.50 have been included. We will attempt to update the ranking lists every three months (end of January, April, July, and October), but this is getting more of a challenge than originally anticipated. Conferences listed below are considered to be tier 1 research meetings in their respective fields. Top 88 conferences are listed (636 considered): SIGMOD: ACM SIGMOD Conf on Management of Data (0.99) VLDB: Very Large Data Bases (0.99) KDD: Knowledge Discovery and Data Mining (0.99) ICDE: Intl Conf on Data Engineering (0.98) ICDT: Intl Conf on Database Theory (0.97) S&P: IEEE Symposium on Security and Privacy (0.97) SIGIR: ACM SIGIR Conf on Information Retrieval (0.96) PODS: ACM SIGMOD Conf on Principles of DB Systems (0.95) WWW: World­Wide Web Conference (0.92) FODO: Intl Conf on Foundation on Data Organization (0.91) ER: Intl Conf on Conceptual Modeling ER (0.90) CIKM: Intl. Conf on Information and Knowledge Management (0.90) KR: Intl Conference on Principles of Knowledge Representation and Reasoning (0.90) DOOD: Deductive and Object­Oriented Databases (0.89) DEXA: Database and Expert System Applications (0.88) SSDBM: Intl Conf on Scientific and Statistical DB Mgmt (0.88) COMAD: Intl Conf on Management of Data (0.88) EDBT: Extending DB Technology (0.88) ICDM: IEEE International Conference on Data Mining (0.87) VDB: Visual Database Systems (0.87) SSD: Intl Symp on Large Spatial Databases (0.85) CoopIS: Conference on Cooperative Information Systems (0.85) SAM: Intl Conference on Security and Management (0.85) IFIP­DS: IFIP­DS Conference (0.85) DaWaK: Data Warehousing and Knowledge Discovery (0.85) ADTI: Intl Symp on Advanced DB Technologies and Integration (0.83) PAKDDM: Practical App of Knowledge Discovery and Data Mining (0.82) NGDB: Intl Symp on Next Generation DB Systems and Apps (0.81) http://perso.crans.org/~genest/conf.html

1/9

6/16/2015

Copy of www.cs­conference­ranking.org

ANNIE: Artificial Neural Networks in Engineering (0.72) AI­ED: World Conference on AI in Education (0.72) DAS: International Workshop on Document Analysis Systems (0.71) ICIP: Intl Conf on Image Processing (0.71) ICGA: International Conference on Genetic Algorithms (0.71) EA: International Conference on Artificial Evolution (0.70) WACV: IEEE Workshop on Apps of Computer Vision (0.65) COLING: International Conference on Computational Linguistics (0.64) ECCV: European Conference on Computer Vision (0.63) EACL: Annual Meeting of European Association Computational Linguistics (0.62) DocEng: ACM Symposium on Document Engineering (0.61) CAAI: Canadian Artificial Intelligence Conference (0.60) AMAI: Artificial Intelligence and Maths (0.60) ICRA: IEEE Intl Conf on Robotics and Automation (0.60) WCES: World Congress on Expert Systems (0.60) ACCV: Asian Conference on Computer Vision (0.59) CAIA: Conf on AI for Applications (0.57) IEA/AIE: Intl Conf on Ind. and Eng. Apps of AI and Expert Sys (0.57) ICCBR: International Conference on Case­Based Reasoning (0.57) ICASSP: IEEE Intl Conf on Acoustics, Speech and SP (0.57) ASC: Intl Conf on AI and Soft Computing (0.57) PACLIC: Pacific Asia Conference on Language, Information and Computation (0.56) ICONIP: Intl Conf on Neural Information Processing (0.56) IWPAAMS: Intl Workshop on Practical Appl. of Agents & Multiagent Systems (0.56) SMC: IEEE Intl Conf on Systems, Man and Cybernetics (0.55) CAEPIA: Conference of the Spanish Association for Artificial Intelligence (0.55) IWANN: Intl Work­Conf on Art and Natural Neural Networks (0.55) CIA: Cooperative Information Agents (0.55) RANLP: Recent Advances in Natural Language Processing (0.54) ICANN: International Conf on Artificial Neural Networks (0.54) MLMTA: Intl Conf on Machine Learning; Models, Technologies and Applications (0.54) NLPRS: Natural Language Pacific Rim Symposium (0.54) ACIVS: Int Conference on Advanced Concepts For Intelligent Vision Systems (0.53) RSS: Robotics: Science and Systems Conference (0.53) ICAPS/AIPS: Conference on Artificial Intelligence Planning Systems (0.53) ECAL: European Conference on Artificial Life (0.53) MAAMAW: Modelling Autonomous Agents in a Multi­Agent World (0.52) ANTS: Ant Colony Optimization and Swarm Intelligence (0.51) NC: ICSC Symposium on Neural Computation (0.51)

Architecture / Hardware / High­Performance Computing / Tools / Operating Systems Although we will attempt to keep this information accurate, we cannot guarantee the accuracy of the information provided. The numbers in brackets correspond to the EIC value (Estimated Impact of Conference). The numbers are normalized to be in the range 0.00­1.00 (the closer the number to 1.00, the better the conference). Only conferences with EIC above 0.50 have been included. The ranking lists will be updated every three months (end of January, April, July, and October), but this is getting more of a challenge than originally anticipated. Conferences listed below are considered to be tier 1 research meetings in their respective fields. http://perso.crans.org/~genest/conf.html

4/9

6/16/2015

Copy of www.cs­conference­ranking.org

Top 57 conferences are listed (421 considered): MICRO: Intl Symp on Microarchitecture (0.97) OSDI: USENIX Operating Systems Design and Implementation (0.96) SC/SUPER: ACM/IEEE Supercomputing Conference (0.96) HPCA: IEEE Symp on High­Perf Comp Architecture (0.96) ASPLOS: Architectural Support for Prog Lang and OS (0.95) FCCM: IEEE Symposium on Field Programmable Custom Computing Machines (0.93) ISCA: ACM/IEEE Symp on Computer Architecture (0.99) HCS: Hot Chips Symp (0.92) DAC: Design Automation Conf (0.92) IPDPS: Intl Parallel and Distributed Processing Symposium (0.91) PACT: IEEE Intl Conf on Parallel Architectures and Compilation Techniques (0.88) ISSCC: IEEE Intl Solid­State Circuits Conf (0.87) VLSI: IEEE Symp VLSI Circuits (0.87) ICCAD: Intl Conf on Computer­Aided Design (0.86) CODES+ISSS: Intl Conf on Hardware/Software Codesign & System Synthesis (0.86) USENIX: Technical Conference (0.86) DATE: IEEE/ACM Design, Automation & Test in Europe Conference (0.85) ICA3PP: Algs and Archs for Parall Proc (0.85) ERSA: Intl Conf on Engineering of Reconfigurable Systems and Algorithms (0.85) ICN: IEEE Intl Conf on Networking Topology in Computer Science Conference (0.84) PDPTA: Intl Conf on Parallel & Distributed Processing Techniques and Appl. (0.84) ASAP: IEEE / Application­Specific Systems, Architectures, and Processors (0.84) CHARME: Conference on Correct Hardware Design and Verification Methods (0.83) FPL: Field­Programmable Logic and Applications (0.82) ICCD: Intl Conference on Computer Design (0.81) PPoPP: ACM SIGPLAN Symp. on Principles & Practice of Parallel Programming (0.81) CASES: Intl Conf on Compilers, Architecture, & Synthesis for Embedded Systems (0.81) ESA: Intl Conf on Embedded Systems and Applications (0.79) PARCO: Parallel Computing Conference (0.77) ICS: Intl Conf on Supercomputing (0.74) SC: ACM/IEEE Intl Conf for High Perf. Comp., Networking, Storage & Analysis (0.73) PADS: IEEE Workshop on Parallel and Distributed Simulation (0.72) CDES: Intl Conf on Computer Design (0.68) CANPC: Communication, Arch., & Appl. for Network­Based Parallel Comp. (0.71) GCA: Intl Conf on Grid Computing and Applications (0.71) ISPASS: Int Symposium on Performance Analysis of Systems and Software (0.71) RTAS: IEEE Real Time Technology and Applications Symposium (0.69) CHES: Cryptographic Hardware and Embedded Systems (0.67) PPSC: SIAM Conf on Parallel Processing for Scientific Computing (0.65) NOSA: Nordic Symposium on Software Architecture (0.64) ACSAC: Asia­Pacific Computer Systems Architecture Conference (0.62) ICPP: Intl Conf. on Parallel Processing (0.61) RTCOMP: Intl Conf on Real­Time Computing Systems and Applications (0.60) ASYNC: Symposium on Asynchronous Circuits and Systems (0.59) CAMP: Intl Workshop on Computer Architectures for Machine Perception (0.59) PPSN: Parallel Problem Solving from Nature (0.56) HPCS: Intl Symposium on High Performance Computing Systems (0.56) HPDC: IEEE Intl Symposium on High Performance Distributed Computing (0.56) VTS: IEEE VLSI Test Symposium (0.56) http://perso.crans.org/~genest/conf.html

5/9

6/16/2015

Computer Science Conference Rank

Computer Science Conference Rank Source: CORE

Rank A+

Rank A

Rank B

Rank C

Rank A CS conference Antonym ACIS ACSAC AIIM AIME AiML ALENEX ALIFE ALT AMCIS AOSD APPROX ASAP ASE ASIACRYPT ATVA BPM CADE CaiSE CANIM CBSE CC CCC CCGRID CGO CIDR CIKM

Name Australasian Conference on Information Systems Annual Computer Security Applications Conference Artificial Intelligence in Medicine Artificial Intelligence in Medicine in Europe Advances in Modal Logic Workshop on Algorithm Engineering and Experiments International Conference on the Simulation and Synthesis of Living Systems Algorithmic Learning Theory Americas Conference on Information Systems Aspect Oriented Software Development International Workshop on Approximation Algorithms for Combinatorial Optimization Problems International Conference on Apps for Specific Array Processors Automated Software Engineering Conference International Conference on the Theory and Application of Cryptology and Information Security International Symposium on Automated Technology for Verification and Analysis International Conference in Business Process Management International Conference on Automated Deduction International Conference on Advanced Information Systems Engineering Computer Animation International Symposium Component Based Software Engineering

Rank A A A A A A

International Conference on Compiler Construction IEEE Symposium on Computational Complexity IEEE International Symposium on Cluster, Cloud and Grid Computing International Symposium on Code Generation and Optimization Conference on Innovative Data Systems Research ACM International Conference on Information and Knowledge Management

A A

http://lipn.univ­paris13.fr/~bennani/CSRank.html

A A A A A A A A A A A A A A

A A A A 1/6

6/16/2015

academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100 Sign in

Advanced Search

Authors »

Academic Computer Science Hardware & Architecture

Publications »

Top conferences in hardware & architecture

Conferences »

1–100 of 102 results

Journals » Keywords »

All Years Conferences

1   2   Publications

Field Rating

 

Organizations » DAC ­ Design Automation Conference

9030

124

23073

95

ICCAD ­ International Conference on Computer Aided Design

2759

94

ISPD ­ International Symposium on Physical Design

2702

89

898

89

ISCA ­ International Symposium on Computer Architecture

1342

83

ISLPED ­ International Symposium on Low Power Electronics and Design

1384

71

692

65

4588

64

FCCM ­ Field­Programmable Custom Computing Machines

930

56

FPGA ­ Symposium on Field Programmable Gate Arrays

845

55

Hybrid Systems

778

55

CHES ­ Cryptographic Hardware and Embedded Systems

427

53

ICCD ­ International Conference on Computer Design

2438

51

VTS ­ IEEE VLSI Test Symposium

1450

49

PACT ­ International Conference on Parallel Architectures and Compilation Techniques

804

48

ASPLOS ­ Architectural Support for Programming Languages and Operating Systems

334

44

3962

41

ISSS ­ International Symposium on Systems Synthesis

483

40

CODES ­ International Conference on Hardware Software Codesign

340

40

ECRTS ­ Euromicro Conference on Real­Time Systems

694

39

2022

38

ASYNC ­ Symposium on Asynchronous Circuits and Systems

402

36

EURODAC ­ European Design and Test Conference

945

34

CASES ­ Compilers, Architecture, and Synthesis for Embedded Systems

404

34

ARITH ­ IEEE Symposium on Computer Arithmetic

648

33

2311

32

COMPCON ­ Computer Society International Conference

899

32

TPHOLs ­ Theorem Proving in Higher Order Logics

569

32

CGO ­ Symposium on Code Generation and Optimization

278

29

EDTC ­ European Design and Test Conference

368

28

CHARME ­ Conference on Correct Hardware Design and Verification Methods

220

28

1773

27

140

27

1070

26

ISCAS ­ IEEE International Symposium on Circuits and Systems

MICRO ­ International Symposium on Microarchitecture

HPCA ­ International Symposium on High­Performance Computer Architecture DATE ­ Design, Automation, and Test in Europe

ASP­DAC ­ Asia and South Pacific Design Automation Conference

FPL ­ Field­Programmable Logic and Applications

VLSI Design

ISQED ­ International Symposium on Quality Electronic Design ARVLSI ­ Advanced Research in VLSI DFT ­ Defect and Fault Tolerance in VLSI Systems

http://academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100

1/3

6/16/2015

academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100 VLSI ­ Very Large Scale Integration

621

26

1477

25

625

25

ISMVL ­ IEEE International Symposium on Multiple­Valued Logic

1374

24

SiPS ­ IEEE Workshop on Signal Processing Systems

1104

24

IWLS ­ International Workshop on Logic & Synthesis

246

23

ATS ­ Asian Test Symposium

752

22

Annual Symposium on VLSI

757

20

RSP ­ Workshop on Rapid System Prototyping

709

20

ETS ­ European Test Symposium

434

20

FPT ­ IEEE International Conference on Field­Programmable Technology

771

19

IOLTS ­ International On­Line Testing Symposium

686

19

68

19

307

18

1015

17

SLIP ­ System­Level Interconnect Prediction

172

17

ERSA ­ Engineering of Reconfigurable Systems and Algorithms

409

16

DELTA ­ Workshop on Electronic Design, Test and Applications

548

15

MTDT ­ Memory Technology, Design and Testing

344

15

PATMOS ­ Workshop on Power and Timing Modeling, Optimization and Simulation

614

14

SAMOS ­ Systems, Architectures, Modeling, and Simulation

300

14

CAMP ­ Computer Architectures for Machine Perception

379

13

CPA ­ Communicating Process Architectures

129

13

55

13

AHS ­ Adaptive Hardware and Systems

411

12

CODES+ISSS ­ International Conference on Hardware Software Codesign

192

12

HIPEAC ­ High Performance Embedded Architectures and Compilers

135

12

39

12

ACSAC ­ Asia­Pacific Computer Systems Architecture Conference

330

11

MIC ­ Modelling, Identification and Control

151

11

58

11

654

10

DDECS ­ Workshop on Design and Diagnostics of Electronic Circuits and Systems

604

10

ARC ­ Applied Reconfigurable Computing

275

10

Workshop on Interaction between Compilers and Computer Architectures

49

10

TPCD ­ Theorem Provers in Circuit Design

39

10

Mathematical Science Institute Workshops

27

10

IWSOC ­ International Workshop System­on­Chip for Real­Time Applications

311

9

ESTIMEDIA ­ Embedded Systems for Real­Time Multimedia

170

9

MTV ­ Workshop on Microprocessor Test and Verification

159

9

IFIP WG10.5

139

9

25

9

ACM Great Lakes Symposium on VLSI ASAP ­ Application­Specific Systems, Architectures, and Processors

PACS ­ Power­Aware Computer Systems ACSD ­ Int. Conf. on Application of Concurrency to System Design DSD ­ Euromicro Symposium on Digital Systems Design

Computer Hardware Description Languages and their Applications

TAU ­ Timing Issues In The Specification And Synthesis Of Digital Systems

MVL ­ Multiple­Valued Logic ICA3PP ­ International Conference on Algorithms and Architectures for Parallel Processing

WMPI ­ Workshop on Memory Performance Issues

http://academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100

2/3

6/16/2015

academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100 ICMENS ­ International Conference on MEMS, NANO, and Smart Systems

430

8

SIPEW ­ SPEC International Performance Evaluation Workshop

44

8

E­smart ­ Research in Smart Cards

22

8

289

7

69

7

195

6

RECOSOC ­ Reconfigurable Communication­centric Systems­on­Chip

95

6

New Zealand Computer Science Research Students' Conference

28

6

Intelligent Memory Systems

18

6

CDES ­ International Conference on Computer Design

213

5

IESS ­ International Embedded Systems Symposium

75

5

Fractals in the Natural and Applied Sciences

41

5

New Hardware Design Methods

13

5

7

5

149

4

36

4

8

4

185

3

46

3

6

3

HPCNCS ­ High Performance Computing, Networking and Communication Systems

81

2

Rechnergestützter Entwurf und Architektur mikroelektronischer Systeme

25

2

IFIP WG5.10 Publications

22

2

SoCC ­ IEEE International System­on­Chip (SoC) Conference Synthesis for Control Dominated Circuits CIT ­ Conference on Information Technology

MEDEA ­ Memory Performance: Dealing With Applications, Systems And Architecture IASTEDCCS ­ Circuits, Signals, and Systems Sagamore Computer Conference Formal Hardware Verification SIGMAP ­ Signal Processing and Multimedia Applications Microcomputing APCCAS ­ Asia Pacific Conference on Circuits and Systems

1   2  

Help  |  Feedback  |  Follow Us  |  Terms of Use  |  Specific Terms  |  Trademarks  |  Privacy & Cookies  |  Survey

©2013 Microsoft Corporation. All rights reserved.

http://academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100

3/3