Power Optimization of Sum-of-Products Design for Signal Processing Applications Seok Won Heo
Suk Joong Huh
Milo˘s D. Ercegovac
Computer Science Department University of California at Los Angeles CA, USA 90095
[email protected]
Samsung Electronics Suwon, Korea
[email protected]
Computer Science Department University of California at Los Angeles CA, USA 90095
[email protected]
Abstract—Power consumption is a critical aspect in today’s mobile environment, while high-throughput remains a major design goal. To satisfy both low-power and high-throughput requirements, parallelism has been employed. In this paper we present an approach to reducing power dissipation in the design of sum-of-products operation by utilizing parallel hardware while maintaining high-throughput. The proposed design reduces about 46% of execution time with about 12% energy penalty compared to the ARM7TDMI-S multipliers in benchmark programs.
Keywords—Low-power Arithmetic; Arithmetic; Sum-of-products.
High-throughput
I. INTRODUCTION There is a fundamental technological shift taking place in the electronics industry. It is moving from the wired era driven by Personal Computer (PC) to the wireless era driven by mobile devices. With an increasing complexity of mobile VLSI systems and a growing number of signal processing applications, minimizing the power consumption of signal processing applications has become of great importance in today’s mobile system design while performance and area remain the other two major design goals. Multiplication and related arithmetic operations are frequently executed operations in conventional digital signal processing applications. However, digital signal processing applications may take many clock cycles using a conventional multiplier even when they include a high-performance parallel multiplier. This is the critical problem for the arithmetic operations in state-of-the-art signal processing applications which require intensive numerical calculations. Moreover, studies on power dissipation in Digital Signal Processors (DSPs) and Graphics Processing Units (GPUs) indicate that the multiplier is one of the most power demanding components on these chips [1]. Therefore, research on new arithmetic models is needed to satisfy low-power and high-throughput requirements in mobile systems. The total power consumed by a CMOS circuit is composed of two sources: dynamic power and static power [2]. Dynamic power dissipation is the dominant factor in the total power consumption of a CMOS circuit and typically contributes over 60% of the total system power dissipation. Although the effect of static power dissipation increases significantly as
978-1-4799-0493-8/13/$31.00 © 2013 IEEE
VLSI manufacturing technology shrinks, the dynamic power dissipation remains dominant [3]. It can be described by 2 Pdynamic = 0.5 × CL × VDD × fp × N
(1)
where CL is the load capacitance, VDD is the power supply voltage, fp is the clock frequency, and N is the switching activity. The equation indicates that the power supply voltage has the largest impact on the dynamic power dissipation due to its squared term factor. Unfortunately, reducing power supply voltage causes performance degradation. A great deal of effort has been expended in recent years on the development of techniques to utilize the low-power supply voltage while minimizing the throughput degradation. Parallel architectures mitigate such throughput degradation [4]. This paper proposes a new arithmetic architecture for signal processing applications and develops a scheme to achieve power savings in the sum-of-products operation by utilizing parallel architectures. This paper is organized as follows. Section II addresses the problem of conventional arithmetic architectures. Section III presents an in-depth view of recent research in the design of parallel multipliers and provides the proposed sum-of-products architecture. In Section IV, the paper provides power and throughput estimates for the sumof-products design and compares them to the estimates for conventional ARM multipliers and the proposed multipliers. Section V discusses current problems. Finally, a summary is given in Section VI. The designs presented in this paper assume 32-bit integer operands, but they can easily be extended to other types of fixed-point operands. II. PROBLEM Sum-of-products are found in many digital signal processing and multimedia applications including FIR filter, high pass filter, and inner-products. This computation is a summation of two products. It can be described by S =a×b+x×y
(2)
A variation of the sum-of-products is the inner-product. The inner-product is usually computed by repeatedly using a sumof-products.
192
S[i + 1] = a[i] × b[i] + x[i] × y[i] + S[i],
(3)
ASAP 2013
where S[0] = 1. Previous research has mainly focused on designs for dedicated multipliers demonstrating that parallel multipliers can be implemented with clustering/partitioning [5], pipelining [6], bypassing [7] and signal gating [8] techniques for reduced power dissipation. An improved modified Booth encoding with multiple-level conditional sum adder [9] and a sign select Booth encoder [10] techniques have been proposed for high-performance. However, recent studies show conventional arithmetic design cannot efficiently support increasing highthroughput and low-power requirements. The sum-of-product architecture offers an opportunity to satisfy these requirements. Due to the frequent use of multiplication and related arithmetic calculations in digital signal processing applications, many processors provide multiply and/or multiply-accumulate instructions. In order to execute sum-of-products operations, processors use an existing multiplier or a multiply-accumulate (MAC) unit. Conventional processors take extra cycles when using multipliers and MAC units to perform sum-of-products. Clearly, by including a sum-of-product operation one expects that fewer cycles are needed. We want to show that the energydelay product is also reduced. Consider a typical FIR filter: y[n] =
∞ X
c[k] × x[n − k]
(4)
k=−∞
This equation can be implemented in high level language, such as C, as follows: y[n] = 0 for(k = 0; k < N ; k + +) { y[n] = y[n] + c[k] × x[n − k] }
(5)
The last line corresponds a multiply-accumulate operation: x = x + y × z. This equation can be translated into a single multiply-accumulate instruction. The FIR filter can be also implemented in C in another way as:
III. SUM-OF-PRODUCTS DESIGN A. Baseline Architecture The sum-of-products baseline model needs two multipliers and one adder. One way to design the sum-of-products is to use two Partial Product Reduction (PPR) arrays and [4:2] adders followed by a single final Carry Propagate Adder (CPA). The other way to design the sum-of-products is to use two PPR arrays and two CPAs followed by a single CPA. The structure using a [4:2] adder followed by a single CPA will be a better solution, because it has one less carry-propagate addition, and thus the power and delay of this architecture is slightly better than those of its counterpart. The inner-products can be designed based on sum-of-products model. It consists of two PPR arrays, [6:2] adders and latches for accumulation and a single CPA. The [6:2] adders accumulate four inputs with the previous partial sums and carries. Figure 1 shows the baseline models. B. Multiplier Multipliers consume more power and have a longer latency than adders, and thus this paper mainly describes multiplier designs. The previous studies demonstrate that array multipliers that integrate array splitting and left-to-right techniques are better than tree multipliers in terms of power while keeping similar delay and area for up to 32-bit [11][12]. Therefore, in this paper we focus on developing the sum-of-products based on the left-to-right split array multipliers. 1) Left-to-Right Array Multiplier: In conventional right-toleft array multipliers, the Partial Products (PPs) are added sequentially from the rightmost multiplier bit. In contrast, in left-to-right array multipliers, the PPs are added in series starting from the leftmost multiplier bit [13]. Of the two designs, left-to-right array multipliers have the potential of saving power and delay, because the carry signals propagate fewer stages, which reduce the power consumption in the Most Significant (MS) region. Left-to-right array multipliers are superior for data with large range, because PPs corresponding to sign bits with low switching activities are located in the upper region of array [14][15].
y[n] = 0 for(k = 0; k < N ; k+ = 2) { y[n] = y[n] + c[k] × x[n − k] + c[k + 1] × x[n − k + 1] } (6) The last line corresponds to accumulated sum-of-products: x = x + y0 × z0 + y1 × z1 . This equation can be translated into a single instruction using the sum-of-products design. In the best case scenario, the sum-of-products operations will require only half the number of cycles using sum-of-products hardware compared to using a single multiplier.
(a)
(b)
Fig. 1. The baseline models: (a) Sum-of-products design (b) Inner-products design
193
2) Split Array Multiplier: In array multipliers, the lower rows consume much more power than the upper rows in PPR array, because glitches cause a snow balling effect as signals propagate through array [16]. Therefore, if the length of array could be reduced, there would be significant power savings. The way to reduce the array is to split the array into several parts. The previous architectures are the two-level even/odd [17] and upper/lower split array multiplier [14]. Each part only has a half number of rows, and is added separately in parallel. The final even/odd (upper/lower) vectors from two parts can be reduced to two vectors using [4:2] adders. The upper/lower split array architecture is shown in Figure 2. Previous studies have mainly focused on developing twolevel split array designs. However, there would be more powerand delay-efficient if each part is split further. The upper/lower structure is better than the even/odd structure if four-level splitting is used, because it allows simpler interconnection. The physical regularity of array multipliers will be maintained by interleaved placement & routing if we apply upper/lower structure. Moreover, the two-level upper/lower split array multiplier consumes the less power compared to the two-level even/odd counterpart. Therefore, in this paper, we utilize the four-level upper/lower split array multiplier. 3) Carry Save Adder: A [4:2] adder has been widely used in parallel multipliers. As technology scales to the deep sub-micron, the importance of simple wire interconnections increases. Compared to two cascaded [3:2] adders, a [4:2] adder has a regular structure with simple interconnection, and thus it reduces the physical complexity. Moreover a [4:2] adder has the same gate complexity as two [3:2] adders. However a [4:2] adder is faster than two cascaded [3:2] adders because it has 3 × TXOR2 delay while each single [3:2] adder has 2×TXOR2 delay. Thus, by using [4:2] adders, the PPR delay is reduced about 25% without area penalties. The delay reduction is positive for power savings as less switching activities can be generated when signals propagate fewer stages. In this paper, we utilize [4:2] adders for low-power design. IV. EXPERIMENTAL RESULTS A. ARM Multiplier Results We summarize relative performance using total execution time of programs. The execution time required for a program Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Fig. 2.
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
Upper/lower split array architecture [13].
Ǹ Ǹ Ǹ Ǹ Ǹ
Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ Ǹ
can be written as Execution time for a program (7) = Clock cycles for a program × Clock cycle time = Instructions for a program × Clock cycles per instruction × Clock cycle time The instruction for a program depends upon compiler and Instruction Set Architecture (ISA), and the Clock cycles Per Instruction (CPI) depends upon ISA and micro architecture [18]. Therefore, we are restricted to the specific compiler, ISA and micro architecture for accurate results. A good example is the ARM architecture. The ARM instruction set differs from the pure RISC definition in several ways that make the ARM instruction set suitable for low-power embedded applications, and hence the ARM core is used to perform real-time digital signal processing in most embedded systems. Specifically, digital signal processing programs are typically multiplication intensive and the performance of the multiplication hardware is critical to meeting the real-time constraints. All ARM processors have included hardware support for integer multiplication and used two styles of multiplier [19]. Several ARM cores include a low-cost multiplication hardware that supports only the 32-bit results multiply and multiplyaccumulate instructions. This multiplier uses the main data path iteratively, employing the barrel shifter and Arithmetic Logic Unit (ALU) to generate a 2-bit PP in each clock. The other version is ARM cores with an M in their name (for example the ARM7DM) and recent higher performance cores have high-performance multiplier and support the 64-bit results multiply and multiply-accumulate instructions. This multiplier employs a modified Booth’s algorithm to produce the 2-bit PP. The carry save array has four layers of adders, each handling two multiplier bits, so the array can multiply 8-bit per clock cycle. The array is cycled up to four times, and the partial sum and carry are combined 32-bit at a time and written back into the register. As multiplication performance is very important, more hardware resource must be dedicated. The best choice is the ARM7TDMI-S processor. The ARM7TDMIS includes an enhanced 32 × 8 single multiplier with a radix-4 modified Booth’s algorithm, and this is synthesizable version of the ARM7TDMI core. Therefore, when trying to measure the cycle counts for an application executed on the ARM multiplier with the cycle-level simulator, synthesizable core provides efficient solution. ARM and Thumb are two different instruction sets supported by ARM cores with a T in their name. ARM instructions are 32-bit wide, and Thumb instructions are 16-bit wide. Thumb mode allows for code to be smaller, and can potentially be faster if the target has slow memory. However, multiply-accumulate operations are not available in Thumb mode. Therefore, we use ARM mode in this experiment. ARM7TDMI-S core does not have a sum-of-products hardware, but includes an enhanced single multiplier, and thus, in particular, cannot use the sum-ofproducts instruction directly. The ARM compiler avoids generating the sum-of-products instructions, and hence we cannot
194
directly measure the total clock cycles with sum-of-products using cycle-level simulation with compiled assembly code. This means for every sum-of-products instruction code must be regenerated manually after analyzing the original ARM assembly code. Suppose we have the modified implementation of ARM7TDMI-S ISA. We replace two consecutive multiplication operations into one sum-of-products operation. The sum-of-products instruction can execute two multiplications simultaneously, and then two products are converted into the final result using a CPA. The ARM7 multiplication finishes up to 4 clock cycles, and thus sum-of-products take up to 5 clock cycles due to single cycle final addition. To regenerate the modified ARM assembly code, we use the ARM technical reference manual after compiling the original C code [20]. The reference manual shows all instructions and their cycle count. We can measure the clock cycles using the ARM multiplier for benchmark programs by running cycle-level simulation using compiled ARM assembly code. The hardware/software co-simulation tool such as Mentor Graphics Questa Codelink profiles clock cycles for programs. The comparison results of clock cycle estimates are shown in Table I. Based on an analysis of clock cycles, we expect the clock cycles of sum-ofproducts to be 42% and 48% less than those of multiplication for FIR filter and high pass filter programs, respectively. The clock cycle time is usually published as part of the specification document. However, as the ARM7TDMI-S is synthesizable core, we can directly measure the power and latency of the ARM multiplier using Synopsys Design Compiler with ARM7TDMI-S HDL code, and estimate those of the sum-of-products hardware. We assume the sum-of-products hardware consists of two identical ARM7TDMI-S multiplier and ALU. Table II shows the power, delay and area of a multiplier and a sum-of-products hardware. The amount of energy used depends on the power and the time for which it is used, and can be written as
The execution time for benchmark programs can be calculated by using equation (7) and measured clock cycles for a program and clock rate. Table III summarizes the energy and execution time. The sum-of-products unit dissipates between 23% and 24% more energy than a single multiplier while between 40% and 41% decrease in execution time for a FIR filter program and between 11% and 12% more energy while between 45% and 46% decrease in execution time for a high pass filter. The designer often faces a trade-off between execution time and energy. Thus, we need to have a suitable metric for energy efficiency. Energy-delay product is widely used when reporting a new architecture design that addresses energy-performance effectiveness [21]. The sum-of-products units are better than multipliers only in terms of energy-delay product in the considered benchmarks. Shorter execution time of sum-of-products can provide less energy demanded by the design. If we reduce the supply voltage, our design can save significant energy. The reason is that the clock cycles per program with sum-of-products are reduced by half compared to those with the multiplier while reducing supply voltage is to increase the clock cycle time slightly. For example, if we replace the ARM multiplier at 1.32V with the sum-of-products at 1.08V for high pass filter, about 22% in execution time and 10% in energy can be decreased. For FIR filter, sum-of-products has 14% less execution time while keeping the same energy. The multiplier and sum-of-products are characterized in the execution time ratio vs. energy ratio in Figure 3. Energy ratio is decreased as execution time ratio is increased. The sumof-products unit consumes more energy as the difference of execution time between a sum-of-products and a multiplier is increased. The energy ratio is expected to be less than 1 if their execution time is the same. This means the sum-ofproducts unit consumes less power than a multiplier only when the execution time is the same.
Energy (Joules) = Power(Watts) × Time(Seconds) (8)
B. The Design Characteristics of the Proposed Sum-ofProducts Units
TABLE I C LOCK CYCLES FOR BENCHMARK PROGRAMS . FIR Filter (length = 100) 1415 1.00 817 0.58
Clock Cycles Multiplication Sum-of-products
High Pass Filter (length = 100) 1617 1.00 845 0.52
TABLE II T HE POWER ,
DELAY AND AREA OF THE ARM7TDMI-S MULTIPLIER AND A SUM - OF - PRODUCTS HARDWARE .
Supply Voltage 1.32 V 1.2 V 1.08 V ∗
Hardware MUL∗ SOP† MUL∗ SOP† MUL∗ SOP†
multiplier: measured value,
†
Power (µW) 1678 3461 1250 2578 940 1940
Delay (ns) 0.99 1.02 1.15 1.19 1.42 1.48
Area (NAND2) 1384 2941 1316 2788 1364 2896
sum-of-products: estimated value
We implemented the proposed sum-of-products unit using Verilog and a top-down methodology. The designs were verified using Cadence NC-Verilog and synthesized using Synopsys Design Compiler in Samsung 65 nanometer CMOS standard cell low power library. The proposed designs were synthesized with three supply voltages 1.08V, 1.20V, and 1.32V supported by technology. To reduce the effects of changes made by the synthesis tool to the structure of the original Verilog code, we implement the same design technology and use the same Synopsys Design Compiler constraints in all designs. Placement & routing processes were performed to obtain more precise results using Synopsys Astro. Delays were obtained from Synopsys PrimeTime, and powers were obtained from the Samsung in-house power estimation tool, CubicWare. Table IV shows power, delay and area estimates for sum-of-products design. Synthesis results indicate that the multipliers have most power, delay, and area of the sum-ofproducts design.
195
TABLE III T HE EXECUTION TIME , ENERGY, Benchmark Programs
AND ENERGY- DELAY PRODUCT OF THE ARM7TDMI-S MULTIPLIER AND A SUM - OF - PRODUCTS HARDWARE FOR BENCHMARK PROGRAMS .
Supply Voltage
Hardware ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products † ARM7TDMI-S Multiplier∗ Sum-of-products †
1.32 V FIR Filter (length = 100)
1.2 V 1.08 V 1.32 V
High Pass Filter (length = 100)
1.2 V 1.08 V
∗
Execution Time (ns) 1400.85 1.00 833.34 0.59 1627.25 1.00 972.23 0.60 2009.30 1.00 1209.16 0.60 1600.83 1.00 861.90 0.54 1859.55 1.00 1005.55 0.54 2296.14 1.00 1250.60 0.55
Energy (µJ) 2.35 1.00 2.88 1.23 2.03 1.00 2.50 1.23 1.89 1.00 2.35 1.24 2.69 1.00 2.98 1.11 2.32 1.00 2.59 1.12 2.16 1.00 2.43 1.12
Energy-Delay 3292.00 2400.02 3303.32 2430.58 3797.58 2841.53 4306.23 2568.46 4314.16 2604.37 4959.66 3038.96
Product 1.00 0.73 1.00 0.74 1.00 0.75 1.00 0.60 1.00 0.60 1.00 0.61
†
: measured value, : estimated value
TABLE IV P OWER , Hardware Sum-of-products LR 4ULS 42 Array∗ × 2 [4:2] adder, CPA
DELAY, AND AREA FOR SUM - OF - PRODUCTS .
Power (µW) 3799.45 1.00 1469.90 × 2 0.39 × 2 859.65 0.22
Delay (ns) 13.80 1.00 12.68 0.92 1.12 0.08
Area (NAND2) 11870 1.00 5295 × 2 0.45 × 2 1280 0.10
*LR 4ULS 42 Array: 4-Level Upper/Lower Split Left-to-Right Array using [4:2] adder with supply voltage 1.08 V
TABLE V T HE POWER , Supply Voltage 1.32 V 1.2 V 1.08 V ∗
DELAY AND AREA OF THE PROPOSED MULTIPLIER AND SUM - OF - PRODUCTS HARDWARE .
Hardware LR 4ULS 42∗ SOP† LR 4ULS 42∗ SOP† LR 4ULS 42∗ SOP†
Power (µW) 2691 5825 2246 4844 1856 3799
Delay (ns) 9.84 10.74 11.48 11.85 13.02 13.80
Area (NAND2) 5864 12226 5736 12038 5722 11870
LR 4ULS 42: 4-Level Upper/Lower Split Left-to-Right Array Multiplier using [4:2] adder = LR 4ULS 42 Array + CPA † sum-of-products
To compare cycle-level results, the other experiment is to use the proposed multipliers and sum-of-products. For accurate results, we assume the clock cycles for benchmark programs are the same as those in ARM7 test environments. Table V shows the power, delay and area of the proposed multiplier and sum-of-products hardware, and Table VI summarizes the energy and execution time. The sum-of-products unit dissipates between 25% and 36% more energy than a single multiplier while between 37% and 40% decrease in execution time and between 15% and 23% less in energy-delay product for a FIR filter program. Also the sum-of-products consumes between 13% and 23% more energy with between 43% and 46% decrease in execution time and between 30% and 37% decrease in energy-delay product for a high pass filter. The sum-of-products is better than the multiplier only solution in terms of energy-delay product. V. DISCUSSION
mainly determined by the silicon process technology and the total number of transistors. Unfortunately, the sum-of-products design will consume more static power than a multiplier only due to larger area. One obvious technique to reduce static power is to reduce the supply voltages used in the circuit [22]. However, it is difficult to find opportunities to reduce the supply voltage, since static power dissipation decreases with the scaling of supply voltage, while delay only increases linearly. It is possible to use high supply voltage in the critical paths of a design to achieve the required performance while the offcritical paths of the design use lower supply voltage to achieve low static power dissipation. By partitioning the circuit into several domains operating at different supply voltages, static power savings are possible. However, level shifter circuits are required for inter-domain communication. This should come at the cost of added circuitry. The other way is to use multi-threshold voltages. This technology has transistors with multiple threshold voltages in order to optimize delay or power. In modern process technology, multiple threshold voltages are provided for each transistor. A multi-threshold voltage process provides the designer with transistors that are either fast with a high static power or slow with the low static power. Therefore, a circuit can be partitioned into high and low threshold voltage gates, which are a trade-off between high-performance and reduced static power dissipation. However, a limitation of this technique is that CAD tools need to be developed and integrated into the design to optimize the partitioning process.
A. Static Power Dissipation
VI. SUMMARY
As mentioned earlier, power of a circuit consists of static and dynamic power dissipation components. Static power is
In this paper, we have proposed a new sum-of-products arithmetic architecture, and have discussed ways of power
196
TABLE VI T HE EXECUTION TIME , ENERGY,
AND ENERGY- DELAY PRODUCT OF THE PROPOSED MULTIPLIER AND SUM - OF - PRODUCTS HARDWARE FOR BENCHMARK PROGRAMS .
Benchmark Programs
Supply Voltage
LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products LR 4ULS 42 Sum-of-products
1.32 V FIR Filter (length = 100)
1.2 V 1.08 V 1.32 V
High Pass Filter (length = 100)
Execution Time (µs) 13.92 1.00 8.77 0.63 16.24 1.00 9.68 0.60 18.42 1.00 11.27 0.61 15.91 1.00 9.08 0.57 18.65 1.00 10.01 0.54 21.05 1.00 11.66 0.55
Hardware
1.2 V 1.08 V
ARM7 TDMIS-S mu ltip lier v s su m-of-produ cts
Trend line When t he execut ion t ime is t he same, sum-of -prod uct s unit s consume less power.
(a) LR_ 4 ULS_ 4 2 v s su m-o f-p rodu cts
Trend line
When t he execut ion t ime is t he same, sum-of -prod uct s unit s consume less power.
(b) Fig. 3. Comparison of energy ratio for execution time ratio of benchmarks: (a) ARM7TDMI-S multiplier and sum-of-products (b) LR 4ULS 42 and sumof-products
savings of the sum-of-products. The proposed unit achieves a significant power or delay savings. It is comparable in power and latency to other current multiplier designs. Compared to sum-of-products implementation using the ARM7TDMI-S multiplier, the proposed sum-of-products unit can reduce the execution time by approximately 45% with 15% of energy penalty in benchmark applications. R EFERENCES
Energy (µJ) 37.48 1.00 51.11 1.36 36.49 1.00 46.90 1.29 34.20 1.00 42.84 1.25 42.84 1.00 52.86 1.23 41.70 1.00 48.51 1.16 39.08 1.00 44.31 1.13
Energy-Delay Product 521.85 1.00 448.50 0.85 592.72 1.00 454.07 0.77 630.10 1.00 482.97 0.77 681.48 1.00 479.76 0.70 774.03 1.00 485.72 0.63 822.83 1.00 516.65 0.63
[2] J. M. Rabaey, A. P. Chandrakasan, and B. Nikolic, Digital integrated circuits: a design perspective. 2nd ed., Prentice Hall, 2003. [3] J. M. Rabaey, Low power design essentials. Springer, 2009. [4] D. E. Culler, J. P. Singh, and A. Gupta, Parallel computer architecture: a hardware/software approach. Morgan Kaufmann Publishers, 1998. [5] A. A. Fayed and M. A. Bayoumi, “A novel architecture for low-power design on parallel multipliers,” in Proc. IEEE Comput. Soc. Workshop on VLSI, Apr. 2001, pp. 149–154. [6] J. Di and J. S. Yuan, “Power-aware pipelined multiplier design based on 2-dimensional pipeline gating,” in Proc. GLSVLSI, Apr. 2003, pp. 64–67. [7] M.-C. Wen, S.-J. Wang, and Y.-N. Lin, “Low power parallel multiplier with column bypassing,” in Proc. ISCAS, May 2005, pp. 1638–1641. [8] Z. Huang and M. D. Ercegovac, “Two-dimensional signal gating for low-power array multiplier design,” in Proc. ISCAS, vol. 1, Aug. 2002, pp. 489–492. [9] W. Yeh and C. Jen, “High-speed Booth encoded parallel multiplier design,” IEEE Trans. Comput., vol. 49, no. 7, pp. 692–700, Jul. 2000. [10] K. Choi and M. Song, “Design of a high performance 32 × 32-bit multiplier with a novel sign select Booth encoder,” in Proc. ISCAS, vol.2, pp. 701–704, May 2001. [11] Z. Huang, High-level optimization techniques for low power multiplier design, Ph.D. dissertation, University of California at Los Angeles, 2004. [12] Z. Huang and M. D. Ercegovac, “High-performance low-power left-toright array multiplier design,” IEEE Trans. Comput., vol. 54, no. 3, pp. 272–283, Mar. 2005. [13] M. D. Ercegovac and T. Lang, “Fast multiplication without carrypropagate addition,” IEEE Trans. Comput., vol. 39, no. 11, pp. 1385– 1390, Nov. 1990. [14] Z. Huang and M. D. Ercegovac, “Low power array multiplier design by topology optimization,” in Proc. SPIE Advanced Signal Processing Algorithms, Architectures, and Implementations XII, vol. 4791, Jul. 2002, pp. 424–435. [15] Z. Huang and M. D. Ercegovac, “Number representation optimization for low power multiplier design,” in Proc. SPIE Advanced Signal Processing Algorithms, Architectures, and Implementations XII, vol. 4791, Jul. 2002, pp. 345–356. [16] T. Sakuta, W. Lee, and P.T. Balsara, “Delay balanced multipliers for low power/low voltage DSP core,” in Proc. ISLPED, Oct. 1995, pp. 36–37. [17] S. S. Mahant-Shetti, P. T. Balsara, and C. Lemonds, “High performance low power array multiplier using temporal tiling,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 7, no. 1, pp. 121–124, Mar. 1999. [18] J. L. Hennessey and D. A. Patterson, Computer organization and design: the hardware/software interface, Morgan Kaufmann Publishers, 2005. [19] S. Furber, ARM system architecture, Addison-Wesley, 1996. [20] ARM, ARM7TDMI technical reference manual [21] R. Gonzales and M. Horowitz, “Energy dissipation in general purpose microprocessors,“ IEEE J. Solid-State Circuits, vol. 31, no. 9, Sep. 1996. [22] S. W. Heo, S. J. Huh, and M. D. Ercegovac, “Power optimization in a parallel multiplier using voltage islands,” To appear in Proc. ISCAS, May 2013.
[1] W. Suntiamorntut, Energy efficient functional unit for a parallel asynchronous DSP, Ph.D. dissertation, University of Manchester, 2005.
197
6/24/2015
Gmail ASAP2013 notification for paper 16
Seok Won Heo
ASAP2013 notification for paper 16 ASAP2013 To: Seok Won Heo
Wed, Mar 27, 2013 at 1:49 PM
Dear Authors, We received an number of outstanding papers for ASAP13 and we are pleased to inform you that your paper was selected as a SHORT PAPER for this year's conference. The reviews are included at the end of this email. Please address the comments from the reviewers and provide your cameraready ***4PAGE*** paper (2 additional pages my be purchased for a fee) as specified in the directions on the conference website (http://asapconference.org/). Cameraready papers are due April 21st. Information regarding registration, lodging, and travel for the conference are available at the conference website: http://asapconference.org/ Best Regards, ASAP PC CoChairs REVIEW 1 PAPER: 16 TITLE: Power Optimization of SumofProducts Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 4 (weak accept) Relevance: 5 (excellent) Originality: 5 (excellent) Soundness: 5 (excellent) Language: 2 (poor) Presentation: 3 (fair) Best paper candidate: 1 (no) REVIEW This paper examines a proposed enhancement to the ARM7TDMIS processor core. The processor's base design includes a hardware multiply unit that uses a modified Booth's algorithm with a carry save array of four layers to perform a 32x8 multiply per cycle, presumably requiring an 8 cycle latency to produce a full 64bit result. Although the ARM instruction set includes a multiplyaccumulate instruction, it cannot be executed natively on this version of the core because it lacks a fusedmultiplyadd unit. The authors designed their own sum of products functional unit, which is different from a multiplyaccumulate in that it can perform two multiplies and one add in a single instruction. Their functional unit is designed to be energy efficient because it uses a split array multiplier comprised of two lefttoright array multipliers which are built on [4:2] adders. In their sum of products instruction, it seems that their one multiplier is used twice and the addition is perfo rmed using an accumulator. Using this new instruction, the authors are able to achieve substantial improvement of energydelay product for an FIR filter kernel. This is an interesting paper because its results are from transistorlevel models that include postplace and route layout parasitics, which is the most accurate way to measure power consumption aside from directly instrumenting the fabricated silicon. The authors used powerful tools, models, and a full technology and standard cell library to collect these results. I was particularly impressed with how the authors integrated their functional unit into the ARM core design. I would have thought that the ARM IP would be somehow encrypted or otherwise obfuscated to protect ARM's proprietary designs, making an effort like this impossible. I’m also impressed that they were able to perform wholeprogram simulation using a circuitlevel simulator. That must have taken an enormous amount of simulation time. https://mail.google.com/mail/u/0/?ui=2&ik=d680faa9bb&view=pt&q=ASAP&qs=true&search=query&msg=13dad9d283a63fdb&siml=13dad9d283a63fdb
1/4
6/24/2015
Gmail ASAP2013 notification for paper 16
REVIEW 2 PAPER: 16 TITLE: Power Optimization of SumofProducts Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 5 (strong accept) Relevance: 5 (excellent) Originality: 4 (good) Soundness: 5 (excellent) Language: 5 (excellent) Presentation: 5 (excellent) Best paper candidate: 2 (yes) REVIEW I appreciated the clarity of this paper. The paper shows that expressing dots products and matrix products as sumsofproducts (i.e., as operations of the form ab+cd) may lead to significant reduction in execution time without large energy penalty. The suggested architectures look interesting (yet I must say that I am not an expert in multiplier design). Maybe the name "sumofproducts" is a bit misleading: sumoftwoproducts would be better. The fact that application to dot product, FIR and IIR filters is interesting is intuitive. It would be interesting to see if this remains true for other applications (e.g. FFT, polynomial evaluation, LU decomposition, etc.), so trying other benchmarks would be important. Also, I would appreciate tests to know what is the impact on accuracy. REVIEW 3 PAPER: 16 TITLE: Power Optimization of SumofProducts Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 4 (weak accept) Relevance: 5 (excellent) Originality: 4 (good) Soundness: 3 (fair) Language: 5 (excellent) Presentation: 3 (fair) Best paper candidate: 1 (no) REVIEW This submission presents a very highperformance parallel arithmetic unit for the computation of sumofproducts : a*b + c*d. The two products a*b and c*d are performed in parallel inside the unit to speed up the execution of long multiplyaccumulate loops. The arithmetic part of the paper is new and very interesting but not very well described. It is difficult to clearly understand what is the proposed solution for the partial product reduction array. It seems to be a mix between lefttoright array multipliers to reduce the glitching activity and a split array multiplier. But the full description is not very clear. For the accumulation part, [4:2] adders seem to be used, but this part is not clear. Does the unit contain an accumulator (redundant or nonredundant?) or a general register of the register file is used as an accumulator? The authors mainly detail the a*b+c*d part not the accumulation. Without internal accumulation, this unit would require 5 read ports on the register file (see below). Moreover in case of an internal redundant accumulator, the representation of signed numbers in redundant form is not detailed (2's complement extension for carrysave trees is not very efficient for power aspects). One cannot assume only positive integers are used the real signal processing applications. Several figures presented in the paper are not clear (fig. 2, 3, 4.b). The text inside these figures is too small. It is not possible to read the details. https://mail.google.com/mail/u/0/?ui=2&ik=d680faa9bb&view=pt&q=ASAP&qs=true&search=query&msg=13dad9d283a63fdb&siml=13dad9d283a63fdb
2/4
6/24/2015
Gmail ASAP2013 notification for paper 16
It seems that there is no accumulation part in figure 1.a, but it is not clear why compared to fig. 1.b. The authors reports a lot of simulation results. The proposed sumofproduct unit clearly saves power compared to a simple a*b multiplier unit. But the comparison is not totally fair. It seems that the authors only compare the arithmetic part, not the performance and power of a complete processor. Using a a*b + c*d sumofproduct arithmetic unit instead of simple a*b one, will require a register file with 4 read ports (or 5 read ports in case of a solution without internal accumulator). This would increase the fanout at the output of the register file and add delays in the registers address decoding path. What would be the impact on the number of registers in the register file and one the loadstore unit in the processor if 4 operands are fetched at each clock cycle instead of two? Then the proposed arithmetic part may slow down the complete processor and all other parts of the applications. A complete architecture + arithmetic analysis would be interesting for the audience o f ASAP conference. REVIEW 4 PAPER: 16 TITLE: Power Optimization of SumofProducts Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 3 (borderline paper) Relevance: 3 (fair) Originality: 3 (fair) Soundness: 2 (poor) Language: 3 (fair) Presentation: 3 (fair) Best paper candidate: 1 (no) REVIEW This paper describes the authors' work on designing and implementing multipliers and sumofproducts units. The overall presentation of the paper is good, easy to follow. The paper also includes extensive experimental results for different units and scenarios. However, most of the design are based on existing methods or approaches. I do not see much novel contribution in this work. The title of the paper is focused on power optimization. But what I see is just a straightforward design and measurements of power consumption in the experiments. I do not see much optimization efforts that try to reduce the power consumption of the design. For the comparison part, we see the reduction in execution time by using sumofproducts units. but we also see a big increase in area. The two filter applications might not be enough to justify the advantage of sumofproduct designs. It might be more convincing if we can see some more benchmarks. REVIEW 5 PAPER: 16 TITLE: Power Optimization of SumofProducts Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 2 (weak reject) Relevance: 3 (fair) Originality: 2 (poor) Soundness: 3 (fair) Language: 3 (fair) Presentation: 2 (poor) Best paper candidate: 1 (no) REVIEW The authors propose adding sum of products hardware to an ARM microprocessor. When the new hardware is used, energy for sumofproducts computation is reduced. https://mail.google.com/mail/u/0/?ui=2&ik=d680faa9bb&view=pt&q=ASAP&qs=true&search=query&msg=13dad9d283a63fdb&siml=13dad9d283a63fdb
3/4
6/24/2015
Gmail ASAP2013 notification for paper 16
The paper's presentation leaves something to be desired. It is not entirely clear until the end what the authors believe their particular contribution to be, and background is mixed with the "new content". From a technical perspective, the paper's conclusion is somewhat obvious: it is well understood that dedicated hardware is more efficient in terms of both performance and energy than microprocessors. The authors have selected a processor that does not have a multiplyaccumulate instruction as a starting point, so even the addition of just that instruction would already have helped. Minor other notes: Page 1, Col 1: "multiplier is one of the most power demanding components on these chips" > this statement is crying out for a reference. Page 3, Col 2: "presents the less power consumption" > reword Page 5, table 3: it is not entirely clear whether 'energy' refers to energy for the complete core, or just for the multipler/sumofproducts unit. REVIEW 6 PAPER: 16 TITLE: Power Optimization of SumofProducts Design for Signal Processing Applications AUTHORS: Seok Won Heo, Suk Joong Huh and Miloš Ercegovac OVERALL EVALUATION: 3 (borderline paper) Relevance: 4 (good) Originality: 3 (fair) Soundness: 3 (fair) Language: 4 (good) Presentation: 4 (good) Best paper candidate: 1 (no) REVIEW This paper proposes a method to design a sumofprodcuts unit for digital signal processing that reduces the execution time of sumofproducts calculations compared to using a multiplier found on the ARM7TDMIS processor. The proposed sumofproducts unit uses lefttoright multiplication, a fourlevel upper/lower split array multiplier, and 4:2 adder. The paper is fairly well written and the proposed approach is interesting and seems to be effective. Some issues with the paper include: (1) The techniques used to design the sumofproducts unit (lefttoright multiplication, split array multiplier, and 4:2 adder) have all been previously published. It seems like their new contributions are the us of a fourlevel split array multiplier (but this design is not described well)), applying the multiplier design to sumof products calculations (but this is trivial) and the comparison with the multiplier in the ARM7TDMIS processor. The authors should be more clear about the novel contributions of the paper. (2) The authors compare their design with the multiplier in the ARM7TDMIS processor. However, this is an odd choice since the ARM processor has a 32bit by 8bit multiplier and does not direct support sumofproducts computations. It would be better if the authors instead compared their sumofproducts unit to previoussumof products units. Also, it was not clear from the discussion if the authors implemented a their sumofproduct unit using 32bit by 32bit multipliers or something else.
https://mail.google.com/mail/u/0/?ui=2&ik=d680faa9bb&view=pt&q=ASAP&qs=true&search=query&msg=13dad9d283a63fdb&siml=13dad9d283a63fdb
4/4
ASAP 2013 Table of Contents
[Page 6 / 11]
Linear Algebra and Signal Processing 160
A Practical Measure of FPGA Floating Point Acceleration for High Performance Computing (John D. Cappello, Dave Strenski)
168
Sparse Matrix-Vector Multiply on the Texas Instruments C6678 Digital Signal Processor (Yang Gao, Jason D. Bakos)
175
Transforming a Linear Algebra Core to an FFT Accelerator (Ardavan Pedram, John McCalpin, Andreas Gerstlauer)
185
Reduce, Reuse, Recycle (R 3 ): A Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms (Kevin Townsend, Joseph Zambreno)
192
Power Optimization of Sum-of-Products Design for Signal Processing Applications (Seok Won Heo, Suk Joong Huh, Milo˘s D. Ercegovac)
198
An Efficient & Reconfigurable FPGA and ASIC Implementation of a Spectral Doppler Ultrasound Imaging System (Adam Page, Tinoosh Mohsenin)
[Search]
ASAP 2013 Brief Author Index
[Page 3 / 6]
H
J
Hajkazemi, Mohammad Hossein . . . . . . . . 153 Hammond, Simon D. . . . . . . . . . . . . . . . . . . . . . . . 321 Hannig, Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1, 10 Hao, Lu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Hemani, Ahmed . . . . . . . . . . . . . . . . . . . . . . . . 227, 277 Heo, Deukhyoun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Heo, Seok Won . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Hsieh, Genie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Hu, X. Sharon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Huang, Jia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Huang, Kai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Huh, Suk Joong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Hunt, Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .237 Hussain, Waqar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Hutchings, Brad L. . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Jafri, Syed M.A.H. . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Jain, Abhishek Kumar . . . . . . . . . . . . . . . . . . . . . .219 Jarollahi, Hooman . . . . . . . . . . . . . . . . . . . . . . . . . . 305
K Kang, Jihoon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Ke˛ pa, Krzysztof . . . . . . . . . . . . . . . . . . . . . . . . . 26, 261 Kim, Yongjoo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Kirchgessner, Robert . . . . . . . . . . . . . . . . . . . . . . . 211 Knoll, Alois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Ko, Yohan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Koelmans, Albert . . . . . . . . . . . . . . . . . . . . . . . . . . . .314 Kougianos, Elias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
I Ioualalen, Arnault . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
[Search]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
ASAP 2013 Detailed Author Index
[Page 12 / 36]
H
Hajkazemi, Mohammad Hossein 153
FARHAD: A Fault-Tolerant Power-Aware Hybrid Adder for Add Intensive Applications
Hammond, Simon D. 321
GPU Acceleration of Data Assembly in Finite Element Methods and Its Energy Implications
Hannig, Frank 1
Symbolic Parallelization of Loop Programs for Massively Parallel Processor Arrays
10
Loop Program Mapping and Compact Code Generation for Programmable Hardware Accelerators
Hao, Lu 91
Virtual Finite-State-Machine Architectures for Fast Compilation and Portability
Hemani, Ahmed 227
Private Configuration Environments (PCE) for Efficient Reconfiguration, in CGRAs
277
Unifying CORDIC and Box-Muller Algorithms: An Accurate and Efficient Gaussian Random Number Generator
Heo, Deukhyoun 79
Design Space Exploration for Reliable mm-Wave Wireless NoC Architectures
Heo, Seok Won 192
Power Optimization of Sum-of-Products Design for Signal Processing Applications
Hsieh, Genie 321
GPU Acceleration of Data Assembly in Finite Element Methods and Its Energy Implications H continues on next page…
[Search]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Message from the ASAP 2013 Chairs
Tarek El-Ghazawi, General Chair
Alan George, General Co-Chair
Melissa Smith, Program Chair
Kubilay Atasu, Program Co-Chair
We welcome you to the 24th IEEE International Conference on Applicationspecific Systems, Architectures and Processors (ASAP 2013). This year’s event takes place in Washington, D.C., USA, the capital of the United States. Prior to visiting D.C., the conference has been held in many places around the globe including Oxford (1986), San Diego (1988), Killarney (1989), Princeton (1990), Barcelona (1991), Berkeley (1992), Venice (1993), San Francisco (1994), Strasbourg (1995), Chicago (1996), Zürich (1997), Boston (2000), San Jose (2002), The Hague (2003), Galveston (2004), Samos (2005), Steamboat Springs (2006), Montréal (2007), Leuven (2008), Boston (2009), Rennes (2010), Santa Monica (2011), and Delft (2012). This year’s program includes an exciting collection of contributions resulting from a successful call for papers. The selected papers have been divided into thematic areas, which include regular papers, short papers, and poster papers that highlight the current focus of application-specific systems research activities. In response to the call for papers, 125 submissions were received, 108 of which were reviewed. These submitted papers came from 33 countries in Africa, Asia, Europe, and America. The largest number of submitting authors have an affiliation in the US (140), followed by the EU (102), and China (30). Submissions were subjected to a rigorous review by the members of the program committee, 46 members from 12 countries, as well as 70 external reviewers: the committee provided 235 reviews and external reviewers contributed 98 reviews. After an intense scrutiny of the reviews, we are pleased to present a high quality technical
continues ...
program that includes 24 long papers, 15 short papers, and 22 posters for presentation at the conference. They represent the current state-of-the-art in application-specific systems research. These are complemented by keynote and invited talks. We thank the authors who responded to our call for papers, the members of the program committee and the external referees who, with their opinion and expertise, ensured a very high quality program. Herman Lam worked tirelessly to publicize and seek out industry sponsors for the conference. Lubomir Riha helped in getting the IEEE and CS sponsorship. Ahmad Anbar ensured that the web environment was active and responsive and that we were on track with finances. Esam El-Araby ensured that the proceedings are indeed a reality and Vikram Narayana made sure the registration process was smooth. We are grateful to the IEEE Computer Society for sponsoring the conference. Thank you all. We hope that the proceedings will serve as a useful reference of the state-of-theart in application-specific systems research. General Chair: Tarek El-Ghazawi, The George Washington University, USA General Co-Chair: Alan George, University of Florida, USA Program Chair: Melissa Smith, Clemson University, USA Program Co-Chair: Kubilay Atasu, IBM Research - Zurich, Switzerland May 2013
ASAP 2013 Conference Committee General Chair: Tarek El-Ghazawi, The George Washington University, USA General Co-Chair: Alan George, University of Florida, USA Honorary General Chair: Reiner Hartenstein, University of Kaiserslautern, Germany Program Chair: Melissa Smith, Clemson University, USA Program Co-Chair: Kubilay Atasu, IBM Research - Zurich, Switzerland Industrial Chair: Herman Lam, University of Florida, USA Publication Chair: Esam El-Araby, The Catholic University of America, USA Registration Chair & Awards Chair:: Vikram Narayana, The George Washington University, USA Finance Chair: Lubomir Riha, The George Washington University, USA Finance Co-Chair & Web Chair: Ahmad Anbar, The George Washington University, USA Program Committee: Peter Athanas, Virginia Tech, USA Jason Bakos, University of South Carolina, USA Pascal Benoit, University of Montpellier, France Jeremy Buhler, Washington University in St. Louis, USA Joseph Cavallaro, Rice University, USA Roger Chamberlin, Washington University in St. Louis, USA Anupam Chattopadhyay, RWTH Aachen University, Germany Esam El-Araby, Catholic University of America, USA Suhaib Fahmy, Nanyang technical University of Singapore, Singapore Michael J. Flynn, Stanford University, USA José A.B. Fortes, University of Florida, USA Frank Hannig, University of Erlangen-Nuremberg, Germany
continues ...
Haohuan Fu, Tsinghua University, China Krzysztof Kuchcinski, Lund Institute of Technology, Sweden Sun-Yuan Kung, Princeton University, USA Philip Leong, The University of Sydney, Australia Wayne Luk, Imperial College London, UK Diana Gohringer, KIT, Germany Ann Gordon-Ross, University of Florida, USA Akila Gothandaraman, University of Pittsburgh Center for Simulation&Modeling, USA Peter Hofstee, IBM Research – Austin, USA Brian Holland, SRC, USA Martin Herbordt, Boston University, USA Volodymyr Kindratenko, NCSA, USA Jean-Michel Muller, Ecole Normale Supérieure de Lyon, France Onur Mutlu, Carnegie Mellon University, USA Oliver Pell, Maxeler Technologies, UK Gang Qu, University of Maryland, USA Oliver Sander, KIT, Germany Kentaro Sano, Tohoku University, Japan Ron Sass, UNC Charlotte, USA Mariagiovanna Sami, Politecnico di Milano, Italy Michael J. Schulte, AMD Research, USA Cristina Silvano, Politecnico di Milano, Italy Eric Stahlberg, OpenFPGA, National Cancer Institute, USA Thomas Steinke, Zuse Institute Berlin, Germany Dave Strenski, Cray, USA Earl Swartzlander, University of Texas at Austin, USA Jürgen Teich, University of Erlangen-Nuremberg, Germany David Thomas, Imperial College London, UK Ingrid Verbauwhede, K.U.Leuven, Belgium Mike Wirthlin, Brigham Young University, USA Christophe Wolinski, Université de Rennes 1, France Roger Woods, Queen’s University Belfast, UK External Reviewers: Per Andersson Jose Rodrigo Azambuja Paul Barber Tobias Becker Ramakrishna Bijanapalli Chakri Srinivas Boppu Vy Bui Vineet Chadha Francois Charot Kit Cheung Laurent Condat
Ayesha Khalid Peter Kornerup William Kritikos Vahid Lari Tao Li Andrew Love Stephen McKeown Alastair McKinley Nick Ng Xinyu Niu Stuart Oberman Gianluca Palermo
Yuichiro Shibata Ali Asgar Sohanghpurwala Renato Stefanelli Christoph Studer Michael Sullivan Hiroyuki Takizawa Arnaud Tisserand Carsten Tradowsky David Uliana Girish Venkatasubramanian Aida Vosoughi Guohui Wang
continues ...
Florent de Dinechin James Demma Shaver Deyerle Chris Dobson Renato Figueiredo Scott Fischaber Michael Frechtling Flavius Gruian John Harris Abhishek Jain Shweta Jain Mioara Joldes
Vivek Pallipuram Raphael Polig Mitra Purandare Krishna Ramadurai Nimisha Raut Felix Reimann Kurt Rooks Zoltán Endre Rákossy Nilim Sarma Yukinori Sato Moritz Schmid Bernhard Schmidt
Andreas Weichselgartner Eddie Weill Michael Wu Hongyi Xin Simin Xu Sotirios Xydis Yoshiki Yamaguchi Gavin Yao Bei Yin Qi Zhang Daniel Ziener
Message from the ASAP 2013 Chairs
Tarek El-Ghazawi, General Chair
Alan George, General Co-Chair
Melissa Smith, Program Chair
Kubilay Atasu, Program Co-Chair
We welcome you to the 24th IEEE International Conference on Applicationspecific Systems, Architectures and Processors (ASAP 2013). This year’s event takes place in Washington, D.C., USA, the capital of the United States. Prior to visiting D.C., the conference has been held in many places around the globe including Oxford (1986), San Diego (1988), Killarney (1989), Princeton (1990), Barcelona (1991), Berkeley (1992), Venice (1993), San Francisco (1994), Strasbourg (1995), Chicago (1996), Zürich (1997), Boston (2000), San Jose (2002), The Hague (2003), Galveston (2004), Samos (2005), Steamboat Springs (2006), Montréal (2007), Leuven (2008), Boston (2009), Rennes (2010), Santa Monica (2011), and Delft (2012). This year’s program includes an exciting collection of contributions resulting from a successful call for papers. The selected papers have been divided into thematic areas, which include regular papers, short papers, and poster papers that highlight the current focus of application-specific systems research activities. In response to the call for papers, 125 submissions were received, 108 of which were reviewed. These submitted papers came from 33 countries in Africa, Asia, Europe, and America. The largest number of submitting authors have an affiliation in the US (140), followed by the EU (102), and China (30). Submissions were subjected to a rigorous review by the members of the program committee, 46 members from 12 countries, as well as 70 external reviewers: the committee provided 235 reviews and external reviewers contributed 98 reviews. After an intense scrutiny of the reviews, we are pleased to present a high quality technical
continues ...
program that includes 24 long papers, 15 short papers, and 22 posters for presentation at the conference. They represent the current state-of-the-art in application-specific systems research. These are complemented by keynote and invited talks. We thank the authors who responded to our call for papers, the members of the program committee and the external referees who, with their opinion and expertise, ensured a very high quality program. Herman Lam worked tirelessly to publicize and seek out industry sponsors for the conference. Lubomir Riha helped in getting the IEEE and CS sponsorship. Ahmad Anbar ensured that the web environment was active and responsive and that we were on track with finances. Esam El-Araby ensured that the proceedings are indeed a reality and Vikram Narayana made sure the registration process was smooth. We are grateful to the IEEE Computer Society for sponsoring the conference. Thank you all. We hope that the proceedings will serve as a useful reference of the state-of-theart in application-specific systems research. General Chair: Tarek El-Ghazawi, The George Washington University, USA General Co-Chair: Alan George, University of Florida, USA Program Chair: Melissa Smith, Clemson University, USA Program Co-Chair: Kubilay Atasu, IBM Research - Zurich, Switzerland May 2013
6/16/2015
Copy of www.csconferenceranking.org
Conference Ranking (was www.csconferenceranking.org) I maintained here (with only cosmetic alteration: the conferences where I had accepted papers have links to their latest edition you can find other copies of it there there and there) a copy of what was on http://www.csconferenceranking.org/conferencerankings/alltopics.html Sadly that webpage is no longer maintained, but my feeling is that it was one of the most accurate conferences ranking, and the fine grain (49 possibilities) gave a better understanding than the usual A,B,C notes (even though the second digit is probably not so representative).
Databases / Knowledge and Data Management / Data Security / Web / Mining Although we will attempt to keep this information accurate, we cannot guarantee the accuracy of the information provided. The numbers in brackets correspond to the EIC value (Estimated Impact of Conference). The numbers are normalized to be in the range 0.001.00 (the closer the number to 1.00, the better the conference). Only conferences with EIC above 0.50 have been included. We will attempt to update the ranking lists every three months (end of January, April, July, and October), but this is getting more of a challenge than originally anticipated. Conferences listed below are considered to be tier 1 research meetings in their respective fields. Top 88 conferences are listed (636 considered): SIGMOD: ACM SIGMOD Conf on Management of Data (0.99) VLDB: Very Large Data Bases (0.99) KDD: Knowledge Discovery and Data Mining (0.99) ICDE: Intl Conf on Data Engineering (0.98) ICDT: Intl Conf on Database Theory (0.97) S&P: IEEE Symposium on Security and Privacy (0.97) SIGIR: ACM SIGIR Conf on Information Retrieval (0.96) PODS: ACM SIGMOD Conf on Principles of DB Systems (0.95) WWW: WorldWide Web Conference (0.92) FODO: Intl Conf on Foundation on Data Organization (0.91) ER: Intl Conf on Conceptual Modeling ER (0.90) CIKM: Intl. Conf on Information and Knowledge Management (0.90) KR: Intl Conference on Principles of Knowledge Representation and Reasoning (0.90) DOOD: Deductive and ObjectOriented Databases (0.89) DEXA: Database and Expert System Applications (0.88) SSDBM: Intl Conf on Scientific and Statistical DB Mgmt (0.88) COMAD: Intl Conf on Management of Data (0.88) EDBT: Extending DB Technology (0.88) ICDM: IEEE International Conference on Data Mining (0.87) VDB: Visual Database Systems (0.87) SSD: Intl Symp on Large Spatial Databases (0.85) CoopIS: Conference on Cooperative Information Systems (0.85) SAM: Intl Conference on Security and Management (0.85) IFIPDS: IFIPDS Conference (0.85) DaWaK: Data Warehousing and Knowledge Discovery (0.85) ADTI: Intl Symp on Advanced DB Technologies and Integration (0.83) PAKDDM: Practical App of Knowledge Discovery and Data Mining (0.82) NGDB: Intl Symp on Next Generation DB Systems and Apps (0.81) http://perso.crans.org/~genest/conf.html
1/9
6/16/2015
Copy of www.csconferenceranking.org
ANNIE: Artificial Neural Networks in Engineering (0.72) AIED: World Conference on AI in Education (0.72) DAS: International Workshop on Document Analysis Systems (0.71) ICIP: Intl Conf on Image Processing (0.71) ICGA: International Conference on Genetic Algorithms (0.71) EA: International Conference on Artificial Evolution (0.70) WACV: IEEE Workshop on Apps of Computer Vision (0.65) COLING: International Conference on Computational Linguistics (0.64) ECCV: European Conference on Computer Vision (0.63) EACL: Annual Meeting of European Association Computational Linguistics (0.62) DocEng: ACM Symposium on Document Engineering (0.61) CAAI: Canadian Artificial Intelligence Conference (0.60) AMAI: Artificial Intelligence and Maths (0.60) ICRA: IEEE Intl Conf on Robotics and Automation (0.60) WCES: World Congress on Expert Systems (0.60) ACCV: Asian Conference on Computer Vision (0.59) CAIA: Conf on AI for Applications (0.57) IEA/AIE: Intl Conf on Ind. and Eng. Apps of AI and Expert Sys (0.57) ICCBR: International Conference on CaseBased Reasoning (0.57) ICASSP: IEEE Intl Conf on Acoustics, Speech and SP (0.57) ASC: Intl Conf on AI and Soft Computing (0.57) PACLIC: Pacific Asia Conference on Language, Information and Computation (0.56) ICONIP: Intl Conf on Neural Information Processing (0.56) IWPAAMS: Intl Workshop on Practical Appl. of Agents & Multiagent Systems (0.56) SMC: IEEE Intl Conf on Systems, Man and Cybernetics (0.55) CAEPIA: Conference of the Spanish Association for Artificial Intelligence (0.55) IWANN: Intl WorkConf on Art and Natural Neural Networks (0.55) CIA: Cooperative Information Agents (0.55) RANLP: Recent Advances in Natural Language Processing (0.54) ICANN: International Conf on Artificial Neural Networks (0.54) MLMTA: Intl Conf on Machine Learning; Models, Technologies and Applications (0.54) NLPRS: Natural Language Pacific Rim Symposium (0.54) ACIVS: Int Conference on Advanced Concepts For Intelligent Vision Systems (0.53) RSS: Robotics: Science and Systems Conference (0.53) ICAPS/AIPS: Conference on Artificial Intelligence Planning Systems (0.53) ECAL: European Conference on Artificial Life (0.53) MAAMAW: Modelling Autonomous Agents in a MultiAgent World (0.52) ANTS: Ant Colony Optimization and Swarm Intelligence (0.51) NC: ICSC Symposium on Neural Computation (0.51)
Architecture / Hardware / HighPerformance Computing / Tools / Operating Systems Although we will attempt to keep this information accurate, we cannot guarantee the accuracy of the information provided. The numbers in brackets correspond to the EIC value (Estimated Impact of Conference). The numbers are normalized to be in the range 0.001.00 (the closer the number to 1.00, the better the conference). Only conferences with EIC above 0.50 have been included. The ranking lists will be updated every three months (end of January, April, July, and October), but this is getting more of a challenge than originally anticipated. Conferences listed below are considered to be tier 1 research meetings in their respective fields. http://perso.crans.org/~genest/conf.html
4/9
6/16/2015
Copy of www.csconferenceranking.org
Top 57 conferences are listed (421 considered): MICRO: Intl Symp on Microarchitecture (0.97) OSDI: USENIX Operating Systems Design and Implementation (0.96) SC/SUPER: ACM/IEEE Supercomputing Conference (0.96) HPCA: IEEE Symp on HighPerf Comp Architecture (0.96) ASPLOS: Architectural Support for Prog Lang and OS (0.95) FCCM: IEEE Symposium on Field Programmable Custom Computing Machines (0.93) ISCA: ACM/IEEE Symp on Computer Architecture (0.99) HCS: Hot Chips Symp (0.92) DAC: Design Automation Conf (0.92) IPDPS: Intl Parallel and Distributed Processing Symposium (0.91) PACT: IEEE Intl Conf on Parallel Architectures and Compilation Techniques (0.88) ISSCC: IEEE Intl SolidState Circuits Conf (0.87) VLSI: IEEE Symp VLSI Circuits (0.87) ICCAD: Intl Conf on ComputerAided Design (0.86) CODES+ISSS: Intl Conf on Hardware/Software Codesign & System Synthesis (0.86) USENIX: Technical Conference (0.86) DATE: IEEE/ACM Design, Automation & Test in Europe Conference (0.85) ICA3PP: Algs and Archs for Parall Proc (0.85) ERSA: Intl Conf on Engineering of Reconfigurable Systems and Algorithms (0.85) ICN: IEEE Intl Conf on Networking Topology in Computer Science Conference (0.84) PDPTA: Intl Conf on Parallel & Distributed Processing Techniques and Appl. (0.84) ASAP: IEEE / ApplicationSpecific Systems, Architectures, and Processors (0.84) CHARME: Conference on Correct Hardware Design and Verification Methods (0.83) FPL: FieldProgrammable Logic and Applications (0.82) ICCD: Intl Conference on Computer Design (0.81) PPoPP: ACM SIGPLAN Symp. on Principles & Practice of Parallel Programming (0.81) CASES: Intl Conf on Compilers, Architecture, & Synthesis for Embedded Systems (0.81) ESA: Intl Conf on Embedded Systems and Applications (0.79) PARCO: Parallel Computing Conference (0.77) ICS: Intl Conf on Supercomputing (0.74) SC: ACM/IEEE Intl Conf for High Perf. Comp., Networking, Storage & Analysis (0.73) PADS: IEEE Workshop on Parallel and Distributed Simulation (0.72) CDES: Intl Conf on Computer Design (0.68) CANPC: Communication, Arch., & Appl. for NetworkBased Parallel Comp. (0.71) GCA: Intl Conf on Grid Computing and Applications (0.71) ISPASS: Int Symposium on Performance Analysis of Systems and Software (0.71) RTAS: IEEE Real Time Technology and Applications Symposium (0.69) CHES: Cryptographic Hardware and Embedded Systems (0.67) PPSC: SIAM Conf on Parallel Processing for Scientific Computing (0.65) NOSA: Nordic Symposium on Software Architecture (0.64) ACSAC: AsiaPacific Computer Systems Architecture Conference (0.62) ICPP: Intl Conf. on Parallel Processing (0.61) RTCOMP: Intl Conf on RealTime Computing Systems and Applications (0.60) ASYNC: Symposium on Asynchronous Circuits and Systems (0.59) CAMP: Intl Workshop on Computer Architectures for Machine Perception (0.59) PPSN: Parallel Problem Solving from Nature (0.56) HPCS: Intl Symposium on High Performance Computing Systems (0.56) HPDC: IEEE Intl Symposium on High Performance Distributed Computing (0.56) VTS: IEEE VLSI Test Symposium (0.56) http://perso.crans.org/~genest/conf.html
5/9
6/16/2015
Computer Science Conference Rank
Computer Science Conference Rank Source: CORE
Rank A+
Rank A
Rank B
Rank C
Rank A CS conference Antonym ACIS ACSAC AIIM AIME AiML ALENEX ALIFE ALT AMCIS AOSD APPROX ASAP ASE ASIACRYPT ATVA BPM CADE CaiSE CANIM CBSE CC CCC CCGRID CGO CIDR CIKM
Name Australasian Conference on Information Systems Annual Computer Security Applications Conference Artificial Intelligence in Medicine Artificial Intelligence in Medicine in Europe Advances in Modal Logic Workshop on Algorithm Engineering and Experiments International Conference on the Simulation and Synthesis of Living Systems Algorithmic Learning Theory Americas Conference on Information Systems Aspect Oriented Software Development International Workshop on Approximation Algorithms for Combinatorial Optimization Problems International Conference on Apps for Specific Array Processors Automated Software Engineering Conference International Conference on the Theory and Application of Cryptology and Information Security International Symposium on Automated Technology for Verification and Analysis International Conference in Business Process Management International Conference on Automated Deduction International Conference on Advanced Information Systems Engineering Computer Animation International Symposium Component Based Software Engineering
Rank A A A A A A
International Conference on Compiler Construction IEEE Symposium on Computational Complexity IEEE International Symposium on Cluster, Cloud and Grid Computing International Symposium on Code Generation and Optimization Conference on Innovative Data Systems Research ACM International Conference on Information and Knowledge Management
A A
http://lipn.univparis13.fr/~bennani/CSRank.html
A A A A A A A A A A A A A A
A A A A 1/6
6/16/2015
academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100 Sign in
Advanced Search
Authors »
Academic Computer Science Hardware & Architecture
Publications »
Top conferences in hardware & architecture
Conferences »
1–100 of 102 results
Journals » Keywords »
All Years Conferences
1 2 Publications
Field Rating
Organizations » DAC Design Automation Conference
9030
124
23073
95
ICCAD International Conference on Computer Aided Design
2759
94
ISPD International Symposium on Physical Design
2702
89
898
89
ISCA International Symposium on Computer Architecture
1342
83
ISLPED International Symposium on Low Power Electronics and Design
1384
71
692
65
4588
64
FCCM FieldProgrammable Custom Computing Machines
930
56
FPGA Symposium on Field Programmable Gate Arrays
845
55
Hybrid Systems
778
55
CHES Cryptographic Hardware and Embedded Systems
427
53
ICCD International Conference on Computer Design
2438
51
VTS IEEE VLSI Test Symposium
1450
49
PACT International Conference on Parallel Architectures and Compilation Techniques
804
48
ASPLOS Architectural Support for Programming Languages and Operating Systems
334
44
3962
41
ISSS International Symposium on Systems Synthesis
483
40
CODES International Conference on Hardware Software Codesign
340
40
ECRTS Euromicro Conference on RealTime Systems
694
39
2022
38
ASYNC Symposium on Asynchronous Circuits and Systems
402
36
EURODAC European Design and Test Conference
945
34
CASES Compilers, Architecture, and Synthesis for Embedded Systems
404
34
ARITH IEEE Symposium on Computer Arithmetic
648
33
2311
32
COMPCON Computer Society International Conference
899
32
TPHOLs Theorem Proving in Higher Order Logics
569
32
CGO Symposium on Code Generation and Optimization
278
29
EDTC European Design and Test Conference
368
28
CHARME Conference on Correct Hardware Design and Verification Methods
220
28
1773
27
140
27
1070
26
ISCAS IEEE International Symposium on Circuits and Systems
MICRO International Symposium on Microarchitecture
HPCA International Symposium on HighPerformance Computer Architecture DATE Design, Automation, and Test in Europe
ASPDAC Asia and South Pacific Design Automation Conference
FPL FieldProgrammable Logic and Applications
VLSI Design
ISQED International Symposium on Quality Electronic Design ARVLSI Advanced Research in VLSI DFT Defect and Fault Tolerance in VLSI Systems
http://academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100
1/3
6/16/2015
academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100 VLSI Very Large Scale Integration
621
26
1477
25
625
25
ISMVL IEEE International Symposium on MultipleValued Logic
1374
24
SiPS IEEE Workshop on Signal Processing Systems
1104
24
IWLS International Workshop on Logic & Synthesis
246
23
ATS Asian Test Symposium
752
22
Annual Symposium on VLSI
757
20
RSP Workshop on Rapid System Prototyping
709
20
ETS European Test Symposium
434
20
FPT IEEE International Conference on FieldProgrammable Technology
771
19
IOLTS International OnLine Testing Symposium
686
19
68
19
307
18
1015
17
SLIP SystemLevel Interconnect Prediction
172
17
ERSA Engineering of Reconfigurable Systems and Algorithms
409
16
DELTA Workshop on Electronic Design, Test and Applications
548
15
MTDT Memory Technology, Design and Testing
344
15
PATMOS Workshop on Power and Timing Modeling, Optimization and Simulation
614
14
SAMOS Systems, Architectures, Modeling, and Simulation
300
14
CAMP Computer Architectures for Machine Perception
379
13
CPA Communicating Process Architectures
129
13
55
13
AHS Adaptive Hardware and Systems
411
12
CODES+ISSS International Conference on Hardware Software Codesign
192
12
HIPEAC High Performance Embedded Architectures and Compilers
135
12
39
12
ACSAC AsiaPacific Computer Systems Architecture Conference
330
11
MIC Modelling, Identification and Control
151
11
58
11
654
10
DDECS Workshop on Design and Diagnostics of Electronic Circuits and Systems
604
10
ARC Applied Reconfigurable Computing
275
10
Workshop on Interaction between Compilers and Computer Architectures
49
10
TPCD Theorem Provers in Circuit Design
39
10
Mathematical Science Institute Workshops
27
10
IWSOC International Workshop SystemonChip for RealTime Applications
311
9
ESTIMEDIA Embedded Systems for RealTime Multimedia
170
9
MTV Workshop on Microprocessor Test and Verification
159
9
IFIP WG10.5
139
9
25
9
ACM Great Lakes Symposium on VLSI ASAP ApplicationSpecific Systems, Architectures, and Processors
PACS PowerAware Computer Systems ACSD Int. Conf. on Application of Concurrency to System Design DSD Euromicro Symposium on Digital Systems Design
Computer Hardware Description Languages and their Applications
TAU Timing Issues In The Specification And Synthesis Of Digital Systems
MVL MultipleValued Logic ICA3PP International Conference on Algorithms and Architectures for Parallel Processing
WMPI Workshop on Memory Performance Issues
http://academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100
2/3
6/16/2015
academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100 ICMENS International Conference on MEMS, NANO, and Smart Systems
430
8
SIPEW SPEC International Performance Evaluation Workshop
44
8
Esmart Research in Smart Cards
22
8
289
7
69
7
195
6
RECOSOC Reconfigurable Communicationcentric SystemsonChip
95
6
New Zealand Computer Science Research Students' Conference
28
6
Intelligent Memory Systems
18
6
CDES International Conference on Computer Design
213
5
IESS International Embedded Systems Symposium
75
5
Fractals in the Natural and Applied Sciences
41
5
New Hardware Design Methods
13
5
7
5
149
4
36
4
8
4
185
3
46
3
6
3
HPCNCS High Performance Computing, Networking and Communication Systems
81
2
Rechnergestützter Entwurf und Architektur mikroelektronischer Systeme
25
2
IFIP WG5.10 Publications
22
2
SoCC IEEE International SystemonChip (SoC) Conference Synthesis for Control Dominated Circuits CIT Conference on Information Technology
MEDEA Memory Performance: Dealing With Applications, Systems And Architecture IASTEDCCS Circuits, Signals, and Systems Sagamore Computer Conference Formal Hardware Verification SIGMAP Signal Processing and Multimedia Applications Microcomputing APCCAS Asia Pacific Conference on Circuits and Systems
1 2
Help | Feedback | Follow Us | Terms of Use | Specific Terms | Trademarks | Privacy & Cookies | Survey
©2013 Microsoft Corporation. All rights reserved.
http://academic.research.microsoft.com/RankList?entitytype=3&topDomainID=2&subDomainID=3&last=0&start=1&end=100
3/3