A Novel Predication Scheme for a SIMD System ... - Semantic Scholar

1 downloads 0 Views 62KB Size Report
2.1 SIMD Architectures and MorphoSys Reconfigurable System-on-Chip. SIMD lets one instruction operate at the same time on multiple data items. What.
A Novel Predication Scheme for a SIMD System-on-Chip Alexander Paar1, Manuel L. Anido2, and Nader Bagherzadeh3 1

Universität Karlsruhe, Fakultät für Informatik, 76128 Karlsruhe, Germany [email protected] 2 Federal University of Rio de Janeiro, NCE, Brazil [email protected] 3 University of California, Irvine, CA 92697 [email protected] http://www.eng.uci.edu/comp.arch

Abstract. This paper presents a novel predication scheme that was applied to a SIMD system-on-chip. This approach was devised by improving and combining the unrestricted predication model and the guarded execution model. It is shown that significant execution autonomy is added to the SIMD processing elements and that the code size is reduced considerably. Finally, the implemented predication scheme is compared with predication schemes of general purpose processors, and it is shown that it enables more efficient if-conversion compilations than previous architectures.1

1 Introduction Many regular data-parallel problems have benefited from the tremendous computational power provided by Single Instruction Multiple Data (SIMD) machines. However, there is also a wide variety of irregular data-parallel problems that are much more difficult to map on SIMD architectures. A major problem has been the difficulty of those architectures to deal efficiently with parallel if-then-else clauses. Reflecting the importance of exploring data parallelism to improve performance, some recent microprocessors have incorporated SIMD execution units and it has been shown that significant improvements can be achieved [1, 2]. This has been driven by the need to accelerate multimedia and digital signal processing applications. Other applications like computer image generation, volume rendering and scientific visualization in real-time have also employed SIMD architectures [3]. The next section will give a brief overview of the general SIMD computation model and a particular instance of such architecture. It will further introduce the principles of 1

The work that led to this paper was conducted at the Department of Electrical & Computer Engineering at University of California, Irvine, where the two first named authors had appointments as visiting researchers. Dr. M. Anido acknowledges the financial support by CNPq. This work was supported by DARPA (DoD) under contract F-33615-97-C-1126 and the National Science Foundation under grant CCR-0083080.

B. Monien and R. Feldmann (Eds.): Euro-Par 2002, LNCS 2400, pp. 834–843.  Springer-Verlag Berlin Heidelberg 2002

A Novel Predication Scheme for a SIMD System-on-Chip

835

predicated execution before the main part of this paper describes how both techniques were combined to achieve improvements in performance and programmability over existing comparable architectures.

2 Previous Work 2.1 SIMD Architectures and MorphoSys Reconfigurable System-on-Chip SIMD lets one instruction operate at the same time on multiple data items. What normally requires a repeated succession of instructions can now be performed in one instruction. For that purpose standard SIMD architectures incorporate an array of processing elements that are centrally controlled by a general-purpose processor. Though there have been efforts to add execution autonomy to those array elements, such as allowing different subsets of PEs to have masked/complemented execution of one instruction based on a predicate flag [4, 5], none of these approaches enables a Main Processor Reconfigurable processing element to take a local decision (e.g. RISC) Processor Array about executing entire sections within ifthen-else clauses nor do previous High Instruction-, approaches provide the capability of nested Bandwidth Data Cache predication. Memory MorphoSys [6] is a coarse grain, integrated reconfigurable system-on-chip targeted at System Bus data-parallel applications. It incorporates a reconfigurable array of processing elements, a RISC processor core, an External Memory (e.g. SDRAM, RDRAM) efficient memory interface unit, and an 8x8 array of SIMD processing elements (PEs). Fig. 1. MorphoSys architecture MorphoSys processing elements will be referred to as reconfigurable cells (RC). A reconfigurable cell comprises of a 32-bit MUX A MUX B context register that contains the SIMD C instruction (context word, CW) for the o Reg 0 current cycle, two input data multiplexers, n ALU + Multiplier Reg 1 and a combinational network that includes t Reg 2 e an ALU, a multiplier and a shifter. Data is x Shifter Reg 3 stored in a set of RC registers and an output t Reg 4 register for inter cell data exchange (see Fig. Output Register Reg 5 1, 2). Fig. 2. MorphoSys reconfigurable cell

2.2 Predication Predication refers to the conditional execution of an instruction based on the Boolean value of a qualifying predicate. If the value of the predicate is true, the instruction is

836

A. Paar, M.L. Anido, and N. Bagherzadeh

allowed to execute normally (commit), otherwise the instruction is nullified, preventing it from modifying the processor state. Predication is able to remove branch instructions from a program code. This is why it can be implemented even in an SIMD processing element that does not have a program counter. The basic compiler transformation to exploit predicated execution is known as ifconversion. If-conversion replaces conditional branches in the code with comparison instructions that define one or more predicates. Instructions that are control dependent on the branch are then converted to predicated instructions, utilizing the appropriate predicate value. In this manner, control dependencies are converted to data dependencies. There are two basic predication models. In the unrestricted predication model [7], all instructions can be predicated. The guarded execution model [8] approach is to introduce a special instruction, which controls the conditional execution of following non-predicated instructions. Using these two basic models further refinements can be made via conditional move instructions and instruction nullification. Some of the better known processor architectures that incorporate predicated ISAs are the Alpha processor [9] and the HPL PlayDoh research architecture [10]. Of major importance for every predication scheme are instructions that support efficient if-conversion as well as parallel computation of high fan-in logical expressions in case the execution of an instruction depends on more than one condition. The following sections will describe how predication was applied to MorphoSys SIMD RCs and how these techniques compare to the IA-64 architecture [11].

3 A Predicated MorphoSys Reconfigurable Cell Modified implementations of both the unrestricted predication model as well as the guarded execution model were applied to the SIMD reconfigurable cells of the MorphoSys system-on-chip. A main design objective was to perform both the generation of the qualifying predicate as well as the evaluation of up to 4 basic conditions completely in parallel with the actual arithmetic/logical execution. 3.1 Predicate Generation Conditional branches are taken depending on the outcome of several arithmetic operations. Hence an if-statement like if ((a==0) && (c