An Efficient VLSI Architecture for CORDIC Algorithm - Semantic Scholar

29 downloads 0 Views 520KB Size Report
introduced in 1956 by Jack Volder as a highly efficient, low-complexity, and robust ... Andraka. R. ( 1998). The basic block diagram of CORDIC processor is ...
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 8, Number 16 (2013) pp. 1897-1907 © Research India Publications http://www.ripublication.com/ijaer.htm

An Efficient VLSI Architecture for CORDIC Algorithm R.Parameshwaran#1, K.Hariharan#2, R.Manikandan#3, M.Raguram*1 and B.Narendran*2 #1, 2

Assistant Professor, SASTRA University Senior Assistant Professor, SASTRA University *1, 2 PG SCHOLAR, SASTRA University #1 [email protected] #2 [email protected] #3 [email protected] *1 [email protected], *[email protected] #3

Abstract The proposed architecture carried out makes use of n iterations to produce the final value of the function upto an accuracy of n bits. A two’s complement 4bit carry-look ahead adder/subtractor block with carry-save has been implemented as part of the architecture for greater speed. An 8-bit barrel shifter has been implemented for use in the algorithm. An optimum use of edge-triggered latches and an intelligent clocking scheme has been designed to reduce the number of transistors involved. The iterative sequencing of steps requires a 3-bit counter and a clocking control scheme which has been implemented in this project. The CORDIC algorithm requires a certain set of fixed values to be accessed during the implementation of the iterative series of steps. A read-only-memory (ROM) block has been designed for this purpose and can be accessed through a 3-bit address bus whose bits are set by the outputs of the 3-bit counter. Keywords—CORDIC, Booth recoding, Online CORDIC, vector translation, coarse rotation.

Introduction CORDIC algorithm has found its way in many applications. The CORDIC was introduced in 1956 by Jack Volder as a highly efficient, low-complexity, and robust technique to compute the elementary functions. It is initially intended for navigation technology, the CORDIC algorithm has found its way in a wide range of applications,

1898

R.Parameshwaran et al

ranging from pocket calculators, numerical co-processors, to high performance radar signal processing. After invention CORDIC worked as the replacement for the analog navigation computers aboard the B-58 supersonic bomber aircraft with a digital counterpart. Pirsch.P(1998) The CORDIC airborne navigational computer built for this purpose, outperformed conventional contemporary computers by a factor of 7, mainly due to the revolutionary development of the CORDIC algorithm. Further Steve Walther continues work on CORDIC, with the application of the CORDIC algorithm in the Hewlett-Packard calculators, such as the HP-9100 and the famous HP-35 in year 1972, the HP-41C in year1980. He told how the unified CORDIC algorithm i.e. combining rotations in the circular, hyperbolic, and linear coordinate systems and how it was applied in the HP-2116 floating-point numerical co-processor. Today fast rotation techniques are closely related to CORDIC, to perform orthonormal rotation at a very low cost. Although fast rotations exist for certain angles only, they are sufficiently versatile, and have already been widely applied in signal processing. Andraka. R. ( 1998) The basic block diagram of CORDIC processor is shown in Fig 2.1.

Methodology

Fig. 2.1 Efficient VLSI Architecture The CORDIC IP shown in Fig 2.2 consists of 5 blocks: APB interface, CORDIC engine for SIN/COS & coarse rotation, 2-stage online CORDIC for TAN-1(), radix-4 Booth recoding and the control block. • APB I/F block is compliant to AMBA APB Specification 2.0. It has 32-bit data bus. • The CORDIC engine for SIN/COS have 3 main functions as follows; o CORDIC calculation engine controlled by the control unit for SIN()/COS() calculation.

An Efficient VLSI Architecture for CORDIC Algorithm

1899

o The coarse rotation can be done by reusing the CORDIC calculation engine. o Aggregation of the final sum part and carry part from CSA(carry save adder) • •



The 2-stage online CORDIC is adopted to reduce the iteration cycles for TAN1 () calculation. It can perform 2-stage calculation for 1 PCLK duration 4) The Booth recoding unit performs Radix-4 Booth recoding from i=16 ~ 29 (‘i’ is the iteration index. During the iterations, ‘i’ will change from i=0 to i=29.). By the result of the Booth recoding, it generates the next index value, next_i. 5) The control part controls all above parts. By FSM, it start to make proper control line activation just after the START bit is set to 1. Baker. P.W. (1975)

SIN/COS Trigonometric Function The sin(), cos() calculation flow-chart is depicted in Fig. 2.3. After the iteration is completed, the X, Y register will have the following values. X (i) = cos(θ ) Y (i ) = sin(θ ) X(0) is set to 1/R (1/1.6467602581) to remove the multiplication of the compensation factor, R. Ercegovac M.D, Lang.T.(1987)

Fig. 2.3 Iteration flowchart for SIN() /COS()

Vector Rotate The vector rotate operation means to rotate some vector by a given phase. It’s very similar with SIN/COS calculation. The only difference is the initial value that is shown in the Fig 2.4.

1900

R.Parameshwaran et al

Fig. 2.4 Iteration flowchart for the vector rotation- Results (X(i), Y(i)) is the vector rotated by an angle and a phase_vector. However, the magnitude is R times bigger than original vector. Coordinate conversion: Polar to Rectangular For a conversion of a Polar coordinate to a rectangular coordinate, the inputs will be as follows; X(i) = magnitude * 1/R Y(i) = 0 Z(i) = phase Vector Translation ( Tan-1(θ) ) The vector translation means the extractions magnitude and angle from a vector. The tan-1 (θ) function value is calculated from the Fig 2.5.

Fig. 2.5 Iteration flow chart for Tan-1 (θ) X(i) = R * magnitude of the vector Z(i) = the phase of the vector or tan-1 (θ) The hybrid-CORDIC having the Online CORDIC scheme The iteration cycles of tan-1() calculation is reduced by adopting on-line CORDIC algorithm In this case, the iteration number is a half of the basic CORDIC algorithm, R value will be different with the normal basic CORDIC result. Coarse Rotation

An Efficient VLSI Architecture for CORDIC Algorithm

1901

The CORDIC algorithm supports -π/2 ~ + π/2. To support a full circle, - π ~ + π, the coarse rotation is needed. For sin()/cos() and the vector rotate mode, the coarse rotation is performed by z value as shown in Fig 2.6. Juang T.B. (2006)

Fig. 2.6 Coarse rotation for SIN()/COS() and the vector rotation For tan-1() mode, the coarse rotation is performed by Y value as shown in Fig2.7

Fig. 2.7 Coarse rotation for TAN-1 () 5.3. CORDIC Engine for SIN()/COS()/Vector Rotate

Fig. 2.8 CORDIC engine for sin()/cos() with coarse rotation function SIN/COS CORDIC engine shown in Fig 2.8 has following unit functions;

1902 • • •

R.Parameshwaran et al Sin() / Cos() Vector rotate (polar coordinate → Cartesian coordinate) Tan-1 () & Vector magnitude

Although Tan-1 is calculated faster by 2-stage online CORDIC block, Tan-1 can be calculated by using SIN/COS CORDIC engine without any additional H/W. Marx.M (1999), Takagi. N. (1991). Kogge-Stone 32-bit adder/ subtractor 32-bit adder/subtracter delay is huge. We used the Kogge-Stone adder. Timmerman .D(1991) et al. Coarse Rotation To enlarge the phase range, the coarse rotation is adopted. But, the coarse rotation is implemented without additional logic by reusing the existing SIN/COS CORDIC engine. Terence K. Rodrigues(2010) et al. Final Z calculation for TAN-1() After the 2-stage online CORDIC iteration is completed, the final result, the sum part and the carry part, should be aggregated. To do this, the 32-bit adder is needed. By modifying the Z calculation part without an additional adder, the final aggregation is performed. Radix-4 Booth Recoding CORDIC calculates the sin(), cos() function on the basis of the value of Z. As the value of Z is getting smaller, we can estimate z ≈ tan-1(z). If z is small enough, we can reduce the iteration cycles by Booth recoding.

Fig. 2.9 Radix-4 Booth recoding block diagram In the Fig 2.9 the radix-4 booth recoding is adopted and the booth recoding is used during i=16 ~ 29. By the radix-4 booth recoding, the iteration time during i=16~29 will be ½. Also, if the booth recoding result is 0, there is nothing to do during iteration. So, if

An Efficient VLSI Architecture for CORDIC Algorithm

1903

the following booth recoding is zero, it’s better to skip the zeros. For skipping this zero, the next_i is generated. For example, the current i value is 16 and there are two consecutive zeros(by booth recoding), the next_i will be 22 in order to skip the process of useless 2 zeros. If the booth recoding is conducted before CORDIC operation, the critical path delay will includes the booth recoding logic delay. To forbid the situation, we adopted a pipe-line scheme to pre-calculate the σ value and the next_i. Although this needs one additional clock, it’s crucial to achieve 400 MHz operating frequency. Two stage online CORDIC Engine for TAN-1 ()

Fig. 2.10 Two stage online CORDIC Engine block diagram The 2-stage online CORDIC shown in Fig 2.10 two step of on-line CORDIC for TAN-1(), which make the iteration cycles a half by processing 2 iterations as 1 PCLK. Although the two parts have the same basic structure, they are slightly different. The left part will process only the case i=2n. The right part will process only the case, i=2n+1. So, each part of a online CORDIC scheme can be optimized to two cases (i=2n / i=2n+1) Volder.J (1959). σ Prediction To implement unfolded online CORDIC, σ calculation is very important because the carry save adder / subtractor can operate only after σ is determined. Normally, σ is determined by the sign bit of Ys+Yc. In order to get the sign bit of Ys+Yc, it needs the 32-bit addition. The delay time of the adder will be in the critical path. So, σ prediction is needed to know the sign bit without full addition. σ Selection Although σ prediction is performed by 4-bit addition, it’s still critical path. So, σ selection technique can be applied. Also the σ selection[7] method was adopted. The 1-stage online CORDIC operations for the 2 cases, σ =1 and σ =-1, are calculated in parallel simultaneously. Also σ will be predicted during that time. And then, the predicted σ value will select the appropriate value. The blind side of the σ selection is

1904

R.Parameshwaran et al

that it needs twice hardware logic. In order to meet the target specification, the σ selection should be implemented.

simulation results VERILOG SIMULATION RESULTS cos 0•and Sin 0•

cos 90•and Sin 90•

cos 45•and Sin 45•

Tan-1 (0)=0•

Sin, cos, tan values for various angles

1905

An Efficient VLSI Architecture for CORDIC Algorithm

sin()/cos() iteration cycles 35 30 25 20 15 10 5 0 basic

Booth

atan() iteration cycles 35 30 25 20 15 10 5 0 basic

hybrid_1

hybrid_2

hybrid_σ

Total cell area 600000 500000 400000 300000 200000 100000 0 basic

Booth

hybrid_1

hybrid_2

Layout from Synopsys Tool

hybrid_σ

1906

R.Parameshwaran et al

Conclusion By using the Booth recoding, we were able to generate more 0s, which means that the general iteration cycle is reduced by little amount than the basic CORDIC. On-line CORDIC scheme was applied to this project, and it is used in calculating tan-1. Online CORDIC shortened the iteration time to the half than basic CORDIC. To optimize the online CORDIC, we implemented CORDIC with σ prediction. Finally, both σ prediction and selection are considered. This made it possible for addition, σ prediction and selection to run in parallel so that we are able to save time consumption at the expense of logic overhead. We have developed all CORDIC MATLAB codes and verilog codes it was simulated and verified.

References [1] Andraka. R. ( 1998) ‘A Survey of CORDIC Algorithms for FPGA Based Computers’ – Proc. Of the CM/SIGDA Sixth International Symposium on FPGAs, February 1998, Monterey, CA, pp.191-200. [2] Baker. P.W. (1975) ‘Suggestion for a Binary Cosine Generator’, IEEE Transactions on Computers, February, pp. 1134-1136. [3] Chen. T.C.(1972) ‘Automatic Computation of Exponentials, Logarithms, Ratios and Square Roots’, IBM J. Res.Development, July, pp.380-388. [4] Ercegovac M.D, Lang.T.(1987) ‘Fast Cosine/Sine Implementation Using CORDIC Iterations’, IEEE Trans. On Comput., vol.40, n 9, pp. 222-226. [5] Juang T.B. (2006) ‘Area/delay efficient recoding methods for parallel CORDIC rotations’, in IEEE Asia Pacific Conf. on Circuits Syst., APCCAS’06, Dec. pp. 1539–1542. [6] Jae-hyuk Kwak, Jae Hun Choi and Earl E. Swartzlander, Jr., (2000)’HighSpeed CORDIC Based on an Overlapped Architecture and a Novel σPrediction Method’, Journal of VLSI Signal Processing 25, 167-177. [7] Marx.M (1999) ‘FPGA Implementation of sin(x) and cos(x) Generators Using the CORDIC Algorithm’, Final Year Project Report, School of Electronic Engineering, University of Surrey, Guidford, UK.

An Efficient VLSI Architecture for CORDIC Algorithm

1907

[8] Pirsch.P.( 1998) ‘Architectures for Digital Signal Processing’, John Wiley & Sons. [9] Terence K. Rodrigues and Earl E. Swartzlander, Jr., Fellow, ( 2010) ‘Using Parallel Angle Recoding to Accelerate Rotations’ “ IEEE transactions on Computers, Vol. 59, No. 4, April [10] Takagi. N. (1991) ‘CORDIC Methods with a Constant Scale Factor for Sine and Cosine Computation’, IEEE Trans. On Comput., vol. 40, n 9, pp. 989-994. [11] Timmerman D., Hahn H., Hosticka B.J. (1992) ‘Low Latency Time CORDIC Algorithms’, IEEE Transactions on Comput., vol.41, n 8, pp.1010-1014. [12] Timmerman .D, Hahn .H, Hosticka B.J, Rix. B.( 1991) ‘A New Addition Scheme and Fast Scaling Factor Compensation Methods for CORDIC algorithms’, Integration – the VLSI Journal, vol. 11, n 1, pp. 85-100. [13] Vandemeulebroecke .A, Vanzieledhem. E, et al.( 1990) ‘A New Carry-Free Division Algorithm and its Application to a Single Chip 1024 bit RSA Processor’, IEEE Journal of Solid-State Circuits, vol.25, n 3, pp.748-755. [14] Vankka. J (1996) ‘Methods of Mapping from Phase to Sine Amplitude in Direct Digital Synthesis’, Proc of the 1996 IEEE International Frequency Control Symposium, pp. 942 –950. [15] Vlachos. A(1999) ‘ Design and Implementation of CORDIC Modules for ADCS’, MSc Project Report, School of Electronic Engineering, University of Surrey, Guidford, UK. [16] Volder.J (1959) ‘The CORDIC Computing Technique’, IRE Trans. Comput., Sept. pp.330-334. [17] Walther. J.S(1971) ‘A Unified Algorithm for Elementary Functions’, Proc. AFIPS Spring Joint Computer Conference, pp.379-385. [18] Wang.S, Piuri.V(1996) ‘A Unified View of CORDIC Processor Design’, in Application Specific Processors, Ed. By Earl E. Swatzlander, Jr., Kluwer Academic Press, pp.121-160. [19] Wertz.J(1985) ‘Spacecraft Attitude Determination and Control’, Ridel.D Publishing Company, London. [20] www.dspguru.com/info/faqs/cordic2.htm

1908

R.Parameshwaran et al