In the last decade, CORDIC algorithm has drawn wide attention from academia and industry for various applications such as DSP, biomedical signal processing, software defined radio, neural networks, and MIMO systems to mention just a few. It is an iterative algorithm, requiring simple shift and addition operations, for hardware realization of basic elementary functions. Since CORDIC is used as a building block in various single chip solutions, the critical aspects to be considered are high speed, low power, and low area, for achieving reasonable overall performance. In this paper, we first classify the CORDIC algorithm based on the number system and discuss its importance in the implementation of CORDIC algorithm. Then, we present systematic and comprehensive taxonomy of rotational CORDIC algorithms, which are subsequently discussed in depth. Special attention has been devoted to the higher radix and flat techniques proposed in the literature for reducing the latency. Finally, detailed comparison of various algorithms is presented, which can provide a first-order information to designers looking for either further improvement of performance or selection of rotational CORDIC for a specific application.

The current research in the design of high speed VLSI architectures for real-time digital signal processing (DSP) algorithms has been directed by the advances in the VLSI technology, which have provided the designers with significant impetus for porting algorithm into architecture. Many of the algorithms used in DSP and matrix arithmetic require elementary functions such as trigonometric, inverse trigonometric, logarithm, exponential, multiplication, and division functions. The commonly used software solutions for the digital implementation of these functions are table lookup method and polynomial expansions, requiring number of multiplication and additions/subtractions. However, digit-by-digit methods exist for the evaluation of these elementary functions, which compute faster than software solutions.

Some of the digit-by-digit methods for the computation of the above mentioned elementary functions were described by Henry Briggs in 1624 in “Arithmetica Logarithmica” [

The conventional method of implementation of 2D vector rotation shown in Figure

Two dimensional vector rotation.

During the last 50 years of the CORDIC algorithm a wide variety of applications have emerged. The CORDIC algorithm has received increased attention after an unified approach is proposed for its implementation [

Although CORDIC may not be the fastest technique to perform these operations, it is attractive due to its potential for efficient and low cost implementation of a large class of applications. Several modifications have been proposed in the literature for the CORDIC algorithm during the last two decades to provide high performance and low cost hardware solutions for real time computation of a two dimensional vector rotation and transcendental functions.

A new type of arithmetic operation called fast rotations or orthonormal

We have carried out the critical study of different architectures proposed in the literature for 2D rotational CORDIC in circular coordinate system, to initiate the work for further latency reduction or throughput improvement. In this paper, we will review the architectures proposed for rotational CORDIC. Specifically, we focus on redundant unfolded architectures, employing techniques suitable to increase throughput and reduce latency.

The rest of the paper is organized as follows. In Section

A nonredundant radix-

The CORDIC algorithm involves rotation of a vector

Rotation in various coordinate systems.

Circular system

Linear system

Hyperbolic system

The generalized equations of the CORDIC algorithm for an iteration can be written as [

With the appropriate initial values of

Realization of some functions using CORDIC Algorithm.

Mode | Initialization | Output | |
---|---|---|---|

1 (Circular) | Rotation | ||

1 (Circular) | Vectoring | ||

0 (Linear) | Rotation | ||

0 (Linear) | Vectoring | ||

Rotation | |||

Vectoring | |||

We present in this section the detailed description of 2D plane rotation in circular coordinate system, since this is used in many applications. The CORDIC algorithm calculates trigonometric functions, rotation of a vector and angle of a vector by realizing two dimensional vector rotation in circular coordinate systems. Figure

CORDIC algorithm based 2D vector rotation.

In rotation mode, the input angle

In vectoring mode, the unknown angle of a vector is determined by performing a finite number of microrotations satisfying the relation [

The iteration equations of the radix-2 CORDIC algorithm [

The redundant radix-2 CORDIC [

As mentioned above, the speed of CORDIC algorithm implementation can be improved by reducing the number of iterations. The iteration equations for the radix-4 CORDIC algorithm in rotation mode derived at the

The Scale factor computation and compensation, CORDIC algorithm convergence and accuracy aspects are presented in following sections.

The CORDIC rotation steps change the length of the vector in every iteration resulting in the distortion of the norm of the vector as shown in Figure

The scale factor compensation technique involves scaling of the final coordinates

Scaling may be done by extending the sequence of CORDIC iterations [

The CORDIC algorithm involves the rotation of a vector to reduce the

For

The accuracy of the CORDIC algorithm is affected by two primary sources of error, namely, angle approximation and rounding error. The error bounds for these two sources of error are derived by performing the detailed numerical analysis of the CORDIC algorithm [

Theoretically, the rotation angle

The second type of error called rounding error is due to the truncation of CORDIC internal variables by the finite length of storage elements. In addition scale factor compensation also contributes to this error. In a binary code, the truncation of intermediate results after every iteration introduces maximum rounding error of

In this section, a few architectures for mapping the CORDIC algorithm into hardware are presented. In general, the architectures can be broadly classified as folded and unfolded as shown in Figure

Taxonomy of CORDIC architectures.

The CORDIC algorithm has traditionally been implemented using bit serial architecture with all iterations executed in the same hardware [

Unfolded pipelined CORDIC architecture.

The implementation of CORDIC algorithm has evolved over the years to suit varying requirements of applications from conventional nonredundant to redundant nature. The unfolded implementation with redundant arithmetic initiated the efforts to address high latency in conventional CORDIC. Subsequently, several modifications have been proposed for redundant CORDIC algorithm to achieve reduction in iteration delay, latency, area and power. The evolution of the unfolded rotational CORDIC algorithms is shown in Figure

Taxonomy of CORDIC algorithms.

CORDIC is broadly classified as nonredundant CORDIC and redundant CORDIC based on the number system being employed. The major drawback of the conventional CORDIC algorithm [

A significant improvement for the conventional rotational CORDIC algorithm in circular coordinate system is proposed [

The low latency nonredundant radix-2 CORDIC algorithm achieves constant scale factor since

Redundant radix-2 CORDIC methods can be classified as variable and constant scale factor methods based on the dependence of scale factor on the input angle. In redundant radix-2 CORDIC [

Taxonomy of constant scale factor redundant radix-2 CORDIC methods.

The scale factor need not be computed for the implementation of all the constant scale factor techniques discussed in this section. In these methods, no specific scale factor compensation technique is considered. It may be noted that a specific compensation technique can be considered depending on the application.

The redundant radix-2 CORDIC using SD arithmetic can be further classified based on the technique employed to achieve constant scale factor (see Figure

The double rotation method performs two rotation-extensions for each elementary angle during the first

This is another method proposed to achieve constant scale factor for the computation of sine and cosine functions. This method avoids rotation corresponding to

This method implements CORDIC algorithm using SD arithmetic, restricting the direction of rotations

The disadvantage of branching method is the necessity of performing two conventional CORDIC iterations in parallel which requires almost two fold effort in terms of implementation complexity. In addition, one of the modules will not be utilized when branching does not take place. However, this method offers faster implementation than double and correcting rotation methods [

The performance of branching algorithm is enhanced by the double step branching method to improve utilization of hardware. This method involves determining two distinct

It is worth discussing here one more classification related to constant scale factor redundant radix-2 CORDIC (see Figure

This algorithm is proposed to reduce the latency of redundant CORDIC [

In the sign estimation methods [

As mentioned earlier, throughput and latency are important performance attributes in CORDIC based systems. The various radix-2 CORDIC algorithms presented so far may be used to reduce the iteration delay, thereby improving the throughput, with constant scale factor. Higher radix CORDIC algorithms using SD arithmetic [

Classification of CORDIC algorithms based on the radix.

Scale factor need not be computed for the constant scale factor algorithms to be discussed in this section. Since no specific scale factor compensation technique is considered for these methods, a compensation technique can be considered depending on the application.

The generalized CORDIC algorithm for any radix in three coordinate systems and implementation of the same in rotation mode of circular coordinate system using radix-4 pipelined CORDIC processor is presented in [

The scale factor

The number of rotations in a redundant radix-2 CORDIC rotation unit is reduced by about 25% by expressing the direction of rotations in radix-2 and radix-4 [

This algorithm achieves constant scale factor, since the rotation corresponding to

A redundant radix-4 CORDIC algorithm is proposed using CS arithmetic, to reduce the latency compared to redundant radix-2 CORDIC [

The possible scale factors are precomputed and stored in a ROM. The number of possible scale factors for

The CORDIC algorithms discussed so far have represented

Taxonomy of direction prediction based CORDIC algorithms.

The low latency parallel radix-2 CORDIC architecture presented for the rotation mode [

The sequential procedure in the computation of direction of rotations of the CORDIC algorithm is eliminated by the P-CORDIC algorithm, while maintaining a constant scale factor. This algorithm precomputes the direction of microrotations before the actual CORDIC rotation starts iteratively in the

The scale factor in the implementation of P-CORDIC algorithm remains constant, as

For

The flat CORDIC algorithm is proposed to eliminate iterative nature in the

The scale factor in the implementation of the flat CORDIC algorithm is maintained constant, since

The Para-CORDIC parallelizes the generation of direction of rotations

The iterative nature in the implementation of the conventional CORDIC algorithm is partially eliminated by semi flat algorithm. This is designed for the semi parallelization of the

The computation time and area of the chip are affected by the choice of

This algorithm achieves constant scale factor, since

We have presented a latency estimate comparison of unfolded architectures available in the literature for 2D rotational CORDIC in Table

Comparison of various rotational CORDIC architectures, (SD: signed digit, CS: carry save).

Method (year) | Radix, | Arithmetic | Latency | Iterative | Iterative | Scale factor | |

Nonpipelined | Pipelined | ||||||

Nonredundant (1959) [ | 2's compliment | Constant | |||||

Redundant (1987) [ | CS | Variable | |||||

Double-rotation/ | SD | Constant | |||||

Correcting (1991) [ | |||||||

Low latency (1992) [ | CS | Constant | |||||

Branching (1993) [ | SD | Constant | |||||

DCORDIC (1996) [ | SD/CS | — | Constant | ||||

Radix-4 (1993) [ | SD | — | Variable | ||||

Radix2-4 (1996) [ | CS | — | Constant | ||||

Radix-4 (1997) [ | CS | — | Variable | ||||

PCORDIC (2002) [ | SD | — | Constant | ||||

Flat CORDIC (2002) [ | SD | 34 for 16-bit, 50 for 32-bit | — | combinational | Constant | ||

Para-CORDIC (2004) [ | CS | — | Constant | ||||

Semi-flat (2006) [ | SD | 33 for 16-bit | — | Constant | |||

Nonredundant low-latency (2008) [ | 2's compliment | — | Constant |

The nonpipelined and pipelined implementation of the conventional radix-2 CORDIC algorithm [

The application of redundant arithmetic [

The double rotation and correcting rotation redundant CORDIC methods using SD arithmetic are proposed in [

Low latency CORDIC algorithm [

Branching algorithm using signed digit arithmetic is proposed to achieve constant scale factor. The latency of nonpipelined and pipelined implementation of this algorithm is

The direction of rotations computed using the sign estimation methods [

All the methods presented so far reduce the latency by decreasing the iteration delay using redundant arithmetic. Since the latency reduction can also be obtained by reducing the number of iterations, the same has motivated to implement radix-4 pipelined CORDIC processor [

The mixed radix CORDIC algorithm [

The advantage of applying radix-4 rotations for all iteration stages is exploited in [

In [

The iterative nature in the

In [

The semiflat technique is proposed in [

In [

In this paper, we have surveyed the algorithms for unfolded implementation of 2D rotational CORDIC algorithms. Special attention has been devoted to the systematic and comprehensive classification of solutions proposed in the literature. In addition to the pipelined implementation of nonredundant radix-2 CORDIC algorithm that has received wide attention in the past, we have discussed the importance of redundant and higher radix algorithms. We have also stressed the importance of prediction algorithms to precompute the directions of rotations and parallelization of

We can draw final conclusions about the different algorithms to achieve efficient implementation of application specific rotational CORDIC algorithm. As far as the application of redundant arithmetic to the pipelined implementation of the conventional radix-2 CORDIC algorithm is concerned, area is doubled with reduction in the adder delay of each stage from