Design and Implementation of Hybrid CORDIC Algorithm Based on Phase Rotation Estimation for NCO

The numerical controlled oscillator has wide application in radar, digital receiver, and software radio system. Firstly, this paper introduces the traditional CORDIC algorithm. Then in order to improve computing speed and save resources, this paper proposes a kind of hybrid CORDIC algorithm based on phase rotation estimation applied in numerical controlled oscillator (NCO). Through estimating the direction of part phase rotation, the algorithm reduces part phase rotation and add-subtract unit, so that it decreases delay. Furthermore, the paper simulates and implements the numerical controlled oscillator by Quartus II software and Modelsim software. Finally, simulation results indicate that the improvement over traditional CORDIC algorithm is achieved in terms of ease of computation, resource utilization, and computing speed/delay while maintaining the precision. It is suitable for high speed and precision digital modulation and demodulation.


Introduction
Numerical controlled oscillator (NCO) is an important part of digital downconversion. It is widely used in radar wireless transceiver system and software radio system [1][2][3]. The main function of NCO is to produce two path sine and cosine data samples with variable frequency, discrete time, and mutually orthogonal. It has an advantage of high frequency precision and fast response.
The traditional implement method of NCO is lookup table and polynomial expansion method. Data accuracy of lookup table method depends on the size of the lookup table ROM. The size of the memory and the precision of phase accuracy are exponential relationship, which enlarges the resource consumption and reduces the processing speed of the system. In [4], it solves this problem by using store content mapping technology of odd-even symmetry to optimize the storage unit and reduce the storage resources to 12.5%. However, under the request of high precision, it still consumes a lot of resources. Polynomial expansion method is a real-time computing method which needs multiplier resources and has certain restrictions on the complexity and speed of the hardware. It is too hard for the two methods to trade off speed, accuracy, and resource. Coordinate rotation digital compute algorithm (CORDIC) is proposed to solve the problem. CORDIC algorithm uses a basic algorithm to replace the complex algorithm. CORDIC algorithm is easy to hardware implementation. It does not require hardware multiplier and all operations are only shift accumulation, which meets the hardware requirements of modular and regularization algorithm requirements.
Along with proposing high speed broadband receiver, the data accuracy and processing speed have a higher request. Under the background, traditional CORDIC algorithm has some inherent drawbacks, such as limited coverage angle and too much pipeline series which increase resource consumption and limit data processing speed. Aiming at these shortcomings, this paper puts forward an efficient pipeline architecture CORDIC algorithm for NCO design.  Fourier transform, discrete cosine transform, digital modulation/demodulator, and stream processors [7][8][9][10]. According to certain phase, starting point ( , ) rotates continuously and approaches the final point gradually. Rotation vector diagram is shown in Figure 1.
In Figure 1, it is easy to get From the start to the end position, spinning process can be done by several steps and each step only rotates a certain phase: After extracting cos , formula (2) can be expressed as follows: In order to simplify the hardware implementation, every operation sets each rotation phase to = arctan(2 − ). The total rotation phase is = ∑ . So tan = 2 − . Formula (3) can be expressed as follows: From formula (4), in addition to the cos coefficient, the operation is simple shift and addition.
In the final result, cos can be eliminated by multiplying a known constant. For example, , the number of iterations is 16 and | | ≤ /4. can be expressed as follows: In the phase rotation process, approximative rotational iterative formula is Parameter is used to judge when the iteration is over: If the initial value is ( , ) = ( 0 , 0 ) = ( , 0), ( , ) of th iteration will converge to (cos , sin ). The phase convergence satisfies the CORDIC convergence theorem [6]. The constant scaling factor is fixed and can be precomputed as long as the precision is determined. After analysis of traditional CORDIC algorithm calculation accuracy, the iteration number and phase precision are expressed as follows: where Δ min = 2 /2 and the input phase data width is .

Hybrid CORDIC Algorithm Based on Phase Rotation Estimation
Common operation structures are iteration, pipeline, and differential CORDIC algorithm. Iterative structure occupies less hardware resources, but the processing data efficiency is low. Although the pipeline structure occupies more hardware resources, it can improve the throughput. Based on the two realization structures, implementation schemes have parallel pipelines, hybrid rotation CORDIC, angle encoding method, and so forth [11,12]. The work in [13] puts forward the way of prediction rotation direction. The algorithm, applied in error analysis and elimination, has the advantages of fast speed. But it does not optimize hardware structure. Using the structure of the parallel hybrid CORDIC algorithm, the prediction scheme of [14] is more regular and simpler compared to previous approaches, which can reduce the number of iterations by more than 50 percent. However, the judgment of rotation direction is not optimized, which increases latency time and resources, so that it affects the throughput. The work in [15] puts forward a modified hybrid CORDIC algorithm and improved the precision of output data, but the method is more complex. Trading off the disadvantages of the above methods and advantages of pipeline structure and iterative structure, this paper simplifies the CORDIC algorithm further. By using the arctangent function property, it reduces the rotating judgment and addsubtract unit operation.
The Scientific World Journal 3 In this paper, attention is focused mainly on techniques that reduce the number of iterations, while keeping the low latency. The hybrid CORDIC algorithm based on phase rotation estimation is presented in this section, which can be addressed by digit-on-line pipelined CORDIC circuits and repetitive multiple accumulations architecture.

Rotation Phase Estimation.
Assuming that the input phase length of CORDIC algorithm is and pipeline series is , rotation phase can be represented as follows: where = ±1. It is noted that the initial value of is 1, and the reason is that we restrict the rotation angle within the range | | ≤ /4 in the application example of NCO.

Rotation Function Optimization and Error Analysis.
In order to obtain cosine data from the new pipeline process, we put forward unidirectional rotation method to reduce the comparator and choose addition or subtractor.̃+ 1 should be expressed firstly. -bits input phase needs to iterate times. The results can be expressed as . The residual phase at this time is +1 .̃+ 1 is expressed as follows: where When [ − 2] = 1, is that [ ] flips every bit and adds 1: From formula (15),̃is unknown. In the hardware implementation,̃needs to be expressed as follows: where = 1 or 0. Taking all figures of̃into (15), Uniting formulas (13) and (17), The last set of rotation phase can be expressed as binary. Rotation direction is obtained directly from the last set. One more shift and add operation reduces − − 3 rotation times. Under the premise of ensuring phase and data accuracy, it reduces the resource consumption and improves the operation speed. Finally, with fewer lines series, constant coefficient is as follows:  At this time set the initial value ( 0 , 0 ) to (̂, 0). According to the above process, ( +2 , +2 ) converges to (cos , sin ).
The cosine error of this algorithm can be divided into three parts: (1) the quantization errors are caused due to the limited word length, (2) limited phase word length leads to approximation error, (3) the phase estimation gives rise to the rotation estimation error.
Quantization error is in an inverse ratio to word length and output word length is set by pipeline series. The more the pipeline series are, the lower the quantization error is. But the increase of pipeline series will lead to resources consumption. So according to the data figure, it is necessary to trade off pipeline series and quantization error. Considering the hardware consumption, computing speed, and precision, [7] proposes the optimization method of data bits and pipeline series. According to [16], the quantization error consists of two parts, the quantization error produced before and this time. It can be expressed as follows: where E is the sum of quantization error and e is the th phase rotation quantization error with = ±1. When the output data is and e = [ ] , | | ≤ , | | ≤ , = 2 − −1 , |e | can be expressed as follows: According to formula (7), when phase length is , phase resolution is = 2 /2 . Approximation error produced by limited phase word length can be expressed as follows: where is the actual value and is the error value. Δ is the difference between real phase and approximate phase. In the final rotating phase estimate, rotating phase arctan(2 − ) is instead of 2 − . Arc value of 2 can be replaced only by binary values similarly. In formula (16), the generated error can be expressed as follows: (23)

High Speed and Precision NCO
Structure. This paper adopts efficient pipelining structure CORDIC algorithm for high speed and high precision NCO. Its structure is shown in Figure 2. We take 16-bit phase control words as an example. Firstly, input is a 16-bit phase control word and 16-bit frequency control word. Secondly, through the phase accumulator and phase adder, the output is 16-bit phase value. Phase map generates the {0, /4} phase. Thirdly, the shift-add efficient pipelining structure processes phase data. Finally according to the previous mapping relation, 16-bit sine and cosine data can be generated. The range of rotation angle value is {−44.855 ∘ , 44.855 ∘ } and approximates to {− /4, /4}. It does not meet the {0, 2 } scope of phase. Before 16-bit phase values are sent into the algorithm, cosine function property can judge the highest, second highest, and third highest bit. According to certain mapping relation, the highest 3 bits of 16-bit phase value and phase can be reduced to 3 000 and {0, /4}, respectively. The highest bit controls sine data symbol. If the bit is 1, the algorithm flips the sine data and adds 1. On the other hand, the algorithm does not process input data. The highest bit and second highest bit control cosine data symbol. If they are different, the algorithm flips sine data and adds 1. Otherwise, it remains to be the input data. The second highest bit and third highest bit control the location of cosine data and sine  data. If they are different, the algorithm exchanges cosine data and sine data. Or else it remains to be the input data.

Internal Architecture Design and the Major Implementation
Steps. According to formula (10), our algorithm needs = ⌈(16 − log 2 3)/3⌉ = 5 times for traditional phase rotation and one time for rotation phase estimation. If̃6 = 2 −9 + 2 −10 + 2 −13 , the pipeline structure is shown in Figure 3. Each level only needs three adder-subtractors, two or six phase shift registers, and a phase coefficient memory and reduces more than a half of the rotation phase judgment and shift operation. For reducing the critical path in the pipelined implementation of traditional CORDIC, the differential CORDIC (D-CORDIC) algorithm based on digit-online pipelined CORDIC circuits [17] can be used to achieve higher throughput and lower pipeline latency. D-CORDIC algorithm is equivalent to the usual CORDIC in terms of accuracy as well as convergence. The system architecture uses parallel and pipeline differential CORDIC architecture to reduce latency and improve throughout. Digit-on-line pipelined CORDIC circuits take place of continuous phase accumulation in Figure 3.
From what has been discussed above, the major steps of our algorithm are as follows. Step 1. Phase rotation is limited in the range of {− /4, /4}.
Step 2. Traditional or differential CORDIC algorithm implements partial phase rotation.
Step 3. Using a relatively simple prediction scheme, we divide original CORDIC rotations into the lower part and the higher part.
Step 4. Differential CORDIC or traditional architecture is proposed to compute rotation direction. The lower part is computed by continuous accumulation or online architecture [18] based on differential CORDIC and the higher part is predicted by rotation phase estimation.
Step 5. According to phase mapping relationship, the required high precision and high speed cosine data is produced. Table 1 compares the delay of some CORDIC rotation methods. Our proposed algorithm could obtain good performance in delay and resource. To compare our pipeline CORDIC algorithm with other previously proposed methods fairly, we assume CSA is universal adder in all algorithms and fast carry-propagate adders (CPA) are used in the last stage to take carry-save forms back to the input initial phase value.

Simulation Results.
In [13], the first iterations use the traditional continuous comparison method, the same as the traditional CORDIC. The delay increases logarithmically with the maximum number of shifts. If the delay of carry-propagate adder (CPA) is ⌈log 2 ⌉ ⋅ , the latency of ( − ) iterations increases linearly with the word length and the delay is (4 /3) ⋅ . Based on the calculation method above, the traditional CORDIC based on pipeline architecture has the delay of ⌈log 2 ⌉ ⋅ ⋅ . Unlike the above methods, our proposed method reduces the number of iterations and simplifies the datapath. The first iterations still adopt the traditional CORDIC algorithm where a delay of ⌈log 2 ⌉ ⋅ is assumed for an -bit CPA. The accumulations of final iteration use repetitive multiple accumulations architecture [19], which has much higher throughput and less delay compared with serial accumulator and pipelined adder based on carry-save addition as well. The last iteration increases linearly and the delay is (4 /3) ⋅ , where is the full-adder number for the accumulations based on adder-tree architecture.  According to the structure shown in Figure 2, traditional pipeline structure and efficient pipeline structure based on rotation phase estimation are implemented by verilog language, respectively. Hardware platform is a Cyclone II series EP2C8Q208C8 chip and software platform is in Quartus II of Altera company. Modelsim 10.0 simulation software tests the experience result. Firstly, the input frequency control word, phase control word, and clock frequency are set to 16 ℎ1999, 16 0, and 100 MHz. Output frequency is 10 MHz. Compared to the use of resources, the result can be expressed in Table 2.
Through the comparison in Table 2, our proposed algorithm reduced resource obviously. This algorithm precision is the same as traditional CORDIC algorithm, Δ̂m in = 2 /2 . The input frequency control word, phase control word, and clock frequency are set to 16 ℎ00 6 and 16 0. The output frequency is 0.3125 MHz. Compared with the theoretical value and experiment value, the error statistic is shown in Figures 4 and 5. The simulation runtime of our proposed algorithm costs less than the traditional CORDIC algorithm in Figure 6.
Compared with Figures 4 and 5, our proposed algorithm has the larger error volatility, while the two kinds of the algorithm error will be controlled in (−5 × 10 −4 , 5 × 10 −4 ).
Though our algorithm structure reduces logic unit, it guarantees the cosine data accuracy. Figure 7 shows the NCO simulation waveform of efficient pipeline structure.
It is necessary to obtain efficient bits of phase, optimum iteration number, and data width. We do the above experiment 200 times. The random angle value is restricted from 0 to 45 ∘ . When the iteration number is 5∼8 and the series of data width are 15, 16, 18, and 21, we can obtain the effective bits. The relationship of effective bit number with iteration times and data width is shown in Table 3. The data unit is degree.    The algorithm error will be controlled in (−5 × 10 −4 , 5 × 10 −4 ), when the iteration number is greater than 6. The experimental results show that the effective bit number is 13. Through calculating the minimum number of microrotation, the effective bit number is generally seven greater than iteration number. The calculation of total quantization errors could be calculated through this method.

Conclusion
In this paper, the hybrid CORDIC algorithm based on phase rotation estimation is proposed to design NCO. In the case of assuring the high precision output, the efficient CORDIC algorithm reduces more than a half of the rotation phase judgment and shift operation. Resource consumption, operation speed, and system delay have much better performance than traditional CORDIC algorithm. In terms of electronic countermeasures, it has a certain practicality. The algorithm has been successfully used in high speed broadband ADS-B receiver and shows good performance.