Optimization and Implementation of Scaling-Free CORDIC-Based Direct Digital Frequency Synthesizer for Body Care Area Network Systems

Coordinate rotation digital computer (CORDIC) is an efficient algorithm for computations of trigonometric functions. Scaling-free-CORDIC is one of the famous CORDIC implementations with advantages of speed and area. In this paper, a novel direct digital frequency synthesizer (DDFS) based on scaling-free CORDIC is presented. The proposed multiplier-less architecture with small ROM and pipeline data path has advantages of high data rate, high precision, high performance, and less hardware cost. The design procedure with performance and hardware analysis for optimization has also been given. It is verified by Matlab simulations and then implemented with field programmable gate array (FPGA) by Verilog. The spurious-free dynamic range (SFDR) is over 86.85 dBc, and the signal-to-noise ratio (SNR) is more than 81.12 dB. The scaling-free CORDIC-based architecture is suitable for VLSI implementations for the DDFS applications in terms of hardware cost, power consumption, SNR, and SFDR. The proposed DDFS is very suitable for medical instruments and body care area network systems.


Introduction
Direct digital frequency synthesizer (DDFS) has been widely used in the modern communication systems. DDFS is preferable to the classical phase-locked-loop-(PLL-) based synthesizer in terms of switching speed, frequency resolution, and phase noise, which are beneficial to the high-performance communication systems. Figure 1 depicts the conventional DDFS architecture [1], which consists of a phase accumulator, a sine/cosine generator, a digital-to-analog converter (DAC), and a low-pass filter (LPF). As noted, two inputs: the reference clock and the frequency control word (FCW) are used; the phase accumulator integrates FCW to produce an angle in the interval of [0, 2π), and the sine/cosine generator computes the sinusoidal values. In practice, the sine/cosine generator is implemented digitally, and thus followed by digital-to-analog conversion and low-pass filtering for analogue outputs. Such systems can be applied in many fields, especially in industrial, biological, and medical applications [2][3][4].
In Figure 1, the word length of the phase accumulator is v bits; thus, the period of the output signal is as follows: where FCW is the phase increment and T s denotes the sampling period. It is noted that the output frequency can be written by According to the equation above, the minimum change of output frequency is given by Thus, the frequency resolution of DDFS is dependent on the word length of the phase accumulator as follows: The bandwidth of DDFS is defined as the difference between the highest and the lowest output frequencies. The highest frequency is determined by either the maximum clock rate or the speed of logic circuitries; the lowest frequency is dependent on FCW. Spurious-free dynamic range (SFDR) is defined as the ratio of the amplitude of the desired frequency component to that of the largest undesired one at the output of DDFS, which is often represented in dB c as follows: where A p is the amplitude of the desired frequency component and A s is the amplitude of the largest undesired one.
In this paper, a novel DDFS architecture based on the scaling-free CORDIC algorithm [34] with ROM mapping is presented. The rest of the paper is organized as follows. In Section 2, CORDIC is reviewed briefly. In Section 3, the proposed DDFS architecture is presented. In Section 4, the hardware implementation of DDFS is given. Conclusion can be found in Section 5.

The CORDIC Algorithm
CORDIC is an efficient algorithm that evaluates various elementary functions including sine and cosine functions. As hardware implementation might only require simple adders and shifters, CORDIC has been widely used in the high speed applications.

The CORDIC Algorithm in the Circular Coordinate System.
A rotation of angle θ in the circular coordinate system can be obtained by performing a sequence of micro-rotations in the iterative manner. Specifically, a vector can be successively rotated by the use of a sequence of pre-determined step-angles: α(i) = tan −1 (2 −i ). This methodology can be applied to generate various elementary functions, in which only simple adders and shifters are required. The conventional CORDIC algorithm in the circular coordinate system is as follows [28,29]: where σ(i) ∈ {−1, +1} denotes the direction of the ith microrotation, σ i = sign(z(i)) with z(i) → 0 in the vector rotation mode [34], σ i = − sign(x(i))·sign(y(i)) with y(i) → 0 in the angle accumulated mode [34], the corresponding scale factor k(i) is equal to 1 + σ 2 (i)2 −2i , and i = 0, 1, . . . ., n − 1. The product of the scale factors after n micro-rotations is given by In the vector rotation mode, sin θ and cos θ can be obtained with the initial value: (x(0), y(0)) = (1/K 1 , 0). More Computational and Mathematical Methods in Medicine 3 specifically, x out and y out are computed from the initial value: (x in , y in ) = (x(0), y(0)) as follows: 2.2. Scaling-Free CORDIC Algorithm in the Circular Coordinate System. Based on the following approximations of sine and cosine functions: the scaling-free CORDIC algorithm is thus obtained by using (6), (7), and the above. In which, the iterative rotation is as follows: For the word length of w bits, it is noted that the implementation of scaling-free CORDIC algorithm utilizes four shifters and four adders for each micro-rotation in the first w/2-microrotations; it reduces two shifters and two adders for each microrotation in the last w/2-micro-rotations [24,34,35].

Design and Optimization of the Scaling-Free CORDIC-Based DDFS Architecture
In this section, the architecture together with performance analysis of the proposed DDFS is presented. It is a combination of the scaling-free-CORDIC algorithm and LUT; this hybrid approach takes advantage of both CORDIC and LUT to achieve high precision and high data rate, respectively. The proposed DDFS architecture consists of phase accumulator, radian converter, sine/cosine generator, and output stage. Figure 2 shows the phase accumulator, which consists of a 32-bit adder to accumulate the phase angle by FCW recursively. At time n, the output of phase accumulator is φ = (n · FCW)/2 32 and the sine/cosine generator produces sin((n · FCW)/2 32 ) and cos((n · FCW)/ 2 32 ). The load control signal is used for FCW to be loaded into the register, and the reset signal is to initialize the content of the phase accumulator to zero.

Radian Converter.
In order to convert the output of the phase accumulator into its binary representation in radians, the following strategy has been adopted. Specifically, an efficient ROM reduction scheme based on the symmetry property of sinusoidal wave can be obtained by simple logic operations to reconstruct the sinusoidal wave from its first quadrant part only. In which, the first two MSBs of an angle  indicate the quadrant of the angle in the circular coordinate and the third MSB indicates the half portion of the quadrant; thus, the first three MSBs of an angle are used to control the interchange/negation operation in the output stage. As shown in Figure 3, the corresponding angles of φ in the second, third, and fourth quadrants can be mapped into the first quadrant by setting the first two MSBs to zero. The radian of φ is therefore obtained by θ = (π/4)φ , which can be implemented by using simple shifters and adders array shown in Figure 4. Note that the third MSB of any radian value in the upper half of a quadrant is 1, and the sine/cosine of an angle γ in the upper half of a quadrant can be obtained from the corresponding angle in the lower half as shown in Figure 5. More specifically, as cos γ = sin((π/2) − γ) and sin γ = cos((π/2) − γ), the normalized angle can be obtained by replacing θ with θ = 0.5 − θ while the third MSB is 1. In case the third MSB is 0, there is no need to perform the replacement as θ = θ.

Sine/Cosine Generator.
As the core of the DDFS architecture, the sine/cosine generator produces sinusoidal waves based on the output of the radian converter. Without loss of generality, let the output resolution be of 16 bits, for the sine/cosine generator consisting of a cascade of w processors, each of which performs the sub-rotation by a fixed angle of 2 −i radian as follows:   For 8 ≤ i < 16 where σ(i) ∈ {1, 0} representing the positive or zero subrotation, respectively. Figure 6 depicts the CORDIC processor-A for the first 7 microrotations, which consists of four 16-bit adders and four 16-bit shifters. The CORDIC processor-B with two 16-bit adders and two 16-bit shifters for the last 9 microrotations is shown in Figure 7. The first m CORDIC stages can be replaced by simple LUT to reduce the data path at the cost of hardware complexity increasing exponentially. Table 1     shifter, and 1-bit memory require 200 gates, 90 gates, and 1 gate [36], respectively. Figure 8 shows the hardware requirements with respect to the number of the replaced CORDICstages [24]. Figure 9 shows the SFDR/SNRs with respect to   the replaced CORDIC-stages [25]. As one can expect, based on the above figures, there is a tradeoff between hardware complexity and performance in the design of DDFS. Figure 10 shows the architecture of output stage, which maps the computed sin θ and cos θ to the desired sin φ and cos φ. As mentioned previously, the above mapping can be accomplished by simple negation and/or interchange operations. The three control signals: xinv, yinv, and swap derived from the first three MSBs of φ are shown in Table 2. xinv and yinv are for the negation operation of the output and swap for the interchange operation.

Hardware Implementation of the Scaling-Free CORDIC-Based DDFS
In this section, the proposed low-power and high-performance DDFS architecture (m = 5) is presented. Figure 11 depicts the system block diagram; SFDR of the proposed DDFS architecture at output frequency F clk /2 5 is shown in Figure 12. As one can see, the SFDR of the proposed architecture is more than 86.85 dBc. The platform for architecture development and verification has also been designed as well as implemented to evaluate the development cost [37][38][39][40]. The proposed DDFS architecture has been implemented on the Xilinx FPGA emulation board [41]. The Xilinx Spartan-3 FPGA has been integrated with the microcontroller (MCU) and I/O interface circuit (USB 2.0) to form the architecture development and verification platform.  Figure 13 depicts block diagram and circuit board of the architecture development and evaluation platform. In which, the microcontroller read data and commands from PC and writes the results back to PC via USB 2.0 bus; the Xilinx Spartan-3 FPGA implements the proposed DDFS architecture. The hardware code in Verilog runs on PC with the ModelSim simulation tool [42] and Xilinx ISE smart compiler [43]. It is noted that the throughput can be improved by using the proposed architecture, while the computation accuracy is the same as that obtained by using the conventional one with the same word length. Thus, the proposed DDFS architecture is able to improve the power consumption and computation speed significantly. Moreover, all the control signals are internally generated on-chip. The proposed DDFS provides both high performance and less hardware.
The chip has been synthesized by using the TSMC 0.18 μm 1P6M CMOS cell libraries [44]. The physical circuit has been synthesized by the Astro tool. The circuit has been evaluated by DRC, LVS, and PVS [45]. Figure 14 shows the cell-based design flow. Figure 15 shows layout view of the proposed scalingfree CORDIC-based DDFS. The core size obtained by the Synopsys design analyzer is 452 × 452 μm 2 . The power consumption obtained by the PrimePower is 0.302 mW with clock rate of 500 MHz at 1.8 V. The tuning latency is 11 clock cycles. All of the control signals are internally generated onchip. The chip provides both high throughput and low gate count.

Conclusion
In this paper, we present a novel DDFS architecture-based on the scaling-free CORDIC algorithm with small ROM and pipeline data path. Circuit emulation shows that the proposed high performance architecture has the advantages of high precision, high data rate, and simple hardware. For 16-bit DDFS, the SFDR of the proposed architecture is more than 86.85 dBc. As shown in Table 3, the proposed DDFS is superior to the previous works in terms of SFDR, SNR, output resolution, and tuning latency [6,17,18,26,27]. According to the high performance of the proposed DDFS, it is very suited for medical instruments and body care network systems [46][47][48][49]. The proposed DDFS with the use of the portable Verilog is a reusable IP, which can be implemented in various processes with tradeoffs of performance, area, and power consumption.