1. Introduction

TSWJ

The Scientific World Journal

1537-744X 2356-6140

Hindawi Publishing Corporation

131568

10.1155/2014/131568

131568

Research Article

Hardware Implementation of 32-Bit High-Speed Direct Digital Frequency Synthesizer

Ibrahim

Salah Hasan

^1,2 Ali

Sawal Hamid Md.

² Islam

Md. Shabiul

³ Adams

Robert

Department of Electronics

College of Engineering

Diyala University

Baqubah, Diyala 32001

Iraq

uodiyala.edu.iq

Department of Electrical, Electronics & System Engineering

Faculty of Engineering

Universiti Kebangsaan Malaysia (UKM)

43600 Bangi

Malaysia

ukm.my

Institute of Microengineering and Nanoelectronics (IMEN)

Universiti Kebangsaan Malaysia (UKM)

43600 Bangi

Malaysia

ukm.my

2014

262014

2014 04 02 2014 13 03 2014 29 04 2014 2 6 2014

2014

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The design and implementation of a high-speed direct digital frequency synthesizer are presented. A modified Brent-Kung parallel adder is combined with pipelining technique to improve the speed of the system. A gated clock technique is proposed to reduce the number of registers in the phase accumulator design. The quarter wave symmetry technique is used to store only one quarter of the sine wave. The ROM lookup table (LUT) is partitioned into three 4-bit sub-ROMs based on angular decomposition technique and trigonometric identity. Exploiting the advantages of sine-cosine symmetrical attributes together with XOR logic gates, one sub-ROM block can be removed from the design. These techniques, compressed the ROM into 368 bits. The ROM compressed ratio is 534.2 : 1, with only two adders, two multipliers, and XOR-gates with high frequency resolution of 0.029 Hz. These techniques make the direct digital frequency synthesizer an attractive candidate for wireless communication applications.

1. Introduction

Direct digital frequency synthesis (DDFS) systems with high-speed frequency hopping function, low power consumption, high frequency resolution, and small chip area are in a great demand for DDFS applications especially in wireless communications and radar systems. DDFS systems have low noise and high frequency resolution compared with phase lock loop [1]. The DDFS output frequency (Fout) is given by (1) Fout=(FCW2N)*Fclk, where Fclk is clock frequency, FCW is a frequency control word, and N is the phase accumulator width. Pipelining technique is defined as a technique that partitioned the given task into a number of subtasks that need to be performed in a sequence. The pipelining technique is used in [2–4] to increase the throughput of the output frequency. However, this technique also increases the power consumption and chip area. The gated clock technique was applied to the pipeline phase accumulator (PA) in [5] to reduce the number of registers without performance degradation. A parallel adder based on progression-of-states technique was combined with the pipelining technique in [6, 7], for high-speed, low power pipelined parallel PA. The pipelined accumulator with parallel adder technique was proposed in [8]. In this paper, two blocks of parallel 16-bit ripple carry adder (RCA) are designed based on the progression-of-states technique with two pipelining stages.

DDFS spectral purity depends on the number of the phase output bits used for addressing the ROM. Therefore, the higher spectral purity required, the bigger ROM size needed. The quarter wave symmetry in [9, 10] was applied as a simple technique to reduce the ROM size, storing only quarter (0:π/2) of the sine wave bits and using the two most significant bits (MSB) to generate the full sine wave (2π). Quarter wave symmetry together with an angular decomposition method based on trigonometric identity was used in [11–13] to compress the ROM size by partitioning the ROM into three sub-ROMs. This technique was also used in [14], where the ROM is partitioned into two ROMs, namely, coarse and fine ROMs, to obtain the quadrature outputs, sine and cosine values. The author suggested several DDFS systems implemented with different types of FPGA kit boards and compared them with an ASIC based approach.

The polynomial approximation method was used in DDFS design to optimize the spurious-free dynamic range (SFDR). This method was used by Ashrafi and Adhami in [15] to determine the upper bound of the SFDR using piecewise polynomial interpolation. Fourier series was used to establish a linear relationship between the coefficients of the interpolating polynomials and the frequency spectrum. This method can be used in any kind of DDFS to maximize the SFDR.

De Caro et al. proposed a dual-slope technique in [16] to optimize the piecewise linear approximation for the phase to sine mapping. This technique improves the spectral purity and ROM size. The stored values of the two quarters (π/4) sine and cosine sub-ROMs, with 3 MSB bits and mapper, are required to accomplish the quarter phase to sine mapping and produce quadrature output from the DDFS. In [17], De Caro et al. proposed another DDFS architecture which is based on piecewise linear approximation technique with nonuniform segment length to the input of three groups of multiplexer. This technique can maximize the SFDR and reduce the size of the coefficients ROM.

This paper presents a high-speed DDFS system with pipelined PA based on modified parallel BK adder and gated clock technique. The ROM was resized by applying the quarter-wave symmetry technique in one quarter of the sine wave, and an angular decomposition technique based on trigonometric identity has been applied to compress the quarter ROM LUT. Based on these techniques, the quarter ROM LUT was partitioned into three sub-ROMs (A, B, and C). The proposed architecture improves the speed of the DDFS and reduces the size of the ROMs.

2. The Gated Clock and Parallel Pipelining Technique for Phase Accumulator

The modified parallel BK adder based on the progression-of-states technique combined with gated clock technique was used in the proposed design of the PA. The frequency resolution (FRes) of DDFS is determined by the clock frequency (Fclk) and the number of N input bit of the PA as depicted by (2)FRes=Fclk2N.

For high frequency resolution, it is preferable to design a PA with large FCW bits input. However, a large ROM size is required to implement all the 2N bits of phase accumulator output. Due to this reason, a part of the MSB phase output is used to address the phase to amplitude converter or ROM lookup table while maintaining high frequency resolution. The pipeline technique was used to increase the throughput of the accumulator, and this throughput will double with the number of pipeline stages, as shown in Figure 1.

Figure 1

The conventional 32-bit pipelining phase accumulator.

The number of registers increases with the number of pipeline stages, which leads to high power consumption. Therefore, in this design, a gated clock technique was used to reduce the number of preskewing registers while preserving high-speed operation. In this technique, D flip-flops (DFFs) were used to connect each row of the pipeline stages with FCW input. These registers are clocked by the pipelined pulses with one clock cycle based on the shifted clock pulses as shown in Figure 2(a). Considering that the phase accumulator input bits are N, the PA was partitioned into L stages with B DFFs in each stage. The number of the DFFs, K for preskewing registers, is given by (3)K=(N*(L+1))2. By applying the gated clock technique on the proposed design, the number of DFFs is given by (4)K=N+L. As a result, with the gated clock technique, the numbers of preskewing registers have been reduced from 80 to 36 corresponding to 53.7% reduction.

(a) The gated clock technique, (b) parallel adder based on progression-of-states technique, and (c) pipelined PA output.

(a) (b) (c)

Figure 2(b) shows the operation of the parallel adder. Four adders calculate the holding constant of 8-bit input word within four clock cycles. Holding the FCW for four clock cycles limits the update rate of the frequency input word but does not cause any glitch, because the four adders will steadily increase the accumulator output four times. Assume that FCW is N and the parallel adder output is X at the Tth clock time. The parallel adder’s outputs are given as follows: (5)X(T+1)=N+X(T),X(T+2)=X(T+1)+X(T)=N+X(T)+X(T)=N+2X(T),X(T+3)=X(T+2)+X(T)=N+3 X(T),X(T+4)=X(T+3)+X(T)=N+4X(T). The result of the fourth adder is set as a feedback to the second input for all the four adders. To generate the second output [X(T+2)] and fourth output [X(T+4)], the N bit is shifted up by removing one and two bits of the FCW input and replacing them with the one and two bits from the lower pipelining stage, respectively, before they are added. The X(T+2) output and X(T) output are used to generate X(T+3).

This operation made the frequency tuning word held constant for four clock cycles without causing any imperfections in the PA output. The partitioned clock cycles (Clk/4) make the multiplexers choose one of the results at the output of the PA to overcome the holding time on the parallel adders as illustrated in Figure 2(c).

3. Phase Accumulator Architecture

The proposed phase accumulator architecture based on the modified parallel BK adder and the gated clock technique with pipelining stages is shown in Figure 3. The output of the PA is a truncated 14-bit value that is achieved from the 8 and 6 bits of the top and second pipelining stages, respectively.

Figure 3

Pipelined phase accumulator with parallel adder based on modifying Bren-Kung adder and gated clocking technique.

The 32-bit FCW input is sufficient to obtain 0.029 Hz frequency resolution from the 125 MHz clock frequency of Cyclone III FPGA kit board (FRes=125×106/232=0.029 Hz).

3.1. Modifying Brent-Kung Adder

The BK adder is fast, and all carries are computed simultaneously through a binary tree of “BK” cells as shown in Figure 4(a).

Block circuit diagram of conventional Brent-Kung adder (a) and (b) modifying Brent-Kung adder.

(a) (b)

BK cells compute the carry for two or more of full adder (FA) cells, and they are calculated as a sum of G and P cells [23, 24]. The arithmetic operation is given by (6)BK=G+P,G=g′′+p′′·g′,P=p′′·p′, where g′′g′, p′′p′ are higher and lower generate and propagate functions, respectively. The p is a propagate function (pi=xi⊕yi) and g is a generate function (gi=xi·yi).

The general prefix addition algorithm is explained by Zimmermann in [25]. By adding the carry input Cin in the prefix structure with some modifications, the prefix structure can be used in pipelining-based adder design. This approach is used in BK adder fast carry computation. However, in this paper, a modification is proposed to the BK adder so that it can be used in pipelining architecture. The proposed modification is by removing the operation of the g0, and the carry out of the first bit C1 can be achieved by a 2-1 multiplexer. The input to this multiplexer is x0 and Cin while p0 is the select input and the output is C1. The operation of the multiplexer is given by (7)C1=(p0. cin)+(x0. p0-), where C1 is the carry out, p0 is the propagate function, Cin is the carry input, and x0 is the first bit input. The proposed modification of the 8-bit BK adder is shown in Figure 4(b). The sum and carry out of the modified 8-bit BK adder are shown in (8)S0=p0⊕cin,SN⋯1=p((N-1)⋯1)⊕C((N-1)⋯1),Cout=gN+pN·CN.

3.1.1. Comparison of Different Adder Architecture

An adder is a key element of the pipelining PA design, and a fast adder improves PA performance. Parallel-prefix adder tree structures such as Sklansky [26], Kogge-Stone adder [27], BK [23], and Beaumont-Smith [28] have been used in pipelining accumulator design for high-speed operation.

A comparison has been made between conventional adder and several parallel-prefix adders for 12-bit, 18-bit, 24-bit, and 32-bit operations. The PA designs were coded in Verilog HDL and verified in Cyclone III FPGA kit board. Prior to that, all the designs were simulated by using ALTERA Quartus II. The comparison result is shown in Figure 5. From the figure, it can be seen that BK adder performs relatively faster, especially for high number of bits.

Figure 5

Comparison result of maximum operating frequency (FMax⁡) for phase accumulator design with CLA, Brent-Kung, Kogge-Stone, Sklansky, Beaumont-Smith, and RCA adders.

4. ROM Lookup Table Design

ROM LUT or phase-to-amplitude converter (PAC) is a memory storage address for DDFS, which is used to convert the phase signal into an amplitude sine wave signal. High-accuracy output signal for DDFS requires a large number of LUT. The ROM size exponentially increases with increasing number of bit inputs. The designer’s challenge is a tradeoff between reducing the ROM size while maintaining high performance (high resolution, high speed). Quarter-wave symmetry technique is used to resize the ROM in the proposed DDFS design to store only one quarter (0:π/2) of the sine waveform and two most significant bits (MSB) from the phase accumulator are used to reconstruct the full sine wave. From these two MSBs, one of them is used to determine if the sine amplitude is increasing or decreasing and the other one is used to determine its sign.

The phase output is directly used in the first and third quarter while the inverse values of the phase output are used in the second and fourth quarters. This requires 2’s complement when the phase is between (π:2π) and is achieved by adding full adder at the output gate to accomplish the full sine wave value. To save power and achieve a smaller area design, the 1/2 LSB offset is added to the stored memory address of sub-ROMs. This offset removes the full adder component from the 2’s complement of the proposed design.

The angular decomposition technique based on trigonometric identity is one of the best techniques to reduce the quarter ROM LUT size. The quarter ROM LUT was partitioned into three (A, B, and C), such that A<(π/2), B<(π/2)*(1/2A), and C<(π/2)*(1/2A+B), with the same approximations based on trigonometric identity.

According to the trigonometric relation, the sine wave function is given in (9)sin(A+B+C) =sin(A+B)cos⁡C+cos⁡Acos⁡BsinC -sinAsinBsinC, where (10)sin(A+B)=sinAcos⁡B+cos⁡AsinB. Equation (10) indicates the presence of four split sub-ROMs, two (2A and 2B) for sin and cos⁡A and B. The calculated results of (sin-cosA) show that they are inversely symmetrical. Based on this, cos⁡(A) can be obtained by complementing the sin(A) values and this is achieved by connecting the sin(A) output and the high Vcc to the XOR logic gate inputs. In this way, only one addressing sub-ROM is needed for sinA and cos⁡A values. The equation for cos⁡B calculation is given by (11)cos⁡B=cos⁡((π2×[0:(2B-1)]2B)×(12B)). The formula shows that the cos⁡B values with 4-bit input are fed into the sub-ROM B(B=4); therefore cos⁡B=cos⁡(0.046019)=0.999999677≅1. Thus, the (cos⁡B) block ROM was also removed from the proposed design. Therefore, (10) can be reduced as given by (12)sin(A+B)=sinA+cos⁡AsinB. With three values of A, B, and C, (9) may be rewritten as (13)sin(A+B+C)=sinA+cos⁡AsinB+cos⁡AsinC. The change in the proposed ROM LUT design after hardware reduction is shown in Figure 6.

Figure 6

Architecture of compressed ROM LUT design.

The error that results from the approximation in (9) is (2π/214), which is approximately 3.83×10-4. The required ROM size for 12-bit input is 214×12=196608 bits. Applying the sine wave approximation based on trigonometric identity with three 4-bit ROM requires only 368 bits [(24×11=176)+(24×8=128)+(24×4=64)] for ROM A, ROM B, and ROM C, respectively. The compressed ratio (196608/368) is 534.2 : 1, with only two adders, two multiplayer adders, and XOR gate as additional hardware equipment.

The final design of the high-speed DDFS, which consists of parallel pipelining PA and compressed ROM LUT by using the wave symmetry technique, is shown in Figure 7.

Figure 7

The block diagram of the final design of high-speed DDFS.

5. Hardware Implementation of High-Speed DDFS

The proposed design of the high-speed DDFS with 0.029 Hz frequency resolution was coded in Verilog HDL, successfully simulated in ALTERA Quartus II software, and implemented with a Cyclone III FPGA kit board.

The proposed DDFS has been verified using a spectrum analyzer. The measured results are consistent with the simulated results. The DDFS design accumulates the FCW input into the sawtooth PA output, and the PA output feeds into the phase-to-amplitude converter (ROM LUT). The full amplitude waveform at the output of the ROM lookup table is shown using the Signal Tap logic analyzer in Figure 8.

Figure 8

Signal tap logic analyzer of FCW input (in hexadecimal), the sawtooth PA output, and half and full amplitude waveform.

6. Result and Discussion

The 32-bit DDFS has been implemented on the Cyclone III FPGA kit board. The 32-bit phase accumulator output was truncated into 14 bits. Approximately 196608 (214×12) bits were required to draw the full sine wave signal with 12-bit output. Quarter-wave symmetry and an angular decomposition technique based on trigonometric identity were applied. The ROM was compressed to only 368 bits, with 534.2 : 1 ratio. The measured DDFS output waveforms and spectra for different clock frequencies illustrate the purity of the sine wave output as shown in Figure 9, for frequency tuning word = (1FFFFFFF) in hexadecimal format, and Fclk = 125 MHz (Cyclone III kit board clock frequency). The output frequency that was calculated is Fout=(FCW/2N)×Fclk=(229-1/232)×125*106=15.624999 MHz. The measured output frequency from oscilloscope is 15.63 MHz and this is closely matched with the calculation result.

Figure 9

The sine wave signal of the high-speed DDFS.

The signal-to-noise ratio (SNR) was approximated based on (14)SNR=- 6.02M-1.76 dB (see [29]), where M represents the number of bits used to feed the digital-to-analog converter. For the proposed DDFS, M=12 and the calculated SNR is 74 dB.

The measured DDFS output from the spectrum analyzer shows SNR of approximately 68 dB as shown in Figures 10(a) and 10(b). The result is about 6 dB less than the calculated result and this is due to the noise from wire connections.

The measured signal-to-noise ratio for the DDFS waveform output (a and b).

(a) (b)

Table 1 shows the comparison of the ROM size used in the proposed design with some other parameters. The table shows that the proposed design has the smallest ROM size.

Table 1

Comparison between Rom size with previous DDFS works.

	Phase accumulator (bit)	Truncated phase (bit)	Amplitude phase (bit)	ROM size (bit)	Truncation ratio	SFDR (dB)
Sunderland et al., 1984 [11]	20	14	12	3328	59.1 : 1	72
Nicholas et al., 1988 [12]	31	15	14	3072	149.3 : 1	90
Curticǎpean and Niittylahti, 2001 [18]	28	14	12	832	236 : 1	84
De Caro and Strollo, 2005 [19]	24	14	12	480	409.6 : 1	83.6
Yang et al., 2004 [8]	32	14	12	2176	90.35 : 1	NA
Chimakurthy et al., 2006 [20]	15	15	15	1216	404 : 1	90.3
Babak and Keshavarzi, 2009 [21]	32	16	14	1664	551.3 : 1	85.3
De Caro et al., 2008 [22]	24	14	12	672	292.5 : 1	80
This work	32	14	12	368	534.2 : 1	68

Note: in this work the measured DDFS output waveform is in signal-to-noise ratio (SNR).

7. Conclusion

A 32-bit high-speed DDFS system was designed with 0.029 Hz frequency resolution. The system was successfully simulated in ALTERA Quartus II software. The phase accumulator together with the ROM lookup table has been implemented on the Cyclone III FPGA kit and verified with a Signal Tap logic analyzer, and the complete DDFS system with digital to analog convertor has been implemented on the Cyclone III FPGA kit board and the performance has been measured using oscilloscope and spectrum analyzer. Parallel pipelining with clock gating has been applied on a modified BK adder for the proposed phase accumulator design. Wave symmetry and an angular decomposition technique based on trigonometric identity were used to reduce the ROM size. The ROM was compressed to only 368 bits, with a 534.2 : 1 ratio. The proposed design has demonstrated attractive results that improve the operation speed and reduce the ROM size significantly.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Tierney

Rader

C. M.

Gold

A digital frequency synthesizer

IEEE Transactions on Audio and Electroacoustics 1971 19 48 57

Geng

Dai

F. F.

Irwin

J. D.

Jaeger

R. C.

An 11-bit 8.6 GHz direct digital synthesizer MMIC with 10-bit segmented sine-weighted DAC

IEEE Journal of Solid-State Circuits 2010 45 2 300 313

2-s2.0-76849115986

10.1109/JSSC.2009.2037542

Jensen

B. S.

Khafaji

M. M.

Johansen

T. K.

Krozer

Scheytt

J. C.

Twelve-bit 20-GHz reduced size pipeline accumulator in 0.25 μm SiGe:C technology for direct digital synthesiser applications

IET Circuits, Devices and Systems 2012 6 1 19 27

2-s2.0-84856932692

10.1049/iet-cds.2010.0399

Ibrahim

S. H.

Ali

S. H. M.

Islam

M. S.

Design a 24-bits pipeline phase accumulator for direct digital frequency synthesizer

Proceedings of the International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA '12)

August 2012

393 397

2-s2.0-84869186768

10.1109/MSNA.2012.6324603

Chen

Zhou

Jin

Liu

A 5.3-GHz 32-bit accumulator designed for direct digital frequency synthesizer

Chinese Science Bulletin 2012 57 19 2480 2487

2-s2.0-84862654122

10.1007/s11434-012-5157-4

Yang

B.-D.

Kim

L.-S.

H.-K.

A high speed direct digital frequency synthesizer using a low power pipelined parallel accumulator

Proceedings of the IEEE International Symposium on Circuits and Systems

May 2002

373 376

2-s2.0-0036290974

Chappell

McEwan

A low power high speed accumulator for DDFS applications

Proceedings of the IEEE International Symposium on Cirquits and Systems

May 2004

797 800

2-s2.0-4344680643

Yang

B.-D.

Choi

J.-H.

Han

S.-H.

Kim

L.-S.

H.-K.

An 800-MHz low-power direct digital frequency synthesizer with an on-chip D/A converter

IEEE Journal of Solid-State Circuits 2004 39 5 761 774

2-s2.0-2442572353

10.1109/JSSC.2004.826323

Rekha

K. R.

Nagabushan

B. S.

Nataraj

K. R.

FPGA implementation of low power digital frequency synthesizer

International Journal of Engineering Science and Technology 2010 2 10 5385 5394

Langlois

J. M. P.

Al-Khalili

Low power direct digital frequency synthesizers in 0.18 μm CMOS

Proceedings of the IEEE Custom Integrated Circuits Conference

September 2003

283 286

2-s2.0-0242696129

Sunderland

D. A.

Strauch

R. A.

Wharfield

S. S.

Peterson

H. T.

Cole

C. R.

CMOS/SOS frequency synthesizer LSI circuit for spread spectrum communications

IEEE Journal of Solid-State Circuits 1984 19 4 497 506

2-s2.0-0021479005

Nicholas

H. T.

Samueli

Kim

Optimization of direct digital frequency synthesizer performance in the presence of finite word length effects

Proceedings of the 42nd Annual Frequency Control Symposium

June 1988

357 363

2-s2.0-0024027811

Khan

M. N.

Imran

M. S.

Rehan

Hai

High speed direct digital frequency synthesizer(DDFS) architecture with reduced ROM structure

Proceedings of the Student Conference on Engineering Sciences and Technology (SCONEST '05)

August 2005

1 5

2-s2.0-51149120594

10.1109/SCONEST.2005.4382873

Mariangela Genovese

E. N.

De Caro

Petra

Strollo

A. G. M.

Analysis and comparison of direct digital frequency synthesizers implementedon FPGA

Integration, the VLSI Journal 2014 47 261 271

Ashrafi

Adhami

Theoretical upperbound of the spurious-free dynamic range in direct digital frequency synthesizers realized by polynomial interpolation methods

IEEE Transactions on Circuits and Systems I 2007 54 10 2252 2261

2-s2.0-36349023943

10.1109/TCSI.2007.904660

De Caro

Strollo

A. G. M.

High-performance direct digital frequency synthesizers in 0.25 μm CMOS using dual-slope approximation

IEEE Journal of Solid-State Circuits 2005 40 11 2220 2227

2-s2.0-27844440775

10.1109/JSSC.2005.857371

De Caro

Petra

Strollo

A. G. M.

Direct digital frequency synthesizer using nonuniform piecewise-linear approximation

IEEE Transactions on Circuits and Systems I 2011 58 10 2409 2419

2-s2.0-80053568192

10.1109/TCSI.2011.2123730

Curticǎpean

Niittylahti

A hardware efficient direct digital frequency synthesizer

Proceedings of the 8th IEEE International Conference on Electronics, Circuits and Systems (ICECS '01)

September 2001

51 54

2-s2.0-77956859593

De Caro

Strollo

A. G. M.

High-performance direct digital frequency synthesizers using piecewise-polynomial approximation

IEEE Transactions on Circuits and Systems I 2005 52 2 324 337

2-s2.0-14644403034

10.1109/TCSI.2004.841592

Chimakurthy

L. S. J.

Ghosh

Dai

F. F.

Jaeger

R. C.

A novel DDS using nonlinear ROM addressing with improved compression ratio and quantization noise

IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control 2006 53 274 283

Babak

Keshavarzi

A novel DDFS based on trigonometric approximation with a scaling block

Proceedings of the 6th International Conference on Information Technology: New Generations (ITNG '09)

2009

102 106

De Caro

Petra

Strollo

A. G. M.

Reducing lookup-table size in direct digital frequency synthesizers using optimized multipartite table method

IEEE Transactions on Circuits and Systems I 2008 55 7 2116 2127

2-s2.0-50549092601

10.1109/TCSI.2008.918008

Brent

R. P.

Kung

H. T.

Regular layout for parallel adders

IEEE Transactions on Computers 1982 31 3 260 264

2-s2.0-0020102009

Bazargan

EE 5324—VLSI Design II: Adders 2006

University of Minnesota

Zimmermann

Binary adder architectures for cell-based VLSI and their synthesis [Doctor of technical sciences] 1997

Swiss Federal Institute of Technology in Zurich

Sklansky

Conditional-sum addition logic

IRE Transactions on Electronic Computers 1960 EC-9 226 231

Kogge

P. M.

Stone

H. S.

A parallel algorithm for the efficient solution of a general class of recurrence equations

IEEE Transactions on Computers 1973 22 8 786 793

2-s2.0-0015651305

Beaumont-Smith

Lim

C.-C.

Parallel prefix adder design

Proceedings of the 15th IEEE Symposium on Computer Arithmetic

June 2001

218 225

2-s2.0-0034870567

O'Leary

Maloberti

A direct-digital synthesizer with improved spectral performance

IEEE Transactions on Communications 1991 39 7 1046 1048

2-s2.0-0026189769