In this paper efficient digital filter design techniques categorized as sigmadelta modulation based short word length (SWL) and multibit (or contemporary) techniques are reviewed in terms of hardware complexity, area, performance and power tradeoffs, synthesis issues, and algorithm versatility. More recent, general purpose DSP applications including classical LMS algorithms reported using sigmadelta modulation encoding are reviewed thoroughly. A small number of basic arithmetic circuits designed using sigmadelta modulation encoding and synthesized by using FPGAs are also described. Finally, recent FPGA based areaperformancepower analysis of singlebit ternary FIR filtering is discussed and compared to its corresponding multibit system. This work shows that in most cases singlebit ternary FIRlike filters are able to outperform their equivalent multibit filters in terms of area, power, and performance.
It is no surprise that many signal processing tasks can be accomplished by a microprocessor or a digital signal processor (commonly called DSP kits). Builtin multiplication modules are the core element of these devices. Furthermore, implementation of multiply and accumulate (MAC) circuits within signal processors can significantly improve the throughput of FIR and IIR digital filters structures (see Figures
General structure of FIR filter.
Block Diagram of an IIR direct form II filter.
An alternative solution is to use gatelevel programmable devices such as field programmable gate arrays (FPGAs) to perform the digital filtering tasks. Concurrent (i.e., parallel) mode of operations of these devices is of great interest as it can improve the throughput of the digital signal processing circuits especially digital filtering modules. This higher throughput can be achieved at the cost of a higher chip area compared to the serial implementation of the circuits. Many of these FPGA devices include a number of builtin multipliers that take up a large amount of silicon area within the device. Further, the most recent FPGA devices include resources that easily support general purpose signal processing tasks even within midrange commercial devices.
However, there is a direct tradeoff between chip area and throughput in these devices. Some obvious applications that require fast and efficient digital filters are decimation filters, audio filter banks, chargecoupleddevice filters, and software defined radio, all of which require high throughput. To achieve fast and efficient implementations, many techniques have been proposed. The overarching theme of these techniques has been to reduce the complexity of the multiplication process in any possible way. One method of reducing the complexity of the multiplier is to reduce the word length in both the input and the filter coefficients. A preferred approach is to utilize the sigmadelta modulation to reduce the word length; this paper focuses on these methods. There are many techniques that use some form of sigmadelta modulation or the like to improve the efficiency of the digital filtering operations. Examples of such techniques were reported in [
The remainder of this paper proceeds as follows. Fast FIR filters are discussed in Section
Fast and efficient filters generally fall in two classes: sigmadelta modulation (ΣΔM) based and optimization techniques within a multibit format. A brief description of both these methods is given below.
This section describes input and coefficient encoding techniques that can be exploited to implement fast and efficient DSP algorithms in FPGAs. The techniques can be applied in either a singlebit or multibit environment [
As outlined above, it is the performance of the multiplyaccumulate (MAC) stages that will have the greatest impact on the overall behaviour of digital filters. Thus, various filter design techniques have been proposed that specifically target the complexity of these stages. For example, distributed arithmetic is a common technique that has been used in FPGA designs for many years [
Many other techniques have been proposed: Canonical Sign Digit (CSD) [
Apart from the classical multiplier complexity reduction techniques, a new approach called Slice Reduction Graphs (SRG) [
The primary intent of the techniques mentioned above has been to improve the areaperformance characteristics of parallel multibit binary filters operating at the Nyquist rate. However, it is obvious that the format of the coefficients and input data is one reason for the high complexity of the MAC stages. In [
Much work has been reported on the design and implementation of the sigmadelta modulation based FIR and IIR filters encompassing various forms. The work that was commenced by the authors in [
In [
Block diagram of the error feedback ΣΔM for requantization.
Overall, an efficient implementation of a narrowband digital filter through a requantizing operation has shown a 50% reduction in logic resources as compared to a traditional FIR filter implementation using an FPGA. This filter shows a great promise for FIR filter implementation. Further reduction in complexity can be gained through harsher requantization to lower precision words.
In [
Block diagram of the FIR filter with ΣΔM modulated filter coefficients.
The use of cascaded comb filters as reported in [
Block diagram of the decoder used in a FIR filter with ΣΔM modulated filter coefficients and with ΣΔM modulated input signal.
In a second approach (Figure
Block diagram of the FIR filter with ΣΔM modulated input signal.
To perform the filtering operation, full precision filter coefficients were zero padded by
In [
A ternary format has an extra symbol for input and filter coefficients and has been found to offer better stop band attenuation and dynamic range flexibility compared to the binary format [
A slightly different fast and efficient FIR filter design using sigmadelta encoding is presented in [
The last group of fast and efficient filters designs uses a canonical signed digit (CSD) quantizer with signed powers of two ΣΔM output [
Regardless of the many optimizations that have been proposed, a large number of multiplication stages is still translated into a large area, delay, and power consumption. Onebit ΣΔ modulators are widely used in AD and DA conversion stages due to their inherent linearity and precision. However, it is less common for the entire digital processing path to operate on singlebit data. The more usual approach has been to decimate the signal data stream after conversion and for the remaining processing to be performed in a standard binary at the Nyquist rate and with a resolution mandated by dynamic range and noise considerations.
Sigmadelta modulation (ΣΔM) encoding of the FIR filter coefficients has been shown to be an efficient way to reduce the complexity of the multiplier and improve its areaperformance tradeoffs [
As the name suggests the singlebit filters produce a singlebit output. In the last decade various general purpose DSP applications are reported using singlebit sigmadelta modulation encoding including classical FIR filter in [
In [
Block diagram of the singlebit FIR filter.
Similarly, in the second approach presented in [
The VLSI analysis of the proposed design was carried out and the singlebit design was found to be more efficient in terms of silicon resources than a PCM digital filter up to 80 taps. The structure still has the complexity of a full precision filter coefficients, which can also increase the word length of the FIR filter output.
The remodulator complexity is discussed by the same authors in [
The core idea of the IIR singlebit ΣΔM presented in [
Block diagram of the firstorder singlebit IIR filter.
The stability of the system in Figure
However, a quasiorthonormal state space IIR architecture was shown to have good filtering abilities with good stop band attenuation by the same authors in [
Recently, new DSP design techniques called short word length (SWL) have been reported in [
General bock diagram of the singlebit ternary FIRlike filter.
Using the same approach, a narrowband band pass ΣΔM was proposed in [
Singlebit narrowband bandpass FIR filter.
The performance of the proposed method is also discussed in [
Further to this work, an LMSlike singlebit adaptive filtering structure for noise cancelling has been presented in [
However, much work is still needed to explore the design using random inputs within a higher noise environment. In addition, it is still unclear what might be the optimum coefficient update rate or range of the convergence parameter (
Although much work has been reported on the design and analysis of singlebit systems, it appears that there has been little reported on rigorous hardware analysis of singlebit signal processing techniques using FPGAs. However, a small range of work has been reported on VLSI synthesis and analysis of bitstream arithmetic modules and its variants that are further covered below. These arithmetic modules are building blocks of DSP algorithms but not a signal processing application itself. Furthermore, these modules have been an inherent part of the singlebit systems already proposed in [
In [
Bitstream arithmetic modules with bi, tri, and quad levels are described in [
In [
Regardless of the work reported on simple arithmetic modules, the drawback to all of these reported works is the limited range of their adder and multiplier modules (i.e.,
Unlike [
First order digital sigmadelta modulator [
Hardware implementation of all three designs was performed using the Xilinx Virtex5 FPGA and the areaperformance characteristics of the multiplier were noted. The synthesis results show a direct tradeoff between all the three designs and the two approaches for the IIR and FIR filter modules. These results indicate that the bilevel design is more resource efficient than either of the tri and quad level and provides a higher performance at the cost of lower noise suppression and vice versa. However, this assumes that the system was stable by considering the same approach described in [
As discussed earlier much work has been reported on the design and analysis of singlebit ternary FIRlike filters including classical LMSlike filters that are classified in general short word length (SWL) DSP systems. These have tended to be performed using highlevel tools such as MATLAB, with little work reported relating to their hardware implementation, particularly in Field Programmable Gate Arrays (FPGAs). Two primary areas of interest exist here. The first is the comparative behaviour of SWL and multibit systems exhibiting equal spectral performance in terms of their relative area, power, and throughput. The second is that it remains to be determined how chip areaperformance varies with varying OSR and bit width of the hardware SWL system.
In [
Simulation results obtained through the set of experiments are given in Tables
Areaperformance comparison of singlebit FIR versus multibit filter: nonpipelined mode.
Device  Singlebit  Multibit  

Number of tern. coeff  LUTs 



LUTs 
 
CycloneIII  512  4089 (3%)  71.4  64  8  8860 (7%)  46.2 
2048  15603 (13%)  52.6  12  17045 (14%)  35.3  
4096  30894 (26%)  45.3  16  26838 (22%)  29.1  
8192  62747 (53%)  40.3  18  32547 (27%)  26.5  
 
StratixIII  512  3925 (1%)  129.8  64  8  5219 (2%)  86.5 
2048  14368 (5%)  97.3  12  10942 (5%)  69.1  
4096  28499 (11%)  82.8  16  17731 (7%)  57.5  
8192  55927 (21%)  69.6  18  21568 (8%)  51.2 
Areaperformance comparison of singlebit FIR versus multibit Filter: pipelined mode.
Device  Singlebit  Multibit  

Number of tern. coeff  LUTs 



LUTs 
 
CycloneIII  512  3963 (3%)  125.6  64  8  9020 (8%)  94.5 
2048  15399 (13%)  122  12  17079 (14%)  67.1  
4096  30607 (26%)  120  16  26890 (23%)  53.3  
8192  61029 (51%)  118  18  32586 (27%)  47.4  
 
StratixIII  512  3719 (1%)  240  64  8  4923 (1%)  258.3 
2048  14453 (5%)  237  12  10353 (4%)  199.0  
4096  28745 (11%)  237  16  16916 (7%)  158.8  
8192  57362 (21%)  231  18  20662 (8%)  139.7 
The power analysis of two filters was performed in two steps using clock obtained from the areaperformance results (shown in Tables
In [
In this survey we have described the work reported on the development of fast and efficient filter designs especially by employing sigmadelta modulation as the encoding technique. Singlebit design techniques have been studied since the early 80s, having been first reported by [
Though significant work has been reported on singlebit design techniques, few analyses of the VLSI bitstream circuits appear in the literature. The design, analysis, and FPGA synthesis of arithmetic modules (i.e., adder, multiplier, and divider) were first reported in [
In [