Low-Cost Design of an FIR Filter by Using a Coefficient Mapping Method

This work presents a novel coefficient mapping method to reduce the area cost of the finite impulse response (FIR) filter design, especially for optimizing its coefficients. Being capable of reducing the area cost and improving the filter performance, the proposed mapping method consists of four steps: quantization of coefficients, import of parameters, constitution of prime coefficients with parameters, and constitution of residual coefficients with prime coefficients. Effectiveness of the proposed coefficient mapping method is verified by selecting the 48-tap filter of IS-95 code division multiple access (CDMA) standard as the benchmark. Experimental results indicate that the proposed design with canonical signed digit (CSD) coefficients can operate at 86MHz with an area of 241,813 um, leading to a throughput rate of 1,382Mbps. Its ratio of throughput/area is 5,715 Kbps/um, yielding a higher performance than that of previous designs. In summary, the proposed design reduces 5.7% of the total filter area, shortens 25.7% of the critical path delay, and improves 14.8% of the throughput/area by a value over that of the best design reported before.


Introduction
Digital signal processing applications are common in home entertainment systems, television sets, high-fidelity audio equipment, and information systems.The digital filter is an important component in mathematical operations on a sampled, discrete-time signal to enhance the certainty of a signal.The digital filter is characterized by its transfer function.Two digital filters are infinite impulse response (IIR) and finite impulse response (FIR) filters.The IIR filter consists of a transfer function with feedback mode, and the FIR filter consists of the function with nonfeedback mode.Commonly found in image processing, audio processing, and wireless communications, FIR filter applications are characterized by a linear phase, arbitrary magnitude, and relatively easy implementation.The filter hardware consists of adders, subtractors, shifters, and registers.Many related works [1][2][3][4][5][6][7][8][9] attempt to reduce the number of these required components in filter implementation, especially for the optimization of coefficients' realization.Experimental results demonstrate that the proposed coefficient mapping method performs better than previous designs in terms of area ratio.
The rest of this paper is organized as follows.Section 2 briefly describes previous researches for filter optimization.Section 3 then describes the coefficient mapping method.Next, Section 4 summarizes the experimental results and compares them with those of other previous designs.Conclusions are finally drawn in Section 5, along with recommendations for future research.

Background
2.1.Digital FIR Filter.Digital filters generally vary in coefficients, based on their specifications.The design of coefficients in a filter can be divided into four portions: coefficient selection, coefficient identification, searching algorithm, and coefficient quantization.
(1) Coefficient Selection.Typically determined by a set of filter specifications, coefficient selection must consider the number of taps, bit width, and filter complexity.According to the different complexities of coefficients, different algorithms are used to find the common subexpressions (CSs) and eliminate them for obtaining the best area reduction.(2) Coefficient Identification.Coefficients must be encoded to determine the area cost of a filter and the frequency of extracting common subexpressions.In coefficient encoding, the common expression is binary encoding.However, this encoding method causes more 1's signals in data expression and more calculations in hardware implementation.Hence, optimizing more coefficients [10,11] involves using the canonic signed digit (CSD) expression to eliminate many 1's signals and using less common subexpressions.
(3) Searching Algorithm.The searching algorithm can find more common subexpressions to reduce the area cost of filter.Although many works [1-9, 12, 13] have attempted to find as many common subexpressions as possible, a more complex algorithm may not yield a higher performance, especially in coefficients with a low complexity.
(4) Coefficient Quantization.Coefficient quantization is an effective means of reducing the number of logic gates while implementing a filter.When the coefficients are quantized for implementation, the commonly used rounding method causes a deviation in the time and frequency responses of the implemented filter from the ideal response.Sensitivity of the filter response is of priority concern when quantizing the coefficients.

Optimization of the FIR Filter.
The equation of FIR filter can be expressed as (1), and its transferred function can also be expressed as (2): Parameter  in (1) and ( 2) is expressed as the number of taps.This parameter is related to the output of a filter system and its frequency responses.Two implementation methods used for a filter are direct and cascade architectures.Here, the direct architecture is of priority concern, especially for enhancing the frequency responses of a filter with all zeros.The direct architecture can also be divided into direct and transposed forms.The direct form consists mainly of multipliers, adders, and registers.Figure 1 shows the architecture of the direct form.An -tap filter with a direct form requires  copies of a multiplier,  − 1 copies of an adder, and  copies of a register.
In addition to using two horizontal methods and one vertical searching method to extract the CSs, the method in [5] also uses a multiplier-adder block (MAB) and structure adder (SA) to construct the CSs and their residues.Besides finding two CSs with the same appearances, the method in [6] extracts the CS with a smaller bit width.A previous work [8] developed two methods for extracting the CSs in CSD format.The first method analyzes the CSs with 3-, 4-, and 5-bit by performing the statistics of their appearances.The second method searches the coefficients up to down and extracts the CSs between them by using vertical search.The method in [9] proposes a rule in which the depth of logic gates cannot be increased by performing the horizontal search in the same way as in [4] and the vertical search in the same way as in [8].
Following implementation of the above searching methods, the CSs can be extracted and the same CSs can be used for calculation only once.Calculation times of the filter are reduced due to the extractions.These searching methods can also reduce the required number of adders and subtractors.To verify the different searching methods, the 48-tap filter of IS-95 CDMA is selected as the benchmark.

Proposed Coefficient Mapping Method
The mapping method divides the coefficients into two parts: primary coefficients and remaining coefficients.The parameters that are set up in the algorithm are  ℎ and   .The constitution of coefficients has two steps: constitution of primary coefficients and calculation of multiple relations between the primary and remaining coefficients.The operation steps of the mapping method are described as follows.
For example, this work selects 18 coefficients  0 - 17 from 48-tap IS-95 CDMA to perform the mapping method.The coefficients are Before performing the mapping method, this work first sets up two parameters in which  ℎ equals 32 and   equals 0.25 to reduce the hardware design complexity.After Steps 1, 2, 3, 4, 5, 6, and 7 of the proposed method are performed, Table 1 lists the generated ℎ  values.Figure 2(a) shows the frequency responses, poles, and zeros distributions of the original filter.Figure 2(b) shows the behaviors of the filter with modified coefficients after applying the proposed mapping method.The behaviors in Figures 2(a

Experimental Results
For comparison,  2 lists two filter designs: the coefficients expressed with binary and CSD formats.In the architecture-level, the proposed filter with CSD coefficients has the smallest summation for calculating the number of adders and subtractors.The proposed filter only requires a total of 56 adders and subtractors which are the smallest amounts among the previous designs.The best design is the method in [7], which has an area of 255,450 um 2 and achieves a throughput/area of 4,979 Kbps/um 2 .The proposed design with CSD coefficients can operate at 86 MHz with an area of 241,813 um 2 , leading to a throughput rate of 1,382 Mbps.Its ratio of throughput/area is 5,715 Kbps/um 2 , which is the highest performance among the previous designs.In summary, the proposed design reduces 5.7% of the total filter area, shortens 25.7% of the critical path delay, and improves 14.8% of the throughput/area compared with the best design in [7].

Conclusions
This work has developed a novel filter design with coefficient mapping method.The proposed method can reduce the area cost by finding the primary coefficients and using them to construct the remaining coefficients.The proposed method can also use several coefficients and construct all of the filter coefficients.Experimental results demonstrate that the proposed design with binary or CSD coefficients can more significantly reduce the area cost and improve the ratio of throughput/area compared with previous designs.Implementation results further demonstrate that the proposed design has the highest throughput with the lowest area cost.

Figure 1 :
Figure 1: Filter architecture of a direct form.

Figure 2 :
Figure 2: (a) Frequency responses, poles and zeros distributions of the original filter.(b) Frequency responses, poles and zeros distributions of the modified filter.
) and 2(b) are approximately the same with each other.

Table 2 :
Performance comparison of various 48-tap filter designs for generating 16-bit outputs.
[7]le 2 lists various 48-tap filter designs for IS-95 CDMA.These filters can generate 16-bit output data at one clock cycle.The table shows the architectureand gate-level information of the filters.The area cost of the architecture-level includes how many adders, subtractors, and registers are used for implementation.Gate-level information includes the area cost, critical path delay, throughput, and throughput per area.In the architecture-level, analysis results indicate that the original filter with binary coefficients has the largest area cost among other designs.The design in[7]has the smallest summation for calculating the number of adders and subtractors.More than having the smallest area cost, the proposed design also achieves the highest throughput and ratio of throughput per area in the gate-level synthesis among other designs.Table