A Low Power FIR Filter Design for Image Processing

In this paper, a new low power design method of the FIR filter for image processing is proposed. Because the correlation between adjacent pixels is very high in image data, the clock gating technique can be a good candidate for low power strategy. However, the conventional clock gating strategy that is applied independently to every flip-flop of the filter give rise to too much additional area overhead and couldn’t get a good result in the power reduction. In our method, each tap register, which is used to delay the input data in the filter, is partitioned into two sub-registers according to the correlation characteristic of its input space. For the sub-register which highly correlated data is inputted into, the dynamic power consumption is reduced by diminishing switching activity of the clock signal. We can also reduce the additional hardware overhead by propagating the clock gating control signal of the first tap register to other tap registers. To identify the efficiency of the proposed design method, we perform the experiments on some filters that are designed in VHDL. The power estimation tool says that the proposed method can reduce the power dissipation of the filter by more than 18% compared to the conventional filter design methods.


INTRODUCTION
As the portable electronic market including mobile communication equipments have been making a good success, the interests on the low power design is on its sharp increase [1,2].Especially, the digital filter is one of the most popular devices in DSP applications that process image and speech signals.
Because the power consumption of filters is a large part of the total power of such DSP application products, researches are concentrated in the development of algorithm considering low power digital filter architecture [3][4][5].There are two methods for implementing low power filters, that is, the optimization of filter coefficients and the transformation of the filter structure.
In [5], the author shows that the transformation of the filter structure reduces switching activity at internal nodes in the filter.
In this paper, we propose a new design method that reduces the dynamic power consumption of the FIR filter.The proposed method is suitable for designing the direct form filters.According to the correlation of input data, we partitioned each tap register into two sub-registers.For the sub-register which highly correlated data is inputted into, clock gating the sub-register can reduce the dynamic power consumption.We can also reduce the size of control circuit which is added to each tap register by propagating the clock gating control signal of the first tap register to other tap registers in the filter.This paper is organized as follows.In Section 2, low power digital FIR filter architecture is des- cribed.Experimental results are presented in Section 3 and conclusion is given in Section 4.

LOW POWER ARCHITECTURE
FOR DIGITAL FIR FILTER Digital filters can be described as the convolution operation between input data and filter coeffi- cients.The output of N-tap FIR filter is given as following: k=O As shown in Eq. ( 1), output y(n) is the sum of the convolution operation between delayed input signal x(n-k) and filter coefficient h(k), where k is through N. Therefore the filter consists of tap registers that are used to delay the input data, multiplier for convolution operation, and adder.There are two types of filters, which are the direct form and the transposed form.We consider only direct form FIR filters in this paper.

Low Power FIR Filter Architecture
for Image Signal Processing The proposed filter design method consists of two steps.First, after considering spatial correlation of image data, we partition a tap register into two sub-registers.One is a sub-register whose input data has high spatial correlation.The other is a sub-register that has low-correlated input data.
Second, we apply the clock gating technique to both sub-registers.This reduces the switching activity of the clock signal of a sub-register whose input space is highly correlated and reduces drastically the dynamic power consumption of the sub-register.We can also reduce the additional circuitry that generates the clock gating control signals by propagating the control signal of the first tap register to others.

Correlation Analysis of Image Data
Generally, a pixel of image data is highly correlated with its adjacent pixels [8].We analyze some test image data to measure the correlation characteristic.Image signal consists of Y(luminance), and color signals Ca, CR.We consider only 256 256 image data whose pixel is represented in 8 bits.Lenna, girl and couple2 are the popularly used sample data in image processing.Figure 2 shows the histogram that describes the differential data distribution that is measured between all adjacent two pixels.The differential data in bit is presented on the horizontal space.The vertical space presents the frequency of each differential value in percentage.From Figure 2, we can see that, in case of couple2, about 90% of all the comluted differential data are less than 4 bit.In case of girl, about 76% are less than 4 bit.As a result, we can say that the The clock signal is a major source of dynamic power dissipation.The clock gating in recent times have become a popular way to reduce the switch- ing activity of logic in redundant cycles.Disabling the clock to parts of the circuit that are not active during certain periods of time reduces the switch- ing activity on the clock net [6,7].
Figure 3 shows a simple model for a clock gating circuit.
The model contains the control signal generator and an AND gate.OUT with clock signal to produce the gated clock signal that clocks the flip-flop.In conventional method, clock gating can be applied to a whole tap register or each flip-flop in the tap register.However this method is ineffective for processing the high correlation input such as image because it doesn't consider the correlation characteristic of input space.Let PLatch, PXOR and PAND be the power of latch, XOR and AND gate, respectively.If we let Pconv be the total power consumption of the conventional clock gating circuitry, Poonv of a 8-bit tap register is given as following: Pconv 8 (PLatch -+-PXOR -F PAND) (2)   The correlation between two pixels is the number of bits that are different.High correlation means that a data is similar to its adjacent input data.The differential data between adjacent two pixels is zero or very small in highly correlated image data.After considering spatial correlation of image data, we partition a tap register into two sub-registers according to the correlation charac- teristic explained in Section 2.2.1.
Figure 4 shows the clock gating circuitry to the first two sub-registers that consists the first tap register.
From the figure, the tap register is partitioned into the sub-register(subl) that lower 4 bits of the input are inputted to and the sub-register(sub2) that higher 4 bits are inputted to.Then, the clock gating is applied to partitioned sub-registers.As shown Figure 2, sub (sub2) has low(high) correla- tion.The clock gating probability is defined as the probability that the clock signal is inactive.The clock gating probability of sub for our method is almost same as that of each flip-flop of sub l, because the sub has low correlation and the input of register is likely to change at every clock.On the other hand, our method is area effective, which resulting in reducing the total power consumption.
As the input of sub2 is highly correlated, the clock gating control signal of any flip-flop is likely to change with other flip-flops in sub2 and we add only one clock gating circuitry to sub2.So, the amount of power consumption of additional con- trol circuitry is much smaller than that of the conventional method.If we let Pproposed be the total power consumption of the proposed clock gating circuitry, Pproposed of a 8-bit tap register is given as following: Pproposed 2 (PLatch d-PAND) d-8 X PXOR (3)   x(n) OLK

Additional Hardware Reduction
Using Propagating the Control Signal of First Tap Registers register): Pc_total N x 8 x (PLatch -I-PXOR -I-PAND) (4)   Because each tap register must be controlled by its own clock gating control signal, every tap register requires its own generator circuitry.However in case of direct form filter, the tap registers is connected with next tap register in series.The clock gating control signal that is generated at first tap register can be used as that of the next tap register during the next clock cycle.Therefore, the tap registers except first tap register don't need their own clock control generator.They need only two flip-flops to get the control signals of sub and sub2 that are shifted from the previous tap register.
As shown Figure 5, the two control signals of first tap register are shifted to the next tap register at the falling edge of the clock.Compared to the conventional clock gating method, we need only two D-type flip-flops instead of 8 XOR gates and 8 latches for a tap register.
Let PD be the power consumption of a D-type flip-flop.If we let Pc_total and Pp_total be the total power consumption of the additional control circuitry in the conventional method and the proposed method, respectively, Pc_total and Pp_total are given as following(N'the number of tap Pp_total 2 x (PLatch + PAND) + 8 PXOR + (N-1) x (2 x PD + 2 x PAND) where N is the number of tap register.
2.2.4.Clock Gating for Pipeline Register in Filter Operation Part The filter operation part consists of multiplier and adder for convolution operation.Because the delay time of operation part must be less than one clock period, the pipeline registers can be inserted for high-speed operation.In this case, the efficient clock gating for pipelined register is shown in Figure 6.The output of gating logic in Figure 6 is the output signal A of D flip-flop in Figure 5.
For convenience, even if all tap registers have their own gating logic, some gating logics are omitted in Figure 6.REG1 and REG2 are the pipeline registers.The inputs of REG1 (REG2) are the result of convolution operation for three (two) tap register and coefficients.When all outputs of the gating logics are 0, the clock of corresponding x(n) 8 CLK FIGURE 5 Propagation of the clock gating control signal.pipeline register is inactive.Therefore, the gating signals of each tap register are used to gate the clock for pipeline register without any additional hardware.

EXPERIMENTAL RESULTS
We measured the power consumption of four FIR filters for image processing.Filter is a 13tap LPF that is shown in Figure 1.Filter2 is a 13tap BPF(Band Pass Filter).Filter3 and Filter4 are 5tap LPF and 5tap BPF, respectively.We implemented these filters in VHDL and synthesized them using Synopsys tool.The processing technology used in our experiments is CMOS standard cell (0.5m) with the supply voltage of 3.3 V.The procedure to perform power estimation is as follows.About 10,000 input patterns of three images (lenna, couple2 and girl) were simulated at the clock frequency of 10MHz for each filters, and the switching activity of each node was recorded.
Then, using Synopsys Design power, the power dissipation of the filters was estimated.Experi- mental results for four filters are shown in Table I.
The results show 19.3% average power reduc- tion for Filter l, 17.3% for Filter2, 18.4% for Filter3 and 18.3% for Filter4.From Table I, we  can see that the proposed method gets 18% average power reduction for all filters.The average power reduction of each image for four filters is 15.5% for lenna, 23.2% for couple2 and 16% for girl.We must note that the power reduction for couple2 image is better than that for any other image because the couple2 image has higher correlation than others as shown in Figure 2.

CONCLUSIONS
In this paper, we presented a new low power design method of the FIR filter for image processing.The clock gating technique can be a good candidate for low power strategy, because the correlation of image data is very high.However the conventional clock gating method is ineffective for processing the high correlation input such as image because it doesn't consider the correlation characteristic of input space.To overcome this demerit, each tap register is partitioned into two sub-registers according to the correlation characteristic of its input space.Then, the clock gating is applied to partitioned sub-registers.The clock gating probability of a sub-register whose input data is lowly correlated is almost same as that of each flip-flop of the register, because this register has low correlation and its input is likely to change at every clock.On the other hand, our method is area effective, which resulting in reducing the total power consumption.In case of the sub-register that has highly correlated input, the clock gating control signal of any flip-flop is likely to change with other flip-flops in the register and we add only one clock gating circuitry.So, the amount of power consumption of additional control circuitry is much smaller than that of the conventional method.We can also reduce the additional hard- ware overhead by shifting the clock gating control signal of the first tap register to other tap registers.
From the experiment results on various filters, we showed that the proposed method can reduce the power dissipation by more than 18% compared to the conventional clock gating methods.
The control signal generator generates the control signal to gate the clock.The output of XOR gate goes high when the input is different from the output of flip-flop.The output of XOR gate is an input to a level triggered latch with active low enable.The latch output is ANDed