A Reliable Leakage Reduction Technique for Approximate Full Adder with Reduced Ground Bounce Noise

In this paper, an effective and reliable sleep circuit is proposed, which not only reduces leakage power but also shows significant reduction in ground bounce noise (GBN) in approximate full adder (FA) circuits. Four 1-bit approximate FA circuits are modified using proposed sleep circuit which uses one NMOS and one PMOS transistor. The design metrics such as average power, delay, power delay product (PDP), leakage power, and GBN are compared with nine other 1-bit FA circuits reported till date. All the comparisons are done using post-layout netlist at 45nm technology.The modified designs achieve reduction in leakage power and GBN up to 60% and 80%, respectively, as compared to the best reported approximate FA circuits. The modified approximate FA also achieves 83% reduction in leakage power as compared to conventional FA. Finally, application level metrics such as peak signal to noise ratio (PSNR) are considered to measure the performance of all the proposed approximate FAs.


Introduction
In today's electronic driven epoch, demand of high speed electronic devices has put tremendous pressure on VLSI designers to develop low leakage and high speed arithmetic circuits.CMOS adders are having significant impact [1] on performance of overall system.Most of the advanced DSP algorithms also use addition as a primary operation, so any improvement in performance parameters of FA circuit can improve the whole system [2].
During the earlier phase of VLSI era, designers were focusing on miniaturization of MOS transistors to reduce chip size so as to achieve more and more portability for electronic devices.It is well known that if technology shrinks, threshold voltage and supply voltage also scale down to maintain the switching speed and performance, which results in exponential rise in leakage current [3] in nanoscale VLSI circuits.Leakage power has become critical issue [4], because electronic devices such as mobile phones, laptops, and other battery operated devices remain in standby mode for most of the time, which results in sharp battery discharge due to heavy leakage current [5].Several techniques have been used in the past for reduction in leakage power in digital circuits but most commonly used technique is power gating [6].It uses sleep transistors which provide high impedance between Vdd and GND during sleep mode to reduce leakage power.Although existing power gating techniques are very effective, all of them are having the disadvantage of large voltage fluctuations [7] during sleep to active mode of transition.During sleep period VGND node charges up to the supply voltage which creates instantaneous current peaks during mode transition.These voltage fluctuations are known as GBN as shown in Figure 1.
GBN was not the major problem in micrometer technologies but it is a serious concern at nanoscale VLSI design [8].At deep submicron technology, intensity of GBN peaks is large enough to change the internal logic states of any circuit.The reliability of a circuit can be increased by curtailing its GBN.
During the last two decades various FA circuits were presented in literature for low power arithmetic circuits.However, a very few circuits are optimized for leakage power.Recently a sleep circuit with three NMOS transistors and one PMOS transistor is presented [9], which reduces leakage power and GBN but requires large silicon area.A circuit with tri-mode [10] is used to reduce leakage current and GBN which has a park mode between active and standby mode.It uses high Vth transistor along with the footer transistor.This technique has large propagation delay and it requires two separate control signals, which further increases complexity.
Another circuit of power gating with stacking [11] is presented to reduce the leakage power and GBN in 10T FA circuit.This technique has penalty of increase in delay and area due to extra transistors.Threshold voltage tuning [12] technique is presented recently.It uses one sleep transistor, a bias, and a parker transistor with forward biased body.Although there is decrease in GBN during sleep to active mode of transition, park control results in increase in delay due to twostep switching.Recently a new technique has been presented in literature [13] which uses two sleep transistors and one capacitor.In this approach one transistor gets the sleep signal immediately and the second transistor gets sleep signal after some delay.Two GBN peaks and need for extra buffers are the limitations of this approach.Recently, approximation [14][15][16][17] is used as one of the possible solutions for low power VLSI design.In the past, several circuit styles of approximate adders are presented to reduce the average power and delay of the circuit, but to the best of our knowledge, leakage power is not improved.Two imprecise FA circuits [18] are presented, which have faster and lower power operation.They show reduction in dynamic power dissipation and transistor count as compared to conventional adder but leakage power is not considered for optimization.Four approximate [19] one-bit FAs are presented by logic reduction at transistor level.Average power dissipation and number of transistors are reduced at the expense of accuracy in conventional adder, but they consume large leakage power.Another inexact FA [20] is presented which used 10 transistors for one-bit FA.
Although it improves dynamic power, circuit does not have full voltage swing for sum signal which reduces the noise margin and hence is not suitable for large cascaded arithmetic circuits.Leakage power is also not reported in this paper which might be high due to the two pass transistors used Although leakage power and GBN have been reduced, further reduction in peaks of GBN is still possible.Most of the existing approximate adders consume large leakage power, which makes them unsuitable for long chain arithmetic circuits.In this article, we have modified the approximate adders with proposed sleep circuit to reduce leakage power and GBN.

Proposed Circuit
The modified structure of approximate adder is shown in Figure 2 with proposed sleep circuit between VGND and GND terminal.
Four approximate FAs [19] are tested with proposed sleep circuit.These modified approximate adders are named as MAA1, MAA2, MAA3, and MAA4 in this paper as shown in Figures 3, 4, 5, and 6, respectively, where MAA stands for modified approximate adder.
We have calculated the W/L ratio of all the modified as well as existing approximate FAs according to the basic principle of CMOS technology in which effective width of pull-up network is double the effective width of pull-down network.It gives the equal rise and fall times of all the output signals.The transistor sizes of all the modified approximate FA cells are included in Figures 3-6.Length of all the transistors are kept at 45nm which is the minimum length at 45nm technology.
Circuit diagram of proposed sleep circuit is shown in Figure 7.The sleep circuit has three modes of operation: (i) active mode, (ii) sleep mode, (iii) sleep to active mode transition.
All the simulations are performed using 45nm technology having threshold voltage of NMOS and PMOS transistor given as follows.
V th (NMOS) = 0.62V . .Active Mode.In active mode, sleep signal will be at logic high.It will switch NMOS transistor of the sleep circuit to ON state and VGND terminal will connect to real GND which forces PMOS transistor into cutoff state.In active mode, NMOS transistor offers very low resistance and PMOS transistor offers very high resistance.Equivalent circuit of both the sleep transistors in active mode is shown in Figure 8, where C DSN is capacitance between drain and source of NMOS transistor, C SDP is the capacitance between source and drain of PMOS transistor, and Rnon is the resistance between drain and source of NMOS transistor.
In active mode output of logic circuit will be the same as it was in normal circuit.
. .Sleep Mode.In sleep mode, sleep signal will gradually fall from Vdd to 0V and potential at VGND node will increase gradually as shown in Figure 9.
As long as the voltage level of sleep signal is higher than the threshold voltage of Mn1 transistor, Mn1 will operate in saturation region and potential at VGND node will be very small which is shown during time T1 in Figure 9.During time T2 sleep signal reaches 0V and Mn1 will go in cutoff state.Potential rises sharply at VGND node during time T2 and when it increases beyond 0.5V, Mp1 will turn ON.As the bias at gate of Mp1 is -0.1V,V gs of Mp1 at this time will be given as follows.
V gs = V g − V s = −0.1V− 0.5V = −0.6V As threshold voltage of PMOS transistor is -0.58V,Mp1 transistor will turn ON during time T3.The ON time of Mp1 transistor during sleep period will depend upon the value of bias voltage.The higher the value of bias voltage is, the less the turn ON time will be.The equivalent circuit of sleep circuit at this condition is shown in Figure 10.
During time T3 Mp1 will discharge the potential from VGND node to GND; as it goes slightly below 0.5V, Mp1 transistor goes in cutoff mode because V gs will not be sufficient to turn ON the Mp1 transistor.Therefore, both transistors Mn1 and Mp1 will be in cutoff state simultaneously after time T3 which is the stable state of the circuit.Equivalent circuit of sleep circuit at this condition is shown in Figure 11.
Potential at various nodes during sleep mode are shown in Figure 12.
. .Sleep to Active Mode Transition.In sleep to active mode transition, sleep signal changes from 0V to Vdd and potential at gate of NMOS transistor starts rising.Whenever the potential at the gate of NMOS transistor increases beyond the threshold voltage, it gets turned ON.NMOS transistor will turn ON in linear region because potential at VGND node (in Figure 12) in sleep mode ensures to turn ON Mn1 in linear mode.Thus, it greatly reduces the peaks of GBN during the switching from sleep to active mode of the circuit.

Simulation Setup
To demonstrate the performance of our modified approximate FA, post-layout simulation results are extracted at 45nm technology.All the circuits, existing as well as modified, are tested under the same environment conditions.Simulation setup is shown in Figure 13.A load of 0.8fF is connected at the output of each approximate FA which is equivalent load of four minimum unit inverters at 45nm technology.

Results and Discussion
The results are discussed in this section with respect to the average power, delay, PDP, leakage power, and GBN performance metrics.The extraction of all the performance metrics for existing and modified adders has been done through post-layout simulations.The modified design has the following advantages as compared to the existing sleep circuit [11].
(i) The proposed sleep circuit has very simple circuitry which used only one NMOS and one PMOS transistor with normal V th .
(ii) PMOS transistor is used with negative gate bias.This maintains the potential in such a way so that NMOS transistor of sleep circuit will turn ON in linear region during the sleep to active mode transition which reduces the GBN as compared to the existing sleep circuit [11].
(iii) The proposed sleep circuit has symmetric and compact layout as compared to the existing sleep circuit [11].
(iv) Simulation results at various temperature and voltage values confirm the stability of the modified circuit.
. .Layout of Modified Design.Layouts of all the existing as well as modified FAs under consideration in this paper are designed using Cadence Virtuoso layout editor at 45nm technology.These layouts are used to perform physical verification (DRC, LVS) and extraction of post-layout netlist.Post-layout netlists are used for simulation and extraction of performance parameters of all the FAs in this paper.Layouts of modified approximate FAs are shown in Figures 14-17, respectively.
. .Average Power and Delay.To extract the average power and delay of all the FAs under consideration, standard test input patterns [21] have been used, which are shown in Figures 18 and 19, respectively.These patterns cover all the possible worst case switching nodes and give very accurate results.In this paper, maximum frequency of input data is 500MHz.The output waveforms of all the modified approximate FAs in active mode are shown in Figure 18.Average power includes three types of power dissipation in the CMOS circuit.Mathematically it is given as follows.
P average = P switching + P static + P short-circuit (3) Switching power dissipation is the average power dissipation due to the switching of all the nodes in the circuit during working condition.Static power is the power dissipated by the circuit during idle state of the circuit.Short-circuit power is the power dissipation due to the simultaneous conduction of pull-down and pull-up network connecting between the Vdd and GND of the circuit.The propagation delay is calculated when a 50% change in input signal from 0 to 1 or 1 to 0 corresponds to 50% change in output signals from either 0 to 1 or 1 to 0 transition.In this paper, the input pattern as shown in Figure 19 is used for measuring the delay.It has total 56 input patterns that cover maximum possible input/output transitions for all the possible input combinations of FA.
. .Leakage Power.The leakage power is measured in idle mode of the circuit.Basically leakage current is a combination of subthreshold and gate oxide leakage current.Mathematically, leakage current is given as where I leakage is the total leakage current of circuit, I subthreshold is the subthreshold leakage current, I gateleakage is the gate leakage current.
where     Similarly gate oxide leakage current is given as where T ox is the gate oxide thickness, V T is the thermal voltage, W is the width of transistor, K 2 and  are experimentally derived parameters.
At 65nm technology subthreshold leakage is dominant.As the technology is shrinking below 65nm, gate leakage current is becoming dominant factor as compared to subthreshold leakage current.During the stationary state of the circuit, data will not change at the input of the circuit; there will be stable data at the input.In FA, there are 3 inputs, so 2 3 = 8 possible data combinations.Out of 8 possible combinations any combination can exist during idle state; leakage power is strongly dependent on data present at inputs during idle state.To measure accurate leakage power we have measured the leakage power for each possible data input and the average of all such values is the final leakage power of the circuit.The same procedure was adopted to measure the leakage power for all the FA circuits under consideration.Table 1 shows the leakage power values for all the existing and modified FA circuits.
. .GBN Calculations.GBN is the voltage fluctuations during sleep to active mode transition of circuit which is also known as simultaneous switching.GBN is a serious issue in deep submicron technology.In this paper, the behavior of GBN peaks is examined for existing and modified FAs under consideration.To measure the GBN, we have used the    Figure 26 shows the effect of voltage variations on PDP.As the voltage increases, PDP decreases, because delay decreases more sharply as compared to the increase in average power dissipation with respect to increase in voltage.
The PDP increases for all the adders as the temperature rises as shown in Figure 27, because of increase in average power dissipation and delay with respect to temperature, and hence the PDP increases.
To evaluate the circuit performance more precisely we have studied the effect of output load variations with respect to PDP for all the existing as well as proposed designs under consideration in this paper.It is clear from Figure 28 that as the output load increases, PDP increases.It is because delay and dynamic power both are directly proportional to capacitance which results in increases in PDP with the increase in output load.

. . Motion Detection Using Proposed Approximate FAs.
In order to measure the application level performance of proposed approximate FA, we have used a practical image processing application of motion detector.
Figures 29(a) and 29(b) show two images with slight differences which we have used to detect the movement of the object.These two images are converted into binary matrices named as n 1 and n 2 , respectively, using MATLAB.Then they subtracted using where i and j are the rows and columns of matrices.
i is the number of rows in binary matrices.j is the number of columns in binary matrices.
These images are converted into binary matrices using MAT-LAB program.Q(i,j) matrices will determine the possible movement in the two picture frames.If there is no movement, then the two images will be exactly the same and their resultant matrices will be completely zero.If there is a movement, then corresponding shadows will exist in the resultant picture.We have used two's complement for subtraction of n 2 (i,j) matrices which is shown in (9).

Mathematical Problems in Engineering
Addition operation in (9) is the binary addition of two matrices which is performed by using accurate and proposed approximate FA, respectively.Q(i,j) matrices are converted into output image using MATLAB program.Figure 30 shows the resultant picture from Q(i,j) matrices with absolute difference using all accurate adders.Figure 31 shows the absolute difference between the frames using MMA1 adder (Figure 31(a)), MMA2 adder (Figure 31(b)), MMA3 adder (Figure 31(c)), and MMA4 adder (Figure 31(d)).As the number of approximate output increases, the output result tends to be inaccurate which is clearly visible in Figure 31.
Finally, the PSNR is measured for evaluating a quantitative metric to study the quality of the proposed approximate FA.The PSNR is measured using (10), where MAX f denotes the maximum signal value that exists in our original image and  MSE denotes mean-squared error.The e MSE is calculated using (11) where symbols n and m denote the number of rows and columns of a picture.Table 3 Shows the comparison of PSNR for the subtracted images by using proposed approximate adders.All the results in Table 3 are extracted through MATLAB program.
. .Comparison of Performance between Modified Approximate FA and Existing FAs.This section presents the performance comparison of the modified FA circuits with existing FA in terms of average power, delay, leakage power, PDP, and maximum GBN peaks.From the results in Table 2, it is observed that MAA3 adder has the best performance characteristics among all the modified adders, so we have compared the MAA3 adder with existing adders which have following advantages.
(1) The modified MAA3 adder has the least leakage power consumption (34W) as compared to any other FA under consideration.The reduction in leakage power is achieved due to the reduction in potential at VGND node in sleep mode, which decreases the potential difference between source and drain of NMOS transistors in pull-down network of adder and hence gate leakage current reduces.This advantageous feature in one-bit adder makes the modified design a best choice for designing long chain arithmetic circuits at nanoscale VLSI technology.
(2) The modified MAA3 has delay metric as 316ps which is 12% more as compared to existing adder of the same type MA3.Increase in delay in MMA3 FA is due to the inclusion of sleep circuit.
(3) The GBN peaks of modified MMA3 adder is 145v which is 80% less as compared to the same category (MA3) of existing adder.It is achieved because of the reduction in stored charge at VGND node during sleep mode.Also, NMOS transistor of proposed sleep circuit turns ON in linear region during sleep to active mode transition, which further reduces the voltage fluctuations and hence the GBN peaks.Reduction in GBN peaks makes this design a best choice for designing low leakage arithmetic circuits for portable electronic devices.
(4) MAA3 has the least average power dissipation among the modified adders and has almost equal average power dissipation when compared to existing adder MA3.
(5) MAA3 has 11% more PDP as compared MA3 adder, but still it has second rank if we compare MAA3 with all other existing adders under consideration as shown in Table 2.
Modified approximate FAs in this paper not only reduce GBN but also effectively reduce the leakage current using sleep circuit.Mostly, existing power gating techniques use multithreshold transistors to design sleep circuits, which increases complexity in designing of layout due to various extra processing steps.In all the modified approximate FA circuits, all the transistors have the same V th which makes layout design very simple.However, there is marginal increase in delay and PDP in modified circuit which is still in acceptable range.Reduction in leakage power and GBN has more weightage as compared to PDP and delay because modern portable electronic devices need low leakage circuit to improve the battery life in idle condition.

Conclusion
This paper presented four modified approximate adders which minimize the leakage power as well as controlling the GBN in transition mode.The modified approximate FAs are compared with other standard design approaches like MA1, MA2, MA3, MA4, imprecise adder, inexact adder, hybrid adder (FA), and conventional adder.The existing as well as modified FAs are simulated using post-layout netlist extracted from Cadence Virtuoso layout editor.Result shows that the modified design has the least leakage power dissipation and GBN in comparison to other adders.The modified designs offered 60% and 80% improvement in their leakage power and GBN, respectively, as compared to the best reported existing approximate FAs.The modified circuits of approximate adders are verified at various voltage and temperature ranges which confirm its robustness and reliability.Although modified design needed extra transistors, improvement in performance parameters makes this circuit a best choice for nanoscale VLSI design.It is concluded that the modified design is one of the best contenders for designing the long chain arithmetic circuit for low leakage power applications.

Figure 18 :
Figure 18: (a) Stimulus for average power dissipation for proposed and existing FAs.(a) Output waveforms of MMA1 adder.(b) Output waveforms of MAA2 adder.(c) Output waveforms of MAA3 adder.(d) Output waveforms of MAA4 adder.

Table 1 :Figure 19 :
Figure 19: Input stimuli used to calculate delay for existing and modified approximate FA under consideration.

Figure 22 :Figure 23 :Figure 24 :Figure 25 :Figure 26 :Figure 27 :Figure 28 :
Figure 22: Effect on leakage power for existing and modified approximate FA due to variations in voltages.

Figure 30 :
Figure 30: Resultant picture with absolute difference between two frames using all accurate adders.

Table 2 :
Comparison of post layout simulation results between existing and modified FA at 1.1V,27 ∘ C temp.

Table 3 :
PSNR for the subtracted images by using an mma1, mma2, mma3 and mma4 proposed approximate FA.