Subthreshold circuit designs are very much popular for some of the ultra-low power applications, where the minimum energy consumption is the primary concern. But, due to the weak driving current, these circuits generally suffer from huge performance degradation. Therefore, in this paper, we primarily targeted analyzing the performance of a near-threshold circuit (NTC), which retains the excellent energy efficiency of the subthreshold design, while improving the performance to a certain extent. A modified row-based dual Vdd 4-operand carry save adder (CSA) design has been reported in the present work using 45 nm technology. Moreover, to find out the effectiveness of the near-threshold operation of the 4-operand CSA design, it has been compared with the other design styles. From the simulation results, obtained for the frequency of 20 MHz, we found that the proposed scheme of CSA design consumes 3.009×10-7 Watt of average power (Pavg), which is almost 90.9% lesser than that of the conventional CSA design, whereas, looking at the perspective of maximum delay at output, the proposed scheme of CSA design provides a fair 44.37% improvement, compared to that of the subthreshold CSA design.
1. Introduction
Subthreshold digital circuit design is a well-practiced technique, for implementing the highly energy-constrained, ultra-low power applications such as implanted sensors, pacemakers, and mobile peripheral processors [1, 2]. But the primary challenge, that limits its usage only to low performance systems, is the weak driving current. For the subthreshold or near-threshold operation, the MOS transistor is provided with a gate-to-source voltage which is either lower or else nearer to the threshold voltage (vth) of the device. At the same time, the supply voltage (Vdd) can be scaled below the vth or else can be set somewhat nearer to the vth. Thus, achieving the minimum power consumption, which leads to a longer battery lifetime, can be possible by using this technique [2]. However, the aforesaid advantage in energy consumption comes at the cost of performance degradation and that is mainly due to the fact that the charging and discharging of the load capacitances of the circuit (with the change in logic function) are actually driven by the weak subthreshold leakage current [3].
Now, it has been observed that a notable improvement in the performance of a CMOS circuit is possible, if we do a little bit of sacrifice in the energy consumption perspective [3]. And, this is the concept which triggers an increasing usage of near-threshold circuits (NTCs). To have the more precise definition, a circuit that operates with a supply voltage which is equal or slightly greater than the vth is called the NTC [4].
On the other hand, assigning the dual Vdd scheme to a CMOS circuit can be very effective in reducing both the dynamic and the leakage power [5, 6]. It provides the higher supply voltage (VddH) to timing critical logic gates, whereas the other noncritical logic gates of the circuit are actually driven by a lower supply voltage (VddL). Therefore, with this dual Vdd technique, it is possible to reduce the overall power consumption, without degrading the performance of the circuit too much [2, 4]. Moreover, the use of the VddH to speed up the timing critical logic gates and the VddL to the noncritical logic gates for minimizing the total power of the circuit requires the additional level-shifters which causes extra power consumption as well as area overhead [7]. Now, considering the case of NTC, the key advantage lies in the fact that the value of VddH and VddL used in the circuit happens to be very close to each other. Thus, such a small difference in two supply voltages can eliminate the requirement of voltage level-shifters [4]. Thereby, properly selecting the subset of the logic gates which needs to be assigned with the VddH, we can significantly improve the performance of the circuit at an affordable power cost [4].
Though, the assignment of dual Vdd can be extremely interesting in case of NTCs, but looking at the physical design implementation part, this approach may cause an extra cost [4]. To reduce this extra cost of routing overhead, we may go for the row-based dual Vdd assignment, where the different rows of circuit are prioritized based on their time criticality, and according to that the rows residing in the critical path are driven by the VddH, while the rest of the rows in the circuit are provided with the VddL [4]. Now, in this work, to find out the effectiveness of the row-based dual Vdd assignment in case of NTCs, the scheme is implemented on an example circuit, which is actually the 4-operand CSA, as described in [8]. The rest of the paper is organized as follows. Section 2 introduces the details of several design issues for the subthreshold circuits. In Section 3, the row-based dual Vdd assignment for a 4-operand CSA has been presented, whereas the near-threshold operation of the 4-operand CSA and its performance analysis has been illustrated in Section 4. Section 5 describes the conclusion of this work.
2. Subthreshold Circuit Design Issues2.1. Modeling the Minimum Energy Point
In case of subthreshold operation (Vdd< vth), the current that flows through the channel of a transistor is mainly due to diffusion [9]. Now, for the purpose of estimating the minimum energy point of a subthreshold circuit, we can take the help of the current model which serves as the basis for the entire analysis [9]. Assuming that total drain current in subthreshold regime is equal to the subthreshold current (Isub) and considering “n” as the subthreshold slope factor (n=1+Cd/Cox), VT as the thermal voltage (VT=KT/q), η as the linearized drain induced barrier lowering (DIBL) coefficient, and S as the subthreshold slope, the Isub can be represented as [3]
(1)Isub=I0×10((Vgs-vth-ηVds)/S)(1-exp-VdsVT),
where I0 denotes the drain current at Vgs (gate to source voltage) equal to vth (threshold voltage) and the Vds (drain to source voltage) dependence in the “quasisaturation” region has been modeled using the η [9].
Again, for a subthreshold circuit, the gate delay is expressed by the following [3]:
(2)td=[(K·CL·Vdd)Ion],
where K is denoting the delay fitting parameter and CL is giving the value of the output load capacitance of the gate.
Now, for the Vgs = Vds=Vdd≫VT, we can rewrite (2) as [3]
(3)td=(K·CL·Vdd)I0×10(((η+1)Vdd-vth)/S).
Thus, the propagation delay of the gate exponentially depends on the Vdd as well as the vth.
Next, the total energy consumed per cycle (assuming rail to rail swing, i.e., Vgs=Vdd for “ON” current) by a single gate can be expressed as [3]
(4)Etotal=Edynamic+Eleak,
where Edynamic=(α0→1)·CL·Vdd2 and Eleak=Ileakage×Vdd×td.
Ileakage denotes the amount of leakage current, whereas α0→1 gives the low to high activity of the output of the gate [3].
2.2. Optimum Sizing of the Various Logic Gates2.2.1. Subthreshold Voltage Transfer Characteristics (VTC) of the CMOS Inverter Circuit
For the 45 nm technology node, the SPICE model which is used for the purpose of simulation has the threshold voltage for the NMOS which is set to 0.469 Volt, whereas for the PMOS it is set to −0.418 Volt. Figure 1 shows the voltage transfer characteristics (VTC) curves of an inverter circuit, where the supply voltage is varied from 0 to 0.4 Volt (with an increment of 0.1 Volt), to inspect the behavior of the circuit in the subthreshold region. It is observed that for the ratio of the width of the PMOS (Wp) to the width of the NMOS (Wn) around 4 : 1 there is a sharp transition at the output, whenever the input value crosses the Vdd/2 level.
Voltage transfer characteristics curve of the CMOS inverter circuit.
2.2.2. Subthreshold XOR Gate Using Transmission Gate Logic
Figure 2(a) shows the conventional transmission gate based 8-transistor XOR that works at ultra-low voltages [9]. Besides, the use of transmission gates in the design helps to balance the number of parallel devices which are operating with the minimum voltage [9].
(a) 8-transistor transmission gate XOR [9]. (b) Plot of Pavg and tdmax versus (Wp/Wn).
Figure 2(b) illustrates the plot of the values of the Pavg and the maximum delay at output (tdmax), with the variation of the Wp and Wn sizes of the transistors used in the circuit. It is seen that Wp/Wn = 800 nm/200 nm gives the optimum point where the two curves cross each other.
2.2.3. Subthreshold Operation of a Two-Inverter Chain or a Buffer Circuit
Here a two-inverter chain or a buffer circuit is firstly simulated with a single Vdd and thereafter with a dual Vdd (where Vdd1 is taken as 0.4 Volt and Vdd2 is taken as 0.8 Volt). In the first case, where Vdd is set to 0.4 Volt and frequency is 200 MHz, we considered the different Wp/Wn values (maintaining the above-mentioned ratio) for the transistors used in the buffer circuit. When the gate length (L) = 45 nm, Wp/Wn = 800 nm/200 nm, we found that the Pavg of the circuit is 3.528 × 10−8 Watt and the tdmax is 1.378 × 10−10 Second.
Now, in case of the dual Vdd assignment for any CMOS circuit, the major problem occurs when a low input swing starts driving a high Vdd gate. So, whenever a high voltage gate has to be driven by a low voltage gate, it becomes obvious to use a level converter (LC) [3]. Thus, the LC is supposed to perform the job of shifting the voltage from a lower level to a higher one. However, as the LCs do not implement any logic function, thereby the usage of a large number of LCs in a circuit may ultimately cause in the area as well as energy overhead [7].
To mitigate this issue, the concept of the use of a second threshold voltage for the PMOS transistors in the higher voltage gates (which are actually driven by the lower voltage gates) has been described in [7]. We followed a similar concept here (as shown in Figure 3), except for the fact that, for the purpose of increasing the threshold of those PMOS transistors, we have actually increased their gate lengths [10]. The overall performance of this buffer circuit, with a dual Vdd, has been described in Table 1.
Performance of the buffer circuit with dual Vdd.
Vdd1(Volt)
Vdd2 (Volt)
Wp/Wn
L(nm)
Average power (Watt)
Delay, tdmax(Second)
Output swing (mVolt)
0.4
0.8
800/200
45
3.553×10-6
—
450 to 800
60
1.132×10-6
1.272×10-10
65 to 800
75
5.664×10-7
1.429×10-10
18 to 800
90
3.961×10-7
1.516×10-10
10 to 800
Buffer circuit with dual Vdd.
From Table 1, it can be seen that the best case results are obtained when the gate length of the PMOS transistor in the higher voltage inverter circuit is set to 90 nm.
2.3. Obtaining the Vdd,optimum for a Full Adder Circuit
Firstly, the full adder (FA) circuit of Figure 4 has been driven by the single Vdd [11, 12] and the inputs having the frequency of 200 MHz. This FA circuit (which has actually got no buffer circuits at its sum and carry outputs) will hereafter be called as FA1 if not otherwise mentioned.
Design of the full adder circuit, which is used in FA1 and FA2 blocks [11].
Now, to find out the Vdd,optimum for this FA1, we have varied the Vdd from 0.1 Volt to 0.8 Volt (with an increment of 0.1 Volt) and measured the changes in the values of Ileakage and tdmax (Table 2).
Variation in Eleakage with the change in Vdd.
Vdd(Volt)
Ileakage (Amp.)
Delay, tdmax (Second)
Eleakage (Joule)
0.1
0.9×10-9
—
—
0.2
1.8×10-9
1.45×10-8
0.5220×10-17
0.3
3×10-9
3.78×10-10
0.0340×10-17
0.4
5×10-9
10.79×10-11
0.0215×10-17
0.5
7.8×10-9
3.02×10-11
0.0117×10-17
0.6
12×10-9
1.61×10-11
0.0115×10-17
0.7
20×10-9
1.12×10-11
0.0156×10-17
0.8
33×10-9
0.907×10-11
0.0239×10-17
It is observed that, for the region of Vdd = 0.4 Volt to 0.6 Volt, the Eleakage (=Ileakage× Vdd× td) is minimum. But, considering the aspects of Edynamic (which will increase with the increase in Vdd), we have opted the Vdd = 0.4 Volt as the Vdd,optimum for the FA1 circuit.
In the next, the same full adder circuit of Figure 4 is provided with two buffer circuits at its sum and carry outputs. For those buffers, the first inverter is driven by a supply of Vdd1, whereas the second one is driven with the supply voltage which has the value of Vdd2. Besides, as mentioned earlier in Section 2.2, the length of the PMOS transistor of the second inverter is taken as L = 90 nm. Now this FA circuit, which is supplied with the dual Vdd, will hereafter be considered as FA2.
Table 3 shows the performance of this FA2 circuit, when the Vdd1 is set to 0.4 Volt, the frequency is taken as 200 MHz, and the Vdd2 is varied in between the range of 0.4 Volt to 0.8 Volt.
Performance of the FA2 circuit with dual Vdd.
Vdd1 (Volt)
Vdd2 (Volt)
Average power (Watt)
Delay, tdmax (Second)
Average power × Delay (Joule)
Sum
Carry
Sum
Carry
0.4
0.8
7.993×10-7
5.237×10-10
4.178×10-10
41.859×10-17
33.395×10-17
0.4
0.7
2.147×10-7
5.472×10-10
4.403×10-10
11.750×10-17
9.454×10-17
0.4
0.6
1.323×10-7
5.703×10-10
4.406×10-10
7.545×10-17
5.829×10-17
0.4
0.5
1.122×10-7
6.084×10-10
4.735×10-10
6.830×10-17
5.315×10-17
0.4
0.4
1.019×10-7
7.415×10-10
6.015×10-10
7.559×10-17
6.132×10-17
From Table 3, it can be inferred that the best case result is obtained, considering the power delay product, when the Vdd1 = 0.4 Volt and the Vdd2 = 0.5 Volt.
3. Row-Based Dual Vdd Assignment for a 4-Operand CSA
Figure 5 shows a 4-operand CSA, where four 4-bit binary numbers (say, A3A2A1A0, B3B2B1B0, C3C2C1C0, and D3D2D1D0) can be added with an initial carry-in [8]. The upper two rows of the circuit (as shown in Figure 5) form the 4-bit CSA, whereas the third row serves as the carry propagation adder (CPA) [8].
Architecture of the 4-operand CSA.
Now, for the purpose of fine tuning the performance, we can opt for the near-threshold operation of this example circuit by selectively using VddH for the gates which are in the critical path and VddL for the rest of the gates to reduce the overall power consumption [4]. The dotted line, as shown in Figure 5, is meant for denoting the critical path of the circuit. Moreover, considering the view point of physical design implementation, the approach of row-based dual Vdd assignment has been adopted here. For that, the entire circuit is partitioned into three different clusters of row/rows. The first cluster may be formed using the subset of row/rows which is/are not time-critical (hence driven by VddL), whereas the third cluster can be formed using the subset of row/rows which is/are time-critical (hence driven by VddH). Now, the row which resides in the second cluster should be studded with the gates which are well-equipped to do the interfacing job in between the row at VddH and the row at by VddL.
With this notion, in our modified 4-operand CSA design (as illustrated in Figure 5), row1 is driven by the VddL (=0.4 Volt), row2 is driven by a dual supply of VddL (= 0.4 Volt) and VddH (= 0.5 Volt), and row3 is driven by the VddH (= 0.5 Volt) only. Furthermore, as the basic building blocks of the CSA design, we have used FA1 blocks for both row1 and row3 and FA2 blocks for the intermediate row2.
4. Near-Threshold Operation of the Proposed Scheme of CSA Design and Its Performance Analysis
When the conventional CSA design of [8] has been simulated with a larger supply voltage (Vdd = 1 Volt), for the frequency of 20 MHz, the tdmax value is obtained as 2.071 × 10−10 Second. But, for the subthreshold operation (Vdd = 0.4 Volt) of the same circuit (even though the power consumption reduces drastically), the tdmax value gets increased to a much higher value of 7.774 × 10−9 Second. Thereby, the application of the subthreshold design is mostly limited to the low performance systems only.
Now, to maintain this excellent energy efficiency of the subthreshold design, while boosting the speed of operation by a significant amount, we can explore the performance of the design for the near-threshold operation [13]. And that is what we have actually done in this work. To evaluate the effectiveness of the near-threshold operation of our modified CSA design, it has been compared with the conventional CSA design as well as the subthreshold CSA design (as shown in Table 4).
Comparison of different design styles for the 4-operand CSA.
Design style
Supply voltage (Volt)
Average power (Watt)
Delay, tdmax(Second)
Conventional CSA design
Vdd=1
3.307×10-6
2.071×10-10
Proposed scheme of CSA design
VddL=0.4, VddH=0.5
3.009×10-7
4.324×10-9
Subthreshold CSA design
Vdd=0.4
1.577×10-7
7.774×10-9
While operating for a frequency of 20 MHz, the proposed scheme of CSA design consumes 3.009 ×10−7 Watt of Pavg, which is almost 90.9% lesser than that of the conventional CSA design. Again, looking at the delay at output, the proposed scheme of CSA design provides a 44.37% improvement in tdmax, compared to that of the subthreshold CSA design.
Figure 6 illustrates the variation in power consumption values, considering all the three design styles, for different frequencies like 20 MHz, 50 MHz, and 200 MHz.
Average power consumption values for different frequencies.
Following are the key points regarding the performance of the modified 4-operand CSA design presented in this work.
The first one is the flexibility of the choice of any higher supply voltage (as per the requirement) for the gates which are in the critical path. In case where the speed of the subthreshold circuit is important, we can tune it by increasing the VddH even up to 0.8 Volt.
From Figure 6, it can be inferred that the proposed scheme works fine not only for an operating frequency of 20 MHz but also for the higher frequency ranges.
Row-based dual Vdd assignment has been incorporated in the proposed scheme of CSA design to facilitate the physical design implementation part.
But, in the case where performance tuning is not the requirement, rather Pavg is the only concern, we may simply go for the subthreshold CSA design where the power consumption is minimum (Table 4).
Lastly, the limits for the upper/lower values of Pavg can be obtained through the use of Vdd as 1 Volt and Vdd as 0.4 Volt, respectively. Simultaneously, at these two Vdd limits, we will get the lower/higher amount of tdmax.
The proposed scheme of CSA design fits somewhere in between with a major advantage, that is, the flexibility of the choice of VddH (which can be used for the purpose of performance tuning).
5. Conclusion
In this work we have mainly focused on the performance analysis of a row-based dual Vdd CSA design which operates in the near-threshold voltage regime. For that purpose, we used two supply voltages: VddH (= 0.5 Volt) and VddL (= 0.4 Volt). Besides, the entire circuit is partitioned into different clusters of row/rows, and all the logic gates which reside in a particular cluster have been driven by a single supply (may be VddH or VddL). Moreover, a fair comparison among the different design styles for a 4-operand CSA has also been presented here. From the results obtained, we can easily infer that the near-threshold operation of the proposed scheme of CSA design can be very much effective in reducing the overall energy consumption, like a subthreshold design. At the same time, it can also be useful in tuning the performance of the circuit so that the maximum delay at output gets reduced.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The authors would like to thank SMDP-II Project Lab., IC Design & Fabrication Centre, Jadavpur University, for giving them the opportunity to carry out this work using SPICE Tools.
RadfarM.ShahK.SinghJ.Recent subthreshold design techniques201220121192675310.1155/2012/9267532-s2.0-84864952437KimK.AgrawalV. D.Minimum energy CMOS design with dual subthreshold supply and multiple logic-level gatesProceedings of the 12th International Symposium on Quality Electronic Design (ISQED '11)March 2011Santa Clara, Calif, USA68969410.1109/ISQED.2011.57708042-s2.0-79959193099KimK.AgrawalV. D.True minimum energy design using dual below-threshold supply voltagesProceedings of the 24th IEEE Annual Conference on VLSI DesignJanuary 201129229710.1109/VLSID.2011.672-s2.0-79952828818KakoeeM. R.BeniniL.Fine-grained power and body-bias control for near-threshold deep sub-micron CMOS circuits20111213114010.1109/JETCAS.2011.21592852-s2.0-80052054946LackeyD. E.ZuchowskiP. S.BednarT. R.Managing power and performance for system-on-chip designs using voltage islandsProceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD '02)200219520210.1109/ICCAD.2002.1167534HuJ.ShinY.DhanwadaN.MarculescuR.Architecting voltage islands in core-based system-on-a-chip designsProceedings of the International Symposium on Lower Power Electronics and Design (ISLPED '04)August 20041801852-s2.0-16244400467DirilA. U.DhillonY. S.ChatterjeeA.SinghA. D.Level-shifter free design of low power dual supply voltage CMOS circuits using dual threshold voltagesProceedings of the 18th International Conference on VLSI Design: Power Aware Design of VLSI SystemsJanuary 20051591642-s2.0-27944433309YeoK. S.RoyK.2009Tata McGraw HillWangA.CalhounB. H.ChandrakasanA. P.2006SpringerJohnsonM.RoyK.Subthreshold leakage control by multiple channel length CMOS (McCMOS)199780KangS.-M.LeblebiciY.20033rdTata McGraw-HillBasakS.SahaD.MukherjeeS.ChatterjeeS.SarkarC. K.Design and analysis of a robust, high speed, energy efficient 18 transistor 1-bit full adder cell, modified with the concept of MVT schemeProceedings of the 3rd International Symposium on Electronic System DesignDecember 201213013410.1109/ISED.2012.232-s2.0-84880215426ZhaiB.DreslinskiR. G.BlaauwD.MudgeT.SylvesterD.Energy efficient near-threshold Chip Multi-processingISLPED, ACM/IEEE, 2007