Device and Circuit Design Challenges in the Digital Subthreshold Region for Ultralow-Power Applications

In recent years, subthreshold operation has gained a lot of attention due to ultra low-power consumption in applications requiring low to medium performance. It has also been shown that by optimizing the device structure, power consumption of digital subthreshold logic can be further minimized while improving its performance. Therefore, subthreshold circuit design is very promising for future ultra low-energy sensor applications as well as high-performance parallel processing. This paper deals with various device and circuit design challenges associated with the state of the art in optimal digital subthreshold circuit design and reviews device design methodologies and circuit topologies for optimal digital subthreshold operation. This paper identifies the suitable candidates for subthreshold operation at device and circuit levels for optimal subthreshold circuit design and provides an effective roadmap for digital designers interested to work with ultra low-power applications.


Introduction
In digital VLSI system design space, considerable attention has been given to the design of high-performance microprocessors.However, in recent years, the demand for power sensitive designs has grown significantly.This tremendous demand has mainly been due to the fast growth of battery-operated portable applications such as personal digital assistants, cellular phones, medical applications, wireless receivers, and other portable communication devices.Further, due to the aggressive scaling of transistor sizes for high-performance applications, not only does subthreshold leakage current increase exponentially, but also gate leakage and reverse-biased source-substrate and drain-substrate junctions band-to-band tunneling (BTBT) currents increase significantly.The tunneling currents are detrimental to the functionality of the devices.Well-known methods of lowpower design (such as voltage scaling, switching activity reduction, architectural techniques of pipelining and parallelism, Computer-Aided Design (CAD) techniques of device sizing, interconnect, and logic optimization) may not be sufficient in many applications such as portable computing gadgets, medical electronics, where ultra low-power consumption with medium frequency of operation (tens to hundreds of MHz) is the primary requirement.To cope with this, several novel design techniques have been proposed.Energy recovery or adiabatic techniques promises to reduce power in computation by orders of magnitude.But it involves use of high-quality inductors which makes integration difficult.More recently, design of digital subthreshold logic was investigated with transistors operated in the subthreshold region (supply voltage (V dd ) less than the threshold voltage (V th )) of the transistor) [1][2][3][4].In such a technique the subthreshold leakage current of the device is used for necessary computation.This results in high transconductance gain of the devices (thereby providing near ideal voltage transfer characteristics of the logic gates) and reduced gate input capacitance.Its impact on system design is an exponential reduction of power at the cost of reduced performance.Digital computation using subthreshold leakage current has gained a wide interest in recent years to achieve ultralow-power consumptions in portable computing devices.Both logic and memory circuits have been extensively studied with design consideration at various levels of abstraction.It has been shown that using subthreshold operation, significant power savings can be achieved in applications requiring low to medium (ten to hundreds of megahertz) frequency of operation [5][6][7].
This paper is organized as follows.The scope of subthreshold operation for ultra-low-power applications is presented in Section 2. Various challenging issues confronting the current and future robust subthreshold circuit design are reviewed in Section 3. Section 4 presents various device level optimization methodologies identified for optimal subthreshold operation.Section 5 shows various circuit styles other than static CMOS suitable for robust subthreshold operation.Finally conclusions are drawn in Section 6.

Scope of Subthreshold Operation for Ultralow Power Applications
Sub-threshold circuits operate with a supply voltage that is less than the threshold of the transistor-far below traditional levels and consequently the transistor operates essentially on leakage.While traditional digital CMOS has relied on running transistors either in the ON state (saturation) or OFF state (subthreshold), subthreshold circuits are either in an OFF state or an almost-ON state (still in subthreshold regime but with weak inversion).Running at these nonstandard operating points limits performance, which remains acceptable for low-to-medium cost applications given the substantial increase in the corresponding energy efficiency.
As power is related quadratically to the supply voltage, reducing the voltage to these ultra-low levels results in a dramatic reduction in both power and energy consumption in digital systems.Due to the exponential current-voltage (I-V) characteristics of the transistor, subthreshold logic gates provide near ideal voltage transfer characteristics.Furthermore, in the subthreshold region, the transistor input capacitance is less than that of strong inversion operation.
The transistor input capacitance (C i ), in subthreshold, is a combination of intrinsic (oxide capacitance (C ox ) and depletion capacitance (C d )) and parasitic (overlap capacitance (C do ), fringing capacitances ((C if , C of ), etc.) of a transistor (Figure 1) and is given by [8] In contrast, the input capacitance in strong inversion operation is dominated by the oxide capacitance.Due to the smaller capacitance and lower supply voltage (< threshold voltage of the transistor), digital subthreshold circuits consume less power than their strong inversion counterpart at a particular frequency of operation.However, since the subthreshold leakage current is used as the operating current in subthreshold operation, these circuits cannot be operated at very high frequencies.Figure 2 illustrates the region of operation for digital subthreshold operation.
The potential for minimizing energy at the cost of speed degradation defines the following set of applications for which subthreshold circuits are well suited.
(a) Energy-constrained applications such as wireless sensor nodes, RFID tags, medical equipments such as hearing aids and pace-maker, wearable computing or implants, Personal digital assistants, energy scavenging applications, and Laptops, which are dominated primarily by the need to minimize energy consumption and increase battery life time, speed is a secondary consideration for this class of applications, so subthreshold circuits offer a good solution.
(b) Many burst mode applications, requiring highperformance for very short duration between extended periods of low-performance operation, Sub-threshold circuits can minimize energy for computations executed during the low-performance slots.This type of applications appears almost in every design, including the high-performance microprocessors, and cell phones.

Roadmap or State-of-the-Art Challenging Issues in Digital Subthreshold Circuit Design
We have identified various device and circuit design challenges which need to be addressed for advancing the stateof-the-art in subthreshold circuit design, emphasizing the need for Codesign at all levels of abstraction like device, circuit and architecture, and so forth.This section provides an interesting insight and challenges for designers interested to work with energy-constrained applications, particularly taking advantage of subthreshold circuits.
(1) Device Optimization for Subthreshold.Sub-threshold circuits can greatly benefit from redesigning the devices.In addition to technology scaling for improving performance in subthreshold operation, devices need to be optimized for subthreshold operations for higher operating frequency, since conventional devices, which are optimized for the operation in a strong inversion region, may not give optimal results for subthreshold operation [9][10][11][12][13][14][15][16][17].
(2) Exploring Logic Families Optimal for Subthreshold Circuit Design.The low V dd results in a reduced I ON /I OFF ratio that can reduce robustness.Static CMOS gates continue to function in subthreshold, but because of enhanced problem of short-channel effects due to variations at nano scale, logic families other than CMOS may offer greater resiliency to certain variation sources such as voltage or process.Therefore, design of robust subthreshold logic circuits exploring logic families other than static CMOS is another open area for exploration [18][19][20][21][22][23][24].
Variability due to all sources, including Process, Voltage, and Temperature (PVT) are all magnified in subthreshold circuits due to the exponential I-V characteristics.So, there is a great need for coming up with a range of effective techniques to combat this variability and design robust and reliable subthreshold circuits [25][26][27][28][29][30].
(4) Device Modeling and Sizing Analysis for Subthreshold.For V dd < V th , delay increases exponentially with additional voltage scaling.Leakage current integrates over the longer delay until leakage energy per operation exceeds the active energy.There is a great need for developing models that capture this effect and illustrating the impact of variations on minimum energy point, optimal supply voltage, and threshold voltage for subthreshold circuits [31][32][33][34].
(5) Need for Alternative Scaling Trends for Subthreshold.The scaling of transistor dimensions and electrical characteristics represents both an opportunity and a threat for subthreshold circuits.Device scaling offers a reduction in gate capacitance, and at super-threshold voltages, it offers a welcome reduction in switching energy and gate delay.Scaling has also led to a dramatic increase in density (which was an effective costreduction measure in the past).At the same time, device scaling has brought about a number of problems in superthreshold circuits, including process variability, increased subthreshold leakage, and increased gate leakage.The implications of device scaling on super-threshold circuits have been explored previously by many, however, no such focus has been given to subthreshold circuits.Transistor design is particularly important in the subthreshold regime due to exponential sensitivities to V th , V dd , and inverse subthreshold slope; therefore, it is not immediately clear how subthreshold circuits will fare under device scaling.Very few have comprehensively studied the effects that device scaling will have on subthresholdcircuits. Therefore, clear understanding of the consequences of traditional performance-driven scaling on subthreshold combinational blocks and SRAM cells is important and also coming up with improved scaling strategies targeting the needs of subthreshold circuits [35][36][37][38].
( (b) Reduced ON-to-OFF current ratios complicate the reading and writing steps.None of the current approaches is completely satisfactory and advancements in this area are also one of the most crucial needs for the proliferation of extreme low-power systems.
(7) Need for Codesign Approach for Subthreshold.In the new paradigm of computation with leakage, unfortunately, conventional wisdom can deliver low-power systems but fails to provide the optimal or near-optimal solution.For subthreshold operation, the lowest power for a given throughput can be achieved only by a complete Codesign in all the aspects of device, circuit, and architecture design [16,17].A lot of work need to be done in that direction.In addition, a complete Codesign, at all levels of hierarchy (device, circuit, and architecture) can further suppress the process variation effects, reduce the power consumption, and improve the performance.Therefore, variability-aware design strategies at all levels of abstraction device, circuit, and architecture, are imperative to ensure the success and functionality of power-efficient designs [44].
(8) Developing Subthreshold Benchmark Circuits.Since there are no industrial subthreshold devices to compare the results with those of any optimized subthreshold devices, there is a need to build benchmark circuits with the subthreshold devices to compare issues such as variation immunity, power, and performance with respect to constructed subthreshold circuits with standard devices.
(9) Advancement in CAD Tools.Another significant issue for subthreshold operation is system verification.Using SPICE for verifying large systems rapidly becomes infeasible when the number of process corners, temperature corners, and voltage supply values increases.Hspice is too slow to run larger circuits and Nanosim can simulate large netlists in reasonable time, but will not correctly model the devices for supply voltages below 1 V. Therefore, need for either modifications of current simulators or a new subthreshold circuit simulator to verify large systems running at such ultra low voltages and to estimate the power dissipation of circuits.Advances in CAD tools to account for this problem become necessary.These tools must also address statistical distributions of delay and power introduced by local variations [45,46].
(10) Ultradynamic Voltage Scaling (UDVS).Since an entire system may not be able to operate completely in subthreshold region, there is a need for periodic switching of devices from strong inversion to subthreshold operation.Therefore, UDV is a strong candidate for tying together subthreshold operation and higher performance operation.Work related to UDVS focusing on system integration can also be done.Decisions related to the best interfaces among blocks operating at different effective rates and V dd values will impact the system energy and delay.Selecting the best bus protocols, level converters, and dc/dc converters for a system remains an open problem.Also, theoretical work related to UDVS can investigate optimum scheduling and control at the system level.The system level analysis can consider all of the blocks and their modes of operation all the way from full shutdown to full speed active mode [45,46].
(11) Architectures for Optimal Subthreshold Circuits.There is much future work opportunities in the area of architectures for subthreshold circuits.One area is the use of pipelining and massively parallel architectures that increase the activity factor of a circuit and requires minimum supply voltage operation.There is also great need for developing complete subthreshold standard cell library which will provide further opportunities to optimize for minimal energy dissipation [45,46].

Device-Level Optimization Methodologies for Subthreshold Operation
In conventional methods, standard transistors were operated in the subthreshold region to implement subthreshold logic.Standard transistors are the "super-threshold transistors" that are optimized for ultrahigh-performance design.It is only prudent to investigate if the standard transistors are well suited for subthreshold operation.The following device optimization methodologies have been identified, giving a good insight for coming up with new methodologies for present and future technology nodes.
4.1.Bulk CMOS Technology for Subthreshold Operation.We have identified various device optimization methodologies for bulk CMOS technology in the subthreshold region and we hope this section will provide a good brief to the readers in identifying the gaps of technology [9][10][11][12].

Device Optimization by Changing Channel Doping Profile for Subthreshold Operation.
It is an established fact that for scaled super-threshold transistors it is essential to have halo and retrograde doping to suppress the short-channel effects.The main functions of halo doping and retrograde wells are to reduce drain-induced barrier lowering (DIBL), prevent body punch through, and control the threshold voltage of the device independent of its subthreshold slope.However, in subthreshold operation, it is worthwhile to note that the overall supply bias is small (in the order of 0.15 V-0.3 V).Consequently, the effects of DIBL and body punch through are extremely low.Further, as long as we meet I ON budget, better subthreshold slope (S) leads to a better device.Since our interest is in the region below the threshold voltage, it is not of any interest to us, where the threshold voltage of the device actually is, as long as wemeet a predefined I ON and S. Hence, it has been qualitatively and quantitatively shown that the halo and retrogradedoping are not essential for subthreshold device design [9].The absence of the halo and retrograde doping has the following implications.
(i) A simplified process technology in terms of process steps and cost.
(ii) A significant reduction of the junction capacitances.The halo regions near the source-substrate and the drain-substrate regions significantly increase the junction capacitances thereby increasing the switching power and the delay of the logic gates.The absence of the halo/retrograde doping will reduce this junction capacitance.
It should, however, be noted that the doping profile in these optimized devices should have a high-to-low profile [9].It is necessary to have a low doping level in the bulk of the device to (i) reduce the capacitance of the bottom junction; (ii) reduce substrate noise effects and parasitic latch-up.
Table 1 shows that the optimized subthreshold device improves in the values of subthreshold slope by 7.8%, junction capacitance by 34.7%, ON current by 60%, and PDP by 50% compared to the standard device due to abovementioned factors.[9] that halo and retrograde doping profiles are not necessary in devices for sub threshold operation (due to low-supply voltage), and instead, a highlow doping profile is suitable to achieve better subthreshold slope and lower junction capacitance.In that analysis, however, a minimum possible oxide thickness (T ox ) provided by the technology is assumed for better sub threshold slope.However, minimum possible oxide thickness may not be optimum for subthreshold operation because it does not guarantee minimum energy consumption, which is the primary goal of subthreshold operation [12].
Although the intrinsic gate capacitance of the transistor in the subthreshold operation is dominated by depletion capacitance, the parasitic capacitances such as the overlap and fringe capacitances (see Figure 1) will eventually dominate the overall gate capacitance if the oxide is too thin.Therefore, a detail analysis of the oxide thickness optimization of transistors for subthreshold operation is necessary.Note that in conventional strong inversion operation, the effective gate capacitance is dominated by oxide capacitance (C ox ; Figure 1) and a minimum T ox , which improves the subthreshold slope (S), is desirable to achieve highperformance.In the subthreshold operation, the effective gate capacitance C g of a transistor is dominated by the intrinsic depletion and the parasitic (both overlap and fringe) capacitances that strongly depend on T ox , while overlap capacitances are inversely proportional to T ox , fringe capacitances are logarithmic function of oxide thickness.In energy-constrained design, the primary objective of the subthreshold operation will be to optimize T ox to minimize these capacitances.However, change in T ox affects both effective capacitance and the subthreshold swing.Figure 3(a) demonstrates that reducing T ox improves subthreshold swing S; it, however, also increases C g in Figure 3(a) and beyond a certain point, the improvement in S is masked by the degradation in C g in Figure 3(a).The improvement in S though reduces the supply voltage requirement to achieve a particular performance (i.e., a desired ON current); it may, however, result in an overall increase in power (C g • V 2 dd • f ) due to the increase in C g if an optimum T ox is not chosen.Figure 3(b) shows the dynamic energy E dyn versus T ox for different fanouts.It can be seen that the required V dd for constant I ON reduces with T ox as expected.However, E dyn does not monotonically reduce with oxide thickness, and the minimum occurs at around T ox , which is larger than the minimum T ox (1.2 nm) offered by the technology.Further, the optimum T ox (corresponding to minimum energy) is approximately the oxide thickness,  [12].(b) Change in E dyn with T ox [12].
where the increase in C g exceeds the improvement in subthreshold swing (Figure 3(a)).Note that optimum T ox has a weak dependency on the fanout (Figure 3(b)), however, the variation in minimum E dyn is less than 2%.Exponential I ds -V g (linear log I ds -V g ) relation in the subthreshold region also ensures that optimum T ox will provide minimum dynamic energy at all performances (interpreted as I ON ) as long as the circuit is operated at the subthreshold (V dd < V th ).Therefore, it was demonstrated that minimizing oxide thickness to improve subthreshold slope does not necessarily provide minimum energy consumption in digital subthreshold operation.It was shown that the oxide thickness should be optimized considering the changes in both transistor effective capacitance and the subthreshold slope to achieve minimum power consumption.

New Device Sizing Utilizing Reverse Short-Channel
Effects for Subthreshold Operation.In order to design optimal subthreshold circuits using CMOS devices that are targeted for super-threshold operation, it is crucial to develop techniques that can utilize the side effects that appear in this new regime.One such mechanism, the pronounced reverse SCE (RSCE), is used to achieve optimal performance in subthreshold circuits [11].SCE (or V th roll-off) is an undesirable phenomenon in short-channel devices where V th decreases as the channel length is reduced.Variation in critical device dimensions translates into a larger variation in the threshold voltage as SCE worsens with increasing DIBL.Typically, non uniform HALO doping is used to mitigate this problem by making the depletion widths narrow and hence reducing the DIBL effect.As a byproduct of HALO, a shortchannel device shows RSCE behavior where the V th decreases as the channel length is increased.
In subthreshold circuits, the SCE mechanism is not as strong as in super-threshold circuits because the drain-tosource voltage is very small.On the other hand, RSCE is still significant enough to affect the subthreshold performance.Moreover, current becomes an exponential function of V th in this regime, which makes it possible to use longer channel-length devices that utilize RSCE for improving drive current.Unlike the case in super-threshold circuits, using a longer channel length in subthreshold does not have a significant impact on the load capacitance.This is due to the reduced depletion capacitance under the gate.This method proposes transistor sizing considerations for subthreshold operation utilizing the RSCE to improve drive current, capacitance, process variation, subthreshold swing, and improved energy/dissipation.
Table 2 shows the implications of this device sizing at device and circuit-level properties.The subthreshold swing of the proposed method is 71 mV/dec, which is 16 mV lower than that of the conventional minimum channel device.The improved subthreshold slope reduces the off-current by 30% for the same on-current.At 0.2 V, the I ON : I OFF ratio was 484 for the proposed scheme, which is a 2.5 times improvement over the conventional minimum channel device.Circuits using the proposed sizing scheme are more robust against Random Dopant Fluctuations (RDFs) because of the increased gate area at the optimal performance point.The proposed sizing scheme reduces delay and power dissipation simultaneously, which is not possible using conventional sizing schemes.As a result, a significant improvement in energy is obtained.Average delay in ISCAS benchmark circuits was improved by 13% while average power dissipation and energy dissipation were reduced by 31% and 40%, respectively.

New Device Sizing Based on Subthreshold Logical Effort.
In conventional logical effort calculations, the optimal ratio of PMOS width (W p ) to NMOS width (W n ) for achieving equivalent current drivability is approximately 2.5 : 1 due to the mobility difference between the carriers between the PMOS and NMOS devices.In addition, the effective width of a transistor in a stack of n devices is roughly 1/n in the strong-inversion region.This means that in order for an nstack to conduct the same amount of current as a single transistor, the devices in the stack must each be sized up by a factor of n.Selection of the proper W p :W n ratio and effective width of stacked transistors is crucial for achieving optimal performance.It was found that the conventional logical effort framework based on strong-inversion operation fails to do so for subthreshold logic due to the difference in the transistor current behavior [10].In the strong-inversion regime, drive current is a first-or second-order function of the four MOS terminal voltages.Whereas, the drive-current in subthreshold designs is an exponential function of the terminal voltages.Hence we need a new design paradigm for optimal device sizing based on the exponential current equation in the subthreshold region.The optimal PMOS to NMOS width ratio in the subthreshold regime was found by simulating a chain of equally sized inverters and observing the rise and fall delays.Results show that a 1.5 : 1 ratio gives equal delays for the rise and fall transitions at V dd = 0.2 V, and a slightly smaller ratio is optimal for V dd = 0.3 V [10].This optimization scheme resulted in performance gains of up to 13.5% for ISCAS benchmark circuits and 33.1% for component circuits operating in subthreshold, which was shown to match theoretically attainable improvements.[13][14][15][16].The Key benefits of choosing DGMOSFETs for subthreshold operation are as follows.

Double Gate-MOSFETs for Subthreshold Operation
(1) Double gate (DG)-MOSFET is promising for subthreshold operations due to its near-ideal subthreshold slope [13].
(2) DG-MOSFET subthreshold operation shows that devices with longer channel length (compared to minimum gate length) can be used for robust subthreshold operation without any loss of performance [13].
(3) Raised S/D structure is not necessary for subthreshold operation and can be simplified greatly [13].
(4) Device will have better resiliency to L g , T ox , T si , RDF variations due to underlying SOI structure.
(5) By using optimum gate underlap, the parasitic capacitances can be significantly reduced resulting in higher performance and lower power consumption [14].
(6) Independent control of front and back gates and asymmetric DGMOS can be effectively used for designing low-power and high-performance circuits [15].(7) Better scalability compared to bulk CMOS and Device characteristics including I ON and I OFF can be optimized by the choice of device geometries, gate material, work-function, and so forth [15].(8) Junction capacitances will be significantly smaller compared to Bulk CMOS and leading to better power, delay performance.
Various DGMOS device optimization methodologies for subthreshold operation have been identified and are presented in the following subsections.).In a short-channel device where velocity saturation occurs, I ON is a weak function of gate length so that the dependence of delay on gate length is mainly decided by C g , and, hence, delay increases linearly with the increase of gate length in super-threshold operations.In contrast, as can be seen in Figure 4, we observe that the optimal channel length for the maximum performance of DG-MOSFET subthreshold logic is longer than the minimum L g when the I OFF of every device is matched [13].As shown in Figure 5, C g is almost constant regardless of L g in the DG-MOSFET subthreshold device because the main component of C g for the subthreshold DG-MOSFET is the gate overlap capacitance and fringing gate capacitance, which are not dependent on L g .Note that the intrinsic capacitance of DG-MOSFET is negligible [13].Hence, dependence of delay in DG-MOSFET subthreshold operation is mainly decided by I ON .With a relatively small increase in C g , a longer channel device has larger I ON in the subthreshold region under the same I OFF condition due to the smaller subthreshold slope.Note that I ON in the subthreshold region is decided by the subthreshold slope (S) only if I OFF is fixed.I OFF of each device is matched with different L g by adjusting metal gate work functions [13].

DGMOS Devices with
As shown in Figure 6, S of the short-channel device is larger than that of the long-channel device due to the short-channel effect.Figure 6 also shows the dependency of I ON to S in the subthreshold region.Since the current does not increase with L g once S approaches the ideal limit (Figure 6), there is an optimal L g for a minimum delay as shown in Figure 4. Hence, the optimal channel length for the subthreshold operation is the minimum channel length that has an ideal subthreshold slope.
Figure 7 shows that short-channel device is more sensitive to L g variation compared to long-channel device due to the drain-induced barrier lowering.variation.Figure 7 shows that variation in T si causes around ±10% I ON variation for long-channel symmetric devices due to the volume inversion effect.The short-channel device experiences more I ON variation due to two-dimensional short-channel effects in addition to the volume inversion [13].So, long-channel device will be more suitable more subthreshold operation than short-channel devices.

DGMOS Devices with Optimum Underlap for Subthreshold Operation.
The impact of gate underlap on the effective gate capacitance of double-gate MOS (DGMOS) transistor for digital-subthreshold operation is analyzed in this paper.It shows that with optimum gate underlap, the parasitic fringe capacitances of DGMOS can be significantly Figure 7: Sensitivity of I ON (I ds at V dd = 0.2 V) to variation of L g , T ox , and T si [13].
reduced resulting in higher performance and lower power consumption [14].Figure 8 shows the schematic of an underlap DGMOS device.The parasitic capacitances of DGMOS include the overlap (C ov ) and the fringe (C fr ) capacitances.Since, in an underlap device there is no C ov , the effective gate capacitance (C g ) is dominated by C fr .The fringe capacitance of DGMOS consists of inner (C if ) and outer (C of ) fringe components, which strongly depend on the device geometry.
It can be seen from Figure 9 that the effective gate capacitance C g initially decreases with the increase in underlap and then becomes flat.This is because C g is dominated by the fringe capacitance (C fr ), which is a logarithmic function ) and I OFF (I ds at V gs = 0) also decrease with the underlap (Figure 10).It can be observed that initially the percentage reduction in I ON is more than that of C g .This indicates that in this region the delay of the circuit with underlap will be more than that of no-underlap case.Beyond a certain L un (15 nm), C g still reduces logarithmically with L un , while I ON decreases only linearly resulting in less percentage reduction than C g (Figure 10).Consequently, for L un > 15 nm, the delay of the RO decreases with the increase in L un .Though the delay of the RO first increases with the underlap and then decreases, both power and PDP reduce monotonically with underlap.It can be observed that 40% improvement in delay can be achieved with optimum L un with 7.3× reduction in PDP for a full adder circuit.It can be seen from Table 3 that the above subthreshold (V dd = 0.2 V) full adder circuit with 50 nm underlap DGMOS devices can be operated at 1.25 GHz frequency with 6.2× less energy consumption than zero-underlap device [14].

DGSOI Technology with Codesign Methodology for
Optimal Subthreshold Operation.This presents a design methodology in all the levels of hierarchy (device, circuit and architecture) for ultralow-power digital subthreshold operation (V dd < V th ).It has been demonstrated that conventional design techniques are not optimal for subthreshold design.By proper Codesign [16,17] it is possible to obtain hundreds of MHz of performance in subthreshold systems with very low-power.Further demonstrated that doublegate MOSFETs are better suited for subthreshold operation (∼10× higher throughput at iso-power) than bulk MOSFETs [16].This is due mainly to the fact that DG-SOI has no intrinsic capacitance in the subthreshold region.Double Gate MOS (DGMOS) transistors are suitable for subthreshold operation due to their near ideal subthreshold slope and negligible junction capacitance.Due to the thin fully depleted silicon body sandwiched between two gates, these devices have an excellent gate control over the channel.Furthermore, the undoped thin silicon body provides negligible source/drain p-n junction capacitance, which largely enhances the circuit performance.In subthreshold operation, the intrinsic capacitance of DGMOS is also negligible and is very weakly dependent on the channel length.For iso-I OFF conditions, Table 4 presents a comparison of the important properties for the standard and optimized bulk and DG-SOI devices for subthreshold operation.It can be noted that due to near ideal subthreshold slope, the DG-SOI devices have almost an order of magnitude higher ON-current compared to the bulk devices [16].Table 4 illustrates the PDP of the bulk inverter and the SOI inverter (driving another inverter) operating in subthreshold regime.Note that the DG-SOI inverter has almost one order of magnitude lower PDP than the corresponding bulk device.This can be ascribed to the fact that the intrinsic capacitance of DG-SOI is negligibly small and hence the switching energy is extremely low.This makes the DG-SOI an extremely powerful technology to do subthreshold design.Sub-pseudo NMOS is also more efficient than sub-CMOS in terms of PDP.This is true in both the bulk and the DGSOI technologies.Simulation results (for both the technologies) of a pseudo NMOS inverter (driving an identical inverter) and a CMOS inverter are compared in Table 4.We observe that in the bulk subthreshold region, pseudo-NMOS gives approximately 20% improvement in PDP compared to CMOS.In DG-SOI the improvement is more than 30%.With the device/circuit/architectural optimizations, the throughput obtained is more than two times better (for iso-power) than the conventional design (Table 4) in case of the bulk technology.The same strategy has been applied to DG-SOI which results in an improvement of 3.8× in the throughput at iso-power conditions [16].Thus we may note that significant improvement can be achieved by proper Codesigning in all aspects namely, device, circuit and architecture.Overall comparison of the performance of the two technologies in subthreshold regime in terms of power-throughput tradeoff of the FIR filter after optimization in device/circuit and architectural levels for both the bulk and the DG-SOI technology illustrates that the DG-SOI technology has more than 10× improvements in throughput at iso-power compared to the bulk technology.This is due, mainly to the fact, that the DG-SOI in the subthreshold domain has no intrinsic capacitance, although the bulk transistors do.This significant lowering of the device capacitance increases the throughput of the overall system at iso-power.As a summary we have the following.
(i) By proper Codesign in aspects of device/ circuit/architecture we can improve the throughput at iso-power in the subthreshold region.(ii) DG-SOI MOSFETs inherently have no intrinsic capacitance in the subthreshold region, which gives significant improvement in PDP and DG-SOI is better suited to subthreshold operation than the corresponding bulk technology.

Carbon Nanotube (CNFETs) Technology for Subthreshold
Operation.Aggressive scaling of CMOS devices over different technology generations has led to higher integration density and performance.However, "short-channel effects" such as exponential increase in leakage current and large parameter variations stand in the way of scaling the devices much beyond 10 nm.Hence, research has started in earnest to consider alternative devices and circuit architecture in a sub-10-nm transistor era.Carbon nano tubes (CNTs) and molecular transistors have already gained widespread attention as possible alternative nanoscale transistors.CNTs are sheets of graphite rolled in the shape of a tube.Depending on the direction in which the nanotubes are rolled (chirality), they can be either metallic or semiconducting.Since their inception in the early 1990s, there has been immense research concerning the electrical properties of CNTs.The semiconducting nanotubes have been used in high-performance transistors where the channel is the nanotube itself.High-performance carbon nanotube fieldeffect transistors (CNFETs) with very high "on"-currents have been reported and the device physics has evolved [47][48][49][50][51][52][53][54][55].As high mobility devices are being investigated, near ballistic transport no longer seems impossible.Absence of scattering in the channel is the characteristic of ballistic devices [50].This makes them ultrahigh speed and apt for high-performance circuit design.The theory of CNT transistors is still primitive and the technology is still nascent.In order to determine whether or not the CNFET meets the performance/device requirement, a comparison of the traditional MOSFET and the newly developed CNFET was done.Before the comparison, the authors have made the assumption that the CNFET takes on the same characteristics as the MOSFET [51].The parameter code for the MOSFET and CNFET was developed by Arijit Raychowdury, graduate student mentor, electrical and computer engineering at Purdue University.To develop the correct FETs circuits, the authors in [52] used the parameter codes as include file within their main circuit codes.To compare the two types of transistors they designed and tested the inverter, ring oscillator, full adder, and the 4-bit ripple carrier circuits made of both MOSFETs and CNFETs.
Table 5 shows that in super-threshold operation, Ring oscillator constructed using CNFETs has frequency around 2 K times faster than the MOSFET-based Ring oscillator circuit and the Full adder is 125 times faster with just 1% PDP of an equivalent MOSFET-based Full adder circuit and 4-bit CNFET RCA circuit is 61 times feaster with 1% PDP of an equivalent MOSFET-based RCA circuit.Table 6 shows in subthreshold operation, Ring oscillator constructed using CNFETs has frequency around 8.4 K times faster than the MOSFET-based RO circuit and 4-bit RCA circuit designed with CNFETs are 440 times faster and with only 0.3% of PDP of an equivalent MOSFET-based 4-bit RCA circuit at 65 nm.This shows the superiority of CNFET based circuits compared to MOSFET-based circuits both for subthreshold and super-threshold operations and particularly for subthreshold operation.Sub-threshold MOSFET Ring oscillator operates at 85% speed lower compared to super-threshold MOSFET Ring oscillator.Whereas sub-threshold CNFET Ring oscillator operates at only 36% speed lower compared to super-threshold CNFET Ring oscillator.

Logic Families for Subthreshold Operation
In this section, we will evaluate the scope of various logic families other than static CMOS for designing optimal subthreshold logic circuits.We will evaluate the robustness, power, and performance improvements that can be brought by various logic families other than CMOS for subthreshold operation.The following logic families have been identified as suitable for designing more robust and energy efficient subthreshold circuits with some tradeoff.
(iv) Subthreshold DTMOS logic.[19].The improvement is mainly caused by the increase in the circuit gain.The exponential relationship between I ds and V gs in subthreshold region gives rise to an extremely high transconductance, g m .The much improved VTC yields better noise margins.Circuit designers can have more freedom in sizing the circuits and still obtain a near optimum delay value than strong inversion CMOS due to the wider range of flatness of PMOS to NMOS ratio [19].Sensitivity to Power Supply Variation has a significant negative impact on subthreshold circuit as the sensitivity of the gate delay due to V dd variation increases by a factor of 8 with decreasing power supply value for subthreshold CMOS logic [19].Hence, V dd stabilization is crucial for the proper operation of subthreshold circuit.

Subthreshold Pseudo-NMOS Logic.
In subthreshold region, Pseudo-NMOS logic is more robust than Pseudo-NMOS logic in strong-inversion, as its VTC is more closer to the ideal curve and also the voltage levels swing rail-torail due to large gain in subthreshold region, and does not suffer from low logic level degradation problem as with the case of the strong inversion case and also Pseudo-NMOS operates faster than CMOS consuming less area [19].Two main disadvantages of Pseudo-NMOS in strong inversion as compared to CMOS are higher power consumption and less robustness, which are eliminated in subthreshold region due to ideal device characteristics.In summary, Pseudo nMOS for subthreshold has better PDP and comparable robustness to static CMOS in subthreshold region.

VT Sub-CMOS Logic.
To ensure proper operations under different temperature and process variations, two subthreshold logic families, namely, Variable Threshold voltage Sub-threshold CMOS logic (VT-Sub-CMOS logic) and Sub-threshold Dynamic Threshold voltage logic (Sub-DTMOS logic) have been proposed [20].Both logic families show a significant improvement in stability to temperature and process variations while maintaining the same ultra low-power design constraint.VT-Sub-CMOS logic is sub-CMOS logic with an additional stabilization scheme.The stabilization circuit monitors any change in the transistor current due to temperature and process variations and provides an appropriate bias to the substrate.Any increase of the current above certain prespecified threshold value is thus reduced by an appropriate bias to the substrate.Both logic and stabilization circuits of VT-sub-CMOS work in the subthreshold region, that is, with a supply voltage less than the threshold voltage of the transistor (V dd < V th ).With proper substrate biasing, a stable operation can thus be achieved in VT-Sub-CMOS logic, thereby increasing the robustness of the circuit.However, the stabilization scheme incurs an additional overhead in area and circuit complexity.Table 7 shows that for 10% change in V th , the amount of change in the energy/switching (PDP) for strong inversion CMOS logic ranges from 0.1% to 1.4%, from 34.7 to 96.2% for Subthreshold CMOS, and only 5 to 42.4% for VT-Sub-CMOS logic showing improvement in VT-subCMOS tolerance to variations.For a temperature change from 25 • C to 100 • C, the energy/switching of strong inversion CMOS logic changes only by 28.2%.Sub-CMOS logic shows a change of 61.5% in its energy/switching, and VT-Sub-CMOS logic shows a change only of 33.7% [20].

Sub-DTMOS Logic.
Sub-DTMOS logic provides an alternative way to achieve the same stability with direct substrate biasing without using additional control circuitry as in the case of VT-sub-CMOS logic.Sub-DTMOS logic uses transistors whose gates are tied to their substrate [21].As the substrate voltage in sub-DTMOS logic changes with the gate input voltage, the threshold voltage is dynamically changed.In the OFF-state, that is, V in = 0 (V in = V dd ) for NMOS (PMOS), the characteristics of DTMOS transistor are exactly the same as regular MOS transistor.Both have the same properties, such as the same off-current, subthreshold slope, and threshold voltage.In the ON-state, however, the substrate-source voltage V bs is forward-biased and thus reduces the threshold voltage of DTMOS transistor.The reduced threshold voltage is due to the reduction of body charge.The reduction of body charge leads to another advantage, namely higher carrier mobility because the reduced body charge causes a lower effective normal field.The reduced threshold voltage, lower normal effective electric field, and higher mobility results in higher ON-current drive in DTMOS than that of a regular MOS transistor.Furthermore, the subthreshold slope of DTMOS improves and approaches the ideal 60 mV/decade which makes it more efficient in subthreshold logic circuits to obtain higher gain [21].Another significant advantage of the sub-DTMOS logic is that it does not require any additional limiter transistors, which further reduces the design complexity.In contrast, in the normal strong inversion region, the limiter transistors are necessary to limit the forward-biased V bs to be less than 0.6 V.This is to prevent forward-biasing the parasitic PN junction diode while allowing a much higher power supply to be used in the circuit.The PDP of DTMOS is comparable to the PDP of regular CMOS [21].Thus, using DTMOS logic, we can operate the circuit at much higher frequency while still maintaining the same energy/switching with enhanced robustness compared to static CMOS.5.5.Subthreshold Domino Logic.Sub-threshold static and ratioed logic has recently been proposed to satisfy the ultralow-power requirement in applications such as hearing aid, pace-maker, and wearable wrist-watch computer.These logic circuits, however, can be operated only at lower frequencies due to lower supply voltage.To increase the frequency of operation, subthreshold dynamic logic: Subdomino logic has been proposed [22].A standard full adder circuit implemented in both Subdomino and Sub-CMOS logic operating in the subthreshold region has been simulated.Results from Table 8 show that Subdomino logic has lower power consumption (32% of sub-CMOS), smaller area (60% of Sub-CMOS logic), and is 3 times faster than Sub-CMOS logic.It has also been shown that Subdomino logic has excellent noise margin [22].5.6.Subthreshold DTPT Logic.For the pass transistor logic, we can use dynamic threshold transistors whose gates are tied to the substrates forming the subthreshold dynamic threshold pass transistor (Sub-DTPT) [24].It has been observed that Sub-DTPT logic shows better stability to the temperature variation than the corresponding subPass Transistor logic.For example, in the second XOR structure in [24], the delay reduction caused for a 100 • C temperature increase is 17.8% for sub-PT and just 7.2% for sub-DTPT logic.

Conclusions
As supply voltage continues to scale with each new generation of CMOS technology, Sub-threshold design is an inevitable choice in the semi-conductor road map for achieving ultra low-power consumption.Device optimization is a must for optimal subthreshold operation to further reduce power and enhance performance.Comparative studies shows that double gate SOI devices and CNFETs are better candidates to work for subthreshold operation than Bulk CMOS devices.At circuitlevel, Sub-Pseudo-NMOS, Sub-DTPT and Subdomino logics can be considered for robust subthreshold operation due to their improved performance and better stability for PVT variations with reduced or comparable energy/switching to that of conventional static CMOS logic.Device/Circuit Codesign methodology can further enhance subthreshold operation in terms of performance and robustness.

Figure 6 :
Figure 6: I ON and S versus gate length [13].

Figure 10 :
Figure 10: Change in I ON and I OFF with underlap V dd = 0.2 V [14].

Table 3 :
[14]g optimum under lap[14].In contrast, C g of the device operated in strong inversion is dominated by the gate-oxide capacitance and hence does not vary considerably with underlap.While C g decreases with the gate underlap, I ON (I ds at V gs
5.1.Subthreshold CMOS Logic.Sub-threshold CMOS (Sub-CMOS) logic is the conventional CMOS logic operated in the subthreshold region.The voltage transfer characteristics (VTC) of the inverter gate running in subthreshold mode is closer to ideal compared to the VTC in normal strong inversion region