On Mixed PTL / Static Logic for Low-power and High-speed Circuits

We present more evidence in a 0.25 tm CMOS technology that the pass-transistor logic (PTL) structure that mixes conventional PTL structure with static logic gates can achieve better performance and lower power consumption compared to conventional PTL structure. The goal is to use the static gates to perform both logic functions as well as buffering. Our experimental results demonstrate that the proposed mixed PTL structure beats pure static structure and conventional PTL in 9 out of 15 test cases for either delay or power consumption or both in a 0.25 lm CMOS process. The average delay, power consumption, and power-delay product of the proposed structure for 15 test cases are 10% to 20% better of than the pure static implementations and up to 50% better than the conventional PTL implementations.


INTRODUCTION
With power becoming more and more a limiting factor for system performance, demand for low power circuits without sacrificing circuit performance is evident and will continue to be the focus of low-power high-performance circuit design community.
Conventional static CMOS logic gates have been widely used in ASIC design today mainly due to the advantages of easy synthesis and a well developed synthesis and analysis environment.
Pass-transistor logic (PTL), which was first pro- posed in [1,2], has attracted a lot of attention early on.PTL's advantage as an alternative low power circuit design approach in 0.5tm and 0.3tm CMOS technology has been well documented [3,4].However, PTL's advantage in deep-submi- cron CMOS technology is not well understood yet.There are speculations [3] that with aggressive scaling in transistor critical dimension and supply voltage, the advantages of PTL over conventional static CMOS gates will gradually diminish.This paper presents results in a 0.25 tm CMOS process that show PTL circuits, when mixed with static logic gates, offer up to 50% power-delay benefits compared to conventional PTL.The adoption of PTL is further exacerbated by the lack of synthesis environment to support the use of PTL gates.The well known method of synthesize PTL circuit from a set of logic equations was proposed in [5].The use of binary decision diagrams (BDDs) [6][7][8] made the synthesis process easy and straightfor- ward.To minimize delay, a buffer is inserted every fixed number of transistors in series in the PTL chain.The process of buffer insertion can also be viewed as the process of BDD decomposition as suggested in [9], where buffers are inserted at the root of each decomposed BDD.For complex logic functions, the number of buffers to be inserted in a PTL circuit can be significant.One way to alleviate this problem is to use static logic gates acting as buffer as well as performing logic functions.Mixed PTL with static logic gates has been proposed in [10], where only one circuit example is used to show the benefit of mixed PTL/Static logic structure in a 0.35 tim CMOS process.
This paper presents more evidence in a deepsubmicron CMOS process to further validate the benefit of mixed PTL circuits.By eliminating explicit buffering between PTL stages, mixed PTL circuits can further improve the low power nature of PTL circuits without sacrificing perfor- mance.Our experimental results show that our proposed structure beats pure static structure and conventional PTL in 9 out of 15 test cases for either delay or power consumption or both in a 0.25 lxm CMOS process.The average delay, power consumption, and power-delay product of the proposed structure for 15 test cases are 10% to 20% better of than the pure static implementations and up to 50% better than the conventional PTL implementations.
The remaining part of this paper consists of three sections.We will review some basic issues related to PTL and propose our mix PTL structure in Section 2. In Section 3, a set of experimental results is presented in detail.The concluding remarks are given in Section 4. The average power, Pay, consumed by a CMOS gate is given by Pay Vd f CL Ai (1) where Vdd is the supply voltage, f is the operating frequency, CL is the total load capacitance driven by the gate, and Ai is the switching activity of the gate.Vdd, f, and Ai are fixed.Decreasing CL will have a positive impact on power consumption.
Since the input capacitance of a PTL gate is smaller than the input capacitance of a static CMOS gate, replacing CMOS gates with PTL gate should result in a reduction in the power consumed.
We cannot arbitrarily connect PTL gates together, however, since the delay through pass- transistors is a quadratic function of the number of transistors in series.The delay in PTL can be estimated as td 7"n'N 2 where rn and N are the time constant and the number of transistors in series, respectively [11].This quadratic dependence can be seen in Figure where the results from an HSPICE simulation of a 0.25tm process are shown.The delay is plotted versus the number of nMOS transistors in series in Figure 1. Figure 2 plots the percentage of increase in delay over one pass-transistor against the number of transistors in series.These graphs show that the delay is unacceptably high when the number of pass- transistors is three or more.Therefore, we limited ourselves to a maximum of two pass-transistors in series without an intervening static gate for this process.
Another issue with pass-transistors is that they do not have full output voltage swing.The high output voltage is limited to Vdd--Vtn for an nMOS transistor chain where Vtn is the nMOS threshold voltage.We had hoped to use this limited voltage swing to reduce the dynamtc power consumption.
However, this output voltage was not high enough to completely turn off the pMOS transistors in the

FIGURE
Relationship between the # of pass-transistors in a chain and delay (0.25 I.tm CMOS technology).following static gates, and, therefore, the static current became unacceptably high.To fix this problem, an inverter and a pull-up pMOS transistor had to be added at the output of each PTL cell to restore the swing all the way to Vaa.
The size of inverter and pull-up pMOS was chosen in such a way that the loading effect at the cell output is minimized.This arrangement is shown in the inset of Figure 5.

Conventional Static and PTL Mapping
One of our goals is to compare our proposed mixed PTL structure with the conventional PTL structure.To map a given set of Boolean functions to a conventional PTL circuit, we followed a method similar to the one described in [5].The logic equations are first converted to a reduced and ordered BDD.This BDD is then converted to a PTL circuit simply by replacing each node with two nMOS pass transistors in a Y-shaped pattern as shown in the inset in Figure 3.This is a straight forward method, and it assures that only one path is on at a time.The final step is to add buffers after every two pass-transistors in series for the reasons outlined in Section 2.1.The buffers use pull-up transistors to restore full swing at the output of the pass-transistors.
As an example of a conventional PTL circuit, the Boolean equation yos2stlX2 "t-S1SoX2Xl + S2X2XlXO "1-yos2slsox2 () was mapped using the above process.The results are shown in Figure 3.
For the pure static mapping, we used the SIS program as described in [12,13].Since technology mapping is an NP-hard problem [14,15], SIS uses a set of heuristic algorithms.The first algorithm decomposes the given Boolean function into a graph containing only simple gates, such as 2-input NAND's and inverters.In the second step, this large graph is then partitioned into smaller pieces called subject graphs [16,17] to make the third step tractable.The third step is called coveting [16][17][18] and consists of trying to match portions of the subject graph to graphs created from the cell library.The cell library used by SIS contains only static CMOS gates.In the final step, the algorithm chooses which library cells to use, usually based upon some sort of timing constraint.SIS was used to implement the function f given in (2).The result is shown in Figure 4.

PTL/Static Hybrid Mapping
Our approach to the mixed PTL/Static technology mapping is similar to the method as described in SIS [12,13].The mapping begin by decomposing the Boolean functions into a graph containing only simple gates called base functions: 2-input AND's, 2-input OR's, and inverters in our case.The primary purpose of this step is to express all local functions as simple functions and is required to ensure that every node is covered by at least one library cell.In order to ensure the existence of a solution the library have to include the base functions.
During partitioning, the large graph is broken into subject graphs.Since the technology mapping problem is NP-complete in nature, we can reduce the problem size and the problem becomes tractable by breaking the graph into cones.We have chosen multiple fanout points as the bound- ary of the subject graphs.
We used a similar covering method as that in SIS, but differs from the SIS method in that three PTL gates are added to the library of static gates and the static gate is used to perform both logic functions and buffers between PTL gates.These PTL gates have a maximum of 2 pass-transistors in series.Figure 5 shows the schematic of these three cells.With the addition of the PTL gates, some new rules must be added to the conventional covering scheme to generate mapped circuit after finding possible matches at every node in the subject graph.First, all inputs to a PTL cell must be from static cells.Similarly, all outputs from a PTL cell must be to static cells.These rules assure that two PTL cells are never connected in series, and thus no more than two pass-transistors are in series at any point in the circuit.This has to be done to keep the delay at acceptable levels as discussed in Section 2.1.
The details of the mapping process is beyond the scope of this paper.The mapping algorithm will be presented in a separate paper at a later time.A sample of our hybrid approach is shown in Figure 6 where the function f given in (2) is implemented.

EXPERIMENTAL RESULTS
We implemented three circuits from a micropro- cessor design in each of the three logic styles, conventional static, Conventional PTL, and mixed PTL/static.The characteristics of three circuits are summarized in Table I.These three circuits are implemented using a commercial 0.25 lxm CMOS process with Vdd 2.5V.For each of the three circuits, we randomly chose five paths to compare delay, power consumption (under different input setup to activate the given path), and power-delay product.The transistor size in all the circuits were determined such that rise and fall time of all the signal nodes in the circuits are in the range between 300 ps and 400 ps.
The HSPICE simulation results are shown in Table II, where the paths numbered through 5 are from circuit 1, 6 through 10 are from circuit 2, and 11 through 15 are from circuit 3.This was the original process we designed our cells for, and the results are encouraging.In 9 out of 15 paths (path # 1, 6, 7, 8, 11, 12, 13, 14, 15), our proposed scheme outperformed the conventional static logic and the conventional PTL in either delay or power consumption.Out of these 9 paths, 6 paths outperformed both delay and power consumption.
In 10 out of 15 paths (path # 1, 6, 7, 8, 9, 11, 12, 13, 14, 15), our scheme outperformed the conven- tional static logic and the conventional PTL in power-delay product.On average, our proposed scheme has a 30 percent improvement in delay while consuming approximately half of the power when comparing to the conventional PTL.More modest, but still substantial, gains are seen when comparing our approach to the static CMOS implementation, where there is approximately a ten percent savings in both delay and power.For power-delay product (PDP), our approach yields a 17 percent improvement over the static CMOS method.
In addition, our experimental results show that, at 0.25 lxm, the advantages of conventional PTL over conventional static are disappearing.In fact, the conventional PTL scheme is significantly outperformed by the conventional static logic in all three categories of delay, power consumption, and power-delay product.It is our belief that explicit use of buffering between PTL stages in the conventional PTL addel significant delay and additional power consumption to the conventional PTL circuits.

CONCLUSIONS
We presented a new pass-transistor scheme in which conventional PTL stages are mixed with static logic gates.The static logic gates not only act as buffers between PTL stages, but also perform part of the logic functions of the given block.
Comparing to the conventional PTL, the elimina- tion of explicit buffering between PTL stages significantly improved both circuit performance and power consumption.The experimental results confirms that, by using proposed scheme, over 10% improvement in delay and power consump- tion over the conventional static logic can be achieved, and up to 50% improvement in power consumption and up to 30% in delay over the conventional PTL can be achieved.The proposed scheme requires a new way the technology map- ping is done.The problem is NP-complete in nature.We are continuing our effort to formalize the process of mapping a given logic function using the mixed PTL and static logic gates.

FIGURE 2
FIGURE 2 Relationship between the # of pass-transistors in a chain and percentage increase in delay (0.251xm CMOS technology).