Efficient Low Power / Low Swing Bus Design Architectures

Novel low-power circuits based on low swing voltage technique, in the internal nodes of bus architectures, are proposed. Different classes of driver/receiver and repeater circuits are presented. They are implemented on conventional CMOS technology. The proposed technique is based on inserting a variable number of MOSFET transistors in the driver circuits, causing variable low swing voltage levels in the output of the driver circuits. In order to re-pull up the low swing voltage to full swing, innovated high-speed, crosscoupled latch voltage receiver circuits are proposed. In applications having high load capacitance due to long interconnections, novel repeater circuits, based also on low swing voltage technique, are introduced. The difference between the values of threshold voltage of the nMOS transistor and the pMOS transistors is exploited to decrease the power dissipation. The effect of the proposed technique in noise margins is also analysed.


INTRODUCTION
The increasing complexity of VLSI circuits and the growing demand for portable equipment makes power dissipation one of the most important issues in modern VLSI applications [1].Decreasing the dynamic power dissipation is the most significant part of decreasing the total power dissipation.Bus architectures, clock designs and long interconnec- tions are examples where high capacitance exists in the output.They are often routed over long distance resulting in large load capacitances that must be charged and discharged.Usually up to 50% of the total power dissipation is dissipated due to clock signal, while 40% of power dissipa- tion is caused of long interconnections [2][3][4][5][6].
Several low-power bus architectures have been proposed.They are based on reducing the voltage swing on the internal nodes [7][8][9][10][11][12].In [7], non-conventional technology is required and transistors with different resistances are used in order to decrease the swing voltage at the output of the driver.In [8], careful design and reference voltage threshold generators are re- quired.The idea to reduce the clock voltage swing was pursued in [9], but it required four clock lines, which caused increasing of the clock interconnec- tion capacitance.BiCMOS technology is used in [10] in order to satisfy high-speed operation and lower power dissipation.Moreover, routing four clock lines is disadvantageous in area, and skew adjustment is difficult to implement.In [11], a method based on diodes using additional clock circuit design without significant decrease in power dissipation is presented.A similar to our proposed technique design based on [10] is proposed in [12], but it requires second external supply voltage of 6V (Vw0 in order to decrease static DC current dissipation.
As shown from the above techniques, the main disadvantage of the low swing voltage technique is the complex design and the large silicon area.In this paper, we propose three different driver, receiver and repeater circuits based on conventional architecture using simple circuit design.In the proposed technique, variable level of low swing voltage in the output of the driver can be gen- erated.In this way, the proposed technique can be used widely in large number of applications where the low swing level value is dependant on the application.Different values of power savings can be achieved by increasing/decreasing the inserted number of the MOS transistors as will show in next section.
The organization of this paper is as follows: In Section 2, the new low swing bus architectures are presented.Bus architecture evaluation is discussed in Section 3.For long interconnections bus architecture, repeater circuits are proposed in Section 4. The influence of the noise margin in the proposed architecture is shown in Section 5. We conclude in Section 6.

BUS ARCHITECTURE
The conventional CMOS bus architecture is shown in Figure element (interconnection line), and a receiver.
Both the driver and the receiver are designed with conventional CMOS inverters.The input/output voltage level of the driver/receiver ranges between 0 and the power supply voltage (FDD).
In a CMOS circuit the decrease of the supply voltage is the most efficient way to reduce the power dissipation.However, this results also to circuit speed reduction.So, new circuits that combine high-speed operation with low power dissipation should be proposed.

Low Swing Voltage Driver Architectures
Current state-of-the-art bus architectures are based on reducing the voltage swing in the internal nodes.They require additional circuitry for the bus structures, consisting of a driver and a receiver.Although the techniques achieve high reduction in power dissipation, they also have a number of constraints that reduce their usability.
By comparing our proposed technique with exist- ing low swing techniques, improvements in the power saving and in decreasing in the delay time are shown.
Three different classes of proposed driver circuits, achieving reductions in output swing voltage levels are shown in Figures 2a, b, Figures  4a, b, Figures 6a, b.The M1 nMOS transistor (Fig. 2), which is inserted between the pMOS and nMOS transistors of a simple inverter, is used to reduce the output voltage swing.Applying a   reference voltage Vref on its gate, the voltage on its source and therefore on the output node of the driver cannot exceed the voltage value Vref VTI-I, where VTH is the threshold voltage of the M1 transistor [13,14].
A simple method is used for the derivation of the Vref.This is accomplished by a structure of MOS transistors.In this method, a parallel or serial structure of MOSFET transistors is used.As the gate of each MOSFET transistor (in the MOSFET structure) is connected on the source/ drain of the previous one, the voltage value on the source node of the last transistor is: where n is the number of the MOSFET transistors used in the structure.It is clear that with this driver design, the number of the inserted MOSFET transistors easily controls the output swing-voltage and therefore the saving power dissipation.
As shown in Figures 2a, 4a because of the gate- to-source and gate-to-drain capacitance of the transistor, as the voltage on the source and drain nodes of the M1 transistor varies, an amount of charge is trapped on these gates.This causes an increase in the gate voltage, which destroys the operation of the circuit.In order /'ref to remain in the proper value, the C1 capacitor with a value at least five times the value of the source and drain parasitic capacitance, is inserted.This capacitance can be easily produced using CMOS technology (gate capacitance of a transistor with proper size) [15][16][17].
The driver circuits in Figures 2a, b belong to the first class called Up Low Swing Voltage Driver (ULD).In this class, the low swing output voltage (VLs) ranges between 0 and Vx, where Vx Vref VTN and VTN is the threshold voltage of nMOS transistor.For the same input voltage and load capacitance of each driver, the total power dissipation of the proposed driver is compared with the power dissipation of the conventional driver as with drivers proposed in [18 (Fig. 2)] [10] and [12 (Fig. 3)].It shows clearly a distinct improvement of the proposed two driver circuits over the other low swing driver circuits.Simula- tion results are given using only one nMOS transistor in the nMOS structure.It should be emphasised here that inserting second or more Bellaouar (Fig. 2b) --e-Kawagushi (Fig, 3b) Supply Voltage (Volt) nMOS transistors in the MOSFET structure increase the power savings, as will be shown later.
The second class of the low-swing voltage driver is called Down Low swing voltage Driver (DLD).
It is shown in Figures 4a, b.It is the inverse form of the first class, replacing the nMOS transistors in the case of (ULD) by pMOS transistors.In this class, the low swing output voltage (VLs) ranges between Vy and VDD, where Vy= Vref--VTp Vref--n VTp, and VTp is the threshold voltage of pMOS transistor.The choice between ULD and DLD depends on the absolute value of the n/pMOS transistor threshold voltages [19].According to our knowl- edge no other driver's circuits are proposed in this class.We have compared the proposed driver circuits with the conventional CMOS driver using only one pMOS transistors in the pMOSFET structure (Fig. 5).
The driver circuits in Figures 6a, b belong to the third class called Up-Down Low swing voltage Driver (UDLD).It is a combination of both previous designs.In this case, the VLS swing output voltage ranges between Vx for low input and Vy for high input.The differences in the threshold voltages between the inserted (n/pMOS) transistors cause differences in the range of low swing at the output voltage of the driver circuit.
For different values of supply voltage and load capacitance, Figure 7 shows comparison results of the proposed technique with other techniques proposed in [18 (Fig. 5a)].
It is obvious from the above proposed driver designs that the output voltage swing and there- fore the power savings is easily controlled by two factors: (1) the value of the threshold voltage of the inserted n/pMOS transistor and (2) the number of the n/pMOS transistors in the MOSFET structure.

Pull-up Receiver Circuits
When the conventional CMOS inverter is used to convert a low-swing signal to a full-swing signal, the standby power can be important [7].Sym- metric structures of the proposed receiver circuit offer a solution to the problem of standby power dissipation.
Special receiver circuits are required to pull-up the low swing output voltage of the driver circuits to the conventional full swing (range between 0 and VDD).Different receiver circuits are proposed in different published papers based on different methods [7,20, 10, 12, 18].In case of the receiver circuits, the main problems are the large silicon area used and the increase of the delay time compared to the conventional CMOS receiver.So special attention must be taken during the design of receiver circuits.In our case, we are interested in designing receiver circuits based on conventional CMOS technology, using simple circuit design with least silicon area an'd delay time.
Because both power dissipation and the propa- gation delay time are the most important factors of   the receiver circuits, we will use in the following comparisons the normalized power delay product for simplicity.For each class of the proposed driver circuit, an appropriate corresponding re- ceiver circuit is proposed.
The Up-Full swing Receiver circuit (UFR) is shown Figure 8a.It based on voltage sense transistor circuit.The operation of the proposed receiver circuit is as follows: As the receiver input (InReceiver) is connected to the driver output (FLS), the receiver input voltage swings between the values 0 and (VDD -'n VTN).When Vx= VDo--n V-r, transistor M2 turns on, discharging the receiver output node to the ground.Thus, M3 Supply Voltage (Volt) -,-Fig.8(a) --e-Zhang (Fig. 2) Zhang (Fig. 8a) -.x--Bellaouar (Fig. 4  turns on charging to VDD the gate node of M4, which turns off.At this time, for the proper operation of the circuit, transistor M5 must be off. When the receiver input (VLs) is 0, transistor M2 turns off while transistor M5 turns on, discharging the gate of transistor M4 to 0 V. Thus, M4 turns on, charging the output load to VDD and ensuring a full swing operation.
Normalized power delay product of the pro- posed receiver circuit (Fig. 8a) and other corre- spondent receivers proposed in previous published papers [18 (Figs. 2, 8a)] [10 (Fig. 4a)] are compared to the conventional receiver (CMOS Inverter).The simulation results are obtained using the same low swing input voltage and output load capacitance, as shown in Figure 8b.
The same logic of the previous receiver circuits can be implemented in order to cooperate with the Down Low swing voltage Driver circuit (DLD).
The proposed Down FuI1 swing voltage Receiver (DFR) is shown in Figure 9a.The logic operation of this receiver is exactly the inversed of the (UFR).As the receiver input (InReceiver) is at VLS voltage, the transistor M2 turns on, charging the receiver output node to VDD.Thus, M3 turns on, discharging the gate node of M4 to GND, which turns off.In this case M5 turns off.When the receiver input is VDD, the transistor M2 turns off while the transistor M5 turns on, charging the gate of transistor M4 to VDD.Thus, M3 turns off discharging the output load to 0 and ensuring a full swing operation.
Normalized power delay product of the pro- posed receiver circuit and receiver circuit proposed in [10 (Fig. 4b)] compared to the conventional CMOS inverter for the same input voltage and load capacitance are shown in Figure 9b.
The combination of the receiver circuits described above, results in the Up-Down Full swing voltage Receiver circuit (UDFR) as illustrated in Figure 10a.The circuit is symmetric.When the low swing is high, the up half receiver converts < InReceiver   the low swing to VDD, while when the low swing is low, the down half swing receiver converts the low swing to 0 V.The main advantage of the proposed design is the output of the receiver, derived from the CMOS inverter.The output voltage in this case will be 0 volts (low logic) or VDD (high logic).
Normalized power delay product of the pro- posed receiver circuit and other correspondent receivers proposed in different published papers [18 (Fig. 5a)] [10 (Fig. 3a)], are compared to the conventional receiver as shown in Figure 10b.

BUS ARCHITECTURE EVALUATION
In order to show the improvements of the pro- posed driver and receiver circuits, the proposed architecture is compared with the conventional bus architecture (using two CMOS inverter circuits).Res.(Ohms) FIGURE 12 Bus architecture normalized power delay product.

ULB DLB UDLB 25
Each driver-receiver pair of the three proposed classes forms a bus architecture.The combination of ULD and UFD forms an Up Low swing Bus architecture (ULB), the DLD and the DFD forms a Down Low swing Bus architecture (DLB), and the UDLD and the UDLR forms an Up-Down Low swing Bus architecture (UDLB).
Figure 11, shows the normalized power delay product of the three proposed bus architectures compared to the conventional bus architecture under the same conditions of the load capacitance, input voltage and transistor widths.
In Figures 12 and 13 the normalized power delay product for different resistance and capaci- tance values, in the delay element, are illustrated respectively.
The normalized propagation delay and total power dissipation for the three proposed classes of bus architecture using different numbers of n, are shown in Table I.An increase of the number of MOS transistors (n) in the inserted structure results in an increase of the delay time and a reduction of the power dissipation.The comparison results are obtained using supply voltage 3.3 V, load capacitance 10pF and delay line with 0.5 gt resistance and 0.2 pF capacitors.The widths of transistors are shown in the corresponding figures.The measurements of the power dissipa- tion, was made by the power-meter circuit proposed in [21].
tance (CL) the operation frequency (fo) and the square of the supply voltage (V2DD).
Pd fc CL (VDD * VDD)   Dynamic power dissipation in the architecture can bc calculated by ( proposed Pd fc CL (VDD * VLS) where VLS is the low swing voltage in the output of the driver.Compared to the full voltage-swing case, a reduction in power dissipation In CMOS technology, dynamic power dissipation (Pd) occurs when current flows from the power supply voltage (VDD) to the output load, and is calculated by multiplying the total load capaci- is achieved.The above equation indicates the significant savings in the dynamic power dissipa- tion that can be achieved by reducing the supply voltage (VDO).In the case of the Up Low swing voltage Bus architecture and for the low voltage swing operation, the energy required to charge a capacitive load CL to the value Vref VDD nVTN The power savings, compared to the power The operation of the repeater circuits is as dissipation, Etot of the full swing operation, is follows (e.g.Fig. 14a): The repeater circuit has given by the power saving factor k, input the driver's output which is ranging k (1-E ) 100%Etot (n + 1) VTNvoD 100%. (7) The dynamic power dissipation of the two other bus classes can be calculated using similar equations.

REPEATER CIRCUITS
The delay of a long line, with distributed resistive between values 0 and Wref-VTN (Wx).When the input value is 0, the pMOS transistor turns off and the nMOS transistor turns off.The voltage value in the source of transistor M1 can not exceed the value of (Vf-VTN), which is the same value of the output voltage of the repeater.When the input voltage is VLS (Vf--VTN), the nMOS transistor turns off and the pMOS transistor turns on.Therefore, the output of the repeater is exactly 0, which satisfies the logic operation of the circuit.The inverse logic operation is satisfied for the repeater in 14b.
In the third class 14c, the input of the UDLR and capacitive components, grows as the square ranges between values Vy (Vref VTN) and Vx.
of its length [22][23][24].To avoid this dependence, When the input voltage value is Vx, the pMOS a common solution is to separate regularly the transistor (M1) turns off and the nMOS transis- interconnection line in equal length segments, tor (M2) turns on (its source voltage value is which are driven by repeaters [25,8,2,3]. VTp).The voltage value in the source of transistor We propose three different classes of repeater M1 can not exceed the value of (VDD VxN), circuits.They are appropriate to the three classes which is the value of the output voltage of the of the driver-receiver circuits.For ULB, the Up driver.When the input voltage is Vx, transistor Low swing Repeater (ULR) (Fig. 14a), for DLB, M2 turns off and transistor M1 turns on (its the Down Low swing Repeater (DLR) (Fig. 14b) source voltage value is VVN).The output of the and for UDLB, the Up-Down Low swing Repeater repeater in this case can not exceed the voltage (UDLR) (Fig. 14c) are proposed, value Vwp. 'f [-]M1     Because the input and the output of the repeater circuit range in low swing voltage level, the power dissipation in this case will be calculated from the following equation: In order to evaluate the repeater's delay time and power dissipation, we applied a different number of repeaters for each of the proposed bus classes.
The long interconnection line (Fig. 1) was cut to symmetrical segments as shown in Figure 15.For the measurements, 10pF capacitor values and 0.5 f resistance values were used.
The normalized propagation delay and normal- ized power dissipation measurements for the three bus architectures with different number of repea- ters (k) are shown in Table II.
The measurements are obtained using only one MOS transistor in each inserted structure.In [23]  an analysis for the optimum number of repeaters is given.From Table II, it is obvious that an increase in the number of repeaters (k) results in a decrease in the normalized delay time, as it is expected.
The normalized propagation delay and normal- ized power dissipation measurements for the three bus architectures with different number of MOS transistors (n) are shown in Table III.
For all the above measurements the SPICE parameters shown in Table IV, are used.

NOISE MARGIN
Noise margin is an important factor in applications with low swing voltage.Maximum noise margin can be defined as the noise margin that can be tolerated in a circuit without producing a logic error.The worst case of noise margin in either low or high noise margin is 0.1 VDD [26].Measure- ments show that in the proposed bus architectures, the lowest voltage of the noise margin is 0.45 VDD, (in case of UDLB).Noise margin measurements taken for UDLB, are shown in Table V.In the proposed designs and in order to increase the value of the noise margin, we proposed also the repeater circuits that can also keep the values of noise margin in high values.
6. CONCLUTIONS In this paper, three low power bus architectures are presented.They are based on the voltage swing reduction technique.Simple design principles and conventional CMOS technology are strictly employed.Symmetrical driver/receiver circuits for different low swing voltage are proposed.Using a structure of nMOS/pMOS transistors, parametrical reduction in voltage swing can be achieved.High power dissipation savings are obtained, while the trade-off between the power dissipation and the propagation delay was also examined.In order to decrease the delay time in long interconnections, repeater circuits compatible to each type of proposed driver/receiver circuits are also proposed, resulting in the decrease of the delay time as well as the increase of the value of the noise margin.

FIGURE 6
FIGURE 6 Up Down Low swing Drivers (UDLD).

3 FIGURE
FIGURE 8b UFR normalized power delay product.

FIGURE 13
FIGURE 13 Bus architecture normalized power delay product.
1.It consists of a driver, a delay FIGURE 8a Up Full swing voltage Receiver (UFR).

TABLE II
Measurements by using different number of repeaters (k)

TABLE III
Measurements by using different number of inserted transistors (n)