Reduced Voltage Scaling in Clock Distribution Networks

We propose a novel circuit technique to generate a reduced voltage swing (RVS) signals for active power reduction on main buses and clocks. This is achieved without performance degradation, without extra power supply requirement, and with minimum area overhead. The technique stops the discharge path on the net that is swinging low at a certain voltage value. It reduces active power on the target net by as much as 33% compared to traditional full swing signaling. The logic 0 voltage value is programmable through control bits. If desired, the reduced-swing mode can also be disabled. The approach assumes that the logic 0 voltage value is always less than the threshold voltage of the nMOS receivers, which eliminate the need of the low to high voltage translation. The reduced noise margin and the increased leakage on the receiver transistors using this approach have been addressed through the selective usage of multithreshold voltage (MTV) devices and the programmability of the low voltage value. the which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Introduction
Continuous VLSI technology scaling has enabled integration of millions of transistors on a single chip working in over GHz clock frequencies. Besides area (cost) and performance, modern VLSI designs are critical to achieve low-power consumption due to limited battery lifetime in mobile applications, increased priority to achieve improved energy efficiency for data centers, web servers, supercomputing centers, and expensive alternative cooling options for personal computers.
LVS is an effective active power consumption reduction technique since active power consumption is proportional to signal voltage swing. Interconnects are responsible for up to 50% of the active power consumption, while up to 90% of interconnect power consumption comes from only 10% of the interconnects, such as clock networks and global signal busses [1]. Developing LVS techniques for these powers hungry interconnects are critical to modern VLSI designs.
Power efficiency is an increasingly critical VLSI design objective. Low-power design for high-performance computers improves energy efficiency and reduces package cost for heat dissipation, while low-power design for mobile applications increases battery lifetime.
Low-voltage swing is an effective technique to reduce dynamic power consumption, especially for clocks which are among the most active signals in a VLSI circuits and generally consume up to 50% of the total power [2]. Reduced voltage swing clock signals can be applied at the upper level of a clock tree for low-power, while clock gates (such as inverters) amplify the signals to full swing upon reaching sequential elements [1].
Existing techniques to generate reduced voltage swing signals require an extra low-power supply or need precise timing for a pulse signal which enables the driver gate, while a number of voltage level converters have been developed which trigger a reduced voltage swing signal into a full swing signal [3].
In general there are two techniques to reduce clocking swing, the first one is dual power supply voltage the second one is single power supply voltage. The first method adds more complexity to the overall design and layout. The second one, single supply voltage challenge, is the design of reduced swing buffers. Many papers [1,4] implemented this method by utilizing pMOS for passing low logic level and nMOS for passing high level logic. Such techniques result in poor rise and fall times, which make it impractical for highperformance applications.  In this paper, we propose Reduced Voltage Swing (RVS) design comparing to the traditional Low Voltage Swing (LVS) technique. We elevate the low logic voltage instead of lowering the high logic voltage. We propose an inverter design which generates RVS signals at the cost of an extra transistor, and an extension of the RVS inverter with programmable gates for adjustable low logic voltage. We achieve (1) minimum area overhead (by not requiring an extra power supply network), (2) minimum performance degradation (by keeping the supply voltage and the high logic voltage), and (3) robustness to process variations (the logic 0 voltage is adaptive to process variations). The simulation results from HSPICE [5] tool show that we reduced active power consumption with very limited performance loss.

VLSI Design
The rest of the paper is organized as follows. Existing lowvoltage swing signal and clocking is presented in Section 2. Section 3 presents the reduce voltage swing principle and circuit followed by implementation and simulation results in Section 4. Finally Section 5 concludes the paper.

Existing Low-Voltage Swing Signaling and Clocking Schemes
Existing low-voltage swing circuits [4] process a number of deficiencies, such as the need for extra supplies, performance impact, differential signaling, and reliability degradation. They typically look at reducing the supply voltage on the targeted net, which impacts timing significantly. Most of the papers describing low or reduced voltage swing signals are targeting clock network or signal nodes with high capacitance to reduce power. Zhang et al. [4] surveyed the different options and circuits used to generate small or reduced signal swings. The paper lists the comparison of speed, power, and complexity of the different options. It also points out the deficiencies of each technique. They also proposed their own scheme called pseudodifferential Interconnect (PDIFF). However, all these LVS signaling techniques require an extra power supply which adds cost and complexity to the design. An LVS clocking technique that requires only a single power supply is proposed [3], wherein intermediate clock buffers are turned off once they reach the desired voltage  levels. This makes the clock node essentially floating and is susceptible to noise. Subsequent regular clock buffers act as amplifiers which restore the clock signal to full swing. The short circuit power consumption of these amplifier clock buffers is reduced through the usage of small and high threshold voltage transistors.

Reduce Voltage Swing Principle and Circuits
For clock distribution the synchronous clock must be distributed all over the chip with minimum possible skew. The clocking network consumes significant amount of power, Clock distribution interconnects, and their increased parasitic with scaling results in the increased power consumption. Typically, buffers are inserted within the clock network to isolate the downstream capacitance; thus it is reducing the transition times and increases amount of power consumption substantially.
As stated and used in [6][7][8] there is a need to reduce the power dissipation of the clock network while maintaining the performance objectives. Power can be reduced by reducing Clock frequency. However, the frequency cannot be changed without significant architectural changes. So alternatively, power can be reduced by reducing the total load capacitance, CL, on all nodes, reducing VDD or reducing Vswing, without reducing VDD, which corresponds to a linear reduction in the power dissipation.
The former works on reducing V(swing) via adding extra supplies to the selected nets. Extra supply not only adds cost and area but also increases the complexity of routing and switching off two power grids for power savings. Figure 1 shows the traditional waveforms obtained with the conventional reduce voltage swing approach. It also shows the waveforms produced by our proposed circuit. The main difference is that with the proposed circuit, the voltage waveform still reachesV dd . This eliminates the timing impact; also the rising edge delay of the lclk1 from clk in (t1) (Figure 1) in our approach is not impacted.
The proposed RVS circuit is shown in Figure 2 with both receiver and driver using the same power supply. We showed both traditional and the new circuit with the expected waveforms. In the proposed approach no extra supply is  needed when generating the RVS signal (lclk2). The overhead of adding a transistor to the driver is very minimal compared to the total net capacitance especially when the driver fans out to many receivers which is true in the clock distribution network or long wire net. The transistors in our proposed design are with low threshold voltage (LVT) for the driver and high threshold voltage (HVT) for the receiver. This enables a built in noise margin on the net lclk2 which is equal to the voltage difference between the HVT and LVT values. Also the receiver HVT transistor and its drain to source voltage being less than vdd minimize the increased leakage due to elevated gate voltage. The receiver SSTC latch topology [9] is selected because the clock pin goes only into nFET transistor which eliminates the need of level translation to prevent short circuit current. Another advantage of our proposed design is that the lclk2 is always going to be actively driven. In case of coupling noise high on lclk2 net the pull down stack will turn on and clear any charge on the net before it reaches the threshold for the receiver HVT FET. This is true because the driver is LVT and the receiver is HVT device. The addition of series transistor to the final driver slows down the falling edge of the clock which only affect hold time and not the speed of the circuit (clk-> q-delay). The M1 transistor that is controlled by power mode is meant as an override mode to the system. If power mode is set to 1, the RVS circuit will behave the same as the traditional one.
One limitation of our proposed technique which shown in Figure 2 is that it only limits the swing of lclk2 between vdd To summarize the differences of our proposed technique to traditional techniques in terms of some of the key design metrics we have the following.   and RVS signaling are equivalent in reducing signal voltage swing hence active power consumption.
(3) Performance. Low supply voltage and low logic 1 voltage in LVS signaling lead to performance degradation. While in RVS signaling, the constant supply voltage and logic 1 voltage do not degrade performance 2.
(4) Noise margin. The reduced signal voltage swing needs to cover the receiver flip-flop's meta-stability point (e.g., 0.5 V dd ), and the minimum distance from the metastability point to input signal voltage swing boundary gives noise

Simulation Result
We compared our proposed RVS inverter and traditional full voltage swing (FVS) inverter, we used HSPICE tools for the simulation, and results for the two inverters with 2.00 fF load capacitance under 0.9, 1.0, 1.  To further verify this approach, we build a clock spine which drives an array of 6 × 32 = 192 flip-flops. HSPICE simulation shows that by replacing the clock buffers with the proposed RVS buffers as shown in Figure 4, we achieve 37.2% power reduction, while the signal propagation delay from the spine input to a flip-flop is degraded by 8.6%, and VLSI Design 7 clock signal slew rate is degraded by 30.0%. Table 2 gives the comparison results and Figure 5 shows waveform of RVS and output results.
We also verified the circuit still functioning with different voltages. Figures 6 and 7 show the results. Table 3 shows results with different cap, clock input pulse width, and different voltages.

Conclusion
In this paper, we propose Reduced-Voltage-Swing (RVS) signaling as compared to the traditional Low-Voltage-Swing (LVS) signaling for reduced active power consumption. We achieve minimum area overhead (without routing an extra power supply network and a minimum number of extra transistors), equivalent active power reduction, and minimum performance degradation. HSPICE simulation results using Arizona state university technologies with respect to a variety of design parameters (supply voltage, load capacitance, input signal slew rate, etc.) verify the effectiveness of these novel RVS circuits, which save an average of 37.2% dynamic power, with 8.6% clock insertion delay increase in a clock spine driving 192 flip-flops.