A Low-Power Digitally Controlled Oscillator for All Digital Phase-Locked Loops

A low-power and low-jitter 12-bit CMOS digitally controlled oscillator (DCO) design is presented. The Low-Power CMOS DCO is designed based on the ring oscillator implemented with Schmitt trigger inverters. The proposed DCO circuit uses control codes of thermometer type to reduce jitters. Performance of the DCO is verified through a novel All Digital Phase-Locked Loop (ADPLL) designed with a unique lock-in process by employing a time-to-digital converter, where both the frequency of the reference clock and the delay between DCO output and DCO clock is measured. A carefully designed reset process reduces the phase acquisition process to two cycles. The ADPLL was implemented using the 32 nm Predictive Technology Model (PTM) at 0.9 V supply voltage, and the simulation results show that the proposed ADPLL achieves 10 and 2 reference cycles of frequency and phase acquisitions, respectively, at 700 MHz with less than 67 ps peak-to-peak jitter. The DCO consumes 2.2 mW at 650 MHz with 0.9 V power supply.


Introduction
Phase-locked loops are widely used in many communication systems for clock and data recovery or frequency synthesis [1][2][3][4][5]. Cellular phones, computers, televisions, radios, and motor speed controllers are just a few examples that rely on PLLs for proper operation. With such a broad range of applications, PLLs have been extensively studied in literature.
The conventional PLLs are often designed using analog approaches. However, analog PLLs have to overcome the digital switch noise coupled with power through power supply as well as substrate-induced noise. In addition, the analog PLL is very sensitive to process parameters and must be redesigned if the process is changed or migrates to next generation process. Although many approaches have been developed to improve the jitter performance, it often results in long lock-in time and increasing design complexity. With the increasing performance and decreasing cost of digital VLSI design technology, all digital phase-locked loops have become more attractive. Although ADPLL will not have the same performance as its analog counterpart, it provides a faster lock-in time and better testability, stability, and portability over difference process [6,7].
The controlled oscillator is a key component in PLL, which is a replacement of the conventional voltage or current controlled oscillator in the fully digital PLLs. They are more flexible and usually more robust than the conventional VCO. Furthermore, the design compromise for the frequency gain in voltage or current controlled oscillator is not necessary in DCOs because the immunity of their control input is very high. There are two main techniques for the DCO design as shown in Figure 1. One technique changes the driving strength dynamically using the fixed capacitance loading [8,9] while the other uses shunt capacitor technique to tune the capacitance loading [10]. Although both of the approaches have a good linear frequency response and a reasonable frequency operating range, the power dissipation has not been taken into consideration. Moreover, for the DCO design, there is a tradeoff between the operating range and the maximum frequency that DCO can achieve. As a result, the increase of the operating range by adding more capacitance loading will result in a lower maximum frequency and higher power consumption. Since power consumption is of extreme concern for portable batterycharged computing systems, the reduction of the power consumption has become a major concern in modern electronic systems. This paper proposes a novel DCO circuit with significantly reduced power consumption using binary controlled pass transistors and Schmitt trigger inverters. The functionality and performance are verified through a novel ADPLL that uses the proposed DCO. Usually the ADPLL structure based on the second order negative feedback system has a faster lock-in time with a limited lock-in range [11,12]. One the other hand, by separating the locking process into frequency and phase acquisition, a wide lock-in range is available [13,14]. However, it takes more time due to the blind "ahead" or "behind" comparison as well as the extra phase acquisition process. In this paper, the new ADPLL also uses a separated frequency and phase lockin process. Instead of "ahead" or "behind" comparison, a time-to-digital converter is used to measure the frequency difference accurately, which greatly reduces the lock-in time.
The phase acquisition only takes two reference clocks. In the first cycle, the DCO is reset by the reference clock considering the delay between DCO output and DCO clock. In the second cycle, the DCO frequency changes back to the reference clock by updating the control bits. The ADPLL with the proposed DCO was implemented using a 0.9 V 32 nm practical transistor model.

DCO Principle and Design
2.1. DCO Principle. DCO should generate an oscillation period of T DCO , which is a function of digital input word D and given by Typically, the DCO transfer function is defined such that the period of oscillation T DCO is linearly proportional to D with an offset. Therefore, the oscillation period is rewritten as (2) where T offset is a constant offset period and T step is the period of the quantization step. For the conventional driving strength-controlled DCO shown in Figure 2, the constant delay of each cell is calculated as follows: where R 1 and R 2 are the equivalent resistances of M1and M1 and C1 and C2 are the total capacitances at the drain of M1 and M1 , respectively, which mainly consist of drain to body and source to body capacitances. Assuming that they have the same driving strength, the delay tuning range of this standard cell is obtained as follows: In order to have a good linear tuning range, the width of the transistor M1 has to be increased as illustrated in (7). Consequently the equivalent resistance R1 will decrease, resulting in a smaller delay tuning range. One way to increase the tuning range while keeping the linear response is to increase the capacitance loading. However, this will minimize the maximum frequency that the DCO can accomplish and the power consumption will also be increased.

Proposed DCO Design.
The proposed DCO employs a new approach to increase the delay tuning range using digitally controlled pass transistor arrays and Schmitt triggerbased inverters [15]. The Schmitt trigger-based inverter has a higher VM+ (low-to-high switching threshold) and lower VM− (high-to-low switching threshold) compared to the conventional inverters as shown in Figure 3. As a result, the proposed DCO circuit provides the same tuning range with a smaller capacitance loading, which is beneficial for power consumption reduction. Moreover, in the conventional DCO circuit, the slope of the input signal to each stage decreases gradually due to the large delay between each stage. This results in not only a nonideal rail-to-rail switch but also a poor power performance. The steep slope of the output signal from the Schmitt trigger-based inverter minimizes this problem to a certain extend.
The circuit diagrams of the conventional DCO and proposed DCO are shown in Figure 4. The conventional DCO consists of two identical binary controlled coarse cells as well as a similar fine cell with smaller tuning range. The proposed DCO also consists of two coarse cells and a fine cell. The coarse cells have tuning codes of 2 bits with PMOS array or NMOS array in form of thermometer code, which could provide a better duty cycle performance and linearity. The fine cell has tuning codes of 6 bits by only NMOS array in the form of thermometer code as shown in Figure 4(c). The thermometer code minimizes the jitters. Since they are grouped per 2 bits, the circuit to convert binary code to thermometer code is also minimized.

Improved Structure with Larger Operating
Range. The binary controlled DCO structure has a limited linear operating range as discussed above. In this paper, three-stage constant delay chains and a 4 : 1 Mux are used to increase the operating range, and the three-stage constant delay is tuned by the fixed code such that each stage provides an accurate VLSI Design   delay as shown in Figure 5. As a result, the operating range can be four times larger compared to the original design.

Comparison between the Two DCO Structures.
The proposed DCO and the conventional DCO are simulated and compared using 32 nm CMOS PTM (Predictive Technology Model) with a supply voltage of 0.9 Volts. The choice of 12 bits is a compromise between the DCO resolution, operating range and circuit complexity. Table 1 shows the impact of each control bit on the period of the two DCO structures. Both structures have the same linear tuning range. Since the two DCO structure-have the same operating ranges, it is more reasonable for us to compare their power consumption.
Compared to the conventional DCO, the proposed DCO saves approximately 40% power consumption as shown in Figure 6. As discussed above, this reduction is due to the comparatively smaller capacitance loading for the Schmitt trigger-based inverter than the conventional inverter at the same operating frequency. The proposed DCO is significantly more power efficient than the conventional DCO. However, this DCO design has a limited operating frequency range, which is improved in this paper by employing the fixed delay blocks shown in Figure 5.
In Out

Simulation
Result of the Proposed DCO. The proposed DCO structure with increased operating range is designed and simulated using 32 nm PTM model. Figure 7 shows operating frequency ranges of the coarse and fine tuning frequency of the novel DCO. The curves have good monotonousness, which is a key factor in PLL performance.   The operational frequency response to the process, temperature, and voltage variation is shown in Figure 8. The curves show the normalized data with respecte to the center frequency. Figure 8 shows that the relative delay per code is almost same regardless of the process, temperature, and, voltage variations. In other words, the proposed DCO design is very robust to PVT variations. Table 3 shows the measurement results to compare with a few recent state-of-the-art DCO designs [6,10,16,17]. The proposed DCO achieves the finest LSB resolution and the highest operating frequency. In addition, the proposed DCO consumes less power than others.

Performance Verification of the Proposed DCO
In this section, performance of the proposed DCO is verified through a novel ADPLL with the proposed DCO circuit. The proposed ADPLL is designed with a unique lock-in process based on the good monotonousness of the DCO.

ADPLL Architecture
Overview. The block structure of the new ADPLL is shown in Figure 9. The control word corresponding to the period of the reference clock T ref is stored in register1. In register 2, the control word corresponds to a new period of T ref − T delay , which is the period of reference clock subtracted by the delay between DCO output and DCO clock. Unlike the conventional ADPLL designs, the clock signals to all the logical blocks are generated from the DCO output. Phase lock begins with frequency acquisition. In this mode, a time-to-digital converter measures the time difference between the reference clock and the DCO clock.
As shown in Figure 10(a), it converts time difference into the digital word T 1 and T 2 , which are the time difference between DCO clock's rising edge and the reference clock's rising and falling edges, respectively. As a result, the frequency (period) difference can be defined as follows: N represents the reference clock's low-to-high or high-to-low transition number during one DCO clock period from the 4bit counter. T 1 ' is the stored value of T 1 in the previous DCO period.
Compared to other frequency acquisition approaches, the DCO does not have to be reset at the beginning of every reference cycle for the initial phase alignment, which reduces the design complexity. Moreover, the frequency acquisition process can be reduced to less than ten cycles if the DCO has a good linearity performance. However, as shown in Figure 10(b), due to the nonzero setup time, the last transition of the reference clock may be ignored if the time difference T 1 ' is smaller than the register's setup time, which will results in an incorrect transition number N. The improved integer counter, which is designed to address this problem, will be discussed in the circuit design part.
The ADPLL in this paper starts with frequency and phase acquisition followed by maintenance mode. Once the frequency and phase are acquired using the coarse code, the acquired frequency and phase are maintained by updating the fine codes to correct the phase and frequency drift due to noises.
During the frequency acquisition mode, the coarse control bits are generated by the algorithm (arithmetic) blocks in Figure 9 and applied to the DCO. Since the DCO has a good linearity, this acquisition process takes fewer reference cycles compared to the previous blind fast or slow comparison. When the frequency is locked, the control bits are stored in the coarse bit register and the lock-in process is switched from the frequency acquisition to the phase acquisition process by the state machine.
In the phase acquisition, the DCO clock edge will be aligned to the reference clock edge. In reality, there are several stages of logic separating DCO output and DCO clock such as the duty-cycle corrector shown in Figure 9. As a result, the DCO clock edge cannot be aligned to the reference clock by a simple reset process as shown in Figure 11. The delay time T Delay results from the logic blocks between the DCO output and the DCO clock.
A phase acquisition process is required to get the phase aligned, which is usually done by comparing the phase position of the two signals. The adjustment on the control word is made based on the "behind" or "ahead" signal until there is a polarity change. However, such kind of acquisition process takes many cycles, which results in a slow lock-in process.
A novel reset process is presented in this paper, which is able to reduce the phase lock process to two cycles as shown in Figure 12. In the first cycle, the DCO is still Without the delay, the second rising edge will lead the reference clock by T Delay such as the DCO output. However, as for the DCO clock signal, this can be compensated by the existing delay T Delay and the second rising edge will be aligned to the reference clock. In the second cycle, the control word in the register1 will be reloaded and DCO frequency will be the same as reference clock again.  Ref After the phase acquisition, a maintenance mode is applied to preserve the phase alignment of the DCO clock relative to reference clock. The phase detector generates "ahead" or "behind" signal based on the rising edges of Delay chain output: 0 · · · 0 1 · · · 1 0 · · · 0 1 · · ·

Status
Function "000" Fine the control word for T = T ref "001" Reset DCO in order to find the delay value T Delay "010" Find a new control word for T = T ref − T Delay "011" Reset DCO in order for the phase alignment "100" Maintenance Mode register, is shifted to the left by one bit every cycle. When the polarity changes, the control word and the phase gain will be reset to the initial value stored during the frequency acquisition. The edge detector keeps comparing the phase difference and updating the fine control bits in order to maintain the phase lock. The fine resolution of the DCO as well as the bit shift strategy provides a fast phase lock-in time and better jitter performance.
The TDC used in this paper is composed of two parts: an integer counter that counts the reference clock edges within one DCO clock period and a fractional counter that quantizes the residual phase difference, which helps to improve the resolution of the proposed TDC.
The block diagram of the fractional TDC structure is shown in Figure 13. It consists of 16 delay blocks and two types of independent decoders. The resolution of the TDC is the delay of a single buffer, which minimizes the delay mismatch compared to the delay of a single inverter. The reference clock waveform propagates through a chain of 8×16 delay elements whose outputs are sampled by 8 × 16 flip-flops at the rising edge of each DCO clock. VLSI Design 9 Counter 1 Bit 0 Bit 1 Bit 2 Bit 3 Local indicator "001" "011" "100" "011" "001" "011" Figure 16: Block diagram of the state machine. The decoding process is shown in Figure 14. The 16 bit output of the delay block is decoded into the higher 4 bits of T 1 and T 2 . At the same time, the 8 bit output of each delay block is also decoded into a series of lower 3 bits of T 1 and T 2 . Based on the output of decoder1, the proper set of T 1 (0 : 2) and T 2 (0 : 2) is selected. As is shown in Figure 14, the transition takes place in the 2nd and 6th blocks. As a result, the decoded output of those two blocks is selected as the lower 3 bits of T 1 and T 2 . The separation of the decoder into two parts has greatly reduced the design complexity.
As mentioned above, the nonideal setup time may result in an incorrect transition number N if T 1 is less than the setup time. The proposed integer counter that is designed to solve this problem is shown in Figure 15.
Using a delay buffer in the clock path, register2 is able to detect the closest rising edge of the reference clock while register1 cannot. The selected signal is the XOR output of the first delay block. When the rising edge of DCO clock is slightly behind the edge of the reference clock, the transition will take place in the first delay block. As a result, the XOR output of this delay block will generate a logic high signal and the output of register2 will be selected. Apparently, when DCO clock edge is slight ahead or T 1 is large enough, the XOR output of delay block 1 is logic low and the output of register1 is selected. This eliminates another possible error that register2 may store value of N + 1 instead of N when DCO clock edge is slight ahead of reference clock edge.
The state machine is the control unit of the proposed ADPLL, and it has five different kinds of working status as shown in Table 4. It takes the reference clock and DCO output as the input signals, and it outputs four-bit state signals such as bit0, bit1, bit2, and bit3 as well as a DCO reset signal as shown in Figure 16.
In the initial "000" status, the control word corresponding to T ref is stored in register1. After that the lock indicator generates a high-voltage signal and ADPLL switches to "001" status. The delay between DCO output and DCO clock T Delay will be measured by resetting the DCO using reference clock. Counter2 is used to make sure that the reset process only takes two cycles and it will be cleared after that. In "010" status, a new control word corresponding to T ref − T Delay is stored in register2 and the corresponding lock indicates that signal will switch the status to "011". Then, the reset process will restart again with the control word corresponding to After that, the ADPLL goes into the maintenance mode and the state machine will be locked. The five consecutive kinds of status ensure a fast lock-in and low-jitter ADPLL design.

Simulation Results of the ADPLL with the Proposed DCO
The proposed APDLL structure is designed and simulated using a 32 nm CMOS Predictive Transistor Model. The resolution of the TDC, which is the delay of a single buffer used in the delay chain, is 20 ps. The 12-bit digitally controlled oscillator has a coarse resolution close to 10 ps and fine resolution close to 1 ps with a tuning range from 570 MHz to 800 MHz. The lock-in process of the proposed ADPLL is illustrated in Figure 17 when locking to 700 MHz. The output of the state machine ensures five consecutive kinds of status during the lock-in process as shown in Figure 17(a). Two-frequency lock-in processes are completed during "000" status and "010" status, respectively, and the corresponding control words are stored in registers 1 and 2 as shown in Figures  17(b) and 17(c). The phase lock-in process takes 2 clock cycles, as in Figure 17(d). As a result, the whole lock-in process takes about ten reference cycles and phase acquisition process takes two cycles. Figure 18 shows an eye diagram to show the DCO jitter performance during the maintenance mode after acquisition. As shown in the figure, this ADPLL achieves a peak-to-peak jitter of 67 ps at 700 MHz with the power supply of 0.9 V. Table 5 shows a comparison of the proposed ADPLL with the conventional ADPLL in items of acquisition time, jitter, operation frequency, power consumption, and locking range.

Conclusion
A 32 nm CMOS 12-bit digitally controlled CMOS oscillator design for low power consumption and low jitter is presented. The presented DCO demonstrates a good robustness to process, voltage, and temperature variations and better linearity comparing to the conventional design. The performance and the functionality of the DCO are verified through a novel ADPLL that uses the proposed DCO. This ADPLL is designed and implemented using 32 nm CMOS Predictive Technology Model for a frequency ranges of 570 MHz to 800 MHz at 0.9 V supply voltage. The overall lock-in process of the ADPLL takes about 12 reference cycles at 700 MHz with a peak-to-peak jitter less than 67 ps. The power consumption of the DCO is 2.2 mW at 650 MHz with supply voltage of 0.9 V. The presented results demonstrate that the proposed design is viable for various clock control systems for full digital implementations. The proposed work will be a good reference for future advanced ADPLL such as ADPLL that multiplies the reference clock frequency by a fractional number without using a fractional number divider.