480 MHz 10-tap Clock Generator Using Edge-Combiner DLL for USB 2.0 Applications

,


Introduction
The clock generator may be one of the largest blocks of the physical layer (PHY) in wireline communications, because it usually consists of a phase-locked loop (PLL) that has large capacitors for use as a lowpass filter.There are many reports proposing the shrinkage of the design area of PLLs.A capacitance-multiplication technique was reported to shrink the capacitance of the loop filter [1,2].However, the shrink ratio of a capacitor may be less than five when taking the leakage current of the capacitor and PVT variation into consideration.Thus, the total design area cannot be drastically reduced.An all digital PLL (ADPLL) technique has also been reported [3].However, an issue with the accuracy operation against the PVT variation still remains.Therefore, a new approach is desirable for essentially reducing the design area.
The DLL has several advantages over the PLL.First, it can be designed to be smaller than the PLL.While the PLL is a higher-order system, the DLL is a first-order system and is always stable.Thus, the DLL needs small capacitors to keep the DLL loop stable while the PLL needs large capacitors to design a stable lowpass filter.Second, the DLL can achieve a shorter locking time than the PLL.Third, the DLL consumes less power than the PLL.The PLL has the VCO and a divider that consumes a large amount of power in order to reduce the jitter.However, the DLL has several disadvantages over the PLL when used as the clock generator.First, the DLL cannot generate faster clock signals than the PLL.Second, the DLL has a locking range limitation while the PLL does not.This means that the DLL cannot achieve a fractional multiplication ratio while the PLL can achieve a fractional-N PLL.
An edge-combiner DLL (ECDLL) has been reported as an alternate high-speed clock generator because the ECDLL is based on the DLL and can multiply the reference frequency [4][5][6][7][8].The ECDLL has a potential for use as the clock generator although it has barely been used in this capacity because the ECDLL has several challenges that need to be overcome.The first is an operation against PVT variation.The second is the output signal frequency limitation.
From the viewpoint of the frequency limitation, the operation frequency of the DLL has been increasing in recent CMOS process.The DLL can operate at less than 1 GHz in a submicron CMOS process.Thus, the DLL might be able to be use as the clock generator for wireline communications whose operation frequency is less than about 1 GHz.USB 2.0 is the most popular wireline communication in the world and operates at 480 MHz.USB 2.0 PHY needs a small design area and a low power consumption level for use in portable devices.Therefore, USB 2.0 may be one of the most suitable applications for the clock generator with the ECDLL.
In order to apply the ECDLL for USB 2.0, we propose techniques hat overcome the above-mentioned challenges.The first is a shot pulse generator to prevent from a harmonic lock.The third is the VCDL trimming function to operate against PVT variation.
In this paper, we propose a clock generator applied to an ECDLL for USB 2.0 PHY to shrink the design area [1].The organization of this paper is as follows.Section 2 describes the overall structure of the proposed clock generator architecture.Section 3 describes the ECDLL and DLL in detail.Section 4 presents the evaluation results of our measurements, and Section 5 concludes with a short summary of the key points.

Overall Clock Generator Architecture
Figure 1 shows a block diagram of a USB 2.0 PHY.The PHY consists of a clock generator, a band-gap reference (BGR), a controller (CNT), a driver (DRV), a receiver (RCV), and  × 550 µm 2 .The loop bandwidth is designed at 1.6 MHz using a second-order lowpass filter that consists of 130 pF and 34 pF capacitances and a 2.7 kΩ resistance.Thus, the lowpass filter occupies a large portion of the design area.Therefore, we proposed the ECDLL as a clock generator to shrink the design area.In this clock generator, there are three candidates, which are one ECDLL and one DLL, two ECDLL and one DLL, and three ECDLL and one DLL, as shown in Figures 3 and  4. The one ECDLL and one DLL structure consists of the DLL1 of the ECDLL that has the multiplication ratio of 40 and the DLL2 of the DLL that generates the 10-tap 480 MHz signals, as shown in Figure 3.The two ECDLL and one DLL structure consists of the DLL1 of the ECDLL that has the multiplication ratio of five, the DLL2 of the ECDLL that has the multiplication ration of eight, and the DLL3 of the DLL that has the same manner of the above DLL, as shown in Figure 4.The three ECDLL and one DLL structure consists of the DLL1 of the ECDLL that has the multiplication ratio of two, the DLL2 of the ECDLL that has the multiplication ratio of four, the DLL3 of the ECDLL that has the multiplication ratio of five, and the DLL4 of the DLL that has the same manner of the above DLL, as shown in Figure 3.
First, it is reasonable that the ECDLL has the multiplication ratio of less than 10, according to the design area and operation against from PVT variation.As the multiplication ratio is larger, the number of the VCDL stage is larger.It causes large design area.The one ECDLL and one DLL structure becomes two times as large as the two ECDLL and one DLL structure because the ECDLL of the multiplication ratio of 40 is large.The two ECDLL and one DLL structure is almost same design area as the three ECDLL and one DLL structure.And then, the previous DLL operates more slowly than the latter one.Thus, the delay cell size in the previous DLL may be smaller than that in the latter one.Therefore, to shrink the design area, the previous DLL might have smaller multiplication ratio than the latter one.
Second, the number of the cascade DLL block should be as low as possible because the operation of the whole clock generator could be stable and the settling period could be short.In our proposed clock generator, the standby sequence is necessary because the DLL may fall into the unlock state if the DLL starts to operate before the previous DLL completes the lock, as shown in Figure 4.
Finally, the two ECDLL and one DLL structure is proposed, considering above concern.The DLL1 and DLL2 have a multiplication ratio of five and eight, respectively.
The counter (CNT) generates the standby signals of each block (ST1, ST2, and ST3) using the standby signal of the clock generator (ST) to create a standby sequence for each DLL, as shown in Figure 4.If DLL2 starts the lock operation before DLL1 completes the lock, DLL2 might fall into the unlock state.Thus, the CNT controls the sequential wakeup operation by generating ST1, ST2, and ST3, as shown in Figure 4.    Our ECDLL and DLL have an issue of a harmonic lock.We propose shot pulse reset technique by using the shot pulse generator to resolve this issue.V C is likely to be V DD .This allows the first rise edge of the F B to come before the second rise edge of the F REF when the PD starts operation.Figures 8 and 9 show a circuit diagram of the shot pulse generator (SHOT) and the simulation results from the SHOT.After the ST1 is set to low, the R operates the pulse reset and the Q is set to high at the rise edge of the F REF .The pulse of the R resets the PD operation and the SW charges the capacitor during the period between the fall edge of the ST1 and the rise edge of the Q.After charging the capacitor, the V C is almost the V DD .Our ECDLL and DLL can be operated accurately because of the SHOT.

Charge Pump.
Figure 10 shows a circuit diagram of the CP.M5 and M6 are the switches that charge and discharge the capacitor.When the CP charges the current, the UP, UN, DP, and UN are high, low, low, and high, respectively.The charge current, which is the M13 drain current, passes through the switch M5 and charges the capacitor that is connected at the V C , and the M12 drain current passes through the switches (the M1 and M4) and flows to M10.When the CP discharges the current, the UP, UN, DP, and UN are low, high, high, and low, respectively.The discharge current, which is the M10 drain current, passes through the switch at M6 and discharges the capacitor that is connected at the V C , and the M13 drain current passes through the switches (the M3 and M2) and flows to M9.The Op-Amp is designed in the CP to structure the common-mode feedback.When the V C is not equal to the voltage of the V CM , the difference between the charge current and the discharge current is larger.This causes a constant phase error.Figure 11 shows the simulation results from the CP.The lock range, which is the range in which the V C is equal to the voltage of the V CM , is 0.427-0.884V under the worst conditions (ss/1.05V/125 • C).In the VCDL, the sensitivity of the VCDL is important for DLL operation.Figure 15 shows the explanation of the sensitivity of the VCDL and the DLL settling operation.If the sensitivity of the VCDL delay-current characteristics is larger at the lock point, the DLL settling operation may not be stable as shown in Figure 15(d).It is the reason that the overshoot is large because the magnitude of the delay change per one clock cycle is large.To prevent from this unstable state, the VCDL sensitivity is designed small by using large delay cell for VCDL, as shown in Figure 15(a).However, this design causes large power consumption and the malfunction may be caused in the worst condition if the sensitivity is designed too small, as shown in Figure 15(b).

VCDL.
The delay is mainly generated as the control current and input capacitor, which is gate capacitor of M4 and M2, and a parasitic capacitor between delay cells.If the buffer MOSs (M4-M2 and M5-M3) are designed small, the necessary delay is obtained by small current.However, this causes large sensitivity.Thus, the buffer MOSs are not designed small.Figure 16 shows the VCDL delay-current characteristics by using variable delay cell.As the size of the delay cell is larger, the sensitivity at the necessary delay point is smoother.
Figures 17, 18, and 19 show the postlayout simulation results of the VCDL delay-current characteristics for DLL1, DLL2, and DLL3, respectively.The VCDL for DLL1 can achieve a target delay of 8.3 ns at about 9 µA under variable conditions.The VCDL for DLL2 can achieve a target delay of 1.04 ns at 80 µA under typical and the best conditions, which are tt/1.20V/25 • C and ff/1.35V/−40 • C, respectively.However, under the worst condition, which is ss/1.05V/125 • C,  22(a).The EC can operate variable input signal as shown in Figure 22(b).If the SRFF cannot operate accurately by the leakage current, the output signal of the EC slips the clock in part.However, the EC can get all clock edges of the each signal at variable V C as shown in Figure 22(b) and it can operate at variable conditions as shown in Figure 23.
Figure 24 shows the postlayout simulation results from the VCDL and EC for DLL2.The one cycle delay is obtained at between 0.45 V and 0.50 V in ff/1.30V/−40 • C and tt/1.20 V/25 • C as shown in Figures 23(a) and 23(b), and at between 0.50 V and 0.60 V in ss/1.05V/125 • C. The EC can get all clock edges of the each signal at variable V C and it can operate at variable conditions as shown in Figure 24. Figure 27 shows the postlayout simulation result of the DLL2 locking operation.The simulation condition is tt/1.2V/25 • C. The DLL2 has a capacitor of 0.5 pF.After ST2 is set to low at about 200 ns, the R is set to high and then a shot pulse occurs.The PD operation is reset by the shot pulse, as shown by the UP and DN signals in Figure 27.After that, the PD generates a wide DN pulse and then the V C decreases.Finally, DLL2 completes the lock at about 400 ns.When DLL2 completes the lock, the V C is about 0.6 V. Figure 28 shows the VCDL output signals after the DLL2 completes the lock.The EC can generate the output signal (F O2 ) of 480 MHz.
Figure 29 shows the postlayout simulation results of the DLL3 locking operation.The simulation condition is tt/1.2V/25 • C. The DLL2 has a capacitor of 1 pF.After ST3 is set to low at about 50 ns, the PD operation is reset by F O [3] F O [4] F O [5] F O [6] F O [7] F O [8] Figure 31 shows the postlayout simulation results from a clock generator that consists of DLL1, DLL2, and DLL3.After ST is set to low, ST1 is set to low first.At this time, ST2 and ST3 remain high.F REF inserts DLL1 and the V C is nearly V DD due to a precharge.The PD generates a wide DN pulse at first because of the precharge.The V C decreases due to the wide DN pulse.At about 2 µs, DLL1 completes the lock and generates F O1 , which is the 60 MHz clock signal.ST2 is set to low at about 1 µs.It is essentially set to low after DLL1 completes the lock.However, in this simulation, it is set to low before the DLL lock time.The V C in DLL2 is almost V DD due to the precharge.After ST2 is set to low, the PD in DLL2 generates a wide DN pulse and then the V C is soon almost 0.5 V.At about 2 µs, DLL2 completes the lock.ST3 is set to low at about 2 µs.It is essentially set to low after DLL2 completes the lock, but in this simulation, it is set to low before the DLL lock time, too.After ST3 is set to low, the V C at first remains almost V DD .The V C decreases at about 2.6 µs and finally is about 0.5 V. DLL3 completes the lock and generates the 10-tap 480 MHz clock signals.The total lock time of the clock generator is about 3.0 µs.In general, the locking time of the DLL is defined by the capacitance and CP current.When the CP current is large for the capacitance, the locking time is short, but the locking operation is barely stable.DLL1 is designed to be stable because the phase error of DLL1 directly influences the other DLLs.DLL2 and DLL3 are designed to achieve a fast locking time because the clock generator can achieve it.Then, DLL2 and DLL3 start to operate before the forward DLL completes the fast locking time.When DLL2 starts to operate, F O1 is almost 60 MHz.Thus, DLL2 can accurately operate.

Measurement Results
A 90 nm CMOS process was used to fabricate our proposed clock generator for use as a USB 2.0 PHY. Figure 32 shows the measurement results of the output signal F O [9].The measurement signal is F O [9] divided by eight.The clock generator output signal frequency is 480 MHz.The jitter is less than 0.8 psrms.Figure 33 shows the measurement results of the EYE pattern for the USB 2.0 specifications.The USB 2.0 PHY with our proposed clock generator can pass these specifications.Figure 34 shows the measurement results of the random data pattern in the USB 2.0 specifications.The USB 2.0 PHY with our proposed clock generator can operate random data that meets the USB 2.0 specifications.Figure 35 shows the layout of the chip.Our proposed clock    generator consists of three DLLs that is half the design area as that of the conventional one that consists of the PLL.
Our clock generator consists of three DLLs.However, each DLL has a small capacitor to maintain the loop stability.Thus, our clock generator is smaller than the conventional one that has a large capacitor in the loop filter.Table 1 is a comparison table.The proposed clock generator has a power consumption of 1.3 mW, which is less than that of the conventional one, which is based on the PLL as shown in Figure 2. The ECDLL operates at the necessary reference signal frequency in the DLL loop that includes the VCDL.Thus, the power consumption is less than that of a PLL that has a VCO and a divider.A locking time of less than 3.5 µs can also be achieved.

Conclusion
We proposed novel clock generator architecture to shrink the design area.The proposed clock generator consists of two edge-combiner DLLs and a DLL.A shot pulse generator is used in the DLLs to prevent from harmonic lock and a CP with common-mode feedback is used in the DLLs to reduce the pattern jitter due to a constant phase error.A controller is used to control the wake-up sequence to prevent malfunctions.Our proposed clock generator is fabricated using a 90 nm CMOS process.It can achieve 10-tap 480 MHz clock signals that meet the USB 2.0 specifications.A power consumption of less than 1.3 mA was also achieved.Our USB 2.0 PHY with this clock generator also meets the USB 2.0 specifications.Our proposed clock generator needs only half the design area of the conventional one, which is based on the PLL.

Figure 3 :Figure 4 :
Figure 3: Block diagrams of candidate clock generator based on DLLs.Upper block diagram is the structure that consists of one ECDLL and one DLL.The ECDLL has a multiplication ratio of 40.Bottom block diagram is the structure that consists of three ECDLL and one DLL.The DLL1, DLL2, and DLL3 have multiplication ration of two, four, and five, respectively.

Figure 5 :Figure 6 :Figure 7 :
Figure 5: Block diagram of DLL1 and DLL2.In DLL1, F IN and F O are 12 MHz and 60 MHz, respectively.The VCDL generates 10-tap 12 MHz output signals.The capacitor is 10 pF.In DLL2, F IN and F O are 60 MHz and 480 MHz, respectively.The VCDL generates 16-tap 60 MHz output signals.The capacitor is 0.5 pF.

Figure 8 :
Figure 8: Block diagram of shot pulse generator.

3. 1 .
Shot Pulse Reset Technique.Figures5 and 6show a block diagram of our proposed ECDLL applied to DLL1 and DLL2 and DLL applied to DLL3.They consist of a phase detector (PD), a charge-pump (CP), a switch (SW), a capacitor, a voltage-controlled delay line (VCDL), and shot pulse generator (SHOT).And then the ECDLL has an edgecombiner (EC) and the DLL has the output buffer (BUF).The PD makes a comparison between the phase of the F REF and a phase of the feedback clock (F B ) and generates result signals (UP/DN).The CP charges and discharges the capacitor.The VCDL generates the output signals from the F REF .The delay time from F REF to F B is controlled by a controlled voltage (V C ).In Figure6, the EC generates the output signal (F O1 ) from VCDL output signals.

Figure 22 :
Figure 22: Postlayout simulation results from VCDL and EC applied to DLL1 for variable V C .The simulation condition is ff/1.30V/−40 • C.

3. 5 .
Lock Operation.Figure25shows the postlayout simulation results of the DLL1 locking operation.The simulation condition is tt/1.2V/25 • C. The DLL1 has a capacitor of 10 pF.After ST1 is set to low at about 100 ns, the R is set to high, and then a shot pulse occurs.The PD operation is reset by the shot pulse, as shown by the UP and DN signals in Figure25.After that, the PD generates a wide DN pulse and then the V C decreases.Finally, DLL1 completes the lock at about 1 µs.When DLL1 completes the lock, the V C is about 0.6 V. Figure26shows the VCDL output signals after the DLL1 completes the lock.The EC can generate the output signal (F O1 ) of 60 MHz.

Figure 33 :Figure 34 :
Figure 33: Measurement results from EYE pattern of fabricated PHY.The results meet the USB 2.0 specifications.