AnOptimization-Based Reconfigurable Design for a 6-Bit 11-MHz Parallel Pipeline ADCwith Double-Sampling S & H

This paper presents a 6 bit, 11 MS/s time-interleaved pipeline A/D converter design. The specification process, from block level to elementary circuits, is gradually covered to draw a design methodology. Both power consumption and mismatch between the parallel chain elements are intended to be reduced by using some techniques such as double and bottom-plate sampling, fully differential circuits, RSD digital correction, and geometric programming (GP) optimization of the elementary analog circuits (OTAs and comparators) design. Prelayout simulations of the complete ADC are presented to characterize the designed converter, which consumes 12 mW while sampling a 500 kHz input signal. Moreover, the block inside the ADC with the most stringent requirements in power, speed, and precision was sent to fabrication in a CMOS 0.35 μm AMS technology, and some postlayout results are shown.


Introduction
The ADC design for a multistandard receiver system has different ways to be developed seeing that both the involved standards and the selected architecture face their own drawbacks and implementation issues.A multistandard receiver is not only a combination of isolated systems operating under each of the standards, but a system capable of working in an efficient way under those dynamic conditions.To do that, some desired capabilities are reconfigurable computing and the possibility of sharing and reusing as many blocks as possible between the operation modes.
The time-interleaved pipeline architecture is frequently used to satisfy the previous requirements in high speed, moderate resolution applications [1][2][3].Its main advantage is the flexibility, hence different number of time-interleaved branches and pipeline stages can be enabled/disabled to configure variable resolution and sampling frequency, thus leading to a reconfigurable system.Figure 1 shows a 2channel, 4-stage version of the architecture, which could provide 12 bits @ 2.75 MS/s and 6 bits @ 11 MS/s for a GSM/Bluetooth receiver.There are, however, some drawbacks related to the parallelism of time-interleaved pipeline ADCs, such as channel offset, gain and timing mismatch.A front-end sample and hold (S&H) circuit is the most straightforward way to avoid timing skew between channels, as shown in Figure 1 [3].After this S&H block operating at the full-sample rate of the converter, input signals are not anymore continuous.Thus, exact sampling moments of the first pipeline stages over these new ideally constant input signals are no longer critical.Additionally, if double sampling techniques are used, changes between sampling and hold phases are identical for both branches, reducing timing mismatch [4].Channel offset and gain mismatch are also diminished by reusing amplifiers, making capacitor mismatch the most important error source [5].
This work presents a single-standard version of the time-interleaved pipeline ADC, which meets Bluetooth specifications while minimizing power consumption and mismatch issues inherent to the architecture.This simpler design allows going deeper into the architecture details as well as preparing it toward a multistandard implementation.In addition, optimization techniques are also applied and explored looking for an even lower power consumption in the most elementary circuits of the A/D converter [6].Finally, as an extended version of [7], this work upgrades the supporting material for GP and the design strategy.Also, new intermediary results are included to show how the complete ADC and the S&H work in pre-and postlayout simulations, respectively.
The architecture of the ADC is described in Sections 2 and 3 presents some of the blocks in the topology.Next, design process goes down to a lower hierarchy level, to describe the OTA and comparator designs using GP in Section 4. Simulation results are presented in Section 5 and finally, conclusions are drawn in Section 6.

System and Architecture Level
The first step in the ADC design is to know the selected wireless standard.Table 1 shows the main Bluetooth specifications affecting the converter.Albeit general for the entire receiver chain, these specifications can be used in system level simulations and analysis to determine the ADC requirements, as can be seen from Table 2. Resolution specification is taken from SNR-BER graphs, along with profiles of channel adjacent interferences (see Figure 2) and design margins.Sampling frequency is derived from system level simulations taking into account settling times and preamble length.Furthermore, linearity requirements (DNL and INL) are defined to guarantee a monotonic converter [3].As seen from Figure 1, there are 2 variables to play with in a time-interleaved pipeline architecture: the number of parallel channels and the resolution of pipeline stages.First, the inverse relation between number of channels and Furthermore, the less bits every stage resolves, the more stages are needed to provide the required total resolution.This leads to a larger power consumption, yet also to looser specifications for the comparators into the sub-ADC.On the other hand, each additional bit duplicates the number of comparators and divides by two the allowed offset in the sub-ADCs of the pipeline stages.Nonetheless, this also requires less power consuming MDACs (Multiplying DAC) [5].
After this discussion as well as some parametric analysis, an architecture with 2 channels and 2 pipeline stages is proposed in Figure 3.Each parallel pipeline chain operates at 5.5 MS/s to get a total sampling rate of 11 MS/s.In spite of the first 4 bit complete stage, which includes sub-ADC and MDAC blocks, the second 3 bits stage has only the sub-ADC block.Moreover, from the 7 output bits, only 6 are effective and 1 (from the first stage) is used for Redundant Sign Digit (RSD) correction.The previous task is done with additional digital circuitry, which also combines and multiplexes the bits towards a single 6-bit digital output word.
The errors within the pipeline stage may appear in four different points, as shown in Figure 4.The sub-ADC nonidealities produce e ADC , while circuit limitations of the S&H, DAC, and residue amplifier add e S&H , e DAC and e G components, respectively.Nonetheless, the previous three components are jointly generated by one single block: the MDAC.By doing so, error sources are identified in Figure 4, as well as the circuit performance parameters to eliminate those undesired characteristics.Some of these parameters are stage resolution, amplifier gain precision, offset voltages and noise levels, among others [3].Therefore, taking into account linearity specifications from Table 2 and error scaling throughout the pipeline stage gains backwards (1), restrictions in similar forms to (2) can be used to determine the different stage specifications presented in Table 3. (2)

Block Level
Going deeper into the architecture, the block details and functions are revealed.Accordingly, Figure 5 shows the details of the implemented converter.Not only are the analog blocks presented, which consist of S&H, sub-ADCs, MDACs, buffers, and bias circuits; but the digital circuitry is also unfolded in clock generation, synchronization, combination, multiplexers, and correction circuits.All of these blocks were full-custom designed and will be introduced in the next lines as well as some of their design considerations.The S&H circuit is normally implemented with switched capacitors (SCs) architectures including an amplifier as their central component.Because of the pure capacitive load, the central amplifier employs single-stage topologies (OTAs), which are considered the fastest and most power-efficient ones [3,5].Though the OTA improves S&H performance, its non-idealities, including finite gain and bandwith, margin phase, slew rate, offset and noise, limit the circuit specifications.During operation, it is required only in the hold phase, thus remaining idle when sample action takes place.The double sampling technique takes advantage of this idle time for using the amplifier.Despite providing samples at double speed, power consumption remains almost unchanged since the referred power is dominated by the amplifier, which normally uses class A architectures that dissipate power even when idle [5].
The S&H schematic is shown in Figure 6, besides the signals controlling its operation.This S&H employs bottomplate sampling with switches S7N(P) − S8N(P) and phases φ 1,2e to reduce the component of charge-injection depending on the input signal.By using the fully differential architecture, the other constant components of error are also diminished.In addition, the shared switches S9N(P) with phase φ at the full sample rate f s eliminate the parallelism at the sampling instant, thus making the S&H timing-skew insensitive.
To specify the S&H block, a minimum sampling capacitance is first determined from noise (3), mismatch, and parasitic components requirement.Since input signals are stored into only one pair of capacitors at each sampling moment, and that OTA also adds thermal noise from its active devices (v n,amp 2 ), total output (and input, because it is a unity gain circuit) referred S&H noise is given by ( 3), where γ is a channel length dependent noise-excess factor.If assumed that OTA design guarantees a minimum input noise

Slewing time
Settling time for OTA level at S&H operation frequency, its component in (3) may be considered negligible.As result, sampling capacitors are the dominant thermal noise source in the S&H.
Once the MOS switches are sized in order to avoid degrading the frequency response of the amplifier or limiting the finite-bandwidth input signal [5], OTA specifications can be described.The combination of the S&H e G specification in Table 3 and expression (4) gives the DC gain (A o ) requirement for the OTA.The transfer function in right side of ( 4) is obtained by applying the charge conservation principle in circuit of Figure 6 and assuming that no charge leakage occurs between phases φ 1 and φ 2 , giving (5) which can be approximated as in (6) When single pole model is used for OTA during hold phase, settling time for a step input voltage is determined by the unity gain-bandwidth product GBW.This assumption, however, is valid only if OTA is designed in such a way that its frequency response is close to the single pole response.Consequently, there is one dominant low-frequency pole, while the others poles and zeros lie at much higher frequencies [5].As result, ( 6) can be re-written as in (7), where pA 0 = 2πGBW with p beeing the dominant pole.By applying inverse Laplace transformation, expression (8) is obtained, where t GBW is the exponential settling time for V out : Right side of Figure 7 shows t GBW and t SR times.Because of double sampling, S&H output has to settle before the half clock period to guarantee the sub-ADC and MDAC in the first pipeline stage has enough time to sample it.In addition, it is a good practice to reserve 1/3 of the settling time for the slewing and the rest for the GBW limited part [5].By using the latter ideas, to achieve the DNL and INL requirement after settling, (9) needs to be satisfied if a single pole system is guaranteed by means of the OTA frequency response, that is, the unity gain-bandwidth (GBW) specification Besides the unity gain-bandwidth, the finite OTA output current to charge and discharge the capacitive load limits settling time as well.This prevents OTA outputs to follow large voltage steps faster than its slew rate.By the way, (10) can be used to find the SR requirement taking into account that V step-max is the maximum step size at the OTA output and t SR is shown in the right side of Figure 7. Unlike the very common assumption V step-max = V FS in the literature [3], this work goes deeper into this specification because of the strong and direct SR effect on power consumption of the OTA.The previous assumption is valid when S&H circuit applies a reset between consecutive samples.Nevertheless, S&H of Figure 6 behaves as a track and hold topology, the output of which tracks its input during the sampling mode.In addition, the acquisition phase has been suppressed to use double sampling technique, and the S&H operates in a very similar way to a zero-order hold.For the latter kind of system, consecutive samples can be very close in amplitude depending on f s and f in ratios, as shown in the left side of Figure 7. Thus, it is evident that there is no possibility that OTA output has to follow voltage steps as large as V FS .A more realistic value of V step-max can be obtained from the maximum derivative of a sinusoidal input signal V in = (V FS /2)sen(2π f in t) times the time slot between consecutive samples, as illustrated in Figure 7 [7].
The quantization process in each pipeline stage is executed by low-resolution sub-ADCs.In order to maximize the available settling time for S&H/MDAC outputs, so that delays are reduced as well as signal dependent conversion errors, the flash architecture is chosen for these blocks.The topology consists of a comparator bank followed by registers that synchronize the output bits with the system, using a carefully selected V LATCH signal.Furthermore, thermometer output bits have to be converted into binary codes.Gray codification is also applied as an intermediate step to diminish spike errors from thermometer transitions.Owing to RSD correction, 4-bit sub-ADC in the first stage has 14 comparators (Figure 8), while second stage uses 7 comparators to produce 3 bits.The two main comparator specifications, offset and speed, are derived from e ADC in Table 3 and timing (sampling frequency) in Figure 7, respectively.
The MDAC is the other main block within the pipeline stages, except for the last one in the chain.Its function is bringing the sub-ADC output back to analog domain, subtracting it from the previously sampled stage input, and then, amplifying the final residue, which will be used as input for the next stage (Figure 5).The MDAC is also based on an SC architecture and has a capacitor bank rather than a single sampling capacitor (Figure 9), so that all the S&H design considerations and analysis are valid, too.Using again the noise and the other stage specifications from Table 3, the different components of the MDAC architecture can be specified [5,8].
The above-mentioned necessity of clock phases generation, bit combination and codification now brings digital circuitry into focus.The circuit in Figure 10 generates the different clock pases required by the SC circuits of Figures 6 and 9 and the control signals for the comparators and registers in Figure 8 [5].It is a standard divide-by-2 architecture using a D-flipflop [9] and 2 cross-coupled NAND gates along with delay chains to control the phase duty cycles.
Figure 11 shows an arrangement of registers, inverters, and NAND, OR, XOR gates that codes into thermometer representation, synchronizes with an extra stages and finally applies RSD to digitally correct the output bits coming from the sub-ADC in the pipeline stages.A glitch-free intermediary Gray coding stage is included as well.RSD combination was developed with a Carry-Lookahead Adder and static logic gates were used all over the digital block.

GP on Circuit Level
GP is applied here to minimize power consumption and optimize performance under specific requirements, for both the OTA and comparator design presented as follows.
4.1.Geometric Programming.GP is a special kind of mathematical optimization problem in which the objective function and restrictions belong to a set of functions with a particular form, thus satisfying some specific conditions.A geometric program is itself a complex nonlinear optimization problem, however, it can be turned into a convex problem through variable changes and transformations of the related functions.Then, it can be solved by very efficient algorithms available from a number of companies and research groups working on the matter.
Despite the above benefits, GP is very restrictive about its formulation.Just monomial and posynomial expressions can be part of a geometric program.These function types are shown, respectively in (11) and (12) where c k > 0, a i is any real number and x 1 , . . ., x n are n real and positive variables.It is really important to identify which operations do not modify the original monomial and posynomial structures because GP is very restrictive in this issue A geometric program in standard form is an optimization problem with the format: where f 0 is called the objective function, f i are inequality restrictions and g i are equality restrictions.In a geometric program in standard form as described in (13), functions f i are posynomials, g i are monomials, and x i are the optimization variables, with the implicit constraint that the variables be positive (x i > 0).In standard form GP, the objective must be posynomial (and it must be minimized); the equality constraints can only have the form of monomial equal to one, and the inequality constraints can only have the form of posynomial less than or equal to one [10,11].This

International Journal of Reconfigurable Computing
Digital output Thermometer to binary encoder      GP solution is based on very efficient algorithms, specially designed for convex optimization.
By developing design via GP, a basic strategy (it requires some modifications for GP formulation incompatibilities) to obtain optimal circuits can be detailed as follows: (1) circuit mathematical formulation in GP standard form, (2) required transistor parameters identification and modeling, (3) optimization file construction taking models and design specifications as inputs, (4) results verification using a circuit simulator, (5) new modeling regions identification after GP solution and return to point 2.

OTA and Comparator Design.
In this work, fullydifferential folded cascode topology was chosen for the OTA circuit, as shown in Figure 12.This architecture is preferred instead of the telescopic one because of its wider input and output dynamic ranges.These voltage swings are very important for this application because some signals may have maximum amplitudes as large as V FS , mainly in the S&H.A SC-CMFB circuit controlling the output common-mode V CM,OUT is shown in Figure 12, too [7].Before applying GP, OTA design space is delimited owing to offset requirements in Table 3. Accordingly, a random offset theoretical estimation is made through parametric variations of the involved transistor sizes shown as.
where ΔV th and ΔK stand for threshold voltage and gain mismatch parameters, respectively, while g m is the transconductance and V GS the gate-source voltage.A prototype designed via GP and fabricated in a 0.35 μm technology is presented in [12] along with some experimental results.Point 1 in GP methodology requires the main performance parameters to be expressed as restrictions for   mn3p2 minimum values of DC gain in (15), unity gain bandwidth in (16), SR in (17), and an approximation of phase margin in (18), where ρ 2 , C L,tot and C d(M9),tot are the nondominant pole and the total capacitances including parasitics at the output node and drain terminal of M9 (10) in the schematic of Figure 12, respectively.All these expressions were formulated from the equivalent circuit shown in Figure 13, where the body effect of transistors M1(2) was ignored since its small signal source voltage v s 0 (ideally balanced differential pair).A similar simplification was applied to transistors M7 (8) because they are PMOS that can have their own isolated N-wells in a P-substrate CMOS process.In addition, biasing and devices geometry conditions are also formulated into the GP, the objective of which is minimizing OTA power consumption established in (19) A low area, low power dynamic architecture (Figure 14) can be used for the comparators since the RSD correction loosens the offset restrictions in the sub-ADC [5].Inner threshold generation prevents the use of the typical resistance ladder in Figure 8.Even though it suffers from offset, this can be tolerated in low-resolution applications, as 3 and 4-bit flash sub-ADCs.Threshold voltage is set by current division in the crossed differential pairs.Assuming 14, and using large signal models for transistor currents, expression (20) is obtained, with K = μ•C ox as the transistor gain factor.
GP is initially applied to optimize power consumption and delays in the basic comparator, which needs an external threshold voltage.Afterwards, (20) is used to establish the threshold voltages needed in the sub-ADC by modifying the transistor size and current ratios: (20)

Results
Even though the complete ADC of Figure 5 was designed, only the S&H block reached the silicon fabrication phase so far.The complete converter needs a longer time to get to its finished layout and be sent to chip integration due to the size and complexity of its architecture.Therefore, this section exhibits some prelayout simulations for the Time-Interleaved Pipeline ADC from the circuit level up to the top system, and some postlayout results for the Sample and Hold block together with other auxiliary circuits to be introduced later.First, from the circuit level, comparator yield was simulated to evaluate whether the offset of the dynamic topology would affect the required resolution.In Figure 15, each point is the result of 100 Montecarlo runs, indicating the percentage of right decisions owing to a known input of the comparator.The closer the input signal is to the threshold, the lower the yield is. Yield rises over 95% when the difference between when the difference between input and threshold voltages exceeds ±v offset .In a different manner, this can be understood as the probability of the comparator having an offset voltage lower than v offset [13].The offset voltage was simulated for the OTA too, as shown in the left side of Figure 16 together with an input dynamic range result, whose specified value was greater than 1 V and it requires the amplifier to be simulated in a unity feedback loop.Offset voltage is given by v offset ≤ μ ± 3 • σ, where μ and σ are the mean and the standard deviation in Figure 16, respectively.Consequently, v offset < 7 mV for the S&H OTA simulated in this case, fulfilling the requirement of Table 3. From the block level, the S&H circuit was simulated while sampling a slow ramp signal in Figure 17.Discrete output against analog input are shown, as well as nonideality details and block current consumption.Observed glitches come from nonlinear variation of CMOS switch resistances and reduced output resistance in OTA cascode transistors (Figures 12 and 13), since they start working close to their triode region when V in approaches V FS .This is why output and input DR (see Figure 16) are important specifications for the OTA design.CMOS bootstrapped switches can be used so as to further reduce those glitches.

International Journal of Reconfigurable Computing
Finally, simulations results from the complete ADC are shown in Figures 18 and 19, while sampling a maximum frequency (500 kHz) input tone using maximum sampling rate (11 MS/s).Time-interleaved pipeline operation can be observed at digital output word in Figure 19 as result of multiplexing the 6-bit outputs from 2 parallel channels, whereas its FFT transform in Figure 18 allows making an estimation of the ADC frequency characteristics.Linearity can be quantified from curves in Figure 20.Better DNL and INL measures require Montecarlo simulations, yet that would take so much longer.
Table 4 shows some results from the prelayout simulations, which characterize the designed ADC along with the other specifications presented in   I core OTA  of 2 reasons.The first of them is that these numbers are not silicon measurements, and the second is that it is really hard to find another converter with similar resolutionsampling frequency-technology-architecture characteristics in the literature, mainly due to the particularity of this application.In a future work, when the complete ADC is sent to fabrication, it will be worth the comparison.
As previously stated, the S&H block was sent to fabrication.Indeed, the complete chip including testing circuits looks like the schematic in Figure 21, and Figure 22 shows power supply.The final layout size including pads is 800 μm × 1400 μm, yet the core is only 500 μm × 255 μm.

Conclusions
By carefully deriving key circuit specifications, it is possible to reduce their strong impact on total system power dissipation.Following this idea, a low power time-interleaved pipeline ADC for Bluetooth standard was designed.A survey on the specification process from standard to the elementary circuits was made to justify the 6-bit, 11 MS/s, 2 timeinterleaved channel, 2 pipeline stage topology selection.This is the first step toward a reconfigurable system implementation.Indeed, the skill and architecture knowledge gained from this work will turn, easier and faster, the multistandard application into a power-efficient solution.
It is difficult to compare this work results due to its very specific characteristics, which do not follow the actual trends of rising either the sampling rate or the resolution, but meet the Bluetooth standard requirements, instead.
The main support for this paper contribution is the application of GP, seeing that it provides a better knowledge and experience of circuit behavior.When designs are developed using GP, it is easier to detect relations and trends between circuit requirements and design variables, allowing identifying possible optimization focuses for global system performance, as in the complete ADC presented here.
As the main contribution of this work over its original version in [7], a prototype 11 MHz S&H was designed in a 0.35 μm 3.3 V CMOS process to verify its central OTA design via GP.Some testing blocks are also in the chip, yet not intended to influence the S&H performance itself.Over a stacked and common centroid-structure chip layout, future measurements aim to demonstrate the optimized power consumption while operating under the highest speed requirements a block would face into this time-interleaved pipeline ADC.

Figure 16 :
Figure 16: OTA offset and input DR simulations.

Table 1 :
Some specifications for Bluetooth standard.

Table 2 :
Some design specifications for the ADC.

Table 3 :
Specifications for pipeline ADC stages.

Table 2 .
It is difficult to compare these prelayout values with other works because