Recent Advances on the Design of High-Gain Wideband Operational Transconductance Amplifiers

Feed-forward techniques are explored for the design of high-frequency Operational Transconductance Amplifiers (OTAs). For single-stage amplifiers, a recycling folded-cascode OTA presents twice the GBW (197.2 MHz versus 106.3 MHz) and more than twice the slew rate (231.1 V/μs versus 99.3 V/μs) as a conventional folded cascode OTA for the same load, power consumption, and transistor dimensions. It is demonstrated that the efficiency of the recycling folded-cascode is equivalent to that of a telescopic OTA. As for multistage amplifiers, a No-Capacitor Feed-Forward (NCFF) compensation scheme which uses a high-frequency pole-zero doublet to obtain greater than 90 dB DC gain, GBW of 325 MHz and better than 70◦ phase margin is discussed. The settlingtimeof the NCFF topology can be faster than that of OTAs with Miller compensation. Experimental results for the recycling folded-cascode OTA fabricated in TSMC 0.18 μm CMOS, and results of the NCFF demonstrate the efficiency and feasibility of the feed-forward schemes.


Introduction
The growing demand for high-speed and high-precision analog ICs dictates stringent design specifications for the amplifiers which are the basic building blocks for numerous applications; IF switched-capacitor (SC) filters and highresolution data converters with sampling frequencies above 100 MHz require very fast OTAs with settling times less than 4 nanosecods for good performance . High-gain amplifiers use cascode structures or multistage designs with long channel length transistors biased at low current levels while high-bandwidth amplifiers use single-stage designs with short channel length transistors biased at high current levels.
For single-stage amplifiers, the folded-cascode (FC) OTA has a higher signal swing than a telescopic OTA while still presenting a single parasitic pole and relatively large DC gain, and hence it is commonly used for high-frequency applications [5,[7][8][9][10][11][12][13][14][15][16][17][18]. For such applications the typical FC structure presents some limitations. PMOS drivers are predominately used for their lower flicker noise and higher frequency parasitic pole, but the bandwidth is limited because of the lower carrier mobility in PMOS devices. If NMOS drivers are used, the settling behavior suffers because of the lower-frequency parasitic pole, and in order to extend the bandwidth, several phase compensation schemes have been reported in literature [8][9][10][11][12]. Another limitation of the FC, regardless of driver type, is that the maximum slewing current is roughly half the total OTA current unlike the telescopic OTA which utilizes the total current. It is shown that the recycling folded-cascode (RFC) OTA can alleviate many of the conventional FC limitations; it can settle faster and more accurately, boost slew rate, and improve overall efficiency.
In multistage amplifiers, cascading of individual gain stages increases the overall amplifier gain, but each stage introduces a low frequency pole, which produces a negative phase shift and degrades the phase margin. Many phase compensation schemes for multi-stage amplifiers have been reported in literature [6,7,[19][20][21][22][23][24]. Most of these are variations of the basic Miller compensation scheme for a two-stage amplifier. The NCFF compensation scheme employs a feed-forward path to create LHP zeros but does not use any Miller capacitor [25]. This topology results in a higher-gain-bandwidth product (GBW) with a fast step response.
The theoretical aspects of feed-forward techniques are discussed in Section 2. Section 3 deals with feed-forward techniques associated with the FC OTA and introduces the RFC. A design case study in Section 4 compares several OTA aspects of the FC and RFC. High-gain two-stage amplifiers without Miller compensation are considered in Section 5. Section 6 describes the circuit simulation and experimental results, and the conclusions are drawn in Section 7.

Settling-Time in the Presence of a Pole-Zero Pair
A macromodel of the capacitive amplifier used in switchedcapacitor circuits is shown in Figure 1(a). By using conventional circuit analysis techniques, the small signal transfer function can be calculated and is given by where A v = g m /g 0 and β = C 2 /(C 1 +C 2 +C 3 ) are the amplifier open-loop DC gain and the feedback factor, respectively. A typical open and closed loop magnitude response is depicted in Figure 1(b). The location of the pole is given by where C L (= C 4 + β(C 1 + C 3 )) is the effective loading capacitor. The typical step response of the critically/overdamped capacitive amplifier is shown in Figure 2. It consists of two phases: the first is limited by the slew rate and the second by the closed loop bandwidth. The error in the final value is determined by the factor 1/(βA v ) as can be seen in (1).
Single-stage OTA slew rate (SR) is determined by the amount of current that can be delivered to or extracted from the output and the effective load capacitor (SR = I/C L ). The bandwidth limited phase is determined by both the effective pole's frequency ω Peff and phase margin, and in many practical low-voltage cases dominates the overall settling time. If the slew rate and the RHP zero effects are ignored, the closed-loop pulse response of the amplifier is given by (3), where α = C 1 /C 2 is the ideal amplifier's gain: A high-performance amplifier should have a high ω Peff for fast settling and a high DC gain A v for final value precision. The analysis of the amplifier impulse response in the presence of a pole-zero doublet is more complex; in [26][27][28] it was shown that the presence of low-frequency pole-zero pairs may generate slow components that reduces significantly the amplifier's speed. This is not the case if high-frequency pole-zero doublets are present. In order to consider the effects of high-frequency pole-zero pairs, the overall open-loop transconductance of the amplifier can be simplified as If the right-hand side zero g m /C 2 is ignored, using (1) and (4), the closed-loop transfer function is obtained as  Figure 3: Typical root locus for a system with two poles and 1 zero: (a) zero is located at high frequencies, and (b) zero is located between the poles. In both cases the dominant pole is terminated by the zero.
where A denotes 1 + (1/ω z + 1/ω Peff + g 0 /βg m ω p )s, and ω Peff is defined by (2). According to (5), the closed-loop poles are located at For real poles, both poles are located above and below (ω p (1+ ω Peff /ωZ) + g 0 /C L )/2. If the poles are complex conjugate, the magnitude of the real part is above ω p /2. Notice that the zero reduces the imaginary part of the poles. Figure 3 shows the typical root locus of a 2-pole and 1-zero system. In both cases, the lowest-frequency pole is close to the frequency of the zero if enough feedback is used. A common case for the feed-forward amplifiers to be discussed in the following sections is shown in Figure 4, which corresponds to the rootlocus shown in Figure 3 Slow output components caused by pole-zero spacing are avoided if both closed-loop poles ω p1 and ω p2 are placed at high frequencies to guarantee small time constants; this is possible if and only if the zero is located at high frequencies, which directly impacts the location of ω p1 and ω p2 as seen in (6). An important observation here is that if the closedloop dominant pole is close to the location of the zero, its coefficient (proportional to ω p1 -ω z ) is reduced thereby reducing the effect of possible slow components.

Feed-Forward Techniques for Folded-Cascode OTAs
The typical FC OTA is shown in Figure 5 [7]. Its small-signal transconductance gain is approximately given by (8), where g m1 is the small-signal transconductance of M1, and C N is the capacitance associated with the source of M5: The transconductance of the cascode transistors and the equivalent parasitic capacitor C N at that node determine the open-loop pole's frequency. For wide band applications, a large unity gain frequency is needed, and therefore the frequency of the parasitic pole ω p (= g m3 /C N ) must be as high as possible. PMOS drivers are preferred for FC amplifiers since the parasitic pole of the folding node is then associated with NMOS cascode devices and is located at a higher frequency. When reducing the widths of the cascode transistors, the benefit of increasing the frequency of the parasitic pole might be limited because the saturation voltage must be maintained within the limits dictated by the supply voltages and signal swing. Mobility degradation due 4 VLSI Design to vertical electrical field becomes more critical in that case as well. Reducing the length of the cascode transistors reduces V DSAT and increases ω p ; the drawback is the reduction of the OTA DC gain. Increasing the bias current also increases the frequency of the parasitic poles, but the DC gain reduces, and the power consumption increases. Moreover, the choice of PMOS drivers is on the expense of a larger input capacitance for the same g m if NMOS drivers were used. Ideally, an OTA should use NMOS transistors for both differential pair and cascode devices, such that both the small signal transconductance and phase margin are increased. This is the major advantage of the telescopic structure [13,14], but its output swing is limited, especially for low-voltage applications and if low V T transistors are not available.
To overcome some of these tradeoffs, a number of feedforward compensation techniques have been reported [5,[8][9][10][11][12]. The technique proposed in [9] uses RC networks connected to the gate of the cascode transistors; hence a zero is introduced such that the parasitic pole is partially compensated. In the technique proposed in [10], the low-frequency M2b M2a signal flows throughout the PMOS cascode transistors, and, by using RC networks, the high-frequency signal flows throughout the NMOS cascode transistors. Due to the higher mobility of the NMOS devices, better performances can theoretically be achieved. The additional networks, however, increase silicon area and the capacitance of the parasitic nodes, thus reducing the frequency of the poles; a mediumfrequency pole-zero pair may increase amplifier's settling time. In [11], the gate of the cascode transistor is directly connected to the input signals. By using that feed-forward scheme, further improvements in the OTA phase margin are obtained due to the presence of a high-frequency zero. A major drawback of this technique, however, is that the gatedrain capacitors of the cascode transistors affect the precision of the system, especially for SC circuits. This drawback has been partially solved by using cross-coupled capacitors [12].
Complementary differential pairs have been used for a long time in the design of rail-to-rail amplifiers [16]. They can also be used for fast amplifiers [17], where all cascode transistors can be exploited as shown in Figure 6. It VLSI Design can be shown that the small-signal transconductance of the complementary OTA is given by where P denotes g m3 (g m9 /g m7 )(C N /C P ), and C N and C P are the parasitic capacitors lumped to the source of transistors M7 and M9, respectively. According to this result, if the poles at the source of M7 and M9 are placed at the same frequency, the overall small signal transconductance becomes Gm(0) = g m1 + g m2 with a single pole located at ω = g m7 /C N = g m9 /C P . In general, two signal paths generate poles located at different frequencies, leading to the so-called "phantom zero"; this term is used because there is not a physical element generating the zero, but this is a result of the addition of signal components with slightly phase difference. The overall current consumption is 4I B , same as the FC OTA previously discussed. For same overall current and same input capacitance, its small signal transconductance is around 15%-20% more compared to the FC OTA. A downside is the introduction of the parasitic pole associated with the PMOS cascode transistor. Moreover, the addition of the signal paths generates a zero at a lower frequency than the pole associated with the NMOS cascode devices. Also, the input common mode range where the transconductance is maximized is limited. The slew rate, on the other hand, is 33% higher because the sourced/sunk current can be as high as 4I B /3.
The current-mirror cascode OTA shown in Figure 7 has a non dominant pole at gate of M3 in addition to the pole of the cascode transistor M7. The overall small signal transconductance is given by (10), where g m3 (7) is the transconductance of transistor M3(M7), C N1 (= C GS3 (1 + N)) is the capacitance associated with the gate of M3, C N2 is the capacitance associated with the source of the cascode device M7, and N = g m5 /g m3 .
The current-mirror cascode OTA suffers from a similar limitation as the FC OTA; during negative slewing, only half of the drain current of M5 is employed in discharging the load capacitance because the DC current provided by M11 cancels the other half. However, a larger fraction of the overall current used can be transferred to the load if N > 1.
With a current gain greater than 1 in the current mirror, the size of the input transistors can be reduced for same GBW as the FC OTA. Although this decreases the input capacitance, the parasitic capacitance at the gate of M3 increases, which pushes the non dominant pole to lower frequencies. Also, for the same power consumption, N > 1 increases the current levels at the output stage thereby lowering the OTA's DC gain. Nonetheless, if the current-mirror OTA is designed with sufficient phase margin, it may settle faster than the FC OTA because of its enhanced slew rate and smaller input capacitance.
A recycling folded-cascode (RFC) OTA built by the combination of the conventional FC and the current-mirror OTAs is depicted in Figure 8 [18]. This architecture shares all the benefits of the two OTAs from which it is created, but without sharing their limitations. It is named the recycling folded-cascode as it reconfigures the same devices of an FC and reuses previously idle current in the signal path with virtually no increase in silicon area. In the FC OTA of Figure  Now it can be shown that the transconductance of the RFC is given by (11), where g m1 is the same as that of the original FC (g m1 = 2g m1a ), and C N is the lumped capacitance at the source of M5. By applying the value of N = 3, the lowfrequency transconductance of the RFC is found to be twice that of the original FC for the same power consumption. When compared to the current-mirror OTA, the increase in the RFC transconductance was not on the expense of increasing the output current and reducing the output impedance. As far as bandwidth is concerned, the input signal follows two paths to the output: M2b-M3b-M3a-M5 creates a current-mirror OTA, while the feed-forward path M1a-M5 creates an FC OTA. Since the signal parts add in phase at the source of M5, an LHP zero is created by the feed-forward path, which partially compensates the negative phase shift induced by ω p1 . Since all the poles and zero of the RFC are associated with NMOS devices, they are naturally at high frequencies and will not introduce slow settling components as long as N is kept moderately small. In fact, the pole-zero pair associated with the current mirrors M3a, b and M4a, b can be placed beyond the OTA unity gain frequency, ω u . Suppose that a condition is imposed such that ω p1 > 3ω u , then an upper boundary is placed on N as described by (12): , ω p1 ∼ = g m3b C GS3b (1 + N) , ω p2 ∼ = g m5 C N , Given the RFC modifications, the slew rate is also improved. Assuming a single-ended load C L , the slew rate of the original FC and the current-mirror OTAs is 2I B /C L and 2NI B /C L , respectively. Now consider the RFC when a large signal is applied at the input. As V in+ approaches V DD , transistors M1a, b shut off, which forces transistors M4a, b and M6 to shut off. Hence the total current available to charge the . Thus, M9 is sourcing I B (N − 1)/2 while M3a is sinking 2NI B , resulting in the capacitance at V out− to be discharged by I B (3N + 1)/2. This differential imbalance in the charging and discharging of V out+ and V out− is quickly converted to a common mode error and fixed by the common mode feedback (CMFB), and the result is a maximum symmetrical slew rate of 2NI B /C L . While it is clear that the slew rate of the RFC is enhanced over that of the original FC, the same may not be so obvious when it comes to the current-mirror OTA. But, if we consider the same power consumption, the value of N used in the current-mirror OTA is 1 whereas for the RFC, N is 3; the slew rate is also enhanced over that of the current-mirror OTA. In the design of any OTA however, the slew rate will be restricted by the size and biasing conditions of the devices in the signal path, which will limit the slew rate to a smaller value than in theory, especially for low-voltage implementations. An aspect worth examining is the overall efficiency. If we define efficiency as the ratio of generated small-signal current to total DC current, that is, Gm(0)/I total , then the efficiencies of the original FC, current-mirror, and RFC OTAs can be given by (13). The RFC is clearly the most efficient OTA. Although the current-mirror OTA is almost as efficient as the RFC, its increased efficiency comes at the expense of a large N which drastically affects its Gm pole locations and limits its bandwidth, whereas the efficiency of the RFC is independent of N. More importantly, the efficiency of the RFC is the same as that of a telescopic OTA (total telescopic current is 2I B ), but the RFC has a wider input common mode range and larger output swing:

Folded-Cascode OTA Case Study
This enhanced efficiency of the RFC can be viewed from another angle. If the RFC is able to achieve twice the transconductance and more than twice the slew rate (N = 3) of the original FC while using the same power and silicon area, then the RFC must be able to achieve the same transconductance and slew rate as the original FC using significantly less power and silicon area. Indeed, if we take the RFC of Figure 8 and reduce the width of all devices by a factor of 2, it will achieve a similar performance to the original FC, but using only half the power and half the area, which also means half the input capacitance. To demonstrate this, three OTAs were designed in TSMC 0.18 μm CMOS technology with a 1.8 V supply: an FC and two RFC OTAs. One of the recycling folded-cascodes, RFC1, uses the same power and area as the FC, while the second, RFC2, uses only half the power and half the area. The setup in Figure 9 was used to characterize the different OTA aspects. To preserve the high-output impedance of the OTAs and limit the DC output current drawn, R was set to be 560 kΩ. As for C 1 and C 2 they were set to 2.2 pF and 2.5 pF, respectively, which yields an overall load of 3.6 pF. As seen in Figure 10, RFC1 indeed has a wider bandwidth, whereas RFC2 has virtually the same bandwidth; this was anticipated according to the analysis in the preceding section. While RFC1 has +6 dB gain due to an enhanced Gm, RFC2 has +6 dB gain because it consumes half the current; the additional 2-4 dB improvement is attributed to the enhanced output impedance. The gain enhancement seen in Ro RFC is due to the increased r ds of M1a and M3a, as they conduct less current compared to their counterparts M1 and M3 of the FC. Therefore, an overall low-frequency gain enhancement of 8-10 dB can be seen in the RFC compared to the FC as seen in Figure 10.
The phase response shows some degradation for both RFC1 and RFC2 with respect to the FC. This is to be expected. As discussed earlier, the addition of current mirrors in the signal path (M3a, b-M4a, b) introduces a pole-zero pair. However, by satisfying the condition set by (12) Figure 11, the input signal was a 500 mVpp 10 MHz pulse with a common mode level of 450 mV. Undoubtedly, RFC1 has a superior slew rate performance than FC as seen in Figure 11(a). RFC2 too has a better slew rate performance, which is seen more clearly in Figure 11(b) as a higher peak output current. Moreover, the settling behavior of both RFC1 and RFC2 was not affected by the phase margin degradation in comparison to FC.
As for noise, RFC1 shows better performance over the FC. Intuitively, the enhanced transconductance of the RFC1 reduces the noise when referred to the input. This, however, is counteracted by an increased output noise due to contributions by M3b and M4b, which actually are amplified by N 2 . Considering that the output current thermal and flicker noise PSD of an MOS device can be expressed as (14), it can be demonstrated that the input referred thermal (v 2 iT ) noise PSD of the FC and RFC1 given by (15) and (16).
The noise performance improvement of RFC1 (N = 3) is hence explained by two smaller terms in (17) A summary of the discussed results is shown in Table 1. Figure 12: Three-stage amplifiers with nested Miller compensation.

VLSI Design
C 02 Figure 13: Two-stage Miller compensation.
The inverting amplifiers are not needed if differential stages are used. DC gains of 90-100 dB can be achieved. Due to the three high-impedance nodes, double Miller compensation might be required for adequate phase margin. The classic two-stage Miller compensation scheme is shown in Figure 13. The open-loop dominant pole, ω p1 ∼ = g 01 /A v 2C m , is pushed to lower frequencies by the increase in effective capacitance formed by the compensation capacitor, Cm, and the gain of the second stage, A v2 . This decreases the open loop unity gain frequency ω u (∼ g m1 /C m ) and results in a slower settling time. The nondominant pole is mainly given by g m2 /(C 01 + C 02 ). For good stability, the condition g m2 /(C 01 +C 02 ) > g m1 /C m must be satisfied. However, highfrequency SC circuits may require large load capacitors that force a large g m2 and further increase the power consumption and capacitor C 01 . Feed-forward compensation techniques have been used to boost the DC gain of OTAs, especially for low-frequency applications [25], [29]. Figure 14 shows the simplified schematic of the compensation scheme. The NCFF compensation scheme does not employ any compensation capacitor but uses a Left plane (LHP) zero for obtaining good phase response. It can be found that the open-loop small signal transconductance gain is G m (s) ∼ = A v1 g m2 + g m3 1 + s/ g m2 /g m3 A v1 ω p where A v1 is the DC gain of the first stage (= g m1 /g 01 ), and the dominant pole of the first stage is located at ω p = g 01 /C 01 . The DC transconductance is approximately given by g m = g m1 g m2 /g 01 . By using this OTA in the amplifier configuration shown in Figure 1(a), and according to (1), (2), (5), (6), and (25), the closed-loop zero and poles are located at the following frequencies: Real poles are obtained if g m3 is further increased, but the frequency of the closed-loop zero decreases, and slow components might appear. The dominant pole and zero are close enough (mismatch < 10%) if 4 g m1 g m2 g m3 + g 02 Additional computations show that under this condition, the poles are located at Notice that under these conditions, and with sufficient feedback, ω z and ω p1 are very close to each other regardless of the absolute value of the load capacitors used; the root-locus is similar to the one depicted in Figure 3. The frequency of both ω p1 and ω z increases, increasing the speed, if the parasitic capacitance at the output of the first stage, C 01 , is reducedthis is an important design consideration. If C 01 is reduced, then complex poles might appear, but these can be tolerated; although some ringing appears in the transient response, fast response results if the real part of the poles is sufficiently large. The SC amplifier of Figure 1(a) has been simulated using the NCFF architecture with transconductances g m1 , g m2, and g m3 set at 1mA/V, 4 mA/V, and 10 mA/V, respectively. The amplifier DC gain is around 90 dB, because a telescopic amplifier is used for the first stage. Shown in Figure 15 (2) C 1 = 0.5 pF, C 2 = 1 pF, and C 01 = 0.5 pF (large capacitance at the output of first stage); (3) C 1 = 0.5 pF, C 2 = 1 pF, and C 01 = 0.75 pF (largest capacitance at the output of first stage); (4) C 1 = 1 pF, C 2 = 2 pF, and C 01 = 0.25 pF (bigger input and integrating capacitors).
Although the variations in parameters are large, the 0.1% settling time is around 3.2 nanosecods for cases 1 and 4. The pulse response is slow if C 01 increases, cases 2 and 3, where the 1% settling is 3.3 and 7 nanosecods, respectively. For comparison, a two-stage Miller amplifier with large transconductance stages was designed; the transconductances used are g m1 = g m2 = 10 mA/V and a nominal C m of 2 pF; a nulling resistor optimized for RHP zero cancellation is used. The amplifier DC gain is set at 90 dB. Shown in Figure 15 (1) input and integrating capacitors of 0.5 pF, 1 pF, and C m = 2 pF; (2) input and integrating capacitors of 1 pF, 2 pF, and C m = 3 pF; (3) input and integrating capacitors of 1 pF, 2 pF, and C m = 4 pF.
Notice that the NCFF approach (nominal case, C 01 = 0.25 pF) can be faster than the Miller amplifier, even if the latter structure uses larger transconductances.

Experimental and Simulated Results
The aforementioned FC, RFC1, and RFC2 OTA prototypes have been fabricated in TSMC 0.18 μm CMOS process; a microphotograph of the chip is shown in Figure 16. The silicon area of the amplifiers is 4700 μm 2 , 4950 μm 2 , and 3000 μm 2 , and they were biased with a total current of 800 μA, 800 μA, and 400 μA, respectively. Input, integrating and load capacitors of 2.2 pF, 2.2 pF, and 2.5 pF, respectively, were used. Equipment and PCB routing parasitics contribute 10 VLSI Design Figure 18: Single-ended amplifier with NCFF compensation scheme.
an additional 2.1 pF, 3.4 pF, and 2.2 pF to the FC, RFC1, and RFC2, respectively. The amplifiers pulse response is depicted in Figure 17 with no observable overshoot. The 1% settling-time is 20.7 nanosecods, 13.7 nanosecods and 20.8 nanosecods respectively. A two-stage OTA using NCFF compensation scheme was implemented in AMI 0.5 μm CMOS technology with supply voltages of ±1.25 V; the schematic is shown in Figure 18. The active area for the amplifier is around 0.16 mm 2 . The bias current for the first stage is only I B1 =50 μA, and the one used in the second stage is I B2 = 2 mA. For the feedforward stage the tail current is I B3 = 5 mA. The transistor aspect ratios are 960 μm/0.6 μm for the first differential pair, 600 μm/0.9 μm for the second stage, and 120 μm/0.9 μm for the feedforward path. According to [25] and [27] the polezero matching should be fairly good. Postlayout simulations show that for a load capacitance of 8 pF and a step of 300 mV, the 1% settling time of the OTA was 5.1 nanoseconds. Neither overshoots nor low-frequency components were observed. The postlayout simulation results for a single-ended OTA show a DC gain of 91 dB, GBW of 325 MHz and slew rate of 140 V/μs. An inverting amplifier, similar to the one shown in Figure 1(a), was experimentally tested. For the test setup, external capacitors of 5 pF were employed. The total effective load capacitance was 12 pF (estimated capacitance of measurement equipment probe capacitance and package bondpad capacitance). Transient postlayout results for a 400 mV peak signal are shown in Figure 19(a); the amplifier response corresponds to a typical first-order system. The 1% settling time is around 6.5 nanosecods; the first 1 nanosecod is associated with slew rate limitations while 5.6 nanosecods correspond to linear settling. The chip was measured and the 1% settling time for an input step of 800 mV was 17 nanosecods, as depicted in Figure 19(b), which divides to roughly 12 nanosecods in the slew rate limited and 5 nanosecods in the bandwidth limited settling phases. For these results, the input edge had a fall time of around 3 nanosecods due to PCB, bond-pad parasitic (DIP-40 package was used), and equipment loading effects. The output step response has no ringing, which shows a good phase margin. Postlayout simulation results for the amplifier with a 4 nanosecods fall time input step, and parasitic capacitors at the OTA input of 3pF and load capacitor of 12 pF show a 1% settling time of around 13.5 nanosecods, which is in good agreement with the measured results.

Conclusions
Feed-forward techniques can improve the speed of closed loop switched-capacitor networks. It has been shown that the recycling folded-cascode OTA presents higher slew rate and superior settling performance than the conventional folded-cascode OTA for the same power consumption. The pole-zero pair present in feed-forward topologies must be placed at high frequencies to avoid slow settling components. Another important advantage of feed-forward schemes is that gain enhancement and smaller parasitic capacitor presented at the input reduce the error after settling than that obtained with the regular folded-cascode OTA. The NCFF compensation scheme enables both high gain and fast settling time, resulting in accurate and fast step response. LHP zeros are used to cancel the phase shift of poles to obtain a good phase margin. The effect of pole-zero mismatches on feed-forward amplifier's performance was studied, and it was shown that the pole-zero cancellation should occur at high frequencies for best settling time performance. Simulation and experimental results for the amplifiers are in accordance with the theoretical derivations.