A Design of a New Column-Parallel Analog-to-Digital Converter Flash for Monolithic Active Pixel Sensor

The CMOS Monolithic Active Pixel Sensor (MAPS) for the International Linear Collider (ILC) vertex detector (VXD) expresses stringent requirements on their analog readout electronics, specifically on the analog-to-digital converter (ADC). This paper concerns designing and optimizing a new architecture of a low power, high speed, and small-area 4-bit column-parallel ADC Flash. Later in this study, we propose to interpose an S/H block in the converter. This integration of S/H block increases the sensitiveness of the converter to the very small amplitude of the input signal from the sensor and provides a sufficient time to the converter to be able to code the input signal. This ADC is developed in 0.18 μm CMOS process with a pixel pitch of 35 μm. The proposed ADC responds to the constraints of power dissipation, size, and speed for the MAPS composed of a matrix of 64 rows and 48 columns where each column ADC covers a small area of 35 × 336.76 μm2. The proposed ADC consumes low power at a 1.8 V supply and 100 MS/s sampling rate with dynamic range of 125 mV. Its DNL and INL are 0.0812/−0.0787 LSB and 0.0811/−0.0787 LSB, respectively. Furthermore, this ADC achieves a high speed more than 5 GHz.


Introduction
CMOS Monolithic Active Pixel Sensors (MAPS) are charged particle tracking devices, integrating on the same silicon substrate radiation sensitive detector elements with its front end readout electronics. In the past years, the CMOS Monolithic Active Pixel Sensors (MAPS) [1][2][3] have evolved as an interesting alternative to fulfill the requirements of vertex detector in the future high energy physics and biomedical imaging applications compared to the existing detectors like Charge Coupled Devices (CCD) [4][5][6][7][8][9] or Hybrid Pixel Detectors (HPD) [10]. The MAPS has many successful advantages such as high spatial resolution, low cost fabrication, low power, radiation hardness, compactness, random access, and fast readout. Nevertheless, despite all these advantages many other challenges still linger for the future of International Linear Collider (ILC) vertex detector (VXD) [11]. There are three steps to deal with the very small amplitude pixel signal (around a couple of millivolts). The first step is the readout chain that must have a low noise limitation, the second is the speed of the readout circuitry that must be fast in order to realize an integration time ranging from 10 to 100 s, the speed of the readout circuitry must be fast, and the third one is a further decrease in power consumption and active area that is strongly desirable.
The popularity of the column-parallel readout architecture in improving the readout speed, allowing reading up to 10 k frames/s, leads to the designing and the fabrication more than 30 different minimum ionizing particle MOS active pixel sensors (MIMOSA) [12,13]. Sensors equipping the innermost layer of the ILC VXD must show a single point resolution better than 3 m attached to a very short integration time (less than 10 s) because of the beam strahlung background. This prerequisite encourages an R&D effort centring on a high readout speed design. A small pixel pitch of 16 m (called MIMOSA-30) ended with a discriminator is proposed [14][15][16][17]. The largest sensors for the outer layers which stand for about 90% of the total VXD surface appeared to have less confines in terms of spatial resolution and readout speed. To achieve the minimization of the power consumption a single point resolution of 3-4 m must be combined with an integration time shorter than 100 s, whereby it is supposed to form 2 The Scientific World Journal a valuable trade-off. A larger pixel pitch of 35 m combined with a 4-bit ADC is proposed [18,19], therefore reducing the power consumption without losing the spatial resolution.
The different kinds of ADCs architectures have been studied by several researchers [18,[20][21][22][23][24][25][26][27][28][29][30][31][32][33]. The proposed ADCs architecture determines how well it can meet the below-mentioned targets. In literatures, pipeline [20], double ramp [21], and successive approximation register (SAR) [18,22,23] are notable for achieving the needed specifications. Indeed, Dahoumane et al. [20,24] and Bouvier et al. [25] have proposed that the pipeline architecture can get a high speed; however, it requires several operational amplifiers, which results in a large power dissipation. Pillet et al. in [21] have proposed that the double ramp architecture can get a low power consumption and small area, but it is not suitable for conversion speed of 1 M samples/s. The SAR architecture proposed by Zhang et al. [18,22,23] requires several comparisons cycles to complete one conversion and therefore has limited operational speed. In fact, each ADC has its advantages and weaknesses that make it more compatible with different applications. Several key points have been used for a converter design to develop a certain formulae that could compare different architectures and some of these points are the accuracy in bits, power dissipation, the speed conversion, and more [2]. Note that, if the pipeline, double ramp, and successive approximation register (SAR) ADC architectures have been widely used in literature to design the MAPS, no work is interested in the use of Flash ADC architecture where the power consumption of the column ADC is a very critical issue. For this, we have proposed, in this paper, a new architecture 4-bit column-parallel ADC Flash, low power, high speed, and small area. In this proposal, we interpose an S/H block in the converter. This integration of S/H block increases the sensitiveness of the converter to the very small amplitude of the input signal from the sensor (around a couple of millivolts) and provides a sufficient time to the converter to be able to code the input signal. This ADC is developed in a 0.18 m CMOS process with a pixel pitch of 35 m. This paper describes the design of a columnparallel ADC suitable for the outer layer CMOS sensors where the design of a new 4-bit column-parallel ADC Flash is constrained by several factors: (i) The used technology must be the one already validated to achieve the pixels. (ii) The ADC needs the conversion of continuous signal and hence does not have dead time. (iii) In order to accomplish an integration time of 100 s or less in a full size sensor (about 2 × 2 cm 2 ), the ADC accommodating the pixel readout in parallel is required to work at a frequency of 100 MHz (10 ns/row). (iv) The design of layout has to be adjusted to the dimensions of the pixel (width 35 m). (v) The readout chain should introduce very feeble noise in order to contain the modest pixel signal (around a couple of millivolts). (vi) The power consumption of ADC must be minimized.
The simulation results of the proposed ADC respond to the constraints of power dissipation, size, and speed for the MAPS sensors compared with Zhang [18,22,23], Dahoumane [20,24], and Bouvier et al. [25]. The ADC must be compacted, fast at a sampling frequency (100 MS/s), very low power dissipation, and responsive to a minimum signal of approximately 7.81 mV. This minimum signal delivered by each column is typically of the order of mV, and it is a first challenge to the design of the read circuit. The choice of this ADC is a compromise between the granularity and the spatial resolution of the sensor, the size, and power dissipation.

ADC Design
The global architecture of MAPS chip comprising the pixel array with its associated readout electronics and conversion stages is showed in Figure 1.
Unlike the conventional Flash ADC architecture where the input signal from the sensor is directly linked to the comparators, we propose to interpose an S/H block in the converter. Indeed the integration of an S/H block in the converter will (i) increase the sensitiveness of the converter to the very small amplitude of the input signal from the sensor (around a couple of millivolts); (ii) provide a sufficient time to the converter to be able to code the input signal.
Furthermore, the architecture of this block S/H was optimized using a minimum number of components capable of performing several operations of the conditioning signal. The ADC converts the pixel output signal by using a new Flash ADC architecture based on a multiplexer based encoder and a specific sample-and-hold (S/H) circuit, as shown in Figure 2. The main components are a sample-andhold (S/H) circuit, pont divisor circuit, series of comparators circuits, a multiplexer based encoder circuit, and DFF register circuit. A sample and hold (S/H) circuit is employed to sample and amplify the pixel signal. A pont divisor of resistors placed in series generates references voltages of comparators. For the 4-bit converter, we need a ladder with 16 resistors. Here the maximum voltage is divided by 16. The series of comparators composed of 15 comparators including a buffer and a preamplifier are used to adapt the level of the voltage references supplied by the "pont divisor." The output of a comparator is 1.8 V when the input voltage becomes greater than the concerned voltage reference and 0 V otherwise. A multiplexer based encoder uses 2 : 1 multiplexer requiring 11 multiplexers for implementing 15 inputs and 4 inverters which convert thermometer codes to the binary codes. It should be noted that at this step the output signals are not synchronous. To solve this problem, a DFF register is proposed to allow a synchronous binary signal using four flipflops of type latch. The output signals are composed of 4 bits that come out in parallel.

Proposed Sample-and-Hold Circuit (S/H).
To increase the sensitiveness of the converter to the low amplitude of the   input signal from the sensor (around a couple of millivolts) and to provide a sufficient time to the converter to be able to code the input signal, we propose to interpose an S/H block in the converter. As shown in Figure 3, the architecture of the proposed sample-and-hold (S/H) circuit is consisted of an output feedback of operational transconductance amplifier (OTA); a hold capacitor and a switch operate at sampling frequency. The voltage at the terminals of the capacitor follows the voltage to convert when the switch is off, this is on one hand. On the other hand, when the switch is on the voltage at the terminals of the capacitor no longer follows the changes in the frequency of the signal to convert. Transmission gate (TG) 4 The Scientific World Journal   is used as a switch and a hold capacitor 0 value is 250.8 fF.
The idea behind using transmission gate (TG) as switch is to get maximum sampling frequency. Generally, upper limit on sampling frequency is depended on the type of a used switch, and with TG as switch, we can get around 6.25 MHz to 5 GHz of sampling frequency without strongly affecting the output. The main advantages of this architecture are the charge injection error and the clock feedthrough error, and they are effectively removed. This type of S/H obtains a very high-accuracy characteristic. The S/H architecture can get a high speed and low noise performance. The gain of this S/H circuit is Here, V is the gain of the operational transconductance amplifier (OTA) circuit. The value of ON,TG 0 is chosen to limit the / noise effect. In order to maximize the gain, ON,TG 0 should be minimized; yet, that would produce a large parasitic capacitor in layout causing great current to drive in the OTA. Consequently, a trade-off must be taken between gain and power. Figure 4 shows the simulation of our S/H for an input signal frequency in = 10 MHz and a sampling frequency S = 100 MHz.

Operational Amplifier Circuit.
In the sample-and-hold circuit the operational amplifier is very important to get accurate results [34]. We propose the use of an operational transconductance amplifier which has a gain of about 103 dB for a bias current of 9.5 A with DD = 1.8 V and SS = 0 V.
A value of loading capacitor is 0,1 pF. The architecture of the proposed amplifier is composed of three stages: a differential input stage that pilots an active load, a gain stage which increases the gain, and an output stage that can be added for the conduct of large loads off-chip. This configuration offers a good common mode range, a swing of output, the voltage gain, and the Rate of Common Mode Rejection (CMRR) in a simple circuit that can be compensated by a capacitor and resistance.
The  Table 1. Figure 5 shows the open loop gain and phase margin of the proposed OTA.

Comparator Circuit.
Among different architectures in literature we chose the architecture of a static comparator [34]. Indeed, it presents the advantage of a low offset and a switching noise that are reduced to a lower input. The configuration of the proposed Comparator Architecture consists of three stages. The first stage is differential input pair, which are PM1 and PM2 of P-channel type, charged by active load of NM1 and NM2 of NMOS transistors and polarized by PM3 transistor. The second stage is added to increase the gain in differential mode. The last stage consists of two inverters (NOT gate) and its role is to achieve a clear switching.
The performances of a comparator which are the open loop gain, the Slew Rat (SR), ICMR (Input common Mode Range), offset, bandwidth, setting time, and the power dissipation are all showed in Table 2.

The Digital
Part. The digital part is composed of an encoder and a register. The encoder transcribes data from the comparators stages to the binary signal using the thermometer code. It should be noted that at this step the output signals are not synchronous. To solve this problem, a DFF register is proposed to allow a synchronous binary signal using four flip-flops of type latch. The output signals are composed of 4 bits that come out in parallel.  code. In the literature, Sä ll et al. [35] and Wallace [36] have proposed that the Wallace tree based decoder uses the one counter; the output is the decoded binary code and it also applies global bubble error correction/suppression. So, this approach has the benefit of bubble suppression. The disadvantage of this approach is that it results in large delay and power. Lee et al. in [37] have proposed that the Fat tree based decoder architecture can get a low power consumption and delay efficient. However, these results are in reduced area and delay in comparison to Wallace tree based decoder. A more optimized implementation of the Fat tree based encoder is presented by Hiremath and Ren [38]. This approach neither reduces the array of OR gates into NAND-NOR pairs. The NAND-NOR gates were implemented using a pseudodynamic CMOS logic. Saä il and Vesterbacka [39] have proposed another architecture named the existing MUX based decoder; this latter results in short critical path and small area. Nevertheless, this proposed architecture results in huge fan-out in the critical path. Therefore, the increased fan-out causes an increased power consumption and delay. Note that, the Wallace tree based decoder, the Fat tree based decoder, and the existing MUX based decoder architectures have been widely used in the literature to design the ADC. Up to now, no work has been done to improve the multiplexer based encoder architecture where the power consumption is a very critical issue. For this, we proposed a new architecture 4-bit encoder, low power, high speed, and small area. The multiplexer based encoder circuit uses 2 : 1 MUX, so we require 11 MUX for implementing 15 inputs and 4 inverters which convert thermometer codes to binary codes. The 2 : 1 MUX needs two input signals with one select line; the select line should vary between two logics 0 to 1 depending on the select line the MUX 11 transmit the logic. The logic of the most significant bits (MSB) of the binary input is equal to the middle bit of the thermometer code because it follows the twin logic. The logical encoder used for ADC 4 bits is represented in Figure 6. It should be noted that at this step the output signals are not synchronous. To solve this problem, a DFF register is proposed to allow a synchronous binary signal using four flip-flops of type latch. The output signals are composed of 4 bits that come out in parallel. The truth table for 4-bit multiplexer based encoder is shown in Table 3. The MSB bit of the output is equal to the 7 bit of input (middle bit) and least significant bit (LSB) of output is equal to the value of 14 to 0 . In this design 11 multiplexers are used because in first stage there are 15 inputs for implementing 15 inputs: 7 MUX are used in the second stage; 3 MUX are used; the output of middle multiplexer is acting as select line in the second stage while in last stage 1 MUX is required. The multiplexers used are designed using transmission gates for better accuracy. Figure 7 shows the encoder simulation that allows encoding the thermometers code to the binary code. The outputs of 15 comparators are noted from 0 to 14 and the outputs of binary code are 3 , 2 , 1 , and 0 .

TG-Register Circuit Based on -Type Flip-Flop.
The clock signal CLK( ) is applied in different CMOS circuits for their operations. Figure 8 shows the clock signal CLK( ) and its complement CLK( ). The synchronization of the operations in a digital network is performed by means of signal of clock with respect to an absolute time base. The period, denoted , is the time interval per unit time in seconds, which corresponds to the inverse of the period: = 1/ or is the frequency in Hertz (Hz or s −1 ). The complement signal of CLK( ) is denoted as CLK( ). The  synchronization of the data flow is performed by the clock signal when the TG may be activated or deactivated with a complementary pair [40].
Here, our idea is to create a low power TG-Register circuit -type flip-flop (Master-slave) in CMOS technology based on the TG-latch circuit which is shown in Figure 9 and to synchronize the signals coming from the encoder. Masterslave flip-flops reduce the sensitivity to noise by minimizing the period of transparency. They operate on the clock front. The master-slave -type flip-flop consists of 2 cascaded -Latches in phase opposition. The first is called master; the second is called slave. Figure 10 shows this circuit. It makes it possible to obtain a synchronous binary signal by means of the latch type flip-flops.
The operation of this circuit is as follows: (i) The TG1 is in a conducting mode and transfers the data bit to the stage one (master) latch, if the signal clock is in the state off (CLK = 0 (CLK = 1)). The transfer of the data does not occur, when TG2 and TG3 are opened in the same time.
(ii) When the signal clock is in the state on (CLK = 1 (CLK = 0)), TG1 acts like a switch open and blocks changes in the data. In this time, TG2 switches off and completes the feedback latching circuit, while TG3 is off, to allow the data voltage to be passed into the stage two (slave) latch.
The master-slave appears as a flip-flop having a data input (data), a clock input (clock), and output. When the clock switches from the off state to the on state output is the value of that has been presented which makes it a positive edge started storage element.
Logically, the operation of the TG-Register is as follows: The operation of a TG-Register is synchronous. Its role is to memorize a logical data at a precise moment. This data applied at is taken into account at the beginning of the rising edge and transferred to the output at the end of this rising edge. A new transfer from input to output will occur at the next rising edge of the clock. Figure 11 shows the simulation of the TG-Register.

ADC Complete Simulation Results
The complete simulation of the proposed ADC is presented in Figures 12(a) and 12(b) for two-clock frequency 100 MHz (which represents 10 ns) and 5 GHz (which represents 200 ps). The low input MAPS sensor signal is 125 mV. This latter is a ramp that allows clarifying all the values encoded by the ADC. The supply voltage is 1.8 V in an ambient temperature of 27 ∘ C. Then, the outputs of the ADC are displayed in a signal from 0 to 3 . We visually note that all binary values from 0000 to 1111 are presented and traversed in a homogeneous way on the duration of the simulation. For testing the output of the proposed ADC we will integrate an ideal DAC in the output of the realized ADC. Figure 13 shows the transfer function of an ideal and real ADC, for a resolution of 4 bits. The horizontal axis represents the digital input digital , and the vertical axis represents the analogy output analog . The dynamics of the input bits digital are between 0000 and 1111. In the ideal case, the width and height of a "quantum" are constant and are, respectively, worth 1 LSB and VLSB. In reality, the function of the real transfer is altered by a number of parameters such as noise, the problems of matching between components, and the opening error of comparators. Indeed, these static errors can be described by only four parameters: the offset error, gain error, the DNL, and INL [41].  Table 4. The conversion time of the ADC is 10 ns at a sampling frequency of 100 MHz, realizing an integration time of 48 s for the full size sensor. Since the consumption is a determining factor in the design of our ADC we will study the variation of the ADC consummation according to the frequency, and more precisely we will calculate the consummation of each block of the ADC for different values of frequencies ranging from 6.25 MHz to 5 GHz. Figure 16 shows the simulated average power consumption of subblocks ADC for varying frame rates with dynamic range of 125 mV. The layout of our proposed ADC was realized in TSMC 0.18 m technology. It is quite difficult to quantify the difficulty of this design stage. To achieve drawing masks especially to not exceed 35 m width is to some extent a complicated thing. Thus, the lack of space for routing tracks, the lack of space for the placement of components, and the value of the parasitic capacitances associated with the form factor are just examples of the many challenges faced when drawing masks. Figure 17 presented the drawing masks of proposed ADC, and its size area is 35 m × 336.76 m 2 .  Table 4 shows the compared results of the proposed ADC with other state-ofthe-art works ADCs [18,[20][21][22]. The pipeline ADC [20] can achieve a high speed, but it has larger power consumption. The ramp ADC [21] has a moderate sampling rate and it is satisfactory with the frame rate of MAPS for the VXD outer layers. Note that [18,22] only includes static power consumption without the sample-and-hold circuit. Therefore, the power consumption has been compared with the other works. From        the results, this ADC has one of the best power efficiencies of published work. Moreover, it achieves the lowest power consumption and a high speed more than 5 GHz. Also this ADC has the smallest active area of 35 × 336.76 m 2 .

Conclusion
In this paper, a new optimized architecture of a low power, high speed, and small-area 4-bit column-parallel ADC Flash integrated at the MAPS sensor array access per-column ADC (PC-ADC) has been proposed. To increase the sensitiveness of the converter to the very small amplitude of the input signal from the sensor and to provide a sufficient time to the converter to be able to code the input signal, we have proposed to interpose an optimized S/H block in the converter. The simulated results show that the architecture offers many interesting performances such as low power consumption of 751.42 W without the S/H at a high speed sampling rate of 100 MS/s; this value rises to 1.28 mW with the S/H. Its DNL and INL are 0.0812/−0.0787 LSB and 0.0811/−0.0787 LSB, respectively. Furthermore, this ADC achieves a high speed more than 5 GHz and has the smallest active area of 35 × 336.76 m 2 . Consequently, with these optimized characteristics, this kind of ADC can be used for monolithic active pixel sensors (MAPS) in high energy physics to accomplish the requirements for next generation with some GS/s.