A VLSI Implementation of Rank-Order Searching Circuit Employing a Time-Domain Technique

We present a compact and low-power rank-order searching (ROS) circuit that can be used for building associative memories and rank-order �lters (ROFs) by employing time-domain computation and �oating-gate MOS techniques. e architecture inherits the accuracy and programmability of digital implementations as well as the compactness and low-power consumption of analog ones. We aim to implement identi�cation function as the �rst priority objective. Filtering function would be implemented once the location identi�cation function has been carried out. e prototype circuit was designed and fabricated in a 0.18 𝜇𝜇 m CMOS technology. It consumes only 132.3 𝜇𝜇 W for an eight-input demonstration case.


Introduction
Searching operation is an important function in recognition systems.In conventional recognition systems, only the nearest matched template data among a vast number of template data can be retrieved.However, in some applications, such as in k-neighbor selectors or internet routers, �nding an rth nearest matched data is necessary.Although such kind of operation can be carried out by employing a sorting processor, sorting operation is computationally expensive and time consuming, making it unsuitable for building lowpower systems.
In image and speech processing, data compression, communication, neural network, and so forth, nonlinear �lters can �nd a lot of applications such as attenuating impulsive noise while preserving sudden changes in the signal.Among many types of nonlinear �lters, MI�, MA�, and MEDIA� are most popular ones.ese �lters can be implemented by using rank-order �lters (ROFs) with appropriate rank-order setting values.Several ROFs have been implemented in fully digital [1,2], mixed-signal [3] as well as analog approaches [4][5][6].When considering the problem of saving required circuit area so as to use the structure as a basic block for building parallel processing array, analog implementations of ROFs are preferred.Although they can achieve low-power consumption and small chip real estate, the main drawback of analog implementations is that they suffer from the problems of accuracy, such as mismatches between transconductance ampli�ers [4].
In this paper, we have developed a compact and lowpower rank-order searching (ROS) circuit by employing a time-domain computation technique.Here, a time interval or delay time is used for representing value.e circuitry in this study is the core that can be used for building associative memories and ROFs.Since employing time-domain technique, the architecture not only achieves small chip real estate and low power consumption of analog implementations, but also improves the accuracy of such approaches.In the design we aim to identify the location of the candidate as the �rst priority objective.�etting out its content or �ltering function would be implemented easily once the location was found.
In the rest of the paper, system organization and major circuitries utilized in the prototype chip design are described in Sections 2 and 3. Section 4 shows the experimental results from the test chips fabricated in a 0.18 m CMOS technology.And the conclusion of the paper is given in Section 5.

System Organization
A ROF employing pulse width modulation (PWM) signals was proposed in [7] in a fully digital architecture.at architecture works well for �ltering function, but it suffers from the problem of narrow-pulse signals (or glitches) probably occurring at the output of XOR gates in the address encoder circuit, leading error in the location identi�cation function.e problem becomes more obviously in the case of a large number of inputs.In order to overcome this problem as well as to achieve a compact architecture dealing with the problem of a large number of inputs in many applications, an analog rank-order searching engine employing time-domain computation techniques is proposed in Figure 1.It is the basic core for building ROFs and associative memories.Identi�cation function is the �rst ob�ective that we aim at with this architecture.Basically, it consists of analog-to-delaytime converters (ATCs), a rank-order setting circuitry, a comparator based on �oating-gate MOS technology, a binary encoder, and a binary counter.e ATCs convert analog values,  0 , 1 , …, 1 , to delay-time signals; then the rankorder searching circuit uses them as input data.e �nal output is a binary code representing the location of the rth smallest value in analog voltage domain or the rth risenup signal in time domain.In addition, in order to establish a smooth interfacing to the following digital processing, a binary counter can be added to the system.e value of the rth smallest input in a digital format is given at the output of this counter.Filtering/searching operation is carried out within a period called the operation slot which is determined by the SLOT signal.e value of SLOT is mainly selected depending on the desired resolution of computation.For example, SLOT is set to 256 (= 2 8 ) clock cycles for 8bit resolution.In the study, from now on, a rank of  is represented by a binary number RANK equivalent to (  1).For example, a rank of four is represented by the rank value of 11 2 = 3.

Analog-to-Delay-Time Converter (ATC).
Input voltages,  0 ,  1 , … ,  1 , are converted to delay-time signals,  0 ,  1 , … ,  1 , by ATCs as shown in Figure 1.Input analog voltages are applied to the negative nodes of voltage comparators while a common ramp voltage signal is applied to the positive nodes.e comparator compares the input analog voltage with this ramp voltage.e output of the comparator remains "0" level until the ramp exceeds the input voltage.At that moment, the comparator output is inverted to "1" level.In this manner, an analog voltage is converted to a delay-time signal.A smaller delay-time is corresponding to a smaller analog voltage.

Floating-Gate-MOS-Based Comparator and Rank-Order
Setting Circuit.In order to reduce the circuit area compared with [7], the carry save adder (CSA) and the subtractor in [7] are replaced by a simple �oating-gate-MOS-based comparator and a rank-order setting circuit.Simpli�ed schematic of the �oating-gate-MOS-based comparator utilized is shown in Figure 2. e voltages at the �oating gates are determined as linear weighted summations of multiple input signals and calculated by [8]: where   TOT = ∑ +1 =0    +  0 ,  is either "" or "."   0 ,   1 , … ,   +1 are the input voltages getting one of two levels: 0 or  DD ;   0 ,   1 , … ,   +1 are capacitive coupling coefficients between the �oating gate and each of the input gates;  0 is the capacitive coupling coefficient between the �oating gate and the substrate.   and   +1 are necessary to guarantee that   is smaller than   at each given rank and to �t the range of   and   inside the input range of the comparator.Smallest MIM capacitances of 16 fF of the fabrication process were chosen to save chip area.
For a given rank-order value, rank-order setting circuit will connect some of its capacitors (i.e.,    ) to  DD and connect others to ground so that it can set a corresponding   proportional to the rank value.e voltage   rises proportionally to the number of "1" inputs.When   exceeds   , the comparator output COUT becomes "high, " as shown in Figure 2, aer a small delay due to the response of the comparator.
Figure 3 demonstrates the case of a 5-input system with the required rank order of four.As can be seen in this example, the �ltered input, that is, the candidate, is  3 and a binary code of 011 2 will be generated by the address encoder.4 illustrates the schematic diagram of the address encoding circuit.It identi�es the location of the �ltered input and represents this location as a binary code.It consists of simple XOR gates, narrowpulse �lters, domino buffers, and a binary encoder.�ocation identi�cation function is carried out by taking XOR function between COUT and delay-time signals then searching a "0" that is remained at the output of domino buffers.Unfortunately, due to the response time of the comparator, a narrow-pulse signal (i.e., glitch) will occur at the output of the corresponding XOR gate of the candidate signal.eoretically, such a glitch disappears if the response time is zero.is pulse is narrower than others occurring at other XORs� outputs.�arrow-pulse �lters are placed at the outputs of XOR gates to remove such a pulse.e domino buffers following �lters detect from "0" to "1" transitions.As a result, only one output of these buffers remains "0" level, and others become "1, " at the end of the operation slot.A following binary priority encoder senses the "0" input and generates an address corresponding to that one.As can be seen in the example of Figure 3, the fourth smallest data ( 3 ) is nearly identical to the signal COUT, and the glitch occurring at  3 is removed by the narrow-pulse �lter.The rth smallest value (in digital) V a = V a (P 0 , P 1 , . . .P N−1 )

Narrow-Pulse Filter (Glitch Filter).
A well-known delay cell [9� has been employed as a narrow-pulse �lter in this design.Figure 5 shows the basic schematic of the �lter.At the beginning of operation, the output of the �lter is reset to "0" by  3 and  4 .e output changes its state when the voltage   of the parasitic capacitor   becomes smaller than the threshold voltage   of the inverter.erefore, if the discharge time Δ, namely, the time required to discharge the parasitic capacitance from  DD to   , is larger than the pulse width Δ of   , the pulse   is �ltered.e advantage of this �lter over conventional �C �lters is that the �ltered pulse width is programmable by changing a bias voltage   .�y this manner, the �lter is programmed to �lter only the narrowest pulse   , which is related to the candidate signal   , while other pulses   are �ust delayed by �ltering.

Experimental Results
4.1.Chip Fabrication.e proof-of-concept chip was designed and fabricated using a 0.18 m standard CMOS technology.e chip includes eight inputs for demonstration.Time-domain signals are directly applied as input data.It means that ATCs shown in Figure 1 were not implemented in the test chip.For simplicity, the binary counter to count the number of clocks representing the digital value of the ranked input was not implemented either.
A photomicrograph of the test chip is shown in Figure 6 and speci�cations are summarized in Table 1.e core size is 0.006 mm 2 .e power dissipation is 132.3 W and the accuracy is 9.5 ns, respectively.Assuming that the system has 8-bit resolution, it takes 256 clock cycles in each operation slot.As a result, the �ltering latency is 2.432 s.8 time-domain signals, as shown in Figure 7, was applied as input data.In this example, a rank value of 100 2 corresponding to searching for the 5th smallest signal is applied to the rank-order setting circuit.Figure 8 shows the measurement results of the searching operation.e searched input (�ltered input) is the input  6 .A winner address code of 110 2 was generated by the encoder.ese waveforms are captured at the maximum time accuracy (i.e., time resolution) Δ of 9.5 ns.
Figure 9 shows the response of the comparator.e comparator has a response time of 3.8 ns at the bias voltage  BIAS of 0.9 V. Reducing  BIAS will increase the response time with the expense of more power dissipation.It can be seen that the time accuracy of the system is the minimum time between two successive delay-time signals in order that the system can distinguish them correctly.In this design, the time accuracy is at least twice as large as the response time of the comparator because of XOR function and �ltering operation.If two successive signals violate the time accuracy, they both generate narrow pulses at the outputs of XORs, and these two pulses will be �ltered.Consequently, the binary encoder may give a wrong decision.For the test chip, the time accuracy is achieved as small as 9.5 ns.As a matter of fact, the response time can be reduced by employing fast comparators but usually with the tradeoff of more power consumption.High-speed synchronous comparators such as in [10] can be implemented in the system since time-domain signals can be synchronized with the system clock.e time accuracy is not an important issue because it can be satis�ed by changing the slope of the ramp voltage signal in the ATC.It mainly affects the latency time required for a given resolution.
e performance of the test chip is summarized in Table 2 along with some ROF implementations from the literature.As can be seen from the table, the analog design F 10: New proposed narrow-pulse �lter.
in [5] gives the best performance in terms of compactness.It is also quite fast but the precision is not good.Digital implementation in [2] is very fast, but it occupies large area.e architecture in this study achieves small core size and low-power consumption.Although this architecture still suffers from some sources of errors such as the offset of the comparators, process variations,…, most of these error can be eliminated by increasing the time resolution of the system to a certain value, via changing the slope of the ramp voltage in the ATCs, so that the system can distinguishe successive time signals correctly.As can be seen, the tradeoff is a larger latency.e problem of glitches in [7], which probably occur at the outputs of XOR gates, is solved by using programmable narrow-pulse �lters.
e rank-order setting circuit can be removed in certain applications where the rank is ��ed, making the architecture simpler, and thus saving chip area.In addition, chip real estate becomes smaller if such a high-k MIM capacitance technology is available.e area required for capacitors in the �oating-gate-MOS-based comparator, therefore, would be reduced signi�cantly.In terms of computation accuracy, the proposed approach can preserve the precision of digital approaches which is difficult to achieve with pure analog implementations.
e narrow-pulse �lter employing delay elements described in Section 3.4 can be replaced by a better version shown in Figure 10.e power-hungry current source caused by T 6 in Figure 5 is removed, thus reducing the total DC power consumption.As a result, an estimated power consumption as small as 77 W is achieved.
Once the location of the desired input is identi�ed, the ROF function can be implemented by either an additional counter, as shown in Figure 1, to receive the �ltered value in a binary code or an additional multiple�er to select the �ltered analog signal.

Conclusions
A low-power analog implementation of rank-order searching circuit for building ROFs and associative memories has been developed by using a time-domain computation scheme.e architecture can preserve the accuracy of digital implementations but achieves advantages of analog implementations in terms of low-power dissipation and small chip real estate.e architecture is also a promising solution when a large number of input data are required.is is because it does not require many additional circuits.e circuit operation has been veri�ed by e�perimental results obtained from the fabricated proof-of-concept chip.

Address encoding F 1 :
Block diagram of the rank-order searching circuit.V b (Rank value)

T 1 :F 8 :
Speci�cations of the proof-of-concept chip.consumption 132.3 (W) Latency 2.432 (s) at 8-bit resolution Function �ank-order location identi�cation 4.2.Measurement Results and Discussion.In order to verify the operation of the prototype chip, an arbitrary pattern of Measured waveforms.Waveforms of operation slot signal (SLOT), signal  6 , LSB of the winner address, and output of the comparator (COUT).