Nanosensor Data Processor in Quantum-Dot Cellular Automata

Quantum-dot cellular automata (QCA) is an attractive nanotechnology with the potential alterative to CMOS technology. QCA provides an interesting paradigm for faster speed, smaller size, and lower power consumption in comparison to transistor-based technology, in both communication and computation. This paper describes the design of a 4-bit multifunction nanosensor data processor (NSDP). The functions of NSDP contain (i) sending the preprocessed raw data to high-level processor, (ii) counting the number of the active majority gates, and (iii) generating the approximate sigmoid function. The whole system is designed and simulated with several different input data.


Introduction
Nanotechnology is a multidisciplinary field that brings together many science and engineering disciplines including physics, chemistry, biosciences, material science, computer science, and electrical and mechanical engineering.Nanotechnology will radically affect all these disciplines and their application areas.The economic impact is foreseen to be comparable to information technology and telecommunication industries [1].Nanotechnology has direct applications in sensing, sensor miniaturization, and new materials development as well as in electronics and electromechanical devices.Substantial advances in nanotechnology development have been achieved in the fields of engineering and bioscience [2][3][4].For example, integrating a large number (496) of programmable FET nodes in a small area of about 960 m 2 and programing them into a full adder, subtractor, multiplexer, and demultiplexer has been reported.Work reported in [3] demonstrated that a lateral integration of 700 rows of Z  O nanowires produces a peak voltage of 1.26 V at a low strain of 0.19%, which is potentially sufficient to recharge an AA battery.The nanosensor development described in [5], based on nanowires, is emerging as a powerful and general class of ultrasensitive, electrical sensors for the direct detection of biological and chemical species, from proteins and DNA to drug molecules and viruses down to the ultimate level of a single molecule.It also shows that a nanosensor array that contains 100 addressable elements provides unique opportunities for label-free multiplexed detection of biological and chemical species.The literature [6] provides an in-depth view of nanosensor technology and electromagnetic communication among nanosensors.With the integration of the technologies described in [3,5,6], that is, the integration of nanosensors, self-powered nanodevices, and wireless nanosensors, many nanosensor applications can be considered.These applications can be classified into four groups: biomedical, environmental, industrial, and military applications.In biomedical applications, health monitoring systems and drug delivery systems are some of the examples.In environmental applications, plant monitoring systems and plague defeating systems can be given as examples.In industrial and consumer goods applications, the examples are ultrahigh sensitivity touch surfaces and haptic interfaces.For military and defense applications, examples are nuclear, biological, and chemical defenses and damage detection systems.
(iii) Massive Data.Nanosensing devices contain a large number of nanosensors (10 3 -10 7 or higher) which generate data continuously.The data from each nanosensor may be primitive and ambiguous.However, massive amount of this type of data, if processed correctly, delivers useful and meaningful information from the device as a whole.
To analyze data from the nanosensing devices mentioned above, classical computing and processing models, such as loosely coupled distributed systems or tightly coupled parallel systems, can be employed.However, in the classical computing and processing models, the number of processors is usually in the range of 10 1 -10 3 .This number of processors can be increased, but it will bring tremendous increase of cost and system dimensions.Each processing unit usually has a local memory and can conduct complex operation independently.Also, they consume large amounts of energy (power) and heat dissipation becomes a critical issue.The above-mentioned computing and processing models are not suitable for processing data provided by nanosensors.The computing and processing models for nanosensing should meet the following three principles.
(i) Simplicity.The basic processing element (PE), that is, the cell, is simple.PE does not need complicated operations and does not need many instructions, as those in the existing general-purpose CPUs.It only needs a small number of operations because it will function as a bridge between nanosensors and highlevel processors.
(ii) Parallelism.There will be a vast number of cells operating in parallel.Hence, the processing must be distributed.
(iii) Locality.All interactions take place on a purely local basis.A cell can interact with a few other cells.
This paper describes the design of a PE for nanosensor data processing, named as nanosensor data processor (NSDP), based on QCA.NSDP works as a bridge between the nanosensors and high-level processor.Its functions are limited to three: (i) sending the preprocessed raw data to high-level processor, (ii) counting the number of the active majority gates (the active means that the output of a majority gate is 1), and (iii) generating the approximate sigmoid function for postprocessing based on artificial neural network.This paper is organized as follows.Section 2 presents the background of QCA focusing on its unique clocking scheme.Section 3 shows the details design of NSDP.Simulation results are shown in Section 4. Conclusions and future works are given in Section 5.

QCA Background
2.1.QCA Cells and Wires.A QCA cell is a square nanostructure with a quantum dot in each of the four corners [17], as shown schematically in Figure 1.The cell is populated with two electrons that can tunnel between two pairs of quantum dots connected via a tunnel junction.The two electrons occupy antipodal sites within the cell due to Coulombic repulsion.Tunneling action only occurs within the cell and no tunneling happens between cells.The combination of quantum confinement, Coulombic repulsion, and the discrete electronic charge produces bistable behavior.The two charge configurations can be used to represent binary "0" and "1" with polarization of −1 and +1, respectively.In contrast to a physical wire, a QCA "wire" is a chain of cells where the cells are adjacent to each other, as shown in Figure 1(b).Since no electrons tunnel between cells, QCA provides a mechanism for transferring information without current flow.

QCA Logic Gates.
In QCA, three-input majority gates and inverters serve as the fundamental gates.A majority gate, as shown in Figure 2(a), consists of five QCA cells that realize the function of (, , ) = ++.An inverter, as shown in Figure 2(b), is made by positioning cells diagonally from each other to achieve the inversion functionality.Figures 2(c) and 2(d) show the variation layouts of an inverter.Majority gates and inverters form a universal set; that is, any logic function can be implemented by using this set.For example, a two-input AND gate is realized by fixing one of the majority gate inputs to "0, " that is, AND(, ) = (, , 0) = .Similarly, an OR gate is realized by fixing one input to "1, " that is, OR(, ) = (, , 1) =  +  ⋅ 1 +  ⋅ 1 =  + .
2.3.QCA Clocking Scheme.Adiabatic switching is used for QCA clocking, which significantly reduces metastability issues and enables deep pipelines [18].During each clock cycle, half of the wire is active for signal propagation, while the other half is unpolarized.During the next clock cycle, half of the previous active clock zone is deactivated and the remaining active zone cells trigger the newly activated cells to In each zone, the clock signal has four states: high-to-low, low, low-to-high, and high.The cell begins computing during the high-to-low state and holds the value during the low state.The cell is released when the clock is in the low-to-high state and inactive during the high state.

QCA Design Rules.
A nominal cell size of 20 nm by 20 nm is assumed.The cell has a width and height of 18-and 5-nmdiameter quantum dots.The cells are placed on a grid with a cell center-to-center distance of 20 nm.QCA design rules are well studied in [19][20][21] and are summarized in the following.(2) Minimum Number of Cells in a Single Clocking Zone.It can be one cell.However, the waveform of a one-cell clocking zone can become distorted and cascading of this kind of clocking zone could lead to incorrect results [22].To observe correct outputs from a circuit, it is recommended that clocking zones should consist of at least two cells.(3) Minimum Wire Spacing for Signal Separation.A space of one QCA cell size is sufficient separation between two wires carrying different signals.(4) Wire Crossover.A unique property of QCA layout is the possibility of implementing crossovers by using only one layer, known as coplanar crossing.Coplanar crossing uses both 45 ∘ and 90 ∘ cells.However, they can easily fail due to low robustness [23] and fabrication issues [24].Another alternative is multilayer crossing, which uses more than one layer of cells similar to the routing of metal wires in CMOS technology.The extra layers of QCA are believed to be useful as active components of the circuits and consume less area compared to coplanar circuits [23].( 5) QCA Equivalent -Rule."-rule" for QCA circuit design could be defined according to the size of a QCA cell, or perhaps the cell size itself could be used as the equivalent .

(B) Timing Design Rules
(1) Logic Component Timing Rule.The timing constraint on a QCA majority gate is that all three inputs are expected to reach the device cell (central cell) at the same time in order to have fair voting.(2) Clocking Zone Assignment Rule.In QCA circuits, cells in each clocking zones should be synchronized.
(C) Special Rules for QCA (1) Majority Logic Reduction.The logic primitive used in QCA is the majority gate.The majority logic-based reduction method [25] can significantly reduce the complexity of QCA circuits.(2) Systolic Design.The features of systolic architecture in terms of synchrony, deep pipelines, and local interconnection are particularly suitable for accommodating the special timing requirement in QCA circuits.When applying systolic architecture to QCA, significant benefits can be achieved, even more than when applied to CMOS-based technology [26].
In the design of NSDP, the second rule (i.e., Minimum number of cells in a single clocking zone) in layout design is applied in the following way.For the fixed-value input, such as fixing one input of a majority gate to "−1.00" (binary 0) to make it an AND gate, it uses one-cell clocking zone.Or for the limited space, it uses one-cell clocking zone, but no cascading.For the 4th rule (i.e., wire crossover), it employs the multilayer crossing technique.The following section shows the details design of NSDP.

NSDP Architecture
NSDP is a processor that works as a bridge between nanosensors and the high-level processor.Its functions are (i) sending the preprocessed raw data to high-level processor, (ii) counting the number of the active majority gates, and (iii) generating the approximate sigmoid function.Among these functions, the last one is the focus of NSDP because it provides the sigmoid function output for the high-level processor to conduct the processing based on the artificial neural network (ANN).The block diagram of NSDP is shown in Figure 4.The detail of each block is explained below.

Decoder.
The decoder unit and the sigmoid pattern generator unit are used to approximate the sigmoid function  = 1/(1 +  − ).As shown in Table 1, NSDP employs 4 points to approximate the sigmoid function.This is 3-to-4 decoder; the inputs are  2 ,  1 , and  0 , and the outputs are  4 ,  3 ,  2 , and  1 .The output  1 corresponds to that  2  1  0 is equal to 0 or 1; that is, it corresponds to that  3  2  1  0 equals to 1000, 0100, 0010, 0001, or 0000.It is worth to note that the position of "1" does not matter.For the four inputs,  3 ,  2 ,  1 , and  0 , of the counter,  3 does not mean the most significant bit and  0 does not mean the least significant bit because the twelve input sensors can only be considered "fired" or "not fired" and the majority gate can be numbered in any order.This is the same here and after.Likewise, the output  2 corresponds to that  2  1  0 which is equal to 2, that is, it corresponds to that  3  2  1  0 which is equal to 1100, 0110, 0011, 1010, 0101, or 1001.The output  3 corresponds to that  2  1  0 , is equal to 3, that is, it corresponds to that  3  2  1  0 which is equal to 1110, or 0111.The output  4 corresponds to that  2  1  0 is equal to 4; that is, it corresponds to that  3  2  1  0 equals to 1111.The outputs  4 ,  3 ,  2 , and  1 are given by (3) Figure 7(a) shows the schematic of this 3-to-4 decoder.Sigmoid function pattern The number of "1" is less than or equal to 1 (the position does not matter).0001 0000 The number of "1" is equal to 2 (the position does not matter).0010 0011 The number of "1" is equal to 3 (the position does not matter).0100 1101 The number of "1" is equal to 4. 1000 1111 4  3  2  1 will be equal to 0001, 0001, 0010, 1000, 0001, 0100, 0001, and 0010, respectively, and the delay is one clock.
The delay from the output of the counter to the output of the sigmoid function is 2(3/4) clocks.Figure 10 shows the graph of the approximated sigmoid function, generated by the decoder and pattern generator unit described above.

Function Control.
NSDP is a multifunction nanosensor data processor.The function selection of NSDP is determined by the function control unit, which is a 2-to-3 decoder.The truth table of the function control unit is shown in Table 2.
When control bus  1  0 is equal to 00, the output of NSDP is the raw majority gate output, that is, independent majority gates.The bit position does not matter; that is,  3 does not mean MSB and  0 does not mean LSB.When control bus  1  0 is equal to 01, the output of NSDP is the number of 1s of the majority gate output, given by 0 2  1  0 .Note that here  0 is LSB and MSB is always 0. When control bus is  1  0 equal to 10, the output of NSDP is the sigmoid function output.The output of the function control unit is given by The schematic of the function control unit is shown in Figure 11(a), the QCA layout is shown in Figure 11(b), and the simulation result is shown in Figure 11(c).When input  1  0 is equal to 00, 01, and 10, the output  10  01  00 is equal to 001, 010, and 001, respectively.The delay from the input to output is 1 clock.

NSDP Output.
The function control unit of NSDP determines which output of the three function units (raw majority data, the number of the active majority gates, and sigmoid The schematic of NSDP output unit is shown in Figure 12(a), and the partial QCA layout of  0 in is shown Figure 12(b).Note that  0 ,  0 , and  0 are the outputs of M-latch, S-latch, and G-latch for the inputs  0 ,  0 , and  0 , respectively, which are not shown in Figure 12(b) (refer to Figure 13 for details).
The complete QCA layout of NSDP is shown in Figure 13.NSDP consists of the following units: preprocessing, counter, 3-to-4 decoder, pattern generator, sigmoid function output, function control, M-latch, S-latch, G-latch, and NSDP output.Each unit is surrounded by a red dotted line in Figure 13.The complete experiment results are given in the next section.

Conclusions and Future Works
This paper presents the design of a 4-bit nanosensor data processor (NSDP) which has 3 functions: (i) sending the result of 4 majority gates to high-level processor; (ii) counting the number of active majority gates (i.e., the output of the majority gate is 1); (iii) generating the approximate sigmoid function for the postprocessing based on the artificial neural network.QCA circuits have significant wire delays.For a fast design in QCA, it is generally necessary to minimize the complexity.The design uses systolic array structures to produce an output on every clock cycle with low latency to the first output.The layouts and functionality checks were done using QCADesigner.NSDP is a relatively complicated system.The simulation results show that its three functions work correctly.It is hoped that this paper will inspire further ideas on developing application systems based on QCA circuits.
As mentioned in Section 1, the nanosensor system generates a large number of nanosensor data.This type of system needs 10 3 -10 7 PEs to process the real-time data.NSDP described in this paper works as a PE.A large number of PEs (10 3 -10 7 ) need to be arranged in a parallel and distributed manner to handle the huge amount of nanosensor data.NSDP is a 4-bit processor.It will be more convenient to handle large amount of nanosensor data, by extending it to a 8-bit processor.All these are our future works.

(A) Layout Design Rules ( 1 )
Maximum Number of Cells in a Single Clocking Zone.It can be as large as 47 cells; the maximum length of QCA wire is 25 cells.Any long QCA wire exceeding the length of 25 needs to be partitioned into different clocking zones.

3. 5 .
Sigmoid Function Output.The output of the sigmoid function is one of the four patterns,  4  4  4  4 ,  3  3  3  3 ,  2  2  2  2 , and  1  1  1  1 , depending on the number of 1's of the counter.Therefore, the output  3  2  1  0 of the sigmoid function can be written as

Figure 9 :
Figure 9: Output of sigmoid function.(a) Schematic for the output of the sigmoid function; (b) QCA layout for  0 of the sigmoid function (this QCA layout is applicable to  1 ,  2 , and  3 by changing to the corresponding input); (c) simulation result of the sigmoid function output.