An Efficient Design of DCT Approximation Based on Quantum Dot Cellular Automata (QCA) Technology

. Optimization for power is one of the most important design objectives in modern digital image processing applications. The DCT is considered to be one of the most essential techniques in image and video compression systems, and consequently a number of extensive works had been carried out by researchers on the power optimization. On the other hand, quantum-dot cellular automata (QCA) can present a novel opportunity for the design of highly parallel architectures and algorithms for improving the performance of image and video processing systems. Furthermore, it has considerable advantages in comparison with CMOS technology, such as extremely low power dissipation, high operating frequency, and a small size. Therefore, in this study, the authors propose a multiplier-less DCTarchitecture in QCA technology. The proposed design provides high circuit performance, very low power consumption, and very low dimension outperform to the existing conventional structures. The QCADesigner tool has been utilized for QCA circuit design and functional veriﬁcation of all designs in this work. QCAPro, a very widespread power estimator tool, is applied to estimate the power dissipation of the proposed circuit. The suggested design has 53% improvement in terms of power over the conventional solution. The outcome of this work can clearly open up a new window of opportunity for low power image processing systems.


Introduction
In the last years, marked researches have been made in many transform techniques like fast Fourier transform (FFT), discrete cosine transform (DCT), and discrete wavelet transform (DWT), which are extensively used in various digital signal processing (DSP) applications [1,2].FFT is an essential transform in DSP with applications in signal filtering, frequency analysis, and compression.DWT is a widely used time-frequency method for the analysis of nonstationary signals.e DCT has widely been exploited for real-life data compression.DCT is better than others in some applications like data compression.It has energy compaction and decorrelation properties which makes it very close to the Karhunen-Loeve Transform (KLT).us, the DCT is preferable for data compression applications.It is an essential conversion between time and frequency domains in various applications of speech and image processing, communication systems, and signal [3].erefore, it is used to map an image space into a frequency.DCT is extensively used in several image and video compression standards such as JPEG [4], MPEG-1 [5], MPEG-2 [6], H.261 [7], H.263 [8], and others [9,10].e implementation of the DCT algorithm is not efficient due to its floating-point calculations and complex loops.In fact, floating-point algorithms are slow in software and require more silicon in hardware implementation [11].However, the DCTshould be calculated in a very short time.In this context, in the last few years, a large number of DCT approximations have been proposed to decrease the complexity of this transform [12][13][14].Indeed, the request for higher quality video has increased because of the enormous amount of electronic devices that process digital video in even higher resolutions.us, power optimization and area minimization are the two principal research areas in very large-scale integrated circuit (VLSI) design for embedded and handheld devices which employ various image processing algorithms.Up to now, complementary metal oxide semiconductor-(CMOS-) based VLSI technology is extensively used to improve the quality of image processing systems.However, traditional transistors cannot get much smaller than their current size, which causes a large impact on the speed, performance, and power consumption of future designs.e challenges created by this trend could be partially met by innovative technologies, proposed as alternatives to the classic CMOS.Presently, single electron transistor (SET), tunnel field-effect transistor (FET), carbon nanotube (CNT), and silicon nanowire transistor are being used as an alternative to conventional VLSI technology [15,16].Among them, quantum-dot cellular automata (QCA) is one of the most promising solutions to design ultra-low-power and very high-speed digital circuits [17,18].QCA technology offers a revolutionary approach to computing at the nanolevel.e use of QCA on the nanoscale has a promising future because of its ability to achieve high performance in terms of device density, clock frequency, and power consumption.In this focus, QCA offers potential advantages of ultra-low power dissipation.It is expected to achieve a very high device density of 1012 device/cm 2 and switching speeds of 10 ps and a power dissipation of 100 W/cm 2 [19].Consequently, an efficient design of circuits based on this new technology would lead to the reduction of computational complexity and power consumption.
ese benefits can make the proposed QCA method useful for image processing applications applied on portable communication devices where low power consumption is demanded in today's world.Recently, some efforts have been made towards the design of QCA logic circuits for image processing applications such as MAC operation [20], BinDCT [21], image steganography [22], morphological edge detection [23], thresholding [24], noise removal [25], and morphological erosion and dilation [26].e above scenario motivates us to investigate a new low-power DCT architecture based on QCA technology.
In this paper, we first present an optimal structure of adder circuit using three inputs XOR gate and three inputs majority gate which is used to design an eight-bit ripple carry adder (RCA) circuit.Furthermore, an efficient QCA D flip-flop (DFF) circuit is designed, and then the PIPO shift register circuit is designed using this DFF circuit as the building block.e designed RCA and PIPO shift register are used to achieve QCA DCT architecture.Power dissipation of the proposed DCT design has been estimated.Reliability of the proposed QCA circuit has also been explored.
e remainder of this paper is organized as follows: Section 2 provides the background of DCT algorithm.Section 3 presents an overview of the QCA.Section 4 discusses the DCT power optimization by QCA technology.Section 5 shows the discussions and results of the proposed DCT architecture.Finally, conclusions are drawn in Section 6.

DCT Algorithm
e discrete cosine transform (DCT) plays a critical role in image and video compression due to its near-optimal decorrelation efficiency [3].e DCT is similar to the discrete Fourier transform (DFT).It is used to compress both color and gray scale images.e main advantage of image transformation using DCT is the suppression of redundancy between neighbouring pixels.Indeed, DCT approximation with low bit rates and low computational complexity is preferred.In this area, significant research works have been devoted for reducing the computation complexity of DCT transform [13,[27][28][29][30][31][32][33][34].In ref. [13], a low power DCT architecture is proposed.It requires only sixteen additions.It has lower computational complexity.Also, a low complexity orthogonal 8 * 8 transform matrix for fast image compression is proposed in [33].It requires only fourteen additions and two shift operations.A new matrix for DCT, which requires only 12 additions, is reported in [34].It achieves a low power consumption while implementing in hardware.Besides, several studies have been carried out to improve the performance of the DCT module and then reduce the complexity of the treatment [35,36].Otherwise, power consumption presents a fundamental problem when designing embedded video applications.Furthermore, embedded and handheld devices face necessary issues related to energy constraints as a result of their sizes and weights. is truth stimulates designers to search for new solutions to grant low power consumption for video processing applications.QCA technology is motivated by its applications in low-power electronic design.It has attracted important attention.In this paper, we have used the digital architecture (Figure 1) proposed in [34].It can be implemented quite easily using adders and Parallel-In Parallel-Out (PIPO) shift registers.

QCA Fundamentals
e QCA approach, introduced in 1993 by Lent et al. [18], is able to replace devices based on field-effect transistor (FET) on nanoscale.Generally, QCA cells are classified into various types: metal islands, nanomagnetics, semiconductors, and molecular structures.In the QCA technology, data are transmitted through polarization based on binary information encoding in quantum-dot cells.is nanotechnology was conceived based on some of Landauer's ideas regarding energy efficient and robust digital devices [37].It consists of an array of cells.Each cell contains four quantum dots at the corner of a square which can hold a single electron per dot.Only two electrons diametrically opposite are injected into a cell due to Coulomb interaction [38].rough Coulombic effects, two possible polarizations (labelled − 1 and 1) can be shaped.
ese polarizations are represented by binary "0" and binary "1" as shown in Figures 2 and 3, which shows the propagation of logic "0" and logic "1", respectively, from input to the output in QCA binary wires due to the Coulombic repulsion.Generally, in neighbouring cells, the coulombic interaction between electrons is used to 2 Journal of Electrical and Computer Engineering implement many logic functions which are controlled by the clocking mechanism [39].

Logic Gates.
A majority and inverter gates are the fundamental logic gates in the QCA implementations which are composed of some QCA cells.Several types of inverter and majority gates are shown in Figure 4.In the inverter gate, the output is the inverse of the input.Furthermore, the majority gate acts as an AND gate and OR gate just by setting one input permanently to 0 or 1.It has a logical function that can be expressed by the following equation: MV (a, b, c) � AB + BC + AC. (1) 3.2.QCA Clocking.e clocking system is an important factor for the dynamics of QCA.Its principal functions are the synchronization of data flows and the implementation of adiabatic cell operation which enable QCA circuits with high energy efficiency [40].Generally, QCA clocking is presented with four different phases which are switch, hold, release, and relax as illustrated in Figure 5.During the switch phase, in which actual computations are occurred, the barriers are raised and a cell is affected by the polarization of its adjacent cells and a distinctive polarity is obtained.During the hold phase, the barriers are high and the polarization of the cell is retained.During the release phase, the barriers are lowered and the cell loses the polarity.During the relax phase, the cell is nonpolarized [41].

Crossovers in QCA.
In this field, two approaches are used to traverse two wires in QCA (multilayer crossovers and coplanar crossings).Multilayer QCA circuits consume huge less area than coplanar circuits.However, it may be expensive and difficult to manufacture.In this paper, we use the former crossover approach in designing our DCT architecture since the second technique yields high cost due to fabrication issue.It requires two cell types (regular and rotated cells) as shown in Figure 6(a).It has already been applied in several studies [37,42].

QCA Implementation of the DCT
In this section, we present a new DCT architecture based on QCA technology to mitigate the computational complexity and power consumption issues.is configuration is composed of two stages (stage 1 and stage 2).e submodules utilized in designing our DCT architecture are eight-bit adders and PIPO shift registers to store the results generated by these adders.us, reducing the number of cell count and area in these components will make more contribution to achieve low power.

Study of Stage 1.
is stage is composed of eight 8-bit full adders and eight 8-bit PIPO shift registers.

Eight-Bit Adder.
e adder circuit plays an important role in the arithmetic circuits.Recently, several attempts have been made to implement efficient adder circuits in the QCA technology [43][44][45][46][47][48][49][50].erefore, the XOR gate [51] can easily be used in the synthesis of adder designs.In this subsection, we propose a novel QCA adder circuit based on majority gates.e inputs are A, B, and C in .e outputs are Carry-out (Cout) and Sum. e outputs for the full adder are, respectively, given by the following equations: e QCA layout for the proposed full adder is depicted in Figure 7.It consists of one majority gate and one threeinput exclusive-OR gate.According to QCADesigner software (version 2.0.3), the design consists of 45 cells and covers an area of 0.04 μm 2 .e proposed design provides correct outputs after a delay of two clock phases as depicted  Here, an eight-bit ripple carry adder can be constructed by cascading eight copies of the proposed full adder circuit in series (Figure 9(a)).In order to perform a correct addition in parallel, added cells may be applied to the inputs and outputs in different clock zones for circuit synchronization.e ripple carry adder (RCA) layout in size of eight bit is indicated in Figure 9(b).is design uses 526 cells in its structure which requires 9 clock phases to generate the final output.

QCA 8-Bit PIPO.
In this subsection, the design of the proposed 8-bit PIPO shift register is explained.e basic building block of a PIPO shift register is the flip-flop, mainly a D-type flip-flop.Figure 10 illustrates the proposed QCA flipflop.It can be built using majority and inverter gates.e logic equation of the D flip-flop is represented by the following equation: Here, the input "D" is only copied to the output "Q" when the clock input is active.e proposed design includes 42 cells with an area of 0.04 μm 2 .It takes five clock periods for the inputs to reach the output and first meaningful output comes on sixth clock.Figure 11

Study of Stage 2.
is stage is composed of eight 8-bit full adders and four 8-bit PIPO shift registers.e same full-adder and PIPO shift register proposed in the first stage have been used in this stage.

Results and Discussions
e implementation and the simulation of the proposed designs are achieved by using QCADesigner 2.0.3 tool [52].Here, an investigation into these designs in semiconductor QCA technology is provided.e parameters used for the simulation are as follows: cell width � 18 nm, cell height � 18 nm, cell-to-cell spacing � 2 nm, dot diameter � 5 nm, number of samples � 12.800, convergence tolerance � 0.001, radius of effect � 80 nm, relative permittivity � 12.9, clock high � 9.8 E-22J, clock low-� 3.8 E-23J, clock amplitude factor � 2, layer separation � 11.5 nm, and maximum iterations per sample � 100.
e spacing between two wires is two cells wide and the cell count in one clock zone is two at least.In this design, the coplanar wire method has been used.e comparison of the proposed QCA submodules with previously reported designs in terms of circuit complexity are shown in Tables 1-4, respectively.
e proposed subcircuits of QCA DCT approximation have lower computational complexity and better performances compared to the existing ones.As shown in Table 1, the designed full adder has an improvement of 78%, 85%, and 75% in terms of cell complexity, extent, and delay, correspondingly, compared with the design in [53].Compared with the design in [49], the proposed full adder has an advancement of 8.16% and 50% in terms of cell complexity and delay, respectively.Table 2 shows that the proposed design of the 8-bit adder has reduced 33% cell count, 5.3% area, and 65% delay as compared with the circuit in [47].In addition, the cell count, area, and delay of the designed QCA D flip-flop are considerably improved compared to the QCA circuits in [21,[56][57][58], as listed in Table 3. Table 4 summarizes the comparative results, which indicates that the designed eight-bit PIPO exhibits considerable superiority over the existing in [21] in terms of cell count and area by 27% and 29%, respectively.So, the proposed submodules can directly contribute to the low power DCT design.
Since there is no electrical current in QCA computations, the power consumption of the proposed design is much lower than the classical-based solution.Here, we employed QCAPro software [59] in order to calculate the power dissipation of the proposed DCT design.e consumption of the entire system is valuing 0.091 mW. is value is considerably lower than that existing in the literature and based on CMOS technology [34,60,61].According to Table 5, it is found that the proposed architecture involves nearly 53% less power dissipation than the presented one in [34].erefore, the proposed design can operate at a higher frequency (higher than 1 GHz) than the conventional solution.e performances gained indicate that the proposed module could be a good candidate for numerous video and image applications.Consequently, this architecture can be useful for future high-definition video applications.It enables meeting the real time constraints of the most recent high-resolution video formats.

Journal of Electrical and Computer Engineering
In this way, with the advances being made both in QCA technology and the ever-increasing computational requirements of image treatment, this work can clearly open up a new window of opportunity in this scope.
e effect of temperature variations on polarization of output cell in the proposed DCT design has been investigated.It is taken at different temperatures and the effect is depicted in Figure 14.According to this figure, it is clear that the DCT circuit works efficiently between 1 K and 6 K.Over 6 K, the output polariation drops dramatically and the design starts malfunctioning.

Conclusion
Area minimization and low power are the two indispensable requirements for portable multimedia devices, which use several image processing algorithms.e QCA technology offers several advantages such as very low power dissipation, high functional density, and improved computing speed (in terahertz) and facilitates further miniaturisation in nanoscale.In this paper, a novel design of DCT approximation in the QCA technology has been presented.e proposed design consumes 0.091 mW power.e operating frequency  [60] 29.78 Transform in [61] 12.4 Transform in [34] 0.1954 Proposed transform 0.091
presents the simulation results of the QCA D flip-flop.Figures 12 and 13 show, respectively, the schematic and the QCA layout of the proposed eight-bit PIPO shift register.It consists of eight QCA D flip-flops which are connected together by a clock signal.Here, the input data are D0, D1, . .., D7 which are parallally loaded into the register coincident.e outputs data of this design are Q0, Q1, . .., Q7 which are parallally available at the output of each D flip-flop.e proposed QCA layout is composed of 407 cells with an area of 0.52 μm 2 .It has a critical path length of 35 clock zones.

Figure 7 :
Figure 7: Proposed QCA layout of the FA circuit.

Figure 9 :
Figure 9: Proposed (a) logical diagrams and (b) QCA layouts of 8-bit parallel binary full adder.

Figure 8 :
Figure 8: Simulated input-output waveform of the proposed FA circuit.

Figure 10 :
Figure 10: Proposed (a) logical diagram and (b) QCA layout of D flip-flop.

Table 4 :
Comparison of the proposed 8-bit PIPO with the previous works.

Table 5 :
Comparison of the proposed DCT with the previous works.

Table 1 :
Comparison of the proposed adder with the previous works.

Table 2 :
Comparison of the proposed 8-bit adder with the previous works.

Table 3 :
Comparison of the proposed D flip-flop with the previous works.Journal of Electrical and Computer Engineering of this architecture can exceed 1 THz. is work provides high circuit performance, very low power consumption and very low dimension as compared with traditional VLSI technology.e outcome of this work can clearly open up a new window of opportunity for low power video designs.Future extensions, such as various applications based on this QCA DCT, could be investigated.